Meta-Evaluation of
IEG Evaluations
(FY15–19)
© 2022 International Bank for Reconstruction and Development / The World Bank
1818 H Street NW
Washington, DC 20433
Telephone: 202-473-1000
Internet: www.worldbank.org


ATTRIBUTION
Please cite the report as: World Bank. 2022. Meta-Evaluation of IEG Evaluations (FY15–19).
Independent Evaluation Group. Washington, DC: World Bank.

COVER PHOTO
shutterstock/ Thaiview

EDITING AND PRODUCTION
Amanda O’Brien

GRAPHIC DESIGN
Luísa Ulhoa


This work is a product of the staff of The World Bank with external contributions. The findings,
interpretations, and conclusions expressed in this work do not necessarily reflect the views of
The World Bank, its Board of Executive Directors, or the governments they represent.
The World Bank does not guarantee the accuracy of the data included in this work. The
bound-aries, colors, denominations, and other information shown on any map in this work do
not imply any judgment on the part of The World Bank concerning the legal status of any
territory or the endorsement or acceptance of such boundaries.


RIGHTS AND PERMISSIONS
The material in this work is subject to copyright. Because The World Bank encourages dissem-
ination of its knowledge, this work may be reproduced, in whole or in part, for noncommercial
purposes as long as full attribution to this work is given.

Any queries on rights and licenses, including subsidiary rights, should be addressed to World
Bank Publications, The World Bank Group, 1818 H Street NW, Washington, DC 20433, USA; fax:
202-522-2625; e-mail: pubrights@worldbank.org.
Meta-Evaluation of
IEG Evaluations
(FY15–19)
February 14, 2022
      Contents

      Abbreviations                                                                                                                   v

      Acknowledgments                                                                                                                vi

      Meta-Evaluation Universe                                                                                                      vii

      Executive Summary                                                                                                              ix


      1.	 Introduction��������������������������������������������������������������������������������������������������������������� 1
         Background, Objectives, and Scope                                                                                            1
         Questions                                                                                                                    2
         Approach                                                                                                                     2

      2.	Framework�����������������������������������������������������������������������������������������������������������������5

      3.	 nventory of Methods������������������������������������������������������������������������������������������������11
         Summary of Main Trends                                                                                                     11

          In-Depth Review of Evaluations�����������������������������������������������������������������������������21
      4.	 
         Attribute 1: Scope and Focus                                                                                               21
         Attribute 2: Reliability                                                                                                   24
         Attribute 3: Construct Validity                                                                                           28
         Attribute 4: Internal Validity                                                                                             35
         Attribute 5: External Validity                                                                                            38
         Attribute 6: Data Analysis Validity                                                                                       40
         Attribute 7: Consistency                                                                                                   42

          Using Innovative Methods in Independent Evaluation Group Evaluations����������51
      5.	 

         Conclusions and Suggestions��������������������������������������������������������������������������������56
      6.	
         Scope and Focus of IEG Evaluations                                                                                         57
         Use of Conceptual Frameworks and
         Theories of Change                                                                                                         57
         Clarity of Research Methods and Design                                                                                    58
         Validity                                                                                                                   59
ii	
   Consistency                                                                                                            60
   Innovation in Evaluation                                                                                                61

References�������������������������������������������������������������������������������������������������������������������64


Figures
            Inventory of Methods Referenced in Approach
Figure 3.1. 
               Papers and Evaluation Reports                                                                               13
Figure 3.2. Prevalence of Methods over Time                                                                                14
Figure 3.3. Distribution of Innovative Methods over Time                                                                   15
            Difference in Methods Tallies between Approach
Figure 3.4. 
                Papers and Evaluation Reports                                                                              16
            References to Special Issues in Approach Papers
Figure 3.5. 
                and Evaluation Reports                                                                                     17
            References to Research Design Attributes in Evaluation Reports
Figure 3.6.                                                                                                                18


Tables
Table FM.1. Universe of Evaluation Reports                                                                                 vii
Table 2.1. Division Matrix of Evaluation Reports                                                                            9


                                                                                                                                   iii
      Appendixes
      Appendix A. Stratified Random Sample of IEG Evaluations                     74
      Appendix B. List of Interviewees                                            77
      Appendix C. Assessment Framework for the IEG Meta-Evaluation               78
      Appendix D. Tabulated Scores of Reports and Approach Papers                83
      Appendix E. Inventory of Methods Used in Evaluations and Approach Papers   86
                  Formulation and Categorization of Evaluation Questions
      Appendix F. 
                  Referenced in the Sample of Eight IEG Evaluations              104
                   ailures When Formulating Evaluation or Research
      Appendix G. F
                   Questions Based on the Literature                             118
iv	
Abbreviations
	CCT	 conditional cash transfers

	CDM	 Clean Development Mechanism

	CF	carbon finance

	EDM	 evaluation design matrix

	ERPA	 Emission Reduction Purchase Agreement

	FY	fiscal year

	IEG	Independent Evaluation Group

	IFC	International Finance Corporation

	QCA	 qualitative comparative analysis


All dollars are US dollars unless otherwise indicated.




                                                         v
     Acknowledgments
     This meta-evaluation was conducted by two senior evaluation consultants,
     Frans Leeuw and Julian Gayfer, and supported by a research fellow, Ariya
     Hagh (Georgetown University). The task team leader was Jos Vaessen.
vi
Meta-Evaluation Universe
Throughout the meta-evaluation, reports are referred to by topic rather than
title. Table FM.1 provides a gloss.

Table FM.1. Universe of Evaluation Reports

 Evaluations, by Fiscal Year                                  Topic
 FY15

    Financial Inclusion: A Foothold on the Ladder toward      Financial inclusion
    Prosperity? An Evaluation of World Bank Group Support
    for Financial Inclusion for Low-Income Households and
    Microenterprises
     Learning and Results in World Bank Operations: How       Learning and results
    the Bank Learns
    The Poverty Focus of Country Programs: Lessons from       Ending poverty
    World Bank Experience
    World Bank Group Support to Electricity Access,           Electricity access
    FY2000–2014
    World Bank Support to Early Childhood Development         Early childhood develop-
                                                              ment
 FY16

    Behind the Mirror: A Report on the Self-Evaluation        Self-evaluation systems
    Systems of the World Bank Group
    Industry Competitiveness and Jobs: An Evaluation of       Competitiveness and jobs


                                                                                                 World Bank Group Independent Evaluation Group    vii
    World Bank Group Industry-Specific Support to Promote
    Industry Competitiveness and Its Implications for Jobs
    Program-for-Results: An Early-Stage Assessment of the     Program-for-Results
    Process and Effects of a New Lending Instrument
    The World Bank Group’s Support to Capital Market          Capital market development
    Development
 FY17

    A Thirst for Change: The World Bank Group’s Support for   Water supply and sanitation
    Water Supply and Sanitation, with Focus on the Poor
    Data for Development: An Evaluation of World Bank         Data for development
    Support for Data and Statistical Capacity
    Growing the Rural Nonfarm Economy to Alleviate            Rural nonfarm economy
    Poverty: An Evaluation of the Contribution of the World
    Bank Group

                                                                                   (continued)
                                                                          Evaluations, by Fiscal Year                                   Topic
                                                                             Higher Education for Development: An Evaluation of the     Higher education
                                                                             World Bank Group’s Support
                                                                             Mobile Metropolises: Urban Transport Matters: An IEG       Urban transport
                                                                             Evaluation of the World Bank Group’s Support for Urban
                                                                             Transport
                                                                             Toward a Clean World for All: An IEG Evaluation of the     Pollution management
                                                                             World Bank Group’s Support to Pollution Management
                                                                             World Bank Group Country Engagement: An Early-Stage        SCD/CPF process
                                                                             Assessment of the Systematic Country Diagnostic and
                                                                             Country Partnership Framework Process and Implemen-
                                                                             tation
                                                                          FY18

                                                                             Carbon Markets for Greenhouse Gas Emission Reduction       Carbon markets
                                                                             in a Warming World
                                                                             Engaging Citizens for Better Development                   Engaging citizens

                                                                             Growth for the Bottom 40 Percent: The World Bank           Shared prosperity
                                                                             Group’s Support for Shared Prosperity
                                                                             The International Finance Corporation’s Approach to        IFC client engagement
                                                                             Engaging Clients for Increased Development Impact
                                                                             World Bank Group Support to Health Services:               Health services
                                                                             Achievements and Challenges
Meta-Evaluation of IEG Evaluations (FY15–19)  Meta-Evaluation Universe




                                                                          FY19

                                                                             ‘Creating Markets’ to Leverage the Private Sector for      Creating markets
                                                                             Sustainable Development and Growth: An Evaluation
                                                                             of the World Bank Group’s Experience through 16 Case
                                                                             Studies
                                                                             Building Urban Resilience: An Evaluation of the World      Urban resilience
                                                                             Bank Group’s Evolving Experience (2007–17)
                                                                             Grow with the Flow: An Independent Evaluation of the       Facilitating trade
                                                                             World Bank Group’s Support to Facilitating Trade 2006–17
                                                                             Knowledge Flow and Collaboration under the World           Knowledge flow and col-
                                                                             Bank’s New Operating Model                                 laboration
                                                                             Two to Tango: An Evaluation of World Bank Group Sup-       Fostering regional integra-
                                                                             port to Fostering Regional Integration                     tion
                                                                             World Bank Group Support in Situations Involving           Forced displacement
                                                                             Conflict-Induced Displacement
                                                                          FY20

                                                                             The World’s Bank: An Evaluation of the World Bank          Convening power
                                                                             Group’s Global Convening

                                                                         Source: Independent Evaluation Group.
viii	
Executive Summary
Since 2005, the Independent Evaluation Group (IEG) has been subject to
independent external reviews. To support the next review, a meta-evaluation
of IEG programmatic and corporate process evaluations was conducted in
2020–21 by independent experts. The purpose of the meta-evaluation was to
provide (i) provide inputs on the quality and credibility of IEG’s evaluations
for IEG’s upcoming independent external review and (ii) provide IEG’s lead-
ership team an external perspective and suggestions on how to improve the
quality and credibility of evaluations.

The assessment focused on the credibility of evaluations (excluding utility
and independence). More particularly, it focused on aspects of credibility
that could be gleaned from the reports and Approach Papers. The analysis
was conducted in three phases. The first phase (inventory stage) focused
on mapping the rationale, scope, use of (innovative) methods, and several
research design attributes of all 28 IEG evaluations within the universe of
evaluations published from fiscal year (FY)15 to FY19. In the second phase
(assessment stage), an assessment framework was developed and applied
to a stratified random sample of eight evaluations. The in-depth review
assessed evaluations according to their scope and focus, reliability, validi-
ty (including construct, internal, external, and data analysis validity), and
consistency. Finally, the analysis was supplemented with interviews with IEG
team leaders and evaluation officers to obtain contextual information on the
design and implementation of evaluations within IEG.

The meta-evaluation arrived at the following six major conclusions and asso-
ciated suggestions for improvement. First, information presented on scope,
rationale, and goals in the evaluation reports and Approach Papers was elabo-
rate, relevant, and thorough. At the same time, the scope of some IEG evalua-
tions tended to be overambitious and diluted. The meta-evaluation offers two
suggestions for improvement in this area: (i) The use of portfolio analysis
as a standard operational procedure should be reconsidered. (ii) Evaluators
should refrain from formulating “bags of questions,” instead devoting more
                                                                                     ix




time to refining the focus of evaluations.
                                                                   Second, IEG evaluations adequately defined concepts (though they did not
                                                                   always operationalize them). More recent evaluations systematically incor-
                                                                   porated evidence from the literature and made adequate use of theories of
                                                                   change. However, the function of the theory of change was not always clearly
                                                                   articulated; its relationship to the empirical parts of the evaluative analysis
                                                                   could have been strengthened. The meta-evaluation offers three suggestions
                                                                   in this area: (i) Evaluations should more explicitly articulate the role theo-
                                                                   ries of change play in data collection and analysis, assessing their relation-
                                                                   ship to relevant empirical work. (ii) Evaluations could be more precise about
                                                                   the content of their theories of change. (ii) Greater attention to operational-
                                                                   izing concepts into variables and measurement instruments could improve
                                                                   construct validity.

                                                                   Third, clarity in evaluation design has improved in IEG evaluations over the
                                                                   past five years. The use of tools such as the evaluation design matrix is wide-
                                                                   spread. However, sometimes the evaluation design matrix presents only a list
                                                                   of “evaluative instruments.” Several evaluations still do not show sufficient
                                                                   clarity on how different methods help answer specific evaluation questions
                                                                   and how evidence from different sources is triangulated and used to substan-
                                                                   tiate evaluation findings. Two suggestions are provided for this area: (i) More
Meta-Evaluation of IEG Evaluations (FY15–19)  Excecutive Summary




                                                                   attention should be paid to distinguishing between data collection and data
                                                                   analysis methods, fully articulating the ways in which the two complement
                                                                   each other. (ii) Guidance on best practices in the practical implementation of
                                                                   principles of triangulation and synthesis in evaluation should be developed.

                                                                   Fourth, while there are good examples of evaluations with high internal,
                                                                   external, and data analysis validity of findings, there are ongoing challenges
                                                                   that merit further attention. The meta-evaluation proposes three sugges-
                                                                   tions for improvement in this area: (i) Although suggestions related to the
                                                                   use of theories of change have already been presented, it should be noted
                                                                   that improvements in this area can also improve internal validity. (ii) A
                                                                   dedicated section on the diagnosis and treatment of internal and external
                                                                   validity issues could be useful in mitigating some of the challenges posed by
                                                                   the complexity of evaluands. (iii) Guidance (as suggested above) on how to
                                                                   triangulate evidence with and across sources of evidence would be helpful.
x	
Fifth, IEG evaluation reports fared quite well with respect to the consistency
among rationale, scope, questions, methods, findings, and recommendations.
There was generally a strong fit among the use of methods, data sources, and
evaluation questions. One suggestion is provided for this area: To further
strengthen analytical rigor, IEG evaluations should consider developing a
more systematic approach to assess how contextual (macro and meso) char-
acteristics may or may not influence the behavior of beneficiaries of World
Bank Group–supported interventions.

Finally, during FY15–19, IEG evaluations demonstrated a broadening of the
range of methods used to respond to evaluation questions. While innovation
in methods used for data collection and analysis should be applauded, such
innovation should not become an end in itself. The meta-evaluation provides
the following suggestion for improvement in this area: IEG could benefit from
a more strategic view of methodological innovation in evaluation. Given the
recent challenges posed by the coronavirus (COVID-19) pandemic, digital
tools and approaches will undoubtedly grow in relevance in the work of the
Bank Group generally and IEG specifically. IEG should therefore be ready to
learn from recent experiences in innovation (especially in the field of data
science) and make informed decisions to adapt its practices where needed.




                                                                                 World Bank Group Independent Evaluation Group    xi
1 | Introduction
Background, Objectives, and Scope
Since 2005, the Independent Evaluation Group (IEG) has been subject to
independent external reviews assessing the credibility, utility, and inde-
pendence of its work.1 To support the next review, a meta-evaluation of IEG
evaluations was conducted in 2020–21. More specifically, the purpose of the
meta-evaluation was the following:

»	 To provide inputs on the quality and credibility of IEG’s evaluations for IEG’s
   upcoming independent external review, and

»	 To provide IEG’s leadership team an external perspective and suggestions on
   how to improve the quality and credibility of evaluations.

IEG conducts independent evaluations of the World Bank Group’s interven-
tions and processes mainly at three levels of analysis:

»	 Major or thematic and corporate process evaluations with a global or regional
   reach,2

»	 Country Program Evaluations, and
»	 Project-level evaluations.
The meta-evaluation covered the first category of IEG’s evaluations, pro-
grammatic and corporate process evaluations,3 completed between fiscal
year (FY)15 and FY19.
                                                                                     1
                                                          Questions
                                                          The meta-evaluation was guided by the following questions:

                                                            1.	 Can the meta-evaluation appraise the quality and credibility of IEG evalu-
                                                               ations according to a dedicated assessment framework? How would such a
                                                               framework be operationalized?4

                                                            2.	 Which data are required for such an assessment framework?

                                                            3.	 Which methodological approaches (both standard and broadened) were
                                                               used in the 28 IEG evaluation reports published between FY15 and FY19?
                                                               How did the methods used in the evaluation reports compare with what
                                                               was initially proposed in the Approach Papers guiding the evaluations?
                                                               Did the evaluations explicitly discuss elements of research design?

                                                            4.	 What are the results of the in-depth review of 8 selected IEG evaluations?

                                                            5.	 What do evaluation reports, Approach Papers, and interviews with IEG
                                                               staff tell us about the use of innovative methods in the context of evalua-
                                                               tion in IEG?

                                                            6.	 What conclusions may be derived from the inventory, in-depth review, and
                                                               interviews? What suggestions can be made for future IEG evaluations?
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 1




                                                          Approach
                                                          The meta-evaluation relied primarily on a desk review of evaluation re-
                                                          ports (and their corresponding Approach Papers) and was complemented
                                                          by selected interviews. The assessment focused on the credibility of evalua-
                                                          tions (excluding utility and independence). More particularly, it focused on
                                                          aspects of credibility that could be gleaned from the reports and Approach
                                                          Papers. The assessment framework was developed according to the guide-
                                                          lines of the American Evaluation Association, the Organization for Economic
                                                          Co-operation and Development’s Development Assistance Committee, and
                                                          the Evaluation Cooperation Group. It was further supplemented by standards
                                                          from various professional evaluation societies, selected international devel-
                                                          opment organizations, and applied behavioral and social science research.
2	
The analysis was conducted in three phases. The first phase (inventory
stage) focused on the rationale and scope of all 28 IEG evaluations within
the universe of evaluations published from FY15 to FY19. The inventory also
appraised the evaluation reports and Approach Papers in terms of various
research design attributes, the reliability of the evaluation approach, and the
use of innovative (also referred to here as broadened) methods. An inven-
tory of core attributes provided insights on credibility, research design, and
methodological diversity across all reports in the universe. A combination of
manual and automatic content analysis was used to tabulate the prevalence
of conventional (standard) and innovative (broadened) evaluative methods,
comparing the methods suggested in Approach Papers with those used in the
evaluation reports.5

In the second phase (assessment stage), an in-depth review guided by the
assessment framework was conducted to assess the quality and credibility of
a stratified random sample of eight evaluations. The review assessed evalu-
ations according to their scope and focus, reliability, validity (including con-
struct, internal, external, and data analysis validity), and consistency. Special
attention was also given to the use of innovative evaluation and research
methods. Finally, the analysis was supplemented with interviews with IEG
team leaders and evaluation officers to obtain contextual information on the
design and implementation of evaluations within IEG.

The remainder of the report is structured as follows: Chapter 2 presents the

                                                                                    World Bank Group Independent Evaluation Group    3
assessment framework, outlining the operationalization of concepts and the
set of guidance used to assess the various attributes under consideration.
The chapter also provides a brief overview of the ways in which the data
were collected and analyzed. Chapter 3 describes the output from the inven-
tory exercise, covering 28 IEG evaluations.6 Chapter 4 describes the results of
the in-depth review of eight selected IEG evaluations. Chapter 5 elaborates
on the use of innovative methods in IEG evaluations, building on insights
from the inventory, interviews, and in-depth review of selected evaluation
reports and Approach Papers. Chapter 6 draws conclusions and presents
some suggestions to IEG.
                                                          1
                                                               The previous self-evaluation was conducted in 2015. The 2020 review was postponed as a
                                                          result of the coronavirus (COVID-19) pandemic. Historically, meta-evaluations can be traced
                                                          back to the 1960s when evaluators such as Scriven, Stake, and Stufflebeam began discussing
                                                          procedures and formal criteria of this genre of work. The term “evaluation of the evaluation,”
                                                          however, was most likely coined by Orata in 1940. A checklist for conducting meta-evalua-
                                                          tions can also be found in Scriven (2015).

                                                          2 
                                                               We use the term programmatic evaluations in this report.

                                                          3 
                                                               When we use the term IEG evaluation, we refer to the subset of programmatic and corporate
                                                          process evaluations.

                                                          4 
                                                               An internal working document on the development of the assessment framework and other
                                                          guiding templates was prepared for the meta-evaluation.

                                                          5 
                                                               Conventional (standard) methods included interviews, focus groups, questionnaires, surveys,
                                                          traditional document analysis, case studies, descriptive statistics, regression analysis, and
                                                          literature reviews. Innovative (broadened) methods included machine learning, network
                                                          analysis, geospatial data analysis, social media analysis, process tracing, qualitative compara-
                                                          tive analysis, theory layering (including nested theories of change), and (quasi-) experimental
                                                          methods.
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 1




                                                          6 
                                                               These are programmatic and corporate process evaluations.
4	
World Bank Group Independent Evaluation Group    5
      2 | Framework
      Evaluation question 1. Can the meta-evaluation appraise the quality and credibility
      of IEG evaluations according to a dedicated assessment framework? How would
      such a framework be operationalized?

      An assessment framework was developed to delineate the scope of the
      meta-evaluation, focusing the analysis on relevant evaluation reports and
      Approach Papers and their methodological characteristics. Per IEG’s re-
      quest, the meta-evaluation sought not only to look back on past evaluations
      but also to present IEG leadership with suggestions on how to improve the
      quality and credibility of its evaluations. As such, a focus on innovative de-
      velopments and approaches within evaluations was deemed important. The
      assessment focused on the credibility of evaluations (excluding utility and
      independence). More particularly, it focused on aspects of credibility that
      could be gleaned from the reports and Approach Papers. The exercise did not
      cover attributes of credibility that could not be assessed on the basis of the
      reports and Approach Papers, such as consultations between evaluators and
      counterparts, expertise and evaluation team composition, quality assurance
      process, and peer review.1

      Development of the framework began with a set of relevant Bank Group
      documents, notably World Bank Group Evaluation Principles (2019). The doc-
      ument discusses the credibility of evaluations as “grounded in expertise, ob-
      jectivity, transparency, and rigorous methodology [emphasis added]. Ensuring
      credibility requires that evaluations be conducted ethically and be managed
      by evaluators who exhibit professional and technical competence in work-
      ing toward agreed dimensions of quality. Independence is a prerequisite for
      credibility” (World Bank Group 2019, 5). The document also makes the point
      that the “rigor of evaluation design and of the corresponding data collection
      and analysis enhances the confidence with which conclusions can be drawn.
      Rigor is a prerequisite for the credibility of evaluation findings and, in turn,
6  




      for evaluation use” (World Bank Group 2019, 13).
The meta-evaluation’s focus on the methodological attributes of evalua-
tions thus links to the perspectives on quality and credibility elaborated
above. The approach also builds on the definition of evaluation quality from
a methodological perspective developed by Vaessen (2018).2 According to
Vaessen, quality from a methodological perspective can be understood as a
function of validity (internal, external, construct, and data analysis validity),
reliability (the idea that the evaluation process can be verified and in part
replicated), consistency (the need for a logical flow among the evaluation
rationale, questions, design, data collection and analysis, and findings), and
focus (balancing depth and breadth of analysis in evaluation).

In addition to the resources outlined above, the meta-evaluation also drew
from the Big Book on Evaluation Good Practice Standards, published a decade
ago by the Evaluation Cooperation Group (ECG 2012). This resource proved
valuable to the development of the assessment framework as it provided
guidelines on how to “organize the evaluation principles by type, i.e., gen-
eral and specific, as well as to address overlaps noted in the good practice
standards and to resolve differences in terminologies” (ECG 2012, 4). For the
purposes of the meta-evaluation, chapter VI-A, “GPS on Self-Evaluation,”
on good practice standards on country strategy and program evaluations,
provided the most relevant guidance.3 The good practice standards outline
16 principles on the process of evaluation and methodological best practices.
They are supported by a corresponding set of operational principles, includ-
ing “Guidance Note 1: Attributing Outcomes to the Project” (annex III.3).
                                                                                    World Bank Group Independent Evaluation Group    7
The assessment framework further benefited from five other resources. First,
the Organization for Economic Co-operation and Development—Development
Assistance Committee framework provided useful inspiration on assessing
the rationale, purpose, and objectives of evaluations. The framework also
offered useful guidance on scoping evaluations, developing an intervention
logic, gauging the validity and reliability of information sources, and clearly
linking evidence to evaluation questions.4 Second, attributes and operation-
alization schemes from the UN Evaluation Group’s Norms and Standards for
Evaluation (2016) informed the development of the assessment framework.
These were combined with checklists and approaches used by evaluation
functions from international organizations such as United States Agency
for International Development and the Norwegian Agency for Development
                                                          Cooperation. Third, the framework drew on insights from three professional
                                                          evaluation societies (the American, Canadian, and UK evaluation associ-
                                                          ations) to refine its assessment of methodological standards and quality.
                                                          Fourth, a set of criteria published by knowledge institutions and repositories
                                                          such as Campbell and 3ie were used in refining the framework’s evaluation
                                                          of methodological quality. Finally, a number of guidance books, handbooks,
                                                          and seminal papers were used to develop and operationalize the framework.5

                                                          The assessment framework was finalized after a series of meetings with the
                                                          members of the meta-evaluation team (Frans Leeuw, Julian Gayfer, and Ariya
                                                          Hagh) under the guidance of IEG’s methods adviser. The framework opera-
                                                          tionalized seven main attributes of methodological quality in evaluations:
                                                          scope and focus, reliability, construct validity, internal validity, external
                                                          validity, data analysis validity, and consistency.

                                                          The assessment framework was then applied to a stratified random sample
                                                          of eight evaluations. Evaluations were rated on each of the attributes, using
                                                          the following scale: “adequate, inadequate, partial, or nonapplicable.” The
                                                          inventory of methods did not assign scores and was devised as an objective
                                                          means of gathering aggregate-level information from the full universe of eval-
                                                          uations between FY15 and FY19. Appendix C provides a full elaboration of the
                                                          framework, its operationalization, and the various facets it incorporated.
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 2




                                                          Evaluation question 2. Which data are required for such an assessment
                                                          framework?

                                                          The data used in the meta-evaluation were collected and analyzed in several
                                                          steps. As noted earlier, the assessment included an inventory exercise cover-
                                                          ing the universe of 28 programmatic and corporate process evaluations (Ap-
                                                          proach Papers) and evaluation reports completed between FY15 and FY19.6
                                                          It included both programmatic (N = 20) and corporate (N = 8) evaluations.
                                                          Programmatic evaluations focus on activities, programs, and operations that
                                                          have been financed or implemented by the Bank Group, or both, to support
                                                          clients in achieving their national development goals, the Sustainable De-
                                                          velopment Goals, and the Bank Group’s twin goals of reducing poverty and
                                                          boosting shared prosperity. Corporate evaluations focus on the Bank Group’s
8	
internal processes, systems, and behaviors, which are designed to improve
the organization’s efficiency and effectiveness.

The full universe of evaluations was used in an inventory exercise of meth-
odological aspects referenced in both Approach Papers and evaluation
reports. First, automated content analysis was used to provide preliminary
insights on the prevalence and distribution of methodological approaches
cited. Next, manual coding was used to generate a more granular measure of
said attributes. Finally, the output data were aggregated and broken down by
type of method, the range of methods employed, and the level of congruence
between proposed and delivered methods.

The inventory of evaluation methods was conducted according to a coding
scheme classifying research methods as conventional or innovative, with the
latter emphasizing the use of approaches such as machine learning, network
modeling, geospatial methods, and qualitative comparative analysis.7 The
assessment of conventional methods included both qualitative and quantita-
tive approaches commonly used in evaluation reports. After coding the range
of methods used in both Approach Papers and evaluation reports, the full
sample was then disaggregated according to the type of evaluation (corpo-
rate versus programmatic) and the prevalence of innovative or conventional
methodological approaches. The results from this exercise were converted
into a matrix (table 2.1).

This matrix was used to generate a sample of reports for in-depth review. To
ensure that both methodological diversity and variations among evaluation         World Bank Group Independent Evaluation Group    9
types were preserved, reports were randomly selected from each of the four
cells in line with the proportional distribution of evaluations in the evalua-
tion universe. The reports selected for in-depth review are shown in bold in
table 2.1. Stratified randomization ensured that at least one report was se-
lected from each cell, examining a range of both corporate and programmat-
ic evaluations employing both conventional and more innovative evaluative
methods. Given the disparity between the number of corporate and program-
matic evaluations, two reports were chosen from the former and six from the
latter category. The results of the in-depth review are explored in chapter 4.8
                                                          Table 2.1. Division Matrix of Evaluation Reports

                                                                                                               Method Type
                                                           Report
                                                           Type             Broadened or innovative                       Conventional or standard
                                                           Corporate         »	 Learning and results                       »	 Program-for-Results
                                                                             »	 Self-evaluation systems                    »	 SCD/CPF process
                                                                             »	 Engaging citizens                          »	 IFC client engagement
                                                                             »	 Knowledge flow and collaboration
                                                                             »	 Convening power
                                                           Program-          »	 Financial inclusion                        »	 Facilitating trade
                                                           matic             »	 Electricity access                         »	 Ending poverty
                                                                             »	 Creating markets                           »	 Capital market development
                                                                             »	 Data for development                       »	 Urban transport
                                                                             »	 Support for shared prosperity              »	 Water supply and sanitation
                                                                             »	 Health services                            »	 Higher education
                                                                             »	 Carbon finance                             »	 Rural nonfarm economy
                                                                             »	 Forced displacement                        »	 Pollution management
                                                                             »	 Early childhood development                »	 Competitiveness and jobs
                                                                             »	 Fostering regional Integration             »	 Urban resilience

                                                          Source: Independent Evaluation Group.

                                                          Note: Bolded text represents reports selected for in-depth review. This table provides the topics of the
                                                          reviewed evaluations. For the full titles and information, see table FM.1.


                                                          Next, in-depth review (including coding and scoring) was conducted in
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 2




                                                          several stages by Frans Leeuw and Julian Gayfer on the eight sampled evalu-
                                                          ations (on the basis of reports and Approach Papers). The first stage involved
                                                          a test to gauge the workability of the framework’s operationalization guid-
                                                          ance: two IEG reports and their corresponding Approach Papers were se-
                                                          lected for this purpose. Leeuw and Gayfer independently coded the selected
                                                          reports, subsequently comparing scores in a meeting to evaluate the con-
                                                          sistency of ratings and ensure intercoder reliability. The results of this test
                                                          indicated that the operationalization of the assessment framework appeared
                                                          to be consistent, relevant, and reliable. Having established this, Leeuw and
                                                          Gayfer independently analyzed all eight evaluations in the sample, assign-
                                                          ing scores to each according to the seven attributes under consideration.9
                                                          These results were again compared, and after adjudication among Leeuw and
                                                          Gayfer, the final scores were assigned. Finally, nine interviews with IEG staff
10	




                                                          were conducted with task team leaders and senior IEG evaluators to comple-
                                                          ment the findings.10
1 
     This is a common limitation of meta-evaluations.

2 
     We use the term programmatic evaluations in this report.

3 
     The Big Book also pays attention to self-evaluations in chapter VI-B.

4 
     The meta-evaluation specifically drew on a number of the elements listed in sections 2 and 3
(OECD-DAC 2010, 2, 3, 11–14).

5 
     Among others, see Farrington 2003; Dfid 2012; NONIE 2009; Bamberger, Rugh, and Mabry
2011; Cook and Campbell 1979; Leeuw and Schmeets 2016; and Hedges 2017.

6
     Note that no Approach Paper was available for the ending poverty (FY15) evaluation. As
such, this evaluation was excluded from some of the analyses conducted.

7
     These methods are also referred to as “broadened” in the meta-evaluation. See appendix E
for more details.

8
     See appendix A for a full list of selected reports and the procedure used to draw the sample
of evaluations for in-depth assessment.

9 
     Output from this scoring exercise can be found in appendix D. Discussions surrounding the
revision of attribute scores can be shared by request.

10 
      To ensure adequate confidentiality standards, notes from the interviews were made avail-
able only to the external experts conducting the meta-evaluation. These notes will be de-
stroyed one year after the finalization of the meta-evaluation.



                                                                                                    World Bank Group Independent Evaluation Group    11
       3 | Inventory of Methods
       Evaluation question 3. Which methodological approaches (both standard and
       broadened) were used in the 28 IEG evaluation reports published between FY15
       and FY19? How did the methods used in the evaluation reports compare with what
       was initially proposed in the Approach Papers guiding the evaluations? Did the
       evaluations explicitly discuss elements of research design?

       An inventory of methodological approaches was conducted to explore the
       range and diversity of empirical strategies used in the evaluation reports
       and their corresponding Approach Papers. First, the inventory tallied the
       conventional evaluative methodologies used in corporate and programmat-
       ic evaluations. Next, the same was done for more innovative approaches,
       broadening the spectrum of methods used in evaluation. Finally, the inven-
       tory briefly examined the coverage of various research design attributes,
       measuring the extent to which evaluations and their supplemental appen-
       dixes discussed issues related to sampling, data collection, and operational-
       ization. The following section provides a brief overview of the data collection
       and operationalization scheme used to generate the inventory, as well as a
       discussion of trends and insights derived from the data.


       Summary of Main Trends
       The inventory drew on the full universe of 28 evaluation reports and corre-
       sponding Approach Papers produced between FY15 and FY19. The sample
       included 8 corporate and 20 programmatic evaluations, with the analysis
       examining both the final reports and the corresponding Approach Papers
       that guided each evaluation.1

       Data collection relied on a combination of automated and manual content
       analysis, using a series of tags representing the different methodological
       approaches referenced in the Approach Papers and evaluation reports. Au-
12  




       tomated content analysis (for example, bigram analysis) offered preliminary
insights on the prevalence of methods in the universe. The models provided
particularly useful information on the prevalence of conventional evaluative
approaches such as portfolio reviews, statistical analysis, and semistructured
interviews. These insights were then refined through manual analysis, which
provided additional granularity to generate a representative image of the
methods used in the universe of evaluations.

The inventory coded 13 conventional methods and 8 innovative ones used
in evaluative analysis. Among the latter, the coding scheme examined
the prevalence of content analysis, Bayesian modeling, network analysis,
Delphi panels, evidence gap maps, geospatial analysis, process tracing, and
qualitative comparative analysis (QCA). Of the innovative methods catego-
rized in the inventory, “content analysis” refers to any procedures related
to machine learning applications or automated content analysis, includ-
ing text mining and computer-assisted classification or parsing. “Network
analysis” includes methods related to social network analysis, organiza-
tional network analysis, or network modeling of any kind. “Geospatial
analysis” includes the use of geographic information systems data, satellite
imagery, or other geospatial methods.

Figure 3.1 summarizes the output from the inventory of methods. The bars
in blue represent the tally of conventional methods used in the universe,
with darker bars representing output from evaluation reports (what was
done) and the lighter bars output from the Approach Papers (what was

                                                                                 World Bank Group Independent Evaluation Group    13
proposed). The bars in orange represent the innovative methods used in the
universe: once again, the lighter bars represent Approach Papers and the
darker bars evaluation reports.2

As can be seen, conventional methods such as case studies, structured inter-
views, and statistical analysis were relatively common across the universe
of evaluations, with innovative methods like geospatial analysis and net-
work analysis present in only a few of the evaluations. Nearly all evaluations
employed some combination of interviews, case studies, desk reviews, and
surveys. The total count of conventional methods tended to be higher in the
final evaluation reports than what was initially proposed in the Approach
Papers. The only apparent exceptions to this involved a few of the more
innovative methods (for example, network analysis and content analysis,
                                                          both of which appeared in seven Approach Papers but only five evaluation
                                                          reports). Temporal analysis of the same data suggests that the use of more
                                                          innovative methods increased in more recent evaluations: this is shown in
                                                          figure 3.2. Annual tallies of methods employed in evaluation reports are
                                                          shown along the axis on the left-hand side. Trendlines graph the average
                                                          number of methods used per report, as shown on the right-hand axis.3

                                                                      nventory of Methods Referenced in Approach Papers and
                                                          Figure 3.1. I
                                                                                                                 Evaluation Reports


                                                                                         30                                 28
                                                                                                                       27 27
                                                                                                26
                                                                                                                                                                                                  25                                                           25
                                                                                         25   24 24                                              24                      24 24
                                                            Times method is used (no.)




                                                                                                                                                                                                                                                          21
                                                                                                                                                                       20
                                                                                         20                                                   19
                                                                                                                                                                                                                                                                                    17

                                                                                         15                                                                                                                                                                                                                  14
                                                                                                                                                                                                                                          13

                                                                                                                                                                                                                                                                             10
                                                                                         10                                                                    9                                                       9 9
                                                                                                                                                                                                            8                                                                                                                            8
                                                                                                                                                                                                                                                                                                        7                                             7                                      7
                                                                                                                                                         6                                                                                                                                                                                                                                                                                                                                      6
                                                                                                                                                                                                                                                                                                                                                                5                                    5
                                                                                         5                                                                                                                                                                                                                                                                                                                                      4                            4            3                 4
                                                                                                                                                                                                                                                                                                                                                                                                                                          2                                       2
                                                                                                                                                                                                                                                                                                                                1                                         11                                     11                                 1
                                                                                         0
                                                                                              Portfolio review

                                                                                                                  Case studies

                                                                                                                                 Interviews

                                                                                                                                              Surveys

                                                                                                                                                        Focus groups

                                                                                                                                                                       Desk reviews

                                                                                                                                                                                      Literature review

                                                                                                                                                                                                          Structured lit. review

                                                                                                                                                                                                                                   Conceptual framework

                                                                                                                                                                                                                                                          Theory of change

                                                                                                                                                                                                                                                                             Descriptive statistics

                                                                                                                                                                                                                                                                                                      Inferential statistics

                                                                                                                                                                                                                                                                                                                               (Quasi-) experiment

                                                                                                                                                                                                                                                                                                                                                     Content analysis

                                                                                                                                                                                                                                                                                                                                                                        Bayesian inference

                                                                                                                                                                                                                                                                                                                                                                                             Network analysis

                                                                                                                                                                                                                                                                                                                                                                                                                Delphi panels

                                                                                                                                                                                                                                                                                                                                                                                                                                Evidence gap maps

                                                                                                                                                                                                                                                                                                                                                                                                                                                    Geospatial analysis

                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Process tracing

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            QCA
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 3




                                                                                                                       Conventional, Approach Paper                                                                                                                                                              Conventional, evaluation
                                                                                                                       Innovative, Approach Paper                                                                                                                                                                 Innovative, evaluation

                                                          Source: Independent Evaluation Group.

                                                          Note: QCA = qualitative comparative analysis.
14	
Figure 3.2. Prevalence of Methods over Time

                             80                                                                         12.0


                             70
                                                                             9.6
                                                 9.5                                                    10.0
                                  8.8                            8.6
                             60
                                                                                               8.0      8.0
  Total methods used (no.)




                                                                                                               Methods per report (no.)
                             50


                             40                                                                         6.0


                             30
                                                                                                        4.0

                             20

                                                                                1.4                     2.0
                             10                                                                   1.2
                                    0.8                             0.6
                                                   0.5
                             0                                                                          0.0
                                  2015         2016            2017        2018            2019
                                                               Year
                                          Conventional methods            Innovative methods
                                          Conventional per report         Innovative per report




Source: Independent Evaluation Group.



                                                                                                                                          World Bank Group Independent Evaluation Group    15
The figure suggests that the average number of conventional evaluative
methods used per report remained roughly consistent across the universe of
evaluations, ranging between 8.0 and 9.6 approaches per evaluation report.
However, there was a small but pronounced increase in the use of so-called
innovative methods: while this number was less than 1.0 per report up to
2017, it increased to 1.4 and 1.2 in 2018 and 2019, respectively. In other
words, the use of at least one innovative method per report appears to have
become the norm in more recent evaluations.

Figure 3.3 further disaggregates the use of innovative methods over time,
graphing the prevalence of various approaches in the evaluation reports
examined in the universe. Certain approaches such as network analysis
and content analysis consistently feature in evaluation reports across the
                                                          universe. Others, such as QCA, appear to peak in more recent evaluations,
                                                          potentially suggesting a shift toward a more systematic analysis of case
                                                          study and other qualitative data. This provides further support for the view
                                                          that more innovative approaches to evaluation were used more frequently in
                                                          more recent evaluations covered in the universe.

                                                          Figure 3.3. Distribution of Innovative Methods over Time

                                                                                       12


                                                                                       10
                                                            Innovative methods (no.)




                                                                                       8



                                                                                       6


                                                                                       4



                                                                                       2



                                                                                       0
                                                                                            2015                  2016          2017        2018                2019
                                                                                                                                Year

                                                                                               Content analysis          Network analysis   Evidence gap maps
                                                                                               Geospatial analysis       Process tracing    QCA             Other
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 3




                                                          Source: Independent Evaluation Group.

                                                          Note: QCA = qualitative comparative analysis.


                                                          Data from the inventory were also used to compare the methodological
                                                          approaches suggested for use in the Approach Papers to those that were
                                                          ultimately delivered in the evaluation reports. As seen in figure 3.1, four of
                                                          the eight innovative approaches were referenced in Approach Papers but not
                                                          used in the evaluation reports (content analysis, network analysis, evidence
                                                          gap maps, and process tracing). Figure 3.4 compares the number of meth-
                                                          ods listed in Approach Papers to those that were used in the final evaluation
                                                          report for 27 of the 28 evaluations covered.4 The results showed that a ma-
                                                          jority of evaluations used more methods than their corresponding Approach
16	




                                                          Papers initially proposed.
             ifference in Methods Tallies between Approach Papers and
Figure 3.4. D
                                                                                     Evaluation Reports

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  7
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          5
 methods proposed (no.)




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  4
  Methods applied vs.




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               2                   2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                  2
                                                                                                                                                                                                                                                                                                                                                                                                            2                               2
                                                                                                                                                                                                                                                                                                                                                                             2
                                                                                                                                                                                                                                                                                                                        2                         2
                            �3 �2                                                                                                                                                                                                                                     1                   1
                                              0 0                                                                                                                                                                                    1
                                  �2 �1
                                        �1 �1
                          Urban resilience
                                             Water supply and sanitation
                                                                           Carbon market
                                                                                           Financial inclusion
                                                                                                                 Learning and results
                                                                                                                                        Pollution management
                                                                                                                                                               Data for development
                                                                                                                                                                                      Rural non-farm economy
                                                                                                                                                                                                               Capital market development
                                                                                                                                                                                                                                            Support for shared prosperity
                                                                                                                                                                                                                                                                            Convening power
                                                                                                                                                                                                                                                                                              Early childhood development
                                                                                                                                                                                                                                                                                                                            Program for results
                                                                                                                                                                                                                                                                                                                                                  Competitiveness and jobs
                                                                                                                                                                                                                                                                                                                                                                             SCD/CPF’s process evaluation
                                                                                                                                                                                                                                                                                                                                                                                                            Essential healthcare services
                                                                                                                                                                                                                                                                                                                                                                                                                                            Forced displacement
                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Facilitating trade
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       IFC client engagement
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Creating markets
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Fostering regional integration
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Urban transport
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Higher education
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Knowledge ﬂow and coordination
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Engaging citizens
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Electricity access
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  Self-evaluation systems
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Ending poverty
                                                                                                                                                                                                                                                                                                    Report topic


Source: Independent Evaluation Group.

Note: This figure provides the evaluation topic or short title. For complete information, see appendix A.


As shown in figure 3.4, a minority of evaluations used fewer methods in the
evaluations than were initially proposed in the Approach Papers: for ex-
ample, the urban resilience evaluation (FY19) ultimately used three fewer                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    World Bank Group Independent Evaluation Group    17
methods than were proposed in the corresponding Approach Paper (World
Bank 2019b). However, most evaluations ultimately used more methodolog-
ical approaches than initially proposed. In the starkest case, the self-eval-
uation systems evaluation (FY16) ultimately featured seven more methods
than were initially proposed (World Bank 2016a). The graph also suggests
that the majority of reports tended to roughly align with their Approach
Papers on the issue of methodological diversity: all but seven evaluations
diverged from their Approach Papers by only one or two methods. It should
be noted that the discrepancies in methods proposed versus used between
Approach Papers and evaluation reports can have many reasons (many of
them entirely justifiable), and there is no single clear interpretation possible.
                                                          In sum, the inventory highlights the breadth of methodological approaches
                                                          featured in the evaluation reports, tallying the frequency of use of different
                                                          analytical tools over time. While the output suggests that innovative meth-
                                                          ods remain somewhat underused in major evaluations, such methods have
                                                          also gained traction, with more recent evaluations relying on a broader spec-
                                                          trum of approaches to address complex evaluation challenges. This trend is
                                                          expected to grow as more evaluations take advantage of cutting-edge tools
                                                          to better use available qualitative and quantitative evidence.

                                                          The inventory also captured the extent to which evaluations paid attention
                                                          to special issues such as gender and data transparency. The inventory tallied
                                                          all references to these issues across all available Approach Papers and eval-
                                                          uation reports. The results of this analysis are summarized in figure 3.5. The
                                                          graphs show the total percentage of all evaluations and Approach Papers
                                                          that address such issues in each indicated year.5

                                                                       eferences to Special Issues in Approach Papers and
                                                          Figure 3.5. R
                                                                                                       Evaluation Reports


                                                                                            a. Gender                                                              b. Data transparency
                                                            Proportion addressing issue




                                                                                                                                     Proportion addressing issue




                                                                                            1                                                                      1.2

                                                                                          0.8                                                                        1
                                                                                                                                                                   0.8
                                                                                          0.6
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 3




                                                                                                                                                                   0.6
                                                                                          0.4
                                                                                                                                                                   0.4
                                                                                          0.2                                                                      0.2
                                                                                           0                                                                        0
                                                                                                2015    2016    2017   2018   2019                                        2015   2016     2017   2018   2019
                                                                                                                Year                                                                      Year
                                                                                                           AP          ER                                                            AP          ER

                                                          Source: Independent Evaluation Group.

                                                          Note: AP = Approach Paper; ER = evaluation report.


                                                          Nearly all reports included references to data transparency and gender,
                                                          with 21 of 28 evaluation reports referencing the former and 22 of 28 eval-
                                                          uation reports referencing the latter. For both issues, the final evaluation
                                                          reports (logically) featured more references than the corresponding Ap-
                                                          proach Papers. Finally, the inventory took stock of references to various
                                                          research design elements within the evaluations. Specifically, relevant
18	




                                                          methodological appendixes were judged based on whether they discussed
the sampling, data collection, and causal analysis strategies employed in
the evaluation. Furthermore, the reports were examined for discussions of
potential limitations and adequate links to the evaluation question(s). The
results from this probe are graphed in figure 3.6.

             eferences to Research Design Attributes in Evaluation
Figure 3.6. R
                                                    Reports


                                                                            a. Causal strategy                                                                            b. Limitations




                                                                                                                                            Proportion addressing issue
                                                                              1                                                                                             1
                                              Proportion addressing issue




                                                                            0.8                                                                                           0.8

                                                                            0.6                                                                                           0.6

                                                                            0.4                                                                                           0.4

                                                                            0.2                                                                                           0.2

                                                                             0                                                                                             0
                                                                                  2015 2016 2017 2018 2019                                                                      2015 2016 2017 2018 2019
                                                                                            Year                                                                                          Year


                              c. Evaluation question(s)                                                                    d. Data collection                                                                        e. Sampling strategy
Proportion addressing issue




                                                                                             Proportion addressing issue




                                                                                                                                                                                       Proportion addressing issue
                                1                                                                                            1                                                                                         1

                              0.8                                                                                          0.8                                                                                       0.8

                              0.6                                                                                          0.6                                                                                       0.6

                              0.4                                                                                          0.4                                                                                       0.4

                              0.2                                                                                          0.2                                                                                       0.2

                               0                                                                                            0                                                                                         0
                                    2015 2016 2017 2018 2019                                                                     2015 2016 2017 2018 2019                                                                  2015 2016 2017 2018 2019
                                              Year                                                                                         Year                                                                                      Year


Source: Independent Evaluation Group.



                                                                                                                                                                                                                                                      World Bank Group Independent Evaluation Group    19
References to research design parameters were either stable or increased
slightly over the time period assessed, with some fluctuations attributable
to the total number of evaluations assessed in each year. Nearly 90 percent
of the appendixes discussed the sampling strategy used in the evaluation,
along with the limitations of the methodological approach employed. About
85 percent of all evaluations linked the methodological strategy to specific
evaluation questions, and 78 percent discussed the data collection strategy
used. About 65 percent of evaluations incorporated the issue of causal identifi-
cation into the analysis, though coverage of this issue increased over time.6

Examining the development of these trends over time, we see that nearly all
evaluations linked their methodological approaches to specific evaluation
questions, a trend that remained roughly consistent over time. Likewise,
                                                          most evaluations discussed the sampling strategy used in data collection,
                                                          though this practice fell in FY19, with only about 70 percent of reports ex-
                                                          plicitly discussing sampling procedures. Except in FY16, a majority of evalu-
                                                          ations elaborated on the data collection methods used in their supplemental
                                                          appendixes. More evaluations discussed the limitations of their empirical
                                                          strategies over time. Likewise, discussions of causal strategy increased
                                                          substantially from FY17 onward. Overall, with the exception of references to
                                                          data collection (low outlier in FY16), we see high and stable values in rela-
                                                          tion to evaluation questions and sampling strategy as well as a positive trend
                                                          over time on clarity in terms of limitations and causal strategy.

                                                          Data from the inventory presented in this section provide a broad overview
                                                          of the range and diversity of methodological approaches used in the 28
                                                          evaluations examined in this meta-evaluation. The inventory highlighted
                                                          the breadth of methodological approaches featured in the full universe of
                                                          assessed evaluations, highlighting the ways in which such tools have been
                                                          leveraged to address a broad range of evaluation questions across the Bank
                                                          Group’s diverse portfolio of activities. Conventional methods such as case
                                                          studies, structured interviews, and statistical analysis are relatively common
                                                          across the universe of evaluations, with innovative methods like geospatial
                                                          analysis and network analysis present in only a minority of the evaluations
                                                          studied. However, the prevalence of innovative methods increased in more
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 3




                                                          recent evaluations, suggesting an upward trend. Finally, a growing number
                                                          of evaluations have been providing a more developed elaboration of their
                                                          research design by discussing data collection procedures, causal strategies,
                                                          and potential limitations with increasing frequency.
20	
1 
     The only exception to this was the ending poverty (FY15) evaluation, for which no Approach
Paper was provided (World Bank 2015c).

2 
     See appendix E for an expanded analysis of the methodological inventory.

3 
     Averages were calculated to offset the differences in the number of evaluations completed
each year. For example, there were only four evaluation reports in 2016 (hence the lower over-
all tally), but each report used an average of 9.5 methodological tools.

4 
     As noted above, the ending poverty (FY15) evaluation had to be excluded from this analysis
because no Approach Paper was provided for it (World Bank 2015c).

5 
     As a caveat to these data, it should be noted that such a tally provides at best a crude instru-
ment for the assessment of such complex issues. Questions related to the coverage of these
concepts in IEG evaluations merit a more in-depth exploration, one that is outside the scope
of this report.

6 
     The inventory also examined references to hypotheses or hypothesis-testing frameworks.
However, issues of data sparsity made it difficult to reach any meaningful conclusions about
trends pertaining to that parameter. As such, it was not included in the analysis of research
design attributes.




                                                                                                        World Bank Group Independent Evaluation Group    21
       4 | In-Depth Review of Evaluations
       Evaluation question 4. What are the results of the in-depth review of the eight
       selected IEG evaluations?

       This chapter presents the results of the in-depth review of the eight IEG
       evaluations selected in the sample. The evaluations were appraised accord-
       ing to the seven attributes distinguished in the framework. The results from
       this analysis are laid out below.1


       Attribute 1: Scope and Focus
       The first attribute in the in-depth review of evaluations focuses on the de-
       limitation of the scope, focus, and context in which the evaluations operated.
       The attribute examines the evaluations’ rationale and the clarity with which
       evaluation questions are formulated. Particular attention is given to issues of
       complexity (including the complexity of the evaluand). Given that IEG evalua-
       tions often address portfolios of up to hundreds of projects and interventions
       in multiple countries—portfolios that are often multilevel, multiactor, and
       multisite in nature—it is crucial that evaluations carefully specify the ratio-
       nale, scope, and questions studied.2

       This attribute also gauges the extent to which evaluation questions are
       clear and focused instead of manifesting a “bag-of-questions” approach.3 To
       assess the focus and clarity of questions used in the sample of evaluations,
       the meta-evaluation drew on previous literature to distinguish between the
       types of questions typically employed in the context of evaluation.4 Such
       questions can be disaggregated into five categories: descriptive, exploratory,
       evaluative, explanatory, and design oriented.

       Descriptive questions provide a summary of the state of affairs in a given
       field, society, or organization. Exploratory questions focus on garnering
       a better understanding of a topic or development. Evaluative questions
22  




       deal with the development, implementation, and consequences of policies,
programs, or interventions of major organizations. Such questions typi-
cally focus on the relevance, effectiveness, or efficiency of interventions.
Explanatory questions focus on clarifying the impact and effectiveness of
programs or policies, including any side effects that may arise from such
interventions. Finally, design-oriented evaluation questions address the
development of new intervention designs, including the characteristics of
programs, evaluation systems, common property regimes, common pool re-
sources, and so forth. Appendix F categorizes the evaluation questions listed
in the evaluations from the sample according to these categories.

Most of the overarching questions cited in the sampled evaluations were
descriptive, evaluative, or (to a lesser extent) design oriented. Evaluation
questions were almost never formulated in the exploratory or explanatory
style. Some questions turned an explicit eye to the future, delineating the
design-oriented steps the Bank Group could take, whereas others did not.
Though the evaluations reviewed in the sample generally fared well in clear-
ly outlining their scope, the meta-evaluation nonetheless found that evalua-
tion questions were not always brought together in a cohesive manner. Some
evaluations did not integrate questions in an accessible section or paragraph.
In other cases, it was not immediately clear which questions were more
central or how the questions related to one another.5 The issue was raised
in several interviews with IEG staff, who noted that the bag-of-questions
approach was a suboptimal means of focusing the scope of evaluations.6


                                                                                  World Bank Group Independent Evaluation Group    23
All eight Approach Papers were rated as adequate with respect to this attri-
bute. Six of the evaluation reports were rated as adequate, and two received a
score of partial. The vignettes below provide greater detail on the ratings and
how specific projects fared with respect to this attribute.

The International Finance Corporation’s Approach to Engaging Clients
for Increased Development Impact (FY18) provides a useful example
of adequate scope and focus considerations (World Bank 2018f). The
evaluation distinguished between the three complementary modalities
the International Finance Corporation (IFC) has employed: client-
focused partnerships, programmatic interventions, and country-focused
interventions.7 The report investigated the effectiveness of IFC’s approaches
to client engagement between FY04 and 2016, providing a clear delineation
                                                          of the evaluation’s scope: “Given the importance of the first modality, the
                                                          report’s focus is on client-focused partnerships” (5). This was justified
                                                          according to IFC’s engagement with long-term clients, helping them enter
                                                          new markets and enhance their contribution to the organization’s strategic
                                                          priorities. The central outcome was likewise clearly defined as “increasing its
                                                          developmental impact” (7).

                                                          World Bank Group Support to Health Services: Achievements and Challenges
                                                          (FY18) provides another useful example of adequate scope (World Bank
                                                          2018g). The evaluation aimed to fill “an evaluative evidence gap in the
                                                          health sector” (xi) and was the first comprehensive health sector evalua-
                                                          tion carried out by IEG since 2009. In laying out its scope, the evaluation
                                                          made sure to clearly delineate the many complexities of the health field, its
                                                          myriad actors, as well as the interconnected systems and operations within
                                                          it. In particular, it recognized and responded to the political economy of
                                                          health systems and the challenges in using monitoring data to interpret
                                                          progress toward health outcomes.

                                                          Conversely, Higher Education for Development: An Evaluation of the World
                                                          Bank Group’s Support (FY17) listed the following as its overarching question:
                                                          “How has the World Bank Group’s support to higher education contributed
                                                          to its twin goals of poverty reduction and shared prosperity?” (59). Per the
                                                          bag-of-questions approach, this was then divided into three subquestions
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 4




                                                          (for example, “Is the World Bank Group’s support for higher education con-
                                                          sistent and well articulated?”), and 13 subsequent components. A somewhat
                                                          similar situation was found in Growing the Rural Nonfarm Economy to Allevi-
                                                          ate Poverty (FY17), which cited two overarching questions, four subquestions,
                                                          and eight subcomponents. Both examples resemble the bag-of-questions
                                                          approach noted above.

                                                          Overall, the meta-evaluation found that all reports and Approach Papers
                                                          provided a good range of evaluation questions. The sheer number of ques-
                                                          tions and subquestions listed in some reports (over 50 in the sample of
                                                          eight evaluations) in some instances led to a fragmentation of focus. For
                                                          example, at times 1 or more overarching questions were followed by 10 or
                                                          more subquestions.
24	
The assessment of evaluation focus also demanded a brief examination
of the role of portfolio review and analysis in structuring the scope of IEG
evaluations. Portfolio review (to a large extent) is a standardized (if not rou-
tine) activity in IEG evaluations. While portfolio-based work has its merits,
in certain cases it can reduce the focus and specificity of evaluations. IEG
evaluation teams tend to spend a significant amount time on the identifica-
tion and description of the portfolio.8 In addition, due to the sheer number of
projects (and underlying interventions), effectiveness analysis often focuses
on project performance indicators instead of developing a causal analysis of
impact. Weaknesses in the system (such as poor-quality outcome indicators)9
can reduce the utility of this type of analysis.

Taken together, the meta-evaluation noted that the information present-
ed in reports and Approach Papers was rather elaborate and relevant: as
such, nearly all evaluations scored adequately on this attribute. All reports
and Approach Papers paid attention to evaluation questions to guide their
assessment: the reports examined in the sample of eight evaluations listed
more than 50 evaluation questions and subquestions in total. Usually 1 or
more overarching questions were formulated, but certain evaluations sub-
sequently added more than 10 subquestions, resembling a bag-of-questions
approach to scoping. Portfolio analysis was used as a standard operation in
characterizing and structuring the scope and focus of evaluations.

However, the scope of some IEG evaluations tended to be overambitious and

                                                                                    World Bank Group Independent Evaluation Group    25
diluted due to two aspects: First, the complexity of the evaluand, especially
in terms of the number of and diversity in countries and projects in the port-
folio, motivated a broadening of the scope in some instances. Second, this
complexity was further amplified due to the multisite, multilevel, and multi-
actor nature of the interventions supported by the Bank Group (especially in
case of the World Bank).


Attribute 2: Reliability
In an IEG blog post by Vaessen (2018), reliability is described as “the idea
that if one would repeat the analysis it would lead to the same findings.
Even though replicability would be too ambitious a goal in many (especially
multilevel, multisite, multiactor) evaluative exercises, at the very least trans-
                                                          parency and clarity on research design … should be ensured to enhance the
                                                          verifiability and defensibility of knowledge claims.”10 The meta-evaluation
                                                          focused on six sections related to evaluation reliability: evaluation design,
                                                          data collection, data analysis, synthesis, limitations discussed, and limitations
                                                          addressed. Of the eight Approach Papers, two were rated adequate, five partial,
                                                          and one inadequate with respect to this attribute. Of the corresponding evalu-
                                                          ation reports, three were rated adequate, four partial, and one inadequate.

                                                          The meta-evaluation specifically focused on four topics pertinent to reliabili-
                                                          ty: use of the evaluation design matrix (EDM), the number of methods used in
                                                          each evaluation, discussions of possible limitations, and the triangulation and
                                                          synthesis of evaluative evidence. These will now be explored in sequence.

                                                          The first topic examines the way in which the EDM is used in evaluations.
                                                          Relative to the attention paid to methodological approaches, the introduc-
                                                          tion of the EDM has been quite important, contributing to more transparent
                                                          and structured evaluations. This view was also reflected in several of the
                                                          interviews conducted for the meta-evaluation. The EDM provides an essen-
                                                          tial structure to the evaluation’s questions, methods, rationales, and sourc-
                                                          es, incentivizing evaluators to think through the methods and sources that
                                                          should be used in evaluative analysis.

                                                          The evaluation on health services provides an illustrative example of the
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 4




                                                          benefits of the EDM. The report adequately specifies key facets of data
                                                          collection and analysis, addressing the relevant data architecture used, the
                                                          theory of change (including intervention-specific theories of change), sys-
                                                          tematic reviews of existing research, and the range of methods required to
                                                          address the evaluand. These include document analysis, case studies, inter-
                                                          views, statistical modeling, and social network analysis. The EDM proves
                                                          particularly useful in justifying the use of specific methods, indicating how
                                                          they are to be used and the ways in which evaluative evidence from each will
                                                          be triangulated and synthesized. This was noted across country case studies,
                                                          cross-validating findings from country-level findings with those from the
                                                          portfolio and literature reviews.

                                                          However, in certain cases the EDM was treated as little more than a list of
                                                          “evaluative instruments” such as questionnaires, interview topic lists, con-
26	




                                                          sultations, project portfolio reviews, statistics, and similar tools. Such re-
ports often do not make a distinction between “instruments” used in data
collection and data analysis. They also seldom discuss evaluation design,
instead focusing largely on individual methods. White (2013) discusses these
distinctions in detail. “Although the terms ‘research methods’ and ‘research
design’ are often used interchangeably, there are important differences
between the two. The essence of developing a research design is making
decisions about the kinds of evidence required to address your research
questions (de Vaus 2001). Research design is not about the logistics of re-
search—how the data are collected, for example— but rather about the logic
of inquiry, the links between questions, data and conclusions.”11

The Learning and Results in World Bank Operations: Toward a New Learning
Strategy (FY15) provides an example of this (World Bank 2015b). In this
report, IEG developed a survey instrument to assess the type and quality of
evidence on project efficacy, applying it to implementation completion and
results reports that discussed experiments, quasi-experimental approaches,
and other approaches in line with the literature on evidence hierarchies. The
evaluation appendix referred to a “results framework” and several “evaluation
instruments” such as seven country case studies, surveys, and semistructured
interviews with 50 World Bank staff.12 In addition, the evaluation listed a series
of other methods, including an analysis of staff mobility across sectors and
regions (using roughly 20,000 individual records from the World Bank’s Time
Recording System), as well as a content analysis of responses to an open-


                                                                                     World Bank Group Independent Evaluation Group    27
ended question in the first Global Practices and Cross-Cutting Solutions Areas
Rapid Survey. However, the evaluation made no mention of how insights from
this rather large battery of methods and data were synthesized or triangulated.

The second topic addresses the number of methods used in each evaluation.
In some cases, up to 10 methodological approaches were deployed, some of
which were obtrusive (interviews, surveys, focus groups, consultations) and
other unobtrusive (documentary evidence, basic statistics, country-focused
evaluations, review of project-level evaluations, and so on). This raised
concerns that the proliferation of methodological approaches may not be
addressing the question of which methods are more appropriate or useful in
terms of each evaluation’s scope and context.13
                                                          The third topic addresses the extent to which the limitations of evaluations
                                                          (including “shoestring” conditions) were discussed.14 A well-developed dis-
                                                          cussion of limitations can positively impact the scope, breadth, and depth of
                                                          the evaluation. Most of the evaluations examined in the sample fared well
                                                          with respect to this factor, addressing limitations in a meaningful and con-
                                                          vincing manner. The evaluation on Carbon Markets for Greenhouse Gas Emis-
                                                          sion Reduction in a Warming World (FY18) presents a good example of this
                                                          (World Bank 2018a). The report lists six potential limitations, taking care
                                                          to address the ways each was addressed in the evaluation. The evaluation
                                                          further addressed specific limitations related to each of the methods used,
                                                          including portfolio analysis (appendix B of the report), causal analysis (ap-
                                                          pendix C of the report), and econometric analysis (appendix D of the report).

                                                          Finally, the fourth topic addresses the triangulation and synthesis of evalua-
                                                          tive evidence. The combination of different methodological approaches can
                                                          facilitate the corroboration of findings. However, a multifaceted research de-
                                                          sign can expose unforeseen contradictions and nuance. Though triangulation
                                                          and synthesis are essential to both, the meta-evaluation noted that coverage
                                                          of this facet could be improved. The point was further raised in several of the
                                                          interviews. With this in mind, several of the reviewed evaluations showed an
                                                          excellent integration of triangulation and synthesis. For instance, in the es-
                                                          sential health care services evaluation report, “triangulation [was] applied at
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 4




                                                          multiple levels, first by cross-checking evidence sources within a given meth-
                                                          odological component. For instance, within country case studies interview
                                                          findings were compared across types of stakeholders (Bank Group staff, gov-
                                                          ernment officials, academia, health experts, and other development partners).
                                                          Second, triangulation across evaluation components—for example, cross-val-
                                                          idating findings from country-level case studies with findings from portfolio
                                                          analysis and literature reviews” (World Bank 2018g, 77). The evaluation also
                                                          took steps to triangulate evidence across the portfolio analysis, the country
                                                          case studies, and the intervention case studies of delivery mechanisms for
                                                          the case of the World Bank’s response to pandemics. The evaluation on the
                                                          rural nonfarm economy also provided an example of triangulation, pointing
                                                          out that the structured literature reviews played a central role in guiding the
                                                          analysis of project documents and data.
28	
Taken together, the meta-evaluation found that most evaluations in the
sample performed relatively well in terms of the attributes of reliability
outlined above. The integration of the evaluation design matrix was touted
as a major improvement in design, clarifying the role of individual methods
and enhancing the general reliability of evaluations. The meta-evaluation
also found that the use of the EDM had increased in recent years, indicating
a positive development with respect to reliability. While the large number of
methods used in certain evaluations raised some questions about the ade-
quate use of triangulation and synthesis of findings, in other evaluations this
issue was handled in a clear and satisfactory manner.


Attribute 3: Construct Validity
The concept of construct validity initially began in psychological research.
However, as Strauss and Smith (2009) have shown, this concept has been
broadened to cover the operationalization of key concepts and relationships
in other forms of research.15 In the context of evaluation, construct validi-
ty among other things relates to the theory of change or intervention logic
used in the conceptualization and delimitation of the evaluand. Bamberger
et al. (2004) define construct validity as “the adequacy of the constructs
used to define processes, outcomes and impacts,” including “the indicators
of outputs, impacts and contextual variables.”16 Specifically, the assessment
focuses on three facets of construct validity: attention paid to the identi-

                                                                                  World Bank Group Independent Evaluation Group    29
fication and operationalization of core concepts or variables, the ways in
which theories of change or intervention logics are used, and the integration
of existing (academic) research through structured reviews.17 Of the eight
Approach Papers reviewed, three were rated as adequate and five as partial.
Of the corresponding evaluation reports, four were rated adequate and four
as partial.

Most evaluations pay attention to the identification of core concepts, usu-
ally defining them in a supplemental glossary. Relatively fewer evaluations
provide a dedicated operationalization of core concepts. The learning and
results evaluation presents an interesting example of this discrepancy. The
evaluation drew heavily on World Development Report 2015: Mind, Society,
and Behavior, which incorporated insights from cognitive, social, psycholog-
                                                          ical, and neuroscience studies to better understand learning in Bank Group
                                                          operations. The evaluation defines the various types of learning and knowl-
                                                          edge used in the analysis of operations. The evaluation also outlines the
                                                          EAST principles to encourage behavior change, along with some behavioral
                                                          reactions like forming, storming, and norming.18 Some concepts like signal-
                                                          ing are not formally operationalized but can be deduced from the context in
                                                          which they are used.19

                                                          Turning to theories of change and intervention logics, the meta-evaluation
                                                          noted that all evaluations in the sample included some type of theory. Three
                                                          main approaches to the use of theories of change were identified in the review.

                                                          The first approach involved the presentation of an overarching “causal”
                                                          framework, often distinguishing among inputs, activities, outputs, and out-
                                                          comes. The framework often directed or restricted the analysis to specific
                                                          instruments, their intended results, and (at a high level) related economic,
                                                          sociological, or policy factors. While the exact relationships between the
                                                          steps of the theory were usually not fully articulated or empirically tested,
                                                          the theory nevertheless offered a sense-making framework aimed at decon-
                                                          structing the complex evaluand under consideration.20

                                                          Two examples illustrate this approach. The higher education evaluation
                                                          presented a conceptual model (the “evaluation framework for higher ed-
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 4




                                                          ucation”) of Bank Group support in this field (World Bank 2017d, 73). In
                                                          practice the model resembled a logic model, distinguishing among inputs,
                                                          outputs, and outcomes without delving into the mechanisms explaining the
                                                          occurrence of events.21 While the logic model structured the evaluation, it
                                                          did not serve as a full conceptual model in terms of testing, validating, and
                                                          assessing points of departure. Similarly, Mobile Metropolises: Urban Trans-
                                                          port Matters: An IEG Evaluation of the World Bank Group’s Support for Urban
                                                          Transport (FY17) provided a theory of change visualizing the links between
                                                          activities, outputs, intermediate outcomes, and development outcomes
                                                          (World Bank 2017e). The theory of change also listed eight “enabling factors”
                                                          such as culture, human capacity, and macro stability; however, the specific
                                                          relationships between these factors and outcomes was not explicitly speci-
                                                          fied. Once again, the theory of change resembled a logic model, “reflecting
                                                          how the World Bank Group’s strategy and sectoral leadership posited that its
30	
interventions would contribute to desired outcomes and impact. The emer-
gent elements became focal points of the evaluation, reflected in its chapter
organization” (60).22

The second approach to formulating and using theories of change involved
presenting a substantive intervention logic, often expanding on the underly-
ing package of interventions in a more rigorous empirical manner. Particular
attention is paid to mechanisms (behavioral, cognitive, economic, institu-
tional) that can alter the impact of projects, investments, and other inter-
ventions. In the sample of IEG evaluations selected for review, three were
identified as employing such an intervention theory.

In the evaluation on IFC client engagement, the theory of change reconstruct-
ed how “the objectives sought by IFC’s approach to client engagement were
expected to improve client outcomes and IFC’s development impact, as the
concept evolved over a series of IFC strategy documents” (World Bank 2018f,
55). The theory of change was then tested, with special focus placed on mech-
anisms like the targeting of selected companies as long-term partners. IFC
supported these entities “with dedicated client relationship teams to provide
them with … specialized local knowledge and contacts [to] assist with regu-
latory issues and mitigation of political risk” (59). Such interventions helped
develop transactions that advanced IFC’s strategic objectives, triggering
behavioral changes and promoting intangible benefits such as a deeper under-
standing of client needs and improved access to key client decision-makers.23

In the health services evaluation, the approach relied on a search of relevant    World Bank Group Independent Evaluation Group    31
literature to develop four specific intervention-related theories of change:
conditional cash transfers (CCT), performance-based financing, pandem-
ic preparedness and control, and public-private partnerships (World Bank
2018g). Next, these intervention theories were supported with evidence from
Bank Group sources (portfolio data) and existing evaluation literature. For
the CCT theory of change, the analysis addressed the degree to which Bank
Group support for CCTs in health services had effectively contributed to the
achievement of relevant health services-related goals (see figure E.1).

The framework integrated the following assumptions:

  1.	 The beneficiaries of CCT programs are currently underusing existing
      health services.
                                                            2.	 The existing supply of services is sufficient to accommodate increasing
                                                                demand.

                                                            3.	 The beneficiaries of CCT programs are aware of the program and correctly
                                                                informed about eligibility and available benefits.

                                                            4.	 The cash transfers received are used to finance health services and im-
                                                                prove food consumption as opposed to detrimental products like tobacco
                                                                and alcohol.

                                                            5.	 The transfers are sufficiently generous to incentivize compliance with the
                                                                required conditionalities.

                                                            6.	 The design features of the CCT (enrollment, verification of conditionali-
                                                                ties, cash transfer management) are credible means of producing the de-
                                                                sired behavioral changes. The theory was tested against existing literature
                                                                including some 30 impact evaluation studies on CCT programs.

                                                          The health services evaluation also featured a pandemic preparedness and
                                                          control theory of change, which was used to structure Bank Group activities
                                                          conducive to the realization of effective pandemic preparedness and miti-
                                                          gation strategies (World Bank 2018g; see figure E.11). The theory of change
                                                          noted that such responses required a collective global health response aimed
                                                          at fulfilling four critical conditions: surveillance, protection of the popula-
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 4




                                                          tion, effective outbreak response, and communication.24 Like the analysis of
                                                          CCTs, the theory of change laid out several assumptions necessary for the
                                                          achievement of the desired outcomes:

                                                            1.	 Frontline human resources would continue to provide essential health
                                                                services even under increasing risk of contagion.

                                                            2.	 The population and the health workforce would respond to behavior
                                                                change interventions (for example, information and incentives). Having
                                                                laid out a framework of interventions and assumptions, outcomes from
                                                                the Bank Group portfolio were then compared with the theory of change.

                                                          Finally, the urban transport evaluation paid attention to the “two lenses”
                                                          of behavior change and service delivery in an appendix (World Bank 2017e).
                                                          For the topic of behavior change, a model rooted in neoclassical and behav-
32	




                                                          ioral economics was developed, showing that such change is dependent on
communication, availability of resources, information on incentives, social
factors, and psychological factors.25 The model was then tested for a random
sample of World Bank urban transport projects, drawn from the larger urban
transport portfolio under review. The main objectives of this review were to
(i) explore the extent to which information on behavior change is available
in project documents, (ii) analyze how behavior change is described and
operationalized in project documents, and (iii) assess the quality of the in-
formation provided in project documents (140). Likewise, the issue of service
delivery was assessed using a theoretical framework applied to a random
sample of 68 World Bank investment operations drawn from core World Bank
operations identified by the urban transport evaluation (149).

The third approach to formulating and using theories of change involved
a combination of a general theory of change underlying a “macro-level”
complex evaluand (that is, a thematic or sectoral portfolio) and one or
more “nested” theories within this broader theoretical framework. Given
its expansive scope, the broader theory of change is not a testable theory
and serves as a broad sense-making framework (see previous discussion).
As such, only the nested theory is empirically tested in this approach. The
carbon finance evaluation provides an excellent example of this approach
(World Bank 2018a). The overarching theory of change was “developed
around the four main roles of carbon finance (CF), shaped by the changes
in global needs and priorities, with a focus on the following components:


                                                                                World Bank Group Independent Evaluation Group    33
(i) creating and developing markets, (ii) innovating carbon finance; (iii)
building capacity of the clients; and (iv) thought leadership and conven-
ing” (85). The approach resembles a more general or synthetic theory of
change, listing outputs and outcomes that could emerge from CF interven-
tions in relation to the four listed key components listed (see figure 1.1 on
page 6 of that report).

The evaluation also offered a nested theory on Emission Reduction Purchase
Agreements (ERPA) under the general assessment of carbon markets (World
Bank 2018a). The ERPA theory of change “fits squarely the logic of what Tro-
chim (1985) popularized as Pattern Matching” (125; figure C.1). The nested
ERPA theory was “tested based on new empirical evidence. The empirical
strategy retained for this study consisted of a combination of two case-based
methods that have a comparative advantage in providing robust evidence for
                                                          causal analysis: process tracing and QCA applied to 16 cases of ERPAs. For
                                                          each case, the evaluation team traced the contribution of the Bank Group,
                                                          the project entity, and other critical actors throughout the process of devel-
                                                          opment, implementation, and follow-through of each ERPA. Data collection
                                                          was broadly meant to include document review, field visits, and a series of
                                                          interviews with the key stakeholders engaged throughout the ERPA cycle and
                                                          beyond. Patterns of convergence and divergence across cases were systemat-
                                                          ically analyzed, using the logic of QCA, ultimately forming a robust empirical
                                                          base” (125).

                                                          The meta-evaluation’s assessment of construct validity concluded with an
                                                          appraisal of the integration of existing (academic) research through struc-
                                                          tured reviews. Several excellent examples were found among the eight re-
                                                          ports assessed. In appendix J of World Bank Group Support to Electricity Access
                                                          (FY15), a structured literature review was presented on “access to electricity
                                                          for improving health, education and welfare in low- and middle-income
                                                          countries” (World Bank 2015d, 128). The review served the primary objec-
                                                          tive of critically analyzing and synthesizing existing evidence to answer the
                                                          following question: What is the impact of electricity access on health, educa-
                                                          tion, and welfare outcomes in low- and middle-income countries?

                                                          In the health services evaluation, existing research was integrated through
                                                          an evidence gap map (World Bank 2018g). “The evaluation used [evidence
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 4




                                                          gap maps] EGMs to identify knowledge gaps on the effects of selected in-
                                                          terventions on expected health outputs and outcomes commonly targeted
                                                          by World Bank Group projects according to portfolio review evidence… The
                                                          searches resulted in a total of 5,506 citations coming from the Cochrane
                                                          Database of Systematic Reviews and others” (73).26

                                                          The carbon finance evaluation also made use of this method, using it to
                                                          better understand the function of the Clean Development Mechanism
                                                          (CDM), “the major international offset mechanism within the broader world
                                                          of carbon finance” (World Bank 2018a, 164). The CDM was designed to lead
                                                          to significant emission reductions “that will help reduce the cost of climate
                                                          mitigation in countries with commitments as well as contribute to sustain-
                                                          able development in the host countries” (164). As background for the evalua-
                                                          tion, IEG carried out a structured literature review on the generation of local
34	




                                                          community co-benefits from CDM projects.
While the examples listed above showcased the integration of existing re-
search in evaluations, it should be noted that the use of structured literature
reviews was not considered standard practice during the period examined
(FY15–19). For instance, the higher education evaluation referred to the
use of literature in only one section, reviewing “the existing academic and
policy literature to provide a better understanding of current thinking about
the sector” (World Bank 2017d, 73). Evidence from interviews indicates that
structured literature reviews have become more widely used since their “in-
troduction” in 2016.

In summary, the meta-evaluation noted adequate coverage of construct
validity issues in the sample of evaluations appraised. The evaluations paid
close attention to the definition of key concepts and took steps to outline
a meaningful theory of change. At the same time, more attention could be
paid to the operationalization of concepts (including the key variables and
measurement instruments used): coverage of this facet was less visible in the
eight reports reviewed.

As noted above, the reports generally took one of three approaches to for-
mulating a theory of change guiding evaluations. In the first approach,
a conceptual framework was used to delineate the inputs, activities, and
outputs that enable or restrict outcomes of interest. The frameworks usually
served as sense-making frameworks to better understand the often-complex
elements underlying the evaluand (for example, as a result of the time period

                                                                                  World Bank Group Independent Evaluation Group    35
assessed, number of projects examined, and so on). The second approach
involved the development of a substantive theory of change that underlies
more specific interventions, confronting that theory with evidence from the
empirical part of the evaluation. Particular attention was paid to the mech-
anisms underlying particular interventions. The third approach combined a
more general theory of change (covering Bank Group activities on a macro
level) with one or more nested theories of change, the latter of which were
empirically tested.

The coverage of theoretical frameworks illuminated a potential area of
growth for future IEG evaluations: while all the evaluations outlined their
underlying intervention logics, more could have been done to link them
to the empirical part of the studies.27 Furthermore, capturing insights from
                                                          existing research and evidence through the adoption of structured literature
                                                          reviews as a standard practice in evaluation seems to be gaining ground in
                                                          IEG’s evaluative work. The sample provided several excellent examples high-
                                                          lighting the benefits of this practice.


                                                          Attribute 4: Internal Validity
                                                          In IEG’s self-evaluation systems evaluation, internal validity was defined
                                                          as “how well an assessment tool measures what it is intended to measure”
                                                          (World Bank 2016a, viii). Like accuracy, the concept of internal validity also
                                                          refers to the degree of confidence in the causal or contributory relationship
                                                          being evaluated, as well as the assurance that findings were not influenced
                                                          by external factors. Internal validity concerns the extent to which a study
                                                          establishes a trustworthy causal relationship (or attribution). Alternatively, it
                                                          assesses the establishment of a trustworthy contributory relationship between
                                                          interventions and outcomes. This includes an evaluation of the degree to
                                                          which studies address and explore possible alternative explanations.

                                                          Internal validity is particularly important given the scope and complexity
                                                          of IEG evaluations. Conventional threats to internal validity (for example,
                                                          attrition, maturation) can be exacerbated by the inherent complexity of the
                                                          evaluand, a notable concern given that the evaluations covered by the me-
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 4




                                                          ta-evaluation each (often) covered hundreds of projects spread over dozens
                                                          of countries. The meta-evaluation’s assessment of internal validity focused
                                                          on four attributes: the extent to which issues of causality, attribution, and
                                                          contribution were discussed, the degree to which causal questions were ad-
                                                          equately addressed by the methods employed, the level of attention paid to
                                                          unintended effects, and the discussion of internal validity concerns relative
                                                          to the validity of findings.

                                                          Of the eight Approach Papers reviewed, two were scored as adequate, three
                                                          as partial, and three as inadequate. Of the corresponding evaluation reports,
                                                          two were rated adequate, five as partial, and one as inadequate. Some of the
                                                          strengths and weaknesses related to internal validity are outlined through
                                                          the examples highlighted below.
36	
As noted in the discussion of construct validity above, the carbon finance
evaluation included a well-developed nested theory of change, along with a
pattern-matching exercise and a case study design for causal analysis (World
Bank 2018a). The case study design consisted of the following steps assuring
internal validity:

    First, for each of the 16 cases, we traced the process of change at play
    throughout the 15 steps of the theory of change (developed in detail in a
    separate common template for data collection; the main steps are shown
    in appendix C.1) and the causal contribution of the World Bank Group and
    other contributory actors and factors, with rich and deep description.

    Second, a systematic analysis of patterns of convergence and divergence
    across cases for each step of the causal chain was performed.

    Third, the empirical patterns emerging from the cross-case comparison
    were linked to the theory of change, checking for match and mismatch.

    Fourth, given the causal complexity underlying the explanation of the five
    main outcomes of interest, the team resorted to crisp-set QCA to formally
    test the theory of change. Crisp-set QCA is a well-established technique
    which resorts to Boolean minimization to ‘simplify complex data structures
    in a logical and holistic manner.’” (World Bank 2018a, 126)

The structured literature review on the CDM also produced relevant insights


                                                                                   World Bank Group Independent Evaluation Group    37
on causality and contribution (World Bank 2018a). Finally, the econometric
study assessed the Bank Group’s effectiveness “in reducing greenhouse gas
emissions through its support to the Clean Development Mechanism (CDM)
interventions” (144). The evaluation combined several approaches and em-
pirical strategies that constituted a convincing causal narrative, supporting
the internal validity of the findings.

In the health services evaluation, the complexity of assessing internal validi-
ty was discussed in depth:

    “Although overall portfolio analysis exploited the breadth of the evaluable
    material, IEG acknowledges that the assessment of project effectiveness
    through outcomes ratings challenges the internal validity of the evaluation
    findings. First, outcome ratings used in the portfolio analyses are based on
                                                              incomplete samples of closed projects. Second, when available, outcome rat-
                                                              ings tend to be a biased measure of the overall projects’ success. Third, the
                                                              team recognizes that IFC [investment services] IS, IFC [advisory services] AS
                                                              and World Bank project financing define and monitor objectives differently,
                                                              therefore direct comparison between interventions with regards to the rat-
                                                              ings of project outcomes and [project development objective] PDO’s efficacy
                                                              should be considered with caution.” (World Bank 2018g, 78)

                                                          Though not focusing on internal validity per se, the evaluation took pains
                                                          to ensure the validity of findings, “including consultations with World Bank
                                                          Group staff, use of specific protocols and coding templates … and intercoder
                                                          reliability and quality control measures to guarantee a consistent approach
                                                          to coding and analysis across evaluation components and across team mem-
                                                          bers” (World Bank 2018g, 77).

                                                          The report also noted that the use of outcome ratings in intervention-type
                                                          case studies presented additional challenges related to the complexity of
                                                          health projects (World Bank 2018g). Given that health projects are usually
                                                          composed of multiple overlapping interventions, project outcome ratings
                                                          can become a rather imperfect measure of the effectiveness of each specific
                                                          intervention. The evaluation was further complicated by the fact that rela-
                                                          tively few closed projects were available for assessment, offering a limited
                                                          sample for the inference of Bank Group contributions to health outcomes.
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 4




                                                          The evaluation on growing the rural nonfarm economy presented another
                                                          interesting vignette with respect to internal validity (World Bank 2017c). An
                                                          appendix on community-based approaches reviews interventions in terms
                                                          of their objectives, targeting, metrics, and results. The review is critical with
                                                          regard to the design of a number of projects, what was measured (often un-
                                                          clear), the completeness of data (often incomplete), how data were treated,
                                                          and which methods were used. Some of the criteria evaluated were in line
                                                          with “evidence or design hierarchies” that evaluators use to separate the
                                                          valuable from the useless when addressing internal validity.28

                                                          The IFC client engagement evaluation took several steps to ensure that a
                                                          consistent approach was taken by the evaluation team members—for exam-
                                                          ple, using a case study template and interview protocols to ensure a common
38	




                                                          framework and evaluative lens across studies (World Bank 2018f). The eval-
uation also demonstrated empirically (through an econometric analysis of
client learning versus selection) a self-reinforcing selection effect through
which client quality and strategic fit promoted a gradual deepening of rela-
tionships into a de facto strategic engagement.

It should be noted that several of the evaluations examined in the sample
were less successful in addressing issues related to internal validity, engag-
ing in a limited discussion of causality or contribution. For example, the
electricity access evaluation made numerous references to effectiveness and
impact, but there was never an explicit discussion of causality or contribu-
tion issues (World Bank 2015d). Self-reported achievement of project objec-
tives (some measured at output or direct outcome levels) was equated with
impact, establishing a line of argumentation that does not apply in situa-
tions where human behavior is crucial to making the infrastructure work (for
example, through interactions with human dimensions such as awareness,
education, gender responsiveness, accessibility, and so on).

While the higher education evaluation made the limitations of the underlying
evidence base explicit, the report still drew largely unfounded higher-order
causal claims (World Bank 2017d). Though the evaluators’ instincts may be
correct with respect to the conclusions drawn, the mechanisms underpinning
causal analysis were nonetheless weakly formulated. Similar conclusions were
drawn from interviews with the learning and results evaluation team.



                                                                                  World Bank Group Independent Evaluation Group    39
Taken together, the meta-evaluation’s assessment of internal validity yield-
ed mixed results on this attribute, making it an important area for improve-
ment for the credibility and quality of IEG evaluations. More could be done
to address conventional threats to validity. Although evaluations need not
engage in causal analysis, triangulation of evidence across different sourc-
es and a more explicit acknowledgment of potential limitations would
strengthen the internal validity of findings in future evaluations.


Attribute 5: External Validity
External validity (or generalizability) refers to how well the findings from an
evaluation can be expected to apply in other settings. For instance, do the
findings apply to other people, organizations, situations, and time periods?
                                                          The meta-evaluation focused on five facets related to the generalizability of
                                                          findings: the extent to which generalizability was discussed, whether exter-
                                                          nal validity concerns affected the validity of findings, whether attention was
                                                          paid to population validity, how issues of ecological validity were addressed,
                                                          and the coverage of temporal validity. Population validity is here defined as
                                                          the extent to which reports pay attention to the ability to generalize results
                                                          to other individuals or targeted groups. Ecological validity refers to the level
                                                          of attention paid to generalizability across different settings. Finally, tem-
                                                          poral validity refers to the ability to generalize findings across time. Of the
                                                          eight Approach Papers reviewed, five were rated as partial and three as inad-
                                                          equate. Of the corresponding evaluation reports, two were rated adequate,
                                                          four as partial, and two as inadequate.

                                                          The assessment found that the coverage of external validity was subject to
                                                          certain weaknesses among the five facets explored, resulting in partial rat-
                                                          ings for several of the reports reviewed. For instance, several reports provid-
                                                          ed limited discussion of the limitations on generalizability.29 Other reports
                                                          provided a relatively narrow sample of country-level assessments with lim-
                                                          ited attempts to systematically establish the causal underpinnings of change
                                                          observed in relation to the overarching evaluation questions.

                                                          While aspects of temporal and ecological validity were well covered, there
                                                          was no explicit discussion of the generalization of findings in the higher
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 4




                                                          education evaluation (World Bank 2017d). The carbon finance evaluation
                                                          identified certain weaknesses related to external validity but did not expand
                                                          on specific mitigation strategies (World Bank 2018a). This was also the case
                                                          in the IFC client engagement evaluation (World Bank 2018f). However, the
                                                          rural nonfarm economy evaluation explicitly focused on the way in which
                                                          variations in country conditions limited the generalizability of findings,
                                                          aligning with the report’s goal of formulating a holistic understanding of
                                                          Bank Group engagement in this area (World Bank 2017c).

                                                          Although the evaluation questions can guide the evaluation toward gener-
                                                          ating generalizable findings, there are rare instances when (given the in-
                                                          stitutional context) the nature of external validity can vary from the intent
                                                          of the evaluation.30 The urban transport evaluation operationalized urban
                                                          mobility through four variables, but two of the four were based on evidence
40	
from country case studies in Africa (World Bank 2017e, 14–15). The lack of
representativeness in cases (relative to the rest of the Bank Group portfolio)
may have affected the ecological validity of the results across other relevant
contexts.

However, several evaluations provided excellent coverage of external validity
issues. For instance, the evaluation on learning and results in World Bank
operations was explicit about the representativeness and randomness of the
sample of evidence used (World Bank 2015b, 3–4). The evaluation also made
clear its focus on ecological (as opposed to population) validity, specifically
for the case studies chosen to reflect the diversity in contexts. Finally, the
evaluation noted an intention to arrive at conclusions that would prove use-
ful for the World Bank, incorporating a discussion of how the results should
be interpreted to ensure temporal validity (2–3).

To conclude, while the ratings indicate a mixed picture on external validity,
the discussion and approach to this attribute were generally consistent with
the nature of the evaluations. Aspects of ecological and temporal validity
were generally well covered. Some evaluations explicitly spelled out the lim-
itations of generalizability across contexts but provided limited mitigation
strategies. This did not always constrain the inferences made from specific
findings to broad conclusions for Bank Group interventions.


Attribute 6: Data Analysis Validity
                                                                                  World Bank Group Independent Evaluation Group    41
Hedges (2017) distinguishes between data analysis validity and the more
narrowly defined statistical conclusion validity, which gauges whether the
conclusions of a study are founded on robust statistical inferences. Data
analysis validity is a broader concept that also addresses issues such as
whether the evaluation has paid attention to risks of bias (unreliable data,
improper choice of methods, incorrect use of methods) and has indicated
ways to address risks associated with these issues. Three factors are consid-
ered in the meta-evaluation’s assessment of this attribute: whether atten-
tion is paid to risks of bias (from unreliable data, incorrect use of methods,
and so on), whether the evaluation indicates ways to address risks of bias,
and indications of data analysis concerns related to validity. Of the eight
Approach Papers reviewed, three were scored adequate, three as partial, and
                                                          two as inadequate. Of the corresponding evaluation reports, one was rated
                                                          adequate, six as partial, and one as inadequate.

                                                          While the quality of the data analysis was generally found to be good across
                                                          the sample, two common challenges were noted for this attribute, relating to
                                                          issues of transparency and triangulation. First, some evaluations faced diffi-
                                                          culties in clearly demonstrating the stream of evidence that supported some
                                                          of the key findings. Second, triangulation of evidence was found to be insuf-
                                                          ficient in certain contexts. However, certain evaluations proved very success-
                                                          ful with respect to both challenges. The carbon finance evaluation took care
                                                          to ensure data sources were validated at every stage (World Bank 2018a).
                                                          Likewise, the higher education evaluation effectively addressed the risk of
                                                          bias in a transparent manner, triangulating evidence from multiple sourc-
                                                          es to reach a cohesive and convincing assessment (World Bank 2017d). The
                                                          use of triangulation was evident in the latter evaluation’s assessment of the
                                                          Bank Group’s support to access, retention, and equity in its higher education
                                                          portfolio. Evidence from interviews and case studies was explicitly compared
                                                          with the Country Partnership Frameworks, the country strategy analysis,
                                                          and portfolio analysis. Both the range of methods used and the transparency
                                                          with which the output was synthesized reflected a high standard of research.

                                                          The evaluations examined in the sample also took steps to discuss the
                                                          potential limitations of the input data. However, in some instances the data
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 4




                                                          analysis did not go far enough to expand on the quality of the underlying
                                                          data. The electricity access evaluation provides an example. In this case, the
                                                          assessment of results drew primarily on the reporting of indicators derived
                                                          from the projects under review (World Bank 2015d). While these indicators
                                                          were transparently reported, the risk of bias underpinning the data was not
                                                          discussed. This contrasted strongly with the explicit consideration of bias in
                                                          the external literature informing the evaluation. The reliance on secondary
                                                          data sources had the additional effect of reducing the strength of evidence
                                                          where reporting was weak; indicators on welfare outcomes (including
                                                          gender-related outcomes) were more likely to be missing, poorly defined, or
                                                          inadequately followed up during project implementation.

                                                          Overall, while the evaluations examined in the sample were generally robust
                                                          in addressing data analysis validity, data quality concerns and strategies to
42	
mitigate potential biases resulting from weaker data came up as areas of
concern under this attribute. Expanded focus on these facets would generally
improve the validity of findings in future evaluations.


Attribute 7: Consistency
Consistency refers to the need for a logical flow between the evaluation
rationale, questions, design, data collection, analysis, findings, and recom-
mendations. It is, thereby, only applicable to evaluation reports, given that
Approach Papers (by definition) do not integrate any findings. Of the eight
reports examined, four were scored as adequate and four as partial. The
reports examined fared relatively well with respect to this attribute. As such,
the challenges listed below mainly apply to areas in which further improve-
ments can be achieved from an already strong baseline.

There was a generally strong fit between the use of methods and data sourc-
es used to address evaluation questions. However, more could be done to
provide a consolidated explanation of how specific methods advanced the
evaluation and what each approach was designed to contribute to the analy-
sis under each evaluation question. An example of good practice on this can
be found in the IFC client engagement evaluation (World Bank 2018f): the
report outlined each of the methods used and why in each case.31 This pro-
vided the reader with a clear view of how they should expect each method to


                                                                                    World Bank Group Independent Evaluation Group    43
contribute to the evidence base and the overarching objectives of the evalu-
ation.

While the findings presented in evaluation reports generally related well to
the evaluation questions, two related challenges were noted in the sample.
First, subtle (but potentially significant) shifts in the interpretation of eval-
uation questions could alter the course of the evaluation, particularly if the
central questions are paraphrased within the report.32 Second, the danger of
findings “overreaching” relative to the data analysis can hinder the effec-
tiveness of the prescriptions or generalizations derived from an evaluation.
In the electricity access evaluation, the report states that “the World Bank’s
performance in the electricity sector is somewhat lower than its performance
in other infrastructure sectors combined” (World Bank 2015d, 23). However,
it is then suggested that “the complexity and diversity of energy sector activ-
                                                          ities and operations compared with those of other infrastructure sectors may
                                                          partly explain this difference” (23–24). This latter claim is neither substanti-
                                                          ated nor explored further.

                                                          In most cases, recommendations from the report followed logically from the
                                                          evidence and findings presented. For instance, the carbon finance evaluation
                                                          presented a clear and explicit flow between the evaluation logic, methods
                                                          deployed, and findings derived (World Bank 2018a). The chapter “Effective-
                                                          ness of World Bank Group Roles” was structured in accordance with the the-
                                                          ory of change (see figure 1.1 of that report). This itself was clearly justified
                                                          with the roles of the Bank Group in this sphere (see pp. 3–4, 6). Statements
                                                          were transparently related to the evidence stream from which they were
                                                          derived. In addition, endnotes in the chapter provided additional evidence
                                                          for many of the points made (see pp. 56–60). The flow from the intervention
                                                          logic to arguments, evidence, and findings presented a clear and compelling
                                                          case to support the evaluation’s findings.

                                                          At a minimum, there was generally a good multitiered depiction of links be-
                                                          tween different levels of intervention and different levels of outcomes in the
                                                          evaluations. However, the meta-evaluation did not find examples where this
                                                          framing was then worked into a model to help better understand and probe
                                                          the underlying issues identified. This is surprising given that the nature
                                                          of the evaluand often had strong features of dependency between actions
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 4




                                                          taken at different levels. Yet how such links were investigated was not always
                                                          sufficiently clear. Exploring and understanding these links in a selective and
                                                          targeted way is critical, particularly where assumptions of linearity do not
                                                          hold or else apply only under certain restrictions.33

                                                          The higher education evaluation provides an example of this point (World
                                                          Bank 2017d). The evaluation posed three central questions. First, was World
                                                          Bank support to higher education consistent and well articulated? Second,
                                                          did the World Bank contribute to higher education systems? Third, did sup-
                                                          port for higher education contribute to improved socioeconomic outcomes?
                                                          To address the third question in a robust way, attention must be paid to what
                                                          may be dubbed “macro-meso-micro” links: How does World Bank support
                                                          influence or contribute to what the evaluation framework calls “broader out-
                                                          comes” like skills and impacts (poverty reduction, employment, productivi-
44	
ty)? Such broader outcomes must be measured at the level of beneficiaries.
However, the links between the elements in the evaluation framework and
micro-level behavior were not addressed.

Several macro-level variables referred to in the visualization of the evalua-
tion’s logic model invoked concepts like political economy, business climate,
environmental and social conditions, and so on (World Bank 2017d). But
the evaluation did not clearly articulate how these were linked to the meso-
(Bank Group support for higher education) and micro- (outcomes impact)
levels. The evaluation noted that micro-level interventions “to improve
equity, teaching and learning, employability, and research outcomes are all
amenable to rigorous piloting and evaluation, unlike systemwide reform,
which is more difficult to measure” (34). Elsewhere, the evaluation notes,
“although the World Bank supervised the grants, there is little evidence that
it provided support or direction to project staff of beneficiaries in the form of
evidence on ‘what works’ in higher education pedagogy” (43–44). This pres-
ents yet another indicator of the importance of paying closer attention to
macro-meso-micro links.

The nature of macro-meso-micro links could also be more explicitly elab-
orated. Such links can be defined as the way in which Bank Group inter-
ventions trickle down to individual decision-makers and beneficiaries.
Frameworks such as the Coleman Boat Model are particularly effective at
emphasizing such links (Coleman 1990). The model distinguishes between

                                                                                    World Bank Group Independent Evaluation Group    45
three types of mechanism that are jointly required to explain the existence
of a relationship between macro situations and the characteristics and out-
comes of individual behavioral choices. The first (situational mechanisms) op-
erate at the macro-to-micro level. They show how specific social situations
shape the beliefs and opportunities of individual actors.34 The second (ac-
tion-formation mechanisms) operate at the micro-to-micro level. This mech-
anism assesses how individual choices and actions are influenced by specific
combinations of (individual) behavioral characteristics, capacities, oppor-
tunities, and limitations.35 The third (transformation mechanisms) operate at
the micro-to-macro level and show how individuals generate macro-level
outcomes through their actions and interactions.36
                                                          To conclude, the evaluations performed well on this attribute, presenting a
                                                          strong fit between the use of methods and data sources for each evaluation
                                                          question. Less clearly evident or articulated was the link between methods
                                                          and the scope for inference (from the evidence generated by the evaluation’s
                                                          methods of inquiry). Overall, most of the recommendations logically fol-
                                                          lowed from the evidence presented. The acknowledgment or assessment of
                                                          interlevel links tended to be implicit rather than explicit.
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 4
46	
1 
     The scores are based on a combination of ratings assigned by the external experts to each
respective evaluation reviewed in the sample.

2 
     For the sake of parsimony, issues related to institutional complexity within the Bank Group
itself will not be discussed in this meta-evaluation.

3 
     The evaluation questions listed in the evaluations from the sample are summarized in
appendix F. While Kane’s (1984) suggestion that all evaluation questions should be posed as
a single sentence is an exaggeration, the assessment framework takes steps to assess cases
in which evaluation questions are insufficiently focused. Per Goethe’s proverb that “in der
Beschränkung zeigt sich erst der Meister,” the scope of an evaluation can become unclear if
it is approached via a set of unstructured questions. When an overarching research problem
includes some 10–15 (or more) questions and subquestions, it becomes increasingly difficult
to see how each specific question relates to the rest, reducing the overall utility and effec-
tiveness of the queries. Such a failure can also occur in the opposite direction. As an example,
Epstein and Martin (2014, 23) cite the question, “what leads people to obey the law?” Though
it presents an interesting problem, it is impossible to answer without further disaggregation.
Finding the correct balance between these extremes requires careful calibration, something
that was appraised in this component of the meta-evaluation. See also White (2010) and
Leeuw and Schmeets (2016; chapter 3).

4 
     See White (2010), Bunge (1997), Ultee (2001), and Leeuw and Schmeets (2016).

5 
     In his article “Who’s afraid of research questions? The neglect of research questions in the
methods literature and a call for question-led methods teaching,” White (2013) discusses this


                                                                                                     World Bank Group Independent Evaluation Group    47
issue in the context of the educational sciences. Appendix G addresses potential failures when
formulating evaluation questions.

6 
     Issues of question clarity and focus could also be addressed in the evaluation design matrix.
The “bag of questions” approach can also be characterized by substantial variations in the
focus of evaluation questions. At times, the questions discuss high-level strategic issues. In
others, the subquestions address rather specific topics (such as the source, operationalization,
and description of service delivery in project appraisal documents).

7 
     Furthermore, the report defines two mechanisms for scoping: a self-reinforcing selection
mechanism and a demonstration mechanism.

8 
     For example, the higher education evaluation portfolio analysis examined the following
documents (World Bank 2017d): Implementation Completion and Results Reports, Imple-
                                                          mentation Completion and Results Reports Reviews, and Project Performance Assessment
                                                          Reports. Furthermore, “a standard quantitative portfolio review was conducted of IFC’s higher
                                                          education portfolio detailing the number of new investment projects committed between
                                                          FY03 and April 30, 2016, and the volume of investments committed” (74–75). In the absence
                                                          of an identified portfolio, the rural nonfarm economy evaluation “used the theme code ‘rural
                                                          nonfarm income generation,’ which was applied by the World Bank to 152 projects between
                                                          2004 and 2014” (World Bank 2017c, 8). After disaggregating the activities collected under the
                                                          code, the evaluation “identified 529 World Bank projects, valued at $35 billion, which have
                                                          directly supported rural nonfarm income generating activities during the same period” (213).
                                                          In the urban transport evaluation, the portfolio covered 73 community-based projects (plus
                                                          32 additional financing), of which 44 (valued at $8.3 billion) were closed and evaluated (World
                                                          Bank 2017e). “IEG filtered and identified projects approved between 2004 and 2014 that were
                                                          within the Transport sector board, were rural themed, and that had a ‘Rural and Inter-Ur-
                                                          ban Roads and Highways’ code or a ‘Roads and Highways’ code (n = 162). It then filtered and
                                                          identified projects within the Agricultural and Rural Development sector board that included
                                                          a ‘Rural,’ an ‘Inter-Urban Roads and Highways’ (TI), or a ‘Roads and Highways’ (TA) sector
                                                          code (n = 70)” (214). Finally, the electricity access evaluation “assessed both quantitative and
                                                          qualitative results for individual projects during FY2000–2014. The portfolio review covered
                                                          all projects for the World Bank, IFC, and MIGA that were approved or closed/matured during
                                                          [this period]” (see table 1.2 of that report).

                                                          9 
                                                               See the higher education evaluation report (xi) for an example of this.
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 4




                                                          10 
                                                                This definition is in line with many methodological handbooks and guidance publications.
                                                          See Vaessen (2018).

                                                          11 
                                                                See also White (2010), Gorard (2010), Leeuw and Schmeets (2016), and de Vaus (2001).

                                                          12 
                                                                The interviews asked staff to relate the ways in which the World Bank’s new organizational
                                                          structure was likely to impact learning and knowledge-sharing in operations.

                                                          13 
                                                                In this regard, Janesick (1998) refers to such proliferation as “methodolatry.” See also White
                                                          (2013; 219–20).

                                                          14 
                                                                See Bamberger et al. (2004), who coined this term. Basically, they refer to the time, data, and
                                                          budget constraints under which evaluations are implemented.

                                                          15 
                                                                See Strauss and Smith (2009) and Dfid (2012).
48	
16 
      Bamberger et al. (2012, 219ff). Such conceptualization was first presented in Campbell and
Stanley (1963) and later revised by Cook and Campbell (1979) and Shadish (2002). Construct
validity is here defined as “the degree to which inferences are warranted from the observed
persons, settings, and cause-and-effect operations included in a study to the constructs that
these instances might represent” (Shadish et al. 2002, 38). For more on the Campbellian ap-
proach to construct validity, see Lund (2020).

17 
      See World Bank (2018), Conducting a Structured Literature Review in the Framework of IEG
(Major) Evaluations.

18 
      The EAST acronym is derived from the following: “If you want to encourage a behavior,
make it Easy, Attractive, Social and Timely.”

19 
      Although construct validity originally emerged from psychological research, Strauss and
Smith (2009) showed how this concept can be broadened to cover the definition and op-
erationalization of key concepts in studies, as well as the relations between concepts and
variables.

20 
      This was particularly valuable for evaluations that spanned across multiple years, projects,
interventions, and different institutional layers.

21 
      In the report, the mechanism concept is only referred to in reference to issues of tracing,
funding, and quality assurance.

22 
      Two “evaluative lenses” are presented: one on behavioral change and the other on service
delivery.

      The literature review that underpinned the evaluation also cited mechanisms such as trust      World Bank Group Independent Evaluation Group    49
23 


and raising awareness.

24 
      See Lee and Fidler (2007).

25 
      The model was dubbed CRI2SP, standing for communication, resources, incentives, informa-
tion, society, and psychology (figure 4.1).

26 
      Evidence gap maps are evidence collections that map out existing and ongoing systematic
reviews or primary studies on a particular set of interventions in a framework of policy rele-
vant interventions and outcomes.
                                                          27 
                                                                Specifically, it is important to ensure that there are feedback loops between theory and
                                                          empirical evidence. While the theory determines how evidence is brought in, the latter can be
                                                          used to iteratively refine the former.

                                                          28 
                                                                The Maryland Scientific Methods Scale is one example of such a design hierarchy. The
                                                          Cochrane Collaboration, the Campbell Collaboration, and several other organizations have
                                                          developed publications, protocols, and other guidance documents on this topic.

                                                          29 
                                                                For instance, the evaluation on World Bank Group support to electricity access (World Bank
                                                          2015d).

                                                          30 
                                                                For example, the learning and results evaluation explicitly included a country case study
                                                          that was not intended to be representative of the Bank Group portfolio (World Bank 2015b).
                                                          Findings were based on evidence gathered from a pre-2014 organizational structure, whereas
                                                          recommendations were framed around the perceived needs of a post-2014 reformed structure
                                                          in which power had shifted from countries and regions to sector and thematic practices.

                                                          31 
                                                                For example, “the evaluation also included some interviews with IFC comparator institu-
                                                          tions to benchmark IFC’s approaches to client engagement,” and “a comprehensive assess-
                                                          ment of IFC’s investment and advisory portfolio … to derive characteristics and patterns of
                                                          performance” (World Bank 2018f 5).

                                                          32 
                                                                The health services evaluation provides an example of this phenomenon.

                                                          33 
                                                                As noted in the Results and Performance of the World Bank Group 2020, the World Bank
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 4




                                                          Group collects limited systematic evidence on its contribution to higher-level outcomes.
                                                          Higher-level outcomes result from the interplay of different projects and types of World
                                                          Bank Group engagements—lending, knowledge, and convening—over time (World Bank
                                                          2020b). In response, the Board requested more evidence on how interventions help achieve
                                                          Sustainable Development Goals. “Better evidence on higher level outcomes would also help
                                                          with learning, reflections on strategy, and course corrections where needed.” See https://ieg.
                                                          worldbankgroup.org/blog/what-world-bank-groups-performance-results-cannot-tell-us-
                                                          about-development-outcomes.

                                                          34 
                                                                For example, this can involve the opportunity structures by which a community is defined:
                                                          the more opportunities (such as employment) present, the greater the chance that any indi-
                                                          vidual will be able to find work. Another example can be found in the demographic compo-
                                                          sition of families and societies (including the Easterlin mechanism linking the size of birth
                                                          cohorts to job opportunities, and so on).
50	
35 
      Examples include cognitive dissonance, fundamental attribution errors, as well as other
cognitive biases. Crowding out, stress levels, relative deprivation, reactance, and incentive-
response mechanisms are also included in this category.

36 
      Examples include threshold effects (also referred to as tipping points or critical mass mod-
els of collective action).




                                                                                                     World Bank Group Independent Evaluation Group    51
       5 | Using Innovative Methods in
                 Independent Evaluation Group
                 Evaluations


       Evaluation question 5. What do evaluation reports, Approach Papers, and
       interviews with IEG staff tell us about the use of innovative methods in the context
       of evaluation in IEG?

       As noted in chapter 3, conventional methods such as case studies, struc-
       tured interviews, and statistical analysis were relatively common across
       the sample, with innovative or broadened methods present in a minority of
       the reports studied. Nearly all evaluations employed some combination of
       interviews, case studies, desk reviews, and surveys. The total count of con-
       ventional methods tended to be higher in the final evaluation reports than
       what was initially proposed in the Approach Papers. Furthermore, analysis
       of temporal trends suggested that the adoption of more innovative methods
       had increased in more recent evaluations.

       Given that one of the goals of the meta-evaluation was to “provide IEG’s
       Leadership Team with an external perspective on how to improve the qual-
       ity and credibility of IEG’s evaluations,” attention was paid to the use of
       innovative evaluation methods in both the review of Approach Papers and
       reports and during interviews with IEG staff. With respect to the latter, it was
       noted that several ongoing evaluations have expanded the scope of methods
       employed, suggesting a growing trend with respect to this issue. Among the
       methods used, the meta-evaluation found a growth in applications of geo-
       spatial analysis, process tracing, QCA, machine learning, and social network
       analysis. An inexhaustive set of examples is discussed below. Given the
       fact that we did not pass any summative judgment on the use of innovative
       methods, we cite some examples from the sample as well as from other (in-
       cluding more recent ongoing) evaluations.
52  
Geographically targeted analysis of georeferenced data on World Bank in-
vestments was used in the Mexico Country Program Evaluation: An Evaluation
of the World Bank Group’s Support to Mexico (2008–17). The background of
this approach is described as follows in appendix 1 of the report: “geo-ref-
erenced poverty and aid data allow to evaluate targeting effectiveness of
development interventions. Initially, this can be done by correlating the
geographical allocation of World Bank projects at regional level with re-
gional measures of (under)development. Relatively high correlations are
consistent with effective geographic targeting, whereby most resources are
directed toward underdeveloped regions. However, finding low correlations
may not necessarily point to poor targeting as there are many factors poten-
tially affecting the allocation of World Bank projects. Therefore, a regression
approach is necessary, controlling for other factors such as conflict, public
spending and other factors.”

The carbon finance and engaging citizens evaluations provide clear exam-
ples of the benefits of process tracing in evaluation. In the latter, “the evalu-
ation team piloted an in-depth causal analysis method called process tracing
in the case of the Reportes Comunitarios of the national CCT of the Domini-
can Republic. Process tracing was used to assess the impact of embedding a
participatory monitoring in the CCT and to evaluate the significance of the
World Bank’s contribution. Process tracing is a rigorous method of with-
in-case causal inference that relies on Bayesian updating logic to transpar-


                                                                                    World Bank Group Independent Evaluation Group    53
ently assess the probative value of pieces of evidence provided to justify
specific contribution claims.”1

The use of a (semisupervised) machine learning approach presented another
example of innovation in evaluation. In the Approach Paper for Evaluation
of the World Bank’s Support to Improving Child Undernutrition and Its Determi-
nants, such an approach was piloted to assess the Bank Group’s contribution
to reducing undernutrition, exploring the effectiveness of various interven-
tions relative to the outcome. Having identified key concepts from the un-
derlying theory of change, machine learning was then used to explore a large
portfolio of projects across sectors and databases in a more efficient way.
Given that nutrition interventions can be nested in a broad pool of projects
(such as those involving health, agriculture, water, governance, and social
protection), a machine learning–supported portfolio analysis presented a
                                                          more effective means of examining the pool of over 4,000 projects consid-
                                                          ered in the evaluation scope. This was complemented with the production
                                                          of automatically generated knowledge graphs that explicitly encoded expert
                                                          knowledge that would otherwise have been difficult to capture.2 The combi-
                                                          nation resulted in the development of a more nuanced theory of change, as
                                                          well as a streamlined portfolio review process.3

                                                          Finally, social network analysis was applied in several reports, including
                                                          the Knowledge Flow and Collaboration under the World Bank’s New Operating
                                                          Model (FY19) and World Bank Group Support to Health Services: Achievements
                                                          and Challenges evaluations. The evaluation The World’s Bank: An Evaluation
                                                          of the World Bank Group’s Global Convening (FY20) also used this approach,
                                                          analyzing Twitter data “to assess the reach and visibility of the Bank Group
                                                          on Twitter and to compare its connectedness in its social networks on select-
                                                          ed issue areas with that of key actors (by virtue of their mandate and com-
                                                          parative strengths) in said area” (World Bank 2020c, 50).4

                                                          In several interviews with task team leaders and senior evaluators, atten-
                                                          tion was paid to the importance of broadening the integration of innovative
                                                          methods in IEG’s evaluations. Interviews on the development of innovative
                                                          methods suggested a generally positive trend in recent years, moving toward
                                                          the broader integration of such methods into evaluations. In some cases,
                                                          innovation was perceived to be coming “from the outside or from above”
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 5




                                                          without due consideration of the relevance of these methods to the subject
                                                          of evaluation. It was noted that if innovation is imposed from the outside
                                                          it could contribute to a (less than optimal) fragmentation of resources and
                                                          evaluation results.5

                                                          Overall, the meta-evaluation noted that the use of innovative methods has
                                                          increased in IEG evaluations over time. The inventory of methods from
                                                          IEG evaluations (chapter 3) supports this assertion. As noted previously,
                                                          innovative methods include the analysis of big data from social media
                                                          sources, geospatial data, and “text-as-data” approaches (including
                                                          machine learning in portfolio analysis), as well as specialized theory-based
                                                          evaluations. Theory-based evaluation methods can be used to reconstruct
                                                          and test the underlying assumptions about mechanisms (behavioral,
54	
cognitive, economic, and institutional) that can explain how and under what
circumstances Bank Group interventions can have an impact.6

The meta-evaluation also noted that innovative methods can be classified
into two categories. First, there are some innovations that may significantly
influence the overall design and approach to evaluation. For example, some
of the new text analytics and machine learning approaches change the way
portfolios are identified and analyzed. Other innovative approaches can
better be classified as “boutique studies,” a term that carries both a positive
connotation and certain implications of detachment. In principle, innovative
“boutique studies” should be stimulated. Experimentation in the use of in-
novative methods can be a strong incentive for staff and can help IEG main-
tain its edge as a leading evaluation institution. Yet, prudence is in order.
Though interviewees emphasized the importance of innovation, they also
noted that the relevance of such approaches was not always fully articulated
or integrated into the evaluation design matrix. This may have influenced
the perceived fragmentation noted above. While the trend of increasing
methodological diversity identified in the inventory of methods should be
applauded, innovation should not become an end in itself. Evaluation teams
should always carefully consider the cost-benefit ratio of innovation and the
logic of using specific methods to address evaluation questions, making sure
that each new approach adds value to the analysis.



                                                                                  World Bank Group Independent Evaluation Group    55
                                                          1 
                                                               Elsewhere in this report, it is indicated that “The process tracing study in the Dominican
                                                          Republic was used to test formally the theoretical framework emerging from the literature
                                                          review.” See box A.4: Process Tracing of Citizen Engagement in the Dominican Republic, p. 78.

                                                          2 
                                                               As noted in the report, “knowledge graphs allow for a ‘smart’ theory of change that inte-
                                                          grates the theory of change and project outcome data to streamline the portfolio reviewing
                                                          process, as well as to assist reporting, strategic analysis, and portfolio management. Knowl-
                                                          edge graphs are complementary to machine learning because they can explicitly encode
                                                          expert knowledge in ways that are difficult with machine learning models.”

                                                          3 
                                                               As the theory of change is “a static object, which keeps the task of validating project in-
                                                          dicators and outcomes manual hitherto, the challenge for AI-based decision support is to
                                                          formulate the theory of change as an instantiated machine-readable artifact” (World Bank,
                                                          forthcoming).

                                                          4 
                                                               Published April 1, 2020. While social media analysis provides certain clear advantages, it
                                                          should be noted that there are also serious analytical limitations tied to the nature of the
                                                          underlying data analyzed. Such issues are outside of the scope of the meta-evaluation.

                                                          5 
                                                               The reasoning seems to be that they are perceived as an extra lens leading to new and possi-
                                                          bly different insights.

                                                          6 
                                                               See Pawson (2013) and the earlier references to the Coleman Boat Model for assessing
                                                          macro-meso-micro links.
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 5
56	
6 | Conclusions and Suggestions
Evaluation question 6. What conclusions may be derived from the inventory,
in-depth review, and interviews? What suggestions can be made for future IEG
evaluations?

The meta-evaluation examined the quality and credibility of IEG evaluations
based on their methodological characteristics. The analysis distinguished be-
tween the inventory of methods (assessing the full universe of IEG evaluations
published between FY15 and FY19) and an in-depth assessment of a sample
of eight evaluations. The latter involved an assessment of the evaluation re-
ports and their corresponding Approach Papers on the basis of a framework
of seven attributes of methodological clarity and rigor. The inventory ex-
posed the breadth of methodological approaches featured in the full sample
of evaluation reports, comparing the range of methodologies used across
evaluation reports and their respective Approach Papers. The total number
of methods tended to be higher in the final evaluation reports than what was
initially proposed in the Approach Papers. The prevalence of more innova-
tive methods also increased in more recent evaluations. The use of at least
one innovative method per report appears to have become a norm in more
recent evaluations. Overall, IEG evaluations scored very well on the attri-
butes of scope and focus and consistency. Evaluations also performed quite
well on the attributes of construct validity and data analysis validity. Finally,
a more mixed picture was found for the attributes of reliability, internal va-
lidity, and external validity. On each of these, a number of good and weaker
examples of evaluations were identified.

The sections below present six conclusions from the meta-evaluation. These
are supplemented with suggestions for future IEG evaluations, highlighting
some of the strengths and weaknesses identified in the assessment of pro-
grammatic and corporate evaluations.
                                                                                    57
                                                          Scope and Focus of IEG Evaluations

                                                          Conclusions
                                                          Overall, information presented on scope, rationale, and goals in the evalua-
                                                          tion reports and Approach Papers was elaborate, relevant, and thorough. At
                                                          the same time, the scope of some IEG evaluations tended to be overambi-
                                                          tious and diluted. This was mainly due to two aspects: the complexity of the
                                                          evaluand (multisite, multilevel, and multiactor in nature) and the number
                                                          and clarity of evaluation questions. While one or more overarching questions
                                                          were usually formulated, certain evaluations subsequently added more than
                                                          10 subquestions for a bag-of-questions approach.


                                                          Suggestions
                                                          The meta-evaluation offers two suggestions for improvement in this area.
                                                          First, the use of portfolio analysis as a standard operational procedure
                                                          should be reconsidered. Specifically, Approach Papers should explicitly dis-
                                                          cuss the necessity of addressing the full diversity of interventions underlying
                                                          a (thematic or sectoral) portfolio.1 Such an analysis will help formulate more
                                                          precise evaluation questions. Moreover, less time and resources need be
                                                          spent on the identification and descriptive analysis of the portfolio.2 Second,
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 6




                                                          evaluators should refrain from formulating bags of questions, and instead
                                                          devote more time to refining the focus of evaluations.


                                                          Use of Conceptual Frameworks and
                                                          Theories of Change

                                                          Conclusions
                                                          Overall, IEG evaluations adequately defined concepts (though they did not
                                                          always operationalize them). More recent evaluations systematically incor-
                                                          porated evidence from the literature and made adequate use of theories of
                                                          change. However, the function of the theory of change was not always clearly
                                                          articulated; its relation to the empirical parts of the evaluative analysis
58	




                                                          could have been strengthened.
The evaluations in the sample usually employed one (or more) of three ap-
proaches for applying theories of change. In the first, the conceptual frame-
work would capture the inputs, activities, outputs, and outcomes of a body of
work alongside major enabling or restricting contextual factors.3 This usually
served as a sense-making framework to better understand and define the
often complex scope of the evaluation. The second approach involved the
development of a substantive theory of change, disaggregating specific pack-
ages of interventions and confronting the theory with empirical evidence.4
The third approach involved a combination of a more general theory of
change underlying macro-level Bank Group categories of activities and one
or more nested theories within this broader framework. Though all evalua-
tions applied theories of change, more attention could have been paid to the
ways in which they interact with the empirical part of the evaluation. Some
evaluations studied intervention mechanisms, but relatively less attention
was paid to how such mechanisms operate in specific contexts.5


Suggestions
The meta-evaluation offers three suggestions in this area. First, evaluations
should more explicitly articulate the role theories of change play in data col-
lection and analysis, assessing their relationship to relevant empirical work.
Where possible, the analysis should always link back to the theory of change,
providing an assessment of its veracity as well as its potential shortcomings.


                                                                                  World Bank Group Independent Evaluation Group    59
Second, evaluations could be more precise about the content of their theo-
ries of change. Specifically, the adoption of a context-mechanism-outcomes
model or comparable analogs from the field of realist evaluations is recom-
mended.6,7 Finally, greater attention to operationalizing concepts into vari-
ables and measurement instruments could improve construct validity.


Clarity of Research Methods and Design

Conclusions
Overall, clarity in evaluation design has improved in IEG evaluations over
the past five years. The use of tools such as the EDM is widespread. However,
sometimes the EDM presents only a list of evaluative instruments. A number
                                                          of evaluations still do not show sufficient clarity on how different methods
                                                          help answer specific evaluation questions and how evidence from different
                                                          sources is triangulated and used to substantiate evaluation findings.

                                                          As shown in the inventory of 28 evaluations (see chapter 3), the EDM is an
                                                          increasingly important tool for enhancing the reliability of evaluations, with
                                                          more recent evaluations paying closer attention to its formulation. However,
                                                          despite their role in clarifying the evaluation design, certain EDMs (and the
                                                          supporting narratives) did not go beyond a listing of the individual methods
                                                          used. Designs are “not about the logistics of research—how the data are col-
                                                          lected, for example—but rather about the logic of inquiry, the links between
                                                          questions, data and conclusions” (White 2013.)


                                                          Suggestions
                                                          Two suggestions are provided for this area. First, more attention should be
                                                          paid to distinguishing between data collection and data analysis methods,
                                                          fully articulating the ways in which the two complement each other. Ap-
                                                          proach papers (and methodology section in the reports) should clarify the
                                                          logic of the design rather than merely listing the methods (to be) used. Sec-
                                                          ond, guidance on best practices in the practical implementation of principles
                                                          of triangulation and synthesis in evaluation should be developed.
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 6




                                                          Validity

                                                          Conclusions
                                                          While there are good examples of evaluations with high internal, external,
                                                          and data analysis validity of findings, there are ongoing challenges that mer-
                                                          it further attention.8

                                                          Internal validity assesses the extent to which a study establishes a trust-
                                                          worthy causal relationship (either attribution or contribution).9 As noted
                                                          previously, theories of change play an important role in this area. However,
                                                          the reviewed evaluations offered limited references to conventional threats
                                                          to validity or how to address them. The complexity of evaluands exacerbates
                                                          this challenge, especially in contexts where evaluations covered dozens of
60	
countries, hundreds of projects, and several years of implementation. While
the sample yielded mixed results on the attribute of external validity (or
generalizability), its discussion was generally consistent with and reflective
of the nature of the evaluation. Some evaluations explicitly discussed the
limitations of generalizability across different contexts but provided limit-
ed mitigation strategies. Finally, the meta-evaluation’s assessment of data
analysis validity was quite positive across the sample. However, two common
challenges were noted, relating to issues of transparency and triangulation.
First, some evaluations faced difficulties in clearly demonstrating the stream
of evidence that supported some of the key findings. Second, the triangula-
tion of evidence was insufficiently applied (or clarified) in some evaluations.


Suggestions
The meta-evaluation proposes three suggestions for improvement in this
area. While suggestions related to the use of theories of change have already
been presented, it should be noted that improvements in this area can also
improve internal validity. Second, a dedicated section on the diagnosis and
treatment of internal and external validity issues could be useful in mitigat-
ing some of the challenges posed by the complexity of evaluands. Finally,
guidance (as suggested previously) on how to triangulate evidence with and
across sources of evidence would be helpful.



                                                                                  World Bank Group Independent Evaluation Group    61
Consistency

Conclusions
Overall, IEG evaluation reports fared quite well with respect to the consis-
tency between rationale, scope, questions, methods, findings, and recom-
mendations. There was a generally strong fit among the use of methods, data
sources, and evaluation questions.

In most cases, recommendations from the reports logically followed from the
findings. Less evident in some cases was the added value of individual meth-
ods within a given evaluation. The consistency between questions, levels of
data collection and analysis, and synthesis of findings was not always clear.
                                                          Furthermore, the nature of macro-meso-micro links tended to be implicit
                                                          rather than explicit in most of the evaluations assessed.10


                                                          Suggestions
                                                          To further strengthen analytical rigor, IEG evaluations should consider
                                                          developing a more systematic approach to assessing how contextual (mac-
                                                          ro and meso) characteristics may or may not influence the behavior of the
                                                          beneficiaries of Bank Group-supported interventions. This would include
                                                          clarifying how and under what conditions different levels of analysis are
                                                          linked. Apart from the use of multilevel EDMs, the literature provides several
                                                          analytical models to tackle this issue: the Coleman Boat Model, for example,
                                                          could provide a useful framework in this context.11


                                                          Innovation in Evaluation

                                                          Conclusions
                                                          During FY15 to FY19, IEG evaluations demonstrated a broadening range of
                                                          methods used to respond to evaluation questions. While innovation in meth-
                                                          ods used for data collection and analysis should be applauded, such innovation
                                                          should not become an end in itself. Evaluation teams should always carefully
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 6




                                                          consider the cost-benefit ratio of innovation and the logic of using specific
                                                          methods to address evaluation questions.


                                                          Suggestions
                                                          The meta-evaluation proposes the following suggestions on innovation. IEG
                                                          could benefit from a more strategic view of methodological innovation in
                                                          evaluation. Among other things this would involve distinguishing between
                                                          innovations that (potentially) significantly change the evaluation approach
                                                          as a whole (or a large part thereof) and boutique studies. Systems of inno-
                                                          vation should be seen as “a way of summarizing the patterns of interactions
                                                          and interdependencies [that are] evolving and changing” between and within
                                                          organizations (Eig 2014). If a collaborative social environment for innovation
                                                          can be fostered, the quality of evaluations can be improved through the inte-
62	




                                                          gration of innovative approaches and greater interactions between them. We
suggest that IEG further stimulate experimentation and collaboration across
IEG on innovative approaches.

Finally, as Jewitt et al. (2017) note, “the digital is a catalyst for innovation.”
Given the recent challenges posed by the COVID-19 pandemic, digital tools
and approaches will undoubtedly grow in relevance in the work of the Bank
Group generally and IEG specifically. IEG should therefore be ready to learn
from recent experiences in innovation (especially in the field of data science)
and make informed decisions to adapt its practices where needed.




                                                                                     World Bank Group Independent Evaluation Group    63
                                                          1 
                                                               This is particularly relevant for evaluations whose scope spans across multiple countries,
                                                          long time horizons, and the three Bank Group institutions (World Bank, International Finance
                                                          Corporation, Multilateral Investment Guarantee Agency) in both lending and nonlending
                                                          operations.

                                                          2 
                                                               This will also improve the value added by investments in portfolio review and analysis.

                                                          3 
                                                               Such characteristics were sometimes referenced in a similar manner as a logical framework
                                                          approach.

                                                          4 
                                                               Attention was sometimes paid to the mechanisms that made interventions work.

                                                          5 
                                                               This is critical given that there is often no a priori evidence that a theory of change will be
                                                          valid in different contexts.

                                                          6 
                                                               See Lemire et al. (2020).

                                                          7 
                                                               See Pawson (2013).

                                                          8 
                                                               Regarding construct validity, please refer to the points made above under the heading “Use
                                                          of Conceptual Frameworks and Theories of Change.”

                                                          9 
                                                               Given the complexity of evaluands and issues of equifinality in attributing formal causal
                                                          relationships, contributory causal relationships (those that support the outcome but are not
                                                          the sole determinant of causation) are mainly considered here.
Meta-Evaluation of IEG Evaluations (FY15–19)  Chapter 6




                                                          10 
                                                                “Macro” in this context pertains to country-level characteristics such as infrastructure,
                                                          connectivity, investment climate, social inclusion/exclusion, fragility or conflict situations,
                                                          economic or financial context, demography, and so forth. “Meso” refers to the role played by
                                                          intermediary organizations and institutions. Finally, “micro” concerns the behavior of ben-
                                                          eficiaries and end users. In most if not all logic models (theories of change) examined in the
                                                          sample of eight evaluations, these links were not clearly articulated.

                                                          11 
                                                                See, for example, Hedström and Ylikoski (2010), Raub et al. (2012), and Astbury and Leeuw
                                                          (2010).
64	
References
Astbury, B., and F. L. Leeuw. 2010. “Unpacking Black Boxes: Mechanisms and Theory
    Building in Evaluation.” American Journal of Evaluation 31 (3): 363–81.

Bamberger, M., J. Rugh, and L. Mabry. 2011. Real World Evaluation: Working under
    Budget, Time, Data, and Political Constraints. 2nd ed. Thousand Oaks, CA: Sage
    Publications.

Bamberger, M., J. Rugh, M. Church, and L. Fort. 2004. “Shoestring Evaluation: De-
    signing Impact Evaluations under Budget, Time and Data Constraints.” Ameri-
    can Journal of Evaluation 25 (1): 5–37.

Bunge, M. 1997. Philosophy of Science: From Problem to Theory. New Brunswick:
    Transaction Publishers.

Campbell, Donald T., and Julian C. Stanley. 1963. Experimental and Quasi-Experimen-
    tal Designs for Research. Ravenio Books.

Coleman, J. 1990. Foundations of Social Theory. New York: Belknap Press.

Cook, T. D., and D. T. Campbell. 1979. Quasi-Experimentation: Design and Analysis for
    Field Settings. Chicago: Rand McNally.

Dfid (Department for International Development). 2012. Broadening the Range of

                                                                                        World Bank Group Independent Evaluation Group    65
    Designs and Methods for Impact Evaluations. Report of a study commissioned by
    the Dfid, UK, Working Paper 38.

Eig, L. 2014. Innovations and New Technology. What Is the Role of Research? Implica-
    tions for Public Policy. VINNOVA—Swedish Governmental Agency for Innovation
    Systems.

Epstein, Lee, and Andrew D. Martin. 2014. An Introduction to Empirical Legal Re-
    search. Oxford University Press.

ECG (Evaluation Cooperation Group). 2012. Big Book on Evaluation Good Practice
    Standards. ECG.

Farrington, D. 2003. “Methodological Quality Standards for Evaluation Research.”
    The Annals of the American Academy of Political and Social Science 587: 49–68.
                                                           Fitzpatrick, J., Blaine R. Worthen, and James R. Sanders. 2004. Program Evaluation:
                                                               Alternative Approaches and Practical Guidelines. Boston: Pearson/Allyn & Bacon.

                                                           Gorard, S. 2010. “Research Design, as Independent of Methods.” In Sage Handbook
                                                               of Mixed Methods, edited by C. Teddlie and A. Tashakkori, 237–252. Thousand
                                                               Oaks, CA: Sage Publications.

                                                           Hedges, L. V. 2017. “Design of Empirical Research.” In Research Methods and Meth-
                                                               odologies in Education, edited by R. Coe, M. Waring, L. V. Hedges, and J. Arthur.
                                                               Thousand Oaks, CA: Sage Publications.

                                                           Hedström, Peter, and Petri Ylikoski. 2010. “Causal Mechanisms in the Social Scienc-
                                                               es.” Annual Review of Sociology 36.

                                                           Janesick, V. J. 1998. “The Dance of Qualitative Research Design: Metaphor, Meth-
                                                               odolatry, and Meaning.” In Strategies of Qualitative Inquiry, edited by N. K. Den-
                                                               zin and Y. S. Lincoln, 35–55. Thousand Oaks, CA: Sage Publications.

                                                           Jewitt, Carey, Anna Xambo, and Sara Price. 2017. “Exploring Methodological Inno-
                                                               vation in the Social Sciences: The Body in Digital Environments and the Arts.”
                                                               International Journal of Social Research Methodology 20 (1): 105–20.

                                                           Kane, E. 1984. Doing Your Own Research: Basic Descriptive Research in the Social Sci-
                                                               ences and Humanities. London: Marion Boyars.
Meta-Evaluation of IEG Evaluations (FY15–19)  References




                                                           Lee, Kelley, and David Fidler. 2007. “Avian and Pandemic Influenza: Progress and
                                                               Problems with Global Health Governance.” Global Public Health 2 (3): 215–34.

                                                           Leeuw, F., and H. Schmeets. 2016. “Chapter 3: Research Problems.” Empirical Legal
                                                               Research, A Guidance Book for Lawyers, Legislators and Regulators. Amsterdam:
                                                               EE Publishers.

                                                           Lemire, S., A. Kwako, S. B. Nielsen, C. A. Christie, S. L. Donaldson, and F. L. Leeuw.
                                                               2020. “What Is This Thing Called a Mechanism? Findings from a Review of Re-
                                                               alist Evaluations.” In J. Schmitt (Ed.), Causal Mechanisms in Program Evaluation.
                                                               New Directions for Evaluation 167: 73–86.

                                                           Lund, Thorleif. 2020. “A Revision of the Campbellian Validity System.” Scandinavian
                                                               Journal of Educational Research. 1–13.
66	
Network of Networks for Impact Evaluation (NONIE). 2009. Impact Evaluations and
    Development. NONIE Guidance on Impact Evaluation. Washington, DC: NONIE.

Orata, P. 1940. “Evaluating Evaluation.” The Journal of Educational Research 33 (9):
    641–66.

Organization for Economic Co-operation and Development—Development Assis-
    tance Committee (OECD-DAC). 2010. Development Evaluation Resources and
    Systems. Paris, France: OECD-DAC.

Ostrom, E. (2010). “Beyond Markets and States: Polycentric Governance of Complex
    Economic Systems.” American Economic Review 100: 641–72.

Pawson, R. 2013. The Science of Evaluation: A Realist Manifesto. Thousand Oaks, CA:
    Sage Publications.

Ragin, C. 2014. The Comparative Method: Moving Beyond Qualitative and Quantitative
    Strategies. Berkeley, CA: University of California Press.

Raub W., et al. 2012. “Micro-Macro Links and Microfoundations in Sociology.” The
    Journal of Mathematical Sociology 35: 1–25.

Scriven, M. 2015. The Meta-Evaluation Checklist. Claremont, CA: Claremont Evalua-
    tion Center.

Shadish, William R. 2002. “Revisiting Field Experimentation: Field Notes for the
    Future.” Psychological Methods 7 (1): 3.

Strauss, M., and G. T. Smith. 2009. “Construct Validity: Advances in Theory and           World Bank Group Independent Evaluation Group    67
    Methodology.” Annual Review of Clinical Psychology 5 (1): 1–25.

Ultee, W. 2001. “Problem Selection in the Social Sciences: Methodology.” In Interna-
    tional Encyclopedia of the Social and Behavioral Sciences, edited by N. Smelser and
    P. Baltes Amsterdam: Elsevier. 18: 12110–17.

United Nations Evaluation Group. 2016. Norms and Standards for Evaluation. New
    York: UNEG.

Vaessen, J. 2018. “Five Ways to Think About Quality in Evaluation.” (blog), December
    11, 2018. https://ieg.worldbankgrou p.org/blog/five-ways-think-about-quali-
    ty-evaluation.
                                                           Van Thiel, Sandra. 2014. Research Methods in Public Administration and Public Man-
                                                               agement: An Introduction. Routledge.

                                                           Vaus, D. de. 2001. Research Design in Social Research. London: Sage Publications.

                                                           White, H., and H. Waddington. 2012. “Why Do We Care about Evidence Synthesis?
                                                               An Introduction to the Special Issue on Systematic Reviews.” Journal of Develop-
                                                               ment Effectiveness 4 (3): 351–58.

                                                           White, P. 2010. Developing Research Questions: A Guide for Social Scientists. Hound-
                                                               mills, UK: Palgrave Macmillan.

                                                           White, P. 2013. “Who’s Afraid of Research Questions? The Neglect of Research Ques-
                                                               tions in the Methods Literature and a Call for Question-Led Methods Teaching.”
                                                               International Journal of Research & Method in Education 36 (3): 213–27.

                                                           World Bank. 2014. World Development Report 2015: Mind, Society, and Behavior.
                                                               Washington, DC: World Bank.

                                                           World Bank. 2015a. Financial Inclusion: A Foothold on the Ladder toward Prosperity?
                                                               An Evaluation of World Bank Group Support for Financial Inclusion for Low-Income
                                                               Households and Microenterprises. Independent Evaluation Group. Washington,
                                                               DC: World Bank.

                                                           World Bank. 2015b. Learning and Results in World Bank Operations: How the Bank
Meta-Evaluation of IEG Evaluations (FY15–19)  References




                                                               Learns. Independent Evaluation Group. Washington, DC: World Bank.

                                                           World Bank. 2015c. The Poverty Focus of Country Programs: Lessons from World Bank
                                                               Experience. Independent Evaluation Group. Washington, DC: World Bank.

                                                           World Bank. 2015d. World Bank Group Support to Electricity Access, FY2000–2014.
                                                               Independent Evaluation Group. Washington, DC: World Bank.

                                                           World Bank. 2015e. World Bank Support to Early Childhood Development. Independent
                                                               Evaluation Group. Washington, DC: World Bank.

                                                           World Bank. 2016a. Behind the Mirror: A Report on the Self-Evaluation Systems of
                                                               the World Bank Group. Independent Evaluation Group. Washington, DC: World
                                                               Bank.
68	
World Bank. 2016b. Industry Competitiveness and Jobs: An Evaluation of World Bank
    Group Industry-Specific Support to Promote Industry Competitiveness and Its Im-
    plications for Jobs. Independent Evaluation Group. Washington, DC: World Bank.

World Bank. 2016c. Program-for-Results: An Early-Stage Assessment of the Process and
    Effects of a New Lending Instrument. Independent Evaluation Group. Washington,
    DC: World Bank.

World Bank. 2016d. The World Bank Group’s Support to Capital Market Development.
    Independent Evaluation Group. Washington, DC: World Bank.

World Bank. 2017a. A Thirst for Change: The World Bank Group’s Support for Water
    Supply and Sanitation, with Focus on the Poor. Independent Evaluation Group.
    Washington, DC: World Bank.

World Bank. 2017b. Data for Development: An Evaluation of World Bank Support for
    Data and Statistical Capacity. Independent Evaluation Group. Washington, DC:
    World Bank.

World Bank. 2017c. Growing the Rural Nonfarm Economy to Alleviate Poverty: An
    Evaluation of the Contribution of the World Bank Group. Independent Evaluation
    Group. Washington, DC: World Bank.

World Bank. 2017d. Higher Education for Development: An Evaluation of the World
    Bank Group’s Support. Independent Evaluation Group. Washington, DC: World
    Bank.

World Bank. 2017e. Mobile Metropolises: Urban Transport Matters: An IEG Evaluation     World Bank Group Independent Evaluation Group    69
    of the World Bank Group’s Support for Urban Transport. Independent Evaluation
    Group. Washington, DC: World Bank.

World Bank. 2017f. Toward a Clean World for All: An IEG Evaluation of the World Bank
    Group’s Support to Pollution Management. Independent Evaluation Group. Wash-
    ington, DC: World Bank.

World Bank. 2017g. World Bank Group Country Engagement: An Early-Stage Assess-
    ment of the Systematic Country Diagnostic and Country Partnership Framework
    Process and Implementation. Independent Evaluation Group. Washington, DC:
    World Bank.
                                                           World Bank. 2018a. Carbon Markets for Greenhouse Gas Emission Reduction in a
                                                               Warming World. Independent Evaluation Group. Washington, DC: World Bank.

                                                           World Bank. 2018b. Conducting a Structured Literature Review in the Framework of IEG
                                                               (Major) Evaluations. Independent Evaluation Group. Washington, DC: World
                                                               Bank.

                                                           World Bank. 2018c. Engaging Citizens for Better Development. Independent Evaluation
                                                               Group, Washington, DC: World Bank.

                                                           World Bank. 2018d. Growth for the Bottom 40 Percent: The World Bank Group’s Support
                                                               for Shared Prosperity. Independent Evaluation Group. Washington, DC: World
                                                               Bank.

                                                           World Bank. 2018e. Mexico Country Program Evaluation: An Evaluation of the World
                                                               Bank Group’s Support to Mexico (2008–17). Independent Evaluation Group.
                                                               Washington, DC: World Bank.

                                                           World Bank. 2018f. The International Finance Corporation’s Approach to Engaging Cli-
                                                               ents for Increased Development Impact. Independent Evaluation Group. Washing-
                                                               ton, DC: World Bank.

                                                           World Bank. 2018g. World Bank Group Support to Health Services: Achievements and
                                                               Challenges. Independent Evaluation Group. Washington, DC: World Bank.
Meta-Evaluation of IEG Evaluations (FY15–19)  References




                                                           World Bank. 2019a. ‘Creating Markets’ to Leverage the Private Sector for Sustainable
                                                               Development and Growth: An evaluation of the World Bank Group’s Experience
                                                               Through 16 case studies. Independent Evaluation Group. Washington, DC: World
                                                               Bank.

                                                           World Bank. 2019b. Building Urban Resilience: An Evaluation of the World Bank
                                                               Group’s Evolving Experience (2007–17). Independent Evaluation Group. Washing-
                                                               ton, DC: World Bank.

                                                           World Bank. 2019c. Grow with the Flow: An Independent Evaluation of the World Bank
                                                               Group’s Support to Facilitating Trade 2006–17. Independent Evaluation Group.
                                                               Washington, DC: World Bank.

                                                           World Bank. 2019d. Knowledge Flow and Collaboration under the World Bank’s New
                                                               Operating Model. Independent Evaluation Group. Washington, DC: World Bank.
70	
World Bank. 2019e. Two to Tango: An Evaluation of World Bank Group Support to
    Fostering Regional Integration. Independent Evaluation Group. Washington, DC:
    World Bank.

World Bank. 2019f. World Bank Group Support in Situations Involving Conflict-Induced
    Displacement. Independent Evaluation Group. Washington, DC: World Bank.

World Bank. 2020a. Evaluation of the World Bank’s Support to Improving Child Under-
    nutrition and Its Determinants. Approach Paper. Independent Evaluation Group.
    Washington, DC: World Bank.

World Bank. 2020b. Results and Performance of the World Bank Group 2020. Washing-
    ton, DC: World Bank.

World Bank. 2020c. The World’s Bank: An Evaluation of the World Bank Group’s Global
    Convening. Independent Evaluation Group. Washington, DC: World Bank.

World Bank, forthcoming. “Advanced Content Analysis: Can Artificial Intelligence
    Accelerate Theory-Driven Complex Program Evaluation?” IEG Methods and
    Evaluation Capacity Development Working Paper Series, World Bank, Washing-
    ton, DC.

World Bank Group. 2019. World Bank Group Evaluation Principles. Washington, DC:
    World Bank Group.

Yeager, S. J. 2008. “Where Do Research Questions Come from and How Are They De-


                                                                                       World Bank Group Independent Evaluation Group    71
    veloped?” In Handbook of Research Methods in Public Administration, 45–60. New
    York: Taylor & Francis Group.
APPENDIXES
Independent Evaluation Group
Meta-Evaluation of IEG Evaluations
(FY15–19)
                                                           Appendix A. Stratified Random
                                                           Sample of IEG Evaluations
                                                           Using a stratified random sample, the meta-evaluation identified a subset
                                                           of projects to which the framework was applied. The following stepwise
                                                           approach was used to draw the sample of eight projects examined in the
                                                           in-depth review. First, all major and thematic evaluations from fiscal year
                                                           (FY)15 to FY19 were divided into two groups (corporate and programmatic
                                                           evaluations). Corporate evaluations focus on World Bank Group processes,
                                                           institutional structures, or corporate strategies of engagement. Such eval-
                                                           uations seek to assess the World Bank’s internal capacity to deliver on its
                                                           mandate.1 Programmatic evaluations focus on Bank Group programs and
                                                           operations that directly benefit its clients, focusing on the World Bank’s di-
                                                           rect and indirect contributions to achieving the twin goals of ending extreme
                                                           poverty and boosting shared prosperity. Table A.1 presents the classification
                                                           of evaluations into the two categories described.

                                                           Table ‎A.1. Classification of Evaluations

                                                            Corporate Evaluations (n = 8)                Programmatic Evaluations (n = 20)
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix A




                                                            Learning and Results in World Bank Opera-    Ending Poverty (FY15)
                                                            tions, Phase 2 (FY15)
                                                            Assessment of World Bank Group’s             Financial Inclusion (FY15)
                                                            Self-Evaluation System (FY16)
                                                            P4R: Program for Results: An Early-Stage     Electricity Access (FY15)
                                                            Assessment of the Process and Effects of a
                                                            New Lending Instrument (FY16)
                                                            SCD/CPF’s Process Evaluation (FY17)          Early Childhood Development (FY15)

                                                            IFC Client Engagement Model (FY18)           Capital Market Development (FY16)

                                                            Engaging Citizens (FY18)                     Competitiveness and Jobs (FY16)

                                                            World Bank Group Convening Power (FY19)      Higher Education (FY17)

                                                            Knowledge Flow and Coordination (FY19)       Shared Prosperity (FY17)

                                                                                                                                      (continued)
74	
 Corporate Evaluations (n = 8)                          Programmatic Evaluations (n = 20)
                                                        Rural Nonfarm Economy (FY17)

                                                        Water Supply and Sanitation (FY17)

                                                        Urban Transport Mobile (FY17)

                                                        Data for Development (FY17)

                                                        Clean World for All (FY18)

                                                        Essential Health Care Services (FY18)

                                                        Carbon Finance (FY18)

                                                        Facilitating Trade (FY18)

                                                        Forced Displacement (FY18)

                                                        Fostering Regional Integration (FY19)

                                                        Urban Resilience (FY19)

                                                        Creating Markets (FY19)

Source: Independent Evaluation Group.

Note: The table is based on the set of evaluations completed between FY15 and FY19. Two evaluations
were excluded as no final report was available in FY19 (one on public finance and one on subnational
governments). This table provides the evaluation topic. For the full title and complete information, see
the reference list of the main report. FY = fiscal year.


Next, an inventory of methodological approaches was made for the eval-
uations identified above, mapping the various approaches proposed and
applied in each report and its respective Approach Paper. The inventory was


                                                                                                           World Bank Group Independent Evaluation Group    75
used to classify all evaluations into two groups: studies largely relying on
standard evaluation methodologies and those employing broadened evalua-
tion methods (that is, where a broader set of methods or designs significantly
determined the collection and analysis of data underpinning evaluation find-
ings).2 This classification resulted in a 2x2 matrix, dividing the evaluations
by type and use of methods. Based on this, a random sample was drawn from
each of the cells (one each from the cells containing corporate evaluations
and three each from the cells containing programmatic evaluations). Sam-
ples were drawn in proportion to the distribution of evaluations relative to
the total universe assessed.

The approach outlined above provided two key advantages for analysis. First,
stratification between standard and broadened evaluation methodologies
                                                           allowed for the examination of a wider range of evaluations, optimizing the
                                                           meta-evaluation’s potential for generating lessons on the enhanced use of
                                                           methods. Second, random selection within the defined strata reduced the
                                                           risk of “cherry picking” based on a priori biases, generating a more objective
                                                           assessment of evaluations.
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix A
76	
Appendix B. List of Interviewees
Leonardo Bravo

Soniya Carvalho

April Connelly

Hiroyuki Hatashima

Ramachandra Jammi

Lauren Kelly

Raghavan Narayanan

Maria Elena Pinglo

Estelle Raimondo

Bekele Shiferaw

Andrew Stone

Maria De Las Mercedes Vellez

Stephan Wegner


                                              World Bank Group Independent Evaluation Group    77
Interviewers: Frans Leeuw and Julian Gayfer
                                                           Appendix C. Assessment Framework
                                                           for the IEG Meta-Evaluation
                                                           Table ‎C.1. Assessment Framework

                                                            Dimension or Attribute   Review (Sample)                        Inventory (Universe)
                                                            Scope                    Is the context and rationale of the    Has explicit atten-
                                                                                     evaluation adequately described?       tion been paid to the
                                                                                     Are the evaluation goals adequate-     context and rationale
                                                                                     ly formulated?                         of the evaluation,
                                                                                     Are the evaluation questions ade-      evaluation goals, and
                                                                                     quately formulated (also in relation   evaluation questions?
                                                                                     to each other)?
                                                                                     Are the evaluation questions ad-
                                                                                     equately linked to the evaluation
                                                                                     goals?
                                                                                     Have the scope and delimitation
                                                                                     of the evaluation been adequately
                                                                                     described?
                                                                                     Has attention been paid to the
                                                                                     complexity of the evaluand? Has
                                                                                     complexity been described, and
                                                                                     how?
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix C




                                                                                                                                       (continued)
78	
Dimension or Attribute          Review (Sample)                         Inventory (Universe)
Reliability                     Is the methodology of the evalua-       Is the discussion of the
(concerned with the             tion adequately described, including    methodology com-
transparency and clarity        Design matrix:                          prehensive? Are any
in describing the use of         »	 Theory of change—theory of          of the key elements
methods and data in view             action/conceptual framework        missing (based on
of the potential replicabili-   »	 Portfolio identification and         checklist/ existing
ty of the evaluation)               analysis                            guidance)?
                                »	 Quality assurance principles in
                                   coding and synthesis
                                »	 Sampling and selection consid-
                                    erations
                                »	 Data collection methods and
                                   sources of data
                                »	 Data analysis methods
                                »	 Triangulation and synthesis of
                                    findings, including how (dif-
                                    ferent) findings coming from
                                    different methods/designs have
                                    been integrated to reach (gen-
                                    eral) conclusions?
                                Are the limitations of the evaluation
                                adequately described (resulting
                                from limitations in scope, methods/
                                data, validity of findings)?
Construct validity              Has the evaluation adequately
(concerned with how to          defined key concepts?
ensure that the variables       Has the evaluation adequately
and their relationships that    operationalized key concepts into


                                                                                                   World Bank Group Independent Evaluation Group    79
are measured adequately         measurable attributes?
represent the underlying        Have relationships between the
realities of interventions      concepts/variables been ad-
and their contexts)             equately articulated (theory of
                                action, theory of change, and/or
                                conceptual framework)?
                                Has the evaluation made adequate
                                use of external existing literature?
                                Have principles of structured
                                literature review been adequately
                                applied?
                                If there was an intention to do a
                                theory-driven evaluation, how has
                                that been done? (for example, was
                                attention paid to the articulation
                                of mechanisms, contexts, and
                                outcomes)?

                                                                                    (continued)
                                                           Dimension or Attribute        Review (Sample)                          Inventory (Universe)
                                                           Internal validity             Has there been an explicit discus-
                                                           (concerned with how           sion on how to deal with the issue
                                                           to establish a causal         of causality/ attribution or contribu-
                                                           relationship between          tion in the evaluation?
                                                           intervention outputs and      Are causal questions adequate-
                                                           processes of change           ly addressed through the use of
                                                           leading/contributing to       causal methods/designs?
                                                           outcomes and impacts)         Has adequate attention been paid
                                                                                         to unintended effects?
                                                                                         Are there any indications of internal
                                                                                         validity concerns affecting the
                                                                                         validity of findings?
                                                           External validity             Are the potential and the limitations
                                                           (concerned with the           for the generalizability of findings
                                                           extent to which one can       adequately described?
                                                           generalize findings to        Has the report paid adequate
                                                           other interventions, re-      attention to population validity (the
                                                           gions, time periods, target   ability to generalize the study re-
                                                           groups, and so on             sults to individuals or target groups,
                                                                                         organizations, regions not included
                                                                                         in the study)?
                                                                                         Has the report paid adequate
                                                                                         attention to ecological validity (the
                                                                                         ability to generalize the results of a
                                                                                         study across settings)?
                                                                                         Has the report paid adequate
                                                                                         attention to temporal validity (the
                                                                                         extent to which the study results
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix C




                                                                                         can be generalized across time)?
                                                                                         Are there any indications of exter-
                                                                                         nal validity concerns affecting the
                                                                                         validity of findings?
                                                           Data analysis validity        Has the evaluation paid attention to
                                                           (concerned with how to        the risks of bias resulting from:
                                                           ensure that the data col-      »	 Unreliable data
                                                           lected and analyzed are       »	 The incorrect use of methods
                                                           reliable and the methods
                                                                                         Has the evaluation indicated ways
                                                           are used correctly [for
                                                                                         to address potential risks of bias
                                                           example, statistical infer-
                                                                                         resulting from the above?
                                                           ence])
                                                                                         Are there any indications of data
                                                                                         analysis validity concerns affecting
                                                                                         the validity of findings?

                                                                                                                                            (continued)
80	
Dimension or Attribute         Review (Sample)                        Inventory (Universe)
Consistency                    Are the methods and data sources
(concerned with the logi-      logically linked to the evaluation
cal flow between evalua-       questions?
tion rationale, questions,     Have the methods that are report-
design and methods             ed as being applied indeed been
choice, actual data collec-    applied?
tion and analysis, findings,   Do the findings logically relate to
and recommendations.)          the underlying data and methods
                               used?
                               Do the findings respond to the
                               original evaluation questions?
                               Do the recommendations logically
                               flow from the findings?
                               If there was an intention to link
                               macro (that is, societal) develop-
                               ments/processes to meso- (that is,
                               organizational) and to micro-levels
                               (individuals/beneficiaries), how has
                               this layering taken place and with
                               what (kind of) results?



                                                                                (continued)




                                                                                              World Bank Group Independent Evaluation Group    81
                                                            Dimension or Attribute         Review (Sample)                   Inventory (Universe)
                                                            Broadening the use of          In what ways have “nonstandard”   What are the main
                                                            methods                        methods helped enhance the        methods applied by
                                                                                           depth or breadth of evaluative    the evaluation? To
                                                                                           analysis?                         what extent, based
                                                                                           Have assumptions underlying the   on a classification of
                                                                                           use of approaches working with    methods, does the
                                                                                           Big Data/ machine learning been   evaluation broaden
                                                                                           articulated?                      the use of methods
                                                                                                                             beyond “standard”
                                                                                                                             methods applied
                                                                                                                             throughout IEG evalu-
                                                                                                                             ations?


                                                           Source: Independent Evaluation Group.
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix C
82	
                                                                                                                      Table D.1. Approach Paper and Evaluation Report Scores


                                                                                                                                                                               Reports and Approach Papers
                                                                                                                                                                               Appendix D. Tabulated Scores of
                            Scope                                                          Data
               Report       and                      Construct   Internal     External     Analysis
 Subject       Type         Focus      Reliability   Validity    Validity     Validity     Validity     Consistency
 Urban         Approach     Adequate   Partial       Partial     Inadequate   Partial      Adequate
 Transport     Paper
 Urban         Evaluation   Partial    Partial       Partial     Inadequate   Inadequate   Inadequate   Partial
 Transport     report
 Health        Approach     Adequate   Adequate      Adequate    Partial      Partial      Partial
 Services      Paper
 Health        Evaluation   Adequate   Adequate      Adequate    Partial      Partial      Partial      Adequate
 Services      report
 Client En-    Approach     Adequate   Partial       Partial     Adequate     Inadequate   Adequate
 gagement      Paper
 Client En-    Evaluation   Adequate   Adequate      Adequate    Partial      Inadequate   Partial      Adequate
 gagement      report
 Carbon        Approach     Adequate   Adequate      Adequate    Adequate     Partial      Adequate
 Finance       Paper
 Carbon        Evaluation   Adequate   Adequate      Adequate    Adequate     Partial      Partial      Adequate
 Finance       report
 Learning      Approach     Adequate   Partial       Partial     Inadequate   Inadequate   Partial
 and Results   Paper
 Learning      Evaluation   Adequate   Partial       Adequate    Partial      Adequate     Partial      Partial
 and Results   report
 Electricity   Approach     Adequate   Inadequate    Partial     Partial      Inadequate   Partial
 Access        Paper


(continued)




      World Bank Group Independent Evaluation Group    83
84	      Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix D




                               Scope                                                            Data
                Report         and                          Construct   Internal     External   Analysis
 Subject        Type           Focus        Reliability     Validity    Validity     Validity   Validity     Consistency
 Electricity    Evaluation     Adequate     Partial         Partial     Partial      Partial    Partial      Partial
 Access         report
 Higher         Approach       Adequate     Partial         Partial     Inadequate   Partial    Inadequate
 Education      Paper
 Higher         Evaluation     Adequate     Inadequate      Partial     Adequate     Partial    Adequate     Adequate
 Education      report
 Rural Non-     Approach       Adequate     Partial         Adequate    Partial      Partial    Inadequate
 farm           Paper
 Rural Non-     Evaluation     Partial      Partial         Partial     Partial      Adequate   Partial      Partial
 farm           report

Source: Independent Evaluation Group.
	Such evaluations can either relate to the World Bank Group as a whole or as a function of its
1


underlying institutions.

2
    As noted in appendix 5, standard evaluation methodologies encompass the use of the fol-
lowing methods and designs: portfolio review and analysis (delimitation, description, content
analysis), case study analysis (interviews, desk reviews, and a combination of the other meth-
ods listed here), desk reviews of internal documents (strategies, reports, and so on), structured
literature reviews of external literature (academic and “grey” policy literature), the integra-
tion of an overarching conceptual framework or causal theory (including theories of change
and intervention logics) as a basis for data collection and analysis, semistructured interviews,
surveys, focus groups, descriptive and inferential statistical analysis (univariate, bivariate, or
multivariate regressions and quasi-experimental econometric methods), qualitative content
analysis of interviews and documents using CAQDAS (for example, NVivo), and narrative
synthesis of information from different sources. Evaluations relying on a broader evaluation
methodology encompass the use of the following methods and designs: social network analy-
sis, Delphi panels, theory-driven (“realist”) evaluation, evidence gap maps, geospatial analysis
of (satellite) imagery data or existing geo-tagged data, machine-learning-based information
extraction and classification, within-case causal analysis, process tracing, cross-case causal
analysis (qualitative comparative analysis and pattern matching), social media analysis, and
advanced multivariate statistical techniques (beyond regularly applied regression designs).




                                                                                                     World Bank Group Independent Evaluation Group
                                                                                                     85
                                                           Appendix E. Inventory of Methods
                                                           Used in Evaluations and Approach
                                                           Papers

                                                           Bigram Analysis
                                                           Figure E.1 below shows output from a preliminary bigram analysis of the 28
                                                           evaluation reports and Approach Papers used in the meta-analysis of IEG
                                                           evaluations.1 As can be seen, the automated analysis provides certain pre-
                                                           liminary insights on the prevalence of methods in the reports and Approach
                                                           Papers but requires manual refinement to generate a representative image of
                                                           the methods used therein.
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix E
86	
Figure E.1. Bigram Analysis




                                        World Bank Group Independent Evaluation Group    87
Source: Independent Evaluation Group.
                                                           Inventory of Methods
                                                           In figure E.2, projects are categorized by year, with the matrix showing the
                                                           use of both conventional and innovative methods for each. Conventional
                                                           methods are marked in blue, while innovative methods (content analysis,
                                                           qualitative comparative analysis [QCA]) are in orange.
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix E
88	
Figure E.2. Methods Referenced in Approach Papers




Source: Independent Evaluation Group.




World Bank Group Independent Evaluation Group    89
                                                           Figure E.3 shows the methods that were ultimately used in the evaluation
                                                           reports. Those marked in navy are conventional methods, while the ones in
                                                           orange (content analysis, QCA) are innovative methods. Note that “Content
                                                           Analysis” above includes any methods involving machine learning appli-
                                                           cations or automated content analysis, including text mining and comput-
                                                           er-assisted classification and parsing. “Network Analysis” includes methods
                                                           related to social network analysis, social media analysis, organizational
                                                           network analysis, or network modeling of any kind. “Geospatial Analysis”
                                                           includes the use of geographic information systems data, satellite imagery,
                                                           or other geospatial methods for data collection.
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix E
90	
Figure E.3. Methods Referenced in Evaluation Reports




Source: Independent Evaluation Group.




World Bank Group Independent Evaluation Group    91
                                                           Operationalization and Classification of
                                                           Evaluation Methods
                                                           Some of the categories above were expanded or compressed to provide a
                                                           useful heuristic of the various methods used in the reports and Approach
                                                           Papers. References to portfolio review and analysis were condensed under
                                                           “portfolio review”: this category captures the delimitation, description,
                                                           and analysis of project portfolio relative to the evaluation question. The
                                                           category does not account for automated versus manual processes, which
                                                           is disaggregated in the innovative methods section. “Desk review” refers to
                                                           the review of World Bank internal documents (strategies, reports, and so
                                                           on) in the evaluation.

                                                           All conventional methods used in the Approach Papers and evaluation re-
                                                           ports were tallied in the inventory. The breakdown also provides a sense of
                                                           which methods were over- or underdelivered from the AP proposals to the
                                                           final reports. Figures E.4 and E.5 break down methods by the type of report
                                                           (programmatic versus corporate).

                                                                        ethods Referenced in Corporate and Programmatic
                                                           Figure E.4. M
                                                                           Approach Papers
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix E




                                                           Source: Independent Evaluation Group.
92	
             ethods Referenced in Corporate and Programmatic
Figure E.5. M
                Evaluation Reports




Source: Independent Evaluation Group.




Correlation Analysis
After coding the prevalence of conventional and innovative methods in the
sample of Approach Papers and evaluation reports, these data were con-

                                                                               World Bank Group Independent Evaluation Group    93
verted into a binary matrix and used to assess the correlation between the
methods indicated in the Approach Papers and those actually referenced
in the evaluation reports. This was done to generate a broad sense of how
faithfully the methodological approaches proposed in the first stages of the
evaluation were actually implemented in the final result. The procedure has
been graphically illustrated for the AP-evaluation report pairings with the
highest and lowest methods correlations in figure E.6.
                                                                        omparison of Methods between Approach Papers and
                                                           Figure E.6. C
                                                                           evaluation reports




                                                           Source: Independent Evaluation Group.

                                                           Note: Light blue represents conventional methods used in Approach Papers. Dark blue represents con-
                                                           ventional methods used in evaluations. Light orange represents innovative methods used in Approach
                                                           Papers. Dark orange represents innovative methods used in evaluations.


                                                           The boxes in light and dark blue represent the conventional methods used in
                                                           the Approach Papers and evaluation reports, respectively. Those in light and
                                                           dark orange represent the innovative methods used in the Approach Papers
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix E




                                                           and evaluation reports, respectively. The top row shows the methods pro-
                                                           posed in the Approach Paper, and the bottom row those ultimately delivered
                                                           in the evaluation report. Those with greater overlap over methods between
                                                           the two stages thus have higher correlations. Note that the correlations do
                                                           not take into account how many methods were proposed, nor do they assess
                                                           whether innovative methods were used. As can be seen, the competitiveness
                                                           and jobs evaluation uses no innovative methods but shows a higher AP-eval-
                                                           uation report correlation than the engaging citizens evaluation. Correlation
                                                           coefficients for all of the reports are shown in figure E.7, rank-ordered by the
                                                           degree of overlap between Approach Papers and evaluation reports. There is
                                                           no coefficient for the ending poverty evaluation because no Approach Paper
                                                           was provided to serve as a point of reference.
94	
Figure E.7. 
            Correlation of Methods between Approach Papers and
               evaluation reports




Source: Independent Evaluation Group.


While the correlations provide a useful quantitative metric for assessing the
methodological differences between Approach Papers and evaluation re-

                                                                                  World Bank Group Independent Evaluation Group    95
ports, they fail to account for an important distinction that can influence the
degree of overlap between the methods in Approach Papers and evaluation
reports. Low correlations can be attributed to two factors. The first involves
an overstatement of methodological diversity, representing cases in which
Approach Papers cite more methods than are ultimately delivered in the
evaluation reports. The second involves an understatement of methods, in
which methodological approaches that were not proposed in the Approach
Papers are deployed in the final evaluation. Examples of such over- and un-
der-delivery have been illustrated in figure E.8.

The three evaluations shown in the graphic illustrations below have roughly
comparable correlation coefficients. However, the urban resilience evalu-
ation underdelivered on methods, listing a number of methodological ap-
                                                           proaches that were ultimately not featured in the final evaluation report.
                                                           By contrast, Behind the Mirror overdelivered on methods, using a number of
                                                           approaches that were not listed in the initial proposal. Both have relatively
                                                           low correlations but represent different issues relative to methodological di-
                                                           versity. To better appraise this issue, figure E.9 provides a tally of the number
                                                           of methods used in the final evaluation reports, disaggregating according to
                                                           conventional and innovative methods.

                                                           Figure E.8. Methodological Under- and Overdelivery
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix E




                                                           Source: Independent Evaluation Group.


                                                           Note: Light blue represents conventional methods used in Approach Papers. Dark blue represents con-
                                                           ventional methods used in evaluations. Light orange represents innovative methods used in Approach
                                                           Papers. Dark orange represents innovative methods used in evaluations.
96	
Figure E.9. Tally of Methods Used in Evaluation Reports




Source: Independent Evaluation Group.


As can be seen, the majority of evaluation reports overdelivered on methods
relative to what was originally proposed in their respective Approach Pa-
pers, and those that underdelivered did so with a relatively small difference

                                                                                World Bank Group Independent Evaluation Group    97
between the number of approaches proposed at the Approach Paper phase.
Once again, ending poverty (far right) was omitted from the analysis because
an Approach Paper was not provided for it. Taken alongside figure E.7, the
two provide a useful appraisal of the methodological diversity of the sample
of evaluation reports assessed. Moreover, the figures suggest that method-
ological diversity evolves as a function of evaluation challenges, with addi-
tional approaches subsequently added to address challenges related to the
appraisal of the evaluand.

Based on the coding of methods shown above, the Approach Papers and
evaluation reports were categorized into a division matrix. Note that slight
differences in categorization stem from both the differences in proposed
methods between Approach Papers and evaluation reports and the omis-
                                                           sion of the Approach Paper for the ending poverty evaluation. The division
                                                           helped categorize reports by type, as well as the diversity of methods used.
                                                           These distinctions were used in the stratified random sampling procedure
                                                           employed in selecting evaluations for in-depth review.


                                                           Discussion of Special Issues:
                                                           The inventory below examines the degree to which issues related to trans-
                                                           parency, confidentiality or privacy, ethical considerations, and gender dy-
                                                           namics were incorporated in the sample of evaluations and Approach Papers.
                                                           Figures E.10 and E.11 below show the prevalence of these considerations
                                                           across the sample of 28 project documents, relative to Approach Papers and
                                                           evaluation reports, respectively.1

                                                           Figure E.10. References to Special Issues in Approach Papers
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix E
98	




                                                           Source: Independent Evaluation Group.
Figure E.11. References to Special Issues in Evaluation Reports




                                                                                   World Bank Group Independent Evaluation Group    99


Source: Independent Evaluation Group.


Note that references to ethical issues are quite rare in both Approach Pa-
pers and evaluation reports: of the 28 projects assessed in the sample, there
were only two references to ethical issues in evaluation reports, with zero
references to the same in the Approach Papers. Likewise, issues of privacy or
confidentiality only featured in a minority of the evaluation reports (8 of 28).
By contrast, nearly all of the reports included references to transparency and
gender responses, with 21 of 28 evaluation reports referencing the former
and 22 of 28 evaluation reports referencing the latter. For both issues of
                                                           transparency and gender, the final evaluation reports featured more refer-
                                                           ences than the corresponding Approach Papers.

                                                           Figure E.12 below further breaks these patterns down by year and subject
                                                           area. The graphs show the proportion of Approach Papers and evaluation re-
                                                           ports that feature references to issues of gender, ethics, confidentiality, and
                                                           transparency in each year. For example, 14 percent of Approach Papers from
                                                           2018, as compared with 29 percent of the final evaluation reports from that
                                                           year, referenced privacy or confidentiality concerns.

                                                           Figure E.12. Breakdown of Special Issues by Year
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix E




                                                           Source: Independent Evaluation Group.


                                                           Note that for nearly every category and year, the evaluation reports over-
                                                           performed relative to the coverage of special issues in the corresponding
                                                           Approach Papers. Looking at temporal patterns, it appears that the coverage
                                                           of both transparency and gender issues declined slightly across the range of
                                                           time explored. However, this may simply be a feature of the limited sample
                                                           explored and might not be indicative of a broader pattern within the data.
100	
Assessment of Methodological Appendixes:
An inventory of methodological practices was completed for the full sample
of evaluation reports appraised (N = 28). The inventory categorized compli-
ance along seven dimensions, assessing the presence and quality of various
facets within the supplemental appendixes. The following attributes were
used as a coding scheme to generate the inventory.

  1.	 Does the evaluation report provide a dedicated methodological appendix
     in which questions of research design and implementation are fully elabo-
     rated?

  2.	 Is there any discussion of the sample of projects used in the evaluation?
     Does the report discuss the sampling criteria used to select projects for
     inclusion in the analysis?

  3.	 Does the discussion make an explicit link to the evaluation question(s)
     or evaluand(s)? Are these actively linked to the approaches and methods
     subsequently used?

  4.	 Is there any discussion of causal pathways or a framework for causal in-
     ference within the methodological appendix? Does the appendix incorpo-
     rate such discussions into the research design? Alternatively, is there any
     attempt to discuss the implausibility of causal inference relative to the
     evaluation question(s)?

  5.	 Does the appendix discuss the method(s) of data collection or provide         World Bank Group Independent Evaluation Group    101
     information on any guidelines used in the operationalization of data?

  6.	 Is there any discussion of the limitations (methodological or otherwise) of
     the evaluation, the methodological design, and/or the findings?

  7.	 Is there any reference to hypotheses generated and tested?

The methodological appendixes were graded according to the presence or
absence of the following features. Every evaluation had a supplementary
methodological appendix, and a majority addressed all of the issues raised in
the coding scheme above. Where an attribute was partially discussed or only
referred to cursorily, a partial grade was assigned. Output from the inventory
is summarized in figure E.13.
                                                           Figure E.13. Inventory of Methodological Appendixes




                                                           Source: Independent Evaluation Group.
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix E




                                                           Figure E.14 below provides an additional breakdown of these data. The graph
                                                           sorts evaluations by the total percentage of all reports that address issues
                                                           related to these metrics. We see that several evaluations address all or nearly
                                                           all of these questions in the supplemental appendix. In particular, both the
                                                           facilitating trade and shared prosperity evaluations cover all of the aspects
                                                           listed above: they provide an appendix with a discussion of the sampling,
                                                           causal, and data collection strategies employed, linking these to the evalu-
                                                           ation questions, establishing testable hypotheses, and discussing potential
                                                           limitations. On the other hand, the competitiveness and jobs evaluation pro-
                                                           vides only a partial discussion of the sampling strategy and potential limita-
                                                           tions. As can be seen, the vast majority of evaluations performed rather well
                                                           in this regard.
102	
Figure E.14. Total Share of Attributes




Source: Independent Evaluation Group.


Furthermore, we see that nearly 90 percent of the appendixes discussed the
sampling strategy used in the evaluation, as well as the limitations of the
methodological approach employed. About 85 percent linked the methodolog-

                                                                                    World Bank Group Independent Evaluation Group    103
ical strategy to specific evaluation questions, and 78 percent discussed the
data collection strategy used. Only about 65 percent of evaluations incorpo-
rated the issue of causal identification into the analysis, though coverage of
this issue increased over time. Lastly, a minority (29 percent) of reports used a
hypothesis-testing framework in their methodological appendixes.
                                                           Appendix F. Formulation and
                                                           Categorization of Evaluation
                                                           Questions Referenced in the Sample
                                                           of Eight IEG Evaluations

                                                           Urban Transport Evaluation
                                                           The overarching evaluation question is a combination of two questions:

                                                           To what extent has the World Bank Group supported sustainable urban trans-
                                                           port development in client countries that contributed to cities’ efficiency and
                                                           economic growth, environmental quality, the welfare of the poor and vulnerable
                                                           groups, and road/traffic safety?

                                                           The subordinate questions addressed several topics:


                                                           Relevance
                                                           To what extent has the World Bank Group’s support for urban transport been
                                                           relevant to client countries (and cities) and their poor, female, and other
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix F




                                                           vulnerable populations’ priority needs, as well as to local priority?


                                                           Effectiveness (Efficacy)
                                                           To what extent has the Bank Group been effective in achieving its objectives
                                                           (improved accessibility and mobility; environmental sustainability; the wel-
                                                           fare of the poor, women, and vulnerable groups; and road/traffic safety) with
                                                           regard to urban transport development?


                                                           Efficiency
                                                           To what extent are Bank Group interventions in urban transport efficient
                                                           from both program and institutional perspectives? This question aims to
                                                           elicit the extent to which Bank Group interventions (or the systems they
104	
supported) reached beneficiaries at a reasonable cost and were well used and
financially viable.


Work Quality
To what extent has the World Bank Group achieved high standards in man-
aging factors within its control and coordinating its work internally and
externally? This question focuses on how well the Bank Group designed and
supported implementation, executed safeguard policies, and tracked the re-
sults of its urban transport portfolio. The question also focused on how well
it used collaboration, coordination, or complementarities across the Bank
Group and with other partners.

Furthermore, two “Evaluative Lenses” posed other specific evaluation ques-
tions:

»	 To what extent is information on Service Delivery contained in project ap-
   praisal documents?

»	 How is Service Delivery described and operationalized in appraisal documents,
   and what is the quality of this?

With respect to the second lens, the question was posed as: “whether or not
projects identified beneficiaries and whether or not diagnostic work was
undertaken to learn what factors influence people’s current behaviors (for


                                                                                    World Bank Group Independent Evaluation Group    105
example, service use) and to understand likely barriers to achieving a proj-
ect’s desired outcome.”


Carbon Finance Evaluation
The overarching question is a combination of questions:

What has been the strategic objective, nature of engagement, and contribution of
the World Bank Group in supporting carbon finance (CF)? What lessons can be
drawn from this to inform the Bank Group’s strategic direction in supporting the
next generation of market-based carbon mitigation activities, given its potential
comparative advantages?
                                                           This was followed by several subordinate questions and corresponding
                                                           “sub-subquestions”:


                                                           Subquestion 1: What has been the nature and extent of engagement of World
                                                           Bank Group support to CF since its inception around 2000?

                                                           »	 What has been the nature and the evolution of the Bank Group’s support to
                                                              carbon finance over time?

                                                           »	 What has been its strategic objective, and to what extent has the support
                                                              been underpinned by and aligned with relevant Bank Group strategies?


                                                           Subquestion 2: What have been the evolving needs and priorities in CF for
                                                           stakeholders at global and national levels from Kyoto to Paris, and how did the
                                                           World Bank Group respond to these?

                                                           »	 How have stakeholder needs and priorities at global and national levels
                                                              evolved over time, and how are they likely to evolve in the near future? How
                                                              have markets and global regulatory regimes evolved over time?

                                                           »	 How and to what extent did the Bank Group adjust or respond to changes and
                                                              uncertainties in markets and in the global regulatory regime? How and to
                                                              what extent has the Bank Group been responsive to the evolving needs and
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix F




                                                              priorities of its clients (funders and countries)?


                                                           Subquestion 3: To what extent and in what ways has the World Bank Group
                                                           contributed to developing and innovating carbon markets and building capacities
                                                           through its multiple roles and support to CF?

                                                           »	 How effectively has the Bank Group been able to fulfill its role in catalyzing
                                                              and developing carbon markets and leveraging private investments, innovat-
                                                              ing CF, building capacity of its clients, and convening thought leadership at
                                                              the global and national levels?

                                                           »	 What does the existing and new evidence tell us about the effectiveness of
                                                              the main CF interventions in reducing greenhouse gas emissions and gener-
                                                              ating co-benefits for sustainable development?
106	
Subquestion 4: To what extent and in what ways does the World Bank Group
support to CF distinguish itself from support provided by other institutional actors
and contribute to its own operations?

»	 How has the Bank Group positioned itself relative to other major institutional
   actors in its CF support?

»	 How and to what extent has the Bank Group been able to leverage CF in-
   ternally to augment its operational core business and scale up results (for
   example, through “blending” or more coherent programmatic integration of
   CF with other Bank Group operations)?

Underpinning these are four subordinate questions:

         a.	 What has been the nature and extent of engagement of Bank Group
            support to CF since its inception in about 2000?

         b.	 What have been the evolving needs and priorities in CF for stakehold-
            ers at global and national levels from Kyoto to Paris, and how did the
            Bank Group respond to these?

         c.	 To what extent and in what ways has the Bank Group contributed to
            developing and innovating carbon markets and building capacities
            through its multiple roles and support to CF?

         d.	 To what extent and how did Bank Group support to CF distinguish it-


                                                                                       World Bank Group Independent Evaluation Group    107
            self from support provided by other institutional actors and contribute
            to its own operations?



Learning and Results Evaluation
The evaluation addresses the following overarching combination of ques-
tions:

How well has the World Bank Group learned in its lending operations? What is
the scope for improving how it generates, accesses, and uses learning and knowl-
edge in these operations?1
                                                           Electricity Access Evaluation
                                                           The overarching question is again a combination of questions:

                                                           To what extent has the World Bank Group been effective in the past and, going
                                                           forward, how well is it equipped to put its country clients on track to achieve
                                                           universal access to electricity that is adequate, affordable, and of the required
                                                           quality and reliability?

                                                           The following question is also formulated (Global Programs’ Contribution to
                                                           Knowledge on Electricity Access): “To what extent have the four programs
                                                           contributed to knowledge on energy access?”2

                                                           In the systematic review, the following evaluation question is formulated:
                                                           “What is the impact of electricity access on health, education and welfare
                                                           outcomes in low- and middle-income countries?”


                                                           Higher Education Evaluation
                                                           The evaluation’s overarching question is:

                                                           How has the World Bank Group’s support to higher education contributed to its
                                                           twin goals of poverty reduction and shared prosperity?
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix F




                                                           To address this subject, the evaluation is divided into three questions and 13
                                                           subquestions.


                                                           Question 1: Is the World Bank Group’s support for higher education consistent
                                                           and well articulated?

                                                             1.	 How has the Bank Group incorporated higher education in its strategic
                                                                 documents?

                                                             2.	 How does it coordinate its support for higher education internally within
                                                                 the Bank Group?
108	
  3.	 How does it coordinate its support for higher education with external
     development partners and nongovernment actors?

  4.	 How does it conceptualize higher education and incorporate local context
     into the design of its operations?


Question 2: How has World Bank Group support contributed to higher education
systems?

  1.	 How has the Bank Group contributed to changes in the financial sustain-
     ability and management of higher education systems?

  2.	 How has its support strengthened the connection between higher educa-
     tion and both the public and private sectors?

  3.	 How has it supported regulation and quality assurance in public and pri-
     vate universities?

  4.	 How has its support contributed to internal efficiency in higher education?

Question 3: How has the World Bank Group’s support for higher education
contributed to social and economic outcomes?

  1.	 How has Bank Group support improved access and equity for lower income
     households?



                                                                                    World Bank Group Independent Evaluation Group    109
  2.	 How has its support addressed gender and other traditionally excluded
     groups in higher education?

  3.	 How has its support contributed to external efficiency through developing
     skills and improving the employability of graduates?

  4.	 How has its support contributed to external efficiency through private sec-
     tor development and increased industry competitiveness?

  5.	 How has its support contributed to the quality of research and its rele-
     vance to local development challenges?
                                                           Health Services Evaluation
                                                           The overarching question of the evaluation is again combined:

                                                           What are the roles and contributions of the World Bank Group in support of
                                                           health services, and what can be done to enhance them?

                                                           These are divided into four subquestions:


                                                           Subquestion 1: What has been the nature, extent, and evolution of support to
                                                           health services in the past 10 years?


                                                           Subquestion 2: How relevant has Bank Group support to health services been to
                                                           the main health needs and priorities?


                                                           Subquestion 3: To what extent has Bank Group support effectively contributed to
                                                           the achievement of its goals?


                                                           Subquestion 4: What has been the role of the Bank Group in global and country-
                                                           level partnerships supporting health services?

                                                           In the section on the “Analysis of Service Delivery and Behavior Change,”
                                                           an additional question is posed: “To what extent is information on behavior
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix F




                                                           change and service delivery presented and operationalized in project ap-
                                                           praisal documents (and completion reports)?”


                                                           Rural Nonfarm Economy Evaluation
                                                           The overarching question is a combination of questions:

                                                           How successfully has the World Bank Group contributed to the creation of
                                                           sustainable income-generating opportunities for the rural poor within the
                                                           rural nonfarm economy (RNFE), and what attributable effects have Bank
                                                           Group efforts had on reducing poverty?

                                                           To answer this question, specific subquestions regarding the relevance,
                                                           effectiveness, efficiency, and sustainability of the Bank Group interventions
110	
at all levels—strategy, project, portfolio, program, country, and aggregate—
were posed.


Relevance
Are Bank Group interventions relevantly responding to client needs to help
alleviate poverty by developing the RNFE in a sustainable and inclusive way?

Is the Bank Group strategically collaborating with partners to help develop
the RNFE for the benefit of the poor?

»	 How relevantly are Bank Group interventions diagnosing and addressing the
   supply- and demand-side constraints related to the development of a sus-
   tainable, profitable, and inclusive (pro-poor) RNFE?

»	 At the global and country level, how is the Bank Group positioning itself stra-
   tegically? At the country level, how relevant are project designs to country
   contexts and national poverty reduction planning needs, with regard to the
   development of the RNFE?

»	 At the household level (project design, targeting, measurement), how rele-
   vantly is the Bank Group addressing the differentiated needs of the marginal-
   ized, women, youth, and other vulnerable groups?


Effectiveness

                                                                                     World Bank Group Independent Evaluation Group    111
How effectively have Bank Group interventions contributed to the develop-
ment of a sustainable and inclusive RNFE? How have these efforts contribut-
ed to alleviating rural poverty?

»	 How effectively has the Bank Group-supported employment creation, in-
   creased incomes, and enhanced welfare for the poor within the RNFE?

»	 How has this assistance been targeted toward and how has it impacted the
   marginalized, women, youth, and other vulnerable groups?


Efficiency
How efficiently have the World Bank Group agencies worked together to help
develop a sustainable and inclusive RNFE?
                                                           Environmental and Social Sustainability:

                                                           Is the World Bank Group’s support for the RNFE environmentally and social-
                                                           ly sustainable?


                                                           IFC Client Engagement Model Evaluation
                                                           The evaluation poses the following questions:

                                                             »	   Question 1: What is the nature and extent of implementation of IFC’s approaches
                                                                  to strategic client engagement from FY04 to FY16?

                                                             »	   Question 2: What are the effects of IFC’s approaches to strategic client engage-
                                                                  ment for its strategic clients?

                                                             »	   Question 3: What are the effects of IFC’s approaches to strategic client engage-
                                                                  ment on IFC?

                                                             »	   Question 4: What are the effects of IFC’s approaches to strategic client engage-
                                                                  ment on the host developing countries?

                                                             »	   Question 5: What are the main factors explaining the differences in effects?

                                                           Table F.1 categorizes the evaluation questions in the stratified random sam-
                                                           ple of evaluations examined in in-depth review.
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix F
112	
Table F.1. Evaluation Questions Categorized

                    Questions                Type of
Evaluation Report   (Overarching)            Questions             Questions (no.)
 Urban Transport    One overarching          This question is      One overarching
 Evaluation         question:                evaluative (ex        question, 7 sub-
                    To what extent has       post), posed in two   questions, 6 “to what
                    the World Bank           parts:                extent” questions.
                    Group support-           Part 1: To what ex-   Total: 7 questions
                    ed sustainable           tent has the Bank
                    urban transport          Group supported
                    development in           X?
                    client countries         Part 2: To what ex-
                    that contributed to      tent has support of
                    cities’ efficiency and   X contributed to Y?
                    economic growth,
                    environmental
                    quality, the welfare
                    of the poor and vul-
                    nerable groups, and
                    road/traffic safety?
 Learning Results   Two overarching          The first question    Two overarching
 Evaluation         questions:               is evaluative, ex     questions and one
                     »	 How well has         post. The second      subquestion:
                        the Bank Group       question is explor-    »	 Do Bank Group
                        learned in its       atory and design          projects that ob-
                        lending opera-       oriented.                 tain better results
                        tions?                                         do so, at least
                    »	 What is the                                     in part, because
                       scope for                                       of more learn-


                                                                                             World Bank Group Independent Evaluation Group    113
                       improving how                                   ing taking place
                       it generates,                                   during the project
                       accesses, and                                   cycle?
                       uses learning                               Total: 2 questions
                       and knowledge
                       in these opera-
                       tions?

                                                                              (continued)
                                                                               Questions               Type of
                                                           Evaluation Report   (Overarching)           Questions               Questions (no.)
                                                           Carbon Finance      A combination of        The first question is   Two overarching
                                                                               two overarching         exploratory (what       questions, four
                                                                               questions.              has been…) and the      subquestions them-
                                                                               What has been           second is evalua-       selves broken into 14
                                                                               the strategic           tive (ex-post).         sub-subquestions.
                                                                               objective, nature                               Total: 20 questions
                                                                               of engagement,
                                                                               and contribution
                                                                               of the World Bank
                                                                               Group in support-
                                                                               ing carbon finance
                                                                               (CF)? What lessons
                                                                               can be drawn
                                                                               from this to inform
                                                                               the Bank Group’s
                                                                               strategic direction
                                                                               in supporting the
                                                                               next generation
                                                                               of market-based
                                                                               carbon mitigation
                                                                               activities, given its
                                                                               potential compara-
                                                                               tive advantages?



                                                                                                                                            (continued)
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix F
114	
                     Questions               Type of                 Number of
Evaluation Report    (Overarching)           Questions               Subquestions
Electricity Access   Two overarching         The first question is   One overarching
Evaluation           questions:              evaluative, ex post     question (to what
                      »	 To what extent      (to what extent         extent) and one on
                          has the Bank       …). The second          “how well equipped
                          Group been         question is design      to…,” with two sub-
                          effective in the   oriented (how well      questions of which
                          past?              equipped …).            one is “to what
                     »	 How well is the                              extent” and one is
                        Bank Group                                   “what is the impact
                        equipped to                                  of electricity access
                        put its coun-                                on health, education,
                        try clients on                               and welfare out-
                        track to achieve                             comes in low- and
                        universal access                             middle-income
                        to electricity                               countries?”
                        that is adequate,                            Total: 4 questions
                        affordable, and
                        of the required
                        quality and
                        reliability?
Higher Education     One overarching         Evaluative ques-        One overarching
Evaluation           question:               tion ex post: how       question, 3 sub-
                     How has the Bank        well has the Bank       questions. The first
                     Group’s support to      Group support           subquestion is: “Is the
                     higher education        contributed, and        Bank Group’s support
                     contributed to its      so on.?                 for higher education
                     twin goals of pov-                              consistent and well
                     erty reduction and                              articulated?”


                                                                                                 World Bank Group Independent Evaluation Group    115
                     shared prosperity?                              The second sub-
                                                                     question is: “How has
                                                                     Bank Group support
                                                                     contributed to higher
                                                                     education systems?”
                                                                     The third subques-
                                                                     tion is: “How has the
                                                                     Bank Group’s support
                                                                     for higher education
                                                                     contributed to social
                                                                     and economic out-
                                                                     comes?”
                                                                     13 sub-subquestions
                                                                     Total: 17 questions


                                                                                   (continued)
                                                                                 Questions                Type of                  Number of
                                                           Evaluation Report     (Overarching)            Questions                Subquestions
                                                           Health Services       Two overarching          The first is a de-       Two overarching
                                                           Evaluation            questions:               scriptive question       questions and 5 sub-
                                                                                 What are the roles       (what are…); the         questions of which 2
                                                                                 and contributions        second is design         are “to-what-extent”
                                                                                 of the Bank Group        oriented (what can       questions and one is
                                                                                 in support of health     be done to…).            a descriptive (“what
                                                                                 services, and what                                is...”) question.
                                                                                 can be done to                                    Total: 7 questions
                                                                                 enhance them?
                                                           Rural Nonfarm Econ-   Two overarching          The first question is    Two overarching
                                                           omy Evaluation        questions:               evaluative ex post.      questions, 4 sub-
                                                                                  »	 How success-         The second is also       questions, and 8
                                                                                     fully has the        evaluative ex post       sub-subquestions.
                                                                                     Bank Group           but in a sequence        Total: 14 questions
                                                                                     contributed to       (that is, “if the Bank
                                                                                     the creation         Group contributed
                                                                                     of sustainable       to x, what then are
                                                                                     income-gener-        the attributable
                                                                                     ating opportuni-     effects of that on
                                                                                     ties for the rural   reducing pover-
                                                                                     poor within the      ty?”).
                                                                                     RNFE?
                                                                                 »	 What attribut-
                                                                                    able effects
                                                                                    have Bank
                                                                                    Group efforts
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix G




                                                                                    had on reducing
                                                                                    poverty?


                                                                                                                                                (continued)
116	
                          Questions               Type of                 Number of
 Evaluation Report        (Overarching)           Questions               Subquestions
 IFC Client Engage-       Five questions:         One descriptive         Total: 5 questions
 ment Model Evalu-         »	 What is the na-     question (what is
 ation                        ture and extent     the nature…?), 3
                              of implemen-        evaluative ex post
                              tation of IFC’s     questions (what
                              approaches to       are the effects…?),
                              strategic client    and 1 explanatory
                              engagement          question (what are
                              from FY04 to        the main factors...?)
                              FY16?
                           »	 What are the
                              effects of IFC’s
                              approaches to
                              strategic client
                              engagement
                              for its strategic
                              clients?
                           »	 What are the
                              effects of IFC’s
                              approaches to
                              strategic client
                              engagement on
                              IFC?
                           »	 What are the
                              effects of IFC’s
                              approaches to
                              strategic client
                              engagement on


                                                                                               World Bank Group Independent Evaluation Group    117
                              the host devel-
                              oping countries?
                           »	 What are the
                              main factors
                              explaining the
                              differences in
                              effects?


Source: Independent Evaluation Group.
                                                           Appendix G. Failures When
                                                           Formulating Evaluation or Research
                                                           Questions Based on the Literature
                                                           Failure 1: Generating ill-formulated and suboptimally formulated research
                                                           problems:

                                                           White and Waddington (2012, 361) give an interesting example of this issue.
                                                           “A good answer needs a good question. The main issue in setting the ques-
                                                           tion is the breadth of the question. We would all like to know the answer to
                                                           the question ‘how do we end global poverty and achieve world peace?’, but it
                                                           is rather too broad for a research project.” In line with this, asking the ques-
                                                           tion “what is the situation of cybercrime in France?” is another example of
                                                           an ill-formulated research problem, because the question attempts to formu-
                                                           late a very broad topic (the “object variable” cybercrime). Specific aspects of
                                                           cybercrime (the modus operandi or the fields covered), the time period, and
                                                           impacted targets (companies, individuals, victims, offenders) are not defined.
                                                           This failure can be prevented by specifying at least two other variables next
                                                           to the object variable: the independent and the dependent variable.
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix G




                                                           Failure 2: Studying erroneous research problems:

                                                           These are problems that are formulated against a background consisting
                                                           of at least one incorrect statement. The background “is constituted by the
                                                           antecedent knowledge and, in particular, by the presuppositions of the prob-
                                                           lem. The presuppositions of the problem are the statements that are some-
                                                           how involved but not questioned in the statement of the problem and in the
                                                           inquiry prompted by it” (Bunge 1997, 194).

                                                           Failure 3: Studying research problems lacking clarity:

                                                           Defining key terms is central to achieving clarity in a research question How-
                                                           ever, clarity does not solely concern definitions. In one extreme, scholars like
                                                           Kane (1984) suggest that all research problems should be posed as a single
                                                           sentence. However, the German proverb that “in der Beschränkung zeigt sich
118	




                                                           erst der Meister” is applicable, as the structure of a research problem can
indeed be unclear. When a single research problem includes some dozen (or
more) subquestions and sub-subquestions without specifying how they re-
late to each other, this will reduce the guidance emanating from the research
problem. Such a failure can also occur in the opposite direction. Epstein
and Martin (2014, 23) give as an example the question, “what leads people
to obey the law?” Though an interesting overarching problem, the question
is difficult to answer without subsequent disaggregation into more specific
subquestions.

Failure 4: Studying problems characterized by a wrong level of abstraction:

Van Thiel (2014, 29) provides two examples of this. The first involves situa-
tions in which a researcher formulates a problem of too abstract or general
a nature (for example, regarding the impact of key performance indicators
on the efficiency of public tasks carried out by municipalities), when in fact
the study will be dedicated to only one particular municipality. The other
example involves selecting too low a level of abstraction. This takes place
when the research problem is basically nothing more than one or two very
concrete and direct questions that respondents in a survey have to answer.
In this case, a link with a more general (overarching) problem, under which
these “respondent questions” reside, is missing. As Yeager (2008, 45) notes,
a research problem “is the focal question a research project is intended to
answer. It is not a question developed for a survey or an interview protocol.”



                                                                                 World Bank Group Independent Evaluation Group    119
Failure 5: Forgetting that an (implicit) theory, assumption, or set of assump-
tions underlies the respected evaluation question(s):

This failure suggests that the implicit theory can and often will guide the
ways in which the evaluation question is addressed. When the theory that
guides the evaluation is explicitly formulated, this failure can be prevented
by explicitly referring to this theory and acknowledging that other theories
are possible and relevant, but not “at this time in this evaluation.”

Failure 6: Assuming that a “bag of questions” increases the depth, breadth,
and width of the evaluation:
                                                           This failure notes that it is much easier to formulate multiple questions than
                                                           to systematically investigate them and combine the findings. Often a bag of
                                                           questions leads to an unconsolidated bag of answers.
Meta-Evaluation of IEG Evaluations (FY15–19)  Appendix G
120	
1
    Note that no Approach Paper was provided for the ending poverty (FY15) evaluation.

2
    Note that much of this analysis was ultimately excluded from the meta-evaluation.




                                                                                         World Bank Group Independent Evaluation Group    121
The World Bank
1818 H Street NW
Washington, DC 20433