Delivery Challenges and Development Effectiveness Assessing the Determinants of World Bank Project Success

The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


Introduction
Scholars and practitioners have recently contributed to a flurry of empirical research analyzing the links between the general characteristics of development projects and their outcomes.Most work on project effectiveness has focused on broad determinants related to the socio-economic attributes of recipient nations, sector-specific obstacles, and the efficiency of project implementation.Generally, these approaches can be summarized into three camps: where the project ran, which sector the project targeted, and who was responsible for its execution.Particular emphasis has been paid to micro-attributes of project implementation, including the quality of team leadership, stakeholder capacity, bureaucratic challenges, and exogenous shocks (macroeconomic or otherwise).While these efforts have substantially improved our understanding of the specific conditions that impact international lending and technical assistance, the proliferation of competing explanations necessitates a more holistic approach to better understand how the various proposed mechanisms function relative to one another.This paper attempts to address this gap, making use of the wealth of prior research to offer a systematic analysis of the weaknesses that impact the World Bank's performance and hinder the achievement of development objectives.
The analysis takes advantage of a novel dataset examining the challenges that are faced throughout the project life cycle, from preparation to completion.Using data mining techniques and automated content analysis of over 5,000 World Bank project completion reports, we compile a taxonomy of 42 "delivery challenges," mapping problems faced by Task Team Leaders across time, region, and sector.The challenges are clustered by project, stakeholder, and context, with the latter group serving as structural controls that plausibly impact delivery and effectiveness.These data are then analyzed using Bayesian model averaging, addressing variations in parameter estimates induced by variable specification effects.Posterior probabilities for each delivery challenge are conditioned on the overall likelihood of having a substantive effect across all possible model configurations.These are further tested alongside a battery of controls and fixed effects, offering a sense of their overall impact on the aggregate subset of World Bank projects tested.Delivery challenges are assessed with respect to their impact on three outcome measures: the achievement of stated development objectives (outcomes), the World Bank's effectiveness in carrying out the project (performance), and the proactive identification of potential risks to overall project success (supervision).
Analytically, delivery challenges serve as powerful measures, standardizing the set of obstacles encountered throughout the project life cycle and offering a cross-cutting assessment of overall performance.Paired with the model averaging procedure described above, this approach offers two advantages.First, it provides a more granular measure of impediments than previous work, looking at the potential correlates of project success at every node along the development and execution life cycle.This is of value to scholars and practitioners alike, as it offers both a more nuanced exploration of outcomes as well as a diagnostic tool to detect ex-ante the potential challenges that the World Bank may face in preparing and supervising projects.With reference to the latter, the practical utility of the DeCODE framework has already been demonstrated within country-1 and sector-specific challenges. 2 This paper contributes to our understanding of the impact these challenges have relative to project outcomes and World Bank performance overall.
Second, the identification of delivery challenges provides a way to capture variation in project performance across a broad set of logged project case histories.By drawing on implementation completion reports, we strive to aggregate and analyze the impact of obstacles that are organically defined by those who supervise disbursement projects in the first place.Put together, this allows for a holistic analysis of success and failure across the project's lifespan.By using a cross-cutting measure of delivery challenges, this paper identifies the most likely issues projects face, as well as the magnitude of effects posed by said challenges.Matching these factors to the sequence in which delivery challenges are likely to be expected, we develop a typology for structured and focused analysis.This is especially helpful in addressing issues related to model selection and diagnosis, as it provides a means of simultaneously assessing the various dimensions along which projects have historically under-performed.
The data and methods proposed above serve to generate standardized coefficients for the most impactful challenges faced in the implementation of World Bank projects.Though some of the measures are comparatively less informative, the examination of the full corpus of delivery challenges allows us to weigh the relative impact of each.The paper is organized as follows.Section 2 offers an overview of the relevant literature on project effectiveness.Section 3 summarizes the conceptual framework, including a description of the delivery challenge taxonomy, the outcome measures tested, and the methodology.Section 4 presents the results of the analysis, Section 5 presents findings on World Bank performance and outcomes, and Section 6 concludes.

Previous Literature
The existing repertoire of literature examining World Bank project effectiveness is too numerous to be cited in full.Previous studies can generally be divided into three broad categories, examining region-and sector-specific work, the political determinants of project effectiveness, and the internal dynamics of project management influencing performance.Much of the aggregative work focuses on the importance of project design and management, whereas sector-and region-specific work emphasizes the impact of cross-national and industrial variation in determining whether a disbursement project is ultimately successful.Some of the more relevant work relating to these two camps will now be examined.
One of the earliest assessments of project effectiveness tested the rates of return from Bank projects at inception as opposed to completion, accounting for the divergences in productivity across a large sample of completed projects.Pohl and Mihaljek (1992) analyzed the World Bank's project evaluation record using a sample of 1,015 projects via a comparison of rates of return at appraisal and completion (typically accounting for a ten-year project lifespan).They found that cost overruns and implementation delays have a small but negative impact on project performance, specifically in agricultural and industrial projects.The paper also noted a discrepancy between expected and actual rates of return, noting that World Bank appraisal estimates were often overly optimistic.Their analysis emphasized the importance of the macroeconomic environment, which they note to be more substantively significant than the other indicators considered and an under-emphasized element in project evaluation.
Similarly, Kilby (2012) uses a measure of economic vulnerability (as measured by the ratio of the recipient country's short-term to long-term debt) to assess project performance.Projects with longer preparation times are expected to perform better, conditional on the degree of economic vulnerability: when project design takes into account the underlying economic conditions of the donor country, the initiatives tend to perform better. 3These findings support both Kilby (2011) and Dreher et al.'s (2009) work on the political economy of World Bank lending projects, which states that the political motivations of donor countries can lead to insufficiently prepared and underperforming projects.Both papers emphasize the importance of project design and preparation.Kilby (2011) notes that politically motivated projects can sometimes be rushed into implementation, whereas Dreher et al. show that politically motivated projects significantly under-perform in economically vulnerable countries.
Taking a more generalizable approach, Ika, Diallo, and Thuillier (2012) expand the typology for project failure outside project design, highlighting five "Critical Success Factors" (CSFs) encompassing monitoring, coordination, design, training, and the institutional environment.They look beyond the internal dynamics of project design, expanding to include elements of the general economic and social environment in which projects are implemented.Each of the five CSFs are positively related to project success, with project design and monitoring showing strong substantive significance.Moll, Geli, and Saavedra (2015) assess the correlates of success in World Bank development policy lending, showing that individual loan characteristics such as task team leader proficiency, leadership, and project design had a significant effect on improving the chances of project success.Interestingly, Moll and co-authors find that certain sectoral reforms (for instance, energy-sector reforms) tend to reduce the likelihood of success, perhaps due to the intricate political difficulties involved in enacting such reforms.However, the paper concludes by noting that increases in leadership and coordination from the Bank at the project level generally have a positive effect on outcome.In a recent book chapter, Enyinda (2017) assesses the role of quantitative risk analysis on project management, examining the ways in which ex-ante risk analysis can mitigate poor outcome ratings on completion.
Region-and sector-specific works related to project success are too numerous to cite here with any degree of meaningful exhaustiveness.That being said, several works with a specific eye to issues that may be seen as delivery challenges can be highlighted here.Diallo and Thuillier's (2004) examination of success factors in African projects presents an insightful analysis of the challenges posed to project success in Africa.Vawda et al. (2003) test the performance of education projects that have been subjected to economic analysis at the time of appraisal, showing a strong link between the project outcome and cost-benefit analysis.This suggests that project design is a strong indicator of ultimate performance.At a regional level, Blanc et al. (2016) use data from World Bank projects in the East Asia and Pacific regions to model the importance of project supervision, implementation progress, and the achievement of development objectives during project supervision.Similarly, Noorbakhsh and Paloni (2007) provide a novel take on the lessons from structural adjustment programs and the role of selectivity in Africa.Along more prescriptive lines, Ika (2012) offers a broad assessment of the general factors that may be leading to project failure on the continent, concluding with a list of prescriptions for improving under-performing projects.
In more sector-specific assessments of project ratings, Vickland and Nieuwenhuijs (2005) provide an interesting take on the primary factors affecting the modernization of public financial management information systems in Bosnia and Herzegovina, and Chauvet, Collier, and Duponchel (2010) analyze the elements that affect aid project success in post-conflict areas more generally.Finally, several recent works look at the conditions that structure the ultimate effectiveness of environmental (Buntaine and Parks, 2013), water (Kfouri, 2016), and education programs (Lee, 2016) financed by the Bank.Notably, such works only take cursory steps towards creating a cross-cutting and generalizable theory of project failure.Most sector-specific studies take few steps towards assessing cross-cutting cleavages that can either positively or negatively impact a project's overall performance.
Among the various substantive evaluations of World Bank implementation success, the importance of data and monitoring is given particular note, as seen in Golini, Kalchschmidt, and Landoni's (2015) examination of project management practices and Raimondo's (2016) analysis of the role of proper supervision on World Bank project performance.Chauvet, Collier, and Foster (2017) employ a useful principal-agent framework in order to examine the role of delegation and observation of agent efforts on project success.Lastly, recent work has examined new approaches to project management, notably from a knowledge-based approach (Todorovic et al., 2015) and transformational leadership (Aga, Noorderhaven, and Vallejo, 2016;Adams and van Tran, 2017).While focusing on a specific challenge to successful project delivery (in this case, the role of project management and supervision), this literature only takes a somewhat cursory approach to examining the role of leadership relative to other indices that may affect project outcomes in parallel.
A great deal of attention has also been paid to how project management, supervision, and leadership can have positive effects on the outcome of aid disbursement initiatives. 4Key among these works is the highly-cited paper by Denizer, Kaufmann, and Kraay (2013), which uses data from the performance of Task Team Leaders (TTLs) to assess the success rates of Bank projects.Denizer et al.'s paper presents the closest analogue to our study, in that it compares project performance across a variety of theorized delivery challenges, including both micro-and macro-determinants of aid project outcomes.One of the earliest works examining the effects of project supervision is Kilby's (2000) insightful analysis of the marginal rate of return from added supervision, particularly when applied early-on the project life cycle, and particularly so when applied to smaller projects. 5Ika's  (2015) survey analysis of World Bank task team leaders similarly engages the effect of supervision, showing that a principal cause of failure in World Bank projects can be traced to a lack of timely supervision.This echoes Kilby's findings, particularly with respect to the cost-benefit analysis of added supervision relative to project performance.
Several insights from literature outside the immediate scope of World Bank project effectiveness may be deemed useful to this analysis.First, a focus on the often-ignored political realities of international aid practices can offer insights into one of the most significant exogenous factors that may affect the ultimate success of disbursements.It is informally understood that aid projects are beholden to political considerations and maneuvering by both grantees and recipients.The extent of this focus has been explored by several works in recent literature, which have looked at both the general effects of political dynamics as well as more focused effects (Laws, 2016).For example, Shin et al.'s (2017) examination of implementing partnerships provides a more nuanced view of how different coalitions of implementing partners can lead to variation in project success.However, in following with the general theme of the literature explored above, such works also exist in a degree of isolation relative to a more comprehensive catalog of delivery challenges that may affect project outcomes.
In addition to political factors, some findings from the literature on public-sector project management may be deemed useful to the analysis attempted here.Khang and Moe's (2008) use of a project life cycle framework offers a useful analogue to our examination of the sequencing effects that may be anticipated in the delivery challenge taxonomy.6By parsing delivery challenges according to different moments in the project life cycle, we may begin to develop an informative theoretical framework for why certain challenges may be expected at certain times.Naturally, some of the challenges explored are to be intuitively expected in specific times (for instance, the designbased challenges can be seen as specific to the front-end definition of aid projects).By applying a life cycle framework to World Bank projects, we gain an informative theoretical prior with which we can better understand the root causes of the issues that often present themselves as delivery challenges.
Lastly, recent work on the performance of public sector programs may offer a useful lens through which World Bank project effectiveness may be analyzed.Knut and Volden's (2016) examination of the paradoxes of project governance in Norway sheds light on several under-represented challenges and themes that merit further exploration in the literature.The authors focus on the importance of the initial project concept as a major determinant of project outcome, noting that a "lack of competence among planners, avoidance of hidden agendas during planning, underestimation of costs and overestimation of benefits, unrealistic and inconsistent assumptions, and how to secure essential planning data and adequate contract regimes" play a key role in project outcome (Knut and Volden 2016, p.297).In particular, two of the paradoxes identified by Knut and Volden appear to be relevant to a delivery challenge-based approach to project success.First, fewer resources are used "up front to identify the best conceptual solution (project governance), than to improve tactical performance during implementation (project management)...once decisions are made, essential choices become locked and it is more difficult and expensive to change the overall design" (Ibid, p.301).The relevance of a delivery challenge-based approach to project success appears intuitive: such a format allows scholars and practitioners to focus on the specific factors that inform project success at its conception.
Second, the alternative extreme can also present a challenge to project delivery.Project leaders can become overwhelmed with an abundance of less-than relevant information."Adding information, therefore, makes sense -but only to a certain degree.Projects are path-dependent, so adding useful information at the offset is valuable."Once again, the approach advocated here allows us to hone in on the specific factors whose inclusion and examination would provide the greatest return in terms of maximizing the likelihood of project success.Addressing the most common and pertinent symptoms also allows for controlling for these effects and focusing on the role of less tangible factors.Thus, we hope that this approach offers a way to benchmark across projects, over both sectoral and regional lines.
Taking into account the wide swath of analysis already conducted in this subject area, we may discern a trend towards greater focus on project-level determinants as a means of assessing the success of World Bank project investments.Previous literature that has taken a broader approach towards project success has touched upon the various individual aspects of what may be dubbed delivery challenges.These include factors related to project design, sequencing, socio-cultural issues, coordination, and a battery of other indicators related to the social, political, economic, geographic, and normative environments in which projects are implemented.However, each has been explored in relative isolation, outside the context of an overarching typology of the various paths to implementation failure.
In this paper, we use practitioner assessments of implementation challenges as a diagnostic tool for assessing the relative impact a nearly-comprehensive set of potential obstacles may have on a project's ultimate outcome and the Bank's overall performance.The inductively-generated taxonomy captures the set of issues project leaders identified as having an impact on the overall success of the disbursement.We test these alongside other potentially omitted correlates to gauge their substantive impact relative to one another.Iterated model averaging runs using a variety of different theoretical priors ensure that the standardized coefficients attributed to each challenge remain robust regardless of the impact other potentially exogenous factors may have.By taking a more holistic view, we attempt to examine a wider swath of the potential variation in project outcome, be it related to external conditions, the design of the project itself, or a function of the interaction between Bank staff and local actors.To achieve this end, the dataset of delivery challenges offers the most complete agglomeration of project-level indicators compiled thus far (Ortega Nieto and Agarwal, 2017).By approaching project challenges "from cradle to grave," we offer a more complete view of the range of variation that has impacted project success over the last two decades.The next section will explore in greater depth the conceptual framework underlying our analysis, offering a brief overview of the delivery challenge framework as well as a review of the methodology employed to examine the determinants of project success.

Explanatory Framework: Delivery Challenges
The exploration of past research demonstrated the need for a comprehensive joint analysis of the impact that various implementation issues can have on Bank performance and the project outcomes.By taking this approach, we strive to account for possible correlates of project success more thoroughly than would be feasible through a sector-or region-specific examination of project indicators.We achieve this with the data compiled in the delivery challenge framework.The taxonomy provides an overarching examination of critical success factors for World Bank projects.Examining the possible symptoms of failure across three broad phases (planning, implementation, and completion), the more holistic approach employed here allows for a robust analysis of project success while creating parameters that are not bound to sectoral and regional specificities.Thus, the data used offer opportunities for generalization, allowing for the examination of the most critical factors at each phase of the project.This is combined with an exhaustive treatment of the relative impact of each challenge on outcome ratings, overall Bank performance, and quality of supervision.By testing the delivery challenges against pre-existing macro-and micro-determinants of project success, this paper offers a systematic breakdown of the location, frequency, and impact of the most challenging obstacles faced by practitioners throughout the disbursement process.
Delivery challenges are central to understanding the intricate underpinnings of project success.Complex linkages between a diverse series of factors have led to an unfortunate reliance on allencompassing jargon such as "institutional weakness" or "lack of political will."These heuristics provide a general sense of the underlying difficulties faced, but lack an important element of analytical rigor that would allow them to be compared across different projects and in a variety of different contexts.Simply put, current metrics for "what went wrong" have been defined with insufficient specificity, in a way that does not properly capture the extent of variation across different delivery projects.One issue involves the inherently subjective conceptual framework that underlies many delivery challenges.Guided by both the literature and the practical experiences of project managers, the delivery challenge taxonomy offers a set of cross-comparable definitions based on three insights.
First, a review of the literature showed that similar delivery challenges were often referred to in different ways, preventing comparison across cases and projects.Standardized categories were formed to capture these issues.Second, challenges were often defined in overly broad terms, creating difficulties in establishing a sufficiently precise metric of concepts for analysis.The overarching categories were correspondingly disaggregated to capture as much granularity as possible.Third, consultations with practitioners and experts in different fields across multiple organizations allowed for cross-validation of the proposed delivery challenge taxonomy, ensuring that the categories identified were in line with the tangible challenges that practitioners faced in implementing development projects.These principles played a central role in generating the taxonomy, consisting of 14 categories and 42 nested subcategories.Tables 1 to 3 provide a summary of the challenges as divided into three clusters encompassing project-, stakeholder-, and context-based challenges.
Observations were clustered according to whether the identified challenges were related to internal project dynamics, interactions with relevant stakeholders, or external factors inherent to the implementation environment.Three categories emerged within the first cluster, addressing issues with project design, financing, and monitoring.Project design challenges address issues relating to the scope of objectives, allocation or time and sequencing of deliverables, as well as general targeting problems involving the engagement of stakeholders and beneficiaries.Financing issues relate to problems with procurement, budgeting, auditing, and other financial management issues.Lastly, issues with monitoring and evaluation focus on the absence of adequate indicators and data baselines for tracking development objectives and identifying potential problems as they evolve.
In the stakeholder cluster, challenges were grouped according to problems of coordination, leadership, and organizational capacity.Coordination and engagement challenges assess difficulties related to administrative and bureaucratic hurdles, ambiguous assignment of responsibilities, and insufficient communication between Bank staff and project beneficiaries.Leadership challenges address shifts in government priorities and opposition to proposed interventions by beneficiaries.Lastly, human resource challenges examine the availability (or absence) of requisite skills, staff turnover, and weaknesses in skill transfer.
The third cluster catalogs the various contextual challenges practitioners faced in carrying out their development objectives.The eight categories defined here (legislation, governance, conflict, societal issues, environmental concerns, natural disasters, business environment, and macroeconomic context) appraise the various internal and external factors that impacted practitioners' implementation of the project.Some of the challenges (e.g.those involving natural disasters, economic shocks, or civil unrest) are exogenous to implementation, but their inclusion provides a useful set of contextual controls against which the impact of delivery challenges can be tested.Figure 1 graphs the distribution of all delivery challenges by project sector and Figure 2 shows the average number of challenges per project as disaggregated by region.The heterogeneity among sectors and regions of implementation shown there will be addressed in the analysis.
The delivery challenges themselves were extracted from over 1,500 World Bank Project Implementation Completion Reports (ICRs).Word clouds, n-grams, and manual reviews were used to create a training set for supervised analysis.These were paired with annotated summaries that associated text labels to the different challenges.The combination ensured that no pertinent categories were missed in classification.A training set of over a thousand manually classified project completion reports was prepared, integrating output from the annotated summaries and unsupervised cluster analysis.This was evaluated using expert interviews in order to confirm its validity.After accounting for spelling errors, stemming, and removing stop-words, K-means clustering was employed on all available ICRs, yielding the first version of the dataset.Cluster dendrograms were then used to cross-validate the output against the categories identified in the literature and practitioner interviews.Ultimately, the output categories were shown to correspond rather closely to theoretical concepts reflected in the literature.
The resulting taxonomy of delivery challenges serves as a comprehensive diagnostic tool for understanding the Bank's performance as well as the achievement of project objectives.That being said, the taxonomy's prioritization of implementation problems over country-and sector-specific heterogeneities presents a potential issue for calculating unbiased and consistent estimates.As will be shown in Section 4, this is addressed using a battery of control variables designed to capture the impact of unique sectoral, regional, and loan-type effects.In addition to controlling for baseline factors such as gross domestic product, population, financial health, foreign debt, and industrialization, additional contextual covariates were included to capture attributes of project performance that might have otherwise been linked to regime type, urbanization, natural resource rents, former colony status, and the quality of project leadership.However, inclusion of these indicators ultimately had a limited impact on outcomes.
The relative representation of challenges may pose a further obstacle for analysis, as certain issues may arguably face problems of serial under-reporting in project completion reports.Insofar as selfassessments of overall performance can be relied upon to accurately reflect practitioner experiences, the delivery challenge framework can serve as a useful analytical tool for assessing project success and Bank performance.This assumption is in line with previous literature, which similarly relies on such data (Kilby and Michaelowa, 2019).Testing delivery challenges against subsequently-assigned evaluation ratings provides an additional benefit, highlighting which challenges actually end up substantively impacting service delivery and performance.This in itself serves a useful purpose, conditioning prior expectations of overall performance against independently-assigned posterior evaluations.
Delivery challenges emulate measures such as Country Policy and Institutional Assessment (CPIA) scores in this regard, offering a comprehensive diagnosis of project weaknesses.But where CPIA scores cover 16 criteria over four clusters, delivery challenges disaggregate issue-areas across a more comprehensive set of issues.In addition to covering dimensions that are simply not covered by the aforementioned (including contextual determinants, specific project design attributes, and stakeholder effects), the delivery challenge taxonomy offers more granularity than other instruments, something that has been noted in qualitative applications of the data in previous analytic work. 7his allows us to take a holistic view of the factors that impact project outcomes and Bank performance.

Outcome Measures
Following previous literature, we employ the Independent Evaluation Group's (IEG) rating of project outcomes as the main dependent variable in our analysis.Outcomes are benchmarked against the achievement of stated project objectives, standards, and expectations, as laid out in the project design.8Though the measure is conditioned on the scope of development objectives laid out at the time of approval, it provides several advantages for our analysis.First, IEG's ratings have become a standard measure in a growing body of literature that use them as a heuristic for project success.9Second, IEG's operationalization of project outcomes provides a degree of external validity to the general concept under exploration.All projects are subjected to a desk review, and approximately 25% of the projects completed each year undergo a more detailed evaluation.IEG evaluates World Bank projects according to a six-point scale ranging from highly unsatisfactory to highly satisfactory.Though not a perfect measure of project performance, using IEG evaluation scores allows for a consistent analysis of a large sample of projects, offering similar advantages as the delivery challenge taxonomy in terms of both replicability and generalization.
Furthermore, the models are tested against IEG's performance and project supervision variables.The Bank's overall performance is defined as a function of two indicators, measuring the ability to ensure the quality of the project at entry and maintaining quality of supervision during implementation.Analogous to the outcome measure, the performance variable more explicitly focuses on how successful the Bank was in implementing a project from inception to conclusion.Testing the impact of delivery challenges on performance as well as outcomes allows us to correct for potential biases related to a project's operating environment, given that support to low-income and conflictprone states has been known to negatively impact implementation.Finally, assessing the impact of various delivery challenges on the quality of supervision provides an added robustness test, since ratings on this metric can circumvent the potential feedback loop between the assessment of the Bank's performance and the achievement of development objectives.10While IEG's overall outcome measure appraises the extent and efficiency to which major objectives were achieved, the rating of Bank performance focuses more on project quality at entry and the quality of supervision.Instead of only examining the overall outcome of the project, models are tested against performance indicators to assess the differences (if any) between the success of the project and the efficacy of the Bank in implementing it.Following this logic, model runs on supervision ratings are also included to focus on the Bank's efficiency in identifying and resolving obstacles to the achievement of relevant objectives.

Methodology: Bayesian Model Averaging
Our assessment of the determinants of project success has identified a rubric for assessing project performance across a broad set of issue areas that may affect a particular project throughout its lifetime.The resulting theoretical framework raises an important question: given the number of indicators that could potentially have an impact on project outcomes, how can we deal with model uncertainty over such a large set of covariates?In order to address this concern, we employ Bayesian model averaging (BMA) to account for those variables that appear to be relevant to the data generating process across the entire model space.Typically, the approach to model selection involves relying on a small subset or even a single model specification to account for substantive and statistical relationships between a subset of covariates and the outcome.Naturally, this is an inadequate approach and rendered more so as the size of the model space increases.In the context of this study, two considerations make model selection pivotal to the analysis conducted.First, the decision to use delivery challenges as an explanatory framework was made to account for as much of the total variation in project outcomes as possible, providing a granular measure of what can "go wrong" throughout implementation.The granularity in explanatory factors calls for a systematic procedure for determining which challenges pose the strongest impact on outcome, ceteris paribus, and the relative effect of each delivery challenge.
The analysis thus demands a method of variable selection that assesses the relative impact of each covariate in a model space composed of the total number of possible variable combinations.We aim to address endogeneity brought about by potential omitted variable bias whilst also accounting for model selection effects that arise as a function of the large number of indicators considered.Bayesian model averaging provides a natural solution, systematically accounting for model uncertainty while still offering a robust model inclusion parameter (Hoeting, 2002).Analysis of project outcomes is inherently complex, demanding the consideration of a broad explanatory framework and multiple modeling decisions, many of which may be subject to arbitrary considerations in a typical frequentist approach (Clyde and George, 2004).Examples include setting limits for p-values, accounting for outlier observations, and modulating deviation using mechanical adjustments such as "robust" standard errors (Clyde, 2003).
An over-reliance on common practices or heuristics can result in the selection of a "best" model based on a set of less-than-robust assumptions, a practice that can be particularly risky when such a large model space is under consideration (Draper, 1994).An understanding of covariate effects (both in terms of substantive and statistical significance) will therefore profit from a methodology that explores the full model space instead of arbitrarily exploring certain segments of it.Such an approach offers the added advantage of accounting for the difficulties of variable selection in those cases where models are sensitive to specification choices.Instead of specifying models according to pre-existing conventions that may systematically over-or under-emphasize coefficients and certainty, Bayesian model averaging establishes a hierarchical model (per George and McCulloch, 1997) that assesses covariate effects relative to all other possible model specifications in the model space.
Model averaging draws each model's posterior probability as a weight in determining its marginal likelihood relative to the sum of all marginal likelihoods in the model space (Amini and Parmeter, 2011).Quantities of interest are represented as weighted averages of model-specific parameters taken over all tested specifications in which that covariate was included (Chipman et al., 2001).This is a rather intuitive approach as a model choice method, since the model probabilities are made easily interpretable as a function of their impact relative to the variable's posterior probability of inclusion.In addition, where the model space is too large to allow full enumeration of all possible specifications, the method can be expanded with the aid of stochastic sampling such as Markov Chain Monte Carlo (MCMC) methods proposed by Green (1995), or else Markov Chain Monte Carlo Model Composition (M C 3 ) proposed by Simmons et al. (2010), which assesses model changes according to a fix acceptance probability.This approach (technically an ensemble learning method) relies on posterior model probabilities, which are calculated in terms of model priors p(M j ) and marginal likelihoods p(y|M j , X), for all models contained in the model space j ∈ {1, 2, ..., 2 k } (Hoeting et al., 1999).As mentioned earlier, there exist 2 k possible model specifications (including models with no predictors), which define the model space M = {M 1 , M 2 , ..., M k }.We specify prior distributions on the model's parameters α,β s , and σ 2 for ordinary least squares estimation of coefficients and errors.For any given model M s in the model space, weighted averages are created from all potential covariate combinations (or else, based on an MCMC sample of the model space in cases where the model space is too large for full enumeration), generating a posterior Model Probability (PMP) based on the marginal likelihood p(y|M s , X).The PMP for model our chosen model M s is therefore conditional on the data (y, X) and proportional to its marginal likelihood such that: Calculation of posterior model probabilities for each specification is facilitated by a hyperparameter based on Zellner's g-prior for each regression coefficient, which accounts for the researcher's level of certainty that the coefficients are zero (Zellner, 1986).11Using the g-prior, we can calculate a closed-form expression of posterior statistics based on the Normal-conjugate Normal-Gamma framework (representing a lack of prior knowledge).This yields the following posterior marginal likelihood for the hyperparameter g and k covariates: A small g value renders small prior coefficient variance (pushing coefficients towards zero), whereas a large g value represents greater uncertainty that the coefficients are zero.12Our PMPs are thus actually conditioned on p(M s |y, X, g) conditional on (y, X, g) serving as model weights (Fragoso and Neto, 2015): Given the value of g, the posterior distribution reflects prior uncertainty according to a t-distribution with expected value: where β γ is the standard least-squares estimator for model γ (Chipman et al., 2001).The posterior covariance term is similar to that of the standard OLS estimator, but incorporates the hyperparameter g.The following subsection will delve deeper into the various options for specifying g, choices that can significantly impact the reliability of the results obtained.Comparison of models is conducted via the Bayes factor, which allows for the rank-ordering of model specifications according to a weighting criteria defined by the ratio of marginal likelihoods for the two competing models (Kass and Raftery, 1995).Thus, Posterior Inclusion Probabilities (PIPs) become the sum of probabilities for all models in which a covariate is included.

Specifying the Hyperparameter
As noted above, the hyperparameter g influences both posterior inclusion and model probabilities (PIPs and PMPs, respectively) by determining the concentration and spread of the posterior model mass.g serves as the posterior shrinkage factor, determining the weight placed on different posterior model size distributions.We use Zellner's g for the sake of consistency and parsimony and take care to control against certain idiosyncrasies inherent in hyperparameter specification.In some cases, the choice of g can concentrate model mass on a few or even a single well-performing model: this is referred to as the "supermodel" effect.Choice of g can therefore be vital to the variable selection effort central to model averaging.
In addition, a random-theta model prior is used with g set at the Bayesian Risk Inflation Criterion (BRIC), or g = max(N, k 2 ) for k covariates.Lastly, we employ the hyper-g prior distribution with the prior expected shrinkage factor conforming to the Unit Information Prior (Liang et al., 2008).This allows for the incorporation of data-dependent shrinkage in variable selection, adjusting prior weights according to data support.Flexible (as opposed to fixed) priors tend to offer better outof-sample performance and greater accuracy than fixed-g priors.Instead of concentrating posterior model mass on a small subset of variables, flexible prior choices for g will create a more tolerant model mass parameter, protecting against the impression that the data generating process is driven by a few indicators with very high inclusion probabilities.This, however, occurs at the cost of computational complexity.By incorporating both fixed and flexible priors, we therefore aim to guard against undue model selection side-effects that may otherwise affect the results of our analysis.13 4 Data Analysis

Diagnostic Tests
Prior beliefs regarding some of the tested covariates were summarized from the relevant literature as the first step of the analysis.This provided a theoretical point of reference from which to evaluate the posterior coefficients.14Models were run with delivery challenges measured at the cluster, category, and subcategory levels, systematically increasing the granularity of analysis and allowing for the disaggregation of substantive effects for each delivery challenge.Over 50 model averaging runs were conducted, testing the three levels of delivery challenges on project outcomes, performance, and supervision, including basic covariates and fixed effects, and testing a variety of alternate model priors.
In addition to basic controls such as project cost, duration, the presence of various specialists, setting, and Task Team Leader (TTL) quality, several indicators of aid effectiveness from Girod and Tobin (2016) were included to ensure that all relevant context-and project-based variables were accounted for.Before conducting the model averaging runs, data were tested for collinearity and missingness issues in order to ensure that variable selection was not unduly affected by structural faults inherent in the samples tested.Inclusion of collinear indicators can introduce an unintended stochastic element that may even lead to one of the collinear variables being chosen at random.Similarly, listwise deletion of observations can severely impact predictive power.Such tests were conducted on diagnostic subsets of control variables used as well as all subsets used in model averaging runs.Furthermore, models were run using both original and imputed data to guard against potential idiosyncrasies introduced by systematic missingness. 15odel averaging runs were conducted on the delivery challenges as well as five additional subsets: the first two tested delivery challenge clusters alongside the control variables and fixed effects, respectively.Subsets three and four tested the delivery challenges at the category level against the same controls and fixed effects.Finally, the last specification tested subcategories of delivery challenges alongside the main control variables (excluding the Girod-Tobin covariates).As noted in Section 3.2, the specifications were tested on project outcomes, performance, and supervision ratings to test the impact of delivery challenges on the achievement of project objectives as well as the Bank's overall performance.Results from the model averaging runs are presented below according to each outcome indicator.
and presentation of results from alternate prior specifications can be found in the supplemental appendix to this paper.Ultimately, it should be noted that alternate prior specifications did little to impact substantive coefficients, even though PIPs fluctuated a bit.Minor fluctuations were mostly seen in models using the Random-Theta Model Prior with a UIP on Zellner's g Hyperparameter.

Posterior Probabilities: Project Outcomes
Posterior Inclusion Probabilities (PIPs) from the Bayesian model averaging run on delivery challenge categories are presented in Figure 3. Model averaging was conducted using a random model prior, which (per Ley and Steel, 2012) implements a random-theta prior and binomial-beta hyperprior on the prior inclusion probability.The random prior represents the default model prior choice in the BMS package.The model employs the default setting for the hyperparameter on Zellner's g prior, using the Uniform Information Prior (UIP), which sets g = N .The horizontal axis proportionally graphs the Cumulative Model Probabilities (CMPs) for the most likely model specifications: for instance, the top model specification has a posterior model probability of 0.346.Colloquially put, this translates into a 35% likelihood that the combination of variables represents the "true" model specification, given the data.The vertical axis indexes the variables included in this model averaging run (in this case, the 14 delivery challenges), with blue bars representing positive coefficients and the red bars negative.The substantive effects for the delivery challenge categories tested in this model averaging run are summarized in Table 4.
Note that nine of the delivery challenges show posterior inclusion probabilities (PIPs) higher than 0.95, meaning that over 95% of the model mass rests on specifications that include those covariates.In addition, most of the tested variables take a binary value on the conditional positive column, indicating that their effect is either uniformly positive (1) or negative (0).Stability across conditional posterior direction and PIP is suggestive of model convergence and a good test that a covariate is a stable estimator on outcome.Given the size of the model space (2 14 or 16,384 possible combinations), all specifications were enumerated, meaning that the posterior inclusion probabilities in this run are not estimates on a sample of the model space but rather an expression of the total share of possible models in which those variables were included.Response, and Business Environment) show a positive impact on outcome ratings.Challenges pertaining to the project's environment and emergency response have the strongest substantive impact among those named, and are both in the context cluster of delivery challenges.Why might this be the case?Understanding the positive impact of some challenges involves going back to the way the taxonomy was coded in the first place.Delivery challenges are situations and phenomena that test the capacity of the implementation team (and the Bank more generally) to successfully bring a project to fruition.TTLs identify both challenges that hindered the implementation of a loan as well as those that arose but were ultimately overcome.The positive results are therefore an intuitive reflection of the way in which Implementation Completion Reports (ICRs) are compiled, showing diversity in the magnitude and valence of the various obstacles that were faced (and potentially overcome) by the implementation team.In this case, context-based references to emer-gency response and environmental challenges appear to have had a net-positive impact on outcome ratings.
Further disaggregation of the delivery challenges at the subcategory level better illuminates the specific components that account for the positive coefficients shown above.PIPs from the subcategorylevel model averaging run are shown in Figure 4. Posterior coefficients for the subcategories are summarized in Curiously, none of the subcategories housed under Project Finance appear to have PIPs higher than 0.5.Instead, the impact of that category appears to have been subsumed under Macroeconomic challenges, specifically, Financial Instability (0.025).Thus, the issue appears to be more contextual, one that diligent project management has historically managed to overcome.
Further disaggregation shows which specific challenges have the greatest substantive impact on project outcomes.Overambitious project objectives pose the strongest risk to the achievement of outcomes (-0.089).Examining the distribution of challenges by region, we see that overambitious objectives are most prevalent among projects in the Middle East and North Africa (48% of total projects affected), South Asia (46% of projects affected), and Africa (45%).By sector, the challenge impacted 50% of examined water projects, 47% of agricultural projects, and 46% of both transport and public administration proposals.The next most substantively impactful challenges involved organizational capacity failures (-0.068) and changes in government priorities (-0.068), with similar spatial and sectoral variances as overambitious objectives.
Within the Environment & Geography group mentioned earlier, the effect is now shown to be driven by the Geographic Access subcategory, which measures "challenges stemming from problems accessing populations due to geographical barriers and remoteness."Of the 381 occasions that this challenge was experienced, 40.6% of all incidents happened in agriculture and transport sector projects (being identified 93 and 62 times, respectively).As such, its substantive impact may reflect a major surmountable challenge within these projects.Moving on to Disasters & Emergency Response, we see that the Natural Disasters subcategory (as opposed to epidemics) is the real driver behind the positive effect noted above.Agriculture and transport sector projects account for the majority of incidents of this delivery challenge, making up 42.8% of all recorded cases.As before, the positive impact on outcome ratings may reflect the the identification of the primary challenge under consideration in the project.That being said, this is still useful as it quantifies the impact of those challenge (relative to all the others identified at every stage in the project life cycle) and relates their impact on outcome ratings.
Summarizing across these findings, we can see that a handful of delivery challenge subcategories appear to have a clear impact on the achievement of development objectives.Projects that are poorly designed at the onset appear to under-perform, regardless of context and other extenuating factors.The same appears to be the case for those projects that are subject to poor monitoring and evaluation standards.A lack of organizational capacity, commitment, or adequate leadership by stakeholders similarly depresses outcome ratings.Lastly, unsupportive legal or regulatory processes, civil unrest, political interference, and electoral cycles can act as contextual impediments against the achievement of project indicators.To ensure that these effects are resistant to model specification choices, the delivery challenge categories and subcategories were further tested alongside a set of control variables and fixed effects.Posterior coefficients from a model averaging run on categorylevel challenges and control variables are summarized in Table 6.As with the baseline models, problems pertaining to project design (-0.051), stakeholder commitment (-0.081), and monitoring (-0.076) continue to have strong impacts. 17Among the other control variables, the impacts of natural resource rents (-0.057) and approval-closing time (-0.084) are interesting but unsurprising.However, the strong substantive impact of Task Team Leader quality (0.126) is revealing, adhering to the prior expectation expressed in Denizer et al. (2013), Geli et al. (2014), and Raimondo (2016).Tests on fixed-effects models yielded comparable results. 18 Among the delivery challenge clusters, the presence of project-and context-related challenges showed a consistent negative impact on outcomes, irrespective of model specification.The presence of a procurement specialist tended to increase overall ratings (0.054), whereas the presence of a financial management specialist decreased it (-0.052).The difference could be indicative of the relative contextual challenges faced and the respective difficulty of overcoming them.Though TTL quality had a positive impact on outcome, it should be noted that TTL project count had a negligible effect, suggesting that success in project management is dictated by quality, not quantity.Projects in rentier states and former colonies tended to do worse on average, whereas larger states predictably performed better.When tested alongside contextual control variables, the delivery challenges recording disasters and regulatory legislation showed lower posterior coefficients and PIPs.This might suggest that the variation previously captured by those indicators shifted to measures of the project's sector or loan type. 19Per the discussion of results earlier, a positive direction of effect associated with a subset of challenges may be indicative of whether certain issues are coded as insurmountable obstacles whilst others (such as language barriers and prevailing group practices) are viewed as difficulties that can be overcome through careful stewardship and improved project management.
As noted earlier, positive coefficients may be indicative of challenges identified and overcome. 20 With respect to the categories explored above, government relation challenges speak to difficulties in coordinating between different branches and levels of the stakeholder government.Communication strategy challenges impact the sharing of relevant information with beneficiaries, and stakeholder engagement issues deal with the facility with which the implementation team engage relevant subsidiaries.All three are tractable in the sense that timely communication and deeper collaboration with local authorities can prove sufficient in addressing the challenges raised.Furthermore, all three are associated with the quality of project leadership, as they represent facets of project management that can be duly addressed by Task Team Leaders.
Among the challenges that have a negative impact on outcomes, those that fell in the context 17 The posterior coefficients for all three of the above increase when structural controls are added, confirming prior expectations of these variables' significance.
18 Results from the model averaging run with sector-and loan-fixed effects were largely analogous insofar as delivery challenges were concerned.Posterior inclusion parameters and coefficient estimates remained comparable across all specifications.Financial Intermediary Funds (leveraging public and private resources) had a negative impact on outcome ratings, while transport sector projects tended on average to outperform others.Region fixed effects models were also tested (see appendix), showing that projects in East Asia, Eurasia, and Latin-America tended to slightly over-perform.However, the relevant delivery challenges retained their relative impacts over all model averaging runs.
19 It is important to note that a lack of regulation & legislation actually appears to have a positive impact on project outcome overall, whereas the presence of an unsupportive legal & regulatory process has a negative impact on project outcome.
20 For example, we see that gender-based delivery challenges have a positive impact on outcome, potentially highlighting instances in which these issues were overcome or else reflecting a situation in which the project succeeded in spite of difficulties.
cluster involved governance, politics, and conflicts.Civil unrest, electoral cycles, and political interference lowered project outcome ratings more than other challenges in this cluster.In terms of project attributes, design and monitoring were most likely to lower outcome ratings.Poorly designed or unrealistic indicators, lack of adequate supervision, insufficient time allocation, and overambitious objectives had the strongest substantive impacts on outcome.In addition, difficulties in coordinating among different entities in the stakeholder government (specifically, when differences in priorities, resources, or expectations are involved) tended to lower outcome ratings.Model averaging runs on alternate priors largely confirm these findings. 21aving discussed findings on project outcomes, we now turn towards two additional measures, testing the impact of delivery challenges on project performance and the quality of supervision.These are discussed in the next two sections.

Posterior Probabilities: Project Performance
As noted in Section 3, IEG assigns ratings not simply with respect to project outcomes but also for the effectiveness of the Bank's role in managing projects.While ratings of World Bank performance may be endogenous to project outcome ratings, for our purposes the measure offers a useful indicator for testing the delivery challenge taxonomy on an instrument explicitly measuring Bank involvement.A comparison of posterior probabilities across outcomes and performance shows that delivery challenges have a comparable impact across the two measures.However, given that IEG's performance measure focuses on whether the Bank ensured quality at entry (by facilitating the preparation and design) and maintained the quality of supervision, several of the challenges behave differently in these tests.Figure 5 illustrates cumulative model probabilities of delivery challenge categories on project performance ratings.
Project Finance, Coordination & Engagement, and Disasters & Emergency response continue to show consistent positive coefficients, as was the case with tests on project outcomes.However, two additional delivery challenges (Macroeconomic Environment and Socio-Cultural Environment) are shown to impact the performance measure.Furthermore, the Governance & Politics delivery challenge has a negligible impact on performance, something to be expected given the measures evaluated in this variable.Comparing cumulative model probabilities on performance, we see that the top-three model specifications (accounting for 52% of the posterior model likelihood) all include delivery challenges related to project financing, design, and monitoring, stakeholder challenges impacting engagement and commitment, as well as several contextual features related to conflict, culture, macroeconomic environment, and emergency response.Environmental and HR-related challenges feature in second-and third-best model specifications, but both yield relatively lower PIPs and posterior coefficients.
Table 7 presents the mean posterior impacts of the delivery challenges with respect to their impact on project performance.Given that the outcome measure evaluates the quality of supervision and quality at entry, it is unsurprising that the three most impactful delivery challenges are those impacting Data & Monitoring (-0.107),Project Design (-0.071), and Macroeconomic Environment (0.064).Unlike the model averaging runs on project outcomes, the models assessed here show that contextual challenges have a moderate but largely positive impact on performance assessments.
The first component of the Bank performance measure focuses on project design, including the "extent to which the Bank identified, facilitated preparation of, and appraised the operation such that it was most likely to achieve planned development outcomes and was consistent with the Bank's fiduciary role." 22The second component evaluates the quality of supervision, measuring whether and to what extent adequate steps were taken to proactively identify and resolve threats to efficient implementation. 23As such, positive coefficients on contextual challenges may represent areas in which adequate preparation and supervision were conducted; in other words, those challenges appear to be areas in which project leaders took pains to account for imminent obstacles to project delivery and the achievement of development objectives.This may also explain why the identification of Socio-Cultural and Macroeconomic factors was associated with higher Bank performance ratings, once again reflecting the project quality and monitoring dimensions of the instrument.
Similar to the tests on outcomes, delivery challenges are further disaggregated at the subcategory level in order to better understand which subcomponents are most closely related to variation in Bank performance.These are shown in Figure 6.Coefficients from the model averaging run on performance using subcategory-level delivery challenges are summarized in Table 8.Both components of the project design category show a negative impact on outcome, with overambitious objectives (-0.095) weighing somewhat more than time allocation issues (-0.068).Likewise, challenges relating to the availability of indicators (-0.074) and reporting (-0.075) negatively impacted Bank performance, a feature that clearly relates to the supervision component of the measure.Turning to the H.R. category, a curious phenomenon can be seen.While that challenge has a relatively low PIP and substantive coefficient at the category level, disaggregation of the effect at the subcategory level shows that this is caused by the presence of two challenges with opposing effects, which appear to negate one another at the category level.Thanks to the more granular data, we see that while skill transfer challenges have a moderate positive impact, the subcategory assessing organizational capacity actually has a relatively pronounced negative impact on Bank performance ratings (-0.078).Finally, disaggregation of challenges shows that the macroeconomic challenges most relevant for understanding Bank performance relate to financial instability (0.061) and foreign-exchange volatility (0.048).These and the rest of the context-cluster challenges have a positive impact on performance ratings, with two exceptions: challenges related to civil unrest and post-conflict climates continue to reduce performance ratings, but their substantive impact appears to be relatively moderate.
The results remain roughly consistent when external controls are included in model averaging.This output is shown in Table 9.The six delivery challenges highlighted above retain their impact, in addition to several structural variables that now show relatively high posterior coefficients.As before, approval to closing time remains an informative determinant of overall performance, showing a strong negative coefficient (-0.118).Added focus on performance (as opposed to the achievement of project indicators) accounts for why this variable has a stronger impact on performance relative to outcome.While TTL quality yields a net-positive impact on performance (0.087), the presence of a financial management specialist decreases performance ratings overall (-0.084), as was the case with outcome ratings.The presence of a specialist is not itself indicative of effectiveness; rather, it appears to serve as an informative heuristic for weaker projects in need of readjustment.
Taken together, this output appears to confirm pre-existing expectations of what ought to impact performance ratings, benchmarking the impact of relative challenges in order to provide an overview of how these features behave relative to one another.In order to complete the analysis, the next section will focus explicitly on the supervision feature of the performance measure.

Posterior Probabilities: Project Supervision
Figure 7 shows cumulative model probabilities for a model averaging run of delivery challenges on project supervision ratings.Given that the outcome measure constituted one of the two components in the Bank performance indicator, results across the two runs should be roughly comparable.This appears to be the case.Project Finance and environmental challenges are less significant a measures on supervision, replaced by challenges impacting H.R. & Organizational Capacity.This appears to follow our prior expectations for understanding the determinants of success in supervision.Examining the posterior coefficients in Table 10, we see that Project Data & Monitoring challenges expectedly have the highest negative posterior coefficients on supervision (-0.109), followed by Project Design challenges (-0.067).On the other hand, Macroeconomic challenges as well as problems related to Coordination & Engagement have the highest positive impact on supervision quality, with both showing a mean coefficient of 0.048.
Disaggregating the results as before, we find that a similar subset of delivery challenges impact project supervision ratings.This is illustrated in Figure 8.The breakdown of posterior coefficients in Table 11 shows that Overambitious Objectives (-0.105), absence of effective indicators (-0.069),Organizational Capacity (-0.061), and inadequate Reporting & Supervision (-0.092) all contribute to lower supervision ratings.Once again, the indicators related to higher average supervision ratings appear to link to those areas where project leadership was able to proactively identify and resolve risks to efficient implementation.Output from the models testing supervision quality shows that effective treatment of stakeholder engagement challenges, socio-cultural idiosyncrasies, and macroeconomic risks all partially contribute to better evaluations of Bank supervision.These results remain roughly consistent when control variables are added to the model space (see Table 12): once again, TTL quality (0.080), approval-duration timespan (-0.118), and the presence of a financial management specialist (-0.064) also appear to serve as substantively significant correlates.
Though not discussed at length here, the supplementary appendix also analyzes the impact of delivery challenges on IEG's rating of project quality at entry. 24Overambitious Objectives, Reporting & Supervision, Time Allocation, absence of Indicators, and Organizational Capacity are all shown to drive down ratings on this measure.On the other hand, socio-cultural challenges identified with gender relations and macroeconomic issues related to financial instability and forex volatility both appeared to be associated with higher quality at entry.

Discussion of Results
This paper used the DeCODE taxonomy to systematically test the ways in which different delivery challenges impact project outcome ratings, Bank performance, and the quality of supervision.Bayesian model averaging was used to measure the impact of different project challenges on the achievement of development objectives, iteratively testing granular issues in order to pinpoint the substantive impact of each one.Next, the delivery challenges were tested alongside various structural and contextual controls to make sure that the effects would be robust to the inclusion of standard covariates (e.g.size of commitment, democratization, level of industrialization, TTL quality, natural resource rents, etc.).
Given the number of averaging runs conducted and the voluminous quantity of outputs from different model specifications and outcome variables, a heuristic may help in condensing the results of the project into a more readily digestible format.Table 13 visually summarizes the impact of the delivery challenges on project outcome ratings.The figure nests the subcategories and categories in order to highlight which subcomponents of each indicator (if any) were the more robust correlates of project outcome ratings.Variable inclusion measures are aggregated according to the posterior model probabilities from the top models selected in each model averaging run. 25If a variable was selected in any of the runs for inclusion in the top model specification, its substantive impact was color-coded on the table, with red representing a negative impact on outcome and blue a positive effect.
Beginning at the cluster level (projects, stakeholders, and context), project-level challenges are shown to be prominent correlates of outcome ratings.Disaggregation at the category level, however, shows that a battery of challenges in all three clusters are substantively impactful.Further disaggregation helps identify which specific subcomponents of each challenge category most impact outcome ratings.The summary of outcome-related challenges is illustrative in that most projectcluster problems are shown to have a net-negative impact on outcome.At the stakeholder level, the overall effect is mixed, with certain challenges (skill transfer, stakeholder engagement, communication strategy, government relations) arising but ultimately corresponding to higher-overall outcome ratings.A similar mixture was seen with respect to context-cluster challenges, with those related to conflict and governance reducing outcome ratings and others associated with emergency response, environment, and geography improving it.Taken together, however, the challenges with the strongest posterior estimates had a negative impact on outcome ratings.
Table 14 provides a similar summary for the models tested against Bank performance.Nearly all of the project-cluster delivery challenges (with the exception of project financing) were shown to have a negative impact, along with stakeholder organizational capacity and context-dependent conflict or instability.The majority of context-cluster challenges appeared to have a positive impact on Bank performance.However, with the exception of financial instability, geographic access, and issues related to culture, religion, and ethnicity, the most substantively significant challenges had a negative impact on performance.Lastly, Table 15 provides a summary for tests on project supervision.Geographic access, culture/religion, and stakeholder coordination show strong substantively positive coefficients but the remaining challenges with large posterior means tend to negatively impact supervision.While the output from the three tables largely conforms to our prior expectations, several gaps in the literature are filled by the holistic methodology employed here.
Despite the marginally positive impact of project financing, all model averaging runs identified project-related challenges as having a negative impact on outcomes. 26The variation explained by the variable was captured when structural controls were included.Within the project design category, disaggregation of challenges showed that overambitious objectives and time allocation issues were the predominant contributors to the variable's negative valence.Likewise, of the three subcategories included in the project data & monitoring category, poorly-designed indicators and an inability to capture or report relevant information most significantly depressed outcome ratings.Overambitious Objectives were a consistently strong negative correlate across the three outcomes measures, with coefficients of -0.089, -0.095, and -0.105 on outcome, performance, and supervision, respectively.Reporting & Supervision challenges were similarly informative, especially for models tested against the quality of supervision.The subcategory yielded posterior means of -0.054, -0.075, and -0.092 on outcome, performance, and supervision, respectively.The presence of adequate indicators behaved similarly (-0.051; -0.074; -0.069).Lastly, Time Allocation & Sequencing challenges were useful for explaining variation in Bank performance and supervision, with posterior means of -0.068 and -0.037, respectively.
Three challenges from the stakeholder cluster had strong substantive impacts on the outcome measures tested.Organizational Capacity negatively impacted all three variables, with posterior means of -0.068, -0.078, and -0.061 on outcome, performance, and supervision, respectively.While changes in priorities only impacted outcome ratings (-0.068), poorly-defined roles and responsibilities were shown to be informative for understanding the quality of supervision (0.044).The subcategories address similar concerns: when flagged, these issues represent cases where project managers experienced difficulties in engaging beneficiaries and relevant parties.In particular, the implementing team may have experienced hardships in raising awareness and sharing relevant information about the credit or loan with the general public.They also flag instances where the priorities and expectations of the recipient government and relevant stakeholders diverged from those of the Bank.Such divergences, however, need not automatically translate to lower rates of efficiency in the implementation of the project itself.Though difficulties in communicating with beneficiaries may have been highlighted, it appears that these challenges were ones that were ultimately overcome using a more dedicated engagement strategy.This could account for the overall positive impact on project ratings.In addition, differences in expectations with the recipient government do not automatically translate to poor implementation if the goals and objectives of the project were nonetheless achieved in a timely and efficient manner.A similar explanation could apply to commitment and leadership challenges: when priorities suddenly change, projects are less likely to succeed.In addition, when priorities experience a sudden loss of commitment, projects are likely to suffer.This could link with the negative coefficients associated with time allocation & sequencing challenges, as a sudden change in goals or an inability to retain commitment to a particular intervention were shown to correlate to a decline in a project's likelihood of success.
In the context-cluster of challenges, only issues relating to geographic access consistently impacted all three outcome measures, with posterior means of 0.050, 0.038, and 0.032 on outcome, performance, and supervision, respectively.Financial instability also positively impacted the quality of Bank performance (0.061) and supervision (0.059), with challenges related to culture and religion behaving similarly.We further see that incidents of ongoing civil unrest & armed conflict tend to lower outcome ratings, as do political interference and issues related to elections and changes in government.These disruptions may link with the negative impacts explored in the stakeholder cluster, specifically, those involving changes to priorities and commitment.By contrast, it appears that corruption and abuses of public power are less-often cited in ICRs and therefore do not correspond to outcome ratings in this framework.This does not imply that patronage and clientelism do not influence outcome ratings (as has been suggested in the literature); rather, it may suggest that certain pressures influence the way in which implementation completion reports are written and the challenges that are explicitly cited therein.This suggests room for growth in future research of project effectiveness.
As noted earlier, some of the findings explored may be endogenous, given that the project itself may have set about ex ante to extend a loan or credit to address geographic barriers, remoteness, or the devastation wrought by a natural disaster.These features would specifically impact projects in certain sectors (such as transportation or agriculture).The positive effects might relate to improved project design attributes attached to those projects that address geographic isolation or natural disasters: their scope and objectives may be better defined and their deliverables more tightly focused than other comparable projects.In order to account for these idiosyncrasies, models were tested using both sector-and loan-type fixed effects, which would have captured any of the subgroup effects expressed here.Ultimately, the coefficients remained roughly consistent even with these indicators.
We conclude with an overview of the delivery challenges that most strongly impacted each outcome measure, as well as those that were consistently significant across measures.With respect to outcome ratings, changes in priorities, weaknesses in organizational capacity, and overambitious objectives (all related to project design or stakeholder engagement) consistently lowered outcome ratings, ceteris paribus.Turning to Bank performance, we see that challenges related to project design (overambitious objectives) as well as those related to project data and monitoring yielded the greatest impact.The same appeared to be true for tests on the quality of supervision.Of the 42 individual challenges tested, four appeared to be consistently present in the model averaging runs.These were the quality of indicators, reporting and supervision, overambitious objectives, and organizational capacity.Given that all but the last were related to project design, we can conclude that the overall effectiveness of World Bank projects is to a great extent influenced by a set of factors established at the time the project is planned.Furthermore, it appears that these factors are relatively robust to variations in leadership, stakeholder relations, and context, posing a consistent risk to effective delivery across the portfolio of projects examined.

Conclusion
Can implementation-driven data on project challenges be used to holistically assess the effectiveness of World Bank lending?This paper took advantage of a wealth of practitioner experience captured in implementation completion reports as a granular and cross-comparable measure of project outcomes and Bank performance.Using a data-driven taxonomy, we devised a novel approach to examining the determinants of project success.By focusing on how projects fail and the specific challenges faced in successful execution, we were able to capture variation hitherto unexplored in the aid effectiveness literature, offering a more precise assessment of the challenges faced throughout the implementation cycle while also providing a cross-cutting measure that reaches across the country-, time-, and sector-based approaches previously employed.Thus, while previous work offered insights into the individual mechanisms or contextual factors that might impact project effectiveness, we jointly tested the obstacles to success using a comprehensive analytic framework that allowed for the benchmarking of dozens of competing theories in a single unified model.Adjusting for alternate explanations and various contextual and structural preconditions, which components of implementation were most likely to impact project success?Using project-level data from over 5,000 lending projects approved between 1995 and 2015, we tested this question using 42 specific delivery challenges across 14 overarching thematic categories.The data were analyzed using iterated Bayesian model averaging runs on project outcome, performance and supervision.Additional robustness checks included structural covariates and alternate model prior specifications to ensure that estimates remained consistent.The steps above were preceded by a comprehensive assortment of diagnostic tests to ensure that the results were both robust to modeling choices and insensitive to collinearity and missingness issues inherent to the data.Output from over 50 model averaging runs was summarized according to the three broad clusters covered by the delivery challenge taxonomy, highlighting both the challenges that were likely to impede performance as well as those that were identified by project leaders but subsequently overcome.Overall, the results substantiated several theoretical priors from the literature and provided further nuance to other mechanisms that had been tested with insufficient vigor.As a holistic test of the correlates of project success and Bank performance, this paper makes several contributions to our understanding of why projects succeed, how they are likely to fail, and what challenges are most commonly surmounted through skillful supervision.
Three generalizable lessons can be extracted from the analysis.First, we find that unrealistically ambitious goals and overly-complex project designs are not only likely to hinder the Bank's performance, but also are liable to impede the achievement of development objectives.Second, we find that information is necessary for success: poor data collection, inadequate monitoring, and inappropriate indicators contribute to project failure in all settings.This can be particularly problematic if a project employs duplicated or overlapping indicators, or if measures are not correctly calibrated against project objectives.We further observe that data collection is a necessary but insufficient condition for success; its importance becomes apparent when paired with timely reporting and supervision.Third, issues linked to stakeholders' organizational capacity are likely to hamper performance, particularly when institutional arrangements are insufficient to protect against emerging risks to development outcomes.
The dearth of available data opens the door to several future studies of delivery challenge impacts.First, a disaggregation and qualitative analysis of legislation and regulation challenges could shed light on the dynamic tension between a lack of regulatory frameworks and interference from unsupportive and redundant legal and regulatory processes.Second, additional focus on projects with environmental and emergency response objectives can help practitioners by disaggregating the positive correlates found in this paper.Why are these challenges (as well as several others found in the context cluster) more likely to be surmounted whereas challenges related to governance and politics tend to be more intractable?Lastly, a more in-depth exploration of challenges related to human resources and organizational capacity would shed light on the attributes of project supervision that are most likely to influence implementation outcomes.These and other sector-and region-specific studies would positively contribute to our understanding of project performance and implementation, building on the insights generated in this analysis.Table 1: Project-Cluster Delivery Challenges

Category Subcategory Definition
Project Design

Overambitious Objectives
Challenges caused by setting targets that are unrealistically ambitious or making the project design overly complex.
Delivery challenges stemming from flaws in project design, including overly-complicated design, overambitious objectives, inappropriate time allocation, or issues in identifying and selecting/ targeting stakeholders and beneficiaries.

Data Availability & Baselines
Challenges that stem from a lack of current or accurate data, as well as an inability to produce baselines.

Reporting & Supervision
Challenges caused by obstacles in capturing relevant information and reporting in a timely fashion.

Ecosystem
Challenges specific to the ecological makeup of the area.

Disasters & Emergency Response
Natural Disasters Challenges stemming from natural disasters.
Delivery challenges caused by natural/manmade disasters or other unexpected emergency situations.

Epidemics
Challenges stemming from disruptions caused by epidemics.

Business Environment
Informal & Illegal Markets Challenges caused by distortions of high informality and shadow/parallel markets.Delivery challenges caused by a weak private sector, or weak sector regulations.

Macroeconomic
Environment Financial Instability Challenges stemming from disruptions in the financial system.Delivery challenges caused by instability, volatility, or interruptions in trade, market conditions, or financial systems.

Forex Volatility
Challenges caused by sudden currency devaluation/ depreciation or restrictions relating to transfer of forex.
Examining posterior means, we can see that the delivery challenges' PIPs positively correspond to their substantive significance, with Data & Monitoring (-0.07) and Commitment & Leadership challenges (-0.06) showing the strongest impact on outcome ratings. 16Cumulative model probabilities for the first two specifications account for over half of the posterior model likelihood.These include the first nine delivery challenges with either Legislation & Regulation or (in the second specification) Business Environment challenges.The results highlight an intriguing attribute of the delivery challenges visualized in Figure 3: of the covariates that show high posterior probabilities of inclusion in the top specifications, five of them (Project Finance, Coordination & Engagement, Environment & Geography, Disasters & Emergency

Figure 1 :
Figure 1: Delivery Challenges by Sector

Figure 7 :
Figure 7: Posterior Inclusion Probabilities -DC Categories on Supervision

Table 5 .
Focusing on the subcategories of the four most substantively impactful positive categories identified above (Disasters & Emergency Response, Environment & Geography, Coordination & Engagement, and Project Finance), we can identify the specific subcomponents responsible for this effect.Disaggregation of challenges shows us that Natural Disasters (0.044), Geographic Access (0.05), Governmental Relations (0.042), Awareness & Communication Strategy (0.034), and Stakeholder Engagement (0.023) account for the impact seen at the category level.

Table 3 :
Context-Cluster Delivery Challenges

Table 4 :
Impact of Delivery Challenge Categories on IEG Outcome

Table 5 :
Impact of Delivery Challenge Subcategories on IEG Outcome

Table 6 :
Delivery Challenge Categories and Controls on IEG Outcome

Table 7 :
Impact of Delivery Challenge Categories on IEG Performance

Table 8 :
Impact of Delivery Challenge Subcategories on IEG Performance

Table 9 :
Delivery Challenge Subcategories and Controls on IEG Performance

Table 10 :
Impact of Delivery Challenge Categories on IEG Supervision

Table 11 :
Impact of Delivery Challenge Subcategories on IEG Supervision

Table 12 :
Impact of Delivery Challenge Subcategories and Controls on IEG Supervision

Table 13 :
Summary of Delivery Challenge Impacts on IEG Outcome

Table 14 :
Summary of Delivery Challenge Impacts on IEG Performance

Table 15 :
Summary of Delivery Challenge Impacts on IEG Supervision