The World Bank Economic Review, 36(3), 2022, 629–645 https://doi.org10.1093/wber/lhac001 Article Downloaded from https://academic.oup.com/wber/article/36/3/629/6555696 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 What Do Local Government Education Managers Do to Boost Learning Outcomes? Jacobus Cilliers, Eric Dunford, and James Habyarimana Abstract Recent public sector reforms have shifted responsibility for public service delivery to local governments, yet little is known about how their management practices or behavior shape performance. This study reports on a comprehensive management survey of district education bureaucrats and their staff that was conducted in every district in Tanzania, and employs flexible machine-learning techniques to identify important management practices associated with learning outcomes. It finds that management practices explain 10 percent of variation in a district’s exam performance. The three management practices most predictive of performance are (a) the frequency of school visits, (b) school and teacher incentives administered by the district manager, and (c) perfor- mance review of staff. Although the model is not causal, these findings suggest the importance of incentives and active monitoring to motivate district staff, schools, and teachers, that include frequent monitoring of schools. JEL classification: I25, I28, O15, H75, H83 Keywords: education, bureaucrats, Tanzania, public management 1. Introduction There is increasing interest in the role that bureaucrats, including local government managers, play in public service delivery in developing countries. This research draws on the literature in two fields of social science. First, the public administration literature examines the role that street-level bureaucrats—such as teachers, health-care workers, and police officers—play in implementing policy (Lipsky 1980; Brodkin 2006, 2011). This research, predominantly from developed countries, has emphasized that civil servants often have considerable discretion in how policies get implemented, and thus play an important mediating role in translating policy into local outcomes (Carpenter 2001).1 Second, a related and more recent body Jacobus Cilliers (corresponding author) is an assistant professor at Georgetown University, Washington DC; his email is ejc93@georgetown.edu. Eric Dunford is an assistant professor at Georgetown University, Washington DC; his email is eric.dunford@georgetown.edu. James Habyarimana is the Provost Distinguished Associate Professor at Georgetown Univer- sity, Washington DC; his email is jph35@georgetown.edu. The authors thank Shardul Oza, Roxanne Oroxom, and Anthony Mwambanga for exceptional program management and research assistant support. The authors thank Christina Brown, Aidan Eyakuze, Baruani Mshale, Risha Chande, Youdi Schipper, and anonymous reviewers from the RISE Intellectual Leader- ship Team for their helpful comments and suggestions. Funding is provided by Research into Improving Systems of Education (RISE) and Twaweza. A supplementary online appendix for this article is available at The World Bank Economic Review website. 1 This has inspired a growing body of research in developing countries on the impact of reforms on public service delivery outcomes (Grindle 1997; Galiani, Gertler, and Schargrodsky 2008; Hanushek, Link, and Woessmann 2013; Faguet and Pöschl 2015; Schuster, Meyer-Sahling, and Mikkelsen 2020; Zarychta, Grillos, and Andersson 2020). © The Author(s) 2022. Published by Oxford University Press on behalf of the International Bank for Reconstruction and Development / THE WORLD BANK. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 630 Cilliers, Dunford, and Habyarimana of research in economics inspired by studies of private sector firm productivity (Bloom et al. 2013) has generated tools to measure managers’ behavior (Bloom and Van Reenen 2007), and evidence of their impact on productivity in the public sector (Bloom et al. 2015; Pepinsky, Pierskalla, and Sacks 2017). Other related work examines how the degree of autonomy (Rasul and Rogger 2018; Rasul, Rogger, and Downloaded from https://academic.oup.com/wber/article/36/3/629/6555696 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 Williams 2019), the role of information (Banerjee et al. 2020), and the structure of incentives across the public sector hierarchy (Deserrano et al. 2021) shapes public agencies’ productivity. This paper builds on this research and measures what local government managers do and explores what practices explain variation in learning outcomes in Tanzania. This is in a context where recent pub- lic sector reforms have shifted monitoring responsibilities away from the central government bureaucrats to local governments—giving local government managers more discretion in specific domains of policy implementation. For this purpose, we conducted a very detailed management survey in 2019—taking inspiration from the World Management Survey (WMS) (Bloom and Van Reenen 2007)—of District Education Officers (DEOs) and their staff in each of the 184 Local Government Authorities (LGA) in the country. The Tanzanian education system is decentralized and DEOs are responsible for implement- ing government policies at the LGA level, with an average of 97 primary schools per LGA. DEOs have limited de jure discretion at school level, since central government decides on the curriculum, resource allocation, and the hiring, firing, and promotion of teachers. However, they have substantial discretion in the supervision of schools, and how they manage the staff who directly report to them. For this reason, Ward Education Officers (WEOs), who report directly to the DEOs and act as a conduit of communica- tion between the DEO and all schools in the Ward (there are typically 4–5 schools per Ward) were also surveyed. By collecting information from a cadre of staff that the manager supervises, this study is able to make a more detailed observation of management practices than the typical manager-focused survey. These survey results are then combined with rich sources of secondary data, including student perfor- mance in national standardized exams in 2012 and 2019, 2002 census data, and data from a nationally representative household survey conducted in 2015. This paper makes a number of empirical contributions. First, it documents considerable variation in DEO management practices. It also documents positive and significant associations between aggregates of these measures that increase confidence in the construct validity of the fielded measures. It further validates these measures by showing that a particular set of the measures of managerial behavior is higher in regions supported by two large donor programs focused on boosting related aspects of governance. Second, it employs flexible machine-learning methods to examine the importance of management prac- tices in explaining variation in learning outcomes. The model also includes historical socioeconomic characteristics—such as employment rates, access to government facilities, parents’ level of education, parents’ investment in their child’s education, and child anthropometric measures—and baseline aca- demic performance of each LGA, all of which might be correlated with both current student test scores and current management practices. Given the limited number of observed units and the large number of managerial practices and indicators for socioeconomic status, the paper uses a random forest (RF) with permutation-based variable importance algorithm to perform feature selection. RF allows one to explore a wide range of predictors, even when the number of observations is less than the number of predictor variables, since only a subset of predictors are used to construct each tree in the ensemble. K-fold cross- validation is then used to tune the parameters of the model, preventing overfitting but ensuring the model provides a decent approximation of the data generating function. Next, it isolates the subset of important features (i.e., the variables the model relied on most to make its prediction) by permuting each variable and assessing the drop in cross-validated R-squared. The approach provides a way of exploring which managerial practices matter most. Moreover, the tree-based models allow for the detection of complex interactions and higher-order relationships, detecting managerial practices that may only be important under specific socioeconomic conditions. The World Bank Economic Review 631 The study finds that the observed variation in management practices explains about 10 percent of the overall variation in test scores. This magnitude is consistent with recent findings by Fenizia (2019) who notes that public sector managers in Italy explain 9 percent of overall productivity, although the empirical approach of this paper is very different.2 For comparison, Park and Hannum (2001) found that Downloaded from https://academic.oup.com/wber/article/36/3/629/6555696 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 differences between teachers explain about 24 percent of variation in student learning in China.3 In this analysis, management practices have less explanatory power than socioeconomic factors (23 percent), or prior academic achievement (16 percent). Delving further into which particular practices influence learning outcomes, the three most important practices are (a) monitoring of schools by the WEO, (b) school and teacher incentives administered by the local government manager, and (c) performance review of WEOs. This paper joins a small but growing body of work documenting the managerial practices of middle- tier bureaucrats and linking those practices to measures of service delivery. This analysis does not include political factors, such as electoral competition and party affiliation, which potentially impact both the behavior of bureaucrats and quality of education service delivery.4 As such, the question of this paper is narrower: it documents what local government actors do, taking the constellation of clients and politics as given. Previous work has shown a positive correlation between school-level management practices and stu- dent learning, or student value-added in learning (Di Liberto, Schivardi, and Sulis 2014; Bloom et al. 2015; Crawfurd 2017; Leaver, Lemos, and Scur 2019). Building on these papers, a new avenue of re- search has sought to measure and explain the performance of public sector managers that work at a level below the central government and above frontline providers. Rasul and Rogger (2018) and Rasul, Rog- ger, and Williams (2019), for example, examine the role of autonomy and incentives in strengthening civil servants’ performance in Nigeria and Ghana, respectively. Other studies, that focus on education, have demonstrated that district superintendents can meaningfully shape school quality in their district (Meier and O’Toole 2002; Lavy and Boiko 2017; Walter 2018). These results also contribute to a long-standing literature on street-level bureaucracy (Lipsky 1980; Brodkin 2011), in two important ways: first, to a domain that sits between client-facing bureaucrats (teachers and head teachers) and the policy makers and managers at the central level, and second, in the spatial coverage by collecting detailed management practices of the universe of district education managers in Tanzania. Two features of the context are particularly relevant to the degree of discretion that drives the variation in measured behaviors. First, the process of decentralization has arguably increased the degree of discretion exercised by remotely located district education managers. Second, and perhaps more important, education services are structured so that the ministry of education makes policy, but a different ministry, of local and regional government, implements it. This gives street-level bureaucrats and their immediate district-level supervisors more latitude in interpreting their mandate. In addition to the empirical results, this paper shows the value of employing machine-learning tech- niques, which allows one to take a descriptive, data-driven approach to select the most important practices. Ordinary least squares (OLS) estimates are easier to interpret, but impose a strict functional form when modeling the relationship between variables. It instead opts for more flexible models capable of capturing nonlinearities in how different managerial practices relate to education outcomes. 2 The author exploits the rotation of managers, using a fixed effects model to “decompose productivity into the compo- nents due to office characteristics, manager effects, and time effects.” 3 Their approach was to decompose the variance in test scores at a school and teacher level. 4 A related literature examines the political determinants of bureaucratic quality, such as patron–client relations (Jiang 2018) and political competition and accountability (Gulzar et al. 2017). 632 Cilliers, Dunford, and Habyarimana 2. Context Over the past three decades, Tanzania has undergone a gradual process of decentralization, where re- sponsibility for service delivery has shifted from central to local government, and now falls under a sepa- rate line ministry: the President’s Office for Regional Administration and Local Government (PO-RALG) Downloaded from https://academic.oup.com/wber/article/36/3/629/6555696 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 (Gershberg and Winkler 2004). Tanzania is divided into 184 LGAs, commonly known as districts, that represent the most important locus of authority and resources in service delivery, especially in education, health care, agricultural extension, local water supply, and roads. In this section we describe the organi- zation of LGAs and the constellation of stakeholders, programs, and policies that potentially shape the behavior of this important set of actors in the education system. 2.1. LGA Organization and DEO Roles and Responsibilities Each LGA has a District Executive Director (DED), appointed by PO-RALG, who is the chief adminis- trative officer in the district. The DED supervises a set of departments responsible for education, health, agriculture, water, and roads services. For education, each LGA has a primary and a secondary school DEO, who are respectively the department heads for primary and secondary education services in the district. The median LGA has 95 primary schools, with a range of 13 to 275. Although their role is quite broadly defined, specific responsibilities include monitoring of schools, communication with schools, en- suring delivery and adherence to the curriculum, ensuring quality teaching, administration of standardized examinations, coordination and communication with donors and local government, and transfer of re- sources such as textbooks. For the purposes of this paper, we focus on the primary school DEOs. Despite their broad responsibilities for education service delivery, DEOs in fact have limited control over resources, and no control over the curriculum. Schools receive capitation grants directly from cen- tral government, which also decides the allocation of new teachers and directly remunerates teachers (Gershberg and Winkler 2004). The allocation of other resources such as textbooks and instructional materials to schools is supposed to be governed by formal rules.5 Consequently, only about 5 percent of the overall LGA budget for the primary education department is directly under the discretion of the DEO (see table S1.1 in the supplementary online appendix). Nonetheless, there are key areas where DEOs have considerable discretion to influence education per- formance. First, they manage a cadre of staff who report directly to them. The office of the DEO is legally entitled to have, at a minimum, five staff members who report directly to the DEO (see fig. S1.1 in the supplementary online appendix), as well as the WEOs. WEOs are required to visit primary and secondary schools in their ward on a regular basis, and act as a conduit and communication channel between schools and the DEO.6 At the time of the survey, there were 3,915 wards in the country (median number of 20 wards per LGA), and typically four to five primary schools and one secondary school per ward. DEOs have discretion in the tasks that they assign their staff—which could range from observing teachers in the classroom and distributing lesson plans to organizing of training—and there could be variation in how well they supervise and motivate their staff to perform these tasks. A previous study that surveyed every WEO in 23 LGAs found large differences between LGAs in the activities performed by the WEOs (Cilliers and Oza 2020). For example, in one LGA all the WEOs reported that they check whether a teacher is present, compared to another LGA where only a quarter of WEOs reported performing this activity. 5 Head teachers are required to submit information about the school annually to the Basic Education Management In- formation System (BEMIS) database, including textbook needs. However, unlike capitation grants, DEOs receive those books from central government and administer the allocation to schools using BEMIS enrollment records (the current ratio is 1:3, with aspirations to make it 1 textbook per student). 6 Although the DEOs do not officially appoint WEOs (the letter of appointment comes from the Regional Administrative Secretary), they are responsible for proposing candidates, so have a lot of influence over appointments. They can also put in requests for dismissals. The World Bank Economic Review 633 Although WEOs report directly to the DEOs, they still have discretion in how they perform their tasks, partly due to the geographic proximity to schools and distance from the district office. For example, 95 percent of WEOs either agreed or strongly agreed with the statement: “My work colleagues and I are granted enough discretion and autonomy to complete our tasks.” But they often also face the challenge of Downloaded from https://academic.oup.com/wber/article/36/3/629/6555696 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 reporting to multiple principals: 75 percent of WEOs either agreed or strongly agreed with the statement: “I find it difficult to complete all my tasks because different people expect different things of me.” This would suggest that the DEO does not have exclusive control over the activities of the WEO (Cilliers and Oza 2020). Second, DEOs have informal control over whether and how much certain resources are deployed. At a very high level, DEOs play an important role in the consultations, planning, and implementation of education-related local government capital development programs.7 Moreover, they play a key coordi- nation role with donor-funded projects—which affords them some control over how and where donor resources are deployed in their LGA—and can exert pressure on the locally elected Ward Development Councils to mobilize the community to provide more resources to schools. Finally, they have power over the following areas of human resource management at a school level: approval of teacher transfers within the LGA, promotion opportunities for head teachers, teacher leave policies, and the allocation of a small pool of funding meant for teacher allowances. All of these enable a motivated DEO to generate teacher and school incentives for performance. Third, even though the DEOs cannot make direct decisions over hiring and firing of teachers and head teachers, they can exert substantial soft pressure on schools to increase professional accountability. They can direct their staff to closely supervise particular schools or teachers, and they can leverage their “bully pulpit” to publicly recognize well-performing schools or shame badly performing schools. For example, some LGAs hold annual award ceremonies to reward well-performing teachers, schools, or WEOs; and some share rankings of school performance, based on average score in the Primary School Leaving Exam (PSLE), with all the head teachers in the LGA (Cilliers, Mbiti, and Zeitlin 2020). 2.2. Donor Involvement in Tanzania Multilateral and bilateral donors play a pivotal role in education sector programming in Tanzania, both as large contributors of new resources that shape DEO tasks and in defining education system objectives. Two large programs are the Education Quality Improvement Program in Tanzania (EQUIP-T) and Tu- some Pamoja. Both were designed to enhance learning outcomes in lagging regions in Tanzania.8 EQUIP-T operated in 9 out of the 26 regions in Tanzania over a six-year period (2014–2020), covering 31 percent of schools in the country. Tusome Pamoja has operated in a different set of four regions in the country over a five-year period (2016–2021), covering 16 percent of schools in the country. Each project has some discretion over which aspects of the education production process to address. One of the components of the EQUIP-T program was strengthening district planning and budgeting, with the goal of improving local education sector governance.9 This included management training for WEOs, monthly meetings be- tween WEOs, DEOs, and School Quality Assurance Officers, and providing motorbikes and stipends to WEOs (so that they could conduct monitoring visits and report to district offices). The program worked within the government’s financing system, so all of the implementation funding was wired to the central government, who in turn transferred it to the LGAs to spend. Tusome also engages in a number of activ- ities such as teacher training, training of WEOs on how to provide teacher professional development to teachers, and developing new materials related to basic skills instruction. 7 Local Government Development Grants pay for capital improvements identified by wards approved by the LGA District Council. They include payments for new infrastructure or renovations to existing infrastructure. 8 The regions were selected on the basis of their poor education performance and resource constraints. 9 Other activities include (a) improved access to quality education, (b) strengthened school leadership and management (c) stronger community participation and demand for accountability, and (d) improved learning and dissemination. 634 Cilliers, Dunford, and Habyarimana 3. Data 3.1. Primary Data Collection Between September and November 2019 we visited each of the 184 LGAs of mainland Tanzania and interviewed the primary school DEO and WEOs from two randomly selected wards in the LGA. The DEO Downloaded from https://academic.oup.com/wber/article/36/3/629/6555696 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 survey instrument builds on the Development World Management Survey (D-WMS), originally developed by Lemos and Scur (2016) to survey officials in charge of managing schools and hospitals in developing countries. Prior to designing the instrument, we conducted unstructured interviews with DEOs and other education officials in four different LGAs, to help us better understand their day-to-day activities and management practices. These insights informed the creation of a management survey instrument relevant to the Tanzanian context which was further piloted and refined in six different LGAs. In addition to adapting the instrument to the Tanzanian context, our approach is different from the standard D-WMS in three important ways. First, whereas the D-WMS requires enumerators to rate the managers on a five-point scale along a series of different dimensions, we fielded each dimension as a se- ries of yes/no questions. The purpose was to minimize enumerator discretion, since this forces them to separately judge each component that enters into the scale. See table S2.1 in the supplementary online appendix for an example of the distinction between the two methods. Second, we drew a clear distinction between the DEOs’ management of their staff and management of schools/teachers. To this end, we added questions on how the DEOs motivate schools and teachers to perform well. Third, the survey was con- ducted in person, rather than over the phone. This was necessary to secure the DEO’s undivided attention for the 1–1.5 hour period required to complete the survey. There were ten LGAs where we were unable to survey the DEO, because the DEO was away during the period of data collection, the DEO did not consent to be surveyed, or a new DEO had not yet been appointed. The WEO survey included questions relating to the management practices of the DEO: setting targets, performance review, rewarding performance, and frequency of interaction with the DEO. We also asked how frequently they visit schools and about the activities that take place during the school visits. Given the proximity of WEOs to the schools, the WEO survey can provide a plausibly more reliable source of information on the actual monitoring activities that take place in the LGA, as well as how the DEOs manage key staff. We took the following steps to ensure high-quality data collection and consistency in answers. First, two enumerators participated in the survey. One enumerator would ask the DEO open-ended questions while the other enumerator rated the management practices. The interviewer followed a clear script for both the open-ended questions and additional prompting questions to ask, if necessary. We decided to separate the two tasks to prevent the enumerator responsible for recording the DEO’s responses from inadvertently asking leading questions. Second, we trained enumerators extensively on the instruments, with several days spent exclusively on the DEO interview. For one training exercise, enumerators listened to and coded recordings of DEO interviews conducted during the pilot. Third, we recorded 115 out of the 174 DEO interviews that were conducted. Dar es Salaam–based survey supervisors, experts in the DEO survey, would listen to the recordings every evening and provide feedback/training to the DEO survey teams if they identified problems with the interviewing approach or how responses were coded. The data collection section in the supplementary online appendix provides further details on our data quality assurance protocols. For our descriptive and regression analyses, we construct each management practice score by simply taking the mean of all the indicators relating to a specific practice. Furthermore, we follow Bloom and Van Reenen (2007) and group the DEO practices into four categories, taking the mean of the z-score for each management practice. The categories include setting targets, operations (budget, curriculum, training, and resource allocation), monitoring (collecting information, sharing performance indica- tors, documenting problems), and incentives (reward schools and reward teachers). We use the same The World Bank Economic Review 635 aggregation for the WEO survey, except that we draw a distinction between how they are managed by the DEOs (including targets, monitoring, and incentives), and the tasks that they perform (including monitoring of schools). We should emphasize that there might be important measures of bureaucratic quality that we could not Downloaded from https://academic.oup.com/wber/article/36/3/629/6555696 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 measure, including individual preferences and elements of bureaucratic culture (e.g., building supportive relationships for teamwork and problem solving, facilitating an environment that allows for learning and adjustment after mistakes are made). As in all surveys, we faced trade-offs in the selection of variables. Our motivation for focusing on management indicators captured in the World Management Survey is two- fold. First, we benefit from a literature that has tested the validity of these measures in different contexts. Second, these are “process-related” management practices that, in principle, could be improved upon with public sector reforms. Nonetheless, our final conclusion of the contribution of mid-tier management to performance indicators should be considered a lower bound, since it does not consider all the possible dimensions of bureaucratic quality. 3.2. Secondary Data Sets This study draws on multiple sources of secondary data. First, we have data on every school’s average performance in the PSLE, from 2012 and 2019. The PSLE is a standardized national exam administered to all grade seven students in the country. The examined subjects are mathematics, English language, science, social studies, and Kiswahili.10 Every student gets a score ranging between 0 and 250 (scoring up to 50 for each subject). We use this data both as an objective measure of managerial performance, as well as (for outcomes in prior years) a proxy for other time-invariant unobserved determinants of learning outcomes. Second, we make use of two additional data sets, the 2002 Population Census and the 2015 Uwezo household survey, that capture the average socioeconomic status of households in each LGA. This data allows us to control for socioeconomic determinants of learning outcomes. We use the following variables from the 2002 census: household head literacy rate, highest education in the household, unemployment, school enrollment, ownership of assets, and bucolic status. The 2015 Uwezo household survey was con- ducted with 197,451 households, in 4,750 villages, in every LGA in the country. This data set includes basic socioeconomic characteristics such as the parents’ level of education, employment status, wealth, income, asset ownership, access to water and electricity, and children’s anthropometric measures. It also includes data on whether a child was enrolled at preschool and whether the child goes to a public or private school. For each of the selected variables we construct the average at the LGA level, and merge these aggregates with the data from the management survey.11 3.3. Descriptive Statistics Tables S3.1 and S3.2 in the supplementary online appendix provide some basic descriptive statistics of our sample, including demographic characteristics of our survey respondents in our sample, as well as the examinations, census, and Uwezo data. Out of a total of 184 LGAs, we were able to survey 174 DEOs and 363 WEOs. The DEOs are typically more educated than the WEO, and the majority have a background in education: 62 percent of DEOs have previous experience as a teacher, compared to 71 percent for the WEO. Surprisingly the DEOs have limited experience as DEO in their current LGA: the average years of 10 https://www.necta.go.tz/psle Accessed 9 June 2020. 11 One challenge in merging is that the number of LGAs in the country has expanded by roughly 50 percent in the past decade. There were 132 and 159 LGAs in 2002 and 2015 respectively, compared to 184 in 2019. We overcame this problem by first merging at a ward level. For wards that never split, this allows us to determine which LGA a household would have belonged to, given the 2019 subdivisions. For splitting wards, we assign the same aggregate measure of the mother ward in the earlier data sets. 636 Cilliers, Dunford, and Habyarimana experience is 2.24, and over a quarter of them have not yet completed a year as a DEO in their current LGA.12 Figures S3.1 and S3.2 in the supplementary online appendix show histogram plots of the families of management practices as respectively measured by the DEO and WEO surveys, as well as the total number Downloaded from https://academic.oup.com/wber/article/36/3/629/6555696 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 of times a WEO visited a school in the past two weeks, and the number of activities performed during these visits. Moreover, an examination of the separate indicators (tables S3.3 to S3.9 in the supplementary online appendix) reveals that there is a wide range of initiatives taken by DEOs and tasks performed by WEOs that could conceivably improve learning, but these practices and tasks are not universally employed. Key insights include the following: (1) DEO targets (table S3.3 in the supplementary online appendix). All DEOs have at least one target, and most of them (67 percent) are focused on student learning. (2) DEO operations (table S3.4 in the supplementary online appendix). Almost all DEOs (97 percent) distribute resources such as textbooks in a systematic way, following government guidelines, but only half of DEOs indicated that they go beyond government guidelines to address specific needs faced by schools in the district. Almost all DEOs report following the standard government curriculum, but a smaller fraction (35 percent) go further in helping teachers develop lesson plans and schemes of work. A smaller fraction of DEOs actually organize training for teachers (26 percent) or head teachers (21 percent). (3) Rewarding school and teacher performance (table S3.6 in the supplementary online appendix). Almost all DEOs reported rewarding schools/teachers for good academic performance, but fewer than 30 percent of them communicated these rewards ex ante to the teachers and schools. Award ceremonies were organized in their LGA by 30 percent of DEOs. (4) Monitoring by the DEO (table S3.8 in the supplementary online appendix). There is variation in how deeply the DEO engages with the information created by the WEOs. Almost all WEOs write reports, a smaller fraction (of 70 percent) share this with their DEO, 52.3 percent believe that the DEO actually reads it, and 45.2 believe that the reports are acted upon. (5) Monitoring by the WEO (table S3.8 in the supplementary online appendix). WEOs interact frequently with schools—they visited four schools on average in the past two weeks—but there is variation in what they do when they visit the schools: 63 percent check teacher attendance, 42 percent actually record teacher attendance, 41 percent observe teaching in the classroom, 45 percent assess student learning, but only 4 percent record student learning. This information is therefore unlikely to flow back to the DEO. (6) Performance review and rewards by DEO (table S3.9 in the supplementary online appendix). Of the WEOs, 82 percent state that they can get rewarded if they perform well, and 75 percent believe that the reward system is fair. The most common type of award is a bonus. This is surprising, since there are no official performance pay mechanisms in PO-RALG, and DEOs have a small discretionary budget. There is a moderate degree of correlation between the families of management practices, as measured in the DEO survey (see fig. S3.3 in the supplementary online appendix). The Targets score is moderately correlated with Monitoring (correlation coefficient of 0.379) and Operations (0.412), and is weakly cor- related with Incentives (0.059). The Incentives score is also weakly correlated with Monitoring (0.14) and Operations (0.02). 12 Note that DEOs’ average years experience as DEO in any LGA is larger, since transfers are relatively common: over a quarter (28 percent) of DEOs have worked as a DEO in a different LGA before this position. The World Bank Economic Review 637 3.4. Validation As a further test for the validity of our instruments, we document co-variation between management practices and donor-driven governance reform. Table 1 shows that the DEO management practices and activities performed by WEOs are different in the regions where two large donor-funded programs were Downloaded from https://academic.oup.com/wber/article/36/3/629/6555696 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 operating in the past five years (see section S4 in the supplementary online appendix for a more detailed Table 1. DEO and WEO Management Practices—by Donor Involvement EQUIP-T Tusome Coefficient SE Coefficient SE R2 N Panel A. DEO survey Overall 0.247 0.278 0.032 0.180 0.013 174 Targets 0.379 0.263 0.132 0.194 0.030 174 Operations Overall 0.299 0.218 0.172 0.182 0.019 174 Curriculum 0.443 0.204 −0.359 0.188 0.080 174 Resource allocation 0.350 0.180 0.644 0.243 0.059 174 Teacher allocation −0.355 0.195 0.079 0.307 0.033 174 Training 0.258 0.242 0.025 0.208 0.015 174 Budget 0.085 0.227 0.061 0.279 0.002 174 Monitoring Overall 0.270 0.213 0.096 0.207 0.015 174 Collect information 0.456 0.201 −0.225 0.169 0.066 174 Identify problems 0.321 0.284 0.187 0.284 0.021 174 Number of performance indicators 0.120 0.181 0.339 0.197 0.014 174 Share performance indicators −0.241 0.174 −0.066 0.124 0.012 174 Incentives Overall −0.320 0.252 −0.381 0.127 0.029 174 Reward schools −0.296 0.237 −0.222 0.147 0.019 174 Reward teachers −0.254 0.213 −0.432 0.155 0.028 174 Panel B. WEO survey Overall 0.739 0.183 0.591 0.164 0.183 184 Targets −0.040 0.222 −0.067 0.201 0.001 184 Resources 1.401 0.162 1.329 0.113 0.601 184 Activities 0.368 0.124 0.320 0.152 0.056 184 Incentives Overall −0.239 0.179 −0.084 0.169 0.020 184 Performance review −0.264 0.170 0.148 0.132 0.042 184 Performance rewarded −0.091 0.134 −0.273 0.148 0.016 184 Monitoring (WEO) Overall 0.779 0.183 0.369 0.238 0.186 184 Meetings 0.807 0.161 0.480 0.250 0.219 184 Reporting 0.375 0.164 0.080 0.166 0.052 184 Monitoring (schools) Overall 0.187 0.136 0.170 0.145 0.015 184 Number of visits −0.175 0.144 −0.273 0.163 0.021 184 Activities during school visits 0.433 0.161 0.508 0.131 0.083 184 Source: Authors’ own analysis based on primary data collection. Note: Each row represents a separate regression, with row titles referring to the dependent variables. Each management practice is constructed by taking the mean of all the indicators relating to the practice, normalized to have a mean of 0 and SD of 1. (See tables S3.3 to S3.9 in the supplementary online appendix for the indicators for each practice.) The “Overall” score is the mean of all the practices within the same category. The two dependent variables in the model, EQUIP-T and Tusome, are dummy variables equal to 1 if the local government area is in a region where the respective donor programs were implemented. Standard errors are clustered at the regional level. DEO, District Education Officer; WEO, Ward Education Officer; EQUIP-T, Education Quality Improvement Program Tanzania. 638 Cilliers, Dunford, and Habyarimana discussion). The overall management score, as captured in the DEO survey, is 0.25 standard deviations (SD) larger in the EQUIP-T regions, compared to regions that are served by neither donor. There seems to be better target setting, operations, and monitoring, but weaker school/teacher incentives. Few of these outcomes are statistically significant at conventional levels of significance, but the magnitudes are large Downloaded from https://academic.oup.com/wber/article/36/3/629/6555696 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 and we have limited statistical power given that there are only 26 regions in the country.13 The difference is more stark in the WEO survey, with a difference in the overall score of 0.74 SD, with the largest differences in access to resources (1.4 SD) and monitoring (0.78). Moreover, WEOs in both the donor-funded regions report performing more tasks overall, more activities when they visit schools, and having access to more resources. All of these impacts are consistent with the governance reforms, resource transfer, and donor activities highlighted in the Context section. 4. Empirical Strategy We employ a random forest model to identify the managerial practices that are most predictive of aca- demic performance.14 Our logic is that managerial practices that are predictive of good (bad) academic performance potentially improve (inhibit) the ways schools function and thus indirectly influence the stu- dents in these schools. Predictive power provides a useful starting point in determining which managerial practices matter. A random forest averages across many regression trees when making a prediction, where each tree leverages a random subsample of training data and variables. The advantage of the random for- est over other modeling methods is that it can capture both nonlinear relationships and interactions in the data, offering a better approximation of the data generating process. Moreover, by randomly sampling features when generating trees, a random forest reduces the influence of dominant features (i.e., features that are always strong predictors for the outcome and thus would dominate how each individual decision tree is grown; see James et al. (2013)). This allows the model to explore different sources of variation in the training data more effectively. Our outcome of interest is each LGA’s average exam performance in 2019. We also include in the model the socioeconomic characteristics of each LGA, as captured in the 2002 census and 2015 household survey, as well as exam performance data from 2012.15 In contrast to typical regression analysis, we do not preprocess the data to construct aggregate measures for each management practice and an index for socioeconomic status; we rather include each indicator separately. This allows for the model to be as flexible as possible, and further reduces the discretion of the researcher. When running the models, we first break up our managerial sample into a training (122 LGAs) and a test (52 LGAs) data set. We then employ k-fold cross-validation using the training data to tune the hyperparameters of the random forest.16 We then test the out-of-sample R2 using the held out test data. Our best performing model yields an out-of-sample R2 of 46.93 percent, providing support for that model offering a good approximation of the data generating process. 13 The donor-funded programs are implemented at a regional level, and we clustered our standard errors at this level. 14 For more on machine learning, see James et al. (2013). 15 We do not include exam performance from more recent years, since recent exam performance could itself be a result of current management practices. In particular, the donor-funded programs that instated governance reforms were directly targeted at LGAs that performed badly in the PSLEs prior to 2014. 16 Specifically, we tune the mtry (i.e., the number of variables that are randomly selected when constructing each tree in the random forest ensemble) and trees (i.e., the number of trees generated and then averaged across; in essence, the forest that is generated) parameters, which determine the random number of variables that are selected when building each tree, and the number of trees used to build each forest. We hold all other hyperparameters at their default values. We use the ranger package using the R statistical programming language (Wright and Ziegler 2017). We use 5 folds when generating the cross-validation samples. The World Bank Economic Review 639 In addition to testing for predictive accuracy, we leverage interpretable machine-learning techniques to determine (a) variable importance and (a) functional relationships. The former determines which variables matter to the prediction task, whereas the latter captures how those variables relate to the outcome (i.e., the marginal effect). Variable importance (VI) captures the extent to which the model relies on a particular Downloaded from https://academic.oup.com/wber/article/36/3/629/6555696 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 variable when making a prediction. If a variable is deemed “important”, then excluding that variable from the model results in a reduction in predictive accuracy. Variable importance offers a way to look through the “eyes of the model” to see which variables it relied on to make its prediction. We rely on a model agnostic technique that uses permutations to calculate variable importance. The method takes the training data Xtrain and outcome ytrain , a trained model fˆ (X train ), and error measure L(ytrain , fˆ (X train ))— which in our case is the model fit (R2 )—as input. We then isolate a variable j in Xtrain and permute the order of the variable for all i observations, effectively breaking the association between X train j and ytrain .17 If L(ytrain , fˆ (X train j−permuted )) < L(ytrain , fˆ (X train )), then we can conclude that j is important to the model in generating its prediction. In other words, we scramble each variable, one at a time, and see if the model can still predict accurately without it. If not, then we conclude that the variable was important in generating the models prediction. We can calculate the magnitude of that importance as equation (1): VI j = L(ytrain , fˆ (X train )) − L ytrain , fˆ X train j−permuted , (1) where the importance of variable j for model fˆ (X train ) is determined by the reduction in predictive accu- racy between the permuted and nonpermuted models. We repeat this process for all p variables contained within the training data, j ∈ 1, 2, …, p. Given that the permutations are random, we permute each j vari- able multiple times, generating a distribution for every VIj . Thus, we report the variable importance both as a point estimate reflecting the average variable importance across all permutations, and as an interval, reflecting the 95 percent interval of the VIj . Finally, we report the variables in terms of importance in decreasing order. We innovate on the permutation-based variable importance technique by introducing the idea of a “cluster” permutation. As noted previously, many of the managerial practices captured in the survey fall within general categories, such as training, curriculum, and budgeting. These categories, or clusters, are not composed of any one managerial practice, but rather many. To understand the importance of a specific managerial practice, we propose permuting all variables associated with that strategy to assess its importance when generating a prediction. If Xc train ⊆ X train where c = 1, …, C and C is the total number of variable clusters, then the cluster variable importance is determined by equation (2): VIc = L(ytrain , fˆ (X train )) − L ytrain , fˆ Xc train −permuted . (2) The key difference between equations (1) and (2) is that more than one variable is permuted at a time. Likewise, we permute a cluster multiple times to generate an estimate of the variation in VIc . We view the cluster variable permutation (hereafter “CVI”) approach as being superior to standard dimension reduction techniques as it (a) does not require us to discard information and (b) allows for interpretation by further unpacking the variables contained within the cluster. 5. Results Figure 1 shows the cluster variable importance for the socioeconomi and management variables, as well as variable importance of the 2012 average test scores. The R2 decreases by 0.1 when the managerial variables are permuted, relative to a reduction of 0.23 and 0.16 respectively when the socioeconomic 17 See Fisher, Rudin, and Dominici (2019) for more information on the permutation-based variable importance. 640 Cilliers, Dunford, and Habyarimana Figure 1. Variable Importance for the Three Main Variable Subsets Downloaded from https://academic.oup.com/wber/article/36/3/629/6555696 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 Exam performance Source: Authors’ own analysis based on primary data collection, the Primary School Leaving Exam in 2012 and 2019, the 2002 Population Census, and the 2015 Uwezo household survey (see the Data section). Note: The figure plots the cluster variable importance for the two main categories of variables—socioeconomic factors and management practices—and variable importance for LGA-level average performance in the 2012 standardized exam. The points capture the average Cluster Variable Importance for each cluster (or average Variable Importance for the standardized exam, and the bars reflect the 95 percent interval of the permuted distribution. variables or the 2012 test scores data are permuted. The management practices thus account for 10 percent of the variation in test scores, even after accounting for the contribution of socioeconomic characteristics and historical exam performance of an LGA to current performance. The intuition is similar to the idea of controlling for observable characteristics in regression analysis.18 Figure 2 shows the cluster variable importance of the different managerial practices that are most predictive of performance. The three most important managerial practices are (i) school visits conducted by the WEO, (ii) school and teacher incentives provided by the DEO, and (iii) performance review of the WEO. Together, these results suggest that rewarding performance—of teachers, schools, or WEOs—and monitoring of schools are two key ingredients to management practices that improve school performance, at least in the Tanzanian context. Figure 3 presents the partial dependencies (marginal effects) for each variable composing one of the three practices with the highest CVI.19 The bar plots capture the predicted change in the outcome when changing the specific variable from the minimum to the maximum value. The line plots capture the marginal prediction for each variable as we alter values along the variable’s range. All features have been transformed to have a minimum value of 0 and a maximum value of 1. We bold the top three variables in each cluster set to make it easier to associate the magnitude changes (left) with the marginal predict values (right). 18 Note that the permutation only takes place after the model has been fitted using all of the variables. Since we are only permuting each cluster one at a time, the information in the other variable sets are held at their observed values. Thus, the drop in predictive accuracy can only be attributed to variables we are permuting given that the other variables are held at their observed values. 19 A partial dependency is derived by setting a single variable (x1 ) to specific value (e.g., x1 = 1), holding all other variables in the model at their observed values (x2 , …, xp ). We then generate the model’s average prediction when x1 = 1 for all observations, y¯ ˆ x1 =1 = E [ f (x1 = 1, x2 , . . . , x p )]. We do this for all values along x1 ’s range to generate a prediction curve that captures the model’s average predictions as the values in x1 are altered. See Zhao and Hastie (2021) for further details. The World Bank Economic Review 641 Figure 2. Cluster Variable Importance of DEO Managerial Practices and WEO Activities WEO visits ● School and teacher incentives ● Downloaded from https://academic.oup.com/wber/article/36/3/629/6555696 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 WEO performance review ● WEO targets ● Setting targets ● WEO incentives ● Resource allocation ● Training ● Cluster Performance reviewed by DED ● WEO reporting ● Monitoring ● WEO meetings ● Budgeting ● Share performance indicators ● Teacher allocation ● Curriculum ● WEO resources ● 0.00 0.01 0.02 0.03 0.04 Reduction in R−Squared Source: Authors’ own analysis based on primary data collection, the Primary School Leaving Exam in 2012 and 2019, the 2002 Population Census, and the 2015 Uwezo household survey. Note: The figure plots the cluster variable importance for each management practice, including data from both the District Education Officer and Ward Education Officer surveys. The points capture the average Cluster Variable Importance for each cluster, and the bars reflect the 95 percent interval of the permuted distribution. See tables S3.3 to S3.9 in the supplementary online appendix for the indicators for each management practice. In terms of school visits, the number of times that a WEO has visited a school is the strongest predictor of performance. Moving from the least to the most number of visits changes the predicted test score by roughly 0.03 standard deviations. In terms of performance review of the WEO, whether the DEO actually discussed the performance measures is the strongest predictor for performance. Turning to incentives, whether performance incentives are based on objective indicators is highly predictive, but whether the incentives are financial in nature or not, is not. The relationships identified in fig. 3 are not causal, but they are intuitive and point to future quasi-experimental or experimental work to test for causal relationships. 6. Discussion and Conclusion This study examines the contribution of mid-tier government education managers to the quality of ed- ucation in their district. For this purpose, we conducted a detailed management survey on all DEOs in Tanzania, as well as a random sample of key staff stationed close to schools—the WEOs. We then link this with socioeconomic attributes and exam performance data and use machine-learning techniques to iden- tify the attributes that are the strongest predictors of performance. There are three main findings. First, we find that the management practices explain 10 percent of the variation in academic performance. Second, we find that behaviors measured in the WEO survey are most predictive of learning, especially the num- ber of times that WEOs visit a school. The other management practice that is predictive is the existence of a reward system for well-performing schools, teachers, and WEOs. These results show the merits of using machine-learning techniques in observational data when the number of predictors is larger than the number of observations. 642 Cilliers, Dunford, and Habyarimana Figure 3. Partial Dependencies of the Top Three Managerial Practices (A) WEO Visits ● ● Number of visits ● ● ● ● ● 0.02 Downloaded from https://academic.oup.com/wber/article/36/3/629/6555696 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 Other activities Report teacher attendance Assess learning ● 0.00 ● ● ● Observe teaching ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● Observe teaching −0.02 Report teacher attendance ● ● Number ● of visits 0.00 0.02 0.04 0.06 0.00 0.25 0.50 0.75 1.00 (B) WEO Performance Review ● ● Predicted standardized test scores Discuss accomplishing assigned tasks Discuss OPRAS 0.000 Reviewed on OPRAS Discuss exam performance ● ● ● Reviewed on accomplishing assigned tasks ● ● ● ● −0.005 ● ● ● ● Performance was reviewed in 2019 Discuss ● ● accomplishing assigned tasks ● ● ● ● ● Performance is reviewed −0.010 Reviewed on exam performance Discuss OPRAS ● Discuss exam performance ● −0.010 −0.005 0.000 0.005 0.010 0.00 0.25 0.50 0.75 1.00 (C) School and Teacher Incentives Teacher award criteria based on data ● ● Teachers receive awards ● ● ● ● ● ● ● ● Teacher aware of criteria ● ● Schools receive awards ● ● −0.01 Teacher award criteria based on >1 indicator ● Schools receive awards Schools aware of criteria ● Teachers receive awards Schools receive financial reward School award criteria based on data −0.02 Teachers receive financial award Schools aware of rewards School award criteria based on >1 indicator Teachers aware of rewards ● Teacher award criteria based on data −0.03 0.00 0.01 0.02 0.00 0.25 0.50 0.75 1.00 Change in predicted outcome when switching from min to max Feature range Source: Authors’ own analysis based on primary data collection, the Primary School Leaving Exam in 2012 and 2019, the 2002 Population Census, and the 2015 Uwezo household survey. Note: The figure reports the partial dependencies (marginal effects) for each of the three practices with the highest Cluster Variable Importance reported in fig. 2. Each panel (A, B, and C) captures a different managerial strategy. The bar plots capture the predicted change in the outcome when changing the specific variable from the minimum to the maximum value. The line plots capture the marginal predicted values for each variable. All features have been transformed to have a minimum value of 0 and a maximum value of 1. For each panel, the variables with the largest net change are highlighted in both graphs. These variables reflect the tactics that matter most for each strategy. It is noteworthy that two of the three most important practices are captured by the WEO survey, rather than the DEO survey. Moreover, the co-variation between management practices and donor involvement is much larger for outcomes measured in the WEO survey. Some information is likely more accurate (e.g., activities performed by the WEOs, such as number of school visits) and less biased (e.g., whether the DEO acted on a report submitted by the WEO) when asked of the WEO rather than the DEO. Although our survey was not explicitly designed to directly compare the informativeness of these two interviews, these results suggest a modest methodological contribution of this study: the importance of interviewing managers’ staff, and not only the managers themselves, in order to capture relevant aspects of managerial quality. Of course, all of the results of this study are from observational data and cannot have a causal in- terpretation. Yet, they are consistent with previous experimental or quasi-experimental studies on school and teacher incentives. Previous studies have shown that increased monitoring of schools and teachers is associated with improved school performance (Muralidharan et al. 2017), especially if combined with The World Bank Economic Review 643 financial incentives (Duflo, Hanna, and Ryan 2012; Cilliers et al. 2018). Similarly, rewarding well- performing schools has been shown to improve school performance in Tanzania, even if not combined with explicit financial incentives (Cilliers, Mbiti, and Zeitlin 2020), and teacher incentives have been shown to improve student learning in Tanzania (Mbiti et al. 2019). Downloaded from https://academic.oup.com/wber/article/36/3/629/6555696 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 Taken together, these associations suggest that improvement in the management of WEOs could im- prove education outcomes. WEOs are a key input into the education production function. Their job description enables them to visit schools on a regular basis, and thus monitor schools and teachers. But there is currently high variation in how often they visit schools, what they do when they visit schools, and how deeply the DEOs engage with the information produced by WEOs. Similarly, the fact that school performance is higher in LGAs where there is more substantive performance review suggests that stream- lining performance review to make it easy to implement could yield gains. Future experimental work can test whether exogenous improvements in the management practices and WEO activities highlighted in this study can improve student learning. 7. Data Availability Raw and derived data supporting the findings of this study are available from the corresponding author on request. References Banerjee, A., E. Duflo, C. Imbert, S. Mathew, and R. Pande. 2020. “E-Governance, Accountability, and Leakage in Public Programs: Experimental Evidence from a Financial Management Reform in India.” American Economic Journal: Applied Economics 12(4): 39–72. Bloom, N., B. Eifert, A. Mahajan, D. McKenzie, and J. Roberts. 2013. “Does Management Matter? Evidence from India.” Quarterly Journal of Economics 128(1): 1–51. Bloom, N., R. Lemos, R. Sadun, and J. Van Reenen. 2015. “Does Management Matter in Schools?” Economic Journal 125(584): 647–74. Bloom, N., and J. Van Reenen. 2007. “Measuring and Explaining Management Practices across Firms and Countries.” Quarterly Journal of Economics 122(4): 1351–408. Brodkin, E. Z. 2006, 07. “Bureaucracy Redux: Management Reformism and the Welfare State.” Journal of Public Administration Research and Theory 17 (1): 1–17. ———. 2011. “Policy Work: Street-Level Organizations under New Managerialism.” Journal of Public Administration Research and Theory 21 (suppl_2): i253–i277. Carpenter, D. P. 2001. The Forging of Bureaucratic Autonomy: Reputations, Networks, and Policy Innovation in Executive Agencies, 1862-1928. Princeton, NJ: Princeton University. Cilliers, J., I. Kasirye, C. Leaver, P. Serneels, and A. Zeitlin. 2018. “Pay for Locally Monitored Performance? A Welfare Analysis for Teacher Attendance in Ugandan Primary Schools.” Journal of Public Economics 167 (November): 69–90. Cilliers, J., I. M. Mbiti, and A. Zeitlin. 2020. “Can Public Rankings Improve School Performance? Evidence from a Nationwide Reform in Tanzania.” Journal of Human Resources 56 (3): 655–85. Cilliers, J., and S. Oza. 2020. “The Motivations, Constraints, and Behaviour of Tanzania’s Frontline Education Providers.” RISE Insight Series 2020/023. Crawfurd, L. 2017. “School Management and Public–Private Partnerships in Uganda.” Journal of African Economies 26(5): 539–60. Deserrano, E., S. Caria, P. Kastrau, and G. Leon. 2021. “Financial Incentives in Multi-layered Organizations: An Experiment in the Public Sector.” Technical report, Northwestern University. Di Liberto, A., F. Schivardi, and G. Sulis. 2014. “Managerial Practices and Students’ Performance.” Economic Policy 30(84): 683–728. Duflo, E., R. Hanna, and S. P. Ryan. 2012. “Incentives Work: Getting Teachers to Come to School.” American Eco- nomic Review 102(4): 1241–78. 644 Cilliers, Dunford, and Habyarimana Faguet, J.-P., and C. Pöschl. 2015. “Is Decentralization Good for Development?: Perspectives from Academics and Policy Makers.” Oxford University Press 1–29. Fenizia, A. 2019. “Managers and Productivity in the Public Sector.” Working Paper. Fisher, A., C. Rudin, and F. Dominici. 2019. “All Models Are Wrong but Many Are Useful: Variable Importance for Black-Box, Proprietary, or Misspecified Prediction Models, Using Model Class Reliance.” Journal of Machine Downloaded from https://academic.oup.com/wber/article/36/3/629/6555696 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 Learning Research 20(177): 1–81. Galiani, S., P. Gertler, and E. Schargrodsky. 2008. “School Decentralization: Helping the Good Get Better, but Leaving the Poor Behind.” Journal of Public Economics 92(10–11): 2106–20. Gershberg, A. I., and D. Winkler. 2004. “Education Decentralization in Africa: A Review of Recent Policy and Prac- tice.” In Building State Capacity in Africa: New Approaches, Emerging Lessons, 323–56. Washington, DC: World Bank. Grindle, M. S. 1997. “Divergent Cultures? When Public Organizations Perform Well in Developing Countries.” World Development 25(4): 481–95. Gulzar, S., and B. J. Pasquale et al. 2017. “Politicians, Bureaucrats, and Development: Evidence from India.” American Political Science Review 111(1): 162–83. Hanushek, E. A., S. Link, and L. Woessmann. 2013. “Does School Autonomy Make Sense Everywhere? Panel Estimates from Pisa.” Journal of Development Economics 104(September): 212–32. James, G., D. Witten, T. Hastie, and R. Tibshirani. 2013. An Introduction to Statistical Learning, vol. 112, p. 18. New York: Springer. Jiang, J. 2018. “Making Bureaucracy Work: Patronage Networks, Performance Incentives, and Economic Develop- ment in China.” American Journal of Political Science 62(4): 982–99. Lavy, V., and A. Boiko. 2017. “Management Quality in Public Education: Superintendent Value-Added, Student Out- comes and Mechanisms.” Technical report, (No. w24028), National Bureau of Economic Research. Leaver, C., R. Lemos, and D. Scur. 2019. “Measuring and Explaining Management in Schools: New Approaches Using Public Data.” World Bank Policy Research Working Paper 9053. Lemos, R., and Scur. 2016. “Developing Management: An Expanded Evaluation Tool for Developing Countries.” Working Paper, RISE. Lipsky, M. 1980. “Street Level Bureaucracy, Dilemmas of the Individual in Public Services.” New York: Russell Sage Foundation. Politics & Society 10(1): 116. Mbiti, I., K. Muralidharan, M. Romero, Y. Schipper, C. Manda, and R. Rajani. 2019. “Inputs, Incentives, and Comple- mentarities in Education: Experimental Evidence from Tanzania.” Quarterly Journal of Economics 134(3): 1627– 73. Meier, K. J., and L. J. O’Toole Jr. 2002. “Public Management and Organizational Performance: The Effect of Manage- rial Quality.” Journal of Policy Analysis and Management: The Journal of the Association for Public Policy Analysis and Management 21(4): 629–43. Muralidharan, K., J. Das, A. Holla, and A. Mohpal. 2017. “The Fiscal Cost of Weak Governance: Evidence from Teacher Absence in India.”Journal of Public Economics 145(January): 116–35. Park, A., and E. Hannum. 2001. “Do Teachers Affect Learning in Developing Countries? Evidence from Matched Student-Teacher Data from China.” In Conference Rethinking Social Science Research on the Developing World in the 21st Century, 1–41. Citeseer. Pepinsky, T. B., J. H. Pierskalla, and A. Sacks. 2017. “Bureaucracy and Service Delivery.” Annual Review of Political Science 20(1): 249–68. Rasul, I., and D. Rogger. 2018. “Management of Bureaucrats and Public Service Delivery: Evidence from the Nigerian Civil Service.” Economic Journal 128(608): 413–46. Rasul, I., D. Rogger, and M. J. Williams. 2019. “Management, Organizational Performance, and Task Clarity: Evidence from Ghana’s Civil Service.” Journal of Public Administration Research and Theory 31(2): 259–77. Schuster, C., J.-H. Meyer-Sahling, and K. S. Mikkelsen. 2020. “(Un) Principled Principals, (Un) Principled Agents: The Differential Effects of Managerial Civil Service Reforms on Corruption in Developing and OECD Countries.” Governance 33(4): 829–48. Walter, T. F. 2018. “State Management of Education Systems and Educational Performance: Evidence from a Manage- ment Survey at District Education Offices in Zambia.” IGC Working Paper S-89454-ZMB-2, International Growth Centre, London. The World Bank Economic Review 645 Wright, M. N., and A. Ziegler. 2017. “Ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R.” Journal of Statistical Software 77(1): 1–17. Zarychta, A., T. Grillos, and K. P. Andersson. 2020. “Public Sector Governance Reform and the Motivation of Street- Level Bureaucrats in Developing Countries.” Public Administration Review 80(1): 75–91. Zhao, Q., and Hastie. 2021. “Causal Interpretations of Black-Box Models.” Journal of Business & Economic Statistics Downloaded from https://academic.oup.com/wber/article/36/3/629/6555696 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 39(1): 272–81.