The Role of Coherence in Strengthening Community Accountability for Remote Schools in Indonesia APRIL 2022 This work is a product of the staff of The World Bank. The findings, interpretations, and conclusions expressed in this work do not necessarily reflect the views of the Executive Directors of The World Bank or the governments they represent. The World Bank does not guarantee the accuracy of the data included in this work. The boundaries, colors, denominations, and other information shown on any map in this work do not imply any judgment on the part of The World Bank concerning the legal status of any territory or the endorsement or acceptance of such boundaries. Rights and Permissions © 2022 The World Bank 1818 H Street NW, Washington DC 20433 Telephone: 202-473-1000; Internet: www.worldbank.org Some rights reserved. The material in this work is subject to copyright. Because The World Bank encourages dissemination of its knowledge, this work may be reproduced, in whole or in part, for non-commercial purposes as long as full attribution to this work is given. All queries on rights and licenses, including subsidiary rights, should be addressed to World Bank Publications, The World Bank Group, 1818 H Street NW, Washington, DC 20433, USA; fax: 202-522-2625; e-mail: pubrights@worldbank.org. Attribution Please cite the work as follows: Hwa, Yue-Yi; Lumbanraja, S. K.; Riyanto, U. A.; Susanti, D. 2022. The Role of Coherence in Strengthening Community Accountability for Remote Schools in Indonesia, World Bank, Jakarta. © World Bank. WORKING PAPER The Role of Coherence in Strengthening Community Accountability for Remote Schools in Indonesia Yue-Yi Hwa, Sharon Kanthy Lumbanraja, Usha Adelina Riyanto, and Dewi Susanti SOCIAL SUSTAINABILITY AND INCLUSION UNIT THE WORLD BANK – INDONESIA 2022 CONTENTS Table of Contents Abstract v Acknowledgments vi 01 Introduction 1 02 Overview of the KIAT Guru Project 5 Context of the intervention 5 Social accountability mechanism (SAM) 6 KIAT Guru treatment groups and outcomes 7 03 Conceptual Framework 9 A note on support in the context of power imbalances 11 04 Data and Methods 13 Data sources 13 Analytical approach 15 05 Results 17 Coherence for learning 17 Prioritizing student learning in service agreements 17 Stakeholders’ contribution to the learning process 20 Summary 20 Coherence between stakeholders in accountability relationships 21 Changes in how favorably stakeholders regard each other 21 Disparities beetween principals’ and agents’ views of each other 26 Summary 26 ii. CONTENTS Coherence between design elements within the voice and choice relationsip 27 Power dynamics in the voice and choice relationship 27 Design elements and coherence in the voice and choice relationship 29 Alternative hypothesis 1: Not coherence, but better information 35 Alternative hypothesis 2: Not coherence, but clear incentives (information + delegation + motivation) 39 Summary 41 06 Discussion 43 07 Conclusion 45 08 References 47 09 Appendices 51 Appendix A. Descriptive Quantitative Analysis of the KIAT Guru Surveys and Process Monitoring 51 Appendix B. Amendments to the Teacher and School Leader Service Agreement Indicators 56 Appendix C. Detailed Analysis of Coherence Across Relationships in Case Study Schools 62 iii. CONTENTS Figures Tables Figure 1. Distribution of Students’ Indonesian Table 1. Summary of KIAT Guru Treatment Groups 8 Literacy Levels Against Their Enrolled Grade Levels in 270 Remote Primary Table 2. Design Elements of the Voice and Choice Schools (KIAT Guru Baseline Study) 6 Relationship Between Families/Community Members (Principals) and Teachers/School Figure 2. SAM in KIAT Guru 7 Leaders (Agents) for Each KIAT Guru Treatment Group 30 Figure 3. How the Components of KIAT Guru Map Onto the RISE Education Systems Table 3. Design Elements in the Village-Level Voice Framework 10 and Choice Relationship That are Incorporated into Hypotheses for the Figure 4. Average Weighting of Selected Indicator Greater Effectiveness of SAM+Cam 35 Categories in the Teacher Service Agreements, Pre- and Postamendment 19 Table 4. Correlations Between School-Level Student Learning Assessment Scores (in Indonesian Figure 5. Proportion of Positive Views of Other Language and Mathematics) and Endline Stakeholders Expressed in Interviews Teacher Scorecard Ratings (with and and Focus Groups in the Nine Case without the Attendance Indicator) 37 Study Schools 22 Table 5. Average Teacher Attendance Within Figure 6. Principals’ and Agents’ Views of Each Other KIAT Guru Treatment and Control Groups in the Voice and Choice and Management at Baseline and Endline, with Two-Sample Relationships at Baseline, Midline, and T-tests for Between-Group Differences (%) 40 Endline 24 Table A.1. Profile of KIAT Guru Cadres, by Figure 7. Average Teacher Scorecard Ratings By Treatment Group 55 Treatment Group, with and without the Attendance Indicator 36 Table A.2. Profile of KIAT Guru UCs, by Treatment Group 55 Figure 8. Parents’ Perceptions of the Quality of Their Children’s Education Compared Table B.1. Service Agreement Indicators for with the Previous Year 38 Teachers in SDK Kondok, Pre- and Postamendment 56 Figure B.1. Changes in Teacher Service Agreement Indicators, by Treatment Group 58 Table B.2. Average Weighting of Teacher Service Agreement Indicator Categories, Pre- Figure B.2. Changes in Teacher Service Agreement and Postamendment 57 Indicators, Average Across Treatment Groups 58 Table B.3. Average Weighting of School Leader Service Agreement Indicator Categories, Figure B.3. Changes in School Leader Service Pre- and Postamendment 59 Agreement Indicators, by Treatment Group 60 Figure B.4. Changes in School Leader Service Agreement Indicators, Average Across Treatment Groups 61 Figure C.1. Stakeholders’ Views of Each Other in SAM-Only Case Study Schools 63 Figure C.2. Stakeholders’ Views of Each Other in SAM+Score Case Study Schools 64 Figure C.3. Stakeholders’ Views of Each Other in SAM+Cam Case Study Schools 65 iv. ABSTRACT Abstract Incoherence in accountability relationships, or the lack of alignment between the various components of a specific education system, can hamper the quality of education. Such incoherence can be a particular challenge in resource- constrained, remote villages where teachers tend to have higher educational capital and social status than the parents and communities whom they serve. We analyzed quantitative and qualitative data from a randomized controlled trial of a social accountability mechanism (SAM) for primary schools in remote Indonesian villages. The intervention had three treatment groups, all of which included the SAM, that engaged village-level stakeholders in a consensus-building process that led to joint service agreements for supporting the learning process. Prior analyses have found that all three treatment groups significantly improved student learning, but the treatment group combining the SAM with teacher performance pay based on camera-monitored teacher attendance led to much larger gains than the SAM-only treatment group or the treatment group combining the SAM with teacher performance pay based on a community-evaluated scorecard. Drawing on a range of quantitative data sources across all treatment group schools (process monitoring, survey, and service agreement indicators) and qualitative data from nine case study schools (interviews and focus group discussions), we show first that the student learning gains across all three treatment groups were accompanied by increases in both the coherence of the accountability relationships between village-level stakeholders and the degree to which these relationships were oriented toward the purpose of cultivating learning. We further show that the treatment group combining the SAM with camera- monitored teacher attendance led to greater improvements in the coherence of accountability relationships than the other treatment groups, because the cameras improved both the technical capacity and the social legitimacy of community members to hold teachers accountable. This coherence-focused, relational explanation for the relative effectiveness of the treatment groups has more explanatory power than alternative explanations that focus narrowly on information quality or incentive structure. Our analysis reinforces arguments for ensuring that accountability structures are coherent with the local context, including local social structures and power dynamics. v. Acknowledgments We acknowledge financial support for the original data collections from the Government of Australia’s Department of Foreign Affairs and Trade and USAID through trust funds managed by the World Bank. The Research on Improving Systems of Education (RISE) Programme in Indonesia, managed by SMERU Research Institute, also co-financed the endline surveys. We are grateful to Clare Leaver, Menno Pradhan, Jan Priebe, Christopher Bjork, Arya Gaduh, and Jason Silberstein for valuable feedback on an earlier draft. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. During the analysis and writing of this paper, Yue-Yi Hwa was a Research Fellow for the RISE Programme and Sharon Kanthy Lumbanraja, Usha Adelina Riyanto, and Dewi Susanti were consultants with the World Bank. vi. INTRODUCTION 01 Introduction Despite substantial increases in education budgets and student enrollment over the last two decades, learning outcomes in Indonesia have stagnated (Beatty et al. 2018; de Ree et al. 2018). Furthermore, teacher performance tends to be unsatisfactory, with high rates of teacher absenteeism in rural and remote areas. To address these challenges, KIAT Guru (Kinerja dan Akuntabilitas Guru, or Teacher Performance and Accountability) was implemented as a randomized controlled trial from 2016 to 2018 in remote districts of East Nusa Tenggara and West Kalimantan provinces in Indonesia. KIAT Guru tested three treatment groups that included a social accountability mechanism (SAM). Under the SAM, each community formulated a multistakeholder service agreement for improving local school quality, then conducted monthly evaluations of teachers and school leaders using a community scorecard. The treatment groups differed in whether and how they combined the SAM with individual performance-based financial penalties. These penalties took the form of deductions from a government-funded special allowance that would effectively double a teacher’s base salary if awarded in full. In “SAM-only,” no performance-based penalties were introduced, such that all allowance-eligible teachers continued to receive the allowance in full. In “SAM+Score,” the SAM was combined with performance pay based on the community scorecard. In “SAM+Cam,” the SAM was combined with performance pay based on camera-monitored teacher attendance. After one year of implementation, student learning outcomes improved significantly in all three KIAT Guru treatment groups compared with the control group (see section 2.3 for detailed results). The SAM+Cam treatment group achieved the best all-around results (Gaduh et al. 2020; World Bank 2020). A follow-up impact evaluation conducted one year after the endline survey showed that the gains persisted in SAM+Cam (Gaduh et al. 2021). Additionally, a prior qualitative study of KIAT Guru concluded that SAM+Cam was the most effective treatment group because it empowered local community members to hold school leaders and teachers accountable. The reputational scorecard mechanism and a complementary financial disincentive (unlike SAM- only) was seen as unbiased and technically straightforward (World Bank 2020). In contrast, the subjective and more technically demanding scorecard-based performance pay mechanism of SAM+Score was seen as less appropriate given the capacities and educational status of local community members relative to the teachers they were evaluating (World Bank 2020). 1. INTRODUCTION We pose a two-part research question: To what extent does the concept of coherence explain (a) the gains in student learning across all three KIAT Guru treatment groups and (b) the differences in effectiveness between the treatment groups? The second part of the research question warrants particular interest because the greater effectiveness of SAM+Cam cannot be entirely explained by mainstream literature on incentives in job performance. The fact that the reductive, attendance-focused performance pay indicator of SAM+Cam was more effective than the multicriteria indicator of SAM+Score runs counter to the argument that single-indicator performance pay schemes are ill-suited for multidimensional jobs such as teaching—particularly when the incentivized indicator is an input (for example, teacher behavior) rather than In this new study, we analyze field data from the nine a desired outcome (for example, student growth). This schools in the qualitative case studies (reported in argument has been advanced in both the economics World Bank 2020), along with three rounds of survey literature, as in Holmstrom and Milgrom’s (1991) and process monitoring data from 203 schools where model of multitask principal-agent relationships, and the treatment groups were implemented. This allows us educational research, as in Murnane and Cohen’s (1986) to further investigate factors that may have driven the analysis of teacher merit pay schemes in the US. Andrabi learning gains in the project, building on the preceding and Brown (forthcoming) offer a recent empirical KIAT Guru studies. We frame the analysis around the example of this dynamic, where a performance pay concept of coherence, or the degree of alignment between scheme in private schools in Pakistan yielded similar test the various components of a specific education system. score gains whether teacher pay was contingent solely We draw on the RISE Education Systems Framework, on student test scores or on broader appraisals by the which focuses on coherence between stakeholders school leader. However, the narrow test-score-based and design elements in accountability relationships for treatment weakened the non-incentivized domain of education service delivery, conceptualized as principal- student socioemotional development. agent relationships (Pritchett 2015; Spivack 2021). Given the persistently low learning levels in Indonesia Yet incentives based on single input indicators have, in despite improvements in input-oriented aspects of some contexts, led to improvements in more complex the education system, we are especially interested outcomes. A notable example is a camera-monitored in whether these relationships are oriented toward teacher performance pay intervention in India evaluated learning as compared with other goals. Accordingly, in Duflo, Hanna, and Ryan (2012), which led to significant we consider three aspects of coherence: the degree gains in both teacher attendance and student learning. to which accountability relationships are coherent for However, as we show below, it is unlikely that the teacher learning; the degree to which there is coherence between attendance gains in SAM+Cam were substantial enough stakeholders in village-level accountability relationships; to be the sole driver of the gains in student learning.1 and the degree to which there is coherence between Hence one goal of this paper is to explain the puzzle of design elements in the voice and choice accountability why the narrow incentive structures of SAM+Cam proved relationship between the village community (principals) to be more effective than the broad-based incentives in and teachers or school leaders (agents). SAM+Score. 1 It is important to note that teacher attendance was measured during one unannounced visit per month for all thirty months of the intervention in Duflo, Hanna, and Ryan (2012); whereas KIAT Guru only conducted unannounced teacher absence surveys at two points (baseline and endline), yielding a noisier measure of teacher attendance. 2. INTRODUCTION We hypothesize that (a) as KIAT Guru progressed, Guru expanded the joint planning meetings in Pradhan all three treatments improved the degree to which et al. (2014) into a SAM that involved service agreements stakeholder relationships were coherent for student between a wider set of stakeholders (parents, teachers, learning, which led to learning gains; and (b) there school leaders, and village leaders), with the additional were greater increases in the overall coherence of incorporation of performance pay. accountability relationships under the most effective treatment, SAM+Cam. To preview the analysis, we find Thus, together with the prior analyses of KIAT Guru, this evidence to support both hypotheses. Coherence for paper contributes to the growing body of research on learning improved across all three treatment groups. interventions that aim to improve education service The greater increase in coherence in SAM+Cam was delivery in developing countries by empowering parents driven by the tamper-proof cameras for monitoring and local communities. These include interventions that teacher attendance, which empowered local community target parents’ awareness of how to improve educational members not only with additional technical capacity but quality (for example, Barrera-Osorio et al. [2021] on also additional authority to evaluate teachers. Thus, the providing training sessions to parent associations in cameras helped to redress a power imbalance between Mexico);2 interventions that improve parents’ access to teachers and school leaders, on one hand, and between information on student and school performance (for children, families, and community members, on the example, Andrabi, Das, and Khwaja [2017] on school other. As we show in our consideration of alternative report cards in Pakistan); and interventions that improve hypotheses, this emphasis on overall coherence and parent-teacher communication (for example, Islam power dynamics in accountability relationships has, in [2019] on parent-teacher meetings to discuss student this context, more explanatory power than hypotheses progress in Bangladesh). that focus narrowly on the quality of information or on financial incentives. We also contribute to research on interventions that test different financial incentive structures for teacher Using coherence as the main analytical construct in performance in low-resource settings. The design of this paper is appropriate given the design of the KIAT these interventions has similarities to Duflo, Hanna, Guru project itself. KIAT Guru aimed to strengthen and Ryan’s (2012) evaluation of camera-monitored accountability relationships between agents—teachers teacher performance pay in India; Cilliers et al.’s (2018) and school leaders as service providers, and between evaluation of teacher performance pay based on both sets of principals whom they serve—village school leaders’ reports of their attendance in Uganda;3 communities as service recipients and the government and Andrabi and Brown’s (forthcoming) evaluation of as an employer. KIAT Guru expanded on a prior teacher performance pay based on student test scores study by Pradhan et al. (2014) in rural public primary and appraisal by school leaders. In examining both schools in Indonesia, which found that strengthening the service provider–community relationship and the the relationship between school stakeholders and service provider–government relationship, we find that other village actors—through facilitated joint planning the structure of monitoring and incentives affected meetings between school committees and village teachers’ responses to KIAT Guru treatments. However, councils—had a significant positive effect on student the differential effects between treatment groups were test scores. In contrast, treatment groups in the same not due to financial incentives in isolation but instead study that focused on the school committees alone, by were mediated by wider power dynamics in the village- providing either block grants or training, did not have a level educational ecosystem. significant effect on student learning. Accordingly, KIAT 2 Unlike the other interventions mentioned in this paragraph, the parent training sessions in Barrera-Osorio et al. (2021) did not improve student learning. However, they had the positive effect of reducing disciplinary actions in schools. 3 However, local monitoring of teacher performance in KIAT Guru’s SAM+Score treatment involved not only official reports but also monitoring by community members. 3. INTRODUCTION We explore these power dynamics, and coherence in KIAT Guru more broadly, in the rest of this paper. In section 2, we give an overview of the KIAT Guru project. In section 3 and section 4, we lay out our conceptual framework, data sources, and analytical approach. In section 5, we report our findings, focusing on three aspects of coherence. Section 6 is a brief discussion, and section 7 concludes. 4. OVERVIEW OF THE KIAT GURU PROJECT 02 Overview of the KIAT Guru Project 2.1 Context of the intervention The KIAT Guru project was motivated by high teacher absenteeism (ACDP 2014) and low student learning outcomes in remote areas of Indonesia (Badan Pusat Statistik [BPS] 2017; Stern and Nordstrum 2014). The National Team for the Acceleration of Poverty Reduction (Tim Nasional Percepatan Penanggulangan Kemiskinan) under the Indonesian vice president’s office collaborated with the Ministry of Education and Culture to initiate the project in 2016 with support from the World Bank. Five districts from the government’s list of disadvantaged regions participated in the project: Ketapang, Landak, and Sintang in the province of West Kalimantan; and Manggarai Barat and Manggarai Timur in the province of East Nusa Tenggara. The government formally established a national committee along with local committees in each of the five districts to coordinate implementation throughout the project’s timeline. The project used two separate instruments to measure student learning in Indonesian language and mathematics. The first instrument was an evaluation test to measure learning outcomes at baseline, endline, and the one-year follow-up (Lumbanraja, Prameswari, and Susanti 2021). In the evaluation test conducted at baseline (between late 2016 and early 2017), 25 percent of classrooms had no teacher, with students being two grade levels below the curriculum standard on average, as shown in figure 1. Yet, parents surveyed at baseline expressed satisfaction with the quality of education and learning outcomes, suggesting either low expectations or inadequate information (World Bank 2019). To address the lack of easily available data on student learning outcomes, the KIAT Guru research team developed a second student learning instrument: a rapid diagnostic test to give community members a gauge of student literacy and numeracy (Lumbanraja and Prameswari 2021). This diagnostic test was similar to learning assessments used in community advocacy in comparable contexts (ASER 2014; PAL Network 2018) and was administered regularly throughout the intervention. 5. OVERVIEW OF THE KIAT GURU PROJECT Figure 1. Distribution of Students’ Indonesian Literacy Levels Against Their Enrolled Grade Levels in 270 Remote Primary Schools (KIAT Guru Baseline Study) 5 At or above grade level Actual mastery Below grade level level 4 17% 3 6% 80% 3% 2 29% 89% 4% 1% 1 11% 64% Below G1 51% 80% 7% 6% Cannot recognize 47% 6% letters 1 2 3 4 5 6 Enrolled grade level Source: Original figure based on data from Table 52 in World Bank, 2019. 2.2 Social accountability develop a schoolwide service agreement and teacher- mechanism (SAM) specific scorecard. At the end of this series of meetings, community representatives elected a gender-balanced Under KIAT Guru, all three treatment groups implemented user committee (UC) consisting of at least nine members a SAM, which operated in a cycle as summarized in figure (six parents, representing each grade level, and three 2. The initial cycle of SAM implementation began with community leaders). KIAT Guru project facilitators presenting village-level student learning data from the baseline evaluation test, The UCs monitored teachers and school leaders benchmarked against national curriculum targets, in a regularly and presented their findings at a monthly community meeting to trigger a series of discussions village-wide meeting, where each teacher and school on how learning environments at schools and at home leader was assigned scorecard ratings that were can better support learning. Modeled after community reported to village and district governments. After participation in monitoring frontline service providers each six-month period of implementation, the UCs and (Bjorkman and Svensson 2009), the meetings were a locally appointed village cadre conducted another initially conducted with separate stakeholder groups— round of diagnostic learning assessments, which were upper-grade level students and alumni, parents and followed by a round of amendments to the service community representatives, and school leaders and agreements and scorecards, if the community deemed teachers—prior to having intergroup meetings to such amendments necessary. 6. OVERVIEW OF THE KIAT GURU PROJECT Figure 2. SAM in KIAT Guru Development (or amendment) Reporting of Assessment of Monitoring “Village-wide” of service evaluation student learning by UC meeting agreement and results scorecards every 6 months every month Source: Original figure for this publication The primary source of frontline support for KIAT Guru district and national governments also issued special came from project facilitators, each of whom initiated regulations to support the project. Alongside the implementation in several schools. They facilitated commissioning decrees was an expectation that the the initial stakeholder discussions that led to service village government would also allocate some funds to agreements and scorecards and raised awareness of support the UC, the village cadre, and the school, such relevant issues among community representatives in as money to cover the costs of office supplies or light “socialization” sessions. These sessions offered school- refreshments for UC members while on duty. specific information on student learning outcomes along with general information on children’s rights to quality education and local communities’ rights to participate 2.3 KIAT Guru treatment groups in and monitor education service delivery. They also and outcomes trained UCs in collecting data to monitor progress on service agreements and scorecard indicators (through Alongside the SAM that was implemented in all three reviewing administrative data, conducting unannounced treatment groups, KIAT Guru tested three different spot checks in school, and interviewing students models of teacher accountability and incentives, as and parents). Additionally, the facilitators identified, summarized in table 1. The first group implemented a mentored, and coached a village cadre to take over their SAM-only treatment, which relied on social rewards and role in facilitating meetings once project implementation sanctions for teachers’ scorecard performance. The was eventually handed over to the communities other treatment groups added a performance-based (Marliyanti, Adelina, and Susanti, forthcoming 2022). pay element, where unsatisfactory teacher performance could result in cuts to the government-funded special Besides this on-the-ground support from project allowance that is enjoyed by roughly 35 percent of facilitators, KIAT Guru was also supported through teachers across the treatment groups.4 The second official channels. Each village government issued group, SAM+Score, tied the teacher special allowance to decrees to formalize the appointment of the UC and the the monthly teacher scorecard as evaluated by the UC, village cadre, thus legitimizing their roles to participate such that a teacher who scored 95 percent in a given in, monitor, and evaluate teacher scorecards. The month would receive 95 percent of his or her allowance 4 This special allowance (Tunjangan Khusus Guru), equal to their base salary, is allocated by the central government for civil servant teachers appointed to schools in special areas (including remote areas). In addition, teachers who are registered as civil servants also receive an allowance of Rp 1.5 million (US$105). Non-civil-servant and non-registered teachers, most of whom are school-contracted teachers, do not receive this allowance, even if they serve in remote schools. For more on the different employment statuses in Indonesia (for example, registered civil servants vs. contract teachers), see Huang et al. (2020). 7. OVERVIEW OF THE KIAT GURU PROJECT for that month. The third group, SAM+Cam, tied the giving them a low score. In SAM+Score, the UC arguably allowance to teachers’ presence in school. SAM+Cam had the most discretionary power, as the scores they schools were provided with a smartphone with an chose to award to each teacher would result in both application called KIAT Kamera, and teachers were reputational and financial sanctions. In SAM+Cam, the required to record their daily presence in school using UC’s discretionary judgments affected the scores that KIAT Kamera. Every month, the UCs would compare KIAT were associated with social sanctions, but they had less Kamera data with administrative attendance records to room for discretion with the camera-validated teacher verify whether teachers’ absences had been formally presence scores that determined the financial sanctions.6 excused by the school leader.5 The UC’s verification reports were then used to calculate deductions to a The impact evaluation of KIAT Guru found that all three 100-point presence score for each teacher, with daily treatments improved learning outcomes, with average deductions of 1.5 points for partial attendance (that is, effect sizes across Indonesian language and mathematics arriving late or leaving early), 2 points for an excused of SD = 0.08 for SAM-only, SD = 0.11 for SAM+Score, absence, and 5 points for an unexcused absence. If a and SD = 0.20 for SAM+Cam (see Table 3 in Gaduh teacher’s monthly presence score fell below 85, he or she et al. 2021; see also Gaduh et al. 2020). This analysis would not receive the allowance. Otherwise, the amount rejected equality between SAM+Cam and the other two matched the presence score, such that a teacher who treatment groups (p < 0.02). An additional round of data received a presence score of 90 in a given month would collection for SAM-only and SAM+Cam one year after the receive 90 percent of the allowance for that month. endline found that learning outcome gains persisted in SAM+Cam with an effect size of SD = 0.13 on average Table 1. across the two subjects, whereas learning outcomes in Summary of KIAT Guru Treatment Groups SAM-only were no longer significantly different from the control group (see Table 3 in Gaduh et al. 2021; see also SAM- SAM+ SAM+ Control Gaduh et al. 2020).7 only Score Cam SAM with UC √☐ √☐ √☐ Performance pay In addition to the quantitative impact evaluation, based on a second study drew on three rounds of detailed Teacher √☐ √☐ qualitative fieldwork in nine schools, with three schools presence in from each of the three treatment groups. The qualitative school study concluded that SAM+Cam was the most effective Other aspect of √☐ teacher practice treatment because it combined enforcement (in the Tamper-proof √☐ form of a performance-based pay cut) with an evaluation cameras for metric that was viewed as objective rather than contingent monitoring presence in school on UC members’ interpretations (World Bank 2020). Number of schools 67 68 68 67 In this reanalysis, we revisit both the qualitative data Source: Original table adapted from Table 1 in Gaduh et al., 2020. and the three rounds of quantitative survey data, complemented with process monitoring data to build a Thus, across the treatment groups, the UCs had different deeper understanding of the processes underlying the capacities for sanctioning underperforming teachers. In differential success of these treatment groups. SAM-only, they could penalize teachers reputationally by 5 “School leader” here refers to the school principal. School leaders were also subject to the interventions and underwent the same evaluations described for the teachers. Their evaluations were sent to and verified by district-level school supervisors. 6 For more on the theoretical underpinnings of the three KIAT Guru treatment groups, refer to the impact evaluation papers (Gaduh et al., 2020, 2021) and the qualitative fieldwork report (World Bank, 2020). 7 Given budget constraints as well as the government’s stated interest in scaling up SAM+Cam based on the results of the initial impact evaluation, follow-up data were not collected for SAM+Score (Gaduh et al., 2021). 8. CONCEPTUAL FRAMEWORK 03 Conceptual Framework The main analytical construct in this paper is coherence in relationships within education systems. We define coherence as “the degree of alignment between the various components of a specific education system.” Given the many interacting components within complex education systems, we focus our analysis of coherence on those aspects emphasized within the RISE Education Systems Framework (Pritchett 2015; Spivack 2021; see also World Bank 2003). This framework identifies four accountability relationships within education systems and five main design elements within each relationship. Each accountability relationship focuses on a set of principals, or actors who want a task to be completed; and a set of agents, or actors whom the principals engage to complete the task. The four relationships are • The politics relationship, between citizens (principals) and the highest authorities of the state such as the president or prime minister (agents); • The compact relationship, between the highest authorities of the state (principals) and education authorities and organizations such as the education ministry (agents); • The management relationship, between education authorities and organizations (principals) and frontline providers such as school leaders and teachers (agents); and • The voice and choice relationship, between service recipients such as children, families, and communities (principals) and frontline providers (agents). Figure 3 shows how these four relationships are interconnected in an education system and how the two main KIAT Guru components map onto these relationships. The SAM operates primarily within the voice and choice relationship, and the performance pay component operates primarily within the management relationship (because the financial incentive involves deductions from a government-funded special allowance). Nonetheless, the two components are connected. Under the performance pay component, the UC of local community members either generated (in SAM+Score) or verified (in SAM+Cam) the performance measure that determined the financial incentive. The fact that these community-based appraisals drew on attendance records administered by school leaders also helped to bring these village-level relationships into coherence with each other. Above the community-school level, the clearly specified service agreement indicators and scorecards also helped to align expectations—and improvement efforts—from schools to village and district governments. 9. CONCEPTUAL FRAMEWORK Figure 3. How the Components of KIAT Guru Map Onto the RISE Education Systems Framework Highest COMPACT state executive and fiduciary authorities POLITICS Education authorities and organizations SOCIAL PERFORMANCE ACCOUNTABILITY PAY MANAGEMENT Children, families, Schools and communities VOICE AND CHOICE and teachers Source: Original figure adapted from Pritchett 2015 and Spivack 2021 In each of the four relationships, there are five design • Support, which refers to resources and processes elements that can fundamentally affect the nature of the that strengthen the accountability relationship, relationship and what the relationship accomplishes. especially when they are provided to the less The five design elements are empowered actor in the relationship. • Delegation, which is the goal that the principal Using the framework of these design elements and wants the agent to fulfill, such as when children, accountability relationships, we examine three different families, and communities (principals in the voice aspects of coherence in accountability relationships. and choice relationship) want teachers (agents First, we look at the goal or purpose for which in the voice and choice relationship) to educate accountability relationships are coherent, looking the children under their care; specifically at coherence for learning in the village-level • Finance, which represents the resources that voice and choice relationship. This part of the analysis the principal provides to the agent to fulfill the focuses on the design element of delegation—that delegated goal, such as the fees that families pay is, the goals that principals want agents to fulfill; with to local schools or the time they contribute to secondary attention to the design element of finance— supporting learning-related initiatives; that is, the extent to which principals are willing to • Information, which refers to what the principal make costly contributions to the achievement of these knows about the agent’s performance, such as goals. The overarching purpose or orientation of an when parents receive report cards from the accountability relationship is especially important in school; domains such as education where there are multiple • Motivation, which is how the agents’ wellbeing is priorities that can inadvertently compete for resources affected by their performance on the delegated and for stakeholders’ attention. In many education goal, such as when a teacher feels personally systems that have high levels of enrollment but low levels fulfilled or gains social recognition from the local of student learning, accountability relationships are community from helping students to improve oriented toward the expansion of access to schooling, their learning; and but they are not coherent for cultivating student learning (Pritchett 2015). 10. CONCEPTUAL FRAMEWORK As noted in the introduction, we ask a two-part research question and test two hypotheses within this conceptual framework. The question is, To what extent does the concept of coherence explain (a) the gains in student learning across all three KIAT Guru treatment groups and (b) the differences in effectiveness between the treatment groups? We hypothesize that (a) as KIAT Guru progressed, all three treatments improved the degree to which stakeholder relationships were coherent for student learning, which led to learning gains; and (b) Second, we look at coherence between stakeholders in there were greater increases in the overall coherence accountability relationships. In this part of the analysis, we of accountability relationships under the most effective depart from the RISE framework in that we do not focus treatment, SAM+Cam. In the analysis that follows, we on design elements within accountability relationships. weigh the evidence for both hypotheses within each of In our study context, the village-level voice and choice the three aspects of coherence outlined above. relationship includes a heterogeneous mix of principals: students, parents, school committees, village heads, UCs, and village cadres, all of whom participated in 3.1 A note on “support” in the field interviews or focus group discussions. The nature context of power imbalances of the qualitative case study data is such that we can either aggregate across multiple stakeholder groups to Spivack (2021, 9) defines support in the RISE framework as summarize their perceptions of specific design elements, “preparation and assistance that the principal provides to which we do subsequently, or explore more generalized the agent to complete the task.” Our definition of support perceptions of each disaggregated stakeholder group, overlaps with Spivack’s in some instances. Spivack gives which we do in this part of the analysis. Specifically, the example of the education ministry providing training we look at how different stakeholder groups generally to teachers in the management relationship, which falls regard other groups (for example, how parents regarded within her definition of support as assistance from the teachers), exploring whether KIAT Guru led to more principal (in this case, the ministry) to the agent (in this positive views of each other’s contributions to student case, teachers). This also falls within our definition of learning. We look at the views of principals and agents support as resources and processes that strengthen the within both the voice and choice and management accountability relationship, especially by empowering relationships at the village level. This falls within the the weaker actor: without appropriate training, teachers broader definition of coherence as alignment between are unlikely to fulfill their complex responsibilities, which the components of a specific education system. would weaken the accountability relationship because of the wide gap between stated delegation and feasible Third, returning more closely to the RISE framework, performance. we look at coherence between design elements within a single accountability relationship (Pritchett 2015; Spivack However, we depart from Spivack’s definition of 2021). For example, if the education ministry delegates support because the power dynamic in one of the most to teachers the goal of improving student learning levels, important relationships in the KIAT Guru project, that but does not provide teachers with the training and is, the voice and choice relationship, was such that the instructional materials needed for effective classroom principals (children, families, and communities in remote learning, this would constitute an incoherence between villages) were initially much less empowered than the delegation and information in the management agents (school leaders and teachers). Consequently, relationship. In our analysis, we focus on coherence (or the relationship could only be coherent with adequate lack thereof) between design elements in the voice and support to rebalance that dynamic and strengthen choice relationship, in the context of the village-level the accountability relationship. Such support for UCs principals and agents who participated in KIAT Guru. included capacity- building sessions, appointment 11. CONCEPTUAL FRAMEWORK via official decrees, and access to tamper-proof data, incentives), because antiaccountability forces are likely which strengthened both their technical capacity and to hold more power than pro-accountability forces, perceived legitimacy, as we discuss below. whether within the state or in society—or, in the language of the RISE framework, the voice and choice We are far from being the first to point out that hierarchical relationship needs to be coherent with the management differences between teachers and other stakeholders can relationship. Similarly, in a review of forty-eight studies of affect accountability structures. For example, Narwana interventions that aimed to use information to improve (2015) observed that teachers in Haryana, India, were rural service delivery, Kosec and Wantchekon (2020) resistant to a community accountability initiative partly conclude that informational interventions will only yield because they regarded themselves as higher-status gains if three conditions hold: the information must be actors who should not be subject to the judgment of relevant, the recipients of the information must have lower-status, less-educated villagers. Broekman (2015) incentives to act on it, and the recipients must also found that teachers in Indonesia questioned even their have the power to act on it. They further note that the fellow teachers’ authority to conduct peer appraisals, on rural poor—a category encompassing the communities the basis that such authority should be held exclusively in KIAT Guru—often lack the power to respond to the by those higher in the hierarchy, such as school leaders information they receive (see also Banerjee and Duflo or supervisors. [2008] on the relative powerlessness of village education committees in Uttar Pradesh, India). Given this need More generally, Fox (2015) argues in a review of social to redress the disempowerment of rural villagers in accountability interventions in developing countries accountability for service delivery, we redefine the that “voice” from citizens needs to be complemented design element of support to incorporate these power with “teeth” from the state (that is, positive and negative dynamics. 12. DATA AND METHODS 04 Data and Methods 4.1 Data sources Process monitoring data. Throughout the implementation of KIAT Guru, process monitoring data from the 203 treatment schools were collected by project facilitators and project implementation teams. Process monitoring data include some indicators that were routinely collected (every month) and some that were occasional. Routinely collected data include monthly teacher scorecard ratings and data from the teacher attendance verification process in SAM+Cam schools. Occasionally collected data include information on project implementation, such as its timeline, descriptions of community participation in the project, and assessments of UCs’ and village cadres’ fulfillment of their responsibilities. Process monitoring data collection extended from November 2016 until June 2019. Quantitative survey data. For impact evaluation purposes, an independent survey team collected quantitative survey data during three different periods (Gaduh et al. 2021). A baseline survey was conducted between November 2016 and February 2017. After roughly one year of implementation (and soon after SAM facilitation was handed over from project facilitators to village cadres), an endline survey was conducted between February and mid-April 2018. A follow-up survey was undertaken from March to May 2019 to assess the impact of the project intervention one year after it concluded. All 270 treatment and control schools were included in the baseline and the endline surveys. However, due to budget restrictions, the relatively disappointing endline findings on SAM+Score, and the government’s stated preferences, only SAM+Cam, SAM, and control schools were included in the follow-up survey. The quantitative survey encompassed evaluation tests of student learning, teacher absence surveys, and interviews with stakeholders participating in the project. The student learning tests assessed basic literacy and numeracy competencies based on the 2006 national curriculum. Students in grades 1–5 in participating schools took part in the assessment during the baseline survey, and subsequently in the endline survey when most of them were in grades 2–6. The teacher absence surveys aimed to provide a representative description of teacher attendance in participating schools through an unannounced visit during normal school hours on the first day of each quantitative survey round. 13. DATA AND METHODS district government. After each six-month semester, local stakeholders met to evaluate these scorecards for relevance, during which service indicators were sometimes amended to reflect identified needs. There were 2,542 service indicators in the beginning of the project, which grew into 10,508 service indicators following the first amendment of the scorecard. This large increase was attributed to the growing need for parents and teachers to tailor each scorecard to individual teachers’ subject- or grade-specific responsibilities. Qualitative case studies. To complement the process monitoring and impact evaluation, a series of qualitative The quantitative interviews with local stakeholders case studies was conducted in nine of the 203 treatment captured background information on the perceptions, schools (World Bank 2020). The nine schools were characteristics, and dynamics of stakeholders at the located in three of the five project districts: Ketapang village level. School leaders, teachers, parents, school and Landak in West Kalimantan province and Manggarai committees, and village heads were interviewed in all Barat in East Nusa Tenggara province. In each district, three rounds of the survey. UC members were only three villages with similar characteristics—such as interviewed in the endline and follow-up surveys as student learning outcomes, geography, and community the committees had not been established at baseline.8 characteristics—were chosen as case study sites, each In this paper, the primary areas of interest from the representing one of the three treatment groups. The surveys are perceptions of teacher performance, school qualitative data collection was designed to identify leadership, the extent of local government support, changes in stakeholders’ views about education quality parental participation, UC effectiveness, and indicators and the social accountability process. The three SAM- that help to triangulate qualitative case study data on only schools were SDI Sangka, SDN Engkangin, and the relationships between stakeholders. SDN Sungai Laur; the SAM+Score schools were SDI Konang, SDN Sungai Keli, and SD Simpang Dua; and the Service indicators. At the start of KIAT Guru, teachers, SAM+Cam schools were SDK Kondok, SDN Sampuraneh, village leaders, and parents in all treatment schools and SD Usaba Sepotong.9 formulated joint service agreements to improve education quality, as discussed above in section 2.2. Researchers visited each school three times: in Teacher service agreements were operationalized into October–November 2016 before KIAT Guru started; teacher scorecards containing five to seven locally in August–September 2017 after monthly teacher defined service indicators, which always included an evaluations had been conducted several times in the indicator of teacher presence. These scorecards were schools; and in February–March 2018, shortly after the used by UCs to monitor and evaluate each teacher project facilitators had handed over implementation every month. The scores were then reported to the responsibilities to village stakeholders. The qualitative 8 School committees are preexisting entities appointed by the school leader and primarily involved as counterparts in school management and the utilization of school operational funds. UCs are elected bodies, newly established under KIAT Guru to represent the wider village community in administering the SAM. UCs are formally appointed by the head of the village in which the school is located. 9 Although individual participants in the field data collection are identified by their titles or designations (for example, village head, UC member) rather than by name, we identify schools by name for consistency with the previous qualitative study (World Bank, 2020). 7 Given budget constraints as well as the government’s stated interest in scaling up SAM+Cam based on the results of the initial impact evaluation, follow-up data were not collected for SAM+Score (Gaduh et al., 2021). 14. DATA AND METHODS research questionnaires were developed by three We analyze a wide range of process monitoring and principal investigators10 who recruited and trained field quantitative survey variables that we identified, ex ante, researchers who held master’s degrees in social sciences. as being related to student learning and coherence The questionnaires were piloted in collaboration with in accountability relationships between village-level the field researchers and then revised prior to each stakeholders. Based on these findings, we iteratively field visit. Each of the nine schools was studied by two refined our arguments, weighing evidence both for researchers who visited the field together and shared our hypotheses and for alternative hypotheses. The data collection responsibilities. They interviewed analysis of process monitoring and quantitative survey students, teachers, parents, school leaders, UC data is summarized in appendix A, and the most salient members, government officials, and members of the indicators are discussed in section 5.1 on coherence community; attended UC meetings; facilitated focus for learning and section 5.3 coherence between design group discussions with teachers, students, parents, elements in the voice and choice relationship. and UC members; examined school records; observed lessons; and collected evidence of student learning The raw service agreement indicators from village- outcomes from school report cards. Interviews and specific scorecards were first grouped into larger focus group discussions were transcribed by the field categories, with twenty categories for teacher service researchers who conducted them. These transcripts are agreement indicators (for example, “Teacher applies fun the main sources of qualitative data in this paper. and motivating learning techniques in the classroom”) and fourteen categories for school leader service agreement indicators (for example, “Principal sets 4.2 Analytical approach good example through their attendance and behavior while in school”). These categories were then classified As noted above, we analyze these four data sources according to whether or not they were directly related to explore the extent to which different aspects of to student learning. Changes in distribution of indicators coherence in accountability relationships can explain across categories before and after the first amendment the overall improvements in student learning across process are outlined in section 5.1 on coherence for treatment groups, as well as the relative effectiveness of learning, with further descriptive statistics in appendix B. the SAM+Cam treatment. We draw on a mix of sources We focus on the first round of semesterly amendments throughout the analysis, depending on data availability to the service agreements because this gives a snapshot for the constructs and issues in question. For example, of the extent to which village stakeholders’ pretreatment our analysis of teacher attendance and incentives educational priorities, as articulated in the initial service mainly uses quantitative data from process monitoring agreements, were altered through participation in KIAT and surveys (see section 5.3.4), while our discussion of Guru. power dynamics in the voice and choice relationship mainly uses qualitative case study data (see section For the qualitative case study data, we read the transcripts 5.3.1). of interviews and focus group discussions with village- level stakeholders in full, coding them according to a For the quantitative data sources—process monitoring scheme based on three thematic areas of interest: (a) data, quantitative surveys, and service agreement coherence, because of the conceptual framework of indicators—we use straightforward descriptive statistics, the paper; (b) the quality of information, because this along with some significance tests. Where relevant, we was identified as a key variable in the initial analysis of also refer to models estimated using these data in the the qualitative data (World Bank 2020); and (c) power impact evaluation papers (Gaduh et al. 2020, 2021). dynamics, because both the wider literature and prior 10 Investigators were Christopher Bjork, Raihani, and Dewi Susanti. Bjork is the Dexter M. Ferry, Jr. Professor of Education and Chair & Coordinator of Teacher Education at Vassar College in the United States. Raihani is a Professor of Islamic Education Studies at the State Islamic University of Sultan Syarif Kasim in Indonesia. 15. DATA AND METHODS experience and analysis of KIAT Guru suggested that views if the overall opinion is obviously positive); and this would be an important explanatory factor. However, positive and oriented toward student learning. these codes were neither applied comprehensively nor analyzed in a structured manner. Rather, we use To illustrate the difference between the last two them to gain traction and highlight points of interest categories, the following statement from the midline in a large corpus of 180 interview transcripts and 122 interview with the school leader at SDN Engkangin focus group discussion transcripts of village-level represents a generically positive view of teachers: stakeholders. On occasion, we also draw on interview transcripts with stakeholders from other administrative There’s been a small change. Previously they levels (for example, project facilitators who coordinated were a bit tardy. Now they’re more punctual. KIAT Guru implementation across several schools) as Maybe they’re getting more and more aware well as from field notes and summary reports from the of the roles and responsibilities in which they field researchers. Quotes included in the text have been need to show determination, for the sake of translated from Indonesian to English and lightly edited their performance [on the scorecard rating]. for readability. In contrast, during the endline round of data collection, In addition to this broad analysis of key themes in the the same school leader’s view of the teachers in his interview data, we conducted a systematic analysis of school was not only positive, but also distinctly oriented stakeholders’ views of each other, which is presented toward student learning: in section 5.2 on coherence between stakeholders in accountability relationships, with further summary tables Over the last six months, there have been in appendix C. For this analysis, we extracted quotes quite a lot of changes. First, there’s been from interviews and focus groups in which a stakeholder change in the fulfillment of their duties. (or set of stakeholders) expressed opinions about Before, they were maybe less effective in another stakeholder (or set of stakeholders). We then delivering teaching materials, but now that summarized each set of views with one classification for there are indicators for foundational skills, each stakeholder pair in each direction, for each data they’re right on target. collection round in each school. (For example, there is one classification for SDI Sangka parents’ views of the As discussed above, this distinction between generic school leader at midline and one classification for the positive views and positive views oriented toward SDI Sangka school leader’s view of parents at midline.) student learning is crucial because actors within Each classification was assigned by one of us and then education systems can have a range of priorities for checked by a coauthor. Any divergences in classification schools and teachers on the frontline, some of which were discussed and resolved by consensus. While some support student learning, and some of which compete of these classifications represent an individual’s views with it for attention and other resources. (for example, the school leader), others span multiple individuals (for example, the views of multiple parents To strengthen the validity of our argument, in summarized into a single classification) as well as examining the third aspect of coherence (that is, multiple interviews and focus groups (for example, the coherence between design elements within the voice views of multiple teachers across a focus group and a few and choice relationship) we weigh the evidence for individual interviews from the same field visit summarized our explanation of the differential effectiveness of the into a single classification). The classifications are as KIAT Guru treatment groups against the evidence for follows: negative, neutral or “don’t know,” and mixed (that two alternative hypotheses. This aligns with both with is, having positive views of some actions but negative statistical approaches to hypothesis testing and with views of others; or some members of the stakeholder realist arguments about validity in policy research group having positive views and others having negative (Pawson and Tilley 1997) as well as educational research ones); positive (which can include some mildly negative (Porpora 2015). 16. RESULTS 05 Results 5.1 Coherence for learning To examine the extent to which the KIAT Guru treatments increased coherence for learning in the village-level voice and choice relationship, we first examine service agreement indicator data for changes over time in the distribution of scorecard weightings (out of 100 percentage points) for indicators that are directly related to student learning compared with indicators that are less related to learning; and next we examine quantitative survey data for changes in the extent to which village-level stakeholders took actions that directly support the learning process. 5.1.1 Prioritizing student learning in service agreements Beginning with the service agreement indicators, we focus on the first round of amendments to the service agreements, which took place after the first semester of KIAT Guru implementation, and on an additional round of rapid diagnostic tests to inform the amendment process. We examine whether these amendments resulted in heavier weighting of service agreement indicator categories that are directly related to student learning. We interpret this as a proxy for the degree to which village stakeholders prioritized learning (versus other goals) in monitoring the performance of teachers and school leaders. For teachers, eight out of the twenty categories of service indicators are directly related to student learning (for example, “Teacher strives to ensure students’ learning comprehension, including in providing feedback”), while the other twelve are indirectly related at best (for example, “Teacher inculcates patriotism and values of obedience and orderliness in students”). For school leaders, three out of the fourteen service indicator categories are directly related to learning.11 An example of the service agreements for teachers in one of the qualitative case study schools, SDK Kondok, is shown in table B.1 in appendix B. 11 We are not implying that the service indicator categories that are indirectly related to learning are unimportant or illegitimate. Some of these categories represent other goals that matter tremendously for holistic development but may inadvertently compete with learning goals—for example, “Teacher instills religious, cultural, and social norms in students.” (See also Kurniasih, Utari, and Akhmadi [2018] on the character education policy in Indonesia.) Other categories represent inputs that are necessary but far from sufficient for student learning—for example, “Teacher starts and ends class on time.” While such inputs are obviously important, many low-performing education systems fail to cultivate adequate levels of student learning because they focus excessively on inputs rather than on how these inputs contribute to meaningful classroom interactions (Pritchett, 2013). In this analysis, we focus on increases in coherence for learning because the outcome of interest that we are trying to explain is gains in student learning. 17. RESULTS Across all three treatment groups, there was a clear In contrast, after the amendment process, the indicator postamendment shift toward indicators directly related categories about facilities management and community to student learning.12 Prior to the amendments, the eight needs had been displaced by an indicator category indicator categories in the teacher service agreements about the school leader setting a good example while in that were directly related to student learning had an school along with a learning-related indicator category— average total weighting of 36.6 percentage points, which “Principal develops curriculum and ensures that teachers increased to an average of 52.1 percentage points develop lesson plans and deliver fun learning activities” postamendment. This increased emphasis on learning- (see table B.3 in appendix B for more details). This shift related indicators was consistent across treatment across the top three indicators was consistent across groups, with preamendment weightings ranging from treatment groups. 35 to 40 percentage points and postamendment weightings ranging from 51 to 54 percentage points Similarly, for the teacher service agreement indicators, (see table B.2 in appendix B for more details). For school as summarized in panel (a) of figure 4, all three leaders, there was a similar but smaller shift, with the treatment groups saw a large increase in the weighting three learning-related school leader service indicator of the indicator category “Teacher works to improve categories having an average total weighting of 22.6 students’ literacy and numeracy skills,” from an average percentage points preamendment and 29.3 percentage weight of 2.4 percentage points preamendment to points postamendment (see table B.3 for more details). 12.4 percentage points postamendment. In contrast, Again, changes were of comparable magnitude across there were consistent declines in the weighting of the treatment groups. indicator category “Teacher uses positive discipline with students and avoids any form of verbal or corporal This increased emphasis on learning-related indicator punishments,” which is not directly related to student categories is also evident when looking at pre- and learning. That said, it is worth noting, as suggested postamendment changes in specific indicator categories. by the qualitative case study data, that this reduced For example, out of the top three indicator categories emphasis on positive discipline approaches may be due for school leaders, preamendment, only one was directly to teachers’ beliefs that eliminating corporal punishment related to student learning—“School leader develops a led to unmanageable classroom disruptions given that teaching schedule for teachers, conducts supervision children had been socialized into expecting corporal over teachers’ discipline, and ensures learning activities punishment—rather than to a specific desire to prioritize are implemented”; while the other two were related to the learning-related indicators. It is also worth noting that school leaders’ management of school facilities and their “Teacher starts and ends class on time” had, by far, the attentiveness to community needs and communication. heaviest weighting both pre- and postamendment 12 It is unclear the degree to which this increased weighting of learning-related indicators was due to changes in village stakeholders’ priorities throughout the intervention, or to the amendment process itself. This probably varied across schools. For example, field notes from the qualitative case study of SDN Simpang Dua indicate that the project facilitator who led the amendment meeting at this school consistently asked meeting attendees whether each indicator had a direct impact on student learning outcomes; whereas field notes from the amendment meeting at SDI Konang do not indicate that the project facilitator played such an explicitly learning-oriented role. 18. RESULTS Figure 4. Average Weighting of Selected Indicator Categories in the Teacher Service Agreements, Pre- and Postamendment A. TEACHER SERVICE AGREEMENT INDICATORS DIRECTLY RELATED TO LEARNING - SAM Only SAM+Score SAM+Cam Overall Striving to ensure 15% students’ learning comprehension 14% Striving to ensure 13% 13% 13% students’ learning 12% 12% 12% comprehension 13% 12% 12% Working to improve 11% students’ literacy and numeracy skills Providing remedial 7% sessions to improve 6% 7% 8% Providing remedial student learning 7% sessions to improve 5% 5% student learning 4% 2% 3% 2% 2% Before After Before After Before After Before After B. TEACHER SERVICE AGREEMENT INDICATORS NOT DIRECTLY RELATED TO LEARNING Starting and Starting and ending class 30% 31% ending class 29% 32% on time 28% on time 30% 27% 28% Using positive discipline and Using positive avoiding 14% 14% 14% discipline and verbal/corporal 15% avoiding punishment 10% verbal/corporal 11% 9% punishment 9% Inculcating patriotism, Inculcating 3% 3% obedience, and 2% 4% 3% 2% patriotism, 1% 2% obedience, and orderliness orderliness Before After Before After Before After Before After Source: Original figure for this publication Notes: Each panel shows the three indicator categories with the highest overall average postamendment weighting, out of a total of 8 indicator categories directly related to student learning for panel (a) and 12 indicator categories not directly related to student learning for panel (b). The weightings of all indicators in each teacher service agreement add up to 100 percent. For a more detailed view of pre- and postamendment service agreement indicator weightings, see appendix B. 19. RESULTS 5.1.2 Stakeholders’ contributions to the respectively. Again, there was considerable variation learning process between families, and no significant differences between groups at endline.13 Data on other aspects of parental While the service agreements identify the goals that participation in education are available in appendix A. the village-level principals (or service recipients, such as families and village government stakeholders) must Besides families, the village government was willing to prioritize in service delivery by village-level agents contribute to KIAT Guru improvements, but to a limited (or frontline providers, such as teachers and school extent. A high proportion of village heads attended the leaders), the actions of the principals themselves also monthly KIAT Guru meetings: 82.8 percent for SAM, 89.8 provide valuable information about the degree to which percent for SAM+Score, and 93.0 percent for SAM+Cam. these bidirectional accountability relationships were Additionally, village governments for 83 percent of oriented toward learning. Here the quantitative survey treatment schools allocated some of the village budget data are encouraging but perhaps less conclusive than for KIAT Guru implementation. However, when it came the service agreement indicators. to more direct contributions, less than half of the UCs reported that their village government had undertaken Looking first at parents’ willingness to contribute directly any novel educational initiatives to honor the service to the learning process, there were small increases agreements (44.8 percent for SAM, 47.8 percent for between the baseline and endline self-report surveys SAM+Score, and 42.7 percent for SAM+Cam). Data on in the average amount of time that parents allocated other aspects of local government support for KIAT toward supporting their children’s learning at home Guru are available in appendix A. each day. These increases were comparable across treatment groups, with increases of 10.3 percent in the 5.1.3 Summary SAM-only treatment group (from 32.1 minutes daily at baseline to 35.4 minutes daily at endline) and 8.5 percent To summarize, data from service agreements and from in both SAM+Score and SAM+Cam. Average increases quantitative stakeholder surveys indicate that there notwithstanding, there was substantial variation across was a shift toward teacher and school leader service families (for example, for SAM-only, standard deviations indicator categories that were directly related to student were 21.7 minutes at baseline and 26.9 minutes at learning, suggesting a greater emphasis on learning in endline). Moreover, there were no significant differences delegation from principals (that is, service recipients) between treatment groups at either baseline or endline, to agents (that is, frontline providers) in the voice and based on t-tests at 5 percent significance. choice relationship at the village level. Additionally, there are slight increases in parents’ and village leaders’ Similar results were seen in families’ average financial willingness to contribute to the learning process or to expenditure on education, as reported by parents. the KIAT Guru social accountability process, suggesting Comparing baseline and endline averages, there was an increase in finance (that is, resources) in the voice and an increase in parent-reported educational expenditure choice relationship oriented toward student learning. of 10.8 percent for SAM, 4.7 percent for SAM+Score, and 18.1 percent for SAM+Cam. Comparing the All of this supports hypothesis (a), which attributes the endline averages for each treatment group with that gains in student learning across all three KIAT Guru of the control group, the typical family in the SAM-only treatment groups to corresponding gains in coherence group spent 6.3 percent more on education than the for learning in the accountability relationships across typical family in the control group, with increases of 5.3 all three treatment groups. However, the data sources percent and 8.7 percent for SAM+Score and SAM+Cam, discussed in this section showed neither large nor However, there was a significant difference at baseline between SAM+Score (mean=IDR326075.9) and SAM+Cam (mean=IDR298329.5). 13 20. RESULTS consistent differences between treatment groups. 5.2.1 Changes in how favorably Thus, they do not lend any support to hypothesis (b), stakeholders regard each other which posits that the larger student learning gains in SAM+Cam are due to greater increases in coherence Many opinions expressed in the qualitative interviews across accountability relationships in this treatment and focus groups suggest that KIAT Guru improved group (relative to the other treatment groups). the coherence between village-level stakeholders, particularly in their expectations of each other (that is, delegation) and in communication with each other 5.2 Coherence between (that is, information), across all three treatment groups. stakeholders in accountability For example, when asked in a midline interview about changes in the school over the last six months, the relationships school leader at the SAM-only school SDN Engkangin As noted above, there are many aspects of coherence said that there had been a change in “motivation for in accountability relationships for education systems. educational cooperation at school”: Having used data from the service agreement indicators and quantitative stakeholder surveys to examine Before this, there were still a lot of parents changes in coherence for the purpose of learning, who were ignorant. . . . There was often the we now weigh the evidence for changes in coherence assumption that the school would bear all between stakeholders in accountability relationships, 100 percent of [the responsibility for] their drawing on interview and focus group data from the children, but after we had those socialization qualitative case studies in nine KIAT Guru schools. meetings then they understood, so they also contribute. In this part of the analysis, we interpret a stakeholder’s positive view of another stakeholder as an indication During an endline focus group discussion at the of (some degree of) coherence in their relationship, SAM+Cam school SDK Kondok, a community elder said especially when both sets of stakeholders in the that there was “closeness” in the relationship between relationship express favorable views of each other. This the school leader and the UC: requires the strong assumption that positive views of another stakeholder correspond to the belief that this If there are any issues that the UC doesn’t stakeholder is fulfilling their obligations as principal or fully understand, they ask the teachers agent in an accountability relationship. This assumption and the school leader. There’s good may fail for multiple reasons, ranging from dispiriting communication between the school and the reasons (for example, if parents’ expectations of UC. They are like partners. What needs to teachers are so low that they view teachers positively be improved is that from all parties, starting as long as the latter show up for part of the school from the school, the committee, and the UC, day) to more encouraging ones (for example, if parents there needs to be even more socialization have commendably aspirational but unrealistically high and alignment of perceptions so that all can expectations of teachers such that they view teachers understand. negatively despite the latter’s best efforts). Although we cannot conclusively test the extent to which this strong Similarly, during an endline interview, a teacher at assumption is met, this part of the analysis nonetheless the SAM+Score school SDI Konang said, “The UC has contributes to the overall understanding of complexity become a bridge connecting parents’ aspirations with and coherence in community accountability in KIAT the school.” Guru schools. 21. RESULTS To develop a more systematic view of the extent to oriented views among village-level stakeholders which these positive views represent overall changes expressed in interviews and focus group discussions in village-level educational accountability relationships, in the nine case study schools. The graphs in figure 5 we extracted and classified quotes pertaining to are based on straightforward proportions of positive stakeholders’ views of each other, as described above and positive learning-oriented views among the total in section 4.2. To begin with a high-level, coarse-grained count of views expressed, where each unit included in view, figure 5 summarizes overall changes over time in the count represents one stakeholder group’s views of the proportion of positive views and positive learning- another stakeholder group.14 Figure 5. Proportion of Positive Views of Other Stakeholders Expressed in Interviews and Focus Groups in the Nine Case Study Schools SAM-Only SAM+Score SAM+Cam SDI Sangka SDI Konang SDK Kondok 80% 60% 40% 20% 0% Baseline Midline Endline Baseline Midline Endline Baseline Midline Endline 80% SDN Engkangin SDN Sungai Keli SDN Sampuraneh 60% 40% 20% 0% Baseline Midline Endline Baseline Midline Endline Baseline Midline Endline 80% SDN Sungai Laur SD Simpang Dua SD Usaba Sepotong 60% 40% 20% 0% Baseline Midline Endline Baseline Midline Endline Baseline Midline Endline Source: Original figure for this publication Notes: Stakeholders included in the analysis were students, teachers, school leaders, parents, school committee members, UC members, village cadres, and village heads. Views of other stakeholders were extracted from interview and focus group transcripts and were classified, for each pair of stakeholders in each round of data collection (for example, teachers’ views of students at baseline), as either negative, neutral, mixed, positive, or positive and oriented toward student learning 14 In this broad-brush summary, we use proportions (for example, the proportion of views that were positive) rather than counts because the total number of units (that is, the denominator) varies across each round of data collection for each school, ranging from 25 (sets of) views at baseline in Sangka to 51 (sets of) views at endline in SDN Engkangin. This wide variation is due to several practical reasons: participants choosing not to answer certain interview and focus group questions about other stakeholders, participants offering opinions about specific stakeholder groups in response to general questions, or, in a few cases, interviews or focus groups that could not be conducted because the stakeholders in question were unavailable during the field research teams’ visits. Across all case study schools, the total number of views included in the classification increased between the baseline and the endline, largely because endline data collection included UC members and village cadres, who had yet to be appointed at baseline. 22. RESULTS As shown in figure 5, all nine case study schools saw For a more granular view, figure 6 shows the classifications increases in the proportion of total positive views (that of principals’ and agents’ views of each other in the is, generic positive + learning-oriented positive) between voice and choice relationship and the management the baseline and endline field visits. To the extent that relationship. (For brevity, this figure does not include positive views of other stakeholders may indicate that principals’ views of other principals in the voice and accountability relationships are functioning effectively, choice relationship; for example, it does not include this suggests that there may have been increases in the students’ views of the school committee, nor parents’ coherence of accountability relationships as the project views of students, and so on. The full classifications of all progressed. Also, across all nine case study schools, stakeholder pairs’ of views are available in appendix C.) there was an increase in the proportion of positive views The figure shows a general trend toward stakeholders’ that were oriented toward student learning, from 0 views becoming more positive in successive rounds of percent at baseline in all cases. data collection. For example, representative views held by parents at the SAM+Score school SDI Konang were However, these increases were not uniform. In four out decidedly mixed at baseline: of the nine schools (SDN Engkangin, SDN Sungai Laur, SDI Konang, and SD Simpang Dua), the proportion of total In terms of teacher quality, we see there are positive views declined between midline and endline some teachers who have never gone into (while remaining higher than the baseline level). Similarly, the school. . . . Teachers aren’t very active in six schools, there was a higher proportion of learning- in teaching, to the point where children are oriented positive views at midline than at endline. (In always playing. We parents feel disappointed. SDI Sangka, this decline between midline and endline In my view, [only] 80 percent are active, we was such that none of the participating stakeholders don’t know why. expressed learning-oriented positive views of others at endline.) It is worth noting that all six of these schools At midline, parents at the same school had generally were in the SAM-only and SAM+Score treatment groups. positive views of teachers: Put differently, all three SAM+Cam case study schools saw monotonic increases in the proportions of total The change in this school is that teachers positive views and of learning-oriented positive views come on time. Secondly, the way they teach in successive rounds of data collection. These patterns has changed. Like in terms of behavior, must be interpreted with a great deal of caution given previously my child wasn’t neat [that is, the very small sample of schools involved, as well as the didn’t dress neatly] because they were only fact that the absolute proportions of positive views were in second grade, but now their teacher has lower in some SAM+Cam schools than in some schools given encouragement and shown examples, in the other treatment group. (For example, among so they now look neater when they go to the three schools in Landak district, the SAM+Cam school. school Sampuraneh had lower proportions of total positive views at endline than the SAM-only school SDN By the endline data collection, parents’ views of teachers Engkangin and the SAM+Score school SDN Sungai Keli.) in SDI Konang were not only positive but also clearly Nevertheless, these patterns at least fail to contradict oriented toward learning. For example, one parent said, hypothesis (b) that SAM+Cam was the most effective “With KIAT Guru, teachers are also able to open their treatment at raising student learning outcomes because eyes well to educate children well,” later commenting it was most effective at improving the coherence of that “the impact [of KIAT Guru] is that children’s learning accountability relationships. outcomes have gone up.” 23. 24. Figure 6. Principals’ and Agents’ Views of Each Other in the Voice and Choice and Management Relationships at Baseline, Midline, and Endline VOICE & CHOICE MANAGEMENT "Principals’ (as below) views of "Agents’ (teachers) views of ✪ Positive view, oriented to student learning agents (teachers)" principals (as below)" ● Positive view, general Principal’s Agents’ ◑ Mixed view (head teacher) (teachers) view view of agents of principal ? Neutral or don’t know (teachers) (head teacher) ○ Negative view [ ] Not mentioned or asked Students Parents School committee Village head UC Village cadre Students Parents School committee Village head UC Village cadre SAM-Only SANGKA base ○ ◑ ● ? ◑ ○ ◑ ◑ ● mid ● ● ● ● ● ✪ ◑ ○ ◑ ◑ ◑ ◑ end ◑ ● ● ● ◑ ● ✪ ◑ ○ ◑ ● ● ◑ ENGKANGIN base ◑ ◑ ◑ ○ ◑ ◑ ● ○ ● ● mid ✪ ● ● ◑ ● ◑ ◑ ● ● ◑ ● ◑ end ✪ ● ● ● ● ● ◑ ◑ ◑ ○ ● ● ✪ ● SUNGAI LAUR base ◑ ◑ ◑ ◑ ○ ◑ ● ◑ ◑ mid ● ◑ ◑ ◑ ◑ ○ ● ● end ● ● ● ● ● ● ◑ ◑ ○ ◑ ◑ ● ● SAM+Score KONANG base ◑ ◑ ● ◑ ◑ ○ ◑ ○ ● ◑ mid ● ● ● ● ◑ ● ◑ ? ? ◑ ● ◑ end ● ✪ ● ● ● ◑ ◑ ◑ ? ● ● ◑ ● ◑ SUNGAI KELI base ● ◑ ◑ ◑ ◑ ◑ ◑ ? ● ◑ mid ◑ ● ◑ ◑ ● ◑ ◑ ◑ ? ● ● ◑ end ● ● ● ● ● ● ● ◑ ? ○ ● ◑ ● ● SIMPANG DUA base ● ◑ ◑ ◑ ◑ ◑ ● ○ ◑ ◑ mid ● ● ● ◑ ◑ ● ○ ◑ ○ ● end ◑ ● ● ● ● ● ◑ ◑ ? ◑ ◑ ● ● ◑ SAM+Cam KONDOK base ◑ ● ● ● ◑ ○ ○ ◑ ◑ mid ● ● ● ● ● ◑ ◑ ○ ? ◑ ✪ ● end ● ✪ ● ◑ ● ● ✪ ● ● ◑ ● ✪ ● SAMPURANEH base ◑ ◑ ○ ◑ ◑ ○ ○ ○ ● ● mid ◑ ● ● ● ● ✪ ◑ ○ ◑ ● ◑ end ● ● ● ● ● ● ✪ ◑ ○ ○ ◑ ? ● ● "USABA base ◑ ◑ ◑ ◑ ◑ ◑ ◑ ○ ● ◑ SEPOTONG" mid ● ● ● ● ◑ ○ ● ○ ● ○ end ● ✪ ● ● ● ✪ ✪ ◑ ● ● ◑ ◑ ● ◑ Source: Original figure for this publication 25. RESULTS 5.2.2 Disparities between principals’ and These mixed views of parents in the qualitative case agents’ views of each other study schools were echoed in the quantitative surveys across all KIAT Guru schools, where roughly half of all Despite this overall increase in positive views and teachers answered affirmatively when asked whether decrease in negative views, the snapshot in figure 6 also they think that parents have a good level of involvement indicates one distinct shortfall in the coherence (broadly in their children’s education. There was little variation construed) of village-level accountability relationships: in this proportion between the baseline and endline the fact that principals’ views of agents tended to be surveys, and between treatment groups (ranging from much more favorable than agents’ views of principals. 46.9 percent in SAM+Cam schools at baseline to 51.7 For example, while parents’ (that is, one set of principals) percent in both SAM+Score schools at baseline and SAM- views of teachers (that is, agents) in SDI Konang markedly only schools at endline). However, neither the qualitative improved, as illustrated in the quotes above, teachers at case study interviews and focus group discussions the same school were more circumspect in their views nor the quantitative surveys allow us to disentangle of parents. For example, one teacher commented at a the combination of reasons underlying teachers’ less baseline focus group discussion: positive views of school leaders (in the management relationship) and of other village stakeholders (in the Almost all of the children don’t have time voice and choice relationship). Possible reasons include to study at home, except for the children school leaders and village stakeholders being less of [village or government] officials. There’s effortful or effective in fulfilling their parts of the service almost no time, with parents just trying to agreements, versus teachers simply being more critical earn money. The problem of learning is just or exacting in their judgements of other stakeholders. handed over to the school. At home, teaching Power dynamics related to teachers being more critical is only limited to things like manners. How in their judgments of other stakeholders are explored could they teach their children when they further in section 5.3. can’t read? 5.2.3 Summary This same teacher was somewhat more positive during a midline interview, noting that “there’s been a change This analysis suggests first that stakeholders viewed in parents’ motivation to make sure that their children each other more positively as the project progressed come to school.” However, he also commented that and second that these positive views became more roughly half of all parents failed to put their signatures oriented toward student learning. This aligns loosely on their children’s homework, a required action under with hypothesis (a) that attributes the student learning the KIAT Guru service agreements that parents support gains across all treatment groups to cross-treatment their children’s learning at home. He attributed this increases in the coherence of accountability relationships negligence to “the reason being that some parents still for student learning. However, as noted above, there don’t have awareness.” In his endline interview, this was limited concurrence between principals’ more teacher’s mixed views about parents persisted: positive views of agents and agents’ less positive views of principals; the SAM+Cam school SDK Kondok was It’s true that a portion of parents, 60 percent, the only school where principals and agents in both have fulfilled the service agreement. . .. the management and voice and choice relationships Some parts of the community don’t yet had generally positive views of each other at endline, as understand their roles and responsibilities in shown in figure 6. supporting children at home. 26. RESULTS As for hypothesis (b), which proposes that there were effective in coherently aligning the different elements of greater increases in the coherence of accountability the voice and choice relationship. A key design element relationships in SAM+Cam, there is some limited support in this coherence analysis is support to strengthen the in the coarse-grained summary shown in figure 5, in that accountability relationship. In the remote schools where the three SAM+Cam schools were the only case study KIAT Guru was implemented, the hierarchical differences schools that saw uniform increases in the proportions between higher-status teachers and school leaders of both generic positive and learning-oriented positive (agents) and lower-status parents and community views over time. Yet these advantages of SAM+Cam members (principals) meant that UCs in SAM-only and schools are neither large nor absolute, nor are they SAM+Score were insufficiently empowered for a strong clearly apparent in the more granular representation in accountability relationship between these committees figure 6. In short, at this level of detail, there is no distinct and teachers or school leaders. In SAM+Cam, the relationship between each treatment group’s impact on cameras provided UCs with teacher attendance data student learning and changes in coherence between that was neutral rather than observer-dependent, which stakeholders in accountability relationships. This absence not only improved the quality of information but also of a clear relationship applies not only when comparing gave support by boosting the perceived legitimacy of student learning gains across treatment groups, but UCs’ evaluations of teachers. also when looking at changes in student learning outcomes and coherence between stakeholders across In this section, we first draw on the qualitative case study the nine individual case study schools. The next section data to show the extent of the power imbalance between will consider a greater level of specificity—looking not teachers and community members, and to compare the only at overall coherence in stakeholders’ views of each coherence between design elements within the voice other but also at design elements within stakeholder and choice relationship across the three treatment relationships. groups. We then return to the process monitoring and quantitative survey data to weigh the evidence for two alternative hypotheses that could weaken the validity of 5.3 Coherence between design our explanation. elements within the voice and 5.3.1 Power dynamics in the voice and choice relationship choice relationship In this part of the analysis, we examine how the differences between the treatments manifested in the As in similar settings, teachers and school leaders in design elements of the voice and choice relationship, and the remote villages where KIAT Guru was implemented how this affected the degree to which this relationship tend to have significantly higher social status than the was coherent for learning. Accordingly, we focus on children and families whom they serve (see section 3 for hypothesis (b), that is, that SAM+Cam had the largest some examples). In other words, the agents in the voice student learning gains because it also had the greatest and choice relationship hold more power than their improvements in coherence for learning. principals. This can affect the local community’s ability to hold teachers accountable. For example, a SDI Sangka As noted in section 3, KIAT Guru affected both the UC member said, in an endline focus group discussion, management relationship and the voice and choice “We have never reprimanded them about their work relationship. That said, in this section we focus on the [because we’re] afraid they would be offended.” This village-level voice and choice relationship because the fear may be well-founded: in SDN Sungai Laur, a teacher qualitative case studies offer particularly rich data from who disagreed with their scorecard ratings angrily interviews and focus groups with stakeholders within confronted the UC “to the point where the whole village this relationship. We posit that SAM+Cam was the most knew” about the conflict (SDN Sungai Laur midline focus effective treatment group because it was the most group with UC members). 27. RESULTS Teachers are more empowered than most village-level Simpang Dua midline focus group with teachers), and stakeholders because of their education. This is partly a teacher scorecard rating of 113 percent (SD Simpang a question of social status. According to a parent in an Dua midline interview with school leaders).16 endline focus group in SDI Konang, Another factor that tilted the power balance toward KIAT Guru also had some negative impact teachers was their direct influence over individual because, actually, UCs don’t have the right children’s school experiences. For example, UC to evaluate teachers. As far as we know, members expressed the fear that agreeing to serve only inspectors can evaluate. . . . In my view, on the committee would result in teachers threatening teachers might just say to themselves, ‘you not to promote their children to the next grade (SDN farmers, on what basis could you evaluate Sungai Keli midline focus group with UC members) or in me?’ They might just feel, in their hearts, teachers not paying any attention to their children during that the community isn’t eligible to do classroom lessons (SDI Konang midline focus group with evaluations.15 UC members). In SDI Sangka, one UC member said in an endline focus group that children were afraid of speaking Besides social status, education levels also affect local to the UC because teachers had allegedly threatened perceptions about capacity for accurate evaluation. For to beat any children who gave information to the UC example, the village secretary in SDI Sangka said in an about teachers’ work. (That said, it is also important to endline interview that some UC members “don’t suit note that children in KIAT Guru schools also had agency. the standard” because, in his opinion, all UC members They were not necessarily passive service recipients. In should at least have completed upper secondary SDN Sungai Laur, some children had intentionally lied or school—a sentiment that was echoed by a SDN Sungai exaggerated unfavorably about their teachers to the UC, Laur parent in a midline focus group. To some extent, as mentioned separately by teachers in a midline focus this emphasis on formal educational qualifications group, the school leader in both midline and endline reflects real capacity issues in UC scorecard appraisals, interviews, and the village cadre in an endline interview. which were mentioned by village stakeholders in all three Some children also took advantage of the scorecard treatment groups. For example, a school committee prohibitions on corporal punishment to actively provoke member in the SAM-only school SDI Sangka said that teachers and behave rudely, because “they know that if “the service agreements don’t make sense, because the teacher gets mad, if they pinch me, I can tell the UC it’s the teachers who understand how to teach the and ask them to fire that teacher,” as an SDN Engkangin children, while the UC isn’t able to,” a sentiment echoed UC member related in an endline focus group.) in an endline interview with a teacher in the SAM+Cam school SD Usaba Sepotong. Also, teachers in an endline Besides teachers’ educational status and their sway focus group in the SAM+Score school SDN Sungai Keli over individual children’s experiences, another factor in remarked that UC members didn’t fully understand the power dynamic between principals and agents was their duties, partly because their relatively low education legitimation from higher authorities, or the lack thereof. levels hampered their ability to interpret the language Such legitimation is highly valued. This was apparent of the service agreements. Specific errors in appraisal during the midline round of data collection in SDN Sungai mentioned by teachers and school leaders include a Laur, which took place shortly after the chair of the UC had teacher being penalized for an action that had in fact failed to travel to the national capital, Jakarta, for some occurred the year before KIAT Guru was instituted (SD KIAT Guru capacity-building sessions at the invitation of See Broekman (2015) for similar views expressed from the teacher’s perspective in a separate study in Indonesia. 15 16 Another cause of UCs’ capacity issues was the fact that UC members were operating on a largely voluntary basis. Many UC members were farmers who could not leave their fields to monitor teachers’ arrivals and departures at school firsthand. Instead, they had to rely on school documents such as the teacher attendance register, as well as word of mouth from other children and other villagers—all of which could be unreliable (as noted by SDI Konang UC members in a midline focus group). 28. RESULTS the central government. This omission was mentioned This suggests first that the teachers would not have by multiple stakeholders, from other UC members who accepted the UC’s scorecard appraisals as legitimate on said that they were “deeply disappointed”; to teachers their own merits, but also that the authority conveyed who said that they had “all lost out” on the information by KIAT Guru as “a government program” was a crucial that would have been shared in Jakarta; to the village support in the UC–school accountability relationship. head who said that the errant UC chair “actually isn’t In turn, the UC members appear to have accepted the suitable to be the leader”; to a non-UC parent who said mantle of this government legitimation. During a midline they had been “proud” when the UC chair was invited to focus group at the same school, UC members spoke of Jakarta but “disappointed” by his failure to attend. In the a teacher who had refused to have her picture taken field research team’s daily report, they note that the UC using the KIAT Kamera application and who had “once chair was regarded as having “besmirched the names said, while pointing with her hand, that ‘you’re my former of the school and the village” in failing to pursue this students.’” One UC member deemed these actions to connection to higher administrative levels. have “lower[ed] our status as UC members”—implying both that relative status matters and that UC members The interplay between village-level status hierarchies regard themselves as having special status within and legitimation from higher administrative levels was the village hierarchy. As we show below, these power evident in this exchange between the SD Usaba Sepotong dynamics substantially influenced the effectiveness of (SAM+Cam) school leader and a field researcher in an the three different KIAT Guru treatment groups. endline interview: 5.3.2 Design elements and coherence in the voice and choice relationship School leader: Because of their lack of knowledge, we are evaluated by people who have As discussed in section 3 on the conceptual framework, a lower education level than we do. That’s what the design elements within any given accountability the teachers objected to, having former students relationship—as well as the coherence between those evaluate us. So there needs to be tolerance design elements—are pivotal to effective functioning from them [that is, lenience from the UC in of the relationship. We now examine design elements appraisals]. within the voice and choice relationship across the three Interviewer: Have you said this to the UC KIAT Guru treatment groups. These design elements before? are summarized in table 2 and are contrasted with the status quo in the control group. School leader: Yes, I have. Interviewer: The UC wasn’t angry? School leader: No, but the teachers have been angry with the UC. I said to the teachers, whatever scores they give, we will accept it, the important thing being that it’s not below 90. Interviewer: What was the motivation for teachers [to accept it]? School leader: Because it’s a government program, it must be done, and whatever happens it’s important that we submit. 29. RESULTS Table 2. Design Elements of the Voice and Choice Relationship Between Families/Community Members (Principals) and Teachers/School Leaders (Agents) for Each KIAT Guru Treatment Group Control SAM-only SAM+Score SAM+Cam Delegation There is no clear ➕ ➕ ➕ delegation from What do principals want The community specifies service standards, as contextualized in service the community to agents to do? agreement indicators, for teachers and school leaders. teachers and school leaders (besides the government-specified minimum service standard). Finance Parents sporadically ➕ ➕ ➕ contribute time, effort, What resources do Parents, village governments, and the community contribute various amounts and money to the principals provide to of time, effort, and money to the learning process and are regularly reminded learning process. agents? to do so via SAM processes. Information Community monitoring ➕ ➕ ➕ is informal and ad hoc. How do principals UCs monitor teachers and school leaders using community perceptions and know that agents are learning assessments, based on the service agreements. performing? ➕ In SAM+Cam, user committees also use cameras. Motivation Teachers and school ➕ ➕ ➕ leaders face some How does agents’ All teachers and school leaders face social rewards and penalties aligned social rewards and wellbeing depend on around the SAM. penalties, but these are performance? not systematic. ➕ ➕ In SAM+Score and SAM+Cam, allowance-eligible teachers and school leaders also face financial penalties. Support School committees ➕ ➕ ➕ receive little (if any) How is the Communities have awareness-raising sessions about rights, responsibilities, funding and technical accountability and quality in education. support. relationship UCs have commissioning decrees and some funding from the village strengthened? government, and visits from the project facilitator. ➕ In SAM+Cam, the camera also provides more authority to user committees. Legend: improvement relative to the control group, given the specific ➕ context (where the agents are more empowered than the principals, and so on) Source: Original table for this publication 30. RESULTS Beginning with delegation, or the specification of what In the past, before there was a UC, it principals want from agents, the SAM within KIAT Guru was difficult to ask parents for help. For improved the clarity and the degree of local consensus example, if we wanted to do something about performance expectations for teachers. In the they’d complain, but now they are willing to control group, dominant expectations of teachers contribute to providing uniforms. This means came from the central government’s minimum service they have become open and supportive of standards for all teachers, which are general, rather than school activities. being specifically calibrated by principals and agents at the local level.17 In contrast, the process of discussing These anecdotal reports of improvements in resource and agreeing on service agreements by village-level provision from the community were borne out in educational stakeholders in KIAT Guru treatment the quantitative survey data, with increases across groups yielded much more specificity and immediacy in treatment groups in the amount of time and money performance expectations for teachers. For example, a that parents contributed to the learning process (see teacher in the SAM+Cam school SDK Kondok said in an section 5.1.2). Still, there was considerable variation in endline interview that this increased provision, which affected teachers’ mixed views of parental contributions (see section 5.2.2). for me personally, I don’t run away from the Besides parents, village governments also increased duties of serving as a teacher. These come their educational contributions under KIAT Guru. Again, from service agreements from teachers, from there was wide variation in the resources that village the community, from parents, so that’s good heads allocated to support KIAT Guru (World Bank 2020), for me. The benefit for me is that I can do my with roughly half of all village governments starting new job well, as a guide for me. educational initiatives to honor the service agreements (see section 5.1.2). This clear, relational, personally endorsed delegation was due to the SAM process rather than the use of cameras There were, however, more distinct differences between for performance pay—as was evident in this teacher’s treatment groups in how KIAT Guru affected the design remark, immediately before the quoted statement, that elements of motivation, information, and support. For “if it’s the camera on its own, [some] people intentionally motivation, two of the treatments involved performance- won’t operate the camera.” Teachers across the three based deductions from a special allowance for eligible treatment groups similarly affirmed the value of service teachers working in remote and otherwise deprived agreements and scorecards as guides that orient them areas. Given that the special allowance was roughly equal toward their duties. to a teacher’s base salary, these deductions could affect agents’ (that is, teachers’ and school leaders’) wellbeing Another shift that was largely consistent across treatment considerably. Both principals and agents in SAM+Cam groups was improvements in finance, that is, greater and SAM+Score schools agreed that the performance- resource provision from families, the community, and based deductions were a significant motivator. For the village government toward children’s education. In example, when asked about what had changed at the the words of a SD Simpang Dua (SAM+Score) teacher in SAM+Score school SDI Konang, UC members reported a midline focus group: the following during an endline focus group discussion: 17 In remote areas, this conceptual distance in delegation often goes hand-in-hand with physical distance that affects information; district supervisors and other education bureaucrats who can facilitate accountability and support are often headquartered in district capital cities and may lack adequate transport resources to visit the schools under their purview, as documented in a study of Papua and West Papua provinces in Indonesia (UNICEF, 2012). 31. RESULTS there’s a shared commitment to UC member 1: Before KIAT, teachers were not strengthening teachers’ attention and very active. concentration. If this is ignored, then UC member 2: Looking at it as a community the community will draw unfavorable representative, we were afraid when a teacher conclusions. Now teachers are afraid of was late. We couldn’t say too much. Now there community assessment through the UC. is definitely a change in teacher attendance. Previously, they were not very aware of their Teachers in the SAM-only school SDN Sungai Laur duties, now they’re more hardworking. described the extent of this social pressure in a midline focus group discussion, directly linking the reputational UC member 3: Maybe they’re afraid of the pressure to central government legitimation of KIAT program. This links to their salary. Guru: UC member 4: Because of this UC duty, watching teachers from a distance early in the Teacher 1: Because [KIAT Guru scores] are morning, so they are afraid. They arrive on time reported to the center, its influence is on morale. because there’s someone monitoring. Teacher 2: They said it has no effect on our UC member 5: They are afraid that their salaries allowance, but it has to do with morale, loyalty and benefits will go down. to duty, good conduct. Teacher 3: Like when I got scolded at the UC In some cases, teachers themselves appreciated the meeting, I felt crushed as a human being. . role of performance pay in teacher motivation. When . . Those scores go directly to the District asked whether the allowance deductions were effective Education Management Unit, where all of them in improving teacher performance and student learning, are my friends, so now they know that [Teacher a teacher at the SAM+Cam school SD Usaba Sepotong 3] from that school is bad; so my friends from said in an endline interview, “If it’s reasonable, this my schooldays have seen it. deduction in value is good for teachers. Without it, then there’s nothing to control the teachers.” This social pressure was also pivotal in improvements Social rewards and penalties also helped to align teacher in practice from uncertified teachers in SAM+Cam and motivation with agreed-upon delegation. Crucially, SAM+Score schools who were ineligible for the teacher this reputational pressure was deeply felt even in the special allowance and, accordingly, were unaffected by treatment group that lacked financial incentives. In an the financial disincentive.18 One such teacher from the endline interview, the school leader in the SAM-only SAM+Score school SDI Konang spoke about the impact school SDI Sangka observed that KIAT Guru had led of KIAT Guru on his sense of responsibility: to “a new spirit” among both teachers and students, attributing this change to the fact that 18 The KIAT Guru impact evaluation (Gaduh et al., 2020) found that KIAT Guru negatively affected the attendance and effort of teachers who were ineligible for the teacher special allowance—and that this negative effect was significant among ineligible teachers in the SAM+Score treatment group. However, among the small subsample of schools in the qualitative case studies, there were no obvious between-treatment differences in the sentiments of teachers who were not eligible for the allowance. Across all three treatment groups, stakeholders noted that these ineligible teachers, who were subject to KIAT Guru appraisals despite receiving none of its financial benefits, felt disappointed and jealous of their colleagues. This also manifested in the non-performance-pay SAM treatment because many stakeholders mistakenly associated the provision of teacher special allowance with KIAT Guru, although they were distinct policy programs. The point here is not to support or contradict the impact evaluation observation of perverse effects among allowance-ineligible SAM+Score teachers, not least because the sample size of three schools per treatment group would hardly permit such conclusions. Rather, the point is to show that the KIAT Guru SAM generated non-financial incentives that substantially influenced teachers’ behavior, whether or not they were subject to performance pay (and whether the presence or absence of this financial disincentive was due to individual-level circumstances or to school-level treatment assignment). 32. RESULTS Before the program, sometimes I just did as I their interactions with higher-status teachers. According pleased. With this program, I am increasingly to a project facilitator who supported KIAT Guru in SDK aware of what our duties as teachers should Kondok and five other schools, be. I think this is a very extraordinary impact. . . . In the beginning, I was the one who had the most accurate [scores] are those using the lowest score, so I reflected and thought cameras, because teacher attendance is that my friends were chasing points, chasing really evidenced with those cameras. They 100, because of the allowance. But then I are more scared of the cameras than of the was still motivated because even though I UCs. Especially when it is connected to their don’t get the allowance, this had become my allowance. main duty whether I liked it or not, so I had to improve myself. That is, the accuracy of the camera-mediated information, together with the threat of allowance deductions, added Given the extent of this social and reputational pressure legitimacy to the social accountability process. This was on teacher motivation, we cannot conclude that the echoed by a Sampuraneh school committee member different financial incentives of the treatment groups in an endline interview, who observed that teachers were sufficient to explain the greater effectiveness of were afraid of the camera-wielding UC but had no such SAM+Cam.19 fear of the preexisting school committee. The cameras in SAM+Cam rebalanced power dynamics toward the Finally, we turn to the two remaining design elements, community as represented by the UC. The fact that information and support. The cameras in SAM+Cam this was a question of power, legitimacy, and status gave this treatment group a distinct advantage in both rather than simply a question of financial penalties is of these elements (see table 2 ). In terms of information, apparent in the way teachers spoke about the cameras. teachers and UC members across all three SAM+Cam For example, a contract teacher said in the endline schools said that they appreciated the cameras because teacher focus group in SD Usaba Sepotong, “Once, I they were harder to manipulate than other information didn’t want to be photographed, because I considered sources, whether by lying or by lobbying. Notably, a it an oppression of teachers.” This teacher was ineligible few frontline providers (for example, the Sampuraneh for the teacher special allowance and thus unaffected school leader in his midline interview) mentioned that by the financial incentives. Yet they viewed the camera the cameras also reduced manipulation by UC members not only as an instrument of information, but also an who wanted to preferentially inflate certain teachers’ instrument of authority. scores. These favorable perceptions of the cameras for improving information on teacher attendance were Although UCs under SAM+Cam enjoyed legitimation also reflected in the quantitative survey. On average, from the visibly official, purposefully designed, tamper- SAM+Cam UC members rated the cameras 9.42 out of proof smartphone cameras, there were also other 10 for helpfulness in evaluating teacher attendance, with sources of support that strengthened the voice and two-thirds of UCs choosing the highest possible rating. choice relationship in all three treatment groups. First, as described in section 2.2, the village government formally In addition to their immediate role in improving the appointed the UCs and the village cadre and allocated quality of information, the cameras also played an some funding for the implementation. Second, project indirect but vital role in support. Specifically, the cameras facilitators visited the schools at regular intervals to strengthened the voice and choice relationship by lending coordinate meetings, provide on-the-job training for UCs authority to the relatively disempowered principals in and the village cadre, and facilitate the communication We further explore this alternative explanation related to incentive structures in section 5.3.4. 19 33. RESULTS of challenging issues between stakeholders. These expectations that they delegated to teachers had too supports positioned the UCs, and the village-level voice little influence over teachers’ motivation. In SAM+Score and choice relationship more generally, to be far more schools, the imbalance was worsened because UCs effective and empowered than the long-established but were seen as having too much influence (in the context circumscribed school committees. of established local power dynamics) on the financial sources of teachers’ motivation, such that teachers Notwithstanding these sources of support that were challenged UCs’ perceived overreach. Given these common across treatments, our conclusion after imbalances between the design elements, the village- examining the data is that SAM+Cam was the only level voice and choice relationships in SAM-only and treatment group in which the design elements in SAM+Score schools were not coherent enough to the voice and choice relationship were coherent for sustain the stakeholder cooperation needed to improve improvements in student learning in the context of the student learning. In contrast, the new demands on UCs in principals and agents in these remote village schools.20 SAM+Cam were balanced with support, motivation, and Across all treatment groups, the service agreements information, such that the voice and choice relationship improved delegation between principals and agents, was adequately coherent for improvements in student particularly in the clarity, contextualization, and learning. emphasis on student learning. The SAM also mobilized greater contributions of finance and resources from From the previous two paragraphs, it is apparent that our the community toward educational provision, which argument for the greater effectiveness of SAM+Cam is made teachers more amenable to the more demanding fairly complex. The argument requires consideration not expectations. The monthly appraisal meetings, along only of several design elements that interact with different with various instruments for collecting information degrees of coherence, but also of context-specific social on teachers’ and school leaders’ performance, also hierarchies and power dynamics. To stress-test this improved the quality of information in the voice and argument, we weigh the evidence for two alternative, choice relationship. This was particularly true of more parsimonious hypotheses. As shown in table 3, SAM+Cam with its manipulation-resistant teacher the first hypothesis focuses only on the design element attendance records, which teachers were less likely to of information, which prioritizes the technical aspect of question—unlike the scorecard appraisals, which some the accountability relationship, rather than the social teachers did question extensively due to their doubts aspects emphasized in our hypothesis. The underlying about UCs’ appraisal capacity, as noted above. assumption here is that for agents to perform well, it is sufficient for principals to monitor them accurately. However, in SAM-only and SAM+Score, the voice and In turn, the second alternative explanation focuses on choice relationship still fell short of coherence in the design elements of delegation, information, and the intervention contexts. Specifically, the improved motivation. This alternative explanation prioritizes delegation via the SAM placed greater demands on the agents and the incentive structures that they face. The relationship than it was able to deliver. Lacking technical assumption here is that for agents to perform well, it and symbolic support from the camera, UC members is sufficient to ensure that they face clear incentives. in SAM-only and SAM+Score could not marshal enough As noted in section 4.2 on the analytical approach, we authority, as lower-status villagers, to credibly hold weigh each alternative hypothesis by looking at both the teachers accountable to the new and more demanding strength of evidence for the alternative hypothesis itself performance expectations. In SAM-only schools, the and for its explanatory power in accounting for other imbalance was worsened because UCs could not deploy findings from the field data. financial disincentives, such that the performance 20 This also aligns with Dewi’s, Sharon’s, and Usha’s firsthand observations of KIAT Guru as they were involved in the project as researchers throughout its implementation. 34. RESULTS Table 3. As noted above, the theory of change here is that Design Elements in the Village-Level Voice and technical improvement in monitoring was sufficient for Choice Relationship That are Incorporated into improving teacher performance and student outcomes. Hypotheses for the Greater Effectiveness of This implies that the binding constraint in the other SAM+Cam treatments—or, at least, in SAM+Score, which also had a performance pay element—was the quality Our Alternative Alternative hypothesis hypothesis 1 hypothesis 2 of information. A possible mechanism here is that teachers’ financial penalties in SAM+Cam were based Delegation √☐ √☐ on this higher-quality, more neutral information source Finance √☐ that made the treatment more acceptable to teachers. Information √☐ √☐ √☐ In contrast, the less reliable scorecard ratings, as in Motivation √☐ √☐ SAM+Score, resulted in teachers becoming dissatisfied Support √☐ with KIAT Guru and, consequently, demotivated (Gaduh Source: Original table for this publication et al. 2021; World Bank 2020). The empirical data offers some suggestive evidence that 5.3.3 Alternative hypothesis 1: Not UCs under SAM+Cam had higher-quality information, and coherence, but better information that the quality of information mattered. For example, one piece of suggestive evidence that SAM+Cam had The first alternative hypothesis for the comparative better information is that average teacher scorecard success of the SAM+Cam treatment focuses on ratings were slightly but consistently lower—that is, less information. Specifically, it is possible that the inflated—in SAM+Cam than in SAM+Score and SAM- improvements under SAM+Cam were driven primarily only, as shown in figure 7. This is true whether or not the by the higher-quality teacher attendance information attendance indicator is included in the weighted average from the cameras. This would imply that the cameras did ratings. It is also true whether aggregating across all not confer symbolic support for the UCs’ authority, and monthly appraisal rounds or looking at monthly within- also that the driving factor was not coherence between treatment averages across the full duration of the study all the design elements, but rather the superiority of one (with minor deviations from this pattern in the month- design element on its own. by-month case).21 21 For aggregated averages across all intervention months, the average scorecard rating, including the attendance indicator, was 92.6 percent for teachers under SAM, 94.3 percent for SAM+Score, and 90.4 percent for SAM+Cam. For weighted average ratings after removing the attendance indicator, the average ratings were 92.7 percent for SAM, 94.7 percent for SAM+Score, and 91.6 percent for SAM+Cam. In figure 7, the dip in average scores in SAM+Cam schools coincides with the first round of assessment following the first round of amendments to service agreement indicators; however, it is unclear why there was a sharp drop in SAM+Cam average scores but not in the other treatment groups. 35. RESULTS Figure 7. Average Teacher Scorecard Ratings By Treatment Group, with and without the Attendance Indicator (a) Average teacher scorecard rating, all indicators (b) Average teacher scorecard rating, excluding the attendance indicator 100% 100% 80% 80% 60% SAM-Only 60% SAM-Only SAM+Score SAM+Score SAM+Cam SAM+Cam 40% 40% 20% 20% 0% 0% Apr-17 Aug-17 Dec-17 Apr-18 Aug-18 Dec-18 Apr-17 Aug-17 Dec-17 Apr-18 Aug-18 Dec-18 Source: Original figure for this publication However, for this alternative hypothesis to be persuasive, Similarly, a SDK Kondok teacher initially noted in an the field data also need to indicate that information endline interview that “if it’s only manual [that is, quality in SAM+Cam improved sufficiently to alleviate written recording of attendance], sometimes teachers what would otherwise have been a binding constraint, aren’t honest, but if it’s with cameras then it can’t be as experienced in the other two treatment groups. The manipulated”—before adding, on reflection, that, “if data do not show this conclusively. For one thing, it is it’s just the camera alone, then people can purposely not obvious that the information available to UCs under not operate the camera.” A Sampuraneh UC member SAM+Cam was substantially better than that available said in an endline focus group that they would prefer to UCs under the other two treatments. Despite the it if KIAT Guru did not use cameras, because they cameras to verify teacher attendance registers, the preferred a more relational, heartfelt accountability, and teacher attendance indicator in SAM+Cam schools was that there was always the possibility that the camera not immune to subjectivity—a fact noted by stakeholders would fall out of use—also noting, “It may be a camera, in all three SAM+Cam qualitative case study schools. In a but those who operate it will always be human.” It is midline focus group discussion, an SD Usaba Sepotong worth noting that human manipulation of ostensibly teacher reported that tamper-proof monitoring equipment—both through arbitrary exemptions and purported “failures” to use the the other day, I didn’t want to use the camera equipment—was also a factor in the eventual failure of a because I actually did go home early. A user performance-pay-for-attendance program for nurses in committee member said I could just go home Rajasthan, India (Banerjee, Duflo, and Glennerster 2008) first and take a photo later, but I didn’t want to tell a lie. 36. RESULTS Additionally, process monitoring data from across all outcomes.22 Neither does it hold when the teacher KIAT Guru schools cast doubt on the proposition that attendance indicator is excluded, suggesting that any SAM+Cam scorecard ratings were more rigorous than in gains in information quality were limited just to the the other treatment groups. Table 4 shows correlations camera-verified indicator, without any spillovers to other between school-level average teacher scorecard indicators. In fact, school leaders in SAM+Cam schools ratings at endline (both with and without the camera- gave UCs slightly lower ratings for their ability to choose supported attendance indicator) and average student scorecard indicators that are important to children’s learning outcomes in both Indonesian language and learning (7.21, on a 10-point scale) compared with their mathematics at endline as well as baseline-to-endline counterparts in SAM-only (7.53) and SAM+Score (7.39). changes. If SAM+Cam scorecard ratings were of higher If the average UC in the other two treatment groups was quality than the scorecard ratings of the other treatment at least as competent as the average SAM+Cam UC in groups, then we would expect a consistently stronger appraising teacher performance on more complicated correlation in SAM+Cam between these ratings and the behaviors beyond merely showing up, it seems unlikely desired outcome, that is, student learning levels. Instead, that the camera-mediated improvements in teacher there are no consistent patterns. For the scorecard attendance information were sufficient to alleviate a ratings that include the attendance indicator, SAM+Cam posited binding constraint in the quality of information schools do have a stronger association in the expected that the other treatment groups hypothetically faced. All direction with average student learning outcomes—but of this weighs against the argument that better-quality these correlations are not large (0.260 for language and information, in and of itself, was the main driver behind 0.249 for mathematics). Moreover, this association does the larger student learning gains in SAM+Cam. not hold for changes in scorecard ratings and in learning Table 4. Correlations Between School-Level Student Learning Assessment Scores (in Indonesian Language and Mathematics) and Endline Teacher Scorecard Ratings (with and without the Attendance Indicator) Endline teacher scorecard ratings, Endline teacher scorecard ratings, all indicators excluding the attendance indicator SAM-only SAM+Score SAM+Cam SAM-only SAM+Score SAM+Cam Student endline -0.208* -0.239* 0.260** -0.234* -0.130 0.097 learning, change 0.184 0.224* 0.158 -0.368*** -0.101 0.117 language Student endline -0.339*** -0.127 0.249** 0.122 0.252** 0.113 learning, change 0.183 0.178 0.039 0.135 0.177 0.019 mathematics Source: Original table for this publication Notes: “Endline” denotes the average within-school student learning assessment/teacher service agreement score at endline. “Change” denotes the within-school percentage change in student learning assessments (comparing endline to baseline) and teacher service agreement scores (comparing endline to community-assessed retrospective ratings of how teachers performed prior to the start of the project). Significance level: * = 10 percent, ** = 5 percent, *** = 1 percent. 22 Note that the changes in teacher scorecard ratings were based on comparing the average endline teacher scorecard rating with community members’ retrospective evaluations of how teachers performed prior to the inception of KIAT Guru. Although these retrospective evaluations were recorded only a few months after the intervention started, they are nonetheless subject to recall bias. 37. RESULTS Besides the limited evidence for the alternative As shown in figure 8, the endline surveys saw a much hypothesis that information was the most important higher proportion than the baseline level of parents design element in the greater effectiveness of reporting that the quality of their children’s education SAM+Cam, another challenge to this hypothesis is its (29.3 percent at baseline compared with 65.8 percent limited explanatory power. Specifically, this alternative at endline)—which implies far more than teacher hypothesis cannot, on its own, account for the fact that attendance—had improved over the previous year (with SAM+Cam led to multifaceted improvements in teacher similar distributions among SAM and SAM+Score parents). performance above and beyond attendance rates, Similarly, among SAM+Cam UCs, 64.7 percent agreed that which were the only area for which information quality teacher performance had greatly improved since the UC improved. For example, a UC member at the SAM+Cam had been established (with 62.7 percent of SAM-only UCs school SDK Kondok said in an endline focus group that and 64.2 percent of SAM+Score UCs agreeing). If we take these data points at face value, they suggest that there there has been a change in the attendance of were general improvements in teacher performance teachers such that they are more careful with under SAM+Cam (as in the other treatment groups) that their service agreement. More active and went beyond the performance area for which information disciplined. . . . [In terms of] relationships quality improved. This weighs against the possibility between teachers and parents, there are that teachers were responding to improvements in many examples of good communication with information alone. If instead we interpret the similarity the community. . . . Teachers have fulfilled in survey response patterns across treatment groups as their promise to use a variety of teaching an indication that UCs and parents had limited capability and learning methods, and the children feel for discerning meaningful improvements in performance, happier. this would weigh against the possibility that information was a binding constraint in the other treatment groups These encouraging improvements are evident not only but not in SAM+Cam.23 Either way, these survey indicators across the case study schools but also in the quantitative on improvements in educational quality, together with survey data. qualitative field data to the same effect, weaken the case for the information-focused hypothesis. Figure 8. Parents’ Perceptions of the Quality of Their Children’s Education Compared with the Previous Year Baseline 26.3% 65.6% SAM-Only Endline 63.3% 31.0% Baseline 29.3% 64.8% SAM+Cam Endline 65.8% 28.5% Better Same Baseline 27.1% 65.2% SAM+Score Worse Do not know Endline 63.4% 32.3% Source: Original figure for this publication 23 It is worth noting that these apparent similarities across treatment groups could be due either to (a) insufficiently granular response categories in the quantitative surveys or to (b) UCs and parents choosing to favorably inflate scorecard ratings to make their villages and schools look good. These possibilities may weaken the validity of the quantitative survey data on perceived performance improvements for the purposes of weighing alternative explanations. However, the arguments above from the process monitoring data (that is, scorecard ratings and learning assessments) and the qualitative case study data still stand. 38. RESULTS In short, although information certainly played a role in when those scores would have direct effects on their the greater effectiveness of SAM+Cam, and although the compensation, which was true in SAM+Score but quality of teacher attendance information in SAM+Cam neither in SAM-only, which did not have performance was probably better than in the other treatment groups, pay, nor in SAM+Cam, where financial incentives the data do not provide convincing support for the depended only on the nonnegotiable attendance argument that information, on its own, could sufficiently metric. However, contrary to this interpretation, the explain the differential effectiveness of the KIAT Guru camera-based teacher attendance indicator was, in treatments. fact, negotiable. The camera-recorded attendance data did not feed mechanistically to scorecard ratings and 5.3.4 Alternative hypothesis 2: Not financial penalties. Rather, SAM+Cam UCs had to verify coherence, but clear incentives what was recorded by the cameras and triangulate the (information + delegation + photographic records with formal leave allowances that motivation) had been approved by the school leader. This human- led verification process created room for “toleransi” A second alternative hypothesis is that SAM+Cam was (tolerance or lenience), as the village stakeholders called more effective because it created an incentive structure it. Tolerance could occur either on an individual basis, focused on a single, easily fulfilled metric, rather than the as reported in the quote above from an SD Usaba more complicated demands of SAM+Score. Teachers Sepotong teacher who said a UC member said she could responded to this straightforward, formal incentive leave school early and simply return later to take an end- structure by improving their performance, leading to of-day photo; or on a schoolwide basis, as reported in gains in student learning.24 In other words, the pivotal the Sampuraneh midline report from the field research design elements were delegation, information, and team who observed that “there was tolerance for motivation, as shown in table 3 above, rather than tardiness of up to 30 minutes that had been agreed by overall coherence across all five design elements. parties on both sides” and was “not in line with what was written in the service agreement indicators.” Some of the quantitative survey data on the relationship between UCs and teachers may appear, at first Another interpretation of the greater pressure reported glance, to support the incentives-focused hypothesis. by SAM+Score UCs comes from the coherence-focused When asked whether they feel pressured to give hypothesis: teachers in both SAM+Cam and SAM+Score better scorecard ratings to teachers, 20.6 percent of schools had incentives to pressure the UC to give them SAM+Score UCs responded affirmatively, compared higher scores (whether in attendance or overall), but with only 8.1 percent of SAM+Cam UCs and 8.5 percent the cameras lent the SAM+Cam UCs an authority that of SAM-only UCs. Similarly, 16.4 percent of SAM+Score reduced the likelihood that teachers would question their UCs reported receiving threats not to give bad scores to scorecard ratings. (As for SAM-only UCs, they faced less teachers, compared with only 7.4 percent of SAM+Cam pressure than their SAM+Score counterparts because UCs. According to the impact evaluation paper, their scorecard ratings had social consequences rather these differences between SAM+Score and the other than having both social and financial ones.) Compared treatment groups were significant (Gaduh et al. 2020). with the first interpretation, this interpretation is more consistent with the fact that UCs in both performance The greater pressure faced by UCs in SAM+Score could pay treatments could directly influence teacher salaries. be interpreted in (at least) two ways. One interpretation Hence the coherence-focused hypothesis can account is consistent with the incentives-focused alternative for these patterns in UC–teacher relationships more hypothesis: teachers pressured UCs about their scores persuasively than the incentives-focused hypothesis. 24 In one sense, this could be taken as a benevolent form of Holmstrom and Milgrom’s (1991) multitask problem: in low-resource, low-performance settings, perhaps focusing on a single, straightforwardly quantifiable outcome may be more effective than expecting agents and principals alike to divide their attention among multiple tasks. However, Holmstrom and Milgrom assess the allocation of time and attention across multiple tasks, whereas in this case, the focal “task” is showing up at work and thus having enough time to be allocated to tasks in the first place. 39. RESULTS Besides the survey data on pressure and threats to evaluation, which indicates that the main differential UCs, perhaps an even more challenging set of data effect of KIAT Guru treatments was that SAM+Score points for the incentives-focused hypothesis is that the had a significant negative effect on teacher attendance, improvements in teacher attendance were neither large which was driven by attendance declines among enough nor sufficiently different between treatment teachers who were ineligible for the teacher special groups to convincingly demonstrate that attendance allowance (Gaduh et al. 2021). Among allowance-eligible was the main driver of student learning gains. As shown teachers, SAM+Cam had a small positive treatment in column 4 of table 5, gains in teacher attendance under effect on teacher attendance (a gain of 5 percentage SAM+Cam were non-significant (which was also true points over the control group mean of 84 percent across all intervention groups). Moreover, as shown in attendance), but this effect was only significant at the columns 5, 6, and 7, respectively, two-sample t-tests also 15 percent level—and, unlike the student learning show that there was no significant difference between gains under SAM+Cam, the effect on eligible teachers’ changes in teacher attendance under SAM+Cam and attendance did not persist in the follow-up study one changes in teacher attendance in the control group; year after the project facilitators had left (Gaduh et al. there were also no significant differences between such 2021).26 In short, the salient point here is that SAM+Cam changes in teacher attendance between SAM+Cam and did not significantly improve teacher performance on SAM-only, nor between SAM+Cam and SAM+Score. The the incentivized service agreement indicator—that is, only significant between-group difference in baseline– attendance—which weighs against the hypothesis that endline teacher attendance was between SAM-only the effectiveness of SAM+Cam was driven primarily by and SAM+Score (p = 0.10, not shown in the table).25 clear incentives. This aligns with the regression analysis in the impact Table 5. Average Teacher Attendance Within KIAT Guru Treatment and Control Groups at Baseline and Endline, with Two-Sample T-tests for Between-Group Differences (%) Mean Difference between SAM+Cam and (standard errors) (p-value) Control SAM-only SAM+Score SAM+Cam Control SAM-only SAM+Score (1) (2) (3) (4) (5) (6) (7) 77.25 76.72 79.22 79.45 2.20 2.73 -0.23 Average baseline teacher (20.08) (16.43) (21.11) (18.61) (0.51) (0.37) (0.95) attendance 80.31 83.93 79.66 85.03 4.71 1.09 -5.37* at endline (18.51) (17.70) (16.87) (16.09) (0.12) (0.71) (0.06) Difference between 3.06 7.21 0.44 5.58 2.51 -1.63 -5.14 baseline and endline (20.85) (21.38) (25.50) (18.99) (0.47) (0.64) (0.19) N 67 68 67 68 135 136 135 Source: Original table for this publication Notes: Standard errors clustered at the school level. Significance level: * = 10 percent, ** = 5 percent, *** = 1 percent. 25 The corresponding t-tests for student attendance also failed to find significant gains in student attendance in the treatment groups (from baseline average student attendance ranging from 86.5 percent to 90.4 percent across the treatment groups). Also, changes in student attendance rates between baseline and endline in each treatment group were also not significantly different from changes in the control group. This aligns with the hypothesis that student learning gains in KIAT Guru were not due to increases in time spent in the classroom (by teachers or students) but were due to increases in the overall coherence of accountability relationships that led to changes in practice during time spent in the classroom and at home. 26 Beyond teacher attendance, the impact evaluation found that none of the treatments resulted in significant gains in the number of hours that teachers allocated weekly for school-related activities. However, teachers in SAM and SAM+Cam changed how they allocated these hours, such that there was a significant increase in time allocated to activities that were positively correlated with student learning (see Table 8 in Gaduh et al., 2021). This reinforces the argument that SAM+Cam (and KIAT Guru more generally) improved teacher performance not through the amount of time that teachers supplied but in how they used that time. This weighs against both the information-focused alternative hypothesis (because the informational difference in SAM+Cam focused on teacher attendance) and the incentives-focused alternative hypothesis (because the incentives in SAM+Cam similarly focused on attendance). 40. RESULTS 5.3.5 Summary On balance, the coherence-focused hypothesis can explain more of the field observations than either the information-focused hypothesis or the incentives- focused hypothesis. This is not to imply that information and motivation were unimportant, nor that the alignment between delegation, motivation, and information in the formal incentive structures was unimportant. However, these elements in isolation cannot sufficiently explain the observed patterns without broadening the explanation to include the social nature of the accountability relationship. Village-level status hierarchies require support in order to sufficiently empower UCs to enforce the more demanding delegation of KIAT Guru; attention should also be given to the value of reciprocity in the accountability relationship, as demonstrated in greater finance and other resources that village stakeholders contributed to complement the greater efforts demanded of teachers. Emphasizing the coherence between elements in this fundamentally social relationship of accountability may not be the most straightforward explanation, but it has the most explanatory power. 41. DISCUSSION 06 Discussion Our analysis of the KIAT Guru project for accountability in education in remote Indonesian villages aligns with arguments for the importance of coherence between stakeholders in service delivery for the poor (Pritchett, 2015; World Bank, 2003). We find that one key element of coherence in the intervention context, where most local community members were relatively disempowered compared with their frontline service providers, is support that comes from tools, processes, and structures that strengthen the accountability relationship, especially by empowering the weaker set of actors. Where communities have few resources, local-level accountability is most effective when communities are empowered with both the technical tools and legitimacy to hold teachers and school leaders accountable for education service delivery. In the SAM+Cam treatment, using dedicated smartphone cameras to monitor teacher attendance improved both the technical quality and the symbolic authority of UCs’ monitoring of teachers. Our analysis also aligns with arguments for combining community-led voice with state-led teeth in social accountability (Fox 2015; see also Kosec and Wantchekon 2020). KIAT Guru improved the coherence between the voice and choice relationship (between communities and frontline providers) and the management relationship (between education authorities and frontline providers) by involving school leaders and a government-funded special allowance in a community-monitored performance pay scheme. By combining community-led voice with state-led teeth in a way that is coherent with the configuration of principals and agents and other aspects of the implementation context, KIAT Guru bucked the trend of school-based management committees in many parts of the Global South that present a formal and politically appealing image of educational decentralization but fail to respond to local needs and preferences (for example, see Bano [forthcoming]). If we had not looked at village-level power dynamics and the importance of support, one (erroneous) conclusion might have been that SAM+Cam was the most effective treatment for raising student learning because teaching in these low-resourced remote schools is a straightforward task where the most important thing is the teacher showing up. Under this conclusion, potential side effects from skewed allocations of attention toward the incentivized behaviors (Holmstrom and Milgrom 1991; Murnane and Cohen 1986) would be irrelevant. However, taking power dynamics and social status hierarchies into account makes it clear that SAM+Cam was the most effective treatment precisely because teaching was regarded as a complex task. Consequently, the UCs were regarded as lacking the 43. DISCUSSION legitimacy to evaluate teachers, especially when this Methodologically, our analysis highlights the value evaluation affects teacher pay. To use the terminology of combining different types of data and levels of of Honig and Pritchett (2019), SAM+Cam struck the analysis in explaining the mechanisms underlying an right balance between accounting-based accountability intervention. Our argument for the importance of that linked official financial incentives to standardized, support with attention to local power dynamics would verifiable camera-monitored attendance data and have been far weaker without the qualitative data from accountability that linked social rewards and penalties interviews and focus groups in the nine case study to multifaceted community scorecards that were part of schools. Yet the qualitative data from this small sample a deliberative, interactive social accountability process. of schools, however richly detailed, would not have been Crucially, these two aspects of the voice and choice as persuasive if it had not been complemented by the accountability relationship were mutually reinforcing, quantitative process monitoring data, survey data, and that is, coherent.27 That said, it is worth noting that in service agreement indicator data from across all 203 a different context where parents and community treatment schools. It is also worth noting that different members were at least as well educated as their children’s types and levels of data were suitable for different teachers, SAM+Score could hypothetically be the more aspects of the analysis; for example, granular interview coherent intervention. In such a context, the community and focus group data led to an understanding of how may have the technical capacity and social legitimacy to teachers perceived the intervention, and aggregated evaluate scorecards tailored to improving the complex data from unannounced visits to schools helped gauge task of teaching, holding teachers accountable through whether the intervention affected teacher attendance financial incentives based on the scorecards. rates. 27 For an example of complementary interventions that improved the overall coherence of the management relationship and resulted in student learning gains, Mbiti et al. (2019) find that providing both financial grants and test-based teacher financial incentives to schools in Tanzania improved student learning far more than providing either the grants or the incentives in isolation. 44. CONCLUSION 07 Conclusion Frontline service delivery failures, particularly by those serving poor communities and those in remote locations, remain frustratingly common throughout low-income countries and some middle-income countries. The case of KIAT Guru shows that strengthening the capacity and authority of parents and the broader community could be one pathway for improving service delivery outcomes and holding service providers more accountable. However, this requires strong policy support, particularly in a context where power relations between the two parties are not balanced. It is worth restating that all three KIAT Guru treatments improved learning outcomes. In other words, independent of whether teachers faced financial disincentives, the SAM seemed to work. However, the empowerment of poor remote communities needs multiple reinforcements: regulations and resources (from government), on-site guidance and endorsement (from project facilitators), knowledge of which information and data are relevant (from capacity building), and willingness to contribute time (from the communities themselves). While successfully improving learning outcomes, KIAT Guru project implementation was an intervention too complex to be scaled up. Given that SAM is a package of interventions (even apart from the different performance pay mechanisms), it would be useful, from a cost-effectiveness perspective, for future studies to identify which elements matter most—whether information on benchmarked learning outcomes, joint service agreements, community monitoring, or monthly meetings. To that end, between 2019 and 2020, the same project partners expanded a simplified version of the project to expand the scope to 410 schools. This Phase 2 project streamlined the SAM and digitized some of the administration process (including learning outcomes and service agreements). It also varied the configuration of involved stakeholders to test the influence of different accountability relationships: one more external to the school, as in the Phase 1 treatments discussed in this paper, and one more internal to the school, where the preexisting school committee took on the role of the UC. Although some of these Phase 2 iterations seemed to improve program effectiveness, the impact evaluation could not be completed due to the pandemic (World Bank 2021). Overall, the three treatment groups discussed in this paper differ in their potential for sustained collaboration and impact. This study validates findings from the previous KIAT Guru studies, suggesting that SAM-only and SAM+Score were not as effective as SAM+Cam because they were 45. CONCLUSION less coherent. SAM+Score generated the most pushback from teachers who deemed community members to lack the capacity and legitimacy to evaluate them. As for SAM-only, the follow-up KIAT Guru impact evaluation found that student learning gains were not sustained after the project facilitators stopped supporting the project in each village (Gaduh et al. 2021). However, SAM+Cam appears to be coherent for improvements in student learning in these implementation contexts because teachers deemed the evaluation to be fair, while community members found that the cameras boosted their limited capacity. 46. REFERENCES 08 References ACDP (Analytical and Capacity Development Partnership). (2014). “Study on Teacher Absenteeism in Indonesia 2014.” Jakarta, Indonesia: Education Sector Analytical and Capacity Development Partnership. Andrabi, T., and C. Brown. (Forthcoming). “Subjective Versus Objective Incentives and Employee Productivity.” RISE Working Paper Series. Andrabi, T., J. Das, and A. I. Khwaja. (2017). “Report Cards: The Impact of Providing School and Child Test Scores on Educational Markets.” American Economic Review 107 (6): 1535–63. ASER. (2014). Annual Status of Education Report (Rural) 2013. New Delhi, India: ASER Centre. Banerjee, A. V., and E. Duflo. (2008). “Mandated Empowerment: Handing Antipoverty Policy Back to the Poor?” Annals of the New York Academy of Sciences 1136 (1): 333–41. https://doi.org/10.1196/annals.1425.019. Banerjee, A. V., E. Duflo, and R. Glennerster. (2008). “Putting a Band-Aid on a Corpse: Incentives for Nurses in the Indian Public Health Care System.” Journal of the European Economic Association 6 (2–3): 487–500. https://doi.org/10.1162/ JEEA.2008.6.2-3.487. Bano, M. (Forthcoming). “International Push for SBMCs and the Problem of Isomorphic Mimicry: Evidence from Nigeria.” RISE Working Paper Series. Barrera-Osorio, F., P. Gertler, N. Nakajima, and H. A. Patrinos. (2021). Promoting Parental Involvement in Schools: Evidence from Two Randomized Experiments. Research on Improving Systems of Education (RISE). https://doi. org/10.35489/BSG-RISE-WP_2021/060. Beatty, A., E. Berkhout, L. Bima, T. Coen, M. Pradhan, and D. Suryadarma. (2018). Indonesia Got Schooled: 15 Years of Rising Enrolment and Flat Learning Profiles. Research on Improving Systems of Education (RISE). https://doi.org/10.35489/BSG-RISE-WP_2018/026. 47. REFERENCES Bjorkman, M., and J. Svensson. (2009). “When is Community-Based Monitoring Effective? Evidence From a Randomized Experiment in Primary Health in Uganda.” Journal of the European Economic Association 8 (2–3): 571–81. https://doi.org/10.1111/j.1542-4774.2010.tb00527.x. BPS (Badan Pusat Statistik). (2017). Portret Pendidikan Indonesia. Statistik Pendidikan 2017. Jakarta: Badan Pusat Statistik. Broekman, A. (2015). “The Effects of Accountability: A Case Study from Indonesia.” In J. Evers and R. Kneyber (eds.), Flip the System: Changing Education from the Ground Up, 72–96. Taylor & Francis Group. https://doi. org/10.4324/9781315678573-7. Cilliers, J., I. Kasirye, C. Leaver, P. Serneels, and A. Zeitlin. (2018). “Pay for Locally Monitored Performance? A Welfare Analysis for Teacher Attendance in Ugandan Primary Schools.” Journal of Public Economics 167: 69–90. https://doi.org/10.1016/j.jpubeco.2018.04.010. De Ree, J., K. Muralidharan, M. Pradhan, and H. Rogers. (2018). “Double for Nothing? Experimental Evidence on an Unconditional Teacher Salary Increase in Indonesia.” Quarterly Journal of Economics 133 (2): 993–1039. https://doi.org/10.1093/qje/qjx040. Duflo, E., R. Hanna, and S. P. Ryan. (2012). “Incentives Work: Getting Teachers to Come to School.” American Economic Review 102 (4): 1241–78. https://doi.org/10.1257/aer.102.4.1241. Fox, Jonathan A. 2015. ‘Social Accountability: What Does the Evidence Really Say?’ World Development 72 (August): 346–61. https://doi.org/10.1016/j.worlddev.2015.03.011. Gaduh, A., M. Pradhan, J. Priebe, and D. Susanti. (2020). “Scores, Camera, Action? Incentivizing Teachers in Remote Areas.” RISE Working Paper Series 20/035. https://doi.org/10.35489/BSG-RISE-WP_2020/035. Gaduh, A. B., M. P. Pradhan, J. Priebe, and D. Susanti. (2021). “Scores, Camera, Action: Social Accountability and Teacher Incentives in Remote Areas.” Policy Research Working Paper Series no. 9748. World Bank. https:// openknowledge.worldbank.org/handle/10986/36112. Holmstrom, B., and P. Milgrom. (1991). “Multitask Principal-Agent Analyses: Incentive Contracts, Asset Ownership, and Job Design.” Journal of Law, Economics and Organization 7 (Special Issue): 24–52. Honig, D., and L. Pritchett. (2019). “The Limits of Accounting-Based Accountability in Education (and Far Beyond): Why More Accounting Will Rarely Solve Accountability Problems.” RISE Working Paper Series 19/030. https:// doi.org/10.35489/BSG-RISE-WP_2019/030. Huang, A. R., S. Revina, R. Fillaili, and Akhmadi. (2020). “The Struggle to Recruit Good Teachers in Indonesia: Institutional and Social Dysfunctions.” RISE Working Paper Series 20/041. https://doi.org/10.35489/BSG-RISE- WP_2020/041. Islam, A. (2019). “Parent–Teacher Meetings and Student Outcomes: Evidence From a Developing Country.” European Economic Review 111:273–304. Kosec, K., and L. Wantchekon. (2020). “Can Information Improve Rural Governance and Service Delivery?” World Development 125:104376. https://doi.org/10.1016/j.worlddev.2018.07.017. 48. REFERENCES Kurniasih, H., V. Utari, and Akhmadi. (2018). “Character Education Policy and Its Implications for Learning in Indonesia’s Education System.” RISE Insight. https://doi.org/10.35489/BSG-RISE-RI_2018/007. Lumbanraja, S. K., and I. A. Prameswari. (2021). “Diagnostic Test to Increase Community Participation in Improving Learning Outcomes in Indonesia’s Remote Primary Schools: Quick Test (English).” World Bank Group, Brief. http://documents.worldbank.org/curated/en/977701617083603131/. Lumbanraja, S. K., I. A. Prameswari, and D. Susanti. (2021). “Community Participation in Measuring Learning Outcomes in Remote Areas of Indonesia: Results from the Development and Implementation of Tes Cepat Teacher Performance and Accountability (KIAT Guru).” World Bank, Jakarta, Background Paper. https:// openknowledge.worldbank.org/handle/10986/35466. Marliyanti, U. R. Adelina, and D. Susanti. (Forthcoming, 2022). “From Facilitation to Participation: Community Empowerment to Improve Education in Remote Areas.” Mbiti, I., K. Muralidharan, M. Romero, Y. Schipper, C. Manda, and R. Rajani. (2019). “Inputs, Incentives, and Complementarities in Education: Experimental Evidence from Tanzania.” Quarterly Journal of Economics 134 (3): 1627–73. https://doi.org/10.1093/qje/qjz010. Murnane, R., and D. Cohen. (1986). “Merit Pay and the Evaluation Problem: Why Most Merit Pay Plans Fail and a Few Survive.” Harvard Educational Review 56 (1): 1–18. https://doi.org/10.17763/ haer.56.1.l8q2334243271116. Narwana, K. (2015). “A Global Approach to School Education and Local Reality: A Case Study of Community Participation in Haryana, India.” Policy Futures in Education 13 (2): 219–33. https://doi. org/10.1177/1478210314568242. PAL Network. (2018). PAL Network 2018 Annual Plan: Assessment for Action. https://palnetwork.org/wp-content/ uploads/2019/03/2018_PAL_Annual-Plan-Budget_Final.pdf. Pawson, R. (2013). The Science of Evaluation: A Realist Manifesto. SAGE Publications. https://doi. org/10.4135/9781473913820. Pawson, R., and N. Tilley. (1997). Realistic Evaluation. London; Thousand Oaks, Calif.: SAGE Publications. Porpora, D. V. (2015). Reconstructing Sociology: The Critical Realist Approach. New York: Cambridge University Press. Pradhan, M., D. Suryadarma, A. Beatty, M. Wong, A. Gaduh, A. Alisjahbana, and R. P. Artha. (2014). “Improving Educational Quality through Enhancing Community Participation: Results from a Randomized Field Experiment in Indonesia.” American Economic Journal: Applied Economics 6 (2): 105–26. https://doi. org/10.1257/app.6.2.105. Pritchett, L. (2013). The Rebirth of Education: Schooling Ain’t Learning. Washington, DC: Center for Global Development. Pritchett, L. (2015). “Creating Education Systems Coherent for Learning Outcomes.” RISE Working Paper Series 15/005. https://doi.org/10.35489/BSG-RISE-WP_2015/005. 49. REFERENCES Spivack, M. (2021). “Applying Systems Thinking to Education: The RISE Systems Framework.” RISE Insight Note 2021/028. https://doi.org/10.35489/BSG-RISE-RI_2021/028. Stern, J., and L. Nordstrum. (2014). Indonesia 2014: The National Early Grade Reading Assessment (EGRA) and Snapshot of School Management Effectiveness (SSME) Survey; Report of Findings. USAID, EdData II. UNICEF. (2012). “‘We Like Being Taught’: A Study on Teacher Absenteeism in Papua and West Papua.” https://eric. ed.gov/?id=ED566745. World Bank. (2003). World Development Report 2004: Making Services Work for Poor People. The World Bank. https:// doi.org/10.1596/0-8213-5468-X. World Bank. (2019). Primary Education in Remote Indonesia: Survey Results from West Kalimantan and East Nusa Tenggara. World Bank. https://openknowledge.worldbank.org/handle/10986/33113. World Bank. (2020). “Community Participation and Teacher Accountability: Improving Learning Outcomes in Remote Areas of Indonesia.” https://openknowledge.worldbank.org/handle/10986/33807. World Bank. (2021). “Implementation Completion and Results Report (ICR) Document – Indonesia: Improving Teacher Performance and Accountability (KIAT Guru) Phase 2 - P167216.” http://documents.worldbank.org/ curated/en/897531625000304453/. 50. APPENDICES 09 Appendices Appendix A Descriptive Quantitative Analysis of the KIAT Guru Surveys and Process Monitoring The descriptive analysis below is formulated to quantitatively complement this reanalysis paper as well as the findings of the original qualitative study (World Bank 2020). The original qualitative study assesses the impacts of KIAT Guru treatments on the following categories: (a) teacher presence; (b) teacher performance and school leadership; (c) parent participation; (d) student learning, attitudes, and discipline; (e) village government activities; (f) UC effectiveness; and (g) village cadre performance. In this appendix, we provide descriptive analysis findings for all categories except (a) and (d), for which the impact evaluation reports provide robust analysis (Gaduh et al. 2020, 2021). A selection of variables provided from various questionnaires was employed to gauge any changes within the respective categories between baseline and endline surveys. In general, the changes we aim to focus on are those related to the coherence between the actions of various stakeholders and the objective of achieving better learning outcomes. Findings are presented in two categories: the overall treatment level, which covers 203 treatment schools; and case study schools, which covers nine treatment schools that were included in the qualitative study. Codes in parentheses (for example, LED05, DPA13) refer to indicator names used in the KIAT Guru quantitative survey and process monitoring datasets. 1. Teacher Performance • Treatment Level. Across all treatment groups, at least 98.5 percent of UCs in 202 schools28 saw improvements in teachers’ performance since they had become active (LED05). In comparison with the baseline, there was an increase in proportion of parents who viewed their children’s education to be at least “good” during the endline survey (DPA13; 28 During the 2018 midline survey, enumerators failed to meet any representatives from the UC in one of the SAM-only schools during the field visit, hence data are available for UCs in 202 out of the 203 schools. 51. APPENDICES from 86.1 percent at baseline to 95.3 percent at other than the routine SKP in only seven schools. endline) and a higher proportion of parents who However, at endline we found that all school viewed the quality of their children’s education leaders performed teacher evaluations other in better light compared with the previous year than the routine one. For schools where school (DPA14; from 27.6 percent to 64.2 percent). leaders had already conducted non-SKP teacher evaluations, we saw an increase in the number of • Nine Case Study Schools. Findings corroborated evaluation criteria. We did not, however, find more the overall results with very few exceptions. school leaders giving rewards to well-performing Only two out of the nine case study schools saw teachers in these schools. a decreased endline proportion of parents who perceived their children’s education to be at least 3. Local Government Support good. Meanwhile, UCs in all nine schools agreed that there had been improvement in teachers’ • Treatment Level. Of 202 UCs, 79.2 percent performance since the UC had become active. perceived village government performance to The proportion of parents who thought the have improved since they had become active quality of education had been better than during (LED14). These numbers were slightly lower than the previous year showed consistent increase. the proportion of UCs who thought there had been improvements from the teachers’ side (95 2. School Leadership percent). Interestingly, SAM+Score villages had the highest proportion of UCs attesting to village • Treatment Level. With the exception of SAM+Score governments taking novel initiatives to honor school leaders, there was an increase in the scorecard indicators (LSR06; 48 percent out of proportion of school leaders who performed 68 schools). On the other hand, 93.0 percent of evaluations other than the routine SKP29 (BEV17; 57 SAM+Cam village heads were found to have from 69.2 percent to 75.7 percent in SAM-only enthusiasm in attending KIAT Guru meetings, schools and from 74.2 percent to 80.6 percent compared with 82.8 percent out of 58 SAM-only in SAM+Cam schools). This was corroborated by villages and 89.8 percent out of 59 SAM+Score teachers, who reported an increase from 79.0 villages (APDD05d). Similar enthusiasm could percent to 84.4 percent in principals conducting also be found in all schools’ village governments; non-SKP performance evaluations (CME01). There on average, 74.7 percent of village heads across was also an increase in percentage of school treatment groups verify that other village leaders who gave rewards to well-performing government officials had also attended KIAT Guru teachers (BEV22; from 33.9 percent to 45.4 meetings (APDD05g). With regard to support from percent in SAM-only schools and from 42.4 school supervisors and district education agencies, percent to 54.8 percent in SAM+Cam schools), between baseline and endline surveys we saw an with exception of SAM+Score schools (decreasing increase in the proportion of SAM+Cam schools from 47.0 percent to 43.3 percent). Out of 202 that have been visited by school supervisors from UCs across all treatment groups, 95.6 percent 86.8 percent to 89.7 percent, whereas schools in attested to improvements in the principal’s two other treatment groups saw a decrease in the performance since the UC became active. proportion (from 87.8 percent to 77.9 percent in SAM-only schools and from 88.0 percent to 82.0 • Nine Case Study Schools. Eight of the case study UCs percent in SAM+Score schools). reported an improvement in the school leader’s performance since they had become active. At • Nine Case Study Schools. With the exception baseline, school leaders performed evaluations of one school, UCs in case study schools saw Sasaran Kinerja Pegawai, or Officer Performance Targets, a mandatory annual performance evaluation for all civil service officers. 29 52. APPENDICES improvements in village government officials’ minutes during baseline to 34.7 minutes during performance since they had become active. While endline (DPR06). Additionally, parents also raised almost all schools’ village governments were the investment in their children’s education by involved in KIAT Guru activities such as meetings increasing the average education expenditure by and proposing education initiatives, SD Usaba Rp 34,269 (11.2 percent) between baseline and Sepotong (SAM+Cam school) did not experience endline (DPS). these beneficial changes. • Nine Case Study Schools. In four schools, the 4. Parent Participation proportion of parents who believed communities were also responsible for children’s education • Treatment Level. The introduction of KIAT Guru grew between baseline and endline. In the case shifted the belief that school was the only study schools, UCs saw improvement in parent party responsible for children’s education. In participation, with the sole exception of SDI all schools, we saw a notable increase in the Konang (a SAM+Score school). This perception proportion of parents who thought communities is corroborated by the higher proportion of were also responsible for children’s education teachers who thought parents had a good level (DEO; from 70.5 percent to 80.0 percent), with of involvement in children’s education, with the parents in SAM-only schools contributing to the exception of two schools. Parents in SAM+Cam increase. Concurrently, 95.0 percent of 202 UCs case study schools exhibited greater positive reported improvements in parents’ participation changes in their level of involvement, with since they had become active (LED08). However, consistently higher proportions of parents who such perceptions were not mirrored by teachers, helped with their children’s homework, increased as the change in proportion of teachers who their education expenditure, and increased the thought parents had a good level of involvement number of days per week allocated to accompany in children’s education hovered at 43.0 percent their children’s home learning. However, parents (CKP15). in other treatment groups compensated for involvement with higher allocated time per day to Yet the data also show minimal behavior accompany children learning at home. change from parents. The proportion of parent respondents who had ever helped with their 5. UC Effectiveness children’s homework decreased by 0.2 percentage points in SAM-only schools and 0.8 percentage • Treatment Level. There is a negligible difference points in SAM+Score schools between baseline across treatment groups in terms of the efforts and endline, while the proportion increased that UCs exerted as measured by the average by 3.3 percentage points in SAM+Cam schools number of members involved in monthly (DPR02). Additionally, the proportion of parents monitoring and monthly evaluation meetings or guardians who accompanied their children in (LMS02; LMM01). According to principals, UCs learning at home within the past week decreased in SAM-only and SAM+Cam schools exhibited by 3.3 percentage point in SAM-only schools, 6.2 a slightly greater effort in providing suggestions percent in SAM+Cam schools, and 3.9 percent regarding the learning process at schools (BUC05; in SAM+Score schools (DPR04). There also was at 85.3 percent and 83.8 percent, respectively, a decrease in the average number of days on compared with SAM+Score at 80.6 percent). which parents accompanied children’s learning Correspondingly, parents in SAM+Score schools at home per week, from 3.4 days during baseline had the lowest awareness of the UC’s existence to 3.2 days during endline (DPR05). However, in their village (DUC02; at 50.5 percent compared parents compensated for this reduction by with 59.0 percent for SAM-only and 57.5 percent increasing the average daily allocated time to for SAM+Cam). Parents also thought that their accompany children learning at home, from 31.8 cooperation with teachers had improved since UCs 53. APPENDICES became active (DUC25). Interestingly, SAM+Score • Nine Case Study Schools. We found no evidence of UCs had the highest average perception scores pressure or threats from schools to UCs based in self-assessing whether they had the capability on the quantitative data for the nine case study to evaluate teachers without feeling intimidated schools. Fascinatingly, leaders in SAM+Score (LMM08; 8.8 compared with 8.6 for SAM-only and case study schools had the most favorable views 8.7 for SAM+Cam). of UCs’ ability to choose important indicators compared with schools in other treatment groups, • Nine Case Study Schools. In seven of the case study in contrast to the overall trend. schools, the average number of UC members involved in monthly monitoring exceeded that 7. UC and Cadre Performance (From Process of the average in 202 schools. All case study UCs Monitoring Data) also exhibited great confidence in their capability to evaluate teachers without feeling intimidated. • Treatment Level. Across all treatment groups, more With the exception of two schools, parents in than 60 percent of village cadres have at least case study school samples showed good levels of high school education. This may be attributed to awareness regarding the UC’s work in their village. the nature of the role that requires willingness to With the exception of SD Simpang Dua (SAM+Score learn about technology. As of March 2018, the school), at least 80 percent of parents thought proportion of cadres with at least a high school UCs had improved parent-teacher cooperation in education is the lowest in SAM+Score schools their respective schools. (61.2 percent) compared with SAM-only (73.5 percent) and SAM+Cam (75.0 percent) schools. On 6. Parent-Teacher Dynamics the other hand, the proportion of female cadres, 25.6 percent out of 203 cadres, indicates that the • Treatment Level. Contrary to UCs’ self-assessments role is still male-dominated. SAM-only schools had in the previous point, the highest proportion of the lowest representation of female cadres (23.5 UCs that felt intimidated in discussing score percent), although the difference with SAM+Cam results with teachers is exhibited in SAM+Score (26.5 percent) and SAM+Score (26.9 percent) schools (LMS12; 9.5 percent compared with is insignificant, magnitude-wise. Overall, cadres 3.4 percent for SAM-only and 4.8 percent for show adequate capability in delivering their tasks SAM+Cam). Additionally, SAM+Score schools had with at least 80 percent of them regularly attending the highest proportion of UCs who felt pressured and speaking up in meetings, as well as having to give better scores to teachers (LMS16; 20.6 facilitation skills, which is their main responsibility percent compared with 8.5 percent in SAM-only as a cadre. In terms of their performance, village and 8.0 percent in SAM+Cam). In schools with facilitators consistently graded SAM+Cam cadres’ pay-for-performance mechanisms, SAM+Score performance lower than their SAM-only and schools had a higher proportion (16.4 percent) SAM+Score counterparts, as suggested in the of UCs who received threats from teachers table below. compared with SAM+Cam schools (7.4 percent). Interestingly, the lowest average school leader perception score on the UC’s ability to choose important education indicators is exhibited in the SAM+Cam group (7.2), in contrast to SAM+Score (7.4) and SAM-only (7.5) groups. 54. APPENDICES Table A.1. Profile of KIAT Guru Cadres, by Treatment Group At least high Regularly attends Speaks up in Has facilitation Treatment school education, Female, % meetings, % meetings, % skills, % % SAM-only 73.5 23.5 93.9 87.9 90.9 SAM+Cam 75.0 26.5 92.7 80.6 84.9 SAM+Score 61.2 26.9 97.0 87.9 89.4 Source: Original table for this publication, number of samples fluctuates between 65 to 68 for each treatment group and classification In comparison with the cadres, the proportion of indicates more equal gender representation, with UC members with at least high school education females representing 44.3 percent, 47.1 percent, is lower across all treatment groups. However, the and 43.7 percent of the total members in SAM, trends of the between-treatment comparison are SAM+Cam, and SAM+Score UCs respectively. reversed, as SAM+Score UCs were found to have While almost 80 percent of UC members regularly the highest proportion of members with at least attend meetings, only half of them were brave a high school education (32.9 percent) compared enough to speak up at these meetings. Nearly 20 with SAM-only (24.6 percent) and SAM+Cam percent of them had facilitation skills. (29.8 percent). Moreover, the structure of the UC Table A.2. Profile of KIAT Guru UCs, by Treatment Group At least high Regularly attends Speaks up in Has facilitation Treatment school education, Female, % meetings, % meetings, % skills, % % SAM-only 24.6 44.3 80.2 55.5 23.1 SAM+Cam 29.8 47.1 79.5 55.4 17.5 SAM+Score 32.9 43.7 81.9 51.7 21.5 Source: Original table for this publication, number of samples fluctuates between 578 to 630 for each treatment group and classification • Nine Case Study Schools. Eight cadres had at least school-educated in SDN Engkangin. The female high school education and only two of them were representation also varies, with six of the schools female. Due to incomplete data, we do not have achieving more than 50 percent representation information regarding the capacity of SDI Sangka’s and three of them having less than the average rate cadre. All cadres attended meetings regularly. of female representation. Among the nine case However, two of them rarely spoke up in meetings study schools, UCs in the SAM+Score treatment and one of these two had inadequate facilitation group had lower willingness to regularly attend skills. meetings. However, when it comes to speaking up in meetings, UCs in SAM+Cam case study schools The education profile of UC members in the performed worse than other treatment groups. nine case study schools varies from none of Intuitively, the provision of KIAT Kamera may have the members having a high school education in dampened the need to voice out opinions in the SDI Konang to almost all members being high meetings. 55. APPENDICES Appendix B Amendments to the Teacher and School Leader Service Agreement Indicators Table B.1. Service Agreement Indicators for Teachers in SDK Kondok, Pre- and Postamendment Preamendment Postamendment Indicator weight weight Indicators directly related to student learning Teacher gives homework to students and makes sure parents are aware of and 10 10 sign off on students’ completed homework Teacher creates problem sets for student learning groups to work on and provides 10 – instructions to the problem sets during classroom learning Teacher uses varieties of teaching methods, including storytelling, singing, role 10 – playing, and question-answer with students, as well as teaching aids Teacher supervises student learning groups by conducting regular visits to all 10 10 groups at least once a month Grades 1–3 teachers ask students to rehearse reading letters and numbers daily – 10 before classroom lessons begin Teachers provide remedial assistance for students who are behind by providing – 20 additional lessons 15 minutes before class begins Grades 1–3 teachers use letter and number cards as teaching aids for students to – 10 be able to read and count Teacher supervises student learning groups by conducting regular visits to all – 10 groups at least once a month Subtotal: directly linked to learning 40 70 Indicators not directly related to student learning Teacher arrives and leaves on time: Monday–Thursday, 7:30 a.m. to 12:20 p.m.; 20 20 Friday–Saturday, 7:30 a.m. to 10:55 a.m. Teacher disciplines students gently with positive discipline Teachers are not to use 20 10 harsh words or physical punishment when disciplining students Teacher motivates students using positive encouragement and advice 10 – Teacher informs parents of students who are facing challenges in school by 10 – conducting a visit to their home and writing a formal letter to the parents Subtotal: indirectly linked to student learning 60 30 Total weights 100 100 Source: Original table for this publication 56. APPENDICES Table B.2. Average Weighting of Teacher Service Agreement Indicator Categories, Pre- and Postamendment SAM-only SAM+Score SAM+Cam Overall Code Description of indicators Pre- Post- Pre- Post- Pre- Post- Pre- Post- A-01 Teacher applies fun and motivating learning 4.9 5.0 7.4 6.3 7.3 4.4 6.6 5.2 techniques in classroom B-02 Teacher develops and applies lesson plan; assists 4.5 5.0 2.2 6.2 2.8 3.1 3.1 4.8 students during classroom learning C-03 Teacher uses teaching aids 4.4 5.5 3.1 5.2 3.2 7.9 3.5 6.2 C-04 Teacher works to improve students’ literacy and 2.8 12.5 2.2 10.6 2.3 14.1 2.4 12.4 numeracy skills D-05 Teacher strives to ensure students’ learning 14.6 13.0 11.6 12.2 11.5 12.5 12.6 12.6 comprehension, including by providing feedback E-06 Teacher uses positive discipline with students 14.1 10.8 14.6 9.3 14.2 9.1 14.3 9.8 and avoids any form of verbal or corporal punishments E-07 Teacher offers motivation, appreciation, and 1.3 1.4 1.8 1.3 1.6 1.0 1.5 1.2 praise to students E-08 Teacher instills religious, cultural, and social 3.7 1.4 4.6 2.4 4.8 2.4 4.4 2.1 norms in students E-09 Teacher inculcates patriotism and values of 1.9 1.3 4.0 2.4 3.1 2.7 3.0 2.1 obedience and orderliness in students E-10 Teacher does not ask students to work for 0.6 0.4 0.9 0.2 1.2 0.4 0.9 0.3 teacher’s personal needs F-11 Teacher starts and ends class on time 30.4 29.3 30.4 28.2 31.6 27.1 30.8 28.2 F-12 Teacher requests permission and provides a 0.1 0.0 0.4 0.1 0.0 0.0 0.2 0.0 legitimate reason and evidence for any absences F-13 When absent, teacher ensures she or he is 2.0 0.5 0.5 0.6 1.6 0.9 1.4 0.7 replaced by a substitute and complies with administration procedure for taking leave G-14 Teacher sets a good example through their 0.8 0.4 1.5 1.1 1.9 0.6 1.4 0.7 behavior while in school H-15 Teacher promotes the use of Bahasa Indonesia 1.1 1.3 1.3 1.8 1.8 2.0 1.4 1.7 as the medium of communication in school H-16 Teacher informs parents of student learning 1.8 1.8 1.9 2.2 1.8 2.0 1.8 2.0 progress H-17 Teacher holds and attends meetings with parents 1.6 0.3 1.2 0.4 1.1 0.3 1.3 0.3 and community members; teacher communicates with UC I-18 Teacher provides remedial sessions to improve 6.1 6.9 5.2 7.4 4.5 7.6 5.3 7.3 students’ learning comprehension I-19 Teacher provides assistance to students during 1.7 2.1 3.7 1.5 2.5 1.7 2.7 1.8 physical education, local content, scouting, and other extracurricular activities I-20 Teacher educates students to promote clean and 1.8 1.1 1.6 0.7 1.3 0.5 1.6 0.8 orderly school environment Total weighting for indicators directly related to learning 40.1 50.9 35.1 34.8 51.8 35.1 53.5 52.1 Legend: Top 3 categories Top 3 categories preamendment postamendment Source: Original table for this publication Note: The 8 (out of 20) indicator categories directly related to learning are A-01, B-02, C-03, C-04, D-05, H-15, H-16, and I-18, as shown in blue text. 57. APPENDICES Figure B.1. Changes in Teacher Service Agreement Indicators, by Treatment Group SAM-only SAM+Cam SAM+Score 14.0% 12.0% 10.0% 8.0% 6.0% 4.0% 2.0% 0.0% -2.0% -4.0% -6.0% -8.0% A-01 B-02 C-03 C-04 D-05 E-06 E-07 E-08 E-09 E-10 F-11 F-12 F-13 G-14 H-15 H-16 H-17 I-18 I-19 I-20 SAM-only 0.1% 0.6% 1.2% 9.7% -1.6% -3.3% 0.1% -2.3% -0.6% -0.2% -1.1% -0.1% -1.5% -0.4% 0.1% 0.0% -1.3% 0.8% 0.4% -0.6% SAM+Cam -2.9% 0.4% 4.7% 11.8% 0.9% -5.1% -0.6% -2.4% -0.4% -0.8% -4.6% 0.0% -0.8% -1.4% 0.2% 0.2% -0.8% 3.1% -0.8% -0.8% SAM+Score -1.2% 4.0% 2.1% 8.4% 0.6% -5.3% -0.4% -2.2% -1.6% -0.7% -2.2% -0.3% 0.1% -0.4% 0.6% 0.3% -0.8% 2.2% -2.3% -0.9% Source: Original figure for this publication Notes: All values are within-treatment-group average changes, in percentage points. See table B.2 for full descriptions of each indicator category. Figure B.2. Changes in Teacher Service Agreement Indicators, Average Across Treatment Groups 12.0% 10.0% 8.0% 6.0% 4.0% 2.0% 0.0% -2.0% -4.0% -6.0% A-01 B-02 C-03 C-04 D-05 E-06 E-07 E-08 E-09 E-10 F-11 F-12 F-13 G-14 H-15 H-16 H-17 I-18 I-19 I-20 Overall -1.3% 1.7% 2.6% 10.0% 0.0% -4.6% -0.3% -2.3% -0.9% -0.6% -2.6% -0.1% -0.7% -0.7% 0.3% 0.2% -1.0% 2.0% -0.9% -0.8% Source: Original figure for this publication Notes: All values are overall average changes, in percentage points. See table B2 for full descriptions of each indicator category. 58. APPENDICES Table B.3. Average Weighting of School Leader Service Agreement Indicator Categories, Pre- and Postamendment SAM-only SAM+Score SAM+Cam Overall Code Description of indicators Pre- Post- Pre- Post- Pre- Post- Pre- Post- A-01 Principal sets a good example through her or his 11.0 29.8 9.9 28.1 10.2 27.9 10.4 28.6 attendance and behavior while in school A-02 Principal uses positive discipline with students 6.4 8.0 7.8 5.7 5.6 7.5 6.6 7.1 A-03 Principal offers motivation, appreciation, and 0.2 0.9 0.6 1.0 0.2 0.7 0.3 0.8 praise to students and teachers A-04 Principal tends to the needs of the community, 17.6 2.2 16.3 1.5 14.7 0.9 16.2 1.6 attends meetings with parents, and communicates with school committee and UC B-05 Principal develops curriculum and ensures that 2.3 7.9 2.7 9.4 1.6 8.3 2.2 8.5 teachers develop lesson plans and deliver fun learning activities B-06 Principal communicates and ensures that other 1.0 1.4 1.2 4.1 1.1 2.5 1.1 2.7 teachers communicate student learning progress to parents B-07 Principal instills patriotism and values of 4.0 5.2 4.6 6.0 4.1 6.9 4.3 6.0 compliance, cleanliness, and orderliness in students C-08 Principal develops and reports school’s work and 0.2 2.9 0.2 1.2 0.5 2.8 0.3 2.3 budget plan D-09 Principal optimally manages and uses school 27.6 9.6 31.5 6.3 35.7 8.0 31.6 8.0 facilities to support learning activities D-10 Principal manages school environment that 1.9 3.9 2.1 4.6 1.2 3.4 1.7 4.0 promotes security, safety, and health D-11 Principal manages the schedule and supervises 3.7 5.8 3.3 6.2 3.9 7.4 3.6 6.5 teacher’s additional assignments as well as extracurricular and cleaning activities in school D-12 Principal manages school administration 1.9 3.3 1.2 2.3 1.1 2.7 1.4 2.8 D-13 Principal holds meetings with teachers 0.8 1.3 1.5 4.1 1.1 3.8 1.1 3.1 E-14 Principal develops teaching schedule for 21.5 17.8 17.3 19.4 19.0 17.2 19.3 18.1 teachers, conducts supervision over teacher’s discipline, and ensures learning activities are implemented Total weighting for indicators directly related to learning 24.8 27.1 21.2 32.9 21.8 28.0 22.6 29.3 Legend: Top 3 categories Top 3 categories preamendment postamendment Source: Original table for this publication Note: The 3 (out of 14) indicator categories directly related to learning are B-05, B-06, and E-13, as shown in blue text. 59. Figure B.3. Changes in School Leader Service Agreement Indicators, by Treatment Group 30.0% SAM-only SAM+Cam SAM+Score 20.0% 10.0% E-14 A-04 D-09 0.0% A-01 A-02 A-03 B-05 B-06 B-07 C-08 D-10 D-11 D-12 D-13 -10.0% -20.0% -30.0% Source: Original figure for this publication Notes: All values are within-treatment-group average changes, in percentage points. See table B3 for full descriptions of each indicator category. 60. Figure B.4. 25.0% Changes in School Leader Service Agreement Indicators, Average Across Treatment Groups 20.0% 15.0% 10.0% 5.0% A-04 D-09 0.0% A-01 A-02 A-03 B-05 B-06 B-07 C-08 D-10 D-11 D-12 D-13 E-14 -5.0% -10.0% -15.0% -20.0% -25.0% -30.0% Categories SAM-only SAM+Cam SAM+Score Overall A-01 18.8% 17.7% 18.2% 18.2% A-02 1.5% 1.9% -2.1% 0.5% A-03 0.7% 0.5% 0.4% 0.5% A-04 -15.4% -13.8% -14.8% -14.6% B-05 5.6% 6.7% 6.7% 6.3% B-06 0.4% 1.4% 2.9% 1.6% B-07 1.2% 2.8% 1.4% 1.8% C-08 2.7% 2.3% 1.0% 2.0% D-09 -18.0% -27.7% -25.2% -23.6% D-10 2.0% 2.2% 2.5% 2.3% D-11 2.1% 3.5% 2.9% 2.9% D-12 1.4% 1.5% 1.1% 1.4% D-13 0.5% 2.7% 2.6% 1.9% E-14 -3.7% -1.8% 2.1% -1.1% Source: Original figure for this publication Notes: All values are overall average changes, in percentage points. See table B2 for full descriptions of each indicator category. 61. Appendix C Detailed Analysis of Coherence Across Relationships in Case Study Schools Figures C.1, C.2, and C.3 categorize each stakeholder group’s overall opinions of other village-level stakeholders, as expressed in interviews and focus groups during the field visits, for each of the three case study schools in each treatment group. These figures are summarized in figure 5 in the main body of the paper, which shows the proportion of expressed views (excluding stakeholders’ views of themselves) that are positive. Stakeholder groups’ overall opinions of other village-level stakeholders are classified as follows: • p ositive and explicitly oriented toward student learning (for example, students now study diligently; the quality of teachers’ lessons has improved; because of parents’ support more children now can read) • positive, which can include some mildly negative views if on the whole the opinion is obviously positive (for example, one teacher still disciplines children harshly, but all of the teachers are now punctual and hardworking) • mixed (that is, having positive views about some actions but negative views about others; some members of the stakeholder group having positive views and others having negative ones) • neutral or don’t know (for example, saying that they don’t know what the school committee’s duties are; listing the committee’s duties without stating whether committee members perform their duties effectively) • negative (for example, teachers are tardy; the UC doesn’t bother checking school documentation; parents don’t help their kids with homework) 62. Figure C.1. Stakeholders’ Views of Each Other in SAM-Only Case Study Schools Opinions of interview and focus group participants (rows) about other school stakeholders (columns) SDI SANGKA SDI ENGKANGIN SDN SUNGAI LAUR School committee School committee School committee Head teacher Head teacher Head teacher Village cadre Village cadre Village cadre Village head Village head Village head TREATMENT: SAM only Teachers Teachers Teachers Students Students Students Parents Parents Parents UC UC UC 1. BASELINE Students ○ ○ ● ◑ ● ● ◑ ● Teachers ◑ ◑ ● ○ ◑ ◑ ◑ ● ◑ ● ○ ○ ◑ ◑ ◑ ● Head teacher ○ ◑ ◑ ● ◑ ● ● ◑ ○ ○ ◑ ◑ ● ○ Parents ○ ◑ ◑ ◑ ◑ ◑ ◑ ● ◑ ○ ◑ ◑ ● ◑ ● ◑ School committee ○ ● ◑ ● ● ◑ ● ◑ ● ● ◑ ◑ ◑ ● Village head ● ● ◑ ◑ ● ○ ○ ● ○ ○ ● ◑ ◑ ● 2. MIDLINE Students ● ● ● ● ● ✪ ✪ ● ✪ ● Teachers ✪ ● ◑ ◑ ○ ◑ ◑ ◑ ● ◑ ◑ ● ● ◑ ◑ ● ● ◑ ○ Head teacher ● ◑ ● ● ○ ● ✪ ● ● ● ● ◑ ◑ ● ● ● ● ◑ Parents ✪ ● ● ○ ● ◑ ● ● ◑ ● ● ● ● ● ◑ School committee ✪ ● ◑ ◑ ◑ ○ ● ✪ ● ● ● ● ● ✪ ◑ ◑ ● ● ○ ◑ Village head ✪ ● ● ● ● ◑ ✪ ◑ ◑ ✪ ● ● ● UC ✪ ● ● ● ○ ● ✪ ● ● ● ● ● ✪ ◑ ● ● ● ● ● 3. ENDLINE Students ● ◑ ✪ ● ✪ ✪ ● ✪ ● ● ● ● ● ● Teachers ✪ ● ◑ ◑ ○ ◑ ● ◑ ● ● ◑ ◑ ○ ● ● ◑ ● ● ◑ ○ ◑ ◑ Head teacher ● ● ◑ ◑ ◑ ● ◑ ◑ ✪ ◑ ● ◑ ● ● ○ ● ○ ◑ ○ ● ○ Parents ✪ ● ● ✪ ◑ ● ✪ ● ● ○ ◑ ● ○ ● ● ◑ ● School committee ● ● ● ● ○ ● ✪ ● ● ◑ ● ● ● ● ● ● ● ● ◑ ● ● Village head ● ● ● ● ● ● ✪ ● ● ● ● ● ● ● ● ● ● ● UC ● ◑ ◑ ◑ ○ ● ● ✪ ● ● ◑ ◑ ◑ ● ● ◑ ● ● ○ ● ◑ ● ○ Village cadre ● ◑ ○ ● ◑ ● ● ◑ ● ● ● ◑ ◑ ● ● ◑ ◑ ○ ● ● Legends ✪ Positive view, oriented to student learning ● Positive view, general ◑ Mixed view ● Neutral or don’t know ○ Negative view [ ] Not mentioned or asked Stakeholders’ views of themselves Stakeholders’ views of teachers Source: Original figure for this publication 63. Figure C.2. Stakeholders’ Views of Each Other in SAM+Score Case Study Schools Opinions of interview and focus group participants (rows) about other school stakeholders (columns) SDI KONANG SDN SUNGAI KELI SD SIMPANG DUA School committee School committee School committee Head teacher Head teacher Head teacher Village cadre Village cadre Village cadre Village head Village head Village head TREATMENT: SAM+Score Teachers Teachers Teachers Students Students Students Parents Parents Parents UC UC UC 1. BASELINE Students ◑ ○ ● ● ● ● ● ● ● Teachers ◑ ◑ ◑ ○ ◑ ○ ◑ ● ◑ ◑ ◑ ● ◑ ● ◑ ◑ ● ○ Head teacher ◑ ● ◑ ◑ ○ ◑ ● ◑ ● ◑ ◑ ◑ ◑ ◑ ○ Parents ◑ ◑ ○ ◑ ● ◑ ◑ ● ● ○ ● ◑ ◑ ◑ ◑ ● ◑ School committee ◑ ● ● ○ ● ● ◑ ● ◑ ◑ ○ ● ◑ ◑ ● ○ Village head ○ ◑ ● ● ● ◑ ◑ ◑ ● ● ● ◑ ● ● ○ ● 2. MIDLINE Students ● ● ◑ ● ◑ ◑ ● ● ● ● ● ● ● Teachers ● ● ◑ ◑ ● ● ◑ ◑ ● ◑ ◑ ◑ ● ● ◑ ● ◑ ● ○ ◑ ○ Head teacher ✪ ● ● ◑ ● ● ✪ ● ● ● ○ ● ● ● ✪ ● ● ◑ Parents ✪ ● ✪ ● ● ✪ ● ● ○ ○ ◑ School committee ● ● ● ◑ ● ● ● ◑ ● ● ● ● ● ● ● ◑ Village head ✪ ● ● ● ● ● ◑ ◑ ● ● UC ● ◑ ● ◑ ● ◑ ● ✪ ● ○ ◑ ◑ ○ ● ✪ ● ● ● ● ◑ ● 3. ENDLINE Students ● ● ◑ ● ● ● ● ● ◑ ● ● ● ◑ Teachers ◑ ● ◑ ◑ ● ● ● ◑ ● ● ● ◑ ● ○ ● ◑ ◑ ◑ ◑ ◑ ● ◑ ◑ ● Head teacher ◑ ● ◑ ○ ● ● ✪ ● ● ● ● ◑ ● ● ● ◑ ◑ ● Parents ✪ ✪ ● ● ● ● ● ● ● ○ ○ ● ✪ ● ◑ ◑ School committee ✪ ● ● ◑ ● ● ● ● ● ● ● ● ● ✪ ● ● ◑ ● ○ ● Village head ● ● ◑ ● ● ● ◑ ○ ● ● ◑ ● ● ◑ UC ◑ ● ◑ ◑ ◑ ◑ ✪ ● ● ◑ ◑ ○ ● ◑ ● ● ● ○ ○ Village cadre ◑ ◑ ● ◑ ● ◑ ◑ ✪ ● ● ● ○ ○ ● ● ○ ● ◑ ◑ ◑ ○ ○ ◑ Legends ✪ Positive view, oriented to student learning ● Positive view, general ◑ Mixed view ● Neutral or don’t know ○ Negative view [ ] Not mentioned or asked Stakeholders’ views of themselves Stakeholders’ views of teachers Source: Original figure for this publication 64. Figure C.3. Stakeholders’ Views of Each Other in SAM+Cam Case Study Schools Opinions of interview and focus group participants (rows) about other school stakeholders (columns) SDI SANGKA SDI ENGKANGIN SDN SUNGAI LAUR School committee School committee School committee Head teacher Head teacher Head teacher Village cadre Village cadre Village cadre Village head Village head Village head TREATMENT: SAM+Cam Teachers Teachers Teachers Students Students Students Parents Parents Parents UC UC UC 1. BASELINE Students ◑ ● ● ◑ ◑ ● ◑ ● ● Teachers ◑ ● ◑ ○ ○ ◑ ● ● ○ ○ ○ ◑ ● ◑ ◑ ◑ ○ Head teacher ○ ◑ ◑ ○ ● ◑ ● ○ ◑ ● ◑ ● ◑ ◑ ◑ Parents ◑ ● ◑ ◑ ○ ● ◑ ● ◑ ○ ○ ◑ ◑ ◑ ◑ ○ ◑ School committee ◑ ● ◑ ◑ ○ ● ○ ◑ ○ ● ● ◑ ○ ● ● ● Village head ◑ ● ○ ◑ ◑ ● ◑ ● ● ○ ● ◑ ● ◑ ◑ 2. MIDLINE Students ● ● ◑ ○ ● ◑ ◑ ● ● ● ◑ ◑ ● Teachers ◑ ✪ ● ◑ ○ ● ◑ ✪ ● ◑ ◑ ○ ◑ ● ● ○ ◑ ○ ● ○ Head teacher ◑ ✪ ◑ ○ ● ● ● ● ◑ ● ○ ◑ ✪ ● ◑ ○ ◑ ◑ Parents ✪ ● ● ● ● ● ● ● ● ● ● ● ● ◑ ○ ○ ● School committee ◑ ● ● ◑ ● ◑ ● ● ● ● ● Village head ● ● ● ✪ ● ● ✪ ● ● ◑ ○ ◑ UC ◑ ● ● ● ○ ● ● ● ○ ● ◑ ● ● ◑ ◑ ◑ ● 3. ENDLINE Students ✪ ● ● ✪ ● ● ● ◑ ● ● ● ● ◑ ● ○ ◑ ● ● Teachers ✪ ✪ ● ● ● ◑ ● ✪ ● ● ◑ ○ ○ ◑ ● ✪ ● ◑ ◑ ● ● ◑ ◑ Head teacher ✪ ✪ ● ● ● ● ✪ ● ◑ ○ ◑ ◑ ● ◑ ● ● ● ● ◑ ◑ Parents ✪ ✪ ✪ ✪ ● ● ● ● ● ○ ● ✪ ✪ ● ● ● ◑ ● School committee ◑ ● ● ◑ ● ● ● ✪ ● ● ◑ ◑ ● ● ✪ ● ● ◑ ● ● ● Village head ✪ ◑ ● ● ● ● ✪ ● ◑ ● ● ● ● ● ● ○ ● ◑ ● UC ● ● ● ● ○ ○ ● ● ✪ ● ● ○ ● ○ ● ○ ● ◑ ◑ ◑ ◑ ○ Village cadre ✪ ● ● ● ● ● ● ◑ ● ● ◑ ○ ◑ ● ● ✪ ✪ ● ◑ ● ● ◑ ● Legends ✪ Positive view, oriented to student learning ● Positive view, general ◑ Mixed view ● Neutral or don’t know ○ Negative view [ ] Not mentioned or asked Stakeholders’ views of themselves Stakeholders’ views of teachers Source: Original figure for this publication 65.