Sc lin Up Coll bor tiv Soci l Account bilit in Compl x Gov rn nc S st ms: A R l tion l Appro ch for Evid ncin Sust in bilit Flor nci Gu r ovich nd Alix W d son JANUARY 2024 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability Abstract When social accountability interventions scale up and their sustainability depends on the interactions of many agents and system components, related results are rarely observable at the end of an intervention. The 2019 OECD Development Assistance Committee’s (OECD DAC) revamped evaluations criteria for assessing sustainability acknowledges that such results are often emergent, and should be monitored and evaluated with this in mind. It therefore emphasizes a turn towards assessing complex processes prospectively. It also asks evaluations to consider how likely it is that these results are evident at the time they are monitored or evaluated. However, the social accountability field continues to have gaps regarding doing this effectively in practice. This paper presents and provides evidence from testing an innovative operational approach that has promising potential to support this aim - a sequential, relational rubric. This approach can support practitioners to monitor, evaluate and learn about the causal processes of scale up of social accountability interventions with an eye towards sustainability i.e., considering prospective sustainability. It is grounded in systems thinking, co-production and social learning theory, as well as links with collective governance and social contract theory for development. Evidence yielded from the authors’ testing of this approach on a sample of diverse projects from the Global Partnership for Social Accountability (GPSA) program revealed that the alleged ‘absence of evidence’ dilemma of social accountability scale up is due to ill-fitting concepts and methods for assessment. It challenges existing assumptions and findings that claim that social accountability processes do not scale and are unsustainable. The authors propose that by using fit-for-purpose concepts and methods with a focus on social learning and compromise – also called a ‘resonance pathway to scale’ which this paper discusses in detail – it is possible to observe loosely coordinated scale up processes at work in many (but not all) social accountability interventions and identify tangible evidence of prospective sustainability. An important caveat is that these processes, the outcomes they generate, and the corresponding evidence often look qualitatively different than the original intervention design and predictions for scale-up at that point in time. This is because the process of deliberation and compromise inherent to social accountability work in dynamic local systems introduces changes and new conditions for uptake by diverse actors in the public sector, civil society, and donor institutions. The paper concludes that even relatively small-scale localized projects of three to five years with budgets of less than one million USD, across different contexts and sectors can produce processes and outcomes which contribute to many forms of sustainability, including via scale- up. Furthermore, the cross-fertilization of learning and aggregation of results for scale-up across projects within and beyond the GPSA (and other programs) can help monitoring evaluation and learning (MEL) and social accountability practitioners alike to deliver on a program’s mandate. Doing so can also create new knowledge for the wider social accountability field that siloed interventions, lacking suitable concepts and methods for assessing scale-up and prospective sustainability, often fail to produce. The paper ends with recommendations for taking forward this approach and the associated benefits, implications and required investments. 2 Acknowledgments This paper represents a culmination of evolving learning, evidence, research and practical experience from the authors and the wider field of social accountability. A key source of findings and evidence base for the paper is from our respective work with the World Bank’s GPSA programming over the past decade, in partnership with civil society, the public sector, communities and citizens around the world. This paper reflects the contributions of numerous stakeholders engaged with the GPSA, especially Jeff Thindwa, Ann-Sofie Jespersen, and Aly Zulficar Rahim. We greatly appreciate the valuable feedback and time provided by peer reviewers Mathieu Cloutier and Tom Aston. We also acknowledge the work of other social accountability practitioners, researchers and evaluators for their contributions to social accountability work and evidence building for the wider field. This paper builds upon the existing evidence base to support improvements and offers insights about how social accountability programming can be strengthened, sustained and scaled. Copy Editor: Amber Meikle Designer: Mohamed Elmahdy Cover Image: © Curt Carnemark / World Bank. Further permission required for reuse Suggested Citation Guerzovich, Florencia and Wadeson, Alix. 2024. Scaling up Social Accountability in Complex Governance Systems: A Relational Approach for Evidencing Sustainability. World Bank, Washington, DC. Contact Information For questions and other information about this paper and its findings, please contact the authors: alixwadesonconsulting@gmail.com and florcig@gmail.com © 2024 International Bank for Reconstruction and Development / The World Bank 1818 H Street NW | Washington DC 20433 Telephone: 202-473-1000 www.worldbank.org 3 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability Glossary of Key Concepts Accountability: A social relationship between a power-holder, an actor that performs a task such as a government official, and an account-holder, those for whom the task is performed or who are affected by it. In this social relationship the power-holder is, in practice, obliged to be more transparent and explain and justify their decisions, behaviors, trajectories, and results (answerability), with information and transparency. There is the possibility of dialogue among the parties and the account-holder can pose questions and ask for rectifications, remedies, corrective action or problem solving (accountability processes). As a result, the account- holder can pass judgment and the power-holder can face some form of consequences. These consequences can be hard or soft, formal, and explicit or informal and implicit, sanctions in the case of malperformance. In other words, accountability entails a proactive set of processes and practices where the how – the quality of the social relationship between a power holder and an account holder – is the what (Guerzovich, 2022, drawing on Bovens et al, 2014). Social Accountability: Processes that seek to make communities leading agents in their localized development story by: (1) improving the quality of goods and services, (2) primarily through monitoring and oversight of those goods and services, (3) citizens’ collective, rather than individual, efforts to hold power-holders (primarily service providers and bureaucrats, secondarily politicians) to account, (4) providing a concrete mechanism to rework the social contract and strengthen local systems, in the sense of transforming state-society relationships and the norms and power dynamics associated with them (Guerzovich and Aston, 2023). Collaborative Social Accountability: Processes whereby civil society organizations and public sector institutions with decision-making power and public management authority at different levels across the institutional and service delivery chain convene to analyze a problem, identify citizen participation mechanisms to help solve it, and agree on joint actions to co-produce solutions and appropriate responses (Poli and Guerzovich, 2020). This is a term coined by and applied in all of the GPSA’s programming. Learning: The process of creating new knowledge, insights, or understanding – usually about what works, what might work, or 4 what doesn’t work in advancing a given goal. This paper, and the resonance pathway to scale, is most interested in shared learning – learning that happens with others, also known as joint learning or social learning (Guerzovich et al, 2022 drawing on Wenger-Trayner and Wenger-Trayner, 2021). Resonance Pathway to Scale: This expects social accountability to scale up based on deliberation, compromise, and coordinated collective action among diverse actors (Guerzovich et al 2022). The logic is that social accountability processes contribute to overcoming the challenges of collective action in a game theoretical sense (Ostrom, 1990; also see World Bank, 2017). Its main thrust is social learning. That means enabling a group of individuals to organize and work out how to make the most of a situation (e.g., insights learned by implementing social accountability in select locations) to create shared gains (e.g., using those insights to inform decisions in other locations) through loose coordination and collaboration. Scale/scale-up:The ability of a project or program to grow its effects beyond its sectoral and geographic boundaries, to reach more people (Guerzovich and Poli, 2014). Sustainability: When and how a project’s net benefits continue or be likely to continue after the end of the project (OECD DAC, 2019). System: The interconnected set of factors (policies, practices, resource flows, relationships and connections, power dynamics and mental models) that jointly produce a development outcome – the whole is greater than the sum of its parts (Kania et al, 2018). Uptake: In the context of the GPSA and this paper, these are actions taken by public service sector actors, policymakers, practitioners, and other development actors that facilitate and contribute to the adaptation, application, and/or sustainability of elements of collaborative social accountability processes (e.g., approaches, strategies, tools, mechanisms) and/or the application of lessons and insights from collaborative social accountability programming and evidence. This definition of uptake includes many types of sustainability – actual, prospective and scale-up. 5 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability Executive Summary 6 The Paper at a Glance In 2019, the Organization for Economic Co-operation and Development -Development Assistance Committee (OECD DAC) revamped its criteria to assess interventions. This new criteria puts front and center the methodological challenges associated with evaluating sustainability. Sustainability is defined as “when and how a project’s net benefits continue or are likely to continue after the end of a project” (OECD DAC, 2019). When sustainability results are dependent on the interactions of multiple actors and elements in a complex local system, they are often emergent over time, prospective in nature, and uncertain at the point of a project’s final evaluation. This paper presents an innovative operational approach – a sequential relational rubric – to monitor, evaluate and learn about the causal processes of scale-up, with an eye towards sustainability (i.e considering prospective sustainability). Scale up – “the ability of a project or program to grow its effects beyond its sectoral and geographical boundaries, to reach more people” (Guerzovich and Poli, 2014) – and sustainability are not synonymous. However, in the social accountability field, many assessments consider scale-up as an essential pathway and key indicator of project sustainability. Accordingly, much of the relevant reviews and literature in the field have found that while many projects achieve some form of positive results, there is limited tangible evidence to demonstrate they have successfully scaled-up to reach more locations and people (see for example E-Pact Consortium, 2016). This commonly held conclusion infers that such projects are not sustainable, contributing to a pessimistic narrative about the potential and long-term impact of social accountability programming overall (Aston, 2021). Written for fellow monitoring evaluation and learning (MEL) practitioners working in the social accountability space, this paper argues that absence of evidence of sustainability to date is not evidence of absence in practice, nor does it equate to social accountability projects being unsustainable. Rather there is an evidence challenge, which lies in the ill-fitting concepts and methods that are often used to monitor and evaluate scale-up of social accountability projects. The authors propose that by applying fit-for-purpose concepts and methods that focus on social learning and compromise, it is possible to observe processes of movement towards scale-up. These different forms of prospective sustainability can be evidenced in a significant proportion of social accountability projects. The relational rubric presented and discussed herein builds upon and strengthens the recent theoretical proposition of the resonance pathway to scale (Guerzovich et al, 2022). This pathway asserts that the scale-up of many social accountability processes involves social learning at its core; and that such processes may occur gradually based on deliberation, compromise, and coordinated collective action among diverse actors. An important caveat to note is that these processes, the outcomes they generate, and the corresponding evidence often look qualitatively different from the design and predictions for scale-up of the original intervention. This is due to changes and conditions for uptake that emerge, both throughout a social accountability project and beyond its implementation. The approach presented and evidenced in this paper can support practitioners to monitor, evaluate and learn about the causal processes of scale-up of social accountability interventions with an eye towards sustainability. It is grounded in systems thinking, co-production and social learning theory, and links with collective governance and social contract theory for development. All these models also underscore the uncertainty and emergent nature of complex and relational processes, validating the need for conceptual and methodological approaches that sufficiency account for such dynamics. Accordingly, the paper discusses new empirical findings and a wealth of examples yielded through the authors’ test of this sequential relational rubric across a sample of 15 completed projects directly supported by the World Bank’s Global Partnership for Social Accountability (GPSA). The GPSA’s approach to MEL at a portfolio (or program) level enabled the iterative development of this rubric method through quick feedback cycles of learning and adaptation. The paper concludes that even relatively small-scale localized projects of 7 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability three to five years with budgets of less than one million USD, working across different contexts and sectors can produce processes and outcomes which contribute to many forms of sustainability, including via scale- up. Furthermore, the cross-fertilization of learning and aggregation of results for scale-up across projects within and beyond the GPSA (and other programs) can help MEL and social accountability practitioners alike to deliver on a program’s mandate. Doing so can also create new knowledge where “the whole is greater than the sum of its parts” (Guerzovich, 2021b). With further testing of the rubric approach and building the evidence base for the resonance pathway to scale, the paper proposes that a solution for the absence of evidence dilemma is possible. If relevant organizations and funders commit to and invest sufficiently in portfolio-level MEL that is grounded in fit-for-purpose concepts and methods for assessing scale-up, then the narrative can shift towards prospective sustainability in its many forms, recognizing the promise of long-term impact from social accountability programming. More Meaningful Monitoring and Evaluation of Relational Social Accountability Processes within Dynamic Local Systems The sequential relational rubric for assessing scale-up and other forms of sustainability, and the concepts embedded within it, are aligned with and help to operationalize the revised OECD DAC evaluation criteria for sustainability (Guerzovich, 2023 and Guerzovich, 2023a). Key features of the concepts and criteria applied in the authors’ work, and discussed in the paper are: ● A systemic lens that considers how interventions fit within local systems and effect different actors. A systemic lens focuses on interactions of a wide range of actors in a system, rather than a narrow focus on the siloed actions of the project’s direct civil society implementers. This is essential to capture scale-up processes because they often rely on downstream actions taken by others in the systems. These include public sector institutions, funders, and other development agencies that adopt, adapt and/or sustain elements of social accountability processes in different ways after a project ends. ● An emphasis on prospective sustainability. During a project’s life and at its closure, it is not possible to have certainty about the future in a complex system (or any system). However, it is both possible and desirable to focus on the likelihood of sustainability, and scale-up as one form of it. Therefore, assessments should be based on signals for prospective sustainability and uptake, given uncertainty using both monitoring data from the whole project life, and triangulated evidence at the final evaluation stage. Emphasis on both forms of data is critical because ongoing attention to monitoring data helps project teams to identify, plan for and build opportunities that can support the continuation of positive effects of a project, from the point of its design, while also mitigating barriers and risks along the way. ● A focus on function over form. Social accountability processes and their scale-up will vary widely in their form i.e., the tools, strategies and mechanisms selected for implementing social accountability work. The variance between contexts considers different perspectives, relationships, and incentives of the key actors who can drive scale-up in the long-term. Therefore, the forms of social accountability processes need to be localized to different systems of implementation. At the same time, they often play a similar function: improving public service delivery in a collaborative manner that includes communities and citizens. Discrete elements (or components) of a social accountability process will often be adapted and applied in many forms, rather than the whole process replicated completely. When evaluations look only for complete replication of a process that is the same as its original design, it fails to capture other forms of scale. This is not only an unrealistic expectation for social accountability projects, but also discounts ways that different actors can engage in more meaningful and responsive social accountability relationships, while also contributing to sustainable outcomes and scale. 8 Execut iv e Summa r y These features are often lacking in traditional monitoring and evaluation processes, which the paper discusses in greater technical detail with examples. However, applying them challenges the (erroneous) conclusion that if we cannot evidence or demonstrate the lasting change during or right at the end of a project, then movement towards scale-up or other forms of sustainability are not happening and will not continue. When uncertainty is rife and insufficient time has elapsed to observe sustainability at work, the authors propose that evaluators should apply concepts and methods that can assess the conditions required for actual sustainability and the likelihood of prospective sustainability (rather than the certainty), in a wide range of different forms. Defining and Evidencing ‘Good Enough’ Results and ‘What Counts’ Most of the existing literature on social accountability sustainability and scale-up focuses on wholesale replication or complete institutionalization of social accountability processes, equating such outcomes with success. It thus fails to capture legitimate outcomes that include adaptation, incremental progress, fits and starts, and gray areas (for alternative approaches, see Integrity Action, 2020). Instead, the resonance pathway to scale and the relational rubric approach for assessment recognize that success depends heavily on the interactions between and actions of several actors within a given system and considers how these dynamics evolve over time to yield positive results for sustainability. But what does this look like in practice, and what counts? And how can we sufficiently evidence it with a MEL system and its data? While acknowledging that there are no perfect definitions and that concepts change with learning and practice, this paper argues that striving for ‘good enough’ is reasonable, while also being careful about conceptual stretching –- defined as “the distortion that occurs when a concept does not fit the new cases” (Collier and Mahon, 1993). The approach presented is based on the assertion from the GPSA that: a result is demonstrated when lessons from or elements of collaborative social accountability inform decisions and actions taken by the public sector and other civil society and development actors beyond an individual project, including after the project has ended. Such results are often associated with the uptake of selected element(s) of a collaborative social accountability process, rather than wholesale scale-up of it. The relational rubric method was developed and then tested through the assessment of the associated operational indicator in the GPSA Results Framework, applied across its portfolio of projects: “The percentage of GPSA grants in which public sector institutions and other relevant actors (e.g., the World Bank, other donors, civil society organizations) seek to: i. use substantive lessons for improvements of targeted policies, processes, and mechanisms; ii. apply or sustain elements of collaborative social accountability processes after life of the project; iii. adapt insights from GPSA projects to scale them through programs or policies; or iv. apply elements of collaborative social accountability processes in additional localities or sectors.” 9 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability It is important to emphasize that the ‘seek to’ part of the indicator statement is critical because the uptake of GPSA projects is contingent, in that it can be introduced but not sustained by the project after its closure. Therefore, the authors’ assessment of this GPSA result and indicator casts a wide net for ‘what counts’ for sustainability, including examples of scale-up. Relevant examples evidenced across the GPSA portfolio, and its respective projects include: ● Work by key stakeholders closely engaged directly in a project (i.e., public sector officials, GPSA/ World Bank project personnel, and representatives from civil society and community organizations and networks integrated into another public sector project or program. ● Public sector counterparts used lessons to inform public sector reforms and policies. ● Emulation by local public sector or service providers (e.g., education officials and schools) that observed, adopted, or adapted the collaborative social accountability process from a project. ● The World Bank or other funders used lessons and approaches to advise public sector or other development partners’ programs. ● The World Bank or other funders financed an adaptation of the project in the same or other sectors. ● Any observed or reported uptake, sustainability and/or scale-up led by other international non- government organizations (INGOs) or civil society organizations (CSOs). ● The project actions and trajectory demonstrated ongoing dialogue with key actors (relevant public sector officials and World Bank operations staff) to move the process for potential uptake of collaborative social accountability processes forward. Understanding and Incorporating a Causal Sequence A critical part of evidencing the likelihood for scale-up and prospective sustainability is to first understand and then investigate the concrete and sequential steps involved in these processes. This relational rubric also has an innovative sequential component. It organizes relevant actions and events in a temporal order to help identify if and how scale-up is on the right track or not, with an eye towards prospective sustainability. Such sequencing can provide significant leverage and support for project teams and evaluators to causally trace complex change processes and produce plausible explanations when concrete outcomes are still unknown. The rubric uses a five-point scale based on these sequential steps, with respective criteria for each level, moving from none to partial to full uptake of collaborative social accountability processes. In recognizing the reasonable limits and appropriate expectations for sustainability and scale, a score of 5 or 100% does not equate to wholesale uptake or replication, in the context of this rubric, for the many reasons discussed above.1 This interpretation may be different to the use of percentages by other MEL practitioners and assessment methods. However, translating each level (1-5) in the rubric with a percentage score provides a common reference point and metric that is easily comparable and transferable across different projects and 1   A s pe r t he G PSA’s evolv ing Th eor y of Ac t ion and by design, GPSA projects do not intend or expect to achieve wh o l esa l e u pt a ke of a c ol l aborat ive s oc ial ac c ount abil ity process within a given sector and country of operations, given the l i mi ted b u d g e t a nd t im e-fram e and t h eir exper im ental nature. T his aligns with the authors’ conceptions and evidence a bo u t w h at is real is t ic t o expec t for s us t ainabil ity and scale-up of social accountability programming operating in c o mp l ex g ove r n a n c e sys t em s of int ers ec t ing and c ontinuously shif ting political, economic and social dynamics of inf luenc e. 10 Execut iv e Summa r y programs (for more detail on GPSA outcomes, indicators and application of the rubric, see the MERL Guide for GPSA Grant Partners and Consultants). Use of or adaptations of this rubric can eliminate percentages and adjust criteria for different levels in the scale as long as the core features of reasonable expectations (‘good enough’), sequential causal steps, and transferable units of measurement are still applied. No vid nc of n us / pplic tion/ d pt tion of l m nt(s) of or insi hts from coll bor tiv soci l ccount bilit proc ss b n priorit st k hold rs nd/or public SCORE s ctor institutions. No vid nc of st k hold r int r st, di lo u of li nm nt. 0% 01 Th unit of m sur m nt for this indic tor in th GPSA’s R sults Fr m work is 0%. Th r for , scor of 0% would b provid d for th indic tor in th R sults UPTAKE Fr m work nd consid r d s ‘no upt k ’. Evid nc of int r st b priorit st k hold rs nd/or public s ctor institutions SCORE xpr ss d publicl or priv t l bout l rnin from coll bor tiv soci l ccount bilit 25% 02 proc ss in th proj ct. In this inst nc , scor of 25% would b provid d for th indic tor in th GPSA’s UPTAKE R sults Fr m work. Evid nc th t priorit st k hold rs nd/or public s ctor institutions h v xpr ss d SCORE wh r to dopt, d pt nd/or sust in l m nts or insi hts from coll bor tiv soci l ccount bilit proc ss nd how this could b incorpor t d in som w into oth r 50% 03 op r tions, pro r ms, polici s (i. ., concr t ntr points h v b n id ntifi d). In this inst nc , scor of 50% would b provid d for th indic tor in th GPSA’s UPTAKE R sults Fr m work. 2 Evid nc of di lo u with priorit st k hold rs nd/or public s ctor institutions on SCORE how to dopt, d pt nd/or sust in l m nts of th coll bor tiv soci l ccount bilit 75% 04 proc ss in futur op r tions, polici s, or pro r ms. In this inst nc , scor of 75% would b provid d for th indic tor in th GPSA’s UPTAKE R sults Fr m work. Evid nc of ctions t k n b priorit st k hold rs nd/or public s ctor institutions on SCORE doption, d ption nd/or sust inin l m nts of coll bor tiv soci l ccount bilit 100% 05 proc ss in oth r op r tions, polici s, or pro r ms. Tri n ul tion of d t with t l st 2 sourc s of vid nc to confirm is r quir d. UPTAKE In this inst nc , scor of 100% would b provid d for th indic tor in th GPSA’s R sults Fr m work. Source: Adapted from Wadeson and Guerzovich, 2023 2    The d i a l o g ue w it h pr ior it y s t akeh ol ders is do ne by the project (e.g., the project team, grant partners). 11 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability Empirical Findings and Benefits to Date After developing this relational sequential rubric, the authors tested it by systematically assessing an existing dataset (e.g., project reports, knowledge products, independent evaluations, official Implementation Completion Reports), from a sample of 15 closed GPSA projects, against the indicator presented above. This exercise identified numerous examples of scale-up, actual, and prospective sustainability in various forms from the source evidence. The breakdown of rubric scores and associated percentages from the assessed sample are: Source: World Bank data The findings validated the promise of the resonance pathway to scale as well as the feasibility and applicability of the relational rubric approach to evidencing it, despite the experimental nature of the relational rubric and the limitations of the exercise. The testing enabled adaptive learning and improvements were made to the rubric. The findings challenge the ‘absence of evidence’ dilemma regarding the sustainability and scale of social accountability work. They demonstrate that it is plausible for relatively short projects of three to five years with budgets of less than one million USD to contribute to actions taken by the public sector and other priority stakeholders to adopt, adapt and/or sustain elements of a collaborative social accountability process in other operations, policies, or programs, moving along a resonance pathway to scale. However, in line with the key features and concepts embedded in this approach, the forms found across projects were diverse and did not look the same, but still had similar functions. This reflects and supports the central notion that context-specific processes of interactions, deliberation, social learning, compromises, and loosely coordinated collective action manifest in various ways (forms), yet they are still coherent with and strengthen local dynamic systems and processes for improved public service delivery and policy (functions). Another benefit yielded through applying this relational rubric method is its potential to help move beyond a siloed understanding of projects within a program or portfolio; emergent findings can foster synergies and cross-learning between projects and aggregate results at a higher level. In this case, the rubric’s application for each sample project and the aggregation of these results at the portfolio level, produced knowledge in a way that is not possible through evaluating projects individually without a transferable method. This added value contributes to the GPSA’s corporate mandate and offers a practical means for other social accountability programs (or related fields) to do the same. 12 Recommendations With more testing and iteration of the relational rubric, and intentional design and implementation of projects and MEL systems to apply it in real-time with primary data, the authors propose that the evidence base for the resonance pathway to scale will grow. This can in turn create new knowledge and influence a shift in discourse about the potential long-term impact of social accountability work, challenging the ‘absence of evidence’ dilemma in the field. To meet these interlinked aims, the authors recommend that: » The GPSA and other funders and organizations working on social accountability make intentional and long-term investments in robust research and evaluation initiatives using the relational rubric to assess actual and prospective sustainability and scale-up, based on the key concepts, features and preliminary evidence presented in this paper. » Ensuring that funders, practitioners and evaluators hold realistic expectations for the success of collaborative social accountability processes and individual projects, recognizing the many legitimate forms of sustainability, the incremental steps involved, and long-term time-frames required for scale-up. » Targeted selection criteria and planned sufficient resourcing of external evaluators and internal MEL staff with the appropriate skillsets to monitor and evaluate programming in this way. » A supportive leadership environment and sufficient investment for systematic assessments of scale-up and sustainability at project and program levels, repeated consistently over time. . 13 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability 01 — Introduction 14 Scale (or scale-up) is “the ability of a project or program to grow its effects beyond its sectoral and geographical boundaries, to reach more people” (Guerzovich and Poli, 2014). It is often referred to as the Achilles’ heel of social accountability programming and various other participatory approaches to development. Relevant reviews and literature in the social accountability field find that while many interventions achieve some form of positive results, there is limited tangible evidence to demonstrate they have successfully scaled-up to reach more locations and people (E-Pact, 2016; Fox, 2014). Sustainability in the simplest terms is defined as “when and how a project’s net benefits continue or are likely to continue after the end of a project” (OECD DAC, 2019). While a social accountability project can be sustainable by yielding net benefits that continue beyond the project life without scaling-up (Guerzovich and Poli, 2014), many social accountability assessments consider scale-up as a key pathway and indicator of project sustainability (see Guerzovich, 2022c). While scale and sustainability are not synonymous terms, the prevailing assumption in the social accountability literature is that scale-up is required to evidence sustainability. The failure to scale-up by the end of an intervention is often assumed to suggest a lack of sustainability in the future, contributing to a pessimistic narrative about the potential and long-term impact of these interventions and broader social accountability work (Aston, 2021). These pessimistic and ill-fitting assessments of the potential of social accountability processes conceptualize and evaluate scale with an eye towards sustainability (i.e., considering prospective sustainability) in simplistic and unrealistic terms. The traditional evaluation approach found in the current evidence base often reflects scenarios in which social accountability processes will look qualitatively the same from their design to the end of an intervention, and the expected changes are expressed quantitatively. The first diagram in Figure 1 illustrates this assumption as a ’scale-up transmission belt’ by showing a gray ring that enters a black box and then produces bigger or more replicated gray rings. Yet, when the changes interventions seeks to make are complex and contingent on many others actors in a system who bring with them their own circumstances and agendas, what happens inside the black box is critical to informing the expectations for and assessment of results. Inside the black box, the causal path towards scale-up is rarely linear while the results to which they contribute are diverse, as illustrated in the second diagram in Figure 1: Comparing Expectations of Outcomes for Social Accountability Scale-up in Complex Governance Systems. To adequately understand and assess this phenomena requires a different approach to gauge whether projects are on the right track towards meeting their goals for complex change, and the potential for benefits to continue and evolve in the longer-term (Haldrup, 2020). 15 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability Figure 1: Comparing Expectations of Outcomes for Social Accountability Scale-up in Complex Governance Systems Traditional Expectations Black Box Social Accountability Process Outcomes for scale with an eye towards sustainability Resonance Pathway Relational Expectations Social Accountability Process Outcomes for scale with an eye towards sustainability Source: Own elaboration This paper is aligned with and builds upon other salient factors emerging in both evaluation and social accountability fields: the Organization for Economic Co-operation and Development - Development Assistance Committee’s (OECD DAC) revamped evaluation criteria to assess sustainability of interventions (OECD DAC, 2019); the lively debate about how to connect (Patton, 2020) and apply (Kania et al, 2018) this within a systems thinking lens; and literature about co-production and social learning (Ostrom, 1990; Doin et al 2012; Wenger-Trayner and Wenger-Trayner, 2021). These all provide useful stepping stones for re-thinking, defining and developing fit-for-purpose concepts and methods for evaluating complex change processes both within and beyond the social accountability space. However, there is a dearth of operational and transferable approaches to put this cutting-edge thinking into practice. To help address the challenges of monitoring and evaluating the scale-up and sustainability of complex change processes both conceptually and operationally, this paper presents an innovative method developed for this purpose - a relational rubric. Using the evidence from applying this method to a sample of social 16 I nt r o d uct io n accountability projects, the paper explores whether and how social accountability processes and the emerging insights support scale-up during and beyond a project life; how these results can be better evidenced through individual projects; and how they can be aggregated upwards to provide more comprehensive knowledge about the sustainability of social accountability programming. The theoretical underpinnings of this relational rubric is the resonance pathway to scale, which has been ignored or missed by most research and evaluations in the social accountability field.3 As will be discussed further, the resonance pathway opens up the black box and identifies the many forms of results to which localized social accountability processes contribute. It also accounts for what can happen when stakeholders that were not involved in their design and/or implementation of such processes, encounter learning and insights that emerge from them. Accordingly, this paper conceptualizes scale-up with an eye towards sustainability as a gradual, sequential process of joint or social learning, deliberation, and compromise. All these interactions are central to support, trigger, and contribute to scale-up in ways that support adaptive uptake. However, these often look like loosely coordinated collective action rather than replication of an original model in which all forms of scale look qualitatively the same. See Box 1 for two scenarios that illustrate these dynamics at play with examples of qualitatively diverse emergent outcomes that can result through a resonance pathway to scale, as also depicted in Figure 1. Box 1: Scenarios Illustrating the Emergent Outcomes That Can Result through a Resonance Pathway to Scale Scenario 1: Actors such as the mayor of a different village, the person responsible for an education region, a bureaucrat in a national ministry and a staffer in a different donor agency are not directly implementing the social accountability process of a given civil society-led intervention. They may have awareness or some engagement at different points. They will often look at the lessons emerging from the implementation of a social accountability process. They will also consider what value it can add to the work they are doing in different locations or sectors in the system. Due to their different roles and interests within the local system, they will bring in new perspectives and considerations. Therefore if and when they decide to scale up the social accountability process, their version of the process and its scale-up will often be an adaptation of the original design; this is an emergent outcome rather than one that could be accurately anticipated at design. The resulting process will take on new properties, with some components taken up, and others changed or dropped. This happens as part of the deliberation and compromises needed to enable the scale-up for sustainability of a given social accountability process. Scenario 2: A civil society group implementing a social accountability process finds out that a ministry may incorporate parts of its process, but will only accept to focus on some aspects and integrate some new protocols. This would be a form of scale-up which could reach many locations because of the ministry’s role within the system. An alternative would be to multiply facilitators that can effectively implement a social accountability process. From a handful of committed and experienced professionals hired by a civil society group to hundreds, hired by another public or civic organization to reach new locations. This would be a huge task, and without compromise on the process between the designers and implementers (the civil society group and professionals) and other actors who can facilitate scale-up, it is unlikely that the ministry would choose to scale-up the social accountability process.5 Source: Own elaboration 3    The re s o n anc e pat hway t o s c al e int roduc es t wo important theoretical innovations. First, a pathway to scale that ref l ec ts t h e l ive d exper ienc e of m any prac t it ioners b ut has been overlooked by traditional schools of thought in the f ield , w h i c h pr io rit ize d bes t prac t ic es and res is t anc e as the driving forces for scale up and sustainability. S econd, traditional s c h o o l s of t h o u g ht h ave oft en pres ent ed t h eir preferred pathway to change as universally applicable, despite mixed results. Th e re s o n a n c e pat hway t o s c al e foc us es on t h e c onditions under which it and other models may be better bets and thei r l i mi ts (G u e r zov i ch et al , 2 0 2 2 ) . A l s o s ee A s t on, 2 0 22. 5   Th a n k yo u t o Th om as A s t on for s h ar ing t h is example f rom a GPSA project. 17 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability Moving beyond conceptions to practice,the authors systematically applied the relational rubric method on a sample of 15 closed projects directly supported by the World Bank’s Global Partnership for Social Accountability (GPSA). They used existing project documentation (e.g., project reports, knowledge products, independent evaluations, Implementation Completion Reports), as well as GPSA staff’s tacit knowledge and accounts to triangulate the findings. A rubric assessment was determined for each project and then aggregated across the sample. This new dataset identified numerous examples of scale-up, actual, and prospective sustainability in various forms. The resulting evidence helped to validate the resonance pathway. Learning from the process supported testing and improvement of the rubric. The exercise revealed that absence of evidence about social accountability scale-up to date does not equate to evidence of absence in practice, nor does it equate to social accountability processes being unsustainable. Rather, the core problem is one of ill-fitting concepts and methods for assessment of social accountability processes and outcomes. Such processes and outcomes may be familiar to practitioners but are only observable and evidenced when monitoring and evaluation of interventions captures systemic processes at work between different actors, as the relational rubric does. This relational rubric also has an innovative sequential component. It organizes relevant actions and events in a temporal order to help identify if and how scale-up is on the right track, with an eye towards prospective sustainability. Such sequencing can provide significant leverage and support for project teams and evaluators to enable them to causally trace complex change processes and produce plausible explanations when concrete outcomes are still unknown. The approach also has points of contact with other theories and frameworks that focus on dynamic relationships, such as collective governance (World Bank Group, 2017) and social contract theory for development (Cloutier, 2021), as well as key literature in social science focused on researching and causally explaining complex phenomena (including Pierson, 2004; Guerzovich, 2022a). This paper is targeted primarily for monitoring, evaluation and learning (MEL), and social accountability practitioners interested in or grappling with how to better assess and evidence sustainability and scale-up of specific projects. It also provides a practical means to aggregate that evidence at the program level, in order to demonstrate higher-level impact, enable comparisons, and build testable hypotheses about what works and in which conditions. The next two sections provide a brief overview of the two key building blocks of this paper. Section 2 explores the revised OECD DAC evaluation criteria, particularly the sustainability criteria, and the linkages with and relevance to the approach presented in this paper. Section 3 presents the resonance pathway to scale (Guerzovich et al, 2022) and its features. Sections 4 and 5 present and justify the development of the innovative relational rubric methodology touched on above. It explains how such a method can effectively and practically monitor, evaluate and learn about how scale-up of social accountability programming happens, with a eye towards sustainability. Section 6 presents findings from applying the relational rubric method to the GPSA sample. The examples and evidence demonstrate that localized projects of three to five years with budgets of less than one million USD can produce processes and outcomes which contribute to many forms of sustainability, including via scale-up, in different settings. This section will likely most interest social accountability practitioners and their funders. Section 7 concludes with reflections on the promise of a shift of focus to prospective sustainability and the resonance pathway to scale. It proposes ways that the relational rubric approach can help the GPSA and potentially other practitioners and funders better design, monitor and evaluate complex, systemic social accountability projects and portfolios in real-time. It suggests how such learning can build new 18 I nt r o d uct io n knowledge about social accountability sustainability and scale in its many forms, and challenges ill-fitting methodological paradigms and erroneous claims about the limited long-term value of this work. Key recommendations for collectively meeting these aims across the social accountability field are presented. Annex A presents a deep dive into cross-fertilization across World Bank’s Global Partnership for Social Accountability and beyond. Using this approach, the program delivered on its corporate mandate by producing knowledge that no single project would have produced on its own. © Dominic Chavez / World Bank. Further permission required for reuse 19 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability 02 — Shifting Towards More Meaningful Monitoring and Evaluation of Relational Social Accountability Processes within Dynamic Local Systems 20 In 2017, Caroline Heider, then-Director General Evaluation at the World Bank Group argued that the time was ripe for the evaluation community to revisit the evaluation criteria that most development organizations use. She explained that “development practitioners, as much as evaluators, know that development processes do not follow such linear assumptions. Instead, one action might cause a number of reactions that have effects in rather diverse ways. Hence, we need to develop evaluation models that capture the effects of complexity to inform policymakers and practitioners about the actual effects of choices they make and actions they take” (IEG, 2017). Two years later the OECD DAC revised its evaluation criteria to assess interventions, taking those insights into account (OECD DAC, 2019). Of particular relevance to this paper and its findings is the revised sustainability criteria, as presented in Box 2. Box 2: OECD DAC Definitions for Sustainability Evaluation Criteria Conditions for actual sustainability: This examines “the extent to which any positive effects generated by the intervention demonstrably continued for key stakeholders, including intended beneficiaries, after the intervention has ended. Evaluators can also examine if and how opportunities to support the continuation of positive effects from the intervention have been identified, anticipated and planned for, as well as any barriers that may have hindered the continuation of positive effects. This can support findings that demonstrate adaptive capacity in an intervention where it was required”. Prospective Sustainability (or the future potential for sustainability given factors in the operating environment that could favor sustainability): “Examining prospective sustainability entails a slightly different approach. An evaluation examining the future potential for sustainability would assess how likely it is that any planned or current positive effects of the intervention will continue, usually assuming that current conditions hold. The evaluation will need to assess the stability and relative permanence of any positive effects realized, and conditions for their continuation, such as institutional sustainability, economic and financial sustainability, environmental sustainability, political sustainability, social sustainability and cultural sustainability”. Source: OECD 2021, p73. The approach for monitoring and evaluating scale-up with an eye towards sustainability, operationalizes two important insights that are embedded within the revised criteria. Firstly, the new criteria introduces a systemic lens, asking evaluators to consider how interventions fit with the system in which they are implemented and to what effect. A systemic lens focuses on interactions of a wide range of actors in a system, rather than a narrow focus on the siloed actions of the project’s direct civil society implementers. This lens is essential to capture scale-up processes because they often rely on downstream actions taken by others in the system, rather than the ongoing dependence on civil society actors to continue leading this work indefinitely through more interventions. These actors include public sector institutions, funders, and other development agencies that adopt, adapt and/or sustain elements of social accountability processes in different ways after a project ends. This paper illustrates the many ways in which different actors in a complex governance system can support, trigger, and contribute to scale-up in ways that strengthen local public service delivery and their respective systems. Secondly, the updated OECD DAC criteria provides a way out of another problem found within social accountability evaluations and evidence: the ‘absence of evidence versus evidence of absence’ dilemma. This often leads to the erroneous conclusion that if lasting change cannot be evidenced or demonstrated during or right at the end of a project, then movement towards scale-up or other forms of sustainability are not happening and will not continue. When there is high uncertainty and insufficient time has elapsed to observe tangible sustainability, evaluators should consider and use methods that can assess the conditions required for actual sustainability and the likelihood of prospective sustainability, in a wide range of different forms. The overall methodological framework and relational rubric presented in this paper aligns with and furthers these concepts by integrating and building evidence for the resonance pathway to scale. 21 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability 03 — The Resonance Pathway to Scale for Social Accountability 22 This paper follows and builds on the work of Guerzovich et al (2022), who argue that scale is a complex, relational process. This is often misunderstood in social accountability literature which also fails to acknowledge the important role of social learning and compromise in fostering sustainable results.5 To help fill this evidence gap, the GPSA teamed up with World Vision and several experts and colleagues in the social accountability field to research and inform a theory of change focused on scale-up. This theory of change accounted for emergent insights, evidence, and experiences from a wide range of social accountability programs of the GPSA, World Vision and CARE International, amongst other partners. As presented in Box 3 and Figure 2, several pathways to scale in social accountability were found, including: 1. Replication of best practices 2. Resistance 3. Resonance This research claims that the resonance pathway has been largely ignored in literature and evaluations to date; this claim is further validated by this paper and the evidence of applying the relational rubric approach. Addressing this theoretical blind spot and evidencing how a resonance pathway works with more fit-for- purpose ways to assess it, will enable social accountability practitioners and evaluators to fill these evidence gaps. It will also help change faulty narratives about the potential and limits of social accountability work. Box 3: Pathways to Scale for Social Accountability Guerzovich et al (2022) argue that there are at least three major pathways to scale, based on their research and experience in the field: 1. The replication of best practices pathway, whose main anchor is technical expertise and ‘rigorous’ knowledge. 2. The resistance pathway, through leveraging the countervailing power of resistance to power and opposition. 3. The resonance pathway which seeks resonance and best fit with existing public sector efforts. The first two pathways are commonly assumed in the social accountability literature. However, practitioners’ experience often reflects the third one, the resonance pathway. Unlike the others, the main thrust of this pathway is social learning. The expectation here is that social accountability work scales-up based on deliberation, compromise, and coordinated collective action among diverse actors. The underpinning logic is that social accountability processes contribute to overcoming the challenges of collective action in a game theoretical sense (Ostrom, 1990; also see World Bank, 2017). That means by enabling a group of individuals to organize and work out how to make the most of a situation (e.g., insights learned by implementing social accountability in select locations), they can create shared gains (e.g., using those insights to inform decisions in other locations) through loose coordination and collaboration. Each pathway places different emphasis on the dividends derived from conflict and on the promise of social learning to resolve collective action problems (see Figure 2 below). 5    S o c ia l l e a rning for t h e pur pos es of t h is paper and the pathways is a f low – or a chain of events – that involves p eo p l e e n g a g i n g wit h eac h ot h er and w h ic h l eads t o a change in something they care about. S ee (Wenger-Trayner, 2014). 23 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability Figure 2: The Role of Opposition and Social Learning in Pathways to Scale for Social Accountability High Resistance Perceived Opposition Medium Resonance Low Best Practice Low Medium High Role of Social Learning Source: Own elaboration The resonance pathway and relational rubric method also account for how different actors in a complex governance system can support, trigger and contribute to scale-up in ways that strengthen local public service delivery and their respective local systems. This is a dynamic that existing research and evaluations in the field often miss when they equate social accountability scale with: ● An overly narrow stakeholder focus, e.g., only the implementing civil society organizations or community groups. ● An overly limited range of applicable actions, e.g., continuation of existing projects, short-term outputs, advocacy, and campaigning. ● Unrealistic expectations for ambitious results over a short time frame, e.g., ‘all-or-nothing’ dramatic changes, such as wholesale replication or complete institutionalization of social accountability processes without adaptation, incremental progress, fits and starts, and gray areas (for alternative approaches see Integrity Action, 2020). Instead, the resonance pathway and the relational rubric recognizes that social accountability scale-up depends on the interactions between and actions of several actors within a given system. This thinking also asserts that there are many discrete elements (or components) comprising a social accountability process. Social accountability interventions are led by civil society, but scale-up often implies negotiation with, adaptation and application of social accountability processes in many forms, by other actors in the system, rather than uptake through replication of the whole process. For example, imagine a collective social 24 The Res onance Pathway to Scale for Social A cco unt a b ilit y accountability process designed, facilitated and implemented in a few schools through a funded intervention of a civil society group. Rather than expecting this civil society group to fund and continue their work in these same roles indefinitely, the learning and evidence from the intervention can inform how other actors within a school district can implement their own version of such a process in more schools in the future. A systems-thinking lens is required to recognize these nuanced dynamics at play. Due to emerging contextual changes and uncertainty, adaptations to a social accountability process are acknowledged to be common as well as potentially desirable. This supports a narrative for long-term, lasting changes that places local actors as the drivers at the center of their own context-specific development stories. Taking this view, the goal of sustainable social accountability interventions, scaled or not, should be to contribute towards a stronger local system that is constantly, if imperfectly, “innovating in terms of how people participate and how those in power are accountable to the society they serve” (Jacobstein, 2019). Furthermore, the GPSA’s learning and experience over the past decade reinforces that a key ingredient for sustainability gains in complex governance systems is the collective action of civil society, citizens and community groups working with public service actors to jointly organize and solve problems in a way that is suited to their context. Enabling and expecting adaptation to specific local spaces and over time, and the associated ongoing experimentation and learning, are all critical for both delivering and assessing fit-for- context social accountability outcomes and their sustainability. The GPSA usually tailors its call for proposals to fit World Bank strategies in countries whose governments opted into the program at a given point in time. This may suggest social learning is not at work in the scale- up of social accountability interventions from the ground upwards. However, it is important to emphasize that, ultimately, these are civil society-led processes that operationalize broad parameters and seek to engage local bureaucrats and officials. This is different from the traditional approaches of other World Bank operations that are anchored in the interface between national governments and World Bank staff. Furthermore, the government officials at the central level who opt into the GPSA program or engage in a World Bank operation are not the same actors that engage in a GPSA social accountability project design and implementation at local levels. Even during implementation, key engaged stakeholders often change over the course of the project. Therefore, the uptake of social accountability processes outside of the boundaries of the project is often facilitated by people who were involved in the project at different stages. This is especially relevant in contexts of instability that experience frequent changes within government and civil society, whereby public service officials and civil society members often shift between roles, levels, and organizations. GPSA evidence shows that these actors bring specific learning, capacities, tools and other elements of social accountability processes with them. And that these actors apply adaptations of social accountability processes in their new government posts, civil society or funder organizations and/or require the original designers to consider and accept (or not) compromises as part of the ongoing process. 25 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability Box 4: The Role of and Contribution to Local Systems in Social Accountability Thinking and Practice For the purposes of this paper and its findings, a system is defined as the interconnected set of factors (policies, practices, resource flows, relationships and connections, power dynamics and mental models) that jointly produce a development outcome – the whole is greater than the sum of its parts. Traditionally, social accountability research and evaluation have focused on standalone interventions, with local systems and their components being of secondary concern. However, more recently, there is a growing number and diversity of schools of thought about ways of thinking and doing social accountability mindful of local systems. According to USAID, a local system refers to actors in a partner country. As these actors jointly produce an outcome, they are ’local’ to it. Development outcomes may occur at many levels - local systems can be national, provincial, or community-wide in scope. Using this approach means relying on that local system to produce desired outcomes. As will be discussed below, there is also evidence about how social accountability contributes towards stronger local systems. From the perspective of international actors, strengthening a local system means building up the capacities of many local actors from government, civil society, communities, and the private sector - and the system as a whole. Source: Own elaboration All these interconnected and fluid dynamics also means that scale-up is not guaranteed. The inherent uncertainty involved in complex governance processes and contexts also means that outcomes are likely to vary in form and significance, as the examples in this paper illustrate.6 The authors argue that this is why sustainability via a resonance pathway to scale is a legitimate framework for conceptualizing and assessing the outcomes of social accountability projects. And it is also necessary if relatively small projects are to contribute to scale-up in complex dynamic systems. Table 1 synthesizes key insights about the resonance pathway to scale, drawing on the final research paper as well as a series of dissemination blog posts from sector experts (see Guerzovich et al, 2022; Guerzovich, 2021c). These are illustrated by examples and evidence from the GPSA portfolio. As further unpacked in the table, resonance captures the idea of an iterative process of deliberation, compromise, social learning, and collective action through which scale-up happens, with fits and starts (see Guerzovich et al, 2022, and Aston 2022). 6    I n t e c hn i c a l t er m s it is pos s ib l e t o as s es s whether scale up (ef fect) occurred via resonance of a social accounta bi l i ty pro j e c t (c aus e) t h roug h t h eor y-infor m ed causal analysis without assuming that endogeneity is a problem o r p re- d e t e rm in es res ul t s s uc h as is om or ph ic m im icry (Andrews et al, 2017). Project design seeks to increase the chances th at u pt a ke ha ppens b ut does not det er m ine t h e results. More generally, on this issue in social science research desi g n see Me a d we l l , 2 0 2 2 . 26 The Res onance Pathway to Scale for Social A cco unt a b ilit y © Dominic Chavez / World Bank. Further permission required for reuse 27 Table 1: Lessons about the Resonance Pathway to Scale from the GPSA Portfolio and Learning Theme Lesson Examples and Evidence from the Field The GPSA and other social accountability practitioners’ main aim; focus for sustainability and scale-up; and indicators are In Georgia, the concrete ways through which Save the Children- centered on the relationships, norms, resource flows and other Georgia, CIVITAS, and partners contributed to implementing the factors through which actors in the system can contribute to Early and Preschool Education Law at the municipal level provided scale and sustainability over time, including as circumstances useful insights that were fed back into the ongoing education policy- shift. This approach means that continuity of a particular civil making process in country. Insights and relationships from the A focus on the project were also instrumental in supporting an improved COVID-19 society led tool, project, or even brand that may have been fit problem that for solving a specific problem at a point in time is not expected. pandemic response when schooling went virtual. However, providing needs solving Instead, the insights emerging from the collaborative social insights for different kinds of problems needs different kinds of accountability process inform other actors’ actions. This often responses and listening to different stakeholders – in other words entails adaptation and some renegotiation of the original the process of uptake was subject to emergence to develop solutions method or tool (e.g., scorecard). This is less of a concern if the fit for the job in each case. adaptation helps to solve the given problem.7 GPSA projects are likely to contribute to scale-up by promoting social learning, deliberation, compromise, and collective action. The resonance pathway to scale, although new to the social accountability theoretical evidence base, (Haldrup, 2020) reflects the lived experience of many practitioners (including In Sud Kivu in DRC, insights emerged from activating Village Health but not limited to GPSA partners in Moldova, Georgia, the Committees through the Cordaid-led GPSA project. These became A resonance Dominican Republic, the Democratic Republic of Congo (DRC), useful for other donors who used lessons to inform their own and Mongolia). programing –they adapted the lessons to their own organizational pathway to scale priorities and circumstances rather than pick up and fund the GSPA Collaborative social accountability processes in these cases project. seem to have enabled groups of individuals to organize and work on making the most of GPSA-funded projects to create shared gains beyond the project time frames and geographical limits. 7    I nt e g r it y Ac t ion ( 2 0 2 1 ) c al l s t h is a proc es s v i ew of sustainability. T his is shared by others in the social accountab i l i ty sp a c e. Theme Lesson Examples and Evidence from the Field Rudy Prawiradinata, a Senior Advisor to the Minister of National Resonance has Development Planning in Indonesia, shared that he was considering potential when how to improve frontline service delivery through citizen engagement, Scale-up via resonance seems to be possible in contexts in there is appetite but was concerned about resistance from local authorities as well which there is some mutual appetite for stakeholders to for solving solve problems with others, even if there is initial skepticism.8 as raising citizens’ expectations without having the capacity to problems It seems to be more likely when stakeholders have prior meet them. That is until he talked with stakeholders in communities with others experience engaging in dialogue across the state-society where Wahana Visi implemented its Citizen Voice and Action project. divide and the capacities and trust associated with it.9 It is Then, Prawiradinata realized that there were ways to use insights (Guerzovich et also harder to pivot to resonance when actors have a history of from this project to achieve his goals, by informing a component al, 2022, and confrontation and mistrust (Aston and Zimmer Santos, 2022). of another funding facility (Kompak). The work would eventually Guerzovich, be adapted, funded and implemented by the Asia Foundation, with initial support provided from Wahana Visi but no long-term 2022b) engagement from the organization in implementation (see Annex A and Kompak, 2018). The GPSA identified early on that it would focus its grant- making on targeting concrete problems that actors in specific The mid-term and final evaluations of the first GPSA project in the High levels countries prioritized as fertile ground for joint problem-solving Dominican Republic suggests that pivots from more confrontational of perceived (see GPSA, 2020). In Paraguay, for instance, it focused on to more collaborative approaches are possible over the course of one opposition addressing shortcomings of the country’s conditional cash project. However, behind the scenes this process was burdensome (either by transfers program (GPSA, 2019), while in Tajikistan it focused and risky. Organizations predisposed to open confrontation to ‘open government or on improving community-based monitoring standards for a door’ continued to do so, even when the door was already opened. civil society) the water and sanitation sector (GPSA, 2018). Tailored This meant that time, resources, and opportunities were lost and context and stakeholder engagement helped to ensure that the risk of losing the trust of officials who opened the door had to need to be barriers to collaboration would not be too high nor undermine be proactively mitigated. There were other cases where civil society overcome groups considered that they were in a zero-sum game with their to enable possibilities for multi-stakeholder social learning, across most of the GPSA’s portfolio. It is important to note that there were governments and walked away from the funding and the project. resonance instances when dialogue broke down but could be renewed (as reflected in the example). 8    Th i s s ke pt i c is m is one exam pl e of t h e fac t ors that enable evaluators in specif ic projects to assume a cause-ef fect re l at i o n s hip at wor k in res onanc e, rat h er t h an a situation in which uptake is determined to happen by project des i g n . 9   O rg a n i zat i o ns t h at h ave l ong -t er m t raj ec t or ies in a single site of ten build these bases in a project cycle and can rea p t h e b e n e f i ts and rais e t h eir am b it ion in s ubs equent ones. S ee (Guerzovich, 2022) on Pact’s portfolio. 30 Theme Lesson Examples and Evidence from the Field The 2021-2022 internal assessment of the GPSA’s results framework The World Bank can play multiple roles to support the indicators (which includes a review of several GPSA project application or adaptation of elements of GPSA projects, evaluations), highlights the importance of the role of the TTL in lessons and processes via resonance. For example, facilitating brokering entry points and linkages between GPSA projects and timely access to public sector actors; convening and relevant public sector actors, programs and policies, to enhance the Connectors brokering; informing, advising or funding public sector or other potential for sustainability gains. The evidence also reflected that can contribute development partner strategies and operations by linking when TTL project engagement and support is weaker, opportunities to or hinder them with insights from GPSA grants (Guerzovich et al, 2020; for sustainability and scale-up might have been missed or not Guerzovich and Poli, 2020; Green, 2017). Many GPSA grant leveraged to their full potential (Wadeson, 2022). GPSA project resonance partners have benefited from the support of World Bank task evaluations and documentation from projects in Kyrgyzstan, team leaders (TTLs) and country and sector teams, as they Morocco, Bangladesh or Ghana, suggest that when TTL interest paved the way for sustainability and scale-up. wanes and World Bank teams can no longer enable social learning, the chain to scale-up is more likely to break down (see Mills, 2019; GPSA, 2019a). The kind of systems change associated with social accountability sustainability and scale-up is not linear. Sustaining meaningful achievements over time (and seeding the conditions for them) depends on many actors, their relationships, and interactions as well as other components of the local system. Fits and starts in ‘resonance-style’ uptake due to systemic factors - The World Bank Implementation Completion Report at the close such as variations in the support from World Bank teams or of the TAME project in Mongolia noted that strong local ownership other local dynamics – are sometimes temporary, rather than as well as upfront planning and investments in sustainability and permanent. scalability diminished risks over time (Meyanathan, 2021). When the independent evaluation of TAME was delayed and carried out Most project evaluations do not benefit from delays and Incremental months after the project’s closure, it found that the 31 Parent- cannot demonstrate the extent to which any positive effects Teacher Associations established by TAME were still functioning progress generated by projects continue after they end, including and playing their roles, continuing to find ways to collectively despite fits ongoing influence over policy dialogue. In these cases, it is solve problems at school levels. The delayed evaluation therefore and starts important that evaluators consider that gradual, complex uncovered that while the application of lessons seemed to be stalled transformations are more common than often assumed in the at project completion, it was rekindled later. This phenomenon has social accountability space (Guerzovich, 2022a). To address been observed in other GPSA projects, including in Mozambique, and this timing problem, as recommended by the OECD DAC and discussed in the 7th GPSA’s Global Partners’ Forum (GPSA, 2021). discussed in this paper, the evaluative focus should be on a project’s investments in creating conditions for sustainability during its lifetime as well as prospective sustainability after it closes. This can be evidenced through specific signals (e.g., expressed interest, dialogue, established entry points) in the operating environment that could favor sustainability, and therefore the potential for scale-up too. 31 Theme Lesson Examples and Evidence from the Field Social accountability scale happens in context, mediated by local actors to address local problems in context. What matters is not just whether the specific tools or scale-up processes have the same form – i.e., that they look the same on paper or across contexts. The very process of scale-up An intervention where there are a strong set of pre-existing with an aim for sustainability requires adaptations to be relationships and joint engagement practices often requires Sustainability effective. Therefore, the key is whether such processes function different levels and types of investments for facilitation than for and scale that comparably in practice and produce similar effects. sites where the same types of groups and relationships must be is guided by localization started from scratch.10 Similarly, a government seeking to scale- These processes of grounding and adapting interventions up an intervention across a country, whether in-house or by in dynamic to local contexts and enabling the ‘continuous pursuit of partnering with diverse civil society groups, often needs to consider systems and improvement, innovation in approaches, crowding in by various organizational circumstances that may be overlooked when the focuses on actors’ are also associated with greater resilience in dynamic intervention is implemented by a single civil society group (or function, contexts. They often bounce back and continue to evolve and assessed as if the outcome depended on a single civil society actor). not form produce results, as opposed to attempts or expectations for wholesale replication and formal continuity of interventions that do not fit the context or fail to account for changes in dynamic systems (see Jacobstein, 2019; GPSA 2021a). Source: Resonance is not the only pathway to scale, but it seems to apply to a broad range of social accountability work analyzed by the researchers who defined resonance; this has also since been validated in learning sessions with others working in the field. In cases where the contextual conditions fail to hold, other pathways to scale may be better suited for reaching more geographic locations, sectors and ultimately people (Guerzovich, 2022b). These include the often referred to ‘best practice’ and ‘resistance pathways’ to scale anchored in technical expertise or pressure tactics, respectively (see Box 4 above). Collectively these insights inform a nested, mid-level theory of change (see Guerzovich et al, 2022) that specifies these three identified pathways (resistance, best practice, and resonance) to social accountability scale-up. These helped the authors to identify where the GPSA experience may fit, as well as the contextual conditions under which each one may be most promising. This theory of change still merits further research to test and refine it. However, the relational rubric approach and assessment exercise discussed in the next sections has supported this effort by building the evidence base further and testing these assumptions. 10 This l aye ring ph enom enon is c om m on ac ros s project cycles in the social accountability space (Guerzovich, 2022) 33 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability 04 — A Relational Rubric to Evidence Sustainability and the Resonance Pathway to Scale 32 Through its 2020 updated Theory of Action and Results Framework, the GPSA provided a formal starting point to systematically evidence the resonance pathway in GPSA projects. It included a specific medium to long-term outcome and indicator on uptake, to assess the many forms of sustainability, specially via scale- up (see Table 2 below). As discussed, success for the GPSA is demonstrated when lessons from or elements of collaborative social accountability inform decisions and actions taken by the public sector and other civil society and development actors beyond an individual project, including after the project has ended. This result is often associated with the uptake of selected element(s) of a collaborative social accountability process, rather than complete replication or scale-up of the entire process (wholesale). Such expectations are unrealistic given the scope, budget, and time-frame of GPSA projects (and many social accountability projects in general). It is also important to emphasize that collaborative social accountability processes do not equate to specific tools or capacity development nor are these the only elements that the GPSA seeks for uptake by other actors. While these are important components that can be sustained and scaled based on a GPSA project experience, a social accountability process is much broader, encompassing many elements and examples of ‘what counts’ for sustainability. This section and Box 5 unpack this in further detail. Table 2: The GPSA’s Results Framework Outcome and Indicator on Uptake (i.e., Sustainability and Scale) Elements of collaborative social accountability processes are taken up by public sector Outcome institutions and other relevant actors beyond individual GPSA projects.11 Other relevant actors can be INGOs/CSOs, World Bank teams, funders. Percentage of GPSA grants in which public sector institutions and other relevant actors seek to: 1. use substantive lessons for improvements of targeted policies, processes, and mechanisms; 2. apply or sustain elements of collaborative social accountability processes after life of the project; 3. adapt insights from GPSA projects to scale them through programs or policies; Indicator or 4. apply elements of collaborative social accountability processes in additional localities or sectors. Note: this can be done through the government’s own reform program, donor-funded programs, or World Bank-financed programs. Health sector: Number of priority stakeholders, including local hospitals, public health sector institution officials (central, regional, district and/or village), CSOs, and World Examples Bank team that commit to applying elements of the project’s collaborative social accountability process in additional localities after the project ends (i.e., scale). Education sector: The Ministry of Education uses lessons from the project’s collaborative social accountability process to improve the ongoing education sector policy reform. Source: Adapted from Wadeson and Guerzovich, 2023 11   L a n g u a g e rel at ed t o s oc ial l ear ning s h oul d be integrated into f uture revisions of this GPSA Results Framework ou tc o me a n d i n d i c at or, t h e M ER L G uide, and t h is r ubric. T he GPSA and its ME RL f ramework are always evolving based on lea rn i n g a n d ev i d e nc e. 33 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability During the lifetime of a project and at its closure, it is not possible to have certainty about the future in a complex system (or any system). However, it is both possible and desirable to focus on the likelihood of sustainability, and therefore scale-up as one form of it. As per the revised OECD DAC criteria and guidance, related assessments should focus more on signals for prospective sustainability and uptake, such as whether a project has been moving along a sequence of steps that can contribute to scale-up, even after project closure. Due to the sequential nature, monitoring data from the whole project life, and triangulated evidence at the final evaluation stage are both critical. Furthermore, ongoing attention to monitoring data helps project teams to identify, plan for and build opportunities that can support the continuation of positive effects of a project, from the point of its design, while also mitigating barriers and risks along the way. For example, the mid-term evaluation of TAME project in Mongolia suggests that relevant and effective monitoring, evaluation and learning for action can help projects to more systematically anticipate and integrate specific elements to enhance the prospect for uptake in the future. This intentional practice and foresight can also preempt discontinuity in uptake processes after project closure. The GPSA’s internal monitoring, evaluation, reporting and learning (MERL) system and recently published MERL Guide (Wadeson and Guerzovich, 2023) have built this focus into the indicator and its respective assessment approach. The ‘seek to’ is a key part of this indicator statement and also operationalizes the OECD DAC’s guidance. The uptake of elements of collaborative social accountability processes facilitated through GPSA projects is contingent, in that it can be introduced but not sustained by the project after its closure. The key to evidencing the likelihood is to investigate if concrete steps have been taken by the project, such as: ongoing dialogue with relevant public sector officials and World Bank operations staff over time; identifying potential entry points where elements of the project’s collaborative social accountability process can live and grow in future; and the steps stakeholders take to compromise and to leverage those points. This could be through substantive practical forms via another program or reflected in government policy changes or reforms. Using this relational rubric for measuring the conditions and likelihood prospective sustainability, relationships, associated capacities (especially adaptation) and systemic factors are at the core of evidencing scale-up with an eye towards sustainability in the GPSA model, as in other innovative indicators used in the social accountability field.12 The project trajectory should ideally demonstrate that this has been an ongoing process from the onset, driving the potential for uptake forward, and course correcting as required, based on solid learning and evidence. Accordingly, the relational rubric is designed to support projects to monitor and evaluate this process and the associated causal sequence of events. Resonance, as explained above, captures characteristics of scale-up of social accountability (non-linear and emergent, multi-dimensional, the product of multi-directional interactions) entailing processes that often look different in various local systems. They are contingent on localized relationships, social learning processes, and give and take among other factors. This requires a shared understanding about what constitutes progress and success to ensure measurement of what is intended (i.e., construct validity). Therefore, it’s critical that evaluative judgments are transparent and clearly understood. Rubrics, which are a form of qualitative scale to denote levels of performance and support assessment, explain what the standard means and clarify the reasoning behind an assessment. They are a useful instrument to deal with the challenge of assessment of a relational pathway such as resonance. As social accountability evaluator Tom Aston has argued, rubrics “provide a harness but not a straitjacket for assessing complex change and they help stakeholders build a shared understanding of what success looks like” (Aston, 2021). 12     Fo r exa m pl e, Pac t m onit ors t h e s oc ial c apit al of organizations as key to sustainability and Integrity Action implem ented a n i t e rat ive approac h t o m onit or s us t ainab il ity. For other approaches see Guerzovich, 2022. 34 A Relational Rubr ic to Evidence Sus t a ina b ilit y and the Res onance Pathwa y t o Sca le Figure 3 introduces the five-point relational rubric13 that the authors developed to transparently assess the many possible outcomes for sustainability, including via scale-up, in line with the resonance pathway and the GPSA’s Theory of Action.14 It was tested across a sample of 15 projects supported by the GPSA. It will be used for current and future GPSA projects and to aggregate and compare results at the portfolio level across time. It is important to note that this rubric can be applied at different stages of a project, not just at the final evaluation, for example, to monitor prospective sustainability throughout the project or for ex-post reviews, by slightly adapting the language in the rubric to clarify the timing.15 13   To l e a r n more ab out t h e r ub r ic and g uidanc e for its application during both monitoring and evaluation phases, see Wa d e s o n and G uer zov ic h , 2 0 2 3. 14  This ru b r ic was s pec ific al ly des ig ned for as sessing the evaluation criteria of sustainability as conceived by this p a p er, n ot fo r ot h er eval uat ion c r it er ia (e. g . , OEC D DAC criteria of ef fectiveness, impact). If additional criteria are pa rt of a M EL sys t e m or s pec ific eval uat ion, t h en ot h er suitable evaluative tools and methods would be needed to complem ent th e r u b r ic . 15   ­Fo r exa m pl e, a G P SA int er nal l ear ning exerc ise in 2022 collected long-term results f rom a sample of closed GPSA p ro j ec ts (u n pu b l is h ed) . Th e s am pl e inc l uded 1 4 of t he 15 projects used for the initial rubric test and f indings presented i n th i s pa pe r. 35 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability Figure 3: Sustainability Relational Rubric Levels with Criteria Source: Adapted from Wadeson and Guerzovich, 2023 36 © Simone D. McCourtie / World Bank. Further permission required for reuse 37 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability 05 — Reflections and Guidance on Using This Relational Rubric 38 While presenting empirical findings of results is always a key priority and interest, the process that enables evidencing them is also important to explain, because the ‘what is the how’. Previous theoretical and methodological choices have created blind spots that pre-empted useful and fit-for-purpose evidence for sustainability and scale-up of social accountability work. This section provides a range of partly overlapping considerations associated with the definition and application of this rubric to address challenges and misconceptions and to provide more guidance. A politically minded, relational rubric: Throughout the life of GPSA projects, grant partners seek to create buy-in and demonstrate the value of collaborative social accountability processes to a range of relevant actors who can support or directly ensure downstream uptake which often calls for repeat interactions and embracing compromises. In doing so, these actors often develop a stake and ownership in the process, becoming more capable and likely to promote collaborative social accountability. As a result, they may actively seek opportunities for scale whether through new or existing policies or programs or to apply insights and lessons in new localities and sectors. These actors also play insider influencing roles, inaccessible to many CSOs due to common power asymmetries. As such, they can better identify where uptake and partial adoption or adaptation of collaborative social accountability processes are possible, and effectively support scale-up for long-term sustainability. Achieving this requires strong political acumen, relationship building skills, and access on the part of grant partners. If relevant actors are invested, they may choose to put their own ‘stamp’ on a collaborative social accountability process or adapt it with an iterated model when scaling, so that it resonates with their own perspective and place in the system. This product of deliberation, compromise and loosely coordinated action is considered a success in the GPSA’s Theory of Action. Such a case would be a strong example of impact, even if the project’s specific contribution is less visible, and the form of the process looks quite different than what it was during the project. Identify logical sequencing and incremental steps: The expectation and existing evidence points to the tendency of scale-up to be incremental. A critical element of enabling this is the project’s efforts to engage and find avenues for sustainability should take place throughout the project life, not just at the end. GPSA learning over time has shown that there is often a certain sequence of events associated with uptake for scale-up. There is a long tradition in comparative institutional and political analysis to study ’cases’ that are decomposed into a sequence of events and whose “causal claims rest upon the inferences derived from the analysis and comparison of those sequences” (Falleti and Mahoney, 2015). This ’comparative sequential’ method, which “can and must encompass more specific methods of cross-case analysis and within-case analysis,” informed the development of the relational rubric.16 The rubric established a logical sequence of events to be pro-actively driven by a project to enhance the likelihood of sustainability, including scale-up. Its performance levels are directly related to each stage in this sequence. This starts from no evidence of any interest expressed, dialogue with or actions taken by relevant priority stakeholders regarding sustainability and scale of the social accountability process. It then moves through a chain of increasing levels of interest, identification, dialogue, and actions, requiring sufficient supporting evidence or insights for substantiation. (of element(s) or insights from a collaborative social accountability process by any priority stakeholders or institutions). It sets out concrete, observable steps that could be causally linked to a project’s efforts, evidencing its contribution. In this way, the relational rubric is designed to help project teams and evaluators to effectively trace and evidence the sequential steps that often play out in a causal manner. This specificity and ongoing evidence collection also supports project teams to make informed course-corrections when projects are not on track. 16   P ro c e s s t rac ing infor m ed our approac h t o t heory building and testing theory within cases, while a transferabilit y l en s, g ro u n d e d in s im pl e m at c h ing t ool s and ot h er methodologies, was used for meaningf ul aggregation and compar i so n of c o m pl ex p or t fol ios . On t h e for m er, s ee Sec t ion 4 ; on the latter see Wadeson et al, 2020. 39 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability Define reasonable limits for the rubric scores and criteria: The 1-5 scale in the rubric provides criteria to guide scoring of the indicator with a sequential logic, where each step is assumed to be a steppingstone to the next. Each numerical score corresponds to a percentage, with all percentages above zero representing different degrees of partial uptake. However, full uptake is not included as this is unrealistic. Setting this limit ensures a realistic ambition for projects and their assessments. By design, the GPSA does not expect that individual and relatively small-scale projects will lead to sustainable collaborative social accountability processes in the form of complete adoption, continuation, replication and/or extensive scale-up. Ensure realistic expectations for scores and expect uncertain, emergent effects: A score of 2 or 3 reflects a positive outcome. A score of 4 or 5 would be regarded as a significant success, but also one that is quite challenging to achieve within the time-frame and scope of most GPSA projects – which makes it important to judge the direction of travel in context. Expectations should be tempered given the timescale and resources of a given grant as well as the nature of progress - often incremental. For example, it could be reasonable to celebrate a score of 2 earlier in a project but to expect a 3 or 4 by its end. Scale-up is an experimental and highly contingent result, it requires difficult changes to result from the interactions of multiple stakeholders and momentum to be maintained (although often with stops and starts) over time-frames that are usually longer than a project cycle. Focusing on ‘ideal’ results can obscure learning about plausible, partial results, which had been identified by the GPSA community as most relevant to learning and course-correction. The relational rubric is designed to guide monitoring and evaluation practice in a more realistic and fit-for-purpose way considering partial uptake of specific elements with modifications as ‘success’ to be evidenced, learned from, and shared. It is therefore important to cast a wide net on different potential outcomes for sustainability in a range of different forms as success is manifested in diverse ways. Box 5 provides examples of ‘what counts’ and what should not be considered as evidence of scale- up, based on GPSA learning and evidence, and the findings of this exercise. Focus the relational rubric explicitly on processes and functions, not tools (see Wadeson, 2020): As previously discussed, scale-up happens in context. It is mediated by local actors who seek to address local problems suited to their given locality and sector. The process brings together unique combinations of stakeholders, dynamics, norms, perspectives, and experiences, amongst other variables. Therefore, these processes will vary widely in their form between contexts to be locally relevant. As a result, the specific form the exchange takes or the design of specific components of the process is far less important than the way the process is meaningfully adapted to the context to support deliberation, compromises and, eventually, coordinated action. For example, an evaluation of GPSA projects in Malawi explains that when teachers who participated in social accountability processes were transferred to new schools, they “inspired by the project activities became harbingers of the social accountability initiatives … at their new schools. Similarly, the [primary education advisors] in targeted zones had participated in capacity building initiatives on social accountability principles and practices and observed them at work in the targeted schools. They took the messages to non-project schools within their zones through their advisory roles” (Chingaipe et al, 2022).17 In different Malawian schools, the transferred teachers and primary education advisors look different, as do their specific activities, but they are playing the same role of supporting uptake of lessons beyond the initial sites of project implementation. This result seems to have been obtained 17   This eva l u at ion was c onc l uded aft er t h e initial testing of the rubric, but f urther validates the f indings of this note. 40 Reflections and Guidance on Us ing This Rela t io na l R ub r ic in other GPSA projects not included in this initial rubric testing.18 In other projects, government authorities or World Bank staff performed this function and paved the way to scale, as will be discussed more in the next section. Focusing on function over form starts by defining standard but broad concepts, such as uptake, sustainability and scale-up with an eye towards sustainability. In many projects, GPSA stakeholders defined both old and new concepts together, considering emergent practices, evaluations as well as research in the sector. Their aim was to craft clear, explicit definitions to support common understanding about what exactly the GPSA intends to measure and learn about collectively, before determining the ‘how’. Such definitions are also important to support the transfer of key ideas to different project contexts more easily, and to ensure similar dynamics are being assessed (i.e., construct validity). This isn’t meant to prescribe, but rather to ensure consistency while consistently drawing on multi-directional learning and dialogue with partners on the ground and evaluators (such as discussed in Annex A). While there are no perfect definitions and concepts that can evolve with learning and practice, striving for ‘good enough’ is reasonable, while also being careful about conceptual stretching.19 Functional equivalent indicators are key: After concepts are defined for common understanding, establishing a set of core indicators to operationalize these concepts is important, with specific guidance on what is (and is not) essential to document, monitor and evaluate across projects. This enables aggregation and comparison at the program level for richer more systematic evidence and learning over time. These indicators should be linked to the theory of action or theory of change of a given program, articulating how and when change is expected to happen on the pathway to scale. While these indicators will be localized to each project, they should be standardized in a few ways. This includes using the same units of measurement and representing the same core concepts and assumptions for how change is expected to happen over time. These are referred to as ‘functional equivalents’ - akin to the expression ‘comparing apples to apples’. Ensuring functional equivalents enables reliable comparison across projects in different geographies and sectors and also allows for aggregation at the program level (see Wadeson and Guerzovich, 2023). The expectation is that with more comparable data, the appropriate conditions and time horizons for impact can be identified in realistic terms, based on a broad range of examples of ‘what counts’ for success. For the GPSA, this means that key concepts and elements (including practices, approaches, tools, and mechanisms) involved in collaborative social accountability processes are consistently defined and represented by the role (function) they are expected to play rather than by their exact form. The forms they take across GPSA projects may look different in practice, as they should since they need to be adapted and localized to the project context (Wadeson and Guerzovich, 2023). It is important to note that the process of clarifying concepts and ensuring functional equivalents ​ often requires dedicated MEL staff and external evaluators to work directly with project teams to build this understanding and practical capacity. The level of effort depends on how much project teams will be directly involved in the MEL of a project (the degree of participation). 18    To m A s t o n s h ared ev idenc e t o s ol idify t h is result in his independent evaluation of a GPSA project in the Dominican Rep u bl i c . 19     M a h o n (19 9 3, 845 ) defines c onc ept ual s t retching as “the distortion that occurs when a concept does not f it th e n ew c a s e s .” 41 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability Make the assessment based on triangulated evidence: Ideally this will use monitoring data from project teams as well as their own primary data to make a balanced and robust assessment on the rubric scale. While the rubric application used GPSA projects as the sample and evidence base, it is relevant beyond GPSA projects. Other MEL practitioners in the social accountability field are encouraged to help test, learn from and improve it over time. It is recommended that MEL practitioners who wish to apply it as intended, understand both rubrics as an assessment tool, and the conceptual framework for social accountability sustainability and scale-up that are presented in this paper. Ensure sufficient evidence quality and range: In addition to casting a wide net for what counts, using the rubric to reach specific scores relies both on the quality of evidence generated by a monitoring and evaluation system and the range of identified examples analyzed in context. A lack of triangulated verifiable evidence of examples with insufficient details on why and how the uptake happened would not contribute to learning about what works for sustainability and scale-up of collaborative social accountability processes. The use of percentages: Recognizing appropriate expectations for sustainability, a rubric score of 5 or 100% does not equate to wholesale uptake or scale-up. Translating each level in the rubric to a corresponding percentage score provides a common reference point that is easily understood and comparable. Each level in the scale moves sequentially from no uptake to partial uptake, to full uptake (while noting that full uptake does not mean total replication for the GPSA, as discussed). However, the use of percentages is not essential in the use or adaptation of this rubric by others. The criteria for each level of the scale can be adjusted, but the authors advise that the core features of the rubric approach and its conceptual underpinnings are maintained. © Simone D. McCourtie / World Bank. Further permission required for reuse 42 Reflections and Guidance on Us ing This Rela t io na l R ub r ic Box 5: What ‘Counts’ for Results of Uptake? Cast the net wide to mitigate blind spots and identify hidden successes: The rubric embeds the same ‘detective’ approach that allowed the GPSA to uncover tacit knowledge about resonance to date, including actions taken as a result of lessons from collaborative social accountability projects, even in cases where the decision is to not pursue the recommendations as they were written. This means ensuring that a range of results ‘count’ positively when collected and analyzing data. Examples range vastly and can include: 1. GPSA/World Bank project personnel, and representatives from civil society and community organizations and networks integrated into another public sector project or program. 2. Public sector counterparts used lessons to inform public sector reforms and policies. 3. Emulation by local public sector or service providers (e.g., education officials and schools) that observed, adopted or adapted the collaborative social accountability process from a project. 4. The World Bank or other funders used lessons and approaches to advise public sector or other development partners’ programs. 5. The World Bank or other funders financed an adaptation of the project in the same or other sectors. 6. Any observed or reported uptake, sustainability and/or scale-up led by other international non- government organizations (INGOs) or civil society organizations (CSOs). 7. The project actions and trajectory demonstrated ongoing dialogue with key actors (relevant public sector officials and World Bank operations staff) to move the process for potential uptake of collaborative social accountability processes forward. Be clear about the boundaries of what does not count: It is important to be clear about events and actions which are not applicable examples of uptake, even though they are sometimes claimed as such. These include: 1. Grant partners share a report or knowledge product and invites key stakeholders to events on learning (an output). 2. Grant partners meet with government, without information about follow-up or subsequent actions taken. 3. Grant partners run a campaign, issue documents and messaging for awareness, advocacy, etc. 4. The media disseminates the content of civil society demands for collaborative social accountability or related advocacy messages. While these might be useful activities that are part of the project implementation or results in terms of information sharing and dissemination, they do not count as a positive instances of uptake action by decision makers to support sustainability and scale-up. It is the actions taken and/or the use of information shared which is critical for uptake and to enable sustainability and scale-up, as per the GPSA’s conception. Source: Own elaboration 43 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability 06 — Testing the Relational Rubric to Better Understand and Learn about Collaborative Social Accountability Scale-up 44 The relational rubric was tested, using sample GPSA projects, to uncover whether it enables better understanding about whether and how the projects have scaled-up, and whether it validates the GPSA’s Theory of Action and the resonance pathway to scale. The test methodology is shared in Box 6. Box 6: Testing the Relational Rubric on a GPSA Sample The first step towards testing the relational rubric was to define the criteria for a purposive sample. There were 30 eligible projects within the GPSA portfolio that could be considered. The sampling and analysis were done by a professional evaluation consultant working with the GPSA, who had not had involvement in any of these projects. The scope of the sample also considered the limitations on resources available for this initial test. The final criteria were: ● Completed projects, as these would be the most likely to demonstrate potential for scale-up. ● Projects with several high-quality secondary documentations covering the whole project life. ● Projects with external midterm and/or final evaluations and/or World Bank Implementation Completion Reports (ICR), because these sources often have the most detailed information and include independent assessments, which helps for triangulation and mitigation of bias. The final purposive sample of 15 projects (50 percent of the sample frame) is presented in Figure 6. After sample selection, the application of the rubric was piloted in one project in the sample – the TWISA project in Tajikistan led by Oxfam. The following examples of scale-up were identified: 1. Another CSO adopted the project’s collaborative social accountability model in other country locations, which were not part of the original TWISA project. 2. WHO Tajikistan used the project’s Service Performance Indicators in their project on water supply and sanitation services assessments and water safety plans. 3. The Swiss Development Cooperation Agency supported use of the project’s collaborative social accountability model by government implementing agencies that they are funding. 4. The European Commission also supported use of the project’s collaborative social accountability model in its other funded projects. Evidence that supported this was sourced in the project documents: ● The Implementation and Completion Report provided specific details and actors demonstrating actions for collaborative social accountability process uptake or expressed interest/support for it. ● The independent final evaluation found that the project actively created synergies with other programs and actors that could help with uptake (scale-up/scale-out) and sustainability i.e., Tajikistan Water Supply and Sanitation Network, and engaged the Ombudsman presidential appointee. ● The project identified suitable policy entry points to advocate to the government for collaborative social accountability support i.e., the new Action Plan for Water reform signed-off by the highest national authority and the Ministry of Energy and Water. ● The project pursued avenues for long-term sustainability from the onset and throughout, not just at the end. This combined and triangulated evidence resulted in a score of 5, considered as 100% uptake as per the rubric criteria. The rubric was then applied to the other 14 projects in the sample. The results were aggregated. Source: Own elaboration 45 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability Figure 6: Final Purposive Sample of Projects Source: Based on World Bank data 46 Tes ting the Relational Rubr ic to Better Under s ta nd a nd Le a r n about Collabor ative Social Accountabilit y Sca le -up Each of the 15 projects in the sample was deemed to achieve at least partial uptake, meeting a rubric score of 2 (25% uptake) or more. Several examples were identified across the source documents (and in many cases, the final independent evaluations) and triangulated as much as possible with the data available.20 Table 3 shows the breakdown of results by rubric level. Table 3: Results of the Relational Rubric Testing Exercise on a Sample of 15 GPSA Projects in 2021 Relational # of Projects Rubric Score in the Sample Relational Rubric Criteria No evidence of any use/application/adaptation of element(s) of or insights from a collaborative social accountability process 1 or 0% 0 projects by any priority stakeholders and/or public sector institutions. UPTAKE No evidence of stakeholder interest, dialogue of alignment evidenced. Evidence of interest by priority stakeholders and/or public 2 or 25% 2 projects sector institutions expressed publicly or privately about learning UPTAKE from a collaborative social accountability process in the project. Evidence that priority stakeholders and/or public sector institutions have expressed where to adopt, adapt and/ 3 or 50% or sustain elements or insights from a collaborative social 4 projects UPTAKE accountability process and how this could be incorporated in some way into other operations, programs, policies (i.e., concrete entry points have been identified). Evidence of dialogue with priority stakeholders and/or public 4 or 75% sector institutions on how to adopt, adapt and/or sustain 4 projects UPTAKE elements of the collaborative social accountability process in future operations, policies, or programs. Evidence of actions taken by priority stakeholders and/ or public sector institutions to adopt, adapt and/or sustain 5 or 100% 5 projects elements of a collaborative social accountability process in UPTAKE other operations, policies, or programs. Triangulation of data with at least 2 sources of evidence to confirm is required. Source: Based on World Bank data 2 0    Th e 15 proj ec t s not inc l uded in t h e s am pl e d id not meet the criteria for evidence availability. T herefore, their potenti a l s u s t a i n a b i l it y is unknow n; pr im ar y dat a c ol l ection and targeted analysis would be required to determine this. 47 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability The exercise was limited as it did not allow for primary data collection due to time and resources available, but it provided good enough evidence for the purposes of testing the relational rubric. It was conducted by an independent consultant who was not part of the projects or the GPSA team which helped to mitigate bias. The relational rubric was also reviewed and tested by other evaluation experts to support inter-rater reliability. Testing should be continued as the rubric is used by the GPSA and others. The testing exercise helped to refine the relational rubric further, and surfaced the following lessons: It can be applied using secondary evidence. Although evaluations did not explicitly apply the rubric, the retrofitting of data to apply to the rubric was still possible and useful. It made sense, helping to build confidence in it as a fit-for-purpose method to assess sustainability and scale outcomes. It provided a transferable metric to compare and aggregate these outcomes and the success of different GPSA projects. Its logic and format were easy to communicate to others (as was done at the GPSA’s 8th Annual Partners Forum, in the Scaling Social Accountability panel session (GPSA, 2022). It validated the assumption about casting a wide net for what counts for success, and captured scale-up in the various forms that a resonance pathway can take. It supported the GPSA’s inductive-deductive approach to theory building. The rubric with key evidence and lessons from using it has been presented at internal and external forums with positive reception from World Bank stakeholders, GPSA grant partners and evaluators outside of the GPSA. While there is clear promise in the process and results, the use of the relational rubric is experimental; it is still being tested. It has been and should continue to be updated with new evidence and learning, including beyond the GPSA. The wide range of potential outcomes for sustainability criteria to be met means that scores can be aggregated and compared. However, what this looks like in practice will still differ vastly across projects. So, it is important to emphasize that when projects received the same rubric score, scale still looked very different. For example, four projects were given a rubric score of 3 or 50%. This means that there was “evidence that priority stakeholders and/or public sector institutions have expressed where to adopt, adapt and/or sustain elements or insights from a collaborative social accountability process and how this could be incorporated in some way into other operations, programs, policies (i.e., concrete entry points have been identified).” To meet this criterion, a wide range of examples were included from these four projects, such as: Expressions of interest from government/CSOs to use tools and guidance developed by the project (SEND Ghana). Likelihood of a governance body mechanism (like a steering committee) to be used in future projects (SEND Ghana). Government actors and parliament making regular requests to the lead CSO for inputs on relevant sector matters post-project (SEND Ghana). Key international aid actors (GIZ, UNICEF, USAID, IRC) report use of approaches derived from the GPSA/CODESA experience, often after having witnessed it in the field (Cordaid DRC). 48 Tes ting the Relational Rubr ic to Better Under s ta nd a nd Le a r n about Collabor ative Social Accountabilit y Sca le -up Project managers who exited the project ended up in influential positions in other institutions and, when interviewed, suggested that they were using CODESA ideas in the design of large UNICEF, World Bank, and USAID programs (Cordaid DRC). Evidence that GPSA/CODESA health sector experience influenced the set-up of social accountability mechanisms in the education sector (Cordaid DRC). Municipalities have pledged to continue the collaborative social accountability mechanism of benchmarking beyond the project. Additional municipalities have expressed the will to participate in the established governance mechanism (Save the Children Georgia). Social audit practices are being considered for scale-up to other districts, including nationwide, and to other programs of the Ministry of Social Development (CIRD Paraguay). A World Bank Country Partnership Framework incorporated new engagements expected in social protection system effectiveness including social accountability mechanisms based on the project experience (CIRD Paraguay). The findings of the test exercise suggest that it is plausible to evidence the realistic contribution to scale-up with an eye towards sustainability of collaborative social accountability processes in projects with a duration of three to five years and budget of less than USD one million, provided projects are assessed with relevant conceptual frameworks and methodologies under qualified MEL staff and evaluators. Those methodologies need to be suitable for exploring and capturing actions taken by priority stakeholders and/or public sector institutions (to adopt, adapt and/or sustain elements of a collaborative social accountability process in other operations, policies, or programs) both within. This can include signals or commitments made by them during the project life for future scale-up. For example, in Madagascar, most participating municipalities (32 of 46) budgeted for the continued operation of the collaborative social accountability processes post-project and the expansion of the approach by new projects and areas by other development actors.21 The findings validated the assumptions of the authors and other partners that there are many potential outcomes and forms of scale-up (see Box 5). They also reinforced the need to look beyond numbers and percentages in the rubric when analyzing the data; a relational and systems lens is critical for understanding the ways in which the assets and learning of various local actors improve on approaches in ways beyond those envisioned during the project design. The potential for growth and resilience exists within these emergent systemic dynamics, through continuity and ongoing adaptation of social accountability processes suited to evolving local contexts. Transformation, thus, is not contingent on wholesale adoption of solutions advocated by the organizations that initially designed or piloted the solution. Adaptation, give and take, and social learning are essential elements that enable resonance with others holding different perspectives and ideas to be taken forward. It is important to review the qualitative details and narrative trajectory of each case to understand the nuances and vast range of how sustainability can take shape on the resonance pathway to scale. An important caveat is that the longer-term trajectory for scale-up, signaled at the end of a project and often reflected in its final evaluation, is not a guarantee. Some processes for scale-up will stop and stall during and after project implementation. Others may resume after years of stalling. Systematically analyzing this variation can offer important insights for how to monitor and evaluate the scale-up of social accountability processes across different contexts. 2 1  S e e Je s pe rs en, 2 0 2 2 , expl aining t h at s om e improvements piloted by TAME have informed the B asic Education Su p p o rt P ro j e c t ( PASEB II) – c o-financ ed by t h e Wor ld Bank and the GPSA. Also see Lekweiry and Falisse, 2022. 49 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability 07 — Concluding Insights and Recommendations for Measuring Complex Change in Social Accountability Work and Beyond 50 Many of the assumptions at the core of the approach and findings discussed in this paper are consistent with a growing evidence base from broader literature on systems aware work. Commonly used evaluation frameworks are not compatible with the complexity and uncertainty associated with these processes and equate mixed results to lack of effectiveness, contributing to problematic ‘gloom and doom’ narratives in the social accountability field (Aston, 2021). Efforts to develop and apply relevant monitoring and evaluation approaches have often been seen as too complicated and dismissed or poorly prioritized. However, the emergence of a vibrant community of monitoring and evaluation professionals committed to developing methods that are well-suited to support the complex, systemic and pressing development problems of this era – from the climate crisis to governance failures is changing that trend. This paper introduced the relational rubric as part of a systematic and operational approach for monitoring and evaluating complex change, applied to collaborative social accountability programming. The rubric is consistent with the revised OECD DAC evaluation criteria for sustainability. The findings discussed show that systematically and causally tracking complex scale-up with an eye towards sustainability is achievable and not as difficult as previously imagined. While using primary evidence in real-time is highly recommended, the authors found that this can still be done in a meaningful way even using secondary data, long after project closure. The Value of Applying a Relational Rubric Grounded in a Resonance Pathway to Scale Building upon previous evaluation-action work on pathways to social accountability scale with an eye towards sustainability, this paper focuses on a new relational pathway - resonance. The resonance pathway was previously missed and seems to apply to a broad set of (but not all) social accountability interventions and contextual circumstances. The relational rubric helped operationalize the resonance pathway further, capturing the many forms of sustainability, involving scale-up through deliberation, compromise and coordination of diverse local actors working in complex governance systems. As hypothesized, these emergent, multi-stakeholder processes of scale up with an eye towards sustainability can be as complex as the systems that they help to strengthen. Yet the rubric demonstrates that they can also be knowable and traceable with fit-for-purpose concepts and tools. It provides a way of uncovering them in a systematic way across very different projects and contexts; at both the project and portfolio-level. Overall, the findings of this inductive-deductive exercise validated the promise of the resonance pathway to scale as well as the relational rubric method for evidencing it – both of which are fit for evidencing incremental, transitory, piecemeal, and intermediary processes, contexts, and outcomes (Guerzovich, 2022a). It demonstrated that projects did move towards scale with a view towards sustainability, along a pathway that is coherent with and strengthens local dynamic systems. The testing of this relational rubric also revealed that it is possible to find evidence and concrete examples of prospective sustainability in many forms by the time projects end. However, expectations for achievement and/or evidence of sustainability and scale-up during the life of most projects or directly after they end, are both overly ambitious and unhelpful for projects with short time frame and limited budgets (such as GPSA projects with average three-year durations). Therefore, the sequential logic embedded into the relational rubric approach focuses on and provides a practical tool to causally assess the conditions required for actual sustainability and the likelihood (rather than the certainty) of prospective sustainability. 51 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability Potential Uses of the Relational Rubric at the Project-Level The sample data was coded retroactively using a range of independent and internal project documentation. However, the value and utility of the relational rubric can be increased if applied during a project, with real- time targeted MEL to provide actionable information to support teams designing and implementing projects: ● By helping them to reflect on whether they are building a viable approach to achieving (realistic) outcomes for sustainability and scale-up into their project implementation and strategies. ● To spot emergent windows of opportunity, analyze the trade-offs and compromises in different scenarios, and course-correct accordingly. For example, at the mid-point of a project, it is helpful to intentionally reflect on whether potential entry points and relationships with actors in the system who could support uptake have been identified and acted upon, and whether others are emerging following contextual changes. This can catalyze planning and actions by project teams to leverage during the remainder of the project, rather than considering this at end, when it is often too late, and teams are focused on completing implementation and project close-out.22 ● Evidencing challenges go beyond the social accountability field and this framework can also be applied to projects that support other types of systems strengthening. For example, the World Bank supports political economy analysis and other analytical products in several contexts, however it can be quite difficult to measure the uptake of the resulting information that is provided to counterpart operational teams and partners. The relational rubric could potentially help. 23 Potential Uses of the Relational Rubric at the Portfolio Level That Can Support the Wider Social Accountability and MEL Fields Beyond assessing individual projects, conceptualizing and evidencing social accountability scale-up along the resonance pathway with the rubric method, provides important insights about transferring the results of individual projects within portfolios. A portfolio-level analysis of interventions can produce insights greater than the sum of their parts. The interaction and iteration of multiple evaluations form the basis for theoretical, methodological and empirical innovations that no single project could deliver alone. In turn, this can support the wider social accountability field by: ● Helping both social accountability and MEL practitioners better understand and address the siloes created by project MEL that is unconnected to the wider evidence base, preventing knowledge transfer. The fragmentation found between locally led processes can be overcome through cross-fertilization of results and learning across projects. This can help build stronger theories of action and change and social accountability narratives, especially about scale-up and sustainability. ● Fostering more meaningful comparisons and aggregation of results within an overall social accountability program, such as the GPSA. ● Contributing to improved learning about how sustainability and scale-up happens in the broader social accountability field as interventions induce new interactions, innovation, and changes in local systems. In this sense, the relational rubric provides a useful building block to help address the ‘absence of evidence’ dilemma found in current erroneous assumptions and incomplete evidence related to the scale-up of social accountability work for long-term prospective sustainability. 2 2    Fo r d e t a i led exam pl es and g uidanc e on us ing the rubric for monitoring, see Wadeson and Guerzovich, 2023. 2 3    Th a n k s t o M at h ieu C l out ier for t h is ins ig h t into the promise of the rubric method and it could be usef ul beyond so c i a l​ a c c o u n t a b il it y w or k, as per t h is exam pl e about the political economy analysis of the World B ank . 52 Concluding Ins ights and Recommendations fo r Me a s ur ing Complex Change in Social Accountability Wor k a nd B e y o nd © Arne Hoel / World Bank. Further permission required for reuse 53 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability Recommendations More testing and iteration of the relational rubric, alongside intentional design of projects and evaluations to apply it in real-time with primary data, will lead to a growing evidence base for the resonance pathway to scale, increase knowledge about social accountability scale-up, and highlight the many legitimate forms of sustainability outcomes. The authors propose the following recommendations to enable this in practice: Investment in targeted and robust research and evaluations to build on the resonance pathway and improve the rubric approach. The GPSA, other funders, and organizations working on social accountability should make intentional and long-term investments in robust research and evaluation initiatives using the relational rubric to continue to assess and test the resonance pathway to scale, based on the conceptions and preliminary evidence presented in this paper.24 For example, to improve inter-rater reliability and replicability of the rubric method, multiple independent reviewers could assess the same projects to ensure that the core concepts and rubric level criteria are clear and well defined enough. Eventually the rubric could also provide an intermediary variable in measuring systemic-level social accountability and impact on public sector and civil society capacity, independent of the limited scope of a project. To test this, research or evaluation exercises designed to assess mixed results in the scale-up of social accountability processes and related impacts should be conducted. For example, looking into clusters (geographic, sector- and/or relational based), connected to the original project, where better public service delivery and development outcomes have materialized from new projects and policies, programs and policy reforms. Understanding the complex dynamics at play and setting expectations accordingly. Ensuring realistic expectations for the success of collaborative social accountability processes, recognizing the many legitimate forms of sustainability, the incremental steps involved, the inherent mixed results associated with complex processes, and long-term time frames required for scale-up. 24     Th e aut h or s c r edit and t h ank peer reviewer Mathieu Cloutier f or these important recommendati o n s o n s t r eng t h ening t h e r epl ic abil it y of the rubric method and enhancing its potential value and uses, throu g h th e inv es t m ent in m or e t ar g et ed r esearch and evaluation by social accountability f unders and organizat i o n s. 54 54 Careful evaluator selection. Independent evaluators need to bring the right technical skillset (methods) and a firm understanding of social accountability. External evaluations of social accountability interventions require a deep understanding and experience with social accountability work, as well as recognition of reasonable limits and expectations for sustainability. It is valuable if evaluators understand the nature of complex processes, such as those in a resonance pathway to scale (with social learning at its core), where scale-up happens through ongoing deliberation and negotiation between a wide set of actors, dynamics, and contextual shifts within a system. This will often result in non-linear processes, producing mixed results that do not look exactly like the original design, progressing in fits and starts over time. Strong organizational commitment and investment. Project and portfolio-level monitoring and evaluation using the relational rubric method requires competence and a good understanding of functional equivalents of core concepts. This requires a sound grounding of projects in a portfolio-level theory of change that can be applied across diverse individual project contexts. This requires intentional portfolio design; qualified and long-term MEL staff embedded within a program; a supportive leadership environment; and sufficient investment for systematic assessments repeated consistently over time. In closing, the authors offer words of caution and advice for MEL and social accountability practitioners, and their respective organizations and funders. Applying these recommendations and the many enabling conditions can be challenging for MEL and social accountability practitioners, organizations, and funders. It requires the engagement of all stakeholders - from senior management to implementing staff in civil society organizations - to understand and commit to common concepts within a theory of change, and to monitor and evaluate a set of functional equivalent indicators from the project onset. This is necessary for reliable data and comparison over the long-term, aggregable at the portfolio level, and so that MEL systems can track cumulative results and impact. The nature of organizational restrictions, limited resources, technical criteria and continuously shifting political dynamics within organizations and the systems in which they work can make it difficult to embed these essential features. However, the findings in this paper are clear that the evidence and learning pay-off is worth the effort. 55 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability Annex A: Learning about Collaborative Social Accountability Sustainability and Scale-up through Quick Feedback Cycles The conceptual framework for scale up with an eye towards sustainability and the relational rubric approach can apply across many projects and portfolios in the social accountability field. As explained in the main paper, sample of 15 projects funded by the World Bank’s GPSA was used to test and illustrate the main arguments and findings. The World Bank’s decision to establish and fund a portfolio of locally tailored and led social accountability projects that collectively produce value for the field (see Box 7 below), is directly linked to the challenges and gaps in evidencing how social accountability can be sustained and scaled. Box 7: The Creation of the GPSA and the World Bank’s Mandate In June 2012, the World Bank’s Board established the GPSA to provide grants for CSO-led social accountability initiatives in partnership with governments, and to foster knowledge and learning about social accountability in different contexts. The imperative for scale, including as a mechanism to support sustainability, and the field’s limited knowledge about it were both important considerations. The Board paper states: “There is a need for more robust evidence on whether and how social accountability approaches can be sustained, scaled up, and replicated in different sociopolitical settings, and how international partnerships can leverage beneficial change. Addressing these knowledge gaps requires learning by doing, rather than passive research.” The challenge set out by the board was about learning across the GPSA’s portfolio, rather than from individual grants, and potentially connecting this learning with lessons from how scale happens across the broader social accountability field. Source: World Bank data According to the GPSA’s Theory of Action:25 the GPSA expects to demonstrate success when elements and lessons of small and experimental “collaborative social accountability processes inform public sector decisions and actions beyond individual GPSA projects” (GPSA, 2020). The term collaborative social accountability was used by the GPSA to distinguish different assumptions and forms of social accountability programming which exist in the field, acknowledging the diversity and its own comparative advantages. The specification of how this process might unfold put emphasis on the relational aspects of social accountability for short-, medium-, and long-term results, rather than on the tools and capacities that had been central to monitoring, research, and evaluation, but are only small parts of what social practitioners deploying collaborative approaches do or contribute to the deliberation, compromise and collective action that may contribute to scale with an eye towards sustainability. By 2014, the GPSA had awarded two rounds of grants. It convened 165 global partners – a diverse group of development agencies, international and national CSOs, private sector groups and government representatives from around the world alongside World Bank staff. The GPSA Secretariat invited them to reflect on the scale of their interventions, capturing the discussion in a GPSA learning note, which concluded: 2 5    Fo r d i s t i n c t ion b et ween t h eor y of ac t ion and theory of change, see Tyrrel, 2019. 56 Annex A: Lear ning about Collabor ative Social A cco unt a b ilit y Sus tainability and Scale-up thr ough Quick Feed b a ck C y cle s “We discovered a range of on-the ground meanings for scale. We also uncovered that there may be some common challenges and similar pathways that show promise. A big ‘Aha!’ moment: in practice, our colleagues’ experience with scale is rarely directly associated with replication” (Guerzovich and Poli, 2014). This insight was partly puzzling because, at the time, the replication of best practices for particular institutional forms across contexts26 was widely assumed to be desirable and widely promoted in donor documents and governance approaches, as was noted in the World Development Report of 2017. Stakeholders’ fundraising strategies were telling donors what they seemed to want to hear. At the same time, the widespread assumption co-existed awkwardly with the mantra that ‘context-matters’. It also affected how similar interventions functioned in practice (Grandvoinnet, Aslam and Raha, 2015), especially as the field had not identified the circumstances under which the replication of specific interventions might be more favorable. For example, by 2014, the GPSA had received over 600 proposals for funding collaborative social accountability interventions. A systematic analysis of a sample revealed that only a few projects had a clear approach to scale. Guerzovich and Poli (2014) illustrate how a typical CSO applying for GPSA funding articulated its assumptions in this regard: “[The CSO applying for GPSA funds proposed to] implement a pilot project in a range of local settings [it had] identified carefully and where [it would] work with local stakeholders to ensure adoption and implementation. Work in these areas of primary focus [would] help [the CSO] identify best practices that could be replicated elsewhere in the country. However, [the CSO] realize(d) that many of the key decisions about the process [it] care(d) about are made at the national level – i.e., not where [the CSO would be] working most of the time in [the proposed] project. Hence, [the CSO] would employ advocacy and awareness raising activities for national decision-makers taking advantage of the national networks the [CSO who applied] already belongs to. These networks [would] facilitate sharing of best practices and lessons learned to the wider national level audience and through the media for making a strong case for wider adoption of the model. The final phase of the knowledge and learning component of the project [would] focus on advocacy at the national level for country wide adoption of the model developed by the project. This process of wider dissemination and advocacy [would] contribute significantly to enhancing the knowledge base on local government dynamics, practices and intervention needs.” Yet, early on, as the GPSA Secretariat’s capacity building team27 began engaging and connecting civil society partners, World Bank teams and public officials, other approaches and entry points to growing impact began to look more promising and plausible. For example, in 2015, the formative evaluation of the Good Governance Practices for Dominican Republic project noted that public officials were advancing actions relevant to the project’s work (e.g., a transparency portal, the ‘Salir del Escondite’ campaign, among others) and this synergy could pay off in terms of project scale (Guerzovich, 2015). Similarly, a 2017 unpublished mid-term review of the Transparency and Accountability in Mongolia (TAME) project identified that investing in building synergies between a component of TAME and the World Bank-financed Education Quality Reform Project’s school grants component could offer a prospective pathway to scaling and sustaining insights from the TAME project. But to do so, CSO partners and World Bank teams had to listen to other perspectives, reflect on alternative scenarios available to them, and make a choice in terms of whether they would compromise their original vision towards scale up or not. The GPSA team took note of these insights and began connecting the dots, using available resources to continue experimenting, gathering evidence and reflecting on additional knowledge from practice, and adaptive course-corrections. Moments to pause and reflect with grant project teams and annual grant partners’ meetings, among others, helped sharpen the focus of GPSA’s approach to scale. The parameters of this emergent approach to scale were integrated into specific projects, subsequent calls for proposals and reporting templates (Poli and Guerzovich, 2020a). 2 6    This pat hway t o s c al e “as s um es t h at t ec h nical experts who produce knowledge can determine the unique form of so c i a l a c c o u nt a bil it y m ec h anis m s t h at wor k and t hen use their authority and knowledge to promote cross-context conve rg en c e ( i.e ., s c a l e up) t owards t h os e ar rang em ent s” (Guerzovich et al 2022). 2 7   O n t h e G P SA’s c apac it y buil ding approac h du ring this period see Poli and Guerzovich, 2020. 57 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability These emergent insights about how project teams build a road towards scale-up prospectively and emergently in practice, focused on function. This means “identifying the systems and functions which need to be in place in order to support an ongoing process of state-citizen interaction around a particular problem or problem area.” These diverged from some grant partners’ and evaluators’ expectations that the GPSA or a government would fund the continuity or expansion of a project’s exact form i.e., a technology, a tool, a standard.28 Scale-up with an eye towards sustainability can be wholesale or partial happening gradually or immediately. Sometimes our expectations of transformative, wholesale change, mean that evaluators and practitioners fail to identify and document how more incremental change happens in practice over time (Guerzovich, 2022a). Scale-up can also be done effectively by others outside of government (e.g., CSOs/ INGOs, donors, World Bank teams). In fact, grant partners later reflected that holding on to assumptions that CSOs would be funded on an ongoing basis to implement the same approaches in more sites or that their advocacy would enable wholesale uptake by national authorities could divert attention from more promising pathways to scale, and result in missed opportunities for scale-up. Furthermore, such misplaced, or unrealistic expectations often led to undue disappointment or a failure to celebrate successes that were happening in practice. This also perpetuated the misinformed narrative about social accountability failure regarding scale. When the team reflected on the evaluation of Wahana Visi’s Citizen Voice and Action for Government Accountability and Improved Services: Maternal, Newborn, Infant and Child Health Services project in Indonesia, they confirmed that quick feedback loops are taking place at project and portfolio levels. Although this was insufficient to account for how scale happens in practice, the evaluation provided important insights to steer future GPSA evaluations towards an adaptive learning approach, as well as more generally for the GPSA’s monitoring and evaluation system for the whole program (see Box 8). At the same time, the Indonesian evaluation focused on the role of civil society and, thus, did not go far enough. While it found that advocacy was not charting the pathway scale, it did not theorize nor look into how processes that were unfolding beyond civil society were triggering actions that could have done so. In Box 8 below, there is no straight line between civil society advocates and national authorities; rather multidirectional flows connect each actor’s work. The flows connecting actors entail give and take, social learning and collective action within and across sites, rather than replication of best practices or the exercise of civil society’s countervailing power. For instance, World Bank teams and documentation had complementary information about their work with government counterparts and other development partners which helped to spread lessons even wider. These lessons seem to have found their way into more policy dialogues and programming decisions than the evaluation suggested, including inspiring national officials to incentivize other public officials, funders and civil society groups to consider and adapt insights from the Wahana Visi project into their own work (Poli and Guerzovich, 2019; GPSA, 2016; authors interviews with stakeholders and GPSA documentation). 2 8    O n t he d i s t inc t ion bet ween for m and func t ion, see Integrity Action (2020) 58 Annex A: Lear ning about Collabor ative Social A cco unt a b ilit y Sus tainability and Scale-up thr ough Quick Feed b a ck C y cle s Box 8: Using Evaluation to Advance Knowledge About the Effects of Social Accountability The evaluation of the GPSA-sponsored project in Indonesia delivered by grant partner Wahana Visi was the first in the social accountability field to systematically identify how collaborative social accountability can shift power asymmetries and strengthen health systems Figure 7 below synthesizes the findings (World Bank Group, 2007). It was a breakthrough from traditional approaches to evaluating social accountability for at least three reasons: 1. It provided concrete evidence about the ways in which social accountability projects, which seemingly focused on producing more responsive service delivery, were producing systemic effects that are critical to support local systems strengthening and transformation after the end of a particular project. This viewpoint stands in contrast to widespread positions in the field that projects are limited to short-term effects, while long-term organic processes deliver long-term systemic transformations.29 2. It helped GPSA partners and other organizations to highlight the work social accountability interventions are already doing to strengthen systems and produce more concrete knowledge about these systemic, but often implicit effects, on state-society relationships. When research and evaluations sought wholesale normative transformations of power relationships as well as short term production of transparency or other results, they omitted this important aspect of the work. It was unclear whether this was the result of an absence of evidence or evidence of absence. Conversely, when research and evaluations began explicitly asking about systemic effects, they found this effect which is valued by practitioners and communities alike. For example, J.B. Falisse and colleagues conducted independent evaluations of GPSA and non-GPSA projects in DRC. In the latter they reflected that: “The real thread running through achievements … is that the (collaborative social accountability) approach seems to allow the construction of a dialogue between parties that used to speak little (or not at all) to each other and an improvement in the relationship between the population, providers and the governmental side … There are two ways (not necessarily opposed) to consider this renewed dialogue: either as a means to achieve the achievements described below or as an end in itself … but let us emphasize here that dialogue is something that communities, providers, and authorities celebrate as an achievement in itself” (Falisse et al, nd; see also Falisse et al, 2019). As stakeholders in the field spotted the blind spot, prioritized learning about these effects and began to ask questions about them, the absence of evidence and the possibility to address it became clearer.30 3. The Wahana Visi evaluation developed a methodology to trace and make causal claims about the concrete mechanisms which connect the facilities on the frontline to other sites of decision- making, informing collective action and social learning that stakeholders use beyond the frontline. The initial insights about these connections had more in common with the mid-term reviews of the GPSA Dominican Republic and Mongolia projects, than with the assumptions explored by the evaluation or those included in many of the funding proposals submitted to the GPSA. Source: World Bank data 2 9   Fo r a synt h es is of t h es e t wo pos it ions , s ee Nelson et al, 2022. 3 0    Fo r ot her eval uat ions t h at add t o t h e b ody of evidence see Guerzovich, 2022. 59 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability Figure 7: The Creation of the GPSA and the World Bank’s Mandate Source: Guerzovich 2022, adapted from Ball and Westhorp (2018) Therefore, a theoretical gap still needed to be addressed by systematically exploring what seemed to be happening in Indonesia, Mongolia, and the Dominican Republic. Following the learning gained from the Wahana Visi evaluation, the GPSA team agreed with a series of grant partners to ask more direct questions about pathways to scale, encouraging evaluators to explicitly think prospectively about scale, looking beyond civil society’s direct action. Increasing investments in opening the black box could potentially help surface plausible connections between the GPSA’s project efforts and government actors and actions at different levels, as well as with the World Bank and other donors. The investment in using case-based causal analysis to conduct evaluations and produced inferences that could travel (under certain conditions) to other cases paid off.31 Evaluation findings further validated the assumption that something important was missing in the mainstream alternative theories of change about social accountability scale-up – whether it be those grounded in replication of best practice, those that betted on CSO’s adversarial countervailing power, or the hybrid reflected in applications for funding to the GPSA. The first evaluation of this set for the My School project in Moldova (‘Scoala Mea’) explicitly asked whether the project had influenced policy through the World Bank Group’s Country Management Office’s dialogue with the government. The evaluation found that “the project provided information on World Bank operations in Moldova and the dialogue and strategies in the education sector to a relatively large extent” (Costachi et al, 2018). Without asking and assessing this 3 1   O n t h e rig or and pot ent ial of t h is us e of c ases for causal analysis see World B ank Group, 2023 60 Annex A: Lear ning about Collabor ative Social A cco unt a b ilit y Sus tainability and Scale-up thr ough Quick Feed b a ck C y cle s directly, this vital finding and evidence for contribution to the project’s scale (and therefore sustainability) might have gone unnoticed. The GPSA team then shared the Moldovan findings with independent evaluators of other projects. Some accepted the challenge to use this new lens to inform their own assessments. Theoretical and methodological cross-fertilization across the portfolio helped uncover how change was happening across projects as well as the localization of the pathway in specific contexts. For example, the GPSA-funded public sector finance and budgeting project led by CSO SEND-Ghana was assessed as unlikely to be replicated in full in more districts due to financial constraints and competing prioritizations. However, the final evaluation identified three processes that were the mostly likely for the government to use (i.e., prospective sustainability): a multi-actor steering committee format to oversee citizen-public sector engagement on budgeting; an adapted spinoff of the project’s dashboard for citizen engagement on public sector budgeting;32 and the project’s learning and insights to help monitor other World Bank projects. It also identified the government and its parliament’s new expectation and demand for SEND-Ghana’s future technical inputs into these processes – all of which require give and take from those involved (Mills, 2019). Furthermore, in the DRC, the GPSA-funded health sector project led by Cordaid informed other development partners’ programming. This was facilitated by local citizens who are redeploying collective action learned from the GPSA project towards other efforts to improve local health services delivery by working with local government and other relevant stakeholders, listening to multiple perspectives, and reaching adaptive compromises to move forward (World Bank Group, 2020). The evaluations of the Moldova, Ghana and DRC projects were uncovering significant but previously ignored results. Collectively, these evaluations provided new, valuable data to specify how collaborative social accountability operated and which results should be prioritized to better understand and inform approaches for evidencing sustainability and scale-up. Consequently, emerging insights informed the next iteration of the GPSA’s Theory of Action and its monitoring, evaluation, and learning processes. At the same time, the GPSA also revamped its Results Framework outcomes and indicators and began developing fit-for-purpose MEL approaches that could help it to better understand how the results of individual projects ‘added up’ and contributed to wider impact and learning for the social accountability field. This was also necessary to test the assumptions in its Theory of Action and validate the logic of the expected short to long-term results, at the program level. This challenge is discussed in Box 9 below. Encouragingly, these initial findings were further validated by subsequent GPSA project evaluations. For example, the final evaluation of the Improved Social Accountability for Bettering Preschool Quality in Georgia project made a positive assessment of its interventions’ sustainability. Tangible results were a combination of strong ownership (including by stakeholders in municipalities); direct references in the Government’s draft Education Strategy; the transfer of learning to a new World Bank operation; and that civil society partners now had a seat at the table of the country’s education governance.33 3 2    Th e Da s h board was an IC T int erac t ive s oc ial accountability platform for citizens to report or raise concerns with state a c t o rs at t h e dis t r ic t and nat ional l evel s . T he intention was to enable citizen to give the government feedback i n rea l t im e . 3 3   “From the perspective of social accountability, the intangible results of strengthening relationships, experiences gained in collaborative action, and improved agency on the part of project beneficiaries and local stakeholders, are likely to continue after the project completion.” ( Ec orys, 2020) 61 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability Box 9: Greater Than the Sum of Its Parts? The Challenge of Evidencing Impact at the Portfolio Level The GPSA’s grantmaking instrument funds individual projects for civil society organizations to lead responses to local public service delivery and policy problems that may benefit from collaborative problem-solving amongst citizens, civil society, and the public sector. The GPSA invested US$50 million (and over $6 million in parallel funding) in a portfolio of 51 grants for CSO-led collaborative social accountability initiatives in 34 countries and a range of public service sectors such as health, education, governance, social protection, water, agriculture, and public finance, between 2012 and 2022. As per its Theory of Action, “The nature of the GPSA’s grant-making is to make small experimental investments with the potential for scale-up and sustainability. When elements and lessons of collaborative social accountability processes inform public sector decisions and actions beyond individual GPSA projects, the GPSA demonstrates success.” A portfolio-based approach like this one also acts as a ripe platform for strategic learning for action across these complementary experimental interventions. The GPSA was set up with portfolio-level ambitions namely contributing to field learning about collaborative social accountability. However, as all projects are also localized to fit their unique context and relatively small scale in nature, it can be challenging to compare and aggregate evidence at this higher level, to understand whether the combined efforts of all the projects contribute to something that is greater than the sum of their parts. Comparing diverse results and aggregating them in a fit-for-purpose and meaningful way is difficult, especially considering all the diverse forms that scale and sustainability can take, as well as mixed results. This challenge is not unique to the GPSA - many funders and organizations in the transparency, participation and accountability sector (and other development fields) have long been grappling with this challenge. As more organizations are focused on complex, system-level transformations, more civil society organizations and funders are seeking ways to solve this evidence and evidencing gap. The GPSA has been tackling this over the past few years with several connected efforts including a revised Theory of Action and Results Framework of 2020 and moving towards a harmonized Monitoring, Evaluation, Reporting and Learning (MERL) System with specific indicators and methods for projects to use and feed into their results and learning to enable portfolio level analysis, including additive effects in terms of the development of the system. This includes the approach discussed in this note about evidencing scale, sustainability, and uptake of collaborative social accountability processes. Source: Own elaboration 62 Annex A: Lear ning about Collabor ative Social A cco unt a b ilit y Sus tainability and Scale-up thr ough Quick Feed b a ck C y cle s Despite these advances in learning and evidence about scale-up, a few following key related questions remained. Ways to address these are discussed in the main paper. i) Could the insights from these and other GPSA projects produce lessons about a whole that was more than the sum of the parts (see Box 9 above)? ii) How did this Theory of Action fit into a broader understanding of how change happens within collaborative social accountability processes? iii) Is this emerging understanding also applicable to explain the results of the swath of other social accountability programming?34 iv) If so, could the GPSA be on track to produce and consolidate learning by doing that delivered on the World Bank’s Board mandate to contribute towards “more robust evidence on whether and how social accountability approaches can be sustained, scaled up, and replicated in different sociopolitical settings, and how international partnerships can leverage beneficial change”? © Vincent Tremeau / World Bank. Further permission required for reuse 3 4   This pa pe r expl ores h ow new dat a infor m ed the GPSA’s work . Insights f rom GPSA team members and partners a re c ro s s -fe r t il ized t h roug h wor king w it h ot h er organizations and partners. Arguably then the GPSA’s modest investments in t h o u g ht l eaders h ip, t h roug h w r it ing , as well as convening events spill overed beyond its own programing. However, t ra c i n g t h at c ont r ib ut ion is beyond t h e s c ope of this paper. For examples of cross-fertilization beyond the GPSA po rtfo l i o , s e e (Ja c o bs t ein, 2 0 2 2 ; G uer zov ic h and G ondo, 2022; Guerzovich et al, 2017; Guerzovich, 2022c) 63 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability 08 — References 64 Andrews, Matt; Pritchett, Lant; Woolcock, Michael. 2017. Looking like a state: The seduction of isomorphic mimicry. https://academic.oup.com/book/26994/chapter/196206819 Aston, Thomas. 2022. “Introducing a Resonance Pathway to Scale” published 4 January, 2022, accessed at https://thomasmtaston.medium.com/introducing-a-resonance-pathway-to-scale-6cacd5163cd8 Aston, Thomas. 2020. Rubrics as a harness for complexity. https://thomasmtaston.medium.com/rubrics-as-a- harness-for-complexity-6507b36f312e Aston, Thomas; Guerzovich, Florencia; and Wadeson, Alix. 2021. “Tales of triumph and disaster in the transparency, participation, and accountability sector.” published 26 August 2021 accessed at: https:// thomasmtaston.medium.com/tales-of-triumph-and-disaster-in-the-transparency-participation-and- accountability-sector-5f638261983c Aston, T.; & Zimmer Santos, G. (2022). Social Accountability and Service Delivery Effectiveness: What is the Evidence for the Role of Sanctions. GPSA Bovens, M.; Goodin, R. E., Schillemans; T., Bovens, M; Schillemans, T; and Goodin, R. E. 2014. Public Accountability. In The Oxford Handbook of Public Accountability. https://doi.org/10.1093/ oxfordhb/9780199641253.013.0012 Chingaipe, Henry; Thombozi, Joseph; Katundu, Enea and Bongololo, Grace. 2022. Evaluation of the GPSA Program Based on Projects in the Primary Education Sector in Malawi. Cloutier, Mathieu. 2021. Social Contracts in Sub-Saharan Africa: Concepts and Measurements. World Bank Group. Governance Global Practice. Policy Research Working Paper 9788 Collier, D. and Mahon, JE. 1993. “Conceptual stretching revisited: adapting categories in comparative analysis.” Am Polit Sci Rev 87(4):845–855. Costachi, Ionela; Criste, Aliona; Terzi Barbarosie, Daniela. 2018. My School - Empowered Citizens Enhancing Accountability of the Education Reform and Quality of Education in Moldova (English). Washington, D.C.: World Bank Group, p.44. http://documents.worldbank.org/curated/en/744681621832061746/My-School- Empowered-Citizens-Enhancing-Accountability-of-the-Education-Reform-and-Quality-of-Education-in- Moldova Doin, Guilherme Augusto; Dahmer, Jeferson; Schommer, Paula Chies; Spaniol, Enio Luiz. 2012. Mobilização Social E Coprodução Do Controle: O Que Sinalizam Os Processos De Construção Da Lei Da Ficha Limpa E Da Rede Observatório Social Do Brasil De Controle Social Ecorys, 2020. Reinforcing Social Accountability in Health Services in Sud Kivu and Kongo Central Provinces: Final Evaluation of the GPSA-CODESA Project (English). Washington, D.C.: World Bank Group. http:// documents.worldbank.org/curated/en/883991607354429777/Final-Evaluation-of-the-GPSA-CODESA- Project E-Pact Consortium. 2016. Empowerment and Accountability Annual Technical Report 2016: What Works for Social Accountability, Macro Evaluation of DFID’s Policy Frame for Empowerment and Accountability. Oxford: e-PACT. Fox, J. 2014. Falleti, T. G., & Mahoney, J. 2015. “The comparative sequential method.” In J. Mahoney, & K. Thelen (Eds.), Advances in Comparative-Historical Analysis (Strategies for Social Inquiry) (1st edition ed., pp. 211-239). Cambridge University Press. Falisse, Jean-Benoît; Mulongo, Philémon; and Koko Kirusha, Janvier. n.d. “The Voice and Citizen Action (CVA) Approach of World Vision DRC Meta-Evaluation (2013-2020).” World Vision International. Falisse, J., Mafuta, E., & Mulongo, P. 2019. Reinforcing Social Accountability in Health Services in Sud Kivu and Kongo Central Provinces. Final Evaluation of the GPSA/CODESA Project. Fox, Jonathan. 2014. Social Accountability: What does the evidence really say? GPSA Working Paper No. 1, Washington, DC: International Bank for Reconstruction and Development. Global Partnership for Social Accountability. 2022. Scaling Social Accountability to Create Lasting Change https://thegpsa.org/sessions/scaling-social-accountability-to-create-lasting-change/ 65 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability Global Partnership for Social Accountability. 2021a. How to make mid-level theory more useful for social accountability that contributes to building back better? https://vimeo.com/558157200 Global Partnership for Social Accountability. 2021. Global Partners Forum. Social Accountability for a Strong COVID-19 Recovery. https://vimeo.com/564191691 Global Partnership for Social Accountability. 2020. Theory of Action. https://documents1.worldbank.org/ curated/en/425301607358292998/ pdf/The-Global-Partnership-for-Social-Accountability-Theory-of-Action. pdf Global Partnerships for Social Accountability. 2019a. Global Partners Forum. Social Accountability and the Challenge of Inclusion. https://vimeo.com/451986935?signup=true Global Partnership for Social Accountability. 2019. Improving Transparency and Performance of the Conditional Cash Transfer Program. GPSA project. https://thegpsa.org/projects/improving-transparency- and-performance-of-the-conditional-cash-transfer-program/ Global Partnership for Social Accountability. 2018. Improving Social Accountability in the Water Sector Through the Development of Quality Standards and Citizen Participation in Monitoring in Tajikistan. GPSA Project. https://thegpsa.org/projects/improving-social-accountability-in-the-water-sector-through-the- development-of-quality-standards-and-citizen-participation-in-monitoring-in-tajikistan/ Grandvoinnet, Helene; Aslam, Ghazia; Raha, Shomikho. 2015. Opening the Black Box: The Contextual Drives of Social Accountability. https://elibrary.worldbank.org/doi/abs/10.1596/978-1-4648-0481-6 Guerzovich, Florencia. 2023a. “Does the whole add more than the SUM of its parts?” https://medium.com/@ florcig/does-the-whole-add-more-than-the-sum-of-its-parts-8b9eb352bb67 Guerzovich, Florencia. 2023. “Adaptive Management Across Project Cycles: Look into Coherence in Time” https://medium.com/@florcig/adaptive-management-across-project-cycles-look-into-coherence-in-time- ab99caa3a9e5 Guerzovich, Florencia. 2022. Systems-Aware Social Accountability (SASA): Supporting the Whole to Be Greater than the Sum of Its Parts. Pact. Washington, DC. Guerzovich, Florencia. 2022b. “How Context Shapes Pathways to Scale in Social Accountability”. https:// medium.com/@florcig/how-context-shapes-pathways-to-scale-in-social-accountability-post-4-of-5- d417cfe2b4f5 Guerzovich, Florencia. 2022a. “Scale up In Time: Revisiting How we Evidence Process and Context.” https:// medium.com/@florcig/scale-up-in-time-revisiting-how-we-evidence-process-context-6c53f82a1817 Guerzovich, Florencia. 2022c. Literature Review towards a WV theory on Social Accountability as a Driver of Sustainable Child Well-Being. World Vision International. Guerzovich, Florencia. 2021c. “Pathways to scale in social accountability.” Published December 21. 2023. https://medium.com/@florcig/pathways-to-scale-in-social-accountability-post-1-of-5-40e5ff51a053 Guerzovich, Florencia. 2021b. “Learning from consortia and portfolios: From cacophony to symphony” published August 31, 2021, accessed at: https://medium.com/@florcig/learning-from-consortia-and- portfolios-from-cacophony-to-symphony-ab0c8ddedaff Guerzovich, Florencia. 2015. Evaluación Formativa Proyecto Prácticas de Buen Gobierno en República Dominicana (Proyecto Vigilantes) https://documents1.worldbank.org/curated/en/900101607357369855/ pdf/Evaluacion-Formativa-Proyecto-Practicas-de-Buen-Gobierno-en-Republica-Dominicana-Proyecto- Vigilantes.pdf Guerzovich, Maria F. and Aston, Tom. 2023, “Social Accountability 3.0: Engaging Citizens to Increase Systemic Responsiveness” (July 19, 2023). Available at SSRN: https://ssrn.com/abstract=4606929 Guerzovich, Maria F; Aston, Tom; Levy, Brian; Chies Schommer, Paula; Haines, Rebecca; Cant, Sue; Faria Zimmer Santos, Grazielli. 2022. How do we shape and navigate pathways to social accountability scale? Introducing a middle-level Theory of Change, CEDIL Research Project Paper 1. Centre of Excellence for Development Impact and Learning (CEDIL), London and Oxford. https://policycommons.net/ artifacts/3533985/cedil-research-project-paper-1/4335198/ 66 Re f e r e nce s Guerzovich, Florencia and Gondo, Rachel. 2022. “Social Accountability Practitioners as System Conveners.” https://Medium.Com/@florcig/Social-Accountability-Practitioners-as-System-Conveners-33b77c8a4778 Guerzovich, Florencia; Yeukai Mukorombindo; and Elsie Eyakuze. 2017. “Beyond Fundamentals: Learning About Social Accountability Monitoring Capacities and Action in Southern Africa.” PSAM. Grahamstown. Guerzoich, Florencia; Poli, Maria. 2020. How Social Accountability Strengthens Cross-sector Initiatives to Deliver Quality Health Service? GPSA Note 17 Washington, D.C. World Bank Group. https://documents1. worldbank.org/curated/en/600891606911830725/pdf/How-Social-Accountability-Strengthens-Cross- Sector-Initiatives-to-Deliver-Quality-Health-Services.pdf Green, Duncan. 2017. Theories of Change for Promoting Empowerment and Accountability in Fragile and Conflict-Affected Settings. IDS Working Paper 499. https://www.ids.ac.uk/publications/theories-of-change- for-promoting-empowerment-and-accountability-in-fragile-and-conflict-affected-settings/ Guerzovich, Maria Florencia; Poli, Maria. 2014. How are GPSA’s Partners Thinking About Scale and Trying to Achieve It (English). Global Partnership for Social Accountability (GPSA), Note No. 8 Washington, D.C. World Bank Group. http://documents.worldbank.org/curated/en/654161606890404650/How-are-GPSA-s- Partners-Thinking-About-Scale-and-Trying-to-Achieve-It GPSA. 2016. OGP Summit Workshop, Coproducing Open Government Results: Insights from the Global Partnership for Social Accountability. https://thegpsa.org/event/ogp-summit-workshop-co-producing-open- government-results-insights-from-the-global-partnership-for-social-accountability’ Haldrup, Soren Vester. 2020. “Measuring Systems Transformation: Towards a Preliminary Framework”. UNDP Strategic Innovation. Accessed at https://medium.com/@undp.innovation/measuring-systems- transformation-towards-a-preliminary-framework-958ad3444949 IEG. 2017. Rethinking Evaluation. Reflections from Caroline Heider. https://ieg.worldbankgroup.org/sites/ default/files/Data/RethinkingEvaluation.pdf Integrity Action. 2020. “Citizen-Centred Accountability: How Can We Make it Last?” Briefing Note. October 2020. Accessed at: https://integrityaction.org/media/16127/integrity-action-sustainability-research-briefing- note_.pdf Jacobstein, 2020. What is the Work? https://usaidlearninglab.org/community/blog/what-work Jacobstein, David. 2019. Market Systems Insights for DRG - Success as a Dynamic System. https:// usaidlearninglab.org/community/blog/market-systems-insights-drg-success-dynamic-system Jespersen, Ann-Sofie. 2022. Civil Society Actions Push Reforms in Mauritanian Schools. https://www. globalpartnership.org/blog/civil-society-actions-push-reforms-mauritanian-schools Kania, John; Kramer, Mark; and Senge, Peter. 2018. The Water of Systems Change FSG KOMPAK (KolaborasiMasyarakat dan Pelayanan untuk Kesejahteraan). 2018. “KOMPAK Program Logic and Ways of Working 2018–2022.” Jakarta, Indonesia: KOMPAK. https://www.dfat.gov.au/sites/default/files/ indonesia-kompak-program-logic-and-ways-of-working-2018-2022.pdf Lekweiry, Mohamedou Ould and Falisse, Jean-Benoit. 2022. Final Evaluation of the Transparency of the Mauritanian Education Budget (TOME) Project (English). Washington, D.C.: World Bank Group. http:// documents.worldbank.org/curated/en/352401647610255802/Final-Evaluation-of-the-Transparency-of-the- Mauritanian-Education-Budget-TOME-Project Meadwell, Hudson. 2022. Endogeneity and qualitative political analysis: Debates about method or debates about ontology? https://journals.sagepub.com/doi/full/10.1177/05390184221138493 Meyanathan, Saha. 2021. Catalysts for Change: Parent-Teacher Association in Mongolian Schools. Final Evaluation of the Transparency and Accountability in Mongolian Education Project. Mills, Linnea Cecilia. 2019. Making the Budget Work for Ghana: Final Evaluation (English). Washington, D.C.: World Bank Group. http://documents.worldbank.org/curated/en/418131607356580690/Making-the- Budget-Work-for-Ghana-Final-Evaluation 67 Sc a l i n g U p C o l l a b o r ativ e S o cial Acco u n tability in Co m plex Gover nance Sys tems : A Re l a t i o n a l A p p r o a ch fo r E v iden cin g S u stain ability Nelson, E., Waiswa, P., Coelho, V. S., & Sarriot, E. 2022. Social accountability and health systems’ change, beyond the shock of Covid-19: drawing on histories of technical and activist approaches to rethink a shared code of practice. International Journal for Equity in Health, 21(S1), 41. https://doi.org/10.1186/s12939-022- 01645-0 OECD. 2021.” Using the evaluation criteria in practice”, in Applying Evaluation Criteria Thoughtfully. OECD Publishing, Paris, https://www.oecd-ilibrary.org/development/applying-evaluation-criteria-thoughtfully_ d1aca6d0-en OECD DAC. 2019. Evaluation Criteria https://www.oecd.org/dac/evaluation/ daccriteriaforevaluatingdevelopmentassistance.htm Ostrom E. 1990. Governing the commons: The evolution of institutions for collective action. Cambridge, UK: Cambridge University Press. Patton, Michael Quinn. 2020. Evaluation Criteria for Evaluating Transformation: Implications for the Coronavirus Pandemic and the Global Climate Emergency. American Journal of Evaluation. Volume 42. Issue 1. https://doi.org/10.1177/109821402093368 Pierson, P. 2004. Politics in Time: History, Institutions, and Social Analysis. Poli, Maria; Guerzovich, Maria Florencia. 2020a. Integrating Adaptive Learning in Grant-Making: The Case of the GPSA (English). Global Partnership for Social Accountability (GPSA), Note No.16 Washington, D.C.: World Bank Group. http://documents.worldbank.org/curated/en/116071606910702575/Integrating-Adaptive- Learning-in-Grant-Making-The-Case-of-the-GPSA Poli, Maria; and Guerzovich, Maria Florencia. 2020. Capacity and Implementation Support Area: Portfolio Performance Review (English). Global Partnership for Social Accountability (GPSA), Note No.15 Washington, D.C.: World Bank Group. http://documents.worldbank.org/curated/en/893741606909911810/Capacity-and- Implementation-Support-Area-Portfolio-Performance-Review. Poli, Maria; Guerzovich, Maria Florencia. 2019. How Social Accountability Strengthens Cross-Sector Initiatives to Deliver Quality Health Services (English). Global Partnership for Social Accountability (GPSA), Note No.17 Washington, D.C.: World Bank Group. Tyrrel, Lavinia. 2019. Theory of Change and Theory of Action: What’s the difference and why does it matter? https://abtgovernance.com/2019/07/19/theory-of-change-and-theory-of-action-whats-the-difference-and- why-does-it-matter/ Wadeson, Alix; and Guerzovich, Florencia. 2023. Monitoring, Evaluation, Reporting and Learning Guide for GPSA Grant Partners and Consultants. World Bank, Washington, DC. Wadeson, Alix. 2022. Internal World Bank Document. Wadeson, Alix. 2021. “Orchestrating a MEL system for portfolios and programs: what we’re testing now.” https://medium.com/@alixsara/orchestrating-a-mel-system-for-portfolios-and-programs-what-were-testing- now-4ca76210c2b7 Wadeson, Alix; Monzani, Bernardo; Aston, Tom. 2020. Process Tracing as a Practical Evaluation Method: Comparative Learning from Six Evaluations. https://mande.co.uk/wp-content/uploads/2020/03/Process- Tracing-as-a-Practical-Evaluation-Method_23March-Final.pdf Wenger-Trayner, Beverly. 2014. What is Social Learning? https://www.wenger-trayner.com/what-is-social- learning/#:~:text=Social%20learning%20in%20the%20way,in%20something%20they%20care%20about Wenger-Trayner, E; and Wenger-Trayner B. 2021. Systems convening: a crucial form of leadership for the 21st century. Social Learning Lab. World Bank Group. 2023. The Rigor of Case-Based Causal Analysis Busting Myths through a Demonstration. The Rigor of Case-Based Causal Analysis (worldbankgroup.org) World Bank Group. 2020. Reinforcing Social Accountability in Health Services in Sud Kivu and Kongo Central Provinces: Final Evaluation of the GPSA-CODESA Project (English). Washington, D.C.: World Bank Group. http://documents.worldbank.org/curated/en/883991607354429777/Final-Evaluation-of-the-GPSA-CODESA- Project 68 Re f e r e nce s World Bank Group. 2017. 2017: Governance World Development Report and the Law. Washington, DC: World Bank. © World Bank. https:// openknowledge.worldbank.org/handle/10986/25880 License: CC BY 3.0 IGO. World Bank Group. 2007. Citizen Voice and Action for Government Accountability and Improved Services: Maternal, Newborn, Infant and Child Health Services: Final Evaluation Report (English). Washington, D.C.: World Bank Group. http://documents.worldbank.org/curated/en/331651607355519716/Final-Evaluation- Report 69