Note No. 68 July 2001 Social Assessments and Program Evaluation with Limited Formal Data: Thinking Quantitatively, Acting Qualitatively This note revisits the long-standing tension between were understandably skeptical about studying qualitative and quantitative approaches to poverty communities coldly selected by outsiders on the analysis, with reference to social assessments and basis of a dated census when they themselves were program evaluation. It presents a summary of recent intimately familiar with their small country. Yet to work in St. Lucia and Colombia, where innovative turn the entire processes of selecting 16 efforts were made to integrate the guiding communities from a total sample of more than 400 principles of quantitative approaches with the over to field staff would have been cumbersome, practice of qualitative approaches. While neither controversial, and time-consuming. case should be seen as ideal or a substitute for a more comprehensive analysis, they nonetheless A compromise strategy entailed using the census present a series of strategies for generating some data to make the first round of cuts in the total meaningful and useful results in environments sample. The first step was to identify the poorest where, for any number of reasons, formal data is 200 households; of the total sample, income data weak or absent. Such environments, of course, are was available on 360 communities, and by ranking all too common in low-income countries. them on the basis of average household income we were able to identify the poorest 200.2 The census The first case, a social assessment of poverty, also contained data on the number of households in comes from St. Lucia. The task manager had funds each community receiving particular forms of water sufficient to cover key informant and focus group delivery and sewerage (public/private pipe, well, interviews in sixteen communities around the etc), enabling a "quality of basic services" index to island. Given this small number, he elected not to be constructed, scored on a 1 (low) to 7 (high) work with a "random sample" as such but rather to scale. The 200 poorest communities were therefore maximize coverage on as many key variables as able to be ranked according to their quality of basic possible (rural/urban, access to clean water, services. Finally, using geographical data, we were distance to main road, level of poverty, etc).1 Our St able to measure the distance of all 200 communities Lucia-based colleagues happened to have access to from the main ring road that circumnavigates St a 1990 census, but it did not contain data on the full Lucia. Dividing the sample in half on the basis of set of variables that would have enabled us to their distance measure, we labeled those close to the generate a final sample meeting all our criteria. road "urban", and those far from the road "rural". Moreover, program field staff, who would be We were thus able to construct a simple 2x2 matrix, charged with conducting the social assessment, with `Quality of Basic Services' (high/low) on one ___________________________________________________________________________________________________ Michael Woolcock is the author of this note. He is a social scientist in the Development Research Group. The views expressed in this note are those of the author(s) and do not necessarily reflect the official policies of the World Bank. axis, and `Rural/Urban' on the other. St Lucia's 200 The managers of the program were confident that poorest communities now fell neatly onto these both goals were indeed being accomplished, but axes, with 50 communities in each cell. were conscious that, sooner or later, they would need evidence to persuade both the current This was followed up the next day by a four-hour administration and any subsequent one that the session with field staff selecting the final 16 program was performing as advertised, and thus a communities. Twenty field staff gathered for this deserving recipient of continued funding. The meeting, and after a brief presentation on the task at program managers sought advice on how to hand and the steps already taken with the census proceed, but upon arrival in Bogota routine data, they were divided into four groups. Each preliminary discussions confirmed that they had group was given the names of 50 poor communities collected no baseline data on program participants from one of the 2x2 cells above, and was then asked (let alone non-participants), and that no census-type to select five communities from this list that varied data was available, certainly not in new areas where according to (a) exposure to the recent hurricane, recent migrants were settling. Stressing from the (b) major forms of employment, and (c) whether or outset that the point of conducting evaluations was not they had participated in the initial round of the not to provide "evidence" to save even the most St Lucia Social Development Program. After two well-intentioned programs, I agreed to stay on and hours, the four groups reconvened with the name of advise them on how one might proceed. their five communities, and over the final hour all field staff negotiated together to whittle the list of I began by outlining the fundamentals of program twenty names down to 16 to ensure (d) adequate evaluation, explaining its task of disentangling regional coverage and (e) that no two communities program from non-program effects. Is a village were contiguous across regional boundaries. At the microcredit project raising the income of the poor, end of four hours, the group emerged with a list of or is this outcome the result of a recent increase in sixteen communities that maximized the variance remittances from relatives living in the city? There according to the eight different criteria listed above. are several formal ways to find out the answer. The classical (and ideal) approach is to allocate Reliance on quantitative or qualitative methods programs randomly from the outset (as in medical alone could never have achieved this result: formal trials), thereby enabling the evaluator to state with data was limited and dated but nonetheless still confidence that observed outcome differences useful; it was unrealistic and invalid to rely between participants and non-participants are exclusively on local experts. Together, however, a indeed attributable to the program. If random superior outcome combining the best aspects of assignment is not administratively or ethically both enabled the selected sample to have maximum possible (as is often the case in development), a diversity, validity and full local ownership. "double differences" approach can be taken, in which the evaluator subtracts the difference The second case, evaluation of an urban community between baseline and outcome scores for development program, comes from Bogota, participants (first difference) from the difference Colombia. This program was being implemented by between baseline and outcome scores for non- the Bogota city council3, and had been in place for participants (second difference). If baseline data is some three years. In essence, the program entailed not available--as is also common--but some other working with recent immigrants into the barrios in independent source of data is (such as a census or the outer districts of the city, in the hope that (a) by other large household survey), then the evaluator incorporating the immigrants' ideas, time, and may use a "matched sampling" strategy, in which resources directly into the design and both the program and independent datasets are used implementation of local infrastructure projects, the to identify samples of participants and non- projects would be carried out more effectively and participants who share otherwise similar efficiently, and thereby (b) help to assimilate the demographic characteristics.4 Recent work new arrivals, encouraging them to take on the deploying these strategies5 has greatly helped to identity and responsibilities of citizenship in the improve the reliability and validity of program city. In a country long beset with violence and evaluation at the World Bank and elsewhere. accompanying waves of rural-urban migration, the importance of accomplishing both goals cannot be What if neither baseline nor sufficient independent overstated. data is available, as was the case here? This too is 2 common, especially for projects being conducted in not (strong, weak).7 One community from each of countries with weak administrative infrastructure the four categories is then selected at random, but and/or by small organizations or government assigned to the team that feels it knows it best. 48 departments that lack the necessary human and different communities (12 teams x 4 categories) are financial resources. In these circumstances a thus selected. realization of the importance of evaluation may come only after the program has been running for Each team is to spend at least a day in each several years. Can something be done to generate community, conducting key informant and (if some reasonable answers for program managers and possible) focus group interviews, asking open- sponsors? ended questions regarding program processes and outcomes. The team should also construct a basic A common response in these circumstances is to "community profile" on the basis of their conduct either (a) a PRA/RRA-type exercise, in observations and interactions, which describes which program beneficiaries are asked to document various demographic features ­ size, quality of diagrammatically their family's and their housing, major sources of employment, levels of community's economic circumstances before and unemployment, age distribution, access to markets, after participation in the project, or (b) some other quality of infrastructure (roads, electricity), form of "participatory" assessment6, in which key stratification (0-6)8, presence of other development informants are identified and focus groups programs, etc. The team's prior familiarity with the assembled for semi-structured interviews regarding community should expedite this. participants' perceptions of the program's efficacy. These approaches have their place, and can While the team is in the field, the principal certainly yield useful insights, but have been rightly investigator prepares a draft of a short basic survey criticized for providing non-representative instrument that will be used to obtain quantitative information, against which it is hard to make data on selected outcome variables (based on the decisions regarding the fate of even the most program's stated objectives, and discussions with humble program. program staff). The one feature of this program that I felt could be All teams report back for a joint day-long meeting. the basis for the beginnings of an "evaluation" was Each team will have prepared (a) a "community the presence of a large number (150) of experienced profile" for each of the four communities visited, field staff, whose detailed knowledge of the city and (b) a summary of the responses provided to the constituted an untapped reservoir of information. open-ended questions. The 12 teams should be The program's senior management indicated that divided into four groups of three teams, to hear they would encourage field staff to be involved in reports on 12 communities. The goal of the this exercise, on the grounds that it was giving them presentations should be to (a) clarify and confirm additional skills while simultaneously helping to the findings from the field regarding the 48 generate feedback on the program. In a short community profiles, and to (b) provide feedback to address to the staff, I laid out the basic problem at the principal investigator on the validity and hand, and requested their cooperation in the appropriateness of the draft survey instrument. following strategy, developed after consultation with program managers: The crucial next stage involves going back to the four teams and their 12 community profiles. Using Senior program managers recruit 24 staff members these profiles, and each members' detailed to do the "evaluation", and pair them into 12 teams. knowledge of Bogota, the teams are then asked to The managers should prepare a master list of the select 12 communities elsewhere in Bogota that names and locations of all the communities in are as similar as possible demographically to which the program is being conducted. those they have profiled except that they have not received the program. Having selected their 12 At a meeting of all selected field staff, each of these "matching communities", the group's choices communities are allocated into four categories, would be ratified by the other groups in a joint broadly distinguished on the basis of length of time final session. At the end of the day, then, 48 the program had been in effect (new, old) and comparable non-program communities will have whether, in their estimation, it was going well or been identified. 3 On the basis of feedback from the meeting, the relatively simple principle of "thinking principal investigator modifies the survey quantitatively, acting qualitatively." This entails instrument. Depending on available resources, the using the people most familiar with the program's survey is then administered to a number (10-100) of communities (program field staff and community randomly selected households in the 48 program workers) to provide the type of information, albeit and 48 non-program communities. Questions on approximate, that would otherwise come from a present and past behavior on key outcome variables census or other formal independent data source. The are then obtained, generating the data needed for a qualitative material is instructive in its own right, basic "double difference" evaluation. In short, this because it has been collected systematically, approach has used qualitative methods to obtain the thereby enabling more valid generalizations9 to be "matched samples" needed to conduct a quantitative drawn from it. Working directly with field staff also "double difference" evaluation. helps to build local capacity, and their involvement in and commitment to the evaluation process. A last-minute change in the directorship of the program and competing commitments in the months Vitally important poverty alleviation work is being leading up to the city elections meant that this undertaken in countries around the world where evaluation protocol never got beyond the formative formal data simply is not available, nor likely to be stages. However, it shows one strategy for using anytime soon. Rather than ignoring these situations qualitative methods to (a) construct the missing and the important lessons they hold, this note quantitative data needed to ascertain a program's argues that the logic underlying more formal effectiveness, and (b) systematically generate useful quantitative program evaluation strategies can be insights in their own right. It can be adapted for fruitfully applied using qualitative approaches, other environments where formal data is missing. yielding insight and ownership that neither approach would attain on its own. Both of these cases demonstrate innovative ways in which to integrate different methodologies in development studies. They are based on the Endnotes 1Maximizing the diversity of a sample in order to search for commonalties--the "method of agreement", as opposed to the "method of difference"--is a common approach in inductive research, where the goal is to generate propositions rather than test hypotheses. See Charles Ragin (1987) The Comparative Methods: Moving Beyond Qualitative and Quantitative Strategies. Berkeley: University of California Press. 2Census data was missing from several of the smallest communities. Given that a community needed to have at least 15 households to generate enough people for interviews, selecting the "poorest 200" communities using this method was not considered to be unduly biased. Another assumption was that income and quality-of-services changes over the intervening decade had affected each community unilaterally, with minor impacts on ranking. 3El Departamento Administrativo de Acción Comunal. 4A related approach to generating comparable groups entails the use of "propensity scores." Details on each of these formal approaches to program evaluation can be found in Judy L. Baker (2000) Evaluating the Impact of Development Projects on Poverty: A Handbook for Practitioners. Washington, D.C: The World Bank 5See Jyotsna Jalan and Martin Ravallion (1999) "Income gains to the poor from workfare: estimates for Argentina's trabajar program." Policy Research Working Paper No. 2149. Washington, DC: The World Bank 6On participatory approaches see Caroline Robb (1999) Can the Poor Influence Policy? Participatory Poverty Assessments in the Developing World. Washington, DC: The World Bank. 7If funds are available to allow a sufficiently large sample to be selected, then naturally it should be selected randomly. In this case, as in many others of this kind, funds were limited, permitting only a relatively small sample. It was deemed best to ensure that this small sample contained some structured degree of variation. Having such variation is also useful should the evaluation suddenly end prematurely (for financial, political, or administrative reasons), since it enables some preliminary conclusions to be salvaged. 8This is a standard classification in Bogota used to categorize communities by class/income differences. 9On managing various forms of validity in qualitative work, see Joseph Maxwell (1992) "Understanding and validity in qualitative research." Harvard Education Review 62(3): 279-300. Social Development Notes" are published informally by the Social Development Family in the Environmentally and Socially Sustainable Development Network of the World Bank. For additional copies, contact Social Development Publications, World Bank, 1818 H Street, NW, MSN MC5-507, Washington, DC 20433, USA, Fax: 202-522-3247, E-mail: sdpublications@worldbank.org. 4 Printed on Recycled Paper