Note No. 68                                                                                                                        July 2001

      Social Assessments and Program Evaluation with Limited
                                                           Formal Data:
                   Thinking Quantitatively, Acting Qualitatively


This note revisits the long-standing tension between                            were understandably skeptical about studying
qualitative and quantitative approaches to poverty                              communities coldly selected by outsiders on the
analysis, with reference to social assessments and                              basis of a dated census when they themselves were
program evaluation. It presents a summary of recent                             intimately familiar with their small country. Yet to
work in St. Lucia and Colombia, where innovative                                turn the entire processes of selecting 16
efforts were made to integrate the guiding                                      communities from a total sample of more than 400
principles of quantitative approaches with the                                  over to field staff would have been cumbersome,
practice of qualitative approaches. While neither                               controversial, and time-consuming.
case should be seen as ideal or a substitute for a
more comprehensive analysis, they nonetheless                                   A compromise strategy entailed using the census
present a series of strategies for generating some                              data to make the first round of cuts in the total
meaningful and useful results in environments                                   sample. The first step was to identify the poorest
where, for any number of reasons, formal data is                                200 households; of the total sample, income data
weak or absent. Such environments, of course, are                               was available on 360 communities, and by ranking
all too common in low-income countries.                                         them on the basis of average household income we
                                                                                were able to identify the poorest 200.2 The census
The first case, a social assessment of poverty,                                 also contained data on the number of households in
comes from St. Lucia. The task manager had funds                                each community receiving particular forms of water
sufficient to cover key informant and focus group                               delivery and sewerage (public/private pipe, well,
interviews in sixteen communities around the                                    etc), enabling a "quality of basic services" index to
island. Given this small number, he elected not to                              be constructed, scored on a 1 (low) to 7 (high)
work with a "random sample" as such but rather to                               scale. The 200 poorest communities were therefore
maximize coverage on as many key variables as                                   able to be ranked according to their quality of basic
possible (rural/urban, access to clean water,                                   services. Finally, using geographical data, we were
distance to main road, level of poverty, etc).1 Our St                          able to measure the distance of all 200 communities
Lucia-based colleagues happened to have access to                               from the main ring road that circumnavigates St
a 1990 census, but it did not contain data on the full                          Lucia. Dividing the sample in half on the basis of
set of variables that would have enabled us to                                  their distance measure, we labeled those close to the
generate a final sample meeting all our criteria.                               road "urban", and those far from the road "rural".
Moreover, program field staff, who would be                                     We were thus able to construct a simple 2x2 matrix,
charged with conducting the social assessment,                                  with `Quality of Basic Services' (high/low) on one
___________________________________________________________________________________________________
Michael Woolcock is the author of this note. He is a social scientist in the Development Research Group.

           The views expressed in this note are those of the author(s) and do not necessarily reflect the official policies of the World Bank.

axis, and `Rural/Urban' on the other. St Lucia's 200       The managers of the program were confident that
poorest communities now fell neatly onto these             both goals were indeed being accomplished, but
axes, with 50 communities in each cell.                    were conscious that, sooner or later, they would
                                                           need evidence to persuade both the current
This was followed up the next day by a four-hour           administration and any subsequent one that the
session with field staff selecting the final 16            program was performing as advertised, and thus a
communities. Twenty field staff gathered for this          deserving recipient of continued funding. The
meeting, and after a brief presentation on the task at     program managers sought advice on how to
hand and the steps already taken with the census           proceed, but upon arrival in Bogota routine
data, they were divided into four groups. Each             preliminary discussions confirmed that they had
group was given the names of 50 poor communities           collected no baseline data on program participants
from one of the 2x2 cells above, and was then asked        (let alone non-participants), and that no census-type
to select five communities from this list that varied      data was available, certainly not in new areas where
according to (a) exposure to the recent hurricane,         recent migrants were settling. Stressing from the
(b) major forms of employment, and (c) whether or          outset that the point of conducting evaluations was
not they had participated in the initial round of the      not to provide "evidence" to save even the most
St Lucia Social Development Program. After two             well-intentioned programs, I agreed to stay on and
hours, the four groups reconvened with the name of         advise them on how one might proceed.
their five communities, and over the final hour all
field staff negotiated together to whittle the list of     I began by outlining the fundamentals of program
twenty names down to 16 to ensure (d) adequate             evaluation, explaining its task of disentangling
regional coverage and (e) that no two communities          program from non-program effects. Is a village
were contiguous across regional boundaries. At the         microcredit project raising the income of the poor,
end of four hours, the group emerged with a list of        or is this outcome the result of a recent increase in
sixteen communities that maximized the variance            remittances from relatives living in the city? There
according to the eight different criteria listed above.    are several formal ways to find out the answer. The
                                                           classical (and ideal) approach is to allocate
Reliance on quantitative or qualitative methods            programs randomly from the outset (as in medical
alone could never have achieved this result: formal        trials), thereby enabling the evaluator to state with
data was limited and dated but nonetheless still           confidence that observed outcome differences
useful; it was unrealistic and invalid to rely             between participants and non-participants are
exclusively on local experts. Together, however, a         indeed attributable to the program. If random
superior outcome combining the best aspects of             assignment is not administratively or ethically
both enabled the selected sample to have maximum           possible (as is often the case in development), a
diversity, validity and full local ownership.              "double differences" approach can be taken, in
                                                           which the evaluator subtracts the difference
The second case, evaluation of an urban community          between baseline and outcome scores for
development program, comes from Bogota,                    participants (first difference) from the difference
Colombia. This program was being implemented by            between baseline and outcome scores for non-
the Bogota city council3, and had been in place for        participants (second difference). If baseline data is
some three years. In essence, the program entailed         not available--as is also common--but some other
working with recent immigrants into the barrios in         independent source of data is (such as a census or
the outer districts of the city, in the hope that (a) by   other large household survey), then the evaluator
incorporating the immigrants' ideas, time, and             may use a "matched sampling" strategy, in which
resources directly into the design and                     both the program and independent datasets are used
implementation of local infrastructure projects, the       to identify samples of participants and non-
projects would be carried out more effectively and         participants who share otherwise similar
efficiently, and thereby (b) help to assimilate the        demographic characteristics.4 Recent work
new arrivals, encouraging them to take on the              deploying these strategies5 has greatly helped to
identity and responsibilities of citizenship in the        improve the reliability and validity of program
city. In a country long beset with violence and            evaluation at the World Bank and elsewhere.
accompanying waves of rural-urban migration, the
importance of accomplishing both goals cannot be           What if neither baseline nor sufficient independent
overstated.                                                data is available, as was the case here? This too is

                                                                                                                2

common, especially for projects being conducted in        not (strong, weak).7 One community from each of
countries with weak administrative infrastructure         the four categories is then selected at random, but
and/or by small organizations or government               assigned to the team that feels it knows it best. 48
departments that lack the necessary human and             different communities (12 teams x 4 categories) are
financial resources. In these circumstances a             thus selected.
realization of the importance of evaluation may
come only after the program has been running for          Each team is to spend at least a day in each
several years. Can something be done to generate          community, conducting key informant and (if
some reasonable answers for program managers and          possible) focus group interviews, asking open-
sponsors?                                                 ended questions regarding program processes and
                                                          outcomes. The team should also construct a basic
A common response in these circumstances is to            "community profile" on the basis of their
conduct either (a) a PRA/RRA-type exercise, in            observations and interactions, which describes
which program beneficiaries are asked to document         various demographic features ­ size, quality of
diagrammatically their family's and their                 housing, major sources of employment, levels of
community's economic circumstances before and             unemployment, age distribution, access to markets,
after participation in the project, or (b) some other     quality of infrastructure (roads, electricity),
form of "participatory" assessment6, in which key         stratification (0-6)8, presence of other development
informants are identified and focus groups                programs, etc. The team's prior familiarity with the
assembled for semi-structured interviews regarding        community should expedite this.
participants' perceptions of the program's efficacy.
These approaches have their place, and can                While the team is in the field, the principal
certainly yield useful insights, but have been rightly    investigator prepares a draft of a short basic survey
criticized for providing non-representative               instrument that will be used to obtain quantitative
information, against which it is hard to make             data on selected outcome variables (based on the
decisions regarding the fate of even the most             program's stated objectives, and discussions with
humble program.                                           program staff).

The one feature of this program that I felt could be      All teams report back for a joint day-long meeting.
the basis for the beginnings of an "evaluation" was       Each team will have prepared (a) a "community
the presence of a large number (150) of experienced       profile" for each of the four communities visited,
field staff, whose detailed knowledge of the city         and (b) a summary of the responses provided to the
constituted an untapped reservoir of information.         open-ended questions. The 12 teams should be
The program's senior management indicated that            divided into four groups of three teams, to hear
they would encourage field staff to be involved in        reports on 12 communities. The goal of the
this exercise, on the grounds that it was giving them     presentations should be to (a) clarify and confirm
additional skills while simultaneously helping to         the findings from the field regarding the 48
generate feedback on the program. In a short              community profiles, and to (b) provide feedback to
address to the staff, I laid out the basic problem at     the principal investigator on the validity and
hand, and requested their cooperation in the              appropriateness of the draft survey instrument.
following strategy, developed after consultation
with program managers:                                    The crucial next stage involves going back to the
                                                          four teams and their 12 community profiles. Using
Senior program managers recruit 24 staff members          these profiles, and each members' detailed
to do the "evaluation", and pair them into 12 teams.      knowledge of Bogota, the teams are then asked to
The managers should prepare a master list of the          select 12 communities elsewhere in Bogota that
names and locations of all the communities in             are as similar as possible demographically to
which the program is being conducted.                     those they have profiled except that they have not
                                                          received the program. Having selected their 12
At a meeting of all selected field staff, each of these   "matching communities", the group's choices
communities are allocated into four categories,           would be ratified by the other groups in a joint
broadly distinguished on the basis of length of time      final session. At the end of the day, then, 48
the program had been in effect (new, old) and             comparable non-program communities will have
whether, in their estimation, it was going well or        been identified.

                                                                                                              3

On the basis of feedback from the meeting, the                                relatively simple principle of "thinking
principal investigator modifies the survey                                    quantitatively, acting qualitatively." This entails
instrument. Depending on available resources, the                             using the people most familiar with the program's
survey is then administered to a number (10-100) of                           communities (program field staff and community
randomly selected households in the 48 program                                workers) to provide the type of information, albeit
and 48 non-program communities. Questions on                                  approximate, that would otherwise come from a
present and past behavior on key outcome variables                            census or other formal independent data source. The
are then obtained, generating the data needed for a                           qualitative material is instructive in its own right,
basic "double difference" evaluation. In short, this                          because it has been collected systematically,
approach has used qualitative methods to obtain the                           thereby enabling more valid generalizations9 to be
"matched samples" needed to conduct a quantitative                            drawn from it. Working directly with field staff also
"double difference" evaluation.                                               helps to build local capacity, and their involvement
                                                                              in and commitment to the evaluation process.
A last-minute change in the directorship of the
program and competing commitments in the months                               Vitally important poverty alleviation work is being
leading up to the city elections meant that this                              undertaken in countries around the world where
evaluation protocol never got beyond the formative                            formal data simply is not available, nor likely to be
stages. However, it shows one strategy for using                              anytime soon. Rather than ignoring these situations
qualitative methods to (a) construct the missing                              and the important lessons they hold, this note
quantitative data needed to ascertain a program's                             argues that the logic underlying more formal
effectiveness, and (b) systematically generate useful                         quantitative program evaluation strategies can be
insights in their own right. It can be adapted for                            fruitfully applied using qualitative approaches,
other environments where formal data is missing.                              yielding insight and ownership that neither
                                                                              approach would attain on its own.
Both of these cases demonstrate innovative ways in
which to integrate different methodologies in
development studies. They are based on the

Endnotes

1Maximizing the diversity of a sample in order to search for commonalties--the "method of agreement", as
opposed to the "method of difference"--is a common approach in inductive research, where the goal is to generate
propositions rather than test hypotheses. See Charles Ragin (1987) The Comparative Methods: Moving Beyond
Qualitative and Quantitative Strategies. Berkeley: University of California Press.
2Census data was missing from several of the smallest communities. Given that a community needed to have at
least 15 households to generate enough people for interviews, selecting the "poorest 200" communities using this
method was not considered to be unduly biased. Another assumption was that income and quality-of-services
changes over the intervening decade had affected each community unilaterally, with minor impacts on ranking.
3El Departamento Administrativo de Acción Comunal.
4A related approach to generating comparable groups entails the use of "propensity scores." Details on each of these
formal approaches to program evaluation can be found in Judy L. Baker (2000) Evaluating the Impact of
Development Projects on Poverty: A Handbook for Practitioners. Washington, D.C: The World Bank
5See Jyotsna Jalan and Martin Ravallion (1999) "Income gains to the poor from workfare: estimates for Argentina's
trabajar program." Policy Research Working Paper No. 2149. Washington, DC: The World Bank
6On participatory approaches see Caroline Robb (1999) Can the Poor Influence Policy? Participatory Poverty
Assessments in the Developing World. Washington, DC: The World Bank.
7If funds are available to allow a sufficiently large sample to be selected, then naturally it should be selected
randomly. In this case, as in many others of this kind, funds were limited, permitting only a relatively small sample.
It was deemed best to ensure that this small sample contained some structured degree of variation. Having such
variation is also useful should the evaluation suddenly end prematurely (for financial, political, or administrative
reasons), since it enables some preliminary conclusions to be salvaged.
8This is a standard classification in Bogota used to categorize communities by class/income differences.
9On managing various forms of validity in qualitative work, see Joseph Maxwell (1992) "Understanding and
validity in qualitative research." Harvard Education Review 62(3): 279-300.

Social Development Notes" are published informally by the Social Development Family in the Environmentally and Socially Sustainable
Development Network of the World Bank. For additional copies, contact Social Development Publications, World Bank, 1818 H Street, NW,
MSN MC5-507, Washington, DC 20433, USA, Fax: 202-522-3247, E-mail: sdpublications@worldbank.org.
                                                                                                                                        4
       Printed on Recycled Paper