The World Bank Economic Review, 36(4), 2022, 889–908
                                                                                https://doi.org10.1093/wber/lhac015
                                                                                                                     Article




                                                                                                                                                         Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
Telescoping Error in Recalled Food Consumption:
Evidence from a Survey Experiment in Ethiopia
Gashaw T. Abate , Alan de Brauw                                            , John Gibson                   , Kalle Hirvonen                     ,
and Abdulazize Wolle
Abstract
Telescoping errors occur if survey respondents misdate events from outside the reference period and include
them in their recall. Concern about telescoping influenced the design of early Living Standards Measurement
Study (LSMS) surveys, which used a two-visit interview format to bound food consumption recall. This design
fell out of favor although not for evidence-based reasons. To measure the extent of telescoping bias on food
consumption measures, a survey experiment was conducted in Addis Ababa, Ethiopia, randomly assigning
households to either a two-visit bounded recall or a single visit unbounded recall. The average value of reported
food consumption is 16 percent higher (95 percent CI: 7.4–25.9) in the unbounded single visit recall relative to
the two-visit bounded recall. Most of the error is explained by difference in reported spending on less frequently
consumed, protein-rich foods, so apparent food security indicators based on household diet diversity are likely
overstated with unbounded recall.
JEL classification: C81, D12, I32

Keywords: diet quality, food consumption, household surveys, recall, telescoping




Gashaw T. Abate is a research fellow at the International Food Policy Research Institute (IFPRI), 1201 Eye St NW, Washington,
DC, 20005, USA; his email address is g.abate@cgiar.org. Alan de Brauw (corresponding author) is a senior research fellow at
IFPRI; his email address is a.debrauw@cgiar.org. John Gibson is a professor of economics at the University of Waikato, Private
Bag 3105, Hamilton 3240, New Zealand; his email address is jkgibson@waikato.ac.nz. Kalle Hirvonen is a senior research
fellow at IFPRI and a research fellow at the United Nations University World Institute for Development Economics Research
(UNU-WIDER), Katajanolanlaituri 6B, Helsinki, Fl-00160, Finland; his email address is k.hirvonen@cgiar.org. Abdulazize
Wolle is a graduate student in the Department of Economics at the University at Albany: State University of New York at
Albany, 1400 Washington Ave, Albany, NY 12222, New York; his email address is abdulazize.wolle@gmail.com. The authors
thank the team at NEED for excellent survey coordination, particularly Abinet Tekle, Betelhem Lakew, Alemayehu Deme,
and Abraha Weldegerima. The authors are also grateful for the survey supervisors and enumerators for their hard work in
interviewing the respondents. None of this work would have been possible without the generosity of the households that took
part in these surveys. The authors thank them all sincerely. Thanks also to Roy Van der Weide (the editor), three anonymous
reviewers, Kibrom A. Abay, Kate Ambler, Kaleab Baye, Kathleen Beegle, Calogero Carletto, Joachim De Weerdt, Tesfaye
Hailu, Sylvan Herskowitz, Vivian Hoffmann, Dean Jolliffe, Talip Kilic, Berber Kramer, Karen Macours, Otto Toivanen, and
Alberto Zezza for comments that improved this manuscript. This work was undertaken as part of, and funded by, the CGIAR
Research Program on Agriculture for Nutrition and Health (A4NH). The opinions expressed here belong to the authors and
do not necessarily reflect those of A4NH or CGIAR. A supplementary online appendix is available with this article at the
World Bank Economic Review website.

© The Author(s) 2022. Published by Oxford University Press on behalf of the International Bank for Reconstruction and Development / THE WORLD BANK.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/
licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
For commercial re-use, please contact journals.permissions@oup.com
890                                                                                                        Abate et al.


1. Introduction
Monitoring progress towards meeting the first two Sustainable Development Goals (SDGs), to end poverty
and hunger, requires accurate measurement. In low- and middle-income countries, the large consumption
share for food means that any comprehensive assessment of welfare requires accurate food consumption




                                                                                                                          Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
data, typically obtained through surveys. Yet the general understanding of errors in the survey measures
of food consumption remains incomplete, bringing into question how progress towards the SDGs can be
measured accurately.
   Recently, this understanding has improved due to a series of survey experiments; key findings are
described by De Weerdt, Gibson, and Beegle (2020).1 These experiments generally show that survey design
can have a large influence on measurements of concepts related to household consumption, labor use,
and agricultural production. Moreover, the error structures appear to be complex and do not follow
the classical assumptions of errors that are purely random. As survey designs exist that can reduce the
occurrence of these errors, such designs may be quite important to measuring indicators of progress
toward the SDGs.
   Surprisingly, a long-discussed type of error in consumption surveys has not been addressed by the
survey experiments covered in this recent literature. This error is telescoping, which is misdating by ei-
ther recalling more distant events as occurring more recently (forward telescoping) or pushing recent
events further back in time (backwards telescoping). Mahalanobis and Sen (1953) gave early attention
to telescoping after finding that the food consumption of Indian households reported with a one-week
unbounded recall appeared to greatly exceed that of households for whom the foods that were consumed
had been weighted. Concern about telescoping influenced design of the early Living Standards Measure-
ment Study (LSMS) surveys, which adopted a two-visit interview format partly to allow a bounded recall.2
What Deaton (1997, 26) called the standard format of LSMS surveys was to have “two visits, roughly
two weeks apart, and the interviewer asks how much was spent on each food item “since my last visit.”3
   However, the two-visit format fell out of favor when Vietnam abandoned it after their 1998 survey,
and other countries followed suit. Thus, by 2000, Deaton and Grosh (2000, 114) noted (in the LSMS
books entitled Designing Household Surveys for Developing Countries) the two-visit structure was being
used less frequently. Going back to using unbounded recall was not the result of evidence of either the
unimportance of telescoping or of the failure the two-visit format. Instead, it reflected the practical matter
that including bounding visits will tend to raise survey costs and complicate field work because of the
need to return to the same households within a week or two. Without any firm evidence that the two-visit
format helped, it was easy to jettison the method for something simpler and less costly.
   Yet researchers continue to speculate that telescoping errors affect patterns found in consumption
survey data. For example, in a survey experiment in Tanzania, Beegle et al. (2012) find unbounded 7-day
recall yields higher estimates of consumption and lower poverty rates than 14-day recall with the same list
of items and comes closer to matching their benchmark from a highly supervised 14-day individual diary.
They note telescoping could contribute to this pattern, as bringing forward consumption that happened
before the recall period matters more when that error is spread over just 7 days rather than over 14 days
(Eisenhower, Mathiowetz, and Morganstein 2004). To provide more evidence on telescoping, Beegle et al.
(2012) suggest that future survey experiments could compare bounded and unbounded seven-day recall.
   A further analysis of the experiment in Tanzania found that macro- and micro-nutrient intakes mea-
sured from the seven-day recall most closely match those derived from the benchmark individual diaries,

1     Special issues of journals that focus on survey measurement and survey experiments are introduced by McKenzie and
      Rosenzweig (2012) and Zezza et al. (2017).
2     Using two visits also let very long LSMS interviews be broken into two more reasonable blocks of time.
3     Early LSMS surveys with the two-visit LSMS format include those for Côte d’Ivoire, Ghana, Pakistan, and Vietnam.
      However, this approach was not used in Jamaica, Nepal, South Africa, and several other countries.
The World Bank Economic Review                                                                                        891


even as the seven-day recall is sensitive to having errors that vary with household characteristics (Ameye,
De Weerdt, and Gibson 2021). These authors suggest that seven-day recall may have offsetting errors—
including those from telescoping—that roughly balance, and they note that an experiment on telescoping
may help diagnose the source of the apparent success (in terms of matching the benchmark) with this




                                                                                                                             Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
design. In an experiment in Niger, seven-day unbounded recall indicated that food consumption was 28
percent higher than what a seven-day diary showed; the authors speculate that telescoping could cause
the seven-day recall to overstate the true value of consumption (Backiny-Yetna, Steele, and Djima 2017).
    This paper provides evidence on telescoping from a survey experiment in Addis Ababa, Ethiopia, that
randomly assigned either two-visit bounded recall or single-visit unbounded recall to surveyed house-
holds. In the two-visit format, a survey supervisor visited the household prior to the actual survey. During
this first visit, the supervisor only informed the household that an enumerator would visit the household
exactly seven days later. No data were collected during this visit, nor did the supervisor prime households
to start thinking about their food consumption during the next seven days. The purpose of the super-
visor visit was to introduce a salient recall marker. In the second visit, respondents were asked to recall
consumption of food items since the visit by the supervisor. The sample was part of an endline study eval-
uating a randomized video-based intervention related to fruit and vegetable consumption, and as part of
this evaluation the food consumption of the subjects had been surveyed three to four months prior to the
present experiment. The unbounded and bounded recall groups were balanced, not only on household
characteristics, but also on their food consumption in the previous survey.
    The results indicate that the value of food consumption is 16 percent (95 percent CI: 7.4–25.9) higher
among the group of households to whom the unbounded recall was administered, relative to the bounded
recall group. In effect, on average and relative to the bounded recall group, an entire extra day of con-
sumption is included in the report for the previous seven days. This difference between the two recall
groups is not evenly distributed. It is particularly prominent for protein-rich foods like meat and eggs that
are typically less frequently consumed.4 As a result, there are also implications for standard indicators of
household food security and diet quality derived from consumption survey data, as these indictors may
be overstated when unbounded recall is used.
    Three developments make this an opportune time to experimentally examine effects of telescoping on
surveyed food consumption. First, there has been a move away from surveys using abstract constructs like
the “usual month,” in which respondents are asked to recall how many months per year purchases are
made for each type of food, how often those purchases are made per month, and the typical spending per
occasion (with similar questions for self-production). These questions substantially increase the time taken
for survey interviews and add education-related inequality to reported consumption due to the cognitively
demanding nature of those questions, while failing to accurately measure either means or variance-based
indicators like inequality statistics (Beegle et al. 2012). With the switch to asking about consumption in
an actual recent period, telescoping should matter more than it did when using the hypothetical “usual
month” construct. Second, recent FAO and the World Bank (2018) guidelines for food data collection in
household surveys recommend using a 7-day recall—shorter than what was often used in the past (where
14-days or 1-month recall was asked). With a shorter recall period, a telescoping error will loom larger as it
is amortized over a shorter period (Eisenhower, Mathiowetz, and Morganstein 2004). Finally, more diverse
diets that result from rising affluence and urbanization may make reports more susceptible to telescoping
errors; when many people were poor, with monotonous diets based on the cheapest local staple, survey
reports of food consumption could rely on respondents using a rate-based estimation strategy where
they multiply the frequency of occurrence by the length of the reference period (e.g., 2 loaves of bread


4   Unfortunately, it is not possible to go one step farther and measure the effect of this reporting error on measures of
    inequality, as the survey did not ask a non–food consumption module required to complete the consumption aggregate
    used in inequality measurement.
892                                                                                                     Abate et al.


per day, ergo 14 loaves of bread for 7 days). Now, there are growing numbers of people who can afford
occasional luxuries like meat and fish. These foods are still eaten sufficiently infrequently that respondents
need to remember and count them rather than using rate-based estimates when answering the food recall
questions. For this reason survey data may be more affected by telescoping.




                                                                                                                       Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
   The higher reported food consumption of the group surveyed with an unbounded recall can be inter-
preted as reflecting the impact of telescoping errors. This pattern fits with a widely discussed hypothesis
about the memory task that is required of survey respondents. However, it should be noted that estimates
in this paper cannot be benchmarked against any “true value” of food consumption. As mentioned by
De Weerdt, Gibson, and Beegle (2020), it is not clear if the true value is always available in household
consumption surveys. Even the “gold standard” diary-based method to collect consumption-expenditure
data can be prone to measurement error unless it is highly supervised, making it prohibitively expensive
in large-scale surveys (Beegle et al. 2012; Brzozowski, Crossley, and Winter 2017). At a minimum, the
findings contest the earlier decision to switch from the two-visit format to unbounded recall in LSMS and
other consumption surveys administered in low- and middle-income countries.
   This paper proceeds as follows. Section 2 reviews the literature related to telescoping, and section
3 describes the experiment and the data used for analysis. Section 4 provides the main results along
with heterogeneity and robustness analyses. Section 4 also explores alternative interpretations, namely
that bounded recall leads to lower reported food consumption because of declining compliance among
households. Section 4 concludes with a discussion of the feasibility of constructing adjustment factors
to correct for telescoping bias. Section 5 explores cost implications for surveys using a two-visit format
rather than a single-visit recall. Section 6 concludes and outlines research and policy implications of the
findings.


2. Previous Work Related to Telescoping
In a two-visit survey, the first visit to the household can provide a distinct start to the recall period in the
mind of the respondent. When the recall questions are asked in the second visit, the first visit then bounds
the recall period for the respondent (Grootaert 1986). While the first use of bounded recall is usually
attributed to Neter and Waksberg (1964), it was in quite a different context to surveys like the LSMS;
the survey was of infrequent spending on alterations and repairs of dwellings, and it used an unbounded
recall on the first visit, so that expenses reported then could be conveyed to respondents in the subsequent
interview to help prevent them from being reported again.5 Other studies with benchmarks for assessing
the extent of telescoping are also for high-value and infrequent purchases, such as computers (Morwitz
1997). It is unclear if reports for frequently consumed and low-value items like food would exhibit the
same response to bounding, especially as there is no easy way to gather consumption data in the first
interview that can be relayed to respondents in the second interview, to ensure that such foods are not
reported again.
   It is conceivable that a bounding visit could help respondents remember whether episodes that involved
low-value items occurred within the recall period. The difficulty of this memory task is shown by the
following comment from a discussant of one of the first papers to suggest that telescoping errors may
explain why expenditures measured with a one-week unbounded recall appeared higher than what other
survey approaches showed:
    I confess that if somebody asked me to give an account of my expenditure for the last seven days I should not
    remember whether it was seven or eight days since I bought an article such as face powder (Cole and Utting 1956,
    389).



5     For a useful review of the early literature on this topic, see Dex (1995).
The World Bank Economic Review                                                                                          893


   However, whether bounding helps respondents to remember the occurrence of episodes largely depends
on the cognitive process they use to answer these questions. A plausible model is that when there are a large
number of items, respondents do not try to remember and count, and instead use a rate-based estimation
strategy. If rate-based estimation is used to answer questions about food consumption, there is less reason




                                                                                                                               Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
to expect that a device to aid memory, such as a bounding visit, would help to improve the accuracy of
these data.
   Friedman et al. (2017) find that food consumption data from a seven-day unbounded recall survey
are subject to incidence errors within several important food groups (the respondents entirely forget to
report any consumption) that are offset by errors of overstatement in value of what was consumed, con-
ditional on reporting any consumption. Thus, the good performance of the seven-day unbounded recall
(in matching data from the benchmark individual diaries in Tanzania) may be from happenstance of inci-
dence errors and value errors canceling each other out. This analysis also found that the overstatement of
consumption, conditional on incidence, was more than twice as high for infrequently purchased (versus
frequently purchased foods) or for self-produced foods that are seldom consumed (versus those that are
frequently consumed). These frequency-related patterns may be due to respondents using a rate-based
rule-of-thumb estimate for frequently purchased or consumed items, while they try to remember and
count episodes for infrequently consumed foods (Chang and Krosnick 2003; De Weerdt, Gibson, and
Beegle 2020). With these different modes of answering survey questions, telescoping error would matter
more for infrequently consumed foods.
   There are at least two other recent but unpublished survey experiments comparing bounded and un-
bounded recall. Both introduced other design variations as well, such as length of the food list, length of
the recall period, frequency of interviewer visits, and type of data capture (diary versus recall). Durazo
et al. (2017) 6 randomized a number of different survey experiments on consumption into questionnaires
in Indonesia; among other things, they found that with a seven-day unbounded recall, per capita food
consumption was 24 percent higher than what a bounded recall showed, using a recall list with 94 items.
This effect was larger than for some of the other design variants tested, such as cutting the food list from
229 items to 126 (which results in just a 2 percent drop in apparent consumption). Sharp et al. (2022)
found no difference in food consumption for a Marshall Islands sample given a seven-day unbounded
recall and another sample where an initial visit was made to the household. However, for the sample
with two visits the recall questions continued to use the “In the last seven days . . .” wording, and the
gap between visits varied; visit 2 was eight days after visit 1 for 53 percent of the sample, nine or more
days after for 14 percent of the sample, and seven or less days after for 33 percent of the sample. Thus,
the latter experiment did not really implement a bounded recall although it does highlight the logistical
challenges of this survey design.


3. The Survey Experiment, Data and Methods
The survey experiment was designed to study the implications of telescoping for food consumption
measurement by systematically contrasting responses from unbounded and bounded recalls. For the un-
bounded recall, the common approach to food consumption measurement was used, requiring a respon-
dent to report on the household’s food consumption for each item from a list of 128 food items, asking
about consumption within the reference or recall period (the last 7 days).7 For the bounded recall, a salient

6   This study has been presented at an international conference but, at the time of writing (May 2022), is not available as
    an online manuscript.
7   The selection of these food items was based on the 2016 Household Consumption Expenditure Survey (HCES) data
    collected by the Central Statistical Agency of Ethiopia. HCES is a nationally and regionally representative sample, and
    the 2016 sample had about 3,800 households in Addis Ababa. All food items consumed by at least 1 percent of the
    2016-HCES households located in Addis Ababa were included on the list.
894                                                                                                                                                Abate et al.


Table 1. Survey Experiment Design

                                                        Bounded recall                                           Unbounded recall

Method of data capture                                  CAPI                                                     Computer-assisted personal interviewing
                                                                                                                 (CAPI)




                                                                                                                                                                       Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
Total number of survey modules                          10 modules                                               10 modules
Module of order of the food                             3rd module                                               3rd module
consumption module
Reference period in the food                            7-day recall                                             7-day recall
consumption module
Designated respondent in the food                       Household member who decides on                          Household member who decides on
consumption module                                      food purchase and/or preparation                         food purchase and/or preparation
Food consumption measurement                            128 food items (frequency and quantity                   128 food items (frequency and quantity
                                                        consumed)                                                consumed)
Question format in the food                             Consumption of food item since the visit                 Consumption of food item during the
consumption module                                      by a survey team member*                                 last 7 days
Number of households                                    440                                                      450

Source: Authors.
Note: *Households in the bounded recall group were visited by survey supervisors exactly 7 days before the actual data collection. The supervisors wore a uniform
(a white T-shirt and hat) supplied by the research team while visiting households in the bounded recall group so as to differentiate themselves from other visitors.
The enumerators were trained to specifically remind and confirm the visit by a survey team member with a white T-shirt and hat just before administering the food
consumption module for households in the bounded treatment group.



recall marker was introduced by visiting sample households seven days prior to the actual survey and in
the second visit the respondents were asked to recall consumption of food items since the initial bounding
visit.8 The household visits were conducted by survey supervisors, who wore a uniform to distinguish
them from other visitors so as to make the visit equally notable and memorable for all sample households
in the subgroup. The two groups differ only by the question format (wording) of the consumption module
(table 1). For all other aspects, the survey designs for the two groups were identical: the method of data
capture, designated respondent, the number of food items in the recall list, and the total number as well
as the order of survey modules.
    The survey experiment was implemented as an add-on to an endline evaluation survey of a randomized
controlled trial designed to assess the impact of video-based behavioral change communication on fruit
and vegetable consumption in Addis Ababa, Ethiopia. To this end, the study team produced two types
of videos with different information content and randomly allocated the sample of households into three
groups: Control (no video screening), Video (video screening with standard informational content) and
Video+ (video screening with advanced informational content). S1 in the supplementary online appendix,
available with this article at the World Bank Economic Review website, provides more details about the
video intervention, and its impact evaluation results are reported in Abate et al. (2021). To ensure that
the experiment on bounded versus unbounded recall did not affect the outcomes of the impact evaluation
(and vice versa), the study cross-randomized study samples into the bounded and unbounded consumption
recall subgroups.
    The study sample is representative of households in Addis Ababa and is formed from 930 households
randomly selected from six sub-cities, 20 woredas (districts), and 40 ketenas (neighborhoods; or clusters
of households) within Addis Ababa.9 The survey experiment took place between January 24 and Febru-
ary 11 2020, and attempted to re-interview all 930 households who had been interviewed in September


8     For 99 percent of the 440 households in the bounded recall group, the duration between the two visits is recorded as
      exactly 7 days. For the remaining 1 percent (6 households), dates of the visits were not recorded.
9     Melesse et al. (2019) provide a a detailed description of the sampling strategy. Hirvonen, Abate, and de Brauw (2020)
      show that the household demographics in this sample are comparable to other representative surveys conducted in Addis
      Ababa.
The World Bank Economic Review                                                                                                 895


2019.10 The interviews for both recall groups took place in parallel. Out of the households that had been
interviewed in September 2019, 35 households were not interviewed in January–February.11 The response
rates are comparable across the two groups: 97 percent and 95 percent for the unbounded and bounded
subgroups, respectively.




                                                                                                                                       Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
   To choose the primary respondent, the survey module on consumption targeted the household mem-
ber who was most knowledgeable about the household’s food shopping and preparation. More than 90
percent of the respondents were women. For each item, the study asked the respondent whether their
household had consumed the item in the past seven days (or, for the bounded group whether they had
consumed it since the visit by a survey supervisor). If the answer was yes, the respondent was asked
the number of days in which the item was consumed and the total quantity consumed during the recall
period.12

Data
Reported quantities were converted into local currency units (Ethiopian birr) using retail price data that
are collected every month by the Central Statistical Agency (CSA) of Ethiopia.13 The quantities were also
converted consumption in kilograms into calorie and protein equivalents using conversion factors pro-
vided in the Ethiopian food composition tables (EPHI 1981). These calculations account for the inedible
portion of the weight by using the edible portion estimates provided by the United States Department of
Agriculture (USDA 2013).
   Extreme values in household per capita consumption variables (birr, kcal, and protein) used in the
analyses were winsorized to the 99th percentile. After dropping five households with implausibly large
per capita consumption values,14 the final data set for analysis includes 890 households.15 The average
household in the bounded group consumed food valued at 275 birr per person during the 7-day period
(equivalent to about USD$8.50 per person per week at market exchange rates). The average daily con-
sumption reported was 1,640 kilocalories per capita, including 45 grams of protein per capita.16
   As an alternative measure of consumption, a further outcome used is household dietary diversity, often
used as an indicator of household food security (Hoddinott and Yohannes 2002). The household dietary
diversity score (HDDS) of Swindale and Bilinsky (2006) was computed by first grouping the 128 food
items in this study’s consumption module into 12 food groups: cereals; roots and tubers; vegetables; fruits;
meat, poultry and offal; eggs; fish and seafood; pulses, legumes and nuts; milk and milk products; oil and
fats; sugar and honey; and miscellaneous foods. The HDDS is a sum of all food groups from which the
household consumed food items during the 7-day recall period, with a minimum of 1 and maximum of
12. As an alternative measure of household dietary diversity, the food consumption score (FCS) developed
by the WFP (2008) was computed. The FCS combines dietary diversity and consumption frequency by
grouping the consumed food items into nine groups and allocating more weight to protein-rich foods.17

10   Previous work in Ethiopia shows how food consumption in urban areas is affected by religious fasting (Hirvonen,
     Taffesse, and Worku 2016). To this end, the study made sure that there was no major Orthodox or Muslim fasting
     period during the survey experiment.
11   Sixteen households refused the interview, 15 could not be found in their house during the survey visit, survey enumerators
     were unable to track 3 households, and sadly 1 respondent had passed away.
12   This module only considered food consumed in the house. The survey instrument had another module for measuring
     foods consumed outside the house but these data are not considered in this study.
13   More specifically, the study used the CSA retail price data for February 2020, restricting the price observations to Addis
     Ababa.
14   Specifically, calorie consumption above 5,000 kcal per adult equivalent was considered implausible.
15   Three of these households were in the bounded group and two in the unbounded group.
16   The corresponding values in per adult equivalent unit terms are 327 birr in 7 days, 1,946 kilocalories per day and 53
     grams of protein per day.
17   The FCS food groups are: main staples (weight: 2); pulses (3); vegetables (1); fruits (1); meat, eggs, fish (4); dairy products
     (4); sugar (0.5); oil/butter (0.5); and condiments (0).
896                                                                                                                                                Abate et al.


Table 2. Household Characteristics, by Recall Type

                                                                              Bounded recall             Unbounded recall             Difference            t-test
Variable                                                                        Mean/[SE]                   Mean/[SE]                                      p-value

Female respondent                                                                  0.925                       0.911                      0.014             0.443




                                                                                                                                                                       Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
                                                                                  [0.015]                     [0.015]
Household size                                                                     4.566                       4.549                      0.017             0.907
                                                                                  [0.126]                     [0.109]
Household size in adult equivalent units                                           3.881                       3.837                      0.044             0.730
                                                                                  [0.109]                     [0.094]
Male-headed household                                                              0.566                       0.551                      0.015             0.696
                                                                                  [0.036]                     [0.029]
Head’s education in years                                                          6.345                       6.491                    −0.146              0.548
                                                                                  [0.275]                     [0.278]
Household asset index                                                             −0.130                       0.103                    −0.233              0.193
                                                                                  [0.168]                     [0.143]
Other treatment: Control                                                           0.305                       0.353                    −0.048              0.096
                                                                                  [0.015]                     [0.016]
Other treatment: Video                                                             0.343                       0.331                      0.012             0.632
                                                                                  [0.014]                     [0.014]
Other treatment: Video+                                                            0.352                       0.316                      0.036             0.264
                                                                                  [0.018]                     [0.018]
Weekly food consumption per capita before the                                    311.281                     323.739                  −12.458               0.313
experiment (in September 2019)                                                   [10.096]                    [13.650]

Number of households:                                                               440                         450
Clusters:                                                                                           40

Source: Authors’ calculations on Addis Ababa survey data.
Note: The unit of observation is the household. Standard errors (SE) are clustered at enumeration area (ketena) level. Difference in means between the groups tested
with a t-test (null-hypothesis: difference in means = 0).



The weighted FCS index ranges between 0 and 112, with higher scores indicating a better food security
situation.
   The bounded and unbounded groups are similar in terms of household characteristics (table 2). Further,
the t(random) assignment into the video experiment study arms is orthogonal to the (random) allocation
into the telescoping experiment groups. The subsamples given bounded and unbounded recall were also
balanced on both household characteristics, and their baseline food consumption data were collected in
September 2019 (i.e., three to four months before the survey experiment).

Methods
The difference in reported per capita consumption values across the two groups is quantified using ordi-
nary least squares (OLS). In the most basic model, both the per capita food consumption value and its
logarithm are regressed on a binary treatment variable valued 1 if the household was randomly selected
into the unbounded recall group, and 0 if selected into the bounded recall group. The next regressions
also control for differences in basic household characteristics (household size, a binary variable to indi-
cate male-headed households, the head’s education in years), the household’s treatment status in the video
experiment and unobserved characteristics between sub-cities (fixed effects for each sub-city). When per-
centage differences derived from the coefficients in semi-log regressions are discussed, they are based on
         ˆ    ˆ ˆ
100 × (eβ −0.5V (β ) − 1) with confidence intervals from the approximate unbiased variance estimator of van
Garderen and Shah (2002).18 The standard errors in all regressions are clustered at the ketena level.19

18                                                               ˆ to the estimated variance.
                       ˆ refers to the estimated coefficient and V
      In the equation, β
19    The household sample groups into 6 sub-cities and 40 ketenas.
The World Bank Economic Review                                                                                                                                      897

Figure 1. Distribution of (ln) Weekly Food Consumption per Capita (in Birr), by Recall Type




                                                                                                                                                                             Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
Source: Authors’ calculations on Addis Ababa survey data.
Note: Kernel density estimates. N = 890 households.


Table 3. Impact of Unbounded Recall on Weekly per Capita Food Consumption (in Birr)

                                                                                (1)                 (2)                               (3)                  (4)
Dependent variable:                                                              Food consumption (birr)                            (ln) food consumption (birr)

Unbounded recall                                                            56.13**                   54.41***                   0.156***                    0.152***
                                                                            (15.88)                    (14.31)                    (0.044)                     (0.039)
Household level controls?                                                     No                         Yes                        No                          Yes
Sub-city fixed effects?                                                       No                         Yes                        No                          Yes

Observations:                                                                  890                       890                         890                        890
R2                                                                            0.020                     0.158                       0.017                      0.196

Bounded group mean of the dependent variable                                  275.3                     275.3                         n/a                        n/a

Source: Authors’ calculations on Addis Ababa survey data.
Note: The unit of observation is the household. Household level controls include household size (number of members), indicator variable for male-headed households,
head’s education in years, and treatment status in the video experiment. Standard errors are clustered at the enumeration area (ketena) level and reported in parentheses.
Statistical significance denoted with + p < 0.10, *p < 0.05, **p < 0.01, ***p < 0.001.



4. Results
As a first step, the study illustrates the full distributions of (log) household weekly per capita food con-
sumption measured in birr for both bounded and unbounded recall groups (fig. 1). Relative to the bounded
recall group, the estimated food consumption distribution for the unbounded group is shifted to the right,
indicating larger reported food consumption values across the board.
    Next, regression estimates for the difference in unbounded and bounded recall for household weekly
per capita food consumption, measured in birr, are computed (table 3). The estimated coefficients quantify
the difference in the consumption outcome when the consumption module was based on an unbounded
recall relative to when a bounded recall module was used. In columns (1) and (2), the outcome variable
is the household per capita food consumption value in birr, whereas it is the natural logarithm of the
value in columns (3) and (4). columns (1) and (3) do not use additional covariates, whereas columns (2)
and (4) control for additional covariates as described above. As the differences between the unadjusted
898                                                                                                                                                     Abate et al.


Table 4. Impact of Unbounded Recall on (log) Daily per Capita Calorie and Protein Intakes

                                                           (1)                           (2)                                  (3)                                (4)
Dependent variable:                                              (ln) calorie consumption                                           (ln) protein consumption

Unbounded recall                                        0.081*                          0.084**                           0.147***                           0.148***




                                                                                                                                                                             Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
                                                        (0.031)                         (0.026)                            (0.040)                            (0.035)
Household level controls?                                 No                              Yes                                No                                 Yes
Sub-city fixed effects?                                   No                              Yes                                No                                 Yes

Observations                                              890                              890                               890                                890
R2                                                       0.008                            0.189                             0.017                              0.159

Source: Authors’ calculations on Addis Ababa survey data.
Note: The unit of observation is the household. Household level controls include household size (number of members), indicator variable for male-headed households,
head’s education in years, and treatment status in the video experiment. Standard errors are clustered at the enumeration area (ketena) level and reported in parentheses.
Statistical significance denoted with + p < 0.10, *p < 0.05, **p < 0.01, ***p < 0.001.




and adjusted regressions are negligible, the study focuses its reporting and discussion on the adjusted
regression results.
    The regression coefficient in column (2) in table 3 shows that unbounded recall increases the reported
per capita consumption value by 54 birr relative to bounded recall (p-value < 0.001; 95 percent CI: 25.5–
83.3). Considering the mean of the bounded group, this impact of using an unbounded recall is equivalent
to a 20 percent increase in apparent food consumption per capita. In the semi-log regression models, the
corresponding difference between the unbounded and bounded recall is 16 percent (p-value < 0.001; 95
percent CI: 7.45–25.95). As noted in the introduction, these estimates are large and are roughly equivalent
to adding an entire day of consumption value to the report of the household seven-day consumption total.
The most plausible source of this apparently higher rate of food consumption is forward telescoping where
consumption episodes occurring more than seven days earlier are included in the report for the last seven-
days.
    Next, the analysis in table 3 is repeated with per capita calorie and protein intakes as the dependent
variables, in columns (1)–(2) and 3–4, respectively (table 4). Unbounded recall results in 8.8 percent higher
reported per capita calorie intakes compared to bounded recall (p-value < 0.01; 95 percent CI: 3.10–14.7),
which is considerably lower than the estimated difference when the consumption value is expressed in
birr terms. Meanwhile the corresponding difference in protein intake is 16 percent (p-value < 0.001; 95
percent CI: 7.91–24.5), which is similar to the estimated impact on the birr value reported in table 3. The
fact that apparent protein intake is more sensitive to differences in survey design than is calorie intake
agrees with a finding from Ameye, De Weerdt, and Gibson (2021), who note that previous findings on the
fragility of calorie-based hunger estimates to variations in the design of food consumption surveys (e.g.,
De Weerdt et al. 2016) are likely to understate the fragility of survey estimates when a richer consideration
of nutrition is used, one that focuses on macro- and micro-nutrients.
    Next, the influence of the type of recall survey on the two indicators of food security and diet quality,
HDDS and FCS, is examined (table 5). The coefficient in column (2) of table 5 shows that the households
in the unbounded recall group report consuming from 0.3 more food groups than do the households in the
bounded recall group (p-value < 0.01; 95 percent CI: 0.11–0.47). Considering that the mean HDDS in the
bounded group is 9.1 food groups, this estimate represents a 3 percent increase in HDDS when unbounded
recall is used. The effect on FCS is slightly larger in magnitude: the mean FCS in the unbounded recall
group is 4.3 units (p-value < 0.01; 95 percent CI: 1.62–6.93) or 6 percent higher than in the bounded
recall group.
    To further consider potential differences by food group, the following analysis uses consumption of
specific food groups as dependent variables (table 6). In panel A, the dependent variable takes on a value
of 1 if the household consumed from the food group and 0 otherwise. Households in the unbounded
The World Bank Economic Review                                                                                                                                       899


Table 5. Impact of Unbounded Recall on Household Diet Diversity (HDDS) and Food Consumption Scores (FCS)

                                                                                 (1)                        (2)                         (3)                       (4)
Dependent variable:                                                           Household diet diversity score                            Hood consumption score

Unbounded recall                                                             0.299**                     0.290**                    4.371**                    4.275**




                                                                                                                                                                             Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
                                                                             (0.089)                     (0.089)                    (1.338)                    (1.313)
Household level controls?                                                       No                         Yes                         No                        Yes
Sub-city fixed effects?                                                        No                          Yes                        No                         Yes

Observations                                                                    890                        890                          890                      890
R2                                                                             0.009                      0.202                        0.011                    0.176

Bounded group mean of the dependent variable                                    9.12                       9.12                        65.89                    65.89

Source: Authors’ calculations on Addis Ababa survey data.
Note: The unit of observation is the household. Household level controls include household size (number of members), indicator variable for male-headed households,
head’s education in years, and treatment status in the video experiment. Standard errors are clustered at the enumeration area (ketena) level and reported in parentheses.
Statistical significance denoted with + p < 0.10, *p < 0.05, **p < 0.01, ***p < 0.001.


Table 6. Impact of Unbounded Recall on Consumption of Different Food Groups

                                                                    (1)            (2)            (3)              (4)           (5)              (6)          (7)
                                                                  Staples      Legumes        Vegetables          Fruit    Meat & eggs          Dairy        Other

Panel A. Dependent variable: = 1 if consumed from the food group, 0 otherwise
Unbounded recall                                   n/a        n/a        n/a                                   0.011          0.082**           0.034          n/a
                                                                                                              (0.028)         (0.029)          (0.028)
R2                                                                  n/a            n/a            n/a          0.089           0.120            0.103          n/a
Bounded group mean of the dependent variable                       1.00           0.99           1.00           0.79            0.69             0.55         1.00

Panel B. Dependent variable: Number of days consumed from the food group
Unbounded recall                                  n/a      0.046       n/a                                    0.322*         0.750***           0.197          n/a
                                                         (0.159)                                              (0.154)         (0.147)          (0.167)
R2                                                n/a      0.042       n/a                                     0.125           0.173            0.115          n/a
Bounded group mean of the dependent variable     6.96       5.54      6.96                                      3.49            2.38             2.18         6.99

Panel C. Dependent variable: Birr value consumed from the food group
Unbounded recall                                31.30**       0.76                               9.82           3.64         110.71**           5.73        16.93+
                                                  (9.70)     (4.04)                             (8.89)         (6.68)         (40.24)          (6.82)        (9.16)
R2                                                0.304      0.112                              0.190          0.127           0.096           0.095         0.124
Bounded group mean of the dependent variable      304.8       97.3                              197.1           86.5           272.3            53.2         183.1

Source: Authors’ calculations on Addis Ababa survey data.
Note: The unit of observation is the household; N = 890. All regressions include household level controls (household size, indicator variable for male-headed house-
holds, head’s education in years, and treatment status in the video experiment) and sub-city fixed effects. Standard errors are clustered at the enumeration area (ketena)
level and reported in parentheses. Statistical significance denoted with + p < 0.10, *p < 0.05, **p < 0.01, ***p < 0.001. n/a = not applicable; insufficient amount of
variation in the dependent variable.




recall group are 8 percentage points (or 11 percent) more likely to report having consumed meat, poul-
try, fish, or eggs in the past seven days (p < 0.01). The estimated coefficients are positive for both fruit
and dairy food groups but are not statistically different from 0. All other food groups were consumed
virtually by all households, and thus there is not sufficient variation in the outcome variable to estimate
coefficients using this method. In panel B, the dependent variable is the number of days the household
reports consuming item from the food group. The unbounded recall increases the reported consumption
frequency of meat and eggs by 0.75 days (p < 0.001) and of fruit by 0.32 days (p < 0.05). Meanwhile, the
difference for legumes and dairy food groups is not statistically different from 0. In panel C, the depen-
dent variable is food group consumption, measured in birr. The choice of recall has a particularly large
influence on the reported consumption values in the “meat and eggs” food group. The mean per capita
consumption in this group is 110 birr higher (p < 0.01) when the unbounded recall is used. Considering
900                                                                                                                 Abate et al.


the mean in the bounded group of 272 birr, this finding translates into a 40 percent increase in the ap-
parent value of consumption when unbounded recall is used. Using unbounded recall also increases the
reported consumption of staple crops by 31 birr—or 10 percent (p < 0.01).
   Overall, the impact of using a bounded versus an unbounded recall is most apparent for protein-rich




                                                                                                                                    Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
foods (e.g., meat, poultry, eggs).20 Among sample households, these foods are typically less regularly
consumed than are calorie-rich staple foods (e.g., maize, wheat, teff). Therefore, one implication of the
results in tables 4–6 (given that diet diversity and quality scores also rise with consumption of the protein-
rich foods) is that the telescoping effect could be driven by infrequently consumed food items. This pattern
has also been suggested by previous studies (e.g., Friedman et al. 2017), although without the benefit of
an experiment designed to get at the telescoping effect.
   To test the hypothesis that the telescoping effect is driven by infrequently consumed food items, the
mean weekly per capita food consumption for each food item is calculated, separately for the unbounded
and bounded recall groups. The two means are then used to calculate a ratio between mean per capita
consumption in the unbounded group relative to the bounded group for each food item. After dropping
food items that were consumed by less than 5 percent of the sample, the study is left with 70 food items.21
For those items, the ratio is compared in a scatter plot to the mean number of days each food item was
reportedly consumed by all households in the sample (fig. 2). A linear regression line weighted by the price
of the food item shows that the telescoping error is larger for infrequently consumed than for frequently
consumed foods. For foods consumed nearly every day of the week, the predicted ratio is close to 1,
indicating that for these foods the telescoping error is close to 0.

Heterogeneity by Household Characteristics
The degree of telescoping error could vary with household and respondent characteristics, if such char-
acteristics affect the nature of the reporting task or the ability to do this task. To explore this possibility,
the binary treatment variable was sequentially interacted with three control variables: a binary variable
to indicate male-headed households, the head’s education in years, and household size (table 7).22 The
coefficient on the interaction term is insignificant when the treatment variable is interacted with the bi-
nary variable capturing male-headed household (column (1)) or the variable capturing head’s education
level in years (column (2)). In contrast, the interaction term is statistically significant (p < 0.05) and neg-
ative for household size, indicating the magnitude of the telescoping error decreases with household size
(column (3)). However, this finding is partly driven by very large households containing nine or more
household members. When households with nine or more members (about 6.5 percent of the total sam-
ple) are omitted from the sample (column (4)), the coefficient on the interaction gets smaller and is no
longer statistically different from 0 (p = 0.240).




20    Dairy products are also protein rich and are infrequently consumed, yet the coefficient estimates on dairy products
      are not statistically different from 0 for any of the outcomes studied in table 6. However, one must be cautious about
      interpreting the lack of rejection of the null hypothesis as “no effect.” The coefficients are all positive, small relative
      to the average value of each outcome in the bounded group, and somewhat imprecisely estimated. It could be that the
      experiment simply lacked statistical power to estimate positive coefficients with p-values in the standard rejection range.
21    Five percent of the sample is about 45 households. There were 58 food items that were consumed by less than 45
      households. For these food items, the ratio of unbounded recall to unbounded recall becomes excessively sensitive to
      extreme values because the two means are calculated from less than 45 observations. For this reason, these food items
      are omitted from the sample for the scatterplot and regression presented in fig. 2.
22    Variables measuring the gender and education level of the household head come from an earlier survey conducted among
      the same households, which is detailed in Melesse et al. (2019). The gender of the respondent to the consumption survey
      was recorded, but the respondent code cannot be linked back to the household roster from previous surveys.
The World Bank Economic Review                                                                                                                                  901

Figure 2. Consumption Frequency and Telescoping Error




                                                                                                                                                                        Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
Source: Authors’ calculations on Addis Ababa survey data.
Note: N = 70 food items; food items consumed by less than 5 percent of the households were dropped. The vertical axis measures the mean weekly per capita food
consumption in the unbounded sample relative to the bounded sample. These two means are equal when the ratio equals 1, marked by the dashed horizontal line. The
horizontal axis measures the mean number of days the item was consumed by all households. The fitted line (solid black line) is based on a weighted linear regression
that puts more weight on more expensive food items. The shaded area around the fitted line is the 95 percent confidence interval (CI).


Table 7. Regression Results from Interaction Models

                                                                             (1)                       (2)                  (3)                               (4)
Dependent variable:                                                                         (ln) total weekly food consumption per capita

Unbounded recall                                                         0.171**                    0.181**                    0.310**                    0.264**
                                                                         (0.051)                    (0.058)                    (0.088)                    (0.091)
Unbounded recall * Male-headed household                                 −0.038
                                                                         (0.064)
Unbounded recall * Head’s education in years                                                        −0.005
                                                                                                    (0.006)
Unbounded recall * Household size                                                                                              −0.035*                    −0.023
                                                                                                                               (0.017)                    (0.020)
Household level controls?                                                   Yes                        Yes                       Yes                        Yes
Sub-city fixed effects?                                                     Yes                        Yes                       Yes                        Yes

Observations:                                                               890                        890                        890                        832
R2                                                                         0.193                      0.193                      0.196                      0.193

Source: Authors’ calculations on Addis Ababa survey data.
Note: The unit of observation is the household. The dependent variable in all columns is (ln) total weekly food consumption per capita (in birr). Household level
controls include household size (number of members), indicator variable for male-headed households, head’s education in years, and treatment status in the video
experiment. Column (4) removes households with 9 or more members. Standard errors are clustered at the enumeration area (ketena) level and reported in parentheses.
Statistical significance denoted with + p < 0.10, *p < 0.05, **p < 0.01, ***p < 0.001.


Robustness Checks
The results of several robustness checks are reported in S2 in the supplementary online appendix. First,
the sensitivity of the main estimates to outliers is tested, by adding the five households with implausibly
large food consumption back into the sample, and by using non-winsorized consumption values; both
changes result in similar coefficients as reported in column (4) of table 3 (table S2.1 in the supplementary
902                                                                                                                 Abate et al.


online appendix). Second, column 2 in table S2.2 shows that the results are similar if median regression
based on the least absolute deviation procedure is used; the least absolute deviation procedure is less sen-
sitive to outliers or other extreme values than OLS (Koenker and Bassett 1978).23 Moreover, while the
estimated impact of the unbounded recall based on the quantile regressions seems slightly larger for richer




                                                                                                                                    Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
households, the difference in the impact estimated at the 25th and at the 75th food consumption quintiles
is not statistically different from 0 (table S2.2). Third, the survey experiment was carried out during an
endline evaluation of a video intervention that encouraged households to consume more fruit and veg-
etables. While regressions do control for treatment status in the video intervention, it is possible that the
video treatment amplified the effect of (un)bounding in the survey experiment. To explore this possibility,
the treatment status in the video intervention is interacted with the binary variable representing mem-
bership in the unbounded recall group. Whether log food expenditures, calorie consumption, or protein
consumption is used as the dependent variable, none of the coefficient estimates on the interaction terms
are statistically different from 0, and they are all relatively small and imprecisely estimated, indicating
that the video experiment does not influence the findings (table S2.3). Fourth, table S2.4 replicates table 3
using an ANCOVA estimator, by adding household per capita consumption measured in September 2019
as an additional control variable. The coefficients are very similar to those reported in table 3.
    Finally, two robustness checks are conducted on consumption frequency and telescoping error (fig. 2).
First, the regression line in the figure is re-estimated without the weight on more expensive food items (fig.
S2.1). The slope of the unweighted regression line remains negative but is slightly less steep than the one
in fig. 2. Second, rather than omitting food items that were consumed by less than 5 percent of sample
households, an alternative way to deal with these imprecisely estimated values is to winsorize small and
large values of the ratio of the mean consumption values. Specifically, fig. S2.2 replicates fig. 2 based on
113 food items, winsorized at the 5th and 95th percentiles.24 As before, the telescoping error is observed
as driven by the infrequently consumed food items.

Alternative Interpretations
This study interprets the results as being the result of forward telescoping. The higher reported food
consumption of the group surveyed with an unbounded recall is in line with concerns about the cognitive
burden of the memory task imposed on survey respondents. Therefore, any features of the survey design
that make this memory task easier, such as the use of a bounding visit to demarcate the beginning of the
recall period in the mind of the respondent, should yield data that are closer to the truth.
    However, this study acknowledges an alternative explanation for the lower reported food consump-
tion of the households who are given the two-visit, bounded, recall. It is possible that visiting the same
household twice, in quick succession, leads to declining compliance (Schündeln 2018) and less cooper-
ative respondents may under-report consumption in order to finish the interview sooner. For example,
there is a Yes or No screening question for each of the 128 food items, asking whether there was any
consumption within the last seven days (or since the visit by the survey supervisor) and uncooperative
respondents might say No when the true answer is Yes. In this case, the two-visit, bounded, recall might
result in food consumption data that are further from the truth.
    This alternative explanation, however, is unlikely, for four reasons. First, the initial visit that the survey
supervisor made to the household was very quick, usually taking less than five minutes. So respondents
in the two-visit bounded recall group did not bear a time cost much greater than what the respondents in

23    Note that in using quantile regression the assumption underlying randomization necessarily changes; when randomizing,
      the assumption is that the average value of the outcome of interest would be the same for both groups in the absence of
      the treatment. In a quantile regression, the causal effect estimated is on the distribution of outcomes at the group level,
      rather than at the quantile specifically, as it cannot be assumed that individual observations would not change ranking
      due to the treatment (e.g., Meager 2022).
24    Fifteen food items that were neither consumed by unbounded nor bounded recall groups were dropped.
The World Bank Economic Review                                                                                           903


the one-visit unbounded recall group experienced. Second, the patterns of the bounding visit having more
effect on reported consumption of small households and of rarely consumed foods can be explained by
telescoping (in conjunction with a hypothesis about using rate-based rule-of-thumb reporting when the
number of episodes is larger) but are less easily explained by reduced cooperation, which should have




                                                                                                                                Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
effects across the board. Third, declining cooperation should especially show up for foods that occur late
in the list of the 128 foods, as respondents learn the pattern of the questions and start answering No
when the true answer is Yes for the screening question, but there appears to be no such effect of position
in the food list.25 Fourth, when the benchmark is used for assessing data quality—Benford’s Law—which
Schündeln (2018) uses in his study of declining respondent compliance in each successive survey visit,
there is no difference between the two groups in how the pattern of digits differs from Benford’s Law (S.3
in the supplementary online appendix; regression results are in tables S3.1 and S3.2). Thus, there is no
evidence of less cooperative respondents in the two-visit bounded recall group.

Adjustment Factors and Mean-Reverting Error Patterns
There are far more food consumption surveys carried out every year than there are experiments that
can inform about the sensitivity of results to different design choices. Thus, a common request of survey
experiments is to provide adjustment factors that may let analysts take data collected in different ways
and line them up on a comparable basis. For example, the widely used Deininger and Squire (1996)
database of inequality estimates has Gini coefficients from both expenditure surveys and income surveys,
with expenditure-based Gini coefficients being 6.6 points lower, on average; thus, some analysts combine
the two types of Gini by adding 6.6 points to expenditure-based ones. In the current setting, unbounded
recall gives the equivalent of an entire extra day of consumption in the report for what is ostensibly the
last seven days of food consumption. Therefore, an analyst might be tempted to annualize this estimate
by multiplying by (365/8) rather than by (365/7) to make up for the overstatement that results from
telescoping.26
   Evidence from other survey experiments suggests that such adjustment factors are rarely available.
One reason is that errors are mean-reverting, contrary to classical assumptions of measurement errors
being uncorrelated with anything of interest. For example, Pradhan (2009) considers evidence from the
SUSENAS survey in Indonesia in which some households get a short list of broadly defined items for their
consumption recall while others get a far longer list of narrowly defined items. Using fewer questions yields
lower consumption, but the fraction by which consumption is underestimated increases as consumption
rises, so without data on actual consumption it is not possible to devise a simple correction factor to line
up data from the two survey designs. Other consumption surveys also show this mean-reverting error
pattern (Gibson et al. 2015).
   The ideal way to test for mean-reverting error is to regress a noisy measure on the true measure, for
the same household. If errors are random, the slope coefficient of this regression should be 1 (and the
intercept will be 0), while with mean-reverting errors the slope coefficient is less than 1. The design of this
experiment does not provide two measures for the same household, so instead the approach of Gibson
et al. (2015) is followed by taking the mean of the (log) per capita household food consumption across

25   The study tested this point by regressing the ratio of Yes responses between unbounded to bounded recall groups on the
     order of the item in the food list (i.e., #1 through to #128). The estimated coefficient on the food item order variable
     was 0.0017 with a White (1980) adjusted 95 percent confidence interval ranging between −0.0015 and 0.0049 (p-
     value = 0.291). In other words, the position in the food list is not correlated with the ratio of Yes responses between
     unbounded to bounded recall groups.
26   This approach uses a naïve extrapolation to annualize, and even if the sample is staggered over the weeks in the year so
     that extrapolation to annual means and totals is correct, any variance-based measures—including the share of observa-
     tions in the lower tail, such as a poverty rate or a hunger rate—will be overstated because short-term shocks that are
     subsequently reversed (at least partially) in the rest of the year are ignored (Gibson 2020).
904                                                                                               Abate et al.


each sampling unit (ketena), separately for both recall groups. The ketena level means for the unbounded
recall group (the noisier measure) are then regressed on the means for the bounded recall group. The slope
coefficient is 0.201 with a White (1980) adjusted 95 percent confidence interval ranging between −0.030
and 0.433. Consequently, the null hypothesis that the errors are random (i.e., coefficient equals 1) is firmly
rejected (p < 0.0001) in favor of the alternative hypothesis of a mean-reverting error structure.




                                                                                                                 Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
   This error structure matters because with mean-reverting errors the bias in coefficient estimates can
be in either direction rather than just attenuation as happens with classically mismeasured right-hand
side variables (Abay et al. 2019). As a result, simple adjustment factors to account for telescoping errors
cannot be easily devised, and the typical econometric approach for mitigating errors in variables bias, of
using instrumental variables, is unlikely to be successful with mean-reverting errors (Gibson et al. 2015).
Therefore, preventing these errors occurring during the data collection seems the only viable option (De
Weerdt, Gibson, and Beegle 2020).



5. Cost Implications
One unexpected finding from recent survey experiments is that design variations that are expected to save
either time or money often do neither, while potentially giving lower-quality data (De Weerdt, Gibson,
and Beegle 2020). For example, in the Tanzania experiment analyzed by Beegle et al. (2012), cutting the
number of items on the food recall list from 58 to 17 reduced interview time by just 15 percent on average
(going from 49 minutes to 41 minutes). Likewise, in an experiment in Indonesia, cutting the length of the
food recall list from 229 items to just 21 reduced the average interview time (for food at home) by just
nine minutes (Durazo et al. 2017).
   It appears that in the setting of this experiment, in Addis Ababa, using the unbounded recall as a way
to save time and money has a similarly modest effect on costs, while opening up the results to the impact
of telescoping errors, as shown above. The cost comparison for the two ways of implementing the recall
survey in the case of the present experiment is straightforward, as the two arms are identical in all aspects
of the survey design and administration (table 1), except that households in the bounded group were
visited by a survey supervisor seven days prior to the actual survey to establish the recall marker in the
minds of respondents. In practice, this procedure resulted in an additional seven days of field work by
survey supervisors prior to the actual survey commencement, so households scheduled to be interviewed
during the first week of the survey could be visited. Once the actual survey begins, the study simply adjusts
the regular appointment visits by survey supervisors for them to make visits seven days prior to the actual
survey for households in the bounded group. Thus, the field expenses associated with an extra seven days
of field work by supervisors prior to the actual survey were the only additional costs due to bounded
recall. A back-of-the-envelope calculation suggests these extra costs averaged US$3.60 per household,
which is 6.5 percent higher than the cost for the unbounded recall.
   In other settings, the cost of using a two-visit format to implement bounded recall may be higher.
For example, in surveys of rural communities where transportation costs would be higher, arranging
two visits to the household would have a bigger impact on both overall costs and the time demands
on supervisors. Indeed, in the experiment in the Marshall Islands (Sharp et al. 2022), where the overall
cost was more than US$1,000 per household partly due to expensive boat travel to atoll locations, using
the two-visit recall format versus single-visit unbounded recall appears to have increased costs by at
least 30 percent. However, the bounded approach can be feasible in other rural surveys, especially with
resident enumerators. For example, the Ethiopian household consumption expenditure survey (HCES),
a nationally representative survey and the official source of poverty statistics in the country, is based on
a two-visit structure (without bounded recall). At least in Ethiopia, the timing of the HCES visits can
potentially be adjusted with minimal cost implications.
The World Bank Economic Review                                                                                            905


   Clearly more experiments are needed to understand both the cost and benefit implications of the two-
visit format in different settings. However, at least in urban areas the cost savings from using an unbounded
recall may not be substantial. Yet it is in urban areas where unbounded recall surveys may be most
susceptible to telescoping errors; it is urban households whose diets are likely to include infrequently




                                                                                                                                  Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
consumed nonstaple foods such as meat and eggs that appear to be most prone to overstated reports of
consumption due to telescoping. Thus, a benefit-cost comparison of the two-visit format may be most
favorable in urban areas.


6. Discussion and Conclusions
This paper reports on a survey experiment in Addis Ababa conducted with a sample of households that
were asked to recall seven days of food consumption. The aim of the experiment was to help understand
the effects of telescoping errors on food consumption measures. A randomly selected subsample had their
seven-day recall bounded by a short visit from survey supervisors seven days before the actual enumeration
(and the questions were then phrased in terms of consumption since the visit by the survey supervisor).
Their reported food consumption is compared with that of a subsample whose food consumption was
reported with an unbounded recall. The unbounded recall survey design is currently widely used, while
the bounded recall design was used in the past in the early stages of the LSMS surveys. The paper finds that
reported per capita household food consumption expenditures are 16 percent higher among households in
the unbounded group. The apparent per capita protein consumption for this subsample is also 16 percent
higher and per capita calories 9 percent higher than among the bounded recall group. These differences
are most likely due to forward telescoping, which appears predominantly to occur for foods that are less
frequently consumed, such as animal-source foods.
    The results have potentially important implications for three types of measurements, all of which are
of considerable importance for measuring progress towards attaining SDGs related to poverty, food in-
security, and inequality. First, poverty measurement might be affected in two ways. If households given
unbounded recall overstate food consumption, then they may appear less poor than they really are. For
example, for some household marginally below the poverty line, their reported consumption that in-
cludes the telescoping error could put them just above the line, and so the true poverty rate would be
understated. The strength of this effect depends partly on the type of poverty line: if it is either based on
a global standard (e.g., $1.90 per capita per day) or it is a poverty line established long ago whose value
is simply updated with some price index like a Consumer Price Index (CPI), then the error is in the wel-
fare measure but not in the threshold used to distinguish the poor from the nonpoor. Given that, at least
pre-COVID, consumption trends appear to include more luxuries in many countries, these trends could
lead to an overestimation of poverty reduction. On the other hand, if the poverty line and the welfare
measure are derived from the same survey that is subject to telescoping bias—as could be the case for a
cost-of-basic needs food poverty line—then the errors might cancel out.27
    Second, estimates of food insecurity, of dietary quality, and more generally of hunger, that rely on food
consumption surveys are likely to be affected by these telescoping errors. There is already considerable
debate about these measurements and the role of household consumption expenditure survey data (see,
for example, De Weerdt et al. 2016 for a summary). Given that telescoping errors have a larger effect for a
shorter recall period, the recent recommendation by the FAO and World Bank (2018) to harmonize food
consumption surveys in low- and middle-income countries by using a one-week recall period may see a
spurious rise in apparent diet quality and a fall in apparent hunger, because the bias due to telescoping

27   However, the usual approach to setting cost-of-basic needs food poverty lines relies only on a calorie target, and the
     results here show that calories are less overstated by telescoping bias than is overall food consumption. So once this
     feature of poverty lines is allowed for, the error in the welfare measure could be larger than the error in the threshold.
     This feature may be another reason to use food poverty lines that are derived from linear programming to consider
     dietary requirements for all macro- and micronutrients (Ameye, De Weerdt, and Gibson 2021).
906                                                                                                     Abate et al.


will be relatively more important than it was when food consumption surveys were using longer recall
periods.
    Third, consider again the fact that infrequently consumed foods, including those higher in protein,
appear to be the most overstated in a seven-day recall relative to the effect on reported consumption of




                                                                                                                       Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
foods that are consumed more frequently (such as major staples). It is easy therefore to see that the share
of protein-rich foods in the overall food budget will be overstated, and this will tend to bias food-demand
elasticities (Deaton 1997). As these elasticities are used in macro-level models that are attempting to
measure food demand, the results of such models should be treated with caution when predicting demand
for anything but staple foods.
    There are two obvious ways in which the experiment here could be extended to learn more about these
telescoping effects. The first is to include both urban and rural households in any future experiment, as the
costs may be higher and the benefits lower, of implementing the bounded recall approach in rural areas.
The second extension is to consider other ways to reduce telescoping errors without using the two-visit
bounded recall approach. For example, if corroborating evidence is generated showing that infrequently
consumed foods are especially prone to telescoping , survey experiments could examine other ways to ask
questions about these foods even within the single-visit format. For example, the standard design is to use
a fixed interval, such as the last seven days, and then ask for each food in turn whether a consumption
episode occurred within that interval. For the infrequently consumed foods, it is evidently difficult for
many respondents to correctly place the last consumption episodes into this interval. It may instead be
more natural for some foods to have respondents answer in terms of when was the last occasion that
the food was consumed, perhaps with some prompts about the sort of occasions when such foods might
be consumed—especially for expensive and rarely consumed foods—and to then “unfold” the questions
from that remembered event.


Data Availability
Raw data for this article are available at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:
10.7910/DVN/Z51Y1T. Replication data are available at IFPRI’s dataverse: https://dataverse.harvard.
edu/dataverse/IFPRI.


References
Abate, G. T., K. Baye, A. de Brauw, K. Hirvonen, and A. Wolle. 2021. “Video-Based Behavioral Change Communica-
  tion to Change Consumption Patterns: Experimental Evidence from Urban Ethiopia.” IFPRI Discussion Paper no.
  2052. International Food Policy Research Institute (IFPRI). Washington, DC, USA.
Abay, K.A., G.T. Abate, C.B. Barrett, and T. Bernard. 2019. “Correlated Non-Classical Measurement Errors, ‘Sec-
  ond Best’ Policy Inference, and the Inverse Size-Productivity Relationship in Agriculture.” Journal of Development
  Economics 139: 171–84.
Ameye, H., J. De Weerdt, and J. Gibson. 2021. “Measuring Macro-and Micronutrient Consumption in Multi-Purpose
  Surveys: Evidence from a Survey Experiment in Tanzania.” Food Policy 102 (July): 102042.
Backiny-Yetna, P., D. Steele, and I.Y. Djima. 2017. “The Impact of Household Food Consumption Data Collection
  Methods on Poverty and Inequality Measures in Niger.” Food Policy 72 (October): 7–19.
Beegle, K., J. De Weerdt, J. Friedman, and J. Gibson. 2012. “Methods of Household Consumption Measurement
  through Surveys: Experimental Results from Tanzania.” Journal of Development Economics 98 (1): 3–18.
Brzozowski, M., T.F. Crossley, and J.K. Winter. 2017. “A Comparison of Recall and Diary Food Expenditure Data.”
  Food Policy 72: 53–61.
Chang, L., and J.A. Krosnick. 2003. “Measuring the Frequency of Regular Behaviors: Comparing the ‘Typical Week’
  to the ‘Past Week.’” Sociological Methodology 33 (1): 55–80.
Cole, D., and J. Utting1956. Estimating Expenditure, Saving and Income from Household Budgets. Journal of the
  Royal Statistical Society. Series A (General), 119(4): 371–92.
The World Bank Economic Review                                                                                 907


De Weerdt, J., K. Beegle, J. Friedman, and J. Gibson 2016. “The Challenge of Measuring Hunger through Survey.”
   Economic Development and Cultural Change 64(4): 727–58.
De Weerdt, J., J. Gibson, and K. Beegle. 2020. “What Can We Learn from Experimenting with Survey Methods?”
   Annual Review of Resource Economics 12 (1): 431–47.
Deaton, A. 1997. The Analysis of Household Surveys: A Microeconometric Approach to Development Policy. Balti-




                                                                                                                      Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
   more: Published for the World Bank by Johns Hopkins University Press,.
Deaton, A., and M. Grosh. 2000. “Consumption.” In Designing Household Survey Questionaires for Developing
   Countries: Lessons from 15 Years of Living Standards Measurement Study, Vol. 1. Edited by M. Grosh and P.
   Glewwe, 91–133. Washington, DC: World Bank.
Deininger, K., and L. Squire. 1996. “A New Data Set Measuring Income Inequality.” World Bank Economic Review
   10 (3): 565–91.
Dex, S. 1995. “The Reliability of Recall Data: A Literature Review.” Bulletin de Methodologie Sociologique 49 (1):
   58–89.
Durazo, J., Y. Herawati, G. Pattinasarany, and D. Jolliffe. 2017. “Bridging Surveys: A Methodological Experiment
   of Food Consumption Data in Indonesia.” Paper presented at the 15th European Association of Agricultural
   Economists Congress, Parma, Italy, August 28 to September 1.
Eisenhower, D., N.A. Mathiowetz, and D. Morganstein. 2004. “Recall Error: Sources and Bias Reduction Techniques.”
   Measurement Errors in Surveys, 125–44.
Ethiopia Public Health Institute (EPHI). 1981. Expanded Food Composition Table for Use in Ethiopia. Addis Ababa:
   EPHI.
Food and Agriculture Organization of the United Nations (FAO) and the World Bank. 2018. Food Data
   Collection in Household Consumption and Expenditure Surveys : Guidelines for Low- and Middle-Income
   Countries. FAO and the World Bank, Rome and Washington, DC, USA. Accessed on 21 June, 2020 from:
   https://openknowledge.worldbank.org/handle/10986/32503.
Friedman, J., K. Beegle, J. De Weerdt, and J. Gibson. 2017. “Decomposing Response Error in Food Consumption
   Measurement: Implications for Survey Design from a Randomized Survey Experiment in Tanzania.” Food Policy
   72: 94–111.
Gibson, J. 2020. “Measuring Chronic Hunger from Diet Snapshots.” Economic Development and Cultural Change
   68 (3): 813–38.
Gibson, J., K. Beegle, J. De Weerdt, and J. Friedman. 2015. “What Does Variation In Survey Design Reveal about the
   Nature of Measurement Errors in Household Consumption?” Oxford Bulletin of Economics and Statistics 77 (3):
   466–74.
Grootaert, C. 1986. “Measuring and Analyzing Levels of Living in Developing Countries: An Annotated Question-
   naire.” Living Standards Measurement Study (LSMS) Working Paper, LSMS 24. World Bank. Washington, DC,
   USA.
Hirvonen, K., G.T. Abate, and A. de Brauw. 2020. “Food and Nutrition Security in Addis Ababa, Ethiopia during
   COVID-19 Pandemic: May 2020 Report.” IFPRI-ESSP Working Paper, 143. Ethiopia Strategy Support Program
   (ESSP) of the International Food Policy Research Institute (IFPRI). Washington, DC, USA
Hirvonen, K., A.S. Taffesse, and I. Worku. 2016. “Seasonality and Household Diets in Ethiopia.” Public Health Nu-
   trition 19 (10): 1723–30.
Hoddinott, J., and Y. Yohannes. 2002. “Dietary Diversity as a Food Security Indicator.” IFPRI-FCND Discussion
   Paper 136. International Food Policy Research Institute (IFPRI). Washington, DC, USA.
Koenker, R., and G. Bassett. 1978. “Regression Quantiles.” Econometrica 46 (1): 33–50.
Mahalanobis, P., and S. Sen. 1953. “On Some Aspects of the Indian National Sample Survey.” Bulletin de l’Institut
   International de Statistique 34 (2): 5–14.
McKenzie, D., and M. Rosenzweig. 2012. “Preface for Symposium on Measurement and Survey Design.” Journal of
   Development Economics 98 (1): 1–2.
Meager, R. 2022. “Aggregating Distributional Treatment Effects: A Bayesian Hierarchical Analysis of the Microcredit
   Literature.” American Economic Review 112 (6): 1818–47.
Melesse, M.B., M. van den Berg, A. de Brauw, and G.T. Abate. 2019. “Understanding Urban Consumers’ Food Choice
   Behavior in Ethiopia: Promoting Demand for Healthy Foods.” IFPRI-ESSP Working Paper 131. International Food
   Policy Research Institute (IFPRI). Washington, DC, USA.
908                                                                                                  Abate et al.


Morwitz, V.G. 1997. “It Seems like Only Yesterday: The Nature and Consequences of Telescoping Errors in Marketing
  Research.” Journal of Consumer Psychology 6 (1): 1–29.
Neter, J., and J. Waksberg. 1964. “A Study of Response Errors in Expenditures Data from Household Interviews.”
  Journal of the American Statistical Association 59 (305): 18–55.
Pradhan, M. 2009. “Welfare Analysis with a Proxy Consumption Measure: Evidence from a Repeated Experiment in




                                                                                                                    Downloaded from https://academic.oup.com/wber/article/36/4/889/6698557 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
  Indonesia.” Fiscal Studies 30 (3-4): 391–417.
Schündeln, M. 2018. “Multiple Visits and Data Quality in Household Surveys.” Oxford Bulletin of Economics and
  Statistics 80 (2): 380–405.
Sharp, M.K., B. Buffière, K. Himelein, N. Troubat, and J. Gibson. 2022. “Effects of Data Collection Methods on
  Estimated Household Consumption and Survey Costs.” World Bank Policy Research Working Paper 10029. The
  World Bank. Washington, DC, USA.
Swindale, A., and P. Bilinsky. 2006. Household Dietary Diversity Score (HDDS) for Measurement of Household Food
  Access: Indicator Guide. Food and Nutrition Technical Assistance Project, Academy for Educational Development,
  Washington, DC: FANTA FHI 360.
United States Department of Agriculture (USDA). 2013. National Nutrient Database for Standard Reference, Release
  28.
van Garderen, K., and C. Shah. 2002. “Exact Interpretation of Dummy Variables in Semilogarithmic Equations.”
  Econometrics Journal 5 (1): 149–59.
White, H. 1980. “A Heteroskedasticity-Consistent Covariance-Matrix Estimator and a Direct Test for Heteroskedas-
  ticity.” Econometrica 48 (4): 817–38.
World Food Programme (WFP). 2008. Food Consumption Analysis: Calculation and Use of the Food Consump-
  tion Score in Food Security Analysis. World Food Programme (WFP), Vulnerability Analysis and Mapping Branch
  (ODAV), Rome, Italy.
Zezza, A., C. Carletto, J.L. Fiedler, P. Gennari, and D. Jolliffe. 2017. “Food Counts. Measuring Food Consumption
  and Expenditures in Household Consumption and Expenditure Surveys (HCES). Introduction to the Special Issue.”
  Food Policy 72: 1–6.