Policy Research Working Paper 10680 Rural Labor and Long Recall Loss Kate Ambler Sylvan René Herskowitz Mywish K. Maredia Development Economics A verified reproducibility package for this paper is Development Impact Group available at http://reproducibility.worldbank.org, January 2024 click here for direct access. Policy Research Working Paper 10680 Abstract Surveys frequently rely on annual recall to capture individu- relative to the shorter window. These losses are greater for als’ labor activities over the preceding year. This paper uses a activities further in the past and especially for individuals panel of rural households in Malawi for a survey experiment whose labor supply is reported by other family members, to test the effect of a long, annual recall window on reported reaching up to 50 percent for some outcomes. The profile labor supply relative to a set of quarterly interviews. The of households’ primary respondents, predominantly male paper documents large losses in reported labor participa- and older, and differential effects by age further suggest that tion using the long recall window with reductions of over long recall may cause meaningful biases in the resulting data 20 percent of reported activities and months worked and for women and younger household members. a 2.5 times greater incidence of reported unemployment This paper is a product of the Development Impact Group, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at sherskowitz@worldbank.org. A verified reproducibility package for this paper is available at http:// reproducibility.worldbank.org, click here for direct access. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Rural Labor and Long Recall Loss Kate Ambler Sylvan René Herskowitz Mywish K. Maredia IFPRI World Bank Michigan State University Keywords: Labor, Survey Methodology, Measurement Bias, Recall Windows, Proxy Response JEL Codes: O1, J2, C8, Q1 This work was supported by the CGIAR Research Program on Policies, Institutions, and Markets and was approved by the IFPRI Institutional Review Board and the Malawi Committee on Social Science and Humanities, National Commission for Science and Technology. We thank the MwAPTA Institute for collecting the baseline data, GeoPoll for data collection, and Laura Leavens for research support. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. 1. Introduction and Background Understanding peoples’ productive activities over the course of the year is important both to document how people earn their livelihoods and also to ensure that poverty programs are well designed and well-targeted. Measuring employment is especially challenging in settings where productive activities are informal, with irregular intensity of participation, and seasonal, where much of the effort and earnings are concentrated in specific periods of the year (Feuerbacher et al. 2020). These characteristics are particularly relevant in rural labor markets in low-income countries that rely heavily on employment in the agriculture sector (Fink, Jack, and Masiye 2020; World Bank, 2017). The prevailing survey methodology literature considers shorter recall periods to be more accurate, often relying on a standard seven-day recall window (Durazo et al. 2021). However, relying on reported labor activities over the most recent seven days is likely to miss individuals’ full range of labor contributions and work activities over a full year, especially in highly seasonal and informal labor markets. To address this gap, standard data collection practices typically also include questions about productive activities of each household member over the past year.1 While existing evidence is mixed, use of such a long window may lead to losses in data and data quality.2 In this paper, we use a survey experiment to test the effect of recall windows on reported annual work behaviors, contrasting reported labor participation using a long (annual) recall method to those resulting from a set of quarterly interviews that follow a similar structure and set of questions, but over shorter, 90-day, recall windows. A commonly used format for annual recall labor questions in multitopic household surveys asks the household’s primary respondent to report on their labor activities over the preceding year, including type of activity, periods worked, and intensity of participation. These questions are asked about their main activity over the year as well as any secondary activity. While there is often a stated preference for having each household member respond for themselves, frequently, either when that respondent is unavailable or when survey budget resources are insufficient to allow for 1 The most common format for large multi-topic surveys such as the World Bank’s Living Standards Measurement Study is to ask about activities in the last seven days in detail, with follow-up questions regarding the last 12 months (see Durazo et al. 2021). 2 See de Weerdt, Gibson, and Beegle (2020) for an excellent review of the survey methods literature that includes an overview of both recall windows and proxy reporting. self-reporting by all household members, the household’s primary respondent will then report on behalf of other members as well, “proxying” their responses.3 Recently, the global penetration of cell phone access has introduced phone surveys as a method of data collection that could allow for more frequent interview intervals along with shorter recall periods at a lower cost. While weekly interviews have been shown to be less vulnerable to recall loss than longer windows (Arthi et al. 2018), weekly interviews across an entire year or season would be prohibitively expensive for most research budgets and highly fatiguing for respondents. In this paper we use a survey experiment to test an intermediary option, comparing quarterly (90- day) with an annual recall question using a series of phone surveys. Labor outcomes measured using the long recall responses result in much lower recorded levels of any work involvement (22%), number of unique activities reported (24%), and number of months worked (20%) compared to those based on the quarterly measures, although we do not find significant losses in total hours worked. These gaps increase with the amount of time that has passed. Proportional losses from long recall (relative to short recall averages) are, depending on the outcome variable, between two and ten times larger for proxied individuals than for households’ primary respondents who self-report. Splitting the sample by self and proxy reported individuals, we explore heterogeneity by gender and age of the respondent as well as of the proxied household member. We do not see clear patterns of heterogeneity for self-reports or by the primary respondent’s characteristics when reporting for others. We do, however, see that younger household members have greater relative losses from long recall when their responses are proxied than older proxied household members. The profile of primary respondents (generally household heads) further biases annual recall data against women and younger household members, as they will be exposed to larger losses from long recall as a consequence of having their labor contributions more frequently reported by proxy. This paper contributes to a growing body of work on the effects of recall periods in the survey methodology literature. A considerable body of evidence suggests that longer recall periods tend to undercount household consumption, provision of agricultural inputs, and negative health events 3 The Living Standards Measurement Study (LSMS) surveys follow a similar structure and have a similarly stated preference for self-reporting whenever possible. In a review of six LSMS surveys, Desiere and Costa (2019) find that the use of proxy reporting is still widespread, ranging between 24% in Nigeria up to 85% in Mali. relative to shorter recall windows.4 The evidence on recall windows as it applies to labor activities, however, is more mixed. In a study based in Tanzania, Arthi et al. (2018) compare reported contributions of farm labor in an agricultural plot-based module using weekly recall with those from an end of season recall window. Longer recall windows exaggerate the number of hours worked per person by a factor of four, although the reported number of active plots and individual level participation are undercounted. In our rural sample in nearby Malawi, we find reductions in reported labor contributions using the longer recall periods relative to a shorter recall window, but no overall difference in hours worked, the intensive margin on which Arthi et al. (2018) find positive effects. In urban labor markets, Heath et al. (2021) examine very short recall windows and find modest effects of switching from weekly to daily recall with higher reported self- employment spells but no impact on wage employment. However, Garlick et al. (2020) find that weekly versus monthly surveys did not influence data quality or reported microenterprise activities. This paper also links to a second area of the methods literature centered on the use of proxy responses for data collection. Of note, Bardasi et al. (2011) find that males are more affected by proxy losses in reported agricultural labor activities than women, though this is reduced when the primary respondent is his wife or well-educated. By contrast, Serneels, Beegle, and Dillon (2017) find that proxying does not affect estimates of returns to education. Another related paper in this literature is by Kilic et al. (2022), that also intersects with the recall literature on labor measurement. They use two nationally representative surveys in Malawi conducted in the same year but following different research protocols. They find underreporting of certain labor activities in the survey using “business as usual,” that allowed for proxy reporting and presence of other household members, relative to the other survey that required own response and privacy during the interview. Contrasting their setup, with multiple differences between the “business as usual” and the more stringent protocols, our study uses experimental variation designed to isolate a single mechanism, recall windows. Our heterogeneous results, showing greater long recall loss for 4 Das and Sanchez-Paramo (2012) find that one-third of acute illness is unreported when comparing one month to one week recall periods. Other notable examples showing high sensitivity of consumption to recall windows include Beegle et al. (2012b), De Weerdt et al. (2016), Backiny-Yetna et al. (2017), and Di Maio and Fiala (2019). Beegle et al. (2012a) find little evidence of distortions from longer recall lags in recording agricultural inputs and production. proxied individuals, are consistent with Kilic et al. as they compare the two survey methods with both annual and weekly recall. Finally, the findings in our paper have implications for a range of labor linked papers, especially those in rural settings that rely on similarly constructed data.5 Large, primarily descriptive literatures currently exist on rural labor, gender and age differences in labor contributions, and rural income diversification. Our results suggest that all these estimates could be meaningfully impacted by reliance on long recall-based data and further distorted by the use of proxy reporting. We discuss our survey experiment and data in Section 2 and empirical strategy in Section 3. Results are presented in Section 4, followed by a brief discussion in Section 5, and conclusion in Section 6. 2. Data Our survey experiment is built on a sample of households included in a separate study, the MwAPATA Instituteʼs Malawi Rural Agricultural Livelihood Survey (MRALS), conducted in- person in the fourth quarter of 2019. Their original sample is representative of farm households in rural areas of eight selected districts and used a multitopic household survey to collect information on demographics, health, socio-economic status, time use, and agricultural production from a sample of 3,259 households (Muyanga et al. 2020).6 2.1 Sample and survey structure Our analysis uses a sample of 701 households for whom a complete set of four quarterly phone- interviews were successfully completed. Starting with the MRALS sample of 3,259 households only 2,435 households had contact phone numbers and were therefore eligible for inclusion in our phone-based survey. Stratifying by region, 1,505 eligible households were randomly selected to be included in our study. 1,020 of these households were successfully contacted and included in Round 1 and ultimately, 701 households were successfully reached for the full set of interviews. 5 See Dzanku (2020), Abay et al. (2019), Asfaw et al. (2019), Yeboah and Jayne (2018), Imai et al. (2015), Himanshu et al. (2013), Djurfeldt (2013), Haggblade et al. (2010), Ellis (1998), and Ellis and Freeman (2004) for some recent examples. 6 The selected districts include two in the Northern Region (Rumphi and Mzimba), four in the Central Region (Lilongwe Rural, Dowa, Kasungu, and Mchinji), and two in the Southern Region (Neno and Blantyre Rural). Interviews were conducted by phone and the primary respondent was the respondent from the initial MRALS study, typically the household head. The labor module asked this respondent about their own labor participation and additionally to report the labor activities of up to two other adult household members ages 18 to 65. These adults were pre-selected randomly in households with more than three adults. While survey protocols allowed interviews to be done with other household members if the respondent was unavailable, in practice this happened very infrequently.7 Column 1 of Table 1 shows descriptive statistics of study participants who were successfully reached for the full set of quarterly interviews and who constitute the sample used in the empirical analysis. Household level characteristics, those of the designated respondent, and those of proxied respondents are shown in panels A, B, and C, respectively. Panel A shows that our final sample contains 701 households, with an average of just over five members split evenly between children and adults. In Panel B we see that 75 percent of primary respondents are men and their average age is 44. Sixty-nine percent are in a monogamous marriage while 10% are in polygamous marriages with 6.6 years of education on average. Panel C shows the characteristics of the other household members whose data was collected in the survey. These household members are younger, 30- years-old on average, and 76% are female with similar levels of education.8 Appendix Table 1 shows these summary statics from the analysis sample with those targeted from the sampling of the initial MRALS sample. They are nearly indistinguishable. Households were contacted approximately once per quarter over the course of a year. While the surveys were intended to be spaced evenly, three months between surveys, in practice logistical issues affected this timing and led to some variation in the actual interval between interviews. Dates and timing of each survey wave are illustrated in Figure 1. These quarterly surveys were designed to be brief, averaging under 20 minutes, focused on labor activities of adult household members. The labor module followed a similar structure to that used in many household surveys where respondents are asked about their primary work activity, with a recall period of the preceding 90 days. Follow-up questions capture further details about that activity in particular, which months they spent doing it, and how many days and hours they spent on it in a typical month that they were doing it (in that recall window). After reporting that activity, they were asked if 7 This occurred in 2.3%, 2.0%, 0.9%, and 0.7% instances across rounds 1 to 4, respectively. 8 There is slightly higher attrition among this group resulting from a requirement that individuals were living in the household in all four rounds in order to be included in the analysis sample. there was a secondary activity that they were engaged in over that same 90-day window and, if so, to provide similar follow-up information.9 After completing this, the respondent was then asked the same sequence of questions about up to two other adult household members (if available). The interviewers asked to speak to the same primary household respondent in each follow-up round of the survey and asked them about the labor activities of the same family members as those identified and included in the first interview. In the final interview, respondents were additionally asked to report primary and secondary activities over the past 12 months. Aside from the reference window, the follow-up questions on the activity details were held the same, with the list of months spanning the whole year instead of the last three months. If the sequence of these questions affects respondents’ responses, possibly through either framing effects or fatigue, as has been shown in other studies (e.g., Ambler et al. 2021; Jeong et al. 2023), this could bias the resulting data when using both measures for all individuals in the same order. To address this concern, we therefore randomized the order of the annual recall questions to be either just before or just after the 90-day recall for all family members. In the analysis, we use the reported labor participation of individuals using the recall window which they were asked first, to avoid the possibility of order-induced bias. The analysis then leverages this randomization to test the effect of long versus short recall windows. Columns 3 and 4 of Table 1 show summary statistics and baseline balance between individuals in households who were randomly assigned to have the short or the long recall questions asked first, respectively. The final column shows the p-value of a test for difference between the two groups. We do not find any statistically significant differences between our two randomly assigned groups. 2.2 Outcome variables Our analysis focuses on four primary labor supply measures. We examine two extensive margin measures: whether or not the person had any work activities reported about them at all and the number of unique activities they worked; and two intensive margin measures: the number of months worked and the number of hours worked. 9 Respondents were then asked about their primary and secondary activities over the preceding week and, if distinct from the 90-day activities, details about those activities. Across all rounds, new jobs were reported in the 7-day category fewer than ten times. We use the 90-day recall periods to construct quarterly estimates of each of these outcomes. For the intensive margin outcomes, we use the survey date and the variables indicating which months the indicated person worked to calculate the percentage of the month that is applicable for each individual. This is then used to calculate the months and hours worked during that 90-day period.10 Using the 12-month recall data we create similar quarterly measures to match the dates of the shorter recall.11 Figure 1 provides a timeline of the four rounds of the surveys, showing the survey dates, the months covered in each recall period, along with the agricultural calendar for maize, the main staple crop in Malawi. We note that there was a delay between the third and fourth (final) survey. This has the effect that some of the time covered in the 90-day recall asked in the Round 1 interview is not well covered by the 12-month recall questions asked at the endline. As such, we observe very low levels of work in Quarter 1 as measured in the 12-month recall data. In particular, we were concerned that including all four rounds of short recall from the four surveys would result in artificially low estimates of work in Quarter 1 for the long recall group. Additionally, these delays resulted in capturing labor data for May and June—the harvest months for the primary agricultural season, in both Round 1 and Round 4. To address both of these issues, we exclude Quarter 1 from our main analysis and use quarters 2- 4 to improve the comparability of the short and long recall periods. As seen in Figure 1, these three quarters cover the main components of one maize growing season, although there is a small gap between the final two interviews and some seasonal activities just after harvest time may be undercounted and missed in our estimates. This would likely lead to an understatement of long recall losses. We show robustness (and strengthening) of our main results using all four interviews in Appendix Table 1. To identify the number of jobs done by individuals over the full study period, we linked reported activities across survey rounds, so that if a respondent reported working the same job in all four 10 This is necessary because for a 90-day recall period, there are 4 calendar months in which the respondent may have worked. To estimate the months worked or hours worked, we must calculate the relevant fraction of the first and fourth month options. 11 We calculate these quarterly measures by using the question in the 12-month recall module that asks respondents to indicate which months they worked in each job, and then use the survey dates from the quarterly interviews to reflect the same time periods covered by the short recall questions. We use the day of survey for each household, so the period covered varies slightly by household based on the date of interview. rounds it is only counted as one unique job.12 Finally, we sum the quarterly estimates to create aggregate measures comparable to a 9-month recall. The main analysis compares the aggregated estimates collected by quarterly or long recall. In comparing labor aggregates using the short and long recall windows, two features of the data should be acknowledged. In the short recall estimates, respondents were able to list different jobs in each round so that, if they reported a different primary and secondary activity in each round, they could have up to eight unique activities.13 Consistent with common survey practices, the long recall questions only allow for a maximum of two work activities. The opportunity to report a greater diversity of work activities may be an advantage of the short recall estimates. In practice, only 2% of individuals reported to have participated in more than two unique activities based on the short recall data and only 25% of individuals reported to have done two unique tasks in the long recall data (the value at which their responses would be censored). While this mechanical difference in the range of possible reported values may influence the analyses on a handful of individuals, the low incidence of cases in which they would be binding makes it unlikely to be driving the results. We return to this later in our discussion of results. A second difference is that when calculating hours worked, we rely on a series of questions that ask respondents which months were worked in that time frame, the number of days in a typical month, and then the number of hours in a typical day. In the short recall responses, the number of days and hours apply to those months worked in that quarter, for long recall those values are applied to all months worked throughout the year. The short recall responses therefore allow for greater variability in recorded labor supply within periods worked for the same activity. We also return to this later in the discussion but generally consider the greater flexibility in reported labor intensity across the year using the short recall window to be an important benefit of higher frequency interviews where differences in intensity are missed when relying on annual recall. The main 9-month aggregate outcome variables described here are summarized in Table 2, separately by short and long recall group (and anticipating the first, main result). Ninety- one 12 This was done by preloading the activities reported in earlier survey rounds and asking respondents to indicate when they were referencing the same activity as earlier. 13 Unique activities are distinct both by type of employment (e.g., self, wage, etc.) and the employer. For example, if a person reported wage employment with entity A and entity B, these would be considered as two different activities. percent of those in the short recall group report working at all, in an average of 1.19 activities in 3.6 months for 360 total hours. With the exception of total hours worked these measures are all higher for short recall compared to long recall. Ninety percent of subjects in the short recall group report working on the household farm or in other household agricultural businesses. Much smaller numbers report working in a non-agricultural family business (9.4%) or as wage employees (13.7%). Like the overall measures, these numbers are all lower in the long recall group. Overall, this sample is heavily involved in household agricultural activities, with limited participation in other activities. Non-farm activities, when reported, are more likely to be secondary activities. 3. Empirical Approach In the final interview, we randomized the order of the questions in the labor module so that some households used the long (annual) recall window first while others used the short (90-day) recall first. For those who received the short recall questions first, we use labor measures based exclusively on their quarterly survey responses, and for those who received the long recall first, we use measures based exclusively on the annual recall responses.14 Our main analysis leverages this randomization to test for differences between these alternative recall windows using the following main empirical specification: = 0 + 1 + + The coefficient of interest is 1, the difference in average reported labor measures between individuals using long and short recall. is a set of gender by age group by reporting status (proxy or self) fixed effects. Age groups are defined as individuals under 25, 25-34, 35-49, and 50 and above, approximately quartiles of the study sample. Robust standard errors are clustered at the household level, the level of the treatment. In additional analyses we test for heterogeneity of these differences by proxy status, gender, and age. In these analyses we use fully saturated regression models to test each group’s differences against a null hypothesis of no effect and report p-values for differences across groups. 4. Results 14 As discussed in Section 2, we have taken this approach of only using the data from the module that was administered first in the final round to avoid concerns over the potential influence of asking 90-day recall questions just before annual recall questions (or vice versa). Appendix Table 2 shows that this ordering indeed has a substantial impact, particularly on the long recall reported values. 4.1 Main outcomes Table 3, Panel A, presents the main results. In columns 1 and 2 we examine whether the person worked at all and the total number of unique jobs reported. We find that long recall led to large reductions in both of these measures. Long recall reduces any reported work participation by 20 percentage points relative to short recall. Just 8% of individuals report no work whatsoever in the short recall, so an increase by 20 percentage points constitutes an enormous difference in reported labor force participation. Long recall also reduces the number of unique jobs reported by 0.3, which is 24% of the short recall mean. On the intensive margin of labor supply, column 3 shows a similar pattern for the number of months worked. Long recall reduces the reported number of months worked by 0.7 relative to short recall, 20% of the short recall mean. However, when considering hours worked, in column 4, there is only a small negative coefficient that is not statistically different from zero. In columns 5, 6, and 7 of Table 3, Panel A, we consider the types of work reported for each individual. We report the impact of long recall on having worked on the household farm or in an agricultural-related home business, participation in a non-agricultural home business, or having done wage labor of any kind. As expected in a rural sample, agricultural work is the most common with 89% of individuals reporting participation in the short recall group. Long recall reduces this rate by 22%. These relative losses are comparable to those for having worked at all. Looking at non-agricultural household business and wage work, long recall reduces reported participation by six and seven percentage points, respectively, from short recall bases of 9 and 14 percent. These translate into very large proportional effects of 65% and 51%, suggesting disproportionate losses in reported non-agricultural labor contributions when using long recall that may reflect important sources of income diversification in this rural sample. Given such strong effects on the other measures, the lack of detectable effect on the number of hours worked is surprising. One possibility is that respondents report hours worked in high intensity seasons instead of values that correspond more closely to a true mean across all months in a given range. This would mean that while reported hours worked vary between relatively high and low seasons in short recall, the maximal levels of labor intensity end up getting applied to all months worked across the full year in the long recall, generating an inflated estimate of hours even when months and activities are held constant. This pattern of reporting is similar to behaviors shown in Arthi et al. (2018). It is also consistent with the reporting in our data. For those participating, the average hours per month worked under short recall (averaged across rounds) is 96.4 for primary jobs and 83.4 for secondary jobs. The hours per month for the long recall are notably higher, 111.9 for primary jobs and 115.7 for secondary jobs. 4.2 Heterogeneity by self and proxy reporting Next, we examine heterogeneous impacts of long recall by proxy/self-reporting, gender, and age. Because a respondent is thought to have more complete knowledge of their own labor contributions than those of other household members, use of long recall when reporting on behalf of others could be more taxing and more susceptible to omission. We present these results in Table 3, Panel B, estimating effects by proxy status using a fully saturated regression model, such that the reported interaction is the long recall effect for each group compared to no effect. Beneath each column, we also report the means in the short-recall group separately for self and proxy reports, the scaled differences of the treatment effects, and the p-value for the difference between the two groups. In considering these patterns we emphasize that a household’s primary respondent, and therefore proxy or self-reporting status, was not randomly assigned and therefore we cannot make causal claims of the effects of proxying itself on reported measures.15 However, we can show how absolute and relative gaps between long and short recall differ by proxy status in our sample. Across the first four outcomes, losses from long recall are larger for proxied individuals, in both absolute and relative (to their short recall mean) terms, although we lack sufficient precision to say whether these differences are statistically significant for the two intensive margin measures. Long recall loss for working at all among those who self-report is 5 percentage points, or 5% considering a short recall mean of almost 100%. The effect among those proxied is large: 42 percentage points or 54% of the short-recall mean for proxied individuals. These large differences between self and proxy reports may be due to increased recall bias in proxy reports or to the fact that primary respondents are engaged in different types of labor activities that may be less vulnerable to omission. When considering total activities and months worked (columns 2 and 3), the proportional effect is larger than for working at all (15% compared to 5%) suggesting that recall bias may differentially affect the reporting of a second job or the specific months worked for those The survey respondents are almost exclusively household heads and, as reported in Table 1, are overwhelmingly 15 male and significantly older than the proxied respondents. who are self-reporting. For these measures, the effect of long recall continues to be larger among the proxied respondents, both in absolute magnitude and percent effect. However, the difference is not statistically significant for months worked. As in the overall sample, there is no impact on hours worked, even while the point estimates are consistent with proxy reported labor being more vulnerable to long recall loss.16 4.3 Heterogeneity by gender and age Next, we examine differences in long recall loss by gender and age within respondent type (self or proxy). We begin with heterogeneity by gender, focusing on the impacts of long recall on total activities and number of months worked in Table 4, with worked at all and hours worked reported in Appendix Table 3. For each outcome we estimate three specifications testing for different types of heterogeneity: (1) respondents (self-reports) only, considering heterogeneity by their own gender, (2) proxied individuals only, considering the gender of the household’s respondent, and (3) proxied individuals only, considering heterogeneity by their own gender. We do not observe any meaningful differences in long recall loss by respondent gender among self-reported measures in columns 1 and 4 or proxied measures in columns 2 and 5. Losses among proxied women appear to be larger in absolute and relative terms, especially for number of months worked in column 6, but the estimates lack precision to distinguish these differences from statistical noise. Finally, we perform a similar analysis on effects by age group in Table 5 and Appendix Table 4, replicating the specifications and table structure from the gender analysis. While tests for differences between all four age groups do not always show statistical significance, it appears that older respondents have higher levels of long recall loss when reporting their own labor activities. Although column 2 does not show clear patterns, column 5 suggests that this pattern may be reversed when reporting about other household members, with older respondents exhibiting less long recall loss than younger respondents. Finally, younger household members appear to be more affected by long recall loss in both absolute and relative magnitude with significant differences among those under 25 or 25-34 when tested against people 50 and above. 16 Regarding types of activities in columns 5-7 the patterns for farm work are similar to working at all. For non- agricultural businesses and wage work we observe larger absolute effects for those who are self-reporting; the proportional effect for non-agricultural businesses is larger for respondents and the effect on wage work is larger for proxies. In the latter case however, we cannot reject that the two effects are equal. 4.4 Time pattern of results To better understand the dynamics and mechanics of these results, we show results for the four main outcomes by quarter, again focusing on quarters 2, 3, and 4 in Figure 2. Consistent with our discussion in section 2, we omit quarter 1 as the long recall window did not sufficiently overlap with the first short-run recall to afford comparable reference periods. Each panel shows the regression adjusted means by quarter, with a 95% confidence interval on the difference between the two recall groups. Across all four outcomes, we see that long recall labor measures are furthest below the short run measures in the second quarter, the quarter furthest from the time of the final interview, and the difference is highly significant, showing proportionate reductions of approximately 40%. Reported differences in labor participation for any work, number of activities, and number of months worked grow smaller as the amount of time since the endline is reduced in quarters 3 and 4. However, the intensive measure of hours worked shows a sudden re-widening in the final quarter. And for both the number of months and hours worked, the long recall responses in quarter 4 are significantly greater than those based on the short recall-based responses. The patterns shown in these results are puzzling given that major activities in the long recall should not be missed by the major activities that take place in 90-day segments within that range. One behavior that could lead to increasing labor measures over time in the long recall is if respondents have a tendency to “pull forward” the attributed months of a given activity from the months they report having started it up towards the present day at the time of the endline (in quarter 4). If, as discussed in section 4.1 respondents then apply their highest levels of participation for these activities, this could result in especially dramatic exaggerations in quarter 4. Despite this caveat, we generally observe meaningfully lower reported labor measurements when using long recall, with effects that increase along with time farther from the endline. 5. Discussion and Interpretation While we have discussed these gaps between long and short recall measurements as “long recall losses”, a few additional points deserve consideration in interpreting these differences as losses. First, the majority of current research suggests that reported data is less accurate with longer recall windows, however even the shorter, three-month recall window that we use as our benchmark may be missing meaningful activities that could be captured using even shorter interview intervals and recall periods. To the extent that our short recall responses are themselves suffering from recall loss, our estimated long recall losses will underestimate the full extent of missed labor contributions. Second, telescoping, the crowding in of actions that actually occurred just outside of the intended recall window, could lead to exaggeration of reported behaviors (Abate et al. 2022). While this phenomenon can affect responses using either short or long recall, short recall estimates could be especially impacted when aggregating across multiple survey rounds to characterize longer windows of time. However, the use of quarterly interviews may have also served as natural (if not explicit) time markers, bounding the time frame of their responses, which has been shown in other settings to limit the influence of telescoping (Abate et al. 2022). Further, asking individuals to specify specific months of participation in activities may further limit risks of these effects. A final consideration is that the act of responding to additional interviews, as used to collect the short recall data, might itself impact peoples’ survey responses or affect attrition (Arthi et al. 2018; Zwane et al. 2011). We note that because everyone in our analysis sample was interviewed the same number of times, this does not threaten the internal validity of our results, though it could influence whether we think we would get the same long-recall responses in the absence of the earlier interviews. The setup of our study prevents us from directly testing this although it could be an important question for future research. Despite these caveats, this work suggests that there are advantages to the three-month windows, although future research could evaluate their performance against even shorter recall periods to further inform these tradeoffs along with their vulnerability to telescoping or distortions induced by repeated interviews. Ultimately, researchers will need to carefully consider their context and the objectives of their research. They must weigh the value of more precise shorter recall measurements and the improved ability to capture seasonal, secondary, and informal labor contributions against survey costs and the potential for respondent fatigue. 6. Conclusion The results presented in this paper suggest that using an annual recall window to measure seasonal variability in employment can lead to considerable losses relative to the use of a shorter recall window in a set of quarterly interviews. Secondary, non-agricultural sources of income are disproportionately impacted. The extent of the losses that we observe in our analysis are heavily influenced by whether an individual’s labor contribution is being self-reported or reported by the household’s primary respondent. Labor contributions of youth are more likely to be omitted when using longer recall windows and relying on proxy reports. Because women and youth are less likely to be household heads and therefore more likely to be proxied in household surveys, the use of long recall periods may especially undercount their labor contributions. The time pattern of results is suggestive of more underreporting the further is the time from the survey time frame, coupled with possible overestimation of hours worked. Given these losses, use of long recall windows can fundamentally affect our understanding of time patterns in peoples’ labor contribution, even when respondents are asked to disaggregate their work over the course of that period. This is not a principal goal of many multi-topic surveys that are primarily focused on the number of months worked and with even less emphasis on which ones. However, as policy makers increasingly demand data driven insights, continuing to refine and improve our methods of data collection is becoming more important. Understanding the seasonality, intensity, and diversity of labor is central to rural development planning. As this interest in the intensity, diversity, and seasonality of work continues to grow, an accompanying understanding of how to accurately measure labor contributions is becoming increasingly important. This objective cannot be separated from a need to understand how survey design and measurement choices, such as recall windows and proxy reporting, impact the resulting data, and finding ways to reduce their influence and potential biases. Bibliography 1. Abate, G., de Brauw, A., Gibson, J., Hirvonen, K., and Wolle, A. 2022. Telescoping Error in Recalled Food Consumption: Evidence from a Survey Experiment in Ethiopia. The World Bank Economic Review, Volume 36, Issue 4, November 2022, Pages 889–908. 2. Abay, K.A., Asnake, W., Ayalew, H., Chamberlin, J. and Sumberg, J., 2021. Landscapes of opportunity: patterns of young people’s engagement with the rural economy in sub-Saharan Africa. The Journal of Development Studies, 57(4), pp.594-613. 3. Ambler, K., Herskowitz, S., and Maredia, M.K. 2021.Are we done yet? Response fatigue and rural livelihoods, Journal of Development Economics, Volume 153, 2021, 102736 4. Arthi V, Beegle K, De Weerdt J, Palacios-López A. 2018. Not your average job: measuring farm labor in Tanzania. Journal of Development Economics. 130:160–72 5. Asfaw, S., Scognamillo, A., Di Caprera, G., Sitko, N. and Ignaciuk, A., 2019. Heterogeneous impact of livelihood diversification on household welfare: Cross-country evidence from Sub- Saharan Africa. World Development, 117, pp.278-295. 6. Backiny-Yetna P, Steele D,Djima IY. 2017. The impact of household food consumption data collection methods on poverty and inequality measures in Niger. Food Policy 72:7–19 7. Bardasi E, Beegle K, Dillon A, Serneels P. 2011. Do labor statistics depend on how and to whom the questions are asked? Results from a survey experiment in Tanzania. World Bank Economic Review. 25(3): 418–47 8. Beegle, Carletto, Himelein 2012a. Reliability of recall in agricultural data. Journal of Development Economics. 98: 34-41. 9. Beegle, de Weerdt, Friedman, Gibson 2012b. Methods of household consumption measurement through surveys: Experimental results from Tanzania. Journal of Development Economics. 98: 3-18 10. Das J, Hammer J, Sánchez-Páramo C. 2012. The impact of recall periods on reported morbidity and health seeking behavior. Journal of Development Economics. 98(1): 76–88. 11. DeWeerdt J, Beegle K, Friedman J, Gibson J. 2016. The challenge of measuring hunger through survey. Economic Development and Cultural Change. 64(4):727–58 12. De Weerdt, Gibson, and Beegle. 2020. What Can We Learn from Experimenting with Survey Methods? Annual Review of Resource Economics. 431-47 13. Desiere, Sam, Costa, Valentina, 2019. Employment Data in Household Surveys: Taking Stock, Looking Ahead. The World Bank. 14. Di Maio M, Fiala N. 2019. Be wary of those who ask: a randomized experiment on the size and determinants of the enumerator effect. World Bank Economic Review. 15. Djurfeldt, Agnes Andersson, 2013. African re-agrarianization? Accumulation or pro-poor agricultural growth? World Development. 41, 217–231. 16. Durazo, Josefine, Costa,Valentina, Palacios-Lopez, Amparo, Gaddis, Isis. 2021. Employment and Own-Use Production in Household Surveys : A Practical Guide for Measuring Labor (English). LSMS Guidebook Washington, D.C.: World Bank Group. 17. Dzanku, Fred Mawunyo, 2020. Poverty reduction and economic livelihood mobility in rural sub-saharan Africa. Journal of International Development. 18. Ellis, Frank, 1998. Household strategies and rural livelihood diversification The Journal of Development Studies. 35 (1), 1–38. 19. Ellis, Frank, Freeman, H. Ade, 2004. Rural livelihoods and poverty reduction strategies in four African countries. The Journal of Development Studies. 40 (4), 1–30. 20. Feuerbacher, A., McDonald, S., Dukpa, C. and Grethe, H., 2020. Seasonal rural labor markets and their relevance to policy analyses in developing countries. Food Policy, 93, p.101875. 21. Fink, Günther, B. Kelsey Jack, and Felix Masiye. 2020. "Seasonal Liquidity, Rural Labor Markets, and Agricultural Production." American Economic Review, 110 (11): 3351-92. 22. Garlick, Orkin, Quinn. 2020. Call Me Maybe: Experimental Evidence on Frequency and Medium Effects in Microenterprise Surveys. World Bank Economic Review. 34(2): 418-443. 23. Haggblade, Steven, Hazell, Peter, Reardon, Thomas, 2010. The rural non-farm economy: Prospects for growth and poverty reduction. World Development. 38 (10), 1429–1441. 24. Heath, Mansuri, Rijkers, Seitz, Sharma. 2021. Measuring Employment: Experimental Evidence from Urban Ghana. World Bank Economic Review. 35(3): 635-651. 25. Himanshu, Lanjouw, Peter, Murgai, Rinku, Stern, Nicholas, 2013. Non-Farm Diversification,Poverty, Economic Mobility and Income Inequality: A Case Study in Village India. The World Bank. 26. Imai, Katsushi S., Gaiha, Raghav, Thapa, Ganesh, 2015. Does non-farm sector employment reduce rural poverty and vulnerability? Evidence from Vietnam and India. J.Asian Econ. 36, 47–61. 27. Jeong, D., Aggarwal, S. Robinson, J., Kumar, N., Spearot, A., Park, D.S., 2023. Exhaustive or exhausting? Evidence on respondent fatigue in long surveys, Journal of Development Economics, Volume 161, 102992 28. Kilic, T., Van den Broeck, G., Koolwal, G. and Moylan, H., 2022. Are You Being Asked? Impacts of Respondent Selection on Measuring Employment in Malawi. Journal of African Economies. 29. Muyanga, Milu, Zephania Nyirenda, Yanjanani Lifeyo & William J. Burke. 2020.The Future of Smallholder Farming in Malawi. Working Paper No. 20/03. Lilongwe, Malawi: MwAPATA Institute (Accessed, December 19, 2021) https://www.mwapata.mw/_files/ugd/dd6c2f_f3cd0a352667458ea4e7ddd894db4ab3.pdf?ind ex=true 30. Serneels, Beegle, Dillon. 2017 Do returns to education depend on how and whom you ask? Economics of Education Review. October Vol. 60. pp.5-19 31. Yeboah, Felix Kwame, Jayne, Thomas S., 2018. Africa’s evolving employment trends. Journal of Development Studies. 54 (5), 803–832. Figure 1. Maize crop calendar for Malawi, survey rounds, and data recall periods Year 2020 2021 Maize production Growing Harvesting Sowing Growing Harvesting season Month J F M A M J J A S O N D J F M A M J J Survey Rounds R1 R2 R3 R4 Data recall period across entire sample Round 4 data recall Long recall data Note: Maize crop calendar provided by FAO at https://www.fao.org/giews/countrybrief/country/MWI/pdf/MWI.pdf Figure 2. Reported labor supply by quarterly measures Panel A: Worked at all Panel B: Number of unique activities Panel C: Number of months worked Panel D: Hours worked (100s) Notes: Quarter 1 is excluded due to lack of overlap between annual and 90-day recall. Short recall measures are shown in black and long recall-based measures are shown in red. Table 1: Summary statistics, attrition and balance P-value: Short Short recall Long recall Full sample recall = Long first first recall Panel A: Households Household size 5.073 5.073 5.072 0.993 Number of adults 2.454 2.446 2.461 0.857 Number of children 2.619 2.627 2.611 0.898 Sample size 701 354 347 Panel B: Household head (designated respondent) Female 0.247 0.263 0.231 0.324 Age 43.916 44.249 43.576 0.566 Married (monogamous) 0.693 0.681 0.706 0.469 Married (polygamous) 0.104 0.107 0.101 0.779 Years of education 6.649 6.585 6.715 0.663 Sample size 701 354 347 Panel C: Other adults (proxied respondents) Female 0.762 0.762 0.762 0.996 Age 30.418 30.105 30.676 0.544 Married (monogamous) 0.612 0.614 0.609 0.916 Married (polygamous) 0.079 0.086 0.074 0.648 Years of education 6.908 7.081 6.766 0.365 Sample size 466 210 256 Notes: Column 1 shows the summary statistics of households successfully reached for all four quarterly surveys and used in the empirical analysis. Columns 2 and 3 split these statistics by whether a household was randomly assigned to have the short or long recall window asked first in the final interview. Column 4 shows the p-value for a test of equality between these two groups. Table 2: Outcome variables Short recall Long recall Worked at all 0.915 0.693 Total activities 1.190 0.864 Number of months worked 3.605 2.760 Hours worked 360.5 356.3 Worked on household farm / doing agriculture 0.892 0.667 Worked in non-agricultural business 0.094 0.032 Worked for a wage 0.137 0.063 Notes: Table shows nine-month aggregate measures of labor participation based on short or long recall responses. Table 3: Test of Losses from Long Recall Window (1) (2) (3) (4) (5) (6) (7) Number of Hours worked Worked hh Worked non ag Worked at all Total activities Worked wage months worked (100s) farm/ag business Panel A Long Recall -0.199*** -0.289*** -0.719*** -0.117 -0.201*** -0.061*** -0.070*** (0.020) (0.034) (0.167) (0.246) (0.020) (0.015) (0.018) Mean Short Recall 0.915 1.190 3.605 3.601 0.892 0.094 0.137 Scaled Difference -0.217 -0.243 -0.199 -0.032 -0.225 -0.654 -0.511 Observations 1167 1167 1167 1165 1167 1167 1167 R-squared 0.348 0.314 0.210 0.152 0.345 0.038 0.040 Panel B Long Recall X Self-Report -0.053*** -0.218*** -0.657*** -0.059 -0.054*** -0.081*** -0.084*** (0.013) (0.040) (0.189) (0.334) (0.016) (0.019) (0.025) Long Recall X Proxy -0.417*** -0.395*** -0.812*** -0.203 -0.421*** -0.032 -0.049** (0.042) (0.056) (0.217) (0.267) (0.042) (0.021) (0.021) Mean Self-Report 0.997 1.407 4.431 4.755 0.983 0.110 0.172 Mean Proxy 0.776 0.824 2.213 1.655 0.738 0.067 0.076 Scaled Difference Self-Report -0.053 -0.155 -0.148 -0.012 -0.055 -0.738 -0.486 Scaled Difference Proxy -0.537 -0.479 -0.367 -0.123 -0.570 -0.476 -0.642 P-Val: Self=Proxy 0.000 0.007 0.497 0.707 0.000 0.066 0.272 Observations 1167 1167 1167 1165 1167 1167 1167 R-squared 0.397 0.318 0.210 0.152 0.392 0.041 0.040 Notes: All estimates cover three quarters of data. Reported means are of short recall estimates. Scaled differences are the coefficients divided by the short recall mean. Standard errors are clustered at the household level. Table 4: Long recall losses: Heterogeneity by Gender (1) (2) (3) (4) (5) (6) Total activities Number of months worked Long Recall x Male -0.210*** -0.396*** -0.332*** -0.654*** -0.835*** -0.313 (0.047) (0.062) (0.109) (0.220) (0.244) (0.458) Long Recall x Female -0.243*** -0.388*** -0.414*** -0.666* -0.676 -0.967*** (0.076) (0.118) (0.063) (0.369) (0.464) (0.235) Sample Respondent Proxy Proxy Respondent Proxy Proxy Characteristics Respondent Respondent Proxy Respondent Respondent Proxy Mean Male 1.429 0.848 0.760 4.438 2.320 1.946 Mean Female 1.344 0.688 0.844 4.411 1.621 2.297 Scaled Difference Male -0.147 -0.466 -0.437 -0.147 -0.360 -0.161 Scaled Difference Female -0.181 -0.564 -0.491 -0.151 -0.417 -0.421 P-Val: Male=Female 0.712 0.952 0.505 0.979 0.764 0.190 Observations 696 471 471 696 471 471 R-squared 0.049 0.126 0.115 0.022 0.067 0.060 Notes: All estimates cover three quarters of data. Reported means are of short recall estimates. Scaled differences are the coefficients divided by the short recall mean. The reported p-values show a test of equality in the effect of long recall between men and women, estimated at the top of each column. Standard errors are clustered at the household level. Table 5: Long recall losses: Heterogeneity by Age Group (1) (2) (3) (4) (5) (6) Total activities Number of months worked Long Recall x Under 25 -0.167 -0.304 -0.434*** -0.533 -1.309* -1.225*** (0.146) (0.190) (0.087) (0.606) (0.714) (0.294) Long Recall x 25 - 34 -0.090 -0.534*** -0.462*** -0.529 -1.395*** -0.919** (0.080) (0.106) (0.100) (0.364) (0.361) (0.409) Long Recall x 35 - 49 -0.247*** -0.367*** -0.348*** -1.020*** -0.447 -0.604 (0.067) (0.091) (0.113) (0.310) (0.391) (0.455) Long Recall x 50 plus -0.302*** -0.303*** -0.066 -0.398 -0.599 1.343 (0.072) (0.099) (0.204) (0.358) (0.397) (0.940) Sample Respondent Proxy Proxy Respondent Proxy Proxy Characteristics Respondent Respondent Proxy Respondent Respondent Proxy Mean Under 25 1.364 0.889 0.770 4.531 2.704 2.072 Mean 25 - 34 1.378 1.000 0.948 4.414 2.646 2.557 Mean 35 - 49 1.423 0.825 0.804 4.546 2.169 2.239 Mean 50 plus 1.420 0.645 0.714 4.307 1.787 1.575 Scaled Difference Under 25 -0.122 -0.342 -0.563 -0.118 -0.484 -0.591 Scaled Difference 25 -34 -0.065 -0.534 -0.487 -0.120 -0.527 -0.359 Scaled Difference 35 - 49 -0.174 -0.444 -0.432 -0.224 -0.206 -0.270 Scaled Difference 50 plus -0.213 -0.470 -0.092 -0.092 -0.335 0.853 P-Val: U25=25 -34 0.644 0.292 0.831 0.996 0.915 0.544 P-Val: U25=35 - 49 0.618 0.768 0.535 0.474 0.289 0.235 P-Val: U25=50plus 0.406 0.997 0.103 0.848 0.386 0.009 P-Val: 25 - 34=35 - 49 0.134 0.230 0.447 0.305 0.077 0.606 P-Val: 25 - 34=50 plus 0.049 0.115 0.085 0.798 0.143 0.027 P-Val: 35 - 49=50plus 0.576 0.636 0.230 0.189 0.782 0.065 Observations 696 471 471 696 471 471 R-squared 0.054 0.150 0.121 0.025 0.081 0.074 Notes: All estimates cover three quarters of data. Reported means are of short recall estimates. Scaled differences are the coefficients divided by the short recall mean. The reported p-values show a test of equality in the effect of long recall between respondents of different age groups, estimated at the top of each column. Standard errors are clustered at the household level. Appendix Figure 1 Self - Reported Proxy Reported 1 1 0.5 0.5 0 0 Quarter 2 Quarter 3 Quarter 4 Quarter 2 Quarter 3 Quarter 4 Panel A: Any Work 1.75 1.75 1.5 1.5 1.25 1.25 1 1 0.75 0.75 0.5 0.5 0.25 0.25 0 0 Quarter 2 Quarter 3 Quarter 4 Quarter 2 Quarter 3 Quarter 4 Panel B: Number of Months 1 1 0.75 0.75 0.5 0.5 0.25 0.25 0 0 Quarter 2 Quarter 3 Quarter 4 Quarter 2 Quarter 3 Quarter 4 Panel C: Number of Activities 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 Quarter 2 Quarter 3 Quarter 4 Quarter 2 Quarter 3 Quarter 4 Panel D: Number of Hours (100s) Notes: This figure reproduces the analyses conducted in Figure 2 of the main paper, splitting the sample by self and proxy reporting. Appendix Table 1: Sample Selection Full panel Targeted analysis sample MRALS sample Panel A: Households Household size 5.073 5.017 Number of adults 2.454 2.428 Number of children 2.619 2.589 Sample size 701 1505 Panel B: Household head (designated respondent) Female 0.247 0.250 Age 43.916 42.605 Married (monogamous) 0.693 0.687 Married (polygamous) 0.104 0.098 Years of education 6.649 6.488 Sample size 701 1505 Panel C: Other adults (proxied respondents) Female 0.762 0.722 Age 30.418 29.419 Married (monogamous) 0.612 0.551 Married (polygamous) 0.079 0.071 Years of education 6.908 6.860 Sample size 466 1765 Notes: Column 1 shows descriptive baseline statistics of final, full- panlel sample used in the empirical analysis. Column 2 shows these same measurs from the initially targeted households sampled from the full MRALS study. Appendix Table 2: Test of Losses from Long Recall Window, Annual Aggregates (1) (2) (3) (4) (5) (6) (7) Number of Hours worked Worked hh Worked non ag Worked at all Total activities Worked wage months worked (100s) farm/ag business Long Recall -0.214*** -0.368*** -1.495*** -1.423*** -0.220*** -0.102*** -0.139*** (0.019) (0.035) (0.193) (0.294) (0.020) (0.017) (0.020) Mean Short Recall 0.938 1.294 4.716 5.289 0.918 0.135 0.209 Scaled Difference -0.228 -0.284 -0.317 -0.269 -0.239 -0.758 -0.666 Observations 1167 1167 1167 1165 1167 1167 1167 R-squared 0.341 0.345 0.253 0.186 0.340 0.057 0.081 Notes: All estimates cover four quarters of data. Reported means are of short recall estimates. Scaled differences are the coefficients divided by the short recall mean. Standard errors are clustered at the household level. Appendix Table 3: Test of Losses from Order of Modules (1) (2) (3) (4) (5) (6) (7) (8) Three month recall Twelve month recall Number of Number of Total Hours worked Total Hours worked Worked at all months Worked at all months activities (100s) activities (100s) worked worked Asked first 0.060*** 0.071** 0.324** -0.050 0.293*** 0.367*** 1.734*** 2.112*** (0.017) (0.033) (0.132) (0.202) (0.026) (0.039) (0.171) (0.252) Mean in Reference Group 0.839 1.085 3.157 3.487 0.693 0.864 2.760 3.302 Scaled Difference 0.072 0.065 0.103 -0.014 0.423 0.425 0.628 0.640 Observations 1167 1167 1167 1167 1167 1167 1167 1162 R-squared 0.234 0.244 0.239 0.196 0.327 0.269 0.195 0.140 Notes: Columns 1 - 4 show the impact of asking the quarterly recall first on the quarterly recall measures. Columns 5 - 8 show the impact of asking the long recall questions first on the long recall measures. Other specification notes are as in the main tables. Appendix Table 4: Long recall losses: Heterogeneity by Gender (1) (2) (3) (4) (5) (6) Worked at all Hours worked (100s) Long Recall x Male -0.050*** -0.418*** -0.341*** -0.133 -0.193 0.155 (0.014) (0.046) (0.089) (0.387) (0.304) (0.486) Long Recall x Female -0.065** -0.409*** -0.440*** 0.168 -0.251 -0.314 (0.028) (0.105) (0.047) (0.660) (0.546) (0.309) Sample Respondent Proxy Proxy Respondent Proxy Proxy Characteristics Respondent Respondent Proxy Respondent Respondent Proxy Mean Male 0.996 0.792 0.720 4.816 1.721 1.359 Mean Female 1.000 0.688 0.794 4.584 1.285 1.747 Scaled Difference Male -0.050 -0.527 -0.474 -0.028 -0.112 0.114 Scaled Difference Female -0.065 -0.594 -0.554 0.037 -0.196 -0.180 P-Val: Male=Female 0.632 0.937 0.320 0.694 0.926 0.405 Observations 696 471 471 694 471 471 R-squared 0.031 0.207 0.199 0.006 0.027 0.026 Notes: All estimates cover three quarters of data. Reported means are of short recall estimates. Scaled differences are the coefficients divided by the short recall mean. Standard errors are clustered at the household level. Appendix Table 5: Long recall losses: Heterogeneity by Age Group (1) (2) (3) (4) (5) (6) Worked at all Hours worked (100s) Long Recall x Under 25 -0.036 -0.356** -0.445*** -0.859 -0.811 -0.797*** (0.035) (0.166) (0.067) (0.936) (0.966) (0.281) Long Recall x 25 - 34 -0.033* -0.493*** -0.470*** 0.433 -1.241*** -0.148 (0.019) (0.073) (0.073) (0.674) (0.433) (0.630) Long Recall x 35 - 49 -0.051** -0.368*** -0.357*** -0.952* 0.554 0.190 (0.020) (0.072) (0.086) (0.538) (0.526) (0.507) Long Recall x 50 plus -0.076*** -0.404*** -0.253 0.674 -0.018 1.711 (0.029) (0.077) (0.160) (0.626) (0.441) (1.439) Sample Respondent Proxy Proxy Respondent Proxy Proxy Characteristics Respondent Respondent Proxy Respondent Respondent Proxy Mean Under 25 1.000 0.889 0.724 5.116 2.166 1.490 Mean 25 - 34 1.000 0.898 0.897 4.901 2.115 1.999 Mean 35 - 49 1.000 0.762 0.725 5.173 1.609 1.614 Mean 50 plus 0.992 0.661 0.786 4.147 1.201 1.398 Scaled Difference Under 25 -0.036 -0.400 -0.615 -0.168 -0.374 -0.535 Scaled Difference 25 -34 -0.033 -0.549 -0.524 0.088 -0.586 -0.074 Scaled Difference 35 - 49 -0.051 -0.483 -0.491 -0.184 0.345 0.118 Scaled Difference 50 plus -0.076 -0.611 -0.323 0.163 -0.015 1.224 P-Val: U25=25 -34 0.949 0.450 0.802 0.263 0.688 0.345 P-Val: U25=35 - 49 0.707 0.945 0.405 0.932 0.207 0.082 P-Val: U25=50plus 0.380 0.793 0.277 0.174 0.450 0.088 P-Val: 25 - 34=35 - 49 0.520 0.223 0.315 0.109 0.009 0.676 P-Val: 25 - 34=50 plus 0.215 0.408 0.226 0.794 0.056 0.235 P-Val: 35 - 49=50plus 0.485 0.735 0.574 0.049 0.407 0.319 Observations 696 471 471 694 471 471 R-squared 0.033 0.235 0.201 0.013 0.053 0.037 Notes: All estimates cover three quarters of data. Reported means are of short recall estimates. Scaled differences are the coefficients divided by the short recall mean. Standard errors are clustered at the household level. Data Availability Statement: The data underlying this paper will be shared on reasonable request to the corresponding author. After journal acceptance and prior to publication, we will make data and replication files available in a public repository.