Are You Being Asked? Impacts of Respondent Selection on Measuring Employment

Accurate estimates of men's and women's employment are at the heart of understanding sources of productivity and economic growth and designing well-targeted, gender-sensitive labor policies. How respondent selection in household and labor force surveys affects these estimates is a key question, for which experimental evidence outside of high-income settings is limited. Leveraging two concurrent, national surveys in Malawi that differed in their approach to respondent selection, the analysis shows that compared to the best practice of privately interviewing adults about their employment outcomes, the common "business-as-usual" approach that permits the use of proxy respondents and non-private/group interviews leads to significant underreporting of employment across a range of wage- and self-employment activities, with stronger effects for women and for a longer (12-month) recall period. Under the business-as-usual approach, the main factors linked to under-reporting include household wealth, proxy reporting, and potential difficulties associated with interpreting/answering on questions regarding household non-farm enterprises.


Introduction
Accurate estimates of men's and women's employment are at the heart of understanding sources of productivity and economic growth and designing well-targeted, gender-sensitive labor policies. Country employment surveys, spanning labor force surveys and labor modules in multi-topic household surveys, are usually structured to ask working-age individuals to report the incidence of and time allocation to a range of employment-related activities over different reference periods, for example over the last week or 12 months. The common "business-as-usual" approach in these surveys, however, can involve interviewing respondents in group settings; proxy respondents reporting on behalf of individuals who are not available for interviews; and other issues affected by survey time constraints, including lack of follow-up questions linked to difficult-to-measure or seasonal activities, as well as intra-household roles in enterprises. All of these variations can potentially create errors in reporting employment, particularly when questions involve longer recall periods, or for types of employment that are more difficult to measure, including seasonal/informal work. A key question, as a result, is whether individual interviews with respondents -focusing on self-reporting and one-on-one (as opposed to group) settings -have a significant effect on reporting of employment over the standard practice.
This study aims to answer this question by leveraging two national surveys that were implemented concurrently by the Malawi National Statistical Office with different approaches to respondent selection in collecting individual-disaggregated survey data, including on employment -and that allow us to compare the data obtained under the business-as-usual approach (the Fourth Integrated Household Survey (IHS4) 2016/17) versus individual interviews that are conducted in private (the Integrated Household Panel Survey (IHPS) 2016). As compared to individual interviews conducted in the IHPS, our findings show that the business-as-usual approach leads to significant underreporting of employment in livestock-related activities and household non-farm enterprises (in management and worker capacity) -with stronger effects for women and under a recall period of 12 months, as opposed to 7 days. The analysis also reveals underreporting in wage employment among men under the business-as-usual approach. Looking at how the choice of survey approach affects time allocation to employment, we find that most differences arise within agriculture, with the business-as-usual approach underestimating weekly hours spent by men and women in livestock activities as well as weekly hours devoted by men to crop agriculture. We find that the higher incidence of proxy reporting in the IHS4 is likely a significant contributor of underreporting, as well as potential difficulties, given less time spent with enumerators, that the IHS4 respondents may have faced in interpreting and reporting on concepts related to businesses or enterprises (as evidenced by greater discrepancies in reporting in the IHS4 across the labor and non-farm enterprise modules). Finally, household wealth is significantly associated with the extent of reporting differences across the individual interview and business-as-usual approaches.
The paper is organized as follows. Section 2 presents the literature review, followed by the description of the country context and data sources in Section 3. Section 4 lays out the empirical strategy and presents the results, and Section 5 concludes.

Literature Review
Across high-income countries, there is a longstanding literature on respondent selection and labor reporting in nationally-representative surveys. In the United States, for example, a large number of survey experiments have been conducted to better understand sources of survey measurement error in earnings by matching individual survey data with a concurrent, "validation" data set (often administrative data from the Internal Revenue Service (IRS) or Social Security Administration, or employers' records). Bound et al. (2001) provide a review of these validation studies, from 1950 up until 1999.
Further, several recent studies have matched individual-level survey rounds from the U.S. Census Bureau's Survey of Income and Program Participation (SIPP) with administrative tax records from the IRS, to understand different sources of measurement error in earnings reporting. This includes proxy response, which can reduce survey costs and potentially provide less biased information on sensitive topics, 2 but also carries the risk of inaccurate reporting due to informational gaps, or differences in preferences and motivation between the proxy and intended respondent (see Cobb, 2018, andBound et al., 2001). Cognitive issues related to respondents' interpretation of questions, and ability to recall past outcomes accurately, additionally need to be considered (see Moore et al., 2000, andBickart et al., 1990). Tamborini and Kim (2013), for example, match the SIPP and IRS tax records at the individuallevel to find that proxy reporting leads to significant underreporting of earnings for single female workers (also see Hill, 1987). Cristia and Schwabish (2009) demonstrate that other covariates (in particular, earning higher incomes) lead to underreporting in the SIPP, and that individual characteristics that are associated with higher earnings (including age and marital status) are systematically associated with underreporting, leading to bias in estimates of these commonlyused predictors of earnings. Abowd and Stinson (2013) do not treat either approach as a "gold standard," showing that both sources of data have measurement errors, and argue for constructing a hybrid measure of earnings from survey and administrative data. Quality issues with earnings data across survey and administrative sources have also been raised. In the United Kingdom, Britton et al. (2019) find systematically lower reporting of earnings in a data set linking administrative tax records with student loan data, as compared to the concurrent Labor Force Survey -likely due to a combination of low survey response in the LFS from low/irregularly-paid earners, as well as a higher share of self-employed workers with no income in the administrative data (perhaps due to underreporting to avoid tax obligations).
Methodological survey research on measuring labor outcomes outside high-income settings have been less common. The quality and regularity of administrative data is one reason, also due in large part to the prevalence of informal and irregularly-paid work in these countries. Recent validation studies have involved smaller, more focused experiments where concurrent household surveys contrast alternative methodological approaches, or modules testing for different approaches are randomized within the same survey. Bardasi et al. (2011), for example, randomized the implementation of a "short" and "detailed" module in the Survey of Household Welfare and Labour in Tanzania, a survey of 1,344 households across seven districts in the country, to examine how employment statistics are affected by (a) the level of detail of questions and (b) self-reporting as opposed to using proxies. Their findings have strong gender implications; proxy response significantly lowers male employment in agriculture but has no effect for women (the proxy effect is mitigated when the spouse is selected as the proxy and when the proxy has some education). A few additional screener questions to identify work mitigates over-reporting of employment among women engaged completely in domestic work, as well as higher average weekly hours conditional on working, especially for men. Dillon et al. (2012) use the same data and experiment to study the effects of proxy reporting and question detail on child labor, and in Serneels et al. (2017) to examine the returns to education. In both of these studies, proxy reporting did not have an effect, but screening questions (i.e. a detailed versus short module) did matter. Finally, the findings around respondent selection can vary by context and experimental design -in a separate experiment from Ethiopia of 1,200 Fairtrade coffee households (Galdo et al., 2019), survey respondents were randomly selected to test, across three different seasons, the effects of proxy reporting on reporting of child labor (children themselves versus the household head or spouse). Proxy reporting was found to significantly underreport work of girls in agricultural settings, relative to children's own reports.
Our paper builds on this emerging literature by testing whether implementing individual interviews makes a significant difference in the quality of labor reporting, using two concurrent, nationallyrepresentative surveys from Malawi. We also examine how individual and household characteristics vary with survey approach to affect labor reporting. 3 Large-scale studies on labor survey methods from developing countries are also critical, given changes in international definitions of work and employment under the 19 th International Conference of Labour Statisticians (19 th ICLS) in 2013 that, for the first time, expand the definition of "work" to include all paid and unpaid activity, but at the same time narrow the definition of employment strictly to paid work (ILO, 2013). 4 This restricted definition of employment means that seasonal and/or irregularly-paid jobs -which by recent ILO estimates across 110 countries constitute about 70 percent of employment in developing and emerging economies, with much of it in agriculture (ILO, 2018a) -need to be elicited more carefully, so as not to be missed through standard survey approaches. A better understanding of issues around respondent selection and survey measurement is therefore critical, to better design policies that foster economic opportunities, along with monitoring SDG targets around small-scale agricultural and informal employment. 5

Country Context and Data
Malawi is a predominantly rural country situated in southeast Africa, with a little over half of its households below the national poverty line, and 20 percent living in extreme poverty (World Bank, 2019). Agriculture makes up 26 percent of the GDP and is a source of employment for 83 percent of Malawian households (Davis et al., 2017). A high share of households, particularly in rural areas, are engaged in seasonal economic activities, including ganyu -the local term for shortterm, temporary, off-farm labor opportunities .
Our analysis is informed by two surveys that were implemented concurrently by the Malawi National Statistical Office: the Fourth Integrated Household Survey (IHS4) 2016/17 and the Integrated Household Panel Survey (IHPS) 2016. 6 The IHS4 was a multi-topic, nationallyrepresentative, cross-sectional household survey that was conducted over the period of April 2016-April 2017. The IHS4 followed the business-as-usual approach to individual-level labor data collection, which allows for (1) proxy respondents to be interviewed on behalf of unavailable respondents, and (2) any respondent, whether proxy or self, to be interviewed in group, non-private settings. The Integrated Household Panel Survey (IHPS), on the other hand, was the third wave of a multi-topic, national, longitudinal household survey that was implemented from April 2016 to January 2017. The IHPS attempted to interview each adult household member in private, as part 4 The ICLS, which meets every five years, affects how country labor force surveys -guided by the ILO -are designed and undertaken. 5 Relevant SDG targets include, for example, Target 2.3 to double the agricultural productivity and incomes of smallscale food producers by 2030, through secure and equal access to land, other productive resources and inputs, knowledge, financial services, markets and opportunities for value addition and non-farm employment; as well as Target 8.3 to promote development-oriented policies that support productive activities, decent job creation, entrepreneurship, creativity and innovation, and encourage formalization and growth of micro-, small-and mediumsized enterprises including through access to financial services. 6 The data, questionnaires and basic information document for the IHS4 2016/17 can be accessed here: https://microdata.worldbank.org/index.php/catalog/2936. The data, questionnaires and basic information document for the IHPS 2016 can be accessed here: https://microdata.worldbank.org/index.php/catalog/2939. Both the IHS4 and IHPS were implemented with technical and financial assistance from the World Bank Living Standards Measurement Study -Integrated Surveys on Agriculture (LSMS-ISA). The implementation of the individual interviews as part of the IHPS was made possible by technical and financial assistance from the World Bank LSMS Plus (LSMS+) initiative, which aims to improve in IDA countries the availability and quality of individual-disaggregated survey data on asset ownership, work and employment and entrepreneurship. For more information on the LSMS+, please visit: http://surveys.worldbank.org/lsms/programs/lsms-plus. of a broader effort to operationalize the international best practices in collecting individualdisaggregated survey data on asset ownership and work and employment. The individual interviews were capped at four per household and it was ensured that the head of household and his/her spouse (if one exists) were among the interviewed individuals. 7 Within-household interviews were always administered in private and were attempted to be administered simultaneously and with a gender match-up between the enumerator and respondent. Appendix I includes the protocol for administering the IHPS individual questionnaire. 8 In comparing the two survey approaches, our prior is that self-reporting with one-on-one interviews would more accurately reflect labor outcomes among respondents. Our analysis focuses on individuals aged 18-64, for whom the information was elicited during the IHS4-IHPS overlapping fieldwork period of April 2016-January 2017. 9 To compare the IHS4 to the IHPS, we pool the data on 17,433 IHS4 respondents together with 4,377 IHPS respondents that self-reported and were interviewed in private. The combined sample includes 21,810 observations. The IHS4 sample is a mix of self-reporting individuals as well as individuals for whom the information was provided through a proxy. The latter set constitutes 39.4 percent of the IHS4 sample. 10 We do not discard any IHS4 observations on the basis of self-versus proxy reporting since the official statistics would be based on the totality of these survey data, irrespective of the deviations from the international best practices.
Conversely, there are 969 IHPS respondents who were eligible but were unavailable for the individual interview, constituting a non-response rate of 16.8 percent (of a total of 5,776). To use the IHPS data set in a way that can gauge the accuracy of the IHS4 data, we focus only on the IHPS individuals who have been subject to individual interviews and correct for non-response. To do the latter, we calculate weights by first running a logistic regression of individual response status among the 5,776 adults eligible for individual interviews. The results from the logistic regression are presented in Appendix Table A1. 11 Subsequently, we (1) take the inverse of the predicted response probability to construct the response weight variable for each IHPS adult household member who was interviewed in private; (2) winsorize the response weights at the top 3 percent to account for potential outliers; and (3) set it equal to 1 for all adults in the IHS4 sample. Henceforth, all statistics are weighted using the response weight. Table 1 presents the descriptive statistics on selected individual and household attributes. Individuals in the IHS4 are slightly older, more likely to be the household head or spouse, female and less educated, and they live in smaller households. Although these are statistically significant differences, the magnitude of the differences is negligible. 12 Observations 4,377 17,433 Notes: IHPS = individual interviews; IHS4 = business-as-usual survey approach. Significant differences are indicated with * (p<0.1), ** (p<0.05) and *** (p<0.01). Estimates are weighted with the response weight. Table 2 presents the descriptive statistics for labor market outcomes of interest, by gender. The estimates are coupled with the results from the tests of mean differences across the two surveys. identifying females; a series of dichotomous variables on educational attainment; dichotomous variables identifying whether the individual is currently married, and separately, whether he/she is head/spouse of head; and individual's number of months living away from the household over the past year; and (iii) household covariates, including household size, dependency ratio, and wealth index. The latter is a factor analysis-based index that is composed of (i) a series of dichotomous variables that capture the ownership of mortar, bed, table, chair, fan, air conditioner, radio, radio with flash drive/micro CD, TV, VCR, sewing machine, kerosene/paraffin stove, electric/gas stove, refrigerator, washing machine, bicycle, motorcycle, car, minibus, lorry, beer-brewing drum, sofa, coffee table, cupboard/drawers, lantern, desk, clock, iron, computer, satellite dish, solar panel and generator, and (ii) a series of dwelling covariates, including number of dwelling rooms per capita and categorical variables that identify construction material (permanent; semi-permanent; traditional); roof type (grass; iron sheets; clay tiles/concrete/plastic sheeting/other); floor type (sand; smoothed mud; smooth cement/wood/tile/other); water source (piped/well; borehole; other), and toilet facility (flush/VIP toilet; traditional latrine; other/none). 12 These differences are due to the survey setting, as the IHS4 aims to represent the national population of Malawi in 2016, while the IHPS is based on the national population in 2010.
First, we analyze a range of dichotomous variables regarding participation in specific activities during the past seven days and during the past 12 months -constructed from the answers given to the screening questions asked at the beginning of the labor module to account for participation in the following activities: (1) crop production (i.e. household farming activities whether for sale or for household food), (2) raising livestock (i.e. household livestock activities whether for sale or for household food), (3) managing a family non-farm enterprise (NFE) (i.e. run or do any kind of non-agricultural or non-fishing household business, big or small, for yourself), (4) working in an NFE (i.e. help in any of the household's non-agricultural or non-fishing household businesses), (5) engaging in off-farm wage employment (i.e. any work for a wage, salary, commission, or any payment in kind, excluding ganyu), and (6) ganyu employment (i.e. engage in casual, part-time or ganyu labor).
Subsequently, we examine hours allocated to each of these activities during the past seven days, as well as annual hours allocated to and earnings from wage, and separately, ganyu employment. 13 For now, we do not distinguish within the IHS4 between self-reporting respondents and respondents-by-proxy, although we do discuss proxy issues in the IHS4 later on. This is because the survey setting of the panel sample not only differs in terms of self-reporting but also in interviewing the respondents separately in absence of other household members, as explained above. Table 2 shows that in the IHS4, all activities have the same or lower participation rates as compared to the IHPS, with the exception of activities in agriculture (any crop work -as well as men's crop work strictly for household consumption). The distinction between crop production intended mainly for sale -versus all for household consumption/subsistence agriculture -was added to better understand how the survey approach affects the measurement of crop employment under the 13 Additional results including zeroes for the non-employed (i.e. unconditional sample) are available upon request. There were more significant differences in the unconditional sample owing to differences in participation across the two surveys. For weekly hours, we assume that no one can work more than 12 hours per day during the past seven days such that weekly hours are capped at 84 hours. We replace missing values with the median value of weekly hours within the enumeration area (EA) and take the logarithm conditional on participation. Annual hours are calculated based on the average number of hours per week spent on wage employment multiplied with the average number of weeks per month and the total number of months during the last 12 months. For ganyu employment, we do not have data on the number of hours per week, so we assume an average of six hours per day multiplied with the number of days per week spent on ganyu. We replace unrealistically high values with a maximum of 4,368 hours as we assume that no one can work more than 12 hours per day all year round. Annual earnings represent net income and are derived from all cash and in-kind remuneration that a worker received during the past 12 months. We winsorize earnings at top 1 percent; replace missing values with the median value within the enumeration area; and take the logarithm conditional on employment. revised 19 th ICLS definitions of work and employment. This has been a particularly challenging area for economies with a large share of smallholder farmers, for whom paid and subsistence work often vary seasonally throughout the year, and where women often have economically productive but hidden roles (Benes and Walsh, 2018a;Koolwal, 2019). Over the last few years, country labor force surveys and the surveys supported under the World Bank Living Standards Measurement Study -Integrated Surveys on Agriculture (LSMS-ISA) initiative have been introducing individual-level questions on the main intended destination of agricultural output to better elicit employment in agriculture under the new definitions.
Management of, and working in, an NFE and wage employment are also underreported under the IHS4 business-as-usual approach, and the difference widens when the recall period is longer. Although participation rates might differ, the ranking of importance of activities is similar in both scenarios, with cropping as the most common activity, then ganyu and livestock, management of a non-farm enterprise, off-farm wage employment and self-employment as least common activities. Further, Table 2 shows that weekly hours worked among those employed are not statistically different across survey approaches for most activities, with some variations by gender (wage employment for men and ganyu for women). This suggests that if participation is screened correctly, weekly hours tend to be adequately estimated for most activities under the business-asusual approach. For annual hours and earnings, we do find greater differences across survey approaches (among women, for example, conditional annual hours worked in wage employment are significantly higher in the IHS4, and conditional hours and earnings in ganyu for both men and women are also significantly higher in the IHS4 as well). The choice of survey approach may matter more, for example, with longer recall periods as well as seasonality of different activities among this population.
Looking at gender differences, the last two columns of Table 2 show that women are significantly more likely than men under both survey approaches to work in crop agriculture over the last 12 months -particularly in own-use production as opposed to market activity -and less likely to work in wage employment and ganyu. Weekly hours, on the other hand, tend to be greater for men across both farm and off-farm activities. 14 While men are significantly more likely to report management of a non-farm enterprise than women under the business-as-usual approach, this difference disappears in the individual-interview approach. Figure 1 looks at gender differences across samples by age, finding that the individual-interview approach in the IHPS tends to raise reported wage employment and NFE management for nearly all men across the age distribution, but for women the largest increases in these activities are focused on younger age groups (around 30 years of age; for NFE management there is an additional increase among older women (55+) in the IHPS as well). For women, lower time constraints outside their child-rearing years may be one reason for these trends. Generally, Table 2 shows that men's participation across different off-farm activities rises in the IHPS relative to the IHS4 sample by a greater factor as compared to women, resulting in wider gender gaps for the IHPS sample.

Table 2. Comparing Individual-Level Labor Outcomes by Survey Approach and Gender
Men Women Gender differences (2)-(4) Labor activities (Y=1, N=0) Any crop work (7d   Moreover, Figure 2 depicts the kernel density distribution of conditional annual hours (top panel) and earnings (lower panel) across wage and ganyu employment, for both men and women. Annual hours in wage employment for men and women tend to follow a bimodal distribution, indicating that people are either part-time or full-time wage employed, whereas hours in ganyu employment tends to be unimodal, and focused around the lower concentration of wage employed (possibly reflecting, to some degree, that part-time wage employed may also be supplementing their income with ganyu). For women, the business-as-usual approach, relative to individual interviews, tends to lead to a higher concentration of individuals employed full-time, but a lower share among those working part-time/fewer hours. The distributions for men, on the other hand, indicate a higher share of both part-time and full-time wage employed (albeit a smaller increase among part-time workers/those with fewer hours). And regarding hours in ganyu, the business-as-usual approach raises the concentration of men and women around the middle of their respective distributions.
Finally, Figure 2 is coupled with the p-values of the non-parametric Kolmogorov-Smirnov tests of equality of the IHPS and IHS4 distributions. The cross-survey differences are statistically significant in the comparisons of the distributions of (i) ganyu hours and earnings for women and men, and (ii) wage hours for women. The IHPS distributions in these cases take on lower values vis-à-vis their IHS4 counterparts.

Effect of business-as-usual approach on labor reporting
To understand the effect of survey approach on labor reporting, we estimate the following equation separately for the samples of men and women aged 18-64: Above, i and h represent individual and household, respectively; y represents individuals' labor outcomes (as presented in Table 2); and α and ɛ represent constant and error terms, respectively.
1 is a binary variable identifying the adults in the IHS4 sample, with the individuals in the IHPS constituting the comparison category. D is a vector of individual and household attributes presented in Table 1 earlier to capture any remaining unobserved heterogeneity that may also jointly determine both the dependent variable and the likelihood of being selected for one survey versus the other.
The aim is to understand how the business-as-usual survey setting compares to the best-practice approach of conducting individual interviews with self-reporting respondents. For the binary outcome variables, Equation (1) is estimated as a linear probability model with probability weights adjusting for non-response. 15 For weekly hours across activities, and annual hours and earnings in wage and ganyu, OLS regressions are run on the conditional sample of respondents (those who reported working in that activity over the relevant reference period). 16 Standard errors are clustered at the enumeration area level.
Based on equation (1), Tables 3 and 4 summarize the effect of the business-as-usual survey setting on the estimation of different labor statistics (for the full sample, and for men and women separately). On the whole, after controlling for demographic and survey characteristics, we find that the business-as-usual scenario (relative to the individual interviews) underestimates men's and women's participation in livestock, managing or working in an NFE, as well as wage employment for men. After controlling for demographic and survey characteristics, we find that the businessas-usual scenario significantly underestimates participation in the last 7 days in livestock, NFE management and supporting work (for both men and women), with weakly significant effects on wage employment for men. The effects are quite large, given the participation rates in Table 2, and tend to be stronger for women. Among women, for example, Table 4 shows underreporting under the business-as-usual approach ranging from 3.2 percentage points for livestock (which, given mean participation for women in the IHS4, is a relative decrease of 20 percent); 3.9 percentage points for NFE management (a relative decrease of about 50 percent); and 1 percentage point for supporting work in an NFE (a relative decrease of 33 percent). For men, these relative decreases in reporting are 15 percent for livestock activity, 26 percent for NFE management, and 40 percent for supporting NFE work. The magnitude of the underestimation becomes even larger when the recall period changes to 12 months. There is no under-or over-estimation for cropping and ganyu labor.
Looking at how the choice of survey approach affects time in employment, we find that most differences arise within agriculture. The business-as-usual approach underestimates weekly hours in livestock activities on the whole and for both men and women. Table 4 also shows that the business-as-usual approach also underestimates weekly hours for men in crop agriculture. We find that annual hours in ganyu employment are higher under the business-as-usual as opposed to individual-interview approach for both women and men and reported earnings in ganyu are higher for men. For women, we also find that annual hours in wage employment are higher under the business-as-usual approach. The results are from linear probability models for binary outcomes, and from linear regressions for hours and earnings. Coefficients represent the impact of the business-as-usual survey approach, originating from regressions that also include the aforementioned control variables, and district, month and enumerator fixed effects. The regressions are weighted by the response weight, and t-statistics accounting for clustering at the enumeration-area level are presented in brackets. Significant differences are indicated with * (p<0.1), ** (p<0.05) and *** (p<0.01). The results are from linear probability models for binary outcomes, and from linear regressions for hours and earnings. Coefficients represent the impact of the businessas-usual survey approach, originating from regressions that also include the aforementioned control variables, and district, month and enumerator fixed effects. The regressions are weighted by the response weight, and t-statistics accounting for clustering at the enumeration-area level are presented in brackets. Significant differences are indicated with * (p<0.1), ** (p<0.05) and *** (p<0.01).

Mechanisms behind under-or overreporting of labor
What mechanisms could be driving these effects? Figure 3 shows that proxy reporting in the IHS4 may contribute to underreporting across different activities. When examining the IHS4 individuals by whether they reported for themselves and estimating the same regressions as in Tables 3 and 4, we find that the IHS4 proxy sample contributes more to underreporting than the ownreporting/non-proxy sample, particularly in agricultural and non-farm enterprise activities.  Tables A2 and A3). Only coefficients significant at 5 percent level are presented here.
Another potential reason for underreporting under the business-as-usual approach may be due to how respondents interpret questions when they have less dedicated time with the enumerator. Under-reporting of men's wage employment under the business-as-usual approach may be due to the irregular or seasonal nature of wage work across many of these communities, and the need for more detailed questioning to better capture this work. This mechanism helps to explain the results in Figure 3 showing that the IHS4 non-proxy sample reports lower non-farm enterprise and wage employment. Recent cognitive studies conducted by the ILO across countries, for example, have found that many respondents in highly informal and agricultural contexts can have difficultywithout detailed contextual and follow-up questions -on identifying their work in an enterprise or casual wage work (Benes and Walsh, 2018b). Table 5 aims to better understand whether this mechanism is at work for non-farm enterprise activity, where underreporting under the business-as-usual approach is more systematic across the wealth distribution and for both men and women. Specifically, Table 5 compares the enterpriserelated outcomes above that come from the labor module, with the same outcomes from a separate household enterprise module that was also part of each survey. The enterprise module in each survey creates a roster of all enterprises that household members are involved in, and for each enterprise, one person (almost always the owner/manager, discussed below) reported on roles across household members on owners, managers, as well as those in supporting roles.
We find that the IHPS individual-interview approach tends to result in lower discrepancies between the labor module and non-farm enterprise module for the overall sample (although there is a significant difference, albeit small in magnitude, for the share of women working in a supporting role). For the IHS4, on the other hand, a significantly greater number of men and women are captured by the enterprise module as owners/managers, as well as supporting workers. Within the sample of households with an enterprise, there are significant differences in both the IHPS and IHS4, where the nonfarm enterprise module captures a greater share of men and women owners/managers as compared to the labor module. The discrepancy, however, is much greater under the business-as-usual approach, and also relatively greater for women (an increase from 37 to 51 percent in women's ownership/management moving from the labor module to the nonfarm enterprise module, or a 14-percentage point increase) as compared to men (a 12-percentage point increase).
In addition, there were cases of proxy respondents in the enterprise module (those that neither owned/managed or worked in the enterprise), although for the sample of individuals used in the analysis, the share of proxy response was lower in the IHPS (6.7 percent, or 64 individuals) compared to the IHS4 (10.4 percent). 17 The last two columns of Table 5 show that proxy response in the non-farm enterprise module has a greater effect on reporting of ownership/management in the IHS4, particularly for men -where own-reporting tended to be lower than the overall (proxy plus own-reporting) sample. There was, otherwise, no effect of proxy response in the IHPS non-farm enterprise module, and proxy response in the non-farm enterprise module did not have an effect on supporting work roles in either survey approach. Along with issues around interpreting one's role in a business, therefore, proxy reporting might be another issue to consider, although larger differences in reporting seem to emerge across the labor and non-farm enterprise modules than by proxy response (as discussed as part of the related regressions below). Significant differences are indicated with * (p<0.1), ** (p<0.05) and *** (p<0.01). Estimates are weighted with the response weight. † indicates that in the labor reporting module, the question asked of respondents was "in the last 12 months, did you run a non-farm business of any size for yourself or the household, even if only for one hour?" In the non-farm enterprise module (covering activity over the last 12 months), one respondent for each enterprise was asked to identify up to two household members for each of the following: (a) "who manages this enterprise or is most familiar with it," and (b) "who owns this enterprise?" ‡ indicates that the most recent survey period in the labor module was the last 7 days, and in the NFE module was the last operational month. These two groups were more comparable than using the last 12 months in the labor module (which led to much higher rates of participation). Figure 4 further looks at specific cases where individuals were identified as owners/managers of a business in one module, but not the other. Again, the nonfarm enterprise module captures a greater share of individuals as owners/managers than the labor module; although the number of cases is not large, we do see much wider gaps across modules in the IHS4 compared to the IHPS, and the labor module in the IHPS also tends to capture a greater share of individuals owning/managing an enterprise. Table A4 in the Appendix also runs regressions of individuals' work in enterprises using data strictly from the enterprise module and finds very few differences by survey approach. Where there is significant underreporting under the business-as-usual approach, the magnitudes of the coefficients are smaller compared to Table 4. Overall, therefore, underreporting of enterprise/business participation under the business-as-usual approach is linked significantly to the level of detail of questions in the labor module, which the individual-interview approach helps address.

Interaction effects of the business-as-usual approach on labor reporting: The role of household wealth quintiles
To further understand whether under-and over-reporting in the business-as-usual scenario are stronger for certain subpopulations, Appendix Tables A5 and A6 summarize the interaction effects (on participation in the last 7 days; results were very similar for participation in the last 12 months) of the business-as-usual survey approach with education, relation to household head, age, household wealth and location. The results are presented separately for the samples of men and women and show that along with age (being in the younger 15-24 age group), interaction effects with household wealth are associated with a greater range of labor participation outcomes for men and women. 18 For households with greater wealth under the business-as-usual approach, reporting of employment in market work in agriculture, as well as livestock, is significantly higher, but lower for non-farm enterprise work (ownership/management). On the other hand, the interactions of the business-as-usual survey approach with relation to the household head, marital status, and dependency ratio have relatively fewer significant effects across different labor outcomes. Figure 5 examines how the business-as-usual approach affects reporting on participation in specific activities during the last 7 days within each wealth quintile (i.e., estimating equation (1) separately for each wealth quintile). The trends in 12-month participation are similar (though with larger effects) and are presented in Appendix Figure A1. 19 As seen in Tables 3 and 4, Figure 5 shows that the business-as-usual approach leads to under-reporting of labor participation across most activities, relative to individual interviews, with the exception of men's employment in ganyu, and with ambiguous effects on wage employment. Figure 5 shows that whether underreporting is greater among higher or lower wealth quintiles depends on sector of activity. Similar to the results in Appendix Tables A5 and A6, the business-as-usual approach leads to greater under-reporting in non-farm enterprise work within higher wealth quintiles, who tend to be focused in these activities -with larger effects for ownership/management as opposed to supporting work. Under-reporting in agriculture, on the other hand, is more likely among men and women in lower wealth quintiles, who are also more likely to be mainly concentrated in agriculture.
Generally, trends in over/under-reporting tend to be similar for women and men for self-employed activities. For wage and ganyu employment, however, there are cases where trends for men and women move in opposite directions, similar to the pooled sample results presented in Appendix  Tables A5 and A6. With wage employment, for example, we find significant over-reporting of men's employment under the business-as-usual approach in the highest quintile -but for women, relative to individual interviews, the business-as-usual approach leads to under-reporting, even though full results from the regressions in Tables 3 and 4 (available upon request) show that wage employment is positively associated with higher wealth quintiles for both men and women. For ganyu, there is significant over-reporting of men's employment among the lowest wealth quintiles, but under-reporting among women under the business-as-usual approach. Given the earlier results from Tables 3 and 4 showing that underreporting under the business-as-usual approach tends to be greater for women, some of the gender differences may be due to norms around how women's paid work outside the household is viewed, particularly when women themselves are not reporting. Women are often engaged in several productive activities within and outside the home, for example, but their main roles may continue to be viewed as domestic instead of in paid labor (see Benes and Walsh, 2018a;Fletcher et al., 2018;Comblon and Robilliard, 2017;and Fox and Pimhidzai 2013). 18 We also found that wealth interactions with the business-as-usual approach also tend to have more widespread effects on participation (in the last seven days or 12 months) as opposed to weekly/annual hours and annual earnings (results available upon request). 19 The full estimates are available upon request.

Notes:
The quintiles are for the wealth index, as explained in footnote 11. q1 = lowest wealth quintile, q5 = highest wealth quintile. The points in the graph represent coefficients from linear probability models of participation on the business-as-usual survey approach, for each wealth quintile, controlling for district, month and enumerator fixed effects. Lines represent 95 percent confidence intervals. The regressions are weighted by the response weight.

Conclusions
Accurate measurement of employment is at the heart of designing effective policies that improve economic opportunities for the poor and alleviate gender inequities in access to and returns from productive activities. Leveraging two national surveys that were implemented concurrently in Malawi with different approaches to respondent selection in collecting individual-disaggregated survey data, we find that compared to the best practice of conducting private individual interviews to elicit self-reported information on labor outcomes, the business-as-usual approach that allows for proxy respondents and group/non-private interviews leads to significant underreporting of employment in livestock-related activities and household non-farm enterprises (in management and worker capacity). The effects are stronger for women and under a recall period of 12 months, as opposed to 7 days. The business-as-usual approach also brings about under-reporting in the incidence of wage employment among men.
We further discuss how underreporting is associated with a higher incidence of proxy reporting under the business-as-usual approach, as well as potential difficulties -given less time spent with enumerators -that respondents may face in interpreting and reporting on work that may be seasonal, such as casual wage work, as well as work in enterprises (as evidenced by greater discrepancies in reporting across the labor and non-farm enterprise modules under the businessas-usual approach, with more pronounced differences for women). The extent of under-and overreporting is also systematically associated with household wealth (greater under-reporting of nonfarm enterprise employment for the highest wealth quintile, for example), highlighting the importance of respondent selection on specific socioeconomic subgroups as well as by gender.

APPENDIX Appendix I
Protocol for Administering the IHPS Individual Questionnaire 1. Upon arrival in a Panel A EA during Visit 1, the team leader must attempt to identify all households assigned on Day 1. 2. At this time, the team leader needs to compile a preliminary list of the number of eligible adults in each household and the gender composition. This is, of course, the preliminary list, and the final determination of target individuals in each household will be based on the information in Module B. 3. After administering Module B, the enumerator should call/text/Whatsapp the supervisor confirming the number of adults that are within the EA and that are eligible for the individual interview. 4. Individual interviews should not all be saved for the last day in the EA, but should be conducted during the 4 days in a Panel A EA in Visit 1 or a Panel B EA in Visit 2. 5. After the enumerator administers the Household and Agriculture Questionnaires, he/she MUST copy the key information from the interview into the booklet of rosters on (i) household members, (ii) agricultural gardens (i.e. parcels), and (iii) agricultural plots. 6. Prior to approaching the household for the individual interview(s), the enumerators who will be conducting the interviews should meet away from the household, and a. Copy the information from the booklet of rosters into the CAPI application. b. Have a short briefing on the household composition such that each enumerator has a basic understanding of the household prior to starting their interview. 7. Make a proper introduction to the household of the purpose of the individual questionnaire. 8. Proceed with the interview(s) while making sure that interviews are done in private, simultaneously, and with a gender match between the enumerator(s) and the respondent(s). 9. Present questions in a way that the respondents feel comfortable sharing any hidden assets. 10. Present questions in a way that respondents feel comfortable responding honestly to questions on ownership of and rights to assets. 11. As necessary, add any agricultural gardens that were missed in the full household interview, in line with the instructions on the CAPI application. 12. Do not share any confidential information from these interviews with anyone, including others in the same household, some of whom may also be subject to an individual interview. Average marginal effects are derived from logit models. Z-statistics in brackets. * (p < 0.1), ** (p < 0.05) and *** (p < 0.01). The logit regression dropped 14 observations because one enumerator fixed effect predicted the outcome perfectly. The results are from linear probability models for binary outcomes, and from linear regressions for hours and earnings. Coefficients represent the impact of the business-as-usual survey approach, originating from regressions that also include the aforementioned control variables, and district, month and enumerator fixed effects. The regressions are weighted by the response weight, and t-statistics accounting for clustering at the enumeration-area level are presented in brackets. Significant differences are indicated with * (p<0.1), ** (p<0.05) and *** (p<0.01). The results are from linear probability models for binary outcomes, and from linear regressions for hours and earnings. Coefficients represent the impact of the businessas-usual survey approach, originating from regressions that also include the aforementioned control variables, and district, month and enumerator fixed effects. The regressions are weighted by the response weight, and t-statistics accounting for clustering at the enumeration-area level are presented in brackets. Significant differences are indicated with * (p<0.1), ** (p<0.05) and *** (p<0.01).

Appendix Table A4. Results on the Impact of Business-As-Usual Approach on Outcomes from NFE Module
Overall sample in NFE module (proxy + self-Reporting) Only self-reporting sample in NFE module

Notes:
The results are from linear probability models. Coefficients represent the impact of the business-as-usual survey approach, originating from regressions that also include the aforementioned control variables, and district, month and enumerator fixed effects. The regressions are weighted by the response weight, and t-statistics accounting for clustering at the enumeration-area level are presented in brackets. Significant differences are indicated with * (p<0.1), ** (p<0.05) and *** (p<0.01).
Bolded coefficients reflect statistically significant differences across the overall (self-reporting plus proxy) and just the own-reporting samples. For ownership/management of an enterprise, Table A4 shows that differences in effects were statistically significant across the total (own reporting + proxy) and only own-reporting samples, likely owing to sample sizes; however, the magnitude of these differences was small (less than 0.01). The results are from linear probability models for binary outcomes, and from linear regressions for hours and earnings. Coefficients represent the impact of the business-as-usual survey approach, originating from regressions that also include the aforementioned control variables, and district, month and enumerator fixed effects. The regressions are weighted by the response weight, and t-statistics accounting for clustering at the enumeration-area level are presented in brackets. Significant differences are indicated with * (p<0.1), ** (p<0.05) and *** (p<0.01). † indicates dependency ratio of children and elderly/adults 15-64. ‡ Wealth index is based on durable goods ownership and housing conditions, as explained in footnote 11. # City = Mzuzu, Lilongwe, Zomba or Blantyre city.