Policy Research Working Paper 10456 Using Paradata to Assess Respondent Burden and Interviewer Effects in Household Surveys Evidence from Low- and Middle-Income Countries Ardina Hasanbasri Talip Kilic Gayatri Koolwal Heather Moylan Development Economics Development Data Group May 2023 Policy Research Working Paper 10456 Abstract Over the past decade, national statistical offices in low- and to 25 minutes. Using a multilevel model that is estimated middle-income countries have increasingly transitioned to for each household and individual questionnaire module, computer-assisted personal interviewing and computer-as- the paper shows that interviewer effects on module duration sisted telephone interviewing for the implementation of are significantly larger than the estimates from high-income household surveys. The byproducts of these types of data contexts. Food consumption, household roster, and non- collection are survey paradata, which can unlock objec- farm enterprises consistently emerge among the top five tive, module- and question-specific, actionable insights on household questionnaire modules in terms of total variance respondent burden, survey costs, and interviewer effects. in duration, with 5 to 50 percent of the variability being This study does precisely that, using paradata generated attributable to interviewers. Similarly, labor, health, and by the Survey Solutions computer-assisted personal inter- land ownership appear among the top five individual ques- viewing platform in recent national household surveys tionnaire modules in terms of total variance in duration, implemented by the national statistical offices in Cambo- with 6 to 50 percent of the variability being attributable dia, Ethiopia, and Tanzania. Across countries, the average to interviewers. These findings, particularly by module, household interview, based on a socioeconomic household point to where additional interviewer training, fieldwork questionnaire, ranges from 82 to 120 minutes, while the supervision, and data quality monitoring may be needed average interview with an adult household member, based in future surveys. on a multi-topic individual questionnaire, takes between 13 This paper is a product of the Development Data Group, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at ardina.hasanbasri@yale.edu or tkilic@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Using Paradata to Assess Respondent Burden and Interviewer Effects in Household Surveys: Evidence from Low- and Middle-Income Countries Ardina Hasanbasri†, Talip Kilic⁑, Gayatri Koolwal⁑ and Heather Moylan⁑1 JEL Codes: C81, C83 Keywords: Survey Methodology, Household Surveys, Paradata, Respondent Burden, Interviewer Effects, Cambodia, Ethiopia, Tanzania. 1 The authors are listed alphabetically. Corresponding authors: ardina.hasanbasri@yale.edu or tkilic@worldbank.org. † Jackson School of Global Affairs, Yale University, ⁑ Living Standards Measurement Study, Development Data Group, World Bank. 1. Introduction Household surveys serve a vital role in national statistical systems; inform official statistics on an extensive range of socioeconomic phenomena; and are required for tracking progress towards national and international development goals. Multi-topic household surveys are implemented frequently across the developing world to fill data and research gaps, and there is increasing international momentum to improve the scope of intra-household, self-reported, individual- disaggregated survey data collected on key dimensions of men’s and women’s economic wellbeing (FAO, World Bank and UN Habitat, 2019; Hasanbasri et al., 2021; ILO, 2018; UN, 2019).2 Although household surveys have continued to grow in terms of topical coverage and complexity, particularly in low- and middle-income countries, there are persisting gaps in empirical evidence regarding various aspects of survey implementation, including respondent burden, survey costs and interviewer effects, which would be critical to assess for gauging data quality concerns – both during and after data collection – and for informing decisions regarding the design of future surveys. A related promising development is that over the last decade, national statistical offices (NSOs) in low- and middle-income countries have accelerated their transition to computer-assisted personal interviewing (CAPI) for face-to-face surveys (Carletto et al., 2022) and have adopted computer- assisted telephone interviewing (CATI) for phone surveys, particularly in response to the data needs brought on by the COVID-19 pandemic (Gourlay et al., 2021). As such, practitioners can address the aforementioned empirical knowledge gaps regarding survey implementation by leveraging survey paradata, which are data that are generated as a by-product of computer-assisted data collection and that capture the entire process of creating a final survey dataset (Couper, 1998; Kreuter, 2013). To give an example, the Survey Solutions CAPI/CATI platform automatically provides an extensive paradata file for each survey. This ancillary dataset is a highly disaggregated account of the “life” of a survey and includes time stamped records of all “events” associated with each interview (e.g., interview record creation, interview assignment to an enumerator, answer provision, modification and comment addition in each questionnaire field, interview completion, to name a few). 2 Several indicators for the Sustainable Development Goals (SDGs) require individual-disaggregated survey data, including SDG 1.4.2 (the proportion of total adult population with secure tenure rights to land, with legally recognized documentation and who perceive their rights to land as secure, by sex and by type of tenure), SDG 5.a.1 (a) (the proportion of total agricultural population with ownership or secure rights over agricultural land, by sex), and SDG 5.a.1 (b) (the share of women among owners or rights-bearers of agricultural land, by type of tenure), SDG 5.b.1 (the proportion of individuals who own a mobile telephone, by sex); SDG 8.10.2 (the proportion of adults (15 years and older) with an account at a bank or other financial institution or with a mobile-money-service provider, by sex). Research has revealed the importance of eliciting self-reported survey data collected in private interviews for the accurate measurement of these and related indicators (Kilic et al., 2021, Kilic et al., 2022, Hasanbasri et al., 2021). 2 Past research has demonstrated the use of survey paradata for (i) monitoring survey progress and informing adaptive survey designs; (ii) analyzing and adjusting for survey non-response; (iii) computing granular interview duration statistics during the survey fieldwork and as an input into the design and costing of future surveys, (iv) tracking answer modification patterns and compliance with the intended interview flow; (v) identifying falsified data, (vi) verifying compliance with the intended visits to sampled enumeration area and household locations, and (vii) studying respondent behavior and predicting future survey participation (Choumer-Nkolo et. al., 2019; Couper and Kreuter, 2013; Gordeev et al., 2021; Jans et al., 2011; Kreuter et al., 2010; Kreuter and Olsen, 2013; Murphy et al., 2019; Virgile, 2016). While these research efforts have been informed by the surveys conducted in high-income countries, the comparable applications are scarce in low- and middle-income countries, where building NSO technical capacity in the use of paradata for survey design, management, and quality control has been identified by the United Nations Intersecretariat Working Group on Household Surveys as one of the technical priorities for positioning household surveys for the next decade (Carletto et al., 2022). To gain paradata-powered insights on survey respondent burden, survey costs, and interviewer effects for the first time in low- and middle-income contexts, this paper uses paradata generated as part of the national household surveys implemented by the NSOs in Cambodia, Ethiopia and Tanzania between 2018 and 2020 using the Survey Solutions CAPI platform. These surveys were supported by the World Bank Living Standards Measurement Study (LSMS+) project and included (a) a multi-topic socioeconomic household questionnaire, and (b) a cross-country comparable individual questionnaire that was administered to the adult household members in private interviews to collect self-reported information regarding their work and employment, as well as ownership and rights over physical and financial assets, among other topics. The paper starts out by providing precise country- and questionnaire module-specific duration estimates, as proxies for respondent burden, for an extensive range of household and individual questionnaire modules. These statistics can serve as operationally relevant inputs that can inform the decisions of survey practitioners interested in implementing similar questionnaire modules in comparable contexts. The average household interview ranges from 82 minutes in Cambodia to 120 minutes in Tanzania. Food consumption tends to be the most time-consuming household questionnaire module to complete with an average of 22 to 26 minutes, depending on the country. Besides food consumption, household modules on non-food consumption, housing, and household roster consistently rank among the highest in interview length. Likewise, the average individual interview ranges from 13 minutes in Ethiopia to 25 minutes in Cambodia, with the individual questionnaire modules on land ownership, labor, health and education consistently ranking among the top time-consuming modules. The disaggregation of the paradata allows for a more detailed look into respondent burden. One could find the average minutes per question conditional on certain specifications (for example, 3 calculating length of the land modules for individuals who own land versus those with no land). We are also able to provide estimates on the extent to which interview length for sampled households rises with the number of household adult members targeted for individual interviews. On average, having one additional interview with an adult household member increases total household interview time by about 37 to 48 minutes in Cambodia. In Ethiopia, this number is 41 minutes when moving from one targeted individual to two, then shifts to less than 22 minutes per any additional individual interview target. Tanzania’s increase in additional time per interview target is the largest relative to other countries, given the number of modules the survey administered (as high as 191 minutes and as low as 81 minutes). Furthermore, the analysis combines the duration data with the information on the total survey costs to provide the cost estimate for a minute of face-to-face multi-topic survey data collection in each country. In Cambodia, the cost per minute is estimated at $0.87, while the comparable figures are $1.71 in Ethiopia, and $3.94 in Tanzania. These cost estimates can be used to construct budgets for hypothetical surveys in comparable country contexts, as a function of the expected total interview time per household, which would be mediated by the estimated duration of specific questionnaire modules (which can be informed by this paper) and the number of intra-household interview targets (in the case of individual-level data collection). We posit the unit costs reported in this paper to be more informative and comparable than the past cross-country cost estimates per interviewed household (see, for example, Kilic et al. 2017) since the countries differ significantly in terms of the approach to and scope of household survey data collection. Subsequently, we turn to the cross-country analysis of interviewer effects on module duration. Interviewers play a large part in survey implementation, with potential effects on how respondents answer questions, non-response, measurement errors, and interview length (West and Blom 2017; Flores-Macias et al. 2008; Vollmer et al. 2021; Maio and Fiala 2020). Regarding interviewer effects on interview length, past research has focused exclusively on high-income settings (Couper and Kreuter, 2013). Contributing to this literature, we rely on multi-level hierarchical models of module duration, with levels defined as enumeration areas and interviewers; compute the interclass correlation coefficient (ICC) for each module, under different model specifications; and decompose the ICC to understand the extent of the variance in module duration that is explained by its interviewer component (i.e., ICC-I). Our findings show that interviewer effects vary by module, despite having the same interviewer sample across modules. These effects are generally higher relative to the comparable multi-level model estimates from high-income countries and explain a large share of the total variance in interview length. Based on our preferred multi-level hierarchical module specification, interviewer effects explain 2 to 50 percent of total variance, depending on the module and country. 4 Identifying modules with high total variance estimates as well as high ICC-I measures is a first step towards instituting measures to minimize interviewer effects, including additional interviewer training and fieldwork supervision. Among the household questionnaire modules, food consumption, household roster and non-farm enterprises consistently emerge among the top 5 modules in terms of total variance in duration. In the case of the food consumption module, 22 to 50 percent of the variability is attributable to interviewers, depending on the country, and the comparable ranges are 7 to 27 percent for the household roster module and 5 to 17 percent for the non-farm enterprises module. Country-specific findings reveal additional modules that rank in the top 5 in terms of total variance and that are also associated with ICC-I estimates of at least 10 percent. These include housing, non-food consumption, consumer durables and livestock, with interviewer effects explaining 14 to 40 percent of total variance in module duration, depending on the module. Likewise, labor, health and land ownership rank in the top 5 individual questionnaire modules in terms of total variance in duration in the majority of the countries. The ICC-I estimates range from 10 to 25 percent for the labor module, 11 to 15 percent for the health module and 6 to 50 percent for the land module. Additional modules emerge in specific countries with high total variance and ICC-I estimates (of at least 10 percent), including time use and subjective wellbeing, and modules that aim to capture the ownership of and rights to livestock and financial assets. In these cases, depending on the module, interviewer effects explain 11 to 29 percent of total variance in module duration. On the whole, the module duration estimates, the survey unit costs, and multi-level hierarchical model insights presented in our paper constitute operationally relevant and previously undocumented reference points for the NSOs and survey practitioners in low- and middle-income countries that may adopt (a) questionnaires and fieldwork protocols similar to ours, including the goal of minimizing the use of proxy respondents while collecting personal information on adults; and (b) the use of paradata and empirical methods presented in this paper to improve further efficiency and quality of future surveys.3 The paper is thus structured as follows. Section 2 discusses the LSMS+ data as well as how the paradata was extracted and cleaned. Section 3 provides descriptive statistics related to time costs of individual modules in the LSMS+ in comparison to household modules and a land module example of using interview length and budgets to make design decisions. Section 4 conducts a multi-level model analysis to understand how the interviewer effect contributes to variation in interview length. Lastly, Section 5 concludes. 3 Since the implementation of the surveys that inform our analyses, Survey Solutions issued multiple updates that also have a bearing on the paradata file structure. While researchers may not be able to leverage our syntax files with ease for the purpose of analyzing their own Survey Solutions paradata, our syntax files can be made available upon request. Our paradata, however, cannot be shared, as they contain confidential information that is excluded from the public use survey datasets. 5 2. Paradata from National Surveys in Cambodia, Ethiopia, and Tanzania 2.1 Overview This paper uses the paradata from three nationally representative surveys supported by the LSMS+ program: the Cambodia Living Standards Measurement Study – Plus (LSMS+) Survey 2019/20, the Ethiopia Socioeconomic Survey (ESS) 2018/19, and the Tanzania National Panel Survey 2019/20. The country surveys were implemented by their respective NSOs. Each survey included a multi-topic household questionnaire, as well as a cross-country comparable individual questionnaire that aimed to collect self-reported data on adult household members’ work and employment, and ownership of and rights to physical and financial assets, among other topics.4 The questionnaire structure, wording, and approach to implementing the individual-level survey modules was the same across countries. Table 1 reports general descriptions of each survey and the type of modules included. A more thorough list and short descriptions of topics covered in the household and individual questionnaires are available in the Appendix Table A1. Appendix Table A2 provides further information on the characteristics of respondents for each survey. 4 Each survey was supported by the World Bank Living Standards Measurement Study – Plus (LSMS+) project, which was established in 2016 to improve the availability and quality of individual-disaggregated survey data on key dimensions of men’s and women’s economic opportunities and welfare. For more information, please visit www.worldbank.org/lsmsplus. 6 Table 1: Overview of National Surveys Used in Analysis Ethiopia Tanzania Cambodia 2018/2019 2019/2020 2019/2020 Survey Ethiopia Socioeconomic Tanzania National Panel Cambodia LSMS+ Survey Survey Implementing Agency Ethiopia Central Tanzania National National Institute of Statistical Agency Bureau of Statistics Statistics of Cambodia Fieldwork Period 9/2018 – 8/2019 1/2019 – 1/2020 10/2019 – 1/2020 Household Sample 6770 Households 1184 Households 1512 Households Scope of Household 8 modules 16 modules 10 modules Questionnaire Adult Respondent 7235 Men 1407 Men 1845 Men Sample for Individual 8153 Women 1506 Women 2095 Women Questionnaire Scope of Individual 7 modules 8 modules 11 modules Questionnaire Individual Questionnaire Non-residential (primarily Non-residential Non-residential (primarily Modules on Asset agricultural) and agricultural) and residential (primarily agricultural) Ownership residential land, financial land, financial accounts, and residential land, accounts, mobile phones, mobile phones, livestock, financial accounts, livestock mobile phones consumer durables Other Individual Employment, non-farm Employment, non-farm Employment, non-farm Questionnaire Modules enterprises, education, enterprises, education, enterprises, education, health, savings health, 24-hour time use health, subjective well- being diary; domestic and international migration Notes: LSMS+ data are publicly available. More information on the data and questionnaire can be found in this link. 2.2 Paradata collection and cleaning The Cambodia, Ethiopia, and Tanzania surveys were conducted using the World Bank Survey Solutions CAPI software, which automatically includes a supplemental paradata that contains timestamps of all “events” associated with each interview. Table 2 provides an example of what a paradata file looks like. There is a row for each event followed by the timestamp of when the event occurred. The “parameters” column provides input associated with the event. For example, the first row shows an AnswerSet event for the question hh_a01_1 where the person answered “53”.5 Each row in the paradata follows the previous row sequentially in time which is shown in the column order.6 Our analysis includes data on events initiated by the interviewer (variable role==1) 5 The variable “parameters” is important since it consists of key information concerning the event. For example, for an AnswerSet event, the parameters value contains the question that is answered, what was the answer, and who answered the question. In some cases, there are multiple IDs in the paradata to uniquely distinguish specific assets or agricultural plots. Parsing these types of information allows for a richer set of analysis at the question-level or respondent-level. 6 Some events were deleted during the data cleaning process which explains the missing numbers in order. 7 and is associated with a questionnaire. The number of observations per person or household may differ since questions are automatically skipped when not applicable and are not logged as events. Table 2: Example of Paradata Collected through Survey Solutions Notes: The column “responsible” reports the name of the interviewer which is anonymized in the example above. There are multiple types of events tracked in the paradata and that are central to the computation of our duration measures. Examples include: AnswerSet (indicates when a question was answered in the interview), CommentSet (marks when a comment was added to a question in the interview), AnswerRemoved (flags when the answer to a question was removed), and Paused (denotes a prolonged pause, such as when a tablet goes to sleep). 7 The number of observations in a paradata file is typically very large since thousands of events can be available for a single respondent. This represents one of the major complexities of handling paradata. Certain events in the paradata were not deemed necessary for our analysis, and thus excluded such as KeyAssigned (indicates when an enumerator creates a new interview and Survey Solutions automatically assigns it a unique key) and ApprovedByHeadquarter (indicates when the interview was approved by one of the individuals at the headquarters – typically a survey manager). The final dataset used for this paper’s analysis consists of mostly AnswerSet events (97.5% of all events in Cambodia, 98.7% for Tanzania, and 98.8% for Ethiopia), since these events are ones that occur during the interviews. Subsequently, we construct a measure of interview length in minutes, by calculating the elapsed time between two logged events and removed outliers, namely events that were in the top 1 percent of the duration distribution in each country.8 We proceed by using the final dataset to estimate 7 For detailed information regarding the Survey Solutions paradata file format and the comprehensive descriptions of events, please visit: https://docs.mysurvey.solutions/headquarters/export/paradata_file_format/. 8 The removed outliers mostly constitute events that are recorded as the entry of a comment, or an interviewer being assigned to a given household. Additionally, there were rare timestamp entries that were not entered subsequently and thus created very large interview time. These were excluded as well during the trimming. After the trimming, means were quite close to the median indicating the data is more centered than before the trimming. In their analysis, Couper and Kreuter (2013), remove “outlier” events with negative or zero response times or with response times that are higher than 2 standard deviations above the mean. This trimming would have been quite conservative in our case, corresponding to only 0.001 to 0.04 percent of events being deleted, depending on the country. 8 interview length at the question-, module-, individual-, and household-level, and also to link it with the information on household, individual and interviewer attributes. 3. Duration and Costs of Household and Individual Interviews Using the setup discussed above, we turn to examining country-specific insights on respondent burden - specifically descriptive statistics for the duration of each household and individual questionnaire module, and for the duration of household and individual interviews. Given often- limited resources in conducting multi-topic surveys, the descriptive analysis in this section can provide insight on which modules might be more costly and complex to implement than others, aiding in survey design decisions. We also show how interview length estimates can be used to provide estimates of monetary costs of household and individual interviews. 3.1 Duration Estimates Table 3 provides key descriptive statistics regarding the time burden of household questionnaire modules while Table 4 provides the same for the individual questionnaire modules. Countries are presented side-by-side for ease of comparison. The mean and median duration estimates for each module are computed over the entire household sample concerning the household questionnaire modules and over the entire age-eligible sample of household members (which varies by module, say education versus health versus labor) concerning the individual questionnaire modules. Most modules tend to have a median interview length of less than 10 minutes per interview, with a few exceptions that are noted below. We also find consistent patterns across countries in terms of which modules take the longest and ranking of modules by interview length. Table 3 shows that the average total household interview ranges from 82 minutes in Cambodia to 120 minutes in Tanzania. The food consumption module emerges as the longest module to administer across countries, with an average module duration of 22 to 26 minutes, reflecting a very small amount of time per question — about 0.16 minutes or less — conditional on answering.9 Other top time-consuming modules across countries are (i) non-food consumption (5 to 13 minutes per module, or 0.19 minutes or less per question), (ii) housing (6 to 7 minutes for the module, or 0.20 minutes or less per answered question), and (ii) household roster (13 to 23 minutes for the module, or 0.29 minutes or less per answered question). The average duration for the administration of the entire set of household-level asset rosters for land, livestock, and apartments ranges from 6 to 7 minutes, resulting in an average duration of 0.20 minutes or less per answered question. 9 Although most questions within modules are comparable across-countries, the survey designer can modify or add questions that are of interest to the country and fit more with the country’s context. Given this, not all questions across countries are created the same and there might be some variation. 9 Table 3: Duration of Household Questionnaire Modules, by Country Cambodia Ethiopia Tanzania Number of Number of Number of Median Mean Median Mean Median Mean Module Name Std Module Std Module Std Module (minutes) (minutes) (minutes) (minutes) (minutes) (minutes) Obs. Obs. Obs. cover 2.88 3.15 1.42 1512 2.58 2.89 1.65 6770 0.90 2.02 2.33 1184 household roster 12.38 14.35 9.02 1512 11.82 13.23 8.19 6770 17.91 22.81 20.47 1184 food consumption 21.70 21.70 9.06 1512 22.13 24.12 11.83 6770 25.24 26.28 12.27 1182 food aggregate - - - - 3.65 4.25 2.43 6770 - - - - non-food consumption 12.02 12.94 5.22 1512 5.00 5.53 2.78 6769 - - - - non-food consumption weekly/monthly - - - - - - - - 5.73 6.14 2.48 1182 non-food consumption annual - - - - - - - - 5.50 5.86 2.48 1182 housing 5.34 5.81 2.56 1512 5.86 6.43 2.79 6770 6.78 7.28 2.95 1183 land roster 4.23 5.01 3.59 1512 1.13 1.56 1.49 6766 2.34 3.15 2.87 1182 livestock roster 1.02 1.25 1.12 1512 1.37 4.55 6.85 6769 - - - - apartment roster 0.27 0.41 0.39 1512 - - - - - - - - consumer durables 1.22 1.48 0.92 1512 3.23 3.58 1.79 6770 13.23 14.55 7.02 1182 children elsewhere 1.38 6.84 9.50 1512 - - - - - - - - household enterprise 1.60 9.41 12.59 1512 0.93 3.29 4.93 6770 0.94 5.36 6.84 1140 credit - - - - 0.37 1.10 1.80 6770 0.20 0.65 1.18 1178 finance - - - - - - - - 3.98 4.86 3.34 1182 food security - - - - 1.90 2.36 1.59 6770 3.03 3.37 1.72 1183 shock - - - - 1.25 1.86 1.66 6770 1.53 1.97 1.61 1182 other income - - - - 1.00 1.42 1.23 6770 - - - - assistance - - - - 0.27 0.58 0.93 6769 0.68 1.14 1.36 1182 recontact information - - - - - - - - 8.52 8.90 2.91 1182 anthoprometry - - - - - - - - 2.95 3.80 3.55 1029 death in household - - - - - - - - 0.17 0.44 1.08 1179 all household modules combined 75.98 82.32 32.03 1512 72.48 76.76 30.15 6770 167.54 195.18 112.45 1184 Cambodia Ethiopia Tanzania Average Average Average Avergae Avergae Avergae length per length per length per % HH/Ind Total question % HH/Ind Total question % HH/Ind Total question question question question Module Name answering question in answered answering question in answered answering question in answered (minutes) (minutes) (minutes) module module per module module per module module per if if if HH/Ind HH/Ind HH/Ind answered answered answered cover 100 14 7 0.35 100 21 6 0.35 100 12 6 0.29 household roster 100 43 26 0.14 100 28 20 0.16 100 40 30 0.20 food consumption 100 8 8 0.16 100 131 55 0.13 100 44 34 0.16 food aggregate - - - - 100 22 20 0.12 - - - - non-food consumption 100 5 5 0.14 100 8 5 0.15 - - - - non-food consumption weekly/monthly - - - - - - - - 100 4 4 0.19 non-food consumption annual - - - - - - - - 100 5 4 0.19 housing 100 40 27 0.19 100 56 32 0.18 100 62 34 0.20 land roster 100 19 14 0.18 100 10 4 0.16 100 24 8 0.24 livestock roster 100 6 5 0.09 100 16 4 0.14 - - - - apartment roster 100 7 4 0.10 - - - - - - - - consumer durables 100 5 5 0.11 100 3 3 0.08 100 5 5 0.17 children elsewhere 100 31 13 0.21 - - - - - - - - household enterprise 100 48 21 0.18 100 42 10 0.16 96 33 14 0.26 credit - - - - 100 28 5 0.20 99 13 2 0.22 finance - - - - - - - - 100 36 15 0.17 food security - - - - 100 19 13 0.16 100 24 16 0.19 shock - - - - 100 11 5 0.08 100 6 2 0.13 other income - - - - 100 10 3 0.09 - - - - assistance - - - - 100 13 3 0.12 100 22 3 0.17 recontact information - - - - - - - - 100 24 24 0.35 anthoprometry - - - - - - - - 87 7 5 0.26 death in household - - - - - - - - 100 21 2 0.24 Notes: Number of module observations is based on how many respondents answered in that module. In Tanzania, household member roster and land roster were integrated and thus combined in the table. 10 Table 4 shows that the average individual interview ranges from 13 minutes in Ethiopia to 25 minutes in Cambodia. The land module consistently emerges as the longest module to administer in each country, with an average duration of 8 to 13 minutes for the module. This is followed by the labor module (with an average duration of 3 to 6 minutes); health module (with an average duration of 2 to 3 minutes); and education (with an average duration of 2 to 3 minutes for the module). Individual-level modules on assets are less time consuming. The average duration for the administration of the entire set of modules on land, livestock, apartments, financial assets, mobile phones, durables (if applicable) and savings (if applicable) ranges from 14 to 18 minutes for each interviewed adult, with an average duration of 0.18 minutes or less per answered question. The number of questions within a module and the number of individual interview targets for a given module are clearly linked with interview duration. The lower panels of Tables 3 and 4 provide insights into how many questions are in a given module and how many respondents answered each module (some were not eligible to answer). Some household modules are required and thus their response rates are almost always 100 percent. Individual modules, on the other hand, may not be administered to all individuals in the household, depending on the age eligibility for each module. For instance, the health module aims to collect information on all household members while the age threshold for data collection is typically 5 years for the education module and 18 years for the land module (except in rare cases when head of household or his/her spouse is younger than 18 and still qualify as interview targets). All questions within a module may not also be applicable to a given respondent. The third-to-last column of Tables 3 and 4 provide the total number of questions available in the module, while the second-to-last column provides the average number of questions that the respondent answered. Despite some modules having higher numbers of questions than others, this does not necessarily mean that the average interview length would be higher, since the duration ultimately depends on (i) the number of answered of questions per module, (ii) the reliance on recall to provide information (as in the case of reporting on quantities and expenditures in the consumption modules) and (iii) the use of open-ended questions (as in the case of descriptions of the main and secondary jobs in the labor module), among other factors. 11 Table 4: Duration of Individual Questionnaire Modules, by Country Cambodia Ethiopia Tanzania Number Number Number Median Mean of Median Mean of Median Mean of Module Name Std Std Std (minutes) (minutes) Module (minutes) (minutes) Module (minutes) (minutes) Module Obs. Obs. Obs. education 1.67 2.35 1.93 6029 1.17 1.53 1.15 26066 2.62 3.14 2.10 4619 health 1.55 2.09 1.66 6332 1.30 2.04 1.84 28877 2.53 2.93 1.73 5550 labor 2.68 4.69 5.00 5741 2.48 3.20 2.31 23416 5.10 6.43 4.61 4616 land 8.80 11.22 13.07 3566 3.63 7.63 12.11 15040 8.44 12.81 15.90 2934 mobile phones 1.25 1.60 1.35 3430 0.58 0.77 0.73 15048 0.93 1.10 0.79 2926 financial assets 0.58 1.13 1.48 3660 0.88 1.60 1.72 15074 0.70 1.03 0.95 2437 livestock 1.97 2.50 2.09 2082 - - - - - - - - migration 1.35 3.10 4.11 4007 - - - - - - - - time 8.00 8.76 3.53 3650 - - - - - - - - durables 1.10 1.40 1.09 3663 - - - - - - - - savings - - - - 3.43 4.04 2.34 15037 - - - - subjective well-being - - - - - - - - 1.70 1.91 1.36 3503 food outside household - - - - - - - - 0.23 0.59 0.84 5524 all ind modules combined 21.9 25.17 22.01 6363 8.68 13.29 13.73 28958 16 20.43 17.9 5564 Cambodia Ethiopia Tanzania Average Average Average Avergae Avergae Avergae length per length per length per % HH/Ind Total question % HH/Ind Total question % HH/Ind Total question question question question Module Name answering question answered answering question answered answering question answered (minutes) (minutes) (minutes) module in module per module in module per module in module per if if if HH/Ind HH/Ind HH/Ind answered answered answered education 95 29 13 0.14 90 25 9 0.15 83 49 16 0.18 health 100 29 12 0.13 100 52 14 0.14 100 61 19 0.15 labor 90 81 20 0.17 81 62 20 0.15 83 96 29 0.20 land 56 49 25 0.11 52 154 31 0.13 53 128 23 0.18 mobile phones 54 14 9 0.13 52 12 5 0.15 53 14 6 0.16 financial assets 58 22 5 0.12 52 20 5 0.15 44 17 4 0.15 apartment 0 19 10 0.11 - - - - - - - - livestock 33 14 8 0.16 - - - - - - - - migration 63 72 14 0.19 - - - - - - - - time 57 50 49 0.16 - - - - - - - - durables 58 8 5 0.17 - - - - - - - - savings - - - - 52 38 18 0.11 - - - - subjective well-being - - - - - - - - 63 17 10 0.17 food outside household - - - - - - - - 99 15 3 0.18 Notes: Number of module observations is based on how many modules a person/household answered. Moreover, when interpreting the results in Table 4, the duration estimates for the modules on asset ownership are not conditional on ownership of the type of asset in question. For example, the incidence of financial asset ownership is fairly low across countries and thus individuals that respond “no” to the initial filter question on ownership would not complete the remaining questions. This drives down the module-specific mean and median estimates. In this respect, Table 5 provides information on how interview length varies when a respondent’s ownership of assets is taken into account. Conditional on the individual’s ownership of the applicable asset, the aver age duration estimates for the modules on mobile phones, financial assets, and livestock are still quite low across countries. On the other hand, the conditional average duration for the land module ranges from 10 to 18 minutes (in comparison to the unconditional range of 8 to 13 minutes, as shown in Table 4). 12 Table 5: Duration of Individual Questionnaire Modules on Assets, by Respondents’ Asset Ownership Status Cambodia Ethiopia Tanzania Number of Number of Number of Module Name Mean Median Std Mean Median Std Mean Median Std Obs. Obs. Obs. Non-Owners Financial assets 0.98 0.55 1.23 3459 0.75 0.63 0.38 9306 0.97 0.68 0.89 2346 Land 1.11 0.77 1.22 250 1.16 0.75 1.53 4066 1.81 1.27 1.94 275 Mobile phones 0.56 0.37 0.52 449 0.22 0.17 0.18 6784 0.57 0.45 0.45 626 Livestock 0.73 0.55 0.55 350 Owners Financial assets 3.61 2.88 2.74 200 2.97 2.43 2.12 5767 2.59 2.18 1.33 81 Land 12.04 9.58 13.26 3298 10.08 5.92 13.43 10852 18.36 13.17 16.73 1977 Mobile phones 1.76 1.37 1.37 2967 1.07 0.92 0.63 8056 1.39 1.17 0.80 1792 Livestock 2.87 2.35 2.11 1720 Notes: The tables describe the modules length for asset modules for each individual respondent. Due to the disaggregated nature of the paradata, we are also able to further estimate how average total time spent in the household varies by the number of adults targeted for individual interview. Figure 1 provides a visual comparison of the average total time spent at household (for the administration of household and individual questionnaires) versus the average total duration of individual interview targets in each household. The estimates are presented in accordance with the number of individual interview targets and the average duration estimates for the land, labor, health and education modules are included as well. Total time spent at household increases with the number of individual interview targets. In Cambodia, interviewers spent on average 103 minutes in total for the administration of household and individual questionnaires when the household only had one individual interview target. Moving from one target to two, the average total time in household increases to 152 minutes. The marginal time for each additional individual interview target, however, does decrease with the exception of Tanzania. In Ethiopia, for example, the difference in time in household with one target versus two is 41 minutes; when shifting from one targeted individual to two, the additional time per individual target is less than 22 minutes. Tanzania’s increase in additional time per interview target is the largest relative to other countries, given the number of modules the survey administered, as high as 191 minutes when moving from three to four targets and as low as 81 minutes when moving from one to two targets. 13 Figure 1: Interview Length (in Minutes) by Number of Individual Interview Targets Cambodia 231 139 4 31 23 10 11 # OF INTERVIEW TARGETS IN HH 195 109 3 26 17 8 9 152 79 2 19 13 7 7 103 37 1 8 6 4 2 MINUTES total hh time total ind mod land labor health education Tanzania 519 259 4 87 63 39 44 # OF INTERVIEW TARGETS IN HH 328 105 3 29 26 14 14 224 65 2 18 17 9 7 135 29 1 7 9 3 2 MINUTES total hh time total ind mod land labor health education 14 Ethiopia 164 77 4 18 15 9 8 # OF INTERVIEW TARGETS IN HH 142 62 3 15 12 8 7 121 47 2 9 9 8 5 80 22 1 3 5 3 2 MINUTES total hh time total ind mod land labor health education Notes: Only up to 4 targeted individuals are shown in the graphs. Most households have between one to four eligible individuals. In Ethiopia, 95.75% of households have 4 eligible individuals or less. This number is 93.92% for Cambodia and 90.88% in Tanzania. Overall, except for a few more complex-to-implement modules, module duration estimates appear to be modest for most of the individual questionnaire modules. While including modules that would be deemed critical for development research and policy making may not add as much time on the margin as one may think, the total respondent burden implied by the administration of both household and individual questionnaires is considerable irrespective of the country. As such, analyzing survey paradata to obtain objective proxies for respondent burden is precisely what survey practitioners should do in order to make evidence-based decisions regarding survey data collection, in particular regarding the scope of both household- and individual-level data collection vis-à-vis the budget constraints and the data priorities. 3.2. Cost Estimates While the limited cross-country availability of information on household survey costs continues to be a challenge for the international statistical community and the donor organizations, even when available, computing unit cost estimates on the basis of the number of interviewed households is a second-best approach when the surveys exhibit significant heterogeneity in terms of questionnaire design and field work organization (e.g. one visit versus multiple visits to sampled 15 households or use of resident interviewers versus mobile field teams (see, for instance, the analysis of Kilic et al., 2017). In view of these challenges, in this section, we combine (a) the paradata on total time spent on the administration of household and individual questionnaires with (b) the information on survey implementation costs to create the estimated unit cost for a minute of face- to-face multi-topic survey data collection in Cambodia, Ethiopia and Tanzania. This approach enhances the precision with which country-specific unit costs can be compared. These estimates, and comparable measures from other countries, can also be paired with interview-, module- or question-specific average duration measures to create more precise estimates of the costs associated with specific types of data collection in future surveys. To estimate the unit cost for a minute of survey data collection, one can divide the total survey budget with the total interview length of the survey. The total adjusted interview length has been cleaned to include only key aspects of the interview process (i.e., certain events are excluded in the time calculation process, see earlier section for how the data was cleaned). The disaggregated survey budgets that we have access to are comparable in terms of the main budget categories, including household listing, piloting, recruitment of staff, training and field practice, fieldwork implementation (including renumeration and transportation costs), and purchases of equipment and office materials. Table 6 reports our cost estimates. In Cambodia, the cost of a minute of survey data collection is estimated at $0.87, while the comparable figure is $1.71 in Ethiopia, and $3.94 in Tanzania. With the estimated unit cost, we are able to calculate the average cost of a household interview, which is $71 in Cambodia, $131 in Ethiopia. $472 in Tanzania. Each additional individual interview per household, on average, costs $22 in Cambodia, $23 in Ethiopia, and $80 in Tanzania. 16 Table 6: Cross-Country Cost Comparisons Ethiopia Tanzania Cambodia ESS NPS LSMS+ 2018/2019 2019/2020 2019/2020 Unit Cost of a Minute of Interview Time $1.71 $3.94 $0.87 (USD in 2019 Prices)* Household Interviews # of Completed Interviews 6,770 1,323 1,519 # of Modules in Household Questionnaire 15 16 11 # of Questions in Household Questionnaire 418 397 226 Average Duration of a Household Interview (Minutes) 77 120 82 Average Cost of a Household Interview $131 $472 $71 Individual Interviews # of Completed Interviews 29,038 5,564 6,363 # of Modules in Individual Questionnaire 7 8 12 # of Questions in Individual Questionnaire 363 382 392 Average Duration of an Individual Interview 13 20 25 Average Cost of an Individual Interview $23 $80 $22 Notes: Unit cost of a minute of interview time is calculated as the ratio of total survey cost and total duration of all completed household and individual interviews in the country. Survey costs include the costs for household listing, piloting, recruitment of staff, training and field practice, fieldwork (field staff salaries and per diems, managerial staff per diems, vehicle rental and maintenance, and fuel), equipment, and office material purchased. 4 Interviewer Effects on Interview Duration Given the connection between interview length and survey costs, it is important that survey practitioners understand factors related to fieldwork implementation that may affect interview duration, especially those that can be improved with enhanced training and supervision to also improve the efficiency and quality of surveys. Interviewers are undoubtedly among these factors. In view of the central role that interviewers play in face-to-face survey data collection, this section presents results from the analyses that seek to document potential interviewer effects on survey data collection, specifically interview duration. Given the complexity of multi-topic survey data collection that interviewers are entrusted with, interviewers should be subject to rigorous training and field practice to minimize heterogeneity across interviewers in all aspects of data collection from human subjects. If there are variations in interview length by interviewer, this could suggest that additional training, field practice and fieldwork supervision may be needed, including also a critical evaluation of the interviewer pool and the recruitment practices for future surveys. The type of analysis that we showcase can be conducted both during and after survey fieldwork – while 17 the former may lead to additional steps for “course correction” in the context of an ongoing survey, the latter would be useful for decisions regarding future surveys. In the literature, the term “interviewer effect” has been used to reference interviewer contributions to variations in interview outcomes. Interviewers can contribute to the variability of respondents’ answers, non-response/survey participation, measurement error or bias, and interview length. West and Blom (2017) summarizes findings concerning interviewer effects, focusing on high-income countries. In low- and middle-income countries, interviewer characteristics such as gender (Flores- Macias et al. 2008; Vollmer et al. 2021) and ethnicity (Adida et al. 2016) have been found to affect responses. Research has also demonstrated that interviewer characteristics can affect answers to sensitive questions, including those on political preference in Uganda (Maio and Fiala 2020), domestic violence in India (Singh et al. 2022), and abortion (in the context of Demographic and Health Surveys) (Leone et al. 2021). The paradata-driven analyses of interviewer effects on survey duration have been limited in low- and middle-income countries until now - in contrast with the numerous applications that have been focused on high-income contexts. The latter category of research has typically relied on multi- level models, which are typically used for analyzing data with a hierarchical structure or when the data is nested within a larger group (such as responses within a group of interviewers). These models have been used to decompose the contributions of interviewer effects, enumeration area effects, and respondent effects towards the variance in survey duration. In high-income countries, the effects of these variables have overall been quite modest (Couper and Kreuter, 2013).10 With the increasing availability of CAPI/CATI paradata, there is now an opportunity to undertake similar analyses in low- and middle-income countries. In what follows, we use the CAPI survey paradata at our disposal and obtain multi-level model estimations to discern interviewer effects on survey duration in Cambodia, Ethiopia and Tanzania. Our analysis differs from the previous paradata-powered research in the sense that we do not model question-level duration but rather focus on module duration and conduct separate analysis for each module. This allows us to contrast results across various modules, currently a significant gap in the literature. Specifically, we estimate a range of multi-level models that are specific to each individual and household questionnaire module. The dependent variable is always the module duration in minutes. To model duration for individual questionnaire modules, we first estimate Equation 1 as 10 Couper and Kreuter (2013), for example, found that the interviewer variable contributes to less than 2 percent of the variation in interview-length while the respondent variable contributes about 3.8 to 6.3 percent, depending on the model. Most of the variation is at the question-level - about 96 percent, again depending on the model. The authors noted that this result is consistent in the literature and thus one should expect that interviewer and respondent contribution to not impact survey time much when conducting fieldwork. 18 a basic intercept model, solely with random effects for interviewers and enumeration areas - i.e., the hierarchy defined for the multi-level model: ℎ = + + + ℎ (Equation 1) where ℎ is the module duration for individual (or household ℎ if it is a household interview) who is interviewed by interviewer j in enumeration area k; is the intercept; uj is the interviewer random effect; ujk is the EA random effect; and eihjk is the error term, with ~(0, ) and ℎ ~(0, ).11,12 Equation 2 builds on Equation 1; is estimated for the individual questionnaire modules; brings in vectors of individual (I) and household (H) attributes; and otherwise leaves the hierarchy unaltered: ̂ + ℎ = ̂ ℎ + ̂ℎ + ̂ + ̂ + ̂ℎ (Equation 2) Equation 3 is the equivalent of Equation 2 that is estimated for household questionnaire modules and that only controls for household attributes: ̂ + ℎ = ̂ ̂ ̂ ℎ + ̂ + ̂ ̂ ̂ℎ ̂ + (Equation 3) Furthermore, each survey attempted to administer an individual questionnaire in private to each man and woman in sampled households. This setup enables us to estimate additional specifications as sensitivity checks to see whether an alternative hierarchical structure may change the results. Instead of including EA random effects, we include household random effects in addition to interviewer effects. Equation 4 is this new specification for the intercept-only model and Equation 5 builds on Equation 4 and controls for the vectors of individual and household attributes: ℎ = ̇ + ̇ ℎ + ̇ ℎ + ̇ℎ (Equation 4) ℎ = ̈ + ̈ ℎ + ̈ ℎ + ̈ ℎ + ̈ ℎ + ̈ℎ (Equation 5) 11 For further descriptions of multi-level models, see Gelman and Hill (2007). 12 The main objective of the surveys that inform our research was not in analyzing interviewer effects — as such, the interviewers were not randomly assigned to sampled households and individual interview targets. Following common practices in large-scale household surveys, NSOs considered regional language requirements in composing field teams that were in turn assigned to regions with matching language profiles. While there may be unobserved variables that may confound our analyses, given the lack of randomized assignments of interviewers to sampled households and individual interview targets, we prioritize the discussion of the results from the estimations that control for observable individual- and household-level attributes. 19 Multi-level models allow us to calculate the interclass correlation (ICC). Based on Equation 1, the total ICC, encompassing both interviewer and enumeration area components, can be defined as: 2 2 + + = 2 + 2 + 2 ICC describes how much of the variation in interview length is attributed to the grouping structure for the module: the same enumeration area group and the same interviewer group. One could also think of the ICC as the variance of the intercept, which in our case can be further decomposed into enumeration area and interviewer components. A large variance reflects a large variation in interview length, while controlling for other factors. A large interviewer component for the ICC indicates that the distribution of interview slope across interviewers is quite spread out relative to other variance components. We thus use this measure to denote the interviewer effects: 2      = 2  + 2  + 2        In the cases of Cambodia and Tanzania, Equations 1-3 are estimated with the identical hierarchical structure (i.e., with the EA and interviewer random effects). For Ethiopia, since the survey relied on resident interviewers, each of whom was assigned to a single EA, Equations 1-3 are estimated with only one hierarchical structure that captures both interviewers and EAs. For consistency with other countries, we do label the resulting ICC estimates as interviewer effects. When we estimate Equation 5 for Ethiopia, however, we present ICC components for households and interviewers separately, and the comparison of the Ethiopia-specific findings from the estimations of Equations 2 and 5 in fact reveal only marginal differences between the Equation 2 total ICC and the Equation 5 ICC interviewer component for the individual questionnaire modules. Tables 7a – 7c presents the multi-level model results, controlling for individual and household attributes (Equations 2 and 3) for each module administered in the three countries.13 The results 13 The results from the intercept-only Equations 1 and 4 are provided in the Appendix Table A3.1 – A3.3 in the interest of brevity. The vector of household-level attributes in Equations 2 and 3 includes household size, dependency ratio, and the dichotomous variables that identify (i) whether the head of household is female; (ii) whether the household is in a rural area; (iii) non-food consumption quintiles (with the first quintile being designated as the comparison category); (iv) whether the dwelling is built with concrete/bricks; (v) whether there is piped water into the dwelling; (vi) whether a toilet facility is available in the household or elsewhere; and (vii) whether the household is an agricultural household. The vector of individual-level attributes in Equations 3 and 5 includes the highest year of education of the individual; and the dichotomous variables that identify (i) whether the individual is female, (ii) whether he/she is the head of household; (iii) whether the person worked in any of the following employment categories: own farm, wage/salary work, and self-employed activities; (iv) whether the person is married; (v) the age group for the individual (18 - 24 years old, 25 - 34 years old, 35 - 44 years old as the reference variable, 44-54, and 20 do not vary significantly when controlling for other covariates.14 The first column of Table 7a 2 2 2 reports the sum of all variances, + + which is the denominator of the ICC. The second column reports the overall rank of the sum of variance relative to other modules. The sum of variance can be decomposed into three components. The variance of the residual is provided as a benchmark in column three. In the remaining columns, we provide three types of ICC and their ranking: ICC Total and the two ICC subcomponents associated with enumeration areas (ICC-EA) and interviewers (ICC-I). The latter is our measure of the interviewer effects on survey duration. The last column indicates the share of ICC-I in ICC Total. The list of modules in Table 7a -7c are ranked by the sum of variance. For example, in Table 7a for Cambodia, the combination of all variances for the Cambodia time use module is 0.165. The ICC-I or the interviewer effect contributes to 27 percent of the total variance, 53.39 percent of total ICC, showing that the interviewer effect contributes slightly more than the EA grouping structure. As a comparison, the land module has a significantly higher total sum of the variance (3.184). Only 6.4 percent of this variance is attributed to the interviewer effect and the interviewer effect accounts for 32.6 percent of ICC Total. This means that the enumerator area contributes much more than the interviewer effect. However, the overall variance of the land 2 modules is coming from the residual, . Several key findings emerge from Tables 7a-7c. The ICC Total and ICC-I both vary by module, despite having the same interviewer sample across the modules. However, our discussion focuses on the modules that have high sum of variance estimates as well as high ICC-I measures, since these should be thought as the most consequential modules for which additional measures can be instituted to minimize interviewer effects. These measures can include additional field staff training, and fieldwork supervision, complemented by rigorous data quality monitoring and feedback provision to the interviewers. The module-specific findings are rich and revealing of cross-country consistency. Food consumption, household roster, and non-farm enterprises are consistently among the top 5 household questionnaire modules in terms of the sum of variance in duration. For the food consumption, 22 to 50 percent of the variability, depending on the country, is attributable to 55 years and above), and (vi) whether the individual reported to own any of the following assets: land, financial assets, and mobile phones. 14 The full results with the regression coefficients are not provided here due to space limitations but are available upon request. The significance of certain covariates in influencing module duration varies by module and country and thus investigation of these coefficients should depend on the question of interest of the survey designer. For example, we found that being female is negatively correlated with interview length for education module and labor module, positively correlated for the land module, and insignificant for other modules in Ethiopia. In Cambodia, age group (being older) is significant and positively correlated to length in education and labor module as well, but insignificant for other modules. The coefficient for rural is negatively significant for the non-farm enterprise module while positively significant for the land roster module. The significance of individual characteristics and household characteristics thus varies by country and module of interest. The coefficient on the covariates itself might be of interest for practitioners and this highlights the insights multi-level models can provide. 21 interviewers (i.e., the ICC-I estimates). The comparable ranges are 7 to 27 percent for the household roster module and 5 to 17 percent for the non-farm enterprises module. Given the importance of the household roster and food consumption modules for consumption-based monetary poverty and inequality measurement in these contexts, minimizing the extent of interviewer effects in the administration of these modules can have non-negligible implications for the quality of survey data on demographics, poverty and inequality. Country-specific findings indicate additional household questionnaire modules that rank among the top 5 in terms of sum of variance and that are also associated with ICC-I estimates of at least 10 percent. These include housing, non-food consumption, consumer durables and livestock, with interviewer effects explaining 14 to 40 percent of total variance in module duration, depending on the module. Similarly, in the majority of the countries, labor, health and land rank among the top 5 individual questionnaire modules in terms of sum of variance. The ICC-I estimates range from 10 to 25 percent for the labor module, 11 to 15 percent for the health module and 6 to 50 percent for the land module, depending on the country. Additional modules emerge in specific countries with high total variance and ICC-I estimates (of at least 10 percent), including time use and subjective wellbeing, and modules that aim to capture the ownership of and rights to livestock and financial assets. In these cases, depending on the module, interviewer effects explain 11 to 29 percent of total variance in module duration. The individual questionnaire modules that we single out in this discussion all collect requisite information for an extensive range of individual-level indicators that are required for the monitoring of the Sustainable Development Goals. 22 Table 7a: Selected Multilevel Model Estimation Results from Cambodia Results from Models With Covariates Variance % of Sum of Rank of ICC Total Rank ICC-EA ICC-I Rank ICC-I in Variance Residual ICC Total Individual Qx Module (Equation 2) land 3.958 1 3.184 0.196 6 0.132 0.064 9 32.60 labor 0.411 2 0.309 0.247 4 0.146 0.101 5 40.84 migration 0.396 3 0.337 0.149 9 0.085 0.064 8 42.99 time use 0.334 4 0.165 0.506 1 0.236 0.270 1 53.39 livestock 0.102 5 0.074 0.274 2 0.162 0.112 3 40.96 health 0.074 6 0.061 0.173 7 0.059 0.113 2 65.59 mobile phones 0.045 7 0.033 0.266 3 0.214 0.052 10 19.54 education 0.035 8 0.030 0.132 10 0.061 0.071 6 53.83 financial assets 0.030 9 0.026 0.151 8 0.081 0.069 7 46.01 durables roster 0.028 10 0.022 0.198 5 0.096 0.101 4 51.20 Household Qx Module (Equation 3) non-farm enterprise 1.517 1 1.379 0.091 10 0.040 0.051 10 55.88 children elsewhere 1.097 2 1.008 0.081 11 0.034 0.048 11 58.73 food cons. 0.890 3 0.383 0.570 1 0.129 0.442 1 77.46 household roster 0.537 4 0.369 0.312 4 0.203 0.110 5 35.07 non-food cons. 0.253 5 0.133 0.474 2 0.105 0.369 2 77.82 land roster 0.157 6 0.112 0.287 5 0.161 0.126 4 43.99 housing 0.076 7 0.042 0.444 3 0.170 0.274 3 61.65 cover 0.024 8 0.020 0.157 7 0.053 0.104 6 66.27 livestock roster 0.014 9 0.011 0.195 6 0.119 0.077 9 39.23 consumer durables 0.009 10 0.008 0.139 8 0.058 0.080 7 57.91 apartment roster 0.002 11 0.002 0.124 9 0.045 0.079 8 63.77 # of EAs 252 # of Interviewers 42 23 Table 7b: Selected Multilevel Model Estimation Results from Tanzania 24 Table 7c: Selected Multilevel Model Estimation Results from Ethiopia In contrast to findings from higher-income contexts using multi-level models, the contribution of area and interviewer effects to the variance of interview length is much higher in our surveys of interest. Our finding is more consistent with findings from the developing country contexts which show that the interviewer effects can be large - although those results were obtained from different methodologies and did not include interview length as the outcome of interest. Couper and Kreuter (2013), in a higher-income country context, found that interviewer and respondent effects, in total, account for at most 7 percent of the total variance in interview length.15 In our results, total ICC (inclusive of both interviewer and enumeration area effects) can as high as 50 percent depending on the module, and the interview effect often constitutes a significant share of total ICC, for example accounting for 99.4% of ICC Total for the food consumption module in Tanzania. As a sensitivity check, we also report results of a multi-level model that sets the hierarchical grouping structure at the household- and the interviewer-level and that controls for a range of individual- and household-level covariates, as specified above (Equation 5). It is possible that 15 Couper and Kreuter (2013) have respondent level effects since they were looking at question-level observations which we do not have in our model setup. 25 certain households (with only one or multiple individuals interviewed) take longer to interview which contributes to the high variance in interview length. Our results in Table 8,16 however, are still consistent with the results reported in Tables 7a-7c for the individual questionnaire modules. One difference observed in Table 8 is that the ICC interview is marginally less than what is reported in Tables 7a -7c because the grouping structure is now larger, with each household being a unit and a large number of household slopes being estimated. 16 A model based on Equation 4 with no covariates was also estimated and presented in the Appendix Table A4. 26 Table 8: Selected Multilevel Model Estimation Results with an Alternative Hierarchical Structure 27 5. Conclusion The goal of this paper is to showcase the power of paradata in generating operationally relevant insights to assist in designing household surveys and improving the quality of survey data collection in low- and middle-income countries. The analysis utilizes the timestamped paradata for the nationally representative household surveys that were implemented by NSOs in Cambodia, Ethiopia and Tanzania between 2018 and 2020. Each survey coupled a multi-topic socioeconomic household questionnaire with an individual questionnaire that was attempted to be administered to adult household members in private and that elicited in-depth individual-level data on a range of topics, with a strong focus on labor and asset ownership. Our paper conducts a range of analyses related to interview length that yielded interview length estimates at the module-, individual-, and household-level; unit cost estimates as a function of interview length and total survey budget; and estimates of interviewer effects on variation in module duration. Module duration estimates constitute useful reference points for survey practitioners aiming to conduct similar data collection in comparable contexts. Often the decision concerning whether to include an additional module or an additional person to interview depends on how this would affect overall interview burden and overall cost. Due to the disaggregated nature of paradata, the smallest unit of analysis is an “event” which provides the time a specific question is asked and therefore allows us to construct various measures of interview length at the module, individual, and household levels. We provide duration estimates for each module per country as well as an estimate of average total time spent in the household, which ranges from 82 minutes in Cambodia to 120 minutes in Tanzania. The food consumption module is the longest module, on average, ranging from 22 to 26 minutes, depending on the country. Additional high duration household modules include non-food consumption, housing, and household roster. Although not done in this paper, one could potentially conduct an analysis at the question level to find ways to make the module more efficient in terms of time and the amount of information the module extracts. For survey practitioners interested administering individual questionnaires to adult household members, we estimate that the additional time spent on individual data collection for all adult household members in a given household ranges from 37 to 48 minutes in Cambodia, 22 to 44 minutes in Ethiopia, and 81 to 191 minutes in Tanzania. These numbers do depend on how many modules are administered per person which explain why Tanzania has a higher additional duration compared to others given its higher number of individual modules administered. However, the individual questionnaire, on average, only takes 13 to 25 minutes to administer to each adult, depending on the country. The top time-consuming individual questionnaire modules include land, labor, health, and education. Furthermore, combining the paradata with the total survey budgets allows us to provide estimates of each minute of survey data collection. For instance, the unit cost estimate is $1.71 in Ethiopia, 28 implying that the administration of the socioeconomic household questionnaire to a sampled household would cost, on average, $131, and that the administration of the multi-topic individual questionnaire to an adult household member would cost, on average, $23. Constructing the unit cost estimate per minute of data collection promotes comparability in cost estimation given the cross-country differences in questionnaire design - unlike previous attempts that had to report, due to lack of paradata, country-specific cost estimates per completed interview. This approach also allows for a cross-country consistent cost assessment of specific modules and questions. The resulting estimates are helpful in understanding the budget implications of questionnaire design decisions for future surveys. Finally, we use the paradata to measure the interviewer effects on variation in interview duration. Multi-level models permit the estimation of the ICC, which, through its decomposition, allows us to capture the extent to which the residual variation in module duration is attributable to interviewers. The estimated interviewer effects are significantly larger than the previous estimates from high-income contexts and reveal a high degree of cross-country consistency. The discussion focused on the identification of household and individual questionnaire modules that may benefit from additional interviewer training, fieldwork supervision and data quality monitoring – based on the modules that are among the top 5 in terms of total variance in duration and that register elevated estimates of the interviewer component of the ICC. We see that food consumption, household roster and non-farm enterprises consistently emerge among the top 5 household questionnaire modules in terms of total variance in duration, with 5 to 50 percent of the variability being attributable to interviewers. Similarly, labor, health and land appear among the top 5 individual questionnaire modules in terms of total variance in duration, with 6 to 50 percent of the variability being attributable to interviewers. There is in fact a near-perfect overlap between these modules that are singled out in the discussion of interviewer effects and those that are identified as the top time-consuming household and individual questionnaire modules. And these modules are central to understanding monetary poverty and inequality in these contexts and to monitoring a range of individual-level indicators for the Sustainable Development Goals. Overall, our findings reveal important insights from survey paradata on module development and implementation, in otherwise understudied low- and middle-income contexts. The type of analyses conducted in this paper can be considered by NSOs and survey practitioners for high-frequency and disaggregated insights regarding respondent burden and identification of priority modules with elevated interviewer effects in duration analysis such that timely training and field supervision measures that can be deployed to minimize these effects during ongoing surveys – particularly in the context of large-scale household surveys that can span up to 12 months in low- and middle- income contexts. Doing so would be in line with the calls for building NSO technical capacity in the use of paradata for household survey design, management, and quality control (Carletto et al., 2022). 29 References Adida, C. L., Ferree, K. E., Posner, D. N., & Robinson, A. L. (2016). Who’s asking? Interviewer coethnicity effects in African survey data. Comparative Political Studies, 49(12), 1630- 1660. Carletto, C., Chen, H., Kilic, T., and Perucci,. F. (2022). “Positioning household surveys for the next decade.” Journal of the International Association for Official Statistics, 38(3), 923- 946. Choumert‐Nkolo, J., Cust, H., & Taylor, C. (2019). Using paradata to collect better survey data: Evidence from a household survey in Tanzania. Review of development economics, 23(2), 598–618. Couper, M. (1998). Measuring survey quality in a CASIC environment. Proceedings of the Survey Research Methods Section of the ASA at JSM 1998: 41-49. Couper, M., & Kreuter, F. (2013). Using paradata to explore item level response times in surveys. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176: 271- 286. FAO, World Bank, UN-Habitat. (2019). Measuring Individuals' Rights to Land: An Integrated Approach to Data Collection for SDG Indicators 1.4.2 and 5.a.1. Washington, DC: World Bank. https://openknowledge.worldbank.org/handle/10986/32321 Flores-Macias, F., & Lawson, C. (2008). Effects of interviewer gender on survey responses: Findings from a household survey in Mexico. International journal of public opinion research, 20(1), 100-110. Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models (Analytical Methods for Social Research). Cambridge: Cambridge University Press. Gordeev, V. S., Akuze, J., Baschieri, A., Thysen, S.M., Dzabeng, F., Haider, M.M., Smuk, M., Wild, M., Lokshin, M.M., Yitayew, T.A., Abebe, S.M., Natukwatsa, D., Gyezaho, C., Amenga-Etego, S., Lawn, J.E., Blencowe, H., & Every Newborn-INDEPTH Study Collaborative Group. (2021) Paradata analyses to inform population-based survey capture of pregnancy outcomes: EN-INDEPTH study. Population Health Metrics.19:10. Gourlay, S., Kilic, T., Martuscelli, A., Wollburg, P., Zezza, A. (2021). “High-frequency phone surveys on COVID-19: good practices, open questions.” Food Policy, 105, 102153. Hasanbasri, A., Kilic, T., Koolwal, G., & Moylan, H. (2021). LSMS+ Program in Sub-Saharan Africa: Findings from Individual-Level Data Collection on Labor and Asset Ownership. Washington, D.C.: World Bank. ILO. (2018). Report III: Report of the Conference, 20th International Conference of Labour Statisticians (Geneva, 10–19 October 2018), International Labour Office, Department of Statistics, Geneva. Jans, M., Sirkis, R., Schultheis, C., Gindi, R., and Dahlhamer, J. (2011). “Comparing CAPI trace file data and quality control reinterview data as methods of maintaining data quality.” 30 American Statistical Association Proceedings of the Survey Research Methods Section. Retrieved from http://www.asasrms.org/Proceedings/y2011/Files/300407_64067.pdf. Kilic, T., Serajuddin, U., Uematsu, H., & Yoshida, N. (2017) Costing Household Surveys for Monitoring Progress Toward Ending Extreme Poverty and Boosting Shared Prosperity. World Bank Policy Research Working Paper 7951. Kilic, T., Moylan, H., & Koolwal, G. (2021). Getting the (Gender-Disaggregated) lay of the land: Impact of survey respondent selection on measuring land ownership and rights. World Development, 146. Kilic, T., Broeck, G., Koolwal, G., & Moylan, H. (2022). Are You Being Asked? Impacts of Respondent Selection on Measuring Employment in Malawi. Journal of African Economies. Kreuter, F., Couper, M., & Lyberg, L. (2010). The use of paradata to monitor and manage survey data collection. In Proceedings of the joint statistical meetings, American Statistical Association (pp. 282-296). Alexandria, VA: American Statistical Association. Kreuter F. (eds.) (2013). Improving surveys with paradata: analytic uses of process information. Hoboken, New Jersey: John Wiley & Sons, Inc. Kreuter, F., & Olson, K. (2013). Paradata for nonresponse error investigation. Improving surveys with paradata: Analytic uses of process information, 2, 13-42. Leone, T., Sochas, L., & Coast, E. (2021). Depends who's asking: Interviewer effects in demographic and health surveys abortion data. Demography, 58(1), 31-50. Maio, M. D., & Fiala, N. (2020). Be Wary of Those Who Ask: A Randomized Experiment on the Size and Determinants of the Enumerator Effect. The World Bank Economic Review, 34 (3): 654–669. Murphy, J. J., Chew, R., Biemer, P. P., Duprey, M. A., Harris, K. M., & Halpern, C. T. (2019). Interactive visualization to facilitate monitoring longitudinal survey data and paradata. Retrieved from: www.ncbi.nlm.nih.gov/books/NBK545492/ Singh, A., Kumar, K., & Arnold, F. (2022). How interviewers affect responses to sensitive questions on the justification for wife beating, the refusal to have conjugal sex, and domestic violence in India. Studies in family planning, 53(2):259-279. United Nations. (2019). Guidelines for Producing Stataistics on Asset Ownership from a Gender Perspective. Retrieved from: https://unstats.un.org/edge/publications/docs/Guidelines_final.pdf Vollmer, N., Singh, M., Harshe, N., & Valadez, J. J. (2021). Does interviewer gender influence a mother’s response to household surveys about maternal and child health in traditional settings? A qualitative study in Bihar, India. Plos One, 16 (6). Virgile, M. (2016). “Measurement error in American Community Survey paradata and 2014 redesign of the contact history instrument.” United States Census Bureau Research Report Series: Survey Methodology #2016-01. West, B. T., & Blom, A. G. (2017). Explaining interviewer effects: A research synthesis. Journal of survey statistics and methodology, 5(2), 175-211. 31 Appendix Table A1: Description of Household and Individual-level Modules Included in (labeled x): Overview of content(1) Cambodia Ethiopia Tanzania Household modules (1) Cover Household location, contact information, interview and interviewer details x x (2) Household roster Listing of all household members, including relationship to head, age, x x X marital status, and residence in the household (3) Food consumption Household’s consumption of food in the last 7 days (quantities and x x x expenditure), as well as amount from own production (4) Food aggregate Aggregated and abbreviated module on food expenditure in the last 7 days, x across main categories of food, as well as number of days these foods were eaten (5) Non-food consumption Household expenditure on non-food items over different reference periods Last 1, 3, 6 Last 1 Weekly/ months, month, and monthly, and and annual annual annual (6) Housing Access to infrastructure and services; details on house construction x x x (7) Land roster Listing of all the land parcels owned by the household, whether residential or x x x non-residential (8) Livestock roster Listing of livestock owned by the household x x (9) Consumer durables Listing of durable goods owned by the household, across electronics, x x x appliances, vehicles and large tools/implements (10) Children living elsewhere Listing of children aged 15 and older living outside the household, including x whether for education/employment (11) Household non-farm enterprises Details on non-farm enterprises owned by the household, including industry, x x x sales, and costs (12) Credit Household borrowing from different sources x x (13) Finance Household use of financial services to transfer money x (14) Food security Household’s ability to afford food for family members in the last week/year x x (15) Shocks Natural, economic and household shocks faced by the household in the last x x year (16) Other income Other sources of income received by the household, including transfers and x remittances (17) Assistance Public/government assistance received by the household x x (18) Anthropometry Height and weight measurements for household members x (19) Deaths in the household Children and adults who had died in the last two years x Individual modules (1) Education Educational attainment (years of schooling, etc.), travel and expenses x x x (2) Health Sudden and chronic illnesses/injuries, details on any treatment received x x x (3) Labor Employment and non-market activities in the last 7 days x x x (4) Land Ownership and rights to residential and non-residential land parcels covered x x x in the household land roster, as well as rights over land (to sell, bequeath, invest, and use as collateral). Whether ownership and rights were exclusive/joint with other individuals was also covered (5) Mobile phones Ownership of mobile phones, as well as whether exclusively or jointly x x x owned (6) Financial assets Ownership of financial accounts (across formal and informal sources), as x x x well as whether exclusively or jointly owned (7) Livestock Ownership of livestock listed in the household livestock roster, as well as x whether exclusively or jointly owned (8) Migration Information on migration history of an individual (time and purpose of x migration) (9) Time use Time use diary on main and secondary activities (conducted within 15- x minute increments) in the last 24 hours (10) Durables Ownership of durables listed in the household consumer durables roster x (11) Savings Savings across different sources (formal and informal) x (12) Subjective well-being Respondent’s satisfaction over different aspects of his/her life x (13) Food consumed outside the household Expenditure on food consumed outside the household x 32 Table A2: Respondent Characteristics Ethiopia Tanzania Cambodia Men Women Men Women Men Women 0.64* Household head 0.66*** 0.22*** 0.20*** 0.64*** 0.15*** ** Age: 18-24 0.25 0.25 0.22* 0.27* 0.15* 0.13* 0.36* Age: 25-34 0.28** 0.30** 0.27*** 0.26** 0.24** ** Age: 45-54* 0.12** 0.11** 0.12 0.12 0.14 0.15 Age: 55+ 0.16*** 0.13*** 0.15 0.18 0.21*** 0.26*** 0.10* Never attended school 0.42*** 0.61*** 0.20*** 0.90*** 0.77*** ** Years of school, if attended 7.82 7.67 7.43 7.67 7.29*** 6.18*** Married 0.63 0.62 0.53 0.49 0.78*** 0.69*** 0.06* Separated/divorced 0.02*** 0.08*** 0.11*** 0.01*** 0.04*** ** 0.01* Widowed 0.01*** 0.10*** 0.10*** 0.03*** 0.15*** ** Months resp. is away from 0.38*** 0.32*** 1.00 0.91 0.72*** 0.38*** Household Last 7 days: work in 0.30* 0.10*** 0.04*** 0.13*** 0.44*** 0.26*** salary/wage ** Last 7 days: work in non- 0.089 0.086 0.18* 0.15* 0.16* 0.18* farm enterprise Last 7 days: work in 0.57*** 0.36*** 0.43 0.43 0.46 0.43 agriculture Household size 5.42*** 5.19*** 6.29 6.13 4.80** 4.69** Household dependency 0.81* 0.68 0.72 0.95*** 1.50*** 1.41*** ratio† ** Household has electricity ‡ 0.30*** 0.34*** 0.66 0.64 0.85 0.86 Household has piped water ‡ 0.17*** 0.19*** 0.41 0.40 0.26 0.27 Household: walls made of 0.06*** 0.07*** 0.20 0.22 0.26 0.26 concrete ‡ Lives in urban area 0.28*** 0.31*** 0.31 0.28 0.27 0.27 Observations 7235 8153 1407 1576 1845 2093 Notes: (1) All estimates are weighted. Statistically significant differences between men and women, within each survey, are indicated by asterisks (***p<0.01, ***p<0.05, * p<0.10). * Excluded category is 35-44. †Indicates dependency ratio of children and elderly. 33 Table A3.1 Selected Multilevel Model Estimation Results from Cambodia 34 Table A3.2 Selected Multilevel Model Estimation Results from Tanzania 35 Table A3.3 Selected Multilevel Model Estimation Results from Ethiopia 36 Table A4: Selected Multilevel Model Estimation Results with an Alternative Hierarchical Structure 37