Policy Research Working Paper 9838 Poverty Imputation in Contexts without Consumption Data A Revisit with Further Refinements Hai-Anh H. Dang Talip Kilic Calogero Carletto Kseniya Abanokova Development Economics Development Data Group November 2021 Policy Research Working Paper 9838 Abstract A key challenge with poverty measurement is that house- employment variables—provides poverty estimates that are hold consumption data are often unavailable or infrequently not statistically significantly different from the true poverty collected or may be incomparable over time. In a devel- rates. In many cases, these estimates even fall within one opment project setting, it is seldom feasible to collect full standard error of the true poverty rates. Adding geospatial consumption data for estimating the poverty impacts. variables to the imputation model improves imputation While survey-to-survey imputation is a cost-effective accuracy on a cross-country basis. Bringing in additional approach to address these gaps, its effective use calls for community-level predictors (available from survey and a combination of both ex-ante design choices and ex-post census data in Vietnam) related to educational achievement, modeling efforts that are anchored in validated proto- poverty, and asset wealth can further enhance accuracy. Yet, cols. This paper refines various aspects of existing poverty there is within-country spatial heterogeneity in model per- imputation models using 14 multi-topic household sur- formance, with certain models performing well for either veys conducted over the past decade in Ethiopia, Malawi, urban areas or rural areas only. The paper provides oper- Nigeria, Tanzania, and Vietnam. The analysis reveals that ationally-relevant and cost-saving inputs into the design including an additional predictor that captures house- of future surveys implemented with a poverty imputation hold utility consumption expenditures—as part of a basic objective and suggests directions for future research. imputation model with household-level demographic and This paper is a product of the Development Data Group, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at hdang@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Poverty Imputation in Contexts without Consumption Data: A Revisit with Further Refinements Hai-Anh H. Dang, Talip Kilic, Calogero Carletto and Kseniya Abanokova* Keywords: consumption, poverty, survey-to-survey imputation, household surveys, Vietnam, Ethiopia, Malawi, Nigeria, Tanzania, Sub-Saharan Africa. JEL Codes: C15, I32, O15. * Dang (hdang@worldbank.org; corresponding author) is a senior economist in the Data Production and Methods Unit, Development Data Group, World Bank and is also affiliated with GLO, IZA, Indiana University, and International School, Vietnam National University, Hanoi; Kilic (tkilic@worldbank.org) is a senior economist in the Data Production and Methods Unit, Development Data Group, World Bank; Carletto (gcarletto@worldbank.org) is the manager of the Data Production and Methods Unit, Development Data Group, World Bank; Abanokova (kabanokova@worldbank.org) is a consultant for the Data Production and Methods Unit, Development Data Group, World Bank and is a junior research fellow at the Higher School of Economics, National Research University, Russia We would like to thank (i) Madeleine Gauthier, Dean Jolliffe, and Anne Swindale for their helpful feedback on the earlier drafts, (ii) Siobhan Murray and Linh Hoang Vu for their help with processing the GIS data and expenditure data, and (iii) Minh Do for excellent research assistance. We are grateful for the funding from the United States Agency for International Development (USAID). 1. Introduction A key challenge with poverty measurement is the inadequacy of household consumption (or income) data, which underlie poverty estimates. Such data may simply be unavailable or may not be comparable from one survey round to the next. This data-scarce situation, regarding both data quantity and quality, occurs for various reasons ranging from lack of financial resources to local capacity constraints, or even difficulties with survey implementation because of conflicts. Indeed, Serajuddin et al. (2015) show that over the period 2002- 2011, of the 155 countries for which the World Bank monitors poverty data using the World Development Indicators (WDI) database, almost one-fifth (i.e., 28) have only one poverty data point and as many as 29 countries do not have any poverty data point in the same period. Worse still, poorer countries have fewer surveys: a 10-percent increase in a country’s household consumption level is associated with almost one-third (i.e., 0.3) more surveys (Dang, Jolliffe, and Carletto, 2019). 1 Even for middle- income countries with an established and long-running household consumption survey such as India, concerns have been raised over varying degrees of incompatibilities of the poverty rates over the past two decades due to changes in the way the consumption data are collected (Deaton and Kozel, 2005; Dang and Lanjouw, 2018). Against this background, there have been more calls to using alternative methods to obtain poverty estimates in contexts with gaps in household consumption data (World Bank, 2017). 2 Survey-to-survey imputation is an increasingly common method that development practitioners 1 Notably, data quality is considered as essential to basic government operations and international aid agencies working in African countries (see, e.g., Jerven (2019)). Devarajan (2013) offers an overview of the statistical challenges facing these countries. The ongoing Covid-19 pandemic could increase poverty and further exacerbate these data deprivations and digital divides for poor countries (Naude and Vinuesa, 2020). Most recently, the World Bank (2021) highlights the role of data for improving global living conditions. 2 Imputation techniques are regularly used by international organizations and national statistical agencies to fill in missing data gaps such as education statistics (UOE, 2020) and income data (US Census Bureau, 2017). 2 have turned to. Building on the seminal technique that imputes from a household consumption survey into a census to generate poverty maps (Elbers et al., 2003), recent studies have imputed from a household consumption survey into another survey to provide poverty estimates. 3 The basic intuition is that we can utilize an existing older consumption survey to build an imputation model, using appropriate predictor variables. This imputation model is subsequently employed in combination with the same variables in a more recent survey that does not collect consumption data to provide poverty estimates for the more recent survey. Besides its relevance for obtaining updated, nationally representative poverty estimates, two other (common) applications of imputation are notable. One application is proxy means testing for social targeting programs. The other application is evaluation of before-and-after impacts of small- scale projects on poverty outcomes (e.g., a food subsidy project). These programs need to identify households that are eligible for program assistance whose (predicted) consumption levels are below a specified threshold or to track household poverty status that are credibly attributed to the impacts of the project. 4 Yet, these projects usually have neither the resources nor the capacity to implement a full-scale consumption survey. This paper makes several new contributions to the literature on survey-to-survey imputation of poverty estimates. First, we further refine various aspects of the poverty imputation models that have been employed in the existing literature. In particular, the paper explores the extent to which imputation accuracy is impacted as a result of varying the scope and complexity of predictors. This creates a wide range of scenarios with different design and cost implications for follow-up surveys 3 The poverty-mapping technique combines a household consumption survey and a non-consumption census, which allows us to provide poverty estimates at a more disaggregated level than available in the household survey. 4 See, e.g., Brown et al. (2018) for a recent application of proxy-means testing and Garbero (2014) for a recent application of imputation for evaluating project impacts on poverty reduction. 3 that would be fielded to compute the required predictors. Specifically, we examine (i) the robustness of the same poverty predictors for different survey rounds over time, ii) whether adding predictors that capture sub-components of household consumption helps improve imputation accuracy, and iii) whether additional predictors from auxiliary community or geospatial data sources result in additional gains in imputation accuracy. While previous studies have touched on some of these topics, to our knowledge this is the first study that attempts to provide a comprehensive and systematic examination of all of them. Second, in order to offer illustrations for a range of data-scarce contexts where imputation methods are most useful, we harmonize and rigorously analyze data from 14 recent rounds of multi-topic household surveys that have been conducted over the past decade in Ethiopia, Malawi, Nigeria, Tanzania, and Vietnam. These five countries span two regions (i.e., Sub-Saharan Africa and Southeast Asia) and different income levels (i.e., low-income to lower-middle-income), and exhibit a greater degree of heterogeneity in terms of population sizes and density vis-à-vis the contexts that had been the focus of previous studies. These heterogeneous settings help ensure that the estimation results, if confirmed across countries, can reliably inform recommendations for future survey-to-survey imputation efforts. Finally, based on the new findings and our review of key previous studies, we provide practical guidance on the variables that can be combined with existing consumption surveys to obtain reliable poverty estimates. These variables can be classified into two groups: those that are likely available in most household surveys (or auxiliary data sources) and those that can be relatively more easily collected (perhaps in a “lighter” and cheaper survey that does not collect full information on consumption). This new and practical focus helps make our study relevant for the 4 design of future surveys as part of survey-to-survey imputation approaches to poverty measurement. The headline findings are as follows. Starting with a basic imputation model that includes household demographic and employment characteristics, we find that augmenting this model with additional predictors that capture household utility consumption expenditures (including electricity, water, and garbage), or to some extent, household assets and dwelling attributes generally provides poverty estimates that are not statistically significantly different from the true poverty rates. These models tend to perform better than the other models, and the resulting imputed poverty rates even fall, in many cases, within one standard error of the true poverty rates. Bringing in geospatial predictors, such as soil quality and distance-to-facilities (and nightlight in the case of Vietnam), by merging georeferenced household survey data with third-party geospatial data sources, is found to further improve imputation accuracy. For instance, including utilities expenditures as an additional predictor in the basic imputation model increases the probability of accurate imputation by 46 percentage points at the national level for all the countries. Further augmenting this model with satellite-based soil quality measures raises the probability of accurate imputation by an additional 17 percentage points. Yet, while these models generally work both at the national level and separately in urban and rural areas, we document some spatial differences through a cross-country meta-analysis of model imputation accuracy. For urban areas, the best performing models are those that feature one of food, health, education, or utilities expenditures as an additional predictor alongside predictors related to demographics, employment, housing and household assets. For rural areas, the best performing models are those that bring in one of total non-food or utilities expenditures as an additional predictor. 5 Moreover, the analysis of the additional survey and census data for Vietnam demonstrates that adding community-level measures of infrastructure, topography, poverty status, education achievement, and wealth can significantly improve estimation accuracy for this country. Further adding continuous, or even dichotomous, measures of consumption of specific food groups as additional predictors may also improve imputation accuracy. This paper consists of six sections. We provide a brief review of the literature in the next section before discussing the analytical framework and data in Section 3. We subsequently present in Section 4 the main estimation results (Section 4.1) and other extensions of analysis (Section 4.2). These include adding the geospatial variables (Section 4.2.1), more disaggregated food consumption items (Section 4.2.2), and additional variables from other auxiliary data sets such as a community survey or population census (Section 4.2.3). We further discuss a more specific application, within-year imputation, in Section 4.3 before offering meta-analysis results on model selection and some practical thoughts for survey implementation in Section 5. We finally conclude in Section 6. 2. Literature Review We briefly review the most relevant studies in this section. Elbers et al. (2003) provide a seminal study that introduces the poverty mapping method (i.e., survey-to-census imputation) to the economic literature that allows poverty estimates at lower administrative levels than are possible using the household survey alone. Employing Elbers et al. (2003)’s framework, various survey-to-survey imputation studies impute from one survey round to another, where these survey rounds can be of either the same design (e.g., imputing from one older household survey round 6 into another more recent household survey round) or of different types (e.g., imputing from one older household survey round into a more recent labor force survey round). We review in Table A.1, Appendix A some key studies in the past 20 years that offer validation of imputation-based poverty estimates against the survey-based poverty estimates using actual consumption data (hereafter referred to as the “true poverty rate”). 5 Several findings stand out from this table. First, the imputation-based poverty estimates can closely track the survey-based estimates in a number of different countries covering different geographical regions. Second, in terms of data combination, studies impute from one round to another round of the same household consumption survey (Christiaensen et al., 2012; Mathiassen, 2013; Daniels and Minot, 2015) or to a different survey such as the Demographic and Health Survey (DHS) (Stifel and Christiaensen, 2007) or the Labor Force Survey (Douidich et al., 2016). Third, regarding methodology, subsequent studies offer various refinements of certain features of the poverty mapping technique, such as imposing a parametric probit functional form on the error term (Tarozzi, 2007) or offering a different formula to estimate the standard errors (Mathiassen, 2009). Most recently, building on the Elbers et al. (2003) method, Dang, Lanjouw, and Serajuddin (2017) attempt to bring some further improvements to the survey-to-survey poverty imputation method, which include simpler variance formulas and formulas for standardization of variables from surveys with different sampling designs (e.g., imputing from a household consumption survey into a LFS). This method has been validated and applied to data from poor and middle-income countries in different regions ranging from India, Jordan, and Sub-Saharan African countries to Vietnam (Beegle et al., 2016; Dang et al., 2017; Cuesta and Ibarra, 2018; 5 Kijima and Lanjouw (2003) offer an earlier imputation study that applies the Elber et al. (2003) framework but without validation against actual consumption data. See Dang et al. (2019), Dang (2021), and Dang and Lanjouw (forthcoming) for more detailed reviews on the poverty imputation literature. 7 Dang and Lanjouw, 2018; Dang et al., 2019). Another recent application of this method is to provide poverty estimates for the Syrian refugees in Jordan (Dang and Verme, 2021) or the various refugee populations in Chad (Beltramo et al., 2020). 6 Finally, regarding variable selection for imputation models, the variables that are found to work well typically include household assets and housing characteristics, with some inconclusive evidence regarding predictors that capture sub-components of household consumption (Christiaensen et al., 2012; Dang et al., 2019). 7 Using a food demand conceptual framework based on the Engel curve, Christiaensen, Ligon, and Sohnesen (2021) make a theoretical suggestion that using consumption sub-aggregates for poverty imputation only works under certain stringent conditions (i.e., these items follow linear Engel curves given prevailing prices and the effect of price changes is small). As such, the key challenge is whether, and how we can identify such variables in practice. On the other hand, Dang et al. (2017) propose an Oaxaca decomposition test that helps select the imputation model that offers the best estimation results where consumption data exist for earlier survey rounds (i.e., there are two earlier survey rounds with consumption data). Compared to the existing literature, our paper offers rigorous validation using multi-topic household survey data that range a greater number of countries and survey rounds and that are integrated with ancillary census and geospatial data. For example, most studies focus on validation 6 The economic poverty imputation literature is also related to a larger literature on missing data (or multiple imputation (MI)) in statistics (see, e.g., Rubin, 1987; Carpenter and Kenward, 2013). Certain differences, however, exist between the two literatures; one is that MI studies tend to employ Bayesian techniques for their estimation, which are more complex and require (far) more computation time for drawing from posterior distributions. Another difference is that economists appear to use economic theory alongside statistical theory for model selection, even though there is little formal discussion of this process in existing studies. See, e.g., Jenkins et al. (2011) and Douidich et al. (2016) for recent studies that apply MI techniques to economic issues. A related application of imputation methods is the construction of synthetic panels, which allow richer analysis of poverty dynamics (Dang et al., 2014). 7 This result is consistent with the concept of a wealth index that is constructed from household assets and housing characteristics to proxy for household wealth levels (Filmer and Pritchett, 2001). 8 using data from one single country or at most two countries (with up to seven survey rounds), while we analyze data from five countries (with 14 survey rounds). Our comparative assessment leverages a greater scope of potential predictors (including consumption sub-aggregate items) vis- à-vis the existing literature, with a focus on providing practical guidance for future survey implementation for survey-to-survey imputation of poverty. Furthermore, we provide new meta- analysis on the estimated parameters from this richer data set that can practically guide model selection in other contexts. 3. Analytical Framework 3.1. Imputation Model We employ Dang et al.’s (2017) method as the main imputation tool in this paper, which we briefly describe in this section. We assume that the linear projection of household consumption per capita ( ), where j denotes the survey, on household and other characteristics ( ) is given by the following linear model = ′ + (1) where are the vector of coefficients, for j= 1, 2. 8 For better accuracy, the error term is further broken down into two components, a cluster random effects ( ) and an idiosyncratic error term ( ). Conditional on the characteristics, the cluster random effects and the error term are assumed 2 uncorrelated with each other and to follow a normal distribution such that | ~(0, ) and 2 | ~ (0, ). Equation (1) thus provides a standard linear model that can be estimated using most available statistical packages. The consumption data exist in the base survey (i.e., j= 1, or 8 More generally, j can be larger than 2 and can indicate any type of relevant surveys that collect household data sufficiently relevant for imputation purposes such as labor force surveys or demographic and health surveys. To make the notation less cluttered, we do not show the subscript for households in the equations. 9 survey 1) but are not available in the other survey(s). Our objective is to impute the missing (or low-quality) consumption data, which can be subsequently employed to obtain poverty estimates in the target survey (or survey 2), given that these data are available in this base survey alone. Assume that the sampled data in survey 1 and survey 2 are representative of the same population in each respective time period, such that estimates based on the same characteristics in these two surveys are consistent and comparable (Assumption 1). In other words, this assumption implies that, for two contemporaneous (i.e., implemented in the same time period) surveys, measurements of the same characteristics are identical (except for potential sampling errors) since they are consistent measures of the population values; for two non-contemporaneous surveys, these estimates from the two surveys are consistent and comparable over time. While it is difficult, if not possible, to formally test for Assumption 1, prior (expert) knowledge about the quality of the survey data can provide supportive evidence for its validation. For example, survey rounds of the same design (e.g., different rounds of a household consumption survey) are more likely to satisfy Assumption 1 than those of different designs (e.g., a household consumption survey round with a labor force survey round). Assumption 1 should not be taken for granted since the inconsistency between different rounds of the same survey or different surveys is well documented in studies using data from both poorer and richer countries. 9 Clear violation of Assumption 1 rules out the straightforward application of survey-to-survey imputation technique and may require further data checks to gauge the degree of violation of this assumption. 9 Survey design issues that compromise the comparability of poverty estimates are found in various countries. These issues can range from changes in the number of consumption items in the questionnaire in India and Vietnam (Dang and Lanjouw, 2018; World Bank, 2012) to data collection methods in China and Tanzania (Gibson, Huang, and Rozelle, 2003; Beegle et al., 2012). See also Angrist and Krueger (1999) for a related review of comparability and other data issues with a focus on labor force surveys in the U.S. 10 Further assume that given the estimated consumption parameters from survey 1, the changes in the distributions of the explanatory variables between the two periods can capture the change in the poverty rate in the next period (Assumption 2). Given Assumptions 1 and 2, to obtain the imputed consumption for survey 2 we can replace 1 with 2 in Equation (1): 1 2 = 1 ′2 + 1 + 1 (2) Put differently, Equation (2) applies the model parameter 1 and the distributions of the error terms 1 and 1 from the base survey to the 2 characteristics in the target survey to obtain estimates of 1 household consumption 2 in the target survey (with the superscript indicating that the household consumption variable is predicted using the model parameters from the base survey). Since the estimated parameters are obtained using a different survey from the target survey, we can use simulation to estimate Equation (2) as follows: �2 1 1 = ∑ �̂1 ( ′ �1, + ̂ ̃1, ) (3) =1 � , 2 + ′ � , In Equation (3), �1, represent the sth random draw (simulation) from their estimated � , and � � 1, 1, distributions using the base survey, for s= 1,…, S. The poverty rate in the target survey and its variance can then be estimated as �2 = 1 ∑ �2 ( 1 , ≤ 1 ) (4) =1 �2 ) = 1 ∑ ( �2, |2 ) + (1 ∑ ( �2, |2 ) (5) =1 =1 Subject to data availability, the vector of characteristics xj that are commonly observed in both the base survey and the target survey can include individual, household, and other characteristics. To help provide relevant inputs for subsequent imputation or survey implementation efforts, we organize the estimation results centered on two principles. The first principle is that the variables in the imputation model are likely available in a standard household consumption survey (or other 11 auxiliary data sets such as a LFS or geospatial data). The second principle is ease of data collection, such that these variables are collectible in most data-scarce contexts. Combined together, these two principles ensure that our estimation results are operational; that is, we can provide imputation- based poverty estimates with the most parsimonious imputation model possible, or the best imputation model in terms of ease of data collection. 10 Individual characteristics include variables such as age, sex, education, ethnicity, religion, language, and occupation. Household characteristics include variables such as household size, the living area of the house, the physical quality of the house (e.g., whether its roof or wall has good quality), and household assets. These characteristics also include a house’s toilet since a better type of toilet such as a flush toilet is often observed to proxy for more wealth than other flimsier or less modern facilities (such as pit latrines or no toilet at all). We investigate the sensitivity of imputation accuracy by showing the results when different predictor variables are used. In particular, we examine whether adding certain consumption sub- aggregates to the imputation model can help improve accuracy; these include items such as food consumption, non-food consumption, durables consumption, health consumption, and consumption on utilities such as electricity, kerosene, water, and garbage. We also consider variables that proxy for community accessibility and infrastructure such as the distances to the nearest facilities and major city, and whether the communes are classified as being poor or remote. These variables are typically available from community survey questionnaires and represent the economic development, and possibly the income levels, of the community. 10 Following these principles also implies that certain variables that may help improve imputation accuracy but are difficult to collect data on (such as food consumption with the appropriate deflators to make it comparable with previous surveys) are not recommended for a good and cost-effective imputation model. We discuss this further in Sections 4 and 5. 12 For the geospatial variables, we consider the distances from the commune center to various locations such as the nearest major road and the nearest international land border crossing, and other variables including nightlight intensity and agricultural soil quality. Nightlight data have been used to produce poverty maps for African countries (Jean et al., 2016) and soil quality is strongly associated with higher agricultural outputs that can raise household living standards (Tittonell and Giller, 2013; West et al., 2014). While we focus in this paper on examining the robustness of the same poverty predictors for different survey rounds over time (i.e., across-year imputation), we also consider their performance within the same time period (i.e., within-year imputation). Across-year imputation is typically employed to provide more updated poverty estimates, while within-year imputation is often used in contexts of proxy-means testing or evaluating project impacts on poverty reduction. 11 3.2. Data We analyze multi-topic household survey data from a total of 14 survey rounds from five different countries: Ethiopia (1), Malawi (4), Nigeria (2), Tanzania (3), and Vietnam (4), with the number of survey rounds for each country being noted in parenthesis. In each Sub-Saharan African country, the data originate from the nationally-representative, multi-topic household surveys that have been implemented by the respective national statistical office with support from the World Bank Living Standards Measurement Study – Integrated Surveys on Agriculture (LSMS-ISA) initiative. The data sets include: i. the Ethiopia Socioeconomic Survey (ESS), 2018/19 round 11 The assumptions for these two types of imputations are also quite different. While across-year imputation requires the assumption of constant parameters, within-year imputation requires the assumption that the national model also applies to the specific region under investigation. Dang and Lanjouw (forthcoming) offer further classification of imputation methods. 13 ii. the Malawi Integrated Household Survey (IHS), 2010/11 and 2016/17 rounds iii. the Malawi Integrated Household Panel Survey (IHPS), 2010 and 2013 rounds iv. the Nigeria General Household Survey (GHS)–Panel, 2010/11 and 2012/13 rounds, and v. the Tanzania National Panel Survey (TZNPS) 2008/09, 2010/11, and 2012/13 rounds. The sample sizes hover around 3,000 to 5,000 households in each survey, except for the ESS 2018/19, which surveyed nearly 7,000 households, and the Malawi IHS3 and IHS4, which surveyed over 12,000 households. In addition, the analysis leverages data from the Vietnam Household Living Standards Survey (VHLSS) 2010, 2012, 2014, and 2016 rounds. Being similar to the LSMS-type surveys supported by the World Bank, these surveys are implemented biennially by the Vietnam’s General Statistical Office (GSO) and collect rich, nationally representative data on household demographics, education, occupation, assets, and consumption. These surveys also collect data on commune characteristics with a community questionnaire. The sample size for each round is around 9,300 households. These surveys are generally regarded as being of high quality and are regularly employed by the Government of Vietnam and international organizations to provide estimates on household welfare and poverty measures. Since the surveys for the five countries are LSMS-type surveys, the data are generally consistent and comparable across countries. The consumption data are deflated in the base survey year’s prices and are comparable across survey rounds for each country. We provide both across- year and within-year imputation results for all the countries, except for Ethiopia where we can only analyze one survey round and test within-year imputation. The objective is to produce the imputation-based poverty estimates as if we did not have consumption data, and then evaluate 14 these imputation-based poverty estimates against the poverty estimates based on the actual survey data (i.e., the “true” poverty rates). We also prepare and add several geospatial variables for the five countries, including the distances from the commune center to various important locations (e.g., the nearest major road and the nearest international land border crossing), nightlight intensity, and agricultural soil quality. These data are obtained from various sources including FAO and are provided together with the LSMS-ISA public use data sets, except for Vietnam where we process these data separately. 12 For Vietnam, we further add several variables that are collected through the VHLSS community questionnaire and that capture community accessibility and infrastructure, including distance variables to the nearest facilities and a major city, and whether the communes are classified as being poor or remote. Since community questionnaires are often part of the instruments used by LSMS-type surveys, the main advantage of employing these commune characteristics is that they can be more readily available to use vis-à-vis predictors that are derived from third-party geospatial data sources that the georeferenced household survey data would need to be linked to. We also add several variables from Vietnam’s 2009 Population and Housing Census on education achievement, ethnicity, and household wealth, which are aggregated at the commune level from the micro census data. 4. Estimation Results 4.1. Main Results The data from the survey rounds listed above are generally regarded as being of good quality. These survey rounds share the same sampling frame for each country and are generally regarded 12 See, e.g., Tanzania’s National Bureau of Statistics (2011) for more discussion on the geospatial variables in the context of this country. For Vietnam, we collect and process data from various public data sources including Harmonized World Soil Database, Open Street Map, and NOAA Climate Data. 15 as comparable over time by most data users (including World Bank poverty economists working on these countries). This satisfies Assumption 1 that the sampled data in round 1 and round 2 are representative of the same population in each time period. As noted above, for Ethiopia, we consider one survey round for within-year imputation purposes only. To examine the sensitivity of imputation accuracy to various predictor variables, we build the estimation models on a cumulative basis, with the later models sequentially adding more variables to the earlier models. On the whole, we employ nine core imputation models across four countries. 13 Model 1 is the most parsimonious (or basic) model and consists of household size, household heads’ age and gender, household heads’ highest completed levels of schooling, a dummy variable indicating whether the head belongs to the ethnic majority group, the shares of household members in the age ranges 0-14, 15-24, and 25-59 (with the reference group being those 60 years old and older), a dummy variable indicating whether the head worked in the past 12 months, and a dummy variable indicating urban residence. Model 2 adds household asset variables and house (dwelling) characteristics to Model 1. Household assets include variables indicating whether the household has a car, motorbike, bicycle, desk phone, mobile phone, DVD player, television set, computer, refrigerator, air conditioner, washing machine, or electric fan. House characteristics include the construction materials for the house’s roof and wall and the type of water and toilet the household has access to. 14 Models 1 and 2 include standard variables that are available in most LSMS-type surveys and other types of micro surveys as well. 13 A recent theoretical study also suggests that for misspecified regressions, adding more variables may result in larger inconsistency (De Luca, Magnus, and Peracchi, 2018). On the other hand, dropping some variables from the core Model 1 such as employment generally, but not substantially, decreases the imputation accuracy (see Appendix E). As such, it is useful to examine imputation accuracy for different models. 14 For Vietnam, house wall material is assigned numerical values using the following categories: 6 "cement", 5 "brick", 4 "iron/wood", 3 "earth/straw", 2 "bamboo/board", and 1 "others". The types of toilet are assigned numerical values using the following categories: 6 "septic", 5 "suilabh", 4 "double septic", 3 "fish bridge", 2 "others", and 1 "none". 16 Model 3 adds total food expenditures to Model 2, and Model 4 adds total non-food expenditures to Model 2. Models 5 to 8 add to Model 2, respectively, durables expenditures, health expenditures, education expenditures, and utilities expenditures (such as on electricity, water, and garbage). All these expenditures are on a per capita basis and are converted to logarithmic form. Finally, Model 9 adds utilities expenditures to Model 1. The list of the specific predictors that are used in each country is provided in Appendix A, Table A.2. For comparison purposes and robustness checks, we use two estimation methods with different assumptions about the error terms. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods include the random effects at the primary sampling unit for each country. Table 1 provides the estimation results for the predicted poverty rates for 2016 for Vietnam using the 2014 VHLSS as the base survey round. (The full regression results for Equation (1) are shown in Appendix B, Table B.1). The estimation results show that Models 1 to 8 provide inaccurate poverty estimates that are different from the true poverty rate of 9.6 percent for 2016. However, poverty estimates using Model 9, which controls for utilities consumption, are not statistically significantly different from the true poverty rate for both estimation methods. Furthermore, our predicted poverty rates are 9.7 percent and 9.1 percent using the normal linear regression model and empirical distribution of the error terms, are within the one-standard-error interval of the true poverty rate of [9.2, 10.0]. To examine the robustness of the same poverty predictors for different survey rounds over time, we provide estimation results using the preceding survey round as the base survey and subsequently impute into the next survey round for all four rounds of the VHLSS. That is, we build the imputation model using the 2010 round and impute into 2012, and using the 2012 round to 17 impute into 2104. The estimation results are provided in Appendix B, Tables B.11 and B.12 for 2012 and 2014, respectively. Several results stand out from these two tables. First, controlling for utilities consumption (Model 9) provides estimates that are mostly within one standard error of the true poverty rate for both years. The only case where Model 9 estimate is not accurate is for the imputation model from 2012 to 2014, using the empirical distribution of the error terms (Table B.11, row 3). Yet, in this case the difference with the true poverty rate is not very large at one percentage point, which is roughly 8 percent of the true poverty rate (=1/13.2). Second, adding the household asset variables and the house characteristics to Model 1 (Model 2) offers estimates that are within one standard error of the true poverty rate for both years. While we do not have the same result for 2016, this result is consistent with the finding in previous studies that these variables have an important role in prediction accuracy (as discussed in Section 2). 15 Third, the model that includes both utilities consumption and the household asset and house characteristics variables (Model 8) performs well. Indeed, three out of four estimates fall inside the 95 percent confidence interval (CI)) of the true poverty rate; put differently, both these estimates are not statistically significantly different from the true poverty rate. Yet, this model does not appear to clearly improve on either Model 2 or Model 9. Finally, some models that control for certain consumption sub-aggregates appear to do well for (one of) these two years but not for 2016. Specifically, controlling for food expenditures (Model 3) provides estimates that are within one standard error of the true poverty rate for 2014. Controlling for health expenditures (Model 6) offers estimates that mostly fall within the 95 15 In addition, Model 2 performs better than some other models with more variables. This is also consistent with our discussion earlier that adding more variables to misspecified regressions may result in less imputation accuracy. 18 percent CI of the true poverty rate. The reason is likely due to the fact that food expenditures often form a key component of household expenditures, particularly for poorer countries; health expenditures, on the other hand, do not typically make up a large share of household expenditures but can represent important expenses. It is also useful to note that while a high value of R2 generally indicates a good model fit for the underlying regression (for Equation (1)), it does not automatically indicate that the poverty imputation model calibrated with the base survey data can provide accurate predictions once it is applied to the target survey data. For example, the R2 for Model 9 ranges between 0.45 and 0.59, which is much less than the corresponding R2 value of roughly 0.9 for Model 3 for the four countries (Tables 1 to 4). Yet, in terms of providing accurate predictions vis-à-vis the true poverty rates, Model 9 performs better than Model 3 (and most other models with a higher R2 value). This result similarly holds for the coefficient of correlation ρ(y,yh) between the predicted and actual consumption variables, which is a statistic commonly used to measure how well the predicted variable approximates the actual variable (Pituch and Stevens, 2016). We return to more discussion on the estimated model parameters in Section 5. We turn next to the estimates using similar models for other countries, which are shown respectively in Tables 2, 3, and 4 for Malawi, Nigeria, and Tanzania. (Since we only have data for one survey round for Ethiopia, we are unable to provide similar estimates for this country.) Notably, Model 2 (controlling for the household asset variables and the house characteristics) works well for Malawi and Nigeria, but not Tanzania. Model 9 (controlling for utilities expenditures) generally performs well for all the three countries. On the other hand, Model 8 that controls for both the household asset variables and the house characteristics and utilities expenditure works for Nigeria and Tanzania but not Malawi. 19 Similar to Vietnam, the imputation models that control for consumption sub-aggregates do not show a consistent pattern. In particular, controlling for food and health expenditures (Model 3 and Model 6) works for Malawi and Nigeria, which are similar to the results for the years 2012 and 2014 for Vietnam. On the other hand, controlling for non-food expenditures (Model 4) works for Malawi and Tanzania. Finally, do estimation results differ across urban and rural areas? We run the imputation models separately for urban and rural areas (e.g., we impute from the rural sample in the base survey to that in the target survey) for the latest two survey rounds for all four countries. Figure 1 plots the results for each model for each country, suggesting that there is no clear evidence whether the imputation models uniformly perform better for urban or rural areas. For example, all the models work well for urban Tanzania (with the larger square symbols indicating the estimates are not statistically significantly different from the true poverty rates), but only Models 4 and 9 work for rural Tanzania. In contrast, while all the models work well for rural Nigeria, only Model 3 works for urban Nigeria. We provide the full estimation results for urban and rural areas for all the countries in Appendix C, Tables C.1 to C.14. We return to more discussion on meta-analysis of model performance in Section 5. 4.2. Further Extensions with Complementary Predictors Our estimation results so far suggest that controlling for household assets and house characteristics (Model 2) or controlling for utilities expenditures (Model 9) provides better poverty estimates than the other models. We next consider three extensions where we examine whether adding to each of these two models some other variables can help improve imputation accuracy. These include variables such as i) geospatial variables, ii) more disaggregated (either dichotomous 20 or continuous) measures of consumption of specific food groups, or iii) variables from the community survey or population census. While geospatial variables have become increasingly more available, the latter two types of variables are not readily available in most contexts. The continuous measures of consumption of specific food groups also require deflators to make these items consistent and comparable over time, similar to the need for deflators in order to use other consumption sub-aggregates as poverty predictors. Similarly, data from the population census are not always accessible. As such, we present these variables in a rough order of data availability in a typical survey context. 4.2.1. Adding Geospatial Variables Table 5 provides the poverty estimates in 2016 for Vietnam when we further add to Model 2 or Model 9 the distances from the commune center to various important locations (such as distances to the nearest major road, the nearest population center with 50,000 or more people, the nearest major port, the nearest international land border crossing, the provincial capital, and the land-based travel time to the nearest densely-populated area), nightlight intensity, and agricultural soil quality. The full regression results are provided in Appendix B, Table B.2. The estimation results show that while adding these variables to Model 2 leads to worse estimates that fall outside the 95 CI of the true poverty rate, doing so with Model 9 yields the opposite results. All the poverty estimates where we separately add these geospatial variables are still within one standard error of the true poverty rate of 9.6 percent. The estimation results for the other countries are somewhat similar to those for Vietnam, except that we do not have nightlight intensity for these countries. Adding the geospatial variables to Model 9 works for Tanzania (Table 8) but not for Malawi (Table 6), where doing so even results 21 in the poverty estimates falling outside the 95 CI of the true poverty rate. Adding the same variables to Model 2 works for Malawi only in the case of agricultural soil quality (Table 6) but works quite well for Nigeria (Table 7) with both the poverty estimates lying within one standard error of the true poverty rate. 4.2.2. Adding More Disaggregated Food Consumption Items We turn next to examining models that add more disaggregated food consumption items to the imputation model with household assets (Table 1, Model 2) using the Vietnam data sets. As discussed above, we deflate these consumption items to the same prices across the 2012-14 rounds of the VHLSSs before including them in the imputation models. We sequentially add to the imputation model each of eight sub-categories of food consumption: rice (the Vietnamese staple food), meat, seafood, vegetable and fruit, lard and cooking oil, milk products, drinks, and food away from home. These food items are popular in the country’s diet and range from 3 percent (drinks) to more than 30 percent (food away from home) of total household food consumption. The estimation results, shown in Table 9, perform quite well. Except for milk products (Model 6) that fall inside the 95 CIs, all the estimates for the other models are within one standard error of the true poverty rate of 13.2 percent for 2014. In the case of Model 9, one of the leading imputation model alternatives based on the aforementioned findings, adding more disaggregated food consumption items to the imputation does not improve prediction performance over and above the core Model 9 model and can in fact result in lower levels of accuracy (Appendix B, Table B.20). For instance, the model that includes milk products and uses the empirical distribution of the error terms provides an estimate that is statistically different from the true poverty rate (Model 6, row 2). The remaining models, however, 22 offer estimates that fall within the 95 CI of the true poverty rate. In fact, two-thirds (i.e., 10 out of 15) of the estimates are still within one standard error of the true poverty rate. Table 10 shows the results from the models that are estimated with the 2012-2014 rounds of the VHLSS and that instead complement the specification of Model 9 with dichotomous, easier- to-collect, measures of consumption of specific food groups. In particular, these dichotomous food consumption measures do not require the use of consumption deflators as with the continuous measures. The models for Vietnam perform well and all the estimates fall within one standard error of the true poverty rates. However, these models do not show a consistent pattern across countries. The poverty estimates work reasonably well for Malawi in 2013 (Appendix D, Table D.2) and Nigeria in 2012/13 (Appendix D, Table D.4) but fall outside the 95 CIs of the true poverty rate for Vietnam in 2016, Malawi in 2016/17, and Tanzania in 2010/11 and 2012/13 (Appendix D, Tables D.1, D.3, D.5, and D.6). 4.2.3. Adding Variables from Other Data Sets Most LSMS-type surveys implement a community questionnaire in addition to the household questionnaire to collect data on the community characteristics. Would adding these community variables to the imputation model help improve accuracy? To investigate this question, we turn to the VHLSSs in 2012 and 2014 where we add several community variables such as the distances from the commune center to the nearest facilities, a major city, and whether the communes are classified as being poor or remote. 16 The estimation results, shown in Table 10, suggest that simply adding these variables to the most parsimonious model (that controls for demographics and 16 These community variables are available for rural areas only, which results in a higher poverty rate and a smaller number of observations for this table compared to Table B.12. 23 employment) does not result in good poverty estimates (Table 11, Model 1). However, adding these variables to either the imputation models that control for household assets and house characteristics or for utilities expenditures works well and provides the poverty estimates of approximately 18.0 percent. This figure is very close to and lies within one standard error of the true poverty rate of 18.1 percent for rural Vietnam in 2014. (Table 10, Model 2 and Model 3). Table 12 further adds the commune-level characteristics to the imputation model, which are generated using the 2009 Population and Housing Census. These variables include the share of the population with college/ university education, the share of the population that belong to ethnic majority groups, the average household's asset index and living areas, and the share of houses with high quality cooking fuel sources, drinking water sources, and toilet facilities. Adding these variables does not change the results with the imputation using the house assets (Model 2), since the estimates are already within one standard error of the true poverty rate (Appendix B, Table B.11, Model 2). But doing this significantly improves the prediction accuracy for the imputation model using the utilities expenditures. Specifically, the estimate using the empirical distribution of the error terms turns from lying outside the 95% CI (Appendix B, Table B.11, Model 8, row 2) to falling within one standard error of the true poverty rate. 4.3. Within-Year Imputation For the within-year imputation, we divide the estimation sample into two random halves for each country. We subsequently use one random half as the base survey and impute from this base survey into the other random half, which serves as the target survey. The estimation results suggest that the within-year imputation works well for most models for every country. Summarizing the results for Ethiopia, Malawi, Nigeria, and Tanzania (fully shown in Appendix B, Tables B.15 to 24 B.19), Figure 2 indicates that the estimates mostly fall within the 95% CIs of the true poverty rates. The estimates are less accurate for Ethiopia and Vietnam and, with four and six out of 18 estimates respectively falling outside the 95% CIs of the true poverty rates. On the other hand, the estimates for the other countries all fall within the 95% CIs, and many within one standard error of the true poverty rates. These results indicate that with only a single base survey at hand, it could be misleading to carry out a similar within-survey imputation exercise and decide on the best performing model to be used for across-year imputation. The reason is that while all the tested models appear to be achieving comparable within-year imputation performance, only a subset of the models can fulfill across-year imputation needs and provide poverty estimates that are not statistically significantly different from the true poverty rates. 5. Further Meta-Analysis on Model Selection Given the various across-year imputation model variants that we tested for different countries and years, it is useful to summarize the results through a meta-analysis. Figure 3 plots for 26 different models the imputation accuracy, which is defined as the share of the estimates that is not statistically significantly different from the true poverty rate for a model. The measure is computed across all instances of a given model’s estimation with a unique pair of a base survey and a target survey in a given country. These models include the core Models 1 to 9 (shown in Tables 1 to 4) and the six models with geo-spatial variables. For more comparison, we further added: i) three models that are variants of Model 2: demographics variables only, demographics variables and assets, and demographics variables and housing characteristics, and 25 ii) eight model variants that add to Model 2 a dummy variable indicating household consumption of, respectively, the staple food (rice or maize), meat, seafood, vegetable and fruit, lard and cooking oil, milk products, drinks, and food away from home. Figure 3 suggests that for the first nine models, Model 9 performs better than average with an imputation accuracy of 70.8 percent, to be followed by Model 3 (58.3 percent) and Model 8 (45.8 percent). Further augmenting Model 9 with the geospatial variable on agricultural soil quality can raise its accuracy to 75 percent. Incorporating into Model 2 the dichotomous variables that capture consumption of food groups does not seem to help much, except that it raises the imputation accuracy above the average model performance, to 45.2 percent and 42.9 percent respectively, when we add cooking oil or drinks. 17 The analysis shown in Figure 3 is obtained by simply averaging across the imputation models the results across the countries, the years, as well as other variables (e.g., region or estimation methods). To further take into account the potential contributions from these model characteristics, we estimate the following logit regression = ∑ =1 ′ + + (6) where is a binary variable that equals 1 if the poverty estimate is not statistically significantly different from the true poverty rate and 0 otherwise, for k= 1,.., K models and n= 1,.., N countries. are the dummy variables indicating the imputation models, are the country dummy variables, and is the error term. 18 17 As a special case, we excluded the employment-related predictors and re-estimated all the models using the two latest round of survey data in each country. These results are presented in Appendix E. The exclusion of the employment-related predictors does not alter our previous findings regarding the performance of each model, except for Model 9. The exclusion of the employment-related predictors is solely and adversely affecting the imputation accuracy of Model 9 in specific cases, and in those instances, the inclusion of the geospatial variables appears to be boosting the predictive performance of the model to be comparable with that of Model 9 that includes the employment- related predictors. 18 We exclude the results with the nightlight variables because these are only available for Vietnam. 26 The dynamics between a country dummy variable and its poverty rate can be captured to varying extents for different countries by the imputation models. Consequently, to shed more light on these differences, we can replace the country dummy variables with the model characteristics, to estimate the following alternative equation: = ∑ =1 ′ + ′ + (7) where are the model characteristics such as the true poverty rate in the target survey, the (logarithm of) sample size of the base survey, the time difference between the base survey and the target survey, and the estimation method (normal linear regression model or the empirical distribution of the error terms). But the model characteristics can only offer a guide to model selection, since these model characteristics likely represent a correlational—rather than causal— and ex post relationship with the imputation outcomes. Our preferred equation for interpretation is Equation (6) that clearly lays out the models a priori. 19 Table 13 shows the logit regression results tied to the estimation of Equations 6 and 7. The associated marginal effects are presented in Appendix B, Table B.21. 20 To explore heterogeneity across urban and rural areas, Equations 6 and 7 are estimated for the whole country (Specifications 1 and 2), and separately for urban (Specifications 3 and 4), and rural samples (Specifications 5 and 6). We estimate robust standard errors that are clustered at the country level for both equations. 19 This concern is particularly relevant to the estimated model parameters (versus the exogenous model parameters given by the data). As an example, the correlation between the model goodness-of-fit statistics R2 (or the correlation between the predicted consumption and the actual consumption ρ(y,yh)) with the model numbers is around -0.34 and strongly statistically significant for the whole country sample. Figure B.2 in Appendix B provides a graphical illustration of these estimated statistics against the model numbers. As such, we do not include them in the regressions for Equations (5) and (6). 20 Alternatively, we can more rigorously define the outcome variable as taking the value of 1 or 2 if the poverty estimate falls within the 95 percent CIs or one standard error around the true poverty rate, and 0 otherwise. The results, shown in Appendix B, Table B.22 are qualitatively similar and have somewhat more statistical significance. 27 There are several interesting findings that stem from Table 13. First, regarding the specific imputation models to use, differences exist by geographical regions. Models 9 and 13 work for the whole country, urban, and rural areas. For urban areas, Models 2, 3, 6, 7, and 11 work well as shown by the strong statistical significance level, and Models 8 and 10 may also work as shown by the marginal statistical significance level at the 10 percent level (Specifications 3 and 4). These models do not work for rural areas. On the other hand, Model 4 appears to work for rural areas only (Specifications 5 and 6). Second, to some extent, the magnitude of the estimated impacts differ by geographical regions. For example, after controlling for other characteristics, compared to the reference imputation model consisting of demographics and employment variables only (Model 1), Model 9 increases the probability of accurate imputation by 0.46 for the whole country, 0.44 for urban areas, and 0.58 for rural areas (Table B.21, Specifications 1, 3, and 5). Further adding agricultural soil quality to Model 9 raises the probability of accurate imputation by 0.62 for the whole country but does not change much this probability for urban or rural areas. Similarly, Model 12 raises the probability of accurate imputation by 0.44 for urban areas and by 0.48 for rural areas (Table B.21, Specifications 3 and 5). Third, it is reassuring that the results in our main specifications for urban and rural areas (Specifications 3 and 5 respectively) are largely similar to the alternative specifications (Specifications 4 and 6 respectively). But several models, including Models 2, 6, 7, and 11, lose their statistical significance when we replace the country dummy variables with the estimated model parameters. Notably, these models have weaker impacts on raising the probability of imputation accuracy (i.e., under Specification 3, the impact of Model 2 is 0.14, roughly one-third of that of 0.44 for Model 9). 28 Finally, the estimation results using the estimated model parameters ( Specifications 2, 4, and 6) indicate that a larger time interval length between the base survey and the target survey can reduce the probability of a poverty prediction that is not statistically significantly different from the true poverty rate for the whole country and rural areas, but not for urban areas. Higher true poverty rates are positively (negatively) associated with increases in the probability of interest for the whole country and rural areas. The opposite is true concerning urban areas. Higher sample size for the base survey can help the estimation for rural areas but may have the opposite effect for urban areas. However, as discussed earlier, the relationship between the estimated model parameters and the imputation accuracy is at best correlational, so these results should be regarded as indicative and should be further investigated. We also examine the meta-analysis results for other model variants. In particular, more parsimonious models that use fewer variables than those in Model 2, such as including demographics variables only, demographics and asset variables, and demographics and housing characteristics. These do not generally have great imputation accuracy, except for the model with demographics and asset variables for urban areas (Appendix B, Table B.23). Adding dummy variables indicating household consumption of disaggregate food items does not generally improve imputation accuracy, except for the models that control for consumption of vegetables and fruit or cooking oil in urban areas (Appendix B, Table B.24). 6. Conclusion This paper advances the literature on the use of survey-to-survey imputation for consumption and poverty measurement by attempting to identify the cross-country consistent, minimum set of predictors that yields reliable estimates for poverty monitoring and evaluation purposes. In doing 29 so, we seek to ultimately inform a set of guidelines that can be generalizable to the extent possible to different project contexts and that are aimed at survey practitioners and analysts interested in the use of survey-to-survey imputation for poverty measurement. The analysis leverages 14 multi- topic survey rounds that have been conducted over the past decade in Ethiopia, Malawi, Nigeria, Tanzania and Vietnam, and we conduct a comparative assessment of the performance of a range of imputation models for across-year and within-year imputation purposes at the national, urban, and rural levels. The models cover a diverse set of scenarios in which the scope and complexity of both survey-based and geospatial predictors vary extensively. We find that augmenting a basic imputation model that includes household demographic and employment characteristics with additional predictors that capture household utility consumption expenditures (including electricity, water, and garbage) and/or household assets and dwelling attributes generally provides poverty estimates that are not statistically significantly different from the true poverty rates. These poverty estimates even fall, in many cases, within one standard error of the true poverty rates. Incorporating additional geospatial predictors such as agricultural soil quality and the distance-to-facilities variables (or nightlights in the case of Vietnam) that are derived by linking georeferenced survey data with third-party geospatial data sources is documented to further improve imputation accuracy. We also consider a number of additional variables from auxiliary data sets such as community surveys or the population census for Vietnam, such as community-level measures of infrastructure, topography, poverty status, education achievement, and wealth. Adding these commune characteristics significantly improves estimation accuracy in Vietnam. Across a larger set of countries, adding other consumption sub-aggregates to the imputation model, particularly more 30 disaggregated food consumption items, as expenditures or even as dummy variables indicating household consumption of these items, may be useful as well. A meta-analysis reveals spatial heterogeneity of imputation accuracy between urban and rural areas. The basic imputation model that consists of demographics, employment, and utilities expenditures (with or without geo-spatial variables) works well for the whole country, urban, and rural areas. For urban areas, augmenting the basic imputation model with predictors that capture total food, health, or education expenditures further improves predictive accuracy. For rural areas, the best performing model appears to be the basic imputation model augmented with total non- food expenditures as an additional predictor. The cross-country consistent promise of the model variant that combines household-level demographic and employment predictors with utility consumption expenditures (with or without third-party geospatial variables that can be matched with georeferenced survey data) is welcome news for future imputation studies. These variables are typically available in household surveys that would inform baseline imputation model estimation and would be relatively easy to collect in follow-up surveys. This is in comparison to alternative predictors that can also yield reliable poverty predictions but that are more complex and costly to collect – such as total food, non-food, education or health expenditures. The finding regarding utility consumption expenditures is promising, as potential measurement error in this predictor will be lower in cases where paid bills can be consulted concerning some utilities. Future research can consider, subject to data availability, expanding the scope of (i) the geographic spread of the countries considered for the comparative assessment, and (ii) the predictors related to food and non-food consumption - for instance, by considering more disaggregated non-food consumption sub-aggregates as predictors. Doing so will be important in 31 further gauging the cross-country consistency of our recommendations. Methodological survey experiments would also be useful. A promising direction is to gauge whether imputation accuracy could be impacted by differences in designs of base and target surveys, for instance, in terms of fieldwork duration and burden on respondents and enumerators - even when poverty predictors are computed based on identical questions in both surveys. In this respect, the evidence provided by Kilic and Sohnesen (2019) 21 suggests that administering alternative versions of a light target survey questionnaire (inclusive only of the questions required for the predictors in the models of interest) to different random subsets of households, in parallel with the base survey questionnaire (with full consumption data collection) being administered to a separate random subset of households, may offer a useful experiment. 21 Based on a randomized survey experiment that was implemented in Malawi, Kilic and Sohnesen (2019) document that observationally-equivalent, as well as same, households answer the same questions differently depending on whether they are subject to a short questionnaire or its longer counterpart. The authors find statistically significant differences in reporting across all topics and question types. When a poverty imputation model (that is calibrated with an even longer, base survey questionnaire) is applied to the poverty predictors associated with the samples that were subject to different questionnaire lengths as part of the experiment, the authors demonstrate 3 to 7 percentage points differences in the resulting predicted poverty rates between the two samples, depending on the model specification. 32 33 References Angrist, Joshua. D. and Alan B. Krueger. (1999) “Empirical Strategies in Labor Economics.” In Ashenfelter, Orley and David E. Card. (Eds.). Handbook of Labor Economics, Vol. 3c. Amsterdam: North-Holland. Beegle, Kathleen, Luc Christiaensen, Andrew Dabalen, and Isis Gaddis. (2016). Poverty in a Rising Africa. Washington, DC: The World Bank. Beegle, Kathleen, Joachim De Weerdt, Jed Friedman, and John Gibson. (2012). “Methods of Household Consumption Measurement through Surveys: Experimental Results from Tanzania”. Journal of Development Economics, 98(1): 3-18. Beltramo, Theresa, Hai-Anh Dang, Ibrahima Sarr and Paolo Verme. (2020). "Estimating Poverty among Refugee Populations: A Cross-Survey Imputation Exercise for Chad". World Bank Policy Research Paper # 9222. World Bank: Washington, DC. Carpenter, J. and Kenward, M. (2013). Multiple Imputation and its Application. Chichester: John Wiley & Sons. Christiaensen, Luc, Ethan Ligon, and Thomas P. Sohnesen. (2021). "Should Consumption Sub- Aggregates be Used to Measure Poverty?" World Bank Economic Review. Doi: https://doi.org/10.1093/wber/lhab021 Christiaensen, Luc, Peter Lanjouw, Jill Luoto, and David Stifel. (2012). "Small Area Estimation- based Prediction Models to Track Poverty: Validation and Applications.” Journal of Economic Inequality, 10(2): 267-297. Cuesta Jose and Gabriel Lara Ibarra. (2018). “Comparing Cross-Survey Micro Imputation and Macro Projection Techniques: Poverty in Post Revolution Tunisia”. Journal of Income Distribution, 25(1): 1-30. Dang, Hai-Anh. (2021). "To Impute or Not to Impute, and How? A Review of Alternative Poverty Estimation Methods in the Context of Unavailable Consumption Data". Development Policy Review, 39(6), 1008-1030. Dang, Hai-Anh and Peter Lanjouw. (2018). “Poverty and Vulnerability Dynamics for India during 2004-2012: Insights from Longitudinal Analysis Using Synthetic Panel Data”. Economic Development and Cultural Change, 67(1), 131-170. ---. (forthcoming). “Data scarcity and poverty measurement”. In Jacques Silber. (Eds.). Handbook of Research on Measuring Poverty and Deprivation. Edward Elgar Press. Dang, Hai-Anh and Paolo Verme. (2021). “Estimating Poverty for Refugee Populations Can Cross-Survey Imputation Methods Substitute for Data Scarcity?” ECINEQ Working Paper 2021-578. 34 Dang, Hai-Anh, Dean Jolliffe, and Calogero Carletto. (2019). "Data Gaps, Data Incomparability, and Data Imputation: A Review of Poverty Measurement Methods for Data-Scarce Environments". Journal of Economic Surveys, 33(3): 757-797. Dang, Hai-Anh, Peter Lanjouw, Umar Serajuddin. (2017). “Updating Poverty Estimates at Frequent Intervals in the Absence of Consumption Data: Methods and Illustration with Reference to a Middle-Income Country.” Oxford Economic Papers, 69(4): 939-962. Dang, Hai-Anh, Peter Lanjouw, Jill Luoto, and David McKenzie. (2014). “Using Repeated Cross- Sections to Explore Movements in and out of Poverty”. Journal of Development Economics, 107: 112-128. Daniels, Lisa, and Nicholas Minot. (2015). "Is poverty reduction over-stated in Uganda? Evidence from alternative poverty measures." Social Indicators Research, 121(1): 115-133. De Luca, Giuseppe, Jan R. Magnus, and Franco Peracchi. (2018). "Balanced variable addition in linear models." Journal of Economic Surveys, 32(4): 1183-1200. Deaton, Angus and Valerie Kozel. (2005). The Great Indian Poverty Debate. New Delhi: Macmillan. Devarajan, Shantayanan. (2013). "Africa's statistical tragedy." Review of Income and Wealth, 59: S9-S15. Douidich, Mohamed, Abdeljaouad Ezzrari, Roy van der Weide, and Paolo Verme. (2016). “Estimating Quarterly Poverty Rates Using Labor Force Surveys: A Primer.” World Bank Economic Review, 30(3): 475-500. Elbers, Chris, Jean O. Lanjouw, and Peter Lanjouw. (2003). “Micro-Level Estimation of Poverty and Inequality.” Econometrica, 71(1): 355-364. Filmer, Deon and Lant Pritchett. (2001). “Estimating Wealth Effects without Expenditure Data— or Tears: An Application to Educational Enrollments in States of India”. Demography, 38(1): 115–132. Garbero, Alessandra. (2014). Estimating poverty dynamics using synthetic panels for IFAD- supported projects: a case study from Vietnam. Journal of Development Effectiveness, 6, 490- 510. Jean, Neal, Marshall Burke, Michael Xie, W. Matthew Davis, David B. Lobell, and Stefano Ermon. (2016). “Combining Satellite Imagery and Machine Learning to Predict Poverty”. Science, 19(353): 790-794. 35 Jenkins, Stephen P., Richard V. Burkhauser, Shuaizhang Feng, and Jeff Larrimore. (2011). “Measuring Inequality Using Censored Data: A Multiple-imputation Approach to Estimation and Inference.” Journal of the Royal Statistical Society: Series A, 174(1): 63–81. Jerven, Morten. (2019). "The Problems of Economic Data in Africa." In Oxford Research Encyclopedia of Politics. Kilic, T., and Sohnesen, T. (2019). “Same question but different answer: experimental evidence on questionnaire design’s impact on poverty measured by proxies.” Review of Income and Wealth, 65.1, pp. 144-165. Kijima, Yoko, and Peter Lanjouw. (2003). “Poverty in India during the 1990s: A Regional Perspective.” Policy Research Working Paper # 3141. World Bank: Washington, D.C. Mathiassen, Astrid. (2009). “A Model Based Approach for Predicting Annual Poverty Rates without Expenditure Data”. Journal of Economic Inequality, 7:117–135. ---. (2013). “Testing Prediction Performance of Poverty Models: Empirical Evidence from Uganda”. Review of Income and Wealth 59, no. 1:91–112. Naudé, W. and Vinuesa, R. (2020). Data, global development, and COVID-19. WIDER Working Paper 2020/109. Pituch, Keenan A. and James P. Stevens. (2016). Applied Multivariate Statistics for the Social Sciences: Analyses with SAS and IBM’s SPSS. Routledge: New York. Rubin, Donald B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: Wiley. Serajuddin, Umar, Hiroki Uematsu, Christina Wieser, Nobuo Yoshida, and Andrew Dabalen. (2015). "Data deprivation: another deprivation to end." World Bank Policy Research Paper no. 7252, World Bank, Washington, DC. Stifel, D. and Christiaensen, L. (2007) “Tracking Poverty over Time in the Absence of Comparable Consumption Data”. World Bank Economic Review, 21, 317-341. Tanzania’s National Bureau of Statistics. (2011). Basic Information Document—National Panel Survey 2010-11. Tarozzi, Alessandro. (2007). “Calculating Comparable Statistics from Incomparable Surveys, With an Application to Poverty in India”. Journal of Business and Economic Statistics 25, no. 3:314-336. Tittonell, Pablo, and Ken E. Giller. (2013). "When yield gaps are poverty traps: The paradigm of ecological intensification in African smallholder agriculture." Field Crops Research, 143: 76- 90. 36 UNESCO-UIS/OECD/EUROSTAT. (UOE). (2020). Data collection on formal education— Manual on concepts, definitions and classifications. Montreal/ Paris/ Luxembourg. United States Census Bureau. (2017). Current Population Survey, Imputation of Unreported Data Items. Accessed on the Internet on May 24, 2021 at https://www.census.gov/programs- surveys/cps/technical-documentation/methodology/imputation-of-unreported-data-items.html West, Paul C., James S. Gerber, Peder M. Engstrom, Nathaniel D. Mueller, Kate A. Brauman, Kimberly M. Carlson, Emily S. Cassidy et al. (2014). "Leverage points for improving global food security and the environment." Science, 345(6194): 325-328. World Bank. (2012). “Well Begun, Not Yet Done: Vietnam’s Remarkable Progress on Poverty Reduction and the Emerging Challenges”. Vietnam Poverty Assessment Report 2012. Hanoi: World Bank. ---. (2017). Monitoring Global Poverty: Report of the Commission on Global Poverty. Washington, DC: The World Bank. ---. (2021). World Development Report 2021: Data for Better Lives. Washington, DC: World Bank. 37 Table 1. Predicted Poverty Rates Based on Imputation from 2014 to 2016, Vietnam (percentage) 2016 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 15.1 13.4 6.2 8.5 10.6 12.5 13.3 11.5 9.7* 1) Normal linear regression model (0.5) (0.4) (0.4) (0.4) (0.4) (0.5) (0.5) (0.5) (0.4) 14.7 13.2 6.0 8.4 10.4 12.3 13.1 11.2 9.1* 2) Empirical distribution of the error terms (0.5) (0.5) (0.4) (0.4) (0.4) (0.5) (0.5) (0.5) (0.4) Control variables Food expenditures Y Non-food expenditures Y Durables expenditures Y Health expenditures Y Education expenditures Y Electricity, water, & garbage expenditures Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.46 0.69 0.86 0.94 0.73 0.71 0.70 0.70 0.56 ρ(y, yh) 0.46 0.69 0.86 0.94 0.74 0.71 0.69 0.72 0.57 N 9347 9347 9347 9347 9347 9347 9347 9347 9347 True poverty rate 9.6 (0.4) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2016 use the estimated parameters based on the 2014 data. 100 simulations are implemented. The underlying regression results are provided in Appendix B, Table B.1. True poverty rate is the estimate directly obtained from the survey data. 38 Table 2. Predicted Poverty Rates Based on Imputation from 2010 to 2013, Malawi (percentage) 2013 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 38.9* 40.9 35.5 40.3 39.5* 40.9 40.7 41.8 40.4 1) Normal linear regression model (1.3) (1.4) (1.5) (1.5) (1.4) (1.4) (1.4) (1.4) (1.3) 2) Empirical distribution of the error 39.2* 41.0 35.9 40.5 39.7 41.0 40.8 42.2 40.9 terms (1.3) (1.4) (1.5) (1.5) (1.4) (1.4) (1.4) (1.4) (1.3) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.52 0.68 0.93 0.87 0.71 0.69 0.68 0.71 0.59 ρ(y, yh) 0.49 0.65 0.92 0.87 0.67 0.66 0.64 0.69 0.54 N 4,000 4,000 4,000 4,000 4,000 4,000 4,000 4,000 4,000 True poverty rate 37.9 (1.7) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2013 use the estimated parameters based on the 2010 data. 100 simulations are implemented. The underlying regression results are provided in Appendix B, Table B.2. True poverty rate is the estimate directly obtained from the survey data. 39 Table 3. Predicted Poverty Rates Based on Imputation from 2010/11 to 2012/13, Nigeria (percentage) 2012/13 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 31.4 29.2* 27.0 31.8 29.3* 29.1* 29.7* 29.1* 31.1 1) Normal linear regression model (1.1) (1.2) (1.2) (1.2) (1.2) (1.1) (1.2) (1.2) (1.1) 2) Empirical distribution of the error 31.3 29.2* 27.1 31.9 29.3* 29.1* 29.8* 29.2* 31.1 terms (1.1) (1.2) (1.2) (1.2) (1.2) (1.1) (1.2) (1.2) (1.1) Control variables Food expenditures Y Non-food expenditures Y Infrequent non-food expenditures Y Health expenditures Y Education expenditures Y Utilities: electricity, fuel, water, garbage Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.44 0.56 0.92 0.73 0.57 0.58 0.57 0.56 0.45 ρ(y, yh) 0.43 0.54 0.93 0.73 0.57 0.57 0.56 0.55 0.44 N 4,406 4,406 4,406 4,406 4,406 4,406 4,406 4,406 4,406 True poverty rate 28.7 (1.2) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2012/13 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. The underlying regression results are provided in Appendix B, Table B.3. True poverty rate is the estimate directly obtained from the survey data. Consumption expenditures are measured in 2011 PPP$. The poverty line is set at $1.90 in 2011 PPP$. 40 Table 4. Predicted Poverty Rates Based on Imputation from 2010/11 to 2012/13, Tanzania (percentage) 2012/13 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 18.1 17.2 18.5 21.2* 17.3 17.6 17.3 19.2 21.3* 1) Normal linear regression model (0.9) (0.9) (1.0) (1.0) (0.9) (0.9) (0.9) (1.0) (1.0) 2) Empirical distribution of the error 18.0 17.1 18.5 20.9* 17.1 17.3 17.1 19.0 21.2* terms (0.9) (0.9) (1.0) (1.0) (0.9) (0.9) (0.9) (1.0) (1.0) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.45 0.59 0.92 0.75 0.61 0.61 0.59 0.60 0.49 ρ(y, yh) 0.42 0.57 0.93 0.76 0.59 0.59 0.58 0.59 0.50 N 4,858 4,858 4,858 4,858 4,858 4,858 4,858 4,858 4,858 True poverty rate 20.8 (1.0) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2012/13 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. The underlying regression results are provided in Appendix B, Table B.4. True poverty rate is the estimate directly obtained from the survey data. 41 Table 5: Predicted Poverty Rates Based on Imputation Using Geospatial Data from 2014 to 2016, Vietnam (percentage) 2016 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 13.4 13.3 13.3 9.6* 9.6* 9.5* 1) Normal linear regression model (0.5) (0.5) (0.5) (0.4) (0.4) (0.4) 13.2 13.1 13.1 9.2* 9.1* 9.1* 2) Empirical distribution of the error terms (0.5) (0.5) (0.5) (0.4) (0.4) (0.4) Control variables Distances to facilities Y Y Nightlight intensity Y Y Agricultural soil quality index Y Y Electricity, water, & garbage expenditures Y Y Y Household assets & house characteristics Y Y Y Demographics & employment Y Y Y Y Y Y R2 0.69 0.69 0.69 0.57 0.56 0.56 ρ(y, yh) 0.69 0.69 0.70 0.59 0.57 0.58 N 9326 9326 9326 9326 9326 9326 True poverty rate 9.6 (0.4) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2016 use the estimated parameters based on the 2014 data. 100 simulations are implemented. The underlying regression results are provided in Appendix B, Table B.5. True poverty rate is the estimate directly obtained from the survey data. 42 Table 6. Predicted Poverty Rates Based on Imputation Using Geospatial Variables from 2010 to 2013, Malawi (percentage) 2013 Method Model 1 Model 3 Model 4 Model 6 52.7 40.8 50.9 40.3 1) Normal linear regression model (1.3) (1.4) (1.3) (1.3) 52.9 40.8 51.7 40.9 2) Empirical distribution of the error terms (1.4) (1.4) (1.3) (1.3) Control variables Distances to facilities Y Y Agricultural soil quality index Y Y Utilities: water, fuel, gas, electricity Y Y Household assets & house characteristics Y Y Demographics & employment Y Y Y Y R2 0.69 0.68 0.60 0.59 ρ(y, yh) 0.61 0.64 0.51 0.54 N 4,000 4,000 4,000 4,000 True poverty rate 37.9 (1.7) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2013 use the estimated parameters based on the 2010 data. 100 simulations are implemented. The underlying regression results are provided in Appendix B, Table B.6. True poverty rate is the estimate directly obtained from the survey data. 43 Table 7. Predicted Poverty Rates Based on Imputation Using Geospatial Variables from 2010/11 to 2012/13, Nigeria (percentage) 2012/13 Method Model 1 Model 3 Model 4 Model 6 29.4* 29.1* 30.8 31.0 1) Normal linear regression model (1.2) (1.1) (1.1) (1.1) 29.3* 29.1* 30.8 31.0 2) Empirical distribution of the error terms (1.2) (1.2) (1.2) (1.1) Control variables Distances to facilities Y Y Agricultural soil quality index Y Y Utilities: electricity, fuel, water, garbage Y Y Household assets & house characteristics Y Y Demographics & employment Y Y Y Y R2 0.56 0.56 0.46 0.46 ρ(y, yh) 0.55 0.56 0.44 0.44 N 4,406 4,406 4,406 4,406 True poverty rate 28.7 (1.2) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2012/13 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. The underlying regression results are provided in Appendix B, Table B.7. True poverty rate is the estimate directly obtained from the survey data. Consumption expenditures in post-harvest period are measured in 2011 PPP$. The poverty line is set at $1.90 in 2011 PPP$. 44 Table 8. Predicted Poverty Rates Based on Imputation Using Geospatial Variables from 2010/11 to 2012/13, Tanzania (percentage) 2012/13 Method Model 1 Model 3 Model 4 Model 6 17.3 17.4 21.1* 21.3* 1) Normal linear regression model (0.9) (0.9) (1.0) (1.0) 17.1 17.2 21.1* 21.3* 2) Empirical distribution of the error terms (0.9) (0.9) (1.0) (1.0) Control variables Distances to facilities Y Y Agricultural soil quality index Y Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Demographics & employment Y Y Y Y R2 0.59 0.59 0.50 0.49 ρ(y, yh) 0.57 0.58 0.50 0.48 N 4,837 4,837 4,837 4,837 True poverty rate 20.9 (1.0) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2012/13 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. The underlying regression results are provided in Appendix B, Table B.8. True poverty rate is the estimate directly obtained from the survey data. 45 Table 9. Predicted Poverty Rates Based on Imputation Using More Disaggregated Food Item Consumption from 2012 to 2014, Vietnam (percentage) Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 13.0* 13.4* 13.1* 13.2* 13.1* 12.6 13.2* 13.6* 1) Normal linear regression model (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) 2) Empirical distribution of the error 12.9* 13.1* 13.0* 12.9* 13.0* 12.5 13.1* 13.5* terms (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) Control variables Rice expenditures Y Meat expenditures Y Seafood expenditures Y Vegetable & fruit expenditures Y Lard & cooking oil expenditures Y Milk products expenditures Y Drink expenditures Y Food-away-from-home expenditures Y Household assets & house Y Y Y Y Y Y Y Y characteristics Demographics & employment Y Y Y Y Y Y Y Y R2 0.68 0.71 0.69 0.71 0.69 0.70 0.70 0.71 N 9300 9300 9300 9300 9300 9300 9300 9300 True poverty rate 13.2 (0.4) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2014 use the estimated parameters based on the 2012 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 46 Table 10. Predicted Poverty Rates Based on Imputation with Dummy Variables Indicating More Disaggregated Food Item Consumption, Vietnam 2014 (percentage) Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 13.0* 13.0* 13.0* 13.0* 13.0* 13.0* 13.0* 13.3* 1) Normal linear regression model (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) 2) Empirical distribution of the error 12.9* 12.9* 12.9* 12.9* 12.9* 12.9* 12.9* 13.1* terms (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) Control variables Had rice expenditures Y Had meat expenditures Y Had seafood expenditures Y Had vegetable & fruit expenditures Y Had lard & cooking oil expenditures Y Had milk products expenditures Y Had drink expenditures Y Had food-away-from-home Y expenditures Household assets & house Y Y Y Y Y Y Y Y characteristics Demographics & employment Y Y Y Y Y Y Y Y R2 0.68 0.71 0.69 0.71 0.69 0.70 0.70 0.71 N 9300 9300 9300 9300 9300 9300 9300 9300 True poverty rate 13.2 (0.4) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2014 use the estimated parameters based on the 2012 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 47 Table 11. Predicted Poverty Rates Based on Imputation Using Variables from Commune Survey from 2012 to 2014, Vietnam (percentage) Method Model 1 Model 2 Model 3 22.3 18.0* 17.8* 1) Normal linear regression model (0.6) (0.6) (0.6) 22.0 17.9* 17.5* 2) Empirical distribution of the error terms (0.6) (0.6) (0.6) Control variables Demographics & employment Y Y Y Household assets & house characteristics Y Electricity, water, & garbage expenditures Y Commune topography & poverty status Y Y Y N 6494 6494 6494 True poverty rate 18.1 (0.6) Note: Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2014 use the estimated parameters based on the 2012 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 48 Table 12. Predicted Poverty Rates Based on Imputation Using Population Characteristics from 2012 to 2014, Vietnam (percentage) Method Model 1 Model 2 Model 3 16.5 13.3* 13.1* 1) Normal linear regression model (0.5) (0.4) (0.4) 16.1 13.1* 12.8* 2) Empirical distribution of the error terms (0.5) (0.6) (0.6) Control variables Demographics & employment Y Y Y Household assets & house characteristics Y Electricity, water, & garbage expenditures Y Census characteristics on education, ethnicity, household Y Y Y assets, and house quality averaged at commune level R2 0.51 0.70 0.58 N 9241 9241 9241 True poverty rate 13.2 (0.4) Note: Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2014 use the estimated parameters based on the 2012 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Census commune-averaged characteristics include the share of the population with college/ university education, the share of the population that belong to ethnic majority groups, the average household's asset index and living areas, and the share of houses with high quality cooking fuel sources, drinking water sources, and toilet facilities. 49 Table 13. Meta-analysis of Imputation Models and Their Parameters, Logit Regressions 50 Figure 1. Predicted Poverty Rates for Urban vs. Rural Areas (Using Normal Linear Regression Models) 51 Figure 2. Predicted Poverty Rates Based on Within-Year Imputation Vietnam Ethiopia Poverty rate (%) Poverty rate (%) 10 20 30 40 50 10 20 30 40 50 1 2 3 4 5 6 7 8 9 1 2 3 4 7 8 9 Malawi Nigeria Poverty rate (%) Poverty rate (%) 10 20 30 40 50 10 20 30 40 50 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 Tanzania Predicted Poverty Poverty rate (%) 10 20 30 40 50 True Poverty 95% CIs 1 2 3 4 5 6 7 8 9 Note: Calculations are based on data on 2018/19 for Ethiopia, 2016/17 for Malawi, 2012/13 for Nigeria and 2010/11 for Tanzania. Estimates are obtained by imputing from sample 1 into sample 2. 52 Figure 3. Imputation Accuracy for Different Imputation Models 53 Appendix A: Overview of (i) Key Poverty Imputation Studies and (ii) Poverty Predictors in Core Imputation Models Table A.1. Overview of Key Poverty Imputation Studies (with Validation) since the 2000s 54 Table A.2. List of variables that are used in the core imputation models, by country 55 Appendix B: Additional Tables for the Main Analysis 56 Table B.2. Household consumption model, Malawi 2010 57 Table B.3. Household consumption model, Nigeria 2010/11 58 Table B.4. Household consumption model, Tanzania 2010/11 59 Table B.5. Household consumption model using geospatial variables, Vietnam 2014 60 Table B.6. Household consumption model using geospatial variables, Malawi 2010 61 Table B.7. Household consumption model using geospatial variables, Nigeria 2010/11 62 Table B.8. Household consumption model using geospatial variables, Tanzania 2010/11 63 Table B.9. Household consumption model, Tanzania 2010/11-2012/13 64 Table B.10. Household Consumption Model Using Geospatial Variables, Tanzania 2010/11- 2012/13 65 Table B.11. Predicted Poverty Rates Based on Imputation from 2010 to 2012, Vietnam (percentage) 2012 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 20.7 17.1* 18.0 10.8 19.4 17.4 17.1* 16.4* 16.8* 1) Normal linear regression model (0.5) (0.5) (0.5) (0.4) (0.5) (0.5) (0.5) (0.5) (0.5) 20.4 17.1* 18.0 10.8 19.3 17.4 17.0* 16.3* 16.3* 2) Empirical distribution of the error terms (0.5) (0.5) (0.5) (0.4) (0.5) (0.5) (0.5) (0.5) (0.5) Control variables Food expenditures Y Non-food expenditures Y Durables expenditures Y Health expenditures Y Education expenditures Y Electricity, water, & garbage expenditures Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.47 0.69 0.87 0.94 0.74 0.71 0.70 0.71 0.57 ρ(y, yh) 0.45 0.68 0.87 0.92 0.73 0.71 0.69 0.70 0.56 N 9261 9261 9261 9261 9261 9261 9261 9261 9261 True poverty rate 16.6 (0.5) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2012 use the estimated parameters based on the 2010 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 66 Table B.12. Predicted Poverty Rates Based on Imputation from 2012 to 2014, Vietnam (percentage) 2014 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 1) Normal linear regression 16.2 13.0* 13.6* 9.8 13.1* 12.4 12.9* 12.3 12.7* model (0.5) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) 2) Empirical distribution of 16.0 12.9* 13.5* 9.7 13.0* 12.2 12.8* 12.2 12.2 the error terms (0.5) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) Control variables Food expenditures Y Non-food expenditures Y Durables expenditures Y Health expenditures Y Education expenditures Y Electricity, water, & garbage Y Y expenditures Household assets & house Y Y Y Y Y Y Y characteristics Demographics & Y Y Y Y Y Y Y Y Y employment R2 0.45 0.68 0.87 0.92 0.72 0.70 0.69 0.69 0.54 ρ(y, yh) 0.44 0.67 0.86 0.93 0.72 0.69 0.68 0.69 0.55 N 9300 9300 9300 9300 9300 9300 9300 9300 9300 True poverty rate 13.2 (0.4) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2014 use the estimated parameters based on the 2012 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 67 Table B.13. Predicted Poverty Rates Based on Imputation, from 2010/11 to 2016/17, Malawi (percentage) 2016/17 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 62.7 61.2 52.3* 55.6 60.6 60.3 60.7 54.2 51.8* 1) Normal linear regression model (0.7) (0.8) (0.9) (0.8) (0.8) (0.8) (0.8) (0.8) (0.8) 2) Empirical distribution of the error 62.9 61.2 52.6 55.9 60.6 60.2 60.7 54.3 52.3* terms (0.7) (0.8) (0.9) (0.8) (0.8) (0.8) (0.8) (0.8) (0.7) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.46 0.62 0.93 0.85 0.66 0.63 0.62 0.65 0.55 ρ(y, yh) 0.49 0.63 0.91 0.83 0.66 0.65 0.62 0.65 0.56 N 12,446 12,446 12,446 12,446 12,446 12,446 12,446 12,446 12,446 True poverty rate 51.5 (0.9) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2016/17 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 68 Table B.14. Predicted Poverty Rates Based on Imputation, from 2008/09 to 2010/11, Tanzania (percentage) 2010/11 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 17.7* 14.5 18.9* 15.5 14.2 14.8 14.5 16.2 18.9* 1) Normal linear regression model (0.9) (0.9) (1.0) (1.0) (0.9) (0.9) (0.9) (1.0) (1.0) 2) Empirical distribution of the error 17.3* 14.0 18.8* 15.1 13.8 14.4 14.0 15.7 18.6* terms (0.9) (0.9) (1.1) (1.0) (0.9) (0.9) (0.9) (1.0) (1.0) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.43 0.56 0.92 0.76 0.58 0.58 0.56 0.59 0.50 ρ(y, yh) 0.43 0.57 0.92 0.75 0.57 0.59 0.55 0.59 0.51 N 3,823 3,823 3,823 3,823 3,823 3,823 3,823 3,823 3,823 True poverty rate 18.0 (1.1) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2010/11 use the estimated parameters based on the 2008/09 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 69 Table B.15. Predicted Poverty Rates Based on Within-Year Imputation in 2018/19, Ethiopia (percentage) 2018/19 Method Model 1 Model 2 Model 3 Model 4 Model 7 Model 8 Model 9 43.9 45.1 42.0* 45.0 45.1 45.0 44.3 1) Normal linear regression model (1.6) (1.6) (2.3) (1.7) (1.6) (1.7) (1.6) 44.6 46.1 42.0* 46.5 46.1 46.1 45.1 2) Empirical distribution of the error terms (1.6) (1.6) (2.3) (1.7) (1.6) (1.7) (1.6) Control variables Food expenditures Y Non-food expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y R2 0.42 0.49 0.95 0.60 0.49 0.51 0.47 ρ(y, yh) 0.42 0.48 0.96 0.60 0.49 0.50 0.45 N 3,368 3,368 3,368 3,368 3,368 3,368 3,368 True poverty rate 40.8 (2.4) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. The estimation sample is generated by splitting the data into two random samples. The imputed poverty rate for sample 2 use the estimated parameters based on the sample 1. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the sample 2. 70 Table B.16. Predicted Poverty Rates Based on Within-Year Imputation in 2016/17, Malawi (percentage) 2016/17 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 50.4 51.3* 51.4* 51.3* 51.3* 50.9* 51.3* 51.1* 50.1 1) Normal linear regression model (1.0) (1.0) (1.0) (1.0) (1.0) (1.0) (1.0) (1.0) (1.0) 2) Empirical distribution of the error 50.7* 51.3* 51.7* 51.4* 51.4* 50.9* 51.3* 51.1* 50.5* terms (1.0) (1.0) (1.0) (1.1) (1.0) (1.0) (1.0) (1.0) (1.0) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.53 0.64 0.92 0.84 0.70 0.67 0.65 0.67 0.58 ρ(y, yh) 0.54 0.65 0.91 0.84 0.70 0.66 0.64 0.67 0.59 N 6,223 6,223 6,223 6,223 6,223 6,223 6,223 6,223 6,223 True poverty rate 51.6 (1.1) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. The estimation sample is generated by splitting the data into two random samples. The imputed poverty rate for sample 2 use the estimated parameters based on the sample 1. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the sample 2. 71 Table B.17. Predicted Poverty Rates Based on Within-Year Imputation in 2012/13, Nigeria (percentage) 2012/13 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 29.0* 27.7 29.6* 27.4 27.6 27.5 27.8* 27.6 28.7* 1) Normal linear regression model (1.4) (1.4) (1.5) (1.4) (1.4) (1.4) (1.4) (1.4) (1.4) 2) Empirical distribution of the error 29.2* 28.0* 29.9* 27.4 27.9* 27.7 28.1* 27.9* 28.9* terms (1.4) (1.4) (1.5) (1.4) (1.4) (1.4) (1.5) (1.4) (1.4) Control variables Food expenditures Y Non-food expenditures Y Infrequent non-food expenditures Y Health expenditures Y Education expenditures Y Utilities: electricity, fuel, water, garbage Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.43 0.56 0.95 0.74 0.57 0.58 0.57 0.57 0.45 ρ(y, yh) 0.43 0.54 0.92 0.72 0.56 0.58 0.56 0.58 0.43 N 2,197 2,197 2,197 2,197 2,197 2,197 2,197 2,197 2,197 True poverty rate 29.3 (1.5) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. The estimation sample is generated by splitting the data into two random samples. The imputed poverty rate for sample 2 use the estimated parameters based on the sample 1. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the sample 2. Consumption expenditures are measured in 2011 PPP$. The poverty line is set at $1.90 in 2011 PPP$. 72 Table B.18. Predicted Poverty Rates Based on Within-Year Imputation in 2012/13, Tanzania (percentage) 2012/13 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 20.0* 21.0* 20.0* 20.8* 20.5* 20.9* 21.0* 21.2 20.7* 1) Normal linear regression model (1.2) (1.3) (1.3) (1.3) (1.3) (1.3) (1.3) (1.3) (1.3) 2) Empirical distribution of the error 19.5* 20.8* 20.0* 20.7* 20.3* 20.8* 20.8* 21.0* 20.3* terms (1.2) (1.3) (1.3) (1.3) (1.3) (1.3) (1.3) (1.4) (1.3) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.45 0.59 0.93 0.77 0.60 0.60 0.59 0.59 0.47 ρ(y, yh) 0.41 0.61 0.93 0.77 0.59 0.59 0.57 0.57 0.45 N 2,430 2,430 2,430 2,430 2,430 2,430 2,430 2,430 2,430 True poverty rate 19.9 (1.2) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. The estimation sample is generated by splitting the data into two random samples. The imputed poverty rate for sample 2 use the estimated parameters based on the sample 1. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the sample 2. 73 Table B.19. Predicted Poverty Rates Based on Within-Year Imputation in 2016, Vietnam (percentage) 2016 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 10.2 10.3 10.1 9.0* 10.2 10.2 10.3 9.9 9.1* 1) Normal linear regression model (0.5) (0.5) (0.6) (0.5) (0.5) (0.5) (0.5) (0.5) (0.5) 9.8 10.0 10.0 8.8* 9.9 9.9 10.0 9.6 8.5* 2) Empirical distribution of the error terms (0.5) (0.5) (0.5) (0.5) (0.5) (0.5) (0.5) (0.5) (0.5) Control variables Food expenditures Y Non-food expenditures Y Durables expenditures Y Health expenditures Y Education expenditures Y Electricity, water, & garbage expenditures Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.47 0.69 0.87 0.95 0.74 0.71 0.70 0.71 0.59 ρ(y, yh) 0.47 0.70 0.87 0.94 0.75 0.71 0.70 0.71 0.57 N 4,679 4,679 4,679 4,679 4,679 4,679 4,679 4,679 4,679 True poverty rate 9.0 (0.5) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. The estimation sample is generated by splitting the data into two random samples. The imputed poverty rate for sample 2 use the estimated parameters based on the sample 1. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the sample 2. 74 Table B.20. Predicted Poverty Rates Based on Imputation Using More Disaggregated Food Item Consumption, Vietnam 2014 (percentage) Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 12.8* 13.3* 12.9* 13.0* 12.9* 12.3 13.2* 13.3* 1) Normal linear regression model (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) 2) Empirical distribution of the error 12.4 12.7* 12.5 12.5 12.5 11.9 12.8* 12.9* terms (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) Control variables Rice expenditures Y Meat expenditures Y Seafood expenditures Y Vegetable & fruit expenditures Y Lard & cooking oil expenditures Y Milk products expenditures Y Drink expenditures Y Food-away-from-home expenditures Y Electricity, water, & garbage Y Y Y Y Y Y Y Y expenditures Demographics & employment Y Y Y Y Y Y Y Y Adjusted R 2 0.54 0.61 0.56 0.60 0.55 0.60 0.60 0.59 N 9300 9300 9300 9300 9300 9300 9300 9300 True poverty rate 13.2 (0.4) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2014 use the estimated parameters based on the 2012 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 75 Table B.21. Meta-analysis of Imputation Models and Their Parameters, Marginal Effects from Logit Regressions 76 Table B.22. Meta-analysis of Imputation Models and Their Parameters, Ordered Logit Regressions 77 Table B.23. Meta-analysis of Imputation Models and Their Parameters, Logit Regressions with More Parsimonious Models 78 Table B.24. Meta-analysis of Imputation Models and Their Parameters, Logit Regressions with Dummy Variables for Food Consumption 79 Figure B.1. Predicted Poverty Rates for Urban vs. Rural Areas (Using Empirical Distribution of the Error Terms) 80 Figure B.2. Relationship between Model Goodness-of-fit Statistics and Model Numbers 81 Appendix C: Additional Tables for Estimates by Urban/ Rural Areas Table C.1. Predicted Poverty Rates Based on Imputation for Urban Areas from 2012 to 2014, Vietnam (percentage) 2014 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 5.2 3.8* 3.9* 2.5 4.0* 3.6* 3.8* 3.4* 3.5* 1) Normal linear regression model (0.5) (0.5) (0.5) (0.4) (0.5) (0.5) (0.5) (0.4) (0.4) 4.8 3.7* 3.8* 2.4 3.8* 3.5* 3.7* 3.2 2.8 2) Empirical distribution of the error terms (0.5) (0.5) (0.5) (0.4) (0.5) (0.5) (0.5) (0.4) (0.4) Control variables Food expenditures Y Non-food expenditures Y Durables expenditures Y Health expenditures Y Education expenditures Y Electricity, water, & garbage expenditures Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.36 0.66 0.82 0.93 0.70 0.67 0.66 0.67 0.51 ρ(y, yh) 0.31 0.63 0.82 0.93 0.69 0.65 0.64 0.66 0.49 N 2,774 2,774 2,774 2,774 2,774 2,774 2,774 2,774 2,774 True poverty rate 3.7 (0.4) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2014 use the estimated parameters based on the 2012 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 82 Table C.2. Predicted Poverty Rates Based on Imputation for Rural Areas from 2012 to 2014, Vietnam (percentage) 2014 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 22.1 17.8* 18.6* 13.6 17.8* 16.8 17.6* 16.9 17.8* 1) Normal linear regression model (0.6) (0.6) (0.6) (0.5) (0.6) (0.6) (0.6) (0.6) (0.6) 21.8 17.6* 18.4* 13.5 17.6* 16.7 17.5* 16.8 17.3 2) Empirical distribution of the error terms (0.6) (0.6) (0.6) (0.5) (0.6) (0.6) (0.6) (0.6) (0.6) Control variables Food expenditures Y Non-food expenditures Y Durables expenditures Y Health expenditures Y Education expenditures Y Electricity, water, & garbage expenditures Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.37 0.62 0.86 0.90 0.67 0.64 0.63 0.63 0.47 ρ(y, yh) 0.37 0.63 0.86 0.91 0.68 0.64 0.62 0.63 0.49 N 6,526 6,526 6,526 6,526 6,526 6,526 6,526 6,526 6,526 True poverty rate 18.1 (0.6) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2014 use the estimated parameters based on the 2012 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 83 Table C.3. Predicted Poverty Rates Based on Imputation for Urban Areas from 2014 to 2016, Vietnam (percentage) 2016 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 4.0 3.9 1.3* 1.6* 2.4 3.5 3.9 2.6 1.8* 1) Normal linear regression model (0.5) (0.5) (0.3) (0.3) (0.4) (0.4) (0.5) (0.4) (0.3) 3.8 3.7 1.2 1.6* 2.3 3.4 3.7 2.5 1.5* 2) Empirical distribution of the error terms (0.4) (0.5) (0.3) (0.3) (0.4) (0.4) (0.5) (0.4) (0.3) Control variables Food expenditures Y Non-food expenditures Y Durables expenditures Y Health expenditures Y Education expenditures Y Electricity, water, & garbage expenditures Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.35 0.65 0.83 0.95 0.69 0.67 0.66 0.68 0.54 ρ(y, yh) 0.34 0.63 0.84 0.95 0.69 0.65 0.64 0.67 0.53 N 2826 2826 2826 2826 2826 2826 2826 2826 2826 True poverty rate 1.6 (0.3) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2016 use the estimated parameters based on the 2014 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 84 Table C.4. Predicted Poverty Rates Based on Imputation for Rural Areas from 2014 to 2016, Vietnam (percentage) 2016 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 20.1 17.7 8.2 11.7 14.3 16.6 17.7 15.7 13.6* 1) Normal linear regression model (0.6) (0.7) (0.5) (0.6) (0.6) (0.6) (0.7) (0.6) (0.6) 19.9 17.6 8.0 11.6 14.2 16.5 17.5 15.5 13.3* 2) Empirical distribution of the error terms (0.6) (0.7) (0.5) (0.6) (0.6) (0.6) (0.7) (0.6) (0.6) Control variables Food expenditures Y Non-food expenditures Y Durables expenditures Y Health expenditures Y Education expenditures Y Electricity, water, & garbage expenditures Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.35 0.65 0.83 0.95 0.69 0.67 0.66 0.68 0.54 ρ(y, yh) 0.42 0.65 0.85 0.92 0.71 0.69 0.65 0.67 0.54 N 6521 6521 6521 6521 6521 6521 6521 6521 6521 True poverty rate 13.3 (0.6) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2016 use the estimated parameters based on the 2014 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 85 Table C.5. Predicted Poverty Rates Based on Imputation for Urban Areas, from 2010 to 2013, Malawi (percentage) 2013 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 13.5 15.6 20.3 23.1* 16.8 15.7 15.7 16.9 15.4 1) Normal linear regression model (1.9) (2.4) (3.2) (3.9) (2.6) (2.5) (2.3) (2.8) (2.3) 2) Empirical distribution of the error 13.8 15.9 20.6 23.5* 17.3 15.9 16.1 17.1 15.3 terms (2.0) (2.5) (3.4) (4.0) (2.7) (2.6) (2.5) (2.8) (2.4) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.54 0.76 0.92 0.94 0.78 0.77 0.77 0.78 0.61 ρ(y, yh) 0.47 0.71 0.93 0.94 0.77 0.72 0.75 0.74 0.58 N 1,046 1,046 1,046 1,046 1,046 1,046 1,046 1,046 1,046 True poverty rate 25.2 (4.2) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2013 use the estimated parameters based on the 2010 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 86 Table C.6. Predicted Poverty Rates Based on Imputation for Rural Areas, from 2010 to 2013, Malawi (percentage) 2013 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 44.3 45.6 38.5* 43.9 43.7 45.7 45.3 46.5 45.5 1) Normal linear regression model (1.4) (1.5) (1.6) (1.6) (1.5) (1.5) (1.5) (1.5) (1.4) 2) Empirical distribution of the error 44.3 45.5 38.7 44.1 43.7 45.7 45.2 46.5 45.8 terms (1.4) (1.5) (1.6) (1.6) (1.5) (1.5) (1.5) (1.5) (1.4) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.36 0.54 0.91 0.79 0.58 0.55 0.54 0.58 0.44 ρ(y, yh) 0.37 0.53 0.91 0.80 0.58 0.54 0.53 0.58 0.43 N 2,954 2,954 2,954 2,954 2,954 2,954 2,954 2,954 2,954 True poverty rate 40.3 (1.8) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2013 use the estimated parameters based on the 2010 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 87 Table C.7. Predicted Poverty Rates Based on Imputation for Urban Areas, from 2010/11 to 2016/17, Malawi (percentage) 2016/17 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 31.0 33.3 21.7 21.2 32.1 31.3 32.8 25.7 19.6 1) Normal linear regression model (1.8) (2.1) (1.8) (1.9) (2.1) (2.1) (2.1) (1.9) (1.5) 2) Empirical distribution of the error 31.5 33.5 21.6 21.1 32.4 31.6 33.0 26.0 19.7 terms (1.8) (2.2) (1.8) (1.9) (2.1) (2.1) (2.1) (1.9) (1.5) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.52 0.75 0.93 0.93 0.78 0.76 0.75 0.78 0.63 ρ(y, yh) 0.50 0.74 0.90 0.91 0.77 0.74 0.74 0.76 0.62 N 2,272 2,272 2,272 2,272 2,272 2,272 2,272 2,272 2,272 True poverty rate 17.7 (1.8) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2016/17 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 88 Table C.8. Predicted Poverty Rates Based on Imputation for Rural Areas, from 2010/11 to 2016/17, Malawi (percentage) 2016/17 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 70.4 67.6 59.6* 63.3 67.1 67.0 67.1 60.8 59.9* 1) Normal linear regression model (0.6) (0.7) (0.9) (0.8) (0.7) (0.7) (0.7) (0.8) (0.7) 2) Empirical distribution of the error 70.6 67.6 59.9* 63.7 67.2 66.9 67.0 60.8 60.2* terms (0.6) (0.7) (0.9) (0.8) (0.7) (0.7) (0.7) (0.8) (0.7) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.34 0.50 0.92 0.79 0.55 0.52 0.50 0.54 0.43 ρ(y, yh) 0.35 0.48 0.89 0.74 0.56 0.52 0.50 0.53 0.43 N 10,174 10,174 10,174 10,174 10,174 10,174 10,174 10,174 10,174 True poverty rate 59.4 (1.0) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2016/17 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 89 Table C.9. Predicted Poverty Rates Based on Imputation for Urban Areas from 2010/11 to 2012/13, Nigeria (percentage) 2012/13 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 13.8 13.0 9.2* 14.8 12.8 12.7 13.7 13.1 13.9 1) Normal linear regression model (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) 2) Empirical distribution of the error 13.2 12.8 9.2* 14.4 12.5 12.6 13.5 12.8 13.2 terms (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) Control variables Food expenditures Y Non-food expenditures Y Infrequent non-food expenditures Y Health expenditures Y Education expenditures Y Utilities: Electricity, fuel, water, garbage Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.40 0.56 0.89 0.75 0.58 0.57 0.57 0.56 0.42 ρ(y, yh) 0.33 0.52 0.89 0.75 0.54 0.54 0.55 0.53 0.39 N 1,341 1,341 1,341 1,341 1,341 1,341 1,341 1,341 1,341 True poverty rate 9.8 (1.2) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2012/13 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Consumption expenditures in post-harvest period are measured in 2011 PPP$. The poverty line is set at $1.90 in 2011 PPP$ 90 Table C.10. Predicted Poverty Rates Based on Imputation for Rural Areas from 2010/11 to 2012/13, Nigeria (percentage) 2012/13 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 41.4 38.6* 37.1 41.5 38.8* 38.5* 39.1 38.5* 41.2* 1) Normal linear regression model (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) 2) Empirical distribution of the error 41.1 38.2* 37.4 41.4 38.4* 38.0 38.8* 38.1* 41.0* terms (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) Control variables Food expenditures Y Non-food expenditures Y Infrequent non-food expenditures Y Health expenditures Y Education expenditures Y Utilities: Electricity, fuel, water, garbage Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.34 0.49 0.93 0.67 0.50 0.51 0.50 0.49 0.36 ρ(y, yh) 0.32 0.45 0.94 0.68 0.50 0.49 0.46 0.47 0.34 N 3,065 3,065 3,065 3,065 3,065 3,065 3,065 3,065 3,065 True poverty rate 39.6 (1.5) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2012/13 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Consumption expenditures in post-harvest period are measured in 2011 PPP$. The poverty line is set at $1.90 in 2011 PPP$ 91 Table C.11. Predicted Poverty Rates Based on Imputation for Urban Areas, from 2008/09 to 2010/11, Tanzania (percentage) 2010/11 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 6.5 5.7* 6.2* 6.1* 5.3* 5.8* 5.8* 6.2* 6.9 1) Normal linear regression model (1.2) (1.1) (1.1) (1.2) (1.1) (1.1) (1.1) (1.2) (1.2) 2) Empirical distribution of the error 6.7 5.9* 6.1* 6.1* 5.5* 6.0* 5.9* 6.1* 6.7 terms (1.2) (1.1) (1.1) (1.2) (1.1) (1.1) (1.2) (1.2) (1.2) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.46 0.62 0.91 0.85 0.63 0.63 0.62 0.64 0.55 ρ(y, yh) 0.44 0.60 0.90 0.83 0.60 0.61 0.57 0.64 0.54 N 1,253 1,253 1,253 1,253 1,253 1,253 1,253 1,253 1,253 True poverty rate 5.3 (0.9) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2010/11 use the estimated parameters based on the 2008/09 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. 92 Table C.12. Predicted Poverty Rates Based on Imputation for Rural Areas, from 2008/09 to 2010/11, Tanzania (percentage) 2010/11 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 19.9 15.9 23.3* 17.5 15.7 16.4 15.8 17.6 21.5* 1) Normal linear regression model (1.1) (1.1) (1.3) (1.2) (1.1) (1.1) (1.1) (1.1) (1.2) 2) Empirical distribution of the error 19.2 15.3 23.3* 16.8 15.2 15.8 15.3 17.1 20.9 terms (1.1) (1.0) (1.3) (1.2) (1.0) (1.1) (1.0) (1.1) (1.2) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.19 0.33 0.91 0.60 0.35 0.37 0.33 0.37 0.26 ρ(y, yh) 0.21 0.38 0.91 0.62 0.39 0.40 0.39 0.41 0.30 N 2,570 2,570 2,570 2,570 2,570 2,570 2,570 2,570 2,570 True poverty rate 22.4 (1.4) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2010/11 use the estimated parameters based on the 2008/09 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table C.13. Predicted Poverty Rates Based on Imputation for Urban Areas, from 2010/11 to 2012/13, Tanzania (percentage) 2012/13 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 5.0* 4.7* 4.6* 5.6* 5.0* 4.9* 4.8* 5.1* 5.4* 1) Normal linear regression model (0.9) (0.9) (1.0) (1.0) (0.9) (0.9) (0.9) (0.9) (0.9) 2) Empirical distribution of the error 4.7* 4.7* 4.5 5.6* 4.8* 4.8* 4.7* 5.0* 5.1* terms (0.8) (0.9) (1.0) (1.0) (0.9) (0.9) (0.9) (0.9) (0.9) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.45 0.63 0.91 0.84 0.64 0.64 0.63 0.64 0.51 ρ(y, yh) 0.40 0.59 0.89 0.83 0.59 0.59 0.56 0.60 0.45 N 1,722 1,722 1,722 1,722 1,722 1,722 1,722 1,722 1,722 True poverty rate 5.6 (1.0) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2012/13 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table C.14. Predicted Poverty Rates Based on Imputation for Rural Areas, from 2010/11 to 2012/13, Tanzania (percentage) 2012/13 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 22.8 21.4 23.7 25.6* 21.6 21.7 21.5 23.8 26.3* 1) Normal linear regression model (1.1) (1.1) (1.3) (1.3) (1.1) (1.2) (1.1) (1.2) (1.2) 2) Empirical distribution of the error 23.1 21.8 23.6 25.6* 22.0 22.0 21.9 24.1 26.6* terms (1.1) (1.2) (1.3) (1.3) (1.1) (1.2) (1.2) (1.2) (1.2) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y R2 0.28 0.45 0.91 0.63 0.46 0.46 0.45 0.46 0.32 ρ(y, yh) 0.25 0.41 0.92 0.63 0.42 0.44 0.43 0.43 0.30 N 3,136 3,136 3,136 3,136 3,136 3,136 3,136 3,136 3,136 True poverty rate 26.3 (1.3) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2012/13 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Appendix D: Additional Tables for Estimates with Dummy Variables for More Disaggregated Food Item Consumption Table D.1. Predicted Poverty Rates Based on Imputation with Dummy Variables Indicating More Disaggregated Food Item Consumption from 2014 to 2016, Vietnam (percentage) 2016 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 13.3 13.3 13.3 13.3 13.3 13.3 13.3 13.3 1) Normal linear regression model (0.5) (0.5) (0.5) (0.5) (0.5) (0.5) (0.5) (0.5) 2) Empirical distribution of the error 13.2 13.2 13.2 13.2 13.2 13.2 13.2 13.1 terms (0.5) (0.5) (0.5) (0.5) (0.5) (0.5) (0.5) (0.5) Control variables Had rice expenditures Y Had meat expenditures Y Had seafood expenditures Y Had vegetable & fruit expenditures Y Had lard & cooking oil expenditures Y Had milk products expenditures Y Had drink expenditures Y Had food-away-from-home expenditures Y Household assets & house characteristics Y Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.70 ρ(y, yh) 0.69 0.68 0.69 0.69 0.69 0.69 0.69 0.70 N 9,347 9,347 9,347 9,347 9,347 9,347 9,347 9,347 True poverty rate 9.6 (0.4) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2016 use the estimated parameters based on the 2014 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table D.2. Predicted Poverty Rates Based on Imputation with Dummy Variables Indicating More Disaggregated Food Item Consumption from 2010 to 2013, Malawi (percentage) 2013 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 40.9 40.6 41.5 40.8 38.9* 41.3 38.6* 40.7 1) Normal linear regression model (1.4) (1.4) (1.4) (1.4) (1.4) (1.4) (1.4) (1.4) 2) Empirical distribution of the error 41.0 40.7 41.6 41.0 39.1* 41.3 38.7* 40.8 terms (1.4) (1.5) (1.4) (1.4) (1.4) (1.4) (1.5) (1.4) Control variables Had maize expenditures Y Had meat expenditures Y Had fish/seafood expenditures Y Had vegetables and fruits expenditures Y Had cooking oil/fats expenditures Y Had milk/milk products expenditures Y Had drink/beverages expenditures Y Had food away from home expenditures Y Household assets & house characteristics Y Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.68 0.73 0.70 0.68 0.70 0.70 0.72 0.69 ρ(y, yh) 0.64 0.70 0.66 0.66 0.65 0.66 0.68 0.65 N 4,000 4,000 4,000 4,000 4,000 4,000 4,000 4,000 True poverty rate 37.9 (1.7) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2013 use the estimated parameters based on the 2010 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table D.3. Predicted Poverty Rates Based on Imputation with Dummy Variables Indicating More Disaggregated Food Item Consumption from 2010/11 to 2016/17, Malawi (percentage) 2016/17 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 61.3 64.0 61.5 61.3 59.2 61.8 63.5 61.4 1) Normal linear regression model (0.8) (0.8) (0.8) (0.8) (0.8) (0.8) (0.8) (0.8) 61.3 64.0 61.5 61.2 59.3 61.7 63.5 61.3 2) Empirical distribution of the error terms (0.8) (0.8) (0.8) (0.8) (0.8) (0.8) (0.8) (0.8) Control variables Had maize expenditures Y Had meat expenditures Y Had fish/seafood expenditures Y Had vegetables and fruits expenditures Y Had cooking oil/fats expenditures Y Had milk/milk products expenditures Y Had drink/beverages expenditures Y Had food away from home expenditures Y Household assets & house characteristics Y Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.62 0.68 0.63 0.62 0.65 0.64 0.67 0.62 ρ(y, yh) 0.62 0.68 0.64 0.62 0.65 0.66 0.68 0.63 N 12,446 12,446 12,446 12,446 12,446 12,446 12,446 12,446 True poverty rate 51.5 (0.9) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2016/17 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table D.4. Predicted Poverty Rates Based on Imputation with Dummy Variables Indicating More Disaggregated Food Item Consumption from 2010/11 to 2012/13, Nigeria (percentage) 2012/13 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 28.1* 28.3* 28.1* 28.2* 28.0* 27.9* 28.3* 29.2* 1) Normal linear regression model (1.1) (1.1) (1.1) (1.1) (1.1) (1.1) (1.1) (1.2) 27.7* 27.9* 27.7* 27.7* 27.5* 27.5* 27.9* 28.8* 2) Empirical distribution of the error terms (1.1) (1.1) (1.1) (1.1) (1.1) (1.1) (1.1) (1.2) Control variables Had maize expenditures Y Had meat expenditures Y Had fish/seafood expenditures Y Had vegetables and fruits expenditures Y Had cooking oil/fats expenditures Y Had milk/milk products expenditures Y Had drink/beverages expenditures Y Had food away from home expenditures Y Household assets & house characteristics Y Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.57 0.58 0.57 0.57 0.57 0.58 0.58 0.58 ρ(y, yh) 0.53 0.55 0.56 0.54 0.54 0.56 0.57 0.55 N 4,405 4,405 4,405 4,405 4,405 4,405 4,405 4,405 True poverty rate 28.7 (1.2) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2012/13 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Consumption expenditures in post-harvest period are measured in 2011 PPP$. The poverty line is set at $1.90 in 2011 PPP$ Table D.5. Predicted Poverty Rates Based on Imputation with Dummy Variables Indicating More Disaggregated Food Item Consumption from 2008/09 to 2010/11, Tanzania (percentage) Method 2010/11 Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 14.4 13.9 14.4 14.4 14.0 14.0 15.1 13.6 1) Normal linear regression model (0.9) (0.9) (0.9) (0.9) (0.9) (0.9) (0.9) (0.9) 2) Empirical distribution of the error 13.8 13.5 13.8 13.8 13.6 13.6 14.6 13.1 terms (0.9) (0.9) (0.9) (0.9) (0.9) (0.9) (0.9) (0.9) Control variables Had maize expenditures Y Had meat expenditures Y Had fish/seafood expenditures Y Had vegetables and fruits expenditures Y Had cooking oil/fats expenditures Y Had milk/milk products expenditures Y Had drink/beverages expenditures Y Had food away from home expenditures Y Household assets & house characteristics Y Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.56 0.61 0.56 0.56 0.57 0.59 0.58 0.59 ρ(y, yh) 0.56 0.59 0.56 0.56 0.57 0.57 0.57 0.58 N 3,823 3,823 3,823 3,823 3,823 3,823 3,823 3,823 True poverty rate 18.0 (1.1) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2010/11 use the estimated parameters based on the 2008/09 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table D.6. Predicted Poverty Rates Based on Imputation with Dummy Variables Indicating More Disaggregated Food Item Consumption from 2010/11 to 2012/13, Tanzania (percentage) 2012/13 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 17.4 18.2 17.5 17.3 17.5 17.4 18.3 17.4 1) Normal linear regression model (0.9) (0.9) (0.9) (0.9) (0.9) (0.9) (1.0) (0.9) 17.0 17.8 17.1 16.9 17.1 17.0 18.0 16.9 2) Empirical distribution of the error terms (0.9) (0.9) (0.9) (0.9) (0.9) (0.9) (1.0) (0.9) Control variables Had maize expenditures Y Had meat expenditures Y Had fish/seafood expenditures Y Had vegetables and fruits expenditures Y Had cooking oil/fats expenditures Y Had milk/milk products expenditures Y Had drink/beverages expenditures Y Had food away from home Y Household assets & house characteristics Y Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.59 0.62 0.59 0.59 0.59 0.60 0.61 0.62 ρ(y, yh) 0.56 0.62 0.59 0.58 0.58 0.58 0.57 0.60 N 4,858 4,858 4,858 4,858 4,858 4,858 4,858 4,858 True poverty rate 20.8 (1.0) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2012/13 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table D.7. Predicted Poverty Rates Based on Imputation with Dummy Variables Indicating More Disaggregated Food Item Consumption in Urban Areas from 2014 to 2016, Vietnam (percentage) 2016 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 3.9 3.9 3.9 3.9 3.9 3.9 3.9 3.8 1) Normal linear regression model (0.5) (0.5) (0.5) (0.5) (0.5) (0.5) (0.5) (0.5) 2) Empirical distribution of the error 3.7 3.7 3.7 3.7 3.7 3.7 3.7 3.7 terms (0.5) (0.5) (0.5) (0.5) (0.5) (0.5) (0.5) (0.5) Control variables Had rice expenditures Y Had meat expenditures Y Had seafood expenditures Y Had vegetable & fruit expenditures Y Had lard & cooking oil expenditures Y Had milk products expenditures Y Had drink expenditures Y Had food-away-from-home expenditures Y Household assets & house characteristics Y Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.66 ρ(y, yh) 0.64 0.63 0.63 0.63 0.63 0.64 0.63 0.64 N 2,826 2,826 2,826 2,826 2,826 2,826 2,826 2,826 True poverty rate 1.6 (0.3) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2016 use the estimated parameters based on the 2014 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table D.8. Predicted Poverty Rates Based on Imputation with Dummy Variables Indicating More Disaggregated Food Item Consumption in Rural Areas from 2014 to 2016, Vietnam (percentage) 2016 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 17.7 17.7 17.7 17.7 17.7 17.7 17.7 17.6 1) Normal linear regression model (0.7) (0.7) (0.7) (0.7) (0.7) (0.7) (0.7) (0.7) 2) Empirical distribution of the error 17.6 17.6 17.5 17.5 17.6 17.5 17.5 17.5 terms (0.7) (0.7) (0.7) (0.7) (0.7) (0.7) (0.7) (0.7) Control variables Had rice expenditures Y Had meat expenditures Y Had seafood expenditures Y Had vegetable & fruit expenditures Y Had lard & cooking oil expenditures Y Had milk products expenditures Y Had drink expenditures Y Had food-away-from-home expenditures Y Household assets & house characteristics Y Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.66 ρ(y, yh) 0.66 0.66 0.65 0.65 0.65 0.65 0.65 0.67 N 6,521 6,521 6,521 6,521 6,521 6,521 6,521 6,521 True poverty rate 13.3 (0.6) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2016 use the estimated parameters based on the 2014 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table D.9. Predicted Poverty Rates Based on Imputation with Dummy Variables Indicating More Disaggregated Food Item Consumption in Urban Areas from 2010 to 2013, Malawi (percentage) 2013 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 16.1 16.4 16.2 16.2 15.4 16.7 15.8 15.6 1) Normal linear regression model (2.5) (2.6) (2.6) (2.5) (2.4) (2.5) (2.6) (2.5) 2) Empirical distribution of the error 15.7 16.3 16.0 15.7 15.1 16.4 15.6 15.3 terms (2.5) (2.6) (2.6) (2.5) (2.4) (2.6) (2.7) (2.5) Control variables Had maize expenditures Y Had meat expenditures Y Had fish/seafood expenditures Y Had vegetables and fruits expenditures Y Had cooking oil expenditures Y Had milk/milk products expenditures Y Had drink expenditures Y Had food away from home expenditures Y Household assets & house Y Y Y Y Y Y Y Y characteristics Demographics & employment Y Y Y Y Y Y Y Y R2 0.76 0.78 0.77 0.76 0.77 0.78 0.77 0.77 ρ(y, yh) 0.74 0.77 0.74 0.66 0.73 0.77 0.74 0.76 N 1,046 1,046 1,046 1,046 1,046 1,046 1,046 1,046 True poverty rate 25.2 (4.2) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2013 use the estimated parameters based on the 2010 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table D.10. Predicted Poverty Rates Based on Imputation with Dummy Variables Indicating More Disaggregated Food Item Consumption in Rural Areas from 2010 to 2013, Malawi (percentage) 2013 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 45.6 45.1 46.3 45.6 43.4 46.1 42.8 45.6 1) Normal linear regression model (1.4) (1.5) (1.4) (1.4) (1.4) (1.5) (1.5) (1.4) 2) Empirical distribution of the error 45.6 45.3 46.4 45.6 43.4 46.0 42.8 45.6 terms (1.5) (1.5) (1.4) (1.5) (1.4) (1.5) (1.6) (1.5) Control variables Had maize expenditures Y Had meat expenditures Y Had fish/seafood expenditures Y Had vegetables and fruits expenditures Y Had cooking oil expenditures Y Had milk/milk products expenditures Y Had drink expenditures Y Had food away from home expenditures Y Household assets & house Y Y Y Y Y Y Y Y characteristics Demographics & employment Y Y Y Y Y Y Y Y R2 0.54 0.61 0.57 0.54 0.58 0.56 0.60 0.54 ρ(y, yh) 0.54 0.60 0.56 0.53 0.56 0.57 0.59 0.55 N 2,954 2,954 2,954 2,954 2,954 2,954 2,954 2,954 True poverty rate 40.3 (1.8) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2013 use the estimated parameters based on the 2010 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table D.11. Predicted Poverty Rates Based on Imputation with Dummy Variables Indicating More Disaggregated Food Item Consumption in Urban Areas from 2010/11 to 2016/17, Malawi (percentage) 2016/17 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 33.4 33.7 33.3 33.3 32.8 32.6 34.0 33.1 1) Normal linear regression model (2.2) (2.1) (2.1) (2.2) (2.1) (2.2) (2.2) (2.1) 2) Empirical distribution of the error 33.7 34.2 33.6 33.7 33.4 32.9 34.5 33.4 terms (2.2) (2.1) (2.2) (2.2) (2.2) (2.2) (2.2) (2.1) Control variables Had maize expenditures Y Had meat expenditures Y Had fish/seafood expenditures Y Had vegetables and fruits expenditures Y Had cooking oil expenditures Y Had milk/milk products expenditures Y Had drink expenditures Y Had food away from home expenditures Y Household assets & house characteristics Y Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.75 0.77 0.75 0.75 0.76 0.78 0.77 0.75 ρ(y, yh) 0.73 0.77 0.73 0.73 0.74 0.76 0.75 0.73 N 2,272 2,272 2,272 2,272 2,272 2,272 2,272 2,272 True poverty rate 17.7 (1.8) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2016/17 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table D.12. Predicted Poverty Rates Based on Imputation with Dummy Variables Indicating More Disaggregated Food Item Consumption in Rural Areas from 2010/11 to 2016/17, Malawi (percentage) 2016/17 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 67.6 71.0 68.0 67.6 65.4 68.3 70.3 67.8 1) Normal linear regression model (0.7) (0.7) (0.8) (0.7) (0.8) (0.7) (0.7) (0.7) 2) Empirical distribution of the error 67.5 70.9 68.0 67.5 65.4 68.1 70.2 67.7 terms (0.7) (0.7) (0.7) (0.7) (0.8) (0.7) (0.7) (0.7) Control variables Had maize expenditures Y Had meat expenditures Y Had fish/seafood expenditures Y Had vegetables and fruits expenditures Y Had cooking oil expenditures Y Had milk/milk products expenditures Y Had drink expenditures Y Had food away from home expenditures Y Household assets & house characteristics Y Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.50 0.59 0.52 0.50 0.55 0.52 0.58 0.50 ρ(y, yh) 0.49 0.57 0.52 0.48 0.54 0.52 0.54 0.49 N 10,174 10,174 10,174 10,174 10,174 10,174 10,174 10,174 True poverty rate 59.4 (1.0) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2016/17 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table D.13. Predicted Poverty Rates Based on Imputation with Dummy Variables Indicating More Disaggregated Food Item Consumption in Urban Areas from 2010/11 to 2012/13, Nigeria (percentage) 2012/13 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 12.2 12.5 12.4 12.2 12.1 12.2 12.3 12.6 1) Normal linear regression model (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) 2) Empirical distribution of the error 12.1 12.3 12.2 12.1 12.0 12.1 12.1 12.6 terms (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) Control variables Had maize expenditures Y Had meat expenditures Y Had fish/seafood expenditures Y Had vegetables and fruits expenditures Y Had cooking oil/fats expenditures Y Had milk/milk products expenditures Y Had drink/beverages expenditures Y Had food away from home expenditures Y Household assets & house characteristics Y Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.56 0.57 0.56 0.56 0.56 0.56 0.57 0.57 ρ(y, yh) 0.55 0.54 0.55 0.53 0.55 0.54 0.54 0.54 N 1,340 1,340 1,340 1,340 1,340 1,340 1,340 1,340 True poverty rate 9.8 (1.2) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2012/13 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Consumption expenditures in post-harvest period are measured in 2011 PPP$. The poverty line is set at $1.90 in 2011 PPP$ Table D.14. Predicted Poverty Rates Based on Imputation with Dummy Variables Indicating More Disaggregated Food Item Consumption in Rural Areas from 2010/11 to 2012/13, Nigeria (percentage) 2012/13 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 37.4 37.5 37.4 37.4 37.2 37.1 37.6 38.4* 1) Normal linear regression model (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) 2) Empirical distribution of the error 36.4 36.5 36.2 36.3 36.0 36.1 36.6 37.3 terms (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) (0.1) Control variables Had maize expenditures Y Had meat expenditures Y Had fish/seafood expenditures Y Had vegetables and fruits expenditures Y Had cooking oil/fats expenditures Y Had milk/milk products expenditures Y Had drink/beverages expenditures Y Had food away from home expenditures Y Household assets & house characteristics Y Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.51 0.52 0.51 0.51 0.50 0.51 0.52 0.51 ρ(y, yh) 0.55 0.54 0.55 0.53 0.55 0.54 0.54 0.54 N 3,065 3,065 3,065 3,065 3,065 3,065 3,065 3,065 True poverty rate 39.6 (1.5) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2012/13 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Consumption expenditures in post-harvest period are measured in 2011 PPP$. The poverty line is set at $1.90 in 2011 PPP$ Table D.15. Predicted Poverty Rates Based on Imputation with Dummy Variables Indicating More Disaggregated Food Item Consumption in Urban Areas from 2008/09 to 2010/11, Tanzania (percentage) 2010/11 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 5.7* 5.7* 5.7* 5.7* 5.7* 5.7* 5.9* 5.2* 1) Normal linear regression model (1.1) (1.1) (1.1) (1.1) (1.1) (1.1) (1.1) (1.1) 2) Empirical distribution of the error 5.6* 5.5* 5.6* 5.5* 5.6* 5.6* 5.7* 5.1* terms (1.1) (1.1) (1.1) (1.1) (1.1) (1.1) (1.1) (1.1) Control variables Had maize expenditures Y Had meat expenditures Y Had fish/seafood expenditures Y Had vegetables and fruits expenditures Y Had cooking oil/fats expenditures Y Had milk/milk products expenditures Y Had drink/beverages expenditures Y Had food away from home expenditures Y Household assets & house characteristics Y Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.62 0.65 0.62 0.62 0.62 0.64 0.62 0.64 ρ(y, yh) 0.60 0.60 0.58 0.61 0.58 0.60 0.59 0.62 N 1,253 1,253 1,253 1,253 1,253 1,253 1,253 1,253 True poverty rate 5.3 (0.9) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2010/11 use the estimated parameters based on the 2008/09 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table D.16. Predicted Poverty Rates Based on Imputation with Dummy Variables Indicating More Disaggregated Food Item Consumption in Rural Areas from 2008/09 to 2010/11, Tanzania (percentage) 2010/11 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 15.9 15.3 15.9 16.0 15.3 15.9 17.0 15.3 1) Normal linear regression model (1.1) (1.0) (1.1) (1.1) (1.1) (1.1) (1.1) (1.1) 2) Empirical distribution of the error 15.4 14.9 15.4 15.5 14.8 15.3 16.3 14.7 terms (1.1) (1.0) (1.1) (1.1) (1.0) (1.0) (1.1) (1.0) Control variables Had maize expenditures Y Had meat expenditures Y Had fish/seafood expenditures Y Had vegetables and fruits expenditures Y Had cooking oil/fats expenditures Y Had milk/milk products expenditures Y Had drink/beverages expenditures Y Had food away from home expenditures Y Household assets & house characteristics Y Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.34 0.42 0.33 0.34 0.36 0.37 0.37 0.36 ρ(y, yh) 0.38 0.45 0.39 0.37 0.40 0.38 0.42 0.42 N 2,570 2,570 2,570 2,570 2,570 2,570 2,570 2,570 True poverty rate 22.4 (1.4) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2010/11 use the estimated parameters based on the 2008/09 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table D.17. Predicted Poverty Rates Based on Imputation with Dummy Variables Indicating More Disaggregated Food Item Consumption in Urban Areas from 2010/11 to 2012/13, Tanzania (percentage) 2012/13 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 4.4 4.8* 4.6* 4.5 4.6* 4.8* 4.7* 4.8* 1) Normal linear regression model (0.8) (0.9) (0.9) (0.9) (0.9) (0.9) (0.9) (0.9) 2) Empirical distribution of the error 4.5 4.7* 4.7* 4.6* 4.6* 4.7* 4.7* 4.7* terms (0.9) (0.9) (0.9) (0.9) (0.9) (0.9) (0.9) (0.9) Control variables Had maize expenditures Y Had meat expenditures Y Had fish/seafood expenditures Y Had vegetables and fruits expenditures Y Had cooking oil/fats expenditures Y Had milk/milk products expenditures Y Had drink/beverages expenditures Y Had food away from home expenditures Y Household assets & house characteristics Y Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.66 ρ(y, yh) 0.59 0.58 0.57 0.59 0.59 0.58 0.58 0.62 N 1,722 1,722 1,722 1,722 1,722 1,722 1,722 1,722 True poverty rate 5.6 (1.0) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2012/13 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table D.18. Predicted Poverty Rates Based on Imputation with Dummy Variables Indicating More Disaggregated Food Item Consumption in Rural Areas from 2010/11 to 2012/13, Tanzania (percentage) 2012/13 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 21.6 22.5 21.5 21.5 21.7 21.5 22.7 21.4 1) Normal linear regression model (1.2) (1.2) (1.2) (1.2) (1.2) (1.2) (1.2) (1.2) 21.8 22.7 21.9 21.8 22.0 21.7 23.0 21.7 2) Empirical distribution of the error terms (1.2) (1.2) (1.2) (1.1) (1.2) (1.1) (1.2) (1.2) Control variables Had maize expenditures Y Had meat expenditures Y Had fish/seafood expenditures Y Had vegetables and fruits expenditures Y Had cooking oil/fats expenditures Y Had milk/milk products expenditures Y Had drink/beverages expenditures Y Had food away from home expenditures Y Household assets & house characteristics Y Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y R2 0.45 0.51 0.45 0.45 0.46 0.46 0.48 0.47 ρ(y, yh) 0.41 0.48 0.44 0.43 0.43 0.44 0.44 0.44 N 3,136 3,136 3,136 3,136 3,136 3,136 3,136 3,136 True poverty rate 26.3 (1.3) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2012/13 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Appendix E: Estimation of Imputation Models without Employment Variables Table E.1. Predicted Poverty Rates Based on Imputation from 2014 to 2016, Vietnam (percentage) 2016 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 15.1 13.4 6.1 8.4 10.6 12.5 13.3 11.4 9.6* 1) Normal linear regression model (0.5) (0.5) (0.4) (0.4) (0.4) (0.5) (0.5) (0.5) (0.4) 14.7 13.1 6.0 8.3 10.4 12.3 13.1 11.2 9.0 2) Empirical distribution of the error terms (0.5) (0.5) (0.4) (0.4) (0.4) (0.5) (0.5) (0.5) (0.4) Control variables Food expenditures Y Non-food expenditures Y Durables expenditures Y Health expenditures Y Education expenditures Y Electricity, water, & garbage expenditures Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics Y Y Y Y Y Y Y Y Y R2 0.46 0.69 0.86 0.94 0.73 0.71 0.69 0.70 0.56 ρ(y, yh) 0.46 0.69 0.86 0.94 0.74 0.71 0.70 0.70 0.56 N 9347 9347 9347 9347 9347 9347 9347 9347 9347 True poverty rate 9.6 (0.4) Note: Estimates that fall within the 95% CI of the true rates are shown in bold; estimates that fall within one standard error of the true rates are shown in bold and with a star "*". Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2016 use the estimated parameters based on the 2014 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table E.2. Predicted Poverty Rates Based on Imputation from 2010 to 2013, Malawi (percentage) 2013 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 38.9* 40.9 35.7 40.2 39.4* 40.8 40.6 41.8 40.3 1) Normal linear regression model (1.3) (1.4) (1.5) (1.5) (1.4) (1.4) (1.4) (1.4) (1.3) 39.3* 40.9 36.0 40.4 39.6 41.0 40.7 42.2 40.9 2) Empirical distribution of the error terms (1.3) (1.4) (1.5) (1.5) (1.4) (1.4) (1.4) (1.4) (1.3) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics Y Y Y Y Y Y Y Y Y R2 0.51 0.68 0.93 0.87 0.71 0.69 0.68 0.71 0.58 ρ(y, yh) 0.48 0.64 0.92 0.86 0.68 0.65 0.65 0.68 0.54 N 4,000 4,000 4,000 4,000 4,000 4,000 4,000 4,000 4,000 True poverty rate 37.9 (1.7) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2013 use the estimated parameters based on the 2010 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table E.3. Predicted Poverty Rates Based on Imputation from 2010/11 to 2012/13, Nigeria (percentage) 2012/13 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 31.4 29.2* 27.0 31.9 29.3* 29.1* 29.7* 29.2* 31.2 1) Normal linear regression model (1.1) (1.2) (1.2) (1.2) (1.2) (1.1) (1.2) (1.2) (1.1) 31.3 29.3* 27.1 31.9 29.3* 29.1* 29.8* 29.2* 31.2 2) Empirical distribution of the error terms (1.1) (1.2) (1.2) (1.2) (1.2) (1.1) (1.2) (1.2) (1.1) Control variables Food expenditures Y Non-food expenditures Y Infrequent non-food expenditures Y Health expenditures Y Education expenditures Y Utilities: electricity, fuel, water, garbage Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics Y Y Y Y Y Y Y Y Y R2 0.43 0.56 0.92 0.73 0.57 0.57 0.57 0.56 0.45 ρ(y, yh) 0.42 0.55 0.93 0.75 0.56 0.57 0.57 0.54 0.45 N 4,406 4,406 4,406 4,406 4,406 4,406 4,406 4,406 4,406 True poverty rate 28.7 (1.2) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2012/13 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Consumption expenditures are measured in 2011 PPP$. The poverty line is set at $1.90 in 2011 PPP$. Table E.4. Predicted Poverty Rates Based on Imputation from 2010/11 to 2012/13, Tanzania (percentage) 2012/13 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 18.1 17.3 18.6 21.3* 17.4 17.5 17.3 19.2 21.4* 1) Normal linear regression model (0.9) (0.9) (1.0) (1.0) (0.9) (0.9) (0.9) (1.0) (1.0) 17.9 17.1 18.4 20.8* 17.2 17.3 17.1 19.0 21.2* 2) Empirical distribution of the error terms (0.9) (0.9) (1.0) (1.0) (0.9) (0.9) (0.9) (1.0) (1.0) Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics Y Y Y Y Y Y Y Y Y R2 0.44 0.59 0.92 0.75 0.60 0.61 0.59 0.60 0.49 ρ(y, yh) 0.42 0.58 0.93 0.75 0.58 0.59 0.57 0.59 0.48 N 4,858 4,858 4,858 4,858 4,858 4,858 4,858 4,858 4,858 True poverty rate 20.8 (1.0) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2012/13 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table E.5. Predicted Poverty Rates Based on Imputation Using Geospatial Data from 2014 to 2016, Vietnam (percentage) 2016 Method Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 13.4 13.3 13.3 9.7* 9.6* 9.6* 1) Normal linear regression model (0.5) (0.5) (0.5) (0.4) (0.4) (0.4) 13.2 13.1 13.1 9.2* 9.1* 9.0 2) Empirical distribution of the error terms (0.5) (0.5) (0.5) (0.4) (0.4) (0.4) Control variables Distances to facilities Y Y Nightlight intensity Y Y Agricultural soil quality index Y Y Electricity, water, & garbage expenditures Y Y Y Household assets & house characteristics Y Y Y Demographics Y Y Y Y Y Y R2 0.69 0.69 0.69 0.57 0.56 0.56 ρ(y, yh) 0.70 0.70 0.69 0.58 0.58 0.57 N 9326 9326 9326 9326 9326 9326 True poverty rate 9.6 (0.4) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ commune random effects. Imputed poverty rates for 2016 use the estimated parameters based on the 2014 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table E.6. Predicted Poverty Rates Based on Imputation Using Geospatial Variables from 2010 to 2013, Malawi (percentage) 2013 Method Model 1 Model 3 Model 4 Model 6 53.0 40.8 51.5 40.3 1) Normal linear regression model (1.3) (1.4) (1.3) (1.3) 53.1 40.9 52.1 40.9 2) Empirical distribution of the error terms (1.3) (1.4) (1.3) (1.3) Control variables Distances to facilities Y Y Agricultural soil quality index Y Y Utilities: water, fuel, gas, electricity Y Y Household assets & house characteristics Y Y Demographics Y Y Y Y R2 0.69 0.68 0.59 0.58 ρ(y, yh) 0.62 0.64 0.51 0.54 N 4,000 4,000 4,000 4,000 True poverty rate 37.9 (1.7) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2013 use the estimated parameters based on the 2010 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Table E.7. Predicted Poverty Rates Based on Imputation Using Geospatial Variables from 2010/11 to 2012/13, Nigeria (percentage) 2012/13 Method Model 1 Model 3 Model 4 Model 6 29.4* 29.1* 30.9 31.1 1) Normal linear regression model (1.2) (1.1) (1.2) (1.1) 29.3* 29.1* 30.8 31.1 2) Empirical distribution of the error terms (1.2) (1.2) (1.2) (1.1) Control variables Distances to facilities Y Y Agricultural soil quality index Y Y Utilities: electricity, fuel, water, garbage Y Y Household assets & house characteristics Y Y Demographics Y Y Y Y R2 0.56 0.56 0.46 0.45 ρ(y, yh) 0.55 0.55 0.43 0.46 N 4,406 4,406 4,406 4,406 True poverty rate 28.7 (1.2) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2012/13 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data. Consumption expenditures in post-harvest period are measured in 2011 PPP$. The poverty line is set at $1.90 in 2011 PPP$. Table E.8. Predicted Poverty Rates Based on Imputation Using Geospatial Variables from 2010/11 to 2012/13, Tanzania (percentage) 2012/13 Method Model 1 Model 3 Model 4 Model 6 17.2 17.3 21.2* 21.4* 1) Normal linear regression model (0.9) (0.9) (1.0) (1.0) 17.0 17.0 21.0* 21.2* 2) Empirical distribution of the error terms (0.9) (0.9) (1.0) (1.0) Control variables Distances to facilities Y Y Agricultural soil quality index Y Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Demographics Y Y Y Y R2 0.59 0.59 0.49 0.49 ρ(y, yh) 0.56 0.57 0.47 0.47 N 4,837 4,837 4,837 4,837 True poverty rate 20.9 (1.0) Note: Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true poverty rate. Standard errors in parentheses are adjusted for complex survey design. All estimates are obtained with population weights. Method 1 uses the normal linear regression model with the theoretical distribution of the error terms and Method 2 uses the empirical distribution of the error terms. Both methods employ cluster random effects. Imputed poverty rates for 2012/13 use the estimated parameters based on the 2010/11 data. 100 simulations are implemented. True poverty rate is the estimate directly obtained from the survey data.