Policy Research Working Paper 10867 Imputing Poverty Indicators without Consumption Data An Exploratory Analysis Hai-Anh H. Dang Talip Kilic Kseniya Abanokova Calogero Carletto Development Economics Development Data Group August 2024 Policy Research Working Paper 10867 Abstract Accurate poverty measurement relies on household con- Adding household utility expenditures or food expenditures sumption data, but such data are often inadequate, outdated, to basic imputation models with household-level demo- or display inconsistencies over time in poorer countries. graphic, employment, and asset variables could improve the To address these data challenges, this paper employs sur- probability of imputation accuracy by 0.1 to 0.4. Adding vey-to-survey imputation to produce estimates for several predictors from geospatial data could further increase impu- poverty indicators, including headcount poverty, extreme tation accuracy. The analysis also shows that a larger time poverty, poverty gap, near-poverty rates, as well as mean con- interval between surveys is associated with a lower probabil- sumption levels and the entire consumption distribution. ity of predicting some poverty indicators, and that a better Analysis of 22 multi-topic household surveys conducted imputation model goodness-of-fit (R2) does not necessarily over the past decade in Bangladesh, Ethiopia, Malawi, help. The results offer cost-saving inputs for future survey Nigeria, Tanzania, and Viet Nam yields encouraging results. design. This paper is a product of the Development Data Group, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at hdang@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Imputing Poverty Indicators without Consumption Data: An Exploratory Analysis Hai-Anh H. Dang, Talip Kilic, Kseniya Abanokova, and Calogero Carletto* Keywords: consumption, poverty, survey-to-survey imputation, household surveys, Viet Nam, Ethiopia, Malawi, Nigeria, Tanzania, Sub-Saharan Africa. JEL Codes: C15, I32, O15. * Dang (hdang@worldbank.org; corresponding author) is a senior economist in the Living Standards and Measurement Unit, Development Data Group, World Bank and is also affiliated with GLO, IZA, Indiana University, and London School of Economics and Political Science; Kilic (tkilic@worldbank.org) is the senior program manager of the Living Standards Measurement Unit, Development Data Group, World Bank; Abanokova (kabanokova@worldbank.org) is an economist in the Living Standards Measurement Unit, Development Data Group, World Bank and is a junior research fellow at the Higher School of Economics, National Research University, Russia; Carletto (gcarletto@worldbank.org) is the senior manager of the Strategy and Collaboratives Unit, Development Data Group, World Bank. We would like to thank Madeleine Gauthier, Nathan Ives, Hoa Nguyen, Lars Osberg, Anne Swindale, Petra Todd, and seminar participants at Australian National University, the Center for the Study of African Economies conference (Oxford), IARIW-TNBS conference (Arusha), Econometric Society’s Summer Meeting (Vanderbilt), and Statistics Canada’s International Methodology Symposium for their helpful discussion and feedback on the earlier drafts. We are grateful for the funding from the United States Agency for International Development (USAID). 1. Introduction Accurate poverty measurement is the prerequisite for policies aiming at reducing poverty. Yet, development practitioners face the typical challenges that the available household survey data underlying poverty estimates are either inadequate (e.g., do not offer nationally representative estimates) or outdated (e.g., do not offer timely estimates of poverty trends). Worse still, in the few countries where survey capacity is well established, data were known to turn out to exhibit varying degrees of incompatibilities over time due to changes in survey design (Deaton and Kozel, 2005). These data challenges could likely hinder effective policy implementation, especially for poorer countries with low statistical capacity (Devarajan, 2013; Jerven, 2019). 1 To address these challenges, alternative methods to obtain poverty estimates that rely on data imputation (instead of direct data collection through surveys) have become increasingly common (World Bank, 2021; Dang and Lanjouw, 2023). 2 Building on the seminal technique that imputes from a household consumption survey into a census to generate poverty maps (Elbers et al., 2003), recent studies have imputed from a household consumption survey into another survey to provide poverty estimates. 3 The central idea is to build an imputation model using appropriate predictor variables from an existing older consumption survey, which can be subsequently applied to the same variables in a more recent survey (that does not collect consumption data) to provide poverty estimates for the latter survey. 1 Serajuddin et al. (2015) show that over the period 2002-2011, of the 155 countries for which the World Bank monitors poverty data using the World Development Indicators (WDI) database, almost one-fifth (i.e., 28) have only one poverty data point and as many as 29 countries do not have any poverty data point in the same period. Furthermore, poorer countries have fewer surveys: a 10-percent increase in a country’s household consumption level is associated with almost one-third (i.e., 0.3) more surveys (Dang, Jolliffe, and Carletto, 2019). The ongoing Covid-19 pandemic could increase poverty and further exacerbate these data deprivations and digital divides for poor countries (Naude and Vinuesa, 2020). 2 Imputation techniques are regularly used by international organizations and national statistical agencies to fill in missing data gaps such as education statistics (UOE, 2020) and income data (US Census Bureau, 2017). 3 The poverty-mapping technique combines a household consumption survey and a non-consumption census, which allows us to provide poverty estimates at a more disaggregated level than available in the household survey. 2 Building on Elbers et al.’s (2003) method, recent studies have innovated in various aspects. These include combining data between a household consumption survey and a different survey (Stifel and Christiaensen, 2007; Douidich et al., 2016), modeling techniques for the error terms or standard errors (Tarozzi, 2007; Mathiassen, 2009; Dang, Lanjouw, and Serajuddin, 2017), and experimenting with survey design and selecting suitable variables (Kilic and Sohnesen, 2019; Christiaensen, Ligon, and Sohnesen, 2022; Dang et al., forthcoming). Most recently, poverty imputation has been employed to provide estimates for hard-to-find refugee population groups that are not typically captured in the standard household survey (Altındağ et al., 2021; Beltramo et al., 2024; Dang and Verme, 2023). Reviewing some key studies in the past 20 years covering poor and middle-income countries ranging from India, Jordan, and Sub-Saharan African countries to Viet Nam, Dang et al. (forthcoming) observe that imputation-based poverty estimates can perform reasonably well against the survey-based poverty estimates using actual consumption data. Further analyzing data from 14 rounds of multi-topic household surveys conducted over the past decade in Ethiopia, Malawi, Nigeria, Tanzania, and Viet Nam, the authors find that rather parsimonious imputation models consisting of household-level demographic and employment variables and household utility expenditures could provide accurate estimates, which even fall within the more rigorous precision criteria of being within one standard error of the true poverty rates in many cases. This paper makes several new contributions to the literature on survey-to-survey imputation of poverty estimates, both conceptually and empirically. On the conceptual front, we significantly expand this literature to various common poverty indicators such as i) near-poverty (vulnerability) status, ii) extreme poverty, iii) poverty gap, and iv) other Foster, Greer and Thorbecke (FGT) poverty indices. Furthermore, we also examine the performance of the imputed consumption 3 distribution against the distribution of the actual household consumption data, which underlie these poverty indicators. These extensions set our paper apart from the existing literature, which almost exclusively focuses on the headcount poverty rate. Indeed, to our knowledge, this is the first study that attempts to provide a comprehensive and systematic examination of these various poverty indicators as well as the entire consumption distribution. Empirically, for illustrations we harmonize and rigorously analyze data from 22 recent rounds of multi-topic household surveys conducted over the past decade in Bangladesh, Ethiopia, Malawi, Nigeria, Tanzania, and Viet Nam. These six countries span three regions (i.e., Sub-Saharan Africa, South Asia, and Southeast Asia) and different income levels (i.e., low-income to lower-middle- income), and thus exhibit more heterogeneity regarding income levels, geographical variations, and population sizes than previous studies. To our knowledge, our study offers an application of survey-to-survey imputation to the most comprehensive dataset that has been analyzed to date. Consequently, our findings would make a useful contribution to future survey-to-survey imputation efforts. 4 We find that (imputation) model heterogeneity exists, with certain models performing better for some poverty indicators and the consumption distribution only. In particular, two models perform better than the others. One model consists of adding food expenditures to household demographic and employment characteristics and house assets (Model 3), and the other model consists of adding household utility consumption expenditures (including electricity, water, and garbage) to household demographic and employment characteristics (Model 9). Model 3 works reasonably well for headcount poverty, extreme poverty, poverty gap, and consumption mean, 4 The existing study with the most comprehensive dataset is Dang et al. (forthcoming), which analyzes data from 14 survey rounds in Ethiopia, Malawi, Nigeria, Tanzania, and Viet Nam. This study focuses on headcount poverty alone. 4 raising the probability of accurate imputation for these indicators by around 0.3 (compared to a reference model with just household demographic and employment characteristics). Compared to Model 3, Model 9 performs slightly better for headcount poverty, raising the probability of imputation accuracy by 0.4. It also raises the probability of imputation accuracy for near-poverty, extreme poverty, poverty gap, and mean consumption by around 0.1-0.2. Further adding agricultural soil quality information to Model 9 results in higher imputation accuracy (and stronger statistical significance) for headcount poverty, increasing the probability of imputation accuracy by 0.5. Models 3 and 9 also perform better than the other for imputing the consumption distribution. Finally, a larger time interval between the base survey and the target survey is associated with lower imputation accuracy, but a better model goodness-of-fit (R2) does not appear to help. This paper consists of six sections. We discuss the analytical framework in the next section before describing the data in Section 3. We subsequently present in Section 4 the main estimation results using the latest survey rounds for each country before summarizing the results using all the available survey rounds for all the countries (Section 4.1). We further extend the analysis to a more general setting (Section 4.2), such as using other FGT indexes that are more sensitive to the poor and estimates for the entire consumption distribution before discussing a more specific application, within-year imputation. We offer meta-analysis results on model selection in Section 5 and finally conclude in Section 6. 2. Analytical Framework 2.1. Imputation Model A household maximizes utility subject to an income budget constraint that includes choice variables such as quantities of goods, durables, and leisure (or labor supply) (Deaton and 5 Muellbauer, 1980). This results in the common practice that total household consumption is constructed as an aggregate of consumption of different items such as food, non-food (including clothing, education, and/or health expenses), durable goods, and housing (Deaton and Zaidi, 2002). It follows that a model of (log) household consumption per capita ( ) is typically estimated using the following reduced-form linear model for survey j, for j= 1, 2, = ′ + (1) where can include household variables such as the household head’s age, sex, education, occupation, ethnicity, religion, and language—which can represent household tastes. 5 can also include household assets or incomes, and is the error term (see, e.g., Elbers et al., 2003; Ravallion, 2016). We employ Dang et al.’s (2017) method as the imputation tool in this paper, which we briefly describe next. For better accuracy, the error term is further broken down into two components, a cluster random effects term ( ) and an idiosyncratic error term ( ). Conditional on the characteristics, the cluster random effects and the error term are assumed uncorrelated with each 2 2 other and to follow a normal distribution such that | ~(0, ) and | ~(0, ). We relax this assumption later and employ an alternative approach where we use the empirical distribution of the error terms instead. Household consumption (or income) data exist in one survey but are missing in the other survey, thus without loss of generality, let survey 1 and survey 2 respectively represent the survey 5 More generally, j can be larger than 2 and can indicate any type of relevant surveys that collect household data sufficiently relevant for imputation purposes such as labor force surveys or demographic and health surveys. To make the notation less cluttered, we do not show the subscript for households in the equations. It is also standard practice with household survey analysis to transform the consumption variable to logarithmic scale to help improve the model fit. 6 with and without household consumption data, and y1 represent household consumption in survey 1. More generally, these two surveys can be either in the same period or in different periods. Our objective is thus to impute the missing consumption data in survey 2, given that consumption data is available in survey 1 only, and the survey characteristics xj are available in both surveys. Note that while we do have consumption data for survey 2, for validation purposes, we assume that household consumption data in this survey round were unavailable. Writing out Equation (1) we have 1 = 1 ′1 + 1 + 1 (2) Equation (2) provides a standard linear random effects model that can be estimated using most available statistical packages. Applying the parameters obtained from Equation (2) to the variables in survey 2, the imputed household consumption in this survey round is given by 6 1 2 = 1 ′2 + 1 + 1 (3) While equations (1) and (2) can also be specified as a simple OLS model (i.e., with the random effects being subsumed into the error terms), modeling the random effects explicitly would help improve the precision of the estimation results. Indeed, the advantage of the random effects model over the OLS model is that the former can better capture the between-cluster variations thanks to the additional information offered by the random effects. This role of is especially important under our estimation framework since the random effects are instrumental not only in estimating but also our estimates of poverty in survey 2 as a component of the predicted 6 This assumes that the returns to the characteristics xj are captured by equations (1) and (2) and precludes the (perhaps exceptionally) rare situations where there could be no correlation between these characteristics and household consumption due to unexpected upheavals in the economy or calamitous disasters. Contexts where there are sudden changes to the economic structures (e.g., overnight regime change) may also introduce noise into the comparability of the estimated parameters, but (variants of) this imputation approach has been found to be rather robust to such changes; see our discussion later. 7 household consumption. Put differently, is utilized for both the point estimate of poverty in survey 2 and its standard errors. We are most interested in the poverty estimates for survey 2, where the consumption data are missing. Let z2 be the poverty line in period 2; if y2 existed the poverty rate P2 in this period could be estimated with the following quantity (2 ≤ 2 ) (4) where P(.) is the probability (or poverty) function that gives the percentage of the population that are under the poverty line z2 in survey 2. Since poverty has an inverse relationship with household consumption (i.e., richer households are less likely to be poor), this function is generally non- increasing in household consumption. 7 We further make the following assumptions that underlie the theoretical framework, which we will relax and offer validation tests for in subsequent sections. Assumption 1: Let xj denote the values of the variables observed in survey j, for j= 1, 2, and let Xj denote the corresponding measurements in the population. Then xj are consistent measures of Xj for all j (i.e., xj=Xj for all j). Assumption 1 is crucial for imputation and ensures that the sampled data in survey 1 and survey 2 are each representative of the target population. Put differently, this assumption implies that, for two contemporaneous (i.e., implemented in the same time period) surveys, measurements of the same characteristics x are identical (except for potential sampling errors) since they are consistent measures of the population values; for two non-contemporaneous surveys, these estimates from the two surveys are consistent and comparable over time. While surveys of the same design (and 7 If we impose a more restrictive assumption that | follows the standard normal distribution and combine the estimation of equations (2) and (3) in the same step, then the probit model that directly estimates poverty results (i.e., the estimating equation is ( ) = Φ( ′ + ) with j= 1, 2, where Φ(. )is the cumulative normal distribution). Note that we also assume homoscedasticity of the error terms for simplicity. 8 sample frame) are more likely to be comparable and can thus satisfy Assumption 1, there is no a priori guarantee that these surveys can provide comparable estimate across two different time periods, or even the same estimates in the same time periods. Examples where Assumption 1 may be violated include the cases where national statistical agencies change the questionnaire for the same survey over time, or where one considers different surveys that focus on different population groups (e.g., the average household size may differ between a household survey and a labor force survey depending on the specific definition that is used). Violation of Assumption 1 rules out the straightforward application of the survey-to-survey imputation technique and would require that additional assumptions be made on the relevance of the estimated parameters from one survey to the other. Assumption 2: Let ∆ and ∆ respectively represent the changes in poverty rates and the 2 2 explanatory variables x over time, and the set of parameters ( , , ) that map the variables x into the household consumption space in period j where the consumption data are available. Then ∆ = (∆| ), where P(.) is the given poverty function. Assumption 2 implies that, given Θj or the estimated consumption parameters from survey 1, the changes in the explanatory variables x between the two periods can capture the change in poverty rate in the next period. More intuitively, given the commonly observed variables in the two surveys and their linkage to household consumption, this assumption allows the imputation of the missing household consumption for survey 2. In practical terms it implies that the change in poverty rates over time is attributable to changes in the explanatory variables x rather than the returns to characteristics (or economic structure) and the unexplained characteristics (or random shocks)— which are respectively represented by β1 and (1 , 1). Clearly, this is a testable assumption if household consumption is available for both of the periods under consideration. 9 As discussed earlier, previous studies commonly assume that the distributions of the household consumption parameters β1, 1 and 1 in equations (2) and (3) based on the data in survey (or period) 1 remain the same for the data in survey (period) 2. Assumption 2 is less restrictive since it allows the distributions of these estimated parameters to change over time, as long as the changes in the variables x alone can correctly capture the change in poverty rate. Assumption 2 only requires that overall, the parts of the consumption distributions below the poverty line for both periods (that can be explained by the changes in x in our model) must be equal and not all the percentiles along the consumption distributions must be equal as implied by the assumption made in existing studies; this result is formally stated in Corollary 1.2 below. Given Assumptions 1 and 2, Dang et al. (2017) provide the following proposition that lays out the estimation framework. Proposition 1: Imputation framework Given Assumptions 1 and 2, the poverty rate for period 2 can be predicted using the estimated consumption parameters based on survey 1 and the data in survey 2. In particular, let P(.) be the 1 poverty function and 2 be defined as 1 ′2 + 1 + 1 , we have 1 (2 ) = (2 ) (5) Corollary 1.1 Let β � 1, σ �2υc1 and σ �2 ε1 represent the estimated parameters obtained from equation (2) and let 1 �2, = �1, + ̂ ̂1 ′2 + � �1, and ̂ ̃1, , where � ̃1, represent the s random draw from their estimated th distributions, for s= 1,…, S. The poverty rate 2 in period 2 can be estimated as �2 = 1 ∑ �2 ( 1 , ≤ 1 ) (6) =1 Corollary 1.2 Instead of Assumption 2, assume the traditional but more restrictive assumption that the consumption model parameters and the distributions of the error terms in equation 1 remain the same in period 2 (that is β1 ≡ β2, and 1 and 1 have the same distributions as 2 and 2 respectively). Given Assumption 1 and this stricter assumption, we have 1 (2 ) = (2 ) (7) 10 where W(.) is a general one-to-one mapping welfare function, which includes the poverty function P(.) as a special case. Proof. See Dang et al. (2017). 2.2. Welfare Indicators The poverty indicators that we estimate generally belong to the Foster, Greer and Thorbecke (FGT) (1984) class. Consider - a population of income-receiving units (persons or households), = 1, … , with income and weight . Let = ∑ =1 , when the data are unweighted = 1 and = . The poverty line is and the income gap up to the poverty line for person is (0, − ). The FGT class of poverty indices is given by (− ) (; ) = ∑ =1 � � (8) where = 1 if ≤ and = 0 otherwise. is a given parameter, whose first three non-negative integer values are most commonly used. In particular, (; 0) is the headcount poverty ratio, (; 1) is the (average normalized) poverty gap, and (; 2) is the (average normalized) poverty gap squared. The larger is, the greater the degree of poverty aversion is (i.e., more weights are placed on poorer individuals). The poverty gap measurement as defined by USAID is a modified version of (; 1), which only applies to the poor population (− ) (; 1) = ∑=1 � � (9) where is the number of poor people (i.e., those with income below poverty line ). Hereafter we refer to this indicator as the USAID poverty gap. 11 The near-poverty (vulnerability) rate represents the proportion of the population with an income above the poverty line but below the vulnerable line V = ∑ =1 , ( < ≤ ) (10) where , = 1 if < ≤ is true and = 0 otherwise. is defined as 1.25 times of the poverty line for our analysis. Finally, in addition to the FGT indexes, we also provide imputed estimates of the general distribution of household consumption , which underlies estimation of all the poverty indicators discussed above. This outcome extends our focus on the poorer part of the consumption distribution to the whole distribution. Some additional remarks are useful. First, since the FGT class of poverty indices is monotonic (Foster et al., 2010), it satisfies the one-to-one mapping condition for the welfare function W(.) in Corollary (1.2). Consequently, the imputed poverty estimates are asymptotically equivalent to the true poverty indicators in Equations (8) to (10). Second, since it is not straightforward to obtain analytical formulae for the standard errors for estimates of the poverty indicators in Equations (8) to (10), we provide the bootstrap standard errors for these estimates. 8 Third, the poverty line and the extreme poverty line vary in different countries, so we employ those that are commonly used for each country. We come back to more discussion on the (extreme) poverty lines in the next section. 3. Data 8 Dang et al. (2017) offer an analytical formula for the standard error of the estimated headcount poverty rate, which provides similar estimates to those based on the bootstrap standard errors that we obtain. 12 We analyze multi-topic household survey data from a total of 22 survey rounds from six countries: Bangladesh (3), Ethiopia (1), Malawi (5), Nigeria (3), Tanzania (6), and Viet Nam (4), with the number of survey rounds for each country noted in parentheses. In the four Sub-Saharan African countries (Ethiopia, Malawi, Nigeria, and Tanzania), the data originate from the nationally-representative, multi-topic household surveys that have been implemented by the respective national statistical offices with support from the World Bank Living Standards Measurement Study – Integrated Surveys on Agriculture (LSMS-ISA) initiative. Being similar to the LSMS-type surveys supported by the World Bank, the surveys from Viet Nam are implemented biennially by the country’s General Statistical Office (GSO) with technical support from the World Bank. These surveys are generally regarded as being of high quality and are regularly employed by the national governments, international organizations, and academic researchers to provide estimates on household welfare. 9 The data sets include i. the Bangladesh Integrated Household Survey (BIHS) 2011/12, 2015, and 2018/19 ii. the Ethiopia Socioeconomic Survey (ESS), 2018/19 round iii. the Malawi Integrated Household Survey (IHS), 2010/11, 2016/17, 2019/20 rounds iv. the Malawi Integrated Household Panel Survey (IHPS), 2010 and 2013 rounds v. the Nigeria General Household Survey (GHS)–Panel, 2010/11, 2012/13 and 2018/19 rounds 10 9 For example, Baulch (2011) considers the VHLSSs as having high quality data and heavily uses these surveys for poverty analysis. Other researchers analyze the LSMS-ISA surveys for various topics such as agricultural input uses (Sheahan and Barrett, 2017) or temperature shocks and household consumption (Letta, Montalbano, and Tol, 2018). 10 We did not include the 2015/16 round for Nigeria for comparability issues (e.g., the total consumption aggregate for this round does not include health care expenditures and it is not adjusted using temporal and spatial price deflators). 13 vi. the Tanzania National Panel Survey (TZNPS) 2008/09, 2010/11, 2012/13, 2014/15, 2019/20, and 2020/21 rounds. vii. the Viet Nam Household Living Standards Survey (VHLSS) 2010, 2012, 2014, and 2016 rounds. The sample sizes hover around 3,000 to 5,000 households in each survey round for the LSMS- ISA surveys (including Nigeria and Tanzania), 5,500 to 7,000 households for the BIHSs and the ESS, 9,300 households for the VHLSSs, and over 12,000 households for the Malawi IHS. The consumption data are deflated in the same survey year’s prices and are comparable across survey rounds for each country. 11 The objective is to produce the imputation-based welfare estimates of interest as if we did not have consumption data and then evaluate these imputation-based estimates against those based on the actual survey data (i.e., the “true” welfare rates). For the poverty line, we use the national poverty lines for Ethiopia, Malawi, Tanzania, and Viet Nam and the international poverty lines of $1.90 (in 2011 Purchasing Power Parity (PPP) prices) for Bangladesh and Nigeria. 12 The extreme poverty line is defined as US$1.25 (2011 PPP) per day per capita for Bangladesh and Nigeria and half of the national poverty line for Viet Nam and Ethiopia. For Malawi and Tanzania, we use the national food poverty lines as the extreme poverty lines. 11 In particular, for Bangladesh, Tanzania, and Viet Nam, consumption data are deflated to 2018/19 prices, 2020/21 prices, 2010 prices respectively. For the Malawi IHPSs and IHSs, consumption data are deflated to 2013 prices and 2010/11 prices respectively. 12 For Bangladesh and Nigeria, we employ the international poverty line for analysis since official poverty data for these countries are based on different data sources that are not available to us, such as the Bangladesh Household Income Expenditure Survey (HIES) and the Nigeria Living Standards Surveys (NLSSs). However, note that Nigeria’s national poverty line calculated using NLSS 2018/19 is close to the international poverty line of $1.90 per person per day in 2011 PPP (Lain and Vishwanath, 2022). Furthermore, data comparability issues also exist with various rounds of the Bangladesh HIESs (Fernandez et al., 2024). 14 We prepare and add several geospatial variables for Malawi, Nigeria, Tanzania, and Viet Nam, including the distances from the commune center to various important locations (e.g., the nearest major road and the nearest international land border crossing) and agricultural soil quality. These data are obtained from various sources including the Food and Agriculture Organization (FAO) and are provided together with the LSMS-ISA public use data sets. The exception is Viet Nam where we process these data separately and we could add nighttime light intensity data for this country. 13 There are two data limitations with the BIHSs. The BIHS questionnaires provide inconsistent variables for utilities expenditures (Appendix A, Table A.11), and there are no geo-spatial data for this country. Consequently, we do not estimate certain models (Models 8 and 9) for Bangladesh and exclude this country from the discussion on overall imputation accuracy (Section 4.1) and subsequent meta-analysis (Section 5). The survey rounds listed above share the same sampling frame for each country and are generally regarded as comparable over time by most data users. This satisfies Assumption 1 that the sampled data in round 1 and round 2 are representative of the same population in each period. As LSMS-type surveys, these surveys are also comparable across countries. We provide both across-year and within-year imputation results for all the countries, except for Ethiopia, where we can only analyze one survey round and test within-year imputation. 4. Estimation Results 4.1. Main Results 13 See Tanzania’s National Bureau of Statistics (2011) for more discussion on the geospatial variables in the context of this country. For Viet Nam, we collect and process data from various public data sources including Harmonized World Soil Database, Open Street Map, and NOAA Climate Data. 15 Main results To examine the sensitivity of imputation accuracy to various predictor variables, we build the estimation models on a cumulative basis, with the later models sequentially adding more variables to the basic models (Model 1 or Model 2). On the whole, we employ nine core imputation models across five countries. 14 Model 1 is the most parsimonious (or basic) model and consists of household size, household heads’ age and gender, household heads’ highest completed levels of schooling, a dummy variable indicating whether the head belongs to the ethnic majority group, the shares of household members in the age ranges 0-14, 15-24, 25-59 and 60 and older, a dummy variable indicating whether the head worked in the past 12 months, and a dummy variable indicating urban residence. Model 2 adds household asset variables and house (dwelling) characteristics to Model 1. Household assets include variables indicating whether the household has a car, motorbike, bicycle, desk phone, mobile phone, DVD player, television set, computer, refrigerator, air conditioner, washing machine, or electric fan. House characteristics include the construction materials for the house’s roof and wall and the type of water and toilet the household has access. 15 Models 1 and 2 include standard variables available in most LSMS-type surveys and other types of micro surveys. Model 3 adds total food expenditures to Model 2, and Model 4 adds total non-food expenditures to Model 2. Models 5 to 8 add to Model 2, respectively, durables expenditures, health expenditures, education expenditures, and utilities expenditures (such as on electricity, water, and 14 For misspecified regressions, adding more variables may result in larger inconsistency (Snijders and Bosker, 1994; De Luca, Magnus, and Peracchi, 2018). As such, it is useful to examine imputation accuracy for different models. 15 For Viet Nam, house wall material is assigned numerical values using the following categories: 6 "cement", 5 "brick", 4 "iron/wood", 3 "earth/straw", 2 "bamboo/board", and 1 "others". Toilet type is assigned numerical values using the following categories: 6 "septic", 5 "suilabh", 4 "double septic", 3 "fish bridge", 2 "others", and 1 "none". 16 garbage). All these expenditures are on a per capita (or per adult equivalent) basis and are converted to logarithmic form. Finally, Model 9 adds utilities expenditures to Model 1. The specific predictors used in the imputation models for Equation (2) for each country are provided in Appendix A, Tables A.1 to A.6. For comparison purposes and robustness checks, we use two estimation methods with different assumptions about the error terms. Method 1 uses the normal linear regression model (assuming that the distribution of the error terms follows a normal distribution), and Method 2 uses the empirical distribution of the error terms. Both methods include the random effects at the primary sampling unit for each country. Table 1 (Panel A) provides the imputed poverty rates for 2018/19 for Bangladesh using the 2015 round as the base survey. The estimation results show that all the imputation models, except for Models 1 and 3, provide headcount poverty estimates that are statistically not significantly different (or fall inside the 95 percent confidence interval (CI)) from the “true poverty rate” of 7.3 percent for 2018/19 (i.e., the poverty rate that is estimated using the actual consumption data for 2018/19). Regarding the near-poverty rate, the most basic model (Model 1), as well as Model 3 offer estimates that lie within the 95 percent CI of the true near-poverty rate of 12.2 percent. In fact, the estimates from Model 1 fall inside one standard error of the true near-poverty rate. The estimated extreme poverty rate and poverty gap for Models 3, 4 and 6 fall inside the 95 percent CI from the true rates of, respectively, 0.6 and 1.1 percent for 2018/19. For extreme poverty, Model 4, which controls for non-food expenditure, produces estimates that fall inside one standard error of the true rates. For the USAID poverty gap, Models 3 and 4, which respectively include food or non-food expenditures, work well with estimates that even fall inside one standard error of the true rate. 17 We turn next to the results for other countries, shown respectively in Table 1 (Panels B to E) for Malawi, Nigeria, Tanzania, and Viet Nam. 16 Table 1 (Panel B) provides the predicted poverty rates for 2019/20 for Malawi using the 2016/17 round as the base survey. The estimation results show that six out of nine imputation models, including Model 9, provide headcount poverty estimates that are statistically not significantly different from the true rate of 51.1 percent for 2019/20. These results are consistent with those found by Dang et al. (forthcoming). Regarding the near-poverty rate, except for Model 3, all the other models offer estimates that lie within the 95 percent CI of the true near-poverty rate of 14.1 percent. In fact, almost all these estimates, except for Model 8, even fall inside one standard error of the true near-poverty rate. Again, the predicted extreme poverty rate and poverty gap mostly mirror the estimates for the headcount poverty rate, adding Model 5 to the set of the models with estimates that fall inside the 95 percent CI of the true rate of, respectively, 20.6 and 17.1 percent for 2019/20. Except for Model 3, eight out of nine models yield good estimates for the USAID poverty gap, falling inside the 95 percent CI of the true rate. In fact, almost all of these models (i.e., seven of eight models) produce estimates that fall inside one standard error of the true rate of the poverty gap. Notably, fewer models work for Nigeria, which may be due to a longer time gap between the base and target surveys in Nigeria (Table 1, Panel C). While no model works for headcount poverty and extreme poverty, eight out of nine models for the near-poverty rate, including Model 9, produce estimates that fall inside the 95 percent CI of the true rate of 13.7 percent in 2018/19. Models 1, 2, 7, 8, and 9 provide good estimates for the USAID poverty gap, with estimates even falling inside one standard error of the true rate. 16 Since we only have data for one survey round for Ethiopia, we are unable to provide similar estimates for this country. 18 On the other hand, almost all models work for headcount poverty, near- and extreme poverty rates and the USAID poverty gap in Tanzania, except for Model 3 with poverty gap (Table 1, Panel D). For Viet Nam, Models 4, 5, and 9 each work for two to three out of four indicators only (Table 1, Panel E). More models work for the USAID poverty gap than for the other four indicators in Viet Nam with the estimates from Models 1 to 5 and 7 falling inside the 95 percent CI from the true rate. The results with imputing for mean consumption show a mixed pattern, with certain models performing better for some countries only (Table 2). Model 3 performs well for Bangladesh and Malawi and Model 9 performs well for Viet Nam, while Model 5 performs well for Tanzania and Viet Nam. In Tanzania, five out of nine models produce estimates of mean consumption per capita that fall inside the 95 percent CI from the true mean and Model 6 even falls inside one standard error of the true rate. As an alternative to the normal linear regression model, we employ the empirical distributions of the error terms. The results shown in Appendix A, Tables A.7 and A.8 are qualitatively similar. Overall imputation accuracy The results discussed in Tables 1 and 2 use the most recent pair of survey rounds for each country. But we implemented imputation for the other older surveys for all the countries and years available and we also added geospatial variables where data are available. Given the various across-year imputation model variants that we tested for different countries and years, it is useful to summarize the results graphically. We plot in Figure 1 the imputation accuracy for 15 different models (of which the last two models with nightlight data are for Viet Nam alone), which is defined as the share of the estimates that are not statistically significantly different from the true poverty 19 rate for a model. The measure is computed across all instances of a given model’s estimation with a unique pair of a base survey and a target survey in a given country. These models include the core Models 1 to 9 (shown in Tables A.1 to A.6) and four additional models where we further add geospatial variables to Models 2 and 9. Regarding headcount poverty, Figure 1 suggests that for the first nine models, Models 3 and 9 perform better than average with an imputation accuracy of, respectively, 65 and 69 percent, followed by Model 8 (50 percent). Adding agricultural soil quality and geospatial characteristics, such as soil index and distance to facilities, significantly improved the prediction of Model 9 up to 70 and 75 percent, respectively, but it does not help to improve Model 3. On the other hand, adding geospatial nightlight information to Model 2 increases the accuracy of the prediction up to 67 percent for Viet Nam. Moreover, adding nightlight information to Model 9 increases accuracy up to 83 percent for Viet Nam. Regarding near-poverty, both Model 3 and Model 9 perform better than average with an imputation accuracy of, respectively, 77 and 69 percent, followed by Models 8 and 5 (both up to 65 percent). However, adding geospatial characteristics, such as soil quality and distance to facilities to Model 9, marginally improves imputation accuracy up to 70 percent. Adding nightlight to Model 9 for Viet Nam does not help to improve Model 9. Regarding extreme poverty, Model 3 has the highest imputation accuracy for across all the different models tested – about 69 percent, followed by Model 9 (54 percent) and Model 4 (46 percent). Model 3 also has the highest imputation accuracy for the poverty gap, as it raises the imputation accuracy above the average model performance to 65 percent. 20 Unlike the indicators discussed above, multiple models perform better than average for the USAD poverty gap. Models 8 and 3 have the highest imputation accuracy of 65 percent, followed by Models 1, 2, 7 and 9 (62 percent) and Model 5 (58 percent). We further plot in Figure 2 the imputation accuracy for mean consumption. Model 3, again, is the best performer that achieves an imputation accuracy rate of 65 percent, followed by Model 9 with an imputation accuracy rate of 46 percent. Adding soil quality and distance to facilities to Model 9 increase imputation accuracy for this model to 50 percent and adding nightlight to Model 9 increases imputation accuracy to 100 percent for Viet Nam. Machine learning as alternative We consider machine learning (ML) as an alternative imputation method. 17 The standard ML procedures split a data sample into a training sample (to estimate the imputation model) and an estimation sample (to obtain out-of-sample predictions). In our context, the base survey and the target survey respectively correspond to the training sample and the estimation sample. Employing three common ML techniques, LASSO, Elastic Net, and Random Forest, we show the estimation results in Appendix B, Tables B.1 and B.2 for Tanzania and Viet Nam, respectively. The ML poverty estimates do not work for both countries, except for the estimates of consumption mean that are within one standard error of the true mean for LASSO and Elastic Net in Malawi. 18 These 17 See Mullainathan and Spiess (2017) and Athey and Imbens (2019) for recent reviews of ML in economics. 18 Lasso linear model and Elastic net linear model are trained in the first round and tested against the second round. Lambda in LASSO is selected by 10-fold cross-validation for out-of-sample prediction. Alpha and lambda in Elastic Net are selected by 10-fold cross-validation for out-of-sample prediction. The final selected variables and prediction models with statistics for Lasso and Elastic Net using postselection coefficient estimates are shown in Table B.3 for Tanzania and Table B.4 for Viet Nam (Appendix B). Random forest model is trained in the first round and tested against the second round. The number of sub-trees is set at 1,000. Both out-of-bag error and validation error are used to determine the best possible model. Importance matrix of the variables is shown in Table B.5 for Tanzania and Table B.6 for Viet Nam. 21 inconsistent results are similar to those obtained earlier for poverty imputation in Dang et al. (forthcoming). 4.2. Further Extensions The results shown in the preceding section focus on FGT indexes with ≤ 2 (i.e., headcount poverty, poverty gap, and poverty gap squared) and mean consumption. We further investigate whether these results still hold in more general settings. In particular, we generally consider the entire consumption distribution instead of just the mean consumption. We further provide estimation results for going up to higher values (i.e., up to 5). We also consider within-year imputation results. Entire consumption distribution We plot in Figure 3 the imputation accuracy for different percentiles of the consumption distribution (including the 5th, 10th, 25th, 50th (or median), 75th, 90th, and 95th percentiles), using the latest survey round for each country. While Model 1 works for the upper part of the (consumption) distributions for Bangladesh and Malawi, producing the estimates for the 75th and higher percentiles that are within the 95 percent CI of the true figures (i.e., the gray bandwidths), it mostly works for the lower part of the distribution in Nigeria and Tanzania and does not work in Viet Nam. Model 2 works for the lowest 5th percentile in Bangladesh and Nigeria, and for the lower parts of the distribution in Tanzania, and for the highest 95th percentile in Tanzania and Malawi, but it does not work in Viet Nam. Model 3 works for the distribution from the 10th to 50th percentiles for Bangladesh, from the 10th to 95th percentiles for Malawi and Tanzania, and for the 75th and 95th percentiles for Nigeria. Model 3 does not work in Viet Nam. Models 4 and 6 mostly 22 work in Tanzania, but also in the lowest 5th and 10th percentiles for Bangladesh and both – the highest 5th and 10th percentiles for Viet Nam and the highest 95th percentile in Malawi. Model 5 works for the full distribution in Viet Nam, but mostly works in the lower parts of the distributions for Tanzania and Bangladesh. Models 7, 8 and 9 work well from the 5th to 50th percentiles of the distributions for Tanzania and for the 5th percentile in Nigeria, and Model 9 works from the 5th to 50th percentile for Tanzania and from 5th to 25th percentile for Viet Nam. In summary, Figure 3 suggests that Models 3 and 9 seem to work better than the other models. Figure A.1 in Appendix A provides a summary of the number of models that offer estimates that are not statistically different from the true estimates. For a more in-depth look into these models’ performance, we plot in Figure 4 the results for Model 3, which covers all countries. Except for Viet Nam, Model 3 works reasonably well for predicting consumption values for the entire distributions for all countries with the estimates overlapping with the true rates and their 95 percent CI (i.e., the dotted red line and gray bandwidth). We also plot the results for Model 9 in Appendix A, Figure A.2, which excludes Bangladesh. This figure suggests that Model 9 works well for predicting consumption values for the entire distribution for Tanzania and Viet Nam, while it works starting from the 75th percentile and higher for Malawi and for the lowest part of the distribution in Nigeria. We return to more discussion on the meta-analysis of model performance in Section 5. Other FGT indexes As discussed earlier, a larger in the FGT poverty index suggests more poverty aversion. We consider Tanzania as an example where we let go up to 5 and plot the results in Figure 5. This figure shows that all the imputation models, except for Model 3, work for the FGT indexes with 23 these different values of . When considering the USAID poverty gap, all the models work for all values of , except that Model 3 does not work for falling between 3 and 5 (Appendix A, Figure A.3). Within-year imputation For the within-year imputation, we divide the estimation sample into two random halves for each country. 19 We subsequently use one random half as the base survey and impute from this base survey into the other random half, which serves as the target survey. The estimation results suggest that the within-year imputation works well for most models for every country. Summarizing the results for Bangladesh, Ethiopia, Viet Nam, Malawi, Nigeria, and Tanzania, Figure 6 indicates that the estimates fall within the 95 percent CI of the true poverty rates for the majority of the models. Specifically, out of the nine models considered, at least seven models work for all the countries for all the outcomes, except for Malawi and Nigeria (regarding USAID poverty gap) and Viet Nam (regarding headcount poverty). In Bangladesh, at least four models (out of seven) work for all the outcomes. Yet, for these exceptions, at least three models still work. We offer more detailed results for each country in Figures A.4-A.9 in Appendix A. These results have several practical implications for survey implementation for poverty imputation. First, in contexts where there is only a single base survey at hand, it could be tempting to carry out a similar within-survey imputation exercise and decide on the best performing model to be used for across-year imputation. But we would strongly advise against this approach. The reason is that while all the tested models appear to be achieving comparable within-year imputation 19 We pretend that each household survey offers the universe of households for each country and implement the random sampling method on the sampled households to obtain the random halves. The poverty rates using the actual consumption data for these random halves are thus not identical, but are very close, to those using all the sampled households. 24 performance, only a subset of the models can fulfill across-year imputation needs and provide poverty estimates that are not statistically significantly different from the true poverty rates. Second, on the other hand, these results provide further supportive evidence for those in earlier studies (see, e.g., Dang and Verme (2023) in the context for refugees) that within-year imputation may potentially offer a promising direction to obtain poverty estimates at lower costs for various situations. For example, data may not be collected for a location due to reasons beyond one’s control such as inaccessible roads or unexpected natural calamities (i.e., flood, storms, or landslides), or conflict and violence. Or it can simply be that prohibitively expensive survey costs can prevent data collection at a specific location. In these cases, if the welfare variable exists for another geographical location that is comparable to the location without these data, we can employ our proposed technique to provide imputation-based poverty estimates for the latter location. 20 5. Meta-analysis The analysis shown in Figures 1 and 2 is obtained by simply averaging across the imputation models the results across the countries, the years, as well as other variables (e.g., region or estimation methods). To further take into account the potential contributions from these model characteristics, we estimate the following logit regression with country fixed effects = (∑ =1 ′ + + ) (11) where is a binary variable that equals 1 if the poverty estimate is not statistically significantly different from the true poverty rate and 0 otherwise, for k= 1,.., K models and n= 1,.., N countries. 20 To ensure that geographical locations are comparable, we may need to bring in additional information on other aspects (e.g., some qualitative information about income levels or poverty rates for these regions). 25 1 F(.) is the logit function (i.e., () = 1+ − ). are the dummy variables indicating the imputation models, are the country dummy variables, and is the error term. The dynamics between a country dummy variable and the performance of the imputation models can be captured to varying extents by the characteristics of the imputation models. Consequently, to shed more light on these differences, we can replace the country dummy variables with the model characteristics, to estimate the following alternative equation: = (∑ =1 ′ + ′ + ) (12) where are the model characteristics such as the true poverty rate in the target survey, the (logarithm of) sample size of the base survey, the time difference between the base survey and the target survey, the number of pairs of survey rounds available for analysis, the model goodness-of- fit (as measured by R2), and the estimation method (normal linear regression model or the empirical distribution of the error terms). But the model characteristics can only offer a guide to model selection, since these model characteristics likely represent a correlational—rather than causal— and ex post relationship with the imputation outcomes. While the estimation results would be strongest if they agree under both equations, our preferred equation for interpretation is Equation (11) that clearly lays out the models a priori, particularly where the estimates are different. 21 For easier interpretation, Table 3 shows the marginal effects from the logit regressions for Equations (11) and (12), using Model 1 as the reference model. The associated regression results 21 This concern is particularly relevant to the estimated model parameters (versus the exogenous model parameters given by the data). As an example, the correlation between the model goodness-of-fit statistics R2 (or the correlation between the predicted consumption and the actual consumption for the target survey ρ(y,y)) with the model numbers is around -0.34 and strongly statistically significant for the whole country sample. As such, we do not include them in the regressions for Equations (5) and (6). 26 are presented in Appendix A, Table A.9. 22 We estimate robust standard errors clustered at the country level for both equations. Several interesting findings stand out from Table 3. First, regarding the specific imputation models to use, differences exist by the type of indicator. Out of the main nine models, Model 9 performs the best for headcount poverty; it raises the probability of accurate imputation by 0.4 for both Specification 1 (the preferred Equation (11)) and Specification 2 (Equation (12)), with the results for both specifications being statistically significant at the 5 percent level. Except for the USAID poverty gap, Model 9 also works for most of the remaining indicators to varying degrees. In particular, this model raises the probability of imputation accuracy for near-poverty, extreme poverty, and poverty gap by around 0.1-0.2 (with Specification 1), but the differences are marginally statistically different at the 10 percent level, except for extreme poverty where the difference is strongly statistically different at the 5 percent level. Notably, Model 9 with Specification 2 works well for consumption mean, where it increases the probability of imputation accuracy by 0.2 and this result is strongly statistically different at the 1 percent level. Further adding agricultural soil quality information to Model 9 (i.e., creating Model 13) results in much higher imputation accuracy (and stronger statistical significance) for headcount poverty, increasing the probability of imputation accuracy by 0.5 for both Specifications. Model 3 works for headcount poverty, extreme poverty, poverty gap, and consumption mean, raising the probability of accurate imputation for these indicators by around 0.3 (under Specification 1). The results are strongly statistically different at the 5 percent level or less. For 22 Alternatively, we can employ an ordered logit regression instead where the outcome variable is defined as taking the values of 1 or 2 if the poverty estimate falls within the 95 percent CIs or one standard error around the true poverty rate, and 0 otherwise. The results, shown in Appendix A, Table A.10, are qualitatively similar but have less statistical significance. For example, the pseudo-R2 for headcount poverty and near-poverty in this table are about half (or less) of those for the logit regressions shown in Appendix A, Table 1.9 for Specifications 1 and 2. 27 the USAID poverty gap, while Model 3 also offers strong statistical significance under Specification 2, it does not work under Specification 1. Several models work for USAID poverty gap in Specification 2 but do not work in Specification 1. These include Models 4 to 8, but the results are statistically significant at the 5 percent level for Models 4, 5, and 8 and are marginally statistically significant at the 10 percent level for Models 6 and 7. Finally, the estimation results using the estimated model parameters (Specification 2) indicate that a larger time interval between the base survey and the target survey is generally associated with lower imputation accuracy for headcount poverty, extreme poverty, and poverty gap. More (higher) extreme poverty, poverty gap and consumption mean are positively associated with imputation accuracy. While more survey rounds help increase imputation accuracy for the poverty rate and poverty gap (possibly through higher data quality and/ or local survey staff capacity due to more surveys being implemented), a higher model goodness-of-fit (R2) does not help. However, as discussed earlier, the relationship between the estimated model parameters and the imputation accuracy is, at best correlational, so these results should be regarded as indicative and should be further investigated. 6. Conclusion We make several new conceptual and empirical contributions to the literature on survey-to- survey imputation of poverty estimates. Conceptually, we significantly expand this literature to various poverty indicators including near-poverty (vulnerability) status, extreme poverty, poverty gap, and other FGT poverty indexes. These extensions extend the existing literature, which almost 28 exclusively focuses on the headcount poverty rate. Furthermore, we also examine the performance of imputed household consumption, which underlies these poverty indicators. Empirically, we harmonize and rigorously analyze data from 22 recent rounds of multi-topic household surveys conducted over the past decade in Bangladesh, Ethiopia, Malawi, Nigeria, Tanzania, and Viet Nam. These six countries span three regions (i.e., Sub-Saharan Africa, South Asia, and Southeast Asia) and different income levels (i.e., low-income to lower-middle-income) and offer the most comprehensive dataset that has been analyzed to date. We find that survey-to-survey imputation provides encouraging results. However, imputation model heterogeneity exists, with certain models performing better for some poverty indicators only. In particular, for headcount poverty, adding household utility consumption expenditures (including electricity, water, and garbage) to a basic imputation model that includes household demographic and employment characteristics (Model 9) performs the best. Compared to a reference imputation model with basic demographic and employment variables, it raises the probability of imputation accuracy by 0.4. Model 9 also works for most of the remaining indicators but to varying degrees, raising the probability of imputation accuracy for near-poverty, extreme poverty, and poverty gap by around 0.1-0.2. Further adding agricultural soil quality information to Model 9 increases the probability of imputation accuracy by 0.5 for headcount poverty. Alternatively, adding food consumption expenditures to an imputation model that includes household demographics, employment, assets, and house characteristics (Model 3) works for headcount poverty, extreme poverty, poverty gap, and consumption mean, raising the probability of accurate imputation for these indicators by around 0.3. The results are strongly statistically different at the 5 percent level or less. 29 Further testing the imputation models with meta-analysis, we find that certain, but not all, model specifications work. In particular, Model 9 works better for mean consumption levels under a model specification (Specification 2) that includes model characteristics rather than country dummy variables (Specification 1). For the USAID poverty gap, several models work under Specification 2 but do not work in Specification 1. These include Models 3, 4, 5, 6, 7, and 8, with varying degrees of statistical significance. While these results provide some tentative evidence that these models may be used to obtain poverty estimates, they also need further study for more robustness. The estimation results using the estimated model parameters (Specifications 2) also indicate that a larger time interval between the base survey and the target survey is generally associated with lower imputation accuracy for headcount poverty, extreme poverty, and poverty gap. On the other hand, more (higher) extreme poverty, poverty gap and consumption mean are positively associated with imputation accuracy, as is more survey rounds. A higher model goodness-of-fit (R2) does not necessarily help with raising across year imputation accuracy. However, as discussed earlier, the relationship between the estimated model parameters and the imputation accuracy is, at best, correlational, so these results should be regarded as indicative and should be further investigated. These results are broadly consistent with earlier studies and offer useful inputs for future survey design. Collecting data on utilities expenditures or food expenditures clearly requires fewer resources and less time than implementing a full-fledged household consumption survey, thus employing imputation methods in combination with these data to provide updated poverty 30 estimates presents a cost-effective option. 23 Furthermore, in contexts where relatively less intensive survey efforts can be spent on collecting such data (e.g., especially where receipts for such expenditures are strongly digitalized), these advantages appear even stronger. Given the increasingly popular digitalization of payment transactions around the world, these contexts may become much more available in the near future. 23 Dang et al. (2024) offer experimental evidence from Tanzania that suggests collecting data on either reduced or more aggregated food consumption categories could help significantly improve the imputation accuracy of poverty estimates. 31 References Altındağ, O., O'Connell, S. D., Şaşmaz, A., Balcıoğlu, Z., Cadoni, P., Jerneck, M., & Foong, A. K. (2021). Targeting humanitarian aid using administrative data: model design and validation. Journal of Development Economics, 148, 102564. Athey, S., & Imbens, G. W. (2019). Machine learning methods that economists should know about. Annual Review of Economics, 11, 685-725. Baulch, Bob. (Ed.) (2011). Why poverty persists: Poverty dynamics in Asia and Africa. Cheltenham, UK: Edward Elgar Publishing. Beltramo, Theresa, Hai-Anh Dang, Ibrahima Sarr and Paolo Verme. (2024). "Estimating Poverty among Refugee Populations: A Cross-Survey Imputation Exercise for Chad". Oxford Development Studies, 52(1): 94-113. Christiaensen, Luc, Ethan Ligon, and Thomas P. Sohnesen. (2022). "Should Consumption Sub- Aggregates be Used to Measure Poverty?" World Bank Economic Review, 36(2): 413–432. Christiaensen, Luc, Peter Lanjouw, Jill Luoto, and David Stifel. (2012). "Small Area Estimation- based Prediction Models to Track Poverty: Validation and Applications.” Journal of Economic Inequality, 10(2): 267-297. Dang, Hai-Anh and Peter Lanjouw. (2023). “Regression-based Imputation for Poverty Measurement in Data Scarce Settings”. In Jacques Silber. (Eds.). Handbook of Research on Measuring Poverty and Deprivation. Edward Elgar Press. Dang, Hai-Anh and Paolo Verme. (2023). “Estimating Poverty for Refugees in Data-scarce Contexts: An Application of Cross-Survey Imputation.” Journal of Population Economics, 36(2), 653 – 679. Dang, Hai-Anh, Dean Jolliffe, and Calogero Carletto. (2019). "Data Gaps, Data Incomparability, and Data Imputation: A Review of Poverty Measurement Methods for Data-Scarce Environments". Journal of Economic Surveys, 33(3): 757-797. Dang, Hai-Anh, Peter Lanjouw, Umar Serajuddin. (2017). “Updating Poverty Estimates at Frequent Intervals in the Absence of Consumption Data: Methods and Illustration with Reference to a Middle-Income Country.” Oxford Economic Papers, 69(4): 939-962. Dang, Hai-Anh, Talip Kilic, Calogero Carletto, Kseniya Abanokova. (forthcoming). "Poverty Imputation in Contexts without Consumption Data: A Revisit with Further Refinements." Review of Income and Wealth. Dang, Hai-Anh, Talip Kilic, Vladimir Hlasny, Kseniya Abanokova, and Calogero Carletto. (2024). “Using Survey-to-Survey Imputation to Fill Poverty Data Gaps at a Low Cost: Evidence from a Randomized Survey Experiment”. IZA Discussion Paper no. 16792. 32 De Luca, Giuseppe, Jan R. Magnus, and Franco Peracchi. (2018). "Balanced variable addition in linear models." Journal of Economic Surveys, 32(4): 1183-1200. Deaton, Angus and Valerie Kozel. (2005). The Great Indian Poverty Debate. New Delhi: Macmillan. Deaton, Angus, and John Muellbauer. (1980). Economics and Consumer Behavior. Cambridge, UK: Cambridge University Press. Deaton, Angus and Salman Zaidi. (2002). Guidelines for Constructing Consumption Aggregates for Welfare Analysis (Vol. 135). Washington, DC: World Bank Publications. Devarajan, Shantayanan. (2013). "Africa's statistical tragedy." Review of Income and Wealth, 59: S9-S15. Douidich, Mohamed, Abdeljaouad Ezzrari, Roy van der Weide, and Paolo Verme. (2016). “Estimating Quarterly Poverty Rates Using Labor Force Surveys: A Primer.” World Bank Economic Review, 30(3): 475-500. Elbers, Chris, Jean O. Lanjouw, and Peter Lanjouw. (2003). “Micro-Level Estimation of Poverty and Inequality.” Econometrica, 71(1): 355-364. Fernandez, J., Olivieri, S., & Wambile, A. (2024). “Reconstructing 2010–2022 Poverty and Inequality Trends in Bangladesh: A Statistical Matching Approach”. Policy Research Working Paper 10749. Washington, DC: World Bank. Foster, J., Greer, J., & Thorbecke, E. (1984). A class of decomposable poverty measures. Econometrica, 52(3): 761-766. ---. (2010). The Foster–Greer–Thorbecke (FGT) poverty measures: 25 years later. The Journal of Economic Inequality, 8, 491-524. Jerven, Morten. (2019). "The Problems of Economic Data in Africa." In Oxford Research Encyclopedia of Politics. Kilic, T., and Sohnesen, T. (2019). “Same question but different answer: experimental evidence on questionnaire design’s impact on poverty measured by proxies.” Review of Income and Wealth, 65.1, pp. 144-165. Lain, J. and Vishwanath, T. (2022). A Better Future for All Nigerians: Nigeria Poverty Assessment 2022. Washington, D.C.: World Bank Group. 33 Letta, Marco, Pierluigi Montalbano, and Richard SJ Tol. (2018). "Temperature shocks, short-term growth and poverty thresholds: Evidence from rural Tanzania." World Development, 112: 13- 32. Mathiassen, Astrid. (2009). “A Model Based Approach for Predicting Annual Poverty Rates without Expenditure Data”. Journal of Economic Inequality, 7:117–135. Mathiassen, Astrid, and Bjørn K. Getz Wold. (2021). "Predicting poverty trends by survey-to- survey imputation: the challenge of comparability." Oxford Economic Papers 73, no. 3: 1153-1174. Mullainathan, S., & Spiess, J. (2017). Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2), 87-106. Naudé, W. and Vinuesa, R. (2020). Data, global development, and COVID-19. WIDER Working Paper 2020/109. Ravallion, Martin. (2016). The economics of poverty: History, measurement, and policy, New York: Oxford University Press. Serajuddin, Umar, Hiroki Uematsu, Christina Wieser, Nobuo Yoshida, and Andrew Dabalen. (2015). "Data deprivation: another deprivation to end." World Bank Policy Research Paper no. 7252, World Bank, Washington, DC. Sheahan, Megan, and Christopher B. Barrett. (2017). "Ten striking facts about agricultural input use in Sub-Saharan Africa." Food Policy, 67: 12-25. Snijders, Tom AB, and Roel J. Bosker. (1994). "Modeled variance in two-level models." Sociological Methods & Research, 22(3): 342-363. Stifel, D. and Christiaensen, L. (2007) “Tracking Poverty over Time in the Absence of Comparable Consumption Data”. World Bank Economic Review, 21, 317-341. Tanzania’s National Bureau of Statistics. (2011). Basic Information Document—National Panel Survey 2010-11. Tarozzi, Alessandro. (2007). “Calculating Comparable Statistics from Incomparable Surveys, With an Application to Poverty in India”. Journal of Business and Economic Statistics 25, no. 3:314-336. UNESCO-UIS/OECD/EUROSTAT. (UOE). (2020). Data collection on formal education— Manual on concepts, definitions and classifications. Montreal/ Paris/ Luxembourg. United States Census Bureau. (2017). Current Population Survey, Imputation of Unreported Data Items. Accessed on the Internet on May 24, 2021 at https://www.census.gov/programs- surveys/cps/technical-documentation/methodology/imputation-of-unreported-data-items.html 34 World Bank. (2021). World Development Report 2021: Data for Better Lives. Washington, DC: World Bank. 35 Table 1. Predicted Poverty Rates Based on Imputation (percentage) Second round True rates Indicators Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 Panel A: Bangladesh 2015-2018/19 Headcount poverty rate 13.0 7.7* 8.5 7.0* 8.0 7.0* 7.8* N/A N/A 7.3 (0.7) (0.6) (0.8) (0.7) (0.7) (0.6) (0.7) (0.6) Near-poverty rate 12.2* 9.5 11.3 10.2 9.9 9.2 9.5 N/A N/A 12.2 (0.5) (0.6) (0.7) (0.6) (0.6) (0.6) (0.6) (0.6) Extreme poverty rate 2.3 1.0 0.9 0.6* 0.9 0.8 1.0 N/A N/A 0.6 (0.3) (0.2) (0.2) (0.2) (0.2) (0.2) (0.2) (0.2) Poverty gap 2.6 1.4 1.4 1.1* 1.4 1.2* 1.4 N/A N/A 1.1 (0.2) (0.2) (0.2) (0.1) (0.2) (0.1) (0.2) (0.1) USAID Poverty gap 19.9 17.5 16.5 15.9* 17.2 17.0 17.6 N/A N/A 15.2 (0.7) (1.0) (0.9) (0.9) (0.9) (1.0) (1.0) (0.8) N 5,604 5,604 5,604 5,604 5,604 5,604 5,604 5,604 Panel B: Malawi 2016/17-2019/20 53.2 52.7 60.5 52.7 53.6 52.6 52.6 52.2 52.3 51.1 Headcount poverty rate (0.9) (1.0) (1.2) (1.2) (1.0) (1.0) (1.0) (1.0) (0.9) (0.9) 14.0* 14.3* 12.4 14.2* 14.4* 14.4* 14.3* 14.5 14.4* 14.1 Near-poverty rate (0.4) (0.4) (0.5) (0.5) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) 22.8 21.7 29.5 21.7 22.0 21.5 21.8 21.2* 21.8 20.6 Extreme poverty rate (0.7) (0.7) (1.1) (0.9) (0.8) (0.7) (0.8) (0.8) (0.7) (0.8) 18.2 17.6 22.5 17.6 17.9 17.5* 17.7 17.3* 17.7 17.1 Poverty gap (0.4) (0.5) (0.7) (0.6) (0.5) (0.5) (0.5) (0.5) (0.4) (0.4) 34.3 33.5* 37.2 33.4* 33.4* 33.3* 33.6* 33.3* 33.8* 33.5 USAID Poverty gap (0.4) (0.4) (0.6) (0.5) (0.4) (0.4) (0.4) (0.4) (0.4) (0.5) N 11,432 11,432 11,432 11,432 11,432 11,432 11,432 11,432 11,432 11,432 Panel C: Nigeria 2012/13-2018/19 Headcount poverty rate 33.4 33.3 50.8 27.2 24.7 29.4 33.2 34.1 34.5 46.4 (1.9) (2.0) (2.6) (2.1) (1.9) (1.9) (2.1) (2.1) (2.0) (1.9) Near-poverty rate 12.2 12.7 13.6* 13.4* 11.3 12.6 12.5 12.6 12.2 13.7 (0.9) (0.9) (1.0) (1.0) (0.9) (0.9) (0.9) (0.9) (0.9) (1.0) Extreme poverty rate 15.1 14.8 26.4 9.2 9.9 12.3 14.9 15.5 15.8 22.2 (1.4) (1.5) (2.2) (1.3) (1.2) (1.3) (1.5) (1.5) (1.4) (1.4) Poverty gap 10.9 10.9 17.9 7.4 7.5 9.2 10.9 11.3 11.4 15.3 (0.8) (0.9) (1.3) (0.8) (0.8) (0.8) (0.9) (0.9) (0.9) (0.8) USAID poverty gap 32.7* 32.7* 35.3 27.2 30.4 31.3 32.9* 33.0* 33.1* 33.0 (1.2) (1.3) (1.1) (1.2) (1.5) (1.4) (1.3) (1.3) (1.1) (0.8) N 4,976 4,976 4,976 4,976 4,976 4,976 4,976 4,976 4,976 4,976 Panel D: Tanzania 2019/20-2020/21 Headcount poverty rate 17.4* 17.5* 19.4 16.9* 17.3* 16.7 17.5* 17.8* 18.2* 17.8 (1.0) (1.3) (1.4) (1.3) (1.2) (1.3) (1.3) (1.3) (1.1) (1.1) Near-poverty rate 10.5* 10.9 10.6* 10.9 10.9* 10.3* 10.8* 11.0 10.4* 10.2 (0.7) (0.7) (0.8) (0.8) (0.8) (0.7) (0.7) (0.8) (0.7) (0.7) Extreme poverty rate 9.7* 9.4* 11.1 9.1* 9.3* 9.0* 9.5* 9.7* 10.3* 9.8 (0.8) (0.9) (1.1) (1.0) (0.9) (1.0) (1.0) (1.0) (0.9) (0.8) 36 Poverty gap 4.6* 4.4* 5.3 4.2 4.3* 4.2 4.4* 4.5* 4.9* 4.6 (0.4) (0.4) (0.5) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) (0.3) USAID Poverty gap 26.3* 25.3* 27.5 24.9* 25.1* 25.2* 25.4* 25.5* 26.9* 25.9 (1.1) (1.1) (1.3) (1.1) (1.1) (1.2) (1.1) (1.1) (1.1) (1.1) N 4,644 4,644 4,644 4,644 4,644 4,644 4,644 4,644 4,644 4,644 Panel E: Viet Nam 2014-2016 Headcount poverty rate 15.0 13.3 6.1 8.4 10.6 12.5 13.3 11.4 9.6* 9.6 (0.5) (0.6) (0.5) (0.5) (0.5) (0.6) (0.6) (0.6) (0.5) (0.4) Near-poverty rate 9.3 8.5 4.9 6.0 7.1* 8.0 8.5 7.7 6.8* 6.9 (0.4) (0.4) (0.3) (0.4) (0.3) (0.4) (0.4) (0.4) (0.3) (0.3) Extreme poverty rate 2.0 2.0 0.7 1.2* 1.5 2.1 2.1 2.0 2.0 1.2 (0.2) (0.3) (0.1) (0.2) (0.2) (0.3) (0.3) (0.3) (0.3) (0.2) Poverty gap 4.0 3.7 1.5 2.3 2.9 3.5 3.7 3.2 2.9 2.5 (0.2) (0.2) (0.2) (0.2) (0.2) (0.2) (0.2) (0.2) (0.3) (0.2) USAID Poverty gap 26.7 27.6 24.9 27.4 27.1 28.1 27.8 28.3 30.1 26.1 (0.7) (0.9) (1.3) (1.2) (1.0) (1.0) (0.9) (1.2) (1.7) (0.9) N 9,347 9,347 9,347 9,347 9,347 9,347 9,347 9,347 9,347 9,347 Control variables Food expenditures Y Non-food expenditures Y Furnishings and durable household expenses Y Health expenditures Y Education expenditures Y Utilities expenditures Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y Note: Normal linear regression with bootstrapped standard errors is used. The standard errors are calculated using 100 bootstrap replications and are adjusted for complex survey design. All estimates are obtained with population weights. The normal linear regression model with the theoretical distribution of the error terms employs cluster random effects. ’Near poor’ status is defined as living on an income between 100 and 125% of the poverty line. All indicators are expressed in percentage. The true rate is the estimate directly obtained from the survey data. Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true rate. 37 Table 2. Predicted Mean (Log) Consumption Based on Imputation Second round True rates Indicators Model Model Model Model Model Model Model Model Model 1 2 3 4 5 6 7 8 9 10.7 10.8 10.8 10.8 10.8 10.9 10.8 N/A N/A 10.8 Bangladesh, from 2015 to 2018/19 (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) 12.1 12.1 12.0 12.1 12.1 12.1 12.1 12.1 12.1 12.1 Malawi, from 2016/17 to 2019/20 (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) 11.3 11.3 11.0 11.4 11.4 11.3 11.3 11.3 11.3 11.0 Nigeria, from 2012/13 to 2018/19 (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) 13.7 13.7 13.7 13.7 13.7 13.7* 13.7 13.7 13.7 13.7 Tanzania, from 2019/20 to 2020/21 (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) 9.6 9.7 10.0 9.8 9.8* 9.7 9.7 9.7 9.8* 9.8 Viet Nam, from 2014 to 2016 (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities expenditures Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y Note: Normal linear regression with bootstrapped standard errors is used. The standard errors are calculated using 100 bootstrap replications and are adjusted for complex survey design. All estimates are obtained with population weights. The normal linear regression model with the theoretical distribution of the error terms employs cluster random effects. Imputed consumption per capita for the second round uses the estimated parameters based on the data from the first round. 100 simulations are implemented. True consumption per capita is the estimate directly obtained from the survey data. Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true rate. 38 Table 3. Meta-analysis of Imputation Models and Their Parameters, Marginal Effects from Logit Regressions 39 Figure 1. Imputation Accuracy for Different Imputation Models, Poverty Indicators Headcount poverty rate Near poverty rate m1 m1 m2 m2 m3 m3 m4 m4 m5 m5 m6 m6 m7 m7 m8 m8 m9 m9 m2+dist_facilities m2+dist_facilities m2+soil_quality m2+soil_quality m9+dist_facilities m9+dist_facilities m9+soil_quality m9+soil_quality m3+dist_facilities m3+dist_facilities m3+soil_quality m3+soil_quality m3+nightlight m3+nightlight m2+nightlight m2+nightlight m9+nightlight m9+nightlight 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Imputation accuracy (%) Imputation accuracy (%) Extreme poverty rate Poverty gap m1 m1 m2 m2 m3 m3 m4 m4 m5 m5 m6 m6 m7 m7 m8 m8 m9 m9 m2+dist_facilities m2+dist_facilities m2+soil_quality m2+soil_quality m9+dist_facilities m9+dist_facilities m9+soil_quality m9+soil_quality m3+dist_facilities m3+dist_facilities m3+soil_quality m3+soil_quality m3+nightlight m3+nightlight m2+nightlight m2+nightlight m9+nightlight m9+nightlight 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Imputation accuracy (%) Imputation accuracy (%) USAID poverty gap m1 m2 m3 m4 m5 m6 m7 m8 m9 m2+dist_facilities m2+soil_quality m9+dist_facilities m9+soil_quality m3+dist_facilities m3+soil_quality m3+nightlight m2+nightlight m9+nightlight 0 10 20 30 40 50 60 70 80 90 100 Imputation accuracy (%) Note: Imputation accuracy is the share of the estimates that are statistically insignificantly different from the true poverty rates for all countries and years. Red dashed line indicates mean accuracy. 40 Figure 2. Imputation Accuracy for Different Imputation Models Consumption mean m1 m2 m3 m4 m5 m6 m7 m8 m9 m2+dist_facilities m2+soil_quality m9+dist_facilities m9+soil_quality m3+dist_facilities m3+soil_quality m3+nightlight m2+nightlight m9+nightlight 0 10 20 30 40 50 60 70 80 90 100 Imputation accuracy (%) 41 Figure 3. Distribution of imputed log of consumption Bangladesh 2015-2018/19 Malawi 2016/17-2019/20 m9 m9 m8 m8 m7 m7 m6 m6 m5 m5 m4 m4 m3 m3 m2 m2 m1 m1 10th 25th 5th 50th 75th 90th95th 5th10th 25th 50th 75th 90th 95th Percentile Percentile Nigeria 2012/13 - 2018/19 Tanzania 2019/20-2020/21 m9 m9 m8 m8 m7 m7 m6 m6 m5 m5 m4 m4 m3 m3 m2 m2 m1 m1 5th10th 25th 95th 50th 75th 90th 5th10th 25th 50th 75th 90th 95th Percentile Percentile Vietnam 2014-2016 m9 m8 m7 m6 m5 m4 m3 m2 m1 5th 10th 25th 50th 75th 90th95th Percentile 42 Figure 4. Predicted consumption distribution, Model 3 Bangladesh 2015-2018/19 Malawi 2016/17-2019/20 13.5 11.5 Log of consumption Log of consumption 13 11 12.5 12 10.5 11.5 10 11 5 10 25 50 75 90 95 5 10 25 50 75 90 95 Percentile Percentile Nigeria 2012/13-2018/19 Tanzania 2019/20-2020/21 15 12 Log of consumption Log of consumption 14.5 11.5 14 11 13.5 10.5 13 10 12.5 5 10 25 50 75 90 95 5 10 25 50 75 90 95 Percentile Percentile Vietnam 2014-2016 11 95% CIs of true value Log of consumption 10.5 True percentile value 10 95% CIs of imputed value 9.5 Imputed percentile value 9 8.5 5 10 25 50 75 90 95 Percentile 43 Figure 5. Predicted FGT indexes, Tanzania 2019/20- 2020/21 Panel 1: FGT 1 Panel 2: FGT 2 Estimated poverty gap (%) Estimated poverty gap (%) 7 3 6 2.5 5 2 4 1.5 3 1 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Panel 3: FGT 3 Panel 4: FGT 4 Estimated poverty gap (%) Estimated poverty gap (%) 1.5 1 .8 1 .6 .4 .5 .2 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Panel 5: FGT 5 Estimated poverty gap (%) .6 .5 .4 .3 .2 .1 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Note: Estimates are obtained by imputing from sample 1 into sample 2. 100 simulations are implemented. The standard errors are calculated using 100 bootstrap replications and are adjusted for complex survey design. Larger hollow symbols indicates that the estimates are statistically insignificantly different from the true poverty gap. Dashed lines represent the true poverty gap. Dotted lines represent confidence intervals of the true poverty gap. Estimates are obtained using the normal linear regression models. 44 Figure 6. Number of Models with Predicted Estimates That Are Statistically Insignificantly Different from the True Poverty Estimates, Within-Year Imputation Panel A: Headcount poverty Panel B: Near poverty 9 9 8 8 7 7 Number of models Number of models 6 6 5 5 4 4 3 3 2 2 1 1 0 0 Bangladesh Malawi Vietnam Bangladesh Malawi Vietnam Ethiopia Nigeria Tanzania Ethiopia Nigeria Tanzania Panel C: Extreme poverty Panel D: Poverty gap 9 9 8 8 7 7 Number of models Number of models 6 6 5 5 4 4 3 3 2 2 1 1 0 0 Bangladesh Malawi Vietnam Bangladesh Malawi Vietnam Ethiopia Nigeria Tanzania Ethiopia Nigeria Tanzania Panel E: USAID poverty gap Panel F: Consumption mean 9 9 8 8 7 7 Number of models 6 6 Number of models 5 5 4 4 3 3 2 2 1 1 0 0 Bangladesh Malawi Vietnam Bangladesh Malawi Vietnam Ethiopia Nigeria Tanzania Ethiopia Nigeria Tanzania Note: Estimates are obtained by imputing from sample 1 into sample 2. 100 simulations are implemented. Estimates are obtained using the normal linear regression models. 45 Appendix A: Additional Tables and Figures Table A.1. Household consumption model, Bangladesh 2015 Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 -0.054*** -0.067*** -0.024*** -0.031*** -0.055*** -0.062*** -0.072*** Household size (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) 0.002*** 0.001** 0.000 0.000 0.002*** 0.001** 0.000 Age of HH Head (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) 0.058*** -0.005 0.033*** -0.023** -0.014 0.003 -0.020 HH Head is Female (0.02) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) 0.091*** 0.039*** 0.022*** 0.005 0.029** 0.031** 0.035** Head has less than 5 years of schooling (0.02) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) 0.204*** 0.078*** 0.040*** 0.015* 0.051*** 0.072*** 0.074*** Head has 5-9 years of schooling (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) 0.451*** 0.151*** 0.079*** 0.033** 0.115*** 0.145*** 0.134*** Head has 10 or more years of schooling (0.02) (0.02) (0.01) (0.01) (0.02) (0.02) (0.02) -0.742*** -0.451*** -0.184*** -0.202*** -0.348*** -0.386*** -0.533*** Share of HH members in 0-14 (0.04) (0.03) (0.02) (0.03) (0.03) (0.03) (0.03) -0.187*** -0.116*** -0.027 -0.092*** -0.139*** -0.068** -0.135*** Share of HH members in 15-24 (0.04) (0.03) (0.02) (0.02) (0.03) (0.03) (0.03) -0.322*** -0.187*** -0.086*** -0.047** -0.128*** -0.210*** -0.129*** Share of HH members in 60 and older (0.03) (0.03) (0.02) (0.02) (0.03) (0.03) (0.03) -0.181*** -0.094*** -0.046*** -0.029*** -0.060*** -0.076*** -0.095*** HH Head did wage/salary work during the last 7 days (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) 0.027** 0.022* -0.014** 0.025*** 0.015 0.037*** 0.013 HH Head was self-employed during the last 7 days (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) 0.698*** Log of food expenditures (0.01) 0.504*** Log of nonfood expenditures (0.01) 0.182*** Log of durable expenditures (0.01) 0.094*** Log of health expenditures (0.00) 0.017*** Log of education expenditures (0.00) 0.021 0.000 0.025 0.013 0.016 0.028 Household owns a radio (0.04) (0.02) (0.03) (0.04) (0.04) (0.04) 0.134*** 0.055*** 0.072*** 0.037*** 0.120*** 0.131*** Household owns a television (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) 0.028 0.024* 0.006 0.001 0.028 0.030 Household owns a audio cassette/cd player (0.02) (0.01) (0.02) (0.02) (0.02) (0.02) -0.007 0.019* -0.013 -0.039** -0.005 -0.008 Household owns a sewing machine (0.02) (0.01) (0.01) (0.02) (0.02) (0.02) 0.164*** 0.091*** 0.062*** 0.104*** 0.145*** 0.166*** Household owns a stove / gas burner (0.02) (0.01) (0.02) (0.02) (0.02) (0.02) 0.040*** 0.029*** -0.001 0.001 0.034*** 0.026** Household owns a bicycle (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) 0.242*** 0.154*** 0.088*** 0.087*** 0.227*** 0.250*** Household owns a motor vehicles (0.02) (0.01) (0.02) (0.02) (0.02) (0.02) 0.163*** 0.084*** 0.043*** 0.029** 0.143*** 0.154*** Household owns a mobile phone (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) 0.114*** 0.076*** 0.019 0.088*** 0.091*** 0.110*** Household owns an iron (0.02) (0.01) (0.01) (0.02) (0.02) (0.02) 0.127*** 0.064*** 0.048*** 0.051*** 0.119*** 0.122*** Household owns an electric fan (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) 0.143*** 0.075*** 0.058*** 0.093*** 0.130*** 0.141*** Log of total floor area of the dwelling (0.01) (0.00) (0.01) (0.01) (0.01) (0.01) 0.015 0.023*** 0.010 -0.015 0.017 0.012 Household dwelling wall materials (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) 0.014 0.032* -0.015 -0.018 0.003 0.014 Household dwelling roof materials (0.03) (0.02) (0.02) (0.03) (0.03) (0.03) 0.138*** 0.084*** 0.062*** 0.102*** 0.132*** 0.135*** Household dwelling floor materials (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) 0.078*** 0.037*** 0.022* 0.064*** 0.066*** 0.076*** Household dwelling water access (0.02) (0.01) (0.01) (0.02) (0.02) (0.02) 0.036*** 0.025*** 0.008 0.015 0.031*** 0.034*** Household toilet is water sealed (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) -0.021 -0.002 -0.015 -0.016 -0.016 -0.019 Household toilet is other types (0.02) (0.01) (0.01) (0.01) (0.01) (0.02) _cons 11.110*** 10.085*** 3.188*** 5.777*** 9.212*** 9.495*** 10.131*** (0.04) (0.06) (0.07) (0.08) (0.06) (0.06) (0.06) sigma_e 0.38 0.32 0.17 0.24 0.29 0.30 0.32 sigma_u 0.14 0.11 0.05 0.07 0.10 0.09 0.10 rho 0.13 0.10 0.06 0.09 0.10 0.09 0.10 r2_o 0.33 0.54 0.87 0.75 0.61 0.60 0.55 N 5447 5447 5445 5447 5441 5447 5447 46 Table A.2. Household consumption model, Ethiopia 2018/19 47 Table A.3. Household consumption model, Malawi 2016/17 48 Table A.4. Household consumption model, Nigeria 2012/13 49 Table A.5. Household consumption model, Tanzania 2019/20 50 Table A.6. Household consumption model, Viet Nam 2014 51 Table A.7. Predicted Poverty Rates Based on Imputation (percentage) Second period True Indicators Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 rates Panel A: Bangladesh 2015-2018/19 12.5 7.5* 8.2 6.7 7.7* 6.8* 7.6* N/A N/A 7.3 Headcount poverty rate (0.7) (0.6) (0.8) (0.7) (0.6) (0.6) (0.7) (0.6) 13.1 9.7 11.8* 10.5 10.1 9.4 9.7 13.1 9.7 12.2 Near-poverty rate (0.6) (0.6) (0.7) (0.6) (0.6) (0.6) (0.6) (0.6) (0.6) (0.6) 1.9 0.9 0.8 0.5* 0.8 0.7* 0.9 1.9 0.9 0.6 Extreme poverty rate (0.3) (0.2) (0.2) (0.2) (0.2) (0.2) (0.2) (0.3) (0.2) (0.2) 2.4 1.3 1.3 1.0* 1.3 1.1* 1.3 2.4 1.3 1.1 Poverty gap (0.2) (0.1) (0.2) (0.1) (0.1) (0.1) (0.1) (0.2) (0.1) (0.1) 18.8 17.0 16.1 15.1* 16.5 16.3 17.0 18.8 17.0 15.2 USAID Poverty gap (0.7) (0.9) (0.9) (0.9) (0.9) (0.9) (0.9) (0.7) (0.9) (0.8) N Panel B: Malawi 2016/17-2019/20 53.5 52.8 60.8 52.8 53.9 52.7 52.7 52.3 52.7 51.1 Headcount poverty rate (0.9) (1.0) (1.1) (1.2) (1.0) (1.0) (1.0) (1.0) (0.9) (0.9) 14.0* 14.2* 12.3 14.3* 14.3* 14.4* 14.2* 14.4* 14.3* 14.1 Near-poverty rate (0.4) (0.4) (0.5) (0.5) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) 22.7 21.5 29.6 21.6 21.9 21.5 21.7 21.1* 21.8 20.6 Extreme poverty rate (0.7) (0.8) (1.1) (0.9) (0.8) (0.8) (0.8) (0.8) (0.7) (0.8) 18.2 17.6 22.6 17.5 18.0 17.5* 17.7 17.3* 17.7 17.1 Poverty gap (0.4) (0.5) (0.6) (0.5) (0.5) (0.5) (0.5) (0.5) (0.4) (0.4) 34.1 33.4* 37.2 33.2* 33.3* 33.3* 33.6* 33.2* 33.6* 33.5 USAID Poverty gap (0.4) (0.4) (0.6) (0.5) (0.4) (0.4) (0.4) (0.4) (0.4) (0.5) N 11,432 11,432 11,432 11,432 11,432 11,432 11,432 11,432 11,432 11,432 Panel C: Nigeria 2012/13-2018/19 Headcount poverty rate 33.4 33.5 51.3 27.6 24.7 29.7 33.4 34.4 34.6 46.4 (2.0) (2.0) (2.4) (1.9) (1.8) (1.9) (2.0) (2.0) (2.0) (1.9) Near-poverty rate 12.5 12.7* 13.4* 13.3* 11.5 12.6 12.6 12.6 12.4 13.7 (0.9) (0.9) (1.0) (1.0) (0.9) (0.9) (0.9) (0.9) (0.9) (1.0) Extreme poverty rate 14.7 14.8 26.4 9.0 10.0 12.2 14.9 15.4 15.5 22.2 (1.4) (1.5) (2.1) (1.1) (1.2) (1.3) (1.5) (1.5) (1.4) (1.4) Poverty gap 10.9 10.9 18.0 7.4 7.6 9.3 11.0 11.3 11.4 15.3 (0.9) (0.9) (1.2) (0.7) (0.7) (0.8) (0.9) (0.9) (0.9) (0.8) USAID poverty gap 32.6* 32.7* 35.1 26.9 30.7 31.2 33.0* 33.0* 33.0* 33.0 (1.1) (1.3) (1.1) (1.1) (1.5) (1.3) (1.3) (1.2) (1.1) (0.8) N 4,976 4,976 4,976 4,976 4,976 4,976 4,976 4,976 4,976 4,976 Panel D: Tanzania 2019/20-2020/21 52 Headcount poverty rate 17.1* 17.3* 19.4 16.4 17.1* 16.5 17.3* 17.6* 18.2* 17.8 (1.1) (1.3) (1.5) (1.3) (1.2) (1.3) (1.3) (1.3) (1.1) (1.1) Near-poverty rate 10.5* 10.9* 10.7* 11.3 10.8* 10.3* 10.8* 10.8* 10.5* 10.2 (0.7) (0.8) (0.8) (0.8) (0.8) (0.8) (0.8) (0.8) (0.7) (0.7) Extreme poverty rate 9.8* 9.4* 10.9 8.4 9.3* 9.0 9.5* 9.6* 10.4* 9.8 (0.8) (1.0) (1.1) (0.9) (0.9) (1.0) (1.0) (1.0) (0.9) (0.8) Poverty gap 4.6* 4.5* 5.2 3.9 4.4* 4.2 4.5* 4.5* 4.9* 4.6 (0.4) (0.4) (0.5) (0.4) (0.4) (0.4) (0.4) (0.4) (0.4) (0.3) USAID Poverty gap 26.9* 25.8* 27.0* 24.0 25.6* 25.5* 25.8* 25.7* 26.8* 25.9 (1.1) (1.1) (1.4) (1.1) (1.1) (1.2) (1.1) (1.1) (1.1) (1.1) N 4,644 4,644 4,644 4,644 4,644 4,644 4,644 4,644 4,644 4,644 Panel E: Viet Nam 2014-2016 Headcount poverty rate 14.7 13.1 6.0 8.3 10.3 12.3 13.1 11.2 9.0 9.6 (0.5) (0.6) (0.5) (0.6) (0.6) (0.6) (0.6) (0.6) (0.5) (0.4) Near-poverty rate 9.4 8.6 4.9 6.1 7.2* 8.0 8.6 7.8 6.9* 6.9 (0.4) (0.4) (0.3) (0.4) (0.3) (0.4) (0.4) (0.4) (0.3) (0.3) Extreme poverty rate 1.9 1.9 0.6 1.2* 1.5 2.0 2.0 1.9 1.9 1.2 (0.2) (0.3) (0.1) (0.2) (0.2) (0.3) (0.3) (0.3) (0.4) (0.2) Poverty gap 3.9 3.6 1.5 2.3 2.8 3.4 3.6 3.1 2.7 2.5 (0.2) (0.2) (0.2) (0.2) (0.2) (0.2) (0.2) (0.3) (0.3) (0.2) USAID Poverty gap 26.5* 27.2 24.5 27.4 26.9* 27.9 27.4 28.0 30.0 26.1 (0.7) (1.0) (1.3) (1.3) (1.0) (1.0) (1.0) (1.4) (1.9) (0.9) N 9,347 9,347 9,347 9,347 9,347 9,347 9,347 9,347 9,347 Control variables Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities expenditures Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y Note: Empirical distribution of the error terms model with bootstrapped SEs is used. The standard errors are calculated using 100 bootstrap replications and are adjusted for complex survey design. All estimates are obtained with population weights. The normal linear regression model with the theoretical distribution of the error terms employs cluster random effects. Imputed poverty rates for 2019/20 use the estimated parameters based on the 2016/17 data. 100 simulations are implemented. ’Near poor’ status is defined as living on an income between 100 and 125% of the poverty line. All indicators are expressed in %. True rate is the estimate directly obtained from the survey data. Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true rate. 53 Table A.8. Predicted Mean (Log) Consumption Based on Imputation Second round Indicators Model Model Model Model Model Model Model Model Model True rates 1 2 3 4 5 6 7 8 9 10.7 10.8 10.8 10.8 10.8 10.9 10.8 N/A N/A 10.8 Bangladesh, from 2015 to 2018/19 (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) 12.1 12.1 12.0 12.1 12.1 12.1 12.1 12.2 12.1 12.1 Malawi, from 2016/17 to 2018/19 (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) 11.3 11.3 11.0 11.4 11.4 11.3 11.3 11.2 11.2 11.0 Nigeria, from 2012/13 to 2018/19 (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) 13.7 13.7 13.7 13.7 13.7 13.7* 13.7 13.7 13.7 13.7 Tanzania, from 2019/20 to 2020/21 (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) 9.6 9.7 10.0 9.8 9.8* 9.7 9.7 9.7 9.8* 9.6 Viet Nam, from 2014 to 2016 (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) (0.0) Food expenditures Y Non-food expenditures Y Furnishings and household expenses Y Health expenditures Y Education expenditures Y Utilities: water, kerosene, lighting Y Y Household assets & house characteristics Y Y Y Y Y Y Y Demographics & employment Y Y Y Y Y Y Y Y Y Note: Empirical distribution of the error terms model with bootstrapped SEs is used. The standard errors are calculated using 100 bootstrap replications and are adjusted for complex survey design. All estimates are obtained with population weights. The normal linear regression model with the theoretical distribution of the error terms employs cluster random effects. Imputed consumption per capita for the second round uses the estimated parameters based on the data from the first round. 100 simulations are implemented. True consumption per capita is the estimate directly obtained from the survey data. Estimates shown in boldface or with a “*” respectively fall within the 95% confidence interval or one standard error of the true rate. 54 Table A.9. Meta-analysis of Imputation Models and Their Parameters, Logit Regressions 55 Table A.10. Meta-analysis of Imputation Models and Their Parameters, Ordered Logit Regressions 56 Table A.11. Comparison of utility expenditure variables in the BIHSs 2010/11 to 2018/19 2010/11 2015 2018/19 Item name BIHS BIHS BIHS Module P1 (monthly recall) 1. Firewood + + + 2. Cow dung/cakes/bhushi/wood-powder + + + 3. Jute stick + + + 4. Kerosene + + + 5. Agriculture by-products used for fuel: paddy, hag, pressed sugarcane, and dried com plants + + + 6. Gas (natural, bio-gas) or liquified petroleum gas (LPG) + + + 7. Electricity + + 8. Pit coal, char coal, wood coal + + + 9. Other fuels and light (e.g., matches and candles) + + + 10. Electricity (national grid) + 11. Electricity (generator) + + 12. Electricity (solar) + + Notes: The inconsistent variables are marked in red. The sign “+” indicates the variable is included in the specific survey round. For item 8, very few households (8 households in 2015 BIHS and 1 household in 2018/19 BIHS) reported use of “Pit coal, char coal, wood coal”. 57 Figure A.1. Number of Models with Predicted Estimates of (Log) of Consumption That Are Statistically Insignificantly Different from the True Estimates Bangladesh 2015-2018/19 Malawi 2016/17-2019/20 9 9 8 8 7 7 Number of models Number of models 6 6 5 5 4 4 3 3 2 2 1 1 0 0 5th 10th 25th 50th 75th 90th 95th 5th 10th 25th 50th 75th 90th 95th Percentiles Percentiles Nigeria 2012/13-2018/19 Tanzania 2019/20-2020/21 9 9 8 8 7 7 Number of models Number of models 6 6 5 5 4 4 3 3 2 2 1 1 0 0 5th 10th 25th 50th 75th 90th 95th 5th 10th 25th 50th 75th 90th 95th Percentiles Percentiles Vietnam 2014-2016 9 8 7 Number of models 6 5 4 3 2 1 0 5th 10th 25th 50th 75th 90th 95th Percentiles 58 Figure A.2. Predicted consumption distribution, Model 9 Malawi 2019/20 Nigeria 2018/19 13.5 12.5 13 10.5 11 11.5 12 Log of consumption Log of consumption 12.5 12 11.5 11 10 5 10 25 50 75 9095 5 10 25 50 75 9095 Percentile Percentile Tanzania 2020/21 Vietnam 2016 11 15 14.5 10 10.5 Log of consumption Log of consumption 14 13.5 9.5 13 9 12.5 8.5 5 10 25 50 75 9095 5 10 25 50 75 9095 Percentile Percentile 95% CIs of true value True percentile value 95% CIs of imputed value Imputed percentile value 59 Figure A.3. Predicted USAID FGT indexes, Tanzania 2019/20- 2020/21 Panel 1: FGT 1 Panel 2: FGT 2 14 Estimated poverty gap (%) Estimated poverty gap (%) 30 28 12 26 10 24 8 22 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Panel 3: FGT 3 Panel 4: FGT 4 5 Estimated poverty gap (%) Estimated poverty gap (%) 7 6 4 5 3 4 2 3 1 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Panel 5: FGT 5 3 Estimated poverty gap (%) 2.5 2 1.5 1 .5 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Note: Estimates are obtained by imputing from sample 1 into sample 2. 100 simulations are implemented. The standard errors are calculated using 100 bootstrap replications and are adjusted for complex survey design. Larger hollow symbols indicates that the estimates are statistically insignificantly different from the true poverty gap. Dashed lines represent the true poverty gap. Dotted lines represent confidence intervals of the true poverty gap. Estimates are obtained using the normal linear regression models. 60 Figure A.4. Predicted Headcount Poverty Based on Within-Year Imputation Bangladesh 2018/19 Ethiopia 2018/19 11 50 Estimated poverty rate (%) Estimated poverty rate (%) 10 45 9 8 40 7 6 35 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Tanzania 2020/21 Malawi 2019/20 22 54 Estimated poverty rate (%) Estimated poverty rate (%) 20 52 18 50 16 48 14 46 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Nigeria 2018/19 Vietnam 2016 55 12 Estimated poverty rate (%) Estimated poverty rate (%) 11 50 10 45 9 8 40 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Note: Estimates are obtained by imputing from sample 1 into sample 2. 100 simulations are implemented. The standard errors are calculated using 100 bootstrap replications and are adjusted for complex survey design. Larger hollow symbols indicates that the estimates are statistically insignificantly different from the true poverty rates. Dashed lines represent the true poverty rates. Dotted lines represent confidence intervals of the true poverty rates. Estimates are obtained using the normal linear regression models. 61 Figure A.5. Predicted Near - Poverty Rates Based on Within-Year Imputation Bangladesh 2018/19 Ethiopia 2018/19 13 Estimated poverty rate (%) Estimated poverty rate (%) 16 12 15 14 11 13 10 12 9 11 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Tanzania 2020/21 Malawi 2019/20 16 12 Estimated poverty rate (%) Estimated poverty rate (%) 11 15 10 14 9 13 8 7 12 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Nigeria 2018/19 Vietnam 2016 18 9 Estimated poverty rate (%) Estimated poverty rate (%) 16 8 14 7 12 6 10 5 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Note: Estimates are obtained by imputing from sample 1 into sample 2. 100 simulations are implemented. The standard errors are calculated using 100 bootstrap replications and are adjusted for complex survey design. Larger hollow symbols indicates that the estimates are statistically insignificantly different from the true near- poverty rates. Dashed lines represent the true near-poverty rates defined as living on an income between 100 and 125% of the poverty line. Dotted lines represent confidence intervals of the true near- poverty rates. Estimates are obtained using the normal linear regression models. 62 Figure A.6. Predicted Extreme Poverty Rates Based on Within-Year Imputation Bangladesh 2018/19 Ethiopia 2018/19 2 16 Estimated poverty rate (%) Estimated poverty rate (%) 1.5 14 1 12 .5 10 0 8 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Tanzania 2020/21 Malawi 2019/20 23 12 Estimated poverty rate (%) Estimated poverty rate (%) 22 11 21 10 9 20 8 19 7 18 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Nigeria 2018/19 Vietnam 2016 30 2.5 Estimated poverty rate (%) Estimated poverty rate (%) 25 2 1.5 20 1 15 .5 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Note: Estimates are obtained by imputing from sample 1 into sample 2. 100 simulations are implemented. The standard errors are calculated using 100 bootstrap replications and are adjusted for complex survey design. Larger hollow symbols indicates that the estimates are statistically insignificantly different from the true extreme poverty rates. Dashed lines represent the true extreme poverty rates. Extreme poverty line is defined as US$1.25 (2011 PPP) per day per capita in Bangladesh and Nigeria and as half of the national poverty line in Viet Nam and Ethiopia. National food poverty lines are used as extreme poverty line in Malawi and Tanzania. Dotted lines represent confidence intervals of the true extreme poverty rates. Estimates are obtained using the normal linear regression models. 63 Figure A.7. Predicted Poverty Gap Based on Within-Year Imputation Bangladesh 2018/19 Ethiopia 2018/19 2 18 Estimated poverty rate (%) Estimated poverty rate (%) 1.5 16 1 14 .5 12 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Tanzania 2020/21 Malawi 2019/20 6 19 Estimated poverty rate (%) Estimated poverty rate (%) 5.5 18 5 17 4.5 4 16 3.5 15 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Nigeria 2018/19 Vietnam 2016 20 3.5 Estimated poverty rate (%) Estimated poverty rate (%) 18 3 16 2.5 14 12 2 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Note: Estimates are obtained by imputing from sample 1 into sample 2. 100 simulations are implemented. The standard errors are calculated using 100 bootstrap replications and are adjusted for complex survey design. Larger hollow symbols indicate that the estimates are statistically insignificantly different from the true poverty gap. Dashed lines represent the true poverty gap. Dotted lines represent confidence intervals of the true poverty gap. Estimates are obtained using the normal linear regression models. 64 Figure A.8. Predicted USAID Poverty Gap Based on Within-Year Imputation Bangladesh 2018/19 Ethiopia 2018/19 20 40 Estimated poverty rate (%) Estimated poverty rate (%) 18 38 16 36 14 34 12 32 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Tanzania 2020/21 Malawi 2019/20 36 30 Estimated poverty rate (%) Estimated poverty rate (%) 35 28 26 34 24 33 22 32 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Nigeria 2018/19 Vietnam 2016 35 38 Estimated poverty rate (%) 36 Estimated poverty rate (%) 30 34 25 32 30 20 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Note: Estimates are obtained by imputing from sample 1 into sample 2. 100 simulations are implemented. The standard errors are calculated using 100 bootstrap replications and are adjusted for complex survey design. Larger hollow symbols indicate that the estimates are statistically insignificantly different from the true poverty gap. Dashed lines represent the true poverty gap. Dotted lines represent confidence intervals of the true poverty gap. Estimates are obtained using the normal linear regression models. 65 Figure A.9. Predicted (Log) Consumption Based on Within-Year Imputation Bangladesh 2018/19 Ethiopia 2018/19 10.8 9.4 Estimated poverty rate (%) Estimated poverty rate (%) 10.78 9.35 10.76 9.3 10.74 10.72 9.25 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Tanzania 2020/21 Malawi 2019/20 12.1 13.8 Estimated poverty rate (%) Estimated poverty rate (%) 12.08 13.75 12.06 12.04 13.7 12.02 13.65 12 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Nigeria 2018/19 Vietnam 2016 10.22 11.1 Estimated poverty rate (%) Estimated poverty rate (%) 10.2 11.05 10.18 11 10.16 10.95 10.14 m1 m2 m3 m4 m5 m6 m7 m8 m9 m1 m2 m3 m4 m5 m6 m7 m8 m9 Imputation model Imputation model Note: Estimates are obtained by imputing from sample 1 into sample 2. 100 simulations are implemented. The standard errors are calculated using 100 bootstrap replications and are adjusted for complex survey design. Larger hollow symbols indicate that the estimates are statistically insignificantly different from the true consumption mean. Dashed lines represent the true consumption mean. Dotted lines represent confidence intervals of the true consumption mean. Estimates are obtained using the normal linear regression models. 66 Appendix B: Machine Learning Results Table B.1. ML Estimates, Tanzania 2020/21 Elastic Random True Indicators LASSO Net Forest rates 9.8 9.7 5.8 17.8 Headcount poverty rate (0.9) (0.9) (0.7) (1.1) 12.0 12.0 16.6 10.2 Near-poverty rate (0.8) (0.8) (1.0) (0.7) 3.8 3.8 0.1 9.8 Extreme poverty rate (0.6) (0.6) (0.0) (0.8) 1.9 1.8 0.4 4.6 Poverty gap (0.3) (0.3) (0.4) (0.3) 18.9 19.0 6.6 25.9 USAID Poverty gap (1.3) (1.3) (0.5) (1.1) Log of consumption mean 13.7* 13.7* 13.7* 13.7 (0.0) (0.0) (0.0) (0.0) N 4,644 67 Table B.2. ML Estimates, Vitenam 2016 Elastic Random True Indicators LASSO Net Forest rates 25.2 25.2 21.5 9.5 Headcount poverty rate (0.6) (0.6) (0.6) (0.4) 14.8 14.8 16.3 6.9 Near-poverty rate (0.4) (0.4) (0.5) (0.3) 4.7 4.7 3.5 1.2 Extreme poverty rate (0.4) (0.4) (0.3) (0.2) 7.3 7.3 5.9 2.5 Poverty gap (0.3) (0.3) (0.2) (0.2) 29.1 29.1 27.3 26.1 USAID Poverty gap (0.7) (0.7) (0.7) (0.9) Log of consumption mean 9.7 9.7 9.7 9.8 (0.0) (0.0) (0.0) (0.0) N 4,644 68 Table B.3. The list of selected variables in Lasso and Elastic Net models, Tanzania LASSO Elastic Net Non-standardized Standardized Non-standardized Standardized Head has primary education -0.050 -0.025 -0.049 -0.024 Head has secondary ordinary education 0.024 0.010 0.025 0.010 Head has secondary advanced education and higher 0.003 0.001 0.004 0.001 Household size -0.051 -0.167 -0.050 -0.164 Dependency Ratio -0.023 -0.021 -0.023 -0.022 Gender Ratio -0.046 -0.051 -0.046 -0.050 Household Head worked in unpaid apprentice in the last 12 months 0.048 0.003 0.045 0.003 Household Head worked in farm in the last 12 months -0.008 -0.004 -0.008 -0.004 Household Head employed in mining, manufacturing, construction -0.014 -0.003 -0.012 -0.003 Household Head employed in retail, transportation 0.105 0.028 0.103 0.027 Household Head employed in information and communication 0.209 0.020 0.208 0.020 Household Head employed in technical, administrative, education 0.084 0.019 0.083 0.019 Proportion of adult males that worked for a wage, salary, or commission -0.053 -0.025 -0.051 -0.024 Proportion of adult males that engaged in casual/ganyu labor in the last 12 months -0.003 -0.001 -0.000 0.000 Proportion of adult males that worked in unpaid apprentice in the last 12 months 0.091 0.005 0.083 0.005 Proportion of adult males that worked in farm in the last 12 months -0.004 -0.002 -0.005 -0.002 Proportion of adult males that worked in electricity or water supply 0.000 0.000 Proportion of adult males that worked in information and communication 0.075 0.006 0.072 0.006 Overcrowded -0.032 -0.043 -0.033 -0.045 Household dwelling roof materials -0.092 -0.030 -0.091 -0.030 Household dwelling floor materials -0.042 -0.021 -0.043 -0.021 Burnt bricks/Concrete walls 0.007 0.003 0.007 0.003 Piped water/Truck water 0.075 0.038 0.075 0.038 Flush/VIP toilet 0.061 0.031 0.062 0.031 Electricity for lighting 0.072 0.035 0.072 0.035 Household owns a chair or sofa -0.037 -0.018 -0.037 -0.018 Household owns a sewing machine -0.072 -0.019 -0.070 -0.019 Household owns an electric/gas stove 0.174 0.068 0.173 0.068 Household owns a refrigerator/freezer 0.051 0.018 0.050 0.018 Household owns a bicycle -0.014 -0.006 -0.013 -0.006 Household owns a motor vehicle 0.030 0.010 0.031 0.010 Household owns a computer 0.166 0.032 0.164 0.032 Household owns a mobile phone 0.042 0.014 0.040 0.013 Household owns an iron 0.010 0.005 0.010 0.005 Household owns an air c/fans 0.120 0.041 0.119 0.041 Anyone in the household owns livestock -0.039 -0.019 -0.038 -0.019 Household owns cows 0.033 0.010 0.030 0.009 Household consumed spaghetti, macaroni 0.042 0.012 0.042 0.012 Household consumed onions, tomatoes, carrots -0.138 -0.044 -0.135 -0.043 Household consumed sweets -0.035 -0.008 -0.031 -0.007 69 Household consumed biscuits, buns, scones 0.047 0.023 0.047 0.023 Household consumed potato 0.080 0.040 0.079 0.040 Household consumed beef 0.130 0.062 0.129 0.062 Household consumed eggs 0.105 0.040 0.104 0.040 Household purchased cigarettes or other tobacco 0.005 0.001 0.001 0.000 Household purchased matches -0.024 -0.010 -0.023 -0.010 Household purchased toothpaste, toothbrush 0.131 0.063 0.130 0.063 Household purchased personal products 0.058 0.029 0.058 0.029 Household purchased petrol or diesel 0.167 0.047 0.164 0.046 Household purchased cleaning products 0.162 0.042 0.161 0.042 Household spent on taxes 0.032 0.008 0.032 0.008 Household spent on wedding 0.076 0.035 0.075 0.035 Household purchased education 0.078 0.012 0.074 0.011 Household purchased schoolbooks -0.044 -0.017 -0.043 -0.017 Cereals, Grains, and Cereal Products -0.089 -0.022 -0.088 -0.022 Fruits 0.077 0.038 0.077 0.038 Meat, Fish and Animal Products 0.041 0.014 0.039 0.013 Milk/Milk Products 0.124 0.056 0.123 0.056 Nuts and Pulses 0.060 0.024 0.059 0.023 Root, Tubers, and Plantains 0.045 0.020 0.044 0.019 Spices/Condiments -0.290 -0.066 -0.282 -0.064 Sugar/Sugar Products/Honey 0.080 0.032 0.080 0.032 Vegetables -0.000 -0.000 -0.005 -0.001 _cons 14.281 0.000 14.277 0.000 MSE 0.18 0.18 R squared 0.63 0.63 N 1182 1,182 70 Table B.4. The list of selected variables in Lasso and Elastic Net models, Viet Nam LASSO Elastic Net Non-standardized Standardized Non-standardized Standardized Head`s age 0.003 0.039 0.003 0.039 Head`s ethnicity -0.125 -0.046 -0.125 -0.046 Primary education 0.037 0.016 0.036 0.016 Lower secondary education 0.051 0.023 0.050 0.023 Upper secondary education 0.121 0.043 0.120 0.043 College 0.220 0.055 0.219 0.055 Household size -0.151 -0.236 -0.150 -0.236 Dependency Ratio -0.087 -0.060 -0.087 -0.060 Gender Ratio -0.015 -0.013 -0.015 -0.013 Household Head worked for a wage, salary, or commission in the last 12 months -0.029 -0.010 -0.029 -0.010 Household Head engaged in casual/ganyu labor in the last 12 months -0.001 0.000 -0.001 -0.000 Household Head employed in industry 1 -0.018 -0.007 -0.018 -0.007 Household Head employed in industry 2 0.036 0.012 0.035 0.012 Household Head employed in industry 3 0.028 0.004 0.028 0.004 Household Head employed in industry 5 0.033 0.003 0.032 0.003 Proportion of adult males that engaged in casual/ganyu labor in the last 12 months -0.007 -0.004 -0.007 -0.004 Proportion of adult males that worked in farm in the last 12 months -0.019 -0.009 -0.019 -0.009 Proportion of adult males that worked in industry 1 -0.008 -0.003 -0.008 -0.003 Proportion of adult males that worked in industry 4 0.012 0.003 0.012 0.003 Proportion of adult males that worked in industry 5 0.019 0.002 0.019 0.001 log of residential area 0.196 0.113 0.196 0.112 Roof: cement 0.017 0.007 0.017 0.007 Wall: bricks 0.029 0.012 0.029 0.012 Wall:cement 0.051 0.007 0.050 0.007 Improved water source 0.006 0.017 0.006 0.017 Improved toilet source 0.029 0.049 0.029 0.049 Lighting source - Electricity -0.108 -0.015 -0.107 -0.015 Dwelling_cookfuel 0.016 0.006 0.016 0.006 Household owns a car 0.625 0.080 0.624 0.080 Household owns a motorbike 0.094 0.035 0.094 0.035 Household owns a bicycle -0.048 -0.024 -0.048 -0.024 Household owns a DVD 0.060 0.030 0.060 0.030 Household owns a TV 0.054 0.014 0.054 0.014 Household owns a computer 0.182 0.072 0.182 0.072 Household owns a refrigerator 0.124 0.061 0.125 0.061 Household owns a sewing machine 0.063 0.013 0.063 0.013 Household owns an electric/gas cooker 0.122 0.042 0.122 0.042 Anyone in the household cultivate any plot -0.072 -0.036 -0.072 -0.036 Anyone in the household earn revenues from husbandry, hunting, trapping and dome -0.023 -0.012 -0.023 -0.012 Household obtains goat/sheep 0.027 0.002 0.026 0.002 71 Household obtains chickens -0.023 -0.011 -0.022 -0.011 Household consumed Noodle last 30 days 0.026 0.009 0.026 0.009 Household consumed Peas, beans last 30 days 0.047 0.024 0.047 0.024 Household consumed tomatoes last 30 days 0.064 0.028 0.064 0.028 Household consumed tea, coffee last 30 days 0.022 0.010 0.064 0.024 Household consumed potatoes last 30 days 0.033 0.015 0.022 0.010 Household consumed beef last 30 days 0.117 0.056 0.033 0.015 Household consumed ice cream & yogurt last 30 days 0.045 0.019 0.117 0.056 Household consumed chicken last 30 days 0.078 0.038 0.045 0.019 Household purchased matches -0.040 -0.014 0.078 0.038 Household purchased petrol or diesel 0.068 0.026 -0.040 -0.014 Household purchased cleaning products 0.059 0.014 0.068 0.026 Household purchased soap 0.055 0.023 0.059 0.014 Household spent on wedding last 12 months 0.252 0.040 0.055 0.023 Household spent on building housing accommodation -0.039 -0.015 0.251 0.039 Household spent on house repair and maintenance over the past 12 months 0.038 0.011 -0.039 -0.014 Household purchased tuition fee 0.080 0.039 0.038 0.011 Household purchased school uniform -0.011 -0.006 0.080 0.039 Consumption category last 30 days: Fruits 0.064 0.024 -0.011 -0.006 Consumption category last 30 days: Milk/Milk Products 0.042 0.021 0.042 0.021 Consumption category last 30 days: Peanuts & sesame 0.014 0.006 0.014 0.006 Consumption category last 30 days: Sugar/confectionery/ molasses 0.053 0.019 0.052 0.019 Consumption category last 30 days: Vegetables -0.052 -0.006 -0.052 -0.006 Household has living conditions improved in 5 years 0.002 0.001 0.002 0.001 _cons 8.494 0.000 8.494 0.000 MSE 0.12 0.12 R squared 0.71 0.71 N 9,296 9,296 72 Table B.5. Variable importance scores in Random Forest, Tanzania Variable Importance Head`s age 0.0546 Head is literate 0.0409 Head has primary education 0.0462 Head has secondary ordinary education 0.0450 Head has secondary advanced education and higher 0.0640 Household size 0.1974 Dependency Ratio 0.1003 Gender Ratio 0.0620 Household Head worked as an employee in the last 12 months 0.0396 Household Head worked as self-employed in the last 12 months 0.0388 Household Head worked in unpaid apprentice in the last 12 months 0.0318 Household Head worked in farm in the last 12 months 0.1056 Household Head employed in mining, manufacturing, construction 0.0492 Household Head employed in retail, transportation 0.0452 Household Head employed in electricity or water supply 0.0371 Household Head employed in information and communication 0.0474 Household Head employed in technical, administrative, education 0.0768 Proportion of adult males that worked for a wage, salary, or commission 0.0412 Proportion of adult males that self-employed 0.0397 Proportion of adult males that worked in unpaid apprentice 0.0524 Proportion of adult males that worked in farming 0.0443 Proportion of adult males that worked in mining, manufacturing, construction 0.0516 Proportion of adult males that worked in retail, transportation 0.0445 Proportion of adult males that worked in electricity or water supply 0.0348 Proportion of adult males that worked in information and communication 0.0495 Proportion of adult males that worked in technical, administrative, education 0.0708 Number of rooms 0.0549 Overcrowded 0.1625 Household dwelling roof materials 0.0631 Household dwelling floor materials 0.3701 Burnt bricks/Concrete walls 0.0521 Piped water/Truck water 0.2013 Flush/VIP toilet 0.6522 Charcoal for cooking 0.1125 Electricity for lighting 1.0000 Household owns a chair or sofa 0.0440 Household owns a radio 0.0433 73 Household owns a tv 0.4729 Household owns a DVD 0.1763 Household owns a sewing machine 0.0475 Household owns an electric/gas stove 0.9191 Household owns a refrigerator/freezer 0.1640 Household owns a bicycle 0.0443 Household owns a motor vehicle 0.0803 Household owns a computer 0.1311 Household owns a mobile phone 0.0618 Household owns an iron 0.1057 Household owns an air c/fan 0.1585 Household owns decoder 0.0770 Anyone in the household owns livestock 0.0521 Household owns goat 0.0542 Household owns chicken 0.0455 Household owns cows 0.0532 Household consumed spaghetti, macaroni 0.0774 Household consumed beans 0.0696 Household consumed onions, tomatoes, carrots 0.0782 Household consumed fruits 0.0938 Household consumed sugar 0.1569 Household consumed sweets 0.0553 Household consumed tea 0.0209 Household consumed biscuits, buns, scones 0.1033 Household consumed potato 0.0627 Household consumed beef 0.1249 Household consumed yogurt 0.0600 Household consumed chicken 0.0757 Household consumed eggs 0.1615 Household purchased cigarettes or other tobacco 0.0459 Household purchased matches 0.0514 Household purchased toothpaste, toothbrush 0.1203 Household purchased personal products 0.0659 Household purchased petrol or diesel 0.0998 Household purchased cleaning products 0.2526 Household purchased soap 0.0623 Household spent on taxes 0.0842 Household spent on construction 0.0480 Household spent on wedding 0.0548 Household spent on repair 0.0397 74 Household purchased education 0.0623 Household purchased schoolbooks 0.0541 Household purchased uniform 0.0545 Cereals, Grains, and Cereal Products 0.0920 Oil/fats 0.0643 Fruits 0.1559 Meat, Fish and Animal Products 0.0886 Milk/Milk Products 0.0677 Nuts and Pulses 0.0906 Root, Tubers, and Plantains 0.0705 Spices/Condiments 0.1161 Sugar/Sugar Products/Honey 0.1868 Vegetables 0.1040 Note: The values are scaled proportional to the largest value in the set. 75 Table B.6. Variable importance scores in Random Forest, Viet Nam Variable Importance Head`s age 0.0086 Head`s ethnicity 0.2932 Primary education 0.0069 Lower secondary education 0.0072 Upper secondary education 0.0102 College 0.0309 Household size 0.0493 Dependency Ratio 0.0218 Gender Ratio 0.0099 Household Head worked for a wage, salary, or commission in the last 12 months 0.0094 Household Head worked as self-employed in the last 12 months 0.0083 Household Head engaged in casual/ganyu labor in the last 12 months 0.0086 Household Head employed in industry 1 0.0080 Household Head employed in industry 2 0.0110 Household Head employed in industry 4 0.0119 Household Head employed in industry 3 0.0115 Household Head employed in industry 5 0.0137 Proportion of adult males worked for a wage, salary, or commission 0.0136 Proportion of adult males that engaged in casual/ganyu labor in the last 12 months 0.0103 Proportion of adult males that worked in farm in the last 12 months 0.0121 Proportion of adult males that worked in industry 1 0.0094 Proportion of adult males that worked in industry 2 0.0107 Proportion of adult males that worked in industry 3 0.0120 Proportion of adult males that worked in industry 4 0.0126 Proportion of adult males that worked in industry 5 0.0138 log of residential area 0.0212 Roof: cement 0.0087 Roof: cement 0.0140 Wall: bricks 0.0144 Wall:cement 0.0161 Improved water source 0.0208 Improved toilet source 0.1869 Lighting source- Electricity 0.0115 Dwelling_cookfuel 0.0089 Household owns a car 0.1665 Household owns a motorbike 0.0253 Household owns a bicycle 0.0109 76 Household owns a DVD 0.0117 Household owns a TV 0.0183 Household owns a computer 0.4517 Household owns a refrigerator 1.0000 Household owns a sewing machine 0.0100 Household owns an electric/gas cooker 0.2177 Anyone in the household cultivate any plot 0.0560 Anyone in the household earn revenues from husbandry, hunting, trapping and dome 0.0169 Household obtains goat/sheep 0.0112 Household obtains chickens 0.0120 Household obtains pigs 0.0101 Household consumed Noodle last 30 days 0.0118 Household consumed Peas, beans last 30 days 0.0134 Household consumed tomatoes last 30 days 0.0166 Household consumed fruits last 30 days 0.0239 Household consumed sugar last 30 days 0.0125 Household consumed tea, coffee last 30 days 0.0164 Household consumed potatoes last 30 days 0.0114 Household consumed beef last 30 days 0.2596 Household consumed ice cream & yogurt last 30 days 0.0147 Household consumed eggs last 30 days 0.0177 Household consumed chicken last 30 days 0.0117 Household purchased matches 0.0111 Household purchased petrol or diesel 0.0291 Household purchased cleaning products 0.0190 Household purchased soap 0.0131 Household spent on wedding last 12 months 0.0220 Household spent on building housing accommodation 0.0125 Household spent on house repair and maintenance over the past 12 months 0.0104 Household purchased tuition fee 0.0143 Household purchased schoolbooks 0.0108 Household purchased school uniform 0.0109 Consumption category last 30 days: Fruits 0.0321 Consumption category last 30 days: Milk/Milk Products 0.0131 Consumption category last 30 days: Peanuts & sesame 0.0107 Consumption category last 30 days: Sugar/confectionery/ molasses 0.0138 Consumption category last 30 days: Vegetables 0.0169 Household has living conditions improved in 5 years 0.0117 77