Updating Poverty Estimates at Frequent Intervals in the Absence of Consumption Data: Methods and Illustration with Reference to a Middle-Income Country

Obtaining consistent estimates on poverty over time as well as monitoring poverty trends on a timely basis is a priority concern for policy makers. However, these objectives are not readily achieved in practice when household consumption data are neither frequently collected, nor constructed using consistent and transparent criteria. This paper develops a formal framework for survey-to-survey poverty imputation in an attempt to overcome these obstacles, and to elevate the discussion of these methods beyond the largely ad-hoc efforts in the existing literature. The framework introduced here imposes few restrictive assumptions, works with simple variance formulas, provides guidance on the selection of control variables for model building, and can be generally applied to imputation either from one survey to another survey with the same design, or to another survey with a different design. Empirical results analyzing the Household Expenditure and Income Survey and the Unemployment and Employment Survey in Jordan are quite encouraging, with imputation-based poverty estimates closely tracking the direct estimates of poverty.

Obtaining consistent estimates on poverty over time as well as monitoring poverty trends on a timely basis is a priority concern for policy makers. However, these objectives are not readily achieved in practice when household consumption data are neither frequently collected, nor constructed using consistent and transparent criteria. This paper develops a formal framework for survey-to-survey poverty imputation in an attempt to overcome these obstacles, and to elevate the discussion of these methods beyond the largely ad-hoc efforts in the existing literature. The framework introduced here imposes few restrictive assumptions, works with simple variance formulas, provides guidance on the selection of control variables for model building, and can be generally applied to imputation either from one survey to another survey with the same design, or to another survey with a different design. Empirical results analyzing the Household Expenditure and Income Survey and the Unemployment and Employment Survey in Jordan are quite encouraging, with imputation-based poverty estimates closely tracking the direct estimates of poverty.

I. Introduction
Building on the success of the Millennium Development Goal that saw the global poverty rate in 1990 halve before 2015, the international community has redoubled its efforts to reduce poverty further. For example, the World Bank recently proposed an ambitious goal of reducing the global extreme poverty rate to no more than 3 percent by 2030. In this connection, measuring poverty serves as an instrumental tool for poverty eradication; reliable estimates can help us understand which policies work and which do not work, and how efficient they are.
Estimation of poverty is, however, a rather involved process, one that typically imposes significant demands on financial resources and that needs to draw on specialized technical expertise. The process often confronts practical challenges that can undermine efforts to track poverty trends for timely policy interventions. For instance, if poverty estimates are to be compared over time, a crucial requirement is that both the consumption aggregates and poverty lines be consistently constructed across survey rounds and be strictly comparable. However, studies document that this seemingly undemanding condition is less often satisfied than one might think. A well-known example is the vibrant debate in India in the early 2000s where, among other factors, changes in the questionnaire design had resulted in considerable controversy around the degree and direction of change in poverty during the 1990s. According to official estimates, the headcount poverty rate decreased by 10 percentage points-equivalent to 60 million people escaping poverty-between 1993/1994 and 1999/2000. In contrast, independent researchers produced conflicting estimates suggesting a rate of decline ranging from slightly slower than the official estimates (Deaton and Dreze, 2002;Kijima and Lanjouw, 2003;Tarozzi, 2007) to one estimate suggesting a mere three percentage point decline in poverty (Sen 2 and Himanshu, 2005). This latter estimate was associated with the absolute number of people living in poverty remaining unchanged during the 1990s. 1 Another issue that commonly hinders the tracking of poverty over time is that consumption surveys are typically conducted only occasionally (particularly in developing countries), and poverty estimates are not available in the intervening years during which surveys have not been implemented. Yet another issue is that collecting, cleaning, and preparing data for analysis can be a protracted process that, at times, can span multiple years from the start of field work to the time when the data are ready for analysis. In all these cases, the challenge can be broadly regarded as one involving missing data: consumption data are available in one period but in the next period(s) are either not available, or are not comparable.
The topic of imputing missing consumption data from one survey to another (i.e., survey-tosurvey imputation) has received some attention in the statistics literature, but relatively little in the economics literature. With a handful of exceptions, the estimation framework utilized by most current economic studies that focus on poverty comparisons appears to be largely based on earlier work exploring the feasibility of survey-to-census imputation by Elbers, Lanjouw, and Lanjouw (2003). This survey-to-census imputation model provides a related, but not perfectly transferable, econometric model for survey-to-survey imputation. 2 It can be contrasted with the multiple imputation (MI) approach discussed in the statistics literature, which has grown rapidly since it was first introduced by Donald Rubin in the late 1970s (Rubin, 1978). Indeed, the widespread availability of a variety of missing data imputation procedures offered in most 1 See Deaton and Kozel (2005) for further discussion on this poverty debate in India. See also Christiaensen et al. (2012) and World Bank (2012a) for similar issues compromising the comparability of poverty estimates in Russia and Vietnam respectively. 2 Significant differences exist between survey-to-census imputation and survey-to-survey imputation methods. In particular, the former focuses on intratemporal (i.e., same point in time) imputation for producing poverty estimates at lower administrative levels than a survey would reasonably allow, while the latter focuses on intertemporal imputation for poverty estimates at more aggregated population groups. These differences clearly raise distinct econometric issues for each method. We will discuss the relevant studies in the next section on literature review. current statistical software packages can pose a challenge to the analyst in identifying the best method to use, and especially in assessing which estimation technique is best suited to the specific economic question, assumptions and data requirements at hand.
In this paper we make new contributions on both the theoretical and empirical front. 3 On the theoretical front, we provide a formal framework for survey-to-survey poverty imputation with several original features ranging from assumption testing to model building and estimation variance. First, we provide an explicit discussion of the different assumptions required for the appropriate application of our poverty imputation method, which are often only implicitly considered in existing studies. In particular, we show that the key and traditionally-made assumption of constant parameters in the household consumption model is both unduly restrictive and unlikely to hold in practice, and we offer a less restrictive assumption instead.
Existing studies commonly invoke the assumption of constant parameters, but to our knowledge none provides a direct test for this assumption. We thus propose formal tests for our general assumption as well as for this traditional but more restrictive assumption, and we also discuss further what can be done when these assumptions are relaxed.
Second, our proposed formula for the variance of the estimated poverty rate is simple and accords with the one commonly used in the statistics literature. Our framework also allows us to provide more insights into the selection of control variables for model building-which has received relatively cursory treatment in the literature. An enhanced understanding of this model selection process coupled with certain additional assumptions enables us to offer bound estimates 3 We focus in this paper on predicting household consumption in cross sectional rather than panel data. For predicting poverty mobility based on synthetic (pseudo) panel data, see Dang, Lanjouw, Luoto, and McKenzie (2014), and Dang and Lanjouw (2013). We also focus on survey to survey imputation; for survey to census imputation, see, e.g., Elbers, Lanjouw, and Lanjouw (2003) and Tarozzi and Deaton (2009) for economic studies, and Rao (2003) for statistical studies. For a related literature on partial identification with different samples see, e.g., Manski (2003); see also Ridder and Moffitt (2007) for a recent review on the econometrics of data combination. even in cases where data constraints are so severe that only very few control variables are available. Our paper thus aims at providing a systematic and comprehensive treatment of surveyto-survey poverty imputation methods that appear to be implemented on a somewhat ad hoc basis in most of the existing economics literature.
Third, we also show that, given some standard assumptions, our framework can be generally applied to imputation either from one survey to another survey with the same design, or to another survey of a different design. The former is relevant to situations where consumption data in a more recent survey round are not consistent with those in an earlier round (say, owing to measurement errors or poorly constructed consumption aggregates), or where no reliable consumer price index (CPI) data exist to update the poverty line over time. On the other hand, imputation from one survey to another of a different design is pertinent to situations where one survey is implemented less frequently but collects consumption data (e.g., household expenditure or budget surveys), while the other survey is conducted more frequently but does not collect consumption data (e.g., labor force surveys). Using surveys of different designs can remarkably expand the application range of imputation methods, but the inevitable tradeoff is that the sample statistics estimated from surveys of different designs would likely be different due to various reasons, which would in turn render imputation-based estimates incomparable. We propose rather straightforward standardization procedures to harmonize the different surveys and show that employing these procedures can produce estimates that are statistically indistinguishable from the actual poverty rates, in sharp contrast to the severely biased estimates obtained from non-standardized data.
Finally, in constructing our framework, we offer a critical review of the economics literature and of the related studies on data imputation in statistics. Our paper thus also represents an early attempt at distinguishing the currently available methods in statistics and economics as well as incorporating the advances from the former into the latter. This is consistent with similar ongoing efforts in other disciplines that build on the multiple imputation method in statistics to better address their own disciplinary needs. 4 Empirically, we illustrate our method with an application to Jordan, a particularly interesting case for analysis. Not much is known about poverty trends since Jordan's Department of Statistics (DOS) last conducted its Household Expenditure and Income Survey (HEIS) in 2010.
In the meantime, this country's economy has experienced several major events such as the introduction of new poverty-reduction policies by the government (e.g., in accordance with its recent Poverty Reduction Strategy), economic reforms (e.g., reducing its petroleum subsidies and implementing a targeted cash transfer), and shocks due to higher energy prices. Socio-political change and unrest in neighboring Syria and Egypt also add further uncertainty to the economy.
Given this fast evolving context, policy makers are keenly interested in tracking poverty trends on a more frequent and timely basis. In contrast with the HEIS survey which was last conducted in 2010, DOS administers the Employment-Unemployment Survey, a labor force survey (LFS) with wide geographical coverage, on a quarterly basis. We exploit the LFS, which does not collect consumption data and has a different design from the HEIS, to fill the missing poverty data problem in Jordan for the years the HEIS is absent.
We validate our imputation-based estimates of poverty against those obtained from the actual consumption data (or design-based estimates) for the two years 2008 and 2010 when consumption data are available, before imputing estimates for other years when consumption data are not available. 5  This paper consists of five sections. A review of recent studies in economics and statistics is provided in the next section. This is followed in Section III by the theoretical framework, estimation procedures, and empirical application for imputation using surveys of the same design. Section IV extends this framework to imputation for surveys of different designs and then provides empirical illustrations. Section V concludes.

II. Review of Missing Data Imputation Methods in Recent Studies
The idea of imputing missing household consumption has existed in various forms in the economic literature, but there was an upsurge of interest in the 2000s. Except for the survey-tosurvey imputation on India by Deaton and Drèze (2002) and Tarozzi (2007), earlier work on poverty based on imputations largely focuses on survey to census imputation and includes a study on Ecuador by Hentschel et al (2000), which is followed by a formalization of the approach in Elbers, Lanjouw, and Lanjouw (ELL) (2003). 6 While a consumption survey collects 5 While a more general and widely used statistical term "model-based" exists which can include the term "imputation-based", we prefer to use the latter to emphasize the more specific imputation nature of our estimates. We also use the terms "imputation" and "prediction" interchangeably in this paper. 6 An earlier study by Ravallion (1996) proposes using time series data consisting of aggregated agricultural wages and outputs to forecast poverty rates in India. Another method to track poverty over time constructs an index for consumption data, its limited sample size means the survey is only representative at highly aggregated administrative levels; conversely, the population census has exactly the opposite strength and weakness, being nationally representative at a far more disaggregated administrative level but offering no consumption data. Applying the estimated model parameters of consumption from a household expenditure survey onto overlapping variables with the census, ELL can predict consumption data into the latter. These data can then be disaggregated to estimate poverty at lower administrative levels than are possible using the household survey alone. This method is sometimes referred to as the "poverty-mapping" approach owing to its extensive presentation of poverty estimates in a cartographic format. Kijima and Lanjouw (2003) then apply this method to provide survey-to-survey imputation-based poverty estimates for India.
Building on this approach, Stifel and Christiaensen (2007) combine household expenditure survey data with more recent rounds of the Demographic and Health Survey (DHS) in Kenya to impute household consumption into the latter. A more recent paper by Christiaensen et al. (2012) predicts consumption in the second round of a consumption survey using the estimated model parameters from the first round of the same survey for several countries. By generating consumption data in the second round that are more consistent with those in the first round, this study indicates that imputation methods can help obviate the need of updating expenditure data with problematic deflators over time. Using seven rounds of household survey data from household wealth based on household assets (Sahn and Stifel, 2000). This method's greatest strength is perhaps that it is straightforward to implement in most contexts where information on household assets is available; however, the non-monetary nature of asset indices renders poverty estimates more difficult to interpret. Another branch of the (statistics and economics) literatures constructs weights to adjust estimates in the presence of missing data instead; for studies that follow this approach, see, e.g., Tarozzi (2007) and Bethlehem, Cobben, and Schouten (2011). Uganda, Mathiassen (2013) also finds imputation-based poverty estimates to accurately track the true poverty rates in most cases.
In the same spirit, another approach is to combine a household expenditure survey and a more recent labor force survey to impute consumption into the latter and subsequently to estimate poverty. This approach has been implemented for Mozambique by Mathiassen (2009).
Douidich, Ezzrari, van der Weide, and Verme (2013) similarly take advantage of an almost identical design between the household expenditure survey and the LFSs in Morocco to impute poverty rates in the latter and find very encouraging results.
Among all these cited studies, however, only the three most recent studies by Christiaensen et al. (2012), Mathiassen (2013), and Douidich et al. (2013) offer validation for their estimates against the true poverty rates before extending their analysis to the years without consumption data. It is worth noting that all these validation studies restrict their analysis to surveys of the same design, but none of these studies explicitly discusses this assumption that their studies rely on. 7 Missing data imputation, however, does not appeal to economics researchers alone. The few existing studies in economics appear to have been developed independently of a much more established literature on missing data imputation in statistics. Starting with the seminal work on imputation methods by Rubin in the late 1970s (Rubin, 1977(Rubin, , 1978, imputation methods have steadily become counted among the main tools of a professional statistician. Government agencies such as the U.S. Census Bureau regularly use imputation to fill in important missing data on various statistics for income (Census Bureau, 2014a) and labor (Census Bureau, 2014b). 7 A recent study that uses the ELL approach for poverty imputation for Sri Lanka by Newhouse et al. (2014) is an exception. It finds that differences in sampling design can undermine the accuracy of survey-to-survey predictions.
Another study by Dabalen et al. (2014) imputes poverty estimates from one household survey round to another round for Liberia but does not provide validation due to missing consumption data in the latter.

9
However, due to different disciplinary focuses, while the imputation methods used in statistics share common features with those used in economics, important differences exist.  Stifel and Christiaensen (2007), Christiaensen et al., (2012), andMathiassen (2013), and for statistics include Rubin (1987), Little and Rubin (2002), Schafer and Graham (2002), van Buuren (2012), and Carpenter and Kenward (2013). These studies do not represent all the existing studies in their respective literatures, but they are indicative of the "typical" approach used within each field. 8 The common and different features across economic and statistical studies are broadly classified along several dimensions including the target population, the type and proportion of missing data as well as the mechanism underlying missing data, and timing and modeling issues.
Several findings emerge from Table 1. There is much commonality between imputation methods used in economics and statistics, even though statistical imputation methods are more general than economic imputation methods. For example, economic studies mostly focus on a single missing variable, usually the household consumption variable; conversely, statistical studies pay attention to missing variables that can either be outcome or explanatory ones (rows 1.1 and 1.2, Table 1). Economic studies mostly investigate a missing data mechanism defined in statistical terminology as missing data at random (MAR) (row 2) and employ parametric and semi-parametric estimation techniques (row 3.3); statistical studies, however, broadly consider other missing data mechanisms and estimation techniques as well.
The differences between economic studies and statistical studies stem largely from their different disciplinary focuses. The cited economic studies are mostly interested in predicting consumption in a new survey (census) round, while the statistics studies pay more attention to filling in the missing data in an existing data set. Consequently, economists usually impute from one survey to another (row 4.1) with missing consumption data (row 5) that are implemented either at the same time or more recently (row 6). In contrast, statisticians often impute missing data within the same survey where usually less than half of the data are missing. Another difference is that, economists appear to use economic theory alongside statistical theory for model selection, even though there is little formal discussion of this process in existing studies (row 3.4).
In short, all these reviewed economic and statistical studies rely on a key assumption that the (distributions of the) parameters estimated from the first survey (for economics) or the observed complete data (for statistics) be identical for the missing data (row 3.1). This assumption is practically a prerequisite for any existing work with data imputation; another implicit assumption which is not often discussed is that the two surveys (or the complete data and the missing data sources) have comparable designs. However, hardly any economic studies explicitly discuss the assumption of comparable survey design, and none tests for the assumption of identical parameters. This latter assumption in fact constitutes the major divergence between the intratemporal survey-to-census imputation and intertemporal survey-to-survey imputation. We will discuss in more detail these assumptions and what should be done when these are relaxed as well as other modelling issues in our imputation framework.

III. Imputation Using Surveys of the Same Design III.1. Estimation Framework
Let x j be a vector of characteristics that are commonly observed between the two surveys, where j indicates the type of survey that can either be the same household expenditure survey or another survey. 9 Subject to data availability, these characteristics can include household variables such as the household head's age, sex, education, ethnicity, religion, language, occupation, household assets or incomes, and other community or regional variables.
Occupation-related characteristics can generally include whether household heads work, the share of household members that work, the type of work that household members participate in, as well as context-specific variables such as the share of female household members that participate in the labor force. Regional characteristics related to macroeconomic trends such as (un)employment rates or commodity prices can also be included if such data are available. As discussed below, these variables would play a critical role in capturing the changes in estimated poverty rates.
Household consumption (or income) data exist in one survey but are missing in the other survey, thus without loss of generality, let survey 1 and survey 2 respectively represent the survey with and without household consumption data, and y 1 represent household consumption in survey 1. More generally, these two surveys can be either in the same period or in different periods. We focus in this section on the latter case, before discussing the more complicated cases of combining surveys of different designs in the same period and in different periods in the next section. 10 To further operationalize our estimation, we assume that the linear projection of household consumption on household and other characteristics (x) for survey 1 is given by a cluster random-effects model Were the household consumption data y 2 available in survey 2, we assume the same linear projection of household consumption on household characteristics 11 where, conditional on household characteristics, the cluster random effects and the error terms are assumed uncorrelated with each other and to follow a normal distribution . Equation (1) thus provides a linear random effects model that can be straightforwardly estimated using most available statistical packages.
We are most interested in the poverty estimates for survey 2, where the consumption data are missing. Let z 2 be the poverty line in period 2, if y 2 existed the poverty rate P 2 in this period could be estimated with the following quantity ) ( 2 2 z y P ≤ (3) where P(.) is the probability (or poverty) function that gives the percentage of the population that are under the poverty line z 2 in survey 2. P(.) is thus non-increasing in household consumption.
We further make the following assumptions that underlie the theoretical framework.
Assumption 1: Let x jt denote the values of the variables observed in survey j at time time t, for j= 1, 2, and t= 1,…, T; and let X t denote the corresponding measurements in the population. Then x jt =X t for all j and t.
Assumption 1 is crucial for imputation and ensures that the sampled data in survey 1 and survey 2 are representative of the population in each respective time period. Put differently, this assumption implies that, for two contemporaneous (i.e., implemented in the same time period) surveys, these estimates are identical since they equal the population values; and for two noncontemporaneous surveys, estimates based on the same characteristics x in these two surveys are consistent and comparable over time. While surveys of the same design (and sample frame) are more likely to be comparable and can thus satisfy Assumption 1, there is no a priori guarantee that these surveys can provide comparable estimate across two different time periods, or even the same estimates in the same time periods. Examples where Assumption 1 may be violated include the cases where national statistical agencies change the questionnaire for the same survey over time as with the NSS for India discussed earlier, or where one considers different surveys that focus on different population groups (e.g., the average household size may differ between a household survey and a labor force survey depending on the specific definition that is used).
Violation of Assumption 1 rules out the straightforward application of survey-to-survey imputation technique and would require that additional assumptions be made on the relevance of the estimated parameters from one survey to the other. To make notation less cluttered, we will suppress the subscript t for time in subsequent expressions.

Assumption 2: Let P
∆ and x ∆ respectively represent the changes in poverty rates and the explanatory variables x over time, and j Θ the set of parameters ( ) that map the variables x into the household consumption space in period j where the consumption data are available. Then , where P(.) is the given poverty function.
Assumption 2 implies that, given the estimated consumption parameters from survey 1, the changes in the distributions of the explanatory variables x between the two periods can capture the change in poverty rate in the next period. Given the commonly observed variables in the two surveys, this assumption allows the imputation of the missing household consumption for survey 14 2. In practical terms it implies that the change in poverty rates over time is attributable to changes in the explanatory variables x rather than the returns to characteristics (or economic structure) and the unexplained characteristics (or random shocks)-which are respectively represented by 1 β and 1 ε . In other words, given the same observed characteristics x, households would be subject to the same level of poverty regardless of the time period the data were collected. While this assumption may seem counterintuitive, it may be especially relevant to economies where the returns to characteristics do not change or simply change little over time.
Clearly, this is a testable assumption if household consumption is available for both of the periods under consideration.
As discussed earlier, previous studies commonly assume that the distributions of the household consumption parameters 1 β , 1 µ , and 1 ε in equations (1) and (2) based on the data in survey (or period) 1 remain the same for the data in survey (period) 2. Assumption 2 is less restrictive since it allows for the estimated parameters to change over time, as long as the changes in the distribution of the variables x alone can correctly capture the change in poverty rate. Technically speaking, Assumption 2 only requires that, overall the parts of the consumption distributions below the poverty line for both periods (that can be explained by the changes in x in our model) be equal and not all the percentiles along the consumption distributions be equal as implied by the assumption made in existing studies; this result is formally stated in Corollary 1.2 below. 12 12 Assumption 2 is also more general in the sense that, it practically allows for the estimated parameters to change even in different directions, as long as the changes in the x variables can capture the net changes in poverty given the estimated parameters in period 1. Another difference between Assumption 2 and the stricter assumption of constant parameters related to model checking, is that the backward imputation (i.e., using the predicted coefficients from the later survey round to impute backwards on the data in the earlier survey round) may not necessarily yield the same results as the forward imputation. The difference in terms of prediction accuracy between the two would also depend on the changes in these predicted coefficients, in addition to the changes in the x characteristics over time. The Given these two assumptions, we propose the following proposition that lays out the estimation framework. 13 Proposition 1: Imputation framework Given Assumptions 1 and 2, the poverty rate based on data in survey 2 can be predicted using data in survey 1. In particular, let P(.) be the poverty function and 1 2 y be defined as Corollary 1.1 Let 1 β , 1 µ , and 1 ε represent the estimated parameters obtained from equation (1)

Corollary 1.2
Instead of Assumption 2, assume the traditional but more restrictive assumption that the consumption model parameters in equation 1 remain the same in period 2 (that is, 2 1 β β ≡ , 2 1 µ µ ≡ , and 2 1 ε ε ≡ ). Given Assumption 1 and this stricter assumption, we have where W(.) is a general one-to-one mapping welfare function, which includes the poverty function P(.) as a special case.
Some remarks about Proposition 1 and its corollaries may be useful. First, the simulation of the error terms for households in survey 2 is mandatory rather than a matter of choice since we former type of changes is set to zero under the stricter assumption but allowed to occur with our more general assumption. 13 Note that in situations where Assumption 1 fails (e.g., one survey is representative of the whole population while the other survey specially targets a population segment such as elderly people), survey imputation may still be feasible conditional on the fact that Assumption 2 holds. In such cases, Assumption 2 essentially boils down to implying that the estimated parameters for equation (1) with the appropriate adjustments (say, by including the dummy variables for different population groups) apply to the population group targeted by the other survey.
are working with two cross sections, which by definition precludes the linkage of households in survey 1 to those in survey 2. Second, we use the poverty line in period 1 in equation (5) rather than the poverty line in period 2 to be consistent with the estimated parameters that are also obtained from the data in period 1. More generally, the poverty line to be used should come from the same time period as the estimated parameters. The consistency between these estimated parameters in the same period is by construction, and can in fact provide more comparable poverty estimates in contexts where there is reason to believe the poverty line (and/ or consumption aggregates) is not consistently updated across the two different periods.
Third, the variance for the estimated poverty rate in (6) consists of two components, one for the variance of the estimated poverty rate conditional on household characteristics averaged over the S simulations (i.e. first term on the right hand side in (6)), and the other the variance of the average of the predicted poverty rate (the second term on the right hand side in (6)). This is related to Rubin's (1987) variance formula, the difference being that we exclude a component due to simulation errors in his formula. 14 The reason is simple, if the number of simulations is large enough, this component would be negligible. We thus recommend using a large number of simulation (e.g., at least 1,000 simulations) in the estimation procedures proposed in the next section. 15 Furthermore, the first and second terms on the right hand side in (6)  Finally, the assumption of constant parameters employed by most, if not all, existing studies is overly restrictive and much more demanding than our Assumption 2. As implied by Corollary 1.2., this assumption can lead to a number of very general results such as any imputed quantities-including mean consumption or any percentile along the consumption distributioncan approximate those based on the true data. These results are sweepingly broad and are thus unlikely to hold under most contexts. We will come back to more discussion on the validity of this assumption in the next section on empirical results.
In practice, the set of the observed overlapping variables between the two survey rounds can be small (i.e., few common variables exist between the two surveys), which may effectively result in these variables being unable to capture well the intertemporal change in poverty. Put differently, Assumption 2 may not hold due to the existence of a limited set of overlapping variables, which can in turn invalidate our imputation framework. However, in such cases, if the trend in the unobserved variables across the survey round and the direction of their correlation with household consumption is known (or can be inferred from previous survey rounds), we can still obtain bound estimates of poverty as proposed in the following proposition. While Proposition 2 appears to require much additional information, it is relevant in such cases as where no data on household assets are available. Since assets are positively correlated with household consumption (see, e.g., Filmer and Pritchett (2001)), additional knowledge about the trend of asset ownership over time (say, from macroeconomic data or qualitative surveys) can be useful in helping determine the bias of estimates.

III.2. Validation in the Jordanian Context
We turn in this section to discussing poverty imputation using the 2008 and 2010 rounds of the HEIS. Since we have the actual consumption data in 2010, we can validate our imputation method by imputing from 2008 into 2010 to obtain imputation-based poverty estimates pretending that consumption data did not exist in the latter year, and then compare these estimates with the design-based (true) estimates based on the actual consumption data. We provide an overview of the country background and data description before discussing estimation results.

III.2.1. Country Background: Poverty in Jordan
The official poverty line in Jordan is constructed based on a "cost of basic needs" approach with a common food and non-food basket for all households, where the food consumption is year. This poverty line is then fixed for 2010 and is adjusted for changes in the cost of living using official CPI deflators to obtain a comparable poverty line in 2008 and its associated poverty rate of 19.5 percent.
Macroeconomic trends shown in Figure 1 appear to corroborate the poverty decline as shown by the household consumption data, since the downward sloping poverty trend is consistent with that of growth in real GDP per capita. The period between 2002 and 2007 sees rapid growth, which, however, slows down in the subsequent period between 2008 and 2010. Real GDP per capita grew by 3 percent and poverty was estimated to fall by about 5 percentage points in this latter period.
While poverty could be tracked between 2008 and 2010 with the consumption data from the HEIS, no consumption data exists after 2010 that can be used to monitor poverty trends.
Projections show per capita GDP growth to be weak, but this alone does not say much about poverty trends. The recent subsidy reforms and the associated cash transfer could well impact poverty, as could the various economic stresses including a continued weak labor market, increased energy prices, and a large influx of war refugees from Syria. 18 Against the background of infrequent collection of consumption data, the country's economically uncertain atmosphere provides an even stronger impetus for policy makers to track poverty with alternative methods like imputation-based estimates.

III.2.3. Estimation Results
We start first with checking on Assumptions 1 and 2 before discussing estimation results.
Since the 2008 and 2010 rounds of the HEIS share the same sampling frame based on the 2004 Population and Housing Census, and their questionnaire design remains almost identical, Assumption 1 for a similar survey design is satisfied. Assumption 2 is usually assumed and can only be checked if data for both survey rounds are available. In this case, since we are validating estimates with the actual consumption data, we can check this assumption using these data in both survey rounds.
We propose an explicit test for this assumption. Specifically, we can use a decomposition that is similar in spirit to the Oaxaca-Blinder framework (Oaxaca, 1973;Blinder, 1973), where the change in poverty between the survey rounds can be broken down into two components, one due to the changes in the estimated coefficients (the first term in square brackets in equation (8) below) and the other the changes in the x characteristics (the second term in square brackets in equation (8) where j η is defined as j j ε µ + , j= 1, 2, for less cluttered notation. 19 Decomposition results are provided in Table 2, where seven different models are used. These models are built on a cumulative basis, with later models sequentially adding more variables to earlier models. The reason is that few common variables may exist between survey rounds in other settings-especially with surveys of different designs as will be discussed in the next section-thus using different models with different sets of control variables would provide a useful illustration.
Model 1 is the most parsimonious model and consists of household size, household heads' age, age squared, gender, highest completed years of schooling, and a dummy variable indicating whether the head is Jordanian, and a dummy variable indicating urban residence. Model 2 adds to Model 1 the household demographics such as the shares of household members in the age ranges 0-14, 15-24, and 25-59 (with the reference group being those 60 years old and older).
Model 3 adds to Model 2 employment variables, which include dummy variables indicating whether the head worked in the past week, whether the household has at least one female member working in the past week, whether the household has one member working as employer, whether the household has a member who is self-employed. These employment variables are commonly collected in most household surveys, and can provide a richer model than Model 1 while still keeping the model relatively parsimonious for most applications.
Model 4 adds to Model 3 some asset variables including the number of rooms in the house, the construction materials for the outside wall of the building, the sources of drinking water, 20 and whether the household owns a car, computer, television set, desk phone, cell phone, internet, air conditioner, microwave, and a water filter. Model 5 adds a more detailed list of asset variables, which include the physical characteristics of the house, the energy sources for cooking, whether the household has a satellite dish/ cable, video player, radio, camera, fax machine, fridge, freezer, oven, gas-operated oven, dishwasher, washing machine, vacuum cleaner, solar boiler, and a sewing machine. As an alternative to not adding all these other variables other than the basic ones in Model 3, Model 6 adds to the latter log of per capita income. Finally, Model 7 adds to Model 5 log of per capita income. Full model specifications are provided in Appendix 2, Estimation results suggest that, unsurprisingly, as the list of control variables becomes richer, the change in poverty that can be explained by the x characteristics grows proportionately larger.
For example, this component increases from around 70 percent in Models 1 to 3 to more than 80 percent in Models 4 and 5, and finally more than 100 percent in Models 6 and 7. 21 This indicates that Assumption 2 is satisfied with Models 6 and 7, perhaps likely to be satisfied with Models 4 and 5, and less likely to be satisfied with the remaining models. As an additional check, we also present decomposition results for the changes in poverty using a more restrictive probit model (see, e.g., Yun (2004)). 22 Estimation results (Panel B, Probit model) are qualitatively similar, and even suggest that in addition to Models 6 and 7, Models 4 and 5 can satisfy Assumption 2.
For comparison purposes, we also provide a Wald (Chow) test for the assumption of constant parameters traditionally made in the existing studies. The test procedure is rather straightforward and includes four steps: i) pool data for both years, ii) generate a dummy variable for the second year and then generate interaction terms for this dummy variable with all the control variables, iii) run a regression of household consumption on the usual control variables plus the year dummy variables and all its interaction terms, and iv) test the joint significance of the estimated coefficients on the year dummy variable and its interaction terms. The resulting test (Panel C, Wald test) overwhelmingly rejects the assumption of constant parameters for all seven models considered, which further emphasizes that our less restrictive Assumption 2 is more appropriate.
Estimated poverty rates are then provided in Table 3. Consistent with our test for Assumption 2, estimates using Models 4 to 7 are within the 95 percent confidence interval of the true poverty rate, while estimates based on Models 1 to 3 are just outside this interval. Adding a richer list of variables help improve the precision of the estimates significantly for those of Models 1 to 3 to those of Models 4 to 7-as indicated by the point estimates moving from outside to inside the 95 percent confidence interval of to the true rate-but there is practically not much difference among the estimates provided by the latter four models. 22 The probit model is more restrictive than our estimation framework since it converts the continuous household consumption variable into a binary variable for poverty status for the dependent variable, and imposes a standard normal distribution where 2 j ε σ is assumed to equal 1.
These validation results provide rather encouraging support for the application of predictionbased method to obtain poverty estimates in the absence of consumption data. Put differently, if consumption data in the 2010 survey round were not available, we could provide reasonable estimates using the consumption data in the 2008 survey round in combination with the household characteristics in the 2010 round.
Another interesting result is that, since household assets are known to have a positive correlation with household consumption (as empirically indicated by the regression results in Appendix 2, Tables 2.1. and 2.2), if we know that asset ownership rates are generally increasing over time (as seen in Appendix 2, Table 2.3), we would also know from Proposition 2 that estimation models that omit assets would provide upward biased estimates of the true poverty rate. Indeed, Models 1 to 3's estimates are around 1.5 percentage points higher than the true rate of 14.4 percent. Thus with additional knowledge on the trend of asset ownership over time, Proposition 2 practically offers a way to obtain a bound estimate on poverty where Assumption 2 is not satisfied.

III.2.4. Alternative Imputation Methods
We provide other modelling options to our imputation framework in Table 4. The imputation framework that provides the estimates in Table 3 relies on the assumption of a normal distribution for the error terms j µ and j ε , j= 1, 2. Is this a valid assumption? We provide a robustness check by assuming no functional form for these error terms and use their empirical distribution instead. Estimation results (Table 4, row 1) provide accurate estimates only for Models 5 to 7, and higher poverty rates for the remaining models, suggesting that this assumption is reasonable and help improve our estimates. Conversely, we provide another check 25 by using the more restrictive probit model to directly estimate poverty rate. 23 Estimation results are accurate for Models 4 to 6 but inaccurate for Model 7, and similarly provide higher poverty rates for the remaining models.
As discussed earlier in the literature review, MI methods are commonly used in statistics.
We provide estimates based on the MI method equivalent to our imputation framework in row 3 (the normal linear regression model), and another version that employs predictive mean matching method in row 4 (which essentially matches a household's predicted consumption level in 2010 with its closest number in the actual consumption data in 2008 and then substitute the former with the latter for the household consumption in 2010; see Little (1988) to 7 under the MI predictive mean matching model perform well. This suggests that out of all these alternative modelling options, the MI predictive mean matching method brings the best results. Notably for the misspecified Models 1 to 3, all these alternative modelling options provide more upward biased estimates that are between one and two percentage points higher than those offered by our imputation framework.

III.3. Estimation Procedures
We thus propose the following estimation procedures to predict the poverty rate in period 2, where consumption data are missing but the relevant characteristics x are available.
Step 1: Check that Assumption 1 is satisfied, which involves verifying that key features of the two surveys such as the sampling frames and the questionnaires are (essentially) the same. If data from earlier survey rounds are available, check that the regression model that is used for imputation satisfies Assumption 2 on these data. 23 The difference is that we use a random effects probit model to estimate equations (1) and (2)  Step 2: Using the data in survey round 1, estimate equation (1) and obtain the distributions of the predicted parameters 1 β , 1 µ , and 1 ε .
Step 3: Take a random draw from the normal distributions of the predicted parameters 1 µ , and 1 ε obtained in step 1 and denote these by μ and 1 ε . Then using these predicted parameters and the data in survey round 2, estimate the consumption level for each household in round 2 as follows Step 4: Estimate the quantity in (5) and the first term on the right hand side in (6) (i.e., ), using the given poverty line z 1 in survey round 1 and 1 2 y obtained from Steps 1 and 2 above.
Step 5: Repeat steps 3 to 4 S times and save the data with all S simulations. Take the average of the estimated quantity in (5) over the S simulations to obtain the estimate of poverty rate in survey round 2. (We use S= 1,000 in our simulations below.) Step 6: Take the average of the estimated quantity for ) | ( 2 2 x P V over the S simulations to obtain the estimate of the first term on the right hand side in (6). Obtain the estimate for the second term on the right hand side in (6) using the simulated dataset and add this estimated quantity to the estimate for the first term to obtain the estimate of the variance of poverty rate in survey round 2.
Step 7 (recommended but optional): Provide additional robustness checks using the empirical distributions of the error terms or other modelling options discussed above.

IV. Imputation Combining Surveys of Different Designs to Update Poverty Estimates
While it may not seem unreasonable to make the assumption that the two rounds of survey under consideration are representative of the population and consequently produce comparable estimates (Assumption 1), this assumption is restrictive. Contexts where one survey has consumption data while the other does not and the two surveys do not produce the same statistics are much more common for a variety of reasons. We consider the application of poverty imputation in such contexts by relaxing Assumption 1 and analyze the LFS in this section.
Jordan's Department of Statistics is responsible for implementing both the HEIS and the LFS, but different departments within this agency are in charge of each survey and thus conduct them independently according to their different mandates. The HEIS collects consumption data and is implemented biannually, while the LFS does not have consumption data but collects data on labor statistics more frequently on a quarterly basis.

IV.1. Making the Different Survey Designs More Comparable
The violation of Assumption 1 implies that, the estimated distributions for the common variables for the two surveys in the same period may be different and not representative of the same underlying population. We propose to "standardize" the distributions of the variables in survey 2 by those in survey 1 in Proposition 3 below. 24

Proposition 3: Standardizing common variables in surveys of different design
Assume that survey 2 has the same design over time, is collected more frequently than survey 1, and that the time periods data from the former are available include the periods that data from the latter are available. Assume further that the overlapping variables between the two surveys follow a normal distribution such that ) , ( Proof. Appendix 1. The intuition behind Proposition 3 is that, for the overlapping period t between the two surveys, the distribution of the variables in survey 2 can be standardized against those in survey 1 in the standard way. Once this is done, these standardized distributions in period t can be used as a benchmark for other periods when only data from survey 2 exist. The term in parentheses on the right hand side in equation (11) ) tracks the changes in the means of the variables in survey 2 over time, which is then rescaled with the relative differences in these variables' variances between survey 1 and survey 2, and finally made comparable to the distributions of the variables in survey 1 by anchoring to their means. In practice, since the x variables in different rounds of the same survey-particularly if they are adjacent in time-typically have roughly equal variances, we can assume the within-survey scaling factor for these variables t t 2 ' 2 σ σ equals one. Note that equation (11) is also a general version of equation (10), where the former is identical to the latter when t'= t.
We can then modify the estimation procedures provided in Section III.3 for two surveys of different designs by replacing the step of checking on Assumption 1 with the following two steps i) standardizing the distributions of the control variables in survey 2 according to those in survey 1 using Proposition 3 (if necessary), and ii) check that imputation using these standardized variables provide estimates that are not statistically different from the true rate for the same year.
For better estimation results, it may also be useful to transform (some) variables in both surveys to normality before standardizing them. We come back to discuss this more in the next section.

IV.2. Updating Poverty Estimates with Different Survey Sources IV.2.1. Data Description for the LFS
The Employment Unemployment Survey (LFS) is the official source of employment and unemployment data in Jordan. While it shares certain similarities with the HEIS such as a twostage cluster stratified sampling design and a common sample frame based on the Population and Housing Census of 2004, its design is different. In particular, between 660 and 680 PSUs (depending on the year) were selected in the first stage out of a total of 1,336 PSUs for the whole country, and within each selected PSU, 10 households were randomly selected at the second stage. Twelve governorates are divided into 24 rural and urban strata and the six major cities across the country with more than 100,000 people are strata on their own, which together form 30 strata in total. The LFS collects data on employment status, occupation, and economic activities for between 11,000 and 12,500 households on a quarterly basis, and these data are representative of the population for each quarter. The LFS questionnaire practically remains the same during the period under study. We analyze all 24 quarterly rounds of the LFSs from 2008 to 2013 in this paper. 25 The LFS does not collect data on assets but collects demographic and employment variables, as does the HEIS (those variables are used in Model 3, Table 2). The LFS also collects data on wage income in the past month for each worker, which is categorized in five income groups: less than 100 JD, 100 to 199 JD, 200 to 299 JD, 300 to 499 JD, and 500 JD or more. Since a considerable number (around 38 percent) of household heads did not work and thus had no 25 Half the sample households in the LFS are designed to be renewed across two consecutive years and for two straight quarters within a year. However, DOS does not maintain any identifying information that allows the construction of panel households or individuals over time, and the data provided to us have no non-Jordanians in three quarters in 2011 and 2012. For these reasons, we analyze each quarter of the LFS separately, and average four quarters within each year to obtain the yearly estimates later.
income in the past month, we assign zero to the wage income for these individuals to make use of all the data. To match this categorical income variable in the LFS, we convert the continuous per capita income variable in the HEIS into a categorical variable with the same income categories.
We provide in Tables 5 and 6   Estimation results in Table 8 indicate a decreasing trend for poverty rates over time. Since data for each quarter are representative of the population, we can then average the estimates for all four quarters in one year to obtain the yearly estimates and provide them in a graphical illustration in Figure 2. The decreasing trend in poverty is steady, even though is less steep during the years 2010-2013 compared to the previous period 2008-2010, perhaps due to the various events taking place in the economy during this time period as discussed earlier. 27 Notably, estimated poverty rates based on the non-standardized variables (not shown) provide a qualitatively similar decreasing trend over time.

IV.2.2. Estimation Results
The estimated poverty rates at the national level are encouraging. To further investigate whether this result holds at more disaggregated levels, we estimate poverty rates broken down by urban and rural areas. Estimation results (not shown) are rather encouraging for urban areas with estimated poverty rates falling within the 95 percent confidence interval of the true rates in both years. The same is true for estimates for rural areas for 2008 but not in 2010. One possible reason for this is that the Jordanian population is predominantly urban (83 percent, Table 6), thus it can be harder to predict poverty rates in rural areas which account for a smaller share of the population. 28

V. Conclusion
In this paper we develop a formal and generalized framework for survey-to-survey poverty imputation, which has been typically handled on an ad hoc basis in the existing literature. We offer less restrictive assumptions and formal tests for these assumptions where data are available, provide more insights into the selection of control variables for model building, and offer simpler variance formulae. Our framework can be generally applied to imputation either from one survey 27 We also experimented with imputation from the HEIS into the DHS. However, one major issue is the latter survey's most recent two rounds are in 2009 and 2012, which do not overlap with the HEIS, thus making it difficult to benchmark the DHS. We tried benchmarking both rounds of the DHS using the HEIS in 2010, and found a qualitatively similar decreasing trend in poverty across these two survey rounds. 28 It is also more demanding to make the distributions of the explanatory variables comparable for smaller population groups (e.g., as disaggregated by regional characteristics or other distributional characteristics such as quintiles) in surveys of different designs. We leave this extension for further research.
33 to another survey with the same design, or to another survey with a different design. We also provide a critical review of recent studies in the economics and statistics literatures that use imputation. Our estimation results combining the HEISs and the HEISs with the LFSs, are quite encouraging, with imputation-based poverty estimates not showing statistically significant differences from the true poverty rates. We also provide step-by-step estimation procedures that can facilitate the implementation of our proposed methods.
Even though we provide an illustration with data from a middle-income country like Jordan, our method is more general and can be applied in other contexts where household consumption surveys are not frequently or consistently collected, while other surveys that can be benchmarked to these household surveys exist. We thus provide support to the growing assessment that survey-to-survey imputation methods can comprise a valuable tool for poverty tracking purposes in developing countries where financial and technical constraints on fielding (expensive and high-quality) consumption surveys can be particularly binding.  (2007), Christiaensen et al., (2012), and Mathiassen (2013). Studies for stastistics include Rubin (1987), Little and Rubin (2002), Schafer and Graham (2002), van Buuren (2012), and Carpenter and Kenward (2013). We only consider imputation for cross sections in this table.

Features
Type of missing data Modelling Different Common  The decomposition of the changes in poverty for Panels A and C are implemented using respectively equation (8) and the Wald test as discussed in the text. The decomposition for Panel B uses the the user-written Stata routine "mvdcmp" (Powers, Yoshioka, and Yun, 2011). All estimates adjust for complex survey design with cluster sampling and stratification. Full model specificaiton is provided in Appendix 2,   10,908 10,908 11,223 Note: Standard errors are in parentheses. We use 1,000 simulations for the error terms. All estimates adjust for complex survey design with cluster sampling and stratification. The underlying regression results are provided in Appendix 2, Table 2.1. True poverty rate is the same for the estimation samples used in Models 1 to 7.

Estimated rate
True rate (0.7) Note: Standard errors are in parentheses. We use 1,000 simulations for the error terms for simulation using the empirical distribution of the error terms (row 1) or using the probit model (row 2), and 50 simulations for MI methods (rows 3 and 4). The true poverty rate for 2010 is 14.4 percent, with a standard error of 0.5 percent. Model specification is the same as in Table 3 and provided in more details in Appendix 2,  11,925 Note: * p<0.05, ** p<0.01 *** p<0.001. Standard deviations/ errors are in parentheses. Differences are estimated with t-tests that takes into account complex survey design with cluster sampling and stratification.

Quarter 4
Head's highest years of schooling completed Head worked in past 7 days Share of household members age 0-14 Share of household members age 15-24

HEIS
Household has at least one member self-employed Household has at least one member working as employer

Share of household members age 25-59
Share of household members working in past 7 days Household has at least one female member working in past 7 days Share of household members age 60 or older  11,223 Note: Imputation-based estimates for poverty rates using the LFS data in 2008 and 2010 are shown under the columns "Quarter 1" to "Quarter 4". True poverty rate estimated from the HEIS for each year is shown under the column "True Rate". Model 6 in Table 3 is used to estimate the underlying consumption model, which regresses household per capita consumption on household size, household head's age, age squared, gender, marital status, nationality, years of schooling, work status in the past 7 days, the shares of household members in the age ranges 0-14, 15-24, and 25-59, the share of members working in the the past 7 days, and dummy variables indicating whether the household has at least one female member working in the past 7 days, at least one member working as employer, at least one member being self-employed, per capita income, and whether the household resides in an urban area. These control variables in each quarter of the LFS are standardized by those in the HEIS such that the former have the same weighted mean and standard deviation as the latter respectively in 2008 and 2010. The variables household size and age in both surveys are transformed to normality before standardizing using the Box Cox method. 1,000 simulations are used for the estimates in each quarter. Standard errors are in parentheses. All estimates adjust for complex survey design with cluster sampling and stratification.  11,147 Note: Imputation-based estimates for poverty rates using the LFS data in 2008 and 2010 are shown under the columns "Quarter 1" to "Quarter 4". True poverty rate estimated from the HEIS for each year is shown under the column "True Rate". Model 6 in Table 3 is used to estimate the underlying consumption model, which regresses household per capita consumption on household size, household head's age, age squared, gender, marital status, nationality, years of schooling, work status in the past 7 days, the shares of household members in the age ranges 0-14, 15-24, and 25-59, the share of members working in the the past 7 days, and dummy variables indicating whether the household has at least one female member working in the past 7 days, at least one member working as employer, at least one member being self-employed, per capita income, and whether the household resides in an urban area. These control variables in each quarter of the LFS are standardized by those in the HEIS and LFS respectively in 2008 and 2010 using Proposition 3(ii). The variables household size and age in both surveys are transformed to normality before standardizing using the Box Cox method. 1,000 simulations are used for the estimates in each quarter. Standard errors are in parentheses. All estimates adjust for complex survey design with cluster sampling and stratification.
x y (A1.2) By Assumption 1, since both x 1 and x 2 are representative of the population (either at the same time or different time periods), we can replace x 1 with x 2 in equation (A1.1) to obtain the imputed household consumption in survey 2. (A1.6) where P(.) is the given poverty function. Corollary 1.1 i) Since the poverty function P(.) is defined as the averaged poverty rate for the population, it is an expectation function. Using the iterated expectation rule, 29 we can rewrite equality (A1.6) as ε represent the s th random draw from the estimated distributions for 1 µ and 1 ε (see, e.g., Gourieroux and Monfort, 1997). The number of simulations S should thus be large enough for equality (A1.10) to hold.
ii) The proposed variance formula is based on the total variance formula provided in equality (5.20) in Little and Rubin (2002) When S tends to infinity (or is practically large enough), the third term on the right hand side in equality (A1.11) will vanish, thus the stated result follows. Corollary 1.2 Given the stricter assumption of constant parameters in place of Assumption 2, that is Proposition 2 Using a general matrix notation for the population where Y j and X j are n j x1 and n j xk respectively, j β is kx1, and j η is n j x1 and represents the vector of error terms, we can break down X j into two components, one is the observed variables X j1 (n j xk 1 ) and the other the unobserved variables X j2 (n j xk 2 ), for k 1 + k 2 = k, and j= 1, 2. We can rewrite equations (1) and (2) in a general format as where 1 j β and 2 j β are k 1 x1 and k 2 x1 accordingly.
If imputed correctly, the predicted household consumption in survey 2 should be where the next-to-last equality holds since by definition, 0 ) ( where the next-to-last equality holds since by definition, Since x 2 is assumed to be normally distributed, so is its linearly transformed variable t x , 1 2→ . 31 Since the first and second moments completely determine the distribution of normally distributed variables, t x , 1 2→ and x 1 have identical distribution. In fact, strictly speaking the assumption of normality is more restrictive than necessary, and we can just assume more generally that the distributions of x 1 and x 2 belong to the same location-scale family (see, e.g., Casella and Berger (2002, pp. 104)).
ii) Similarly, we want to show that the transformed variable ' , 1 2 t x → in survey 2 has the same distribution as x 1t' in survey 1 at time t'. The assumption that the changes for the variables x in between time t and time t' are the same for survey 1 and survey 2 is equivalent to The mean of the standardized variable where the next-to-last equality holds using equality (A1.23). The variance of the standardized variable where the last equality holds given our assumption that the variables x in different rounds of the same survey are on the same scale, or  -0.000*** -0.000*** -0.000*** -0.000* -0.000* -0.000*** -0.000 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) Head is male -0.074*** -0.059*** -0.074*** -0.090*** -0.089*** -0.040*** -0.066*** (0   10791 Note: * p<0.05, ** p<0.01 *** p<0.001. Standard errors are in parentheses. All estimation employs cluster random effects models. Model 5 and Model 7 add to Model 4 the types of house, the energy sources used for cooking, and dummy variables indicating whether the household has a radio, camera, satellite dish or cable, video player, fax machine, solar boiler, freezer, fridge, washing machine, oven, gas-operated oven, dishwasher, vacuum cleaner, and sewing machine.

Share of household members working in past 7 days
Head's highest years of schooling completed Share of household members age 0-14 Share of household members age 15-24 Share of household members age 25-59 Head worked in past 7 days Household has at least one female member working in past 7 days Household has at least one member working as employer Household has at least one member self-employed Construction material for the outside walls of the building Number of rooms in the house 10,936 11,142 Note: * p<0.05, ** p<0.01 *** p<0.001. Standard deviations/ errors are in parentheses. Differences are estimated with t-tests that takes into account complex survey design with cluster sampling and stratification.