WPS4155 How Good a Map? Putting Small Area Estimation to the Test Gabriel Demombynes, Chris Elbers, Jean O. Lanjouw, and Peter Lanjouw1 Abstract This paper examines the performance of small area welfare estimation. The method combines census and survey data to produce spatially disaggregated poverty and inequality estimates. To test the method, predicted welfare indicators for a set of target populations are compared with their true values. The target populations are constructed using actual data from a census of households in a set of rural Mexican communities. Estimates are examined along three criteria: accuracy of confidence intervals, bias and correlation with true values. We find that while point estimates are very stable, the precision of the estimates varies with alternative simulation methods. While the original Elbers et al (2002, 2003) approach of numerical gradient estimation yields standard errors that seem appropriate, some computationally less-intensive simulation procedures yield confidence intervals that are slightly too narrow. Precision of estimates is shown to diminish markedly if unobserved location effects at the village level are not well captured in underlying consumption models. With well specified models there is only slight evidence of bias, but we show that bias increases if underlying models fail to capture latent location effects. Correlations between estimated and true welfare at the local level are highest for mean expenditure and poverty measures and lower for inequality measures. Keywords: Poverty, Inequality, Small Area Estimation JEL Classification: C13, C88, D31, I32, O15, R13 World Bank Policy Research Working Paper 4155, March 2007 The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the view of the World Bank, its Executive Directors, or the countries they represent. Policy Research Working Papers are available online at http://econ.worldbank.org. 1World Bank, Free University of Amsterdam, UC Berkeley and World Bank. We are grateful to Martin Ravallion and Danny Pfeffermann for comments and suggestions. The views in this paper are the authors' and should not be interpreted to reflect those of the World Bank or affiliated institutions. 1 Introduction This paper examines the performance of a method for producing small area estimates of the spatial description of economic welfare. The methodology is described in Elbers, Lanjouw and Lanjouw (2002, 2003), henceforth referred to as ELL (2002). These "poverty maps" offer the promise of generating useful data about poverty and inequality at the local level, information which has potential applications in both the policy and research spheres. In this paper, an unusual data set is used to compare community-level welfare measures estimated using the small area estimation method against measures created from direct observations of household expenditure collected over the entire population within those communities. Poverty maps have two sets of uses. They can be used as tools for geographical targeting of social spending. In a number of countries they have been used by governments and non-governmental organizations to identify those areas where the poor are concentrated as a first step towards directing resources to the poor. While policymakers in wealthy nations are accustomed to having information about local level conditions and welfare readily at hand, in the typical less developed country, information compiled at the local level is scarce and only available through specialized surveys. In such environments poverty maps are a potentially valuable resource. On the research front, poverty maps have a variety of applications. With the resurgent interest in economic growth theory, and in particular the focus on inequality's role, spatial profiles of welfare within a country can be useful. Poverty maps can also be used to investigate the spatial relationship between poverty and a variety of outcomes, including health and crime. The research applications for poverty maps are particularly strong when poverty maps can be produced for multiple years in a single country. In such cases poverty maps can be employed for policy evaluation. 2 The method examined here has been employed for a number of countries, and the resulting poverty maps have been utilized by both policymakers and researchers.2 The growing popularity of the methodology adds to the need for a validation exercise. The analysis in this paper compares the predicted poverty and inequality rates produced by the methodology for groups of rural Mexican communities to the actual poverty and inequality rates in those communities. One strength of the small area estimation approach is that it produces confidence intervals for its estimated welfare measures. An important objective in this paper is to assess to what degree the confidence intervals produced by the ELL method capture the distribution of error in the point estimates. Bias in the point estimates is also examined. The paper is organized as follows. Section 2 details the poverty mapping methodology. Section 3 describes the data employed, Section 4 sketches the validation exercise, and Section 5 presents the results. Section 6 concludes with a discussion of results and their implications. 2. Methodology This section reviews the poverty mapping methodology, which is explained in more detail in ELL (2002).3 The basic approach is straightforward and typically involves a household survey and a population census as data sources. First, the survey data are used to estimate a prediction model for either consumption or incomes. The selection of explanatory variables is restricted to those variables that can also be found in the census (or some other large dataset) or in a tertiary dataset that can be linked to both the census and survey. The parameter estimates are then applied to the census data, expenditures are predicted, and poverty (and other welfare) statistics are derived. The key assumption is that the models estimated from the survey data apply to census observations. The first stage begins with an association model of per capita household expenditure for a household h in location c, where the explanatory variables are a set of observable characteristics: 2 Poverty Maps based on this method are now underway or completed in more than 30 developing countries. Early examples include Alderman et al. (2002), and Mistiaen , Ozler, Razafimanantena and Razafindravonona (2002). See also Demombynes et al (2002). 3 (1) ln ych = E[ln ych xch] + uch . The locations correspond to the survey clusters as they are defined in a typical two-stage sampling scheme. The observable characteristics must be found as variables in both the survey and the census or in a tertiary data source that can be linked to both data sets.4 Using a linear approximation to the conditional expectation, the household's logarithmic per capita expenditure is modeled as (2) ln ych = xch + uch . The vector of disturbances, u, is distributed F (0,). The model in (2) is estimated by Generalized Least Squares using the household survey data. In order to estimate the GLS model, , the associated error variance-covariance matrix, is estimated. Individual disturbances are modeled as (3) uch = c + ch , where c is a location component andch is a household component. This error structure allows for both spatial autocorrelation, i.e. a "location effect" for households in the same area to the extent that it is not already covered by location-level explanatory variables, and heteroskedasticity in the household component of the disturbance. The two components are uncorrelated and (by construction) uncorrelated with observable characteristics in the regression equation. The model in (2) is first estimated by simple OLS. The residuals from this regression serve as estimates of overall disturbances, given by u^ch . These residuals are decomposed into uncorrelated household and location components: 3Early variants of the methodology were presented in Hentschel et al (2000) and Elbers, Lanjouw and Lanjouw (2000). These earlier versions differ in important ways with the approach outlined in ELL(2002). 4 (4) u^ch = ^c + ech . The estimated location components, given by ^c , are the within-cluster means of the overall residuals. The household component estimates, ech , are the overall residuals net of location components. Additional parameters are estimated: ^2 , the variance of c and V^ 2 , the variance of 2 .5 ( ) To allow for heteroskedasticity in the household component, a logistic model of the variance of ch conditional on a set of variables, zch, is estimated, bounding the prediction between zero and a maximum, A, set equal to (1.05)*max{ech}: 2 (5) ln[ A - ech ] = zch^ + sch . ech 2 T 2 Letting exp{zch^} = B and using the delta method, the model implies a household T specific variance estimator for ch of (6) ^,ch [1 2 AB + B] 2+ Var(s)[AB( 1 1- B)] . (1+ B)3 This heteroskedasticity model generates a vector of coefficient estimates, ^ , and the variance-covariance matrix, V^(^) . The coefficient estimates are used to predict ^,ch , the household-specific term for the variance of ch. 2 These error calculations are used to produce two square matrices of dimension n, where n is the number of survey households. The first is a block matrix, where each block corresponds to a cluster, and the cell entries within each block are ^ . The second 2 4 Note that these variables need not be exogenous. 5See Appendix 1 of Elbers et al (2002) for details. 5 is a diagonal matrix, with household-specific entries given by ^,ch . The sum of these 2 two matrices is ^ , the estimated variance-covariance matrix for the original model given by equation (2). Once this matrix has been calculated, the original model is estimated by GLS. In the second stage predicted log expenditures and subsequently local-level estimates of poverty and their accompanying standard errors can be generated via several routes. Elbers et al (2002) describe a method based on numerical gradient estimation. An alternative approach known as parametric bootstrapping (Pfeffermann and Tiller, 2005) has been found to yield closely similar results and proceeds as follows.6 A series of simulations are conducted, where for each simulation r a set of first stage parameters are drawn from their corresponding distributions estimated in the first stage. A set of beta and alpha coefficients, ~r and ~r , are drawn from the multivariate normal distributions described by the first stage point estimates and their associated variance-covariance matrices. Additionally, (~ ) , a simulated value of the variance of the location error 2 r component is drawn.7 Combining the alpha coefficients with census data, for each census household (~,ch)r , the household-specific variance of the household error 2 component, is estimated. Then, for each household simulated disturbance terms, ~c and r ~ch , are drawn from their corresponding distributions.8 A value of expenditure for each r household, y^ch , is simulated based on both predicted log expenditure, xch ~r , and the r disturbance terms: 6We will see below that while the methods yield very similar point estimates, the approach employed in ELL (2002) produces slightly wider (and possibly more plausible) confidence intervals. In Appendix 1 we outline yet a third approach that yields confidence intervals that also more closely track those obtained with the method outlined in ELL (2002). 7The ( )r value is drawn from a gamma distribution defined so as to have mean ^ 2 and variance ~2 V^ 2 . ( ) 8Non-normality is allowed for in the distribution of both c and ch . For example, for each distribution, a Student's t-distribution can be chosen with degrees of freedom such that its kurtosis most closely matches that of the first stage residual components, ^c or ech . An alternative, semi-parametric, approach can also be adopted in which stardardized residuals are drawn from the first-stage survey residuals. 6 (7) y^ch = exp xch +~c + ~ch . r ( ~r r r ) Finally, the full set of simulated per capita expenditures, y^ch , are used to calculate r estimates of the welfare measures for each target population.9 This procedure is repeated R times drawing a new ~r, ~r, (~ ) 2 r and disturbance terms for each simulation. For each subgroup, the mean and standard deviation of each welfare measure are calculated over all r=1,...,R simulations. For any given location, these means constitute our point estimates of the welfare measure, while the standard deviations are the standard errors of these estimates. There are two principal sources of error in the welfare measure estimates produced by this method.10 The first component, referred to as model error in ELL (2002), is due to the fact that the parameters from the first-stage model in equation (2) are estimated. The second component, termed idiosyncratic error, is associated with the disturbance term in the same model, which implies that households' actual expenditures deviate from their expected values. While population size in a location does not affect the model error, the idiosyncratic error increases as the number of households in a target subgroup decreases. 3. Data The analysis in this paper uses data collected as part of the targeting and evaluation program of PROGRESA, a health, education, and nutrition program of the Mexican government. Assignment to PROGRESA for households in these communities was randomized by community; a census of all households in 506 communities was conducted in November 1997, 320 were integrated into PROGRESA in late spring of 1998, and three follow up surveys (complete censuses) of households in all 506 communities were conducted in 1998 and 1999. Additionally, a survey was conducted in 9These calculations are performed using household size as weights, implicitly assuming that expenditure is distributed uniformly within households. The same methodology could be applied using equivalence scales to capture alternative intrahousehold distributional assumptions. 7 March 1998, before PROGRESA was introduced to treatment communities. The March survey included a fairly detailed expenditure survey. This paper employs household characteristic data from the November 1997 survey and an expenditure aggregate constructed using the March 1998 survey.11 While it would be possible to undertake the analysis using income data from the November survey, the expenditure data is preferred for two reasons. First, the income data is very noisy. A substantial fraction of households report no income at all, and the income data shows no correlation with the March expenditure aggregate. The March expenditure aggregate, in contrast, is highly correlated with an expenditure aggregate from the June 1999 survey (for control group households), suggesting that it is a fairly consistent measure of household welfare. Second, the applications of the ELL methodology thus far have most commonly used household expenditure or consumption as the basis for welfare analysis, following the consensus that given the potential for consumption smoothing, consumption is likely to be a better indicator of long-term welfare than income. While it would be preferable to have expenditure data collected at the same time as household characteristics data, the household variables used here are unlikely to change substantially over time. Consequently the time gap between the November and March surveys should not distort the analysis. While detailed, the expenditure aggregate is less comprehensive than typical consumption aggregates developed from some surveys carried out in developing countries. It covers only cash expenditures and does not include figures for rent. The expenditure survey was not carried out in 14% of households interviewed in November 1997. These households, which are concentrated in a small number of communities, are not included in the analysis. The ten communities with fewer than 10 households with expenditure information are also not included, leaving 20544 households in 496 communities. 10A third potential source of error is associated with computation methods. Elbers et al (2002) show that this can be set arbitrarily small by selecting a sufficiently large number of simulations. 11Most questions in the November 1997 survey were similar to those in the 2000 national Mexican census. They concerned household characteristics and recent income of the household. 8 4. Analysis The approach used for the validation exercise is to estimate a first-stage model using a "pseudo-survey" drawn from the PROGRESA households, using a two-stage sampling procedure. Welfare measures are then predicted with target populations composed of groups of PROGRESA households. The PROGRESA communities themselves have too few households to produce meaningful confidence intervals for the estimates using the methodology. Previous experience, e.g. ELL (2002), has shown that standard errors are very large for target populations with less than a few hundred households. In order to generate a group of more suitably sized target populations, the communities were grouped at random into 20 target populations. Both the pseudo-survey and the target populations were drawn repeatedly, in order to generate estimates for a large number of target populations. Specifically, the steps in the analysis were as follows: 1) A random sample of 50 localities was drawn from the 496 localities, with probability of selection proportional to the size of the locality. From each of the 50 localities, 10 households were selected at random. The data from these households (a total of 500) serve as a pseudo-survey. 2) The first-stage methodology described above was applied using the pseudo-survey. A set of explanatory variables for log per capita expenditure was selected from a candidate list. An additional set of explanatory variables which best explained estimated location effects were selected from a set of community-level averages.12 3) The 496 localities were grouped into 4 groups of 24 communities and 16 groups of 25 communities. These serve as the 20 target populations for the poverty mapping 12From equation (2) and (3) it is clear that the variance of the location effect c must be small if acceptable standard errors on welfare predictions are to be obtained. We have found that the inclusion of means of explanatory variables, calculated from the census for the relevant enumerationa areas, reduces 2 considerably. See ELL(2002) for details and see also below. 9 analysis, and the location effect is modeled at the level of the localities. The target populations each cover an average of 1042 households. 4) True poverty and inequality rates were calculated for the 20 target populations based on actual per capita expenditure.13 5) The poverty mapping methodology was applied to predict poverty and inequality rates for the 20 target populations, using first-stage models estimated with the pseudo- survey. 6) The entire procedure was repeated 10 times, drawing a new pseudo-survey for each round of analysis. The output of this procedure is a set of poverty and inequality estimates and associated standard errors for 200 target populations. To examine the sensitivity of the estimates to the error specification, two different specifications are used for the second- stage analysis. In the first, both the location component and the household component of the error are modeled as Student's t-distributions. For the second specification, a semi- parametric approach is used for both the location and the household components. In this semi-parametric approach, instead of drawing from a t-distribution, the standardized residuals are drawn from the first-stage survey residuals. For both specifications, the household component of the error is modeled as heteroskedastic, with the predicted log per capita expenditure as the sole explanatory variables.14 13The poverty line was set to 159 pesos, the per capita expenditure of the median household in the full set of households. This corresponds roughly to PROGRESA's poverty-classification scheme; using discriminant analysis techniques based on household income, approximately 50% of households were initially classified as "poor" and thus qualified for PROGRESA. 14Note that for the semi-parametric approach, it is the standardized residuals that are drawn from the observed distributions in the survey. These standardized residuals, with mean zero and variance equal one, are drawn and multiplied by the square root of the relevant simulated variance (of the location or household effect) to produce simulated residual values. 10 5. Results First-Stage Results OLS Regression results from the first-stage models are given in Appendix 2 Tables A1-A10. Across the ten pseudo-surveys used here, the R2 ranges from 0.415-0.53 (see Table 1). The explanatory power of the models in this analysis is in the general range of models from past applications. The R2 for models for particular strata ranged from 0.45 to 0.77 in Ecuador (Hentschel et al, 2000), 0.29 to 0.63 in Madagascar (Mistiaen et al, 2002), and 0.47 to 0.72 in South Africa (Alderman et al, 2002). The explanatory power achieved with the PROGRESA models is rather good given that the households in the PROGRESA communities are more homogenous than those within a stratum in a typical application. All the communities in the PROGRESA sample were selected for the program because they were poor and rural, based on indicators in the 1990 and 1995 censuses. Consequently, the households are more similar to one than another than the households in an entire stratum of a country. Household size was used in all models, and some variables were selected in models for several pseudo-surveys, but there was generally little consistency in models chosen across pseudo-surveys. The estimated location effects were generally small, with variances ranging from 0.9% to 3.1% of the overall variance of the disturbance term after the addition of cluster-level means. This can also be seen in that the models achieved levels of explanatory power very close to what would be achievable with models that employed, instead, a cluster-level fixed-effects specification (see Table 1). Second-Stage Results 5.1 Point Estimates and Precision Tables 2 and 3 present illustrative results for the headcount rate based on two pseudo surveys: 2 and 3.15 These tables present for each of the 20 target populations a measure of the true headcount rate as well as the estimated headcount rate based on a variety of procedures. Column 1 presents estimates and standard errors based on the 15These two pseudo-surveys have been chosen arbitrarily in order to avoid unnecessary repetition. Qualitative conclusions are unchanged if other, or all, pseudo-surveys are examined. 11 numerical gradient simulation procedure sketched out in Elbers et al (2002). Columns 2- 4 present estimates based on the "parametric bootstrapping" (Pfeffermann and Tiller, 2005) procedure outlined in section 3 and are computed using the POVMAP2 software that has been purpose-written by Qinghua Zhao in the Research Department of the World Bank.16 The parametric bootstrapping results vary depending on whether disturbances are drawn from the empirical distribution (Column 2) or from parametric distributions (Column 3). The estimates in column 4 are based on a program written in SAS, based also on application of the procedure outlined in section 3 (with disturbances drawn from a parametric distribution), and are presented to illustrate that simulation based results do vary depending on different random number generating algorithms as well as seeds. Finally the results presented in Column 5 are based on an alternative, non-parametric, scheme outlined in Appendix 1.17 Point estimates differ only slightly across different simulation approaches. In Table 2, while the true headcount rate for target population 1 is 60.5% the estimated rate for this target population varies between 60.9% and 61.6% across the different estimation approaches. The approaches are more clearly at odds in terms of the estimated standard errors. In particular, standard errors deriving from the "parametric bootstrapping" procedure described in Section 3 and summarized in Columns 2-4, tend to be somewhat smaller than those based on the numerical gradient method described in ELL(2002) ­ Column 1 - and the non-parametric approach of Appendix 1 (Column 5). In the case of pseudosurvey 2 the distinction is not of great significance: irrespective of methodology, the 95% confidence interval around each target population's estimated headcount rate encompasses the true poverty rate in 19 out of 20 cases. However, with other pseudo surveys the distinction does matter. In Table 3, results are presented based on a model of consumption estimated from pseudosurvey 3. With this survey, the "classical" approach (Elbers et al, 2002) and the alternative approach outlined in the appendix yield three 16POVMAP2 can be freely downloaded at http://iresearch.worldbank.org. 17 Note that these estimates do not show significant differences in poverty between target populations. This reflects both the relative homogeneity of the group of PROGRESA households, the random composition of target populations, and the small sizes of the target populations, about 1000 households. On the other hand, discriminating between poverty of the target populations is not the subject of the current paper and all standard errors are about the same size as one would get from survey-based estimates at the aggregate level.. 12 cases where a target population's true poverty rate falls outside the 95% confidence interval around the estimated poverty rate. But with the parametric bootstrap approach underpinning estimates in Columns 2-4 the failure rate is higher (7 cases). For this pseudosurvey the parametric bootstrapping approach appears to produce standard errors that are too "optimistic" - suggesting greater precision of estimates than is warranted. Given this evidence of a tendency for the parametric bootstrapping procedure to produce confidence intervals that are somewhat too narrow, we employ from now on, unless noted explicitly otherwise, the non-parametric approach outlined in Appendix 1. Additional comparisons, not reported here, confirm that conclusions derived with this simulation procedure hold also for estimates based on the considerably more computationally-intensive numerical gradient approach outlined in ELL(2002). The important point to take away here is that simulation methods do seem to matter (with respect to standard errors, if not point estimates). Further research is underway to understand better why the different simulation methods do not always agree.18 Table 4 looks more closely at the confidence intervals estimated around welfare estimates produced with our non-parametric simulation scheme. If the confidence intervals accurately reflect the true uncertainty in the estimates, the fraction of cases of the "truth" falling within a confidence interval around an estimate should be approximately equal to the corresponding confidence level. Note however that twenty `target populations' are drawn for each of the ten `surveys' and so the experiments are not entirely independent. For each welfare measure and each of the ten pseudosurveys the number of instances is counted when true welfare in each of the 20 target populations falls within two standard deviations around the target population's estimated welfare level. For example, in the case of pseudo survey 1, the true welfare estimate (mean, headcount, squared poverty gap, and General Entropy Class inequality measure with parameter 0) always fall within the confidence interval around the estimated welfare measure. In Table 2 we saw that for pseudosurvey 2 this occurred 95% of the time (19 out of 20 cases) for the headcount, and Table 4 shows the same was observed for the mean, while 18The most recent version of POVMAP2 now offers the user the choice of the "classical" numerical gradient or the parametric bootstrapping procedures outlined in Section 3. . 13 for the squared poverty gap and inequality calculated on the basis of the GE0 the truth always falls within the confidence intervals calculated around the estimates. On average, across all pseudo surveys the success rate is just under 95% for the mean consumption, headcount, and squared poverty gap measures, and just below 90% for the GE0 measure. In Table 5 we consider how sensitive are our estimated standard errors to the presence of unobserved location effects. We saw in Table 1 that our preferred specifications for the different pseudosurveys were quite successful in proxying unobserved location effects ( ^ 2 ^u2 ranges between 0.9% and 3%). How much larger would standard errors be if our underlying models had not been so successful in this respect? Table 5 compares estimates and standard errors on small area estimates of the headcount rate from pseudosurvey 2 based on two models: one with our preferred specification; and the other with a specification in which no census-mean variables were included.19 In the latter model the share of the variance of overall disturbance term that is attributable to the variance of the cluster component is now 11.9%, a four-fold increase over the 2.7% in the preferred model (Table 5). At the all-census level, the two models predict headcount rates of 61.9% and 61.5%, respectively, both virtually indistinguishable from the 61.1% actual headcount rate in the population. However, the standard error on the model with no location variables is now 0.024, up by more than two fifths from the standard error of 0.017 obtained with the preferred model. Part of the increase in the standard error is due to the fact that the explanatory power of the model with no location variables is lower than that of the preferred model. As a result, idiosyncratic error would be expected to be higher ­ see Section 2 and ELL(2002). However, at the level of the total population most of the idiosyncratic error will have cancelled out (poverty is being estimated over a population of more than 20,000 households). Thus the increase in the standard error from 0.017 to 0.024 is likely due mainly to the consequence of our failure to adequately capture unobserved location effects. At the target population level, standard errors are higher than at the level of the total population, irrespective of underlying models. Moving from the preferred specification to the model with no location variables, standard errors rise considerably, 19 Our calculations here are based on the numerical gradient "classical" simulation procedure. 14 and in some cases the percentage change is even greater than at the level of the total population. For example, standard errors across the two models rise by as much as 43% for target population 2 (0.030*1.43=0.043). However, here, the changes in standard errors are reflecting both the influence of idiosyncratic error and our failure to capture location effects. 5.2 The Level of Location Effects Note that the location effect c may include group effects at levels higher than the survey cluster. To see this consider the following model with group random effects at a `district' level (v), as well a the cluster level (c).: ln yvch = xvch + v + vc + vch As before, the error components are uncorrelated. If clusters are the primary sampling unit, a district is sampled only indirectly, viz. if one of the sampled clusters happens to be located in that district. In a typical living standards survey there will only rarely be districts that have been sampled more than once in this way, making it impossible to separate the location effect in the sample into a `district effect' and a `cluster effect' . Assume accordingly that a district is sampled at most once, and write v(c) for the unique district sampled along with the cluster. The model now becomes ln yv(c)ch = xv(c)ch + v(c) + v(c)c + v(c)ch . Or, with obvious relabelling: ln ych = xch + *c + ch, where *c = v(c) + v(c)c. Consequently, the estimated variance of the location effect in a model with only cluster-level random effects is in fact an estimate of + , the 2 2 combined group effects operating at the sample's cluster level. 15 In the simulation phase the analyst has to choose whether the location effect estimated from the pseudosurvey should be applied at the cluster or the `district' level. When there is no way of separating the location effect into a cluster and `district' effect the best that one can do is to assume either that the effect is entirely a cluster-level effect, or that it occurs entirely at the district-level. The latter will be quite a conservative assumption as it will rule out that any part of the estimated location effect applies only at the cluster level. This approach might be considered as yielding an "upper-bound" on the standard error. The former will be "optimistic" in the sense that it will yield standard errors that could be under-estimates of the true-standard error ­ particularly if the location effect is big. In our setting, it does not make sense to apply the location effect at a level higher than the cluster, as the latter correspond to villages and these have been assembled randomly into 20 target populations. ELL (2002) illustrate in the more plausible setting of rural Ecuador, however, that when it is assumed that the location effect estimated at the cluster level applies entirely at a higher level (in Ecuador, at the parroquia level), then the idiosyncratic component of the standard error does rise appreciably. However, they also show that the impact on overall standard errors is negligible because in their setting ­ as in the present study ­ the size of the estimated location effect is small. If the introduction of cluster-means or other cluster-level variables is not successful in capturing group effects then the choice of level of aggregation at which to apply the location effect in the simulations can affect final results more substantially. In such a case there would be a larger range between the "optimistic" standard errors and the upper-bound estimates obtained by assuming that the location effect occurs entirely at the `district' level. 5.3 Bias Another way in which to gauge the reliability of small-area estimates of welfare is to consider whether there is evidence of bias - a systematic tendency for estimates to deviate from the truth in any way. Figures 1-4 show, for each target population and for four different welfare measures, the relationship between true welfare and the difference between true and estimated welfare. In Figure 1 we can see that there is some tendency for the estimation procedure to overestimate mean per-capita consumption for those 16 target populations with a true mean consumption level that is low, and to underestimate the mean consumption level of rich target populations. To see this note that when true consumption is low, the bias - defined here as "truth" minus estimated consumption, is negative ­ while it is positive when true average consumption is high. However this relationship is not strong. Overall, the average difference between the estimated mean consumption and true consumption is about 1.5 pesos: about 1% of the mean consumption level of the poorest target population. The bias is similarly modest for the headcount (Figure 2), squared poverty gap (Figure 3) and mean log deviation (General Entropy class measure with parameter 0) inequality measure (Figure 4). The extent of bias in these estimates is related to the degree to which the model specification fails to capture location effects on the basis of census-mean variables or other variables intended to capture locality-level characteristics. As we saw in the preceding section and in Table 1, our model specifications are quite successful in removing the effect of latent community level characteristics, and as a result the bias in our estimates is quite modest. If we produce estimates that omit village-level census means, then the bias is accentuated. Figure 5 illustrates how the slope of the line capturing the extent to which headcount is overestimated in truly non-poor communities and the headcount is underestimated in truly poor communities becomes steeper when estimates are based on a consumption model that fails to capture unobserved location effects. The intuition behind this bias is quite straightforward: if there is a sizeable location effect, and our model fails to capture it, then there will be a tendency for poverty to be over-estimated in communities that are relatively well-off, given the explanatory variables in the model, i.e. that have large positive location effects. Part of the reason that the communities are well-off is likely attributable to community-wide characteristics of the community, and this will not be reflected in estimates based on a model that fails to capture the effect of those characteristics. As a result estimates will tend to overstate poverty of such communities. Conversely, in truly poor communities, part of the reason they are so poor will be due to the broader characteristics of the community. Again, if the consumption model does not capture the impact of those broader characteristics, there will be a tendency for estimated poverty to be an understatement of true poverty in the community. We see, therefore, that not only is there a strong incentive to proxy location 17 characteristics in order to improve the precision of estimates (Section 5.1), but also in order to minimize a systematic tendency to overstate poverty in truly non-poor communities and understate poverty in truly poor communities. 5.4 Correlation A further way to consider the reliability of the small area estimates is to examine the correlation between the predictions and the true values. Table 6 shows simple pearson and spearman rank correlations between true and predicted values. Each cell shows the correlation between predicted welfare and true welfare across the 20 target populations. Rows represent alternative pseudosurveys and columns indicate alternative welfare measures. Correlations (both pearson and rank) are positive and reasonably high for mean consumption and the two poverty measures (headcount rate and squared poverty gap). In the case of inequality the correlations are much lower ­ presumably because the target populations vary very little in terms of true inequality. Indeed, households in the PROGRESA communities are more homogeneous than those within a stratum in a typical poverty mapping application. All the communities in the PROGRESA sample were selected for the program because they were poor and rural, based on indicators in the 1990 and 1995 censuses. Consequently, the households are more similar to one another than the households in an entire stratum of a country. This high level of homogeneity across households (and target populations) is a somewhat unusual feature of this empirical application. However, it might be expected to present a particularly difficult setting in which to implement the small-area estimation methodology and therefore does provide a useful (conservative) setting in which to gauge the methodology's performance. 6. Discussion The results presented here offer a rough test of the ELL methodology and point to some tentative conclusions that may inform future applications of the ELL welfare mapping method. In terms of the predictive power of the method, the results provide strong evidence that ELL estimates have important information content. Bias is low, the correlations between actual and predicted values of poverty indices and the mean are 18 generally positive and not insubstantial. For inequality figures, the results are generally weaker. Because the signal-to-noise ratio is lower in these inequality estimates, it is particularly important to take into account error in the estimates when applying them to research or policy applications. The ability to provide confidence intervals is a crucial advantage to the ELL method as compared with alternative approaches to welfare mapping. In the analysis presented here, it was found that alternative simulation methods do influence the size of the estimated standard errors on welfare estimates. The numerical gradient approach, originally proposed in ELL(2002) was found to produce satisfactory standard errors, and similarly for the non-parametric simulation procedure outlined in Appendix 1. However, the parametric bootstrapping procedure described in Section 3 was found to yield standard errors that are somewhat understated. It is not entirely clear why this latter procedure should suffer from this propensity, and further research is needed to resolve this concern. An important objective of this analysis has been to document how important it is, when applying small-area estimation methods, to think hard about possible unobserved, community-level, factors that may influence welfare outcomes. Experience with "poverty mapping" in a large number of countries indicates that inclusion of census- means as regressors in the underlying consumption model (and/or the inclusion of household variables that capture "network" effects, or of additional community-level variables from tertiary datasets such as administrative and GIS data) can go a long way towards helping to secure specifications in which unobserved location effects are kept small. The analysis here has shown that failure to capture such location effects in this way can lead to markedly higher standard errors and also an increase in bias. It is important to recognize the limitations of the analysis in this paper. The data used here are less well-suited to poverty mapping than those usually employed. First, the expenditure aggregate used is less comprehensive than that found in a typical developing country survey, and the general quality of the data may be worse than, for example, data collected in a World Bank LSMS survey. This reduces the potential for variation in expenditure to be explained by observed variables. Second, the data all come from poor households in rural Mexico. Consequently, there is relatively little variation in 19 expenditure across households, and a relatively large fraction of the variation is due to measurement error or short-term fluctuations and cannot be explained by observable characteristics. The problem associated with the small range of expenditures is compounded in this exercise by the fact that it was necessary to construct target populations by randomly assembling groups of communities. This resulted in a narrow spread of welfare measure values across the target populations. The ELL method is likely to produce estimates with a higher signal-to-noise ratio when the underlying population has greater variation in consumption. All in all, the analysis presented here suggests that the details of poverty mapping matter. But the evidence does also suggest that the small area estimation procedure can provide useful, and reliable, estimates of welfare at fine levels of aggregation that survey data themselves would not be able to accommodate. 20 References Alderman, Harold, Miriam Babita, Gabriel Demombynes, Nthabiseng Makhatha, and Berk Özler. "How Small Can You Go? Combining Census and Survey Data for Mapping Poverty in South Africa, 2002. Journal of African Economies, 11: 3. Demombynes, Gabriel, Chris Elbers, Jenny Lanjouw, Peter Lanjouw, Johan Mistiaen and Berk Özler. 2002. "Producing a Better Geographic Profile of Poverty: Methodology and Evidence from Three Developing Countries." WIDER Discussion Paper no. 2002/39, The United Nations. Elbers, Chris, Jean O. Lanjouw, and Peter Lanjouw (2000) "Welfare in Villages and Towns: Micro-Measurement of Poverty and Inequality", Tinbergen Institute Working Paper No. 2000-029/2, Amsterdam, Netherlands. Elbers, Chris, Jean O. Lanjouw, and Peter Lanjouw (2002) "Micro-Level Estimation of Welfare", Policy Research Working Paper No. 2911, The World Bank, October 2002. Elbers, Chris, Jean O. Lanjouw, and Peter Lanjouw. 2003. "Micro-level Estimation of Poverty and Inequality." Econometrica 71:1, pp. 355-364. Hentschel, J., Lanjouw, J.O., Lanjouw, P. and Poggi, J. (2000) "Combining Census and Survey Data to Study Spatial Dimensions of Poverty: A Case Study of Ecuador", World Bank Economic Review 14(1): 147-166. Mistiaen, Johan, Berk Özler, Tiaray Razafimanantena, and Jean Razafindravonona. 2002. "Putting Welfare on the Map in Madagascar" World Bank Africa Region Working Paper Series No. 34, The World Bank. Pfeffermann, D. and Tiller, R. (2005) `Bootstrap Approximation to Prediction MSE for State-Space Models with Estimated Parameters', Journal of Time Series Analysis, 25(6), November, 893-916. 21 Table 1 Diagnostics for 10 Pseudosurvey Consumption Models Pseudosurvey Sample No. of R 2 ^ 2 R2 Size Clusters ^u 2 R2f.e. 1 500 50 0.4678 0.0291 0.927 2 500 50 0.4593 0.0270 0.912 3 500 50 0.5274 0.0247 0.927 4 500 50 0.4151 0.0019 0.901 5 500 50 0.5176 0.0195 0.961 6 500 50 0.4766 0.0259 0.920 7 500 50 0.4549 0.0263 0.971 8 500 50 0.4205 0.0241 0.910 9 500 50 0.4910 0.0088 0.945 10 500 50 0.4193 0.0310 0.874 22 TABLE 2: Pseudosurvey 2 Truth (1)` (2) (3) (4) (5) Classical' PovMap2 PovMap2 SAS-based Program Alternative Targetpop Procedure (non-parametric) (parametric) Procedure (Elbers et al 2002) (see Appendix) s.e. s.e. s.e. s.e. s.e. 1 0.605 0.614 0.030 0.616 0.027 0.609 0.029 0.611 0.025 0.612 0.037 2 0.568 0.616 0.030 0.622 0.028 0.621 0.028 0.613 0.027 0.616 0.039 3 0.572 0.621 0.032 0.624 0.032 0.619 0.029 0.614 0.029 0.613 0.040 4 0.636 0.636 0.031 0.635 0.024 0.630 0.024 0.627 0.027 0.640 0.036 5 0.612 0.586 0.034 0.585 0.034 0.592 0.033 0.591 0.034 0.584 0.041 6 0.640 0.641 0.031 0.638 0.033 0.641 0.032 0.638 0.029 0.639 0.038 7 0.621 0.568 0.034 0.565 0.035 0.573 0.035 0.572 0.036 0.569 0.038 8 0.647 0.643 0.036 0.644 0.035 0.645 0.033 0.640 0.032 0.626 0.048 9 0.610 0.592 0.029 0.595 0.030 0.599 0.032 0.597 0.033 0.589 0.039 10 0.675 0.609 0.033 0.609 0.034 0.615 0.030 0.612 0.031 0.596 0.038 11 0.603 0.609 0.038 0.605 0.034 0.607 0.030 0.607 0.029 0.606 0.038 12 0.568 0.681 0.037 0.690 0.031 0.685 0.030 0.677 0.033 0.680 0.046 13 0.647 0.623 0.033 0.629 0.029 0.631 0.030 0.623 0.032 0.630 0.038 14 0.604 0.591 0.035 0.599 0.029 0.594 0.030 0.592 0.030 0.583 0.043 15 0.576 0.618 0.036 0.619 0.029 0.625 0.030 0.614 0.030 0.625 0.039 16 0.595 0.613 0.030 0.614 0.029 0.616 0.027 0.608 0.024 0.611 0.038 17 0.553 0.564 0.038 0.565 0.030 0.569 0.029 0.561 0.031 0.553 0.043 18 0.589 0.634 0.039 0.633 0.029 0.636 0.033 0.638 0.033 0.629 0.043 19 0.676 0.638 0.037 0.639 0.029 0.642 0.023 0.637 0.025 0.656 0.039 20 0.613 0.654 0.030 0.653 0.029 0.656 0.027 0.657 0.029 0.651 0.036 Cases of truth falling outside the 2 s.e. 1 1 1 1 1 interval TABLE 3: Pseudosurvey 3 Truth (1) (2) (3) (4) (5) `Classical' PovMap2 PovMap2 SAS-based Program Alternative Procedure Targetpop Procedure (non-parametric) (parametric) (parametric) (See Appendix) 23 (Elbers et al 2002) s.e. s.e. s.e. s.e. s.e. 1 0.605 0.555 0.030 0.554 0.023 0.555 0.030 0.554 0.034 0.554 0.022 2 0.568 0.570 0.037 0.569 0.024 0.570 0.037 0.560 0.040 0.568 0.026 3 0.572 0.544 0.033 0.544 0.030 0.544 0.033 0.531 0.043 0.0541 0.030 4 0.636 0.554 0.034 0.551 0.029 0.554 0.034 0.548 0.043 0.554 0.024 5 0.612 0.576 0.032 0.582 0.028 0.576 0.032 0.562 0.040 0.580 0.028 6 0.640 0.591 0.033 0.587 0.027 0.591 0.033 0.581 0.040 0.591 0.026 7 0.621 0.571 0.033 0.575 0.028 0.571 0.033 0.566 0.039 0.573 0.026 8 0.647 0.629 0.036 0.629 0.032 0.629 0.036 0.619 0.040 0.632 0.029 9 0.610 0.554 0.034 0.556 0.024 0.554 0.034 0.554 0.038 0.558 0.023 10 0.675 0.595 0.033 0.600 0.026 0.595 0.033 0.574 0.043 0.594 0.025 11 0.603 0.584 0.038 0.586 0.025 0.584 0.038 0.587 0.037 0.586 0.027 12 0.568 0.562 0.034 0.561 0.027 0.562 0.034 0.556 0.043 0.563 0.028 13 0.647 0.567 0.040 0.568 0.027 0.567 0.040 0.568 0.040 0.567 0.025 14 0.604 0.527 0.030 0.525 0.025 0.527 0.030 0.531 0.039 0.523 0.022 15 0.576 0.548 0.030 0.545 0.022 0.548 0.030 0.549 0.037 0.545 0.025 16 0.595 0.589 0.026 0.589 0.026 0.589 0.026 0.593 0.040 0.588 0.025 17 0.553 0.492 0.033 0.495 0.022 0.492 0.033 0.487 0.030 0.497 0.025 18 0.589 0.548 0.040 0.549 0.024 0.548 0.040 0.546 0.042 0.547 0.025 19 0.676 0.649 0.031 0.651 0.025 0.649 0.031 0.641 0.033 0.651 0.024 20 0.613 0.652 0.040 0.653 0.025 0.652 0.040 0.632 0.039 0.652 0.027 Cases of truth falling outside the 2 s.e. 3 7 7 7 3 interval 24 Table 4: Relative Frequency of True Target Population Welfare Falling Within 95% Confidence Interval Around Estimated Welfare Survey Mean Headcount FGT2 GE0 1 1.00 1.00 1.00 1.00 2 0.95 0.95 1.00 1.00 3 0.90 0.85 0.80 0.95 4 0.95 1.00 1.00 0.90 5 1.00 1.00 1.00 0.85 6 0.80 0.90 0.80 0.60 7 0.95 0.95 0.95 1.00 8 0.95 0.95 0.90 0.70 9 0.85 0.85 0.80 0.90 10 0.95 0.90 0.90 0.95 Overall 0.93 0.94 0.92 0.89 25 Table 5 Precision of Headcount Estimates with and without Location Variables Numerical Gradient "Classical" Simulation Pseudosurvey 2, POVMAP2 calculations I. Model with Location II. Model with no Variables Location Variables % change Village Population True FGT0 Sample size=500 Sample size=500 in standard Code R 2 = 0.459 , R2 = 0.413, error in (sorted by moving from 2 2 2 2 true FGT0) /u = 0.027 /u = 0.119 Model I. to Estimated s.e. Estimated s.e. Model II. FGT0 FGT0 1 946 0.605 0.614 0.030 0.600 0.040 33% 2 1046 0.568 0.616 0.030 0.622 0.043 43% 3 1162 0.572 0.621 0.032 0.604 0.042 31% 4 991 0.636 0.636 0.031 0.598 0.041 32% 5 1061 0.612 0.586 0.034 0.609 0.042 24% 6 935 0.640 0.641 0.031 0.606 0.040 29% 7 932 0.621 0.568 0.034 0.602 0.046 35% 8 861 0.647 0.643 0.036 0.653 0.042 14% 9 871 0.610 0.592 0.029 0.615 0.038 31% 10 1219 0.675 0.609 0.033 0.622 0.040 21% 11 845 0.603 0.609 0.038 0.615 0.038 0% 12 992 0.568 0.681 0.037 0.624 0.044 9% 13 1289 0.647 0.623 0.033 0.623 0.039 18% 14 1271 0.604 0.591 0.035 0.624 0.045 29% 15 854 0.576 0.618 0.036 0.612 0.039 8% 16 1141 0.595 0.613 0.030 0.614 0.038 27% 17 1181 0.553 0.564 0.038 0.582 0.044 16% 18 820 0.589 0.634 0.039 0.616 0.045 15% 19 1060 0.676 0.638 0.037 0.623 0.038 3% 20 1008 0.613 0.654 0.030 0.637 0.040 33% Total 20485 0.611 0.619 0.017 0.615 0.024 41%% 26 Figure 1: Checking for Bias 40 30 20 10 0 -10 -20 -30 -40 150 155 160 165 170 175 180 185 190 Average difference: -1.49 27 Figure 2: Checking for Bias 0. 12 0. 10 0. 08 0. 06 0. 04 0. 02 0. 00 -0. 02 -0. 04 -0. 06 -0. 08 -0. 10 -0. 12 0. 55 0. 57 0. 59 0. 61 0. 63 0. 65 0. 67 0. 69 Average difference: 0.012 28 Figure 3: Checking for Bias 0. 05 0. 04 0. 03 0. 02 0. 01 0. 00 -0. 01 -0. 02 -0. 03 -0. 04 -0. 05 -0. 06 -0. 07 0. 100 0. 105 0. 110 0. 115 0. 120 0. 125 0. 130 0. 135 0. 140 0. 145 0. 150 0. 155 0. 160 Average difference: -0.0015 29 Figure 4: Checking for Bias 0. 08 0. 07 0. 06 0. 05 0. 04 0. 03 0. 02 0. 01 0. 00 -0. 01 -0. 02 -0. 03 -0. 04 -0. 05 -0. 06 -0. 07 -0. 08 -0. 09 -0. 10 -0. 11 -0. 12 -0. 13 -0. 14 -0. 15 0. 21 0. 22 0. 23 0. 24 0. 25 0. 26 0. 27 0. 28 0. 29 0. 30 0. 31 Average difference: -0.0024 30 Figure 5: Model Specification and Bias 0. 08 0. 04 0. 00 -0. 04 -0. 08 0. 55 0. 57 0. 59 0. 61 0. 63 0. 65 0. 67 0. 69 31 Table 6: Correlations Between Estimated and True Welfare Across Target Populations Survey Mean Headcount FGT2 GE0 Pearson Spearman Pearson Spearman Pearson Spearman Pearson Spearman 1 0.58 0.53 0.64 0.58 0.73 0.75 0.14 -0.05 2 0.27 0.32 0.20 0.22 0.47 0.55 0.02 -0.01 3 0.68 0.69 0.62 0.61 0.54 0.45 0.03 0.14 4 0.50 0.54 0.59 0.57 0.33 0.29 -0.11 -0.06 5 0.67 0.67 0.75 0.69 0.71 0.67 -0.02 0.12 6 0.45 0.50 0.67 0.73 0.80 0.78 0.06 0.15 7 0.37 0.36 0.35 0.30 0.21 0.20 0.24 0.07 8 0.66 0.67 0.59 0.50 0.53 0.51 0.18 0.15 9 0.22 0.11 0.23 0.12 0.15 0.04 0.11 0.18 10 0.28 0.21 0.38 0.28 0.18 0.08 -0.17 -0.18 Average 0.47 0.46 0.50 0.46 0.46 0.43 0.05 0.05 32 Appendix 1 A Non-parametric Simulation Procedure In this appendix we describe the procedure used for generating the welfare predictions reported in the paper. The procedure was developed to diminish the role of distributional assumptions and increase the role of bootstrapping. A key aspect of the prediction is the way in which 'model error' is handled, or the inevitable deviation between estimated and true parameters.20 So far we have accounted for model error using the estimated covariance matrices for the model parameters. Alternatively, sampling error of the parameter estimates can be simulated directly, by re- sampling the survey and re-estimation of the parameters, which is what we do in the current paper. The survey is resampled by parametric bootstrapping of the error term, based on an initial set of point estimates and residuals. This procedure also allows us to detect bias in the estimators for the parameters of the error model. Starting from any given 'fake survey' the steps are as follows21: 1. For the current application, model selection must necessarily be a semi-automatic procedure. Thus we carry out an OLS regression of log per capita consumption on an extensive set of candidate variables. 2. Next we limit the number of covariates using a procedure for step-wise selection of regressors. 3. With the resulting set of regressors, we specify and estimate a linear mixed effect model accounting for both cluster random effects and household-level heteroskedasticity.22 We have used the following specification for heteroskedasticity: h =0e 1 h y^ where y^h denotes the point estimate of household h's log per capita consumption (pcx). 4. The estimation yields - point estimates for the regression coefficients,^ . - point estimates for log per capita expenditure, y^ . - point estimates for the heteroskedasticity model, ^ . - the ^ allows us to derive point estimates for the standard deviation of household- level errors, ^s . 20'True' is interpreted here as the parameter estimates that would result from a sample consisting of the full population. 21The computations have been carried out using R version 2.2.1 and the nlme package, version number 3.1.66. Script files of the procedure can be obtained upon request from the authors. 22 See Venables and Ripley (1997) and Bates and Pinheiro (1998). The procedures for estimating linear mixed effect models in R's nlme package can handle cluster random effects and household-level heteroskedasticity of a simple type. 33 - residuals, which we split into mean residuals per cluster, ^ , the standard deviation of these, , and deviations from the cluster mean, ^ . - the standardized household residuals, ^ = ^ . ^s These estimates are used to check for bias in the estimation procedure. There is reason to expect such a bias, especially for the heteroskedasticity model and the variance of the cluster effects . 23 5. The general idea to generate 100 samples by parametric bootstrapping using the above parameters as the 'true' model. We resample 's from ^ , standardized household residuals from ^ , multiplying the latter with each households specific standard deviation from ^s . The total residual is added to y^ to yield a new value for log per capita expenditure for each household. The new value is compatible with the model estimated under 3 above, and with the value of household regressors. 6. Each bootstrapped sample is used to re-estimate the model and the mean of the estimates is used to check for estimation bias. It turns out that the bias (if any) is small and inconsequential. Nevertheless, we have compensated for bias in the estimators for and using the average bias found in this first round of simulations. With the adjusted values for the variance estimators we again generate 100 samples by parametric bootstrapping. 7. For each sample we restimate the model, resulting in point estimates for , , , and . These are used to impute log per capita consumption values for households in the 'census'. For census 'EAs' an is drawn from the estimation result, for households a is drawn and multiplied with the household-specific variance, using the current value of . The sum of cluster and household 'error' is added to the systematic part of log per capita expenditure, based on the household regressors and the current value of . Thus we generate values of log per capita expenditure for all households in the census. Using these we compute welfare statistics (poverty and inequality measures). The tables and figures in the text represent means and standard deviations of the simulated welfare statistics thus generated. 23See Pfefferman and Glickman(2004), and Rao (2003). The estimators for the regression coefficients are unbiased regardless of the error structure imposed. 34 Appendix References Bates, D.M. and Pinheiro, J.C. (1998) "Computational methods for multilevel models" available in PostScript or PDF formats at http://franz.stat.wisc.edu/pub/NLME/) Pfeffermann, D., and Tiller, R. (2005). Bootstrap Approximation to Prediction MSE for State-Space Models with Estimated Parameters. Journal of Time Series Analysis, 26, 893-216. Pfeffermann, D., and Glickman, H. (2004). "Mean Square Error Approximation in Small Area Estimation By Use of Parametric and Nonparametric Bootstrap". Invited lecture at the Joint Statistical Meeting, Toronto. Rao, J.N.K. (2003) Small Area Estimation. Wiley: New York. Venables, W.N. and Ripley, B.D. (1997) Modern Applied Statistics with S-plus. 3rd Edition, Springer-Verlag. 35 Appendix 2: OLS Regression Results of Consumption Models Table 1: Pseudo Survey 1 Dependent Variable: Log Per Capita Expenditure Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.998746 0.376066 15.951 < 2e-16 *** hsize -0.088087 0.013425 -6.562 1.37e-10 *** onlyindhead -0.357614 0.187783 -1.904 0.057450 . refrig 0.164402 0.076970 2.136 0.033187 * toilet -0.096050 0.052603 -1.826 0.068475 . vehicle 0.203101 0.088630 2.292 0.022359 * bilinghead -0.341641 0.080568 -4.240 2.67e-05 *** rechead 0.092900 0.059246 1.568 0.117526 av_femhead -0.898957 0.371149 -2.422 0.015798 * av_onlyindhead 2.250072 0.566152 3.974 8.13e-05 *** av_primedhead 0.774239 0.260069 2.977 0.003056 ** av_rechead 0.786780 0.223840 3.515 0.000481 *** av_runwater -0.098425 0.066368 -1.483 0.138717 rhsize2 0.796609 0.167641 4.752 2.66e-06 *** rroompp -0.174065 0.039715 -4.383 1.44e-05 *** rroompp2 0.011750 0.003473 3.384 0.000773 *** Multiple R-Squared: 0.4838, Adjusted R-squared: 0.4678 Table 2: Pseudo Survey 2 Dependent Variable: Log Per Capita Expenditure Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.129474 0.804244 11.352 < 2e-16 *** hsize -0.096499 0.014865 -6.492 2.12e-10 *** gasstove 0.172803 0.070264 2.459 0.014270 * refrig 0.133641 0.081375 1.642 0.101186 toilet 0.087655 0.059192 1.481 0.139298 adultfracf 0.327968 0.159454 2.057 0.040243 * av_adultfracm 0.747587 0.468641 1.595 0.111320 av_agehead -0.033981 0.007541 -4.506 8.29e-06 *** av_concreteroof -0.382385 0.207337 -1.844 0.065759 . av_femhead -2.605026 0.637204 -4.088 5.09e-05 *** av_primedhead -0.659667 0.308155 -2.141 0.032800 * av_radio -0.874114 0.318263 -2.747 0.006249 ** av_rechead 0.451829 0.286645 1.576 0.115622 av_runwater -0.179179 0.086103 -2.081 0.037964 * av_television 0.776940 0.212897 3.649 0.000292 *** av_waterheater 1.502314 0.854344 1.758 0.079308 . rhsize2 0.953988 0.147147 6.483 2.23e-10 *** rroompp -0.027115 0.017004 -1.595 0.111454 ragehead2 123.342227 50.256095 2.454 0.014470 * Multiple R-Squared: 0.4788, Adjusted R-squared: 0.4593 36 Table 3: Pseudo Survey 3 Dependent Variable: Log Per Capita Expenditure Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7.740004 0.553045 13.995 < 2e-16 *** hsize -0.111892 0.012125 -9.228 < 2e-16 *** blender 0.142074 0.069276 2.051 0.040833 * brickwall -0.123116 0.065063 -1.892 0.059067 . gasstove 0.231063 0.072605 3.182 0.001556 ** naturalroof -0.169465 0.071431 -2.372 0.018070 * onlyindhead 0.242028 0.166921 1.450 0.147733 radio 0.140417 0.055806 2.516 0.012193 * stereo 0.247070 0.116874 2.114 0.035038 * adultfracf 0.302865 0.165445 1.831 0.067787 . bilinghead 0.163705 0.073534 2.226 0.026468 * agehead -0.002257 0.001585 -1.424 0.155226 secedhead 0.227859 0.118303 1.926 0.054693 . av_agehead -0.015256 0.006668 -2.288 0.022575 * av_blender -1.091239 0.259010 -4.213 3.02e-05 *** av_concreteroof 1.030535 0.205624 5.012 7.63e-07 *** av_femhead -0.657499 0.421795 -1.559 0.119708 av_hsize -0.096361 0.038338 -2.513 0.012285 * av_onlyindhead -0.539298 0.359583 -1.500 0.134336 av_primedhead -0.386760 0.255997 -1.511 0.131505 av_radio -0.745915 0.219001 -3.406 0.000715 *** av_refrig 0.870410 0.258107 3.372 0.000807 *** av_television 0.807982 0.192275 4.202 3.16e-05 *** av_toilet -0.258594 0.096860 -2.670 0.007851 ** av_waterheater -1.194664 0.657062 -1.818 0.069666 . rhsize2 0.949978 0.146055 6.504 1.99e-10 *** Multiple R-Squared: 0.5511, Adjusted R-squared: 0.5274 37 Table 4: Pseudo Survey 4 Dependent Variable: Log Per Capita Expenditure Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.061854 0.351890 14.385 < 2e-16 *** hsize -0.109834 0.016429 -6.685 6.35e-11 *** refrig 0.174286 0.076323 2.284 0.022831 * toilet 0.161254 0.054947 2.935 0.003497 ** adultfracm 0.320246 0.139893 2.289 0.022495 * adultfracf 0.293536 0.138096 2.126 0.034042 * bilinghead 0.143261 0.062064 2.308 0.021403 * secedhead 0.205535 0.105298 1.952 0.051520 . av_agehead 0.014903 0.007363 2.024 0.043521 * av_blender 0.423415 0.159784 2.650 0.008314 ** av_brickwall 0.382044 0.128597 2.971 0.003117 ** av_radio -0.727830 0.218147 -3.336 0.000914 *** rhsize2 0.476885 0.148333 3.215 0.001392 ** rroompp -0.140513 0.045565 -3.084 0.002161 ** rroompp2 0.012268 0.004478 2.740 0.006379 ** Multiple R-Squared: 0.4315, Adjusted R-squared: 0.4151 Table 5: Pseudo Survey 5 Dependent Variable: Log Per Capita Expenditure Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.20858 0.33501 18.533 < 2e-16 *** hsize -0.10914 0.01331 -8.198 2.22e-15 *** blender 0.17330 0.06220 2.786 0.00554 ** brickwall 0.19870 0.06127 3.243 0.00126 ** onlyindhead -0.31920 0.16104 -1.982 0.04804 * toilet 0.09907 0.05699 1.738 0.08279 . adultfracm 0.26519 0.13636 1.945 0.05239 . av_adultfracm 1.05350 0.38360 2.746 0.00625 ** av_blender -0.36338 0.16296 -2.230 0.02621 * av_femhead -0.88381 0.36526 -2.420 0.01590 * av_refrig 1.56893 0.30584 5.130 4.21e-07 *** av_runwater 0.19768 0.07834 2.524 0.01194 * av_secedhead -0.88101 0.49439 -1.782 0.07538 . av_toilet -0.38558 0.11117 -3.468 0.00057 *** av_washmachine -1.43055 0.49677 -2.880 0.00416 ** rhsize2 0.72648 0.15162 4.791 2.21e-06 *** rroompp -0.04117 0.01615 -2.550 0.01109 * Multiple R-Squared: 0.5331, Adjusted R-squared: 0.5176 38 Table 6: Pseudo Survey 6 Dependent Variable: Log Per Capita Expenditure Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.830e+00 4.540e-01 10.639 < 2e-16 *** hsize -1.031e-01 1.365e-02 -7.555 2.13e-13 *** blender 1.693e-01 6.489e-02 2.608 0.009384 ** onlyindhead -3.751e-01 1.941e-01 -1.933 0.053881 . refrig 1.485e-01 7.901e-02 1.879 0.060809 . bilinghead -3.069e-01 7.068e-02 -4.342 1.73e-05 *** agehead -6.775e-03 2.913e-03 -2.325 0.020469 * av_adultfracm 2.464e+00 6.953e-01 3.545 0.000432 *** av_agehead -1.184e-02 5.958e-03 -1.987 0.047493 * av_blender -5.047e-01 1.965e-01 -2.569 0.010503 * av_brickwall 1.187e+00 2.092e-01 5.671 2.45e-08 *** av_concreteroof -7.636e-01 2.516e-01 -3.035 0.002537 ** av_onlyindhead 3.661e+00 5.887e-01 6.219 1.09e-09 *** av_rechead 1.371e+00 2.384e-01 5.752 1.57e-08 *** av_refrig 4.606e-01 3.069e-01 1.501 0.134090 av_washmachine -7.053e-01 3.694e-01 -1.909 0.056798 . av_waterheater 2.058e+00 7.781e-01 2.645 0.008436 ** rhsize2 6.923e-01 1.459e-01 4.746 2.74e-06 *** rroompp -4.672e-02 1.594e-02 -2.931 0.003541 ** ragehead2 -1.285e+02 8.692e+01 -1.479 0.139890 Multiple R-Squared: 0.4965, Adjusted R-squared: 0.4766 Table 7: Pseudo Survey 7 Dependent Variable: Log Per Capita Expenditure Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 7.05900 0.43468 16.240 < 2e-16 *** hsize -0.12896 0.01467 -8.793 < 2e-16 *** brickwall 0.12956 0.06725 1.927 0.054605 . refrig 0.27110 0.08101 3.347 0.000882 *** toilet 0.10500 0.06705 1.566 0.117989 rechead 0.11263 0.05835 1.930 0.054186 . av_brickwall 0.44800 0.19190 2.335 0.019975 * av_concreteroof -0.65035 0.24228 -2.684 0.007518 ** av_femhead -2.13496 0.43132 -4.950 1.03e-06 *** av_hsize 0.16780 0.04718 3.556 0.000414 *** av_primedhead 0.73362 0.31380 2.338 0.019801 * av_radio -0.41700 0.19357 -2.154 0.031714 * av_secedhead 1.06547 0.75789 1.406 0.160414 av_secplusedhead -2.31016 1.26275 -1.829 0.067947 . av_toilet -0.33099 0.12981 -2.550 0.011084 * av_waterheater -1.91772 0.77601 -2.471 0.013809 * rhsize2 0.51461 0.14845 3.467 0.000574 *** rroompp -0.04888 0.01765 -2.769 0.005839 ** Multiple R-Squared: 0.4735, Adjusted R-squared: 0.4549 39 Table 8: Pseudo Survey 8 Dependent Variable: Log Per Capita Expenditure Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.734e+00 5.829e-01 11.552 < 2e-16 *** hsize -1.208e-01 1.689e-02 -7.151 3.22e-12 *** radio 1.080e-01 6.370e-02 1.695 0.09076 . refrig 2.748e-01 8.626e-02 3.186 0.00154 ** toilet 1.568e-01 7.117e-02 2.203 0.02806 * vehicle 2.872e-01 1.095e-01 2.623 0.00898 ** agehead -7.636e-03 3.424e-03 -2.230 0.02619 * av_adultfracm 1.856e+00 9.437e-01 1.967 0.04976 * av_concreteroof 8.002e-01 1.790e-01 4.472 9.70e-06 *** av_femhead -1.495e+00 5.201e-01 -2.875 0.00422 ** av_primedhead -1.095e+00 4.020e-01 -2.724 0.00668 ** av_rechead 5.684e-01 2.779e-01 2.045 0.04139 * av_runwater -1.586e-01 8.212e-02 -1.931 0.05410 . av_secedhead 2.328e+00 7.829e-01 2.974 0.00309 ** av_toilet -2.154e-01 1.340e-01 -1.608 0.10844 rhsize2 8.414e-01 1.902e-01 4.424 1.20e-05 *** rroompp -7.209e-02 3.950e-02 -1.825 0.06864 . rroompp2 6.130e-03 3.114e-03 1.968 0.04962 * ragehead2 -1.873e+02 1.038e+02 -1.805 0.07173 . Multiple R-Squared: 0.4414, Adjusted R-squared: 0.4205 Table 9: Pseudo Survey 9 Dependent Variable: Log Per Capita Expenditure Coefficients: Value Std.Error DF t-value p-value (Intercept) 5.086357 0.1885547 441 26.975497 0.0000 hsize -0.141745 0.0124185 441 -11.414072 0.0000 brickwall 0.104505 0.0600241 441 1.741055 0.0824 gasstove 0.135917 0.0672063 441 2.022382 0.0437 onlyindhead -0.895540 0.1898896 441 -4.716112 0.0000 radio 0.137231 0.0543460 441 2.525141 0.0119 adultfracf 0.402884 0.1555154 441 2.590636 0.0099 bilinghead -0.111148 0.0660533 441 -1.682702 0.0931 secedhead 0.260845 0.1083821 441 2.406719 0.0165 av_hsize 0.098639 0.0312940 44 3.152021 0.0029 av_runwater -0.149705 0.0722344 44 -2.072487 0.0441 av_secedhead 1.286449 0.3965916 44 3.243761 0.0023 av_television -0.318822 0.1260081 44 -2.530169 0.0151 av_washmachine 1.140216 0.2628267 44 4.338280 0.0001 rhsize2 0.653687 0.1320114 441 4.951745 0.0000 Multiple R-Squared: 0.506, Adjusted R-squared: 0.491 40 Table 10: Pseudo Survey 10 Dependent Variable: Log Per Capita Expenditure Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.31149 0.23697 26.634 < 2e-16 *** hsize -0.11552 0.01597 -7.232 1.87e-12 *** naturalroof -0.18068 0.07998 -2.259 0.024322 * television 0.14613 0.05767 2.534 0.011593 * vehicle 0.26146 0.09715 2.691 0.007363 ** bilinghead -0.15631 0.07185 -2.175 0.030083 * av_adultfracm -1.67378 0.69667 -2.403 0.016655 * av_blender -0.65744 0.19602 -3.354 0.000859 *** av_brickwall 0.22799 0.12851 1.774 0.076677 . av_radio -0.59248 0.21000 -2.821 0.004978 ** av_roompp 0.64006 0.23260 2.752 0.006150 ** av_secedhead 1.37118 0.53967 2.541 0.011371 * rhsize2 0.72202 0.17679 4.084 5.17e-05 *** rroompp -0.03031 0.01717 -1.765 0.078153 . Multiple R-Squared: 0.4344, Adjusted R-squared: 0.4193 41