wps 2- lIl POLICY RESEARCH WORKING PAPER 2911 Micro-Level Estimation of Welfare Chris Elbers Jean 0. Lanjouw Peter Lan jouw The World Bank Development Research Group Poverty Team October 2002 POLIcy RESEARCH WORKING PAPER 2911 Abstract The authors construct and derive the properties of Using data from Ecuador, the authors obtain estimates estimators of welfare that take advantage of the detailed of welfare measures, some of which are quite reliable for information about living standards available in small populations as small as 15,000 households-a "town." household surveys and the comprehensive coverage of a They provide simple illustrations of their use. Such census or large sample. By combining the strengths of estimates open up the possibility of testing, at a more each, the estimators can be used at a remarkably convincing intra-country level, the many recent models disaggregated level. They have a clear interpretation, are relating welfare distributions to growth and a variety of mutually comparable, and can be assessed for reliability socioeconomic and political outcomes. using standard statistical theory. This paper-a product of the Poverty Team, Development Research Group-is part of a larger effort in the group to develop tools for the analysis of poverty and income distribution. Copies of the paper are available free from the World Bank, 1818 H Street NW, Washington, DC 20433. Please contact Patricia Sader, room MC3-556, telephone 202-473-3902, fax 202- 522-1153, email address psader@worldbank.org. Policy Research Working Papers are also posted on the Web at http://econ.worldbank.org. The authors may be contacted at celbers@econ.vu.nl, jlanjouw@brookings.edu, or planjouw@worldbank.org. October 2002. (57 pages) The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the view of the World Bank, its Executive Directors, or the countries they represent. Produced by the Research Advisory Staff MICRO-LEVEL ESTIMATION OF WELFARE BY CHRIS ELBERS, JEAN 0. LANJOUW, AND PETER LANJOUW' 'We are very grateful to Ecuador's Instituto Nacional de Estadistica y Censo (INEC) for making its 1990 unit-record census data available to us. Much of this research was done while the authors were at the Vrije Universiteit, Amsterdam, and we appreciate the hospitality and input from colleagues there. We also thank Don Andrews, Francois Bourguignon, Andrew Chesher, Denis Cogneau, Angus Deaton, Jean-Yves Duclos, Francisco Ferreira, Jesko Hentschel, Michiel Keyzer, Steven Ludlow, Berk Ozler, Giovanna Prennushi, Martin Ravallion, Piet Rietveld, John Rust and Chris Udry for comments and useful discussions, as well as seminar participants at the Vrije Universiteit, ENRA (Paris), U.C. Berkeley, Georgetown University, the World Bank and the Brookings Institution. Financial support was received from the Bank Netherlands Partnership Program. 1. INTRODUCTION RECENT THEORETICAL ADVANCES have brought income and wealth distributions back into a prominent position in growth and development theories, and as determinants of specific socio-economic outcomes, such as health or levels of violence.2 Empirical investi- gation of the importance of these relationships, however, has been held back by the lack of sufficiently detailed high quality data on distributions. Time series data are sparse, con- straining most econometric analyses to a cross-section of countries. Not only may these data be non-comparable, such estimations require strong assumptions about the stability of structural relationships across large geographical areas and political units.3 Further, many of the hypothesized relationships are more obviously relevant for smaller groups or areas. For example, as noted by Deaton (1999), while it is not clear why country-wide 2The models in this growing literature describe a wide variety of linkages between distributions and growth. For example, inequality (or poverty) limits the size of markets which slows growth when there are scale economies (Murphy, Shleifer and Vishny, 1989); with imperfect capital markets, greater inequality limits those able to make productive investment and occupational choices (Galor and Zeira, 1993; Banerjee and Newman, 1993). Aghion and Bolton (1997) endogenize inequality, with growth having a feedback effect on the distribution of wealth via its effect on credit, or labour, markets. Political economy models such as Alesina and Rodrik (1994) and Persson and Tabellini (1994) suggest that, in democratic regimes, inequality will lead to distortionary redistributive policies which slow growth. 3The state-of-the-art data set for this purpose, compiled by Deininger and Squire (1996), goes a long way towards establishing comparability but the critique by Atkinson and Brandolini (2001) shows it remains very far from ideal. (See also Fields, 1989 and 2001, on data.) Bruno, Ravallion and Squire (1998) give examples of country-level estimation of growth models. Al- though they do not include distributional variables, Barro and Sala-i-Martin estimate a growth model using U.S. state-level data where the fact that it is a better controlled situation is emphasized (see Com- ments and Discussion in Barro and Sala-i-Martin, 1991). Ravallion (1998) points out that aggregation alone can bias estimates of the relationship between asset inequality and income growth derived from country-level data, and demonstrates this using county-level panel data from China. For a more general identification critique of cross-country models see Banerjee and Duflo (2000). 2 inequality should directly affect an individual's health, a link could be made to the degree of inequality within his reference group. The problem confronted is that household surveys that include reasonable measures of income or consumption can be used to calculate distributional measures, but at low levels of aggregation these samples are rarely representative or of sufficient size to yield statistically reliable estimates. At the same time, census (or other large sample) data of sufficient size to allow disaggregation either have no information about income or con- sumption, or measure these variables poorly.4 This paper outlines a statistical procedure to combine these types of data to take advantage of the detail in household sample sur- veys and the comprehensive coverage of a census. It extends the literature on small area statistics (Ghosh and Rao (1994), Rao (1999)) by developing estimators of population parameters which are non-linear functions of the underlying variable of interest (here unit level consumption), and by deriving them from the full unit level distribution of that variable. In examples using Ecuadorian data, our estimates have levels of precision compara- ble to those of commonly used survey based welfare estimates - but for populations as small as 15,000 households, a 'town'. This is an enormous improvement over survey 4For example, a single question regarding individuals' incomes in the 1996 South African census generates an estimate of national income just 83% the size of the national expenditure estimate derived from a representative household survey, and a per-capita poverty rate 25% higher, with discrepancies systematically related to characteristics such as household location (Alderman, et.al., 2002). 3 based estimates, which are typically only consistent for areas encompassing hundreds of thousands, even millions, of households. Experience using the method in South Africa, Brazil, Panama, Madagascar and Nicaragua suggest .that Ecuador is not an unusual case (Alderman, et. al. (2002), and Elbers, Lanjouw, Lanjouw, and Leite (2002)). With accurate welfare measures for groups the size of towns, villages or even neighbor- hoods, researchers should be able to test hypotheses at an appropriate level of disaggre- gation, where assumptions about a stable underlying structure are more tenable. Better local measures of poverty and inequality will also be useful in the targetting of devel- opment assistance and many governments are enthusiastic about new methods for using their survey and census data for this purpose. Poverty 'maps' can be simple and effective policy tools. Disaggregated welfare estimates can also help governments understand the tradeoffs involved in decentralizing their spending decisions. While it is beneficial to take advantage of local information about community needs and priorities, if local inequalities are large and decisions are taken by the elite, projects may not benefit the poorest. Local level inequality measures, together with data on project choices, make it possible to shed light on this potential cost of decentralization. Datasets have been combined to fill in missing information or avoid sampling biases in a variety of other contexts. Examples in the econometric literature include Arellano and Meghir (1992) who estimate a labour supply model combining two samples. They 4 use the UK Family Expenditure Survey (FES) to estimate models of wages and other income conditioning on variables common across the two samples. Hours and job search information from the much larger Labour Force Survey is then supplemented by predicted financial information. In a similar spirit, Angrist and Krueger (1992) combine data from two U.S. censuses. They estimate a model of educational attainment as a function of school entry age, where the first variable is available in only in one census and the second in another, but an instrument, birth quarter, is common to both. Lusardi (1996) applies this two-sample IV estimator in a model of consumption behaviour. Hellerstein and Imbens (1999) estimate weighted wage regressions using the U.S. National Longitudinal Survey, but incorporate aggregate information from the U.S. census by constructing weights which force moments in the weighted sample to match those in the census. After the basic idea is outlined, we develop a model of consumption in Section 3. We use a flexible specification of the disturbance term that allows for non-normality, spatial autocorrelation and heteroscedasticity. One might ask whether, given a reasonable first- stage model of consumption, it would suffice to calculate welfare measures on the basis of predicted consumption alone. In general such an approach yields inconsistent estimates and, more importantly, it may not even preserve welfare rankings of villages. Figures L.a and 1.b demonstrate using the data from Ecuador described below. In Figure 1.a 'villages' are ordered along the x-axis according to a consistent estimate of the expected 5 proportion of their households that are poor. The jagged line represents estimates of the same proportions based only on the systematic part of households' consumption. Figure 1.b shows the same comparison for the expected general entropy (0.5) measure of inequality. There is clearly significant and sizable bias and re-ranking associated with ignoring the unobserved component of consumption even with the extensive set of regressors available to us in this example. Thus one would expect the use of predicted consumption to be problematic in many actual applications. The welfare estimator is developed in Section 4 and its properties derived in Section 5. Section 6 gives computational details with results for our Ecuadorian example presented in Section 7. In this section, we explore briefly the implications of making various modelling assumptions. Section 8 indicates how much the estimator improves on sample based estimates. Section 9 gives results for additional welfare measures and then, in Section 10, we provide simple illustrations of the use of our estimators. The final section concludes. 2. THE BASIC IDEA The idea is straightforward. Let W be an indicator of poverty or inequality based on the distribution of a household-level variable of interest, Yh. Using the smaller and richer data sample, we estimate the joint distribution of Yh and a vector of covariates, Xh. By restricting the set of explanatory variables to those that can also be linked to 6 households in the larger sample or census, this estimated distribution can be used to generate the distribution of Yh for any sub-population in the larger sample conditional on the sub-population's observed characteristics.5 This, in turn, allows us to generate the conditional distribution of W, in particular, its point estimate and prediction error. 3. THE CONSUMPTION MODEL The first concern is to develop an accurate empirical model of yCh, the per capita expenditure of household h in sample cluster c. We consider a linear approximation to the conditional distribution of Ych, (1) In ych = E [In ychI xlch + Uch = XcTh, + Uch, where the vector of disturbances u F(O, E),6 Note that, unlike in much of econometrics, ,6 is not intended to capture only the direct effect of x on y. Because the survey estimates will be used to impute into the census, if there is (unmodelled) variation in the parameters we would prefer to fit most closely the clusters that represent large census- populations. This argues for weighting observations by population expansion factors. 5The explanatory variables are observed values and thus need to have the same degree of accuracy in addition to the same definitions across data sources. Comparing distributions of reponses at a level where the survey is representative is a check that we have found to be important in practice. 6One could consider estimating E(ylx) or the conditional density p(ylx) non-parametrically. In estimating expenditure for each household in the populations of interest (perhaps totalling millions) conditioning on, say, thirty observed characteristics, a major difficulty is to find a method of weighting that lowers the computational burden. See Keyzer (2000) and Tarozzi (2002) for examples and discussion 7 To allow for a within cluster correlation in disturbances, we use the following specifi- cation: Uch = 77c + Ech, where , and E are independent of each other and uncorrelated with observables, Xch One expects location to be related to household income and consumption, and it is certainly plausible that some of the effect of location might remain unexplained even with a rich set of regressors. For any given disturbance variance, aCoh, the greater the fraction due the common component t7c the less one enjoys the benefits of aggregating over more households within a village. Welfare estimates become less precise. Furthur, the greater the part of the disturbance which is common, the lower will be inequality. Thus, failing to take account of spatial correlation in the disturbances would result in underestimated standard errors on welfare estimates, and upward biased estimates of inequality (but see the examples below). Since residual location effects can greatly reduce the precision of welfare estimates, it is important to explain the variation in consumption due to location as far as possible with the choice and construction of xh variables. We see in the example below that location means of household-level variables are particularly useful. Clusters in survey data typically correspond to enumeration areas (EA) in the population census. Thus, means can be calculated over all households in an EA and merged into the smaller sample data. 8 Because they include far more households, location means calculated in this way give a considerably less noisy indicator than the same means taken over only the households in a survey cluster. Other sources of information could be merged with both census and survey datasets to explain location effects as needed. Geographic information system databases, for example, allow a multitude of environmental and community characteristics to be geographically defined both comprehensively and with great precision. An initial estimate of ,l in equation (1) is obtained from OLS or weighted least squares estimation. Denote the residuals of this regression as iZm. The number of clusters in a household survey is generally too small to allow for heteroscedasticity in the cluster component of the disturbance. However, the variance of the idiosyncratic part of the disturbance, 2;h, can be given a flexible form. With consistent estimates of fi, the residuals eh from the decomposition ich = ii. + (iZm -U.) = & + em, (where a subscript '.' indicates an average over that index) can be used to estimate the variance of -,h. We propose a logistic form, (2) (Zch,a,A)B) = [AeTh + B] \Ch,,,/ [1± ezTiP The upper and lower bounds, A and B, can be estimated along with the parameter vector a using a standard pseudo maximum likelihood procedure.7 This functional form avoids 7An estimate of the variance of the estimators can be derived from the information matrix and used to 9 both negative and extremely high predicted variances. The variance, o,7, of the remaining (weighted) cluster random effect is estimated non- parametrically, allowing for heteroscedasticity in ech. This is a straightforward application of random effect modelling (e.g., Greene (2000), Section 14.4.2). An alternative approach based on moment conditions gives similar results. See Appendix 1. In what follows we need to simulate the residual terms 77 and e. Appropriate distribu- tional forms can be determined from the cluster residuals & and standardized household residuals _ rch [ 1 ech 1 (3) e,: = ch- -Ech, c ,h h H ,ch respectively, where H is the number of observations. The second term in e:h adjusts for weighting at the first stage. One can avoid making any specific distributional form assumptions by drawing directly from the standardized residuals. Alternatively, per- centiles of the empirical distribution of the standardized residuals can be compared to the corresponding percentiles of standardized normal, t, or other distributions. Before proceeding to simulation, the estimated variance-covariance matrix, E, weighted by the household expansion factors, eh, is used to obtain GLS estimates of the first-stage construct a Wald test for homoscedasticity (Greene (2000), Section 12.5.3). Allowing the bounds to be freely estimated generates a standardized distribution for predicted disturbances which is well behaved in our experience. This is particularly important when using the standardized residuals directly in a semi- parametric approach to simulation (see Section 7 below.) However, we have also found that imposing a minimum bound of zero and a maximum bound A* = (1.05) max{e2h} yields similar estimates of the parameters a. 10 parameters, IGLS, and their variance, Var(I3GLS). In our experience, model estimates have been very robust to estimation strategy, with weighted GLS estimates not signif- icantly different from the results of OLS or quantile regressions weighted by expansion factors. The GLS estimates do not differ significantly from coefficients obtained from weighted quantile regressions. 4. THE WELFARE ESTIMATOR Although disaggregation may be along any dimension - not necessarily geographic - for convenience we refer to our target populations as 'villages'. There are MV households in village v and household h has mh family members. To study the properties of our welfare estimator as a function of population size we assume that the characteristics Xh and the family size mh of each household are drawn independently from a village-specific constant distribution function G (x, m): the super population approach. While the unit of observation for expenditure in these data is typically the household, we are more often interested in poverty and inequality measures based on individuals. Thus we write W(m,,X,3l,u,), where n,, is an M, -vector of household sizes in village v, X, is a Mv x k matrix of observable characteristics and u, is an Mv-vector of disturbances. 8Consider the GLS model V = X*/3 + g*, where y* = Py, etc. E[ec] = Q, W is a weighting matrix of expansion factors, and pTp = WO-1. Then Var(GLS) = (XTWQI X)1 (XTWQ-lWX)(XTWQ -lX)-l. 11 Because the vector of disturbances for the target population, u, is unknown, we esti- mate the expected value of the indicator given the village households' observable charac- teristics and the model of expenditure. This expectation is denoted A,v = E[Wlm,,, Xv, Cv], where (v is the vector of model parameters, including those which describe the distribu- tion of the disturbances. For most poverty measures W can be written as an additively separable function of household poverty rates, W(Xh, P, Uh), and i, can be written (4) /-I, = I E Mh |Wh (Xh, 0,Uh) d,,, (Uh), N hEHt m uh where H,, is the set of all households in village v, N, = ZhEH mh is the total number of individuals, and ,vh is the marginal distribution of the disturbance term of household h in village v. When W is an inequality measure, however, the contribution of one household depends on the level of well-being of other households and W is no longer separable. Then we need the more general form, (5) ,V = | ...| W(mi,Xv,, ,u,)dF'(uMv,...,ul), where ul ... uM, are the disturbance terms for the M, households in village v. In constructing an estimator of p,v we replace (, with consistent estimators, (v, from the first stage expenditure regression. This yields 74 = E[W I mv, Xv,Zv]. This expectation is often analytically intractable so simulation or numerical integration are used to obtain the estimator S4. 12 5. PROPERTIES AND PRECISION OF THE ESTIMATOR The difference between i, our estimator of the expected value of W for the village, and the actual level may be written (6) W- =( - )+(U-I)+(- . (The index v is suppressed here and below). Thus the prediction error has three compo- nents: the first due to the presence of a disturbance term in the first-stage model which causes households' actual expenditures to deviate from their expected values (idiosyn- cratic error); the second due to variance in the first-stage estimates of the parameters of the expenditure model (model error); and the last due to using an inexact method to compute A (computation error). The error components are uncorrelated (see below). We consider the properties of each:9 Idiosyncratic Error - (W -,) The actual value of the welfare indicator for a village deviates from its expected value, it, as a result of the realizations of the unobserved component of expenditure in that village. Figure 2 illustrates. For convenience, denote the known expenditure component {xTh,8} as th. Randomly drawn vectors uT are added to t and empirical distributions of log 9Our target is the level of welfare that could be calculated if we were fortunate enough to have obser- vations on expenditure for all households in a population. Clearly because expenditures are measured with error this may differ from a measure based on true expenditures. See Chesher and Schluter (2002) for methods to estimate the sensitivity of welfare measures to mismeasurement in y. 13 per-capita expenditure are graphed. The first panel shows the cumulative distribution of log per-capita expenditure based on a single simulation draw for 10 households. Subse- quent panels superimpose 25 simulations for target populations of increasing size (where, for the purpose of illustration, Uh is assumed to be distributed iid JA(O, a2)). For small populations there is considerable variation in distributions across realizations of u. It is easily proved that a limiting picture, that is for an infinite-sized population, will portray the underlying distribution. As is clear from Figure 2, particular realizations of u lose their effect on the empirical distribution of consumption. When W is separable, this error is a weighted sum of household contributions: (7) (W - I)=fmM M i% Mh [W(Xh,Q, Uh) W (Xh, , Uh)dF (Uh)J where mM =N/M is the mean household size among M village households. As the village population size increases, new values of x, and m are drawn from the constant distribution function G,(x, m). To draw new error terms in accordance with the model uch = tc + ech complete enumeration areas are added, independently of previous EAs. Since Tm converges in probability to E[m], (8) apW V(,SI)a M __ 0, where (9) E t E[m'V1ar(w1Xh,M)] 14 When W is a non-separable inequality measure there usually is some pair of func- tions f and g, such that W may be written W = f (y, g) ,where Y = 'N EhEH mhYh and 9= N EhhH mhg(Yh) are means of independent random variables.'0 The latter may be written (10) 9= m M E mhg(yh), hEHu which is the ratio of means of M iid random variables gh = mhg(Yh) and mh. Assuming that the second moments of gh exist, 9 converges to its expectation and is asymptotically normal. The same remark holds for V. Thus, non-separable measures of welfare also converge as in (8) for some covariance matrix EI." The idiosyncratic component, V, = Eh/M, falls approximately proportionately in M. Said conversely, this component of the error in our estimator increases as one focuses on smaller target populations, which limits the degree of disaggregation possible. At what population size this error becomes unacceptably large depends on the explanatory power of the x variables in the expenditure model and, correspondingly, the importance of the remaining idiosyncratic component of expenditure. Model Error - (IL - '0The Gini coefficient is an exception but it can be handled effectively with a separable approximation. See Elbers, et. al. (2000) "The above discussion concerns the asymptotic properties of the welfare estimator, in particular con- sistency. In practice we simulate the idiosyncratic variance for an actual sub-population rather than calculate the asymptotic variance. 15 This is the second term in the error decomposition of equation (6). The expected welfare estimator i = E[W I m,, X,,Z$] is a continuous and differentiable function of c, which are consistent estimators of the parameters. Thus i is a consistent estimator of g and: (1 1) d(+Ar N(O, EM) as s __ oo, where s is the number of survey households used in estimation."2 We use the delta method to calculate the variance EM, taking advantage of the fact that p admits of continuous first-order partial derivatives with respect to C. Let V = [aO /a(]91 be a consistent estimator of the derivative vector. Then VM = Emls V-TV(Z)V, where V(() is the asymptotic variance-covariance matrix of the first stage parameter estimators. Because this component of the prediction error is determined by the properties of the first stage estimators, it does not increase or fall systematically as the size of the target population changes. Its magnitude depends, in general, only on the precision of the first-stage coefficients and the sensitivity of the indicator to deviations in household expenditure. For a given village v its magnitude will also depend on the distance of the explanatory x variables for households in that village from the levels of those variables in the sample data. 12Although A is a consistent estimator, it is biased. Our own experiments and analysis by Saul Morris (IFPRI) for Honduras indicate that the degree of bias is extremely small. We thank him for his communication on this point. Below we suggest using simulation to integrate over the model parameter estimates, C, which yields an unbiased estimator. 16 Computation E7Tor - (- - i) The distribution of this componehit of the prediction error depends on the method of computation used. When simulation is used this error has the asymptotic distribution given below in (16). It can be made as small as computational resources allow. The computation error is uncorrelated with the model and idiosyncratic errors. There may be some correlation between the model error, caused by disturbances in the sample survey data, and the idiosyncratic error, caused by disturbances in the census, because of overlap in the samples. However, the approach described here is necessary precisely because the number of sampled households that are also part of the target population is very small. Thus, we can safely neglect such correlation. For two populations, say Q and K, one can test whether the difference in their expected welfare estimates is statistically significant using the statistic (12) P 8Q - ( K )2 Var[(Q - WQ) - (K-WK- which is distributed asymptotically X2(1) under the null hypothesis Ho: WQ = WK. The parts of the variance in the prediction error for populations Q and K due to computation and the idiosyncratic component of W are independent. However, if the same first-stage model estimates are used to estimate th for households in both populations, then the model component of the prediction error will be correlated across populations. Let Ot be 17 a vector of all of the parameters used in the estimation of either LQ or 11K, and let q be a vector of the partial derivatives [8(iQ - IK)/la/) I -. Then, (13) Var[(7Q - WK) - (jQ - WK)] :t qTV (k) q+VQ + VfK + VQ + VK. If the first-stage parameter estimates used to estimate household expenditure differ across the two regions then the first term is simply VQ +VK. 6. COMPUTATION We use Monte Carlo simulation to calculate: ~, the expected value of the welfare measure given the first stage model of expenditure; VI, the variance in W due to the id- iosyncratic component of household expenditures; and the gradient vector V = [9h/ea(]] IZ Let the vector viT be the rth simulated disturbance vector. Treated parametrically, v2' is constructed by taking a random draw from an Mn,-variate standardized distribution and pre-multiplying this vector by a matrix T, defined such that TTT = S. Treated semi- parametrically, iir is drawn from the residuals with an adjustment for heteroscedasticity. We consider two approaches. First, a location effect, Tc, is drawn randomly, and with replacement, from the set of all sample &. Then an idiosyncratic component, ech, is drawn for each household c with replacement from the set of all standardized residuals and ee = The second approach differs in that this component is drawn only from the standardized residuals e* that correspond to the cluster from which household 18 tc's location effect was derived. Although 7 and e_h are uncorrelated, the second approach allows for non-linear relationships between location and household unobservables. It is considered empirically in the example below, Section 7. With each vector of simulated disturbances we construct a value for the indicator, Wr = W(m,t, t,u), where t= XT= , the predicted part of log per-capita expenditure. The simulated expected value for the indicator is the mean over R replications, 1R (14) EfWr- Rr=1 The variance of W around its expected value A due to the idiosyncratic component of expenditures can be estimated in a straightforward manner using the same simulated values, (15) VI - E - r=1 Simulated numerical gradient estimators are constructed as follows: We make a positive perturbation to a parameter estimate, say /k, by adding 51AlI, and then calculate t, followed by Wr+ = W(m,t+,iDr), and i+. A negative perturbation of the same size is used to obtain A-. The simulated central distance estimator of the derivative atj/af3kjH is - u-)/(2SI/kj). As we use the same simulation draws in the calculation of ~, p+and i these gradient estimators are consistent as long as a is specified to fall sufficiently rapidly as R -* oo (Pakes and Pollard (1989)). Having thus derived an estimate of the 19 gradient vector V = [9t/O1(] I, we can calculate VM = VTV(Z)V. Because li is a sample mean of R independent random draws from the distribution of (W Im,F, 2), the central limit theorem implies that (16) IR-( -Hd+ J[(°, EC) as R - oo, where Ec =Var(Wjm,, 2).13 When the decomposition of the prediction error into its component parts is not im- portant, a far more efficient computational strategy is available. Write lnyCh = XChT + 77(() + 6.(() where we have stressed that the distribution of 77 and e depend on the parameter vector (. By simulating ( from the sampling distribution of (, and {?17} and {e'} conditional on the simulated value C', we obtain simulated values {yr }, consistent with the model's distributional characteristics, from which welfare estimates W' can be derived (Mackay (1998)). Estimates of expected welfare, pl, and its variance are calculated as in equations (14) and (15). Drawing from the sampling distribution of the parameters replaces the delta method as a way to incorporate model error into the total prediction error. Equation (15) now gives a sum of the variance components VI + VM, while Ec in equation (16) '3Whenever a parametric distribution is used, efficiency can be improved using a minimum discrep- ancy estimator, where draws are made systematically from the disturbance distribution (see Traub and Werschulz, 1998). In experiments estimating the headcount measure, we found that, for R < 100, fVE for this estimator was 74-78% of its value for Monte Carlo simulation. 20 becomes Ec =Var(Wjm,X,cZ,V(C)). 7. BASIC SIMULATION RESULTS This section uses the 1994 Ecuadorian Encuesta Sobre Las Condiciones de Vida, a household survey following the general format of a World Bank Living Standards Mea- surement Survey. It is stratified by 8 regions and intended to be representative at that level. Within each region there are several levels of clustering. At the final level, 12 to 24 households are randomly selected from a census enumeration area. Expansion factors allow the calculation of regional totals. The analysis in this section uses data from the rural Costa region. Table 1 gives diagnostics for four different first-stage regressions. The first column refers to a regression with a range of demographic and education variables, but excluding all information about infrastructure. The second column corresponds to a regression where regressors include means of some of these same variables. The third column has results for a model with no means but including household level infrastructure variables, and the last column corresponds to a 'full' model with regressors chosen from all household level variables and also some of their means."4 Detailed results for the full model are 14In order to choose which variable means to include we first estimated the model with only household level variables. We then estimated the residual location effect for each cluster in rural Costa, and regressed them on variable means to determine a set of means particularly suited to explaining the effect of location. We limited the chosen number of variables to five so as to avoid over-fitting our 39 sample cluster effects. 21 presented in Appendix 2, Table A.1. All of the regressions are weighted by population expansion factors. These weights differ considerably across clusters and the test results in row one of Table 1 indicate that weighting has a significant effect on the coefficients. Weighting is discussed further in subsection below. In row 3 we examine the varying importance of residual intra-cluster correlation across the different models by decomposing the overall disturbance variance. The (weighted) cluster random effect variance, a4, is estimated non-parametrically, allowing for het- eroscedasticity in ech. For details, along with the formula used to estimate Var(6 2), see Appendix 1. Further evidence on the importance of residual location effects is pro- vided by a regression of the total residuals, 1i, on cluster fixed effects. Row 4 gives results of an F-test of the null hypothesis that fixed effect coefficients are jointly zero. Both rows 3 and 4 indicate that there is a significant intra-cluster correlation in the distur- bances of models that do not include location mean variables. However, when means of household-level variables are included as regressors they effectively capture most of the effect of location on consumption. Infrastructure variables also contribute, and in the full model there is little remaining evidence of spatial correlation in the residuals. We next model the variance of the idiosyncratic part of the disturbance, U2h. In Sec- tion 3 we suggested estimating a logistic model with free bounds. However, we have found 22 that imposing a minimum bound of zero and a maximum bound A* = (1.05) max{ec} yields similar estimates of the parameters a . These restrictions allow one to estimate the simpler form: [A -+ch which is what we do here.'5 Detailed results corresponding to the full model may again be found in Appendix 2, Table A.2. Results of chi-square tests of the null that estimated parameters are jointly zero in these regressions are found in row 5 of Table 1, where homoscedasticity is clearly rejected for all but the first model specification. Letting exp{zCT } = B and using the delta method, the model implies a household specific variance estimator for ech of () 3 e [AB] [AB(1-B) Finally, the last rows in Table 1 present results of tests of the null hypotheses that 7j and E are distributed normally, based on the cluster residuals & and standardized household residuals e* , respectively. For some strata in Ecuador the standarized residual distribution appears to be ap- proximately normal, even if formally rejected by tests based on skewness and kurtosis. "5Specifying the bounds is problematic in that it generates some small values of 6e,ch and, conse- quently, very large absolute standardized residuals. Thus, when simulating on the basis of the empirical distribution of these residuals we drop four observations with e* > 151. 23 Elsewhere, we find a t(5) distribution to be the better approximation. Relaxing the distributional form restrictions on the disturbance term and taking either of the semi- parametric approaches outlined above makes very little difference in the results for our Ecuadorian example. Simulation results for the headcount measure of poverty and the general entropy (0.5) measure of inequality are in Tables 2 and 3. We construct populations of increasing size from a constant distribution G,(x, m) by drawing households randomly from all census households in the rural Costa region. They are allocated in groups of 100 to pseudo enumeration areas, with 'parroquias' of a thousand households created out of groups of ten EAs. We continue aggregating to obtain nested populations with 100 to 100,000 households. For each model and measure we present estimates of the expected value of the welfare indicator, calculated with a sufficient number of simulation draws to ensure that the standard error due to computation is less than 0.001. In all examples we adjust for outliers. In standard situations, where the analyst has direct information about y, it is common to have outliers in that variable due to mismeasurement, inputting errors, etc. The problem is typically dealt with by discarding suspect observations. Here we have an analoguous problem with respect to the x variables used to infer expenditure levels, and 24 we deal with it in the usual way.'6 In addition to the standard "dirty data" problem, when treating the distribution of Uh parametrically there is a non-zero probability of getting an extreme simulation draw and therefore an 'outlying' value for yh. This problem is resolved by using truncated distributions. Since it is the best information we have, we use the minimum and maximum of - and - from our first-stage log-expenditure regression as truncation points.'7 Poverty measures give zero weight to expenditure levels above the poverty line and are not very sensitive to variations below. Inequality measures, however, can be very sensitive to outlying values and therefore the choices made to discard observations and 'trim' disturbances. (Sampling raises similar issues and this subject is an area of continuing research.) Table 2, column 1, refers to the headcount measure of poverty. It is defined as (18) W = N E mhl(yh < Z), hEHv where z is a poverty line defined in per-capita expenditure terms and o( ) is an indicator function taking on the value of one if the expression inside of the brackets is true and zero otherwise. When w77 and em are normally distributed there is a simple analytical form for '6We delete households with predicted per-capita expenditure, ih, outside the range of observed per- capita expenditure in the household survey, losing less than 0.2% of our total census observations as a result. 17Although they are in line with common practice, both steps of this procedure are admittedly some- what ad hoc. Addressing the standard problem of mismeasurement in Yh, Cowell and Victoria-Feser (1996) suggest leaving suspected outliers in the data when estimating inequality and using weighting to lessen their importance. A similar approach could be taken here. 25 the welfare estimator: (19) - E Mh(D((n z - h)16h), hEHN where {(.) is the standard normal distribution function and ah = F2 ,Ch Table 2, column 2, refers to the general entropy (GE) measure with parameter c = 0.5. This measure is defined as (20) WC= 1 f1 ZMh(~)c c(l- c) { N hEH v The first set of results (I) is calculated using the full first-stage model (column four of Table 1). Here we assume that the location effect estimated at the cluster level in the survey data applies in the census to an enumeration area, and that household disturbances across different EAs are uncorrelated. The set of results (II) again are calculated using the full first-stage model, but now with the (conservative) assumption that the location effect estimated from clusters applies across an entire parroquia. This has the expected effect of increasing the idiosyncratic variance, although the estimator is still remarkably good given the small size of the residual location effect once infrastructure means are included as observable correlates of consumption. For comparison, (III) and (IV) give simulation results using the most sparse first-stage model - that with only household-level variables and no means (column one of Table 1). In (III) we estimate it as in (I), with the location effect at the EA level, while in (IV) we impose the assumption that there 26 is no intra-cluster correlation, i.e. that 7i = 0. A comparison of the results in (I) and (III) highlights the importance of developing a set of regressors that succeeds in picking up most of the influence of location on consumption. The prediction errors in (III) are higher, particularly for inequality. As noted above, there is great potential to enrich both the survey and census with other data to obtain appropriate variables. Comparing (III) and (IV) one sees that failing to allow for the effect of location can lead to a markedly over-optimistic view of the precision of the estimator. Table 3 shows estimates of the expected value of the welfare indicator, the standard error of the prediction, and the share of the total variance due to the idiosyncratic com- ponent for increasingly large target populations. The location effect estimated at the cluster level in the survey data is applied to EAs in the census. In all cases the standard error due to computation is less than 0.001. Looking across columns one sees how the variance of the estimator falls as the size of the target population increases. For both measures the total standard error of the pre- diction falls to about five to seven percent of the point estimate with a population of just 15,000 households. At this point, the share of the total variance due to the idiosyncratic component of expenditure is already small, so there is little to gain from moving to higher levels of aggregation. The table also shows that estimates for populations of 100 have large errors Clearly it would be ill advised to use this approach to determine the poverty 27 of yet smaller groups or single households. We now examine briefly several other modeling choices. First we consider the impor- tance of modelling heteroscedasticity in the idiosyncratic component of the disturbance. We estimate expected headcount and GE (0.5) measures for the entire rural Costa, by parroquia, first using a model of heteroscedasticity and then assuming homoscedasticity. Table 4, column 1, indicates that there is little re-ranking of parroquias based on their headcount measures when heteroscedasticity is ignored. However, allowance for het- eroscedasticity does have an important effect on rankings by inequality. The bottom half of the table indicates that the Spearman's rank correlation of general entropy inequality estimates is just 0.83. The difference in estimates within each parroquia is not always trivial for either measure. Differences across the two sets of estimates reach 0.08 and 0.11 for the headcount and GE (0.5) measure, respectively. We next consider the effect of weighting by population expansion factors. As noted above, all of our analyses use these weights. The argument for doing so is that there may be some variance in the parameters C within regions which is not modelled. If so, because we want to use the model estimates to impute into the census, we would prefer the model to fit most closely the clusters that represent large census populations. However, this decision is not innocuous. The expansion factors range by a factor of about 600, with about half of the clusters receiving on the order of 100 times as much weight in the 28 regression as the other half. To explore this, we estimate parroquia welfare measures using the full first-stage model without weighting by population expansion factors. Column two of Table 4 shows that this choice is very important. The rank correlation across weighted and unweighted estimates of the expected headcount is just 0.77, the average absolute difference is 0.05, and reaches as high as 0.34. For the general entropy measure, the rank correlation is similar: 0.78, with a maximum difference of 0.19. Finally, we consider the second of the semi-parametric approaches to estimating the effect of the unobserved component of consumption on the welfare measure (see Section 6). Results axe found in the third column of Table 4. Relaxing the functional form restrictions on the disturbance term makes very little difference in this example. The rank correlations between the parametric and semi-parametric treatments is 1.00 and 0.98 for the headcount and GE (0.5) measure, respectively, with maximum differences in the estimates of 0.04 and 0.05. 8. How MUCH IMPROVEMENT? Most users of welfare indicators rely, by necessity, on sample survey based estimates. Table 5 demonstrates how much is gained by combining data sources. The second column gives the sampling errors on headcount measures estimated for each stratum using the survey data alone (taking account of sample design). There is only one estimate per 29 region as this is the lowest level at which the sample is representative. The population of each region is in the third column. When combining census and survey data it becomes possible to disaggregate to sub-regions and estimate poverty for specific localities. Here we choose as sub-regions parroquias or, in the cities of Quito and Guayaquil, zonas, because our prediction errors for these administrative units are similar in magnitude to the survey based sampling error on the region level estimates. (See the median standard error among sub-regions in the fourth column.) The final column gives the median population among these sub-regions. Comparing the third and final columns it is clear that, for the same prediction error commonly encountered in sample data, one can estimate poverty using combined data for sub-populations of a hundredth the size. This becomes increasingly useful the more there is spatial variation in well-being that can be identified using this approach. Considering this question, Demombynes, et. al. (2002) find, for several countries, that most sub-region headcount estimates do differ significantly from their region's average level. 9. OTHER MEASURES Table 6 summarizes results for a range of welfare measures, again using the four nested census populations described above. In each case, location effects are assumed to apply 30 at the EA level. The measures are the FGT (1) measure of the severity of poverty, (21) WI = N E mh(1 - h)I(Yh < Z); the variance of log expenditure, (22) W = N Z mh(lnyh-_n y)2; NhEH& and the Atkinson measure with inequality aversion parameter of 2, (23) W2 = 1 {N Mh(-h)} hEHv, Y where the village mean expenditure, y, is weighted by household size. Results for the FGT(1) measure, often called the poverty gap, are similar to those for the headcount. Again quite precise estimates are obtained for populations of just 15,000 households. Results for the variance of log expenditure measure are similar to those for the GE (0.5) measure presented in Table 3. Our estimates of the Atkinson measure are somewhat more precise that the other inequality measures, 10. PUTTING THE INDICATORS TO WORK - ILLUSTRATIONS We now use estimates of distributional measures in two different types of applications. The measures have been calculated for all parroquias in rural Ecuador using the full census. Parroquias are the lowest adminstrative units. The calculations are based on 31 three separate regional first-stage consumption models (estimation results available from the authors on request). Geographical Maps of Welfare A useful way of understanding the geographical spread of poverty or inequality is to contruct a map using GIS data. Figure 3 provides an example. Comparisons between the Costa, the coastal region of Ecuador, and the Sierra, the central mountainous region, feature highly in popular political debate in Ecuador."8 The top two maps in Figure 3 depict the spatial distribution of poverty on the basis of two common measures: the headcount and the poverty gap, FGT(1).'9 The bottom two maps in Figure 3 indicate those instances where the two alternative poverty measures differ in their ranking of cantons. The map on the lower left shows that in the Costa a number of cantons are ranked poorer under the headcount criterion than under the poverty gap. In contrast, in the Sierra, numerous cantons are ranked more poor under the poverty gap criterion than under the headcount. Clearly, views about the relative poverty of the regions will be affected by the measure of poverty employed. It is also clear that, irrespective of the poverty measure used, all cantons in the eastern part of Ecuador are particularly poor. This type of map could be used for targetting development efforts, or for exploring relationships between welfare indicators and other variables. For example, a poverty or 18See, for example, "Under the Volcano", The Economist, November 27, 1999, p. 66. 9For visibility we have disaggregated only to cantons, the administrative level just above a parroquia. 32 inequality map could be overlaid with maps of other types of data, say on agro-cimatic or other environmental characteristics. The visual nature of the maps may highlight unexpected relationships that would escape notice in a standard regression analysis. Are Neighbors Equal? An important issue in the area of political economy and public policy is to determine the appropriate level of government to give responsibility for public services and their financing. The advantage of decentralizing to make use of better community-level information about priorities and the characteristics of residents may be offset by a greater likelihood that the local governing body is controlled by elites - to the detriment of weaker community members. In a recent paper, Bardhan and Mookherjee (1999) highlight the roles of both the level and heterogeneity of local inequality (and poverty) as determinants of the relative likelihood of capture at different levels of government. As most of the theoretical predictions are ambiguous, they stress the need for empirical research into the causes of political capture - analysis which has been held back by a lack of empirical measures for most variables.20 Our community-level welfare estimates can help to address this problem. We can answer, first, many questions about the level and heterogeneity of welfare 20Galasso and Ravallion (2002), which compares the inter- vs intra-district targetting of schooling in Bangladesh, uses village-level inequality measures, but is limited to those sampled in the household expenditure survey. 33 at different levels of government. For example, here we decompose inequality in rural Ecuador into between- and within-group components and examine how within-group in- equality evolves at progressively lower levels of regional disaggregation. At one extreme, when a country-level perspective is taken, all inequality is, by definition, within-group. At the other extreme, when each individual household is taken as a separate group, the within-group contribution to overall inequality is zero (assuming, as is implicit in our use of a per-capita indicator, an equal distribution within each household). But how rapidly does the within-group share fall? Is it reasonable to suppose that at a sufficiently low level of disaggregation (say, a village or neighbourhood) differences within groups are small, and most of overall inequality is due to differences between groups? We employ the general entropy (0.5) inequality measure because it is decomposable. If N individuals are placed in one of J groups subscripted by j, and the proportion of the population in the jth group, denoted fj, has weighted mean per-capita expenditure yj and inequality wj, then (24) WO.5 = 4 {1 - E f_( + where the first term is the inequality between groups and the second is within groups (Cowell, 1995). In stages we disaggregate the country down to the parroquia level. Table 7 illustrates that even at a very high degree of spatial disaggregation, 86% of overall rural 34 inequality can still be attributed to differences within groups.2' For further interpretation and examples from other countries, see Elbers, et. al. (2002). Thus, as often suggested by anecdotal evidence, even within local communities there exists a considerable heterogeneity of living standards. In addition to affecting the likelihood of political capture, this may have implications for the feasibility of raising revenues locally, as well as for the extent to which residents of such communities can be viewed as having similar demands and priorities. Put together with either survey data on attitudes towards government or on the al- location of public spending, disaggregated inequality estimates could be used to directly assess the influence of welfare distributions on the political process. We plan to explore this further in the context of the targetting of social fund programs. 11. CONCLUSIONS In constructing disaggregated estimates of welfare we have explored a straightfor- ward idea. We use detailed household survey data to estimate a model of per-capita expenditure and then use the resulting parameter estimates to weight the census-based characteristics of a target population in determining its expected welfare level. While others have taken weighted combinations of variables in the census to estimate house- hold poverty, this merging of data sources has the advantage of yielding estimators with 21We have confined our attention to rural areas where there is no evidence of spatial autocorrelation in e. Results using all of Ecuador were very similar. 35 clear interpretations via their link to household expenditure; which are mutually compa- rable; and, perhaps most importantly, which can be assessed for reliability using standard statistical theory. What is quite remarkable is how well this method of estimating welfare measures can work in practice. In our examples using Ecuadorian data we find that estimates are often quite reliable for populations as small as 15,000 households, a 'town'. This is a very considerable improvement over the direct survey-based estimates, which are only consistent for areas encompassing hundreds of thousands of households. Given these promising initial results there is also no reason to be passive consumers of existing data sets. Governments and surveying bodies can be encouraged to design both census and survey instruments to correspond more closely for this purpose. So now that we have estimates of poverty and inequality in thousands of 'towns' or other groups, what can we do with them? The possibilities seem many and varied. For many questions, intra-regional cross-town analysis could considerably enrich the existing results of cross-country studies (see, Elbers and Lanjouw, 2001). At the micro-level increasing attention is being paid to ways in which welfare distributions within groups relate to socioeconomic and political outcomes. Of the resulting multitude of theories, most remain to be tested. Again, our findings regarding the level and heterogeneity of well-being at different levels of government, features which have been linked in theory to 36 political capture and the targetting of public resources, are just one illustration of what is possible. Merging these measures with data on crime, education, health, voting patterns, unemployment, and so on, will open up many promising avenues for further research. Department of Economics, Vrije Universiteit, De Boelelaan 1105, 1081 HV Amster- dam, N.L.; celbersOfeweb.vu.nl, and Department of Agriculture and Resource Economics, University of California at Berke- ley, and the Brookings Institution, 1775 Massachusetts Avenue NW, Washington, DC, 20036, U.S.A.; jlanjouw(brook.edu, and The World Bank, 1818 H. Street, Washington, DC, 20433, U.S.A.; planjouwvuworldbank.org. 37 REFERENCES AGHION, P., AND P. BOLTON (1997): "A Theory of lYickle Down Growth and Devel- opment," Review of Economic Studies, 64, 2, 151-72. ALDERMAN, H., M. BABITA, G. DEMOMBYNES, N. MAKHATHA, AND B. OZLER (2002): "How Low Can You Go?: Combining Census and Survey Data for Mapping Poverty in South Africa," Journal of African Economics, forthcoming. ALESINA, A., AND D. RODRIK (1994): "Distributive Politics and Economic Growth," Quarterly Journal of Economics, 109, 465-90. ANGRIST, J. D., AND A.B. KRUEGER (1992): "The Effect of Age of School Entry on Educational Attainment: An Application of Instrumental Variables with Moments from Two Samples," Journal of the American Statistical Association, 87, 328-36. ARELLANO, M., AND C. MEGHIR (1992): "Female Labour Supply and on the Job Search: an Empirical Model Estimated using Complementary Data Sets," Review of Economic Studies, 59, 537-59. ATKINSON, A. B., AND A. BRANDOLINI (2001): "Promise and Pitfalls in the Use of "Secondary" Data-Sets: Income Inequality in OECD Countries," Journal of Eco- nomic Literature, 39, 3. BANERJEE, A., AND E. DUFLO (2000): "Inequality and Growth: What Can the Data Say?," NBER Working paper no. 7793. BANERJEE, A., AND A. NEWMAN (1993) "Occupational Choice and the Process of Development," Journal of Political Economy, 101, 1, 274-98. BARDHAN, P., AND D. MOOKHERJEE (1999): "Relative Capture of Local and Central Governments: An Essay in the Political Economy of Decentralization," CIDER Working Paper no. C99-109, University of California at Berkeley. BARRO, R., AND X. SALA-I-MARTIN (1991): "Convergence Across States and Re- gions," Brookings Papers on Economic Activity, no. 1, 107-82. BRUNO, M., M. RAVAILLION, AND L. SQUIRE (1998): "Equity and Growth in De- veloping Countries: Old and New Perspectives on the Policy Issues," in Income Distribution and High-Quality Growth, eds. V. Tanzi and K.-Y. Chu. Cambridge: MIT Press. 38 CHESHER, A., AND C. SCHLUTER (2002): "Welfare Measurement and Measurement Error," Review of Economic Studies, forthcoming. COWELL, F. (1995): The Measurement of Inequality, 2nd ed. Hemel Hempstead: Pren- tice Hall/Harvester Wheatsheaf. COWELL, F., AND M.-P. VICTORIA-FESER (1996) "Robustness Properties of Inequal- ity Measures," Econometria, 64, 1, 77-101. DEATON, A. (1997): The Analysis of Household Surveys: A Microeconometric Approach to Development Policy. Washington, D.C.: The Johns Hopkins University Press for the World Bank. -(1999): "Inequalities in Income and in Inequalities in Health," NBER Working paper no. 7141. DEININGER, K., AND L. SQUIRE (1996): "A New Data Set Measuring Income Inequal- ity," The World Bank Economic Review, 10, 565-91. DEMOMBYNES, G., C. ELBERS, J. 0. LANJOUW, P. LANJOUW, J. MISTIAEN, AND, B. 0(2002): "Producing an Improved Geographic Profile of Poverty: Methodology and Evidence from Three Developing Countries," WIDER Discussion Paper no. 2002/39, The United Nations. ELBERS, C., AND P. LANJOUW (2001): "Intersectoral Transfer, Growth, and Inequality in Rural Ecuador," World Development, 29, 3, 481-96. ELBERS, C., J. 0. LANJOUW, AND P. LANJOUW (2000): "Welfare in Villages and Towns: Micro-Measurement of Poverty and Inequality," Tinbergen Institute Work- ing Paper no. 2000-029/2. ELBERS, C., J. 0. LANJOUW, P. LANJOUW, AND P. G. LEITE (2002): "Poverty and Inequality in Brazil: New Estimates from Combined PPV-PNAD Data," Unpub- lished Manuscript, The World Bank. ELBERS, C., P. LANJOUW, J. MISTIAEN, B. OZLER, AND K. SIMLER (2002): "Are Neighbours Equal? Estimating Local Inequality in Three Developing Countries," Unpublished Manuscript, The World Bank. FIELDS, G. (1989): "A Compendium of Data on Inequality and Poverty for the Devel- oping World," Unpublished Manuscript, Cornell University. 39 _(2001): "Economic Growth and Inequality: A Review of the Empirical Evidence," Chapter 3 in Distribution and Development: A New Look at the Developing World. Russel Sage Foundation and MIT Press. GALOR, O., AND J. ZEIRA (1993): "Income Distribution and Macroeconomics," Review of Economic Studies, 60, 35-52. GALASSO, E., AND M. RAVALLION (2002): "Decentralized Targetting of an Anti- Poverty Program," Unpublished Manuscript, The World Bank. GHOSH, M., AND J. N. K. RAO (1994): "Small Area Estimation: An Appraisal," Statistical Science, 9, 55-93. GREENE, W. H. (2000): Econometric Analysis. Fourth Edition. New Jersey: Prentice- Hall Inc. HELLERSTEIN, J., AND G. IMBENS (1999): "Imposing Moment Restrictions from Aux- iliary Data by Weighting," Review of Economics and Statistics, 81, 1, 1-14. KEYZER, M. (2000): "Reweighting Survey Observations by Monte Carlo Integration on a Census," Stichting Onderzoek Wereldvoedselvoorziening, Staff Working Paper no. 00.04, the Vrije Universiteit, Amsterdam. LUSARDI, A. (1996): "Permanent Income, Current Income and Consumption: Evidence from Two Panel Data Sets," Journal of Business and Economic Statistics, 14, 1. MACKAY, D. J. C. (1998): "Introduction to Monte Carlo Methods," in Learning in Graphical Models; Proceedings of the NATO Advanced Study Institute, ed. by M. I. Jordan. Kluwer Academic Publishers Group. MURPHY, K. M., SHLEIFER, A., AND R.C. VISHNY (1989): "Income Distribution, Market Size and Industrialization," Quarterly Journal of Economics, 104, 537-64. PERSSON, T., AND G. TABELLINI (1994): "Is Inequality Harmful for Growth," Amer- ican Economic Review, 84, 600-21. PAKES, A., AND D. POLLARD (1989): "Simulation and the Asymptotics of Optimiza- tion Estimators," Econometrica, 57, 1027-58. RAO, J. N. K. (1999): "Some Recent Advances in Model-Based Small Area Estimation," Survey Methodology, 25, 175-86. 40 RAVALLION, M. (1998): "Does Aggregation Hide the Harmful Effects of Inequality on Growth?," Economics Letters, 61, 1, 73-7. TAROZZI, A. (2002): "Estimating Comparable Poverty Counts from Incomparable Sur- veys: Measuring Poverty in India," RPDS Working paper no. 213, Princeton Uni- versity. TRAUB, J.F., AND A.G. WERSCHULZ (1998): Complexity and Information. Cam- bridge: Cambridge University Press. 41 PIGU1ES AND TABLES Figure Ia EsdimOtedHeadcounts bY Porroquida in Rural Costa Headcot,,it Headcouti CalcUlated from PredictedConptio .0 0.9 0.g1 0.'7 0.6 0. 5 0.2 X / ~~~~~~~~CCIUtb4d 0.3 0.0 Parroqulos ronkeo Isy estmated poverty Figure lb Inequality by Parroquic in Rural Costa Estimated Inequalijy v Inequality Calculated from Predicted Consumption Ce61er EntrOPY Class with Parameter 0.5 Inequality 0.50o 0.45, 0.40 0. 5t 0.00 Parroquios ronked by estinnoted inequality 42 Figure 2 0.8 10 0.8 1 similUatim 5 Iata 0.6 0.6 0.4 0.4 0.2 0.2 10 n1 12 13 14 10 11 12 13 14 1 1 0.8 1 0.8 / 0.6 0.6 0.4 0.4 0.2 ~~~~~~~~~~~~~~~~~0.2 10 1 12 13 14 10 11 12 13 14 Idiosyncratic error falling with number of households in target population. 43 Figure 3a Rural Poverty by Canton: Headcount and Poverty Gap Head count Poverty gap - - Head count Poverty gap index index 0.13-0.48 0.04-0.17 0.48-0.54 _ 0.17-0.20 0.54 - 0.59 0.20 - 0.24 0.59- 0.64 0.24-0.28 0.64 - 0.85 0.28 - 0.45 no data -V ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ -4 ~ ~ ~ ~ ~ I- Areas ranked Areas ranked poorer using -j- poorer using head count poverty gap Note: a The top two maps illustrate the geographical distribution of rural poverty across cantons based on respectively, the headcount measure of poverty and the poverty gap index. The shaded regions in the bottom two maps highlight those cantons where the rankings in the top two maps are not the same. The map on the left highlights those cantons that are ranked lower (more poor), according to the headcount measure, than they would be according to the poverty gap index. The map on the right highlights those cantons that are ranked lower according to the poverty gap index, than they would be according to the headcount measure. 44 Table 1: Diagnostics for Selected First-Stage Model Specifications Model I II III IV (Sparse) (Full) Diagnostic No No Infrastructure Infrastructure Infrastructure Infrastructure No Means Location Means No Means Location Means Hausman test of F-test: 1.66 F-test: 2.05 F-test: 1.57 F-test: 1.84 Population weights (Deaton, 1997) 95% Critical value 95% Critical value 95% Critical value 95% Critical value Ho: p w _P NW (18,448)=1.42 (23,438)=1.53 (21, 442)=1.57 (26,432)=1.50 R 2 0.41 0.47 0.42 0.50 Importance of random effect 0.141 0.048 0.149 0.019 Ho: Location effects jointly = 0 <0.001 0.024 <0.001 0.235 p-value _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ e.ch = ae 0.98 < 0.001 <0.001 < 0.001 p-value Distribution of: 71c Skewness 0.52 0.12 0.38 0.25 Kurtosis 3.06 0.87 3.91 0.35 J (2) - test of normal distribution 1.79 7.47 2.27 11.85 Distribution of: ich (N=483) (N=484) (N=484) (N=484) Skewness -0.22 -0.48 -0.14 -0.51 Kurtosls 5.67 6.23 7.14 3.79 a(2) - test of 147.44 229.90 346.49 33.51 45 Table 2: Headcount and General Entropy (0.5) Measures - Different Consumption Models' Model Estimates Headcount GE (0.5) I No. Draws R 300 300 Full Model a 0.508 0.275 Location Effect at EA Level Estimated Standard 0.024 0.020 Error Due tob: Model 0.023 0.004 Idiosyncratic 0.005 0.019 II ft 0.508 0.504 Full Model Estimated Standard 0.025 0.040 Location Effect at Parroquia Error Level Due to: Model 0.021 0.004 Idiosyncratic Error 0.013 0.039 III at 0.504 0.514 Sparse Model Estimated Standard 0.029 0.038 Location Effect at EA Level Error Stadar_0_29__03 Due to: Model 0.028 0.008 Idiosyncratic Error 0.009 0.037 IV A 0.526 0.521 Sparse Model Estimated Standard 0.021 0.016 Assumption of No Location Error Effect Due to: Model I 0.020 0.007 Idiosyncratic Error 0.004 0.015 Notes: a These are household groups drawn randomly from the rural Costa census population as described in the text. The 'population' samples are of 15,000 households. b These are the estimated standard deviations for each separate piece of the total variance, VM and Vi. 46 Table 3: Headcount and General Entropy (0.5) Measures - Different Population Sizes Number of householdsa ' Model Estimates 100 1,000 15,000 100,000 Q 0.46 0.50 0.51 0.51 Headcount Total Standard Error 0.067 0.039 0.024 0.024 VI / Total Variance 0.75 0.24 0.04 0.02 a 0.26 0.28 0.28 0.28 GE (0.5) Total Standard Error 0.048 0.029 0.022 0.022 VI / Total Variance 0.79 0.28 0.03 <0.01 Notes: a These are household groups drawn randomly from the rural Costa census population as described in the text. Smaller 'population' samples are subsets of the larger 'populations'. b These are the estimated standard deviations for each separate piece of the total variance, VM and VI. 47 Table 4: Further Diagnostics Estimation assuming Estimation without Estimation with Comparison of homoscedasticity of use of population semi-parametric Parroquia-level disturbance expansion factors disturbance Estimates ab components and no distribution location effect _ Headcount Spearman's rank 0.98 0.73 0.998 correlation (j X a (C -A) Mean absolute difference 0.017 0.053 0.008 Minimum -0.069 -0.169 -0.024 Maximum 0.023 0.306 0.026 General Entropy (0.5) Spearman's rank 0.86 0.91 0.957 correlation (p, ,i). -(, -ai) Mean absolute 0.016 0.026 0.011 difference -0.174 -0.151 -0.047 Minimum 0.067 0.009 0.192 Maximum Notes: all are estimates using the fiull model in column 4 of Table 1. p *are estimates which differ as indicated in the column headings. b Comparisons are made of 271 parroqutas in the rural Costa region. 48 Table 5: Improvement using Combined Data - Headcount Sample Data Only (region) Combined Da a (sub-region) (2) (3) (4) (5) Region S.E. of Estimate Population S.E. of Estimate Population (1000s) (median) (median, 1000s) Rural Sierra 0.027 2 509 0.038 3.3 Rural Costa 0.042 1,985 0.046 4.6 Rural Oriente 0.054 298 0.043 1.2 Urban Sierra 0.026 1,139 0.026 10.0 Urban Costa 0.030 1,895 0.031 11.0 Urban Oriente 0.050 55 0.027 8.0 Quito 0.033 1,193 0.048 5.8 Guayaquil 0.027 1,718 0.039 6.5 49 Table 6: Other Measures of Welfare Number of householdsa Measure Estimates 100 1,000 15,000 100,000 0.159 0.176 0.176 0.176 FGT (1) Estimated 0.030 0.016 0.013 0.013 Poverty Gap Standard Error _ Due tob: Model 0.013 0.013 0.012 0.012 _Idiosyncratic 0.026 0.010 0.002 0.002 0.453 0.480 0.480 0.482 Variance of Estimated 0.071 0.044 0.037 0.037 Log Per- Standard Error capita Due to: Model 0.037 0.039 0.037 0.037 Expenditure Idiosyncratic Error 0.060 0.021 0.006 0.002 0.368 0.389 0.390 0.391 Atkinson Estimated 0.046 0.028 0.024 0.023 Index (2) Standard Error . Due to: Model 0.024 0.024 0.024 0.023 Idiosyncratic Error 0.039 0.014 0.004 0.001 Notes: a b See notes to Table 3. 50 Table 7 Decomposition of Inequality in Rural Ecuador by Regional Sub-Group General Entropy (0.5) No. of sub- Within-Group Between-Group Level of Decomposition groups (%) (%) National 1 100.0 0 Sector and: Region (Costa, Sierra, Oriente) 3 100.0 0 Province 21 98.7 1.3 Canton 195 94.1 5.9 Parroguia 915 85.9 14.1 Household 960,529 0 100.0 51 Appendix 1: The Estimator a2 and its Distribution Estimation using moment conditions For c = 1, . . ., C; h=1, . . , nc, let 71 and ech be independent random variables with zero expectation and finite variance, where the w77 are identically distributed. Suppose we have observations on uch, where (25) Uch = ?C + Ech. The problem is to estimate o-2 = var(q). Using '' to indicate the arithmetic mean over an index, (e.g., ec. = 1/ncEh eM) we note that Uc. 77c + ec., Hence (26) E[U2] =o + var(ec.) = 0, +r2. C. 77~~~~' C We use the following lemma: Lemma 1 For i = 1,... ,n, let xi be independent random variables urith zero mean and finite variance, and let A1,. . . , An be a given set of non-negative numbers, satisfy- ing ZL Ai = 1. Let x. = ,i Aixi be the weighted average of the xi. Then E[Z Ai(x _ (x)2] = Ej A(l_ Ai)E[xl]. i i 52 The lemma implies that, for a set of non-negative weights w,, summing to 1: (27) E[Zwc(uc - u..)2] = ,wc(l - w,)(, ±7 rT2) c c Hence: E 2 E[ C wC(uC - u)2] _ EW W-c)Tc2 (28) o*727 = 331w) Zw(-3 ( ) S ~Ej wj(- wj) Ej wj(- wj) Note that (29) rc' = var(e..) = E[ 1 -(Cch EC.)2] A natural candidate for an estimator for o,, is therefore (30) 6, = max(EEc (?-U ) _ c(1-wc)r) where (31) r- 1- 1) (Ech- c.) An estimator for the variance of a, can be obtained using simulation (see below). As an alternatively, to approximate var (a2) we make the following simplifying assumptions: * Ech X (0, ,2,c) , homoskedastic within cluster. * 71c , Ar ( ;7) * UC and r,2 treated as independent and 53 * U =0. Denote a= wc/heW(1-w;), n = a a,Uc - Ec brC2'. (32) var (u ) = var + _2 + 277CEC.) = var(772) + var(e ) + 4o,r2- Note that under the assumptions above, rc2 is distributed as -rc2X2_ /(nc-1), hence its variance is (33) var(rc2) = 2 ' 4 nic - Similarly, E2 is distributed as rc2X2 with variance 2Tr4 and var(272) = 2o,42. Combining, we find (34) var (a,2) z[aivar (uv ) + b,2var(r2)] - E 2[a {()a(r)2 + 2a,r2} + C C Estimation using simulation The following, more direct approach can also be taken. * Estimate a, from equation (30) above. This gives o,2. * Estimate Oe,ch heteroskedasticity model in Section 3. This gives 6^2 ch- 54 * Using the estimated variance components, and assuming 77, and Ejh to be indepen- dent and normally distributed with mean zero, generate new values for ut, using equation (25). * Compute a new estimate for o2 using formula (30). * Repeat many times, keeping the simulated values of 2,. The set of simulated values for au thus obtained can be used to calculate the sampling variance of o, directly. In practice a2 is often so small that equation (30) will generate a significant number of zero variance estimates for 77 (i.e., the distribution is far from normal). Given this feature of the sampling distribution of a,7 using only information on the point estimate and its sampling variance could be misleading (as when using the delta method to calculate the model variance, VM). The alternative approach to calculating the variance of {i discussed following equation (16) could be implemented by taking random draws of r, from the set of simulated values of u2 obtained above,therefore using the full distribution. 55 Appendix 2: First Stage Regression Results Table A.1. First-Stage Estimates for Log Per-Capita Expenditure: Rural Costa Estimated Standard Variable' Parameter estimateb Errors I. Household-level/ Non-Infrastructure Famnily size -0.623 0.0947 Family size squared 0.062 0.0138 Family size cubed -0.002 0.0006 Indigenous language spoken 0.004 0.0035 Rented home 0.001 0.0015 Owned home 0.002 0.0005 Walls of brick 0.002 0.0007 Walls of wood -0.002 0.0008 Cooking on gas fire 0.0001 0.0019 Cooking with wood or charcoal -0.0008 0.0019 Persons per bedroom 0.049 0.1018 Persons per bedroom squared -0.014 0.0185 Persons per bedroom cubed 0.0007 0.0009 Household head with no spouse -0.089 0.1500 Years of schooling of: Household head 0.027 0.0067 Spouse of head 0.011 0.0084 Age of: Household head 0.005 0.0025 Spouse of head -0.002 0.0030 II. Household-level/ Infrastructure Own connection to modem sewage 0.002 0.0005 Shared connection to modem sewage 0.0005 0.0010 Own latrine 0.0002 0.0006 III. Location Means/ Non-Infrastructure Age of household head -0.026 0.0064 Years of schooling of spouse of head -0.098 0.0327 % of household heads male -0.025 0.0054 (Persons per bedroom)A2 0.019 0.0043 IV. Location Means/ Infrastructure Own connection to modem sewage 0.004 0.0012 Number of household observations 485 Number of sample clusters 39 Notes: 'Age and education for a child in a specific birth position is set equal to zero if the household does not have such a child. The location mean variables are household values of the indicated variable in the census data averaged over all households in a census enumeration area. A2 indicates that the mean is squared. Dummy variables are defined as either 100 or 0. b Parameters and standard errors are two-step GLS estimates calculated using household expansion factors and estimated variances of the disturbance components a,, and cE. 56 Table A.2 Model of Heteroscedasticity in eh Estimated Variable Parameter Estimate Standard Errors Constant -4.161 0.427 Years schooling of head's spouse -2.516 1.066 Wood walls 0.018 0.004 Predicted log per capita expenditure * spouse education 0.299 0.083 Head's education * age of head -0.005 0.002 Head's education * cooking with gas 0.001 0.0007 Age of head * education of spouse 0.019 0.009 Spouse's education * age of spouse -0.009 0.003 Spouse's education * crowding -0.525 0.150 Spouse's education * own latrine 0.001 0.0006 Age of Spouse A 2 0.0004 0.0001 Shared sewage connection * brick walls -0.0002 0.00005 Head with no spouse * rented home 0.044 0.004 Spouse's education * household size 0.059 0.018 Spouse's education * (crowdingA2) 0.104 0.029 Spouse's education * (crowdingA3) -0.006 0.002 Own sewage connection * (crowdingA3) -0.00003 0.00003 Brick walls * (household size^3) 0.00004 0.00001 Wooden walls * (crowdingA3) -0.00008 0.00002 Gas cooking * (household sizeA3) -0.00004 0.00001 Gas cooking * (crowdingA3) 0.00004 0.00001 R 2 _ 0.25 Note: 'The dependent variable is ( 62h - UC )2. See notes to Table A. 1 for other variable definitions. The model and standard errors are estimated using household expansion factors. Standard errors are White robust estimates. 57 Policy Research Working Paper Series Contact Title Author Date for paper WPS2895 Telecommunications Reform in Jean-Jacques Laffont September 2002 P. Sintim-Aboagye C6te d'lvoire Tchetch6 N'Guessan 38526 WPS2896 The Wage Labor Market and John Luke Gallup September 2002 E. Khine Inequality in Vietnam in the 1990s 37471 WPS2897 Gender Dimensions of Child Labor Emily Gustafsson-Wright October 2002 M. Correia and Street Children in Brazil Hnin Hnin Pyne 39394 WPS2898 Relative Returns to Policy Reform: Alexandre Samy de Castro October 2002 R. Yazigi Evidence from Controlled Cross- Ian Goldin 37176 Country Regressions Luiz A. Pereira da Silva WPS2899 The Political Economy of Fiscal Benn Eifert October 2002 J. Schwartz Policy and Economic Management Alan Gelb 32250 in Oil-Exporting Countries Nils Borje Tallroth WPS2900 Economic Structure, Productivity, Uwe Deichmann October 2002 Y. D'Souza and Infrastructure Quality in Marianne Fay 31449 Southern Mexico Jun Koo Somik V. Lall WPS2901 Decentralized Creditor-Led Marinela E. Dado October 2002 R. Vo Corporate Restructuring: Cross- Daniela Klingebiel 33722 Country Experience WPS2902 Aid, Policy, and Growth in Paul Collier October 2002 A. Kitson-Walters Post-Conflict Societies Anke Hoeffler 33712 WPS2903 Financial Globalization: Unequal Augusto de la Torre October 2002 P Soto Blessings Eduardo Levy Yeyati 37892 Sergio L. Schmukler WPS2904 Law and Finance: Why Does Legal Thorsten Beck October 2002 K. Labrie Origin Matter? Asl1 Demirgu,-Kunt 31001 Ross Levine WPS2905 Financing Patterns Around the World: Thorsten Beck October 2002 K. Labrie The Role of Institutions Asl1 Demirgu,-Kunt 31001 Vojislav Maksimovic WPS2906 Macroeconomic Effects of Private Lourdes Trujillo October 2002 G. Chenet-Smith Sector Participation in Latin Noelia Martin 36370 America's Infrastructure Antonio Estache Javier Campos WPS2907 The Case for International Antonio Estache October 2002 G. Chenet-Smith Coordination of Electricity Regulation: Martin A. Rossi 36370 Evidence from the Measurement of Christian A. Ruzzier Efficiency in South America WPS2908 The Africa Growth and Opportunity Aaditya Mattoo October 2002 P. Flewitt Act and its Rules of Origin: Devesh Roy 32724 Generosity Undermined? Arvind Subramanian WPS2909 An Assessment of Carsten Fink October 2002 P. Flewitt Telecommunications Reform in Aaditya Mattoo 32724 Developing Countries Randeep Rathindran Policy Research Working Paper Series Contact Title Author Date for paper WPS2910 Boondoggles and Expropriation: Philip Keefer October 2002 P. Sintim-Aboagye Rent-Seeking and Policy Distortion Stephen Knack 38526 when Property Rights are Insecure