Policy Research Working Paper 11110 When Aggregation Misleads Bias in Unit-Level Small Area Estimates of Poverty with Aggregate Data Paul Andres Corral Rodas Poverty and Equity Global Department May 2025 Policy Research Working Paper 11110 Abstract This paper explores why small area poverty estimates from geographic levels. Through model-based simulations, the models at the household level that only use aggregate paper shows that the bias in these models is minimized data as covariates, exhibit systematic bias. The analysis when the empirical variability of simulated welfare based demonstrates that this bias stems from the model’s inabil- on the model is closest to the true empirical variance of ity to capture the complete between-household variation welfare at the area level. This finding also has implications in welfare, as they rely solely on covariates aggregated at for bias in unit-level models. This paper is a product of the Poverty and Equity Global Department. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The author may be contacted at pcorralrodas@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team When Aggregation Misleads: Bias in Unit-Level Small Area Estimates of Poverty with Aggregate Data Paul Andres Corral Rodas∗ Key words: Small area estimation; poverty mapping; satellite imagery; census; official statistics JEL classification: C13; C55; C87; C15 ∗ The World Bank Group - Poverty and Equity Global Practice (pcorralrodas@worldbank.org). The author ac- knowledges financial support from the World Bank. Special thanks to Carlos Rodriguez-Castelan, Alexandru Cojo- caru, Tara Vishwanath, and Isabel Molina for comments on an earlier draft. Full replication package for the results presented in this paper may be found in: https://github.com/pcorralrodas/UC_source_of_bias 1 Introduction Household surveys aimed at gauging a population’s living standards often lack representativeness beyond broad regions or specific population demographics. Additionally, there is a risk that many pertinent locations or groups may be omitted from these surveys. However, detailed information on poverty is crucial for effectively targeting resources to alleviate it. The demand for disaggregated statistics has increased the reliance on indirect techniques that integrate supplementary data from censuses, registries, or larger-scale surveys. These methods are used to produce sufficiently precise statistics for granular populations. Small area estimation encompasses a broad range of statistical techniques designed to enhance the precision of estimates when household surveys lack the sample size required for the desired level of accuracy. Among these techniques, model-based approaches stand out by leveraging the concept of "borrowing strength" from larger datasets or auxiliary information. These methods use models that establish relationships across areas (e.g., regression techniques), enabling the creation of indirect estimators (Molina and Rao 2010). Most model-based techniques fall into two main categories: unit-level models and area-level mod- els. Unit-level models are generally applied when data on individual units (e.g., households) are available, while area-level models are used when only aggregate data for specific geographic areas (e.g., area means) are accessible, as described by Fay and Herriot (1979). In poverty estimation, unit-level models predict the welfare distribution first and then apply a threshold to determine the proportion of people below that threshold. In contrast, area-level models directly estimate the poverty rate for an area. Unit-level models face limitations when survey and census data are from different years — a common issue in developing countries where censuses and surveys are conducted infrequently. Area-level models, however, provide a feasible alternative. These models rely on linear functional forms and perform estimation and prediction using only aggregate data for geographic entities of interest (Fay and Herriot (1979); Torabi and Rao (2014)). Another approach, unit-context models, employs an estimation stage where household-level measures are modeled as a linear function of area-level characteristics (Nguyen (2012); Lange et al. (2018); Masaki et al. (2020)). Unit-context models, like unit-level models predict the welfare distribution first and then apply a threshold to determine the proportion of people below that threshold, but do so only using area-level characteristics. Although alluring, unit-context models, have been discouraged due to the method resulting in biased estimates. Corral et al. (2022) show that the method is unable to fully replicate the welfare distribution which results in biased estimates. For a practical example of the resulting bias, see Edochie et al. (2024). In that work the authors present the aggregated estimates at the geographic level of representativeness and it is evident that across many areas the differences compared to direct estimates are considerable (see Figures 7 to 9 in Edochie et al. (2024)). Moreover, it can be seen that the majority of the estimates fall above the 45 degree line, suggesting an 2 upward bias in the estimates. Using real world data, where samples are taken following methods implemented in developing countries, Corral Rodas et al. (2023) present evidence on how unit- context models produce biased estimates of poverty and that the noise and bias of unit-context models is considerably larger than that produced by area-level models. A key question is what is the source of the bias in unit-context models. In this note, the simulations implemented by Corral et al. (2022) are used to study the source of the method’s bias. The bias studied here goes beyond the potential bias of the method noted in Corral et al. (2021) which was related to a sampling issue. Instead, the bias noted in this paper is related to the transformation bias noted by Würz et al. (2022). These authors attempt to address the same issue as Nguyen (2012), Lange et al. (2018) and Masaki et al. (2020) of not having auxiliary microdata and relying on aggregate population-level auxiliary information. Würz et al. (2022) note that relying on aggregate-level data to model household-level welfare leads to first-order bias from back- transformation and second-order bias from using aggregate data. They note that using aggregate means as covariates instead of individual values introduces additional bias due to the convexity of the back-transformation function. The bias studied here is related to the poor model explanatory power of unit-context models and is related to the bias noted by Würz et al. (2022). Because unit-context methods model trans- formed household-level welfare as a linear function of area-level covariates only, these are unable to adequately account for between-household variations in welfare. The simulations undertaken here show that areas where the variance of predicted welfare is aligned to the area’s true variance of welfare are those that exhibit the lowest bias across the entire welfare distribution. This illustrates that under unit-level and unit-context models, bias at the area level is a function of the dependent variable’s mean and the empirical variance of the linear fit. The dependent variable benefits from the use of EB methods, but the empirical variance is fully dependent on the auxiliary data at hand and how aligned it is to the population in each area. The note proceeds to present the assumed model for unit-level small area estimation and provides a discussion on why not accounting for variance leads to biased estimates of poverty. The method for creating simulated data is illustrated, followed by the results. Finally, conclusions are presented. 2 Small area estimation The model based small area estimation methods described in this paper are dependent on an assumed model. The nested error model used for small area estimation was originally proposed by Battese et al. (1988) to produce county-level corn and soybean crop area estimates for the American state of Iowa. For the estimation of poverty and welfare, Molina and Rao (2010) and Elbers et al. (2003) assume that the transformed welfare yah for each household h within each location a in the population is linearly related to a 1 × K vector of characteristics (or correlates) xah for that 3 household, according to the nested error model:1 yah = xah β + ηa + eah , h = 1, . . . , Na , a = 1, . . . , A, (1) where ηa and eah are respectively location and household-specific idiosyncratic errors, assumed to be independent from each other, following: iid 2 iid 2 ηa ∼ N 0, ση , eah ∼ N 0, σe 2 and σ 2 are unknown. Here, A is the number of locations in which the where the variances ση e population is divided and Na is the number of households in location a, for a = 1, . . . , A, and na is the sample size from area a. Finally, β is the K × 1 vector of regression coefficients. One of the main assumptions is that errors are normally distributed. The assumption implies that, conditional on the observed characteristics, the model’s errors are normally distributed. To obtain estimates, the first step is to fit the model from Eq. 1 to the observed sample data via any method providing consistent estimators. Usual fitting methods under this approach are maximum likelihood (ML) or restricted maximum likelihood (REML), both based on the normal likelihood, and H3 method, which does not specify a distribution. This yields the vector of parameter estimates: ˆ = β, θ ˆ σ 2 ˆη ,σ ˆe2 . The empirical best (EB) area effects for the model are estimated from: σ ˆη2 γ ¯a − x ˆa y ˆ , ¯a β γ ˆa = σ ˆη2+σˆe2 /n a ¯a and y where x ¯a are the sample means in area a for x and y , respectively. The variance for the area effects are given by (Molina and Rao, 2010): 2 var [ηa |y ¯a ] = ση (1 − γa ) Making use of the parameters estimated from the model of Eq. 1, it is possible to produce a value of yah for every household in the census data: ∗ ˆ+γ ˆ + ε∗ yah = xah β ¯a − x ˆa y ¯a β ah where ε is drawn from ε∗ 2 2 ah ∼ N (0, ση (1 − γa ) + σe ). This Monte Carlo procedure is often repeated 100 times in order to derive indicators under each simulated population and then average across to 1 For simplicity yah is considered the transformed welfare. The most common transformation applied is the natural logarithm. 4 ˆ+γ derive the final EB estimate. Notice how in the simulated vectors xah β ¯a − x ˆa y ˆ does not ¯a β vary across simulations, only ε∗ ah vary across the simulated vectors. Alternatively, because normally distributed errors are assumed, the probability of being poor for ˆ its idiosyncratic error, any household h in area a is entirely dependent on its expected welfare, xh β, e ¯a − x ˆa y ˆh , and the predicted area effect γ ˆ which are assumed to follow eh ∼ N (0, σ 2 ) and ¯a β e ηa ∼ N γa (¯ ya − x 2 (1 − γ ) , respectively. The empirical best estimated probability of a ¯a β ) , ση a household being poor is given by:   z−y ˆah P rob poorha = Φ   (2) σ ˆη2 (1 − γ ˆ2 ˆa ) + σ e where z is the transformed poverty threshold and y ˆ+γ ˆah = xah β ¯a − x ˆa y ˆ0 . The poverty rate ¯a β for the area is given by the average probability of being poor across households.2 Within a given area, the only thing that varies across households is the covariates. 2∗ ) in a given area Under the assumed model, variation in simulated welfare across the population (Sy a 2∗ is the sum of two components: 1) the variability of the explained portion, Sy 1 yah −y ∗ )2 ˆa = Na h (ˆ ¯a , and 2) variability in the simulated errors 1 Na h ε∗ 2 ah : 2∗ = Sy 1 ∗ −y ∗ )2 a Na h (yah ¯a 2∗ = 1 yah + ε∗ ∗ )2 Sy a Na h (ˆ ah − y¯a 2∗ = 1 2 + ε∗2 2∗ + 2ˆ yah ε∗ ∗ ∗ ε∗ Sy a Na h y ˆah ah + y¯a ah − 2ˆyah y ¯a − 2¯ ya ah 2∗ = Sy 1 yah − y ∗ )2 + 1 ∗2 a Na h (ˆ ¯a Na h εah The bias of poverty estimates at the area level will be determined by differences between the true 2 and the simulated S 2∗ . This is simplified if we assume that the dependent variable for a given Sy a ya area follows a log-normal distribution. Then the simulated poverty rate for a given area under a given threshold (ln z ) under the assumption is:   ∗ ln z − y¯a  F GT0a =Φ  (3) 2 Sya∗ 2 . ∗ ¯a and Sy Consequently, the poverty rate for a given area is a function of y a Unit-context models are an approximation to the assumed underlying data generating process from Eq. 1. Originally introduced by Nguyen (2012), and then re-introduced by Lange et al. (2018) and modified by Masaki et al. (2020). Unit-context models are defined as models where household-level welfare is modeled using only area and sub-area level characteristics. Masaki et al. (2020) suggest 2 Traditionally, this process is approximated via Monte Carlo simulations as noted above. 5 that the model should include characteristics that explain variability at a geographic level below the one for which we aim to estimate poverty. A possible unit-context model follows: ysach = zsac α + tsa ω + gs λ + ηsa + εsach where s is used for an aggregation level that is over the target areas (a super-area) and c is used for subareas, e.g., clusters that are nested in area a. Hence, zsac contains subarea-level characteristics, tsa includes area-level characteristics and gs is composed of super-area-level characteristics (which may include super-area fixed effects). The regression coefficients across these levels are respectively denoted α, ω and λ. The random effects, ηsa , are specified in this model at the area level, the same as in Eq. 1. Note that, among the set of covariates in this model, none is at the unit-level; covariates only vary at the subarea-level and above. A key feature of unit-context models is that the linear fit only explains a relatively small amount of the total variance of the dependent variable (y ), with a coefficient of determination (R2 ) that is often quite lower than models that include household level covariates, and ranges between 0.15 and 0.25 in most instances. Define Sy as the standard deviation of the sample’s dependent variable, the coefficient of determination, R2 , is given by: σ ˆη2+σ ˆ2 2 e R =1− 2 Sy Consequently, because unit-context models have lower explanatory power since they only rely on area-level covariates, if σ ˆη2 ˆ2 2 , and σ is the unit-context model estimate of ση uc euc is the unit-context model estimate of σ 2 e , then: σ ˆη2 +σ ˆ2 σ ˆη2+σ ˆ2 2 uc euc 2 e Ruc =1− 2 < R = 1 − 2 Sy Sy 2 σ ˆη uc ˆ2 +σ euc > σ ˆη2 ˆ2 +σ e. (4) The lower R2 of unit-context models implies that the explained portion of welfare is lower than the true portion of explained welfare. When simulating vectors of the dependent variable at the national level, the empirical standard deviation of the dependent variable is approximated under unit-level and unit-context models. How- ever, under unit-context models the empirical standard deviation at the area level is not properly approximated due to the model misspecification. Consequently, when creating simulated vectors of welfare under unit context models the variation in simulated welfare across the population in an 2∗ ) will not match that of the true population. From Eq. 4, we know that: area (Sya 6 ε∗ 2 ahuc > ε∗2 ah . h h Additionally, given the poor model fit we also know that: ∗ 2 ∗ 2 ˆahuc − y y ¯a uc < yah − y (ˆ ¯a ) h h 2∗ that is larger than that of Hence, under unit-context models some areas will have a value of Sy a 2∗ . This result, coupled with Eq. 3, the true DGP and some will have a smaller or equal value of Sy a 2∗ is most different from implies that the bias of unit-context models will be larger in areas where Sy a 2 . the true model’s Sy a The focus here is the model’s performance on estimating poverty. However, the poor model fit also affects predictions of welfare. Corral et al. (2022) present evidence that unit-context models will do a good job at predicting the model’s dependent variable (y ), thanks to the use of empirical best predictors. A similar result is presented by Chen et al. (2024) who indicate that under model misspecification, unit-context models will perform just as well as area level models in predicting the model’s dependent variable.3 Nevertheless, this seems to only hold true when estimating the mean of the dependent variable of the model is the goal. It does not hold true when the goal is estimating the original untransformed equivalized income or expenditure. Under unit-context models, bias is introduced when the goal is a measure that may be distributionally sensitive – e.g., poverty or equivalized income or expenditure. Corral et al. (2022) present evidence from simulations on how unit-context models will produce biased predictions of the back transformed dependent variable, i.e. equivalized income or expenditure (see figure 5.3 in Corral et al. (2022)). Würz et al. (2022) also present evidence on how unit-context models will lead to biased estimates of welfare that arise from the back transformation of the dependent variable.4 ∗ = x β Assuming log linearity, the simulated nested error model (yah ˆ ah 0 + γ ¯a − x ˆa y ˆ0 + ε∗ ) ¯a β ah yah ) exp (ε∗ ∗ ) = exp (ˆ ∗ 2 2 ∗ implies that exp (yah ah ). Since εah ∼ N (0, ση (1 − γa ) + σe ), then E [exp (εah )] = 2 (1 − γ ) + σ 2 exp 0.5 ση . Therefore, at the area level welfare will be equal to: a e Na 1 2 2 yah ) exp 0.5 ση exp (ˆ (1 − γa ) + σe Na h=1 Consequently, just like for poverty, the predicted welfare under unit-context models will be more 2∗ and the true model’s S 2 . biased the greater the difference between Sya ya 3 Under unit-level and unit-context models welfare is usually transformed to ensure that errors are normally distributed to conform to the model’s assumptions. 4 Under unit-level models welfare is usually transformed to ensure that errors are normally distributed to conform to the model’s assumptions. 7 2∗ is related to biased In the following section, I present a model-based simulation to illustrate how Sy a estimates at the area level. 3 Simulation data Data is generated for the simulations following the assumed model from Eq. 1.5 The population size for the simulated data is N = 500, 000, and the observations are allocated among A = 100 areas (a = 1, . . . , A). Within each area a, observations are uniformly allocated over c = 20 clusters (ca = 1, . . . , Ca ). Each cluster c consists of Nac = 250 observations. In this simulation experiment, a simple random sample of nac = 10 households per cluster is taken, and this sample is kept fixed across simulations. Using a sample, it is possible to compare with estimators based on the FH model (see Corral et al. (2022); Molina et al. (2022); Rao and Molina (2015); Fay and Herriot (1979)). The model that generates the population data contains both cluster and area effects. iid iid Cluster effects are simulated as ηac ∼ N (0, 0.1), area effects as ηa ∼ N 0, 0.152 and household iid specific residuals as each ∼ N 0, 0.52 , where h = 1, . . . , Nac , c = 1, . . . , Ca , a = 1, . . . , A. 1. x1 is a binary variable, taking value 1 when a random uniform number between 0 and 1, at a c the household-level, is less than or equal to 0.3 + 0.5 40 + 0.2 10 . 2. x2 is a binary variable, taking value 1 when a random uniform number between 0 and 1, at the household-level, is less than or equal to 0.2. 3. x3 is a binary variable, taking value 1 when a random uniform number between 0 and 1, at a the household-level, is less than or equal to 0.1 + 0.2 40 . 4. x4 is a binary variable, taking value 1 when a random uniform number between 0 and 1, at a c the household-level, is less than or equal to 0.5 + 0.3 40 + 0.1 10 5. x5 is a discrete variable, simulated as the rounded integer value of the maximum between 1 a and a random Poisson variable with mean λ = 3 1 − 0.1 40 . 6. x6 is a binary variable, taking value 1 when a random uniform value between 0 and 1 is less than or equal to 0.4. Note that the values of x6 are not related to the area’s label. c a 7. x7 is generated from a random Poisson variable with mean λ = 3 20 − 100 + u , where u is a random uniform value between 0 and 1. For the unit-context model variations implemented here the PSU level mean of each covariate is used as the eligible covariates to fit the model. 5 Data is simulated following the same approach as Corral et al. (2022). The write-up here is also borrowed from Corral et al. (2022). 8 In this experiment, I take a grid of 99 poverty thresholds, corresponding to the 99 percentiles of the very first population generated. In total, 1,000 populations are generated. In each of the 1,000 populations, the following quantities are computed in every area for each of the 99 poverty lines: 1. True poverty indicators τa , using the “census”. 2. CensusEB estimators τ ˆaCEBa presented in Corral et al. (2021), based on a nested-error model with only area random effects and including the unit-level values of the covariates. The R2 for this model is roughly 0.60. 3. Unit-context CensusEB estimators τ ˆaU C −CEBa based on a nested-error model with random effects at the area-level. This estimator follows the approach of Masaki et al. (2020). The R2 of the resulting model hovers around 0.17. The average difference between the true poverty indicator and the estimate across the 1,000 simu- lations represents the empirical bias for each area. 4 Results Figure 1: Empirical bias of poverty for CensusEB and Unit-Context small area estimation Note: Simulation based on 1,000 populations generated as described in section 3. Each line corresponds to one of the 100 areas. The x-axis represents the percentile on which the poverty line falls on, and the y-axis is the empirical bias. Because poverty is predicted across the 99 thresholds noted in the previous section, it is possible to plot how bias across all lines and areas is present. The 99 percentiles are considered since the goal 9 of unit-level small area estimation of poverty is to replicate the full welfare distribution and from it, estimate poverty. The simulations presented here, train the model and predict using the same data, thus sampling does not play a role. This is done in order to remove other potential sources of bias from unit-context models, such as the potential omitted variable bias in unit-context models noted by Corral et al. (2021). As can be seen in Figure 1, the bias of unit-context models is present across all lines, but as noted by Corral et al. (2022), the bias is lower for some areas and for some percentiles. 2 Table 1: Statistics for the area level empirical variation (Sˆ y ) of the model’s linear fit 2 2 2 True Sˆ y UC Sˆ y EB Sˆ y Min 0.374 0.075 0.374 Max 0.861 0.106 0.861 Mean 0.568 0.088 0.568 p25 0.448 0.084 0.449 p50 0.549 0.088 0.549 p75 0.682 0.091 0.682 Note: Simulations based on 1,000 populations generated as described in section 3 and averaged to the area level. The true level variation in the explained portion of the model ranges from 0.374 to 0.861, with the average across the 100 areas being 0.568. Table 2: Statistics for the area level total empirical variance of the model simulated dependent variable True Unit-context model EB 2 2∗ 2∗ 2 2∗ 2∗ 2 Sya Sya Sya /Sy a Sy a Sya /Sya Min 0.632 0.813 0.744 0.632 0.998 Max 1.119 0.844 1.295 1.119 1.003 Mean 0.826 0.826 1.026 0.826 1.000 p25 0.707 0.822 0.879 0.707 0.999 p50 0.808 0.826 1.024 0.807 1.000 p75 0.940 0.830 1.166 0.940 1.001 Note: Simulations based on 1,000 populations generated as described in section 3 and averaged to the area level. As argued in section 2, the bias arises because at the area level, the unit-context model does a poor 2 6 job at capturing the empirical variation of the explained portion of the data, Sy ˆ . Table 1 presents 2 the value for Sy 2 ˆ across areas. The true empirical variation of the explained portion, Sy ˆ ,ranges from 0.37 for the area with the lowest empirical variation to 0.86 for the area with the largest, a similar range to the EB model. However, the range for the unit-context model is much smaller, from 0.07 to 0.1. Nevertheless, on average, the total empirical variation of the dependent variable, 2 , is matched by unit-context models (Table 2; mean). This is because the model is fit at the Sy 2 and σ 2 are estimated to be much larger since the covariates capture so little national level and ση η 2 . Consequently, the range for the total of the total empirical variance of the dependent variable, Sy 6 1 ∗ 2 At the area level this is equal to: Na h yah − y (ˆ ¯a ) 10 empirical variance of the areas is minimal, from 0.813 to 0.844, compared to the truth of 0.632 2 and σ 2 only works at the to 1.119. Thus, in the unit-context models the larger estimates for ση η 2∗ national level, across areas this leads to some areas where Sya is considerably larger than the true 2 2∗ 2 Sya and some where Sya is considerably smaller than the true Sy a . Unit-context models are biased because these deviate from the truth by assuming welfare is not dependent on unit level characteristics, which leads to poor fitting models. Figure 2 illustrates how this is manifested across selected poverty lines. The absolute bias for a given area decrease as the ratio of model explained variance to true variance for a given area gets close to 1. Therefore, even if the use of empirical best methods guarantees that the mean of the dependent variable of the model at the area level is unbiased, because poverty is distribution dependent, the inability of unit-context models to approximate the full distribution at the area level leads to biased poverty estimates. 11 Figure 2: Empirical bias of Unit-Context models across select lines Note: Simulation based on 1,000 populations generated as described in section 3. Each dot corresponds to one of the 2∗ 2 100 areas. The x-axis represents the Sy a /Sya for the UC model, and the y-axis is the absolute empirical bias. The gray line represents the quadratic fit plot. Figure 3 averages across all 99 poverty lines the absolute bias from the model for each area.7 This figure makes it more salient how bias is at its minimal point around the point where the unit-context 2∗ 2 model’s Sy a /Sy a is closer to 1 in that area. Hence, estimates are biased for some areas more than for others. 7 1 P =99 For area a, the average absolute bias across all lines is given by: 99 p ˆpa − τpa |,where τ is the head count |τ poverty rate. 12 Figure 3: Empirical absolute bias of Unit-Context models across areas Note: Simulation based on 1,000 populations generated as described in section 3. Each dot corresponds to one of the 2∗ 2 100 areas. The x-axis represents Sy a /Sya for the unit-context model, and the y-axis is the absolute empirical bias across all 99 percentiles. The gray line represents the quadratic fit plot. ∗ )) under unit-context models. The absolute bias A similar issue is observed for welfare (exp (yah 2∗ /S 2 is closer to 1 (Fig. 4, left). Considering that the true area values of un- decreases as Sy a ya transformed welfare range from roughly 18 to 110, the bias for some areas is quite considerable. In accordance to what was noted in section 2, the method yields unbiased values of the model’s dependent variable, y ¯a , i.e. the transformed welfare (Fig. 4, right). This is a similar finding to that of Chen et al. (2024) who through simulations illustrate that, in the presence of substantial model misspecifications, the unit context model shows similar performance to that of area-level models with known variances. Nevertheless, Chen et al. (2024) do not provide results for the back-transformed variable. Considering that welfare usually requires transformation so that model assumptions are met, the author’s results hold little value for international welfare and poverty monitoring. 13 Figure 4: Empirical absolute bias of Unit-Context model’s predicted welfare, exp (ya ) and y ¯a across areas Note: Simulation based on 1,000 populations generated as described in section 3. Each dot corresponds to one of the 2∗ 2 100 areas. The x-axis represents Sya /Sya for the unit-context model, and the y-axis is the absolute empirical bias of the mean welfare for the area. The gray line represents the quadratic fit plot. 5 Conclusions This paper provides evidence that the bias in unit-context models for small area poverty estimation stems primarily from their inability to adequately capture the full variance of welfare at the area level. While these models may achieve unbiased estimates of mean transformed welfare (i.e. the model’s dependent variable) through empirical best prediction methods, they systematically fail to replicate the true welfare distribution within areas, leading to biased poverty estimates and welfare estimates. The simulation results reveal several key insights: First, unit-context models typically explain only a small portion of the total variance in welfare (R2 ranging from 0.13 to 0.25) compared to traditional unit-level models (R2 around 0.50). This limited explanatory power arises because unit-context models rely solely on area-level covariates, omitting household-level variation. Adding more covariates to the unit-context model risks overfitting and could lead to further bias, thus the solution is not aligned to adding more data. An approach similar to the one presented by Würz et al. (2022) could work, but may require having access to unit-level data and thus is not feasible when using satellite derived data. Second, while unit-context models may be able to match at the national level the total empirical 2 and σ 2 ), variation of welfare through the estimation of area and household error components (ση e 14 this compensation mechanism breaks down at the area level. Some areas end up with significantly over- or under-estimated variation of welfare, leading to systematic bias in poverty estimates. Third, our analysis reveals that the magnitude of bias in poverty estimates is directly related to how well the model simulated total empirical variation of welfare matches the true welfare’s variability in each area. Areas where this ratio approaches 1 show minimal bias, while areas with substantial mismatches exhibit larger biases. These findings have important implications for practitioners. While unit-context models offer practical advantages in situations where household-level census data is unavailable or outdated, their inherent limitations in capturing welfare distributions should be carefully considered. Users should be particularly cautious when interpreting poverty estimates for areas where the model’s simulated variability of welfare differs substantially from the observed welfare’s empirical variance in survey data. As noted by Corral Rodas et al. (2023) and Corral et al. (2022), using simulated and real world data, area-level models such as the well-known Fay Herriot model will outperform unit-context models. Future research might explore methods to improve the empirical variance approximation in unit- context models or develop diagnostic tools to identify areas where these models are most likely to produce reliable estimates. Additionally, investigating alternative approaches that better capture within-area welfare distributions while maintaining the practical advantages of unit-context models could prove valuable. 15 References Battese, G. E., Harter, R. M., and Fuller, W. A. (1988). An error-components model for predic- tion of county crop areas using survey and satellite data. Journal of the American Statistical Association, 83(401):28–36. Chen, Y., Lahiri, P., and Salvati, N. (2024). Effects of model misspecification on small area estimators. arXiv preprint arXiv:2403.11276. Corral, P., Molina, I., Cojocaru, A., and Segovia, S. (2022). Guidelines to small area estimation for poverty mapping. The World Bank, Washington, DC. Corral, P., Molina, I., and Nguyen, M. (2021). Pull your small area estimates up by the bootstraps. Journal of Statistical Computation and Simulation, 91(16):3304–3357. Corral Rodas, P. A., Henderson, H. L., and Segovia Juarez, S. C. (2023). Poverty mapping in the age of machine learning. World Bank Policy Research Working Paper, (10429). Edochie, I., Newhouse, D., Tzavidis, N., Schmid, T., Foster, E., Hernandez, A. L., Ouedraogo, A., Sanoh, A., and Savadogo, A. (2024). Small area estimation of poverty in four west african countries by integrating survey and geospatial data. Journal of Official Statistics, page 0282423X241284890. Elbers, C., Lanjouw, J. O., and Lanjouw, P. (2003). Micro-level estimation of poverty and inequality. Econometrica, 71(1):355–364. Fay, R. E. and Herriot, R. A. (1979). Estimates of income for small places: An application of James- Stein procedures to census data. Journal of the American Statistical Association, 74(366a):269– 277. Lange, S., Pape, U. J., and Pütz, P. (2018). Small area estimation of poverty under structural change. World Bank Policy Research Working Paper No. 8472. Masaki, T., Newhouse, D., Silwal, A. R., Bedada, A., and Engstrom, R. (2020). Small area estimation of non-monetary poverty with geospatial data. World Bank Policy Research Working Paper No. 9383. Molina, I., Corral, P., and Nguyen, M. (2022). Estimation of poverty and inequality in small areas: Review and discussion. TEST, pages 1–24. Molina, I. and Rao, J. (2010). Small area estimation of poverty indicators. Canadian Journal of Statistics, 38(3):369–385. Nguyen, V. C. (2012). A method to update poverty maps. The Journal of Development Studies, 48(12):1844–1863. 16 Rao, J. and Molina, I. (2015). Small area estimation. John Wiley & Sons, Hoboken, NJ, 2nd edition. Torabi, M. and Rao, J. (2014). On small area estimation under a sub-area level model. Journal of Multivariate Analysis, 127:36–55. Würz, N., Schmid, T., and Tzavidis, N. (2022). Estimating regional income indicators under transformations and access to limited population auxiliary information. Journal of the Royal Statistical Society: Series A (Statistics in Society), 185(4):1679–1706. 17