Policy Research Working Paper                    10171




      How Accurate Is a Poverty Map Based
          on Remote Sensing Data?
                     An Application to Malawi

                             Roy van der Weide
                             Brian Blankespoor
                                Chris Elbers
                              Peter Lanjouw




Development Economics
Development Research Group
September 2022
Policy Research Working Paper 10171


  Abstract
 This paper assesses the reliability of poverty maps derived                        assessment depends somewhat on the evaluation criteria
 from remote-sensing data. Employing data for Malawi, it                            employed. The two approaches reveal the same patterns
 first obtains small area estimates of poverty by combining                         in the geography of poverty. However, there are instances
 the Malawi household expenditure survey from 2010/11                               where the two approaches obtain markedly different esti-
 with unit record population census data from 2008. It then                         mates of poverty. Poverty maps obtained using remote
 ignores the population census data and obtains a second                            sensing data may do well when the decision maker is inter-
 poverty map for Malawi by combining the survey data                                ested in comparisons of poverty between assemblies of areas,
 with predictors of poverty derived from remote sensing                             yet may be less reliable when the focus is on estimates for
 data. This allows for a clean comparison between the two                           specific small areas.
 poverty maps. The findings are encouraging—although that




 This paper is a product of the Development Research Group, Development Economics. It is part of a larger effort by the
 World Bank to provide open access to its research and make a contribution to development policy discussions around the
 world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may
 be contacted at rvanderweide@worldbank.org.




         The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
         issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
         names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
         of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
         its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                       Produced by the Research Support Team
  How Accurate Is a Poverty Map Based on Remote
     Sensing Data? An Application to Malawi∗
                  , Brian Blankespoor‡
 Roy van der Weide†                                , and Peter Lanjouw¶
                                     , Chris Elbers§




       Keywords: Poverty, Small Area Estimation, Remote Sensing Data
       JEL Classiﬁcation: C13, D31, I32, O15, R13, Y91




   ∗
      The ﬁndings, interpretations, and conclusions expressed in this paper are entirely those of the
authors. They do not necessarily represent the views of the World Bank and its aﬃliated organizations,
or those of the Executive Directors of the World Bank or the governments they represent. The authors
gratefully acknowledge ﬁnancial support from the World Bank Knowledge for Change Program.
    †
      World Bank: rvanderweide@worldbank.org
    ‡
      World Bank: bblankespoor@worldbank.org
    §
      Vrije Universiteit Amsterdam: c.t.m.elbers@vu.nl
   ¶
      Vrije Universiteit Amsterdam: p.f.lanjouw@vu.nl
1     Introduction
Tracking poverty at the global, regional, and national levels helps international orga-
nizations and the development community identify lagging regions and allocate their
resources accordingly (see e.g. Ravallion (2018), Ferreira et al. (2015), Bourguignon
(2017)). Poverty is commonly measured by the percentage of the population whose
income or consumption expenditure falls below a poverty line. The household income
and expenditure data used to estimate poverty come from household surveys that are
conducted every 1-5 years by the national statistical agencies. Individual countries use
these data to monitor poverty at the national and subnational levels.
    Elbers et al. (2007) demonstrates the value of estimating poverty at a more disag-
gregated level. Speciﬁcally, they estimate by how much governments could lower the
costs of providing resources to the poor if they had access to a poverty map that pro-
vides estimates of poverty down to the district or municipality level. Their simulation
study using data from Madagascar, Cambodia, and Ecuador suggests that the gains
can be substantial. In all three countries, the reduction in poverty that is achieved with
a uniform transfer to all households can often be realized with less than half the budget
if the transfers could be targeted on the basis of municipality level estimates of poverty.
Unfortunately, collecting data on household income and expenditure is costly and time
consuming (Kilic et al. 2017; Fujii and van der Weide, 2020). As a result, survey
samples tend to be relatively small, permitting only a limited degree of disaggregation,
often not beyond large sub-regions.
    Estimating poverty at the small area level requires alternative sources of data, ideally
a population census, that are able to provide complete coverage of the population.
Elbers, Lanjouw, and Lanjouw (2003), hereafter referred to as ELL, pioneered the
small area estimation of poverty by combining household survey with population census
data. The proposed approach imputes household income or consumption expenditure
(multiple times) for each household into the population census, and then aggregates the
multiple imputed values to obtain estimates of poverty at the municipality or district
level. The population census arguably provides the richest possible data source for this
purpose. It contains a comprehensive set of variables that are highly correlated with
household income or consumption (although not, usually, income itself). Speciﬁcally, a
census typically includes information on demographics, education, employment, housing
characteristics, asset ownership, and community services (i.e. local amenities) for all
households in the population. In the ELL procedure, a household survey is used to infer
the relationship between these variables and household income, taking advantage of the
fact that many of these variables are available in both the census and household survey.


                                             2
Elbers et al. (2008) assess the validity of the approach using data from Brazil and ﬁnd
that small-area estimates based on the ELL procedure are close to direct calculations
of sub-national poverty in this setting. ELL has been adopted to obtain small area
estimates of poverty and inequality in over 60 countries worldwide.1 Poverty maps
are also increasingly used as a source of data in a variety of empirical applications,
see e.g. Araujo et al. (2008), Baird et al. (2013), Bazzi (2017), Crost et al. (2014),
Demombynes and Ozler (2005), Elbers et al. (2005), and Maloney and Caicedo (2015).
     An important limitation of the ELL approach, however, is precisely its reliance
on the population census. For all the strengths of a population census, a notable
disadvantage is its limited availability. Population censuses are generally conducted
only once every 10 years, and it often takes two years or longer to process the data.
These factors combine to imply that ELL-based poverty maps are rarely up-to-date (i.e.
are often already outdated at the time of release). Furthermore, even when current
population census data exist and are ready for analysis, getting access to such data is
often not trivial. As a result, areas that may be most in need of a timely poverty map
may lack the data necessary to implement the ELL approach. An alternative approach
that can produce a poverty map in the absence of population census data will thus have
obvious appeal.
     The objective of this study is to evaluate whether an accurate map of poverty can be
obtained using predictors derived from publicly available remote sensing data instead
of a population census. A requirement in such a case would be that the household
expenditure survey is also geo-referenced, i.e. that the enumeration areas (or primary
sampling units) can be located on a map. While this would have been a demand-
ing requirement a decade ago, geo-referencing of household surveys is now increasingly
becoming standard practice. To preserve the anonymity of sampled households, the ge-
ographic coordinates of the primary sampling units in public use datasets are not exact,
i.e. some random oﬀsets are applied. Our secondary objective is to investigate whether
these random oﬀsets undermine this alternative approach of obtaining a poverty map.
The primary sampling unit, such as a village or a neighborhood in a city, will be the
unit of observation (rather than the household). Correspondingly, the level of poverty
in the primary sampling units will now serve as the dependent variable. Examples of
possible predictors of local poverty rates include local Night-Time-Lights (NTL), road
network, urban footprint, vegetation, elevation, surface temperature, rainfall.
     The use of remote sensing data in the economics literature has expanded dramat-
ically in recent years, as these data have become richer and more accessible, see e.g.
   1
     See e.g. Alderman et al. (2002) and and Bedi et al. (2007). Recently, the approach has also
been extended to the small area estimation of child malnutrition (Fujii, 2010). For an overview of the
literature on small area estimation we refer the reader to Elbers and van der Weide (2014).


                                                  3
Donaldson and Storeygard (2016). Early applications to the measurement of economic
activity and welfare include Henderson et al. (2012), which also established the value of
these data as predictors of economic outcomes (see also Pinkovskiy and Sala-I-Martin,
2016). These early studies conﬁrm that national income can be accurately predicted
using NTL data.2 Economists have since expanded the set of predictors extracted from
remote sensing data and increased the spatial resolution at which economic welfare
including poverty is estimated. Jean et al. (2016) is among the early studies to use a
comprehensive set of predictors derived from remote sensing data to estimate poverty
at the small area level. The studies vary in the resolution of imagery used, with higher
resolution images allowing for a richer set of predictors that can be extracted from them
(see e.g. Marx et al., 2019). The increased accessibility of remote sensing data com-
bined with the continued demand for highly disaggregated poverty data has prompted
a surge in the production of poverty maps in recent years. In the years following Jean
et al. (2016), dozens of remote sensing-based poverty and/or household welfare maps
have been produced, see e.g. Blumenstock et al. (2015), Blumenstock (2016), Bosco et
al. (2017), Burke et al. (2021), Engstrom et al. (2021), Imran et al. (2014), Pokhriyal
and Jacques (2017), Steele et al. (2017), Watmough et al. (2016), Watmough et al.
(2019), Zhao et al. (2019), and Lee and Braithwaite (2022).
    This highlights the importance of assessing the accuracy of remote-sensing-based
poverty maps. One logical entry-point in this regard is an assessment of how closely
a poverty map obtained using remote sensing data tracks a poverty map derived from
population census data. Such assessments are currently missing from the literature.
Existing studies typically validate their small area poverty estimates by comparing
them against direct-estimates obtained from the household survey. However, these
direct estimates are subject to large sampling error and cover only a small subset of
the total number of areas. They thus fail to confront what motivates the use of small
area estimation methods to begin with. Engstrom et al. (2021) is a notable exception.
That study uses the census-based small area estimates of poverty obtained using the
ELL approach as the dependent variable in their model employing predictors derived
from remote sensing data as independent variables. This does not allow, however, for
a validation of a poverty mapping approach that does not use any population census
data (i.e. an approach that combines household survey with remote sensing data).
    The validation exercise presented here provides a clean comparison between poverty
maps obtained with and without population census data. Employing data for Malawi,
we ﬁrst obtain small area estimates of poverty that combine the Malawi household
  2
    More recent examples of empirical studies that use remote-sensing data to track local economic
activity include Nhu and Noy (2020), Marx et al. (2019), and Van der Weide et al. (2018), among
others.


                                                4
expenditure survey from 2010 with the unit record population census data from 2008
using the ELL approach. This poverty map serves as our benchmark. We then ignore
the population census data and obtain a second poverty map for Malawi by combining
the household expenditure survey data with predictors of poverty derived from publicly
available remote sensing data.
    An important motivation for using data from Malawi is that we have access to the
exact coordinates of the survey PSUs. This enables us to produce the remote-sensing-
based poverty map both using the publicly available geographic coordinates (that have
been subjected to random oﬀset to protect the anonymity of households) and exact
coordinates (that are not publicly available) to investigate whether the random oﬀsets
might undermine the approach – and inspect the correspondence.
    Our ﬁndings are two-fold. First, assessments of the correspondence between the
remote-sensing-based poverty estimates and the census-based poverty estimates de-
pend on how the correspondence is evaluated. The two approaches reveal the same
patterns in the geography of poverty in Malawi; it is hard to tell the diﬀerence in a
side-by-side comparison of the two poverty maps. The statistical correlation between
the two diﬀerent small area estimates of poverty is above 0.9, again conﬁrming their
close correspondence. However, there are districts for which the two approaches obtain
markedly diﬀerent estimates of poverty. Accordingly, poverty maps obtained using re-
mote sensing data can be expected to do well when the decision maker is interested
in estimates or comparisons of poverty between assemblies of areas. Yet, the decision
maker may not be served as well by remote-sensing-based estimates when the focus
is on estimates for speciﬁc small areas. Second, the random oﬀsets embedded in the
geographic coordinates of the public use data do not meaningfully alter the estimated
geography of poverty. But they may, however, lead to an under-estimation of the stan-
dard errors of the small area poverty estimates (i.e. an over-estimation of precision).
The reason for this is that the random oﬀsets partially destroy the spatial correlation
structure in the data and error term.
    The decision to restrict attention to publicly available remote sensing data is a
deliberate one. This ensures that the data are available at no cost in any given country,
and so the approach can easily be replicated and readily repeated as new survey data
become available. By the same token, our exercise arguably provides a conservative
assessment, i.e. a lower bound of success as higher resolution remote sensing data that
come at a cost will allow for more precise estimates. It should be noted that with the
rapid technological advancements in this ﬁeld, what is costly today will in all likelihood
become low-cost or publicly-available in the near future.
    The paper is organized as follows. An overview of the two small area estimation


                                            5
methods is presented in Section 2. In this section we also oﬀer a methodological contri-
bution by deriving standard error estimates for the remote-sensing based poverty map.
Section 3 describes the diﬀerent data used in this study. Our empirical ﬁndings are
presented in Section 4. Finally, Section 5 concludes.


2     Methodologies
2.1     Population census-based small area estimation of poverty
Elbers et al. (2003; ELL) pioneered the use of unit record population census data
combined with household consumption expenditure survey data for the small area es-
timation of poverty and inequality. ELL assume the following data generating process
(DGP):

                                      yah = xT
                                             ah β + va + εah ,

where y is log per capita income or expenditure, x is a vector with independent variables,
and where v and ε are zero expectation error terms that are independent of each other
                              2      2
with variances denoted by σv    and σε,ah , respectively. The subscripts indicate household
level h residing in small area a. The predictors x may consist of variables observed at
diﬀerent levels of aggregation, i.e. household level employment status and local area
employment rates may both serve as good predictors of household expenditure.
    The model permits heteroskedasticity in the household errors εah to allow for in-
stances where the predictability of expenditures varies between demographics (i.e. elder
versus young), employment status (i.e. unemployed versus employed), and sectors (i.e.
                                                               2
blue collar versus white collar). Speciﬁcally, the variance σε,ah  is assumed to be a func-
tion of household and area characteristics, denoted by zah . A logistic transformation
of the squared error will serve as the dependent variable (as the distribution of the
untransformed squared error tends to be heavily skewed):

                                         ε2
                                          ah             T
                                 ln                   = zah α+   ah ,
                                       A − ε2ah
                                                                            2
where the vector zah includes the constant term. An estimate of σε,ah           = E [ε2
                                                                                      ah |zah ] is
obtained by applying the inverse of the logistic transformation (see Elbers et al., 2003).
Estimates of the household level variances are re-scaled such that the sample mean
matches the estimate of the unconditional variance σε  2
                                                         , i.e. E [σ 2          2
                                                                     ε,ah ] = σε .
    For households sampled in area a, the residuals eah = yah − xT         ah β = va + εah are
informative of the latent area error va . Assuming both errors are normally distributed,


                                                  6
it follows that conditional on the residuals observed in the survey data, the error eaj is
normally distributed with mean and variance given by (see e.g. van der Weide, 2014):

                 E [eaj |ea1 , . . . , eama ] = γa         αah eah                                     (1)
                                                     h

                                                 2    2            2            2   2         2
               var [eaj |ea1 , . . . , eama ] = σv − γa           σv +         αah σε,ah   + σε,ah ,   (2)
                                                                          h

              2     2    2
where γa = σv   / (σv + σε /ma ), with ma equal to the number of survey households sam-
pled into the area a, and where:

                                                            1
                                                           2
                                                          σε,ah
                                            αah =                    .
                                                              1
                                                      h      2
                                                            σε,ah


    If there are sampling weights these will be absorbed by (and modify) the weights αah
(see van der Weide, 2014). Conditioning on the survey data when drawing the errors is
known as Empirical Bayes (EB) prediction (see e.g. Molina and Rao, 2010). Note that
EB prediction only concerns the drawing of the area errors, and only those areas that
have been sampled into the survey. The gain in precision resulting from conditioning
                                                                       2
on the survey data for these areas is most notable when the variance σv  and the number
of sampled households ma are relatively large.
    The small area estimates of poverty and corresponding standard errors are obtained
by means of bootstrapping. Let R denote the number of simulations. The estimator
for poverty then takes the form:

                                                     R
                                         ˆ = 1
                                         H                   ˜(r) ,
                                                           h y
                                             R       r=1

where h(y ) is a function that converts the vector y with (log) consumption expenditures
for all households into the corresponding poverty rate, and where y  ˜(r) denotes the r-th
simulated vector with elements:

                                      (r)      ˜(r) + v                  (r)
                                     ˜ah = xT
                                     y      ah β      ˜a(r)
                                                             ˜ah .
                                                            +ε
                             ˜(r) , σ         (r )                  (r)   (r)
where the model parameters β        ˜( r)
                                            ˜ε and the errors v
                                      v and σ                           ˜ah are drawn
                                                                 ˜a and ε
                                               ˜(r) , σ         (r )
from their estimated distributions. We draw β         ˜( r)
                                                              ˜ε by re-estimating the
                                                        v and σ
model parameters using the r-th bootstrap version of the survey.3 The error term
  (r)  (r )
˜a + ε
v     ˜ah is drawn from a normal distribution with mean and variance given by eq. (1)-
  3
      Alternatively, the model parameters can be drawn from their estimated asymptotic distribution.


                                                      7
(2). Finally, the point estimates and their corresponding standard errors are obtained
by computing respectively the average and the standard deviation over R imputed
poverty rates. Note that the standard errors of the poverty estimates are driven both
by model error and idiosyncratic error. The contribution of the idiosyncratic error (the
area- and the household error) will tend to zero as the size of the target population of
a small area (in the population census) tends to inﬁnity. Similarly, the contribution
of the model error tends to zero when the sample size of the survey (which is used to
estimate the model parameters) tends to inﬁnity.


2.2    Remote-sensing-based small area estimation of poverty
Relying on remote sensing data to estimate small area poverty rates alters the dimen-
sions of the model with which we work. The unit of observation will be a geographic
unit, think of a small administrative unit such as a village. The dependent variable
in this case will be village level poverty rather than household level consumption ex-
penditure. Furthermore, restricting the set of the covariates that can be derived from
publicly available remote sensing data could conceivably increase the magnitude as well
as the spatial correlation structure of the error term. Adopting a spatial error model
will help ﬁt this spatial structure (e.g. Anselin, 2001).
    We use a spatial error model (henceforward SEM) that assumes iid normal shocks
to local poverty spilling over to other locations depending on distance. It is assumed
that village level poverty q can be described (in matrix notation) by:

                                        q = Xθ + u,

where X is the matrix with geo-covariates (rows indicating villages and columns indi-
cating covariates), u satisﬁes u = ρW u + ε, where ε is a vector of independent local
errors satisfying ε ∼ N (0, σ 2 I ) and E [ε|X ] = 0, and where W is a spatial weight matrix
with elements wij measuring the spatial correspondence between villages i and j (rel-
ative to other villages). We deﬁne wij = t−       τ
                                                ij (for i = j ) as a function that is inversely
related to travel time tij between i and j with τ > 0. We take W to be symmetric, with
wii = 0, and normalize it by its largest eigenvalue, i.e. W = W ∗ /λmax , where W ∗ is
the un-normalized spatial weight matrix and λmax is the largest eigenvalue of W ∗ (see
e.g. Bell and Bockstael (2000) and Kelejian and Prucha (2010)). Locations that are
in close proximity to each other are exposed to the same latent geographical features
which induces spatial correlation. These correlations are modeled as the product of W
and ρ, where ρ is a scalar, expected to be positive. The spatial covariance structure of



                                              8
the error term u = (I − ρW )−1 ε is seen to satisfy:

                                  Σ = EuuT = σ 2 CC T ,

where C = (I − ρW )−1 .
    Let m = Xθ be the mean of q conditional on X . Let the subscripts S and O
indicate whether the village is “in sample” or “out of sample”, respectively. For ease
of exposition, let us also sort the villages such that the “in sample” villages are stacked
above the “out of sample” villages, which permits for the following notation:

                                 qS        mS          uS
                                       =         +          .
                                 qO        mO          uO

The corresponding partition of the variance matrix Σ takes the form:

                                           ΣSS ΣSO
                                   Σ=              .
                                           ΣOS ΣOO

    Suppose that we know the model parameters (in practice we will have to work with
estimates) such that mS and mO are both observed (note that X is observed for all
villages). Let us also assume for now that we observe qS , the outcome variable of
interest for a sub-set of locations. (Since the village level poverty rates are derived from
a sample of households in any given village, we observe qS with error due to sampling;
we will return to this later.)
    The objective then is to estimate qO given our observations (or estimation) of mS ,
mO , Σ, as well as qS , so as to obtain a map of poverty that covers the whole country. If
the data exhibits spatial correlation (i.e. ρ = 0), then observing qS and mS will carry
information that is relevant for the estimation of qO above and beyond the information
that is conveyed in mO . It follows from the normality assumptions that the Best Linear
Unbiased Predictor (BLUP) of qO solves (Goldberger, 1962):

                  BP (qO ) = mO + cov [qO , qS ] var [qS ]−1 (qS − mS )
                             = mO + ΣOS Σ−1
                                         SS (qS − mS ) .


   Let us next account for the fact that qS is observed with sampling error. We shall
denote the sampling direct estimate of qS by qˆS :

                              ˆS = qS + eS = mS + uS + eS ,
                              q



                                             9
where eS denotes sampling error. This modiﬁes the BLUP estimate of qO to:

                    ˜O = mO + cov [qO , q
                    q                             qS ]−1 (ˆ
                                        ˆS ] var [ˆ       q S − mS )
                         = mO + ΣOS (ΣSS + DS )−1 (ˆ
                                                   qS − mS ) ,

where DS is a diagonal matrix with the sampling variances (i.e. variance of eS ) as
diagonal elements. Note that when the magnitude of sampling variance is large (i.e.
when qˆS denotes a noisy observation of qS ), more weight is given to the synthetic
estimator mO .
    We can similarly use the model to obtain a more precise estimate of qS . It can be
veriﬁed that the BLUP estimator of qS in this case satisﬁes:

                     ˜S = mS + cov [qS , q
                     q                             qS ]−1 (ˆ
                                         ˆS ] var [ˆ       qS − mS )
                         = mS + ΣSS (ΣSS + DS )−1 (ˆ
                                                   qS − mS ) .

Note that q ˜S is of the same form as the estimator put forward by Pratesi and Salvati
(2008), who consider a diﬀerent choice of SEM model. As Pratesi and Salvati (2008)
restrict their analysis to in-sample prediction, they do not oﬀer an estimator for qO .

Precision In practice a prediction of qO is also subject to error induced by the sam-
pling variation of estimators for model parameters. Denote by a ‘check’ ˇ estimators
                                                                                   ˇO
and predictions derived from them. The covariance matrix H of prediction errors of q
can be decomposed into three terms:

                    H = E ( qO − q          ˇO )T = V1 + V2 + V3 ,
                                 ˇO ) (qO − q

   where:

   • V1 reﬂects the sampling variation of m        ˇ, resulting from the fact that θ is
                                          ˇ O = XO θ
                                                  ˇ) and the sampling distribution of
     estimated from the survey data (indicated by θ
                          ˇ
     ˇ carries over to XO θ
     θ

   • V2 represents sampling variation induced by θˇ, as well as the estimators of the
     variance parameters, on ΣOS (ΣSS + DS )−1 (ˆ
                                                qS − mS )

   • V3 reﬂects the variance component that is due to the residuals uO = qO − mO , to
     the extent that they are independent of (and therefore cannot be predicted by)
     ˆS − mS
     q

Estimators for V1 , V2 and V3 are proposed in Lemma 1 and in Appendix 1.

                                           10
Aggregation In general, precision will be comparatively low at the level of geograph-
ical units, while predicting poverty over larger areas increases precision as idiosyncratic
errors are averaged out. Aggregation over groups of units (‘areas’) can be represented
by a linear mapping B . The estimator for the area-level poverty rate is seen to equal
BqˇO , while its variance-covariance matrix solves:

                         E B (qO − q          ˇO )T B T = BHB T .
                                   ˇO ) (qO − q

The standard errors of the area-level poverty estimates reported in Table 5 (in the
column headed ‘SEM’) are obtained by evaluating the square root of the diagonal
elements of BHB T .

Lemma 1 Let q, u, W, X, Σ, ε; ω = (ρ, θ, σ 2 , τ ) and index sets S, O be as deﬁned and
discussed above; assume further that observation errors eS are independent of ε and
normally distributed with zero mean and given variance matrix DS = EeS eT                S . Let
ˇ = (ˇ
ω               ˇτ
          ˇ 2 , θ,
       ρ, Σ        ˇ) be maximum-likelihood estimators of the corresponding parameters with
variance matrix V (ˇ     ω ), and let q       ˇ+Σ
                                      ˇO = XO θ       ˇ SS +DS ]−1 (ˆ
                                                ˇ OS [Σ                     ˇ) denote predictions
                                                                    qS − XS θ
of qO . Denote by SOS the matrix:

                                  SOS = ΣOS [ΣSS + DS ]−1

Then: (i)

               qO − q            ˇ) + SOS (uS + eS ) − S
                    ˇ0 = XO (θ − θ                     ˇOS (ˆ       ˇ) + rO ,
                                                            qS − XS θ                           (3)

with:
                                 rO = uO − E (uO |uS + eS ).

And: (ii) The three terms on the right side are asymptotically independent and their
covariance matrices can be consistently estimated by:

                        ˇ)T X T = XO V (θ
                 ˇ)(θ − θ
 V1 = Eω XO (θ − θ                      ˇ)X T
                             O             O
                                                                                            T
 V2 = Eω                          ˇOS q
                 SOS (uS + eS ) − S           ˇ
                                      ˆS − XS θ                         ˇOS q
                                                       SOS (uS + eS ) − S           ˇ
                                                                            ˆS − XS θ
        ≈ Eω        ˇOS q
              SOS − S    ˆS q
                            ˆST
                                (SOS − SˇOS )T + S            ˇ)X T S
                                                    ˇOS XS V (θ     ˇ
                                                                 S SO

 V3             T
        = E rO rO ≈Σˇ OO − Σ      ˇ S + DS ]−1 Σ
                            ˇ OS [Σ            ˇ SO .


Proof Here we provide an outline of the proof. The detailed proof is given in Ap-




                                               11
pendix 1. The ﬁrst assertion follows from:

           qO = xO θ + uO = x0 θ + E (u0 |qS − XS θ) + r0 = x0 θ + E (u0 |uS + eS ) + r0 .

Asymptotic independence of the ﬁrst and second term on the right hand side of eq. (3)
follows from the independence of θ ˇ and q        ˇ and the fact that the information
                                          ˆS − XS θ
matrix of the model for qˆS is block diagonal in θ and (ρ, σ 2 , τ ) respectively. As for
independence between the third and the ﬁrst two terms, note that the estimator ω     ˇ is
a function of uS + eS and hence is uncorrelated with rO = uO − E (uO |uS + eS ), which
together with the normality assumptions implies independence.
    To emphasize that the expectations of the ﬁrst two terms are taken with respect
to the sampling distribution of ω we have used the symbol Eω . The expressions for
the asymptotic covariance of the ﬁrst and third terms of eq. (3) are obvious. Details
on the covariance matrices of the three terms and estimators for them are discussed in
Appendix 1.



3         Data
Figure 1 (left panel) shows the map of Malawi, our study area. Malawi is a landlocked
country in the South-East of Sub-Saharan Africa, situated between Zambia, Mozam-
bique, and Tanzania, with a large lake alongside its Eastern border. Its territory has
an elongated shape that is divided into a Northern, Central, and Southern region, with
over 80 percent of the population residing in the Central and Southern regions (see Ta-
ble 1). The country is still highly rural with just over 15 percent of the population (in
2010) residing in urban areas. The three largest cities span all three regions: Lilongwe
(the capital) in the Central region with a population of 674,448 (2008), Blantyre in
the Southern region with a population of 661,256 (2008) , and Mzuzu in the Northern
region with a population of 133,968 (2008) (Malawi NSO 2008).
    Both approaches considered in this paper rely on the 2010-11 Malawi Third Inte-
grated Household Survey (IHS3), which is part of the Living Standards Measurement
Study - Integrated Surveys on Agriculture (LSMS-ISA) project (Malawi National Sta-
tistical Oﬃce (NSO) and World Bank, 2012). The survey contains high-quality data on
household consumption expenditure that is used to measure poverty, deﬁned as the per-
centage of individuals whose total annual household consumption per capita fall below
the national poverty line. Household expenditure and the poverty line are expressed in
Malawi kwacha using February/March 2010 prices.4 The poverty line, the sum of the
    4
        For details on the price deﬂator used in the Malawi IHS3, we refer the reader to Malawi National


                                                    12
food and non-food poverty lines, equals kwacha 37,001.68.5
    The ELL approach combines the household survey with the Malawi 2008 Popula-
tion and Housing Census (henceforward referred to as the census). The census provides
individual and household level information on household composition, education, em-
ployment, dwelling characteristics and asset ownership. These data are used to obtain
multiple imputed values for household consumption expenditures for all households in
the census, which are then aggregated to obtain estimates of poverty at the small area
level. Alternatively, the SEM approach model combines the household survey with re-
mote sensing data, such as night time lights, urban footprints, major roads, population
density, vegetation, surface temperature, rainfall. It uses these data to obtain multiple
imputed values for village level (i.e. enumeration area) poverty rates which too can
be aggregated to the small area level to facilitate a comparison between the two ap-
proaches. The following sub-sections provide further details on the respective databases
used.


                               In 2010     % population       % urban

                               Malawi 100                     15.2
                               North   13.1                   14.3
                               Central 42.6                   15.4
                               South   44.3                   15.2
                                  Table 1: Population in 2010




3.1     Household expenditure survey
The Third Integrated Household Survey (IHS3) was conducted between March 2010
and March 2011. Its target universe consists of the households and individuals in all
districts, except for the Likoma district. A total of 12,271 households were interviewed
(with 668 replacements), see Table 2. The sample is designed to be representative at
Statistical Oﬃce and World Bank (2012). For a more general discussion on spatial (and temporal)
price deﬂation and real consumption measurement, see e.g. Gibson et al. (2017) and van Veelen and
van der Weide (2008).
    5
      The food poverty line in the IHS3 is Kwacha 22,956, the monetary cost of 2,400 calories per person
per day, where the price of a calorie is estimated from the population in the 5th and 6th deciles of the
aggregate distribution for consumption. The non-food poverty line (non-basic food needs) in the IHS3
is kwacha 14,045, and is estimated as the average non-food consumption of the population whose food
consumption is close to the food poverty line.


                                                  13
Figure 1: Study area (left panel) and LSMS survey points color-coded by poverty rate
(right panel: count of individuals below the national poverty line divided by total
individuals sampled per household cluster)

the nation, region, and district level, as well as for urban and rural areas at the national
and regional level. The sampling frame is based on listing information and cartography
from the 2008 Malawi Population and Housing Census (the same Census used in this
poverty mapping project). The IHS3 uses a stratiﬁed and two stage sampling design
with: (1) 767 primary sampling units (PSUs); and (2) 16 households randomly selected
from each PSU. In the actual population, at the time of the census, an EA (which
serves as the PSU) on average counted a total of 235 households. A minimum of 24
EAs were chosen in each of the 31 districts included in the IHS3 (out of the 32 districts
in Malawi).
    The PSUs are geo-referenced such that they can be placed on a map and merged
with remote sensing data. The right-panel of Figure 1 shows the observed poverty
rates for the 767 PSUs (i.e. villages), obtained by evaluating for each PSU the share of
sampled individuals with a household expenditure per capita that is below the poverty

                                            14
line. While there is a larger concentration of sampled clusters in and around major
urban centers, the survey provides a complete coverage of the country including rural
and more remote areas. Poverty rates are seen to vary considerably across space. On
average however, poverty levels tend to be higher in more remote areas that located
further away from the urban centers. Note that this does not imply that poverty is
less of a concern in urban centers. In fact, the majority of the poor plausibly reside
in urban centers due to the higher concentration of population there, i.e. despite the
lower prevalence of poverty.


                                           2008 census    IHS3

                          #   regions    3                3
                          #   districts  32               31
                          #   Wards/TAs 353               279
                          #   EAs        12,575           767
                          #   Households 2,859,720        12,271
                Table 2: Descriptive statistics of 2008 census and IHS3




3.2    Population census
The enumeration for the 2008 Population and Housing Census for Malawi was carried
out between June 8 and 28, 2008. It covers regular households, institutions, and the
homeless. For the purpose of this exercise, institutions and the homeless are omitted.
The total population was found to be 13,077,160. Table 2 reports the number of house-
holds in the census and number of sampled households in the survey at diﬀerent levels
of aggregation. The most disaggregated level at which poverty will be estimated is the
Traditional Authority (TA) level, which we may also refer to as the‘small area level.
    The administrative boundary data are taken from the National Statistical Oﬃce
(NSO) of Malawi in order to match the boundaries with the census data at the enumer-
ation area level. Malawi has 12,666 enumeration area boundaries from the 2008 census.
The IHS3 household survey has 12,271 observations from 768 enumeration areas.6
  6
    See the background report for more details on the design, sample and survey results at:
http://microdata.worldbank.org/index.php/catalog/2939/download/41224




                                            15
3.3     Public remote sensing data
The analysis uses publicly available global remote sensing data from numerous sources
and summarizes predictors derived from them at the administrative level (i.e. TA level)
for Malawi. For most of the variables, we take advantage of the geographically com-
prehensive raster data structure, which uses pixels to form a grid that covers all land
areas. From the raster structure we extract geo-referenced information at the adminis-
trative level. Using Geographic Information Systems (GIS) and Google Earth Engine,
we overlay the administrative boundaries with other geo-referenced gridded data, (e.g.
temperature, precipitation, elevation, etc.). When an administrative area consists of
multiple pixels (which is typically the case), the pixel observations are collapsed (i.e.
aggregated) to obtain summary statistics (e.g. mean and sum) at the administrative
area level.7 This process of extraction provides a nearly exhaustive correspondence.
For a select few cases, we make two minor adjustments:

    • Administrative areas near boundaries with water or small islands do not always
      overlay with the raster data. In this case, we ﬁrst extract data by center point
      if the results from the polygon extract are missing. Second, we make a small
      adjustment by ﬁlling in the missing cells from the original raster with the average
      value of data from contiguous neighbors.

    • When the polygon representing an administrative area is much smaller than the
      raster pixel size, we disaggregate the raster data. If the center point of the polygon
      is located in an area of “no data” (e.g. a lagoon or a lake), then we move the
      center point to the nearest land pixel. We also calculate the area at the pixel size
      to ensure consistency between the sum of pixel values and the corresponding land
      area.8

   For the remaining data, we use vector data to calculate distances. Table 3 lists the
variables considered in this paper that have been derived from publicly available remote
sensing datasets. These variables are assembled into various themes related to initial
conditions, agriculture, climate, environment, socio-economic conditions, population
and remoteness. Speciﬁc details on the source and construction of the variables can be
found in Appendix 2.

   7
     We use two geometries (center points and polygons) to represent the administrative area. We
extract raster data by polygon using the extract function in R. In Google Earth Engine, we use the
Reducer function over a region at the administrative level.
   8
     An area calculation with the vector data provides more accurate measurements, however it does
not always yield consistent area with the pixel calculations, which is necessary to ensure variables with
the share are between 0 and 1.


                                                   16
Theme / Layer        Description                  approximate grid   Source
                                                  size

Initial conditions
dist2cap             Distance to capital          N/A                NSO and Global
                                                                     Insights v6.1
dist2border          Distance to border           N/A                NSO
dist2major river     Distance to major river      N/A                NSO and Nat-
                     (centerline)                                    ural Earth 3.0
                                                                     1:10 million
                                                                     (2017)
dist2rail            Distance to railway          N/A                NSO and Global
                                                                     Insights v6.1
elev30m              Elevation from SRTM          90m and 1km2       Jarvis et al.
                                                                     (2008)
srtm popmask         Elevation in populated ar-   1km2               Jarvis et al.
                     eas in 2012 from Landscan                       (2008); Bright
                                                                     et al. (2013)
Slope                Slope (degrees of slope      30-arc-seconds     Verdin et al.
                     (x100))                                         (2007)
Ruggedness           Ruggedness Index             3 arc-seconds      Nunn and Puga
                                                                     (2012)
rugged popmask       Ruggedness Index in popu-    1km2               Nunn and Puga
                     lated areas                                     (2012); Bright
                                                                     et al. (2013)

Agriculture
ag hybrid            Agricultural Hybrid          1km2               Fritz et al.
                                                                     (2015)
lgp50 v6990          Length of Growing Period     10km2              FAO-GAEZ ver-
                                                                     sion 3 (Fischer
                                                                     et al. 2012)
rain crop sh         Share of area with rainfed   1km2               GFSAD1000
                     crop                                            V1.0: (Telu-
                                                                     guntla et al.
                                                                     2014)




                                     17
Theme / Layer       Description                    approximate grid   Source
                                                   size

irrigated crop sh   Share of area with irrigated   1km2               GFSAD1000
                    crop                                              V1.0: (Telu-
                                                                      guntla et al.
                                                                      2014)
agr oppt cost       Global map of gross eco-       10km2              Naidoo and
                    nomic rents from agricul-                         Iwamura (2007)
                    tural lands (2000 US dol-
                    lars per hectare per year)

Climate
bio1                Annual Mean Temperature        1km2               Bioclim (Hi-
                                                                      jmans et al.
                                                                      2005)
bio4                Temperature Seasonality        1km2               Bioclim (Hi-
                    (standard deviation *100)                         jmans et al.
                                                                      2005)
bio12               Annual Precipitation           1km2               Bioclim (Hi-
                                                                      jmans et al.
                                                                      2005)
bio15               Precipitation Seasonality      1km2               Bioclim (Hi-
                    (Coeﬃcient of Variation)                          jmans et al.
                                                                      2005)
bio16               Precipitation of Wettest       1km2               Bioclim (Hi-
                    Quarter                                           jmans et al.
                                                                      2005)
bio17               Precipitation of Driest        1km2               Bioclim (Hi-
                    Quarter                                           jmans et al.
                                                                      2005)
oxmap daylst        Day time land surface tem-     5km2               MOD11A2 and
                    perature (Celcius)                                Weiss et al.
                                                                      (2014)
oxmap ngtlst        Night time land surface        5km2               MOD11A2 and
                    temperature (Celcius)                             Weiss et al.
                                                                      (2014)


                                    18
Theme / Layer     Description                    approximate grid   Source
                                                 size

Aridity           Aridity Index                  1km2               Trabucco and
                                                                    Zomer (2009)
Pet               Potential Evapotranspira-      1km2               Trabucco and
                  tion                                              Zomer (2009)

Environment
NDVI              Normalized Diﬀerence           10km2              Zhu et al.
                  Vegetation Index (NDVI)                           (2013)
                  (1981-2011)
lndsatndvi 2008   Normalized Diﬀerence                              LandSat
                  Vegetation Index (NDVI)
                  (2008)
EVI               Enhanced Vegetation Index      5km2               MODIS BRDF-
                  (EVI)                                             corrected
                                                                    imagery
                                                                    (MCD43B4)
                                                                    with gap-ﬁlled
                                                                    method by
                                                                    Weiss et al.
                                                                    (2014)
Forloss           Forest Loss (2008 – 2000)                         Hansen et al.
                                                                    (2013)
PM2.5             Particulate Matter 2.5 at      1km2               van Donkelaar
                  35% RH (ug/m3) (no dust                           et al. 2015
                  or seasalt) (2008)
ET solrad         Extraterrestrial Solar Radi-   2.5km2             Zomer et al.
                  ation by month                                    2008



Socio-economics
NTL               Night Time Lights (1996,       1km2               NOAA DMSP-
                  2000, 2010)                                       OLS




                                  19
Theme / Layer          Description                  approximate grid   Source
                                                    size

Population count and
density
lpop12                 Landscan (2012)              1km2               Bright et al.
                                                                       (2013)
urban lpop12           Urban population             1km2               Bright et al.
                                                                       (2013); WDI
                                                                       (2017); Authors’
                                                                       calculations
ghspop                 GHS-Pop (1975, 1990,         1km2               Freire et al.
                       2000, 2014)                                     (2016)
ghsbuiltup             GHS-Built up (1975, 1990,    300m               Freire et al.
                       2000, 2014)                                     (2016)

Travel time and ac-
cessibility
dist2railway           Distance to railway          N/A                NSO and Global
                                                                       Insights v6.1
tt50k2000              Mean Travel time to major    1km2               Uchida and Nel-
                       cities (circa 2000)                             son (2008)
tt50k2010              Mean Travel time to major    1km2               Berg et al. (in
                       cities (circa 2010)                             prep)
ai50k2000              Agglomeration Index (circa   1km2               Uchida and Nel-
                       2000)                                           son (2008)
ai50k2010              Agglomeration Index (circa   1km2               Roberts et al.
                       2010)                                           (2017)
                              Table 3: Data source summary




                                       20
4       Empirical application to Malawi
4.1     Model selection and estimation of model parameters
4.1.1    ELL approach: Household level data with household expenditure as
         dependent variable

We rely on variables that are available in both the survey and the census. The ﬁrst step
of our model selection procedure groups candidate variables by type: (i) demograph-
ics, (ii) education, (iii) employment, (iv) dwelling characteristics, (v) asset ownership,
and (vi) village level variables. This latter group includes variables that are created
by computing mean values of selected variables from the population census at the
enumeration-area (EA) levels (think of local employment rates, share of the population
with given levels of education and the penetration of phone use) and variables derived
from the publicly available remote data (think of night-time-lights, distance to nearest
city and river, local climate, and agricultural land quality).
    Next, we regress the log of per capita expenditure (combined with region and urban
dummies) on each group separately using Ordinary Least Squares (OLS). Independent
variables that are not signiﬁcant at the 5% level are sequentially dropped from the
regressions, starting with those that are least signiﬁcant. Once all groups have been
considered we select the group that provides the best ﬁt, combine this with the group
that provides the second-best ﬁt, and continue to trim the variables that cease to
be signiﬁcant. This process is continued until every single group has been given a
chance to contribute one or more variables to the model. We allow for interactions
with the urban-rural dummy to capture selected area heterogeneities (see Tarozzi and
Deaton, 2009; Tarozzi, 2011). Finally, LASSO model selection is implemented to verify
that all key variables are accounted for. The same procedure is applied to obtain the
heteroskedasticity model describing the variance of the errors. The resulting regression
models are included in Appendix 3.
    Among the diﬀerent categories of explanatory variables, dwelling characteristics and
household asset ownership rank as the strongest predictors of household consumption
expenditure per capita, followed by household head education and employment vari-
ables. An interesting observation is that local area night-time-lights, agricultural land
quality, and climate data (derived from public remote sensing data that serve as pre-
dictors in the SEM approach) are found to be signiﬁcant when included on their own
– but once we control for variables available in the population census, these variables
lose their signiﬁcance and are subsequently dropped from the model.
    Once the models have been selected, the model parameters are estimated by going


                                           21
through the following steps: (a) estimate the models using OLS and extract the resid-
                                                              2       2
uals; (b) estimate the unconditional variance parameters σv     and σε  , using Henderson’s
method III (see Henderson, 1953; and Searle et al., 1992); (c) estimate the conditional
            2
variance σε,ah  ; and ﬁnally (d) construct the covariance matrix Ω = E vv T + εεT |x =
  2             2
σv  In + diag(σε,ah ), and compute the feasible GLS estimator for β .
     While the regression coeﬃcients purely capture correlations and not causal eﬀects,
it is instructive to verify what predictors stand out and inspect the signs of their eﬀects.
As one would expect, higher (lower) quality materials used for roof, ﬂoor, and walls
as well as a larger (smaller) number of rooms are positively (negatively) correlated
with (log) household expenditure. Household assets, particularly, owning a car, fridge,
cooker, and tv, all report positive regression coeﬃcients. Furthermore, we observe
that per capita household expenditures tend to be higher (lower) for smaller (larger)
households, and for households with a more (less) educated household head.

4.1.2   SEM approach: Village level data with village poverty rate as depen-
        dent variable

A similar procedure is adopted to build the model for village level poverty using variables
derived from public remote sensing data as predictors which are grouped into: (i)
population density and land size, (ii) night time lights, (iii) road connectivity and
distance to cities, borders and rivers, (iv) climate and environment, and (v) agricultural
suitability variables. Among these categories of variables, night time lights, market
access (i.e. road connectivity), and local climate and environment report the highest
correlations with village poverty with similar in-sample goodness-of-ﬁt levels. The ﬁnal
model has 15 explanatory variables.
    We estimate the model by maximum likelihood. The distance decay parameter
τ is set to τ = 0.54, which maximizes the in-sample goodness-of-ﬁt (measured by the
correlation between predicted and observed poverty rates). It turns out that, except for
ρ, the parameters (as well as their conditional covariance matrix) are not very sensitive
to the choice of τ . The estimation results are reported in Table 4, where poverty is
measured as a percentage. Details on working out the asympototic variance-covariance
matrix of the parameter estimators are given in Appendix 1.
    While also here the regression coeﬃcients capture correlations and not causal eﬀects,
it will nevertheless be instructive to inspect the signs of the correlations. Lower poverty
rates are associated with higher levels of population density, higher values of night time
lights and higher shares of cultivated land that can be irrigated through rainfall. In
contrast, higher poverty rates are associated with increased remoteness (i.e. larger
distances from major cities), higher levels of ruggedness, and higher rates of forest loss.

                                            22
                                            Poverty (NSO)
                           (Intercept)        -0.45205 **
                                               (0.21987)
                           ln lpop12         -0.03781 ***
                                               (0.00734)
                           ln rugged          0.02051 **
                                               (0.01005)
                           ln slope          -0.06148 ***
                                               (0.01745)
                           ln forloss         0.07558 ***
                                               (0.01169)
                           agr opp cst        0.0085 ***
                                               (0.00328)
                           rain crop sh         -0.02675
                                               (0.01844)
                           dist2major river  -0.00131 ***
                                               (0.00016)
                           ln tt50k2010       0.02608 **
                                               (0.01045)
                           ln ntl 2010        -0.02033 **
                                               (0.00968)
                           ln ntl 1996       -0.03566 ***
                                               (0.01279)
                           bio1              -0.00406 ***
                                               (0.00077)
                           bio4              -0.00021 ***
                                                 (4e-05)
                           pet                0.00147 ***
                                               (0.00016)
                           s2                    0.02357
                                               (0.24747)
                           rho               0.75524 ***
                                               (0.00165)
                           Observations            765
                                  2
                           adj. R                 0.558

Table 4: Parameter estimates of SEM model with τ = 0.54
t-statistics in parentheses: * p<0.10, ** p<0.05, *** p<0.01

4.2    Comparing poverty maps
Before we inspect the estimates of poverty at the small area level, let us ﬁrst evaluate
the estimates at the national and the regional levels. Since the IHS3 survey is designed
to be representative at these levels, the survey direct estimates provide a natural bench-
mark to compare our model-based estimates against. The ELL and SEM estimates are


                                           23
obtained by aggregating the small area estimates to the national and regional level using
population weights. The results are shown in Table 5. The model-based estimates from
both approaches are seen to line up with the survey-direct estimates. ELL and SEM
both identify the North and South as the regions with the highest rates of poverty.9
While the two approaches disagree slightly on the ordering of these two regions, the
observed diﬀerence in the poverty estimates lies well within the 95 percent conﬁdence
interval.
    More striking is the diﬀerence in statistical precision between the two approaches.
ELL achieves slightly greater statistical precision than the survey-direct estimates (by
virtue of combining the survey with unit record population census data), while the
standard errors obtained by the SEM approach are notably larger (between two and
three times the ELL standard errors). This diﬀerence in precision is in large part due
to divergence in number of observations used to estimate the model parameters: 12,271
households in the ELL approach versus 767 PSUs in the SEM approach. The variance
of the poverty estimates are largely driven by the variance of the model parameter
estimates, which in turn scales with the number of observations used in the respective
regression model. As the model adopted by ELL is at the household level, it has 16
times more observations to work with compared to the SEM approach.


                              survey-direct     ELL              SEM

                   Malawi 0.507 (0.009) 0.498          (0.006) 0.492 (0.023)
                   North   0.543 (0.021) 0.562         (0.018) 0.540 (0.022)
                   Central 0.445 (0.015) 0.438         (0.009) 0.434 (0.024)
                   South   0.555 (0.014) 0.537         (0.008) 0.533 (0.026)
                Table 5: Estimates of poverty from the survey, ELL and
                SEM




   The small area estimates of poverty for the two approaches are presented in Figure 2,
which shows the two poverty maps side by side. A visual inspection of the two maps
shows a striking agreement on the spatial distribution of poverty. Both maps suggest
low levels of poverty in the urban areas of Mzuzu, Kasungu, Lilongwe, Zomba, and
Blantyre, as well as areas of high poverty in the far north and south of the country.
   9
    SEM slightly underestimates poverty in areas of the Northern region with comparatively low rates
of poverty.


                                                24
The models diﬀer slightly in areas of intermediate poverty, but the spatial patterns
shown in the two maps are generally remarkably similar.




Figure 2: ELL (left) and SEM (right) estimates of poverty at the TA level. National
parks and game reserves are masked as “protected areas”

    Figure 3 maps the diﬀerence between the SEM and ELL estimates. A well-deﬁned
geographical pattern would hint to the possibility that there is a predictable structure
in the data that is picked up by one model but not by the other. Noteworthy patterns
include an area in the north-central part of the country between Mzuzu and Kasungu
coincident with a low density of survey points. In this area the SEM estimates are
lower than the ELL estimates for three contiguous TAs. Conversely the areas where
SEM predict higher poverty rates tend to be TAs in an east-west oriented band located
south of Lake Malawi.


                                          25
Figure 3: Spatial distribution of diﬀerences SEM and ELL estimates of poverty at the
TA-level

4.3    Comparing individual small area estimates
While the diﬀerence between SEM and ELL estimates of poverty is small for the large
majority of TAs, there are a number of TAs for which the discrepancy is more sub-
stantial. Figure 4 plots a histogram of the diﬀerences in poverty estimates (the SEM
estimate minus the ELL estimate). It follows that for about half of the TAs the esti-
mates lie within +/- 5% points of each other (for 75% of TAs, the estimated poverty
rates lie within 10% points of each other). The histogram also conﬁrms that the diﬀer-
ence between estimates are symmetrically distributed around zero by approximation;
there is no meaningful bias in the SEM estimates relative to ELL, and overestimates of
poverty occur with the same frequency as underestimates of poverty.
    Figure 5 evaluates the correspondence between the two approaches by plotting the
SEM estimates (vertical axis) against the ELL estimates of poverty (horizontal axis),


                                         26
Figure 4: Histogram of diﬀerence between SEM and ELL estimates of poverty at the
TA level

where the size of the dot/bubble measures the population size of the TA. The green
line shows the diagonal while the red line plots the ﬁtted regression line. The two lines
almost perfectly coincide indicating a near zero bias. The R2 is 0.83 which conﬁrms the
close correspondence between the SEM and ELL estimates. The ﬁgure also conﬁrms
however that despite the close correspondence for the large majority of TAs, there are a
select number of TAs, including a number of populated TAs, where the two estimates of
poverty diﬀer substantially. This suggests that while the SEM approach accurately cap-
tures the geography of poverty and conceivably provides accurate estimates of poverty
for groups of TAs, estimates may be oﬀ for any particular TA (i.e. small area). This
means that the value of the SEM approach will depend on the needs of the application.


4.4    Robustness of SEM method to random oﬀsetting of GPS
       locations
In order to preserve the conﬁdentiality of sample households and communities, the IHS3
does not provide exact geographic coordinates of the Primary sampling Units (PSUs) in
its public use dataset. Such an approach is standard practice; it is rare for exact GPS
locations of PSUs to be made available in the public domain. In our assessment of the
SEM method above, however, the precise EA coordinates provided by the Malawi NSO
were employed. The question thus arises whether performance of the SEM method is
seriously undermined when one is compelled to work with the public use coordinates.
    The coordinates in the IHS3 made available for public use have been modiﬁed using


                                           27
   Figure 5: Scatter plot of SEM against ELL estimates of poverty at the TA level

the MeasureDHS methodology (Perez-Heydrich et al. 2013). This method applies a
random oﬀset to the exact GPS locations within a speciﬁed range that varies between
0-2 km in urban areas, and 0-5 km in rural areas. Two caveats apply. First, the
speciﬁed range for rural PSUs that are particularly remote (about one percent of rural
PSUs) is extended to 0-10 km. Second, the condition is imposed that location of the
random oﬀset must be within the sampled district. The result is a set of modiﬁed EA
coordinates with known limits of accuracy for public use (Malawi NSO and World Bank
2012).
    Table 6 compares the regression models obtained with the NSO and the public use
coordinates, respectively. Using public coordinates reduces the in-sample goodness-of-
ﬁt of the model, as is to be expected. The reduction in adjusted R2 is reasonably modest,
however, from 0.558 to 0.514. The two models are qualitatively similar; all regression
coeﬃcients remain signiﬁcant and are of the same order of magnitude. Some regression
coeﬃcients are more sensitive to the choice of public versus exact NSO coordinates
than others. Examples of covariates that are found to be particularly stable include:
bio1, bio4m pet, slope, ruggedness, and distance to river. Not coincidentally these are
covariates that exhibit a high degree of spatial correlation, and as such are less sensitive
to random spatial perturbations.
    Table 7 reports the statistical correlation coeﬃcients between the two diﬀerent SEM
estimates (obtained using public versus exact NSO coordinates) and the ELL estimates
of poverty (the benchmark) at the TA level. It can be seen that: (a) While SEM esti-
mates obtained with public coordinates exhibit a lower correlation with the benchmark
ELL estimates compared to the estimates obtained with exact NSO coordinates, the

                                            28
                                 Poverty (NSO)      Poverty (Public)
                (Intercept)        -0.45205 **         -0.58123 **
                                    (0.21987)            (0.22611)
                ln lpop12         -0.03781 ***        -0.02969 ***
                                    (0.00734)            (0.00848)
                ln rugged          0.02051 **          0.02977 ***
                                    (0.01005)            (0.01076)
                ln slope          -0.06148 ***        -0.05727 ***
                                    (0.01745)            (0.01836)
                ln forloss         0.07558 ***         0.04426 ***
                                    (0.01169)            (0.01122)
                agr opp cst        0.0085 ***          0.01517 ***
                                    (0.00328)            (0.00357)
                rain crop sh         -0.02675             -0.02488
                                    (0.01844)            (0.02009)
                dist2major river  -0.00131 ***        -0.00131 ***
                                    (0.00016)            (0.00017)
                ln tt50k2010       0.02608 **           0.02195 *
                                    (0.01045)            (0.01145)
                ln ntl 2010        -0.02033 **            -0.01452
                                    (0.00968)            (0.01016)
                ln ntl 1996       -0.03566 ***        -0.03638 ***
                                    (0.01279)            (0.01248)
                bio1              -0.00406 ***        -0.00401 ***
                                    (0.00077)            (0.00079)
                bio4              -0.00021 ***        -0.00025 ***
                                      (4e-05)              (4e-05)
                pet                0.00147 ***         0.00153 ***
                                    (0.00016)            (0.00017)
                s2                    0.02357              0.02622
                                    (0.24747)             (0.3463)
                rho                0.75524 ***          0.6478 ***
                                    (0.00165)            (0.00181)
                Observations            765                  738
                       2
                adj. R                 0.558                0.514

Table 6: Parameter estimates of SEM model with τ = 0.54
t-statistics in parentheses: * p<0.10, ** p<0.05, *** p<0.01

diﬀerence is small, and (b) as expected, the correlation between the two SEM estimates
is high. Figures 8 and 9 conﬁrm the close correspondence by comparing the respective
poverty maps.




                                         29
                                ELL    BP (NSO)     BP (Public)

                  ELL           1.000
                  BP (NSO)      0.910 1.000
                  BP (Public)   0.883 0.987         1.000
             Table 7: Correlation matrix of estimates of poverty from
             ELL, BP (NSO), and BP (Public) at the TA level.




Figure 6: SEM BP (Public) (left) and SEM BP (NSO) (right) estimates of poverty at
the TA level. National parks and game reserves are masked as “protected areas”



                                        30
Figure 7: Spatial distribution of diﬀerences between BP (NSO) and BP (Public) (left)
and between ELL and BP (public) estimates of poverty (right) at the TA-level. National
parks and game reserves are masked as “protected areas”.

    We observe larger diﬀerences when comparing standard errors (SEs) of the small
area poverty estimates obtained with public versus private NSO coordinates, see Ta-
ble 8 which reports selected summary statistics of the respective SEs at the diﬀerent
administrative levels. Lower SEs are obtained when the public coordinates are used to
estimate the spatial regression model, notably for poverty estimates at the more aggre-
gate levels. At ﬁrst glance it seems counter-intuitive that the use of randomly scrambled
coordinates (when compared to using exact coordinates) yields more precise estimates.
This observation is consistent, however, with the fact that scrambling destroys part of
the spatial correlation structure that is present in the data (and underlying errors).
An under-estimation of the spatial correlation structure in errors will lead to an over-

                                           31
estimation of statistical precision, meaning that the estimated SEs obtained using the
private NSO coordinates will arguably provide more accurate estimates of precision.


                             BP (NSO)     BP (Public) OLS (Public)

                 EA       0.155           0.165          0.186
                 TA       0.039           0.0.038        0.036
                 Region   0.0245          0.0199         0.011
                 National 0.0230          0.0178         0.0075
               Table 8: Mean standard errors (SEs) of poverty estimates
               from BP (NSO), BP (Public), and OLS for diﬀerent levels
               of aggregation.




    Provided that the spatial correlation structure is only partially destroyed by scram-
bling the coordinates, this explanation would predict that SEs obtained with public
coordinates would be higher than SEs obtained using OLS which would fully ignore
any spatial correlation structure in the errors, yet lower than SEs obtained with the
private NSO coordinates (where the spatial correlation structure is fully accounted for).
Fully ignoring the spatial correlation structure, as the OLS model does, would lead to
a comparatively larger downward bias in SEs at more aggregate levels (as errors are
incorrectly assumed to average out) – relative to SEs obtained with the spatial re-
gression models. Both predictions are born out, as can be seen in Table 8 (which also
includes estimates of SEs obtained with OLS). Standard statistics measuring the degree
of spatial correlation found in the residuals are reported in Table 6, which conﬁrm that
the errors from the regression model with public coordinates exhibit a weaker spatial
correlation structure.


5    Concluding remarks
This study investigates whether an accurate poverty map could be obtained by com-
bining remote sensing data that are available in the public domain with household
consumption survey data. This approach to poverty mapping takes advantage of the
fact that geo-referencing survey locations (i.e. Primary Sampling Units; PSUs) have
become standard in recent years and that remote-sensing data from which predictors of
poverty can be derived have become widely available. It should be noted, however, that

                                           32
the publicly available geo-coordinates of survey locations are generally subjected to a
random oﬀset (i.e. are scrambled) in order to protect the anonymity of survey respon-
dents. A second objective of this study therefore is to investigate whether these random
oﬀsets undermine the ability of this approach to produce accurate poverty maps.
    Assessing the success of this alternative approach to producing a poverty map will
depend on how success is measured. When the ability to capture the geography of
poverty is used as the metric of success, our analysis suggests that estimating small area
poverty using remote sensing data oﬀers a promising alternative to building a poverty
map using population census data. In our application to Malawi the two approaches
are found to produce maps that show nearly identical geographies of poverty. When
the objective is to identify poverty in a speciﬁc district or set of districts of the country,
however, we ﬁnd that poverty estimates obtained using remote sensing data can deviate
substantially from the benchmark estimates obtained using population census data. In
summary, the value of poverty maps built using remote sensing data alone will depend
on the needs of the decision makers.
    On the empirical question of whether randomly oﬀ-setting survey coordinates (to
protect anonymity of survey respondents) may undermine the accuracy of a poverty map
derived from remote-sensing data, our ﬁndings echo the comparison with census-based
estimates. The random scrambling of the geographic coordinates does not meaningfully
alter the estimated geography of poverty, yet may have meaningful impact on estimates
for individual districts and may furthermore lead to an over-estimation of statistical
precision. The latter observation arguably stems from the fact that the random oﬀsets
partially destroys the spatial correlation structure in the data and error term.
    Our validation study conﬁrms that for many practical purposes accurate poverty
maps can be obtained in countries where population census data are not available.
Given that remote sensing data are globally available and are typically updated on an
ongoing basis, this also opens the door to updating poverty maps on a higher frequency
– whenever new household survey data become available. On a practical note, since
the remote-sensing-based models are estimated at the survey PSU level (rather than
household level), with the area-level poverty rate as dependent variable (rather than
household income or consumption), experimenting with alternative poverty measures or
alternative poverty lines would require returning repeatedly to the basic modeling stages
of the analysis. This is not the case in the census-based approach, where the model is
estimated at the household level. In this sense the remote-sensing-based approach can
become rather burdensome.
    It remains to be veriﬁed whether the remote-sensing-based approach is equally suc-
cessful in heavily urbanized settings. Small, though not necessarily low-population,


                                             33
administrative units found in cities may be more diﬃcult to model, particularly when
wealthy and poor neighborhoods are in close proximity to each other. Diﬀerences across
such neighborhoods are conceivably harder to capture using the satellite images that
are currently available in the public domain. On a positive note, the technology un-
derlying satellite imagery data is rapidly evolving, and with it the artiﬁcial intelligence
algorithms that derive usable data from the images (see e.g. Jean et al., 2016; Burke et
al., 2021). In conclusion, validating the approach in diﬀerent locations will be impor-
tant. Now that we established that the approach works well in a country like Malawi,
it remains to be veriﬁed whether it is equally successful in countries with diﬀerent lev-
els of development, diﬀerent levels of urbanization and diﬀerent degrees of population
density.




                                            34
Appendix 1: Variance of the prediction error
Introduction
Consider a simple data generating mechanism:

                                       y = f (x, θ) + ε,

with the assumptions that ε and x are independent, realizations of the DGP are in-
dependent, and Eε = 0, Eε2 = σε   2
                                    . We want to estimate the precision of predicting
y conditional on x, using parameter values θ   ˇ and σˇ 2 for θ and σε2
                                                                        , consistently esti-
mated from an appropriate sample of (y, x) observations. If (y0 , x0 ) is an out-of-sample
observation and y0 is predicted by y            ˇ) the prediction error is:
                                   ˇ0 = f (x0 , θ

                            y0 − y                         ˇ) + ε0 .
                                 ˇ0 = f (x0 , θ) − f (x0 , θ

     The prediction error consists of two parts, which we will term model error f (x0 , θ) −
         ˇ) and idiosyncratic error ε0 . Under the model assumptions (and assuming a
f ( x0 , θ
typical sampling scheme for the sample of (y, x) observations on which the estimator
for θ is based) the two error components are independent and a consistent estimator
for the mean square prediction error is:

                              MSE(ˇ                 ˇ)) + σ
                                  y0 ) = V (f (x0 , θ     ˇ 2 I,

where:
                                                                   ˇ))2
                                     ˇ)) = E (f (x0 , θ) − f (x0 , θ
                          V (f (x0 , θ
                                                                                  ˇ
and where the latter expectation is based on the (asymptotic) distribution of the θ
estimator (typically using the ‘delta method’ if f is nonlinear).


Prediction errors in the SEM model
Denote by qS and qO the vectors of poverty rates in survey locations and non-survey
locations respectively. Predictions of qO are based on out-of-sample application of the
SEM model:
                                      q = Xθ + u,

                                     ˆS of qS , due to sampling variation:
where we have imperfect observations q

                                   ˆS = XS θ + uS + eS .
                                   q



                                               35
With this data generating process the best linear unbiased prediction of q is:

                                        ˆS ) = Xθ + E (u|uS + eS )
                               E (q |X, q

for both in-sample and out-of-sample locations, and uS = qS − XS θ. Let the idiosyn-
                                            ˆS be denoted by r:
cratic part of prediction error given X and q

                                      r = q − E (q |X, q
                                                       ˆS ),

with variance matrix ErrT .
    An empirical implementation involves sampling variation of estimators for θ and
hence estimates of the residual uS + eS = qˆS − XS θ. Let θˇ denote the maximum-
likelihood estimator for θ. Then:

                           q    ˇ + E (u|uS + eS = q
                           ˇ = Xθ                          ˇ).
                                                   ˆS − XS θ

With normally distributed residuals (uS , eS , uO ), E (u|uS + eS ) is linear in uS + eS . Be-
                    ˇ are asymptotically independent for consistent and asymptotically
              ˇ and θ
      ˆS − XS θ
cause q
normal estimators of θ, the sampling variance of q    ˇ, given X and q   ˆS can be estimated
by:
                                q |X, q
                             V (ˇ     ˆS ) = XV (θ ˇ)X T + Z,
                                                            ˇ

where Z ˇ denotes the sampling variance of u
                                           ˇ = E (u|uS + eS = q       ˇ). Combining the
                                                              ˆS − XS θ
sources of idiosyncratic and model error, the total covariance matrix H of prediction
error q − qˇ is estimated by:

                                         ˇ)X T + Z
                                 H = XV (θ       ˇ + ErrT .

Below, these three components will be discussed in turn.


Estimation
              ˇ ρ, σ 2 , τ )
Estimating V (θ,

The SEM model is estimated by maximum likelihood (ML), assuming normally dis-
tributed error terms throughout. Under these assumptions, and given the variance
                    2
parameters ρ and σu   as well as exogenous V (eS ), ML estimation of θ amounts to GLS.
Moreover, asymptotically the full covariance matrix of the full parameter estimator is
                                                     2
block diagonal in the components relating to ρ, σu     and θ respectively. In practice we
                                            2
concentrate the likelihood function on (ρ, σu ), estimating the variance parameters and


                                               36
their covariance matrix ﬁrst. The estimator for θ and its covariance matrix then follows
from GLS.

           ˇ = V (ˇ
Estimating Z      uO )

Given the residual process u = σ (I − ρW )−1 ε, we have:

                                      Σ = EuuT = σ 2 CC T ,

                                                        −τ
where C = (I − ρW )−1 . Recall that for i = j , Wij = Tij  with T a (scaled) symmetric
matrix of distances between locations i and j , and Wii = 0. Partition the variance
matrix Σ according to in-sample locations (S ) and out-of-sample locations (O) so that
we can write:
                                        ΣSS ΣSO
                                Σ=                   .
                                       ΣOS ΣOO
Also, let Σ·S denote the ﬁrst ’column’ of this partition:

                                                   ΣSS
                                          Σ·S =            .
                                                   ΣOS

       Assuming normally distributed ε and survey error eS , it follows that:

                          E (u|uS + eS ) = Σ·S [ΣSS + DS ]−1 (uS + eS ),

where DS is the variance matrix of eS , established separately, based on the survey
design, and taken as a given matrix of parameters.10 The empirical counterpart is:

                                u        ˇ SS + DS ]−1 (ˆ
                                  ˇ · S [Σ
                                ˇ=Σ                             ˇ),
                                                        qS − XS θ

and is subject to model error – the topic of this sub-section.
   We approximate the MSE of u    ˇ using the Delta method. Let operator d() indicate
taking (inﬁnitesimal) deviations from the true parameter values. Then:

           d(ˇ      ˇ ·S [ Σ
             u) = d(Σ      ˇ SS + DS ]−1 )(ˆ       ˇ) + Σ
                                           qS − XS θ           ˇ SS + DS ]−1 d(ˆ
                                                        ˇ ·S [ Σ                        ˇ).
                                                                               qS − X S θ

To simplify notation, write A = Σ·S [ΣSS + DS ]−1 and b = q  ˆS − XS θ, with obvious
                    ˇ ˇ
derived meaning of A, b. Then, denoting actual deviations from true parameter values
  10                                                                       2
    Under the model assumptions DS stems from variation in XS and σε         . Alternatively, it could
                                              2
therefore be argued that DS is increasing in σ . We ignore this complication here.




                                                  37
by ∆, we get:

 ˇ ≈ E ∆(A
 Z               ˇ)T + EA∆(ˇ
                           b)∆(ˇ
         ˇ)bbT ∆(A                          ˇ)b∆(ˇ
                               b)T AT + E ∆(A    b)T AT + EA∆(ˇ      ˇ)T . (4)
                                                              b)bT ∆(A

In the expression above, symbols without accent indicate ‘true’ values, i.e, constants
with respect to the expectation operator. In actual computation they are replaced by
estimated values. The two last terms of the equation vanish asymptotically, because
∆(A ˇ) and ∆(ˇ
             b) are asymptotically independent with zero expectation.11
    The second term of equation (4) evaluates to:

                  b)∆(ˇ
              EA∆(ˇ                        ˇ)T X T AT ≈ AX
                                       ˇ)∆(θ
                      b)T AT = AXS E ∆(θ        S
                                                               ˇ)X T A
                                                        ˇ S V (θ
                                                                  S
                                                                     ˇT .

                                                        ˇ). Note that:
For the ﬁrst term of equation (4) we need to work out d(A

                                           1         1       1
                                     A=      2
                                               Σ·S [ 2 ΣSS + 2 DS ]−1
                                           σ        σ       σ

and that Σ/σ 2 does not depend on σ 2 . In what follows Σ∗ and DS∗
                                                                   will refer to Σ/σ 2 and
DS /σ 2 . Also, when there is no danger of confusion, we will omit accents from symbols.
It follows that:

        d(A) = d(Σ∗    ∗     ∗ −1       ∗     ∗     ∗ −1
                  ·S [ΣSS + DS ] ) = d(Σ·S )[ΣSS + DS ]  + Σ∗      ∗     ∗ −1
                                                            ·S d([ΣSS + DS ] ),


where: d(Σ∗
          ·S ) is the ·S sub-matrix of:


                                         dΣ∗ = d (I − ρW )−2 .

Likewise,

              d([Σ∗     ∗ −1       ∗     ∗ −1    ∗        ∗    ∗     ∗ −1
                  SS + DS ] ) = −[ΣSS + DS ] [d(ΣSS ) + dDS ][ΣSS + DS ] ,


with d(Σ∗                            ∗
        SS ) the SS sub-matrix of d(Σ ) given above. To evaluate these expressions, it
  11
    Independence follows from (i) the normality assumptions; (ii) linearity of ∆(A         ˇ) in (∆ (ˇ
                                                                                                     ρ) , ∆ σˇ2 )
                   ˇ          ˇ                                             2
and linearity of ∆(b) in ∆(θ); (iii) the information matrix I (θ, ρ, σ , τ ) of the data generating process is
block diagonal, I θ, ρ, σ 2 , τ = diag(I (θ) ; I (ρ, σ 2 , τ )). See e.g., Mardia, K. V.; Marshall, R. J. (1984).
“Maximum likelihood estimation of models for residual covariance in spatial regression”. Biometrika.
71 (1): 135–46.




                                                       38
can be veriﬁed that:

                           ∂    ∗
                             2
                               DS                     ∗
                                  = −σ −4 DS = −σ −2 DS
                          ∂σ
                            ∂ ∗
                               Σ = (CW C )C T + C (CW C )T ,
                           ∂ρ

where C = (I − ρW )−1 .
  Then we have:

                                            ∂ ∗
                                 ∆Σ∗ =          Σ ∆ρ
                                           ∂ρ
                                  ∗
                                ∆DS              ∗
                                        = −σ −2 DS ∆σ 2 ,

so that:

                  ∂ ∗
     (∆A)b =          Σ   [Σ∗        ∗ −1
                             SS + DS ] b ∆ρ
                 ∂ρ ·S
                                       ∂ ∗
                −Σ∗     ∗      ∗ −1
                  ·S [ΣSS + DS ] [        Σ               ∗
                                               ∆ρ − σ −2 DS ∆σ 2 ][Σ∗     ∗ −1
                                                                    SS + DS ] b
                                      ∂ρ SS
                 ∂ ∗                    ∗ −1 ∂
              =     Σ·S − Σ∗     ∗
                            ·S [ΣSS + DS ]     Σ∗      ∗     ∗ −1
                                                SS [ΣSS + DS ] b∆ρ
                 ∂ρ                         ∂ρ

                  + Σ∗    ∗     ∗ −1 −2 ∗  ∗     ∗ −1   2
                     ·S [ΣSS + DS ] σ DS [ΣSS + DS ] b∆σ .



    Finally, when working out E (∆A) bbT ∆AT , the (co)variance between ∆ρ and ∆σ 2
is taken from the corresponding components of the covariance matrix of the estimators
for the model’s variance parameters.

                  T
Estimating E [ro rO ]

                               ˆS ), the idiosyncratic variance ErrT is consistently esti-
Finally, using r = u − E (u|X, q
mated by:
                           V (r) = Σ ˇ −Σ        ˇ SS + DS ]−1 Σ
                                          ˇ ·S [ Σ             ˇ S. .



Appendix 2: Source of Remote Sensing Variables
We describe below the data sources and provide an overview of the speciﬁc variables
that are constructed for the remote-sensing based analysis in this paper.
   Initial conditions:
   The capital is distinct from other areas due to its centrality of political decisions and
public disbursement. We measure the bird-ﬂight (Euclidean) distance from the center

                                            39
point of each administrative area to the coordinates of the capital (from the Global
Insights (v 6.1) data), which we call dist2cap. Likewise, we calculate distance from the
center point of each administrative area to the border (dist2border ) using the boundary
ﬁle provided by the NSO.
    In consideration of the initial conditions of administrative areas for regional trade, we
construct distance to major rivers (dist2major river ) and distance to coast (dist2coast ).
Also here we use the center point of the administrative areas as the points of origin and
destination. Major rivers and coastline data are from Natural Earth12 .
    In addition, we consider the terrain with regards to the elevation, slope and rugged-
ness. The elevation data are from the Shuttle Radar Topography Mission (SRTM)
Digital Elevation dataset (version 4) and is measured in meters at 3 arc-seconds and 30
arc-seconds (approximately 1km2 ) (Jarvis et al. 2008). We summarize mean and max:
elev and elev max30. We derive elevation in populated places using a 1% threshold level,
which is approximately 10 people per km2 from the Landscan 2012 dataset (described in
the population count and density section). We take advantage of the slope function on
Google Earth to process the 3 arc-seconds elevation data into a slope variable: slope.13
Figure A2.1 (left panel) illustrates the variation of slope, where the Great Rift Valley is
seen to run from North to South. The Ruggedness Index we use is developed by Nunn
and Puga (2012) which measures the level of ruggedness in hundreds of meters of eleva-
tion diﬀerence for each grid cell. We summarize the pixel mean (rugged ) and maximum
(rugged max ) of this ruggedness index evaluated over the entire administrative areas
as well as over populated areas (rugged popmask and rugged max popmask ). Populated
areas are deﬁned as areas with any population in the Landscan 2012 population model
(described in population count and density section below). Figure A2.1 (right panel)
shows the spatial variation of the log of rugged.
    Agriculture:
    Agriculture and arable land can indicate higher economic growthpotential, or con-
versely, imply a lack of urbanization and infrastructure. Agricultural activities provide
evidence of economic activity, which in turn may indicate a presence of water infras-
tructure. In order to measure agricultural land-use we use the Global Hybrid dataset
(0611-2012 V2) produced by Fritz et al. (2015), which estimates the percentage share of
land used for agriculture within a one square kilometer pixel. Expert assessment of ﬁve
existing global land cover products along with national and subnational crop statistics
as inputs provides the likelihood of a pixel indicating agricultural land use. By mul-
  12
     Large    scale   (1:10    million)   vector   data   are   available   for    download   from:
http://www.naturalearthdata.com/downloads/
  13
     Similarly, Verdin et al. (2007) derive slope measures at 30-arc-seconds derived from the 3-arc-
seconds SRTM data.


                                                40
    Figure 8: Map of ln slope (left panel) and map of ln ruggedness (right panel)

tiplying this percentage of agricultural likelihood by the total pixel area, we obtain a
measure of total agricultural land use area in a pixel. We then aggregate these to obtain
agricultural land-use data at the administrative level. The total share of agricultural
land-use within the total administrative land area is labeled sh ag land. Furthermore,
we consider irrigated and rainfed crop area data from GFSAD1000 V1.0, which are de-
rived from a variety of sources including: multi-sensor remote sensing data, secondary
data and ﬁeld-plot data (Teluguntla et al. 2015). We process these data in Google
Earth Engine and divide by the area to obtain: rain crop sh and irrigated crop sh (see
Figure A2.2). We also consider the Global Map of Irrigation Areas (version 5.0), which
provides the area equipped for irrigation expressed in hectares per cell.
    Measures of greenness consider the intensity or density of vegetation within a dis-
trict over time, which identiﬁes areas of vegetation including agricultural land and forest
cover. Greenness may be aﬀected by land fertility, irrigation, precipitation and climate.
The most common measure of greenness is the Normalized Diﬀerentiated Vegetation


                                            41
Figure 9: Map of distance to major river (left panel) and map of share of rainfed crop
(right panel)

Index (NDVI), which is derived from remote sensing data. NDVI values range between
-1 (indicating complete absence of vegetation) and 1 (indicating the greatest intensity
of vegetation). We examine two sources of NDVI data. The ﬁrst source is hosted by
Google Earth Engine at an annual level, which is the result of a composite of L1T
orthorectiﬁed scenes from Landsat 7 using the computed top-of-atmosphere (TOA)
reﬂectance.14 We derive ﬁve measures that summarize the data for 2008 at the admin-
istrative area level: minimum (ndvi min ), mean (ndvi mean ), median (ndvi median ),
standard deviation (ndvi sd ), and maximum (ndvi max ). Figure A2.3 (left panel) illus-
trates the greenness across Malawi with high values in the North near Lake Malawi and
in the South in the Shire River catchment. The second source of NDVI data is provided
by the U.S. National Aeronautics and Space Administration (NASA) Global Inventory
Modeling and Mapping Studies (GIMMS v.3) at a bi-monthly frequency. It measures
 14
      For more details on the TOA computation, see Chander et al. (2009)



                                                 42
greenness over 8 square kilometers pixels (Zhu et al. 2013).15 Similarly, we derive ﬁve
measures that summarize the data for 2010 at the administrative area level: minimum
(ndvi min ), mean (ndvi mean ), median (ndvi median ), standard deviation (ndvi sd ),
and maximum (ndvi max ). We also consider a similar greenness measure EVI. We use
the EVI MODIS BRDF-corrected imagery (MCD43B4) monthly product available on
Google Earth Engine as a result of a gap-ﬁlled method by Weiss et al. (2014). We
calculate the monthly mean for 2008 and summarize it at the administrative level.




       Figure 10: Map of median NDVI (left panel) and map of PET (right panel)

   The Food and Agricultural Organization (FAO) and IIASA extract agricultural
data from the Global Agricultural Ecological Zones (GAEZ v3.0) (Fischer et al. 2012).
The database includes estimates of yield for various crops (e.g. cotton), inputs, water
sources (rain-fed or irrigated). It also includes aggregated information on Net Primary
Production, total crop production value per hectare, and total crop production value.
The FAO also produces the Food Insecurity, Poverty and Environment Global GIS
 15
      Jim Tucker (NASA) kindly provided us the data.


                                                43
database (FGGD). The FGGD 2005 data includes a suitability index for rain-fed crops
measured at a spatial resolution of 5 arc-minutes (10 km) (Van Velthuizen et al. 2007).16
    We also include the production of plant biomass using the measurement of Net Pri-
mary Productivity (NPP), which is the result of all the carbon used in photosynthesis
minus the loss of carbon from the maintenance respiration. Google Earth Engine pro-
vides the cloud-corrected MODIS product (MOD17A3) data at 1km spatial resolution
summarized at the annual level from the 45 8-day Net Photosynthesis (PSN) products.
Evaluating the mean level at the administrative level yields the variable npp.
    Naidoo and Iwamura (2007) derive a global map of gross economic rents from agri-
cultural lands (in 2000 US dollars per hectare per year) by integrating spatial infor-
mation on crop productivity, livestock density, and producer prices from FAOSTAT
(http://faostat.fao.org; accessed June 2005). We extract these economic rents by pixel
and summarize the mean over the administrative unit (agr opp cst ).
    Climate:
    Climate provides the initial conditions to agricultural production and human settle-
ment. Increasing change in climate will arguably aﬀect conditions to support agricul-
tural production. In a low production scenario, Hertel et al. (2010) ﬁnd that the prices
for major staples rise 10–60% by 2030, which would have heterogenous impacts on the
poor depending on local changes in agricultural production and the sources of income
for households in the area. In the case of drought and urban ﬂoods, Winsemius et al.
(2015) ﬁnd evidence that the poor already are overexposed to natural hazards of ﬂoods
and droughts and are vulnerable to changes in climate.
    BioClim is a global gridded database at 30 arc-second (1km2) spatial resolution of
bioclimatic variables that are derived from monthly temperature and rainfall values
from cleaned weather station data and elevation. The bio-physical model interpolates
these meteorological variables and provides derivative variables of seasonality over the
base period of 1960 to 1991 (Hijmans et al. 2005)17 . We extract these 1km2 pixels for
annual mean temperature (bio1 in degrees Celsius * 10), temperature seasonality (bio4
in standard deviation * 100), annual precipitation (bio12 in mm * 10), precipitation
seasonality (bio15 in coeﬃcient of variation * 10), precipitation of wettest quarter
(bio16 in mm * 10) and precipitation in driest quarter (bio17 in mm). We aggregate
the data to obtain mean values for each administrative area. In Figure A2.4 (left
panel), we see the eﬀect of elevation in the higher temperature of the Great Rift Valley
  16
     We modiﬁed negative values that represent pixels classiﬁed as urban, closed forest or irrigated to
equal zero for the mean calculations by administrative area.
  17
     For the interpolation of noisy multi-variate meteorological data, the authors use thin plate smooth-
ing splines from the ANUSPLIN program with latitude, longitude, and elevation as independent vari-
ables.



                                                   44
compared to higher elevation areas. Figure A2.4 (right panel) shows a higher level of
temperature seasonality in the South compared to the North. We also use measures from
a daytime and nighttime product that is derived from MODIS land surface temperature
data (MOD11A2) and a gap-ﬁlling algorithm by Weiss et al. (2014): oxmap ngtlst
and oxmap ngtlst. The diﬀerence between the nighttime and daytime land surface
temperature is stored in the variable delta lst08.




Figure 11: Map of annual mean temperature (bio1) (left panel) and map of temperature
seasonality (bio4) (right panel)

   The Extraterrestrial Solar Radiation (ET solrad) monthly data (Zomer et al. 2008)
provide a measure of evaporation.18 We summarize mean values (at the administrative
area level) for the months of January, April, July and October. Likewise, from the
same website, we make use of annual average Potential Evapotranspiration data that
  18
     ETRA (mm/month) data are calculated using the methodology by Allen et al. (11998) and are
available from the International Food Policy Research Institute (IFPRI) website http://www.cgiar-
csi.org/data/global-aridity-and-pet-database (accessed 2017-03-06) and see Global Geospatial Poten-
tial EvapoTranspiration & Aridity Index: Methodology and Dataset Description for additional details.


                                                45
measures the amount of evaporation that would occur given suﬃcient water resources
over the period 1950 to 2000. This variable is labeled pet and plotted in Figure A2.3
(right panel).
    In addition, we extract an aridity index and potential evapo-transpiration from
models created by Trabucco and Zomer (2009). The models use data from WorldClim
(Hijmans et al. 2005) as input parameters. The global mean Aridity index is Mean
Annual Precipitation divided by Mean Annual Potential Evapo-Transpiration for the
period from 1950 to 2000. High values of this index represent humid conditions, while
low values represent arid conditions. A generalized climate classiﬁcation scheme by
UNEP (1997) suggests: hyper arid and arid are values less than 0.2, semi-arid has
values 0.2-0.5 and dry sub-humid and humid have values above 0.5.19
    Environment:
    The environment provides context to human settlements and surrounding economic
activities. Land cover provides important information on the extent and type of human
activity. We use the MODIS land cover products to select the crop and urban areas
from the Annual International Geosphere-Biosphere Programme (IGBP) classiﬁcation.
These data area are the result of both supervised classiﬁcations of MODIS Terra and
Aqua reﬂectance data and subsequent post-processing reﬁnements for speciﬁc classes
from prior knowledge and ancillary information. We select the crop and urban classes:
crop and urb. The forest data are from Hansen et al. (2013) version 1.6 that provide
both the stock of the forest from 2000, which is deﬁned as “Tree canopy cover for year
2000, deﬁned as canopy closure for all vegetation taller than 5m in height”, and forest
loss, which is deﬁned as “a stand-replacement disturbance (a change from a forest to
non-forest state)”: forloss and forest.
    To measure pollution, we use the ﬁne particulate matter (PM 2.5) of air pollution
data estimated from a model that excludes dust and sea-salt particles (van Donkelaar
et al., 2015). The model uses geographically weighted regressions along with monitor-
ing and satellite data, including: aerosol composition and land use information. We
obtain the mean (pm25 ), sum (pm25 sum) and standard deviation (pm25 sd ) for each
administrative unit.
    Socio-economic: Night time lights
    Night time lights (NTL) data, the visible light emitted from earth at night captured
by satellites, are found to correlate strongly with population density and economic
activity (e.g. Ebener et al., 2005; Henderson et al., 2012). The NTL high gain data
come from the Defense Meteorological Satellite Program DMSP-OLS, which are made
 19
   See Table 2 in CGIAR-CSI Global Aridity Index (Global-Aridity) and Global Potential Evapo-
Transpiration (Global-PET) Climate Database (2009)



                                             46
publicly available by NOAA20 . Speciﬁcally, we use the radiance calibrated data that is
available for the years 1996, 2005 and 2010. The variables used in our study sum all of
the NTL that is emitted within an administrative level. Although the 1996 and 2010
night time lights are from diﬀerent sensors, we observe an increase in the spatial extent
of light areas (Figure A2.5).




Figure 12: Map of ln night time lights (1996) (left panel) and map of ln night time
lights (2010) (right panel)

    Socio-economic: Population count and density
    Because of the relative eﬃciency in the construction of infrastructure to support
service delivery in higher population dense urban areas compared to rural areas, ur-
  20
     NOAA DMPS-OLS has another night time light data product that provides a radiance correction
and corrects for the top-coding problem, which is present in the high gain night time lights data.
Corrections by Elvidge et al. (2013) and Wu et al. (2013) provide corrections due to the lack of inter-
calibration of the satellite, however we are only interested in cross-sectional data. Currently, the low
gain data are not available at a consistent annual basis necessary for this time-series analysis. Image
and data processing by NOAA’s National Geophysical Data Center. DMSP data collected by US Air
Force Weather Agency.


                                                  47
ban areas play an important role in providing service delivery, which in turn inﬂuences
poverty. Diﬀerent measures of urbanity have been proposed in the literature (e.g Dijk-
stra and Poelman, 2014; Aubrecht et al., 2016; Roberts et al., 2017). We take advantage
of geospatial subnational gridded global data to measure travel time to major cities,
the agglomeration index, population count, and built-up area.
    We construct population density and totals from two main sources of global model
population distribution data: Landscan (e.g. Bright et al., 2013) and GHS-Pop (Freire
et al., 2016).21 Landscan is a model that uses dasymetric and spatial modeling methods
to estimate population. The model takes advantage of multiple sources of spatial mi-
crodata including land cover, road, slopes and urban areas data, and population census
data. Figure A2.6 illustrates the spatial distribution of the total population across the
enumeration area.
    GHSpop data are derived from a raster-based dasymetric model estimating global
population. Using newly processed LandSat data to provide multiple periods in time
(1975, 1990, 2000 and 2014), this model leverages newly available built-up data (GHS-
BUILT) to deﬁne the locations where the GHSpop allocates population. The estimates
of population are from the highest number of spatial units available to account for the
total population known across subnational areas. We construct the following variables of
total population within the administrative unit: GHS poptot 1975, GHS poptot 1990,
GHS poptot 2000, and GHS poptot 2014. GHS-BUILT at 300m spatial resolution is
the source of built-up area within each administrative unit from which we derive the
variables: GHSL built km2 1975, GHSL built km2 1990, GHSL built km2 2000, and
GHSL built km2 2014.
    The share of urban population within an administrative unit is constructed as fol-
lows. We ﬁrst deﬁne urban population as the number of people within a given threshold
of density from the 2014 GHS population grid (ghspop14 ) and the Landscan 2012 pop-
ulation grid (lpop12 ). Next we use the World Urban Prospects estimates of the urban
share at the country level provided by the World Bank World Development Indicators
to inform a population density threshold that sums to the estimated urban share at the
country level22 . Finally, we construct the share of urban population that meet these
density criteria within the administrative unit as the urban share of total population
(urban lpop12 ).
    Socio-economic: Travel time and accessibility
    Transport infrastructure enables improved economic growth by lowering costs of
moving goods and people. Nelson (2008) derives the travel time from all land area to
 21
    For a recent review of gridded population datasets see Leyk et al. (2019).
 22
    Another method of country consistent urban rates from World Urbanization Prospects is the
iUrban method by Aubrecht et al. (2016).


                                             48
major cities, which are deﬁned by a city size of 50,000 people or greater (circa 2000).
They use global roads data with a travel time model based on road functional class.
Remote sensing data are the input to the terrain classiﬁcation. We extract these data at
30 arc-seconds (approximately 1km2 ) and label the variable tt50k2000 mean. Following
the methodology by Uchida and Nelson (2008), Berg et al. (2017) construct a travel
time model using circa 2010 data of global roads and land cover (tt50k2010 mean ).
Figure A2.6 plots the mean travel time to major cities which highlights transportation
infrastructure network between cities such as Blantyre.




Figure 13: Map of ln population (2012), source: Landscan (left panel) and map of ln
travel time to major cities (ca. 2010) (right panel)

    Uchida and Nelson (2008) developed an Agglomeration Index (AI) which measures
the area of land that satisﬁes the following criteria: (1) travel time of 60 minutes or less
to a city of 50,000 people or greater, and (2) population density of at least 150 people
per square kilometer. (They use the model of travel time from Uchida and Nelson (2008)
and the population model from Landscan 2004.) We sum the data at the administrative


                                            49
area level and label it ai50k2000. Roberts et al. (2017) provide an updated AI circa
2010 using the newer travel time model and the population model from Landscan 2012
(ai50k2010). We construct the variable of the total population in the AI for each period
and, subsequently, the share of population in an AI out of the total population of the
administrative area, respectively ai50k2000 lpop04 and ai50k2010 lpop12.


Appendix 3: ELL (log) consumption model

                                                      Coeﬃcient    Std. Err.
           Intercept                                  10.5881      (0.1237)
           EA M HHD HAS BICYCLE                       -0.1779      (0.0595)
           EA M HHD HAS CAR                           1.2011       (0.2212)
           EA M HHD HAS PHONE                         0.2678       (0.0836)
           EA M HHD HAS TV                            -0.4604      (0.1471)
           EA M HOH RESID 5YR                         0.1522       (0.0797)
           EA M HOH TR SENA                           -0.3658      (0.0513)
           EA M HOH TR TUMBUKA                        0.1242       (0.0555)
           EA R CHLDBRN LSTYR FEM 30 45               0.2403       (0.0688)
           HHD AGE 00 05 ALL 0                        0.1749       (0.0199)
           HHD AGE 00 05 ALL 1                        0.0722       (0.0155)
           HHD AGE 06 15 ALL 00                       0.0715       (0.0207)
           HHD AGE 06 15 ALL 01                       0.0649       (0.0167)
           HHD AGE 65 ALL 0                           0.0352       (0.0214)
           HHD COOK WOOD OTHER 1                      -0.1877      (0.0276)
           HHD EDU SECONDARY 0                        -0.0814      (0.0261)
           HHD FLOOR SOFT 1                           -0.1754      (0.0203)
           HHD HAS BICYCLE 1                          0.098        (0.0136)
           HHD HAS COOKER 1                           0.1738       (0.0467)
           HHD HAS FRIDGE 1                           0.2897       (0.0444)
           HHD HAS ITN 1                              0.1469       (0.0126)
           HHD HAS RADIO 1                            0.1934       (0.0134)
           HHD HAS TV 1                               0.2631       (0.027)
           HHD OCCUPIED 06 15 0                       0.034        (0.0154)
           HHD ROOF GRASS 1                           -0.1072      (0.0181)
           HHD ROOMS 01                               -0.2334      (0.0236)
           HHD ROOMS 02                               -0.1608      (0.019)


                                          50
                             Coeﬃcient   Std. Err.
HHD ROOMS 03                 -0.0815     (0.0182)
HHD SIZE 01                  1.2754      (0.0375)
HHD SIZE 02                  0.7781      (0.0336)
HHD SIZE 03                  0.5297      (0.0273)
HHD SIZE 04                  0.3551      (0.0223)
HHD SIZE 05                  0.21        (0.0196)
HHD SIZE 06                  0.1046      (0.0187)
HHD TOIL NONE 1              -0.2377     (0.032)
HHD TOIL PIT 1               -0.1515     (0.0262)
HHD WALL MUDBRCK 1           -0.0376     (0.0146)
HOH AGE LN                   -0.0597     (0.0236)
HOH EDU PRIMARY 1            0.1332      (0.0208)
HOH EDU SECONDARY 1          0.1766      (0.0346)
HOH EDU SOMEPRIM 1           0.1047      (0.0157)
HOH EDU SOMESEC 1            0.2069      (0.023)
HOH EDU TERTIARY 1           0.2779      (0.0455)
HOH OCCUPIED 1               0.117       (0.0194)
HOH OCC PRIVATE 1            -0.0605     (0.0209)
HOH OCC SELFAGRO 1           -0.0588     (0.0146)
HOH RELIG MUSLIM 1           0.0513      (0.0203)
STRATA 1                     0.3166      (0.0796)
STRATA 2                     0.4951      (0.0722)
STRATA 3                     0.5221      (0.0714)
STRATA 4                     -0.0796     (0.0465)
STRATA 5                     0.2127      (0.0272)
TA REMOTE100K                -0.017      (0.0098)
 URBAN$HHD HAS BICYCLE 11    -0.125      (0.0353)
 URBAN$HHD HAS RADIO 11      -0.0983     (0.0339)
 URBAN$HHD ROOF GRASS 11     0.1054      (0.0475)
 URBAN$HHD ROOMS 101         -0.4033     (0.0682)
 URBAN$HHD ROOMS 102         -0.2877     (0.0563)
 URBAN$HHD ROOMS 103         -0.2806     (0.0543)
 URBAN$HHD ROOMS 104         -0.1993     (0.0535)
 URBAN$HHD WALL MUDBRCK 11   -0.0967     (0.0386)




                    51
                                                    Coeﬃcient      Std. Err.
              Table 9: ELL household (log) consumption regression
              model



                                  Coeﬃcient            Std. Err.
                    Intercept     -4.3293              (0.0726)
                    STRATA 1      0.0265               (0.1417)
                    STRATA 2      -0.2575              (0.1435)
                    STRATA 3      -0.2739              (0.1332)
                    STRATA 4      0.2134               (0.0946)
                    STRATA 5      0.2481               (0.064)
                    TA REMOTE100K -0.062               (0.0262)
                      Table 10: ELL heteroskedasticity model




References
                                                               ¨
Alderman, H., Babita, M., Demombynes, G., Makhatha, N. and Ozler, B. (2002). How
  low can you go? combining census and survey data for mapping poverty in south
  africa. Journal of African Economies, 11, number 2, 169–200.

Allen, R. G., Pereira, L. S., Raes, D. and Smith, M. (1998). Crop evapotranspiration-
  guidelines for computing crop water requirements. Irrigation and drainage paper 56.
  FAO, Rome.

Anselin, L. (2001). Spatial econometrics, A companion to theoretical econometrics,
 310330 edn.

Araujo, M., Ferreira, F., Lanjouw, P. and Ozler, B. (2008). Local inequality and
  project choice: Theory and evidence from ecuador. Journal of Public Economics, 92,
  1022–1046.

Aubrecht, C., Gunasekera, R., Ungar, J. and Ishizawa, O. (2016). Consistent yet
 adaptive global geospatial identiﬁcation of urban–rural patterns: The iurban model.
 Remote Sensing of Environment, 187, 230–240.

Baird, S., McIntosh, C. and Ozler, B. (2013). The regressive demands of demand-driven
  development. Journal of Public Economics, 106, 27–41.



                                         52
Bazzi, S. (2017). Wealth heterogeneity and the income elasticity of migration. American
  Economic Journal: Applied Economics, 9, 219–255.

Bedi, T., Coudouel, A. and Simler, K. (2007). More than a pretty picture: using poverty
  maps to design better policies and interventions. World Bank Publications.

Bell, K. and Bockstael, N. (2000). Applying the generalized-moments estimation ap-
  proach to spatial problems involving microlevel data. Review of Economics and Statis-
  tics, 82, 72–82.

Berg, C., Blankespoor, B., Li, Z. and Selod, H. (2017). Travel time to major cities.
  mimeo. The World Bank.

Blumenstock, J., Cadamuro, G. and On, R. (2015). Predicting poverty and wealth
  from mobile phone metadata. Science, 350, 1073–1076.

Blumenstock, J. E. (2016). Fighting poverty with data. Science, 353, 753–754.

Bosco, C., Alegana, V., Bird, T., Pezzulo, C., Bengtsson, L., Sorichetta, A. and Wetter,
  E. (2017). Exploring the high-resolution mapping of gender-disaggregated develop-
  ment indicators. Journal of The Royal Society Interface, 14.

Bourguignon, F. (2017). The globalization of inequality. Princeton and Oxford: Prince-
  ton University Press.

Bright, E. A., Rose, A. N. and Urban, M. L. (2013). LandScan 2012: High Resolution
  Global Population Data, US Department of Energy: http://www. ornl. gov/landscan
  edn. UT-Battelle, LLC. Oak Ridge National Laboratory.

Burke, M., Driscoll, A., Lobell, D. and Ermon, S. (2021). Using satellite imagery to
  understand and promote sustainable development. Science, 371.

Chander, G., Markham, B. L. and Helder, D. L. (2009). Summary of current radiometric
 calibration coeﬃcients for landsat mss, tm, etm+, and eo-1 ali sensors. Remote
 Sensing of Environment, 113, number 5, 893–903.

Crost, B., Felter, J. and Johnston, P. (2014). Aid under ﬁre: Development projects
  and civil conﬂict. American Economic Review, 104, 1833–1856.

Demombynes, G. and Ozler, B. (2005). Crime and local inequality in south africa.
  Journal of Development Economics, 76, 265–292.




                                          53
Dijkstra, L. and Poelman, H. (2014). A harmonised deﬁnition of cities and rural
  areas: the new degree of urbanisation. Regional Working Paper. European Comission
  Directorate-General for Regional and Urban Policy.

Donaldson, D. and Storeygard, A (2016). The view from above: Applications of satellite
 data in economics. Journal of Economic Perspectives, 30, number 4, 171–198.

Ebener, S., Murray, C., Tandon, A. and Elvidge, C. C. (2005). From wealth to health:
  modelling the distribution of income per capita at the sub-national level using night-
  time light imagery. International Journal of Health Geographics, 4, number 1.

Elbers, C., Fujii, T., Lanjouw, P., Ozler, B. and Yin, W. (2007). Poverty alleviation
  through geographic targeting: How much does disaggregation help? Journal of
  Development Economics, 83, number 1, 198–213.

Elbers, C., Lanjouw, J. and Lanjouw, P. (2003). Micro–level estimation of poverty and
  inequality. Econometrica, 71, number 1, 355–364.

Elbers, C., Lanjouw, J. and Lanjouw, P. (2005). Crime and local inequality in south
  africa. Journal of Economic Geography, 5, 101–118.

Elbers, C., Lanjouw, P., and Leite, P. (2008). Brazil within brazil: Testing the poverty
  mapping methodology in minas gerais. Policy Research Working Paper 4513. The
  World Bank.

Elbers, C. and van der Weide, R. (2014). Estimation of normal mixtures in a nested
  error model with an application to small area estimation of poverty and inequality.
  Policy Research Working Paper 6962. The World Bank.

Elvidge, C.C., Hsu, F. C., Baugh, K. E. and Ghosh, T. (2013). National Trends in
  Satellite Observed Lighting: 1992-2012, O. Weng (ed), Global Urban Monitoring and
  Assessment Through Earth Observation edn. CRC Press.

Engstrom, R., Hersh, J. and Newhouse, D. (2021). Poverty from space: Using high-
  resolution satellite imagery for estimating economic well-being. World Bank Eco-
  nomic Review.

Ferreira, F. H. G., Chen, S., Dabalen, A. L., Dikhanov, Y. M., Hamadeh, N., Jolliﬀe,
  D. M., Narayan, A., Prydz, E. B., Revenga, A. L., Sangraula, P., Serajuddin, U.
  and Yoshida, N. (2015). A global count of the extreme poor in 2012 : data issues,
  methodology and initial results. Policy Research Working Paper 7432. The World
  Bank.

                                          54
                                                              oth, G., Van Velthuizen, H.,
Fischer, G., Nachtergaele, F. O., Prieler, S., Teixeira, E., T´
  Verelst, L. and Wiberg, D. (2012). Global agro-ecological zones (gaez v3. 0) - model
  documentation. Technical Report. FAO, Rome.

Freire, S., MacManus, K., Pesaresi, M., Doxsey-Whitﬁeld, E. and Mills, J. (2016).
  Development of new open and free multi-temporal global population grids at 250 m
  resolution, Geospatial Data in a Changing World; Association of Geographic Infor-
  mation Laboratories in Europe (AGILE) edn. AGILE.

Fritz, S., See, L., McCallum, I., You, L., Bun, A., Moltchanova, E., Havlik, P. and Co.
  (2015). Mapping global cropland and ﬁeld size. Global change biology, 21, number
  5, 1980–1992.

Fujii, T. (2010). Micro-level estimation of child undernutrition indicators in cambodia.
  World Bank Economic Review, 24, number 3, 520–553.

Fujii, T. and van der Weide, R. (2020). Is predicted data a viable alternative to real
  data? World Bank Economic Review.

Gibson, J., Le, T. and Kim, B. (2017). Prices, engel curves, and time-space deﬂation:
  Impacts on poverty and inequality in vietnam. World Bank Economic Review, 31,
  504–530.

Goldberger, A. S. (1962). Best linear unbiased prediction in the generalized linear
 regression model. Journal of the American Statistical Association, 57, 369–375.

Hansen, M.C., Potapov, P.V., Moore, R., Hancher, M., Turubanova, S. A., Tyukavina,
  A., Thau, D., Stehman, S.V., Goetz, S.J., Loveland, T.R. and Kommareddy, A.
  (2013). High-resolution global maps of 21st-century forest cover change. Science,
  342, 850–853.

Henderson, C. (1953). Estimation of variance and covariance components. Biometrics,
  9, 226–253.

Henderson, J. V., Storeygard, A. and Weil, D. N. (2012). Measuring economic growth
  from outer space. American Economic Review, 102, number 2, 994–1028.

Hertel, T. W., Burke, M. B. and Lobell, D. B. (2010). The poverty implications
  of climate-induced crop yield changes by 2030. Global Environmental Change, 20,
  number 4, 577–585.

Hijmans, R. J., Cameron, S., Parra, J., Jones, P. G., Jarvis, A. and Richardson, K.
  (2005). WorldClim, version 1.3 edn. University of California, Berkeley.

                                           55
Imran, M., Stein, A. and Zurita-Milla, R. (2014). Investigating rural poverty and
  marginality in burkina faso using remote sensing-based products. International Jour-
  nal of Applied Earth Observation and Geoinformation, 26, 322–334.

Jarvis, A., Reuter, H. I., Nelson, A. and Guevara, E. (2008). Hole-ﬁlled seamless SRTM
  data, version 4, available at: http://srtm. csi. cgiar.org edn. International Center for
  Tropical Agriculture (CIAT).

Jean, N., Burke, M., Xie, M., Davis, W. M., Lobell, D. B. and Ermon, S. (2016).
  Combining satellite imagery and machine learning to predict poverty. Science, 353,
  790–794.

Kelejian, H. and Prucha, I. (2010). Speciﬁcation and estimation of spatial autore-
  gressive models with autoregressive and heteroskedastic disturbances. Journal of
  Econometrics, 157, 53–67.

Kilic, T., Serajuddin, U., Uematsu, H. and Yoshida, N. (2017). Costing household
  surveys for monitoring progress toward ending extreme poverty and boosting shared
  prosperity. Policy Research Working Paper 7951. The World Bank.

Lee, K. and Braithwaite, J. (2022). High-resolution poverty maps in sub-saharan africa.
  World Development, 159.

Leyk, S., Gaughan, A. E., Adamo, S. B., Sherbinin, A. D., Balk, D., Freire, S. and
  Comenetz, J. (2019). The spatial allocation of population: a review of large-scale
  gridded population data products and their ﬁtness for use. Earth System Science
  Data, 11, number 3, 1385–1409.

Maloney, W. and Caicedo, F. (2015). The persistence of (subnational) fortune. Eco-
 nomic Journal, 126, 2363–2401.

Marx, B., Stoker, T. and Suri, T. (2019). There is no free house: Ethnic patronage in
 a kenyan slum. American Economic Journal: Applied Economics, 11, 36–70.

Molina, I. and Rao, J. (2010). Small area estimation of poverty indicators. Canadian
 Journal of Statistics, 38, 369–385.

Naidoo, R. and Iwamura, T. (2007). Global-scale mapping of economic beneﬁts from
  agricultural lands: Implications for conservation priorities. Biological Conservation,
  140, number 1-2, 40–49.




                                           56
Nelson, A. (2008). Travel time to major cities: A global map of Accessibility, Global En-
  vironment Monitoring Unit edn. Joint Research Centre of the European Commission,
  Ispra Italy.

Nhu, C. and Noy, I. (2020). Measuring the impact of insurance on urban earthquake
 recovery using nightlights. Journal of Economic Geography, 20, 857–877.

NSO (2012). Third integrated household survey 2010-2011 - ﬁnal report. Technical
 Report. Malawi National Statistical Oﬃce (NSO) and The World Bank.

Nunn, N. and Puga, D. (2012). Ruggedness: The blessing of bad geography in africa.
 Review of Economics and Statistics, 94, number 1, 20–36.

Perez-Heydrich, C., Warren, J. L., Burgert, C. R. and Emch, M. (2013). Guidelines on
  the use of DHS GPS data edn. ICF International.

Pinkovskiy, M. and Sala-i Martin, X. (2016). Lights, camera ... income! illuminating the
  national accounts-household surveys debate. The Quarterly Journal of Economics,
  131, number 2, 579–631.

Pokhriyal, N. and Jacques, D. C. (2017). Combining disparate data sources for improved
  poverty prediction and mapping. Proceedings of the National Academy of Sciences,
  114, E9783–E9792.

Pratesi, M. and Salvati, N. (2008). Small area estimation: the eblup estimator based
  on spatially correlated random area eﬀects. Statistical Methods and Applications, 17,
  113–141.

Ravallion, M. (2018). Inequality and globalization: A review essay. Journal of Economic
  Literature, 56, number 2, 620–642.

Roberts, M., Blankespoor, B., Deuskar, C. and Stewart, B. (2017). Urbanization and
  development: is latin america and the caribbean diﬀerent from the rest of the world?
  Policy Research Working Paper 8019. The World Bank.

Searle, S., Casella, G. and McCulloch, C. (1992). Variance components. New York:
  Wiley.

Steele, J. E., Sundsøy, P. R., Pezzulo, C., Alegana, V. A., Bird, T. J., Blumenstock, J.
  and Hadiuzzaman, K. N. (2017). Mapping poverty using mobile phone and satellite
  data. Journal of The Royal Society Interface, 14.



                                           57
Tarozzi, A. (2011). Can census data alone signal heterogeneity in the estimation of
  poverty maps? Journal of Development Economics, 95, 170–185.

Tarozzi, A. and Deaton, A. (2009). Using census and survey data to estimate poverty
  and inequality for small areas. Review of Economics and Statistics, 91, 773–792.

Teluguntla, P., Thenkabail, P., Xiong, J., Gumma, M.K., Giri, C., Milesi, C., Ozdo-
  gan, M., Congalton, R., Tilton, J., Sankey, T.R., Massey, R., Phalke, A. and Yadav,
  K. (2015). Global Cropland Area Database (GCAD) derived from Remote Sensing in
  Support of Food Security in the Twenty-ﬁrst Century: Current Achievements and Fu-
  ture Possibilities, Vol. II edn. Land Resources: Monitoring, Modelling, and Mapping,
  Remote Sensing Handbook edited by Prasad S. Thenkabail. In Press.

Trabucco, A. and Zomer, R. J. (2009). Global aridity index (global-aridity) and global
  potential evapo-transpiration (global-PET) geospatial database. CGIAR Consortium
  for Spatial Information.

Uchida, H. and Nelson, A. (2008). Agglomeration index: towards a new measure of
  urban concentration. Background paper for the World Bank’s World Development
  Report 2009. The World Bank.

Van der Weide, R. (2014). Gls estimation and empirical bayes prediction for linear
  mixed models with heteroskedasticity and sampling weights: A background study for
  the povmap project. Policy Research Working Paper 7028. The World Bank.

Van der Weide, R., Rijkers, B., Blankespoor, B. and Abrahams, A. (2018). Obstacles
  on the road to palestinian economic growth. Policy Research Working Paper 8385.

Van Donkelaar, A., Martin, R. V., Brauer, M. and Boys, B. L. (2015). Use of satel-
  lite observations for long-term exposure assessment of global concentrations of ﬁne
  particulate matter. Environmental health perspectives, 123, number 2, 135–143.

Van Veelen, M. and van der Weide, R. (2008). A note on diﬀerent approaches to index
  number theory. American Economic Review, 98, 1722–1730.

Van Velthuizen, H., Huddleston, B., Fischer, G., Salvatore, M., Ataman, E., Nachter-
  gaele, F. O., Zanetti, M. and Bloise, M. (2007). Mapping biophysical factors that
  inﬂuence agricultural production and rural vulnerability. Woking Paper 11. FAO,
  Rome.

Watmough, G. R., Atkinson, P. M., Saikia, A. and Hutton, C. W. (2016). Understand-
 ing the evidence base for poverty–environment relationships using remotely sensed
 satellite data: an example from assam, india. World Development, 78, 188–203.

                                         58
Watmough, G. R., Marcinko, C. L., Sullivan, C., Tschirhart, K., Mutuo, P. K., Palm,
 C. A. and Svenning, J. C. (2019). Socioecologically informed use of remote sensing
 data to predict rural household poverty. Proceedings of the National Academy of
 Sciences, 116, number 4, 1213–1218.

Weiss, D. J., Atkinson, P. M., Bhatt, S., Mappin, B., Hay, S. I. and Gething, P. W.
 (2014). An eﬀective approach for gap-ﬁlling continental scale remotely sensed time-
 series. ISPRS Journal of Photogrammetry and Remote Sensing, 98, 106–118.

Winsemius, H. C., Jongman, B., Veldkamp, T. I., Hallegatte, S., Bangalore, M. and
 Ward, P. J. (2015). Disaster risk, climate change, and poverty: assessing the global
 exposure of poor people to ﬂoods and droughts. Technical Report.

Wu, J., He, S., Peng, J., Li, W. and Zhong, X. (2013). Intercalibration of dmsp-ols
 night-time light data by the invariant region method. International journal of remote
 sensing, 34, 7356–7368.

Zhao, X., Yu, B., Liu, Y., Chen, Z., Li, Q., Wang, C. and Wu, J. (2019). Estimation
  of poverty using random forest regression with multi-source data: A case study in
  bangladesh. Remote Sensing, 11, number 4.

Zhu, Z., Bi, J., Pan, Y., Ganguly, S., Anav, A., Xu, L. and Myneni, R. (2013). Global
  data sets of vegetation leaf area index (lai) 3g and fraction of photosynthetically
  active radiation (fpar) 3g derived from global inventory modeling and mapping studies
  (gimms) normalized diﬀerence vegetation index (ndvi3g) for the period 1981 to 2011.
  Remote sensing, 5, number 2, 927–948.

Zomer, Robert J, Trabucco, Antonio, Bossio, Deborah A and Verchot, Louis V (2008).
  Climate change mitigation: A spatial analysis of global land suitability for clean
  development mechanism aﬀorestation and reforestation. Agriculture, ecosystems &
  environment, 126, number 1-2, 67–80.




                                          59