The World Bank Economic Review, 36(2), 2022, 382–412 https://doi.org10.1093/wber/lhab015 Article Poverty from Space: Using High Resolution Satellite Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 Imagery for Estimating Economic Well-being Ryan Engstrom, Jonathan Hersh, and David Newhouse Abstract Can features extracted from high spatial resolution satellite imagery accurately estimate poverty and economic well-being? The present study investigates this question by extracting both object and texture features from satellite images of Sri Lanka. These features are used to estimate poverty rates and average expected log con- sumption taken from small-area estimates derived from census data, for 1,291 administrative units. Features extracted include the number and density of buildings, the prevalence of building shadows (proxying building height), the number of cars, length of roads, type of agriculture, roof material, and several texture and spectral features. A linear regression model explains between 49 and 61 percent of the variation in average expected log consumption, and between 37 and 62 percent for poverty rates. Estimates remain accurate throughout the consumption distribution, and when extrapolating predictions into adjacent areas, although performance falls when using fewer households to calculate estimates of poverty and welfare. JEL classification: I32, C50 Keywords: poverty estimation, satellite imagery, machine learning, big data, inequality Ryan Engstrom is an associate professor of geography at George Washington University in Washington, DC; his email address is rengstro@gwu.edu. Jonathan Hersh (corresponding author) is an assistant professor of economics and management science at Chapman University in Orange, CA, and may be reached at hersh@chapman.edu; David Newhouse is a Senior Economist at the Poverty and Equity Global Practice at the World Bank. His email address is dnewhouse@worldbank.org. This project benefited greatly from the comments of two anonymous referees and discussions with Sarah Antos, Ana Areias, Marianne Baxter, Sam Bazzi, Azer Bestavros, Jacob Bien, Kristen Butcher, John Byers, Pedro Conceição, Francisco Ferreira, Ray Fisman, Michael Gechter, Alex Guzey, Klaus-Peter Hellwig, Kristen Himelein, Selim Jahan, Matthew Kahn, Tariq Khokhar, Kala Krishna, Hannes Mueller, Trevor Monroe, Dilip Mookherjee, Vivian Peng, Pierre Perron, Hashem Pe- saran, Bruno Sánchez-Andrade Nuño, Kiwako Sakamoto, Jacob Shapiro, David Shor, Benjamin Stewart, Andrew Whitby, Nat Wilcox, Nobuo Yoshida, and seminar participants at Boston University, Chapman University, University of Southern California, Penn State, Princeton University, UNDP, The World Bank, and the Department of Census and Statistics of Sri Lanka. All remaining errors in this paper remain the sole responsibility of the authors. Sarah Antos, Benjamin Stewart, and Andrew Copenhaver provided assistance with texture feature classification. Object imagery classification was assisted by James Crawford, Jeff Stein, and Nitin Panjwani at Orbital Insight, and Nick Hubing, Jacqlyn Ducharme, and Chris Lowe at Land Info, who also oversaw imagery pre-processing. Hafiz Zainudeen helped validate roof classifications in Colombo. Colleen Ditmars and her team at DigitalGlobe facilitated imagery acquisition, Dung Doan and Dilhanie Deepawansa devel- oped and shared the census-based poverty estimates, and the authors thank Dr. Amare Satharasinghe for authorizing the use of the Sri Lankan census data. Liang Xu and Cady Stringer provided research assistance. Zubair Bhatti, Benu Bidani, Christina Malmberg-Calvo, Adarsh Desai, Nelly Obias, Dhusynanth Raju, Martin Rama, and Ana Revenga provided additional sup- port and encouragement. The authors gratefully acknowledge financial support from the Strategic Research Program and World Bank Big Data for Innovation Challenge Grant, and the Hariri Institute at Boston University. The views expressed here do not necessarily reflect the views of the World Bank Group or its executive board, and should not be interpreted as such. © The Author(s) 2021. Published by Oxford University Press on behalf of the International Bank for Reconstruction and Development / THE WORLD BANK. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com The World Bank Economic Review 383 1. Introduction Despite the best efforts of national statistics offices and the international development community, local area estimates of poverty and economic welfare remain rare. Between 2002 and 2011, as many as 57 countries conducted zero or only one survey capable of producing poverty statistics, and data are scarcest in the poorest countries (Serajuddin et al. 2015). But even in countries where data are collected regularly, household surveys are typically too small to produce reliable estimates below the Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 district level. Generating welfare estimates for smaller areas requires both a household welfare survey and contemporaneous census data, and the latter are typically available only once per decade at best. Furthermore, in many conflict areas safety concerns may prohibit survey data collection altogether. Satellite imagery has generated considerable enthusiasm as a potential supplement to household data that can help fill these data gaps. In recent years, private companies such as DigitalGlobe (now Maxar) and Airbus have rapidly expanded the coverage and availability of High Spatial Resolution Imagery (HSRI), driving down commercial prices. The startup Planet currently operates hundreds of satellites with the goal of daily coverage of the entire planet at 3 to 5 meter spatial resolution per pixel. Continued technological advances are likely to further allow social scientists to benefit from this type of imagery, which has been utilized intensively by the intelligence and military communities for decades. This paper investigates the ability of object and texture features derived from HSRI to estimate and predict poverty rates at local levels. The study area of this paper covers 3,500 square kilometers in Sri Lanka, which contain 1,291 local administrative areas known as Grama Niladhari (GN) divisions. The study employs a two-step methodology in which, for each GN, it extracts meaningful object and texture features from the satellite images, and then uses these features to model poverty, average income, or an asset index of an area. Object features extracted include the number of cars, number and size of buildings, type of farmland (plantation or paddy), the type of roofs, the share of shadow pixels (building height proxy), road extent and road material, along with contextual measures. These features are identified using a combination of deep learning–based Convolutional Neural Networks (CNN) and classification of spectral and textural characteristics. These satellite-derived features were then matched to household estimates of per capita consumptions imputed into the 2011 Census for the 1,291 GN Divisions. The article investigates five main questions: 1) To what extent can variation in GN economic well- being—headcount poverty rates defined at the 10 and 40th percentiles of national income and average GN consumption—be explained by high spatial-resolution features? 2) Which features are most strongly correlated with these measures of well-being? 3) Do these features predict equally well in poor and rich GNs? 4) Can these models predict into areas different from those in which the model was estimated? Finally, 5) how robust is the prediction model to using a smaller number of households and a single simulation per household to generate ground-truth measures of GN poverty and welfare? The study finds that 1) satellite features are highly predictive of economic well-being and explain between 35 and 60 percent of the variation in both GN average consumption and estimated poverty headcount rates; 2) built-up area and roof type strongly correlate with welfare; car counts and building height are strong correlates in urban areas, while the share of paved roads and agricultural type are strong correlates in rural areas; 3) accuracy declines only slightly in the poorest decile of villages (average consumption of $4.67 per day); 4) predicting into geographically distinct areas sees a slight reduction in accuracy but remains relatively high; and v) the predictive power of the model is highly sensitive to the number of households and the number of simulations used to generate the “ground truth” training data. This paper contributes to a growing literature exploring how remotely sensed data may be used to assess economic outcomes. The first paper, to the authors’ knowledge, that combined satellite and survey data for prediction used daytime imagery from Landsat to predict the area under corn and soybean cultivation in 12 Iowa counties (Battese, Harter, and Fuller 1988). Since then, the most popular remotely sensed measure for economic applications has been night-time lights (NTL), which measures 384 Engstrom, Hersh, and Newhouse the intensity of light captured passively by satellite. Strong correlations between NTL and GDP appear at the country level (Elvidge et al. 1997; Henderson, Storeygard,Weil 2012: Pinkovskiy and Sala-i-Martin 2016), although within a country NTL appears more strongly correlated with density than with welfare. The relationship between lights and wages or other measures of income appears weak (Mellander et al. 2013), casting doubt on its reliability as a proxy for small area estimates of economic activity. Additionally, NTL is ill-suited for identifying variation in welfare within small areas because of its low Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 spatial resolution. Even the most advanced NTL satellite, the Visible Infrared Imaging Radiometer Suite (VIIRS), has a spatial resolution at nadir of approximately 1.0 km2 .1 Indeed, this study finds that NTL captures only 20 percent of the variation in poverty or income in the same area where high resolution spatial features capture 35–60 percent of the variation. Daytime imagery has recently re-emerged as a practical source of information on welfare, in large part due to new developments in computer vision algorithms. Advances in Deep Learning such as Con- volutional Neural Networks (CNN) have the capability to algorithmically classify objects such as cars, building area, roads, crops, and roof type (Krizhevsky, Sutskever, and Hinton 2012). These objects may be more strongly correlated with local income and wealth than NTL. Furthermore, textural and spectral algorithms provide a simpler alternative to analyzing HSRI that does not rely on object classification (Graesser et al. 2012; Engstrom et al. 2015b; Sandborn and Engstrom 2016; Engstrom et al. 2019). In this approach, the spatial and spectral variations in imagery are calculated over a neighborhood of pixels to characterize the local-scale spatial pattern of the objects observed in the imagery. These measures, which this study refers to as contextual features, capture information about an area that may not be clear from object recognition alone. This paper also contributes to a literature exploring how supervised learning techniques from machine learning may be applied to unstructured data, such as images or text, to reveal information about human welfare (Donaldson and Storeygard 2016; Athey 2017; Gechter and Tsivanidis 2018). Glaeser et al. (2015) apply texture-based machine vision classification to images that are captured from Google Street View, trained using subjective ratings of the images on the basis of perceived safety. They estimate a support-vector machine model and show that the fitted model can reliably predict block-level income in New York City. Jean et al. (2016) employ an innovative transfer learning approach, in which a set of 4,096 unstructured features are extracted from the penultimate layer of a convolutional neural network that uses Google Earth daytime imagery to predict the luminosity of NTL. These 4,096 features are then used to predict the average per capita consumption of enumeration areas (villages), taken from living-standard measurement surveys using ridge regression to prevent overfitting. The resulting model explains an average of 46 percent of the variation in village per capita consump- tion, out of sample, across the four countries in which it was trained. Subsequent researchers estimated a direct end-to-end CNN to model poverty in Mexico (Babenko et al. 2017), Africa (Yeh et al. 2020) and Uganda (Ayush et al. 2020) without the NTL transfer learning stage.2 While Jean et al.’s innovative use of daytime imagery substantially improves on the use of night-time lights alone, there are two problems with its applicability to poverty measurement. First, extensions of this approach in Haiti and Nepal (Head et al. 2017) show declines in predictive performance, suggesting the NTL step in the transfer learning process may be ill suited for many areas, especially those that are primarily dark when viewed through NTL. Sec- ond, the transfer learning method is not necessarily optimal for predicting very poor areas. When the top two quintiles are excluded from their sample, restricting the sample to those below twice the international poverty line, the R2 falls to about 0.12. In contrast, this study’s method explains 48–55 percent of the 1 Pixel size can vary depending on the angle of the satellite relative to the ground site. 2 This current paper is also distinguished from a previous conference proceeding, which only uses spatial features and restricts the analysis to Colombo (Engstrom et al. 2017b). The World Bank Economic Review 385 variation in the poorest decile of villages.3 Head et al. (2017), comment on the transfer learning approach that it may be “possible that other approaches to feature engineering might be more successful than the brute force approach of the convolutional neural network.” This is precisely what this paper does. Other researchers have used CNNs to predict an asset index as the outcome variable. Yeh et al 2020 train a CNN model directly to household survey data from 23 DHS surveys in Sub-Saharan Africa, using both daytime and night-time imagery. The prediction explains roughly 70 percent of survey-measured Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 average wealth at the cluster level, out of sample. Furthermore, correlations between predictions at the district level and independent census data are only slightly weaker than the correlation between the survey data and the census data. Unlike this paper, however, Yeh et al 2020 only evaluates the ability of satellite imagery to predict a measure of asset wealth, whereas this paper predicts log per capita consumption and poverty rates directly. In addition, Yeh et al 2020 is limited to medium-resolution publicly available imagery from the Landsat satellite. The present paper also differs in that its primary results are based on regression models of distinct interpretable features, and in that way is more similar to Ayush et al (2020). This paper differs in four significant ways from Jean et al. (2016), as well as related articles that predict differences in village wealth using satellite imagery such as Yeh et al 2020. The first is that it demonstrates that satellite data can accurately predict spatial variation in local headcount poverty rates, in addition to mean per capita consumption. This is important for informing policies that explicitly seek to target areas characterized by high rates of poverty. Second, this paper uses post-Lasso estimation to predict welfare and poverty rates instead of ridge estimation, which generates unbiased estimators of welfare and poverty. The third significant difference with the existing literature is that the measures of poverty and welfare used to train the prediction model are taken from model-based estimates of welfare and poverty rates derived from census data, rather than design-based estimates derived solely from household survey data. As a con- sequence, the headline results on predictive accuracy are not comparable to other papers in the literature that train models to cluster averages derived from household surveys (Jean et al. 2016; Yeh et al 2020). In the Sri Lankan context, model-based estimates of GN-division poverty and welfare derived from the cen- sus are more precise than design-based estimates from the sample, for two reasons. First, the model-based estimates leverage census auxiliary data that contain data on far more households in each GN Division than a typical household survey. Second, the model-based estimates of per capita consumption and poverty rates are based on averages over 100 draws for each household from the distribution of unexplained welfare, which is assumed to be stochastic. For poverty rates, this allows each household to be assigned an estimated probability of being poor, which prevents information loss that occurs when each household is dichotomously classified as either poor or non-poor no matter how far from the poverty line their mea- sured welfare lies. For mean welfare, taking repeated simulations for each household virtually eliminates variation due to both unexplained shocks and measurement error in household welfare, and therefore provides a different indicator of longer-term predicted welfare. In the Sri Lankan context, the additional precision provided by incorporating both the full census data and 100 simulations per household sub- stantially improves the predictive performance of the model. This illustrates that the explanatory power of prediction models, as measured by their R2 , is highly sensitive to how precisely the “ground truth” is measured. The fourth significant difference from most of the existing literature, with the notable exception of Ayush et al (2020), is the utilization of imagery features that are based either on recognizable objects or tex- ture algorithms developed for computer vision applications. This method offers several advantages for the estimation of poverty rates. Interpretable features may provide a more transparent understanding of the 3 This is not to say the method outlined in this paper is necessarily better than a CNN approach. The two contexts are dissimilar and the authors cannot make general claims about performance. However, the method here appears to perform better for poorer households in the sample relative to the method in Jean et al. (2016) for the poorest households in their sample. 386 Engstrom, Hersh, and Newhouse underlying factors that explain geographic variation in welfare in different contexts. Additionally, features developed from HSRI, such as roads and the extent of built-up area, are useful for policy analysis in other areas as transport and urban planning. A feature-based approach can easily be extended to alternative welfare indicators, such as headcount poverty rates measured at different thresholds, without the extensive retraining that is necessary for some end-to-end deep learning methods. Finally, separating the satellite- based feature engineering from the poverty modeling stage may be a more feasible processing pipeline for Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 economists and statisticians tasked with generating small-area estimates of poverty and welfare. The paper proceeds as follows: section 2 summarizes how the data were created and presents brief summary statistics. Section 3 presents the statistical methodology. Section 4 presents the baseline models. Section 5 compares this method to a direct CNN approach or models using NTL features only. Section 6 examines the predictive power of these features with a household asset index, and examines robustness to spatially stratified cross-validation—that is predicting into novel areas. Section 7 concludes. 2. Data Description The analysis is restricted to a sample area of approximately 3,500 km2 in Sri Lanka. National coverage was not feasible due to the high cost and only partial availability of high-resolution imagery.4 The study sampled DS Divisions conditional on HSRI being available, drawing areas from urban, rural, and estate sectors.5 According to the 2012 census, population by sector in Sir Lanka is rural (77.4 percent), urban (18.2 percent), and estate (4.4 percent) (Sri Lanka Department of Census and Statistics 2012). Population by sector in the sample is rural (45.9 percent), urban (46.2 percent), and estate (7.8 percent). 2.1 Details on Satellite Imagery Figure 1 depicts the coverage area of our satellite imagery over a map of Sri Lanka. The satellite imagery consists of 55 unique “scenes” purchased from Digital Globe (now Maxar), covering areas specified in the study’s sample area. Each scene is an individual image captured by a particular sensor at a particular time. Images were acquired by three different sensors: Worldview 2, GeoEye 1, and Quickbird 2. These sensors have a spatial resolution of 0.46 m2 , 0.41 m2 , and 0.61 m2 , respectively in the panchromatic band and 1.84 m2 , 1.65 m2 , 2.4 m2 respectively in the multispectral bands. Preprocessing of imagery included pansharpening, orthorectification, and image mosaicking. Most imagery was captured in either 2011 or 2012, although some imagery from 2010 was also used.6 2.2 Details on Poverty Data Ideally village poverty and consumption statistics would be generated directly from the 2012/2013 Household Income and Expenditure Survey (HIES), a detailed survey that measures the consumption patterns of 25,000 households on approximately 400 consumption items. The survey contains an average of 8.4 households per GN Division in the 47 sampled DS Divisions, making GN Division poverty estimates that would be derived directly from the HIES imprecise. The study therefore draws on the simulations that were used to generate official DS Division poverty estimates, which draw on the 2011 Census of Population and Housing (Department of Census and Statistics and World Bank 2015). The study used these simulations to generate poverty rate and welfare estimates at the GN Division level. The methodology to derive the poverty estimates follows the traditional method employed by the World Bank (Elbers, Lanjouw, Lanjouw 2003). For each household in the census, per capita consumption was estimated based on models developed from the HIES, using household indicators that are common to 4 These data are rapidly becoming more available and less expensive as companies such Planet and DigitalGlobe expand their archives and launch newer, more precise satellites with more frequent revisit rates. 5 Sri Lanka classifies sectors as urban, rural, or estate. The estate sector refers to plantation areas of more than 20 acres with 10 or more residential laborers. Except for sample stratification, the estate sector is grouped with the rural sector. 6 More detail on the satellite imagery is provided in the supplementary online appendix. The World Bank Economic Review 387 Figure 1. Coverage Area of High Resolution Satellite Imagery Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 Source: Author’s calculation using data derived from Digital Globe. Note: Sample area shown highlighted in white. both the census and the HIES. Sixteen random effect models were estimated in the HIES, corresponding to different groupings of provinces. The models estimated using the HIES data are: ln Wic = β Xic + ηc + εic (1) Where Wic is the welfare, or per capita expenditure, of household i in cluster c, Xic is a vector of predictor variables common to the census and survey, ηc is a random cluster effect, and eic is a household-specific error term, both of which are assumed to be normal. Feasible GLS is used to estimate the variance of the household-specific error term in order to account for heteroscedasticity. With estimates for β , the variance of ηc , and household-specific estimates of the variance of εic in hand, welfare for each household is simulated 100 times. Households are considered poor in each simulation if their simulated welfare falls below the poverty line and the GN poverty rate equals the average of the poverty indicator across 388 Engstrom, Hersh, and Newhouse simulations and households for each GN. The procedure is described in more detail in Department of Census and Statistics and World Bank (2015). Similar methods can be used to derive the poverty gap, denoted as P1 and defined as the product of the headcount poverty rate and the average relative shortfall of poor households (Foster, Greer, and Thorbecke 1984): 100 q ∗ 1 Z − Wics Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 P1 = (2) 100 ∗ N Z s=1 i=1 Where N is the number of households in the GN, q is the number of poor households in the GN, Z is the ∗ poverty line, and Wics represents simulated welfare for household i in cluster c in simulation s. Therefore, in the baseline specification, the study averages poverty rates over the one hundred simulations. Later, as a robustness check, poverty rates are considered for only one imputation. The study calculates GN headcount poverty rates and poverty for two poverty lines: poverty line 1 at the 10th percentile of the national per capita consumption distribution, and poverty line 2 at the 40th percentile. This is equivalent to $3.00 and $5.13 per day respectively in 2011 PPP terms, which is higher than the global extreme poverty line in 2011 prices of $1.90 per day. Imputing welfare into the census requires an assumption of spatial homogeneity within small areas. This assumption “may severely underestimate the variance of the error in predicting welfare estimates at the local level in the likely presence of small-area heterogeneity in the conditional distribution of expenditure or income” (Tarozzi and Deaton 2009). To test the extent of spatial heterogeneity in practice, small area estimates of poverty have been compared to census-based measures in Mexico and Brazil, which each collect income information in their census. Considerable spatial heterogeneity is present in Mexico.7 In contrast, Elbers et al. (2008) find significantly less in Minas Gerais, Brazil. The effect of spatial heterogeneity on the results presented is unclear. The authors are not aware of any empirical estimate of the extent to which spatial heterogeneity assumption leads to biased poverty headcount estimates at the local level. To the extent that any additional noise in the poverty estimates due to uncaptured heterogeneity in the coefficients is independent across neighboring households within a GN, this noise would be significantly reduced after averaging over a large number of households. 2.3 Comparison of GN Poverty Rates and Mean NTL Reflectance A simple visual comparison between mean NTL and GN poverty rates illustrates why NTL provides limited information on subnational welfare. Figure 2 presents a panel of three images for the Western Province, Sri Lanka: mean raw NTL (left), poverty rates derived from the 10 percent national income threshold (middle), and log of mean population density (right). Comparing the left and middle panels, there is a modest association between villages that have low NTL reflectance and those that are high in poverty. Problems of overglow (Henderson, Storeygard, Weil 2012) could result in poor villages adjacent to wealthy ones being misclassified as nonpoor. While NTL tracks the general contours of poverty for the DS—lower poverty areas in the northwest and higher poverty areas in the southeast—this coarse association is of only limited use for public policy. The statistical correlation between GN NTL and population density is equally modest, at about 0.30. The study takes this to suggest that the information content contained within NTL related to human welfare is limited. While lights at night may indicate gross associations, it is an imperfect measure of welfare. The study therefore investigates whether the much richer set of information contained in HSRI daytime imagery translates into more accurate welfare predictions. 7 Simulations indicate that in 10 percent of municipalities, the coverage rate of the estimated poverty rate is less than 50 percent. In other words, in these 10 percent of municipalities, confidence intervals from simulations that estimate headcount rates exclude the true poverty rate in more than half the simulations. The World Bank Economic Review 389 Figure 2. Comparison of Mean Night Time Lights (NTL), Poverty Rate, and Mean Population Density, Western Province, Sri Lanka. (a) Average night time lights (NTL). (b) Average headcount relative poverty rate using 10th percentile of national income. (c) Population density Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 Source: Author’s analysis based on 2012/13 Sri Lankan HIES, 2012 Census, and VIIRS NTL. Note: Average headcount relative poverty rate using 10th percentile of national income. 2.4 Feature Extraction from High Resolution Satellites The derived high-resolution spatial features fall into seven broad categories: (1) Agricultural Land, (2) Cars, (3) Building Density and Vegetation, (4) Shadows (building height proxy), (5) Road and Transportation, (6) Roof Type, and (7) Textural and Spectral characteristics. In addition to the satellite features, two geographic attributes of the GN Division are used: whether it is administratively classified as an urban area, and its area in square kilometers.8 Table 1 presents summary statistics for all variables. Deep learning–based object classification was used for classifying the share of the GN division that is built-up (i.e., consists of buildings), the number of cars in the GN, and the share of pixels in the GN that were identified as shadow pixels (proxy for building heights), and crop type. The classification method used is similar to Krizhevsky, Sutskever, and Hinton (2012), which utilizes convolutional neural networks (CNN) to build object predictions from raw imagery. Roof type, paved and unpaved roads of different widths, and railroads were classified using a combination of Trimble eCognition and Erdas Imagine software, utilizing a combination of support vector machines and visual identification. Classifier accuracy is greater than 90 percent for all of the objects recognized. Details on the extraction and classification process are provided in detail in the supplementary online appendix, which includes an example ROC curve for buildings. 2.4.1 Object Classification Details The agricultural land variables consist of the fraction of GN agriculture identified as paddy (rice cultivation) or plantation (cash crops such as tea). These sum to one hundred percent for GNs with agricultural land, so the excluded category in subsequent regressions is GN Divisions with no agricultural land. The study also calculated the fraction of total GN area that is either paddy, plantation, or any agriculture. Figure 3 shows an example of a developed area building classification, with raw image shown at the top and CNN classification accuracy shown below. On the bottom panel, true positives are highlighted green, with false positives highlighted red. Figure 4 shows a sample car classification. Cars 8 An urban indicator and area could in principle be calculated using remote sensing alone. 390 Engstrom, Hersh, and Newhouse Table 1. Grama Niladhari Summary Statistics Mean Sd Min Max Economic well-being Avg consumption in Rs 10274.2 3052.7 4881.9 21077 Avg log consumption 9.19 0.28 8.49 9.96 Rel. pov. rate at 10% nat. cons. 0.0903 0.066 0.0023 0.39 Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 Rel. pov. rate at 40% nat. cons. 0.332 0.16 0.035 0.8 Geographic descriptors log area (square meters) 14.73 1.01 12.1 18 = 1 if urban 0.304 0.46 0 1 province = = [1] Western 0.587 0.49 0 1 province = = [3] Southern 0.255 0.44 0 1 province = = [6] North-Western 0.0643 0.25 0 1 province = = [7] North-Central 0.0155 0.12 0 1 province = = [8] UVA 0.0782 0.27 0 1 Agricultural land % of GN area that is agriculture 16.8 0.15 0 94 % of GN agriculture that is paddy 44.4 37.5 0 100 % of GN agriculture that is plantation 46.38 37.8 0 100 % of Total GN area that is paddy 8.629 10.9 0 74.7 % of Total GN area that is plantation 8.168 11 0 94.1 Cars log number of cars 3.123 1.44 0 8.3 Total cars divided by total road length 0.00556 0.01 0 0.17 Total cars divided by total GN Area 0 0.00007 0 0.00093 Building density and vegetation % of area with buildings 7.817 6.82 0.13 33.9 % shadows (building height) covering valid area 6.509 6.01 0.31 34.9 Vegetation index (NDVI), mean, scale 64 0.427 0.21 0 0.86 Vegetation index (NDVI), mean, scale 8 0.566 0.24 0 0.99 Shadows ln shadow pixels (building height) 12.96 1.04 7.31 17.6 ln number of buildings 6.90 0.92 0 9.3 Road variables log of sum of length of roads 9.445 0.94 1.47 13.1 fraction of roads paved 38.3 28.7 0 100 ln length airport roads 0.013 0.33 0 9.25 ln length railroads 1.098 2.67 0 10.8 Roof type Fraction of total roofs that are clay 36.5 22 0 100 Fraction of total roofs that are aluminum 14.08 7.06 0 71.9 Fraction of total roofs are asbestos 7.766 11.3 0 71.2 Textural and spectral characteristics Pantex (human settlements), mean 0.627 0.54 0.02 2.94 Histogram of oriented gradients (scale 64m), mean 3509.4 2070.3 129.1 10381 Linear binary pattern moments (scale 32m), mean 49.5 1.1 18.1 49.5 Line support regions (scale 8m), mean 0.00836 0.004 −2E-07 0.035 Gabor filter (scale 64m), mean 0.469 0.28 0.014 1.3 Fourier transform, mean 84.34 17.8 4.51 113.4 SURF (scale 16m), mean 12.06 7.77 0.13 31.6 Observations 1291 Source: GN poverty statistics are derived from the 2012/13 HIES and 2012 Sri Lankan Census. Satellite features are derived from Digital Globe imagery. The World Bank Economic Review 391 Figure 3. Example Developed Area (Buildings) Classification. (a) Raw satellite imagery. (b) Satellite imagery overlaid with building classification Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 Source: Digital Global. Note: Areas in green show are true positive building classifications. Images in red are false positives: erroneously classified areas as buildings. Figure 4. Example Car Classification Source: Digital Globe. Note: Cars identified by convolutional neural network shown in blue. that are positively identified are shown circled in blue. False negatives are most prevalent where there is considerable tree masking of pixels. Three car-related variables were calculated—the log total number of cars in a GN, total cars divided by total road length, and cars per square kilometer of the GN. The average GN Division in the sample contains 50 cars. However, there is wide dispersion, as the 99th percentile of the car count distribution is equal to 577 cars, and the maximum value is 4,000 cars. On the left side of the distribution, 136 out of 392 Engstrom, Hersh, and Newhouse 1291 GNs contain no cars. Because the distribution is skewed, the study takes the log of the car count, while imposing a smooth function for GNs with zero or few cars.9 Building density variables include the fraction of an area covered by built-up area and the number of roofs identified, built up area captures any human settlements—buildings, homes, and so forth— regardless of use or condition. These are grouped with two measures of the Normalized Difference Vegetation Index (NDVI). Although technically a spectral characteristic, the presence of vegetation in Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 urban areas indicates development such as parks, trees, or lawns (i.e., areas that are not built up) within the urban environment. In the rural environment it also indicates undeveloped areas, and the values can aid in describing variations in agricultural type and productivity depending on the timing of the image acquisition. The study includes two indicators that capture shadows of buildings: the log of the number of pixels classified as shadow as well as the fraction of shadows in a GN. The shadow variables use the angle of the sun as it shines on a building, and the shadows it displaces, to estimate the presence of tall buildings.10 The road variables that are calculated are the log of total road length, fraction of roads that are paved, and length of airport runway and length of railroad identified. For roof type, the study calculates the fraction of roofs in a GN that are either clay, aluminum, asbestos—with the omitted category being roofs that are identified as none of the above—the vast majority being gray cement roofs. Different roof materials exhibit different spectral properties, particularly in the subvisible bands of the spectrum. The roofs in the sample are clay (36.5 percent), aluminum (14.08 percent), asbestos (7.8 percent), or gray concrete (41.6 percent). 2.4.2 Details of Textural and Spectral Features (Contextual Features) The study calculates seven separate types of contextual features: Fourier transform, Gabor filter, His- togram of Oriented Gradients (HoG), Line support regions (LSR), Pantex, and Speed-Up Robustness Features (SURF). These are often used in computer vision problems to decompose an image. They are intended to capture aspects of a neighborhood that are not so easily identified directly, including the presence of characteristics associated with slums such as many irregular building lines or high density. These features may be considered outputs from a dimension reduction technique, in that they are reduced dimensionality descriptions of a complex 2-D satellite imagery. Because these measures may be novel to readers without backgrounds in remote sensing, further description may be helpful. The authors consider Pantex here to be a measure of human settlements. It is a spatial similarity index, where each cell is compared to adjacent cells in all directions. Open fields will have a low Pantex level, since cells in all directions have similar contrast, as will cells with straight roads. Dense cities with many buildings will have high Pantex values. HOG captures “local intensity gradients or edge directions” (Dalal and Triggs 2005) and in context here captures intensity of lines of development or agriculture. Local binary patterns (LBPM) captures local spatial patterns and gray scale contrast. SURF detects local features used for characterizing grid patterns, and measures orderliness of building develop- ment, the opposite of which is typically referred to as a slum. Areas with right angles, corners, or areas with regular grid patterns, will have larger SURF values relative to areas with chaotic or irregular spacing. For more detail on imagery and the feature extraction process, see the supplementary online appendix. 3. Statistical Methodology Given the list of available covariates, variable choice is not obvious. Estimating a model with the full set of candidate variables in table 1 would likely produce predictions that are overfit, in the sense that they 9 The log car variable is calculated as the log of the sum of the car count and the square root of the car count plus one. 10 Valid area refers to areas at the foot of building where shadows may appear. The World Bank Economic Review 393 perform much better in-sample than out-of-sample (Athey and Imbens 2015). One attractive method for variable selection among a large selection of covariates is Lasso regularization. Lasso is a regularized regression that estimates a regression model with an added constraint that enforces parsimony (Tibshirani 1996). The motivation for the shrinkage estimator is that, by reducing the parameters of the model, bias is increased bias at the expense of lower variance. The baseline model is a “Post-Lasso” estimator (Belloni and Chernuzhukov 2013). This two-step esti- Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 mator first estimates a Lasso model over the full set of coefficients, followed by an OLS model over the set of non-zero coefficients from the Lasso step. The model that is estimated in the Lasso step is defined as ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎛ ⎞2 ⎪ ⎪ ⎪ ⎨ N K K ⎪ ⎬ βLasso = arg min ⎝yi − xi j β j ⎠ + λ |β j | (3) β ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ i=1 j =1 j =1 ⎪ ⎪ ⎪ ⎩ ⎪ ⎭ Residuals Shrinkage f actor where the poverty rate in a GN is given by yi and λ ≥ 0 is a parameter that penalizes the absolute values of the coefficients. At the extreme, full relaxation of the penalization factor, that is setting λ to zero, yields unconstrained OLS estimates. Thus as λ → 0, βLasso → βOLS . As λ → ∞, the penalty increases and βLasso converges to the zero vector. Lasso regression is useful as a variable selection methodology because the sharp l1 metric shrinks variables exactly to zero if they prove not to be useful in decreasing the sum of squared errors, thus creating a type of variable selection. However, simultaneously the Lasso “shrinks” the magnitude of coefficients towards zero, even for those that remain nonzero (Varian 2014). By subsequently estimating an OLS model for variables that remain nonzero after a Lasso model in the second stage, the study ensures that the coefficient estimates are unbiased (Belloni and Chernuzhokov 2013). To choose the appropriate value of λ, 10-fold cross-validation is applied, and the value of λ is chosen that minimizes root-mean squared error (RMSE) plus one standard error of estimated λ across folds.11 GLM versions of the model, which ensures that predicted values lie in between zero and one, do not change the results qualitatively and are available by request. Inferential standard errors are typically absent from Lasso models. Because of the Oracle property of the Lasso estimator (Fan and Li 2001), the standard errors from the OLS model are used in the second stage as the measures of population inference. The Oracle property ensures that inference in the second stage using the reduced set of variables selected in the first stage is consistent with inference were the study to use a single stage estimation strategy using only the selected variables present in the true data-generating process (Belloni and Chernuzhukov 2013). 4. Results Table 2 presents the estimates from the main specification for the full sample. The first two columns show the model where GN poverty is defined at the lower poverty rate, the next two present the higher poverty rate models, and the next two present average GN consumption dependent variable models. Many extracted satellite features have high explanatory power, including agriculture type, length of roads and fraction of roads paved, number and density of buildings, NDVI, roof type, shad- ows (building height proxy), and two spatial features, LBPM, and Fourier transform. The models explain a high amount of the variation in poverty, summarized in the in-sample R-squared values between 0.608 and 0.618. Cross-validated R-squared, estimated using tenfold cross-validation, vary between 0.588 and 0.605. It is concluded from the results that the models are not likely to overfit to the data. 11 See Krstajic et al. (2014). 394 Engstrom, Hersh, and Newhouse Table 2. Prediction of Local Area Poverty Rates Using High-Res Spatial Features Lower poverty rate Higher poverty rate Average log per capita (10% Nat. Inc.) (40% Nat. Inc.) consumption Coef t coef t coef T log area (square meters) 0.020* [2.52] 0.0093 [0.60] −0.0079 [−0.31] Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 = 1 if urban −0.023 [−1.80] −0.037 [−1.06] 0.08 [1.18] % of GN area that is agriculture −0.00025 [−1.04] −0.00017 [−0.27] % of GN agriculture that is paddy −0.00033** [−2.97] −0.00087** [−2.97] 0.0014** [2.92] % of GN agriculture that is plantation −0.00021** [−2.84] −0.00059* [−2.66] 0.0012** [2.72] % of Total GN area that is paddy −0.00019 [−0.58] −0.00083 [−1.10] 0.0016* [2.10] Total cars divided by total road length −0.31 [−1.17] Total cars divided by total GN Area 29.6 [0.54] log number of cars −0.0059 [−0.89] −0.015 [−1.39] 0.024 [1.60] log sum of length of roads −0.020*** [−3.64] −0.027* [−2.32] 0.033 [1.67] fraction of roads paved −0.00035*** [−4.24] −0.00079** [−3.24] 0.0014** [3.06] ln length airport roads −0.0051 [−1.45] 0.022 [1.52] ln length railroads 0.00098 [1.31] −0.0046 [−1.26] % of area with buildings −0.0027* [−2.31] −0.0093* [−2.34] 0.020* [2.56] log of total count of buildings in GN −0.0090** [−2.71] −0.019* [−2.05] 0.029 [1.70] Vegetation index (NDVI), mean, scale 64 0.061* [2.20] 0.14** [2.94] −0.21** [−2.93] Vegetation index (NDVI), mean, scale 8 −0.064** [−2.80] % shadows (building height) 0.0022* [2.04] 0.0064* [2.18] −0.013* [−2.27] ln shadow pixels (building height) 0.016* [2.51] 0.039* [2.64] −0.047 [−1.95] Fraction of total roofs that are clay 0.00077** [3.35] 0.0017** [3.25] −0.0027** [−3.15] Fraction of total roofs that are 0.00091*** [3.63] 0.0022** [3.15] −0.0040** [−3.15] aluminum Fraction of total roofs are asbestos −0.00033 [−1.08] Linear binary pattern moments (scale 0.0021** [2.91] 0.0090*** [5.53] −0.017*** [−5.92] 32m) mean Line support regions (scale 8m), mean −0.66 [−0.87] Gabor filter (scale 64m) mean −0.052 [−1.53] Fourier transform, mean 0.0017** [3.42] SURF (scale 16m), mean −0.0014 [−0.94] −0.001 [−0.59] 0.0034 [1.06] Constant −0.32** [−3.03] −0.31 [−1.43] 10.1*** [29.9] Observations 1291 1291 1291 R-sq 0.610 0.618 0.608 R-sq Adj. 0.602 0.613 0.602 Cross-validated R-sq 0.588 0.605 0.594 Cross-validated mean absolute error 0.032 0.078 0.139 Source: GN poverty statistics are derived from the 2012/13 HIES and 2012 Sri Lankan Census. Satellite features are derived from Digital Globe imagery. Note: Unit of observation is Grama Niladhari (GN) division. Variables were selected using Lasso regularization from the candidate set of variables shown in table 1. *p < 0.05, **p < 0.01, ***p < 0.001. Estimated standard errors may be biased downward due to the use of a common sample for model selection and estimation. The results suggest that, in words, a simple linear model that includes only the geographic size of the GN Division, whether it is urban, and remotely sensed information explains 54–61 percent of the variation across GNs in headcount poverty rates. Figure 5 plots predicted consumption against true average GN consumption, with colors assigned by province in which the GN is located. A Lowess smoothing line is shown with associated confidence interval. A perfect model would have predictions exactly on the 45° line. While there is noise, the predictions tend to straddle the 45° line indicating a high degree of agreement be- tween the predicted and true welfare values, although the model tends to under-predict for wealthier GNs. The World Bank Economic Review 395 Figure 5. Model Diagnostic Plot of Predicted against True Average GN Consumption Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 Source: Author’s analysis based on 2012/13 HIES, 2012 Census, and Digital Globe. Note: Units are in 2012 Sri Lankan Rs. 4.1 Discussion of Significance and Magnitude of Satellite Features While the primary objective of this exercise is to obtain accurate predictions, the study also wants to shed light on which satellite features are most helpful in predicting poverty in this context. Additionally, table S2.1 in the supplementary online appendix presents estimated marginal coefficients. The size of the GN, in square kilometers, is more strongly correlated with headcount or average consumption, but is statistically significant only for the lowest poverty rate model. This suggests that households in the bottom decile are disproportionately found in larger GN Divisions. The presence of agricultural land is weakly and negatively associated with poverty, controlling for other characteristics of the GN, although the result is not statistically significant. Of the indicators related to the distribution of paddy vs. plantation land, Lasso selected three of the indicators for 10 and 40 percent poverty incidence models, and two for the log consumption model.12 The results indicate a statistically significant negative relationship between the presence of paddy agricultural land and poverty, which is consistent with the relative deprivation of the tea plantation sector in Sri Lanka. Compared with land type, the association between poverty and cars is mildly stronger, although not statistically significant in any of the specifications. Length of roads, fraction of roads paved, and runways 12 Since an increase in paddy land implies a reduction in agricultural land, for those GNs with agricultural land, the latter is subtracted instead of added when calculating the marginal effect. 396 Engstrom, Hersh, and Newhouse Figure 6. Predicted versus True Welfare Measures, Average Consumption (top), 10 percent Poverty (middle) 40 percent Poverty (bottom). (a) Average household consumption. (b) Average headcount relative poverty rate using 10th percentile of national income. (c) Average headcount relative poverty rate using 40th percentile of national income. Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 Source: Author’s calculations based on 2012/13 HIES, 2012 Census, and Digital Globe. Note: Units are in 2012 Sri Lankan Rs. are negatively associated with poverty, though only the first two are statistically significant, while GNs with more railways are poorer. Building density is strongly associated with log welfare and poverty and is statistically significant in all specifications. Vegetation is moderately associated with poverty and strongly statistically significant. For the lower poverty line model, both NDVI measures are selected. The higher poverty line and log welfare models only include NDVI calculated over blocks of 64 pixels, suggesting that very high spatial resolution imagery may not be critical for generating informative measures of NDVI for prediction. Two measures of shadows, a proxy for building height, are selected—the share of valid area covered by shadows, and the log number of shadow pixels—and are statistically significant in most specifications. For roof type, the Lasso procedure selects both the fraction of roofs classified as clay and aluminum, for all three models and the post-Lasso model finds them strongly statistically significant. It also selects the fraction of roofs classified as asbestos for the lower poverty line model although this is not statistically significant. The signs on clay and aluminum in the poverty regressions are positive, suggesting that these are generally inferior compared to the omitted category of grey concrete. This appears to be consistent an analysis in Kenya that documents that roofs with greater luminosity, like aluminum, are associated with lower levels of poverty (Marx, Stoker, and Suri 2019). Of the contextual features, five out of seven are selected for the 10 percent model (LBPM, LSR, and Gabor, Fourier, and SURF). Of these, only LBPM and SURF are selected for the higher poverty line and log per capita consumption model. The coefficient on LBPM is strongly statistically significant in all of the specifications. The main exception is the mean of the Fourier transform, which is positively associated with poverty in the lower poverty line model, though the coefficient is not statistically significant. This is consis- tent with wealthier areas being laid out in a more orderly way, with more “right angles” in housing layouts. Figure 6 presents a map showing the true welfare measures on the left panel, against the predicted welfare measures on the right, for Western Province, Sri Lanka. The top panel shows predicted welfare from the OLS model against actual welfare. The model is able to distinguish the poorer eastern areas from the richer western ones. Even poor GNs adjacent to richer ones can be distinguished; although the smallest GNs are less than a half mile across, the HRSF model is able to distinguish with considerable accuracy the variation in average consumption. The middle panel shows predicted and true poverty rates defined at the lower poverty line. Again, the predicted model approximates the true poverty rates with considerable accuracy. The lower poverty regions in the south and north east are replicated in the predicted values. The model tends to underpredict poverty in the lowest poverty areas in the mid-west, suggesting that two-step or zero-inflated Poisson models may perform better. The World Bank Economic Review 397 Table 3. Shapley Decomposition of Share of Variance Explained (R 2 ) by High Resolution Spatial Feature Subgroup Lower poverty rate (10% Nat. Inc.) Higher poverty rate (40% Nat. Inc.) Average log per capita consumption Area 10.4 8.3 8.4 Urban 9.4 9.7 10.8 Agricultural land 0.9 1.0 Paddy land 3.8 4.6 4.1 Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 Cars 7.3 5.6 4.6 Building density 14.8 19.5 22.5 Vegetation 8.0 6.2 4.4 Shadows 14.4 14.1 14.0 Road variables 9.4 7.7 9.8 Roof Type 10.4 8.3 8.4 Texture variables 9.4 9.7 10.8 Observations 1291 1291 1291 R-sq 0.610 0.618 0.608 Source: GN poverty statistics are derived from the 2012/13 HIES and 2012 Sri Lankan Census. Satellite features are derived from Digital Globe imagery. Note: Agricultural variables include fraction agriculture plantation, fraction agriculture paddy, and fraction of GN area that is plantation. Car variables include log of car count, and cars per total road length. Building density variables include log of developed area, shadow count (building height proxy), fraction of GN developed, fraction covered by shadow, NDVI at scales 64 and 8. Road variables include log of unpaved road length, log of paved roads narrower than 5m, log of paved roads 5m+, log of airport roads, log of railroad length, and fraction of roads paved. Roof variables include count of roofs by type: clay, aluminum, asbestos, grey cement, and fraction of roofs of same type. Texture variables include Fourier series, Gabor, histogram of oriented gradients, Local Binary Pattern Moments mean and standard deviation, line support regions, and SURF. In summary, predictive models based on an urban indicator, the size of the GN, and a host of features derived from satellite imagery predict poverty rates and mean log per capita consumption remarkably well. Greater numbers of cars are associated with lower poverty, although the relationship is not statis- tically significant, as is a denser road network and a larger share of paved roads. The indicators most strongly associated with poverty are building density and shadows. Shadows are positively associated with poverty, which suggests they are capturing variation in tree cover that is inversely related to building density. Consistent with this, areas characterized by more and lusher vegetation tend to be poorer. Clay and aluminum roofs, compared to grey roofs, are associated with greater levels of poverty. Of the contextual features, SURF exhibits a fairly strong association with poverty at the lower poverty line, suggesting that neighborhoods laid out in a more orderly way tend to be less poor. The following sections consider the robustness of these main findings. 4.2 Decomposition of Satellite Feature Explanatory Power The results presented indicate that features derived from satellite imagery explain a large portion of village income or poverty, and that associations are particularly strong for measures of building density and shad- ows. However, these results don’t address the question of which indicators account for the model’s predic- tive power. To address this issue, the study decomposes the R2 using a Shapley decomposition (Israeli 2007; Huettner and Sunder 2012; Shorrocks 2013). This procedure calculates the marginal R2 of a set of ex- planatory variables, as the amount by which R2 declines when removing that set from the set of candidate variables. For a model with k sets of explanatory variables, the procedure will estimate 2k−1 models and average the marginal R2 obtained for each set of independent variables across all estimated models. This ensures that the variable’s contribution to R2 is independent of the order in which it appears in the model. Table 3 presents the R2 decomposition. The results confirm that measures of building density—built up area, number of buildings, shadow pixels, and to a lesser extent vegetation—are powerful contributors to predictive power. Collectively, these three sets of variables account for 39 to 45 percent of the model’s explanatory power. However, a number of other variables are moderately important. GN area, urban 398 Engstrom, Hersh, and Newhouse Table 4. Urban and Rural Models of Local Area Poverty Rates (10 percent Relative Poverty Line) using High Resolution Spatial Features Rural Urban coef t coef t % of GN area that is agriculture 0.12* [2.34] Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 % of GN agriculture that is paddy 0.00076** [3.11] 0.0002 [0.36] log number of cars 0.019 [1.27] 0.085*** [5.73] log area (square meters) −0.033 [−1.43] log of sum of length of roads 0.029+ [1.93] fraction of roads paved 0.0012** [3.44] 0.0014+ [2.06] ln length airport roads 0.044*** [6.59] ln shadow pixels (building height) −0.057** [−3.23] Fraction of total roofs that are clay −0.0041*** [−6.70] 0.0026 [1.41] Fraction of total roofs that are aluminum −0.0051*** [−5.63] −0.0033+ [−1.84] Fraction of total roofs are asbestos −0.0017* [−2.05] log of Total count of buildings in GN 0.040** [3.53] 0.031 [0.77] Vegetation index (NDVI), mean, scale 64 −0.27*** [−4.68] 0.28 [1.65] Pantex (human settlements), mean 0.18*** [3.73] Linear binary pattern moments (scale 32m), mean −0.013*** [−10.7] % of Total GN area that is plantation −0.0058** [−3.20] ln length railroads −0.0052 [−1.50] % of area with buildings 0.028*** [6.07] % shadows (building height) covering valid area −0.015** [−2.87] Line support regions (scale 8m), mean −1.4 [−0.34] Fourier transform, mean −0.0042+ [−1.96] Constant 10.6*** [36.8] 8.89*** [27.6] Observations 898 393 R-sq 0.6562 0.4464 R-sq adj. 0.6503 0.4274 Cross-validated R-sq 0.5716 0.4184 Cross-validated MAE 0.0327 0.02 + p < 0.10, *p < 0.05, **p < 0.01, ***p < 0.001. Source: GN poverty statistics are derived from the 2012/13 HIES and 2012 Sri Lankan Census. Satellite features are derived from Digital Globe imagery. classification, road characteristics, roof type, and the texture variables each explain 8 to 12 percent of the variation. The car and agricultural variables explain a bit less than that, between 5 and 7 percent each. 4.3 Urban and Rural Linear Models How does the relationship between indicators and welfare differ in urban and rural areas? Table 4 shows model estimates estimated separately for 393 urban villages and the 898 rural ones, based on Sri Lanka’s official definition of urban and rural areas.13 Variables were again selected through Lasso estimation. The urban model selects fewer variables—13 of the candidate variables in the urban model are selected versus 15 for the rural model. R-squared values are slightly higher in rural areas (0.656) and significantly lower in urban areas (0.445).14 For the urban model, log number of cars, built-up development, and shadow pixels are important. In rural models, agricultural variables, roof type, shadow pixels, NDVI, Pantex and LBPM are important. The association between cars and poverty is significantly stronger in 13 This definition is based on administrative units and has not been updated in many years. As a result, some areas officially classified as rural have urban characteristics. 14 This might be due to the presence of de facto urban GNs in the rural sample. In addition, the nature of the consumption module in the HIES, may better capture consumption in rural areas than in urban ones. The World Bank Economic Review 399 Table 5. Model Performance for Prediction of Average log per Capita Consumption at Different Points in the Welfare Distribution Bottom 20% Bottom 40% Bottom 60% Bottom 80% Full sample Observations 259 517 775 1033 1291 R-sq 0.551 0.454 0.474 0.509 0.608 Adjusted R-sq 0.52 0.436 0.461 0.5 0.602 Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 Cross-validated R-sq 0.487 0.425 0.447 0.475 0.595 Mean absolute error 0.064 0.0774 0.0909 0.115 0.139 Mean log p.c. income 8.83 8.95 9.00 9.09 9.16 Standard deviation 0.11 0.13 0.15 0.20 0.28 Source: GN poverty statistics are derived from the 2012/13 HIES and 2012 Sri Lankan Census. Satellite features are derived from Digital Globe imagery. Note: Table reports model performance statistics for the national model for different subsamples of the bottom portion of the GN Division welfare distribution. The dependent variable is average predicted log GN per capita consumption. The rightmost column is identical to the results reported in the right column of table 2. urban areas. In addition, the association between NDVI and poverty is strongly negative in rural areas, as rural areas with more vegetation and less built-up area are poorer. The coefficient on NDVI in urban areas, meanwhile, is positive and not statistically significant, suggesting that if anything wealthier urban GNs are characterized by a greater prevalence of lush vegetation. 4.4 Model Performance at Varying Income Levels The model’s ability to predict variation in headcount poverty rates at both poverty lines suggests that it can effectively distinguish between households within lower parts of the welfare distribution. To verify this, the sample of GN Divisions is divided into quintiles based on the mean predicted per capita consumption of census households, and the main model for log per capita consumption on the subsample of the bottom 80, 60, 40, and 20 percent of the distribution is re-estimated. Model performance across income quintiles is shown in table 5. Overall, the model continues to predict well within the poorest subsamples, as the R- squared declines from 0.60 in the full sample to 0.551 (in-sample) and 0.487 (cross-validated) when only considering the bottom decile. Given that the poorest decile of GNs have an average welfare of $4.67 per day, this represents a little more than double the international poverty line. This suggests that this approach for estimating welfare from high-resolution satellites images is accurate for even moderately poor contexts. 4.5 Correcting for Spatial Autocorrelation One unaddressed concern is whether the presence of either spatial autocorrelation or spatial heterogene- ity leads the standard errors to be underestimated. Spatial autocorrelation can occur in the presence of geographic spillovers or interactions (Anselin 2013), and considering the village-level observations one could develop plausible stories by which poverty is influenced by this mechanism. A Moran’s I test for the presence of such disturbances according to Anselin (1996) rejects the null hypothesis that there is no spatial autocorrelation present. To correct for the spatial autocorrelation. this study models explicitly the spatial autoregression (SAR) process and allows for SAR disturbances, a so-called SARAR model. This is implemented via a generalized spatial two-stage least-squares (GS2SLS) as shown in Drukker, Prucha, and Raciborski (2013). The results presented in table 6 show that after correcting for spatial autocorrelation most high-resolution spatial features remain significant predictors of local-area poverty. Although there is some presence of autocorrelation, it is not sufficient to alter the joint significance of the spatial variables. 4.6 Using Alternative Measures of Welfare as Ground Truth The results so far have demonstrated that indicators derived from satellite imagery are strongly predictive of variation in welfare and poverty rates, using a measure of welfare that was simulated in the 2011 400 Engstrom, Hersh, and Newhouse Table 6. MLE Estimation Correcting for Spatial Autocorrelation Average log per capita consumption coef t log area (square meters) −0.046*** [−4.01] = 1 if urban 0.048+ [1.96] Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 % of GN area that is agriculture 0.00022 [0.42] % of GN agriculture that is paddy 0.00046+ [1.74] % of GN agriculture that is plantation 0.00076** [3.09] % of Total GN area that is paddy 0.00057 [0.79] Total cars divided by total road length −0.93 [−1.20] Total cars divided by total GN Area 401.4* [2.28] log number of cars 0.020*** [3.57] % of area with buildings 0.0083*** [4.19] log of total count of buildings in GN 0.012 [1.23] Vegetation index (NDVI), mean, scale 64 0.071 [1.54] Vegetation index (NDVI), mean, scale 8 −0.042 [−0.67] log of sum of length of roads 0.029** [2.70] Fraction of roads paved 0.0012*** [6.00] ln length airport roads 0.0052 [1.50] ln length railroads −0.00092 [−0.48] Fraction of total roofs that are clay −0.0025*** [−5.83] Fraction of total roofs that are aluminum −0.0034*** [−4.92] Fraction of total roofs are asbestos 0.0014* [2.26] Linear binary pattern moments (scale 32m), mean −0.0080*** [−3.38] Line support regions (scale 8m), mean −1.25 [−0.71] Gabor filter (scale 64m) mean −0.053 [−0.92] Fourier transform, mean −0.0030*** [−3.61] SURF (scale 16m), mean 0.0052* [2.24] Constant 9.74*** [51.6] Observations 1287 Source: GN poverty statistics are derived from the 2012/13 HIES and 2012 Sri Lankan Census. Satellite features are derived from Digital Globe imagery. Note: Standard errors have been corrected according to Conley (1999), with model estimation via GMM. + p < 0.10, *p < 0.05, **p < 0.01, ***p < 0.001 census. As the dependent variable, the baseline method uses, the average welfare or poverty rates, taken across both all census households in each GN Division and the 100 simulations of predicted residuals. This average is then regressed on various features derived from satellite data. Because the dependent variable is an average taken over 100 simulations, it is a measure of expected poverty and welfare across both simulations and GN households. When estimating poverty rates, this procedure uses the estimated variance of both the cluster and household idiosyncratic variance components to incorporate the full distribution of potential outcomes into the measure of poverty rates, which allows each household to be assigned an estimated probability of being poor rather than a dichotomous classification of poor or non-poor. Averaging over the one hundred simulations per household therefore reduces the variance of the estimated poverty rates and the measure of expected per capita consumption, which raises the explanatory power of the satellite indicators. An alternative would be to compare satellite-based predictions against simulated poverty and welfare. One way to test this is to use the estimated GN mean log welfare and poverty rate for only one of the 100 simulations instead of the average across all simulations. This eliminates the additional precision obtained by averaging results for average welfare and poverty across 100 simulations of the stochastic distribution of unexplained welfare. In addition, because the simulation process is based on census data, it can be used to shed light on an additional question: How sensitive is the model’s predictive performance to the The World Bank Economic Review 401 Table 7. R2 of Predicted Poverty and Welfare under Alternative Samples Lower poverty rate Higher poverty rate Average log per (10% Nat. Inc.) (40% Nat. Inc.) capita consumption Expected poverty rate and welfare over 100 simulations Full census sample 0.611 0.619 0.609 30 household census sample 0.548 0.581 0.551 Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 8 household census sample 0.396 0.479 0.485 Poverty rate and welfare using one simulation Full census sample 0.373 0.469 0.486 30 household census sample 0.265 0.400 0.391 8 household census sample 0.167 0.273 0.348 Number of GN divisions 1291 HIES subsample of GN divisions 8 household census sample (one simulation) 0.174 0.319 0.378 HIES sample 0.217 0.259 0.322 Number of GN divisions 425 Source: GN poverty statistics are derived from the 2012/13 HIES and 2012 Sri Lankan Census. Satellite features are derived from Digital Globe imagery. Note: Cross-validated R2 reported. Unit of observation is Grama Niladhari (GN) division. Independent variables are identical to those used in table 2. Expected welfare refers to the average poverty rate or the average log per capita consumption averaged across both GN households and one hundred simulations. number of households used to generate GN-level poverty and welfare estimates. In the results reported so far, the dependent variables—poverty and welfare—are calculated using simulations based on the full set of census data. On average, the census data contain approximately 500 households per GN Division. But census data are not typically available to calibrate models. In more typical settings, surveys are used that may contain 30 households per cluster, as in the Demographic and Health surveys. In Sri Lanka’s case, the Household and Income Expenditure Survey contains an average of eight households per GN Division in the sampled areas. When predicting log per capita consumption based on census characteristics, the average R2 of the prediction models based on census characteristics is 0.46 (Department of Census and Statistics 2015). Therefore, increasing the average number of households in each GN by a factor of about 60 more than makes up for the fact that the results are based on a model, and generate far less noisy estimates. This in turn improves the predicted performance of indicators derived from satellite imagery. Table 7 shows the extent to which the predictive performance of the model depends on both the number of households and simulations used to generate the “ground truth” data used to train the model. The top row of table 7 reports the in-sample R2 when using expected welfare as the dependent variable, which is identical to the results reported in table 2. Subsequent rows report R2 s when fewer households are used to generate GN Division estimates, when one simulation instead of 100 is used per household, and when the sample is limited to GN Divisions present in the HIES survey. The bottom row of values shows R2 s when using consumption as measured in the HIES sample. The set of satellite indicators used to predict poverty and welfare is the same for each row, matching those used for the main results shown in table 2, so the only difference across rows is the dependent variable used in the regression. Four main findings emerge from the table. First, the estimated R2 value of the model is very sensitive to how the “ground truth” dependent variable is measured. Model R2 values range from 0.17 to 0.61 for 10 percent poverty rates, from 0.26 to 0.62 for 40 percent poverty rates, and from 0.32 to 0.61 for log mean consumption. The highest R2 s are obtained from the simulation model that averages across 100 simulations of the stochastic error terms for each household (expected welfare) and uses all household in the census. In contrast, the lowest R2 s are obtained either by averaging over the observations in the HIES sample survey, or in the case of the 10 percent poverty measure, when using one simulation per household based on a census subsample containing eight randomly selected households per GN Division. 402 Engstrom, Hersh, and Newhouse The second clear finding from table 7 is that using one simulation rather than one hundred simulations per household significantly decreases the predictive power of the satellite indicators, particularly when predicting the 10 percent poverty rate. With the full census, the fall in R2 is large, from 0.61 to 0.37. For the 40 percent poverty rate and mean consumption, the fall is noticeably smaller but still substantial, from 0.61 to 0.47 for 40 percent poverty and 0.61 to 0.49 for mean log per capita consumption. Using 100 simulations to estimate the probability that each household is poor is particularly beneficial when using Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 lower poverty thresholds for generating accurate estimates of poverty rates, because it more accurately estimates the probability that households will be poor when their predicted per capita consumption, based solely on their census characteristics, lie just above the poverty line. For mean log per capita consumption, the higher R2 when using 100 simulations per household reflects the greater correlation between satellite indicators and expected log per capita consumption, since virtually all of the noise added back by the simulation procedure is averaged away across the 100 simulations. Third, the use of smaller subsamples of the census to estimate poverty and welfare explains the rest of the range of R2 values. When using one simulation, the estimated R2 for 10 percent poverty falls from 0.37 to 0.17 when using eight households per GN Division instead of the full census, and the comparable drop for 40 percent poverty rates is from 0.47 to 0.27. However, for mean consumption the drop in R2 is smaller, from 0.49 to only 0.35. Finally, the explanatory power of the satellite variables is similar whether using the HIES sample itself or eight households per GN Division in the census sample and only one simulation. This remains true after limiting the estimation to the 425 GN Divisions included in the HIES sample. The only indicator in which the R2 when using the sample exceeds the simulation with eight households per GN and one simulation is the 10 percent poverty rate, in which case the R2 when using the sample is 0.22 and the R2 for the simulated poverty rate is 0.17. This suggests that, to the extent that the measure of consumption collected in the HIES survey data contains accurate information on household transient shocks, these are not captured by the satellite data either. This study’s preferred estimates of GN-Division welfare and poverty rates remain the expected welfare and poverty rates based on averages across 100 simulations and the full set of census households, because these use all available information to generate local estimates of poverty and welfare. There are three im- portant caveats that bear mentioning, however. The first is that expected welfare is a measure of predicted welfare that is largely free of measurement error and will not pick up transient shocks such as drought ex- perienced by the village. It is therefore easier to predict using satellite imagery than average consumption taken from a typical household survey, as is reflected by the increase in R2 from 0.49 to 0.61 when moving from one to one hundred simulations. A second caveat is that census-based imputations are not typically available to train prediction models, and if they are, there is no need to use geospatial data to generate predictions. However, these results do point to the importance of collecting high-quality training data. The type of training data that work well for design-based estimates may not be optimal for model-based estimates. For example, training data that cover a larger number of households in selected administrative units, or that collects welfare proxies such as assets for all households in an administrative unit to use in imputations, could be used to estimate models that fit the data better than a standard household survey. The third and final caveat is that the estimated poverty rates from the model-based simulations contain a small amount of bias. This results from the assumption that the error terms in the model describing log per capita consumption follow a normal distribution, which is typical in small area estimation exercises (Rao and Molina 2015). However, as shown tables S5 and S6 in the supplementary online appendix, the bias introduced by the modeling procedure is small relative to the increase in precision. Therefore using the model-based estimates reduces the Mean Squared Error of the estimated mean welfare and poverty rates by about 90 percent. Using model-based estimates of estimated GN-Division poverty and log per capita consumption will, therefore, generate more accurate predictions of welfare and poverty rates using satellite data. The World Bank Economic Review 403 Table 8. Estimating Poverty Gap Using High Resolution Features Poverty gap (FGT1 - 10%) Poverty gap (FGT1 – 40%) coef t coef t log area (square km) 0.0060** [2.84] 0.0063 [1.02] = 1 if urban −0.0063 [−2.00] −0.013 [−1.05] Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 % of GN area that is agriculture −0.000081 [−1.29] −0.00018 [−0.76] % of GN agriculture that is paddy −0.000087** [−3.24] −0.00033** [−3.10] % of GN agriculture that is plantation −0.000053** [−2.91] −0.00021* [−2.63] % of total GN area that is paddy −2.3E-05 [−0.29] −0.00025 [−0.88] Total cars divided by total road length −0.09 [−1.32] Total cars divided by total GN Area 9.55 [0.72] log number of cars −0.0014 [−0.83] −0.0058 [−1.24] log of sum of length of roads −0.0049** [−2.97] −0.011* [−2.48] fraction of roads paved −0.000077** [−3.37] −0.00023* [−2.67] ln length airport roads −0.00027 [−0.89] ln length railroads 0.00026 [1.35] % of area with buildings −0.00062* [−2.16] −0.0028* [−2.04] % shadows (building height) covering valid area 0.00053 [1.76] 0.0017 [1.54] ln shadow pixels (building height) 0.0037* [2.19] 0.016* [2.68] Fraction of total roofs that are clay 0.00020** [2.96] 0.00070** [3.12] Fraction of total roofs that are aluminum 0.00024** [3.31] 0.00084** [3.19] Fraction of total roofs are asbestos −9.1E-05 [−1.14] log of total count of buildings in GN −0.0022* [−2.62] −0.0073* [−2.09] Vegetation index (NDVI), mean, scale 64 0.017* [2.33] 0.056** [2.88] Vegetation index (NDVI), mean, scale 8 −0.019** [−2.95] Linear binary pattern moments (scale 32m) 0.00048* [2.55] 0.0029*** [4.87] Line support regions (scale 8m), mean −0.27 [−1.39] Gabor filter (scale 64m) mean −0.016 [−1.78] Fourier transform, mean 0.00046** [3.44] SURF (scale 16m), mean −0.00025 [−0.67] −0.0001 [−0.15] Constant −0.093** [−3.41] −0.17+ [−2.00] Observations 1234 1234 R-sq 0.5884 0.6097 R-sq adj. 0.5792 0.6039 Cross-validated R-sq 0.5855 0.6075 Cross-validated MAE 0.0080 0.0282 Source: GN poverty statistics are derived from the 2012/13 HIES and 2012 Sri Lankan Census. Satellite features are derived from Digital Globe imagery. Note: Independent variables are the same as those listed in table 2. *p < 0.05, **p < 0.01, ***p < 0.001 4.7 Do High Resolution Satellite Features Explain the Poverty Gap? The poverty gap is a useful supplement to the headcount rate for understanding poverty because it takes the depth of poverty into account. The poverty gap or FGT1 metric measures poverty depth by considering how far the poor are from a given poverty line.15 This study computes the average poverty gap for each village, and uses this measure as a dependent variable in a regression where the right-hand side includes the size of the GN, a dummy indicating urban classification, and the features created from high resolution satellite imagery. The study considers again poverty lines defined at the 10th and 40th percentiles of national consumption per capita. Table 8 presents the results estimated via OLS. The coefficients can be interpreted as a unit change in the distance between the poverty gap and the poverty line for the average village. As was the case for headcount rates, high resolution features explain the 15 The study calculates for its sample the FGT1 metric (Foster, Greer, and Thorbecke 1984), which is defined as FGT1 = 1 z−y j N i=1 ( z ), where y j is an individual’s income, and z is the poverty threshold. 404 Engstrom, Hersh, and Newhouse poverty gap well, with adjusted R2 values between 0.588 and 0.609. Not surprisingly, building density and shadow variables are also strong correlates of the poverty gap. 5. Comparison to Alternative Estimation Methods 5.1 Comparisons to Convolutional Neural Networks Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 It is important to consider how the method used in this paper is different from other approaches to model- ing poverty from satellite imagery, most notably the use of convolutional neural networks (CNNs) (LeCun et al. 1998). There is much similarity between a CNN used to model poverty directly, and the baseline method described in this paper. With a convolutional neural network, a series of filters is applied to images which produces a “feature map,” or outputs that highlight certain characteristics of the image. Using deep learning optimization methods, these filters are adjusted during training such that the model “learns” fil- ters that are useful for the prediction task. The adjustment of several layers of filters is a very data-intensive task, often requiring millions of images to appropriately learn those that produce reliable feature maps for the specific prediction task. Researchers have used CNNs to directly model poverty both using transfer learning with Night Time Lights as an intermediate step (Jean et al. 2016) and using transfer learning us- ing ImageNet weights (Babenko et al. 2017). Because poverty applications do not have access to millions of training examples, applications often use transfer learning, where weights are pre-defined from an aux- iliary image prediction task. These auxiliary tasks often have millions of image examples, usually from the ImageNet data repository. Once these filters are constructed, they are then applied to the prediction task. The baseline method used in this paper applies pre-built filters designed to recognize objects and other information from satellite images. Some of these—for the case of cars or building height—come from deep learning models. Others are filters used for specific remote sensing applications. The advantages of using pre-built satellite specific filters are: 1) The filters are specifically designed to recognize characteristics in satellite images rather than objects from still photography; 2) It maybe be more straightforward for these filters to incorporate additional information outside the visible spectrum16 ; 3) The satellite-specific features can be designed such that they carry interpretable information, such as number of cars or buildings. The disadvantage of using pre-built filters is that given that the filters are static, they cannot learn patterns unforeseen by the researchers that may be predictive of poverty. To compare the prediction accuracy of this method with direct training via a CNN the study im- plemented a standard CNN model (ResNet-50) trained against the same imagery. ResNet-50 models have been used in a variety of computer vision tasks and are generally easy to train (Akiba, Suzuki, and Fukuda 2017). The study produced rectangular tiles of its satellite images at 250 by 250 pixels, producing 4130 images for training and 1044 for validation. The outcome variable—poverty of each GN—was discretized into bins of 0.05, for instance, [0 to 0.05), (0.05 to 0.1], and so forth. This was done because classification tasks are easier than regression tasks in CNNs. Data augmentation was further used. to increase the number of imagery training samples. Two models were produced, one predicting poverty using a poverty line at the 10th percentile of national income, and one using a poverty line defined at the 40th percentile of national income. To optimize training, the study started with pre-trained ImageNet weights but re-optimized for 1000 epochs.17 16 Babenko et al. (2017) use an additional channel of infrared information from the satellite imagery. However, because weights from RGB channels are pre-trained starting at ImageNet weights, the model did not optimize to use the infrared channel. Yeh et al (2020) use multispectral Landsat bands where the RGB bands are pre-trained to ImageNet weights, and the weights for the non-RGB bands in the first convolutional layer are set to the mean RGB channel weights. The difference in ease of training non-RGB bands may be due to the use of medium- and high-resolution imagery in the first study, and lower-resolution Landsat imagery in the second. 17 The model was implemented in PyTorch and is available at github.com/jonhersh/LKA_CNN_public. The World Bank Economic Review 405 Table 9. Comparison with CNN ResNet-50 Model Dependent variable Lower poverty rate Higher poverty rate (10% Nat. Inc.) (40% Nat. Inc.) Root mean squared error (RMSE), ResNet-50 CNN predicted poverty vs. true 0.0748 0.1762 Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 Mean average error (MAE), ResNet-50 CNN predicted poverty vs. true 0.0554 0.1192 R2 , ResNet-50 CNN predicted poverty vs. true 0.3949 0.2888 Source: GN poverty statistics are derived from the 2012/13 HIES and 2012 Sri Lankan Census. Satellite features are derived from Digital Globe imagery. Note: This table shows model diagnostics estimating a CNN ResNet-50 model against the same training data in the baseline model. Imagery was tiled into 250 by 250 pixel grids, which were processed through a ResNet-50 model in PyTorch. Model is pre-trained with ImageNet weights, with the final layers unfrozen and retrained for 1,000 epochs. Final model diagnostics reflect performance at the unit of analysis, Gram Niladhari Divisions (GNs). Table 9 summarizes the results, showing in-sample prediction accuracy at the GN level. These results are comparable with results using the method outlined in this paper, if slightly worse. Results in the validation sample during training are roughly comparable to the in-sample results indicating a lack of overfitting. Cross-validation is not feasible with this study’s computational infrastructure given the large computational cost of training the model. For model of poverty with the poverty line defined at the 10th percentile of national income, the study estimates an MAE of 0.055 and an R2 value of 0.3949. For the model with the poverty line at the 40th percentile of national income, the study estimates an MAE of 0.119 and an R2 of 0.2888. The lower poverty line model produces results roughly comparable to the results in this study’s method, while the results with the higher poverty line model are slightly worse than this study’s baseline estimates. The present article does not interpret this to mean that building models using pre-built filters neces- sarily dominates a direct CNN approach. However, given the current sample sizes for poverty training datasets, and existing CNN training methods, there does not appear to be a cost in terms of reduced prediction accuracy in using interpretable features. This is a conclusion that Ayush et al. (2020) also find. As dataset size increases, it’s very likely that this result will reverse, but researchers do not appear to be currently at that inflection point. 5.2 Comparisons to Night Time Lights How does the predictive power of indicators derived from daytime imagery compare with night-time lights (NTL)? To shed light on this, table 10 presents OLS models covering the same sample area using NTL as the independent variable. The first three columns present poverty and per capita consumption models. Aggregate NTL is positively correlated with welfare and negatively correlated with poverty; however, the total explanatory power is low: R2 values for the three regressions are between 0.15 and 0.22, with performance lowest for the 10 percent head-count measure and highest for log consumption per capita. Adding higher-order polynomials up to a quartic only increases it to 0.15. Models built using high-resolution satellite indicators capture around three to four times as much variation in poverty or welfare as NTL. Columns (4)–(6) of table 4 show estimates that include DS Division fixed effects. Night-time lights are no longer significant in any of the specifications, indicating that within DS Divisions, NTL is weakly correlated with welfare. Given the prevalence, ease of use, and familiarity with NTL, one might also ask how much more explanatory power does NTL provide in addition to the indicators extracted from daytime imagery? Table 11 answers that question, by adding NTL to the Shapley decomposition. The NTL category includes average, squared, cubed, and average standard deviation of NTL. The NTL variables explain between 7 and 12 percent of the variance in per capita consumption or poverty according to the decomposition, meaning there is roughly a 90 percent additional variation in poverty or income 406 Engstrom, Hersh, and Newhouse Table 10. Model Estimates, Night Lights on Poverty/Average GN Consumption Lower poverty Higher poverty Average log per Lower poverty Higher poverty Average log per rate (10% Nat. rate (40% Nat. capita rate (10% Nat. rate (40% Nat. capita Inc.) Inc.) consumption Inc.) Inc.) consumption Avg night lights 2012 −98.47* −201.4 262.3 −26.82 −58.45 111.7 [−2.06] [−1.74] [−1.23] [−1.40] [−0.99] [−0.88] Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 Avg night lights squared −16991.5 −150061.7 457585.9 20678.3 31038.3 −54736.5 [−0.33] [−0.98] [−1.47] [−0.73] [−0.34] [−0.27] Avg night lights cubed 26286044.1 110121407.3 −266793439.7* −3062554.7 1662858 −5719566 [−1.5] [−1.97] [−2.31] [−0.38] [−0.06] [−0.09] Avg night lights std dev 0.0014 0.00298 −0.0049 0.000408 0.0000568 0.00095 [−0.87] [−0.71] [−0.62] [−0.51] [−0.03] [−0.21] Observations 1291 1291 1291 1291 1291 1291 R-sq 0.154 0.198 0.227 0.00485 0.00662 0.00758 R-sq adj. 0.151 0.196 0.225 0.00176 0.00353 0.0045 R-sq within 0.00485 0.00662 0.00758 R-sq between 0.324 0.407 0.472 R-sq overall 0.0868 0.118 0.133 Divisional Secretariat FEs No No No Yes Yes Yes Source: GN poverty statistics are derived from the 2012/13 HIES and 2012 Sri Lankan Census. Satellite features are derived from Digital Globe imagery. Note: Unit of observation is Grama Niladhari (GN) Division. All models include a regression constant which is omitted from the table. *p < 0.05, **p < 0.01, ***p < 0.001 Table 11. Shapley Decomposition by High-Resolution Spatial Feature Subgroup and Night-Time Lights Lower poverty rate Higher poverty rate (10% Nat. Inc.) (40% Nat. Inc.) Average log per capita consumption Area 10.2 8.1 8.0 Urban 8.7 8.7 9.5 Agricultural land 0.9 1.0 3.3 Paddy land 3.3 3.8 Cars 6.7 5.1 4.0 Buildings 13.0 16.7 19.0 Vegetation 8.0 6.0 4.1 Shadows 12.1 13.0 10.6 Road variables 8.0 8.0 8.5 Roof type 13.0 12.0 11.7 Texture variables 8.5 7.1 8.9 Night time lights variables 7.6 10.6 12.1 Observations 1291 1291 1291 R-sq 0.621 0.636 0.632 Source: GN poverty statistics are derived from the 2012/13 HIES and 2012 Sri Lankan Census. Satellite features are derived from Digital Globe imagery. Note: Night time lights category includes the following transformations of night time lights: average, squared, cubed, and standard deviation. Variable groupings are identical to those in table 5. that is captured through high-resolution satellite predictions. Furthermore, adding NTL marginally increases the overall R2 of the regression, by about 0.01. In this context, NTL is not a particularly accurate proxy for poverty and welfare, and adds little explanatory power to the set of available daytime indicators. The World Bank Economic Review 407 6. Extensions and Applications 6.1 Correlation with Household Asset Index Much of the existing literature that uses satellite data to predict household welfare uses household asset index values, typically estimated from a principal components analysis, as the dependent variable in the model. Asset indices are generally effective in ranking the welfare of households, especially in urban areas, but perform less well in rural households and identifying the extreme poor (Ngo and Christaensen Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 2019). Per capita consumption is also more responsive to shocks than asset indices are. Despite these shortcomings of asset indices, this study repeats the analysis using an asset index as the dependent variable, to shed light on how the nature of the dependent variable affects model fit. The study constructs a measure of nonmonetary poverty derived from a principal component analysis, equal to the score of the factor loading of several individual welfare measures. This more transparently re- flects observed welfare measures in the census. The welfare indicators and their associated factor loadings are listed in table S2.2 in the supplementary online appendix. They include six asset ownership dummy variables: home, computer, land phone, mobile phone, radio, and TV. The other variables included in the index measure are the quality of the housing floor and roof, the quality of sanitation facilities and services, the type of energy used for lighting and cooking, and the principal source of drinking water.21 The asset index is estimated based on census data from the 32 DS Divisions covered by the satellite imagery, and then averaged across all households in each GN Division. As in the per capita consumption and poverty modeling, post-lasso is used to select predictor variables from the full set of candidate variables. The results of regressing the average asset index against the lasso-selected satellite predictors are shown in table 12. The third column reports the share of the R2 explained by each different variable, according to a Shapley decomposition. Overall, the satellite features explain two-thirds of the variation in the average asset index at the GN Division level, a result quite similar to the CNN estimates reported in Yeh et al (2020). Of the variables in the regression, the variables that explain the most variation are those that are most closely related to population density. These include log geographic area, the urban dummy, built-up area, the seven spatial features, and building counts, each of which explains between 12 and 17 percent of the variation. The two NDVI measures, the two measures of shadows, and type of roof are less powerful, but each explains between 7 and 11 percent of the variation. 6.2 Poverty Estimation via Geographic Extrapolation One motivation for using satellite imagery is to extrapolate poverty estimates into areas where survey data on economic well-being are not collected. While most of the data deprivation that characterizes the developing world occurs at the country level, it is also common for surveys to omit selected regions, due to political turmoil, violence, animosity towards the central government, or prohibitive expense. For example, from 2002 through 2009/10, Sri Lanka’s HIES failed to cover certain districts in the northern and eastern parts of the country due to civil conflict, and Pakistan’s HIES exclude the Federally Administered Tribal Areas, Jammu and Kashmir. To assess how well a model “travels” to a different geographic area, the study fits a series of models, where in each model it excludes a single Divisional Secretariat (DS), a larger administrative area, from the model, and uses the estimated model to predict into that excluded area. This is a form of “leave-one-out cross-validation” (LOOCV), a common method used to infer statistical out-of-sample performance (Gentle et al. 2012), where the unit of analysis is spatially stratified from distinct units. In that manner, this is similar to the spatially stratified cross-validation procedure recommended by Deville et al. (2014) where the study spatially stratifies according to distinct DS geographical units. In fact, it is identical when the number of partitions is equal to the number of Divisional Secretariats. In their context of population mapping using mobile phone data, Deville et al. (2014) find that failing to stratify at the geographic unit overstates model performance. The present study estimates both linear models and random forest 408 Engstrom, Hersh, and Newhouse Table 12. Prediction of GN Division Average Asset Index Using High-Resolution Features Average asset index Coef T Shapley contribution log area (square meters) −0.23** −3.5 16.6% = 1 if urban 0.45*** 3.62 15.1% Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 Percent built-up 0.03* 2.41 17.1% Percent shadow −0.02 −1.91 5.9% Log number of shadows −0.04 −0.7 4.3% Log of building count 0.28*** 3.91 12.3% Share clay roofs 0.00 −0.08 4.8% Share aluminum roofs −0.01** −2.73 1.9% Share asbestos roofs 0.00 −0.35 0.6% Mean NDVI Scale 64 −0.49 −1.96 2.9% Mean NDVI Scale 8 −0.17 0.14 4.4% Pantex Scale 8 0.58*** 0.14 Hist of ordered gradients Scale 64 0.00*** 0.00 Linear binary pattern support Scale 32 −0.04*** 0.01 Line support region Scale 8 −17.43*** 4.81 14.1% Gabor Filter Scale 64 0.27 0.16 Fourier transform Scale 32 0.00 0.00 Speed-up robustness features Scale 32 0.03*** 0.01 R2 0.686 Cross-validated R2 0.669 Cross-validated MAE 0.364 Number of observations 1291 Source: GN poverty statistics are derived from the 2012/13 HIES and 2012 Sri Lankan Census. Satellite features are derived from Digital Globe imagery. Note: Unit of observation is Grama Niladhari (GN) Division. All models include a regression constant which is omitted from the table. *p < 0.05, **p < 0.01, ***p < 0.001. 14.1 percent in the right column refers to the combined Shapley contribution of the seven texture features listed between Pantex Scale 8 and Speed-up robustness features Scale 32. models18 to predict out-of-sample to determine if more flexible model specifications perform better out-of-sample. To give more detail on the spatially stratified leave one-one-cross-validation, the study starts with enumerating all Divisional Secretariats in the sample. It fits a model using all Divisional Secretariats except the first DS in the sample, and uses that fitted model to predict into the withheld DS. The study then withholds the second DS from estimation, fits a model using the remaining Divisional Secretariats, and uses that model to predict into the withheld DS. The study proceeds in this fashion until it has a prediction for each DS. Note that because Divisional Secretariats are large geographic units, this is a stronger test than non-spatially stratified sampling as the study may be predicting into areas that are geographically or economically distinct. This test is more representative of the intended application outlined. Table 13 shows model performance at predicting into novel areas, comparing predicted and true welfare rates using both random forest and linear models to fit HRSF models. The novel area prediction error rates are larger than when predicting randomly out of sample using cross-validation. The study estimates R2 values that vary between 0.488 and 0.579, which are smaller in magnitude than its baseline estimates, but not substantially smaller. While these error rates imply predicting into adjacent areas may be too imprecise for producing welfare measures intended as official statistics, they may be sufficient for generating rank ordering of villages by poverty or income. 18 For each random forest model the study uses 1000 decision trees, sampling 13 of the predictors with replacement. The World Bank Economic Review 409 Table 13. Divisional Secretariat Spatially Stratified Cross-Validation Model Performance Dependent variable Average log per Lower poverty rate Higher poverty rate capita consumption (10% Nat. Inc.) (40% Nat. Inc.) R2 , Predicted and true poverty rates, linear models with full 0.4876 0.4498 0.4586 Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 satellite variables R2 , predicted and true poverty rates, random forest models with 0.5788 0.5643 0.5510 full satellite variables R2 , predicted and true poverty rates, linear models with only 0.0652 0.0463 0.0510 night-time light variables Source: GN poverty statistics are derived from the 2012/13 HIES and 2012 Sri Lankan Census. Satellite features are derived from Digital Globe imagery. 7. Conclusion Traditionally, given the prohibitive cost of conducting surveys sufficiently large to provide accurate statistics for small areas, generating small-area poverty estimates requires pairing a welfare survey with a census or inter-census survey. Census data are expensive to collect and are therefore produced relatively infrequently. Such data are also usually disseminated with a lag, making it difficult to rapidly assess changes in local living standards. The results show that indicators derived from high spatial resolution imagery, when paired with survey data, generate accurate predictions of local-level poverty and welfare, and that by and large the conditional correlations are of sensible signs and magnitudes. Furthermore, predictions based on specific features accurately predict mean per capita consumption throughout the welfare distribution. While the welfare consequences of more frequent measures of poverty and inequality are unknown, they may be large, given the many applications of frequent local measures of economic well-being, ranging from impact evaluation, to budget allocation to social transfers. How well do indicators derived from satellite imagery predict poverty, and which indicators are most important? This study investigates these questions using a sample of 1,291 villages in Sri Lanka, linking measures of economic well-being with features derived from HRSI. The results indicate that the correlation between satellite-derived indicators and economic well-being is remarkably strong when using model-based measures of ground truth that use the full census data and average over one hundred simulations. Simple linear models explain 35 to 60 percent in the variation in poverty and average log per capita consumption. Models explain 68 percent of the variation in a household asset index. These models perform slightly better than an end-to-end CNN model trained over the same data, suggesting that models built with interpretable features do not come at a cost of predictive power. Additional analysis also highlights the sensitivity of model performance to the measure of ground truth used for poverty and welfare. Predictive performance falls significantly, for example, when only using one simulation per household to estimate mean log per capita consumption and poverty rates at the GN Division level. In this case, simple linear models explain 37 percent of the variation in poverty rate when the poverty line is set at the 10th percentile, 47 percent of the variation in the poverty rate when the poverty line is set at the 40th percentile, and 49 percent of mean log consumption per capita. The explanatory power of the satellite indicators falls further when using only a subsample of census households to generate GN-Division poverty and welfare estimates, especially when predicting the 10 percent poverty rate. When using only 8 households per GN Division, the models explain only 17 percent of the variation in the 10 percent poverty rate, 32 percent of the variation in the 40 percent poverty rate, and 38 percent of the variation in mean log welfare. The sensitivity of the predictive performance of models using satellite data to different measures of ground truth may have implications for the efficient design of sample surveys, when the surveys are 410 Engstrom, Hersh, and Newhouse intended to be linked with geographically aggregated satellite data. Linking remote sensing and other “big data” with survey data can combine the decades of knowledge gained in collecting and interpreting survey data with the benefits of comprehensive big data. For example, as linking survey data with satellite imagery and other forms of big data become more popular, the benefits of collecting “micro-censuses” that interview all households in a random sample of low-level administrative areas may increase. At the same time, the sensitivity of models’ predictive performance to the precision of the training data highlights Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 a challenge of using the R2 as the sole measure of model performance. The models are intermediary inputs used to generate predictions, and the accuracy and precision of the predictions themselves ultimately mat- ter more than the predictive performance of the models used to generate them. Estimating how targeting performance benefits from augmenting survey data with satellite indicators, and how this depends on the precision of the ground truth data used to train the model, is an important area for further research. These findings raise a host of questions for further work. First, it is important to better understand the extent to which these results generalize to different social and ecological environments, such as Africa, the Middle East, and other parts of Asia. There is no guarantee that the predictive power of building density, shadows, and other features documented will hold in all environments. A second line of research could explore whether changes in satellite imagery could be used to forecast changes in economic well-being across space and time. Poverty surveys are typically collected every three years, and the most recent global estimates are produced with a three-year lag. Therefore, the ability to “now-cast” measures of economic well-being by combining frequently updated satellite imagery with the most recent survey-based measures of poverty has great potential. Third, there is much room for algorithm development, both for satellite feature extraction and model building. While this study has shown the direct CNN modeling to be no better than the study’s method, as the size of training datasets increases the CNN approach should overtake linear models. Finally, and most pressingly, how should statistical agencies make best use of the wealth of informa- tion from satellite imagery? Given the increased quality and decreased cost of satellite images—and the continuing advancement of processing power to extract features and build models—how can statistical agencies adapt to make use of this information? Many of the features extracted—roads, number of buildings, amount of vegetation—would be useful as policy variables for additional uses beyond poverty mapping. Should statistical agencies develop these themselves, or should they rely on the many third-party agencies that supply this information? The conclusion of this study that if satellite features continue to be valuable, there is room for multilaterals to provide this information.19 The concern is that private companies will own these data pipelines and possibly extract excess surplus from their use (Hersh, Engstrom, Mann 2020). Overall, the inevitable increase in the availability of imagery and feature identification algorithms, in conjunction with the encouraging results from this study and others in the literature, implies that satellite imagery will become an increasingly valuable tool to help governments and stakeholders better understand the spatial nature of poverty and economic welfare. 8. Data Availability Due to the data sharing agreement signed between the authors and the Sri Lankan Department of Census and Statistics, the underlying poverty and income data cannot be shared publicly. Researchers may contact the Sri Lankan Department of Census and Statistics to obtain a data-sharing agreement: http://www.statistics.gov.lk/ContactUs/headOffice. Interested researchers may contact the corresponding author (hersh@chapman.edu) to facilitate the application for access to the data. 19 UN Global Pulse PulseSatellite is an excellent example of an open-data satellite analytics tool, but it focuses primarily on humanitarian efforts. https://www.unglobalpulse.org/microsite/pulsesatellite/. The World Bank Economic Review 411 References Akiba, T., S. Suzuki, and K. Fukuda. 2017. Extremely Large Minibatch SGD: Training resNet-50 on ImageNet in 15 Minutes. arXiv preprint arXiv:1711.04325. Cornell University, Ithaca, NY. Anselin, L. 2013. Spatial Econometrics: Methods and Models. Vol. 4. Springer Science & Business Media. Anselin, L., A.K. Bera, R. Florax, and M.J. Yoon 1996. “Simple Diagnostic Tests for Spatial Dependence.” Regional Science and Urban Economics 26 (1): 77–104. Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 Athey, S. 2017. “Beyond Prediction: Using Big Data for Policy Problems.” Science 355 (6324): 483–85. Athey, S., and G. Imbens. 2015. Machine Learning Methods for Estimating Heterogeneous Causal Effects. arXiv preprint arXiv:1504.01132. Cornell University. Ithaca, NY, USA. Ayush, K., B. Uzkent, M. Burke, D. Lobell, and S. Ermon. 2020. “Generating Interpretable Poverty Maps using Object Detection in Satellite Images.” arXiv preprint arXiv:2002.01612. Babenko, B., J. Hersh, D. Newhouse, A. Ramakrishnan, and T. Swartz. 2017. “Poverty Mapping Using Convolutional Neural Networks Trained on High and Medium Resolution Satellite Images, with an Application in Mexico.” Proceedings from NIPS 2017: Neural Information Processing Systems Workshop on Machine Learning for the Developing World. Long Beach, CA. Battese, G.E., R.M. Harter, and W.A. Fuller. 1988. “An Error-Components Model for Prediction of County Crop Areas Using Survey and Satellite Data.” Journal of the American Statistical Association 83 (401): 28–36. Belloni, A., and V. Chernozhukov. 2013. “Least Squares after Model Selection in High-Dimensional Sparse Models.” Bernoulli 19 (2). Conley, Timothy G. 1999. “GMM estimation with cross sectional dependence.” Journal of econometrics, 92(1): 1–45. Dalal, N., and B. Triggs. (2005). “Histograms of Oriented Gradients for Human Detection.” In Computer Vision and Pattern Recognition (CVPR). 886–93. San Diego, CA. Department of Census and Statistics. 2012. “Sri Lanka Census of Population and Housing 2011.” Department of Census and Statistics and World Bank. 2015. “The Spatial Distribution of Poverty in Sri Lanka.” http://www.statistics.gov.lk/poverty/SpatialDistributionOfPoverty2012_13.pdf. Deville, P., C. Linard, S. Martin, M. Gilbert, F.R. Stevens, A.E. Gaughan, and A.J. Tatem. 2014. “Dynamic Population Mapping Using Mobile Phone Data.” Proceedings of the National Academy of Sciences 111 (45): 15888–93. Donaldson, D., and A. Storeygard. 2016. “The View from Above: Applications of Satellite Data in Economics,” Journal of Economic Perspectives 30 (4): 171–98. Drukker, D.M., I.R. Prucha, and R. Raciborski. 2013. “Maximum Likelihood and Generalized Spatial Two-Stage Least-Squares Estimators for a Spatial-Autoregressive Model with Spatial-Autoregressive Disturbances.” The Stata Journal 13 (2): 221–41. Elbers, C., J.O. Lanjouw, and P. Lanjouw. 2003. “Micro–Level Estimation of Poverty and Inequality.” Econometrica 71 (1): 355–64. Elbers, C., P.F. Lanjouw, and P.G. Leite. 2008. “Brazil Within Brazil: Testing the Poverty Map Methodology in Minas Gerais.” World Bank Policy Research Working Paper Series, Vol. Elvidge, C.D., K.E. Baugh, E.A. Kihn, H.W. Kroehl, and E.R. Davis. 1997. “Mapping City Lights with Nighttime Data from the DMSP Operational Linescan System.” Photogrammetric Engineering and Remote Sensing 63 (6): 727–34. Engstrom, R., D. Pavelsku, T. Tomomi, and A. Wambile. 2019. “Mapping Poverty and Slums Using Multiple Method- ologies in Accra, Ghana.” Joint Urban Remote Sensing Conference, Vannes, France. May 22-24, 2019, 1–4. Engstrom, R., A. Sandborn, Q. Yu, J. Burgdorfer, D. Stow, J. Weeks, and J. Graesser. 2015. Mapping Slums Using Spatial Features in Accra, Ghana. Joint Urban and Remote Sensing Event Proceedings (JURSE). Lausanne, Switzerland, 10.1109/JURSE.2015.7120494. Engstrom, R., A. Copenhaver, D. Newhouse, J. Hersh, and V. Haldavanekar. 2017. “Evaluating the Relation- ship between Spatial and Spectral Features Derived from High Spatial Resolution Satellite Data and Ur- ban Poverty in Colombo, Sri Lanka.” Joint Urban Remote Sensing Event (JURSE 2017) Dubai, UAE. DOI: 10.1109/JURSE.2017.7924590 Fan, J., and R. Li. 2001. “Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties.” Journal of the American Statistical Association 96 (456): 1348–60. Foster, J., J. Greer, and E. Thorbecke. 1984. “A Class of Decomposable Poverty Measures.” Econometrica 52 (3): 761–6. 412 Engstrom, Hersh, and Newhouse Gechter, M., and N. Tsivanidis. 2018. “The Welfare Consequences of Formalizing Developing Country Cities: Evidence from the Mumbai Mills Redevelopment.” Working Paper. https://economics.yale.edu/sites/ default/files/mumbaimills_ada-ns.pdf. Gentle, J. E., W.K., Härdle, and Y. Mori (Eds.). 2012. Handbook of Computational Statistics: Concepts and Methods. Berlin, Heidelberg: Springer-Verlag. Glaeser, E.L., S.D. Kominers, M. Luca, and N. Naik. 2015. Big Data and Big Cities: The Promises and Limitations of Improved Measures of Urban Life (No. w21778). National Bureau of Economic Research. Cambridge, MA, USA. Downloaded from https://academic.oup.com/wber/article/36/2/382/6333255 by LEGVP Law Library user on 08 December 2023 Graesser, J., A. Cheriyadat, R.R. Vatsavai, V. Chandola, J. Long, and E. Bright. 2012 “Image Based Characterization of Formal and Informal Neighborhoods in an Urban Landscape,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 5 (4): 1164–76. Head, A., M. Manguin, N. Tran, and J. Blumenstock. 2017. “Can Human Development Be Measured with Satel- lite Imagery?” Article No. 8, ICTD ’17: Proceedings of the Ninth International Conference on Information and Communication Technologies and Development, November 2017. Henderson, J.V., A. Storeygard, and D.N. Weil. 2012. “Measuring Economic Growth from Outer Space.” American Economic Review 102 (2): 994–1028. Hersh, J., R. Engstrom, and M. Mann. 2020. “Open Data for Algorithms: Mapping Poverty in Belize Using Open Satellite Derived Features and Machine Learning.” Information Technology for Development 27 (2); 1–30. Huettner, F., and M. Sunder. 2012. “Axiomatic Arguments for Decomposing Goodness of Fit According to Shapley and Owen Values.” Electronic Journal of Statistics 6: 1239–50. Israeli, O. 2007. “A Shapley-Based Decomposition of the R-square of a Linear Regression.” Journal of Economic Inequality 5 (2): 199–212. Jean, N., M. Burke, M. Xie, W.M. Davis, D.B. Lobell, and S. Ermon. 2016. “Combining Satellite Imagery And Machine Learning To Predict Poverty.” Science 353 (6301): 790–4. Krstajic, D., L.J. Buturovic, D.E. Leahy, and S. Thomas. 2014. “Cross-Validation Pitfalls When Selecting and Assessing Regression and Classification Models.” Journal of cheminformatics 6 (1): 1–15. Krizhevsky, A., I. Sutskever, and G.E. Hinton. 2012. “Imagenet Classification with Deep Convolutional Neural Net- works.” In Advances in Neural Information Processing Systems. 1097–105. LeCun, Yann, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. “Gradient-based learning applied to document recognition.” Proceedings of the IEEE, 86(11): 2278–2324. Marx, B., T.M. Stoker, and T. Suri. 2019. “The Political Economy of Ethnicity and Property Rights in Slums: Evidence from Kenya.” American Economic Journal: Applied Economics 11 (4). Mellander, C., J. Lobo, K. Stolarick, and Z. Matheson. 2015. “Night-time light data: A good proxy measure for economic activity?.” PloS one, 10(10): e0139779. Ngo, D.K., and L. Christiaensen. 2019. “The performance of a consumption augmented asset index in ranking house- holds and identifying the poor.” Review of Income and Wealth, 65(4): 804–833. Pinkovskiy, M., and X. Sala-i-Martin. 2016. “Lights, Camera… Income! Illuminating the National Accounts- Household Surveys Debate.” Quarterly Journal of Economics 131 (2): 579–631. Rao, J.N.K., and I. Molina. 2015. Small-Area Estimation. Hoboken, NJ: John Wiley and Sons, Inc. Sandborn, A., and R. Engstrom. 2016. “Determining the Relationship Between Census Data and Spatial Features Derived From High Resolution Imagery in Accra, Ghana.” IEEE Journal of Selected Topics in Applied Earth Ob- servations and Remote Sensing (JSTARS) Special Issue on Urban Remote Sensing. Serajuddin, U., H. Uematsu, C. Wieser, N. Yoshida, and A. Dabalen. 2015. “Data Deprivation: Another Deprivation to End.” Policy Research Working Paper 7252. World Bank, Washington, DC, USA. Shorrocks, A.F. 2013. “Decomposition Procedures for Distributional Analysis: A Unified Framework Based on the Shapley Value.” Journal of Economic Inequality 11: 1–28. Tarozzi, A., and A. Deaton. 2009. “Using Census and Survey Data to Estimate Poverty and Inequality for Small Areas.” Review of Economics and Statistics 91 (4): 773–92. Tibshirani, Robert 1996. “Regression shrinkage and selection via the lasso.” Journal of the Royal Statistical Society: Series B (Methodological), 58(1): 267–288. Varian, H.R. 2014. “Big Data: New Tricks for Econometrics.” Journal of Economic Perspectives 28 (2): 3–27. Yeh, C., A. Perez, A. Driscoll, G. Azzari, Z. Tang, D. Lobell, S. Ermon, and M. Burke. 2020. “Using publicly available satellite imagery and deep learning to understand economic well-being in Africa.” Nature communications, 11(1): 1–11.