Report No: AUS0001842 . Timor-Leste Timor-Leste Poverty Monitoring & Analysis Estimating Small Area Poverty and Welfare Indicators in Timor-Leste using Satellite Imagery Data . September 28th, 2020 . POV . . © 2020 The World Bank 1818 H Street NW, Washington DC 20433 Telephone: 202-473-1000; Internet: www.worldbank.org Some rights reserved This work is a product of the staff of The World Bank. The findings, interpretations, and conclusions expressed in this work do not necessarily reflect the views of the Executive Directors of The World Bank or the governments they represent. The World Bank does not guarantee the accuracy of the data included in this work. The boundaries, colors, denominations, and other information shown on any map in this work do not imply any judgment on the part of The World Bank concerning the legal status of any territory or the endorsement or acceptance of such boundaries. Rights and Permissions The material in this work is subject to copyright. Because The World Bank encourages dissemination of its knowledge, this work may be reproduced, in whole or in part, for noncommercial purposes as long as full attribution to this work is given. Attribution—Please cite the work as follows: “Purnamasari, R, Wirapati, B.A., Alatas, H., Nasiir, M. 2020. Estimating Small Area Poverty and Welfare Indicators in Timor-Leste using Satellite Imagery Data. © World Bank.” All queries on rights and licenses, including subsidiary rights, should be addressed to World Bank Publications, The World Bank Group, 1818 H Street NW, Washington, DC 20433, USA; fax: 202-522-2625; e-mail: pubrights@worldbank.org. ii ESTIMATING SMALL AREA POVERTY AND WELFARE INDICATORS IN TIMOR-LESTE USING SATELLITE IMAGERY DATA Ririn Purnamasari Bagus Arya Wirapati Hamidah Alatas Mercoledi Nasiir POVERTY AND EQUITY THE WORLD BANK 28 SEPTEMBER 2020 iii TABLE OF CONTENTS List of Figures ................................................................................................................................................ v List of Tables .................................................................................................................................................. v List of Acronyms ........................................................................................................................................... vi Acknowledgements ..................................................................................................................................... vii 1. Background ........................................................................................................................................... 1 2. Fay-Herriot Small Area Estimation Model ................................................................................................. 4 3. Data ........................................................................................................................................................... 6 3.1. Timor-Leste Survey of Living Standards 2014 .............................................................................. 6 3.2. Timor-Leste Demographic and Health Survey 2016 .................................................................... 8 3.3. Geospatial Satellite Data ............................................................................................................. 9 4. Model Specification ............................................................................................................................. 12 5. Results and Discussions ....................................................................................................................... 13 5.1. Poverty Rate .............................................................................................................................. 13 5.2. Log Average Real Per Capita Consumption ............................................................................... 14 5.3. Wealth Index ............................................................................................................................. 16 6. Conclusion ........................................................................................................................................... 18 References................................................................................................................................................... 19 Appendix 1: Poverty Estimates using FHSAE ............................................................................................... 21 Appendix 2: Log Average Real Per Capita Consumption using FHSAE ......................................................... 24 Appendix 3: Normalized Wealth Index Estimates using FHSAE ................................................................... 27 Appendix 4: Validation and Precision .......................................................................................................... 30 Appendix 5: District Direct and Sub-District FH Estimates........................................................................... 32 Appendix 6: Comparison between ELL and FHSAE poverty estimates ........................................................ 38 Appendix 7: List of Variables and Outliers ................................................................................................... 41 Appendix 8: FH Estimation Model Details ................................................................................................... 44 iv LIST OF FIGURES Figure 1: Standard errors (left) and confidence intervals (right) of poverty estimates at sub-district level.................... 7 Figure 2: Standard errors (left) and confidence intervals (right) of Log Average Real Per Capita Consumption at sub- district Level .......................................................................................................................................................................... 7 Figure 3: Standard errors (left) and confidence intervals right) of Wealth Index at sub-district Level ............................ 8 Figure 4: Map of sub-district poverty estimates using FHSAE .......................................................................................... 13 Figure 5: Map of sub-district Log Average Real Per Capita Consumption using FHSAE .................................................. 14 Figure 6: Map of sub-district wealth index using FHSAE .................................................................................................. 16 Figure 7: RMSEs of TL-SLS direct estimates versus FHSAE estimates .............................................................................. 30 Figure 8: RMSEs of FHSAE results with Chosen Model versus StepVIF + Stepwise Regression Selection ..................... 31 Figure 17: Poverty Rate Comparison between District Direct Estimates and Sub-District FH Estimates ...................... 32 Figure 18: Log Real Per Capita Consumption Comparison between District Direct Estimates and Sub-District FH Estimates ............................................................................................................................................................................. 34 Figure 19: Wealth Index Comparison between District Direct Estimates and Sub-District FH Estimates...................... 36 Figure 20: Confidence interval of ELL poverty estimates ................................................................................................. 38 Figure 21: Confidence interval of FHSAE poverty estimates ............................................................................................ 39 Figure 22: Poverty rates at sub-district level based on direct, ELL, and FHSAE estimations .......................................... 39 Figure 23: Poverty map comparison between ELL and FHSAE ......................................................................................... 40 LIST OF TABLES Table 1: Poverty Rates and Average Real Per Capita Consumption of Timor-Leste, 2007 and 2014 ............................... 1 Table 2: Poorest Sub-Districts Based on FH Estimates of Poverty Rates ......................................................................... 14 Table 3: Poorest Sub-Districts Based on FH Estimates of Log Average Real per Capita Expenditure ............................ 15 Table 4: Poorest Sub-Districts Based on FH Estimates of Log Average Real per Capita Expenditure ............................ 17 Table 5: Poverty Estimates using FHSAE ............................................................................................................................ 21 Table 6: Log Average Real Per Capita Consumption Estimates using FHSAE ................................................................... 24 Table 7: Normalized Wealth Index Estimates using FHSAE .............................................................................................. 27 Table 8: List of Variables and Outliers ................................................................................................................................ 41 Table 9: FHSAE Regression Results for Poverty Rate ........................................................................................................ 44 Table 10: FHSAE Regression Results for Log Per Capita Expenditure .............................................................................. 45 Table 11: FHSAE Regression Results for Normalized Wealth Index ................................................................................. 46 v LIST OF ACRONYMS BLUP Best Linear Unbiased Predictor CV Coefficient of Variation DNB Day-Night Band EA Enumeration Area EBLUP Empirical Best Linear Unbiased Predictor ELL Elbers, Lanjouw and Lanjouw FGLS Feasible Generalized Least Squares FH Fay-Herriot FHSAE Fay-Herriot Small Area Estimation IID Independent and Identically Distributed MODIS Moderate Resolution Imaging Spectroradiometer MSE Mean Squared Error NDVI Normalized Difference Vegetation Index RMSE Root Mean Squared Error SAE Small Area Estimation SRTM Shuttle Radar Topography Mission TLDHS Timor-Leste Demographic and Health Survey TLPHC Timor-Leste Population and Housing Census TL-SLS Timor-Leste Survey of Living Standards VIF Variance Inflation Factor VIIRS Visible Infrared Imaging Radiometer Suite vi ACKNOWLEDGEMENTS This report was prepared by a team led by Ririn Salwa Purnamasari (Senior Economist, EEAPV), with vital support from Bagus Arya Wirapati (Consultant, EEAPV), Hamidah Alatas (Consultant, EEAPV) and Mercoledi Nikman Nasiir (Consultant, EEAPV). The Geospatial Satellite data was prepared by Yasuhiro Kawasoe (Social Protection Specialist, HEASP). Excellent comments to the draft were received from peer reviewers: Paul Andres Corral Rodas (Senior Economist, GGHVP), William Hutchins Seitz (Senior Economist, EECPV), and Alexandru Cojocaru (Senior Economist (EPVGE). The method to generate the Fay-Herriot small area estimates in this report relied upon Stata commands developed by Paul Andres Corral Rodas (Senior Economist, GGHVP) and William Hutchins Seitz (Senior Economist, EECPV). Overall guidance was provided by Rinku Murgai (Practice Manager, Poverty and Equity Global Practice, East Asia and Pacific Region), Macmillan Anyanwu (Resident Representative, Timor-Leste), and Satu Kahkonen (Country Director, Indonesia and Timor-Leste). vii 1. BACKGROUND Timor-Leste has made significant progress in recent years in reducing poverty. The poverty rate at the national poverty line has fallen from 50.4 percent in 2007 to 41.8 percent in 2014. Measured using an internationally comparable poverty line of US$1.90 per person per day (2011 PPP), poverty declined even more rapidly from 47.2 percent to 30.7 percent between 2007 and 2014. However, Timor-Leste’s poverty rate remains amongst the highest in the East Asia and Pacific region. Moreover, these impressive reductions in poverty were not experienced equally across the country. Poverty is still predominantly a rural phenomenon, with 80 percent of the poor living in rural areas. While poverty fell in both rural and urban areas, the decline was larger in urban areas. Across regions, the biggest falls in poverty between 2007 and 2014 were observed in the Central region. Poverty also declined in the West, though poverty incidence in this region remains the highest in the country. Despite having the lowest poverty rate, poverty increased slightly in the eastern region, driven by the rise in rural poverty. TABLE 1: POVERTY RATES AND AVERAGE REAL PER CAPITA CONSUMPTION OF TIMOR-LESTE, 2007 AND 2014 Poverty Rate (%) Real Consumption (US$ per person per month) Region 2007 2014 2007 2014 National 50.4 41.8 53.9 60.1 Rural 54.7 47.1 50.3 55.5 Urban 38.3 28.3 64.2 71.7 East 31.6 33.8 65.3 63.0 Centre 54.6 40.0 51.6 62.2 West 60.3 55.5 47.7 51.1 Source: TL-SLS 2014 A reduction in poverty reflects greater capability to consume basic needs. Nationally household real per capita consumption increased from US$53.9 to US$60.1 between 2007 and 2014. Mirroring the decline in the poverty rate, the Central region showed the largest increase in real per capita consumption over the same period. An increase in real per capita consumption was also observed in the West. Meanwhile, real per capita consumption slightly increased in the eastern region. Since 2011, the Government of Timor-Leste has set a goal to eradicate extreme poverty by introducing socially inclusive policies and programs. The success of these policies and programs will depend on how effectively they are developed and targeted. Regular monitoring of the progress of poverty reduction, especially in small administrative areas, is therefore key to informing the government in its allocation of scarce resources to improve people’s welfare. The challenge for Timor-Leste, however, is that the existing consumption-based poverty estimates from the Timor-Leste Survey of Living Standards (TL-SLS) and Timor-Leste Demographic and Health Survey (TL- DHS) are representative only at the district level and thus do not capture the detail and heterogeneity of living standards within districts. Increasing the geographical disaggregation of nationwide household welfare survey data is costly – both financially and in terms of time taken to do the survey. To overcome this constraint, a variety of Small Area Estimation (SAE) methods that combine survey data with more granular non-survey data to estimate poverty have been developed and widely used. However, traditionally, SAE methods - including the one introduced by Elbers, Lanjouw, and Lanjouw (ELL), (2003) 1 which is commonly used within the World Bank - require census data to generate poverty estimates for smaller areas. In most other countries, these census data are only available every decade. An alternative method to estimating poverty in small areas using non-conventional data sources, particularly in the case when census data might not be available, is needed. An emerging alternative source of data to perform SAE is geospatial data from satellite imagery. As satellite imagery is increasingly available, including in the public domain, there have been several applications of the analysis of satellite imagery to economic questions. Night-time luminosity - which measures the intensity of light captured passively by satellite - is becoming a commonly used remote sensing measure for economic activity for small areas. Recently, daytime imagery has also emerged as a practical source of information on welfare and economic activities. For instance, daytime satellite imagery has been used to predict enterprise counts and employment in Vietnam (Goldblatt et al. 2019) and to define urban markets in India (Vogel et al. 2019). Most satellite data are available frequently – some even as frequently as on a weekly or daily basis — with minimal time lags. This permits the use of satellite data in the same time frame as household survey data. In addition, satellite images provide a wide range of resolutions, ensuring accuracy for analysis at the intended administrative level where household survey data is available. Various geospatial data from satellite imagery are also available freely from a range of sources. Recent studies have demonstrated the effectiveness of employing satellite imagery to estimate poverty or welfare in small areas. For example, Jean et al. (2016) in five African countries, Engstrom et al. (2017) in Sri Lanka, Babenko et al. (2017) in Mexico, Tingzon et al. (2019) in the Philippines, and Seitz (2019) in Central Asia. In terms of methodology, the SAE approach introduced by Fay and Herriot (1979) offers an advantage in that it allows estimation of poverty or welfare at an intended administrative level where survey data is not representative, by linking it with area-based information that is less subject to imprecision, such as administrative data and geo-spatial data obtained from satellite imagery, without relying on census data. The Fay-Herriot Small Area Estimation (FHSAE) method thus allows estimation of welfare indicators at a relatively granular level whenever household survey data and satellite imagery data are available at a high frequency. However, the FHSAE method becomes highly inefficient when it is applied to areas with very small sample sizes – resulting in extremely high variance. In addition, when applying the method, we must assume that the sampling variances of the direct estimators are known. Therefore, the resulting Mean Square Error (MSE) of the poverty estimators do not account for the uncertainty due to the estimation of the sampling variance of the direct estimator. While the FHSAE can be applied to non-sampled areas, it is not recommended because the estimate will entirely rely on the linear fit. Moreover, since the estimates are based on a model, thus the model needs to be further analyzed to ensure approximation to the model’s assumptions. Given the advantages and disadvantages of the FHSAE method, in this paper, we apply the FHSAE approach to Timor-Leste in order to produce poverty and welfare estimates at the sub-district level. There are 65 sub-districts spread across 13 districts in Timor-Leste. The number of sub-districts in each district ranges between three and seven. The TL-SLS and TL-DHS sampling designs provide a number of sampled households for each sub-district. While sub-district sample sizes are not large enough to perform direct estimations at sub-district level, they are large enough to allow for the use of FHSAE. The fact that all sub- 2 districts are sampled is an advantage for the Timor-Leste FHSAE exercise as it does not have any out of sample sub districts, and thus the results of the estimation are potentially going to be more precise than similar exercises in other countries that do not have this advantage. The exercise was performed for three indicators — poverty estimates and average real per capita consumption using the TL-SLS, and an asset- based welfare index using the TL-DHS. This report is structured as follows: an in-depth explanation of the FHSAE method is presented in Section 2. Section 3 reviews the sub-district level data used in this study, which includes imprecise TL-SLS and DHS direct estimates, as well as satellite imagery data used in this study. The variable selection method used for the FHSAE model in this model is explained in Section 4. Section 5 provides the results of the FHSAE exercise on poverty estimates, average real per capita consumption and welfare index, presenting them in the graphical maps. Section 6 concludes. 3 2. FAY-HERRIOT SMALL AREA ESTIMATION MODEL Fay and Herriot (1979) proposed a methodology to improve the direct estimator � of each area to estimate the true small area mean using the following two levels: the sampling model and linking model. The sampling model is defined as � = + (1) � is the survey direct estimate of welfare indicator – in this case, poverty rate or average real per where capita consumption – and is the sampling error corresponding to � , such that | ~ (0, ) and is the known variance of the sampling error. The linking model is defined as = + (2) where is the vector of area-specific auxiliary variables – in this case, geospatial satellite imagery data – is a vector of regression coefficients to be estimated, and are independent and identically distributed (IID) random errors, such that ~ (0, 2 ) where 2 is the unknown variance of the area-specific random effect to be estimated. Since is obtained from continuous monitoring of satellite imagery data, it is considered free of sampling error. The sampling model accounts for the variability of survey direct estimates , while the linking model links to the vector . By substituting from the linking model into the sampling model, the Fay-Herriot Small Area Estimation (FHSAE) model can be defined as � = + + (3) where it is assumed that and are IID. The FHSAE method aims to estimate the small area mean = + and obtain the measurement of uncertainty associated with the estimation. Thus, the best linear unbiased predictor (BLUP) of which minimizes the Mean Squared Error (MSE) can be expressed as a weighted average of the direct estimate � and the model-based estimator as follows � + (1 − ) � = (4) 2 where = +2 , which is the shrinkage factor, such that ∈ (0,1). Note that measures the uncertainty in estimating relative to total variance + 2 . If the parameter 2 is known, then can be obtained by employing a standard weighted least squares estimation. However, in a practical sense, 2 is commonly unknown, so must be replaced by general estimator � 2 . Thus, an Empirical Best Linear Unbiased Estimator (EBLUP) is obtained and defined as � = � + (1 − � ̂ � ) (5) 4 �2 ̂ is the Feasible General Least Squares (FGLS) estimator for and where � = + . �2 Observe that the shrinkage factor serves as a weight to the direct and model-based estimates based on the parameter . The greater the sampling variance , the less precise the direct estimates, hence the lower the weight given to the direct estimates. In this case, FH estimates will be based more on the synthetic model-based estimates. Similarly, the lower the sampling variance, the greater the weight entrusted to direct estimates. To obtain the parameter , this exercise uses a linearized variance estimator approach based on first-order Taylor series. 5 3. DATA 3.1. Timor-Leste Survey of Living Standards 2014 TL-SLS 2014 is the most recent consumption-based household survey of Timor-Leste. The survey was fielded from April 2014 to March 2015, on a sample of 400 Enumeration Areas (EA). The EAs were stratified by urban and rural within each district. Within each EA, a sample of 15 households was randomly drawn, resulting in 5,916 successful households out of the intended 6,000 sample size. The sampling weights reflect the unequal probabilities of selection inherent in a sample that is representative at district level. The consumption estimates are in terms of USD per person per month 1. These estimates were obtained based on a seven-day recall of consumption of 135 food commodities and a mixed recall period (monthly, quarterly, or annual) for expenditure on 53 non-food commodities. The consumption aggregate also includes an estimate of imputed rent, which is based on a hedonic housing equation. Poverty estimates used in this exercise are based on national poverty lines of Timor-Leste. The poverty lines were estimated by Directorate General of Statistics (DGS) using cost of basic needs approach for each district without urban and rural disaggregation. The poverty line consists of food poverty lines, rent poverty lines and non-food poverty lines and varies between 13 districts. The highest share, over one-half, within each poverty line came from food poverty lines. The highest poverty line was Dili with USD 56.16 per person per month, while the lowest were Ermera and Liquica with USD 37.97. Real per capita consumption used here is the total household expenditure per capita, deflated by a temporal and spatial price index. The temporal price index applied by the DGS captures the monthly inflation of food and fuel (kerosene and firewood) items over the year of TL-SLS 2014 data collection, using the Laspeyres price index calculation. The temporal adjustment is required because households were interviewed throughout the year. Meanwhile, the spatial price index is derived from a relative average of the official poverty lines. In this exercise, the real per capita consumption was log-transformed to bring the distribution closer to normal distribution. Since TL-SLS 2014 is not designed to produce direct estimates lower than the district level, it is expected that the sub-district level direct estimates produced from the datasets will have high standard errors. As can be seen from Figure 1 and Figure 2, the standard errors ranged from 0 to 10.7% for sub-district level poverty estimates and ranged from 0 to 0.1 for sub-district level log average real per capita consumption, resulting in a wide confidence interval in some sub-districts. The Coefficient of Variation (CV) of poverty rates ranged from 0 to 68 percent, while log average real per capita consumption ranged from 0 to 6.5 percent. 1 The official currencies of Timor-Leste are the United States Dollar notes and Timor-Leste centavo coins. The coins, however, are not internationally recognized, hence not been awarded ISO code. 6 FIGURE 1: STANDARD ERRORS (LEFT) AND CONFIDENCE INTERVALS (RIGHT) OF POVERTY ESTIMATES AT SUB-DISTRICT LEVEL 12% 100% 90% 10% 80% 70% 8% 60% 6% 50% 40% 4% 30% 20% 2% 10% 0% 0% Source: TL-SLS 2014 FIGURE 2: STANDARD ERRORS (LEFT) AND CONFIDENCE INTERVALS (RIGHT) OF LOG AVERAGE REAL PER CAPITA CONSUMPTION AT SUB-DISTRICT LEVEL 12% 4.4 10% 4.2 8% 4 3.8 6% 3.6 4% 3.4 2% 3.2 0% 3 Source: TL-SLS 2014 7 3.2. Timor-Leste Demographic and Health Survey 2016 The TL-DHS 2016 was established to provide a comprehensive overview of population, maternal and child health issues in Timor-Leste. Data collection took place from September to December 2016. The sampling frame was based on the 2015 Timor-Leste Population and Housing Census (TLPHC). The TL-DHS 2016 was designed to be representative at the district level. The stratified sample was drawn in two stages. In the first stage, 455 EAs were randomly sampled with probability proportional to EA population size from TLPHC 2015 – that is 129 EAs in urban areas and 326 EAs in rural areas. In the second stage, 26 households were randomly sampled within each selected EA. During data collection, no replacement or changes of the pre- selected households were allowed. This resulted in a total of 11,502 successfully interviewed households, or an approximate 99 percent response rate. Welfare in TLDHS 2016 was represented by a wealth index. This is a score assigned to households based on their housing characteristics, such as source of drinking water, toilet facilities, flooring materials, and ownership of consumer durable goods, such as TVs, bicycles, cars amongst others. A Principal Component Analysis (PCA) was employed to derive the household’s wealth index at national level. 86 percent of urban households were included in the top 40 percent of wealth quintiles and 53 percent of rural households were included in the bottom 40 percent. Therefore, based on the wealth index, urban households tend to be wealthier. In this exercise, the wealth index is normalized to span between 0 and 1. Similar to the TL-SLS 2014, since TL-DHS 2016 was not designed to be representative below district level, the sub-district direct estimates of the wealth index have standard errors spanning from 0 to 0.04 (Figure 3). This results on wide confidence interval in some sub-districts. The CV of wealth index direct estimates ranged from 0 to 21 percent. FIGURE 3: STANDARD ERRORS (LEFT) AND CONFIDENCE INTERVALS RIGHT) OF WEALTH INDEX AT SUB-DISTRICT LEVEL 0.05 0.8 0.045 0.7 0.04 0.6 0.035 0.5 0.03 0.025 0.4 0.02 0.3 0.015 0.2 0.01 0.1 0.005 0 0 Source: TLDHS 2016 8 3.3. Geospatial Satellite Data A wide variety of geo-spatial data were drawn from various free sources. The data were categorized into three groups including economic activities, agro-climatic conditions, and Project Locations of Donors and FCV Related Data. 1. Economic activities • Night-time Luminosity data: Monthly average radiance composite images using night-time data from the Visible Infrared Imaging Radiometer Suite (VIIRS) Day/Night Band (DNB) 2 for 2016 was collected through the Google Earth Engine. A spatial resolution of the original data is 15 arc seconds. The annual median value of DNB radiance (nanoWatts/cm2/sr) was then aggregated at the sub- district level. • Built up area: Global Human settlements layer data 3 that estimates built-up areas was developed by the European Commission Joint Research Center using Landsat image collections (GLS1975, GLS1990, GLS2000, and Landsat 8 collection 2013/2014). In this study, we categorize the data into five groups as follows: a. Never built up; b. Built-up before 1975; c. Built-up between 1975-1990; d. Built-up between 1990-2000; and e. Built-up between 2000-2014. The total area (square kilometers) for each category was then aggregated at the sub-district level. • Population density data: Mean population density at sub-district level was calculated using the Gridded Population of World Version 4 (GPWv4), Revision 11 4. It models the distribution of global human populations for 2015 on 30 arc-second (approximately 1km) grid cells. The study used the United Nations (UN) adjusted version, where total population per country is adjusted to match the 2015 Revision of UN World Population Prospects country totals. • Accessibility data: Average travel time from the central points of grids (resolution of data is around 1km) to cities was calculated for each sub-district using the global map of travel time to cities to assess inequality in accessibility in 2015 5. The definition of city adopted in this study is a densely populated area (contiguous areas with 1,500 or more inhabitants per square kilometer or a majority of built-up land cover types coincident with a population center of at least 50,000 inhabitants). • Number of buildings: The number and GPS location of buildings, schools, and hospitals were collected during the 2015 census. • Length of road network: The data was obtained from the OpenStreetMap (as of February 2, 2020). We categorized the initial 25 types of road from OpenStreetMap into nine types: trunk, primary, primary link, secondary, secondary link, motorway, motorway link, tertiary, and tertiary link. • Gross Domestic Product Grid: Map of total economic activity, including both formal and informal economic activity for ~2006; created from nighttime lights and LandScan population grid. 2 https://spie.org/Publications/Proceedings/Paper/10.1117/12.2023107 3 https://publications.jrc.ec.europa.eu/repository/handle/JRC97705 4 https://sedac.ciesin.columbia.edu/data/set/gpw-v4-population-density-rev11 5 https://malariaatlas.org/research-project/accessibility_to_cities/ 9 2. Agro-climatic conditions • Elevation and slope data: from the Shuttle Radar Topography Mission (SRTM) digital elevation dataset Version 4 in 90m resolution i. • Precipitation data: from Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) version 2.0 at 0.05 arc degrees resolution 6. The average daily precipitation values (mm/day) in each cell were then aggregated at the sub-district level. • Normalized Difference Vegetation Index (NDVI): The sub-district annual average of the 16-Day NDVI values were aggregated from each cell data from the MOD13Q1 V6 Terra Vegetation Indices 16-Day in 250 m resolution 7. • Surface temperature data: from TerraClimate 8 by University of Idaho that captures Monthly Climate and Climatic Water Balance for Global Terrestrial Surfaces in 2.5 arc minutes resolutions. • Landcover data: from the Global Land Cover by National Mapping Organizations (GLCNMO) Global version 3 9 which classifies the status of land cover of the whole globe into 20 categories using MODIS data 2013 in 500m resolution. • Tree canopy cover for year 2000 (GFC 2018, v1.6): Tree cover in the year 2000, defined as canopy closure for all vegetation taller than five meters in height from AidData. • Year of gross forest cover loss event (GFC 2018 v1.6): Forest loss during the period 2000-2018, defined as a stand-replacement disturbance, or a change from a forest to non-forest state. Dataset is downloaded from AidData. • Wind Speed Potential: Wind speed potential at 50m resolution from Global Wind Atlas. Dataset is downloaded from AidData. • Wind Power Density Potential: Wind power density potential at 50m resolution from Global Wind Atlas. Dataset is downloaded from AidData. • Distance to Coast: Distance to coast (on land only), measured in meters. Derived by AidData using World Vector Shorelines. • Distance to Water: Distance to water, measured in meters. Derived by AidData using World Vector Shorelines combined with rivers and lakes from World Data Bank 2. • Distance to Roads: Distance to roads, measured in meters, based on the Global Roads Open Access Dataset (gRoads) version 1.0 by AidData. • Distance to Country Borders: Distance to country borders, measured in meters. Derived by AidData using GADM 2.8 ADM0 (Country) boundaries. • Ozone Concentration: Ozone concentration (ug/m3) from TM5 FASST simulation. Dataset is downloaded from AidData. 6 https://www.chc.ucsb.edu/data/chirps 7 https://lpdaac.usgs.gov/products/mod13q1v006/ 8 http://www.climatologylab.org/terraclimate.html 9 https://globalmaps.github.io/glcnmo.html 10 • Particulate Matter (PM2.5) Concentration: Particulate matter (PM2.5) estimate, based on prediction model using combination of satellite-based estimate and TM5-FASST simulation from AidData 3. Project Locations of Donors and FCV Related Data • World Bank Geocoded Aid Data v1.4.2: Location of World Bank projects from 1995 to 2014 geocoded by AidData. • Timor Leste Geocoded Aid Data v1.4.1: Location of projects by donors from 1998 to 2015 geocoded by AidData. • Global Environment Facility Sectors Geocoded Aid Data v1.1.0: Aid Data from Global Environment Facility Sectors, covering projects from 1994 to 2014. • Chinese Official Finance v1.1.0: Includes the entire Geocoded Global Chinese Official Finance from AidData. • The World Database on Protected Areas (WDPA): Global database of marine and terrestrial protected areas. Dataset downloaded and managed by AidData in April 2017. • UCDP Conflict Deaths: Number of total fatalities per 0.01 decimal degree grid cell resulting from conflict event using UCDP Georeferenced Event Dataset (GED) global version 17.1. Dataset is downloaded from AidData. 11 4. MODEL SPECIFICATION Having been provided with a list of 102 variables derived from the satellite imagery data and AidData, we need to select the subset from all possible subsets of variables that will best fit the FHSAE model estimation. We first performed an outlier test using Grubbs’ Test 10 also known as maximum normalized residual test to detect outliers in a univariate data set assumed to be drawn from a normally distributed population. Once the outliers are detected, we then proceed to winsorize the outliers to reduce the outlier effect to the estimation. Finally, variables with outliers more than ten were dropped from the variable candidates. The variable selection was performed manually by setting up an iteration to select the variable which has the lowest p-value in FHSAE estimation from the pool of 102 variables. The selected variable with the lowest p-value was then stored, and the iteration repeated to select the next variable with the lowest p-value until ten variables were selected. The ten selected variables were then evaluated using uncentered Variance Inflation Factor (VIF) in which variables with VIF greater than 20 were dropped from the model. Once the selection using uncentered VIF was complete, we dropped variables which were not significant in the FHSAE estimate to obtain the final model. In dropping the variables, we made sure to observe the change in Schwarz’s Bayesian Information Criterion (BIC) to ensure that the new model has a better approximation than the previous one. To ensure that the model obtained from the iteration was sufficiently robust, we compared our final model with the screening steps utilized by Seitz (2019). The screening consisted of two steps: First, StepVIF procedure, which dropped variables in the pool with a VIF greater than ten. Second, variables obtained from the StepVIF were then further selected using Stepwise Regression. Once the set of variables from the automated model had been obtained, all selected variables were checked and exercised by manually putting back some combination of variables considered to be relevant to see if any improvement resulted from doing so. This comparison was made in order to confirm that no major gain can be obtained by deviating to another model selected using a different selection method. Further discussion regarding model validation and precision can be found in Appendix 4. 10 Grubbs (1969) and Stefansky (1972) 12 5. RESULTS AND DISCUSSIONS 5.1. Poverty Rate As seen in Figure 4, the FHSAE estimates at the sub-district level underline the variability of sub-district poverty estimates within each district. The sub-district FHSAE estimates are consistent with the fact that western Timor-Leste is relatively poorer compared to eastern Timor-Leste. The majority of these sub- district variations, however, are not significantly different from corresponding district direct estimates (see confidence interval in Figure 9). In some sub-districts, poverty rates from the FHSAE estimates are significantly lower than respective district direct estimates and other sub-districts within the same district, such as sub-district Liquidoe (Aileu), Fatumean (Covalima) and Tutuala (Lautem). On the other hand, Venilale (Baucau), Cailaco (Bobonaro), Fatululic (Covalima), Atauro (Dili), Soibada (Manatuto) and Lacluta (Viqueque) are sub-districts with significantly higher poverty rates than their respective districts and other sub-districts within the same district. Moreover, within districts, some of sub-district poverty rates are statistically different from one another. FIGURE 4: MAP OF SUB-DISTRICT POVERTY ESTIMATES USING FHSAE Source: FHSAE based on TL-SLS 2014 and satellite imagery data Table 2 present Timor-Leste’s sub-districts with poverty rates above 60 percent. All of these poorest sub- districts are concentrated in the four poorest districts. Cailaco (Bobonaro) and Fatululic (Covalima) are estimated as the poorest sub-districts in Timor-Leste, with 82 percent and 79 percent poverty rates, accordingly. Poverty rates in these two sub-districts are significantly higher than the rest of poorest sub- districts. Variations are more apparent beyond the poorest sub-districts, as several sub-district poverty rates are statistically different from other sub-district poverty rates (see Figure 9). This suggests that the 13 FHSAE estimates at sub-district level provides additional information beyond the existing district direct estimates. TABLE 2: POOREST SUB-DISTRICTS BASED ON FH ESTIMATES OF POVERTY RATES District Sub-district Estimates S.E. CV Bobonaro Cailaco 82% 4% 0.11 Covalima Fatululic 79% 0% 0.00 Oecussi Nitibe 71% 6% 0.32 Viqueque Lacluta 69% 1% 0.00 Ermera Letefoho 67% 8% 0.54 Oecussi Oesilo 67% 6% 0.31 Manatuto Soibada 64% 0% 0.00 Oecussi Passabe 63% 9% 0.70 Ermera Hatolia 63% 6% 0.28 Covalima Zumalai 61% 7% 0.42 Ermera Ermera 60% 9% 0.63 Source: FHSAE based on TL-SLS 2014 and satellite imagery data 5.2. Log Average Real Per Capita Consumption At the sub-district level, estimates of log average real per capita consumption have less extreme values than poverty rates and thus display less variability compared to poverty rates. As the map shows in Figure 5, many sub-districts in the western part of the country tend to be poorer than those in the eastern part. Similar to FHSAE estimates of poverty rates, the majority of sub-districts FHSAE estimates are not significantly different from their respective district estimates. Among those having significant differences in estimates, Fatumean (Covalima) and Fatuberliu (Manufahi) are two sub-districts with significantly greater log average real per capita consumption than their respective districts and other sub-districts within the same district (see respective confidence interval in Figure 10). Meanwhile, Baguia (Baucau), Venilale (Baucau), Fatululic (Covalima), Metinaro (Dili), Atauro (Dili), Soibada (Manatuto) and Lacluta (Viqueque) are sub-districts with significantly lower log average real per capita consumption than their respective districts and other sub-districts within the same district. FIGURE 5: MAP OF SUB-DISTRICT LOG AVERAGE REAL PER CAPITA CONSUMPTION USING FHSAE 14 Source: FHSAE based on TL-SLS 2014 and satellite imagery data Table 3 presents sub-districts with the lowest log average real per capita consumption. Most of these sub- districts are also the poorest sub-districts as estimated previously and are from the four poorest districts in the country. However, unlike the poverty rates FHSAE estimates, the distribution of FHSAE estimates of log average real per capita consumption is tighter around the tail. As a result, many of the FSAE estimates of log average real per capita consumption are not significantly different across sub-districts among the poorest districts. This is probably the result of log transformation which reduce the intensity of extreme values. TABLE 3: POOREST SUB-DISTRICTS BASED ON FH ESTIMATES OF LOG AVERAGE REAL PER CAPITA EXPENDITURE District Sub-district Estimates S.E. CV Covalima Fatululic 3.58 0.00 0.00 Oecussi Nitibe 3.64 0.06 1.68 Bobonaro Cailaco 3.64 0.08 2.24 Ermera Letefoho 3.65 0.09 2.40 Viqueque Lacluta 3.72 0.04 0.97 Oecussi Passabe 3.74 0.09 2.48 Oecussi Oesilo 3.75 0.09 2.37 Manatuto Soibada 3.76 0.00 0.00 Ermera Ermera 3.77 0.10 2.55 Bobonaro Maliana 3.80 0.08 2.19 Oecussi Pante Macasar 3.80 0.04 1.00 Source: FHSAE based on TL-SLS 2014 and satellite imagery data 15 5.3. Wealth Index The FHSAE estimates of wealth index at sub-district level preserve the predisposition for the Eastern region to be generally wealthier than the Western region (Figure 6). However, as the green and red/orange areas in the maps are relatively spread out, the welfare disparity between the Eastern and Western regions is not demonstrated strongly as in the previous results based on poverty rates and average real per capita consumption. This suggests that the sub-district FHSAE estimates of wealth index show greater variability within districts than those of the poverty rates and average real per capita consumption. While as previously the majority of the sub-districts are not significantly different from their respective districts, more variations in the FHSAE wealth index estimates are observed compared to the FHSAE estimates of poverty rates and log average per capita consumption. For example, Quelicai (Baucau), Baguia (Baucau), Cailaco (Bobonaro), Fatululic (Covalima), Fatumean (Covalima), Zumalai (Covalima), Metinaro (Dili), Atauro (Dili), Luro (Lautem), Laclubar (Manatuto) and Nitibe (Oecussi) are the numerous sub-districts which are significantly less wealthy than their respective districts (see respective confidence interval in Figure 11). The FHSAE estimates of wealth index, however, do not show sub-districts that are significantly wealthier than their respective districts, though some sub-districts are significantly higher than other sub- districts within the same district. This suggests FHSAE estimates of wealth index provides additional information beyond the already known estimates at the district level. FIGURE 6: MAP OF SUB-DISTRICT WEALTH INDEX USING FHSAE Source: FHSAE based on TLDHS 2016 and satellite imagery data 16 As Figure 6 shows, poor sub-districts based on wealth index are found almost in every district – with Dili being the exception. The wide spread of the wealth index seems highly attributable to the fact that urban households are more likely to be wealthier, where sub-districts with more urban areas tend to have higher wealth index estimates. The evidence that urban households are wealthier is more pronounced when comparing Dili’s sub-districts to subdistricts in other districts. Four out of six Dili’s sub-districts – Dom Aleixo, Nain Feto, Vera Cruz, and Cristo Rei – are the most urbanized sub-districts in Timor-Leste (more than 88 percent), and also the top four sub-districts in terms of wealth index. The outlier trait of these two sub-districts mirror the direct estimates at district level where Dili also possesses a much higher wealth index than other districts. Table 4 presents the poorest sub-districts based on the FHSAE estimates of wealth index which come from seven districts. This suggests more districts with the poorest sub-districts in terms of wealth index, than the four districts with the poorest sub-districts based on the FHSAE poverty rate and log average real per capita consumption. Interestingly, even Lautem and Manatuto, the second and third richest districts in terms of wealth index, have their poorest sub-districts with wealth indices as low as Oecussi’s Nitibe. As in poverty estimates, across subdistricts, the wealth index of many sub-districts is statistically different from that of other sub-districts located in different districts (Figure 11), providing new, more granular information on poverty, that is not available in the existing district direct estimates. TABLE 4: POOREST SUB-DISTRICTS BASED ON FH ESTIMATES OF LOG AVERAGE REAL PER CAPITA EXPENDITURE District Sub-district Estimates S.E. CV Oecussi Nitibe 0.26 0.01 5.25 Lautém Luro 0.27 0.02 6.51 Ermera Hatolia 0.29 0.01 3.97 Oecussi Oesilo 0.30 0.01 4.65 Oecussi Passabe 0.31 0.01 3.95 Covalima Zumalai 0.31 0.01 4.16 Ainaro Hatu-Builico 0.31 0.01 3.65 Ermera Atsabe 0.31 0.02 5.33 Manatuto Laclubar 0.31 0.02 5.74 Ainaro Maubisse 0.32 0.01 3.87 Baucau Quelicai 0.32 0.02 5.72 Source: FHSAE based on TL-SLS 2014 and satellite imagery data 17 6. CONCLUSION The FHSAE method using satellite imagery provides robust estimates of poverty, average real per capita consumption and wealth index at the sub-district level for Timor-Leste. The estimates resulting from the FHSAE are superior to direct estimates as they are more precise, i.e. the FHSAE estimates constitute lower standard errors compared to the direct estimates. The method offers a reliable, alternative way to estimate poverty and welfare in small areas using non-conventional data sources. The FHSAE method, however, must be used cautiously as it loses its precision exponentially when estimating welfare indicators in areas with small sample sizes. The precision is further lost on non-sampled areas as the prediction would then purely rely on synthetic modelling. In the case of Timor-Leste, TL-SLS and TL-DHS provide a number of sampled households for each sub-district and therefore we do not have any out of sample estimation which usually will result in much less precise estimates. The FHSAE estimates at the sub-district level show that, in Timor-Leste, many sub-districts in the western part of the country tend to be poorer than those in the eastern part. While most sub-district FHSAE estimates do not differ significantly from their respective district direct estimates, some sub-district poverty and welfare estimates within the same district do statistically differ from one another. Moreover, variations differ more evidently in sub-districts across districts, with several sub-district poverty and welfare estimates differing significantly from those of other sub-districts in different districts. Therefore, the FHSAE estimates at sub-district level provides additional information beyond the already known district direct estimates from the existing survey, emphasizing the importance of granular poverty and welfare estimates to provide useful information for developing spatially targeted interventions. With most satellite data now available on a frequent basis, the FHSAE method allows estimates to be done at any time and as household survey data becomes available, without relying on the availability of census data. This means the small area estimates resulting from the FHSAE can be updated on a more regular basis and hence poverty and welfare indicators can be monitored more frequently at levels of granularity that are typically not feasible only with survey data. 18 REFERENCES AidData. 2016. TimorLesteAIMS_GeocodedResearchRelease_Level1_v1.4.1 geocoded dataset. Williamsburg, VA and Washington, DC AidData. 2017. WorldBank_GeocodedResearchRelease_Level1_v1.4.2 geocoded dataset. Williamsburg, VA and Washington, DC AidData Research and Evaluation Unit. 2017. Geocoding Methodology, Version 2.0. Williamsburg, VA: AidData at William & Mary. https://www.aiddata.org/publications/geocoding-methodology-version- 2-0 AidData. 2018. GlobalEnvironmentFacilitySectors_GeocodedResearchRelease_Level1_v1.1.0 geocoded dataset. Williamsburg, VA and Washington, DC: AidData. Avila-Valdez et al (2018) “The Fay-Herriot Model in Small Area Estimation L EM Algorithm and Application to Official Data” Revstat – Statistical Journal. Babenko et al. (2017) “Poverty Mapping using Convolutional Neural Networks Trained on High and Medium Resolution Satellite Images, With an Application in Mexico”. 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. Benavent, R. and D. Morales (2016) “Multivariate Fay-Herriot Models for Small Area Estimation”. Computational Statistics & Data Analysis. 94: 372-390. Brauer M, et al. Ambient air pollution exposure estimation for the Global Burden of Disease 2013. Environmental Science & Technology. 2015 Nov 23. doi: 10.1021/acs.est.5b03709. Center for International Earth Science Information Network - CIESIN – Columbia University, and Information Technology Outreach Services - ITOS - University of Georgia. 2013. Global Roads Open Access Data Set, Version 1 (gROADSv1). Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC). http://dx.doi.org/10.7927/H4VD6WCT. Elbers, C., J. O. Lanjouw and P. Lanjouw (2003) “Micro-Level Estimation of Poverty and Inequality”. Econometrica 71(1): 355-364. Engstrom et al. (2019) “Exploring the Landscape of Spatial Robustness”. Proceedings of the 36th International Conference on Machine Learning, PMLR 97, 1802-1811 Fay, R. and R. Herriot (1979) “Estimates of Income for Small Places: An Application of James-Stein Procedures to Census Data”. Journal of the American Statistical Association. 74(1979): 269-277. General Directorate of Statistics (GDS), Ministry of Health and ICF (2018) “Timor-Leste Demographic and Health Survey 2016”. Dili, Timor-Leste and Rockville, Maryland, USA: GDS and ICF. Ghosh, T., Powell, R.L., Elvidge, C.D., Baugh, K.E., Sutton, P.C., Anderson, S.: Shedding light on the global distribution of economic activity.The Open Geography Journal 3, 148-161 (2010) Global Administrative Areas (GADM) http://www.gadm.org 19 Global Wind Atlas 2.0 by the Technical University of Denmark (DTU) in partnership with the World Bank Group, utilizing data provided by Vortex, with funding provided by the Energy Sector Management Assistance Program (ESMAP). https://globalsolaratlas.info. Accessed online 2019-03-05. Goodman, S., BenYishay, A., Lv, Z., & Runfola, D. (2019). GeoQuery: Integrating HPC systems and public web-based geospatial data tools. Computers & Geosciences, 122, 103-112. Hansen, M. C., P. V. Potapov, R. Moore, M. Hancher, S. A. Turubanova, A. Tyukavina, D. Thau, S. V. Stehman, S. J. Goetz, T. R. Loveland, A. Kommareddy, A. Egorov, L. Chini, C. O. Justice, and J. R. G. Townshend. 2013. “High-Resolution Global Maps of 21st-Century Forest Cover Change.” Science 342 (15 November): 850–53. IUCN and UNEP-WCMC (2016), The World Database on Protected Areas (WDPA) [On-line], [April 2017], Cambridge, UK: UNEP-WCMC. Available at: www.protectedplanet.net. Jean et al. (2016) “Combining Satellite Imagery and Machine Learning to Predict Poverty”. Science. Vol. 353, Issue 6301: 790-794 Molina I. and Rao J. (2010) “Small Area Estimation of Poverty Indicators”. Canadian Journal of Statistics, 38(3): 369-385. Rao, J. (2003) “Small Area Estimation”. 1st ed. Wiley-Interscience. Seitz, William (2019) “Where They Live: District-Level Measures of Poverty, Average Consumption, and the Middle Class in Timor-Leste”. Policy Research Working Paper (8940). Poverty and Equity Global Practice, The World Bank Group. Sundberg, Ralph, and Erik Melander, 2013, 'Introducing the UCDP Georeferenced Event Dataset', Journal of Peace Research, vol.50, no.4, 523-532 Croicu, Mihai and Ralph Sundberg, 2017, “UCDP GED Codebook version 17.1”, Department of Peace and Conflict Research, Uppsala University. Tingzon et al (2019) “Mapping Poverty in the Philippines using Machine Learning, Satellite Imagery, and Crowd-Sourced Geospatial Information”. AI for Social Good ICML 2019 Workshop. Wessel, P., and W. H. F. Smith, A Global Self-consistent, Hierarchical, High-resolution Shoreline Database, J. Geophys. Res., 101, #B4, pp. 8741-8743, 1996. Wolter, Kirk (2007) “Introduction to Variance Estimation”. Springer Science & Business Media. World Bank (2019) “Developing Timor-Leste Gender-Disaggregated Poverty Small Area Estimates: Technical Report”. The World Bank Group. 20 APPENDIX 1: POVERTY ESTIMATES USING FHSAE TABLE 5: POVERTY ESTIMATES USING FHSAE District Sub-district Poverty rate (%) Gamma CV Code Name Code Name Estimates S.E. 01 Ainaro 0101 Ainaro 39% 9% 0.63 22.65 01 Ainaro 0102 Hatu-Builico 46% 5% 0.20 10.63 01 Ainaro 0103 Maubisse 47% 6% 0.32 13.51 01 Ainaro 0104 Hatu-Udo 43% 5% 0.22 11.68 02 Aileu 0201 Aileu Vila 36% 5% 0.18 12.72 02 Aileu 0202 Liquidoe 14% 4% 0.13 27.57 02 Aileu 0203 Remexio 31% 9% 0.65 29.58 02 Aileu 0204 Laulara 34% 3% 0.07 8.14 03 Baucau 0301 Baucau 22% 4% 0.13 17.89 03 Baucau 0302 Laga 37% 9% 0.62 23.47 03 Baucau 0303 Quelicai 36% 5% 0.25 14.85 03 Baucau 0304 Baguia 48% 6% 0.29 12.00 03 Baucau 0305 Vemase 29% 8% 0.51 27.02 03 Baucau 0306 Venilale 54% 3% 0.06 4.95 04 Bobonaro 0401 Maliana 58% 8% 0.48 13.38 04 Bobonaro 0402 Cailaco 82% 4% 0.11 4.42 04 Bobonaro 0403 Balibo 41% 6% 0.32 14.77 04 Bobonaro 0404 Atabae 46% 10% 0.83 21.66 04 Bobonaro 0405 Lolotoe 54% 10% 0.81 18.20 04 Bobonaro 0406 Bobonaro 52% 6% 0.26 10.64 05 Covalima 0501 Fatululic 79% 0% 0.00 0.00 05 Covalima 0502 Fatumean 37% 0% 0.00 0.00 05 Covalima 0503 Forohem 53% 3% 0.07 5.40 21 District Sub-district Poverty rate (%) Gamma CV Code Name Code Name Estimates S.E. 05 Covalima 0504 Maukatar 51% 9% 0.70 17.81 05 Covalima 0505 Suai 49% 7% 0.41 14.36 05 Covalima 0506 Tilomar 34% 9% 0.65 25.52 05 Covalima 0507 Zumalai 61% 7% 0.42 12.11 06 Dili 0601 Vera Cruz 24% 7% 0.37 28.23 06 Dili 0602 Nain Feto 29% 6% 0.29 20.52 06 Dili 0603 Metinaro 42% 7% 0.39 16.08 06 Dili 0604 Atauro 56% 3% 0.08 5.71 06 Dili 0605 Dom Aleixo 26% 4% 0.13 15.22 06 Dili 0606 Cristo Rei 33% 5% 0.18 14.13 07 Ermera 0701 Railaco 47% 8% 0.60 17.89 07 Ermera 0702 Ermera 60% 9% 0.63 14.79 07 Ermera 0703 Letefoho 67% 8% 0.54 12.37 07 Ermera 0704 Atsabe 59% 9% 0.63 15.66 07 Ermera 0705 Hatolia 63% 6% 0.28 9.00 08 Liquiça 0801 Bazartete 43% 7% 0.40 15.82 08 Liquiça 0802 Liquiça 37% 8% 0.50 20.86 08 Liquiça 0803 Maubara 37% 11% 0.75 28.59 09 Lautém 0901 Lospalos 24% 7% 0.33 30.64 09 Lautém 0902 Lautém 38% 7% 0.43 18.81 09 Lautém 0903 Iliomar 33% 4% 0.16 13.03 09 Lautém 0904 Luro 51% 8% 0.49 14.92 09 Lautém 0905 Tutuala 9% 4% 0.15 45.74 10 Manufahi 1001 Same 43% 4% 0.17 10.49 10 Manufahi 1002 Alas 49% 8% 0.57 16.73 22 District Sub-district Poverty rate (%) Gamma CV Code Name Code Name Estimates S.E. 10 Manufahi 1003 Fatuberliu 28% 5% 0.21 17.30 10 Manufahi 1004 Turiscai 49% 9% 0.68 18.88 11 Manatuto 1101 Manatuto 30% 5% 0.26 18.50 11 Manatuto 1102 Laleia 16% 5% 0.24 32.83 11 Manatuto 1103 Laclo 57% 8% 0.56 14.86 11 Manatuto 1104 Soibada 64% 0% 0.00 0.00 11 Manatuto 1105 Barique/Natarbora 43% 9% 0.67 20.51 11 Manatuto 1106 Laclubar 45% 7% 0.42 15.66 12 Oecussi 1201 Pante Macasar 58% 4% 0.13 6.94 12 Oecussi 1202 Nitibe 71% 6% 0.32 8.67 12 Oecussi 1203 Oesilo 67% 6% 0.31 9.11 12 Oecussi 1204 Passabe 63% 9% 0.70 14.76 13 Viqueque 1301 Uatucarbau 40% 6% 0.32 15.41 13 Viqueque 1302 Ossu 32% 6% 0.28 17.77 13 Viqueque 1303 Watulari 37% 7% 0.38 18.10 13 Viqueque 1304 Viqueque 33% 3% 0.08 8.78 13 Viqueque 1305 Lacluta 69% 1% 0.00 0.72 23 APPENDIX 2: LOG AVERAGE REAL PER CAPITA CONSUMPTION USING FHSAE TABLE 6: LOG AVERAGE REAL PER CAPITA CONSUMPTION ESTIMATES USING FHSAE District Sub-district Log RPCC Gamma CV Code Name Code Name Estimates S.E. 01 Ainaro 0101 Ainaro 4.01 0.10 0.70 2.47 01 Ainaro 0102 Hatu-Builico 3.90 0.06 0.23 1.43 01 Ainaro 0103 Maubisse 3.88 0.06 0.24 1.46 01 Ainaro 0104 Hatu-Udo 4.02 0.06 0.31 1.57 02 Aileu 0201 Aileu Vila 3.99 0.05 0.22 1.34 02 Aileu 0202 Liquidoe 4.17 0.04 0.10 0.87 02 Aileu 0203 Remexio 4.00 0.10 0.73 2.60 02 Aileu 0204 Laulara 4.04 0.06 0.30 1.55 03 Baucau 0301 Baucau 4.12 0.04 0.11 0.89 03 Baucau 0302 Laga 4.05 0.09 0.68 2.34 03 Baucau 0303 Quelicai 3.99 0.04 0.15 1.07 03 Baucau 0304 Baguia 3.82 0.04 0.10 0.94 03 Baucau 0305 Vemase 4.07 0.07 0.41 1.82 03 Baucau 0306 Venilale 3.84 0.03 0.05 0.67 04 Bobonaro 0401 Maliana 3.80 0.08 0.49 2.19 04 Bobonaro 0402 Cailaco 3.64 0.08 0.48 2.24 04 Bobonaro 0403 Balibo 3.91 0.06 0.25 1.44 04 Bobonaro 0404 Atabae 3.94 0.09 0.68 2.38 04 Bobonaro 0405 Lolotoe 3.82 0.10 0.77 2.64 04 Bobonaro 0406 Bobonaro 3.86 0.06 0.31 1.65 05 Covalima 0501 Fatululic 3.58 0.00 0.00 0.00 05 Covalima 0502 Fatumean 4.01 0.00 0.00 0.00 24 District Sub-district Log RPCC Gamma CV Code Name Code Name Estimates S.E. 05 Covalima 0503 Forohem 3.81 0.07 0.35 1.76 05 Covalima 0504 Maukatar 3.86 0.10 0.81 2.64 05 Covalima 0505 Suai 3.84 0.08 0.48 2.09 05 Covalima 0506 Tilomar 4.10 0.08 0.53 2.04 05 Covalima 0507 Zumalai 3.81 0.09 0.48 2.24 06 Dili 0601 Vera Cruz 4.13 0.06 0.30 1.54 06 Dili 0602 Nain Feto 4.21 0.07 0.33 1.55 06 Dili 0603 Metinaro 3.92 0.04 0.12 1.01 06 Dili 0604 Atauro 3.85 0.06 0.21 1.48 06 Dili 0605 Dom Aleixo 4.17 0.05 0.22 1.28 06 Dili 0606 Cristo Rei 4.09 0.06 0.30 1.55 07 Ermera 0701 Railaco 3.91 0.10 0.75 2.54 07 Ermera 0702 Ermera 3.77 0.10 0.66 2.55 07 Ermera 0703 Letefoho 3.65 0.09 0.55 2.40 07 Ermera 0704 Atsabe 3.81 0.09 0.59 2.40 07 Ermera 0705 Hatolia 3.81 0.08 0.50 2.11 08 Liquiça 0801 Bazartete 3.93 0.07 0.35 1.73 08 Liquiça 0802 Liquiça 3.95 0.07 0.41 1.88 08 Liquiça 0803 Maubara 3.84 0.08 0.51 2.14 09 Lautém 0901 Lospalos 4.07 0.07 0.33 1.61 09 Lautém 0902 Lautém 3.97 0.06 0.27 1.48 09 Lautém 0903 Iliomar 4.07 0.06 0.28 1.47 09 Lautém 0904 Luro 3.92 0.08 0.50 2.05 09 Lautém 0905 Tutuala 4.16 0.08 0.45 1.85 10 Manufahi 1001 Same 3.93 0.04 0.14 1.06 25 District Sub-district Log RPCC Gamma CV Code Name Code Name Estimates S.E. 10 Manufahi 1002 Alas 3.81 0.05 0.22 1.40 10 Manufahi 1003 Fatuberliu 4.09 0.05 0.16 1.11 10 Manufahi 1004 Turiscai 3.87 0.05 0.20 1.32 11 Manatuto 1101 Manatuto 4.10 0.06 0.25 1.37 11 Manatuto 1102 Laleia 4.21 0.09 0.64 2.21 11 Manatuto 1103 Laclo 3.84 0.09 0.62 2.44 11 Manatuto 1104 Soibada 3.76 0.00 0.00 0.00 11 Manatuto 1105 Barique/Natarbora 3.96 0.08 0.50 2.03 11 Manatuto 1106 Laclubar 3.92 0.07 0.35 1.72 12 Oecussi 1201 Pante Macasar 3.80 0.04 0.11 1.00 12 Oecussi 1202 Nitibe 3.64 0.06 0.29 1.68 12 Oecussi 1203 Oesilo 3.75 0.09 0.59 2.37 12 Oecussi 1204 Passabe 3.74 0.09 0.63 2.48 13 Viqueque 1301 Uatucarbau 3.97 0.05 0.20 1.28 13 Viqueque 1302 Ossu 4.09 0.03 0.09 0.83 13 Viqueque 1303 Watulari 4.03 0.06 0.26 1.42 13 Viqueque 1304 Viqueque 4.05 0.03 0.06 0.65 13 Viqueque 1305 Lacluta 3.72 0.04 0.11 0.97 26 APPENDIX 3: NORMALIZED WEALTH INDEX ESTIMATES USING FHSAE TABLE 7: NORMALIZED WEALTH INDEX ESTIMATES USING FHSAE District Sub-district Wealth Index Gamma CV Code Name Code Name Estimates S.E. 01 Ainaro 0101 Ainaro 4.01 0.10 0.70 2.47 01 Ainaro 0102 Hatu-Builico 3.90 0.06 0.23 1.43 01 Ainaro 0103 Maubisse 3.88 0.06 0.24 1.46 01 Ainaro 0104 Hatu-Udo 4.02 0.06 0.31 1.57 02 Aileu 0201 Aileu Vila 3.99 0.05 0.22 1.34 02 Aileu 0202 Liquidoe 4.17 0.04 0.10 0.87 02 Aileu 0203 Remexio 4.00 0.10 0.73 2.60 02 Aileu 0204 Laulara 4.04 0.06 0.30 1.55 03 Baucau 0301 Baucau 4.12 0.04 0.11 0.89 03 Baucau 0302 Laga 4.05 0.09 0.68 2.34 03 Baucau 0303 Quelicai 3.99 0.04 0.15 1.07 03 Baucau 0304 Baguia 3.82 0.04 0.10 0.94 03 Baucau 0305 Vemase 4.07 0.07 0.41 1.82 03 Baucau 0306 Venilale 3.84 0.03 0.05 0.67 04 Bobonaro 0401 Maliana 3.80 0.08 0.49 2.19 04 Bobonaro 0402 Cailaco 3.64 0.08 0.48 2.24 04 Bobonaro 0403 Balibo 3.91 0.06 0.25 1.44 04 Bobonaro 0404 Atabae 3.94 0.09 0.68 2.38 04 Bobonaro 0405 Lolotoe 3.82 0.10 0.77 2.64 04 Bobonaro 0406 Bobonaro 3.86 0.06 0.31 1.65 05 Covalima 0501 Fatululic 3.58 0.00 0.00 0.00 05 Covalima 0502 Fatumean 4.01 0.00 0.00 0.00 05 Covalima 0503 Forohem 3.81 0.07 0.35 1.76 27 District Sub-district Wealth Index Gamma CV Code Name Code Name Estimates S.E. 05 Covalima 0504 Maukatar 3.86 0.10 0.81 2.64 05 Covalima 0505 Suai 3.84 0.08 0.48 2.09 05 Covalima 0506 Tilomar 4.10 0.08 0.53 2.04 05 Covalima 0507 Zumalai 3.81 0.09 0.48 2.24 06 Dili 0601 Vera Cruz 4.13 0.06 0.30 1.54 06 Dili 0602 Nain Feto 4.21 0.07 0.33 1.55 06 Dili 0603 Metinaro 3.92 0.04 0.12 1.01 06 Dili 0604 Atauro 3.85 0.06 0.21 1.48 06 Dili 0605 Dom Aleixo 4.17 0.05 0.22 1.28 06 Dili 0606 Cristo Rei 4.09 0.06 0.30 1.55 07 Ermera 0701 Railaco 3.91 0.10 0.75 2.54 07 Ermera 0702 Ermera 3.77 0.10 0.66 2.55 07 Ermera 0703 Letefoho 3.65 0.09 0.55 2.40 07 Ermera 0704 Atsabe 3.81 0.09 0.59 2.40 07 Ermera 0705 Hatolia 3.81 0.08 0.50 2.11 08 Liquiça 0801 Bazartete 3.93 0.07 0.35 1.73 08 Liquiça 0802 Liquiça 3.95 0.07 0.41 1.88 08 Liquiça 0803 Maubara 3.84 0.08 0.51 2.14 09 Lautém 0901 Lospalos 4.07 0.07 0.33 1.61 09 Lautém 0902 Lautém 3.97 0.06 0.27 1.48 09 Lautém 0903 Iliomar 4.07 0.06 0.28 1.47 09 Lautém 0904 Luro 3.92 0.08 0.50 2.05 09 Lautém 0905 Tutuala 4.16 0.08 0.45 1.85 10 Manufahi 1001 Same 3.93 0.04 0.14 1.06 10 Manufahi 1002 Alas 3.81 0.05 0.22 1.40 28 District Sub-district Wealth Index Gamma CV Code Name Code Name Estimates S.E. 10 Manufahi 1003 Fatuberliu 4.09 0.05 0.16 1.11 10 Manufahi 1004 Turiscai 3.87 0.05 0.20 1.32 11 Manatuto 1101 Manatuto 4.10 0.06 0.25 1.37 11 Manatuto 1102 Laleia 4.21 0.09 0.64 2.21 11 Manatuto 1103 Laclo 3.84 0.09 0.62 2.44 11 Manatuto 1104 Soibada 3.76 0.00 0.00 0.00 11 Manatuto 1105 Barique/Natarbora 3.96 0.08 0.50 2.03 11 Manatuto 1106 Laclubar 3.92 0.07 0.35 1.72 12 Oecussi 1201 Pante Macasar 3.80 0.04 0.11 1.00 12 Oecussi 1202 Nitibe 3.64 0.06 0.29 1.68 12 Oecussi 1203 Oesilo 3.75 0.09 0.59 2.37 12 Oecussi 1204 Passabe 3.74 0.09 0.63 2.48 13 Viqueque 1301 Uatucarbau 3.97 0.05 0.20 1.28 13 Viqueque 1302 Ossu 4.09 0.03 0.09 0.83 13 Viqueque 1303 Watulari 4.03 0.06 0.26 1.42 13 Viqueque 1304 Viqueque 4.05 0.03 0.06 0.65 13 Viqueque 1305 Lacluta 3.72 0.04 0.11 0.97 29 APPENDIX 4: VALIDATION AND PRECISION The objective of performing FHSAE is to minimize the RMSE caused by the imprecision of TL-SLS and TL- DHS to directly estimate poverty and welfare indicators at lower than district level, in this case the sub- district level. This validation and precision section therefore present comparisons of the RMSE from various approaches that were exercised before deciding on the chosen model specification explained in this report. Several precision and validation checks conducted during the exercise resulted in all chosen independent variables in the three FHSAE models are significant and have uncentered VIF less than 20 (please see the regression diagnostics of the three models presented in Table 9 to Table 11 in Appendix 8). Figure 7 shows comparisons of RMSE obtained from sub-district direct estimates and the FHSAE estimates. The size of the vertices indicates the sample size of the sub-district. In general, the scatter plots show that the FHSAE estimates have lower RMSE compared to the direct estimates. The reduction of RMSE grows larger as sub-district direct estimate’s RMSE becomes larger, indicating that estimates relying more to the model estimates from the FHSAE’s FGLS estimates by applying higher gamma values to the FGLS estimates. In general, the scatter plots show that the FHSAE estimates have lower RMSE compared to the direct estimates. The gains in terms of RMSE are larger as sub-district direct estimate’s RMSE becomes larger, indicating that the FHSAE estimates relying more to the model estimates from the FGLS estimates by applying higher gamma values to the FGLS estimates. Although not obvious, there is also an indication of larger gains as the sample size is getting smaller. In sub-districts with small sample size, the RMSE of FHSAE estimates are smaller than that of the direct estimates. FIGURE 7: RMSE OF TL-SLS DIRECT ESTIMATES VERSUS FHSAE ESTIMATES Poverty Rate Log Average Real Per Capita Normalized Wealth Index Consumption Source: TL-SLS 2014, TL-DHS 2016, World Bank staff calculations The improvement made by the FHSAE becomes more obvious when the CV of sub-district direct estimates are compared to the FHSAE estimates. The CV of poverty rates direct estimates reached 68 percent, while log average real per capita consumption reached 7 percent. The FHSAE reduced the maximum CV of sub- district poverty rates and log averaged per capita consumption to 46 percent and 3 percent, respectively. As for normalized wealth index, the maximum CV has been reduced from 20 percent to 12 percent. 30 To check if further major improvement can be made with the model, the chosen model selected using iteration of independent variables with the highest significance level is compared with the StepVIF and Stepwise Regression selection method proposed by Seitz (2019) in Figure 8. For poverty rate and log average per capita consumption FHSAE estimates, it is obvious that the chosen model selected by iteration method performs much better for higher RMSEs. As for the normalized wealth index, while it is true that StepVIF and Stepwise Regression performs better, but the gain from deviating to the other selection method is deemed not significant as there are no major improvements even on the highest RMSE. FIGURE 8: RMSE OF FHSAE ESTIMATES FROM CHOSEN MODEL VERSUS STEPVIF+STEPWISE REGRESSION SELECTION Poverty Rate Log Average Real Per Capita Normalized Wealth Index Consumption Source: TL-SLS 2014, TL-DHS 2016, World Bank staff calculations 31 APPENDIX 5: DISTRICT DIRECT AND SUB-DISTRICT FH ESTIMATES FIGURE 9: POVERTY RATE COMPARISON BETWEEN DISTRICT DIRECT ESTIMATES AND SUB-DISTRICT FH ESTIMATES District: Ainaro District: Aileu 100% 100% 80% 80% 60% 60% 40% 40% 20% 20% 0% 0% District Direct Ainaro Hatu-Builico Maubisse Hatu-Udo District Aileu Vila Liquidoe Remexio Laulara Estimate Direct Estimate District: Baucau District: Bobonaro 100% 100% 80% 80% 60% 60% 40% 40% 20% 20% 0% 0% District Baucau Laga Quelicai Baguia Vemase Venilale District Maliana Cailaco Balibo Atabae Lolotoe Bobonaro Direct Direct Estimate Estimate District: Covalima District: Dili 100% 100% 80% 80% 60% 60% 40% 40% 20% 20% 0% 0% District Fatululic Fatumean Forohem Maukatar Suai Tilomar Zumalai District Vera Cruz Nain Feto Metinaro Atauro Dom Cristo Rei Direct Direct Aleixo Estimate Estimate 32 District: Ermera District: Liquiça 100% 100% 80% 80% 60% 60% 40% 40% 20% 20% 0% 0% District Railaco Ermera Letefoho Atsabe Hatolia District Direct Bazartete Liquiça Maubara Direct Estimate Estimate District: Lautém District: Manufahi 100% 100% 80% 80% 60% 60% 40% 40% 20% 20% 0% 0% District Lospalos Lautém Iliomar Luro Tutuala District Same Alas Fatuberliu Turiscai Direct Direct Estimate Estimate District: Manatuto District: Oecussi 100% 100% 80% 80% 60% 60% 40% 40% 20% 20% 0% 0% District Manatuto Laleia Laclo Soibada Barique/ Laclubar District Pante Nitibe Oesilo Passabe Direct Natarbora Direct Macasar Estimate Estimate District: Viqueque 100% 80% 60% 40% 20% 0% District Uatucarbau Ossu Watulari Viqueque Lacluta Direct Estimate 33 FIGURE 10: LOG REAL PER CAPITA CONSUMPTION COMPARISON BETWEEN DISTRICT DIRECT ESTIMATES AND SUB- DISTRICT FH ESTIMATES District: Ainaro District: Aileu 5 5 4 4 3 3 2 2 1 1 0 0 District Ainaro Hatu-Builico Maubisse Hatu-Udo District Direct Aileu Vila Liquidoe Remexio Laulara Direct Estimate Estimate District: Baucau District: Bobonaro 5 5 4 4 3 3 2 2 1 1 0 0 District Baucau Laga Quelicai Baguia Vemase Venilale District Maliana Cailaco Balibo Atabae Lolotoe Bobonaro Direct Direct Estimate Estimate District: Covalima District: Dili 5 5 4 4 3 3 2 2 1 1 0 0 District Fatululic Fatumean Forohem Maukatar Suai Tilomar Zumalai District Vera Cruz Nain Metinaro Atauro Dom Cristo Rei Direct Direct Feto Aleixo Estimate Estimate 34 District: Ermera District: Liquiça 5 5 4 4 3 3 2 2 1 1 0 0 District Direct Bazartete Liquiça Maubara District Direct Railaco Ermera Letefoho Atsabe Hatolia Estimate Estimate District: Lautém District: Manufahi 5 5 4 4 3 3 2 2 1 1 0 0 District Lospalos Lautém Iliomar Luro Tutuala District Direct Same Alas Fatuberliu Turiscai Direct Estimate Estimate District: Manatuto District: Oecussi 5 5 4 4 3 3 2 2 1 1 0 District Manatuto Laleia Laclo Soibada Barique/ Laclubar 0 Direct Natarbora District Direct Pante Nitibe Oesilo Passabe Estimate Estimate Macasar District: Viqueque 5 4 3 2 1 0 District Uatucarbau Ossu Watulari Viqueque Lacluta Direct Estimate 35 FIGURE 11: WEALTH INDEX COMPARISON BETWEEN DISTRICT DIRECT ESTIMATES AND SUB-DISTRICT FH ESTIMATES District: Ainaro District: Aileu 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 District Direct Ainaro Hatu-Builico Maubisse Hatu-Udo District Direct Aileu Vila Liquidoe Remexio Laulara Estimate Estimate District: Baucau District: Bobonaro 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 District Baucau Laga Quelicai Baguia Vemase Venilale District Maliana Cailaco Balibo Atabae Lolotoe Bobonaro Direct Direct Estimate Estimate District: Covalima District: Dili 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 District Fatululic Fatumean Forohem Maukatar Suai Tilomar Zumalai District Vera Cruz Nain Feto Metinaro Atauro Dom Cristo Rei Direct Direct Aleixo Estimate Estimate 36 District: Ermera District: Liquiça 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 District Railaco Ermera Letefoho Atsabe Hatolia District Direct Bazartete Liquiça Maubara Direct Estimate Estimate District: Lautém District: Manufahi 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 District Lospalos Lautém Iliomar Luro Tutuala District Direct Same Alas Fatuberliu Turiscai Direct Estimate Estimate District: Manatuto District: Oecussi 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 District Manatuto Laleia Laclo Soibada Barique/ Laclubar District Direct Pante Nitibe Oesilo Passabe Direct Natarbora Estimate Macasar Estimate District: Viqueque 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 District Uatucarbau Ossu Watulari Viqueque Lacluta Direct Estimate 37 APPENDIX 6: COMPARISON BETWEEN ELL AND FHSAE POVERTY ESTIMATES The Small Area Estimation (SAE) method for estimating poverty at a level below the district is not new for Timor-Leste. The World Bank has developed a suco (village) level poverty map using the ELL method (World Bank, 2019), a model-based estimation which relies on overlapped household characteristics variables in the TL-SLS 2014 and TL-PHC 2015 to estimate household’s per capita consumption in all census areas. Poverty rates at suco level were obtained by applying poverty lines to households’ estimated per capita consumption. This section briefly compares poverty estimates resulting from the FHSAE and ELL methods to validate two model estimates. Two criteria were used for validation, i.e. the precision and consistency of the estimates. Figure 12 and Figure 13 display confidence intervals of the ELL and FHSAE estimates for poverty rate. Note there are differences in the resolution of the graphs because the ELL estimates were calculated at suco level and the FHSAE estimates were calculated at sub-district level. When viewed, the confidence intervals of the two methods look somewhat similar, however direct comparison of its RMSE distribution shows that the ELL method produced more precise estimates. The ELL estimates have minimum and maximum RMSE of 2.8 and 25.6, respectively, with a median of 9.6. Meanwhile, the FHSAE estimates have minimum and maximum RMSE of 0 and 10.7, respectively, with a median of 6.1. While the precision of FHSAE is higher than ELL, it is acknowledged that the comparison is performed at different administrative level, with the FHSAE estimates used sub-district level data and the ELL estimates used suco level data. FIGURE 12: CONFIDENCE INTERVAL OF ELL POVERTY ESTIMATES 120 100 80 60 40 20 - (20) Source: TL-SLS 2014, TLPHC 2015 , World Bank staff calculations 38 FIGURE 13: CONFIDENCE INTERVAL OF FHSAE POVERTY ESTIMATES 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Source: TL-SLS 2014, TLPHC 2015 , World Bank staff calculations To examine whether the ELL and FHSAE estimates are consistent, the suco level ELL poverty estimates were aggregated to sub-district level to allow comparability with the sub-district FHSAE poverty estimates. Measuring consistency between the ELL and FHSAE using the Spearman correlation coefficient suggests that the two estimates, with a correlation coefficient of 0.55, are moderately correlated and statistically significant (with 0 percent significance level at 0.01). This moderate correlation could partly be driven by variations in the magnitude of the gaps of the estimated poverty rates, as shown by Figure 14 below, and thus does not necessarily verify the relative precision of the estimates. FIGURE 14: POVERTY RATES AT SUB-DISTRICT LEVEL BASED ON DIRECT, ELL, AND FHSAE ESTIMATIONS 100 90 80 70 60 50 40 30 20 10 0 ELL SAE Direct Estimates Source: TL-SLS 2014, TLPHC 2015 , World Bank staff calculations The graphs of estimated poverty rates in Figure 14 were constructed by ordering sub-districts based on direct poverty estimates. Here direct estimates are presented only to serve as a benchmark for comparison and should not be considered as reliable and representative estimates of poverty at sub-district level. As 39 Figure 14 shows, FHSAE estimates seem much more aligned with sub-district direct estimates than ELL. Also, the Spearman rank correlation of FHSAE to direct estimates is much higher than that of ELL estimates with coefficients of 0.93 and 0.62, respectively. However, this is somewhat to be expected given the way FHSAE is constructed as a weighted average of the direct estimate and the model-based estimator. Visually, consistency between estimates from the two methods can be observed from maps using the same color range (Figure 15 11 ). Despite slight variations in terms of the rates of poverty based on ELL and FHSAE estimation, the two sets of model estimates reveal a consistent pattern of poverty distribution in Timor- Leste, that is, the poverty rate is much higher in the western region than in the eastern region. The intensity of poverty rates between the two methods are slightly different with the FHSAE showing more intense poverty prediction in the western region than the ELL does. FIGURE 15: POVERTY MAP COMPARISON BETWEEN ELL AND FHSAE Source: TL-SLS 2014, TLPHC 2015 , World Bank staff calculations In conclusion, both FHSAE and ELL methods have their advantages and disadvantages. The ELL method allows small area estimation to be performed at more granular level whenever census data is available. The FHSAE method, on the other hand, provides estimates with relatively lower standard errors due to the way it combines information from both direct estimates and model estimates. Although FHSAE allows estimation of non-sampled areas, performing FHSAE at suco level is not advisable due to the lack of sample from TL-SLS and TLDHS. Therefore, whenever census data is available, ELL can be the preferred tool for estimating suco level welfare indicators, while in the absence of census data, FHSAE has proven to be a powerful tool to perform small area estimation whenever household survey data is available. 11 Figure 15 was constructed using the same color range used for the sub-district FHSAE poverty maps in Figure 4. 40 APPENDIX 7: LIST OF VARIABLES AND OUTLIERS TABLE 8: LIST OF VARIABLES AND OUTLIERS No. Variable Name Label No. of Outliers 1 city_access Access to City (2015) 0 2 bcount Number of housing buildings (census) 1 3 hcount Number of Hospitals 0 4 scount Number of Education Facilities 0 5 major_road Length of major road -km (osm) 1 6 other_road Length of non-major road -km (osm) 0 7 elevation Mean elevation(m) 0 8 slope Slope 0 9 maxtemp Mean maximum temperature 0 10 mintemp Mean minimum temperature 0 11 ndvi Mean NDVI 0 12 ntl Mean Nighttime Light 8 13 popden Mean Population density 6 14 precipitation Mean precipitation 0 15 lu1 Broadleaf Evergreen Forest (Land cover area km2) 0 16 lu2 Broadleaf Deciduous Forest (Land cover area km2) 0 17 lu3 Needleleaf Evergreen Forest (Land cover area km2) 2 18 lu4 Needleleaf Deciduous Forest (Land cover area km2) 3 19 lu5 Mixed Forest (Land cover area km2) 3 20 lu6 Tree Open (Land cover area km2) 0 21 lu7 Shrub (Land cover area km2) 8 22 lu8 Herbaceous (Land cover area km2) 0 23 lu10 Sparse vegetation (Land cover area km2) 65 24 lu11 Cropland (Land cover area km2) 1 25 lu12 Paddy field (Land cover area km2) 1 26 lu13 Cropland / Other Vegetation Mosaic (Land cover area km2) 0 27 lu14 Mangrove (Land cover area km2) 65 28 lu18 Urban (Land cover area km2) 65 29 lu20 Water bodies (Land cover area km2) 9 30 bu1 Water surface 8 31 bu2 Land no built-up in any epoch (km2) 0 32 bu3 Built-up from 2000 to 2014 epochs(km2) 5 33 bu4 Built-up from 1990 to 2000 epochs(km2) 11 34 bu5 Built-up from 1975 to 1990 epochs(km2) 4 41 No. Variable Name Label No. of Outliers 35 bu6 Built-up up to 1975 epoch(km2) 8 36 adm1_en TL 13 District name 0 37 adm2_en TL 65 Subdistrict name 0 38 tc_mean Tree cover year 2000, mean 0 39 wspeed_mean Wind speed, mean 10 40 wpower_mean Wind power, mean 0 41 gdp_mean GDP mean 0 42 dcos_mean Distance to coast, mean 1 43 dwat_mean Distance to water, mean 0 44 drod_mean Distance to roads, mean 0 45 dbor_mean Distance to country borders, mean 0 46 o3_2013 Ambient air pollution exposure estimation 65 47 o3cal_2013 Ambient air pollution exposure estimation 2013, calibrated 0 48 tc_max Tree cover year 2000, max 0 49 wspeed_max Wind speed, max 6 50 wpower_max Wind power, max 0 51 gdp_max GDP max 0 52 dcos_max Distance to coast, max 1 53 dwat_max Distance to water, max 0 54 drod_max Distance to roads, max 65 55 dbor_max Distance to country borders, max 0 56 tc_min Tree cover year 2000, min 0 57 wspeed_min Wind speed, min 65 58 wpower_min Wind power, min 0 59 gdp_min GDP min 0 60 dcos_min Distance to coast, min 65 61 dwat_min Distance to water, min 65 62 drod_min Distance to roads, min 3 63 dbor_min Distance to country borders, min 0 64 aid_wb Aid data from World Bank 65 65 aid_tlrs Aid data from Timor Leste Recipient System, geocoded and published by 9 aiddata. 66 aid_gefs Aid data from Global Environment Facility Sectors 0 67 aid_chna Aid data from Global Chinese Official Finance 0 68 wdpa_all Protected areas (in pixel), total area 65 69 wdpa_unprot Protected areas (in pixel), unprotected areas 65 70 wdpa_1a Protected areas (in pixel), category 1a 65 71 wdpa_1b Protected areas (in pixel), category 1b 65 42 No. Variable Name Label No. of Outliers 72 wdpa_2 Protected areas (in pixel), category 2 65 73 wdpa_3 Protected areas (in pixel), category 3 65 74 wdpa_4 Protected areas (in pixel), category 4 65 75 wdpa_5 Protected areas (in pixel), category 5 65 76 wdpa_6 Protected areas (in pixel), category 6 65 77 wdpa_NA Protected areas (in pixel), not applicable 2 78 wdpa_NAs Protected areas (in pixel), not assigned 65 79 wdpa_DK Protected areas (in pixel), mixed 0 80 wdpa_mix Protected areas (in pixel), mixed 0 81 tcloss_all Tree cover loss, total area 0 82 tcloss_noloss Tree cover loss, in pixel 0 83 tcloss_2009 Tree cover loss (in pixel) 2009 0 84 tcloss_2008 Tree cover loss (in pixel) 2008 0 85 tcloss_2007 Tree cover loss (in pixel) 2007 0 86 tcloss_2006 Tree cover loss (in pixel) 2006 0 87 tcloss_2005 Tree cover loss (in pixel) 2005 0 88 tcloss_2004 Tree cover loss (in pixel) 2004 0 89 tcloss_2003 Tree cover loss (in pixel) 2003 0 90 tcloss_2002 Tree cover loss (in pixel) 2002 0 91 tcloss_2001 Tree cover loss (in pixel) 2001 0 92 tcloss_2010 Tree cover loss (in pixel) 2010 0 93 tcloss_2011 Tree cover loss (in pixel) 2011 0 94 tcloss_2012 Tree cover loss (in pixel) 2012 0 95 tcloss_2013 Tree cover loss (in pixel) 2013 0 96 tcloss_2014 Tree cover loss (in pixel) 2014 0 97 tcloss_2015 Tree cover loss (in pixel) 2015 0 98 tcloss_2016 Tree cover loss (in pixel) 2016 0 99 tcloss_2017 Tree cover loss (in pixel) 2017 0 100 tcloss_2018 Tree cover loss (in pixel) 2018 0 101 death_sum Number of total fatalities per 0.01 decimal degree grid cell resulting from 0 conflict 102 gdp_sum GDP sum 0 43 APPENDIX 8: FH ESTIMATION MODEL DETAILS TABLE 9: FHSAE REGRESSION RESULTS FOR POVERTY RATE Coefficients VARIABLES (Standard Errors) Built-up from 1990 to 2000 epochs (km2) 0.3*** (0.069) Length of non-major road (osm) -0.001** (0.00041) Distance to roads, min 0.00002*** (0.00001) Distance to coast, mean 0.00001*** (0.000002) Distance to roads, mean -0.000046*** (0.000013) Aid data from World Bank -0.00000001** (0) Protected areas (in pixel), category 5 -0.00015** (0.00008) Constant 0.16406** (0.09262) Number of Observations 65 Estimated Random Effects Variance 0.01150843 Log Likelihood 42.541907 AIC -67.083815 AICc -64.512386 BIC -47.514329 Vector Inflation Factor Built-up from 1990 to 2000 epochs (km2) 2.01 Length of non-major road (osm) 4.77 Distance to roads, min 1.29 Distance to coast, mean 5.90 Distance to roads, mean 7.49 Aid data from World Bank 2.66 Protected areas (in pixel), category 5 1.11 Mean VIF 4.73 44 TABLE 10: FHSAE REGRESSION RESULTS FOR LOG PER CAPITA EXPENDITURE Coefficients VARIABLES (Standard Errors) Distance to coast, mean -0.00001*** (0.000003) Built-up from 1990 to 2000 epochs (km2) -0.219*** (0.0606) Aid data from World Bank 0.00000001*** (0.000000002) Slope -0.0095** (0.0054) Distance to roads, mean 0.000037*** (0.000014) Distance to roads, min -0.0000133** (0.00000618) Constant 4.286*** (0.102) Number of Observations 65 Estimated Random Effects Variance 0.01215697 Log Likelihood 39.142508 AIC -62.285017 AICc -60.320104 BIC -44.889918 Vector Inflation Factor Distance to coast, mean 6.77 Built-up from 1990 to 2000 epochs (km2) 1.39 Aid data from World Bank 2.50 Slope 17.31 Distance to roads, mean 7.35 Distance to roads, min 1.34 Mean VIF 7.68 45 TABLE 11: FHSAE REGRESSION RESULTS FOR NORMALIZED WEALTH INDEX Coefficients VARIABLES (Standard Errors) Urban (land cover area km2) 0.0407*** (0.0074363) GDP, mean 1.117*** (0.214588) Wind power, min 0.0032* (0.0018) Herbaceous (land cover area km2) 0.0123* (0.00659) Needleleaf Evergreen Forest (land cover area km2) -0.0142* (0.00736) Constant 0.285** (0.0288) Number of Observations 65 Estimated Random Effects Variance 0.00246253 Log Likelihood 98.862323 AIC -183.72465 AICc -182.27637 BIC -168.50393 Vector Inflation Factor Urban (land cover area km2) 1.28 GDP, mean 2.30 Wind power, min 3.59 Herbaceous (land cover area km2) 1.93 Needleleaf Evergreen Forest (land cover area km2) 2.25 Mean VIF 3.20 46 47