Publication: Pull Your Small Area Estimates Up by the Bootstraps
Loading...
Files in English
109 downloads
Date
2021-05-08
ISSN
0094-9655
Published
2021-05-08
Author(s)
Editor(s)
Abstract
This paper presents a methodological update to the World Bank's toolkit for small area estimation. The paper reviews the computational procedures of the current methods used by the institution: the traditional ELL approach and the Empirical Best (EB) addition introduced to imitate the original EB procedure of Molina and Rao [Small area estimation of poverty indicators. Canadian J Stat. 2010;38(3):369–385], including heteroskedasticity and survey weights, but using a different bootstrap approach, here referred to as clustered bootstrap. Simulation experiments provide empirical evidence of the shortcomings of the clustered bootstrap approach, which yields biased and noisier point estimates. The document presents an update to the World Bank’s EB implementation by considering the original EB procedures for point and noise estimation, extended for complex designs and heteroscedasticity. Simulation experiments illustrate that the revised methods yield considerably less biased and more efficient estimators than those obtained from the clustered bootstrap approach.
Link to Data Set
Associated URLs
Associated content
Other publications in this report series
Journal
Journal Volume
Journal Issue
Citations
- Cited 8 times in Scopus (view citations)
Collections
Related items
Showing items related by metadata.
Publication Pull Your Small Area Estimates Up by the Bootstraps(World Bank, Washington, DC, 2020-05)After almost two decades of poverty maps produced by the World Bank and multiple advances in the literature, this paper presents a methodological update to the World Bank's toolkit for small area estimation. The paper reviews the computational procedures of the current methods used by the World Bank: the traditional approach by Elbers, Lanjouw and Lanjouw (2003) and the Empirical Best/Bayes (EB) addition introduced by Van der Weide (2014). The addition extends the EB procedure of Molina and Rao (2010) by considering heteroscedasticity and includes survey weights, but uses a different bootstrap approach, here referred to as clustered bootstrap. Simulation experiments comparing these methods to the original EB approach of Molina and Rao (2010) provide empirical evidence of the shortcomings of the clustered bootstrap approach, which yields biased point estimates. The main contributions of this paper are then two: 1) to adapt the original Monte Carlo simulation procedure of Molina and Rao (2010) for the approximation of the extended EB estimators that include heteroscedasticity and survey weights as in Van der Weide (2014); and 2) to adapt the parametric bootstrap approach for mean squared error (MSE) estimation considered by Molina and Rao (2010), and proposed originally by González-Manteiga et al. (2008), to these extended EB estimators. Simulation experiments illustrate that the revised Monte Carlo simulation method yields estimators that are considerably less biased and more efficient in terms of MSE than those obtained from the clustered bootstrap approach, and that the parametric bootstrap MSE estimators are in line with the true MSEs under realistic scenarios.Publication A Map of the Poor or a Poor Map?(World Bank, Washington, DC, 2021-04)This paper evaluates the performance of different small area estimation methods using model and design-based simulation experiments. Design-based simulation experiments are carried out using the Mexican Intra Censal survey as a census of roughly 3.9 million households from which 500 samples are drawn using a two-stage selection procedure similar to that of Living Standards Measurement Study surveys. Several unit-level methods are considered as well as a method that combines unit and area level information, which has been proposed as an alternative when the available census data is outdated. The findings show the importance of selecting a proper model and data transformation so that the model assumptions hold. A proper data transformation can lead to a considerable improvement in mean squared errors. The results from design-based validation show that all small area estimation methods represent an improvement, in terms of mean squared errors, over direct estimates. However, methods that model unit level welfare using only area level information suffer from considerable bias. Because the magnitude and direction of the bias are unknown ex ante, methods that rely only on aggregated covariates should be used with caution, but they may be an alternative to traditional area level models when these are not applicable.Publication How Good a Map? Putting Small Area Estimation to the Test(World Bank, Washington, DC, 2007-03)The authors examine the performance of small area welfare estimation. The method combines census and survey data to produce spatially disaggregated poverty and inequality estimates. To test the method, they compare predicted welfare indicators for a set of target populations with their true values. They construct target populations using actual data from a census of households in a set of rural Mexican communities. They examine estimates along three criteria: accuracy of confidence intervals, bias, and correlation with true values. The authors find that while point estimates are very stable, the precision of the estimates varies with alternative simulation methods. While the original approach of numerical gradient estimation yields standard errors that seem appropriate, some computationally less-intensive simulation procedures yield confidence intervals that are slightly too narrow. The precision of estimates is shown to diminish markedly if unobserved location effects at the village level are not well captured in underlying consumption models. With well specified models there is only slight evidence of bias, but the authors show that bias increases if underlying models fail to capture latent location effects. Correlations between estimated and true welfare at the local level are highest for mean expenditure and poverty measures and lower for inequality measures.Publication SAE - A Stata Package for Unit Level Small Area Estimation(World Bank, Washington, DC, 2018-10)This paper presents a new family of Stata functions devoted to small area estimation. Small area methods attempt to solve low representativeness of surveys within areas, or the lack of data for specific areas/sub-populations. This is accomplished by incorporating information from outside sources. Such target data sets are becoming increasingly available and can take the form of a traditional population census, but also large scale administrative records from tax administrations, or geospatial information produced using remote sensing. The strength of these target data sets is their granularity on the subpopulations of interest, however, in many cases they lack the ability to collect analytically relevant variables such as welfare or caloric intake. The family of functions introduced follow a modular design to have the flexibility with which these can be expanded in the future. This can be accomplished by the authors and/or other collaborators from the Stata community. Thus far, a major limitation of such analysis in Stata has been the large size of target data sets. The package introduces new mata functions and a plugin used to circumvent memory limitations that inevitably arise when working with big data. From an estimation perspective, the paper starts by implementing a methodology that has been widely used for the production of several poverty maps.Publication Frontiers in Small Area Estimation Research(Washington, DC: World Bank, 2024-06-28)This paper reviews the main methods for small area estimation of welfare indicators. It begins by discussing the importance of small area estimation methods for producing reliable disaggregated estimates. It mentions the baseline papers and describes the contents of the different sections. Basic direct estimators obtained from area-specific survey data are described first, followed by simple indirect methods, which include synthetic procedures that do not account for the area effects and composite estimators obtained as a composition (or weighted average) of a synthetic and a direct estimator. The previous estimators are design-based, meaning that their properties are assessed under the sampling replication mechanism, without assuming any model to be true. The paper then turns to proper model-based estimators that assume an explicit model. These models allow obtaining optimal small area estimators when the assumed model holds. The first type of models, referred to as area-level models, use only aggregated data at the area level to fit the model. However, unit-level survey data were previously used to calculate the direct estimators, which act as response variables in the most common area-level models. The paper then switches to unit-level models, describing first the usual estimators for area means, and then moving to general area indicators. Semi-parametric, non-parametric, and machine learning procedures are described in a separate section, although many of the procedures are applicable only to area means. Based on the previous material, the paper identifies gaps or potential limitations in existing procedures from a practitioner’s perspective, which could potentially be addressed through research over the next three to five years.
Users also downloaded
Showing related downloaded files
Publication Global Economic Prospects, January 2025(Washington, DC: World Bank, 2025-01-16)Global growth is expected to hold steady at 2.7 percent in 2025-26. However, the global economy appears to be settling at a low growth rate that will be insufficient to foster sustained economic development—with the possibility of further headwinds from heightened policy uncertainty and adverse trade policy shifts, geopolitical tensions, persistent inflation, and climate-related natural disasters. Against this backdrop, emerging market and developing economies are set to enter the second quarter of the twenty-first century with per capita incomes on a trajectory that implies substantially slower catch-up toward advanced-economy living standards than they previously experienced. Without course corrections, most low-income countries are unlikely to graduate to middle-income status by the middle of the century. Policy action at both global and national levels is needed to foster a more favorable external environment, enhance macroeconomic stability, reduce structural constraints, address the effects of climate change, and thus accelerate long-term growth and development.Publication Global Economic Prospects, June 2025(Washington, DC: World Bank, 2025-06-10)The global economy is facing another substantial headwind, emanating largely from an increase in trade tensions and heightened global policy uncertainty. For emerging market and developing economies (EMDEs), the ability to boost job creation and reduce extreme poverty has declined. Key downside risks include a further escalation of trade barriers and continued policy uncertainty. These challenges are exacerbated by subdued foreign direct investment into EMDEs. Global cooperation is needed to restore a more stable international trade environment and scale up support for vulnerable countries grappling with conflict, debt burdens, and climate change. Domestic policy action is also critical to contain inflation risks and strengthen fiscal resilience. To accelerate job creation and long-term growth, structural reforms must focus on raising institutional quality, attracting private investment, and strengthening human capital and labor markets. Countries in fragile and conflict situations face daunting development challenges that will require tailored domestic policy reforms and well-coordinated multilateral support.Publication The Container Port Performance Index 2023(Washington, DC: World Bank, 2024-07-18)The Container Port Performance Index (CPPI) measures the time container ships spend in port, making it an important point of reference for stakeholders in the global economy. These stakeholders include port authorities and operators, national governments, supranational organizations, development agencies, and other public and private players in trade and logistics. The index highlights where vessel time in container ports could be improved. Streamlining these processes would benefit all parties involved, including shipping lines, national governments, and consumers. This fourth edition of the CPPI relies on data from 405 container ports with at least 24 container ship port calls in the calendar year 2023. As in earlier editions of the CPPI, the ranking employs two different methodological approaches: an administrative (technical) approach and a statistical approach (using matrix factorization). Combining these two approaches ensures that the overall ranking of container ports reflects actual port performance as closely as possible while also being statistically robust. The CPPI methodology assesses the sequential steps of a container ship port call. ‘Total port hours’ refers to the total time elapsed from the moment a ship arrives at the port until the vessel leaves the berth after completing its cargo operations. The CPPI uses time as an indicator because time is very important to shipping lines, ports, and the entire logistics chain. However, time, as captured by the CPPI, is not the only way to measure port efficiency, so it does not tell the entire story of a port’s performance. Factors that can influence the time vessels spend in ports can be location-specific and under the port’s control (endogenous) or external and beyond the control of the port (exogenous). The CPPI measures time spent in container ports, strictly based on quantitative data only, which do not reveal the underlying factors or root causes of extended port times. A detailed port-specific diagnostic would be required to assess the contribution of underlying factors to the time a vessel spends in port. A very low ranking or a significant change in ranking may warrant special attention, for which the World Bank generally recommends a detailed diagnostic.Publication Global Economic Prospects, January 2024(Washington, DC: World Bank, 2024-01-09)Note: Chart 1.2.B has been updated on January 18, 2024. Chart 2.2.3 B has been updated on January 14, 2024. Global growth is expected to slow further this year, reflecting the lagged and ongoing effects of tight monetary policy to rein in inflation, restrictive credit conditions, and anemic global trade and investment. Downside risks include an escalation of the recent conflict in the Middle East, financial stress, persistent inflation, weaker-than-expected activity in China, trade fragmentation, and climate-related disasters. Against this backdrop, policy makers face enormous challenges. In emerging market and developing economies (EMDEs), commodity exporters face the enduring challenges posed by fiscal policy procyclicality and volatility, which highlight the need for robust fiscal frameworks. Across EMDEs, previous episodes of investment growth acceleration underscore the critical importance of macroeconomic and structural policies and an enabling institutional environment in bolstering investment and long-term growth. At the global level, cooperation needs to be strengthened to provide debt relief, facilitate trade integration, tackle climate change, and alleviate food insecurity.Publication Digital Progress and Trends Report 2023(Washington, DC: World Bank, 2024-03-05)Digitalization is the transformational opportunity of our time. The digital sector has become a powerhouse of innovation, economic growth, and job creation. Value added in the IT services sector grew at 8 percent annually during 2000–22, nearly twice as fast as the global economy. Employment growth in IT services reached 7 percent annually, six times higher than total employment growth. The diffusion and adoption of digital technologies are just as critical as their invention. Digital uptake has accelerated since the COVID-19 pandemic, with 1.5 billion new internet users added from 2018 to 2022. The share of firms investing in digital solutions around the world has more than doubled from 2020 to 2022. Low-income countries, vulnerable populations, and small firms, however, have been falling behind, while transformative digital innovations such as artificial intelligence (AI) have been accelerating in higher-income countries. Although more than 90 percent of the population in high-income countries was online in 2022, only one in four people in low-income countries used the internet, and the speed of their connection was typically only a small fraction of that in wealthier countries. As businesses in technologically advanced countries integrate generative AI into their products and services, less than half of the businesses in many low- and middle-income countries have an internet connection. The growing digital divide is exacerbating the poverty and productivity gaps between richer and poorer economies. The Digital Progress and Trends Report series will track global digitalization progress and highlight policy trends, debates, and implications for low- and middle-income countries. The series adds to the global efforts to study the progress and trends of digitalization in two main ways: · By compiling, curating, and analyzing data from diverse sources to present a comprehensive picture of digitalization in low- and middle-income countries, including in-depth analyses on understudied topics. · By developing insights on policy opportunities, challenges, and debates and reflecting the perspectives of various stakeholders and the World Bank’s operational experiences. This report, the first in the series, aims to inform evidence-based policy making and motivate action among internal and external audiences and stakeholders. The report will bring global attention to high-performing countries that have valuable experience to share as well as to areas where efforts will need to be redoubled.