Publication:
Is Predicted Data a Viable Alternative to Real Data?

Loading...
Thumbnail Image
Files in English
English PDF (849.94 KB)
250 downloads
English Text (123.56 KB)
22 downloads
Published
2016-09
ISSN
Date
2016-10-13
Author(s)
Fujii, Tomoki
Editor(s)
Abstract
It is costly to collect the household- and individual-level data that underlies official estimates of poverty and health. For this reason, developing countries often do not have the budget to update their estimates of poverty and health regularly, even though these estimates are most needed there. One way to reduce the financial burden is to substitute some of the real data with predicted data. An approach referred to as double sampling collects the expensive outcome variable for a sub-sample only while collecting the covariates used for prediction for the full sample. The objective of this study is to determine if this would indeed allow for realizing meaningful reductions in financial costs while preserving statistical precision. The study does this using analytical calculations that allow for considering a wide range of parameter values that are plausible to real applications. The benefits of using double sampling are found to be modest. There are circumstances for which the gains can be more substantial, but the study conjectures that these denote the exceptions rather than the rule. The recommendation is to rely on real data whenever there is a need for new data, and use the prediction estimator to leverage existing data.
Link to Data Set
Citation
Fujii, Tomoki; van der Weide, Roy. 2016. Is Predicted Data a Viable Alternative to Real Data?. Policy Research Working Paper;No. 7841. © World Bank. http://hdl.handle.net/10986/25156 License: CC BY 3.0 IGO.
Associated URLs
Associated content
Report Series
Report Series
Other publications in this report series
  • Publication
    Climate and Social Sustainability in Fragility, Conflict, and Violence Contexts
    (Washington, DC: World Bank, 2026-01-07) Cuesta Leiva, Jose Antonio; Huff, Connor
    Climate change is widely recognized as a driver of violent conflict, but its broader social effects remain less understood. Ignoring these dimensions risks a vicious cycle where climate policies might undermine socially just adaptation. Evidence is still limited on how climate shocks influence political participation, trust, or migration. This paper helps fill that gap by examining links between climate change, conflict, and social sustainability, with a focus on inclusion, resilience, cohesion, and legitimacy. Using secondary data from 2019–24, the study applies simple correlation-based methods to test three hypotheses on the nature, severity, and composition of these associations. The analysis combines multiple climate impact measures, new conflict classifications, recent social sustainability frameworks, and controls for population and geography. The results reveal strong correlations—not causation—between climate events and contexts of fragility, conflict, and violence. Climate impacts are most pronounced in both national and subnational conflict settings. The study also finds robust links between fragility, conflict, and violence and low levels of social sustainability, reflecting its role as both a driver and consequence of conflict. Some dimensions—such as violent events and insecurity—appear weaker in areas most affected by climate shocks. Two of the hypotheses are supported, and one remains inconclusive.
  • Publication
    The Macroeconomic Implications of Climate Change Impacts and Adaptation Options
    (Washington, DC: World Bank, 2025-05-29) Abalo, Kodzovi; Boehlert, Brent; Bui, Thanh; Burns, Andrew; Castillo, Diego; Chewpreecha, Unnada; Haider, Alexander; Hallegatte, Stephane; Jooste, Charl; McIsaac, Florent; Ruberl, Heather; Smet, Kim; Strzepek, Ken
    Estimating the macroeconomic implications of climate change impacts and adaptation options is a topic of intense research. This paper presents a framework in the World Bank's macrostructural model to assess climate-related damages. This approach has been used in many Country Climate and Development Reports, a World Bank diagnostic that identifies priorities to ensure continued development in spite of climate change and climate policy objectives. The methodology captures a set of impact channels through which climate change affects the economy by (1) connecting a set of biophysical models to the macroeconomic model and (2) exploring a set of development and climate scenarios. The paper summarizes the results for five countries, highlighting the sources and magnitudes of their vulnerability --- with estimated gross domestic product losses in 2050 exceeding 10 percent of gross domestic product in some countries and scenarios, although only a small set of impact channels is included. The paper also presents estimates of the macroeconomic gains from sector-level adaptation interventions, considering their upfront costs and avoided climate impacts and finding significant net gross domestic product gains from adaptation opportunities identified in the Country Climate and Development Reports. Finally, the paper discusses the limits of current modeling approaches, and their complementarity with empirical approaches based on historical data series. The integrated modeling approach proposed in this paper can inform policymakers as they make proactive decisions on climate change adaptation and resilience.
  • Publication
    Institutional Capacity for Policy Implementation: An Analytical Framework
    (Washington, DC: World Bank, 2026-01-07) Kim, Galileu; Kumar, Tanu; Ramalho, Rita; Russell, Stuart
    State capacity is an important prerequisite for policy implementation, yet at the country level it is difficult to measure, assess, and reform. This paper proposes a focus on institutional capacity: the ability of public institutions to implement the specific policy mandates for which they are responsible. Based on a review of existing literature, the paper defines the different dimensions that compose institutional capacity and groups them into two cross-cutting categories: organizational dimensions (personnel, financial resources, information systems, and management practices) and governance dimensions (transparency, independence, and accountability). The paper proposes measures for organizational and governance dimensions using existing data, shows intra-institutional variation of these measures within countries, and discusses how new data could be collected for better measurement of these concepts. Finally, the paper illustrates how the framework can be used to diagnose the sources of common problems related to weak policy implementation.
  • Publication
    South Africa’s Fragmented Cities: The Unequal Burden of Labor Market Frictions
    (Washington, DC: World Bank, 2026-01-08) Baez, Javier E.; Kshirsagar, Varun
    Using high-resolution administrative, census, and satellite data, this paper shows that South African cities are characterized by spatial mismatches between where people live and where jobs are located, relative to 20 global peers. Areas within 5 kilometers of commercial centers have 9,300 fewer residents per square kilometer than expected, which is 60 percent below the global median. Poor, dense neighborhoods are most affected. In Johannesburg, a 10-percentile increase in distance from the nearest business hub corresponds to a 3.7-percentile drop in asset wealth (a proxy of household wellbeing) and 4.9-percentile drop in employment. In Cape Town, the declines are 4.0 and 3.7 percentiles, respectively. Employment is 87 percent lower in the poorest decile than the richest in Johannesburg and 61 percent lower in Cape Town. These findings suggest that South Africa’s spatial organization of people and economic activity constrains agglomeration and reinforces inequality. This methodology provides a scalable and standardized data-driven framework to analyze spatial accessibility and agglomeration frictions in complex, data-constrained urban systems.
  • Publication
    Investment in Emerging and Developing Economies
    (Washington, DC: World Bank, 2026-01-07) Adarov, Amat; Kose, M. Ayhan; Vorisek, Dana
    The world faces a pressing challenge to meet key development objectives amid slowing growth and rising macroeconomic and geopolitical risks. With the number of job seekers rising rapidly, infrastructure shortfalls continuing to be large, and climate costs mounting, the case for a significant investment push has never been stronger. Yet the capacity to respond in many emerging markets and developing economies has eroded. Since the global financial crisis, investment growth has slowed to about half its pace in the 2000s, with both public and private investment weakening. Foreign direct investment inflows—a critical source of capital, technology, and managerial know-how—have also fallen sharply and become increasingly concentrated, leaving low-income countries with only a marginal share. The risks of further retrenchment are significant, as trade tensions, policy uncertainty, and elevated debt levels continue to weigh on investment. Reigniting momentum will require ambitious domestic reforms to strengthen institutions, rebuild macro-fiscal stability, and deepen trade and investment integration—the foundations of a supportive business climate. At the same time, international cooperation is indispensable. A renewed commitment to a predictable system of cross-border trade and investment flows, combined with scaled-up financial support and sustained technical assistance, is essential to help emerging markets and developing economies—especially low-income countries and economies in fragile and conflict situations—bridge financing gaps and implement the domestic reforms needed to restore investment as an engine of growth, jobs, and development.
Journal
Journal Volume
Journal Issue

Related items

Showing items related by metadata.

  • Publication
    Is Predicted Data a Viable Alternative to Real Data?
    (Published by Oxford University Press on behalf of the World Bank, 2020-06) Fujii, Tomoki; van der Weide, Roy
    It is costly to collect the household- and individual-level data that underlie official estimates of poverty and health. For this reason, developing countries often do not have the budget to update estimates of poverty and health regularly, even though these estimates are most needed there. One way to reduce the financial burden is to substitute some of the real data with predicted data by means of double sampling, where the expensive outcome variable is collected for a subsample and its predictors for all. This study finds that double sampling yields only modest reductions in financial costs when imposing a statistical precision constraint in a wide range of realistic empirical settings. There are circumstances in which the gains can be more substantial, but these denote the exception rather than the rule. The recommendation is to rely on real data whenever there is a need for new data and to use prediction estimators to leverage existing data.
  • Publication
    Cost-effective Estimation of the Population Mean Using Prediction Estimators
    (World Bank, Washington, DC, 2013-06) Fujii, Tomoki; van der Weide, Roy
    This paper considers the prediction estimator as an efficient estimator for the population mean. The study may be viewed as an earlier study that proved that the prediction estimator based on the iteratively weighted least squares estimator outperforms the sample mean. The analysis finds that a certain moment condition must hold in general for the prediction estimator based on a Generalized-Method-of-Moment estimator to be at least as efficient as the sample mean. In an application to cost-effective double sampling, the authors show how prediction estimators may be adopted to maximize statistical precision (minimize financial costs) under a budget constraint (statistical precision constraint). This approach is particularly useful when the outcome variable of interest is expensive to observe relative to observing its covariates.
  • Publication
    GLS Estimation and Empirical Bayes Prediction for Linear Mixed Models with Heteroskedasticity and Sampling Weights : A Background Study for the POVMAP Project
    (World Bank Group, Washington, DC, 2014-09) van der Weide, Roy
    This note adapts results by Huang and Hidiroglou (2003) on Generalized Least Squares estimation and Empirical Bayes prediction for linear mixed models with sampling weights. The objective is to incorporate these results into the poverty mapping approach put forward by Elbers et al. (2003). The estimators presented here have been implemented in version 2.5 of POVMAP, the custom-made poverty mapping software developed by the World Bank.
  • Publication
    Poverty Alleviation through Geographic Targeting: How Much Does Disaggregation Help?
    (World Bank, Washington, D.C., 2004-10) Elbers, Chris; Fujii, Tomoki; Lanjouw, Peter; Özler, Berk; Yin, Wesley
    Using recently completed "poverty maps" for Cambodia, Ecuador, and Madagascar, the authors simulate the impact on poverty of transferring an exogenously given budget to geographically defined subgroups of the population according to their relative poverty status. They find large gains from targeting smaller administrative units, such as districts or villages. But these gains are still far from the poverty reduction that would be possible had the planners had access to information on household level income or consumption. The results suggest that a useful way forward might be to combine fine geographic targeting using a poverty map with within-community targeting mechanisms.
  • Publication
    Updating Poverty Estimates at Frequent Intervals in the Absence of Consumption Data : Methods and Illustration with Reference to a Middle-Income Country
    (World Bank Group, Washington, DC, 2014-09) Lanjouw, Peter F.; Dang, Hai-Anh H.; Serajuddin, Umar
    Obtaining consistent estimates on poverty over time as well as monitoring poverty trends on a timely basis is a priority concern for policy makers. However, these objectives are not readily achieved in practice when household consumption data are neither frequently collected, nor constructed using consistent and transparent criteria. This paper develops a formal framework for survey-to-survey poverty imputation in an attempt to overcome these obstacles, and to elevate the discussion of these methods beyond the largely ad-hoc efforts in the existing literature. The framework introduced here imposes few restrictive assumptions, works with simple variance formulas, provides guidance on the selection of control variables for model building, and can be generally applied to imputation either from one survey to another survey with the same design, or to another survey with a different design. Empirical results analyzing the Household Expenditure and Income Survey and the Unemployment and Employment Survey in Jordan are quite encouraging, with imputation-based poverty estimates closely tracking the direct estimates of poverty.

Users also downloaded

Showing related downloaded files

  • Publication
    World Development Report 2023: Migrants, Refugees, and Societies
    (Washington, DC : World Bank, 2023-04-25) World Bank
    Migration is a development challenge. About 184 million people—2.3 percent of the world’s population—live outside of their country of nationality. Almost half of them are in low- and middle-income countries. But what lies ahead? As the world struggles to cope with global economic imbalances, diverging demographic trends, and climate change, migration will become a necessity in the decades to come for countries at all levels of income. If managed well, migration can be a force for prosperity and can help achieve the United Nations’ Sustainable Development Goals. World Development Report 2023 proposes an innovative approach to maximize the development impacts of cross-border movements on both destination and origin countries and on migrants and refugees themselves. The framework it offers, drawn from labor economics and international law, rests on a “Match and Motive Matrix” that focuses on two factors: how closely migrants’ skills and attributes match the needs of destination countries and what motives underlie their movements. This approach enables policy makers to distinguish between different types of movements and to design migration policies for each. International cooperation will be critical to the effective management of migration.
  • Publication
    Global Economic Prospects, January 2025
    (Washington, DC: World Bank, 2025-01-16) World Bank
    Global growth is expected to hold steady at 2.7 percent in 2025-26. However, the global economy appears to be settling at a low growth rate that will be insufficient to foster sustained economic development—with the possibility of further headwinds from heightened policy uncertainty and adverse trade policy shifts, geopolitical tensions, persistent inflation, and climate-related natural disasters. Against this backdrop, emerging market and developing economies are set to enter the second quarter of the twenty-first century with per capita incomes on a trajectory that implies substantially slower catch-up toward advanced-economy living standards than they previously experienced. Without course corrections, most low-income countries are unlikely to graduate to middle-income status by the middle of the century. Policy action at both global and national levels is needed to foster a more favorable external environment, enhance macroeconomic stability, reduce structural constraints, address the effects of climate change, and thus accelerate long-term growth and development.
  • Publication
    E-GP Implementations : A Review of Business Models and Approaches
    (Washington, DC, 2009-11-01) World Bank
    E-procurement systems have become an integral component of procurement reform by governments around the world as they move to institute competitive and fully transparent procurement systems, and to address issues of corruption and transparency. This rise in demand for e-Government Procurement (e-GP) systems has created an innovative industry of technical products and business arrangements. To support the development, implementation and operation of e-procurement systems, governments have undertaken several different business approaches, from complete in house solutions to various types of third-party partnerships. Each implementation has its own set of benefits and business issues associated with it, which directly affect both the government and the suppliers doing business with the government. The purpose of the study is to identify the types of business arrangements used by governments; the benefits of these arrangements; how they may or may not affect the procurement process; the types of issues or perceived issues that have resulted from the business approaches applied; the contractual arrangements or policies that have been drafted to mitigate issues and ensure the integrity and security of the procurement process. The goal of this study is to continue expanding the World Bank knowledge base on e-procurement programs, approaches and solutions so that it can continue to assist organizations with the implementation and management of new and current government e-procurement initiatives.
  • Publication
    The Container Port Performance Index 2023
    (Washington, DC: World Bank, 2024-07-18) World Bank
    The Container Port Performance Index (CPPI) measures the time container ships spend in port, making it an important point of reference for stakeholders in the global economy. These stakeholders include port authorities and operators, national governments, supranational organizations, development agencies, and other public and private players in trade and logistics. The index highlights where vessel time in container ports could be improved. Streamlining these processes would benefit all parties involved, including shipping lines, national governments, and consumers. This fourth edition of the CPPI relies on data from 405 container ports with at least 24 container ship port calls in the calendar year 2023. As in earlier editions of the CPPI, the ranking employs two different methodological approaches: an administrative (technical) approach and a statistical approach (using matrix factorization). Combining these two approaches ensures that the overall ranking of container ports reflects actual port performance as closely as possible while also being statistically robust. The CPPI methodology assesses the sequential steps of a container ship port call. ‘Total port hours’ refers to the total time elapsed from the moment a ship arrives at the port until the vessel leaves the berth after completing its cargo operations. The CPPI uses time as an indicator because time is very important to shipping lines, ports, and the entire logistics chain. However, time, as captured by the CPPI, is not the only way to measure port efficiency, so it does not tell the entire story of a port’s performance. Factors that can influence the time vessels spend in ports can be location-specific and under the port’s control (endogenous) or external and beyond the control of the port (exogenous). The CPPI measures time spent in container ports, strictly based on quantitative data only, which do not reveal the underlying factors or root causes of extended port times. A detailed port-specific diagnostic would be required to assess the contribution of underlying factors to the time a vessel spends in port. A very low ranking or a significant change in ranking may warrant special attention, for which the World Bank generally recommends a detailed diagnostic.
  • Publication
    Digital Africa
    (Washington, DC: World Bank, 2023-03-13) Begazo, Tania; Dutz, Mark Andrew; Blimpo, Moussa
    All African countries need better and more jobs for their growing populations. "Digital Africa: Technological Transformation for Jobs" shows that broader use of productivity-enhancing, digital technologies by enterprises and households is imperative to generate such jobs, including for lower-skilled people. At the same time, it can support not only countries’ short-term objective of postpandemic economic recovery but also their vision of economic transformation with more inclusive growth. These outcomes are not automatic, however. Mobile internet availability has increased throughout the continent in recent years, but Africa’s uptake gap is the highest in the world. Areas with at least 3G mobile internet service now cover 84 percent of Africa’s population, but only 22 percent uses such services. And the average African business lags in the use of smartphones and computers as well as more sophisticated digital technologies that catalyze further productivity gains. Two issues explain the usage gap: affordability of these new technologies and willingness to use them. For the 40 percent of Africans below the extreme poverty line, mobile data plans alone would cost one-third of their incomes—in addition to the price of access devices, apps, and electricity. Data plans for small- and medium-size businesses are also more expensive than in other regions. Moreover, shortcomings in the quality of internet services—and in the supply of attractive, skills-appropriate apps that promote entrepreneurship and raise earnings—dampen people’s willingness to use them. For those countries already using these technologies, the development payoffs are significant. New empirical studies for this report add to the rapidly growing evidence that mobile internet availability directly raises enterprise productivity, increases jobs, and reduces poverty throughout Africa. To realize these and other benefits more widely, Africa’s countries must implement complementary and mutually reinforcing policies to strengthen both consumers’ ability to pay and willingness to use digital technologies. These interventions must prioritize productive use to generate large numbers of inclusive jobs in a region poised to benefit from a massive, youthful workforce—one projected to become the world’s largest by the end of this century.