Publication:
Missing Evidence: Tracking Academic Data Use around the World

Loading...
Thumbnail Image
Files in English
English PDF (3.36 MB)
413 downloads
English Text (90.72 KB)
19 downloads
Date
2024-01-25
ISSN
Published
2024-01-25
Author(s)
Stacy, Brian
Kitzmüller, Lucas
Wang, Xiaoyu
Serajuddin, Umar
Editor(s)
Abstract
Data-driven research on a country is key to producing evidence-based public policies. Yet little is known about where data-driven research is lacking and how it could be expanded. This paper proposes a method for tracking academic data use by country of subject, applying natural language processing to open-access research papers. The model’s predictions produce country estimates of the number of articles using data that are highly correlated with a human-coded approach, with a correlation of 0.99. Analyzing more than 1 million academic articles, the paper finds that the number of articles on a country is strongly correlated with its gross domestic product per capita, population, and the quality of its national statistical system. The paper identifies data sources that are strongly associated with data-driven research and finds that availability of subnational data appears to be particularly important. Finally, the paper classifies countries into groups based on whether they could most benefit from increasing their supply of or demand for data. The findings show that the former applies to many low- and lower-middle-income countries, while the latter applies to many upper-middle- and high-income countries.
Link to Data Set
Citation
Stacy, Brian; Kitzmüller, Lucas; Wang, Xiaoyu; Mahler, Daniel Gerszon; Serajuddin, Umar. 2024. Missing Evidence: Tracking Academic Data Use around the World. Policy Research Working Paper; 10673. © World Bank. http://hdl.handle.net/10986/40965 License: CC BY 3.0 IGO.
Associated URLs
Report Series
Report Series
Other publications in this report series
  • Publication
    Geopolitics and the World Trading System
    (Washington, DC: World Bank, 2024-12-23) Mattoo, Aaditya; Ruta, Michele; Staiger, Robert W.
    Until the beginning of this century, the GATT/WTO system worked. Economic research provided a compelling explanation. It showed that if governments maximize the well-being of their own countries broadly defined, GATT/WTO principles would facilitate mutually beneficial cooperation over their trade policy choices. Now heightened geopolitical rivalry seems to have undermined the WTO. A simple transposition of the previous rationalization suggests that geopolitics and trade cooperation are not compatible. The paper shows that this is only true if rivalry eclipses any consideration of own-country well-being. In all other circumstances, there are gains from trade cooperation even with geopolitics. Furthermore, the WTO’s relevance is in question only if it adheres too rigidly to its existing rules and norms. Through measured adaptation to the geopolitical imperative, the WTO can continue to thrive as a forum for multilateral trade cooperation in the age of geopolitics.
  • Publication
    Tradeoffs over Rate Cycles
    (Washington, DC: World Bank, 2025-05-23) Forbes, Kristin; Ha, Jongrim; Kose, M. Ayhan
    Central banks often face tradeoffs in how their monetary policy decisions impact economic activity (including employment), inflation and the price level. This paper assesses how these tradeoffs have evolved over time and varied across countries, with a focus on understanding the post-pandemic adjustment. To make these comparisons, we compile a cross-country, historical database of “rate cycles” (i.e., easing and tightening phases for monetary policy) for 24 advanced economies from 1970 through 2024. This allows us to quantify the characteristics of interest rate adjustments and corresponding macroeconomic outcomes and tradeoffs. We also calculate Sacrifice Ratios (output losses per inflation reduction) and document a historically low “sacrifice” during the post-pandemic tightening. This popular measure, however, ignores adjustments in the price level—which increased by more after the pandemic than over the past four decades. A series of regressions and simulations suggest monetary policy (and particularly the timing and aggressiveness of rate hikes) play a meaningful role in explaining these tradeoffs and how adjustments occur during tightening phases. Central bank credibility is the one measure we assess that corresponds to only positive outcomes and no difficult tradeoffs.
  • Publication
    Global Socio-economic Resilience to Natural Disasters
    (Washington, DC: World Bank, 2025-05-22) Middelanis, Robin; Jafino, Bramka Arga; Hill, Ruth; Nguyen, Minh Cong; Hallegatte, Stephane
    Most disaster risk assessments use damages to physical assets as their central metric, often neglecting distributional impacts and the coping and recovery capacity of affected people. To address this shortcoming, the concepts of well-being losses and socio-economic resilience—the ability to experience asset losses without a decline in well-being—have been proposed. This paper uses microsimulations to produce a global estimate of well-being losses from, and socio-economic resilience to, natural disasters, covering 132 countries. On average, each $1 in disaster-related asset losses results in well-being losses equivalent to a $2 uniform national drop in consumption, with significant variation within and across countries. The poorest income quintile within each country incurs only 9% of national asset losses but accounts for 33% of well-being losses. Compared to high-income countries, low-income countries experience 67% greater well-being losses per dollar of asset losses and require 56% more time to recover. Socio-economic resilience is uncorrelated with exposure or vulnerability to natural hazards. However, a 10 percent increase in GDP per capita is associated with a 0.9 percentage point gain in resilience, but this benefit arises indirectly—such as through higher rate of formal employment, better financial inclusion, and broader social protection coverage—rather than from higher income itself. This paper assess ten policy options and finds that socio-economic and financial interventions (such as insurance and social protection) can effectively complement asset-focused measures (e.g., construction standards) and that interventions targeting low-income populations usually have higher returns in terms of avoided well-being losses per dollar invested.
  • Publication
    From Chalkboards to Chatbots
    (Washington, DC: World Bank, 2025-05-20) De Simone, Martin; Tiberti, Federico; Barron Rodriguez, Maria; Manolio, Federico; Mosuro, Wuraola; Dikoru, Eliot Jolomi
    This study evaluates the impact of a program leveraging large language models for virtual tutoring in secondary education in Nigeria. Using a randomized controlled trial, the program deployed Microsoft Copilot (powered by GPT-4) to support first-year senior secondary students in English language learning over six weeks. The intervention demonstrated a significant improvement of 0.31 standard deviation on an assessment that included English topics aligned with the Nigerian curriculum, knowledge of artificial intelligence and digital skills. The effect on English, the main outcome of interest, was of 0.23 standard deviations. Cost-effectiveness analysis revealed substantial learning gains, equating to 1.5 to 2 years of ’business-as-usual’ schooling, situating the intervention among some of the most cost-effective programs to improve learning outcomes. An analysis of heterogeneous effects shows that while the program benefits students across the baseline ability distribution, the largest effects are for female students, and those with higher initial academic performance. The findings highlight that artificial intelligence-powered tutoring, when designed and used properly, can have transformative impacts in the education sector in low-resource settings.
  • Publication
    Disentangling the Key Economic Channels through Which Infrastructure Affects Jobs
    (Washington, DC: World Bank, 2025-04-03) Vagliasindi, Maria; Gorgulu, Nisan
    This paper takes stock of the literature on infrastructure and jobs published since the early 2000s, using a conceptual framework to identify the key channels through which different types of infrastructure impact jobs. Where relevant, it highlights the different approaches and findings in the cases of energy, digital, and transport infrastructure. Overall, the literature review provides strong evidence of infrastructure’s positive impact on employment, particularly for women. In the case of electricity, this impact arises from freeing time that would otherwise be spent on household tasks. Similarly, digital infrastructure, particularly mobile phone coverage, has demonstrated positive labor market effects, often driven by private sector investments rather than large public expenditures, which are typically required for other large-scale infrastructure projects. The evidence on structural transformation is also positive, with some notable exceptions, such as studies that find no significant impact on structural transformation in rural India in the cases of electricity and roads. Even with better market connections, remote areas may continue to lack economic opportunities, due to the absence of agglomeration economies and complementary inputs such as human capital. Accordingly, reducing transport costs alone may not be sufficient to drive economic transformation in rural areas. The spatial dimension of transformation is particularly relevant for transport, both internationally—by enhancing trade integration—and within countries, where economic development tends to drive firms and jobs toward urban centers, benefitting from economies scale and network effects. Turning to organizational transformation, evidence on skill bias in developing countries is more mixed than in developed countries and may vary considerably by context. Further research, especially on the possible reasons explaining the differences between developed and developing economies, is needed.
Journal
Journal Volume
Journal Issue

Related items

Showing items related by metadata.

  • Publication
    When Is There Enough Data to Create a Global Statistic ?
    (Washington, DC: World Bank, 2022-05-05) Serajuddin, Umar; Mahler, Daniel Gerszon; Maeda, Hiroko
    To monitor progress toward global goals such as the Sustainable Development Goals, global statistics are needed. Yet cross-country data sets are rarely truly global, creating a trade-off for producers of global statistics: the lower is the data coverage threshold for disseminating global statistics, the more statistics can be made available, but the lower is the accuracy of these statistics. This paper quantifies the availability-accuracy trade-off by running more than 10 million simulations on the World Development Indicators. It shows that if the fraction of the world’s population for which data are lacking is x, then the global value will on expectation be off by 0.37*x standard deviation, and it could be off by as much as x standard deviations. The paper shows the robustness of this result to various assumptions and provides recommendations on when there is enough data to create global statistics. Although the decision will be context specific, in a baseline scenario, it is suggested not to create global statistics when there are data for less than half of the world’s population.
  • Publication
    Missing SDG Gender Indicators
    (Washington, DC: World Bank, 2023-08-15) Serajuddin, Umar; Beegle, Kathleen; Stacy, Brian; Wadhwa, Divyanshi
    The Sustainable Development Goal agenda lays out an ambitious set of 231 indicators to track progress. Countries continue to fall short in terms of reporting on the indicators in general, and this is particularly the case for the subset of 50 gender-related indicators, where countries reported on average on 31 percent of these indicators in at least one year from 2016 to 2020. A closer look at this low coverage reveals four salient fundings. First, this is not just a problem of missing data; lack of reporting on existing data is detected to be a problem. For example, of the 32 gender-related indicators that are sex disaggregated, if countries that had a population estimate also had a sex-disaggregated estimate (which is almost always feasible), the Sustainable Development Goal gender coverage rate would be 43 percent instead of 31 percent. Second, better statistical systems are a major part of the solution, as statistical system strength is correlated with higher coverage. Third, poorer countries are doing no worse in reporting on gender-related Sustainable Development Goal indicators than high-income countries, despite weaker statistical systems. Lastly, sizable over (and under) performance in reporting, conditional on statistical strength, suggests that country-level advocacy and focus can yield wins in Sustainable Development Goal gender indicator coverage.
  • Publication
    Updating Poverty Estimates at Frequent Intervals in the Absence of Consumption Data : Methods and Illustration with Reference to a Middle-Income Country
    (World Bank Group, Washington, DC, 2014-09) Lanjouw, Peter F.; Dang, Hai-Anh H.; Serajuddin, Umar
    Obtaining consistent estimates on poverty over time as well as monitoring poverty trends on a timely basis is a priority concern for policy makers. However, these objectives are not readily achieved in practice when household consumption data are neither frequently collected, nor constructed using consistent and transparent criteria. This paper develops a formal framework for survey-to-survey poverty imputation in an attempt to overcome these obstacles, and to elevate the discussion of these methods beyond the largely ad-hoc efforts in the existing literature. The framework introduced here imposes few restrictive assumptions, works with simple variance formulas, provides guidance on the selection of control variables for model building, and can be generally applied to imputation either from one survey to another survey with the same design, or to another survey with a different design. Empirical results analyzing the Household Expenditure and Income Survey and the Unemployment and Employment Survey in Jordan are quite encouraging, with imputation-based poverty estimates closely tracking the direct estimates of poverty.
  • Publication
    Technical Assessment of Open Data Platforms for National Statistical Organisations
    (World Bank, Washington, DC, 2014-10-18) World Bank Group
    The term quot;open dataquot; is generally understood to be data that are made available to the public free of charge, without registration or restrictive licenses, for any purpose whatsoever (including commercial purposes), in electronic, machine-readable formats that ensure data are easy to find, download and use. National Statistics Offices (NSOs) have the potential to play a pivotal role in the implementation of open data initiatives. As producers and curators of data, the objective of making high quality data more accessible and usable is consistent with their guiding principles. NSOs indicate, in research conducted in support of this report, that one of the difficulties they encounter is that the technology they use to publish - or electronically distribute - data for public use is not compatible with open formats. They also indicate that common software packages used for open data portals do not accommodate the data formats and metadata they produce. Two key concerns related to data dissemination products are addresses: (1) Can such products designed primarily for NSOs satisfy requirements for an open data initiative?; and (2) Can such products designed primarily for open data satisfy the requirements of NSOs? Furthermore, data reuse, both by data experts and the public at large, is key to creating new opportunities and benefits from government data. The following recommendations are made to improve the overall utility of data publication platforms to NSOs and the open data community: improve technical documentation; ensure public Application Programming Interfaces (APIs) and endpoints are interoperable; presentation of metadata and Uniform Resource Identifiers (URIs) must conform to W3C standards; natural language search and metadata faceting should be standard; structural metadata and hypercube support are core NSO requirements; dashboards and visualisations are necessary for user engagement; and develop data engagement tools for improving data-quality and reuse.
  • Publication
    Reviewing Assessment Tools for Measuring Country Statistical Capacity
    (Washington, DC: World Bank, 2024-03-11) Pullinger, John; Serajuddin, Umar; Stacy, Brian; Dang, Hai-Anh H.
    Country statistical capacity is increasingly recognized as crucial for development, but no academic study exists that reviews the available assessment tools. This paper offers the first review study that fills this gap, paying particular attention to data and practical measurement challenges. It compares the World Bank’s recently developed Statistical Performance Indicators and Index with other widely used indexes, such as the Open Data Inventory index, the Global Data Barometer index, and other regional and self-assessment tools. The findings show that each index brings advantages in data sources, number of indicators, measurement focus, coverage of countries and time periods, and correlation with common development indexes. The Open Data Inventory index covers the most countries, the Global Data Barometer index collects data through its surveys, and the Statistical Performance Indicators and Index offer a broader framework for assessing statistical systems. The paper offers further thoughts on the potential mechanisms through which these tools can bring positive impacts on economic activities and some political economy concerns, as well as future directions for development.

Users also downloaded

Showing related downloaded files

  • Publication
    Economic Recovery
    (World Bank, Washington, DC, 2021-04-06) Malpass, David; Georgieva, Kristalina; Yellen, Janet
    World Bank Group President David Malpass spoke about the world facing major challenges, including COVID, climate change, rising poverty and inequality and growing fragility and violence in many countries. He highlighted vaccines, working closely with Gavi, WHO, and UNICEF, the World Bank has conducted over one hundred capacity assessments, many even more before vaccines were available. The World Bank Group worked to achieve a debt service suspension initiative and increased transparency in debt contracts at developing countries. The World Bank Group is finalizing a new climate change action plan, which includes a big step up in financing, building on their record climate financing over the past two years. He noted big challenges to bring all together to achieve GRID: green, resilient, and inclusive development. Janet Yellen, U.S. Secretary of the Treasury, mentioned focusing on vulnerable people during the pandemic. Kristalina Georgieva, Managing Director of the International Monetary Fund, focused on giving everyone a fair shot during a sustainable recovery. All three commented on the importance of tackling climate change.
  • Publication
    Remarks at the United Nations Biodiversity Conference
    (World Bank, Washington, DC, 2021-10-12) Malpass, David
    World Bank Group President David Malpass discussed biodiversity and climate change being closely interlinked, with terrestrial and marine ecosystems serving as critically important carbon sinks. At the same time climate change acts as a direct driver of biodiversity and ecosystem services loss. The World Bank has financed biodiversity conservation around the world, including over 116 million hectares of Marine and Coastal Protected Areas, 10 million hectares of Terrestrial Protected Areas, and over 300 protected habitats, biological buffer zones and reserves. The COVID pandemic, biodiversity loss, climate change are all reminders of how connected we are. The recovery from this pandemic is an opportunity to put in place more effective policies, institutions, and resources to address biodiversity loss.
  • Publication
    World Development Report 2024
    (Washington, DC: World Bank, 2024-08-01) World Bank
    Middle-income countries are in a race against time. Many of them have done well since the 1990s to escape low-income levels and eradicate extreme poverty, leading to the perception that the last three decades have been great for development. But the ambition of the more than 100 economies with incomes per capita between US$1,100 and US$14,000 is to reach high-income status within the next generation. When assessed against this goal, their record is discouraging. Since the 1970s, income per capita in the median middle-income country has stagnated at less than a tenth of the US level. With aging populations, growing protectionism, and escalating pressures to speed up the energy transition, today’s middle-income economies face ever more daunting odds. To become advanced economies despite the growing headwinds, they will have to make miracles. Drawing on the development experience and advances in economic analysis since the 1950s, World Development Report 2024 identifies pathways for developing economies to avoid the “middle-income trap.” It points to the need for not one but two transitions for those at the middle-income level: the first from investment to infusion and the second from infusion to innovation. Governments in lower-middle-income countries must drop the habit of repeating the same investment-driven strategies and work instead to infuse modern technologies and successful business processes from around the world into their economies. This requires reshaping large swaths of those economies into globally competitive suppliers of goods and services. Upper-middle-income countries that have mastered infusion can accelerate the shift to innovation—not just borrowing ideas from the global frontiers of technology but also beginning to push the frontiers outward. This requires restructuring enterprise, work, and energy use once again, with an even greater emphasis on economic freedom, social mobility, and political contestability. Neither transition is automatic. The handful of economies that made speedy transitions from middle- to high-income status have encouraged enterprise by disciplining powerful incumbents, developed talent by rewarding merit, and capitalized on crises to alter policies and institutions that no longer suit the purposes they were once designed to serve. Today’s middle-income countries will have to do the same.
  • Publication
    Media and Messages for Nutrition and Health
    (World Bank, Washington, DC, 2020-06) Calleja, Ramon V., Jr.; Mbuya, Nkosinathi V.N.; Morimoto, Tomo; Thitsy, Sophavanh
    The Lao People’s Democratic Republic (Lao PDR) has experienced rapid and significant economic growth over the past decade. However, poor nutritional outcomes remain a concern. Rates of childhood undernutrition are particularly high in remote, rural, and upland areas. Media have the potential to play an important role in shaping health and nutrition–related behaviors and practices as well as in promoting sociocultural and economic development that might contribute to improved nutritional outcomes. This report presents the results of a media audit (MA) that was conducted to inform the development and production of mass media advocacy and communication strategies and materials with a focus on maternal and child health and nutrition that would reach the most people from the poorest communities in northern Lao PDR. Making more people aware of useful information, essential services and products and influencing them to use these effectively is the ultimate goal of mass media campaigns, and the MA measures the potential effectiveness of media efforts to reach this goal. The effectiveness of communication channels to deliver health and nutrition messages to target beneficiaries to ensure maximum reach and uptake can be viewed in terms of preferences, satisfaction, and trust. Overall, the four most accessed media channels for receiving information among communities in the study areas were village announcements, mobile phones, television, and out-of-home (OOH) media. Of the accessed media channels, the top three most preferred channels were village announcements (40 percent), television (26 percent), and mobile phones (19 percent). In terms of trust, village announcements were the most trusted source of information (64 percent), followed by mobile phones (14 percent) and television (11 percent). Hence of all the media channels, village announcements are the most preferred, have the most satisfied users, and are the most trusted source of information in study communities from four provinces in Lao PDR with some of the highest burden of childhood undernutrition.
  • Publication
    Business Ready 2024
    (Washington, DC: World Bank, 2024-10-03) World Bank
    Business Ready (B-READY) is a new World Bank Group corporate flagship report that evaluates the business and investment climate worldwide. It replaces and improves upon the Doing Business project. B-READY provides a comprehensive data set and description of the factors that strengthen the private sector, not only by advancing the interests of individual firms but also by elevating the interests of workers, consumers, potential new enterprises, and the natural environment. This 2024 report introduces a new analytical framework that benchmarks economies based on three pillars: Regulatory Framework, Public Services, and Operational Efficiency. The analysis centers on 10 topics essential for private sector development that correspond to various stages of the life cycle of a firm. The report also offers insights into three cross-cutting themes that are relevant for modern economies: digital adoption, environmental sustainability, and gender. B-READY draws on a robust data collection process that includes specially tailored expert questionnaires and firm-level surveys. The 2024 report, which covers 50 economies, serves as the first in a series that will expand in geographical coverage and refine its methodology over time, supporting reform advocacy, policy guidance, and further analysis and research.