Publication:
Applying Machine Learning and Geolocation Techniques to Social Media Data (Twitter) to Develop a Resource for Urban Planning

Abstract
With all the recent attention focused on big data, it is easy to overlook that basic vital statistics remain difficult to obtain in most of the world. This project set out to test whether an openly available dataset (Twitter) could be transformed into a resource for urban planning and development. The hypothesis is tested by creating road traffic crash location data, which are scarce in most resource-poor environments but essential for addressing the number one cause of mortality for children over age five and young adults. The research project scraped 874,588 traffic-related tweets in Nairobi, Kenya, applied a machine learning model to capture the occurrence of a crash, and developed an improved geoparsing algorithm to identify its location. The project geolocated 32,991 crash reports in Twitter for 2012-20 and clustered them into 22,872 unique crashes to produce one of the first crash maps for Nairobi. A motorcycle delivery service was dispatched in real-time to verify a subset of crashes, showing 92 percent accuracy. Using a spatial clustering algorithm, portions of the road network (less than 1 percent) were identified where 50 percent of the geolocated crashes occurred. Even with limitations in the representativeness of the data, the results can provide urban planners useful information to target road safety improvements where resources are limited.
Link to Data Set
Citation
Milusheva, Sveta; Marty, Robert; Bedoya, Guadalupe; Williams, Sarah; Resor, Elizabeth; Legovini, Arianna. 2020. Applying Machine Learning and Geolocation Techniques to Social Media Data (Twitter) to Develop a Resource for Urban Planning. Policy Research Working Paper;No. 9488. © World Bank. http://hdl.handle.net/10986/34910 License: CC BY 3.0 IGO.
Associated URLs
Associated content
Report Series
Report Series
Other publications in this report series
  • Publication
    The Macroeconomic Implications of Climate Change Impacts and Adaptation Options
    (Washington, DC: World Bank, 2025-05-29) Abalo, Kodzovi; Boehlert, Brent; Bui, Thanh; Burns, Andrew; Castillo, Diego; Chewpreecha, Unnada; Haider, Alexander; Hallegatte, Stephane; Jooste, Charl; McIsaac, Florent; Ruberl, Heather; Smet, Kim; Strzepek, Ken
    Estimating the macroeconomic implications of climate change impacts and adaptation options is a topic of intense research. This paper presents a framework in the World Bank's macrostructural model to assess climate-related damages. This approach has been used in many Country Climate and Development Reports, a World Bank diagnostic that identifies priorities to ensure continued development in spite of climate change and climate policy objectives. The methodology captures a set of impact channels through which climate change affects the economy by (1) connecting a set of biophysical models to the macroeconomic model and (2) exploring a set of development and climate scenarios. The paper summarizes the results for five countries, highlighting the sources and magnitudes of their vulnerability --- with estimated gross domestic product losses in 2050 exceeding 10 percent of gross domestic product in some countries and scenarios, although only a small set of impact channels is included. The paper also presents estimates of the macroeconomic gains from sector-level adaptation interventions, considering their upfront costs and avoided climate impacts and finding significant net gross domestic product gains from adaptation opportunities identified in the Country Climate and Development Reports. Finally, the paper discusses the limits of current modeling approaches, and their complementarity with empirical approaches based on historical data series. The integrated modeling approach proposed in this paper can inform policymakers as they make proactive decisions on climate change adaptation and resilience.
  • Publication
    Global Poverty Revisited Using 2021 PPPs and New Data on Consumption
    (Washington, DC: World Bank, 2025-06-05) Foster, Elizabeth; Jolliffe, Dean Mitchell; Ibarra, Gabriel Lara; Lakner, Christoph; Tettah-Baah, Samuel
    Recent improvements in survey methodologies have increased measured consumption in many low- and lower-middle-income countries that now collect a more comprehensive measure of household consumption. Faced with such methodological changes, countries have frequently revised upward their national poverty lines to make them appropriate for the new measures of consumption. This in turn affects the World Bank’s global poverty lines when they are periodically revised. The international poverty line, which is based on the typical poverty line in low-income countries, increases by around 40 percent to $3.00 when the more recent national poverty lines as well as the 2021 purchasing power parities are incorporated. The net impact of the changes in international prices, the poverty line, and new survey data (including new data for India) is an increase in global extreme poverty by some 125 million people in 2022, and a significant shift of poverty away from South Asia and toward Sub-Saharan Africa. The changes at higher poverty lines, which are more relevant to middle-income countries, are mixed.
  • Publication
    Geopolitical Fragmentation and Friendshoring
    (Washington, DC: World Bank, 2025-06-26) Grover, Arti; Vézina, Pierre-Louis
    This paper examines the relationship between geopolitical fragmentation and friendshoring of foreign investments over time, countries, and sectors. The analysis uses comprehensive data on foreign direct investments covering greenfield projects, mergers and acquisitions, and stocks of affiliates, as well as data on four alternative measures of geopolitical distance between countries. The gravity estimations suggest that, first, geopolitical differences have a negative effect on foreign investments and the magnitude has heightened in the post-pandemic period compared to a decade ago. Second, it is primarily the companies from advanced Western economies whose foreign investment decisions are increasingly shaped by friendshoring forces. Finally, the paper shows that friendshoring is not only confined to strategic industries, implying that allocations of foreign direct investments may not solely reflect national security or resilience considerations.
  • Publication
    Soaring Food Prices Threaten Recent Economic Gains in the EU
    (Washington, DC: World Bank, 2025-07-02) Robayo, Monica; Lucchetti, Leonardo Ramiro; Delgado-Prieto, Lukas; Badiani-Magnusson, Reena
    The surge in food prices following the 2021 economic rebound has become a significant concern for households, particularly low-income ones, in Bulgaria, Croatia, Poland, and Romania. Food price inflation, which surpasses general inflation rates, risks worsening poverty and food insecurity in these countries. This paper explores the distributional impacts of rising food prices and the effectiveness of government response measures. Low-income households, who allocate a larger share of their income to food, are disproportionately affected and are struggling to cope with unexpected expenses, leading to increased difficulties in accessing proper nutrition. Simulations indicate that rising food prices contribute to higher poverty rates and greater income inequality, especially among vulnerable populations. They also suggest that the main poverty-targeted social assistance schemes offer critical support for the extreme poor, but expanding both coverage and benefits is vital to shield all at-risk individuals. Targeted policies that balance immediate relief with long-term resilience-building are essential to addressing the challenges posed by escalating food prices.
  • Publication
    Disentangling the Key Economic Channels through Which Infrastructure Affects Jobs
    (Washington, DC: World Bank, 2025-04-03) Vagliasindi, Maria; Gorgulu, Nisan
    This paper takes stock of the literature on infrastructure and jobs published since the early 2000s, using a conceptual framework to identify the key channels through which different types of infrastructure impact jobs. Where relevant, it highlights the different approaches and findings in the cases of energy, digital, and transport infrastructure. Overall, the literature review provides strong evidence of infrastructure’s positive impact on employment, particularly for women. In the case of electricity, this impact arises from freeing time that would otherwise be spent on household tasks. Similarly, digital infrastructure, particularly mobile phone coverage, has demonstrated positive labor market effects, often driven by private sector investments rather than large public expenditures, which are typically required for other large-scale infrastructure projects. The evidence on structural transformation is also positive, with some notable exceptions, such as studies that find no significant impact on structural transformation in rural India in the cases of electricity and roads. Even with better market connections, remote areas may continue to lack economic opportunities, due to the absence of agglomeration economies and complementary inputs such as human capital. Accordingly, reducing transport costs alone may not be sufficient to drive economic transformation in rural areas. The spatial dimension of transformation is particularly relevant for transport, both internationally—by enhancing trade integration—and within countries, where economic development tends to drive firms and jobs toward urban centers, benefitting from economies scale and network effects. Turning to organizational transformation, evidence on skill bias in developing countries is more mixed than in developed countries and may vary considerably by context. Further research, especially on the possible reasons explaining the differences between developed and developing economies, is needed.
Journal
Journal Volume
Journal Issue

Related items

Showing items related by metadata.

  • Publication
    The Unintended Consequences of Curfews on Road Safety
    (World Bank, Washington, DC, 2023-04-13) Bedoya, Guadalupe; Dolinge, Amy; Dolkart, Caitlin; Legovini, Arianna; Milusheva, Sveta; Marty, Robert; Taniform, Peter
    During COVID-19, curfews spread like wildfire. Although their impact on curbing the spread of disease remains to be proven, curfews have the potential to bring about costs to society in multiple domains. This paper investigates the impact of curfews on road safety in an urban lower-middle- income setting. It shows that curfews lead to large reductions in crashes during the curfew hours when cars are off the road, but that these reductions can be fully offset by an increase in crashes during heavy traffic hours when people rush to get home before the curfew starts. These spillover effects result from a behavioral response to the curfew—increased driving speed—leading to higher crash rates. These findings forewarn that the use of curfews in future crises and pandemics should be carefully scrutinized and designed to minimize unintended negative effects.
  • Publication
    Program Targeting with Machine Learning and Mobile Phone Data
    (World Bank, Washington, DC, 2022-12) Aiken, Emily L.; Bedoya, Guadalupe; Blumenstock, Joshua E.; Coville, Aidan
    Can mobile phone data improve program targeting By combining rich survey data from the baseline of a “big push” anti-poverty program in Afghanistan implemented in 2016 with detailed mobile phone logs from program beneficiaries, this paper studies the extent to which machine learning methods can accurately differentiate ultra-poor households eligible for program benefits from ineligible households. The paper shows that machine learning methods leveraging mobile phone data can identify ultra-poor households nearly as accurately as survey-based measures of consumption and wealth; and that combining survey-based measures with mobile phone data produces classifications more accurate than those based on a single data source.
  • Publication
    Weighting Justice Reform Costs and Benefits Using Machine Learning and Modern Data Science
    (World Bank, Washington, DC, 2023-05-22) Mahony, Chris; Manning, Matthew; Wong, Gabriel
    Can the impact of justice processes be enhanced with the inclusion of a heterogeneous component into an existing cost-benefit analysis app that demonstrates how benefactors and beneficiaries are affected Such a component requires (i) moving beyond the traditional cost-benefit conceptual framework of utilizing averages, (ii) identification of social group or population-specific variation, (iii) identification of how justice processes differ across groups/populations, (iv) distribution of costs and benefits according to the identified variations, and (v) utilization of empirically informed statistical techniques to gain new insights from data and maximize the impact for beneficiaries. This paper outlines a method for capturing heterogeneity. The paper tests the method and the cost-benefit analysis online app that was developed using primary data collected from a developmental crime prevention intervention in Australia. The paper identifies how subgroups in the intervention display different behavioral adjustments across the reference period, revealing the heterogeneous distribution of costs and benefits. Finally, the paper discusses the next version of the cost-benefit analysis app, which incorporates an artificial intelligence-driven component that reintegrates individual cost-benefit analysis projects using machine learning and other modern data science techniques. The paper argues that the app enhances cost-benefit analysis, development outcomes, and policy making efficiency for optimal prioritization of criminal justice resources. Further, the app advances the policy accessibility of enhanced, social group-specific data, illuminating optimal policy orientation for more inclusive, just, and resilient societal outcomes—an approach with potential across broader public policy.
  • Publication
    Tracking Advances in Access to Electricity Using Satellite-Based Data and Machine Learning to Complement Surveys
    (World Bank, Washington, DC, 2021-04-15) Dhorne, Milien; Nicolas, Claire; Arderne, Christopher; Besnard, Juliette
    Access to electricity is widely considered a major determinant of socioeconomic development. But despite long-standing efforts to expand access, 789 million people remained without electricity in 2018. Accurate and reliable data to keep track of electrification efforts must be the first step toward achieving universal access. Monitoring access with the finest granularity and taking into account local socioeconomic characteristics enable a realistic depiction of electrification progress. Such data can be used to plan efficient and robust energy access policies and programs, to raise public awareness of the urgency of action, to sustain the pace of electrification, and ultimately to connect the hardest-to-reach populations. In addition to identifying where efforts should be targeted, high-resolution data are needed to show which electricity supply options are most relevant. Remote sensing techniques and geographic information systems have revolutionized data collection by providing a range of location-specific information that was not previously accessible. The use of standardized geospatial tools and methods has made it possible to offer countries technical assistance and operational support for the development of national electrification strategies, least-cost electrification plans, and country-based investment prospectuses that combine grid, mini-grid, and off-grid technologies.
  • Publication
    What Can We (Machine) Learn about Welfare Dynamics from Cross-Sectional Data?
    (World Bank, Washington, DC, 2018-08) Lucchetti, Leonardo
    This paper implements a machine learning approach to estimate intra-generational economic mobility using cross-sectional data. A Least Absolute Shrinkage and Selection Operator (Lasso) procedure is applied to explore poverty dynamics and household-level welfare growth in the absence of panel data sets that follow individuals over time. The method is validated by sampling repeated cross-sections of actual panel data from Peru. In general, the approach performs well at estimating intra-generational poverty transitions; most of the mobility estimates fall within the 95 percent confidence intervals of poverty mobility from the actual panel data. The validation also confirms that the Lasso regularization procedure performs well at estimating household-level welfare growth between two years. Overall, the results are sufficiently encouraging to estimate economic mobility in settings where panel data are not available or, if they are, to improve panel data when they suffer from serious non-random attrition problems.

Users also downloaded

Showing related downloaded files

  • Publication
    Argentina Country Climate and Development Report
    (World Bank, Washington, DC, 2022-11) World Bank Group
    The Argentina Country Climate and Development Report (CCDR) explores opportunities and identifies trade-offs for aligning Argentina’s growth and poverty reduction policies with its commitments on, and its ability to withstand, climate change. It assesses how the country can: reduce its vulnerability to climate shocks through targeted public and private investments and adequation of social protection. The report also shows how Argentina can seize the benefits of a global decarbonization path to sustain a more robust economic growth through further development of Argentina’s potential for renewable energy, energy efficiency actions, the lithium value chain, as well as climate-smart agriculture (and land use) options. Given Argentina’s context, this CCDR focuses on win-win policies and investments, which have large co-benefits or can contribute to raising the country’s growth while helping to adapt the economy, also considering how human capital actions can accompany a just transition.
  • Publication
    Classroom Assessment to Support Foundational Literacy
    (Washington, DC: World Bank, 2025-03-21) Luna-Bazaldua, Diego; Levin, Victoria; Liberman, Julia; Gala, Priyal Mukesh
    This document focuses primarily on how classroom assessment activities can measure students’ literacy skills as they progress along a learning trajectory towards reading fluently and with comprehension by the end of primary school grades. The document addresses considerations regarding the design and implementation of early grade reading classroom assessment, provides examples of assessment activities from a variety of countries and contexts, and discusses the importance of incorporating classroom assessment practices into teacher training and professional development opportunities for teachers. The structure of the document is as follows. The first section presents definitions and addresses basic questions on classroom assessment. Section 2 covers the intersection between assessment and early grade reading by discussing how learning assessment can measure early grade reading skills following the reading learning trajectory. Section 3 compares some of the most common early grade literacy assessment tools with respect to the early grade reading skills and developmental phases. Section 4 of the document addresses teacher training considerations in developing, scoring, and using early grade reading assessment. Additional issues in assessing reading skills in the classroom and using assessment results to improve teaching and learning are reviewed in section 5. Throughout the document, country cases are presented to demonstrate how assessment activities can be implemented in the classroom in different contexts.
  • Publication
    Improving the Performance of Higher Education in Vietnam
    (World Bank, Washington, DC, 2020-04-28) World Bank
    The progress of East Asian economies in recent years illustrates a strong symbiotic relationship among higher education, innovation, and growth through the production of research and skills. In the case of Vietnam, higher education has a significant positive effect on household poverty and long-term earnings at the individual level, where annualized private returns to higher education are above fifteen percent, one of the highest levels in the world. As Vietnam aspires to become an upper middle-income country by 2035, its productivity needs to increase continuously, which requires greater production and effective use of highskilled manpower and science, technology and innovation (STI). There is a disconnect between Vietnam’s remarkable achievement on equitable economic growth and human development, on the one hand, and the performance of the higher education system, on the other hand. Vietnam has experimented with a number of higher education reforms in the last two decades, with some success in expanding access but missing opportunities in achieving good results on quality and relevance, and in furthering equity. The main objective of this Bank’s report is to provide a diagnosis of the current performance of the Vietnamese universities and propose a range of options for transforming and developing the higher education system.
  • Publication
    Remarks at the G20 Finance Ministers and Central Bank Governors Meeting
    (World Bank, Washington, DC, 2021-04-07) Malpass, David
    David Malpass, President of the World Bank, discussed the World Bank climate change action plan, debt and the DSSI, and resource needs for IDA countries.
  • Publication
    Remarks to the Annual Meetings 2020 Development Committee
    (World Bank, Washington, DC, 2020-10-16) Malpass, David
    David Malpass, President of the World Bank Group, announced that the Board approved a fast track approach to emergency health support programs that now covers 111 countries. Most projects are well advanced, with average disbursement upward of 40 percent. The goal is to take broad, fast action early. The operational framework presented back in June has positioned the Bank to help countries address immediate health threats and social and economic impacts and maintain our focus on long-term development. The Bank is making good progress toward the 15-month target of 160 billion dollars in surge financing. Much of it is for the poorest countries and will take the form of grants or low-rate, long-maturity loans. IFC, through the Global Health Platform, will be providing financing to vaccine manufacturers to foster expanded production of COVID-19 vaccines in both part 1 and 2 countries, providing production is reserved for emerging markets. The Development Committee holds a unique place in the international architecture. It is the only global forum in which the Governments of developed countries and the Governments of developing countries, creditor countries and borrower countries, come together to discuss development and the ‘net transfer of resources to developing countries.’ The current International Financial Architecture system is skewed in favor of the rich and creditor countries. It is important that all voices are heard, so Malpass urged the Ministers of developing countries to use their voice and speak their minds today. Malpass urged consideration of how we can build a new approach to debt restructuring that allows for a fair relationship and balance between creditors and debtors. This will be critical in restoring growth in developing countries; and helping reverse the inequality.