Publication: Applying Machine Learning and Geolocation Techniques to Social Media Data (Twitter) to Develop a Resource for Urban Planning
Loading...
Published
2020-12
ISSN
Date
2020-12-10
Editor(s)
Abstract
With all the recent attention focused on big data, it is easy to overlook that basic vital statistics remain difficult to obtain in most of the world. This project set out to test whether an openly available dataset (Twitter) could be transformed into a resource for urban planning and development. The hypothesis is tested by creating road traffic crash location data, which are scarce in most resource-poor environments but essential for addressing the number one cause of mortality for children over age five and young adults. The research project scraped 874,588 traffic-related tweets in Nairobi, Kenya, applied a machine learning model to capture the occurrence of a crash, and developed an improved geoparsing algorithm to identify its location. The project geolocated 32,991 crash reports in Twitter for 2012-20 and clustered them into 22,872 unique crashes to produce one of the first crash maps for Nairobi. A motorcycle delivery service was dispatched in real-time to verify a subset of crashes, showing 92 percent accuracy. Using a spatial clustering algorithm, portions of the road network (less than 1 percent) were identified where 50 percent of the geolocated crashes occurred. Even with limitations in the representativeness of the data, the results can provide urban planners useful information to target road safety improvements where resources are limited.
Link to Data Set
Citation
“Milusheva, Sveta; Marty, Robert; Bedoya, Guadalupe; Williams, Sarah; Resor, Elizabeth; Legovini, Arianna. 2020. Applying Machine Learning and Geolocation Techniques to Social Media Data (Twitter) to Develop a Resource for Urban Planning. Policy Research Working Paper;No. 9488. © World Bank. http://hdl.handle.net/10986/34910 License: CC BY 3.0 IGO.”
Digital Object Identifier
Associated URLs
Associated content
Other publications in this report series
Publication Climate and Social Sustainability in Fragility, Conflict, and Violence Contexts(Washington, DC: World Bank, 2026-01-07)Climate change is widely recognized as a driver of violent conflict, but its broader social effects remain less understood. Ignoring these dimensions risks a vicious cycle where climate policies might undermine socially just adaptation. Evidence is still limited on how climate shocks influence political participation, trust, or migration. This paper helps fill that gap by examining links between climate change, conflict, and social sustainability, with a focus on inclusion, resilience, cohesion, and legitimacy. Using secondary data from 2019–24, the study applies simple correlation-based methods to test three hypotheses on the nature, severity, and composition of these associations. The analysis combines multiple climate impact measures, new conflict classifications, recent social sustainability frameworks, and controls for population and geography. The results reveal strong correlations—not causation—between climate events and contexts of fragility, conflict, and violence. Climate impacts are most pronounced in both national and subnational conflict settings. The study also finds robust links between fragility, conflict, and violence and low levels of social sustainability, reflecting its role as both a driver and consequence of conflict. Some dimensions—such as violent events and insecurity—appear weaker in areas most affected by climate shocks. Two of the hypotheses are supported, and one remains inconclusive.Publication The Macroeconomic Implications of Climate Change Impacts and Adaptation Options(Washington, DC: World Bank, 2025-05-29)Estimating the macroeconomic implications of climate change impacts and adaptation options is a topic of intense research. This paper presents a framework in the World Bank's macrostructural model to assess climate-related damages. This approach has been used in many Country Climate and Development Reports, a World Bank diagnostic that identifies priorities to ensure continued development in spite of climate change and climate policy objectives. The methodology captures a set of impact channels through which climate change affects the economy by (1) connecting a set of biophysical models to the macroeconomic model and (2) exploring a set of development and climate scenarios. The paper summarizes the results for five countries, highlighting the sources and magnitudes of their vulnerability --- with estimated gross domestic product losses in 2050 exceeding 10 percent of gross domestic product in some countries and scenarios, although only a small set of impact channels is included. The paper also presents estimates of the macroeconomic gains from sector-level adaptation interventions, considering their upfront costs and avoided climate impacts and finding significant net gross domestic product gains from adaptation opportunities identified in the Country Climate and Development Reports. Finally, the paper discusses the limits of current modeling approaches, and their complementarity with empirical approaches based on historical data series. The integrated modeling approach proposed in this paper can inform policymakers as they make proactive decisions on climate change adaptation and resilience.Publication Institutional Capacity for Policy Implementation: An Analytical Framework(Washington, DC: World Bank, 2026-01-07)State capacity is an important prerequisite for policy implementation, yet at the country level it is difficult to measure, assess, and reform. This paper proposes a focus on institutional capacity: the ability of public institutions to implement the specific policy mandates for which they are responsible. Based on a review of existing literature, the paper defines the different dimensions that compose institutional capacity and groups them into two cross-cutting categories: organizational dimensions (personnel, financial resources, information systems, and management practices) and governance dimensions (transparency, independence, and accountability). The paper proposes measures for organizational and governance dimensions using existing data, shows intra-institutional variation of these measures within countries, and discusses how new data could be collected for better measurement of these concepts. Finally, the paper illustrates how the framework can be used to diagnose the sources of common problems related to weak policy implementation.Publication South Africa’s Fragmented Cities: The Unequal Burden of Labor Market Frictions(Washington, DC: World Bank, 2026-01-08)Using high-resolution administrative, census, and satellite data, this paper shows that South African cities are characterized by spatial mismatches between where people live and where jobs are located, relative to 20 global peers. Areas within 5 kilometers of commercial centers have 9,300 fewer residents per square kilometer than expected, which is 60 percent below the global median. Poor, dense neighborhoods are most affected. In Johannesburg, a 10-percentile increase in distance from the nearest business hub corresponds to a 3.7-percentile drop in asset wealth (a proxy of household wellbeing) and 4.9-percentile drop in employment. In Cape Town, the declines are 4.0 and 3.7 percentiles, respectively. Employment is 87 percent lower in the poorest decile than the richest in Johannesburg and 61 percent lower in Cape Town. These findings suggest that South Africa’s spatial organization of people and economic activity constrains agglomeration and reinforces inequality. This methodology provides a scalable and standardized data-driven framework to analyze spatial accessibility and agglomeration frictions in complex, data-constrained urban systems.Publication Investment in Emerging and Developing Economies(Washington, DC: World Bank, 2026-01-07)The world faces a pressing challenge to meet key development objectives amid slowing growth and rising macroeconomic and geopolitical risks. With the number of job seekers rising rapidly, infrastructure shortfalls continuing to be large, and climate costs mounting, the case for a significant investment push has never been stronger. Yet the capacity to respond in many emerging markets and developing economies has eroded. Since the global financial crisis, investment growth has slowed to about half its pace in the 2000s, with both public and private investment weakening. Foreign direct investment inflows—a critical source of capital, technology, and managerial know-how—have also fallen sharply and become increasingly concentrated, leaving low-income countries with only a marginal share. The risks of further retrenchment are significant, as trade tensions, policy uncertainty, and elevated debt levels continue to weigh on investment. Reigniting momentum will require ambitious domestic reforms to strengthen institutions, rebuild macro-fiscal stability, and deepen trade and investment integration—the foundations of a supportive business climate. At the same time, international cooperation is indispensable. A renewed commitment to a predictable system of cross-border trade and investment flows, combined with scaled-up financial support and sustained technical assistance, is essential to help emerging markets and developing economies—especially low-income countries and economies in fragile and conflict situations—bridge financing gaps and implement the domestic reforms needed to restore investment as an engine of growth, jobs, and development.
Journal
Journal Volume
Journal Issue
Collections
Related items
Showing items related by metadata.
Publication The Unintended Consequences of Curfews on Road Safety(World Bank, Washington, DC, 2023-04-13)During COVID-19, curfews spread like wildfire. Although their impact on curbing the spread of disease remains to be proven, curfews have the potential to bring about costs to society in multiple domains. This paper investigates the impact of curfews on road safety in an urban lower-middle- income setting. It shows that curfews lead to large reductions in crashes during the curfew hours when cars are off the road, but that these reductions can be fully offset by an increase in crashes during heavy traffic hours when people rush to get home before the curfew starts. These spillover effects result from a behavioral response to the curfew—increased driving speed—leading to higher crash rates. These findings forewarn that the use of curfews in future crises and pandemics should be carefully scrutinized and designed to minimize unintended negative effects.Publication Program Targeting with Machine Learning and Mobile Phone Data(World Bank, Washington, DC, 2022-12)Can mobile phone data improve program targeting By combining rich survey data from the baseline of a “big push” anti-poverty program in Afghanistan implemented in 2016 with detailed mobile phone logs from program beneficiaries, this paper studies the extent to which machine learning methods can accurately differentiate ultra-poor households eligible for program benefits from ineligible households. The paper shows that machine learning methods leveraging mobile phone data can identify ultra-poor households nearly as accurately as survey-based measures of consumption and wealth; and that combining survey-based measures with mobile phone data produces classifications more accurate than those based on a single data source.Publication Tracking Advances in Access to Electricity Using Satellite-Based Data and Machine Learning to Complement Surveys(World Bank, Washington, DC, 2021-04-15)Access to electricity is widely considered a major determinant of socioeconomic development. But despite long-standing efforts to expand access, 789 million people remained without electricity in 2018. Accurate and reliable data to keep track of electrification efforts must be the first step toward achieving universal access. Monitoring access with the finest granularity and taking into account local socioeconomic characteristics enable a realistic depiction of electrification progress. Such data can be used to plan efficient and robust energy access policies and programs, to raise public awareness of the urgency of action, to sustain the pace of electrification, and ultimately to connect the hardest-to-reach populations. In addition to identifying where efforts should be targeted, high-resolution data are needed to show which electricity supply options are most relevant. Remote sensing techniques and geographic information systems have revolutionized data collection by providing a range of location-specific information that was not previously accessible. The use of standardized geospatial tools and methods has made it possible to offer countries technical assistance and operational support for the development of national electrification strategies, least-cost electrification plans, and country-based investment prospectuses that combine grid, mini-grid, and off-grid technologies.Publication Weighting Justice Reform Costs and Benefits Using Machine Learning and Modern Data Science(World Bank, Washington, DC, 2023-05-22)Can the impact of justice processes be enhanced with the inclusion of a heterogeneous component into an existing cost-benefit analysis app that demonstrates how benefactors and beneficiaries are affected Such a component requires (i) moving beyond the traditional cost-benefit conceptual framework of utilizing averages, (ii) identification of social group or population-specific variation, (iii) identification of how justice processes differ across groups/populations, (iv) distribution of costs and benefits according to the identified variations, and (v) utilization of empirically informed statistical techniques to gain new insights from data and maximize the impact for beneficiaries. This paper outlines a method for capturing heterogeneity. The paper tests the method and the cost-benefit analysis online app that was developed using primary data collected from a developmental crime prevention intervention in Australia. The paper identifies how subgroups in the intervention display different behavioral adjustments across the reference period, revealing the heterogeneous distribution of costs and benefits. Finally, the paper discusses the next version of the cost-benefit analysis app, which incorporates an artificial intelligence-driven component that reintegrates individual cost-benefit analysis projects using machine learning and other modern data science techniques. The paper argues that the app enhances cost-benefit analysis, development outcomes, and policy making efficiency for optimal prioritization of criminal justice resources. Further, the app advances the policy accessibility of enhanced, social group-specific data, illuminating optimal policy orientation for more inclusive, just, and resilient societal outcomes—an approach with potential across broader public policy.Publication What Can We (Machine) Learn about Welfare Dynamics from Cross-Sectional Data?(World Bank, Washington, DC, 2018-08)This paper implements a machine learning approach to estimate intra-generational economic mobility using cross-sectional data. A Least Absolute Shrinkage and Selection Operator (Lasso) procedure is applied to explore poverty dynamics and household-level welfare growth in the absence of panel data sets that follow individuals over time. The method is validated by sampling repeated cross-sections of actual panel data from Peru. In general, the approach performs well at estimating intra-generational poverty transitions; most of the mobility estimates fall within the 95 percent confidence intervals of poverty mobility from the actual panel data. The validation also confirms that the Lasso regularization procedure performs well at estimating household-level welfare growth between two years. Overall, the results are sufficiently encouraging to estimate economic mobility in settings where panel data are not available or, if they are, to improve panel data when they suffer from serious non-random attrition problems.
Users also downloaded
Showing related downloaded files
Publication Kyrgyz Republic Country Climate and Development Report(Washington, DC: World Bank, 2025-11-03)This Country Climate and Development Report (CCDR) on the Kyrgyz Republic aims to support the country’s development goals amid a changing climate. The CCDR considers two policy scenarios up to 2050: the business-as-usual (BAU) and high-growth scenarios. As it quantifies the likely impacts of climate change on the Kyrgyz economy between now and 2050, the report highlights key government actions to best prepare for and adapt to climate impacts (referred to as “with adaptation” measures), with a particular focus on the time horizon up to 2030. The CCDR also outlines a path to net zero emissions by 2050 (referred to as “with mitigation” measures, “decarbonization,” or, simply, “net zero 2050”), highlighting associated development co-benefits.Publication Direct and Indirect Impacts of Transport Mobility on Access to Jobs: Evidence from South Africa(Washington, DC: World Bank, 2025-11-12)Access to jobs is essential for economic growth. In Africa, unemployment rates are notably high. This paper reexamines the relationship between transport mobility and labor market outcomes, with a particular focus on the direct and indirect effects of transport connectivity. As predicted by theory, wages are influenced by the level of commuting deterrence. Generally, higher earnings are associated with longer commute times and/or higher commuting costs. Local accessibility is also important, especially for individuals with time constraints. Both direct and indirect impacts are found to be significant in South Africa, where job accessibility has been challenging since the end of apartheid. For the direct impact, the wage elasticity associated with commuting costs is significant. Returns on commute are particularly high for women. Local accessibility to socioeconomic facilities, such as shops and health services, is also found to have a significant impact, consistent with the concept of mobility of care. To enhance employment, therefore, it is crucial to connect people not only to job locations but also to various socioeconomic points of interest, such as markets and hospitals, in an integrated manner. This integration will enable individuals to spend more time working and commuting longer distances.Publication Digital Africa(Washington, DC: World Bank, 2023-03-13)All African countries need better and more jobs for their growing populations. "Digital Africa: Technological Transformation for Jobs" shows that broader use of productivity-enhancing, digital technologies by enterprises and households is imperative to generate such jobs, including for lower-skilled people. At the same time, it can support not only countries’ short-term objective of postpandemic economic recovery but also their vision of economic transformation with more inclusive growth. These outcomes are not automatic, however. Mobile internet availability has increased throughout the continent in recent years, but Africa’s uptake gap is the highest in the world. Areas with at least 3G mobile internet service now cover 84 percent of Africa’s population, but only 22 percent uses such services. And the average African business lags in the use of smartphones and computers as well as more sophisticated digital technologies that catalyze further productivity gains. Two issues explain the usage gap: affordability of these new technologies and willingness to use them. For the 40 percent of Africans below the extreme poverty line, mobile data plans alone would cost one-third of their incomes—in addition to the price of access devices, apps, and electricity. Data plans for small- and medium-size businesses are also more expensive than in other regions. Moreover, shortcomings in the quality of internet services—and in the supply of attractive, skills-appropriate apps that promote entrepreneurship and raise earnings—dampen people’s willingness to use them. For those countries already using these technologies, the development payoffs are significant. New empirical studies for this report add to the rapidly growing evidence that mobile internet availability directly raises enterprise productivity, increases jobs, and reduces poverty throughout Africa. To realize these and other benefits more widely, Africa’s countries must implement complementary and mutually reinforcing policies to strengthen both consumers’ ability to pay and willingness to use digital technologies. These interventions must prioritize productive use to generate large numbers of inclusive jobs in a region poised to benefit from a massive, youthful workforce—one projected to become the world’s largest by the end of this century.Publication Continental Drying: A Threat to Our Common Future(Washington, DC: World Bank, 2025-11-04)Grounded in new evidence from satellite data, “Continental Drying: A Threat to Our Common Future” presents the first global assessment of freshwater reserves over the past two decades. The findings expose an alarming trend of “continental drying,” a persistent long-term decline in freshwater availability across vast landmasses. Not only are droughts and deluges becoming more unpredictable, but the total amount of freshwater available for use has also significantly declined. Continental drying, driven by global warming, worsening droughts, and unsustainable water and land use, is a silent but accelerating crisis—largely unknown to the public—that reshapes the global water narrative. Continental drying raises profound risks. This report reveals new empirical evidence showing how freshwater depletion leads to major job losses, reduced incomes, wildfires, and biodiversity threats. In the long term, the combined effects of drying and warming could push societies toward a tipping point where damage accelerates rapidly and adaptation becomes increasingly difficult. Against the backdrop of continental drying, global water consumption rose by 25 percent between 2000 and 2019, with about a third of this increase occurring in regions already experiencing drying. Compounding the pressure, a substantial share of water use in drying regions remains inefficient. Continental Drying identifies hot spots where rising demand and declining supply converge and explores where and how water savings can be realized. This report recommends a three-pronged approach to address the crisis: managing demand, augmenting water supply, and improving water allocation. Five cross-cutting levers—strengthening institutions, reforming water tariffs and repurposing subsidies, adopting water accounting, leveraging data and technological innovations, and valuing water in trade—are essential for effective implementation and to attract private investment to finance the approach. Beyond water, addressing trade barriers, investing in education and skills development, and improving access to markets and financial services are critical for strengthening job and livelihood resilience amid a continental drying crisis.Publication Taxes, Spending, and Equity: International Patterns and Lessons for Developing Countries(Washington, DC: World Bank, 2025-11-17)Taxes and public spending underpin the basic administration of government and finance the human capital and infrastructure investments needed for economic growth. They can also have a significant and immediate impact on poverty and inequality. The question of how public finance can support longer-term growth objectives while promoting equity has become even more important in recent years, given the high fiscal deficits and debt levels most countries emerged with in the aftermath of the COVID-19 pandemic. These included the increasing cost of debt and the need to restart environmentally sustainable growth while helping households address the learning losses and other social scars caused by the pandemic. This paper examines the global evidence on which households pay which taxes and who benefits from what spending, and critically, the net effect on different households across the income distribution. The aim is to identify the patterns and lessons that emerge for designing progressive fiscal policies. A global dataset of 96 countries is assembled, spanning all regions of the world and all national income levels, grounded in the Commitment to Equity (CEQ) approach to fiscal incidence.