Publication:
Using Post-Double Selection Lasso in Field Experiments

Loading...
Thumbnail Image
Files in English
English PDF (1.99 MB)
538 downloads
English Text (108.09 KB)
35 downloads
Published
2024-09-27
ISSN
Date
2024-09-27
Author(s)
Cilliers, Jacobus
Elashmawy, Nour
Editor(s)
Abstract
The post-double selection Lasso estimator has become a popular way of selecting control variables when analyzing randomized experiments. This is done to try to improve precision, and reduce bias from attrition or chance imbalances. This paper re-estimates 780 treatment effects from published papers to examine how much difference this approach makes in practice. PDS Lasso is found to reduce standard errors by less than one percent compared to standard Ancova on average and does not select variables to model treatment in over half the cases. The authors discuss and provide evidence on the key practical decisions researchers face in using this method.
Link to Data Set
Citation
Cilliers, Jacobus; Elashmawy, Nour; McKenzie, David. 2024. Using Post-Double Selection Lasso in Field Experiments. Policy Research Working Paper; 10931. © World Bank. http://hdl.handle.net/10986/42206 License: CC BY 3.0 IGO.
Associated URLs
Associated content
Report Series
Report Series
Other publications in this report series
  • Publication
    The Economic Value of Weather Forecasts: A Quantitative Systematic Literature Review
    (Washington, DC: World Bank, 2025-09-10) Farkas, Hannah; Linsenmeier, Manuel; Talevi, Marta; Avner, Paolo; Jafino, Bramka Arga; Sidibe, Moussa
    This study systematically reviews the literature that quantifies the economic benefits of weather observations and forecasts in four weather-dependent economic sectors: agriculture, energy, transport, and disaster-risk management. The review covers 175 peer-reviewed journal articles and 15 policy reports. Findings show that the literature is concentrated in high-income countries and most studies use theoretical models, followed by observational and then experimental research designs. Forecast horizons studied, meteorological variables and services, and monetization techniques vary markedly by sector. Estimated benefits even within specific subsectors span several orders of magnitude and broad uncertainty ranges. An econometric meta-analysis suggests that theoretical studies and studies in richer countries tend to report significantly larger values. Barriers that hinder value realization are identified on both the provider and user sides, with inadequate relevance, weak dissemination, and limited ability to act recurring across sectors. Policy reports rely heavily on back-of-the-envelope or recursive benefit-transfer estimates, rather than on the methods and results of the peer-reviewed literature, revealing a science-to-policy gap. These findings suggest substantial socioeconomic potential of hydrometeorological services around the world, but also knowledge gaps that require more valuation studies focusing on low- and middle-income countries, addressing provider- and user-side barriers and employing rigorous empirical valuation methods to complement and validate theoretical models.
  • Publication
    The Macroeconomic Implications of Climate Change Impacts and Adaptation Options
    (Washington, DC: World Bank, 2025-05-29) Abalo, Kodzovi; Boehlert, Brent; Bui, Thanh; Burns, Andrew; Castillo, Diego; Chewpreecha, Unnada; Haider, Alexander; Hallegatte, Stephane; Jooste, Charl; McIsaac, Florent; Ruberl, Heather; Smet, Kim; Strzepek, Ken
    Estimating the macroeconomic implications of climate change impacts and adaptation options is a topic of intense research. This paper presents a framework in the World Bank's macrostructural model to assess climate-related damages. This approach has been used in many Country Climate and Development Reports, a World Bank diagnostic that identifies priorities to ensure continued development in spite of climate change and climate policy objectives. The methodology captures a set of impact channels through which climate change affects the economy by (1) connecting a set of biophysical models to the macroeconomic model and (2) exploring a set of development and climate scenarios. The paper summarizes the results for five countries, highlighting the sources and magnitudes of their vulnerability --- with estimated gross domestic product losses in 2050 exceeding 10 percent of gross domestic product in some countries and scenarios, although only a small set of impact channels is included. The paper also presents estimates of the macroeconomic gains from sector-level adaptation interventions, considering their upfront costs and avoided climate impacts and finding significant net gross domestic product gains from adaptation opportunities identified in the Country Climate and Development Reports. Finally, the paper discusses the limits of current modeling approaches, and their complementarity with empirical approaches based on historical data series. The integrated modeling approach proposed in this paper can inform policymakers as they make proactive decisions on climate change adaptation and resilience.
  • Publication
    Labor Demand in the Age of Generative AI: Early Evidence from the U.S. Job Posting Data
    (Washington, DC: World Bank, 2025-11-18) Liu, Yan; Wang, He; Yu, Shu
    This paper examines the causal impact of generative artificial intelligence on U.S. labor demand using online job posting data. Exploiting ChatGPT’s release in November 2022 as an exogenous shock, the paper applies difference-in-differences and event study designs to estimate the job displacement effects of generative artificial intelligence. The identification strategy compares labor demand for occupations with high versus low artificial intelligence substitution vulnerability following ChatGPT’s launch, conditioning on similar generative artificial intelligence exposure levels to isolate substitution effects from complementary uses. The analysis uses 285 million job postings collected by Lightcast from the first quarter of 2018 to the second quarter of 2025Q2. The findings show that the number of postings for occupations with above-median artificial intelligence substitution scores fell by an average of 12 percent relative to those with below-median scores. The effect increased from 6 percent in the first year after the launch to 18 percent by the third year. Losses were particularly acute for entry-level positions that require neither advanced degrees (18 percent) nor extensive experience (20 percent), as well as those in administrative support (40 percent) and professional services (30 percent). Although generative artificial intelligence generates new occupations and enhances productivity, which may increase labor demand, early evidence suggests that some occupations may be less likely to be complemented by generative artificial intelligence than others.
  • Publication
    External Finance in Emerging Markets and Developing Economies: A Tale of Differences in Vulnerabilities
    (Washington, DC: World Bank, 2025-12-04) Kim, Dohan; Milesi-Ferretti, Gian Maria
    Over the past two decades, many emerging markets and developing economies have been viewed as increasingly resilient to external financial shocks. This paper assesses whether such resilience is broadly shared across emerging markets and developing economies by classifying them into three tiers based on economic size, income level, institutional strength, and financial integration. The analysis shows that first-tier emerging markets and developing economies have improved their external balance sheets and reduced dependence on official support. However, second- and third-tier emerging markets and developing economies have experienced growing external vulnerabilities since the global financial crisis, marked by rising external debt liabilities and declining foreign exchange reserves. Using a range of indicators, including sovereign defaults, arrears, partial defaults, and International Monetary Fund lending, the paper identifies episodes of external financial distress and shows that distress remains widespread among second- and third-tier emerging markets and developing economies. The empirical analysis confirms that key components of the net international investment position—especially external debt and foreign exchange reserves—predict the onset of external financial distress, with institutional quality shaping the impact. Weak institutions amplify risks, while strong institutions mitigate them. These findings highlight the importance of recognizing heterogeneity across emerging markets and developing economies, strengthening institutional quality alongside external balance-sheet management, and rebuilding buffers to safeguard against renewed global financial stress.
  • Publication
    Rigging the Scores: Corruption through Scoring Rule Manipulation in Public Procurement Auctions
    (Washington, DC: World Bank, 2025-12-02) Chen, Qianmiao
    Public procurement is highly susceptible to corruption, especially in developing countries. Although open auctions are widely adopted to curb it, this paper finds that corruption remains prevalent even within this procurement format. Procurement officers can collaborate with firms to manipulate scoring rules, ensuring predetermined winners, while corrupt firms submit noncompetitive bids to meet minimum bidder requirements. Using extensive data from Chinese public procurement auctions, the paper introduces model-driven statistical tools to detect such corruption, identifying a corruption rate of 65 percent. A procurement expert audit survey confirms the tools’ reliability, with a 91 percent probability that experts recognize suspicious scoring rules when flagged. Firm-level analysis reveals that local, state-owned, and less productive firms are favored in corrupt auctions. Lastly, the paper explores policy implications. Analysis of the national anti-corruption campaign since 2012 suggests that general investigations may be insufficient to address deeply ingrained corrupt practices. Using counterfactuals based on an estimated structural model, the paper shows that implementing anonymous call-for-tender evaluations could improve social welfare by 10 percent by eliminating suspicious rules and encouraging broader participation.
Journal
Journal Volume
Journal Issue

Related items

Showing items related by metadata.

  • Publication
    Testing the Importance of Search Frictions, Matching, and Reservation Prestige through Randomized Experiments in Jordan
    (World Bank Group, Washington, DC, 2014-09-01) Groh, Matthew; McKenzie, David; Shammout, Nour; Vishwanath, Tara
    Unemployment rates for tertiary-educated youth in Jordan are high, as is the duration of unemployment. Two randomized experiments in Jordan were used to test different theories that may explain this phenomenon. The first experiment tested the role of search and matching frictions by providing firms and job candidates with an intensive screening and matching service based on educational backgrounds and psychometric assessments. Although more than 1,000 matches were made, youth rejected the opportunity to even have an interview in 28 percent of cases, and when a job offer was received, they rejected this offer or quickly quit the job 83 percent of the time. A second experiment built on the first by examining the willingness of educated, unemployed youth to apply for jobs of varying levels of prestige. Youth applied to only a small proportion of the job openings they were told about, with application rates higher for higher prestige jobs than lower prestige jobs. Youth failed to show up for the majority of interviews scheduled for low prestige jobs. The results suggest that reservation prestige is an important factor underlying the unemployment of educated Jordanian youth.
  • Publication
    In Pursuit of Balance : Randomization in Practice in Development Field Experiments
    (World Bank, Washington, DC, 2008-10) McKenzie, David; Bruhn, Miriam
    Randomized experiments are increasingly used in development economics, with researchers now facing the question of not just whether to randomize, but how to do so. Pure random assignment guarantees that the treatment and control groups will have identical characteristics on average, but in any particular random allocation, the two groups will differ along some dimensions. Methods used to pursue greater balance include stratification, pair-wise matching, and re-randomization. This paper presents new evidence on the randomization methods used in existing randomized experiments, and carries out simulations in order to provide guidance for researchers. Three main results emerge. First, many researchers are not controlling for the method of randomization in their analysis. The authors show this leads to tests with incorrect size, and can result in lower power than if a pure random draw was used. Second, they find that in samples of 300 or more, the different randomization methods perform similarly in terms of achieving balance on many future outcomes of interest. However, for very persistent outcome variables and in smaller sample sizes, pair-wise matching and stratification perform best. Third, the analysis suggests that on balance the re-randomization methods common in practice are less desirable than other methods, such as matching.
  • Publication
    Soft Skills or Hard Cash? The Impact of Training and Wage Subsidy Programs on Female Youth Employment in Jordan
    (World Bank, Washington, DC, 2012-07) Groh, Matthew; Krishnan, Nandini; McKenzie, David; Vishwanath, Tara
    Throughout the Middle East, unemployment rates of educated youth have been persistently high and female labor force participation, low. This paper studies the impact of a randomized experiment in Jordan designed to assist female community college graduates find employment. One randomly chosen group of graduates was given a voucher that would pay an employer a subsidy equivalent to the minimum wage for up to 6 months if they hired the graduate; a second group was invited to attend 45 hours of employability skills training designed to provide them with the soft skills employers say graduates often lack; a third group was offered both interventions; and the fourth group forms the control group. The analysis finds that the job voucher led to a 40 percentage point increase in employment in the short-run, but that most of this employment is not formal, and that the average effect is much smaller and no longer statistically significant 4 months after the voucher period has ended. The voucher does appear to have persistent impacts outside the capital, where it almost doubles the employment rate of graduates, but this appears likely to largely reflect displacement effects. Soft-skills training has no average impact on employment, although again there is a weakly significant impact outside the capital. The authors elicit the expectations of academics and development professionals to demonstrate that these findings are novel and unexpected. The results suggest that wage subsidies can help increase employment in the short term, but are not a panacea for the problems of high urban female youth unemployment.
  • Publication
    Designing and Analyzing Powerful Experiments
    (Washington, DC: World Bank, 2025-07-22) McKenzie, David
    This paper offers practical advice on how to improve statistical power in randomized experiments through choices and actions researchers can take at the design, implementation, and analysis stages. At the design stage, the choice of estimand, choice of treatment, and decisions that affect the residual variance and intra-cluster correlation can all affect power for a given sample size. At the implementation stage, researchers can boost power through increasing compliance with treatment, reducing attrition, and improving outcome measurement. At the analysis stage, power can be increased through using different test statistics or estimands, through the choice of control variables, and through incorporating informative priors in a Bayesian analysis. A key message is that it does not make sense to talk of “the” power of an experiment. A study can be well-powered for one outcome or estimand, but not others, and a fixed sample size can yield very different levels of power depending on researcher decisions.
  • Publication
    Beyond Baseline and Follow-up : The Case for More T in Experiments
    (2011-04-01) McKenzie, David
    The vast majority of randomized experiments in economics rely on a single baseline and single follow-up survey. If multiple follow-ups are conducted, the reason is typically to examine the trajectory of impact effects, so that in effect only one follow-up round is being used to estimate each treatment effect of interest. While such a design is suitable for study of highly autocorrelated and relatively precisely measured outcomes in the health and education domains, this paper makes the case that it is unlikely to be optimal for measuring noisy and relatively less autocorrelated outcomes such as business profits, household incomes and expenditures, and episodic health outcomes. Taking multiple measurements of such outcomes at relatively short intervals allows the researcher to average out noise, increasing power. When the outcomes have low autocorrelation, it can make sense to do no baseline at all. Moreover, the author shows how for such outcomes, more power can be achieved with multiple follow-ups than allocating the same total sample size over a single follow-up and baseline. The analysis highlights the large gains in power from ANCOVA rather than difference-in-differences when autocorrelations are low and a baseline is taken. The paper discusses the issues involved in multiple measurements, and makes recommendations for the design of experiments and related non-experimental impact evaluations.

Users also downloaded

Showing related downloaded files

  • Publication
    The World Bank Human Capital Index
    (Published by Oxford University Press on behalf of the World Bank, 2019-02) Kraay, Aart
    This paper provides a guide to the new World Bank Human Capital Index (HCI), situating its methodology in the context of the development accounting literature. The HCI combines indicators of health and education into a measure of the human capital that a child born today can expect to achieve by her 18th birthday, given the risks of poor education and health that prevail in the country where she lives. The HCI is measured in units of productivity relative to a benchmark of complete education and full health, and ranges from 0 to 1. A value of x on the HCI indicates that a child born today can expect to be only x×100 percent as productive as a future worker as she would be if she enjoyed complete education and full health.
  • Publication
    Combining Preschool Teacher Training with Parenting Education
    (World Bank, Washington, DC, 2016-09) Fernald, Lia C. H.; Ozler, Berk; Kariger, Patricia; McConnell, Christin; Neuman, Michelle; Fraga, Eduardo
    This paper evaluates a government program in Malawi, which aimed to improve quality at community-based childcare centers and complemented these efforts with a group-based parenting support program. Children in the integrated intervention arm (teacher training and parenting) had significantly higher scores in measures of language and socio-emotional development than children in centers receiving teacher training alone at the 18-month follow-up. However, the study finds no effects on child assessments at the 36-month follow-up. Significant improvements at the centers relating to classroom organization and teacher behavior in the teacher-training only arm did not translate into improvements in child outcomes at either follow-up. The findings suggest that, in resource-poor settings with informal preschools, programs that integrate parenting support within preschools may be more effective than programs that simply improve classroom quality.
  • Publication
    Where Are All the Jobs ?
    (World Bank, Washington, DC, 2022-03) Barzin, Samira; Rentschler, Jun; O’Clery, Neave; Avner, Paolo
    Globally, both people and economic activity are increasingly concentrated in urban areas. Yet, for the vast majority of developing country cities, little is known about the granular spatial organization of such activity despite its key importance to policy and urban planning. This paper adapts a machine learning based algorithm to predict the spatial distribution of employment using input data from open access sources such as Open Street Map and Google Earth Engine. The algorithm is trained on 14 test cities, ranging from Buenos Aires in Argentina to Dakar in Senegal. A spatial adaptation of the random forest algorithm is used to predict within-city cells in the 14 test cities with extremely high accuracy (R- squared greater than 95 percent), and cells in out-of-sample ”unseen” cities with high accuracy (mean R-squared of 63 percent). This approach uses open data to produce high resolution estimates of the distribution of urban employment for cities where such information does not exist, making evidence-based planning more accessible than ever before.
  • Publication
    Welfare Impacts of Rural Electrification : A Case Study from Bangladesh
    (2009-03-01) Barnes, Douglas F.; Khandker, Shahidur R.; Samad, Hussain A.
    Lack of access to electricity is one of the major impediments to growth and development of the rural economies in developing countries. That is why access to modern energy, in particular to electricity, has been one of the priority themes of the World Bank and other development organizations. Using a cross-sectional survey conducted in 2005 of some 20,000 households in rural Bangladesh, this paper studies the welfare impacts of households' grid connectivity. Based on rigorous econometric estimation techniques, this study finds that grid electrification has significant positive impacts on households' income, expenditure, and educational outcomes. For example, the gain in total income due to electrification can be as much as 30 percent and as low as 9 percent. Benefits go up steadily as household exposure to grid electrification (measured by duration) increases and eventually reach a plateau. This paper also finds that rich households benefit more from electrification than poor households. Finally, estimates also show that income benefits of electrification on an average exceed cost by a wide margin.
  • Publication
    World Development Report 2021
    (Washington, DC: World Bank, 2021-03-24) World Bank
    Today’s unprecedented growth of data and their ubiquity in our lives are signs that the data revolution is transforming the world. And yet much of the value of data remains untapped. Data collected for one purpose have the potential to generate economic and social value in applications far beyond those originally anticipated. But many barriers stand in the way, ranging from misaligned incentives and incompatible data systems to a fundamental lack of trust. World Development Report 2021: Data for Better Lives explores the tremendous potential of the changing data landscape to improve the lives of poor people, while also acknowledging its potential to open back doors that can harm individuals, businesses, and societies. To address this tension between the helpful and harmful potential of data, this Report calls for a new social contract that enables the use and reuse of data to create economic and social value, ensures equitable access to that value, and fosters trust that data will not be misused in harmful ways. This Report begins by assessing how better use and reuse of data can enhance the design of public policies, programs, and service delivery, as well as improve market efficiency and job creation through private sector growth. Because better data governance is key to realizing this value, the Report then looks at how infrastructure policy, data regulation, economic policies, and institutional capabilities enable the sharing of data for their economic and social benefits, while safeguarding against harmful outcomes. The Report concludes by pulling together the pieces and offering an aspirational vision of an integrated national data system that would deliver on the promise of producing high-quality data and making them accessible in a way that promotes their safe use and reuse. By examining these opportunities and challenges, the Report shows how data can benefit the lives of all people, but particularly poor people in low- and middle-income countries.