Policy Research Working Paper 9723 Is Investment in Preprimary Education Too Low? Lessons from (Quasi) Experimental Evidence across Countries Alaka Holla Magdalena Bendini Lelys Dinarte Iva Trako Development Research Group Education Global Practice & Development Impact Evaluation Group June 2021 Policy Research Working Paper 9723 Abstract Many studies have estimated high rates of return for prep- limited set of longitudinal studies indicate persistence of rimary education provided to children between the ages of advantages of 0.07 sd in each type of skills beyond the pre- 3 and 6, yet coverage is not universal in high-income coun- primary period, suggesting that investments in preprimary tries and is very low in low- and middle-income countries. education can make primary instruction more effective. In This study uses a novel dataset of impact estimates from studies that report separate effects for populations that vary 55 (quasi-) experimental studies conducted around the in socio-economic status, disadvantaged children benefit world and meta-regression methods to investigate whether significantly more on average from preprimary interven- this preprimary investment is suboptimal. Average effect tions. Lastly, benefit-to-cost ratios estimated for a subset sizes suggest strong demand for preprimary services when of studies conducted in low- and middle-income countries offered and significant improvements in children’s cogni- range from 1.7 to 103.5. Taken together, these results imply tive (0.15 sd) and executive functions, social-emotional high returns and room for improvements in efficiency from learning, and behavior (0.12 sd) during the preprimary reallocating the marginal dollar in existing budgets toward period, with no significant differences between high and preprimary education. low & middle income countries. Estimates from a more This paper is a product of the Development Research Group, the Education Global Practice, and the Development Impact Evaluation Group. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at aholla@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Is investment in preprimary education too low? Lessons from (quasi) experimental evidence across countries* Alaka Holla† Magdalena Bendini‡ Lelys Dinarte§ Iva Trako** Keywords: Preprimary education, meta-regression, cognitive and non-cognitive outcomes JEL Classification: I20, I26, I28 * We would like to thank Kathleen Beegle, Arianna Legovini, Emma Näslund-Hadley, Dana McCoy, Katherine King, Amer Hasan, Florencia Lopez Boo, and Hirokazu Yoshikawa for their valuable comments and suggestions. We are also grateful to Steffanny Romero Esteban and Yilin Pan for their assistance in our robust meta-regression and benefit-to- cost calculations. This research benefitted from funding from the World Bank’s Strategic Impact Evaluation Fund. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. † Program Manager. Education Global Practice. The World Bank. email: aholla@worldbank.org ‡ Senior Economist. Education Global Practice. The World Bank. email: mbendini@worldbank.org § Economist. Development Research Group. The World Bank. email: ldinartediaz@worldbank.org ** Economist. Development Impact Evaluation Group (DIME). The World Bank. email: itrako@worldbank.org 1. Introduction Studies in high-income countries have shown preprimary education provided to children between the ages of 3 and 6 has high returns both in small-scale interventions like the Perry Preschool Project (Heckman et al., 2010) and larger-scale government programs like childcare expansions in Norway (Havnes & Mogstad, 2011) and Head Start in the United States (Carneiro & Ginja, 2014; Kline & Walters, 2016; Bailey et al., 2020b). In a comparative welfare analysis of 133 policies in the United States, Hendren and Sprung-Keyser (2020) find that investments in health and education for low-income children pay for themselves in the long-run. More recently, evidence from low-income countries demonstrates sizeable improvements in skills and subsequent educational attainment following preschool interventions (Berlinski et al., 2008; Martinez et al., 2017; Ganimian et al., 2021; Dean & Jayachandran, 2020). Nevertheless, unlike primary education, coverage of early childhood education is not universal in high-income countries and is below 20 percent in low-income countries.1 Is this investment in preprimary education suboptimal? If demand for preprimary education is low (Bouguen et al., 2018), if offered services do not improve children’s skills relative to their counterfactual situations (Baker et al., 2008; Cascio & and Schanzenbach, 2013; Bouguen et al., 2016; Bernal et al., 2019), or if early primary grades are an adequate substitute for preprimary education, then there may be little return to expansions in preprimary investments either through increases or reallocations in existing budgets. On the other hand, if there is demand for early childhood education services and these improve skills that translate into earnings increases and other benefits for society, then the returns to an increase in preprimary investment may be positive. If these skill improvements during the preprimary years also make subsequent education more effective in building children’s human capital, then it may even be cost-effective to direct the marginal dollar of existing budgets toward preprimary education. Similarly, if certain subpopulations, such as children from lower socioeconomic backgrounds, benefit more from increases in preprimary investments (Johnson & Jackson, 2019; Hendren & Sprung-Keyser, 2020), then it may be more cost-effective to target these populations for such investments. 1Early childhood education typically refers to services that provide care or education to children between age three and age six. 2 This study uses the entire body of experimental and quasi-experimental evidence focused on preprimary education to evaluate potential returns to increasing or redirecting investment toward preprimary education. We focus our analysis in preprimary programs, defined as those that provide group-based childcare in a center setting with a developmental or educational focus for children between the ages of 3 and 6 years.2 Based on this definition, we construct a dataset of 1,017 effect sizes from 55 studies of preprimary interventions conducted across 19 countries to test whether, on average: (i) there is demand for preprimary education; (ii) preprimary interventions—either expansions in coverage or improvements to the quality of existing services— improve children’s skills during the preprimary period; (iii) skill advantages remain beyond the preprimary period; and (iv) preprimary interventions generate higher benefits for children of lower socioeconomic status. The interventions in these studies include both small- scale pilots implemented by non-governmental organizations (NGOs) and nationally scaled government programs. For a subset of studies conducted in low- and middle-income countries, we also estimate earnings returns to improvements in cognitive and social-emotional skills during the preprimary period to calculate benefit-to-cost ratios of preprimary interventions, using estimates from the literature on the extent to which skill improvements during the preprimary period translate into earnings increases. To exploit all the information in studies that estimate effects using multiple outcome measures with varying levels of precision, we standardize the outcomes we extracted from studies (Lipsey and Wilson, 2001) and use meta-regression methods to aggregate evidence across studies and estimate an average effect size for each class of outcomes, effectively weighting study- specific average treatment effects by their precision. Specifically, we use robust variance meta- regression (Hedges, et al., 2010; Tanner‐Smith & Tipton, 2014; Tanner-Smith, et al., 2016; Tipton, 2015), which adjusts the standard error of the aggregate average effect size for potential dependence among outcomes coming from the same study. Although we rely on our own judgment to screen studies for inclusion in our sample, meta-regression offers a way to assess whether an intervention (or class of interventions) works across contexts that does not require the analyst to engage in subjective weighting of positive, negative, and statistically insignificant 2In this sense, we include programs that provide formal preprimary, community-based preprimary, kindergarten, pre-k, and daycare with an educational component, programs that makes preprimary education affordable (e.g. subsidies) or increase its quality (e.g. teacher training or provision of materials). This definition excludes programs that provide services outside the preprimary period. 3 effects as is done in more narrative reviews. It also does not require the analyst to engage in heuristics like “vote counting,” which tallies the number of positive, negative, statistically significant, and statistically insignificant results but ignores information contained in the size and precision of estimated effects. The average effect sizes we estimate for outcomes related to school participation and progression, children’s skills during the preprimary years and beyond, and adult outcomes suggest high returns to increasing investment in preprimary education and to targeting disadvantaged populations for investment. When experimentally offered access to preprimary education, children are on average 1.4 standard deviations (sd) more likely to participate in the preprimary program than their control group counterparts, suggesting strong demand for services. During the preprimary period, children’s cognitive skills (in language, literacy, and math) improve by an average of 0.15 sd, while their executive functions, social-emotional skills, and behavior (sometimes referred to as “non-cognitive skills”) improve by an average of 0.12 sd. These effects are similar (and statistically indistinguishable) across high- and low- & middle- income countries. Only a subset of studies has longitudinal designs, but this more limited sample shows persistence of these skill advantages beyond the preprimary period, with significant average gains of 0.07 sd in each type of skill. These persistent advantages suggest that preprimary education improves school readiness and helps make instruction after the preprimary period more effective. An even smaller subset of studies, all in high-income contexts, can track children into adulthood, and while some individual studies do find large and statistically significant advantages in both adult health and labor market outcomes, our average effect is small and insignificant. To estimate longer-term returns for our sample of studies from low- and middle-income countries, we combine study-specific estimates of impacts on skills, data on average wages and real wage growth, and estimates from the literature on the extent to which improvements in childhood skills translate into earnings. Our most conservative estimates suggest benefit-to-cost ratios above 1, ranging from 1.7 to 14.2. Less conservative estimates, where we assume a discount rate and return to cognitive skills for low- and middle-income countries similar to values typically used in economic evaluations of programs in high-income countries, suggest benefit-to-cost ratios 4 ranging from 3.5 to 103.5. Both sets of ranges can be considered lower bounds for the returns to preprimary investments as they focus on benefits solely related to the future earnings of children and do not take into account any “fiscal externalities” in the sense of Hendren and Sprung-Keyser (2020), such as the increases in tax revenues and decreases in transfer payments (public assistance) that might accompany increases in children’s lifetime earnings. Our estimates of the rate of return also ignore any contemporaneous benefits of preprimary education that might arise from increases in maternal labor force participation (Evans et al., 2021). While very few studies permit a comparison of effects across subpopulations, the limited set of studies that report disaggregated effects is sufficient to detect statistically meaningful differences between populations of high and low socioeconomic status. Children from more disadvantaged backgrounds who are exposed to the preprimary interventions evaluated in the sample show significantly higher responses in school participation and progression, cognitive and “non-cognitive” skills, and health. Our results are consistent with other reviews of subsets of the preprimary experimental and quasi-experimental literature that find impacts on cognitive skills during the preprimary period (Nores & Barnett, 2010; Duncan & Magnuson, 2013; Yoshikawa et al., 2013; McCoy et al., 2017; van Huizen & Plantenga, 2018) and larger impacts for disadvantaged children (Yoshikawa et al., 2013; van Huizen & Plantenga, 2018) by using narrative, vote-tallying, and meta-analytic techniques to aggregate evidence. We do find persistence of skill advantages after the preprimary period, in contrast to other studies and reviews that document fade-out (Yoshikawa et al., 2013; Duncan & Magnuson, 2013), although it is possible that individual studies (and therefore more narrative or vote-tallying reviews) lack sufficient statistical power to detect effects as small as 0.07 sd. Persistence in an advantage in executive function and social-emotional skills, however, is consistent with the hypothesized mechanism behind “sleeper effects,” or gains in health and earnings observed in adulthood when test score advantages dissipate or disappear after the preprimary period (Heckman et al., 2010; Bailey et al., 2020a; Cascio & Schanzenbach, 2014). Our results, along with current coverage rates, suggest that levels of preprimary investment are suboptimal, particularly in low- and middle-income countries. When offered services, families do elect to send their children to preprimary education centers. The significant average effects for cognitive and social-emotional skills show that preprimary education does provide 5 better conditions for the development of skills than children’s counterfactual options of remaining at home or in informal care settings. This improvement in skills could help counter the inequities in skill development that children bring with them when they enter primary school (Fernald et al., 2011; Naudeau et al., 2011; Duncan and Magnuson, 2013; Schady et al., 2014). The high benefit-to-cost ratios indicate high returns to preprimary investments, and the persistence of impacts beyond the preprimary period suggests that investments in preschool can play a role in improving the effectiveness of primary education (Johnson & Jackson, 2019), possibly by enhancing executive functions and social-emotional skills that help children learn (Cunha and Heckman, 2007) or by reducing heterogeneity in the classroom and thus making it easier to teach foundational skills (Banerjee et al., 2007; Dupas et al., 2011). The rest of this paper is organized as follows. The next section (Section 2) provides background, describing skill development during the preprimary period, theories on when preprimary education should improve skills and well-being, and preprimary coverage and quality in low- and middle-income countries. Section 3 presents a framework for using evidence from the (quasi) experimental literature to assess whether preprimary investments are too low. Section 4 describes the methods we used to search for studies, screen them for inclusion, and extract data on effect sizes. This section also describes the studies, interventions, and outcomes included in the analysis, as well as the methods we use to aggregate evidence across studies and to test our research questions related to the demand for preprimary education, skill improvements during the preprimary years, the persistence of any advantages beyond the preprimary years, and the returns to preprimary investments. Section 5 presents the main results and those by socioeconomic status. Section 6 discusses the implications of these results for preprimary investments and concludes. Various appendixes provide more detailed information on our methods and studies and present study-specific estimates of the standardized average treatment effects that we use in our main analyses. 2. Background and context Learning during the preprimary years Research in neuroscience, psychology, and developmental cognitive science has established that, due to higher brain malleability at younger ages, children learn fast in their first six years of life compared to later stages of development (Shonkoff et al., 2000; Knudsen, 2004). They rapidly 6 acquire skills in areas of knowledge that include numbers (DeWind, 2019) and language (Wang et al., 2020; Yuan & Fisher, 2009) as well as social interactions (Hamlin et al., 2013; Tamis- LeMonda et al., 2008; Tomasello, et al., 2005). Before primary school, children also develop skills including executive functions (inhibition, working memory, and cognitive flexibility) that allow them to “learn to learn,” manage emotions, and relate to others (Diamond, 2013; Zelazo et al., 2003). As a result, most research on the development of children during the preprimary period measures skills in these “domains” of development (Fernald et al., 2017). Evidence from neuroscience and developmental psychology also demonstrates that exposure to learning opportunities promotes children’s innate learning abilities (Jara‐Ettinger et al., 2016; Wang et al., 2020). Learning appears to be sequential and cumulative as early, more basic skills provide the basis to master later and more complex skills (Knudsen et al., 2006; Spelke & Shutts, forthcoming). Learning is also interdependent; learning in specific domains promotes learning in other domains and results in complex skills such as literacy (Dehaene 2010; Bailey et al., 2020a.) Cunha and Heckman (2007) argue that due to the cumulative and interdependent nature of learning, the establishment of sound foundational skills early in life can lead to a virtuous cycle of skill acquisition as children develop. Whether preprimary education facilitates this process in practice, however, depends on the quality of the learning environment provided through preprimary education and how it compares to what children would have experienced without formal preprimary education—both during and after the preprimary period. If children receive more cognitive and psychosocial stimulation in the home or informal care settings (for example, the homes of relatives or friends) than in formal preprimary education, then an expansion of formal services is unlikely to improve average skill development and may even set children back. That is what researchers suggest happened when Canada’s province of Quebec extended subsidized coverage of childcare to less needy families (Baker et al., 2008; Baker et al., 2015). If, on the other hand, the learning environment offered through preprimary education services provides more stimulation to children than their counterfactual situation, then coverage expansions or quality improvements will likely enhance skill development (Cascio, 2015; Cascio & Schanzenbach, 2014). Some preprimary programs even try to directly improve the home learning environment by offering parents counseling on parent-child interactions conducive to child development. 7 Similarly, if available preprimary education substitutes for other services that children would otherwise access, then investments in preprimary education may not translate into improvements in children’s skills. For example, in Denmark, children who had already benefited as infants and toddlers from a nurse home visiting program exhibited smaller gains on an index of human capital from an expansion in preprimary education than children who did not have the home visiting program (Rossin-Slater & Wüst, 2020). Similarly, a preprimary school construction program in Cambodia induced parents to switch their underage children out of formal primary school and into the new community-based preprimary schools, which decreased their cognitive development compared to children without access to the new schools (Bouguen, et al., 2018).3 The federal Head Start program in the United States also induced switching out of private preprimary schools (Kline & Walters, 2016). The adequacy of stimulation in their counterfactual environment and lower access to substitute services may partially explain why children from more disadvantaged backgrounds tend to benefit more from preprimary education (Cascio and Schanzenbach, 2014; Currie, 2001). Preprimary education in low- and middle-income countries Preprimary enrollment has increased considerably around the world over the last two decades, from an average of 30 percent of children in 2000 to 50 percent in 2018.4 Access to preprimary education in low- and middle-income countries is still low, with 19 percent of preprimary-aged children in low-income countries enrolled (UIS, 2018), a coverage rate less than half of what was observed in high-income countries fifty years ago. Beyond these averages, there is substantial variation of preprimary enrollment across and within countries that is associated with socioeconomic status, with the largest differences in enrollment in the poorest countries (Figure 1a). This unequal access to preprimary education can exacerbate learning inequalities, as children in families from lower socioeconomic groups tend to also have limited learning opportunities and stimulation at home and in their communities (McCoy et al., 2018) (Figure 1b). 3 In contrast to the practice of “red-shirting” in higher-income countries, in many low-income countries, parents try to enroll their children in primary school before the children are age eligible. 4 Data on provision are limited but suggest that around 38 percent of countries offer some free preprimary education, with most of these countries offering between one and three years of service. There is also wide variation of private provision across regions, ranging from 7 percent in Europe and Central Asia to 72 percent in the Middle East and North Africa (UIS, 2018). 8 Domestic financing for preprimary education has increased over the past decade, amounting to 6.6 percent of domestic education budgets globally. Low-income countries allocate substantially less, with less than 2 percent of their education budgets going toward preprimary education (UIS 2018). In these countries, standards and quality assurance systems are often nonexistent or under-resourced and learning spaces often do not meet minimum safety and sanitation conditions (World Bank, 2013; UNICEF, 2019). Due to an acute shortage of preprimary teachers and staff, child-teacher ratios remain high even when enrollment rates are low (UNICEF, 2019). The preprimary education workforce also has lower renumeration and higher attrition rates than its primary education counterpart (UIS, 2018). Thus, on the one hand, the low levels of stimulation occurring at home and very limited access to preprimary education in low-income countries suggests that offered preprimary education services must exceed a low bar in terms of cognitive stimulation and psychosocial support. That is, the counterfactual situation in these contexts is not very conducive for children’s skill development. On the other hand, countries with limited state capacity to provide and regulate school quality may struggle to meet even this low bar. 3. Is investment in preprimary education too low? While the large difference in preprimary coverage between high- and low-income countries suggests that preprimary investments in low-income countries might be too low, these countries also spend much less on education overall. Their ministries or departments focused on education have smaller budgets, and lower coverage of preprimary education could result from an optimal allocation of resources across sectors of the economy or subsectors within the education sector. To fix ideas, let us assume the perspective of the decision maker who controls the budget in the education sector, although our arguments would also hold were we to consider a more general social planner with control over a country’s entire budget (for example, a ministry of finance). Our decision maker must choose between spending on preprimary education, , and spending on any other education, . Let S denote total spending in education, such that + = , and let Smax represent the total budget so that ≤ . A function V(.) maps spending on education into the net present value of the benefits of education, which includes not only private returns to the individuals but also “fiscal externalities” in the sense of Hendren and 9 Sprung-Keyser (2020), where increases in children’s earnings as adults also increase tax revenues collected by the government and reduce transfers made through the public assistance system.5 If < and there are no competing demands, it might be sufficient to show in a cost- benefit analysis that the benefits of preprimary education outweigh its costs, or that there is a positive return to increasing preprimary spending. ( ) >0 (1) If, however, = (the budget constraint is binding) or there are competing demands for resources, then showing a positive return is insufficient to determine whether investment in preprimary education is suboptimal. Instead, both Conditions (1) and (2) would have to be satisfied. That is, the marginal returns to investing in preprimary education should also be larger than the marginal returns to other investments in education. ( ) ( ) > (2) Thus, to assess whether spending is suboptimal, we need to demonstrate that at the margin (i) there is a positive return to investing in preprimary education, and (ii) investing in preprimary education is more cost-effective than investing in basic education at current levels of spending. That is, at the margin, there would be gains from moving some current spending to preprimary education. In this simplified framework, whether these conditions are satisfied will depend on parameters that could differ across countries: the levels of , , and , as well as the function V(.) that determines the way spending translates into benefits, which itself may be different for preprimary and other levels of education.6 In this simple model, the decision maker also does not care about the distribution of benefits. Given that some households have better access to preprimary education in low-income countries (for example, poorer households in Figure 1a), it 5 To the extent that increases in education also improve health later in life (as in Carneiro and Ginja, 2014 and Brotman et al., 2016) or the intergenerational transmission of human capital (as in Rossin-Slater & Wüst 2020), the function V(S) also captures these benefits as well. 6 For example, diminishing marginal returns to spending may set in earlier for one type of education. 10 also is possible that a decision maker would put more/less weight on an increase in benefits accruing to one subpopulation over the other from a change in public spending. Data from experimental and quasi-experimental studies that estimate the benefits of preprimary interventions can shed light on whether in practice conditions (1) and (2) are satisfied. For a preprimary intervention to generate a positive return, first there must be demand for preprimary education. That is, when offered early childhood education services, households take them. Evaluations should find an increase in participation in preprimary programs when these services are expanded (through, for example, school construction or subsidies). Second, the services that are offered should improve children’s skills relative to their counterfactual situations. Evaluations should find positive and significant effects on skills resulting from either expansion in preprimary education or improvements in the quality of existing services. Finally, any improvements in skills should translate into benefits with a net present value that outweighs their costs. Only a limited number of studies both globally and within countries have tracked their preprimary cohort into adulthood and estimated the costs of the evaluated preprimary intervention. Most evaluations, however, do estimate impacts on skills measured during the preprimary years, and a few have carefully measured the costs of their interventions (Spier et al., 2020; Wolf et al., 2019a). Moreover, from the few longitudinal studies that are available, we have some empirical evidence on how improvements in skills during a child’s early years translate into benefits in adulthood (Heckman et al., 2010; Kline & Walters, 2016; Galasso & Wagstaff, 2019). Empirical tests of Condition (2) pose more of a challenge. To our knowledge, there are no experiments in which spending is reallocated away from basic education to preprimary education. The increase in preprimary spending evaluated in natural experiments (as in Jackson and Johnson, 2019, for example) does not necessarily come at the expense of other educational spending. Therefore, we must explore the potential cost-effectiveness of preprimary education more indirectly. Cunha and Heckman (2007) posit that investments during the preprimary years may exhibit dynamic complementarity, where early improvements in skills increase the productivity of primary investments and lead to greater skill development in later years. Some evaluations track children into primary school and secondary school. If preprimary education 11 were a substitute for early primary education, then evaluations that examine the impacts of preprimary education in the primary years should find that any skill improvements observed in the preprimary years have dissipated in later years, as comparison children catch up and acquire the skills that treatment children gained during the preprimary period. If, on the other hand, preprimary education and primary education were complements, and improvements in children’s school readiness during the preprimary years helped them learn more in primary school, then evaluations should find a persistent skill advantage among children benefiting from more (or improved) preprimary education. Because children from disadvantaged backgrounds are less likely to attend preprimary education and because their home environments may be less conducive to skill development than a school setting would be, it is possible that these children may benefit more from expansions or improvements in preprimary education. Some evaluations separately estimate impacts of preprimary interventions for populations with varying socioeconomic status. If impacts for more disadvantaged children tend to be higher, then targeting these groups for preprimary education spending may be even more cost-effective than untargeted preprimary investments. Therefore, to test whether Conditions (1) and (2) hold, we can use estimated impacts from experimental and quasi-experimental studies of preprimary interventions to answer the following questions: 1. Is there demand for preprimary programs? When services are offered, do households take them up? 2. Do available preprimary education services improve children’s skills during the preprimary period? If so, should we expect a positive return on this increase in skills? 3. Are preprimary and primary education complements? Does any skill advantage among children who attended preprimary programs persist into basic education? 4. Do disadvantaged children benefit more from preprimary programs? 4. Methods To answer our main research questions, we conducted a systematic review of the experimental and quasi-experimental literature focused on preprimary education. We then use 12 meta-regression methods to aggregate evidence across studies. This section describes our process for selecting studies to use in our analyses and for extracting our “raw data” from the studies. We also lay out our empirical specifications and describe how we standardize extracted coefficients across studies and map study-specific outcomes into broader categories that we can use in meta- regressions. Data To identify studies to review, we followed standard practice for systematic reviews and proceeded in three iterative stages: (i) search and application of inclusion criteria, (ii) screening (two rounds), and (iii) data extraction. Figure 2 summarizes the entire process and indicates the number of studies that remained in the review sample after each stage. 4.1.1 Search strategy and selection criteria To start, we conducted a systemic search for studies using relevant keywords and terms in several search engines and databases, as well as known portfolios of experimental and quasi- experimental studies, such as the portfolio of the World Bank’s Strategic Impact Evaluation Fund.7 Table 1 Panel A lists the databases and specific search terms we used. In this phase, we looked for studies that stated an aim to estimate the effects of preprimary interventions. We did not restrict our search to studies conducted in low- and middle-income countries (as defined by the country-income groupings of the World Bank). Evidence from these countries is limited. Moreover, evidence from high-income countries may offer insight for low- and middle- income countries, since many of the programs evaluated in high-income settings target disadvantaged populations and it is challenging to achieve coverage of high-quality preprimary education even in these contexts. Moreover, if we see convergence of results across country- income settings for outcomes consistently collected across settings (for example, skills measured during the preprimary years and beyond), then we might be more comfortable inferring that estimates only available in high-income settings (for example, adult outcomes) generalize to low- and middle-income settings as well. 7 The Strategic Impact Evaluation Fund supports research that estimates the effects of programs and policies to improve education, health, access to quality water and sanitation, and early childhood development in low- and middle-income countries. See here for more information on the fund and a list of evaluations in its portfolio. 13 We also imposed no restrictions on the timing of publication, although the keywords related to experimental and quasi-experimental evaluation designs effectively limit our attention to studies published after 2005. In addition to the database searches, we found studies in two other ways. We searched bibliographies of papers found through the database search. We also contacted experts and researchers who have frequently published on the impacts of preprimary programs in peer- reviewed journals and asked them to suggest other relevant published studies to incorporate in our review. This search process yielded a total of 270 studies, with 183 studies from database searches and 87 additional studies from other search methods. These studies included systematic reviews, experimental and quasi-experimental studies, and other studies that did not employ experimental or quasi-experimental methods (for example, cost analysis, qualitative reports, etc.). Table 1 Panel B describes the inclusion criteria we used to identify studies for review. We sought to include experimental and quasi-experimental studies of preprimary programs that provided group-based childcare in a center setting with a developmental or educational focus for children between the ages of 3 and 6 years. This definition of preprimary education encompasses the provision of formal preprimary, community-based preprimary, kindergarten, pre-k, and daycare with an educational component, as well as interventions that make preprimary education more affordable (such as subsidies) or increase its quality (such as teacher training, the introduction of a new curriculum or pedagogical approach, and provision of materials). We did not include programs that also provided services outside the preprimary period, such as childcare during infancy or classroom quality improvements after kindergarten, as it would be impossible to distinguish the impacts of interventions during the preprimary years from those of interventions across a longer period. This restriction ruled out seminal studies of programs like the Abecedarian Project (Campbell et al., 2014; Conti et al., 2016) and Project STAR (Chetty et al., 2011; Krueger, 1999), as well as large-scale expansions in childcare that also covered infants (Baker et al., 2008; Baker et al., 2015; Bernal, 2019; Bernal and Ramirez, 2019).8 8We did include a subset of effects from Baker et al., (2008) – specifically those restricted to 4 year olds in the post-period survey – who were unlikely to have access to subsidized care before age 3 since the childcare expansion started with older children. 14 For our purposes, experimental (randomized controlled trials or RCTs) or quasi- experimental designs had to isolate the causal impact of preprimary programs or policies on children’s outcomes. Children’s outcomes had to include either a measure of school participation or progression, a measure of children’s skills or development, or a measure of the child’s learning environment, such as the behavior of teachers or caregivers. This measurement could take place either when children were still in preprimary education or after they had progressed to higher levels of education. We also included outcomes observed when the beneficiaries of preprimary programs were adults. Included studies also had to be published in peer-reviewed journal articles or in a formal working paper series, such as the working papers of the National Bureau of Economic Research or the Policy Research Working Paper series of the World Bank. We included technical reports only if they included a suggested formal institutional citation.9 4.1.2 Screening We then screened these studies to verify that they met our inclusion criteria, focusing on the credible estimation of causal impacts of preprimary interventions and the inclusion of outcomes that captured preprimary coverage, child development, or any subsequent indicator that could proxy for a dimension of well-being in adulthood, as well as caregiver behavior. Figure 2 describes the screening process. We first eliminated studies based solely on a review of their citations, abstracts, and introductions. Of the 270 studies identified in the initial search, we excluded 111 studies during this stage because (i) they were not published in peer- reviewed journals or did not appear as part of a working paper series; (ii) they were not experimental or quasi-experimental studies; (iii) they assessed early childhood interventions that targeted children outside of the 3 to 6 year age range, (iv) they measured outcomes unrelated to learning and well-being for affected children at some point in their life cycle; (v) they were duplicates (for example, working-paper versions of published studies); or (vi) they had no relevance to our research questions. This first stage of screening left us with a total of 159 studies. 9One implication of not including unpublished studies is the potential for publication bias—that is, a bias toward positive and significant results. On the other hand, Brodeur et al. (2020) find that randomized control trials exhibit less of this bias than methods like instrumental variables and differences-in- differences, and more than 70 percent of the studies we eventually included were randomized control trials. 15 In the second stage of screening, at least two authors of this study reviewed each identified study, reading the full text and assessing features meant to proxy for study quality, such as the use of an evaluation design that would generate causal impacts (RCT, regression discontinuity design, differences-in-differences estimation, matching, or an instrumental variable strategy) and the presentation of information on issues that could compromise evaluation design.10 Specifically, to be included in our final sample, studies needed to (i) isolate the impact of the program using some sort of comparison group; (ii) present sufficient evidence that the experimental groups in an RCT were balanced on a set of characteristics prior to the intervention; (iii) report sample attrition and compliance to the treatment assignment for the case of RCTs; (iv) experience less than 30 percent attrition in either the treatment or comparison group between the initiation of treatment and measurement; (v) report the precision of the estimated effect size by providing either standard errors, confidence intervals, t-statistics, standard deviations, sample size and/or p-values; and (vi) use well-known or established outcome variables and tools to measure them. For issues like attrition and compliance, we could not exclude studies based on how authors addressed them, as approaches vary across disciplines.11 Some studies omitted key design and estimation details such as balance checks and the presentation of the exact precision of estimated impacts. When related analyses already published in earlier work provided this information, we kept the study in the review if it met our other criteria. Taking all screening criteria into account, each reviewer assigned studies to two categories—those to include in the review and those to exclude. When reviewers’ ratings matched, a study was either automatically included or removed from consideration. All four authors discussed ratings discrepancies to arrive at a conclusion. Of the 159 studies that underwent this more intense screening, we excluded 104 studies largely because they failed to credibly identify causal impacts or to report sufficient information for us to calculate an effect 10 We did not include studies that used sibling fixed effects to isolate the impact of a program on children’s outcomes, which excluded studies of the Head Start program such as Garces et al. (2002) and Deming et al., (2009). 11 For example, papers published by economists typically document attrition, note whether it is differential across treatment and comparison groups, and address it by calculating bounds for average treatment effects using the methods of either Lee (2009) or Horowitz and Manski (2000). Papers written by developmental psychologists, however, tend to address sample attrition through multiple imputation methods and then calculate average treatment effects. 16 size or its precision. Thus, this second phase of screening yielded a final sample of 55 studies that moved to the next phase of data extraction. 4.1.3 Data extraction We extracted three levels of information from each study. Studies are defined as independent publications concerning preprimary programs. Interventions refer to the studies’ different experimental arms (in an RCT) or implied treatment and comparison groups (in a quasi- experimental study). For example, a study might provide one group of participants with a preprimary program, a second group of participants with the preprimary program and an accompanying program targeting parents, and a third group with no service. During data extraction, we treated these as three different intervention groups or contrasts. Outcomes refer to the estimated coefficients corresponding to the average treatment effects we extracted for each outcome for each intervention. The resulting dataset includes 55 studies, 143 interventions or contrasts, and 1,017 outcomes. 4.1.3.1 Studies Table 2 presents characteristics of the studies in our sample. Appendix Table 1 presents additional details of each study, including intervention components and evaluation design. Figure 3 maps studies, demonstrating coverage across 19 different countries but a concentration of research in the United States. In Table 2, we see that 85 percent of the studies we reviewed appeared in peer-reviewed journals, and around half appeared in journals that publish research in the field of Economics. More than 70 percent of studies were RCTs. Close to half evaluated a program that expanded coverage of preprimary education, while the other half focused on programs that aimed to improve the quality of existing preprimary education services. 4.1.3.2 Interventions We also extracted information about the evaluated interventions. Table 3 summarizes characteristics of the treatment arms of the included studies for our full sample and separately for high-income countries and low- and middle-income countries. Appendix Table 2 presents additional characteristics of the programs evaluated in each country. 17 We assigned interventions to at least one of 12 intervention categories.12 Around 60 percent of programs included a component focused on teachers’ professional development, with more of a focus on this kind of component in low- and middle-income countries. This is not surprising as many interventions that aim to improve the quality of preprimary education—such as a change in curriculum or pedagogy—will require teachers to receive some training prior to program implementation. Close to half of all programs provided subsidized or free access in high-income countries, compared to 3 percent in low- and middle-income countries. In both contexts, around 70 percent aimed to target a disadvantaged population. These interventions include both small-scale pilots implemented by non-governmental organizations (NGOs) and nationally scaled government programs. Daycare programs with education components represented very few evaluated programs; formal preschool was the dominant program in both high-income and low- and middle-income contexts, although community preschool accounted for more than 40 percent of programs in low- and middle- income contexts. While a majority of programs in both contexts took place in publicly managed schools, a sizeable fraction had a combination of public and private management. A higher fraction of programs in high-income countries took place in classrooms with a teacher who was formally qualified to teach preprimary students than in low- and middle-income countries, where only half of teachers were formally qualified to teach in preprimary settings. 4.1.3.3 Outcomes We focused on outcomes related to child development, school participation, teacher behavior, parental behavior, and well-being in adulthood to extract average treatment effects and their precision for each intervention (Table 4). We extracted a total of 1,017 outcomes at the child, teacher, and parent levels. In one study that relied on a very large administrative dataset with a panel structure (Rossin-Slater & Wüst, 2020), outcomes were aggregated to combinations of geographic unit and birth cohort. Sample sizes associated with extracted outcomes ranged from 46 to 24,800,00 individuals, with a median of 1,483. 12 These categories included (i) teacher professional development, (ii) subsidized or free access, (iii) change in curriculum, (iv) change in pedagogy, (v) provision of materials, (vi) provision of new staff, (vii) provision of health and nutrition services, (viii) preschool construction, (ix) parental engagement, (x) community outreach, (xi) preschool day extension, and (xii) teacher payments. 18 Measures of precision included standard errors, confidence intervals, and p-values. When only stars or other symbols indicated precision in a table or chart and the study provided no other metric of precision, we assumed p-values corresponding to the highest p-value of the interval indicated by the symbol.13 If a paper presented no standard measure of precision, we extracted information on the sample sizes and standard deviations of the outcome for the treatment and control groups to calculate the standard error of the average treatment effect. For each outcome, we included both average treatment effects measured on the full study sample and those measured separately for different socioeconomic groups and different age groups, as these tend to be the characteristics of children used to target primary programs when universal coverage is not an option.14 We aimed to extract intention-to-treat estimates of impact, although we extracted local average treatment effects or treatment-on-the-treated effects when intention-to-treat effects were unavailable or less relevant.15 Three-quarters of extracted outcomes were not disaggregated by socioeconomic status. Appendix Table 3 lists the different subpopulations for which we extracted separate coefficients plus a mapping of these subpopulations to a more aggregate indicator meant to capture high and low socioeconomic status. We classified these outcomes according to the timing of their measurement, tagging them as occurring during the preprimary period (below age 6), during post-preprimary education (age 6 to 18), or during adulthood (after age 18). The majority (59 percent) of extracted outcomes were measured when children were still in the preprimary period and 13 percent, all from high-income countries, represent outcomes measured in adulthood. We scored each outcome as positive or negative depending on whether increases represented improvements or reductions in welfare. We also mapped the specific outcomes measured in each study (for example, receptive vocabulary using the Peabody Picture 13 For example, if ** indicated p<0.05, we assumed a p-value of 0.04. We could not use the exact threshold of 0.05 as that led us to misclassify effects that were statistically significant in their respective studies as insignificant when standardized. 14 For two studies (Carneiro & Ginja, 2014; Heckman et al., 2010), we had to extract effects disaggregated by gender as full sample estimates were not reported. 15 For example, in Ganimian et al. (2021), the evaluated intervention entailed an improvement in existing services. Thus, while all children living around the preprimary education center could access the services, only those regularly attending the services would be exposed to the treatment. 19 Vocabulary Test) to more aggregate domains that could be used in our subsequent analysis (for example, language), following standard domains observed in the early childhood education literature (Fernald et al., 2017). Specifically, we classified all outcomes that measured attendance, enrollment, years of education, and schooling attainment into a single category meant to capture school participation and progression. We mapped all outcomes that measured children’s skills and knowledge about specific content areas (for example, mathematics, language, and literacy), as well as dispositions and skills that help children to think about and understand the world around them (for example, general intellectual ability) into one of the following categories: literacy, language, math, and general cognition. We classified constructs such as attention, inhibition, and working memory as executive functions; outcomes such as social cognition, social competence, and emotional regulation as social-emotional skills; and reported or observed measures of aggression, internalizing and externalizing behavior problems, conduct, and disciplinary actions, as well as incarceration as behaviors. 16 Physical outcomes encompassed health and motor development. Longitudinal studies tended to estimate impacts on labor market outcomes, such as earnings or labor market participation. We mapped engagement in learning activities at home, rules and routines, and measures of parenting involvement and quality of parent-child interactions into a parental engagement category. Similarly, we classified teachers’ responses in the classroom, such as the extent of emotional and pedagogical support they provide their students, as teacher practices, while we mapped their self- reports of job satisfaction, motivation, or feelings of burnout to the category of teacher professional well-being. One-to-one mapping was at times difficult since some measures used in studies that themselves were composite scores included items from more than one related domain – for example, both language and literacy or both social-emotional skills and behavior. To address such difficulties with classification and to manage potential multiple inference problems that could arise as the number of outcome categories gets larger, we also aggregate outcomes into larger 16 Appendix 1 describes the contents of each skill category in more detail. 20 categories: school participation and progression; cognitive skills, executive functions, social- emotional skills and behavior; parental responses; adult health; and labor market outcomes. When a study estimated treatment effects using more than one measure for a particular domain of development (for example, the authors used both a measure of separation anxiety and a measure of physical aggression and opposition to assess social-emotional skills), we extracted coefficients for each mode of measurement, unless one measure represented only the aggregation of all the other measures. We omitted effect sizes for outcomes where reported measurement did not conform to accepted practice—for example, asking about nonacute, nonchronic illnesses with a 12-month (instead of 2–4 week) recall period—and for outcomes with very low response rates.17 We also omitted outcomes when it was not clear if an increase in the measured outcome represented an improvement or decrement to well-being. Take for example, outcomes such as grade retention in kindergarten or body mass index. Grade retention in kindergarten could occur when a child’s learning suffers in the classroom (a bad outcome) or when teachers become more attuned to children’s readiness (a good outcome). An increase in body mass index for an underweight child in low-income countries would represent progress, but a decrease in body mass index for an overweight poor child in high-income countries would also be considered a benefit. Not all studies measured outcomes covering all domains of development, nor did studies routinely measure outcomes like parental engagement at home or teachers’ behavior in the classroom. Thus, sample sizes and the composition of studies varies by aggregated outcome. Analytical strategy We aggregate all extracted outcomes to measure an average treatment effect across studies, using meta-regression methods that account for the potential correlation among outcomes measured from the same study and that penalize precision appropriately when the sample size of either studies or outcomes is low. We also combine our estimates of the average effects of 17 We also did not include outcomes for all reported age groups in Berlinski et al. (2008), as primary school enrollment rates suggested little room for improvement for some ages. For this study, we selected age groups that matched with the age composition of other studies in our sample that reported similar outcomes. 21 preprimary interventions with information on the costs of these interventions and their estimated rates of return to estimate benefit-to-cost ratios. 4.2.1 Robust variance meta-regression Many reviews of evidence, particularly those focused on low- and middle-income countries, do not estimate an average effect across studies or incorporate the uncertainty associated with estimated effects in their assessments of the benefits of a certain intervention or class of interventions. For example, authors may plot effect sizes in a bar chart and use their expert judgement to determine whether an intervention does or does not improve outcomes (Kremer & Holla, 2009; Evans & Popova, 2016; Evans & Yuan, 2019). Others may employ a vote counting method, tallying the number of positive and significant coefficients. These methods, however, do not fully consider the fact that estimated coefficients come with confidence intervals of different sizes that inform us about the precision of estimated impacts. That is, even if two estimated coefficients are both significantly different from zero, they are likely to have different levels of precision. Moreover, when a literature does include a large number of statistically insignificant effects, we might worry about inter-rater reliability when human judgment adjudicates whether the average impact of an intervention appears positive (or negative) overall. The education, health, and psychology literatures try to solve these issues through meta- regression, where estimated coefficients for studies j = 1, ….J are the units of observation and the goal of the analysis is to estimate the average effect size, 0 , in a model such as = 0 + + + (3) where is the study-level residual, is a study-level random effect, and 0 and any other coefficients in vector associated with covariates in vector are estimated through weighted least squares, with weights coming from the (inverse of) standard errors associated with each coefficient (Tanner-Smith et al., 2016; Tipton et al., 2019).18 The precision of the average effect 18Some meta-analyses are not only interested in measuring the average effect 0 across studies but also (i) the effects of any study characteristics that might determine the size of a study’s average effect and (ii) heterogeneity in effects observed across studies that is not due to sampling variation. Meager (2019), for example, uses hierarchical linear modeling to jointly estimate both this sampling variance and heterogeneity for studies of microcredit interventions to assess external validity of estimates and the extent to which we can generalize treatment effects from one context to another. Vivalt (2020) has a similar goal for a larger set of randomized controlled trials with multiple objectives. 22 size 0 is measured under the assumption that all estimated coefficients included in the regression are independent draws from the potential distribution of outcomes. Since we had no reason a priori to select one outcome over another when studies reported effects using multiple measures, we extracted all outcomes. Thus, this assumption of independence is unlikely to hold. We could have selected one effect size per study or averaged all effects within a study to create a single synthetic effect size, as in Baird et al. (2014) or Hidrobo et al. (2018). However, this could result in loss of information, as outcomes within a study are rarely perfectly correlated (and sometimes not correlated), and we would like to use as many extracted effect sizes as possible. To deal with the potential dependence among outcomes measuring the same outcome category extracted from the same study, we use robust variance meta-regression (Hedges et al., 2010; Tanner‐Smith & Tipton, 2014; Tanner-Smith et al., 2016; Tipton, 2015). This method of meta- regression uses a working model of the structure of dependence of outcomes within a study (that is, the variance-covariance structure) and does not require assumptions on the exact distribution of effect size estimates, unlike approaches such as hierarchical linear modeling that nest estimated effects within clusters. The robust variance regression method also provides unbiased estimates of the variance for the average effect size across studies even with sample sizes as low as 10 studies (Tanner-Smith et al., 2016). In particular, we use a working model of the variance- covariance structure characterized by correlated effects, assuming that study average effect size varies across studies, the effect sizes within studies are equally correlated (that is, they all have the same intracluster correlation), and this correlation arises from sampling error when multiple outcome measures are collected on the same units.19 19We use the robumeta command in Stata developed by Hedberg (2011) and the default assumption about the correlation among outcomes of the same study (that is, = 0.80). Another option for the working model of the variance-covariance structure is the hierarchical effects model, in which the observed effect size estimates are nested within studies that are nested within clusters, where clusters may correspond to countries, research groups, or interventions (when there are multiple studies drawn from the same intervention). We have no reason to believe that our country-income groups correspond to clusters, and while we do have some studies focused on a single intervention (for example, the Head Start experiment in the United States), this is not our dominant driver of potential dependence among outcomes. We follow Tanner-Smith and Tipton (2014) and use the model of variance that likely describes most of our data when both correlated effects and hierarchical effects may describe the dependence of outcomes of our sample of estimated coefficients. 23 Thus, we modify Equation (3) to account for multiple outcomes coming from a single study in Equation (4), = + + (4) where is the effect size i in study j of outcome category k; is the average effect size of outcome category k; is the study-level random effect such that ( ) captures the between- study variance component; and is the residual for effect size i in study j. Recall that for the benefits of preprimary education to outweigh its costs, families must send their children to preprimary programs when they are made available and these programs must, at a minimum, improve children’s skills relative to their counterfactual situations. We use Equation (4) and measures of school participation in the preprimary years to assess demand for preprimary education. We use measures of children’s knowledge and skills plus teacher behavior measured during the preprimary years to assess whether preprimary services can be delivered with sufficient quality to provide children a better learning environment than they would otherwise receive. In both cases, we test whether > 0. We can also use Equation (4) and data from the subset of studies that report outcomes on children after they progress out of preprimary education to assess whether preprimary education and later years of education are substitutes ( = 0) or complements ( > 0). Finally, to test whether certain subsamples benefit more from preprimary interventions, we modify Equation (4) to add a covariate , which indicates if effect size i in study j was measured for a disadvantaged population, and we limit our sample to the set of effect sizes separately estimated for subpopulations defined by their socioeconomic status. = + + + (5) Because so few studies reported effects for different subpopulations, we group the outcomes into five aggregate categories of outcomes: school participation and progression, cognitive skills, ̂ , which noncognitive skills, parental behavior, and health. Our estimated coefficient of interest is indicates if the average effect size of preprimary interventions differs for disadvantaged populations relative to populations that are less or not disadvantaged. We also include covariates capturing country income and the level of education at the time of outcome measurement. 24 4.2.2 Standardization of outcomes Even within the outcome categories that we created, studies in our sample used different measures, each with their own scales, to assess school participation and progression, children’s skills and behavior, the responses of caregivers and teachers, and well-being in adulthood. Of all extracted outcomes, 61.5 percent were reported in terms of standard deviations by the researchers. Thus, before we could use treatment effects as dependent variables in our meta- regression, we needed to standardize the remaining outcomes, using other information we extracted from each study, such as sample sizes, standard errors, and the means of the treatment and control groups at follow-up. Appendix 2 details our process for standardizing both continuous and binary outcomes and for imputing values when information (like the separate sample sizes of the treatment and control group) was missing. We were able to standardize an additional 32.6 percent of all outcomes.20 Sometimes information required for standardization was missing – for example, separate sample sizes for the treatment and control groups or the mean of the control group at endline for each sub-population when heterogeneous treatment effects are presented. To address these issues, we relied on information provided in the text or appendices of papers as much as possible to approximate sample allocations across treatment and control groups and assumed an even allocation when papers provided no guidance. We used control means of the full sample to represent the means for each sub-population when this information was missing. Given that disadvantaged sub-populations tend to be minorities and given that their average outcomes tend to be lower than less disadvantaged sub-populations, this type of imputation should bias us against finding significant differences between children coming from high socio-economic backgrounds and those coming frow low socioeconomic backgrounds. 4.2.3 Economic evaluation Very few studies can track children into adulthood to assess the full impact of preprimary education and thus appropriately capture the benefits in a benefit-to-cost ratio, and none of these kinds of longitudinal studies in our sample take place in low- and middle-income countries. To estimate a benefit-to-cost ratio for these countries, we need to infer what eventual benefits might be based on short-term effects. For the subset of studies in low- and middle-income countries 20 A total of 5.9 percent of extracted outcomes were not standardized because some inputs for the standardization process were not reported. 25 that do report significant improvements in children’s cognitive skills or social-emotional skills as well as sufficient information to infer per-child costs for the duration of the evaluated intervention, we follow the strategy of Galasso and Wagstaff (2018) and Ganimian et al. (2021) to project what benefits might be and assess whether these projected benefits exceed reported costs. For each study, we gather information about the country’s labor force participation, average wages, and real wage growth to calculate the total expected lifetime earnings of a child.21 We assume children will work between the ages of 22 and 65, which should be a conservative assumption for the number of active years in the labor market, given that labor market participation tends to start much earlier than age 22 in low- and middle-income countries. Because most studies reported program costs in US dollars, we first convert wage data into US dollars using official exchange rates from the International Financial Statistics of the International Monetary Fund.22 We discount this lifetime stream of earnings to the year that preprimary program costs were incurred, using two different discount rates – 3 percent and 5 percent. While 3 percent reflects the standard in the literature on economic evaluation in high-income countries, some researchers have argued that the higher economic growth rates in low- and middle-income countries would make rates as high as 5 percent more appropriate (Haacker et al, 2020). Next we rescale the present value of earning to account for preprimary interventions, combining study-specific estimates of the improvement in cognitive skills following a preprimary intervention with estimates from the literature on the extent to which improvements in cognitive skills translate into improvements in earnings. To be conservative, we select each study’s lowest estimated treatment effect that is significantly different from zero. To translate average effects on cognitive skills into increases in earnings, Kline and Walters (2016) use a value of 13 percent per standard deviation increase in cognitive skills in their study of the Head Start program in the United States, while Galasso and Wagstaff (2018), focusing on low-income countries, use a value of 4.3 percent. Thus, for example, we can multiply the present value of earnings by (0.13 x study- 21 We omit this data collection and calculation for Ganimian et al., (2021) which directly reports benefit to cost ratios. Because of lack of information from the general population on monthly wages in Mozambique, we also cannot calculate benefit-to-cost ratios corresponding to Martinez et al., (2017) which found significant impacts on children’s cognitive skills from an expansion of community preschools and did report per child costs. 22 As studies did not indicate whether they had used official exchange rates or exchange rates adjusted for purchasing power parity to convert program costs to US dollars, we assume they used official exchange rates and therefore use official exchange rates to convert wages to US dollars. 26 specific average treatment effect expressed in standard deviations) to calculate the lifetime gain in earnings from an increase in cognitive skills. When studies also include effect sizes for social-emotional outcomes, we again extracted the lowest significant average treatment effect and augmented the estimate of individual gains in earnings with the gains implied by improvements in social-emotional skills, again using estimates from the literature of the returns to social-emotional skills (Belfield et al., 2015). These estimates of benefit-to-cost ratios, even when using a discount rate of 3 percent and the higher value of the labor market returns to cognitive skills, should be considered conservative, as they solely capture earnings benefits accruing to the individual child. They do not capture any intergenerational transmission of human capital as documented in Rossin-Slater and Wust (2020), nor do they include any social externalities that could arise from an increase in tax revenues following an earnings increase, from a lower reliance on public assistance, from improvements in health (Carneiro and Ginja, 2014; Brotman et al., 2016) and the resulting lower burden on health systems, and from decreases in crime (Heckman et al., 2010). 5. Results In this section, we summarize evidence from experimental and quasi-experimental studies that can isolate the impacts of center-based programs or policies that aimed to provide preprimary education to children between the ages of three and six years. We organize our results around the research questions of Section 3, focusing on the demand for preprimary education, the impacts of preprimary programs on children’s skills, the persistence of effects beyond the preprimary period, and heterogeneity in impacts by socioeconomic status. In this section, we present average effect sizes from robust variance meta-regressions. We present results for our full sample of studies and for high-income countries and low- and middle- income countries separately. We could not compute average effects when all extracted outcomes came from a single study; cells corresponding to these situations are blank in our tables. Appendix 4 presents the figures with the corresponding study-specific standardized effect sizes by outcome category. 27 Demand for preprimary education programs In our review, 12 studies—equally distributed between high-income countries and low- and middle-income countries, report impacts on children’s take-up of offered preprimary education services. Table 5, Panel A presents the average effect sizes of the impact of preprimary interventions on preprimary school participation and progression for the full sample of studies and for each country-income group. On average, the odds of children being enrolled in or attending preprimary education are 1.4 sd higher for children who were given access to preprimary education programs compared to those who did not have access. We lack power to estimate effects separately for each country-income group and for the set of studies evaluating expansions of preschool coverage, possibly because only four studies in low-income countries measured this outcome and the number of studies looking at expansion and quality were equally small.23 Although most studies only reported participation in the program under evaluation, some also reported participation in any preprimary education services. Results from each individual study indicate that preprimary interventions improve overall enrolment and attendance, suggesting that the improvements in program-specific participation do not solely reflect substitution away from existing services (Kline & Walters, 2016; Brinkman et al., 2017; Berkes et al., 2019; Spier et al., 2020). Impacts on children’s skills in the preprimary period Error! Reference source not found.Panel B of Table 5 presents robust variance meta- regression results for cognitive skills related to language, literacy, and math, as well as for outcomes that either relate to general cognition or represent indices that combine multiple cognitive skills. Panel C of Table 5 presents results for executive functions, social-emotional skills, and behavior. Given the importance of both parents and teachers to children’s skill development during the preprimary period, Table 6 presents average effect sizes of the impact of preprimary interventions on outcomes related to the quality of children’s learning environments.24 Taken 23 See Appendix Figure 1 for study-specific estimates on school participation and progression during the preprimary years. 24 While we did extract outcomes related to children’s health and motor de velopment, these are typically not the goals of many preprimary education programs, and we do not present meta-regression results for these outcomes in our main tables. The effect sizes are small and statistically insignificant (effect size: 0.032; 28 together, our results suggest that preprimary interventions do improve children’s skills in both high-income countries and low- and middle-income countries; these interventions also alter the behavior of parents and teachers, generating more stimulation at home and improved classroom practices. 5.2.1 Cognitive skills For language, literacy, and math, the average effect sizes for the full sample of extracted outcomes are positive and statistically significant, suggesting gains of 0.108 sd in language, 0.216 in literacy, and 0.217 in math (Table 5, Panel B, column (1)). This advantage over children in comparison groups who either did not attend preprimary education or did not experience increased investment in their preprimary classrooms is also evident for the general cognition category. Overall, when we aggregate these outcomes further into the larger category of cognitive skills, we find a significant average effect size of 0.147 sd. Estimated effect sizes are higher (although not significantly so) when we restrict our attention to high-income countries (column 2) compared to low- and middle-income countries (column 3). In high-income countries, for example, preprimary interventions lead to a significant 0.36 sd increase in math scores, a 0.11 sd increase in language scores, and a 0.24 sd increase in literacy scores. In low- and middle-income countries, preprimary education interventions raise only math scores by a significant 0.16 sd, on average. Estimated effect sizes in these countries are positive but fail to reach conventional levels of statistical significance for language and literacy where sample sizes are 18 and 3 outcomes, respectively, although when we pool across all skills, children in low- and middle-income countries experience statistically significant gains of 0.12 sd in cognitive skills. Therefore, interventions that expand preprimary coverage and those that target program quality both appear to improve cognitive skills. Appendix Figures 2–6 present study-specific effect sizes for cognitive skills (language, literacy, math, and general). 5.2.2 Executive functions, social-emotional skills, and behavior Table 5 Panel C presents average effect sizes for outcomes measuring executive functions, behavior, and social-emotional skills. When pooled across income levels (column 1) and across all outcomes, estimated effects suggest that preprimary education can also lead to significant standard error: 0.029; and N=27). Appendix Figure 15 graphs study-specific standardized effect sizes for this outcome category. 29 gains of 0.121 sd in these “non-cognitive” skills. Executive functions improve by an average of 0.095 sd (p<0.10). This effect is driven largely by interventions implemented in high-income countries that directly aimed to improve these skills (column 6), which increased these skills by an average of 0.218 sd (p<0.01). Preprimary interventions significantly improved social-emotional skills by an average of 0.115 sd across both income groups, with high-income countries and low- and middle-income countries demonstrating similar significant effects of 0.09 and 0.13 sd, respectively (columns 2 and 3). The average effect size for outcomes related to child behavior in high-income countries was a statistically insignificant 0.07 sd, although the sample of studies evaluating interventions focused on quality improvements did show marginally significant gains of 0.14 sd. Studies conducted in low- and middle-income countries did not collect this outcome. Appendix Figures 7–10 graph study-specific effect sizes and visually demonstrate the overall positive but still mixed results for these outcomes.25 5.2.3 The learning environment Table 6 reports average effects on parental engagement at home, teacher practices in the classroom, and teacher well-being. The average effect sizes pooled across country-income groups suggest that preprimary interventions can improve children’s learning environments, raising parental engagement at home significantly by an average of 0.097 sd and teachers’ practices in the classroom by a significant 0.473 sd (column 1). While average effect sizes are not statistically significant in the low- and middle-income country sample (column 3), they are large and cannot be statistically distinguished from the average effect sizes of high-income countries (column 2). Only two studies, both focused on the same intervention, report effects on measures of teachers’ professional well-being. Although positive, the average effect size across these outcomes is statistically insignificant. The pattern of study-specific effect sizes for parental engagement, 25 In high-income contexts, the largest effect sizes, both positive and negative, come from a US study reporting a statistically significant improvement in behavior of 1.06 sd to -0.77 for externalizing behavior following a teacher intervention to reduce children’s behavioral problems (Raver et al., 2009). In low- and middle-income contexts, standardized effect sizes range from a significant 0.45 sd on a social composite index following exposure to a preprimary curriculum that promoted social and emotional abilities through play in India (Dillon et al., 2017), to an insignificant -0.08 sd in prosocial scores following a preprimary school expansion in Cambodia (Bouguen et al., 2018). 30 teacher practice, and teacher professional well-being in Appendix Figures 11–14 is consistent with the overall positive, but still mixed, results for these outcomes. Persistence of advantage beyond preprimary education This section reports robust variance meta-regression results for the average effect size of preprimary interventions after the preprimary period. If preprimary education imparts skills that help children learn and makes subsequent education efforts more effective, then we should expect to see the benefits of preprimary interventions persist when studies track children into post-preprimary education or adulthood. Unfortunately, few studies estimate impacts beyond the preprimary period and no studies in low- and middle-income countries report impacts measured in adulthood. Thus, we often lack statistical power for inference. Table 7 (Panels A–C) reports average effect sizes for outcomes measuring school participation and progression, cognitive skills, executive functions, behavior, and social-emotional skills when children are between the ages of 6 and 18 years. Table 8 presents average effect sizes for health and labor outcomes measured in adulthood.26 Overall, the results suggest that preprimary interventions can generate advantages that last beyond the preprimary period, particularly in language and social-emotional skills and executive functions. Average effect sizes for adult outcomes, however, could not be statistically distinguished from zero. 5.3.1 School participation and progression The meta-regression results in Table 7 Panel A suggest a positive but statistically insignificant average effect size of 0.142 sd for the impact of preprimary interventions on subsequent school participation and progression in the post-preprimary period and a marginally significant gain of 0.07 sd in adulthood. The study-specific effect sizes in Appendix Figures 16- 17 suggest differences across income contexts. Preprimary interventions can improve outcomes like high-school graduation rates in high-income countries (Rossin-Slater & Wust, 2020; Gray- Lobe et al., 2021), but the overall effect of these interventions on school participation and progression appears to be low after the preprimary period. On the other hand, in low- and middle-income contexts, effect sizes are positive and significant in most studies. 26Only three studies tracked parental engagement in stimulation and learning at home beyond the preprimary period (Gelber & Isen, 2013; Ozler et al., 2018; Spier et al., 2020). 31 5.3.2 Cognitive skills When we pool effect sizes across high-income countries and low- and middle-income contexts and aggregate outcomes into a single category of cognitive skills, we find a significant persistent advantage of 0.071 sd in the post-preprimary period following interventions that either expanded or improved preprimary education (Table 7, Panel B). Effect sizes are generally similar across skills and across country income contexts, although only pooled effects for language are statistically significant.27 5.3.3 Executive function, social-emotional skills, and behavior The estimated average effect size pooled across income contexts and across outcomes in Table 7, Panel C suggests significant persistence of skill advantages of 0.068 sd in other domains associated with learning. In the post-preprimary period, children benefitting from preprimary interventions show a significant advantage of 0.058 sd in executive functions and 0.094 sd in social-emotional skills. Again, estimated effect sizes are similar in magnitude and statistically indistinguishable across country-income contexts. For behavioral outcomes measured in the post-preprimary period, advantages are smaller and statistically insignificant. While large (0.30 sd), the average effect on behavior measured in adulthood is statistically insignificant.28 5.3.4 Health Table 8 reports average effects on health-related outcomes measured after the preprimary period, while Appendix Figure 25 presents study-specific effects. Most health outcomes measured in low- and middle-income countries in our sample are anthropometric measures, which may be more difficult to influence with preprimary education interventions, particularly height, which tends to be largely determined by individuals’ very early development and is more 27 Appendix Figures 18–21, which plot study-specific effect sizes for cognitive skills, suggest that the limited longitudinal evidence is mixed. Some studies exhibit economically and statistically meaningful impacts; others report negative but statistically insignificant results. In high-income contexts, for example, effect sizes—when standardized—could be as high as the significant 0.38 sd in first-grade math scores following a change in the preprimary math curriculum and teacher training/coaching (Clements et al., 2013), and as low as the significant -0.14 sd in a state achievement test following an offer of a slot in a subsidized pre-k program (Lipsey et al., 2018). Similarly, in low-income contexts, standardized effect sizes could be as high as a significant 0.23 sd in literacy scores measured in first grade when an additional year of preprimary education is offered to children (Spier et al., 2020), and as low as a statistically insignificant -0.05 sd in literacy scores following a teacher training and coaching program (Wolf et al., 2019b). 28 As for outcomes related to cognitive skills, study-specific effect sizes in Appendix Figures 22–24 make apparent both low sample sizes for executive functions, social-emotional skills, and behaviors after the preprimary period and mixed results, as well as a high count of statistically insignificant effects. 32 or less fixed by the age of four (Schultz, 2010). On the other hand, a large share of the outcomes reported for high-income countries capture health outcomes, choices, and behaviors that may be more amenable to influence by preprimary interventions that promote child development in terms of cognitive skills, executive functions, and social-emotional skills, which in turn can influence health behavior later in life (Heckman, 2007). Nevertheless, we pool study-specific effects across income groups to have sufficient observations to estimate an overall average effect size. We find that preprimary interventions lead to statistically insignificant gains of 0.25 and 0.03 sd, respectively, during the post-preprimary period and in adulthood. 5.3.5 Labor and earnings Similarly, the overall average effect size on labor outcomes measured in adulthood is small, positive, but insignificant (Table 8), in contrast to some evaluations that find positive and significant impacts of preprimary interventions on earnings (Heckman et al., 2010) and the likelihood of working (Bailey et al., 2020b) and reductions in the likelihood of being a low earner or receiving public assistance (Havnes & Mogstad, 2011; Bailey et al., 2020b). Appendix Figure 26, however, shows a preponderance of sizable positive effects, both significant and insignificant, for labor-related outcomes. The small average effect size, therefore, likely results from the two quite large negative effects (-0.47 SD and -0.38 SD) estimated for very young males (aged 19-21) in the Carneiro and Ginja (2014) Head Start evaluation and the Heckman et al., 2010 of the Perry Preschool Program. For the case of the Heckman et al., (2010) study, the same large, negative, and insignificant effect estimated among the 19-years-old male sample for not being jobless in the previous year turns large and positive when the men are older. Heterogeneity by socioeconomic status This section reports average effect sizes estimated from Equation (5) when we limit our sample to coefficients estimated separately for subpopulations defined by their socioeconomic status. Given the infrequency of separate estimates for these subpopulations, we only report outcomes aggregated into six categories: school participation and progression; cognitive skills; executive functions, behavior, and social-emotional skills; parental behavior; health; and labor. We control for both the level of education at the time of measurement and for country-income context whenever possible. 33 The average effect sizes across these pooled outcome categories in Table 9 suggest that preprimary interventions have greater impacts on more disadvantaged populations. For outcomes related to school participation and progression, estimated effects are 0.10 sd higher on average for children coming from lower socioeconomic backgrounds, compared to their less disadvantaged peers. Estimated impacts for children who were more disadvantaged when exposed to the preprimary interventions are a significant 0.068 standard deviations higher for cognitive skills, representing around 50 percent of the full sample average effect size reported in Table 5. While not consistently statistically significant across the specifications in Table 9, the average effect of coming from a low socioeconomic background on estimated impacts related to executive functions, behavior, and social-emotional skills is also positive and large, representing 75 percent of the full sample average effect size reported in Table 5. Parents of children from lower socio-economic backgrounds increase their parental engagement by a significant 0.06 standard deviations more than parents of children of higher socio-economic status, and their children’s health also improves significantly more (0.08 standard deviations). Finally, preprimary interventions do not appear to disproportionately benefit one group over another when it comes to labor-related outcomes. Overall, we do not find significant differences in these impacts between high income countries and low & middle-income countries. Economic analysis In Table 10, we present our findings from an economic evaluation of a subset of studies in low- and middle-income countries that reported sufficient information for us to compare costs per child to an estimate of the gain in lifetime earnings the child would receive from an improvement in cognitive skills. Panel A reports calculations for a discount rate of 3 percent, while Panel B presents results from a more conservative rate of 5 percent. Ganimian et al., (2021) report a range of benefit-to-cost ratios following the same method for a discount rate of 3 percent, which we directly report in Panel A. Appendix Table 4 documents the assumptions underlying these calculations as well as data sources for parameters such as the labor force participation rate and growth in real wages. In Panel A, our smallest benefit-to-cost ratio is 3.5 when we use the smallest estimate of the returns to cognitive skills (4.3 percent per standard deviation) to translate the smallest estimate of gains in cognitive skills to gains in earnings for the Spier et al. (2020) study, which assessed the impact of adding an additional year of preprimary education for four-year old children on top of 34 the mandatory year prior to the start of first grade in Bangladesh. Our largest benefit-to-cost ratio is 103.5 when we use the largest estimate of the returns to cognitive skills (13 percent per standard deviation) and the largest estimate of the returns to social-emotional skills (15 percent per standard deviation) to translate the smallest estimated average treatment effects into earnings for the Wolf et al. (2019a) study. This study assessed the impact of an in-service teacher training program in Ghana that sought to help teachers transition to a more holistic, child-centered curriculum.29 Although both estimates of the benefit-to-cost ratio are quite high, it is understandable that the ratio would be higher for a teaching training program implemented once with some refresher sessions than for the provision of an additional year of education. Panel B uses the more conservative discount rate of 5 percent to translate lifetime earnings into a present value (Haacker et al., 2020). We still find benefit-to-cost ratios all above 1. The minimum value for this ratio is 14.2 for the Gallego et al. (2021) study, which assessed the impact of a change in the math curriculum. The maximum value, which is for the Wolf et al. (2019a) study, is still quite large at 49.7. 6. Discussion and conclusion This study investigates whether current investments in preprimary education are too low through a systematic review of quasi-experimental evidence from around the world. Using 1,017 outcomes extracted and standardized from 55 studies, we use robust variance meta-regression to assess the returns to investment in preprimary education by establishing whether on average there is demand for preprimary programs and the extent to which preprimary education improves children’s skills during the preprimary period. To more concretely gauge returns, we also combine estimates from the literature on the relationship between improvements in cognitive skills and future earnings with study-specific estimates of improvements in these skills in a subset of studies from low- and middle-income countries that report per-child costs. We find that interventions that expand access to preprimary education lead to significant increases in the take-up of preprimary education services and school participation during the 29Though poor children did show positive and significant gains in cognitive skills in Brinkman et al. (2017), the average treatment effects for all cognitive skills in the full sample were not statistically distinguishable from zero. However, even in the full sample, there were average gains in social-emotional skills. If we estimate wage returns only for these gains in social emotional skills with a discount rate of 3 percent, we still find sizable benefit-to-cost ratios of 8.1 (lower bound) to 24.5 (upper bound). 35 preprimary period. On average, preprimary education programs significantly improve children’s cognitive and social-emotional skills and executive functions in the short run, suggesting that these kinds of services promote learning and skill development better than children’s alternative care options during the preprimary period. A translation of these effects into gains in children’s earnings suggest sizable benefit-to-cost ratios, ranging from 1.7 to 103.5, and thus provide strong evidence in favor of increasing funding for early childhood education. We also use meta-regression methods to indirectly assess whether spending more on preprimary education would be a cost-effective way to improve children’s skills. We assess whether the skill advantages among children who benefit from preprimary education persist beyond preprimary education and into adulthood, which would be consistent with preprimary education complementing later education. We also test whether targeted programs would be more cost-effective by investigating if disadvantaged populations consistently benefit more from preprimary education interventions than populations with higher socioeconomic status. Although fewer studies track children beyond the preprimary years, we find evidence of statistically significant persistent effects of preprimary interventions in both cognitive skills and “non-cognitive” skills. Even fewer studies—all of them from high-income countries—follow children into adulthood. Results from this sample suggest positive but statistically insignificant effects on health and labor outcomes. We also find significantly higher impacts for children who come from groups associated with lower socioeconomic status. This set of results, together with the currently low coverage rates in low- and middle-income countries, suggest that current levels of investment in preprimary education may be suboptimal. That is, an increase in spending on preprimary education coverage and quality may improve the overall efficiency of education spending, particularly if investments first target children from lower socioeconomic backgrounds. Given that children from disadvantaged backgrounds tend to have very low access to preprimary education and alternative learning opportunities (McCoy et al., 2018), minimum quality thresholds for services may depend on local conditions. Thus, even preprimary interventions that do not cover every aspect of quality could still improve learning outcomes for very disadvantaged children (Cascio & Schanzenbach, 2014). Our results suggest that existing preprimary programs have on average promoted higher skill development 36 compared to what children would have experienced in the absence of these programs (either at home or in alternative programs), even in low- and middle-income countries. While our sample of 1,017 estimated effect sizes is large, we recognize some limitations in our study design that may affect the magnitude and significance of the average effect sizes we estimate for school participation, knowledge and skills, and health and economic well-being. For example, we restricted our search to studies that had been published in peer-reviewed journals or through formal working paper series. Given levels of publication-bias estimated multiple disciplines (Ioannidis, 2008; Ioannidis et al., 2017), this decision might bias us toward a positive and significant average effect. On the other hand, we also aimed to extract all relevant outcomes from each included study, even if they might not be considered the best metrics to assess our outcomes of interest. This decision, and the attendant measurement error (attenuation bias) in many extracted outcomes, could bias us toward extracting multiple insignificant effects. Nevertheless, from our large sample of published studies and formal working papers, we find positive and often persistent average effects of preprimary interventions on school participation and progression, as well as on cognitive and social-emotional skills, during the preprimary period and beyond. Future research that aggregates evidence across studies would ideally try to understand variation in effectiveness based on program and child characteristics. The identification of what makes programs most effective could help make the returns and cost- effectiveness of preprimary education programs even higher. 37 References Bailey, D.H., G.J. Duncan, F. Cunha, B.R. Foorman, and D.S., and Yeager. 2020a. "Persistence and Fade- Out of Educational-Intervention Effects: Mechanisms and Potential Solutions." Psychological Science in the Public Interest 21(2): 55-97. Bailey, M.J., B.D. Timpe, and S. Sun. 2020b. Prep School for poor kids: The long-run impacts of Head Start on Human capital and economic self-sufficiency. (No. w28268): National Bureau of Economic Research. Baird, S., F.H. Ferreira, B. Özler, and M. Woolcock. 2014. "Conditional, unconditional and everything in between: a systematic review of the effects of cash transfer programmes on schooling outcomes." Journal of Development Effectiveness 6 (1): 1-43. Baker, M., J. Gruber, and K. Milligan. 2015. Non-cognitive deficits and young adult outcomes: The long- run impacts of a universal child care program. (No. w21571): National Bureau of Economic Research. Baker, M., J. Gruber, and K. Milligan. 2008. "Universal child care, maternal labor supply, and family well- being." Journal of Political Economy 116 (4): 709-745. Banerjee, A,, Sh. Cole, E. Duflo, and L. Linden. 2007. "Remedying Education: Evidence from Two Randomized Experiments in India." Quarterly Journal of Economics, 122 (3): 1235-1264. Belfield, C., A.B. Bowden, A. Klapp, H. Levin, R. Shand, and S. Zander. 2015. "The economic value of social and emotional learning." Journal of Benefit-Cost Analysis 6 (3): 508-544. Berkes, J., A. Bouguen, D. Filmer, and T. Fukao. 2019. Improving Preschool Provision and Encouraging Demand: Heterogeneous Impacts of a Large-Scale Program. Washington D.C.: The World Bank Working Paper Series. Berlinski, S., S. Galiani, and M. Manacorda. 2008. "Giving children a better start: Preschool attendance and school-age profiles." Journal of Public Economics 92 ((5-6)): 1416-1440. Bernal, R., and S.M. Ramírez. 2019. "Improving the quality of early childhood care at scale: The effects of “From Zero to Forever”." World Development 118: 91-105. Bernal, R., O. Attanasio, X. Pena, and M. Vera-Hernández. 2019. "The effects of the transition from home- based childcare to childcare centers on children’s health and development in Colombia." Early Childhood Research Quarterly 47: 418-431. Bouguen, A., D. Filmer, K. Macours, and S. Naudeau. 2018. "Preschool and parental response in a second best world evidence from a school construction experiment." Journal of Human Resources (The World Bank Working Paper Series) 53 (2): 474-512. Brinkman, S.A., A. Hasan, H. Jung, A. Kinnell, and M. Pradhan. 2017. "The impact of expanding access to early childhood education services in rural Indonesia." Journal of Labor Economics 35 (S1): S305-S335. Brodeur, A., N. Cook, and A. Heyes. 2020. "Methods matter: P-hacking and publication bias in causal analysis in economics." American Economic Review 110 (11): 3634-3660. 38 Brotman, L.M., S. Dawson-McClure, D. Kamboukos, K.Y. Huang, E.J. Calzada, K. Goldfeld, and E. Petkova. 2016. "Effects of ParentCorps in prekindergarten on child mental health and academic performance: Follow-up of a randomized clinical trial through 8 years of age." JAMA Pediatrics 170 (12): 1149-1155. Campbell, F., G. Conti, J.J. Heckman, S.H. Moon, R. Pinto, E. Pungello, and Y. Pan. 2014. "Early childhood investments substantially boost adult health." Science 343 (6178): 1478-1485. Carneiro, P., and R. Ginja. 2014. "Long-term impacts of compensatory preschool on health and behavior: Evidence from Head Start." American Economic Journal: Economic Policy 6 (4): 135-73. Cascio, E.U. 2015. The promises and pitfalls of universal early education. IZA World of Labor. Cascio, E.U., and D.W. Schanzenbach. 2014. Proposal 1: Expanding preschool access for disadvantaged children. Policies to address poverty in America, p.19. Cascio, E.U., and D.W., and Schanzenbach. 2013. "The impacts of expanding access to high-quality preschool education ." National Bureau of Economic Research (No. w19735). Chetty, R., J.N. Friedman, N. Hilger, E. Saez, D.W. Schanzenbach, and D. Yagan. 2011. "How does your kindergarten classroom affect your earnings? Evidence from Project STAR." The Quarterly Journal of Economics 126 (4): 1593-1660. Clements, D.H., J., Sarama, C.B. Wolfe, and M.E. Spitler. 2013. "Longitudinal evaluation of a scale-up model for teaching mathematics with trajectories and technologies: Persistence of effects in the third year." American Educational Research Journal 50 (4): 812-850. Conti, G., J.J. Heckman, and R. Pinto. 2016. "The effects of two influential early childhood interventions on health and healthy behaviour." The Economic Journal 126 (596): F28-F65. Cunha, F., and J., Heckman. 2007. "The technology of skill formation." American Economic Review 97 (2): 31-47. Currie, J. M. 2001. Early childhood intervention programs: what do we know? Brookings Institution, Brookings Roundtable on Children. Dean, J.T., and S. Jayachandran. 2020. "Attending kindergarten improves cognitive development in India, but all kindergartens are not equal." Dehaene, S. 2010. Reading in the brain: The new science of how we read. Penguin Group USA. Deming, D. 2009. "Early childhood intervention and life-cycle skill development: Evidence from Head Start." American Economic Journal: Applied Economics 1 (3): 111-34. DeWind, N. K., J. Park, M. G. Woldorff, and E. M. Brannon. 2019. "Numerical encoding in early visual cortex." Cortex 114: 76-89. Diamond, A. 2013. "Executive functions." Annual Review of Psychology 64: 135-168. Dillon, M.R., H. Kannan, J.T. Dean, E.S. Spelke, and E. Duflo. 2017. "Cognitive science in the field: A preschool intervention durably enhances intuitive but not formal mathematics." Science 357 (6346): 47-55. 39 Duflo, E., P. Dupas, and M. Kremer. 2011. "Peer Effects, Teacher Incentives, and the Impact of Tracking: Evidence from a Randomized Evaluation in Kenya ." American Economic Review 10 (5): 1739-74. Duncan, G.J., and K. Magnuson. 2013. "Investing in preschool programs." Journal of Economic Perspectives 27 (2): 109-32. Evans, D.K., and A., Popova. 2016. "What really works to improve learning in developing countries? An analysis of divergent findings in systematic reviews." The World Bank Research Observer 31 (2): 242-270. Evans, D.K., and F. Yuan. 2019. What We Learn about Girls' Education from Interventions that Do Not Focus on Girls. Washington D.C.: World Bank Policy Research Working Paper, (8944). Evans, D.K., P. Jakiela, and H.A. Knauer. 2021. "The impact of early childhood interventions on mothers." Science 372 (6544): 794-796. Fernald, L.C., A. Weber, E. Galasso, and L. Ratsifandrihamanana. 2011. "Socioeconomic gradients and child development in a very low income population: evidence from Madagascar." Developmental Science 14 (4): 832-847. Fernald, L.C., E. Prado, and Kariger, P. Raikes, A. 2017. A toolkit for measuring early childhood development in low and middle-income countries. Washington D.C.: The World Bank. Galasso, E., and A. Wagstaff. 2019. "The aggregate income losses from childhood stunting and the returns to a nutrition Intervention aimed at reducing stunting." Economics & Human Biology (Policy Research Note World Bank) 34: 225-238. Gallego, F.A., E. Näslund-Hadley, and M. Alfonso. 2021. "Changing Pedagogy to Improve Skills in Preschools: Experimental Evidence from Peru." The World Bank Economic Review 35 (1): 261- 286. Ganimian, A.J., K. Muralidharan, and C.R. Walters. 2021. Augmenting State Capacity for Child Development: Experimental Evidence from India. (No. w28780): National Bureau of Economic Research. Garces, E., D. Thomas, and J. Currie. 2002. "Longer-term effects of Head Start." American Economic Review 92 (4): 999-1012. Gelber, A., and A. Isen. 2013. " Children's schooling and parents' behavior: Evidence from the Head Start Impact Study." Journal of Public Economics 101: 25-38. Haacker, M., T.B. Hallett, and R. Atun. 2020. "On discount rates for economic evaluations in global health." Health Policy and Planning 35 (1): 107-114. Hamlin, J., T. Ullman, J. Tenenbaum, and N. Baker, C. Goodman. 2013. "The mentalistic basis of core social cognition: Experiments in preverbal infants and a computational model." Developmental Science 16 (2): 209-226. Havnes, T., and M. Mogstad. 2011. "No child left behind: Subsidized child care and children's long-run outcomes." American Economic Journal: Economic Policy 3 (2): 97-129. Heckman, J.J. 2006. "Skill formation and the economics of investing in disadvantaged children." Science 312 (5782): 1900-1902. 40 Heckman, J.J. 2007. "The economics, technology, and neuroscience of human capability formation. ." Proceedings of the National Academy of Sciences 104(33): 13250-13255. Heckman, J.J., S.H. Moon, R. Pinto, P.A. Savelyev, and A. and Yavitz. 2010. "The rate of return to the HighScope Perry Preschool Program." Journal of Public Economics 94 (1-2): 114-128. Hedberg, E.C. 2011. ROBUMETA: Stata module to perform robust variance estimation in meta-regression with dependent effect size estimates. Boston: Boston College Working Papers in Economics. Hedges, L.V., E. Tipton, and M.C. Johnson. 2010. "Robust variance estimation in meta‐regression with dependent effect size estimates." Research Synthesis Methods 1 (1): 39-65. Hendren, N., and B. Sprung-Keyser. 2020. "A unified welfare analysis of government policies." The Quarterly Journal of Economics 135 (3): 1209-1318. Hidrobo, M., J. Hoddinott, N Kumar, and M. Olivier. 2018. "Social protection, food security, and asset formation." World Development (101): 88-103. Horowitz, J.L., and C.F. Manski. 2000. "Nonparametric analysis of randomized experiments with missing covariate and outcome data." Journal of the American Statistical Association 95 (449): 77-84. Ioannidis, J.P. 2008. "Why most discovered true associations are inflated." Epidemiology 640-648. Ioannidis, J.P., T.D. Stanley, and H. Doucouliagos. 2017. "The power of bias in economics research." Jara‐Ettinger, J., E Gibson, and Kidd, C. Piantadosi, S. 2016. "Native Amazonian children forego egalitarianism in merit‐based tasks when they learn to count." Developmental Science 19 (6): 1104- 1110. Johnson, R.C., and C.K. Jackson. 2019. "Reducing inequality through dynamic complementarity: Evidence from Head Start and public school spending." American Economic Journal: Economic Policy 11 (4): 310-49. Kline, P., and C.R. Walters. 2016. "Evaluating public programs with close substitutes: The case of Head Start." The Quarterly Journal of Economics, 131 (4): 1795-1848. Knudsen, E.I. 2004. "Sensitive periods in the development of the brain and behavior." Journal of Cognitive Neuroscience 16 (8): 1412-1425. Knudsen, E.I., J.J. Heckman, J.L. Cameron, and J.P. Shonkoff. 2006. "Economic, neurobiological, and behavioral perspectives on building America’s future workforce." Proceedings of the national Academy of Sciences 103 (27): 10155-10162. Kremer, M., and A. Holla. 2009. "Improving education in the developing world: what have we learned from randomized evaluations?" Annu. Rev. Econ. 1 (1): 513-542. Krueger, A.B. 1999. "Experimental estimates of education production functions." The Quarterly Journal of Economics 114 (2): 497-532. Lee, D.S. 2009. "Training, wages, and sample selection: Estimating sharp bounds on treatment effects." The Review of Economic Studies 76 (3): 1071-1102. Lipsey, M.W., and D.B., Wilson. 2001. Practical meta-analysis. SAGE publications, Inc. 41 Lipsey, M.W., D.C. Farran, and K. Durkin. 2018. "Effects of the Tennessee Prekindergarten Program on children’s achievement and behavior through third grade." Early Childhood Research Quarterly 45: 155-176. Martinez, S., S. Naudeau, and V. Pereira. 2017. Preschool and child development under extreme poverty: evidence from a randomized experiment in rural Mozambique. Washington D.C.: The World Bank. McCoy, D.C., C. Salhi, H. Yoshikawa, M. Black, P. Britto, and G. Fink. 2018. "Home-and center-based learning opportunities for preschoolers in low-and middle-income countries." Children and Youth Services Review 88: 44-56. McCoy, D.C., H. Yoshikawa, K.M. Ziol-Guest, G.J. Duncan, H.S. Schindler, K. Magnuson, R. Yang, A. Koepp, and J.P., Shonkoff. 2017. "Impacts of early childhood education on medium-and long-term educational outcomes." Educational Researcher 46 (8): 474-487. Meager, R. 2019. "Understanding the average impact of microcredit expansions: A bayesian hierarchical analysis of seven randomized experiments." American Economic Journal: Applied Economics 11 (1): 57-91. Naudeau, S., S. Martinez, P. Premand, and D. Filmer. 2011. "Cognitive development among young children in low-income countries, is No Small Matter." In The Impact of Poverty Shocks, and Human Capital Investments in Early Childhood Development, by editor Harold Alderman, 9-50. Washington D.C.: The World Bank. Nores, M., and W.S. Barnett. 2010. "Benefits of early childhood interventions across the world:(Under) Investing in the very young." Economics of Education Review 29 (2): 271-282. Ozler, B., L. C. H. Fernald, P. Kariger, C. McConnell, M. Neuman, and E. Fraga. 2018. "Combining preschool teacher training with parenting education: A clusterrandomized controlled trial." Journal of Development Economics (The World Bank.) 133: 448–467. Phillips, D., M. Lipsey, K. Dodge, R Haskins, D. Bassok, M. Burchinal, G. Duncan, M. Dynarski, K. Magnuson, and Ch. Weiland. 2017. The Current State of Scientific Knowledge on Pre- Kindergarten Effects. Washington D.C.: The Brookings Institute. Raver, C.C., S.M. Jones, C. Li-Grining, F. Zhai, M.W. Metzger, and B. Solomon. 2009. "Targeting children's behavior problems in preschool classrooms: A cluster-randomized controlled trial." Journal of Consulting and Clinical Psychology 77 (2): 302. Rossin-Slater, M., and M. Wüst. 2020. "What is the Added Value of Preschool for Poor Children? Long- Term and Intergenerational Impacts and Interactions with an Infant Health Intervention." American Economic Journal: Applied Economics 12 (3): 255-86. Schady, N., J. Behrman, M.C. Araujo, R. Azuero, R. Bernal, D. Bravo, F. Lopez-Boo, et al. 2014. Wealth gradients in early childhood cognitive development in five Latin American countries. Washington D.C.: The World Bank. Schultz, T.P. 2010. "Population and Health Policies." In Handbook of Development Economics 5, edited by Mark Rosenzweig and Dani Rodrik, 4785-4881. 42 Shonkoff, J.P., D.A. Phillips, and National Research Council. 2000. The developing brain. National Academies Press (US): In From neurons to neighborhoods: The science of early childhood development. . Spelke, E., and K. Shutts. forthcoming. "Chapter 1: Learning in the Early Years." In Quality Early Learning: Nurturing Children's Potential, edited by A. Devercelli and M Bendini. Spier, E., K. Kamto, A. Molotsky, A. Rahman, N. Hossain, Z. Nahar, and H. and Khondker. 2020. "Bangladesh Early Years Preschool Program Impact Evaluation." Tamis-LeMonda, C.S, K. E Adolph, S. A. Lobo, L. B. Karasik, S.. Ishak, and K. A. Dimitropoulou. 2008. "When infants take mothers' advice: 18-month-olds integrate perceptual and social information to guide motor action." Developmental Psychology 44 (3): 734. Tanner‐Smith, E.E, and E. Tipton. 2014. "Robust variance estimation with dependent effect sizes: Pra ctical considerations including a software tutorial in Stata and SPSS." Research Synthesis Methods 5 (1): 13-30. Tanner-Smith, E.E., E. Tipton, and J.R. Polanin. 2016. "Handling complex meta-analytic data structures using robust variance estimates: A tutorial in R." Journal of Developmental and Life-Course Criminology 2 (1): 85-112. Tipton, E. 2015. "Small sample adjustments for robust variance estimation with meta-regression." Psychological Methods 20 (3): 375. Tipton, E., J.E. Pustejovsky, and H. Ahmadi. 2019. "Current practices in meta‐regression in psychology, education, and medicine." Research Synthesis Methods, 10 (2): 180-194. Tomasello, M., M. Carpenter, J. Call, and T., Moll, H. Behne. 2005. "Understanding and sharing intentions: The origins of cultural cognition." Behavioral and brain sciences 28 (5): 675-691. UIS, (UNESCO Institute for Statistics). 2018. UIS Stat. Accessed March 4, 2021. http://data.uis.unesco.org/. UNICEF. 2019. A World Ready to Learn: Prioritizing quality early childhood education. Global Report. . New York: UNICEF. van Huizen, T., and J. Plantenga. 2018. "Do children benefit from universal early childhood education and care? A meta-analysis of evidence from natural experiments." Economics of Education Review 66: 206-222. Vivalt, E. 2020. "How much can we generalize from impact evaluations?" Journal of the European Economic Association 18 (6): 3045-3089. Wang, Y., R. Williams, L. Dilley, and D. M. Houston. 2020. "A meta-analysis of the predictability of LENA™ automated measures for child language development." Developmental Review 57: 100921. Wolf, S., J.L. Aber, J.R. Behrman, and E. and Tsinigo. 2019a. "Experimental impacts of the "Quality Preschool for Ghana” interventions on teacher professional well -being, classroom quality, and children’s school readiness." Journal of Research on Educational Effectiveness 12 (1): 10-37. 43 Wolf, S., J.L. Aber, J.R. Behrman, and M. Peele. 2019b. "Longitudinal causal impacts of preschool teacher training on Ghanaian children’s school readiness: Evidence for persistence and fade‐out." Developmental Science 22 (5): e12878. World Bank. 2013. What Matters Most for Early Childhood Development: A Framework Paper. Washington, D.C: World Bank. SABER Working Paper Series. World Bank. 2018. World Development Report 2018 : Learning to Realize Education's Promise. Washington, DC: World Bank. Yoshikawa, H., C. Weiland, J. Brooks-Gunn, M.R. Burchinal, L.M. Espinosa, W.T. Gormley, J. Ludwig, K.A. Magnuson, D. Phillips, and M.J. Zaslow. 2013. Investing in Our Future: The Evidence Base on Preschool Education. Society for Research in Child. Society for Research in Child Development. Yuan, S., and C. Fisher. 2009. "Really? She blicked the baby? Two-year-olds learn combinatorial facts about verbs by listening." Psychological Science 619-626. Zelazo, P.D., U. Müller, D. Frye, S. Marcovitch, G. Argitis, J. Boseovski, J.K. Chiang, et al. 2003. "The development of executive function in early childhood." Monographs of the society for research in child development i-151. 44 Figures and tables Figure 1a: Children in early childhood education by country income and household wealth 90% 0.784 80% 70% 0.637 60% 0.557 0.482 50% 40% 30% 0.264 0.271 20% 0.155 10% 0.076 0% Low Income Low-Middle Income High-Middle Income High Income Bottom quintile Top quintile Note: Data from the Multiple Indicator Cluster Surveys, reported in McCoy et al., 2018 Figure 2b: Children receiving stimulation at home and attending early childhood education by country income 100% 0.092 0.228 0.421 0.607 90% 0.14 High stimulation & preprimary 0.29 Low stimulation & preprimary 80% 0.539 High stimulation & no preprimary 0.471 70% Low stimulation & no preprimary 0.658 60% 0.708 50% 0.834 40% 0.875 30% 0.386 20% 0.226 10% 0.116 0% 0.074 Low Income Low-Middle High-Middle High Income Income Income Note: Data from the Multiple Indicator Cluster Surveys, reported in McCoy et al., 2018 Figure 3: Study Selection Process Studies identified through Studies identified through Identification database searches other sources (e.g. experts) (N=183) (N=87) Studies excluded from Screening 1: Citation - Abstract Screening Stage (N=111) Total studies screened Reasons for exclusion: - Not center-based program (N=270) - Not targeting children 3 to 6 - Not peer-reviewed journal article or Screening 1 WBG/NBER working paper series - Not quasi-experimental or experimental study - Duplicates/working papers of final published papers Full-text studies assessed for Studies excluded from eligibility Screening 2: Full Text (N=159) Screening Stage (N=68) Reasons for exclusion: Screening 2 --Study IE quality checks (identification strategy, attrition, compliance etc.) -No impact estimate, standard errors, confidence interval, p-value, sample size etc. -Not enough information to calculate effect size Total studies included for review and for citation (N=91) Note: [Identification/Search Stage: Phase 1 and Phase 2]: includes searching literature from Economics, Education and Psychology, SIEF Portfolio, Google Scholar, ProQuest search (WB Librarian), external reviewers, experts, hand search and bibliographies. [Screening 1: Phase 3]: review of citations and Included Total studies included for abstracts to eliminate duplicates, not published papers and papers that do not estimate rigorously the impacts review or data extraction of preprimary programs or policies. [Screening 2: (N=55) Phase 4]: review of full text studies, eliminate papers based on lack of a rigorous quantitative analysis (e.g. identification strategy, attrition, compliance etc.). [Data Extraction: Phase 5]: coding of information at the study, intervention and outcomes (effect size) level. Figure 4: Geographical Distribution of Included Studies (N=55) Note: The sample of 55 studies included in our review covers the following countries: Argentina, Bangladesh, Cambodia, Canada, Chile, Colombia, Denmark, Gambia, Ghana, India, Indonesia, Malawi, Mozambique, Norway, Paraguay, Peru, Spain, U.S., and Uruguay. Table 1: Search and inclusion criteria for studies Panel A: Search Sources APA PcycNet, Academic Search Elite/EBSCO host, Econlit, ERIC, IDEAS, ScienceDirect, Social Science, Research Network (SSRN), Google Scholar, ProQuest Database, MEDLINE, NBER, WBG Working Papers, Strategic Impact Evaluation Fund (SIEF) Portfolio, JSTOR, JOLIS library catalogue – International Monetary Fund, World Bank and International Finance Corporation, PsycINFO, PubMed, and Web of Science Search terms Preprimary OR preprimary OR Pre-school OR Kindergarten OR early childhood development OR early childhood education OR early child development OR early child education OR early learning OR early cognitive development OR early skills development) AND (impact evaluation OR field experiment OR randomized OR program evaluation OR meta-analysis OR systematic review OR synthesis review OR qualitative study)>2005. Age 3 to 6; Countries: Global Panel B: Inclusion criteria Program beneficiaries Children aged 3-6 years Interventions Programs that provide group-based childcare to children 3 to 6 years old in a center setting with a formal or informal developmental and educational focus. This includes formal preprimary schools, community schools, preschools, kindergarten, pre-kindergarten, or daycare with an educational component. We also include supplementary / co-interventions targeting parents, teachers, or other inputs (i.e., teacher training, targeting curricula, pedagogical approaches, infrastructure) whose primary outcomes of interest are either child outcomes, or intermediary outcomes that affect the quality of a pre-school program. Study design Studies implementing experimental and quasi-experimental methods with a credible source of exogenous variation. We included studies employing one of the following designs: randomized control trial (RCT), regression discontinuity design (RDD), differences-in-differences (DID), instrumental variables (IV), and matching methods. At least one treatment arm must be able to isolate the effect of a preprimary program. Outcomes of interest Child outcomes: school participation and progression, cognitive skills, social- emotional skills, behavior, long-term educational attainment, adult health, and labor outcomes Teacher/parent (adult) outcomes: classroom practices and engagement in stimulation activities Publication type Peer-reviewed journal articles, WB/NBER Working Papers, and other working/discussion papers only if they included a formal institution/citation. We excluded studies that are not part of a working paper series (including PhD Dissertations, job market papers, and conference working papers) and other institutional policy publications. Publication date Any Geography Global Table 2: Study characteristics Standard Mean deviation (1) (2) Focused on high-income country 0.67 0.47 Publication timing 2007-2012 0.33 0.47 2013-2018 0.42 0.50 2019-2021 0.25 0.44 Published in peer-reviewed journal 0.85 0.36 Discipline Economics 0.51 0.50 Education 0.44 0.50 Psychology 0.05 0.23 Randomized control trial 0.71 0.46 Evaluation of expansion in coverage 0.47 0.50 Notes: Observations are studies, N=55. Table 3: Characteristics of interventions High-income Low- & middle- countries income countries (1) (2) Program category Teacher professional development 0.55 0.69 (0.07) (0.08) Subsidized or free access 0.49 0.03 (0.07) (0.03) Change in curriculum 0.29 0.16 (0.02) (0.07) Change in pedagogy 0.04 0.13 (0.03) (0.06) Provision of materials 0.14 0.28 (0.05) (0.08) Provision of new staff 0.00 0.09 (0.00) (0.05) Provision of health and nutrition services 0.08 0.00 (0.04) 0.00 School construction 0.02 0.34 (0.02) (0.09) Parental engagement 0.04 0.50 (0.03) (0.09) Community outreach 0.00 0.16 (0.00) (0.07) Extension of school day 0.02 0.00 (0.02) (0.00) Teacher payments 0.00 0.09 (0.00) (0.05) Type of provision Formal preprimary 0.90 0.56 (0.04) (0.09) Community-based preprimary 0.00 0.44 (0.00) (0.09) Daycare with educational component 0.10 0.00 (0.04) (0.00) School management Public 0.57 0.63 (0.07) (0.09) Public-private 0.43 0.25 (0.07) (0.08) Community 0.00 0.13 (0.00) (0.06) Teacher formally qualified 0.80 0.50 (0.06) (0.09) Program targets disadvantaged population 0.71 0.66 (0.07) (0.09) Notes: Observations are evaluated programs, N=81. Standard errors of presented means are in parentheses. Table 4: Characteristics of extracted outcomes Standard Mean Median deviation (1) (2) (3) Sample size 978,113 4,551,054 1,483 0.25 Fraction that are estimates for separate SES subpopulations 0.44 Fraction measured during period Preprimary (3-6 years) 0.59 0.49 Post-preprimary (7-18 years) 0.27 0.45 Adulthood (after 18 years) 0.13 0.34 Fraction measuring domain School participation and progression 0.17 0.38 Language 0.09 0.29 Literacy 0.05 0.22 Math 0.06 0.24 General cognitive 0.05 0.21 Executive function 0.07 0.26 Social-emotional learning 0.11 0.31 Behavior 0.07 0.26 Health 0.10 0.30 Labor 0.07 0.25 Parental engagement 0.08 0.28 Teacher practices 0.05 0.22 Teacher professional wellbeing 0.01 0.12 Child development (index) 0.01 0.10 Notes: Observations are extracted outcomes from studies in the sample, N=1,017. SES refers to socio- economic status. Table 5: Average effects of preprimary interventions on participation and skills during the preprimary period Quality Expansion improvement Low- & Low- & Low- & Pooled High- High- High- middle- middle- middle- sample income income income income income income (1) (2) (3) (4) (5) (6) (7) Panel A: School participation and progression School participation and progression 1.4048** 1.599* 1.2077 1.599* 1.4578 (0.4776) (0.739) (0.6391) (0.739) (0.775) N 29 7 22 7 19 Panel B: Cognitive skills Pooled skills 0.147*** 0.179*** 0.122*** 0.112*** 0.0762 0.249*** 0.158*** (0.0259) (0.0431) (0.0331) (0.0270) (0.0504) (0.0794) (0.0416) N 131 76 55 25 33 51 22 Language 0.108*** 0.115** 0.0992 0.0978** 0.0521 0.143* 0.1859 (0.0295) (0.0367) (0.0526) (0.0310) (0.0315) (0.0692) (0.1355) N 53 35 18 12 11 23 7 Literacy 0.216*** 0.238** 0.143 0.240** (0.0634) (0.0783) (0.105) (0.0937) 31 28 3 22 Math 0.224*** 0.360** 0.164*** 0.122* 0.1447 0.461** 0.1773** (0.0469) (0.111) (0.0406) (0.0161) (0.1323) (0.131) (0.0394) 28 10 18 5 7 5 11 General 0.0844* 0.0954 0.0803 0.128 0.0781 (0.0366) (0.0382) (0.0464) (0) (0.0570) 19 3 16 2 14 Panel C: Executive functions and social-emotional skills Pooled skills 0.121*** 0.136*** 0.1138*** -0.0122 0.0852 0.183*** 0.139 (0.0279) (0.0494) (0.0330) (0.0645) (0.0510) (0.0531) (0.0429) N 114 75 39 9 23 66 16 Executive functions 0.0951* 0.169* 0.0652 0.0124 0.218** (0.0387) (0.0778) (0.0394) (0.0121) (0.0629) N 42 31 11 8 30 Social-emotional skills 0.1150*** 0.0935** 0.1301** 0.1266 0.0995* 0.136* (0.0301) (0.0318) (0.0463) (0.0798) (0.0432) (0.0574) N 50 22 28 15 19 13 Behavior 0.075 -0.0114 0.140* (0.050) (0.0720) (0.0609) N 22 5 17 Notes: Standard errors estimated using small sample adjustments as in Tipton (2015) are in parentheses. *** p<0.01, ** p<0.05, * p<0.1 Number of observations (N) refers to number of study-specific average effects used in the specification. Results under the columns of “Expansion” or “Quality” are estimated average effects from studies e valuating interventions oriented to increase preschool expansion or to improve preschool quality, respectively. Specifications that pool skills use one large category for extracted outcomes - either cognitive skills (which combines language, literacy, math, and general cognitive skills) or what is typically referred to as non-cognitive skills (executive functions, social-emotional skills, and behaviors). See Appendix 1 for details on definitions of school participation and progression; language, literacy, math, and general skills; executive functions; social-emotional skills; and behavior. Cells are blank when all observations came from a single study. Table 6: Average effects of preprimary interventions on the learning environment during the preprimary period Pooled sample High-income Low- & middle-income (1) (2) (3) Parental engagement 0.0970** 0.0997 0.0882* (0.0228) (0.0337) (0.0334) N 28 9 19 Teacher practice in the classroom 0.473*** 0.422** 0.59 (0.0994) (0.106) (0.265) N 47 28 19 Teacher professional wellbeing 0.097 (0.0198) N 14 Notes: Standard errors estimated using small sample adjustments as in Tipton (2015) are in parentheses. *** p<0.01, ** p<0.05, * p<0.1 Number of observations (N) refers to number of study-specific average effects used in the specification. Cells are blank when all observations came from a single study. Table 7: Average effects of preprimary interventions on participation and skills after the preprimary period Pooled Low- & High-income sample middle-income (1) (2) (3) Panel A: School participation and progression School participation and progression post-preprimary 0.142 0.0198 0.272 (0.0888) (0.0347) (0.182) 22 11 11 School participation and progression in adulthood 0.0700* (0.0245) 19 Panel B: Cognitive skills post-preprimary Pooled skills 0.0718*** 0.0604 0.0862** (0.0236) (0.0375) (0.0311) 62 23 39 Language 0.0710** 0.0481 0.1491 (0.0241) (0.0220) (0.0937) 9 4 5 Literacy 0.0626 0.0842 0.0487 (0.0344) (0.0418) (0.0539) 14 7 7 Math 0.0571 0.0648 0.0512 (0.0311) (0.0566) (0.0361) 29 7 22 General 0.0603 0.0873 (0.0476) (0.0572) 10 5 Panel C: Executive functions and social-emotional skills Pooled skills 0.0686*** 0.062 0.0750*** (0.0190) (0.0778) (0.0178) 70 36 34 Executive functions post-preprimary 0.0580** 0.0558** (0.00751) (0.00806) 10 8 Social-emotional skills post-preprimary 0.0945** 0.0653 0.0966** (0.0342) (0.1082) (0.0338) 34 13 21 Behavior post-preprimary 0.0555 0.0207 0.0833 (0.0378) (0.0580) (0.0550) 26 21 5 Behavior during adulthood 0.302 (0.2345) 6 Notes: Standard errors estimated using small sample adjustments as in Tipton (2015) are in parentheses. *** p<0.01, ** p<0.05, * p<0.1 Number of observations (N) refers to number of study-specific average effects used in the specification. Specifications that pool skills use one large category for extracted outcomes - either cognitive skills (which combines language, literacy, math, and general cognitive skills) or what is typically referred to as non-cognitive skills (executive functions, social- emotional skills, and behaviors). See Appendix 1 for details on definitions of school participation and progression; language, literacy, math, and general skills; executive functions; social-emotional skills; and behavior. Outcomes measured in basic education were measured when children were between the ages of 6 and 18, whether they were currently participating in education or not. Outcomes measured in adulthood were measured after the age of 18. Only studies of interventions in high- income countries measured outcomes in adulthood. Cells are blank when all observations came from a single study. Table 8: Average effects of preprimary interventions on health and labor after the preprimary period Pooled sample (1) Health post-preprimary 0.2537 (0.1078) N 15 Health during adulthood 0.0363 (0.0066) N 12 Labor 0.0585 (0.0334) N 37 Notes: Standard errors estimated using small sample adjustments as in Tipton (2015) are in parentheses. *** p<0.01, ** p<0.05, * p<0.1 Number of observations (N) refers to number of study-specific average effects used in the specification. Outcomes measured in basic education were measured when children were between the ages of 6 and 18, whether they were currently participating in education or not. Outcomes measured in adulthood were measured after the age of 18. Only studies of interventions in high-income countries measured outcomes in adulthood. Table 9: Difference in average effects of preprimary interventions by socio-economic status (1) (2) (3) Panel A: School participation and progression Low socio-economic status 0.103 0.0560** 0.0569** (0.0674) (0.0264) (0.0264) N 36 36 36 Panel B: Cognitive skills Low socio-economic status 0.0688** 0.0714** 0.0492 (0.0318) (0.0299) (0.0329) N 44 44 44 Panel C: Executive functions and social-emotional skills Low socio-economic status 0.0653 0.0894* 0.0981** (0.0517) (0.0523) (0.0482) N 47 47 47 Panel D: Parental engagement Low socio-economic status 0.0568*** (0.0216) N 20 Panel E: Health Low socio-economic status 0.0805*** 0.0805*** (0.0202) (0.0202) N 18 18 Panel F: Labor Low socio-economic status 0.0322 (0.0432) N 32 With controls for education-level x x With controls for education-level and country-income group x Notes: Standard errors estimated using small sample adjustments as in Tipton (2015) are in parentheses. *** p<0.01, ** p<0.05, * p<0.1 Number of observations (N) refers to number of study-specific average effects used in the specification. Appendix Table 2 presents the groups classified as "low socio-economic" sub-populations. All specifications include a constant term (the average effect size), which is omitted here. The coefficient on the low socio-economic status variable represents the average difference between study-specific average effect sizes between children coming from low and high socio-economic backgrounds. Table 10: Returns to cognitive and social-emotional skills from low- and middle-income studies Total costs Lower-bound Upper-bound per child Preprimary-augmentation to lifetime earnings (dollars) benefit-to-cost benefit-to-cost (dollars) ratio ratio (1) (2) (3) (4) (5) (6) (7) (8) (9) Panel A: Discount rate = 3% India (Ganimian et al, 2021) 11.1 11.8 India (Dillon et al, 2017) $53 $913 $2,845 $1,429 $302 $2,234 $817 5.7 54.0 Malawi (Ozler et al, 2018) $72 $782 $2,010 $1,109 $259 $1,487 $586 3.6 28.1 Ghana (Wolf et al, 2019a) $16 $563 $1,657 $855 $186 $1,280 $478 11.6 103.5 Peru (Gallego et al, 2021) $37 $3,264 $1,080 29.2 88.2 Bangladesh (Spier et al, 2020) $145 $1,542 $4,175 $2,244 $510 $3,143 $1,212 3.5 28.8 Panel B: Discount rate = 5% India (Dillon et al, 2017) $53 $411 $1,281 $643 $136 $1,006 $368 2.6 24.3 Malawi (Ozler et al, 2018) $72 $370 $950 $524 $122 $703 $277 1.7 13.3 Ghana (Wolf et al, 2019a) $16 $271 $796 $411 $89 $615 $230 5.6 49.7 Peru (Gallego et al, 2021) $37 $1,587 $525 14.2 42.9 Bangladesh (Spier et al, 2020) $145 $738 $1,999 $1,074 $244 $1,505 $580 1.7 13.8 High cognitive wage returns Yes Yes Yes No No No Low cognitive wage returns No No No Yes Yes Yes High social-emotional wage returns No Yes No No Yes No Low social-emotional wage returns No No Yes No No Yes Notes: For each country, we first calculate the present discounted value of a child's lifetime earnings, assuming that the child works from age 22 to 65. To do this, we use the average nominal monthly wages for the latest year available, as well as the labor force participation rate and the average growth in real wages for the 2015- 2019 period. See Appendix Table 4 for these parameters and our sources of data for all studies except for Ganimian et al, (2021), for which we directly use their estimates of benefit to cost ratios. In Panel A, when discounting future earnings to the present value, we assume a discount rate of 3 percent per year. In Panel B, we assume 5 percent per year. To calculate gains to lifetime earnings from preprimary interventions, we use the lowest significant treatment effect on cognitive skills reported in standard deviations in the paper and the lowest significant treatment effect on social-emotional skills. We use high and low estimates from the literature on the returns to a standard deviation increase in cognitive or social-emotional skills. Kline and Walters (2016) use an annual return of 13 percent per standard deviation of cognitive skills, while Galasso and Wagstaff (2018) use 4.3 percent. For the returns to social-emotional skills, Belfield et al (2015) use a range of 4 percent to 15 percent. We vary our assumptions on these returns in Columns 2 through 7. Because papers (with the exception of Ganimian et al., (2021)) reported costs per child in US dollars, we discounted lifetime earnings to the year costs were incurred and used nominal exchange rates collected by the International Monetary Fund to convert earnings to dollars in the year costs were incurred. In Columns 8 and 9, we report the ratio of benefits to costs for an individual child. Appendix—For Online Publication Only Appendix 1. Mapping of extracted outcomes This section describes the skill categories we used to aggregate outcomes extracted from each study. Literacy outcomes include early reading skills (e.g., letter/word recognition, decoding, print and phonological awareness), and writing, as well as reading comprehension and performance in achievement tests in studies that focus on longer-term impact. We classify constructs like expressive and receptive vocabulary, and oral comprehension as language. Math outcomes include early mathematics knowledge and skills such as numerical identification and shape recognition, as well as math test scores beyond preprimary. Under general cognition we mapped more general knowledge and learning skills ranging from general intellectual ability, such as problem solving and communication skills, to summary indexes of achievement scores on literacy, language, and math skills, as captured by instruments such as the Early Development Instrument (Janus and Offord, 2007), International Development and Early Learning Assessment (Pisani, Borisova, and Dowd 2015), and the Woodcock–Johnson (Woodcock, McGrew, & Mather, 2001). We classified outcomes representing constructs such as attention, inhibition, working memory, and cognitive flexibility as executive functions. Social-emotional outcomes include social cognition, social competence, emotional regulation, prosociability, and relationships with peers, among others. Behavioral outcomes include reported or observed measures of aggression, internalizing and externalizing behavior problems, conduct, and disciplinary actions during preprimary and basic education, as well as indicators such as crime, arrests, convictions, and incarceration during adulthood. There is some overlap across these last three outcome categories, which include a very wide range of measures that are often grouped together in the early childhood development literature. Health outcomes include anthropometric measures such as weight, height, and their standardized equivalents (WAZ, or z-scores for weight-for-age and HAZ, or z-scores for height-for-age), as well as indicators for general child health. For adults, health included both physical and mental health. Motor development encompasses indicators of fine and gross motor development. Appendix 2. Process for standardizing outcomes across studies Continuous variables To standardize continuous variables that were not already standardized, we needed to scale the estimated coefficient (or “raw” effect size, ) by the standard deviation (sd) of the outcome variable for the control group at follow-up to get the standardized effect d. = (1) Very few studies, however, report this standard deviation. When the standard deviation was not available, we use an approximation for standard deviation using the sample sizes of treatment ( ) and control ( ) groups: = × (2) ×√ ( + ) where SE corresponds to the standard error of . When studies did not include even a standard error of the estimated coefficient, we used the p- value (or an approximation of it) to estimate the standardized effect size for outcomes using the following formula: = −( (3) /2) where qnorm gives us the z-score of the pth quantile of the normal distribution. Some studies reported baseline or follow-up sample sizes only for the entire sample, not separately for the treatment and control groups. In these cases, we assume that the total sample size was equally distributed across experimental groups, unless information in the text or elsewhere indicated otherwise. For example, in the Baker et al., (2008) differences-in-differences study, we assigned 23 percent of the sample to the treatment group since 23 percent of the Canadian population lives in Quebec. When the follow-up sample size for an estimation is not specified, we assume the sample size reported at baseline and adjust for attrition, if reported. We use the following formula to calculate the standard errors associated with d, where t corresponds to the t-statistic corresponding to the raw estimated coefficient. d = (4) Finally, if a study only reported stars to indicate statistical significance at the 1, 5 or 10 percent levels, we calculate the t-stat, assuming the following p-values: 0.005, 0.04, and 0.09, respectively.1 Binary variables To standardize binary variables, we first have to convert raw effect sizes into odds ratios (OR) using follow-up means of success (enrollment rate, for example) and failure (non-enrollment rate). ⁄(1− ) = ⁄(1−) (5) For randomized studies conducted at the individual level that report raw means at follow-up, we would need the outcome variable means at follow-up— , the outcome mean for the control group at follow-up and and = + , the mean of the treatment group at follow- up. Because some studies do not report raw means at follow-up and instead report estimates of impact adjusted by baseline characteristics or fixed effects, we can also add the impact estimate of treatment obtained from the regression model to the follow-up mean of the control group to get the covariate-adjusted success rate in the treatment group. When treatment is assigned at an aggregated level (for example, schools or communities) rather than the individual level, estimated standard errors must also take into account intracluster correlation in the outcome variable. Otherwise, we might overstate the precision of the estimates. We follow Wilson (2011) and convert the logged OR (ln⁡()) into a standardized effect size (d). As the logistic distribution is similar to the normal distribution, we can convert each ln⁡() into a d using the following formula: ln⁡() = 1.814 (4) 1Some studies only reported “Not significant” or p-value = 0 or p-value = 1. For these cases, we assume p-value = 0.55, 0.005, and 0.99, respectively. Then, the standard error of d can be calculated using the standard error of the coefficient estimate for the treatment indicator from the regression model as follows: d = (5) where z is either a z- or t-test associated with the treatment effect from the regression model. As we did for continuous variables, we used an approximation to calculate the t-stat when studies only reported levels of significance. In sum, to calculate standardized effects sizes and their standard errors within each study for binary variables, we needed the outcome mean in the control group at follow-up and the estimated coefficient. These two elements allowed us to calculate the covariate-adjusted follow- up mean outcome in the treatment group. Then, we calculate ln(OR) and convert it into d using Equation (4). Finally, we calculate the standard error of d using the formula (5). When the follow-up mean was not reported for the sample or for a subgroup (T or C), we assumed that the follow-up mean was equal to the baseline mean for the control group. We excluded a study from a regression if it only reported an impact estimate without any reference to a control mean at baseline or follow-up. We also excluded studies from a regression when the follow-up mean outcome in the control group plus the covariate-adjusted impact estimate from the regression model was larger than 1, as in these cases, it is not possible to calculate an odds ratio. Appendix 3. Tables Appendix Table 1: Characteristics of the included studies for review Country Program Name Study Intervention Components Discipline Publication Type IE Design (1) (2) (2) (3) (4) (5) (6) HIC - High Income Countries 1 Canada Quebec Universal Childcare (Baker et al. 2008) Subsidized or free access Economics Journal article DID Program 2 Chile Chile - Un Buen Comienzo (Bowne et al. 2016b) Teacher PD/ Education Journal article RCT + Pedagogy change/ HLM Provision of materials 3 Chile Chile - Un Buen Comienzo (Yoshikawa et al. 2015) Teacher PD/ Psychology Journal article RCT Pedagogy change/ Provision of materials 4 Denmark Denmark Preschool (Rossin-Slater and Wust 2020) Preschool construction Economics Journal article DID Expansion (1933-1960) 5 Norway Norway Universal Childcare (Drange et al 2016) Subsidized or free access/ Economics Journal article DID Program I Curriculum 6 Norway Norway Universal Childcare (Havnes and Mogstad 2011) Subsidized or free access Economics Journal article DID Program II 7 Norway Norway Universal Childcare (Havnes and Mogstad 2015) Subsidized or free access Economics Journal article DID + QR Program II 8 Spain Spain Universal Childcare (Felfe et al 2014) Subsidized or free access/ Economics Journal article DID Program Teacher PD/ Curriculum 9 U.S. USA High/Scope Perry (Heckman et al 2010) Subsidized or free access Economics Journal article RCT Preschool Program 10 U.S. USA Boston Pre-K - Building (Clements et al. 2011) Teacher PD/ Education Journal article RCT + Blocks Curriculum Curriculum/ HLM Provision of materials 11 U.S. USA Boston Pre-K - Building (Clements et al. 2013) Teacher PD/ Education Journal article RCT + Blocks Curriculum Curriculum/ HLM Provision of materials 12 U.S. USA Boston Pre-K - (Weiland and Yoshikawa Subsidized or free access/ Education Journal article RDD Subsidized Preschool 2013) Teacher PD/ Curriculum 13 U.S. USA Boston Pre-K - (Gray-Lobe et al 2021) Subsidized or free access/ Economics Journal article RCT + IV Subsidized Preschool Teacher PD/ Curriculum 14 U.S. USA Great Start Teacher PD (Neuman and Cunningham Teacher PD Education Journal article RCT + Initiative 2009) ANOVA 15 U.S. USA Head Start - CSRP (Raver et al. 2008) Teacher PD/ Education Journal article RCT + Health and nutrition HLM 16 U.S. USA Head Start - CSRP (Raver et al. 2009) Teacher PD/ Psychology Journal article RCT + Health and nutrition HLM 17 U.S. USA Head Start - CSRP (Raver et al. 2011) Teacher PD/ Education Journal article RCT + Health and nutrition HLM 18 U.S. USA Head Start - CSRP (Watts et al 2018) Teacher PD/ Education Journal article RCT Health and nutrition 19 U.S. USA Head Start - Increased (Johnson and Jackson 2019) Subsidized or free access Economics Journal article DID + IV Spending 20 U.S. USA Head Start - REDI (Bierman et al. 2008) Teacher PD/ Education Journal article RCT + Program Curriculum/ HLM Provision of materials/ Parental engagement 21 U.S. USA Head Start - Subsidized (Bitler et al 2014) Subsidized or free access Economics Working Paper RCT + IV Preschool 22 U.S. USA Head Start - Subsidized (Carneiro and Ginja 2014) Subsidized or free access Economics Journal article RDD Preschool 23 U.S. USA Head Start - Subsidized (Frisvold and Lumeng 2011) Subsidized or free access/ Economics Journal article DID Preschool Preschool day extension 24 U.S. USA Head Start - Subsidized (Gelber and Isen 2013) Subsidized or free access Economics Journal article RCT + IV Preschool 25 U.S. USA Head Start - Subsidized (Kline and Walters 2016) Subsidized or free access Economics Journal article RCT Preschool 26 U.S. USA Head Start - Subsidized (Bloom and Weiland 2015) Subsidized or free access Education Working Paper RCT + Preschool HLM 27 U.S. USA Head Start - Subsidized (Bailey et al 2020) Subsidized or free access Economics Journal article DID Preschool 28 U.S. USA Head Start - Teacher (Hamre et al 2012b) Teacher PD/ Education Journal article RCT + PD & PATHS Curriculum Curriculum HLM 29 U.S. USA Head Start - Teacher (Pianta et al. 2017) Teacher PD Education Journal article RCT + PD Program II HLM 30 U.S. USA Head Start - Teacher (Powell et al. 2010) Teacher PD Education Journal article RCT + PD Program I HLM 31 U.S. USA Oklahoma/Tulsa (Gormley et al. 2011) Subsidized or free access Education Journal article PSM Universal Pre-K 32 U.S. USA ParentCorp Project (Brotman et al. 2016) Teacher PD/ Education Journal article RCT + PSM Curriculum/ Parental engagement 33 U.S. USA Pre-K Programs (State (Wong et al. 2008) Subsidized or free access Economics Journal article RDD level) 34 U.S. USA Tennessee Pre-K (Lipsey et al. 2018) Subsidized or free access Education Journal article RCT + HLM 35 U.S. USA Tools of the Mind (Barnett et al. 2008) Teacher PD/ Education Journal article RCT Curriculum Curriculum/ Provision of materials 36 U.S. USA Tools of the Mind (Blair and Raver 2014) Teacher PD/ Psychology Journal article RCT Curriculum Curriculum 37 U.S. USA Tools of the Mind (Diamond et al 2007) Teacher PD/ Economics Journal article RCT Curriculum Curriculum LMIC - Low- & Middle-Income Countries 38 Argentina Argentina Preschool (Berlinski et al. 2009) Preschool construction/ Economics Journal article DID Construction Curriculum 39 Bangladesh Bangladesh Early Years (Spier at al 2020) Subsidized or free access Education Technical Report RCT + IV Preschool Program 40 Cambodia Cambodia Preschool (Berkes et al 2019) Preschool construction/ Economics Working Paper RCT Construction II Community outreach/ Parental engagement 41 Cambodia Cambodia Preschool (Bouguen et al. 2018) Preschool construction/ Economics Journal article RCT Construction I Teacher PD/ Provision of materials 42 Colombia Colombia Preschools - HIM (Andrew et al. 2019) Teacher PD/ Economics Working Paper RCT & FE Programs Provision of materials/ Provision of staff 43 Gambia Gambia Preschool (Blimpo et al 2019) Preschool construction/ Education Working Paper RCT Construction Curriculum/ Teacher PD 44 Ghana Quality Preschool for Ghana (Wolf 2019) Teacher PD/ Education Journal article RCT + (QP4G) Parental engagement HLM 45 Ghana Quality Preschool for Ghana (Wolf and Peel 2019) Teacher PD/ Education Journal article RCT + (QP4G) Parental engagement HLM 46 Ghana Quality Preschool for Ghana (Wolf et al. 2018) Teacher PD/ Education Journal article RCT + (QP4G) Parental engagement HLM 47 Ghana Quality Preschool for Ghana (Wolf et al. 2019) Teacher PD/ Education Journal article RCT + (QP4G) Parental engagement HLM 48 India India Pratham Game-Based (Dillon et al. 2017) Curriculum/ Economics Journal article RCT Math Curriculum Provision of materials 49 India India’s Integrated Child (Ganimian et al 2021) Provision of staff Economics Journal article RCT Development Services (ICDS) 50 Indonesia Indonesia Preschool (Brinkman et al. 2017) Preschool construction/ Education Journal article DID + IV Construction Teacher PD/ Community outreach 51 Malawi Malawi Community-based (Ozler et al. 2018) Teacher PD/ Economics Journal article RCT Childcare Centers Teacher Payments/ Parental engagement 52 Mozambique Mozambique Preschool (Martinez et al. 2017) Preschool construction/ Economics Working Paper RCT Construction Teacher PD/ Parental engagement 53 Paraguay, Inquiry and Problem-Based (Bando et al 2019) Teacher PD/ Economics Working Paper RCT Peru Pedagogy (IPP) Pedagogy change/ Provision of materials 54 Peru Inquiry and Problem-Based (Gallego et al 2019) Teacher PD/ Economics Journal article RCT Pedagogy (IPP) Pedagogy change/ Provision of materials 55 Uruguay Uruguay Preschool (Berlinski et al. 2008) Preschool construction Economics Journal article IV Construction Appendix Table 2: Characteristics of the programs evaluated Country Program name Intervention components Provision type Target Target Type of Implementer type Number group population location of studies (years) in review (1) (2) (2) (3) (4) (5) (6) (7) (8) HIC - High Income Countries Canada Quebec Universal Childcare Subsidized or free access Daycare (w/education 3-4 General Urban & Rural Government 1 Program component) Chile Chile - Un Buen Comienzo Teacher PD/ Formal classroom 4-5 Disadvantaged Urban Government 2 Pedagogy change/ Provision of materials Denmark Denmark Preschool Preschool construction Preschool - formal 3-7 Disadvantaged Urban & Rural Government + 1 Expansion (1933-1960) classroom Private Norway Norway Universal Childcare Subsidized or free access/ Formal classroom 5-6 General Urban & Rural Government 1 Program I Curriculum Norway Norway Universal Childcare Subsidized or free access Daycare (w/education 3-6 General Urban & Rural Government + 2 Program II component) Private Spain Spain Universal Childcare Subsidized or free access/ Daycare (w/education 3 General Urban & Rural Government + 1 Program Teacher PD/ component) Private Curriculum U.S. USA High/Scope Perry Subsidized or free access Preschool - formal 3-5 Disadvantaged Urban & Rural Government 1 Preschool Program classroom U.S. USA Boston Pre-K - Building Teacher PD/ Formal classroom 3-5 Disadvantaged Urban Government 2 Blocks Curriculum Curriculum/ Provision of materials U.S. USA Boston Pre-K - Subsidized or free access/ Formal classroom 4-5 General Urban Government 2 Subsidized Preschool Teacher PD/ Curriculum U.S. USA Great Start Teacher PD Teacher PD Formal classroom 3-5 Disadvantaged Urban Government + NGO 1 Initiative U.S. USA Head Start - CSRP Teacher PD/ Formal classroom 3-5 Disadvantaged Urban & Rural Government + 4 Health and nutrition Private U.S. USA Head Start - Increased Subsidized or free access Formal classroom 4 Disadvantaged Urban & Rural Government 1 Spending U.S. USA Head Start - REDI Teacher PD/ Formal classroom 4 Disadvantaged Urban & Rural No info 1 Program Curriculum/ Provision of materials/ Parental engagement U.S. USA Head Start - Subsidized Subsidized or free access Formal classroom 4 Disadvantaged Urban & Rural Government + 7 Preschool Private U.S. USA Head Start - Teacher PD Teacher PD/ Formal classroom 3-4 Disadvantaged No info Government 1 & PATHS Curriculum Curriculum U.S. USA Head Start - Teacher PD Teacher PD Formal classroom 4 Disadvantaged Urban & Rural No info 1 Program I U.S. USA Head Start - Teacher PD Teacher PD Formal classroom 4 Disadvantaged Urban Government + 1 Program II Private U.S. USA Oklahoma/Tulsa Subsidized or free access Formal classroom 4 Disadvantaged Urban Government 1 Universal Pre-K U.S. USA ParentCorp Project Teacher PD/ Daycare (w/education 4 Disadvantaged Urban Government 1 Curriculum/ component) Parental engagement U.S. USA Pre-K Programs (State Subsidized or free access Formal classroom 4 General Urban & Rural Government 1 level) U.S. USA Tennessee Pre-K Subsidized or free access Formal classroom 4-5 Disadvantaged Urban & Rural Government 1 U.S. USA Tools of the Mind Teacher PD/ Formal classroom 4-5 Disadvantaged Urban Government 3 Curriculum Curriculum LMIC - Low- & Middle-Income Countries Argentina Argentina Preschool Preschool construction/ Formal classroom 3-5 Disadvantaged Urban Government 1 Construction Curriculum Bangladesh Bangladesh Early Years Subsidized or free access Formal classroom 4 General Rural Government 1 Preschool Program Cambodia Cambodia Preschool Preschool construction/ Community-based 3-6 Disadvantaged Rural Government 1 Construction I Teacher PD/ Provision of materials Cambodia Cambodia Preschool Preschool construction/ Community-based 3-5 General Urban & Rural Government 1 Construction II Community outreach/ Parental engagement Colombia Colombia Preschools - HIM Teacher PD/ Community-based 2-5 Disadvantaged Urban & Rural Government + NGO 1 & FE Programs Provision of materials/ Provision of staff Gambia Gambia Preschool Preschool construction/ Community-based 3-6 Disadvantaged Rural Government 1 Construction Curriculum/ Teacher PD Ghana Quality Preschool for Ghana Teacher PD/ Formal classroom 4-6 Disadvantaged Urban & Rural Government 4 (QP4G) Parental engagement India India Pratham Game-Based Curriculum/ Formal classroom 5 Disadvantaged Urban International Non- 1 Math Curriculum Provision of materials Governmental (ING) India India’s Integrated Child Provision of staff Formal classroom 3-6 Disadvantaged Urban & Rural Government 1 Development Services (ICDS) Indonesia Indonesia Preschool Preschool construction/ Community-based 3-6 Disadvantaged Rural Government + IO 1 Construction Teacher PD/ Community outreach Malawi Malawi Community-based Teacher PD/ Community-based 3-5 Disadvantaged Rural Government + ING 1 Childcare Centers Teacher Payments/ Parental engagement Mozambique Mozambique Preschool Preschool construction/ Community-based 3-5 Disadvantaged Rural International Non- 1 Construction Teacher PD/ Governmental (ING) Parental engagement Paraguay, Inquiry and Problem-Based Teacher PD/ Formal classroom 5 General Urban & Rural Government 2 Peru Pedagogy (IPP) Pedagogy change/ Provision of materials Uruguay Uruguay Preschool Preschool construction Formal classroom 3-5 General Urban & Rural Government 1 Construction Appendix Table 3: Mapping of subpopulations to high and low socioeconomic status. Low socioeconomic status High socioeconomic status Assets below median Assets above median Black Non-Black Father not at home Father at home Free lunch Not free lunch Low income High income Mother less educated Mother more educated Mother without education Mother with education Mother without high school Mother with high school No parent with high school One parent with high school Parents with high school or less Parents with more than high school Poor Non-poor Single-parent Two-parent Stunted child Unstunted child Non-White White Mother with college Hispanic High-poverty schools Appendix Table 4: Parameter values and data sources for calculating benefit-to-cost ratio for preprimary interventions in low- and middle- income countries Average Lowest Lowest nominal treatment Labor force Annual treatment monthly effect on participation real wage effect on Data sources wage social- rate growth cognitive (country emotional skills (SD) currency) skills (SD) (1) (2) (3) (5) (6) (7) Periodic Labour Force Survey, 2019; India Ministry of Statistics and India (Dillon et al, 2017) 46.3 13143 0 0.09 0.165 Programme Implementation; Bloomberg Newswire Integrated Household Survey, 2017 ; Malawi (Ozler et al, 2018) 37.2 125000 3.4 0.185 0.252 National Statistical Office of Malawi Living Standards Survey, 2017; Ghana (Wolf et al, 2019a) 57 618 2.16 0.107 0.18 Ghana Statistical Service; Bloomberg Newswire Encuesta Nacional de Hogares, 2019; Peru (Gallego et al, 2019) 77.4 1570 1.7 0.19 --- ILO SIALC; Bloomberg Newswire National Labour Force Survey, 2020; Indonesia (Brinkman et al, 2017) 70.1 2913897 -0.2 --- 0.158 Statistics Indonesia of the Republic of Indonesia; Bloomberg Newswire Labour Force Survey, 2017; Bangladesh (Spier et al, 2020) 59.4 12016 3 0.25 0.37 Bangladesh Bureau of Statistics Notes: Missing recent data for average real wage growth for Ghana, we averaged values across other countries. Country currencies were the following: India (INR), Malawi (MWK), Mozambique (MZN), Ghana (GHS), Peru (PEN), Indonesia (IDR), and Bangladesh (BDT) Appendix 4. Figures Appendix Figure 1: Effect sizes on school participation and progression (preprimary) ................................ 15 Appendix Figure 2: Effect sizes on language skills in HICs (preprimary) ...................................................... 16 Appendix Figure 3: Effect sizes on language skills in LMICs (preprimary)..................................................... 17 Appendix Figure 4: Effect sizes on cognitive literacy skills (preprimary) ........................................................ 18 Appendix Figure 5: Effect sizes on cognitive math skills (preprimary) ............................................................ 19 Appendix Figure 6: Effect sizes on cognitive general skills (preprimary) ........................................................ 20 Appendix Figure 7: Effect sizes on executive function skills (preprimary) ...................................................... 21 Appendix Figure 8: Effect sizes on social-emotional skills in HICs (preprimary) ........................................... 22 Appendix Figure 9: Effect sizes on social-emotional skills in LMICs (preprimary) ........................................ 23 Appendix Figure 10: Effect sizes on behavioral skills (preprimary) ................................................................. 24 Appendix Figure 11: Effect sizes on parental behavior (preprimary) ............................................................... 25 Appendix Figure 12: Effect sizes on teaching practices in HIC (preprimary) .................................................. 26 Appendix Figure 13: Effect sizes on teaching practices in LMICs (preprimary & post-preprimary) ........... 27 Appendix Figure 14: Effect sizes on teacher professional well-being (preprimary) ....................................... 28 Appendix Figure 15: Effect sizes on health and motor outcomes (preprimary) .............................................. 29 Appendix Figure 16: Effect sizes on school participation and progression (post-preprimary) ..................... 30 Appendix Figure 17: Effect sizes on school participation and progression (adulthood) ................................ 31 Appendix Figure 18: Effect sizes on language skills (post-preprimary) ........................................................... 32 Appendix Figure 19: Effect sizes on cognitive literacy skills (post-preprimary) ............................................. 33 Appendix Figure 20: Effect sizes on cognitive math skills (post-preprimary) ................................................. 34 Appendix Figure 21: Effect sizes on cognitive general skills (post-preprimary) ............................................. 35 Appendix Figure 22: Effect sizes on executive function skills (post-preprimary) ........................................... 36 Appendix Figure 23: Effect sizes on social-emotional skills (post-preprimary) .............................................. 37 Appendix Figure 24: Effect sizes on behavioral skills (post-preprimary and adulthood) .............................. 38 Appendix Figure 25: Effect sizes on health and motor outcomes (post-preprimary and adulthood) .......... 39 Appendix Figure 26: Effect sizes on labor outcomes in HICs (adulthood)....................................................... 40 Appendix Figure 27: Effect sizes on parental behavior (post-preprimary) ...................................................... 41 Note: These Appendix Figures present study-specific average treatment effects of preprimary programs on outcomes expressed in terms of standard deviations. Results are drawn from 55 experimental and quasi experimental studies that report impacts on a wide variety of outcome categories. Effect sizes to the right of zero represent an improvement of the outcome, while effect sizes to the left of zero represent a worsening of the outcome. (*) indicates that the connotation of the outcome is negative, and the representation of the effect size has been adjusted to show either an improvement or worsening according to each specific case. Appendix Figure 1: Effect sizes on school participation and progression (preprimary) Appendix Figure 2: Effect sizes on language skills in HICs (preprimary) Appendix Figure 3: Effect sizes on language skills in LMICs (preprimary) Appendix Figure 4: Effect sizes on cognitive literacy skills (preprimary) Appendix Figure 5: Effect sizes on cognitive math skills (preprimary) Appendix Figure 6: Effect sizes on cognitive general skills (preprimary) Appendix Figure 7: Effect sizes on executive function skills (preprimary) Appendix Figure 8: Effect sizes on social-emotional skills in HICs (preprimary) Appendix Figure 9: Effect sizes on social-emotional skills in LMICs (preprimary) Appendix Figure 10: Effect sizes on behavioral skills (preprimary) Appendix Figure 11: Effect sizes on parental behavior (preprimary) Appendix Figure 12: Effect sizes on teaching practices in HIC (preprimary) Appendix Figure 13: Effect sizes on teaching practices in LMICs (preprimary & post-preprimary) Appendix Figure 14: Effect sizes on teacher professional well-being (preprimary) Appendix Figure 15: Effect sizes on health and motor outcomes (preprimary) Appendix Figure 16: Effect sizes on school participation and progression (post-preprimary) Appendix Figure 17: Effect sizes on school participation and progression (adulthood) Appendix Figure 18: Effect sizes on language skills (post-preprimary) Appendix Figure 19: Effect sizes on cognitive literacy skills (post-preprimary) Appendix Figure 20: Effect sizes on cognitive math skills (post-preprimary) Appendix Figure 21: Effect sizes on cognitive general skills (post-preprimary) Appendix Figure 22: Effect sizes on executive function skills (post-preprimary) Appendix Figure 23: Effect sizes on social-emotional skills (post-preprimary) Appendix Figure 24: Effect sizes on behavioral skills (post-preprimary and adulthood) Appendix Figure 25: Effect sizes on health and motor outcomes (post-preprimary and adulthood) Appendix Figure 26: Effect sizes on labor outcomes in HICs (adulthood) Appendix Figure 27: Effect sizes on parental behavior (post-preprimary) Appendix 5. Included studies Andrew, A., O. Attanasio, R. Bernal, L.C. Sosa, S. Krutikova, and M. Rubio-Codina. 2019. Preschool quality and child development . (No. w26191): National Bureau of Economic Research. Bailey, M.J., B.D. Timpe, and S. Sun. 2020. Prep School for poor kids: The long-run impacts of Head Start on Human capital and economic self-sufficiency . (No. w28268): National Bureau of Economic Research. Baker, M., J. Gruber, and K. Milligan. 2008. "Universal child care, maternal labor supply, and family well- being." Journal of Political Economy, 116 (4): 709-745. Bando, R., E. Näslund-Hadley, and P. Gertler. 2019. Effect of inquiry and problem based pedagogy on learning: Evidence from 10 field experiments in four countries. (No. w26280): National Bureau of Economic Research. Barnett, W.S., K. Jung, D.J. Yarosz, J. Thomas, A. Hornbeck, R. Stechuk, and S. Burns. 2008. "Educational effects of the Tools of the Mind curriculum: A randomized trial." Early Childhood Research Quarterly 23 (3): 299-313. Berkes, J., A. Bouguen, D. Filmer, and T. Fukao. 2019. Improving Preschool Provision and Encouraging Demand: Heterogeneous Impacts of a Large-Scale Program. Washington D.C.: The World Bank Working Paper Series. Berlinski, S., S. Galiani, and M. Manacorda. 2008. "Giving children a better start: Preschool attendance and school-age profiles." Journal of Public Economics 92 (5-6): 1416-1440. Berlinski, S., S. Galiani, and P. Gertler. 2009. "The effect of pre-primary education on primary school performance." Journal of Public Economics 93 (1-2): 219-234. Bierman, K.L., C.E. Domitrovich, R.L. Nix, S.D. Gest, J.A. Welsh, M.T. Greenberg, C. Blair, K.E. Nelson, and S. Gill. 2008. "Promoting academic and social‐emotional school readiness: The Head Start REDI program." Child Development 79 (6): 1802-181. Bitler, M.P., H.W. Hoynes, and T. Domina. 2014. Experimental evidence on distributional effects of Head Start. (No. w20434): National Bureau of Economic Research. Blair, C., and C.C. Raver. 2014. "Closing the achievement gap through modification of neurocognitive and neuroendocrine function: Results from a cluster randomized controlled trial of an innovative approach to the education of children in kindergarten." PloS One 9 (11): e112393. Blimpo, M., P.M. Carneiro, P. Jervis, and T. Pugatch. 2019. Improving access and quality in early childhood development programs: Experimental evidence from the Gambia. Washington D.C.: World Bank Policy Research Working Paper, (8737). Bloom, H.S., and C. Weiland. 2015. Quantifying variation in Head Start effects on young children's cognitive and socio-emotional skills using data from the National Head Start Impact Study. Available at SSRN 2594430. Bouguen, A., D. Filmer, K. Macours, and S. Naudeau. 2018. "Preschool and parental response in a second best world evidence from a school construction experiment." Journal of Human Resources (The World Bank Working Paper Series) 53 (2): 474-512. Bowne, J.B., H. Yoshikawa, and C.E. Snow. 2016. "Experimental impacts of a teacher professional development program in early childhood on explicit vocabulary instruction across the curriculum." Early Childhood Research Quarterly 34: 27-39. Brinkman, S.A., A. Hasan, H. Jung, A. Kinnell, and M. Pradhan. 2017. "The impact of expanding access to early childhood education services in rural Indonesia." Journal of Labor Economics 35 (S1): S305-S335. Brotman, L.M., S. Dawson-McClure, D. Kamboukos, K.Y. Huang, E.J. Calzada, K. Goldfeld, and E. Petkova. 2016. "Effects of ParentCorps in prekindergarten on child mental health and academic performance: Follow-up of a randomized clinical trial through 8 years of age." JAMA Pediatrics 170 (12): 1149-1155. Carneiro, P., and R. Ginja. 2014. "Long-term impacts of compensatory preschool on health and behavior: Evidence from Head Start." American Economic Journal: Economic Policy 6 (4): 135-73. Clements, D.H., J. Sarama, M.E. Spitler, A.A. Lange, and C.B. Wolfe. 2011. "Mathematics learned by young children in an intervention based on learning trajectories: A large-scale cluster randomized trial." Journal for Research in Mathematics Education 42 (2): 127-166. Clements, D.H., J., Sarama, C.B. Wolfe, and M.E. Spitler. 2013. "Longitudinal evaluation of a scale-up model for teaching mathematics with trajectories and technologies: Persistence of effects in the third year." American Educational Research Journal 50 (4): 812-850. Diamond, A., W.S. Barnett, J. Thomas, and S. Munro. 2007. "Preschool program improves cognitive control." Science 318 (5855): 1387. Dillon, M.R., H. Kannan, J.T. Dean, E.S. Spelke, and E. Duflo. 2017. "Cognitive science in the field: A preschool intervention durably enhances intuitive but not formal mathematics." Science 357 (6346): 47-55. Drange, N., T. Havnes, and A.M. Sandsør. 2016. "Kindergarten for all: Long run effects of a universal intervention." Economics of Education Review 53: 164-181. Felfe, C., N. Nollenberger, and N. Rodríguez-Planas. 2015. "Can’t buy mommy’s love? Universal childcare and children’s long-term cognitive development." Journal of Population Economics 28 (2): 393- 422. Frisvold, D.E., and J.C. Lumeng. 2011. "Expanding exposure can increasing the daily duration of head start reduce childhood obesity?" Journal of Human Resources 46 (2): 373-402. Gallego, F.A., E. Näslund-Hadley, and M. Alfonso. 2021. "Changing Pedagogy to Improve Skills in Preschools: Experimental Evidence from Peru." The World Bank Economic Review 35 (1): 261- 286. Ganimian, A.J., K. Muralidharan, and C.R. Walters. 2021. Augmenting State Capacity for Child Development: Experimental Evidence from India. (No. w28780): National Bureau of Economic Research. Gelber, A., and A. Isen. 2013. " Children's schooling and parents' behavior: Evidence from the Head Start Impact Study." Journal of Public Economics 101: 25-38. Gormley, W.T., D.A. Phillips, K. Newmark, K. Welti, and S. Adelstein. 2011. "Social‐emotional effects of early childhood education programs in Tulsa." Child Development 82 (6): 2095-2109. Gray-Lobe, G., P.A. Pathak, and C.R. Walters. 2021. The Long-Term Effects of Universal Preschool in Boston. (No. w28756): National Bureau of Economic Research. Hamre, B. K., R. C. Pianta, A. J. Mashburn, and J. T. Downer. 2012. "Promoting young children's social competence through the preschool PATHS curriculum and MyTeachingPartner professional development resources." Early Education & Development 23 (6): 809-832. Havnes, T., and M. Mogstad. 2015. "Is universal child care leveling the playing field?" Journal of Public Economics 127: 100-114. Havnes, T., and M. Mogstad. 2011. "No child left behind: Subsidized child care and children's long-run outcomes." American Economic Journal: Economic Policy 3 (2): 97-129. Heckman, J.J., S.H. Moon, R. Pinto, P.A. Savelyev, and A. and Yavitz. 2010. "The rate of return to the HighScope Perry Preschool Program." Journal of Public Economics 94 (1-2): 114-128. Johnson, R.C., and C.K. Jackson. 2019. "Reducing inequality through dynamic complementarity: Evidence from Head Start and public school spending." American Economic Journal: Economic Policy 11 (4): 310-49. Kline, P., and C.R. Walters. 2016. "Evaluating public programs with close substitutes: The case of Head Start." The Quarterly Journal of Economics, 131 (4): 1795-1848. Lipsey, M.W., D.C. Farran, and K. Durkin. 2018. "Effects of the Tennessee Prekindergarten Program on children’s achievement and behavior through third grade." Early Childhood Research Quarterly 45: 155-176. Martinez, S., S. Naudeau, and V. Pereira. 2017. Preschool and child development under extreme poverty: evidence from a randomized experiment in rural Mozambique. Washington D.C.: The World Bank. Neuman, S.B., and L. Cunningham. 2009. "The impact of professional development and coaching on early language and literacy instructional practices." American Educational Research Journal 46 (2): 532- 566. Ozler, B., L. C. H. Fernald, P. Kariger, C. McConnell, M. Neuman, and E. Fraga. 2018. "Combining preschool teacher training with parenting education: A clusterrandomized controlled trial." Journal of Development Economics (The World Bank.) 133: 448–467. Pianta, R., B. Hamre, J. Downer, M. Burchinal, A. Williford, J. Locasale-Crouch, C. Howes, K. La Paro, and C. Scott-Little. 2017. "Early childhood professional development: Coaching and coursework effects on indicators of children’s school readiness." Early Education and Development 28 (8): 956-975. Powell, D.R., K.E. Diamond, M.R. Burchinal, and M.J. Koehler. 2010. "Effects of an early literacy professional development intervention on head start teachers and children." Journal of Educational Psychology 102 (2): 299. Raver, C.C., S.M. Jones, C. Li‐Grining, F. Zhai, K. Bub, and E. Pressler. 2011. "CSRP’s impact on low‐ income preschoolers’ preacademic skills: self‐regulation as a mediating mechanism." Child Development 82 (1): 362-378. Raver, C.C., S.M. Jones, C. Li-Grining, F. Zhai, M.W. Metzger, and B. Solomon. 2009. "Targeting children's behavior problems in preschool classrooms: A cluster-randomized controlled trial." Journal of Consulting and Clinical Psychology 77 (2): 302. Raver, C.C., S.M. Jones, C.P. Li-Grining, M. Metzger, K.M. Champion, and L. Sardin. 2008. "Improving preschool classroom processes: Preliminary findings from a randomized trial implemented in Head Start settings." Early Childhood Research Quarterly 23 (1): 10-26. Rossin-Slater, M., and M. Wüst. 2020. "What is the Added Value of Preschool for Poor Children? Long- Term and Intergenerational Impacts and Interactions with an Infant Health Intervention." American Economic Journal: Applied Economics 12 (3): 255-86. Spier, E., K. Kamto, A. Molotsky, A. Rahman, N. Hossain, Z. Nahar, and H. and Khondker. 2020. "Bangladesh Early Years Preschool Program Impact Evaluation." Watts, T.W., J. Gandhi, D.A. Ibrahim, M.D. Masucci, and C.C. Raver. 2018. "The Chicago School Readiness Project: Examining the long-term impacts of an early childhood intervention." PloS One 13 (7): e0200144. Weiland, C., and H. Yoshikawa. 2013. "Impacts of a prekindergarten program on children's mathematics, language, literacy, executive function, and emotional skills." Child Development 84 (6): 2112- 2130. Wolf, S. 2019. "Year 3 follow-up of the ‘Quality Preschool for Ghana’interventions on child development." Developmental Psychology 55 (12): 2587. Wolf, S., and M.E. Peele. 2019. "Examining sustained impacts of two teacher professional development programs on professional well-being and classroom practices." Teaching and Teacher Education 86: 102873. Wolf, S., J.L. Aber, J.R. Behrman, and E. and Tsinigo. 2019a. "Experimental impacts of the "Quality Preschool for Ghana” interventions on teacher professional well -being, classroom quality, and children’s school readiness." Journal of Research on Educational Effectiveness 12 (1): 10-37. Wolf, S., J.L. Aber, J.R. Behrman, and M. Peele. 2019b. "Longitudinal causal impacts of preschool teacher training on Ghanaian children’s school readiness: Evidence for persistence and fade‐out." Developmental Science 22 (5): e12878. Wong, V.C., T.D. Cook, W.S. Barnett, and K. Jung. 2008. "An effectiveness‐based evaluation of five state pre‐kindergarten programs." The Journal of the Association for Public Policy Analysis and Management 27 (1): 122-154. Yoshikawa, H., D. Leyva, C.E. Snow, E. Treviño, M. Barata, C. Weiland, C.J. Gomez, et al. 2015. " Experimental impacts of a teacher professional development program in Chile on preschool classroom quality and and child outcomes." Developmental Psychology 51 (3): 309.