Long-Term Impacts of Alternative Approaches to Increase Schooling: Evidence from a Scholarship Program in Cambodia

This paper reports on a randomized experiment to investigate the long-term effects of a primary school scholarship program in rural Cambodia. In 2008, fourth-grade students in 207 randomly assigned schools (103 treatment, 104 control) received scholarships based on the students' academic performance in math and language or their level of poverty. Three years after the program's inception, an evaluation showed that both types of scholarship recipients had more schooling than nonrecipients; however, only merit-based scholarships led to improvements in cognitive skills. This new study reports impacts, nine years after program inception, on the educational attainment, cognitive skills, socioemotional outcomes, socioeconomic status and well-being, and labor market outcomes of individuals who are, on average, 21 years old. The results show that both types of scholarships led to higher long-term educational attainment (about 0.21-0.29 grade level), but only merit-based scholarships led to improvements in cognitive skills (0.11 standard deviation), greater self-reported well-being (0.18 standard deviation), and employment probability (3.4 percentage points). Neither type of scholarship increased socioemotional skills. The results also suggest that there are labeling effects: the impacts of the scholarship types differ even for individuals with similar characteristics.


Policy Research Working Paper 8566
This paper reports on a randomized experiment to investigate the long-term effects of a primary school scholarship program in rural Cambodia. In 2008, fourth-grade students in 207 randomly assigned schools (103 treatment, 104 control) received scholarships based on the students' academic performance in math and language or their level of poverty. Three years after the program's inception, an evaluation showed that both types of scholarship recipients had more schooling than nonrecipients; however, only merit-based scholarships led to improvements in cognitive skills. This new study reports impacts, nine years after program inception, on the educational attainment, cognitive skills, socioemotional outcomes, socioeconomic status and well-being, and labor market outcomes of individuals who are, on average, 21 years old. The results show that both types of scholarships led to higher long-term educational attainment (about 0.21-0.29 grade level), but only merit-based scholarships led to improvements in cognitive skills (0.11 standard deviation), greater self-reported well-being (0.18 standard deviation), and employment probability (3.4 percentage points). Neither type of scholarship increased socioemotional skills. The results also suggest that there are labeling effects: the impacts of the scholarship types differ even for individuals with similar characteristics. This paper is a product of the Development Research Group, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/research. The authors may be contacted at dfilmer@worldbank.org.

Introduction
How does additional schooling impact long-term life outcomes? According to the canonical human capital model, labor markets remunerate the skills acquired during the education process (Becker 2009). According to a signaling model (Arrow 1973;Spence 1973), education provides the market with a signal of individuals' higher abilities; as a result, the market pays for these skills. Both models predict positive effects from investment in education. At the same time, emerging research is showing that, in many settings, increased schooling has not meant increased learning, which is potentially limiting the market returns to education (Pritchett 2013;The World Bank 2017). There are, however, few studies in low-income settings that can isolate the impacts of schooling on skills accumulation. 1 Our paper aims to contribute to this evidence by presenting the causal long-term effects of a scholarship program which induced more schooling on cognitive, socioemotional, socio-economic status and wellbeing, 2 , and labor outcomes in a group of 21-year-old individuals who received the scholarship nine years earlier, in Cambodia.
Our study setup is the following. In 2008, 207 schools in Cambodia were randomly allocated between two treatment arms (103 schools) and a control group (104 schools). In half of the treatment schools, students in grade four received a scholarship based on merit-high-performing students were selected using a baseline test of math and language skills-and fourth-graders in the remaining treatment schools received a scholarship based on poverty-students were selected using a poverty index, based on household and family socio-economic characteristics. Scholarships were given to recipients for three years (i.e. until the completion of primary school), conditional on continued school participation and basic performance standards. A first follow-up study, three years after the inception of the program, showed two main effects: higher school progression for individuals receiving either type of scholarship 1 Few well-identified studies on the causal impacts of education exist for developing countries; important exceptions are Duflo et al. (2017); Parker and Vogl (2018); Ozier (2016); Jakiela et al. (2015) and Friedman et al. (2011).
2 From this point on, we will refer to both socio-economic status and self-reported well-being as "well-being" for the sake of brevity.

D R A F T
(compared with non-recipients), and impacts on cognitive outcomes (as measured by a math test and a test of working memory) only for those receiving merit-based scholarship (Barrera-Osorio and Filmer 2016). In this paper we report results from data collected in 2016-nine years after the beginning of the program-from a subsample of the original study participants. We present evidence of the effects of the scholarship on a range of long-term outcomes spanning cognitive skills, socioemotional outcomes, socio-economic and well-being outcomes, and labor market outcomes.
The analysis presents causal evidence to address three questions. First, what are the long-term effects of the program on cognitive skills and socioemotional outcomes? Specifically, we investigate the impacts of (exogenously induced) additional exposure to schooling on these outcomes. Heckman and Kautz (2014) show that socioemotional skills (which are also sometimes referred to as "non-cognitive" skills) are important determinants of labor outcomes in the long-term. Second, what are the long-term effects of the scholarships on socioemotional outcomes? In particular, we investigate whether socioemotional outcomes are co-produced with (or are complements to) cognitive outcomes. We can pursue the answer to this question because only the merit-based scholarship induced changes in cognitive skills after the first three years of the intervention; therefore, we can test whether we observe effects on socioemotional outcomes for this group only, for both treatment groups, or for neither group. Third, what are the long-term effects of the scholarships on well-being and labor market outcomes? Given that scholarships induced more schooling for all treated individuals, but only cognitive skills for some, we can investigate the channels through which this additional education might affect these outcomes.
Based on an intent-to-treat model, the results show that, despite some catch-up by the control group between 2011 and 2016, scholarship recipients have on average 0.21-0.29 more years of schooling. This is in line with programs that attempt to reduce direct costs (for example scholarships; see Kremer et al. (2009), andDuflo et al. (2017)) and indirect costs of education (for example conditional cash transfers; see Fiszbein and Schady (2009)).
We find positive effects on measures of cognitive skills, but only for the meritbased approach to targeting. Impacts of the merit scholarships on a "family index"-D R A F T that is an index that standardizes the cognitive skill measures and calculates a weighted average 3 -have an effect size of 0.11 standard deviations (significant at the 10% level); the effects of the poverty-based treatment are close to zero (and not statistically significant). This is consistent with the effects found after the initial three years, suggesting limited fade-out for this outcome.
We do not find any systematic impacts on two measures of socioemotional outcomes: emotional and behavioral difficulties (as measured by the Strengths and Difficulty Questionnaire, "SDQ") and the "Big 5" personality traits (openness, conscientiousness, extroversion, agreeableness and neuroticism). 4 For a "family index" of these outcomes, we find imprecisely estimated impacts of 0.01 (merit-based scholarships) and 0.10 (poverty-based scholarships) standard deviations. The findings therefore neither support the hypothesis that more schooling (necessarily) produces more socioemotional skills, nor the hypothesis that cognitive and socioemotional skills are (necessarily) co-produced.
We find that the probability of working increased by 3.4 percentage points for young adults who had received a merit-based scholarship (significant at the 10% level), but the impact for those who had received a poverty-based scholarship was close to zero (and statistically insignificant). The point estimates for earnings are both negative (but not statistically significant), perhaps because the scholarships induced individuals to delay entry into the labor market.
Finally, we find positive overall impacts on various measures of self-reported wellbeing, but again only for those who had received merit-based scholarships. For a "family index" of socio-economic status, the point estimate is 0.17 standard deviations (significant at the 1% level) for the merit-based treatment arm; for the poverty treatment, the point estimate is 0.04 standard deviations (and not statistically significant).
Overall, both types of scholarships led to more schooling attainment, but only the merit-based scholarships had positive impacts on cognitive, well-being, and labor market outcomes. Neither of the two types of scholarships induced greater socioemotional skills. Two factors are important for interpreting these results. First, they are the marginal effect of increasing schooling by only about four additional monthsalthough these may be critical months, inasmuch the program induced individuals to finish primary education. But it is possible that some of the key impacts of schooling on socioemotional skills happen early on (when both the control and treatment groups were still in school) or later on in adolescence (when, for this population, both groups would have left school). Second, while attrition is neither especially high nor systematically different across the three groups of students, our relatively limited sample size may nonetheless have reduced the precision of the estimates. Our overall results present a complex picture, suggesting that demand-side interventions, such as scholarships, and their particular targeting approaches can have important long-term effects.
The paper is organized as follows: in Section 2 we (selectively) review the related literature, in Section 3 we describe in more detail the study setup and context, in Sections 4 and 5 we describe our estimation strategy and data, in Section 6 we present the results, and in Section 7 we provide some concluding comments.

Related Literature
Our study builds on three strands of literature. First, we add to previous research on the effects of demand-side incentive programs in low-and middle-income countries, both in terms of their overall effect and with respect to varying targeting approaches. Second, we contribute to research on whether increased schooling produces outcomes that go beyond cognitive skills, in particular socioemotional outcomes. We furthermore investigate how these might be co-produced. Third, we contribute to the relatively limited literature on the long-term effects of increased school enrollment on outcomes such as employment status or well-being (scant because these impacts only manifest themselves later in life and require a long-term approach to evaluation).

Demand-side incentives
There is a growing empirical literature on the impact of conditional cash transfers, of which scholarships are one form, in low-and middle-income countries (Baird et al. 2014;Barham et al. 2013;García and Saavedra 2017;Snilstveit et al. 2015). 5 Three studies are of particular relevance given their similar designs and scope. First, Friedman et al. (2011) andJakiela et al. (2015) present experimental evidence on the effects of a Kenyan merit-scholarship program for sixth-grade girls, nine years postintervention. The studies find that short-term impacts on educational attainment and cognitive skill (initially reported in Kremer et al. (2009)) result in greater female empowerment, improved political knowledge and attitudes in young adulthood (finding weaker evidence for effects on political behaviors). Second, Barham et al. (2013) evaluate the long-run effects of a conditional cash transfer program targeted to poor families in Nicaragua. They find that boys who were 9-12 at the time of the program attained about half a year more schooling when they were 19-22 than boys in a comparison group, and subsequently had better labor market outcomes (the difference for girls was not statistically different from zero between the treatment and comparison groups). Third, Duflo et al. (2017) evaluate the long-term effects of a secondary school scholarship program, in Ghana. This randomized evaluation finds that the program delayed fertility and marriage, improved educational attainment, cognitive skill, and reproductive and health behaviors, and had heterogeneous impacts on earnings. Our study builds on these: The Kenyan study does not report impacts on earnings, and the Ghana and Nicaragua studies investigate few impacts on socioemotional outcomes. Our paper includes indicators for both types of outcomes, and allows for a contrast of targeting approaches (building on Barrera-Osorio and Filmer (2016)). Together, these evaluations inform the degree to which individual findings from specific contexts might have broader external validity (Vivalt 2017). 5 Scholarships may also be designed as incentive mechanisms, where payments are made based on future performance. See Fryer Jr (2011) for related evidence from the US. See Berry (2015), Blimpo (2014), and Li et al. (2014) for examples of related research from India, Benin, and China, respectively. We study scholarships whose payout is not (or arguably, only weakly) incentivized. See Section 3, below, for a description of the scholarship program.

Socioemotional outcomes
Most of the literature on CCTs and scholarships focuses on schooling and cognitive skill outcomes, with some exceptions that consider the impact of transfers on political and social factors, household consumption smoothing (Sparrow 2007), labor market outcomes (Araujo et al. 2016;Filmer and Schady 2014;Parker and Vogl 2018;Silva and Sumarto 2015), or health (Cruz et al. 2017). Few studies analyze the impacts on various outcome dimensions simultaneously.
In high-income countries, socio-emotional skills have been found to be important predictors of success in school and life in general (see West et al. (2016) for an overview), and the importance of social skills has grown in the U.S. labor market between 1980 and 2012 (Deming 2017). Research from the United States suggests that teachers can have large effects on socioemotional outcomes, although a teacher's productivity in terms of student cognitive achievement is only a weak predictor for her impact on measures of students' socioemotional outcomes (Blazar 2017;Blazar and Kraft 2017;Jackson et al. 2014;Kraft 2017;Santorella 2017). At the same time, little is known-especially in low-and middle-income countries-about whether increased educational attainment leads to more socioemotional skills, and how this might interact with the formation of cognitive skills. Some analyses have tried to shed light on these relationships. For example, Kyllonen and Bertling (2013) report how participants' self-reported confidence in mathematics in the 2003 Programme for International Student Assessment (PISA) study was positively correlated with performance. Claro et al. (2016) use a national data set of all tenth-graders in Chile to show that a student's "growth mindset" can predict academic performance, offsetting socio-economic achievement gaps. However, these studies cannot identify exogeneous variation in schooling and cognitive skill, making causal inferences difficult. 6 6 In a well-identified study, Fabregas (2017) investigates the effect of school quality and peer composition on students' academic performance, perseverance, aspirations, and time-management, in Mexico. But this and related research on peer effects (ibid., for a review) does not shed light on the effects of educational attainment.

Long-term effects
Research from the high-income countries suggests a common characteristic for effects of educational interventions is a lack of persistence (or "fade out"); i.e., initial positive effects that diminish in magnitude or disappear altogether over time (Bailey et al. 2017;Protzko 2015). But at the same time, other studies have shown positive effects on long-term outcomes, such as educational attainment, earnings, health outcomes, and (reduced) criminal behavior (Anderson et al. 2009;Carneiro and Ginja 2014;Chetty et al. 2011Chetty et al. , 2014Currie and Thomas 2000;Deming 2009;Dynarski et al. 2013;Frisvold and Lumeng 2011;Garces et al. 2002;Heckman et al. 2010;Ludwig and Miller 2007).
Comparable long-term evidence from low-and middle-income countries is scarce, with the examples from Kenya, Nicaragua and Ghana described above being some of the few. Drawing lessons requires that educational interventions be defined more broadly. Acevedo et al. (2016) exploit a randomized controlled trial to assess the effect of a youth training and internship program in the Dominican Republic, approximately four years after its inception. The authors investigate socioemotional outcomes (including grit and self-esteem), expectations, and labor market outcomes, finding that treatment effects differed substantially by gender. Further, Doyle et al. (2011) use a randomized experiment to evaluate the impact of a health education program in grades five to seven of Tanzanian primary schools (in combination with health services and community engagement). Six years after the program's implementation, the study documents improvements in sexual and reproductive health attitudes, knowledge, and behaviors. In addition, Walker et al. (2007) and Gertler et al. (2013) assess the long-term effects of a randomized early childhood stimulation program (in combination with food supplementation) for a small sample of adolescents in Jamaica. The authors find positive effects on anxiety, depressive symptoms, self-esteem, anti-social behavior, attention deficit, hyperactivity, and oppositional behavior, along with impacts on labor market outcomes. 7 Finally, both Ozier (2016) and Brudevold-Newman (2016) find positive effects of additional exposure to secondary education on labor market outcomes, in Kenya; Brudevold-Newman (ibid.) also demonstrates related delays in childbearing and marriage. A review of the longterm impacts of CCT programs in Latin America (Molina-Millan et al. 2016) concludes that the literature produces very mixed results, with CCTs during the school years resulting in more cognitive, socioemotional skills and labor market outcomes in some settings, but not in others.

Intervention and Experimental Design
In 2008, the Government of Cambodia began implementing a new pilot scholarship program for grade 4 students in 207 public schools. The program's stated goal was to reduce student drop-out rates and increase primary school completion, though the government also implicitly sought to improve students' educational performance. At the time, the program's 207 schools represented all public schools in three of the country's 25 provinces 8 (Mondulkiri, Ratanakiri, and Preah Vihear); the three provinces had been selected for having the highest drop-out rates in the upper primary grades (grades four to six), according to Cambodia's Education Management Information System (EMIS). 9 The program was phased in as a pilot over two years, with a random set of 103 schools starting in 2008/09 and the remaining schools entering in the following year (random assignment was stratified by province).
The scholarship program targeted students entering grade 4, using one of two selection approaches. In a randomly selected half of the scholarship schools (52 schools), students were selected based on their combined performance on a test of Khmer and mathematics. This "merit-based" eligibility was determined through and aspirations, as well as positive impacts on educational attainment and initial labor market outcomes (approximately 11 years after program participation started). 8 Here, we count the capital as Cambodia's 25th "province". More precisely, Phnom Penh is a special administrative district whose administrative characteristics partly resemble those of provinces. 9 To limit the program's geographic scope, in Ratanakiri, only five of seven districts were selected, choosing those districts with the highest dropout rate. In the remaining two provinces, all districts were selected. 9 a centrally-scored test; the maximum possible score was 25. In the remaining 51 schools, they were selected based on a "poverty-based" approach. A student's "poverty score" was determined based on their self-reported (but validated) household and socio-economic characteristics; the poverty index ranges from 0 (richest household) to 292 (poorest household). 10 Under both approaches, half of a given school's fourth-graders qualified (i.e., the top half of performers, or the poorest half of students). 11 Crucially (for our study), students in all 207 schools completed both types of assessments, independent of their school's assignment status.
Scholarships were offered to beneficiaries for three years (i.e. through the end of primary school), conditional on their continued enrollment, passing grades, and regular attendance. These requirements were moderately enforced. 12 Scholarships were disbursed as a lump-sum payment of approximately USD20 in the first year, and two payments of approximately USD10 in each of the following two years. As reported by Barrera-Osorio and Filmer (2016), these amounts represent about 3.3 percent of the yearly per capita expenditure in the study sample. These transfers are small compared to similar programs in other countries (Fiszbein and Schady 2009); even relatively small impacts may therefore be cost-effective.
Our experimental design exploits the randomized roll-out of the program over its two phases. In 2008/09, during phase one, fourth-graders in schools that were selected to disburse the program in the second phase did not receive any scholarship and did not become eligible in the years thereafter. 13 Note that a sub-set of these fourth-grade students would have been eligible under one of the two targeting schemes (merit-based or poverty-based), had their school been selected. In expectation, these two sub-samples are equal to their respective eligible peers from phase-one schools (below, we present supportive evidence that the two groups of students are in fact balanced, across phase-one and phase-two schools). Thus, we can identify the causal intent-to-treat effect of the scholarship program, under either of the two targeting approaches. As phase-one schools were moreover randomly assigned to either the poverty-based or merit-based targeting scheme, we can also compare the scholarship's effect across the two targeting schemes.

Estimation Framework, Internal Validity
We estimate a generic production function model: where Y are outcomes such as educational attainment, cognitive skills, socioemotional skills, labor outcomes, or measures of well-being (which include socio-economic status, SES). Vector X 0,i includes a rich set of baseline characteristics at the student's school-, village-, and individual-level (the next section describes these measures in greater detail). All estimations include district-level fixed effects and allow for the clustering of standard errors at the assignment level (i.e., within schools; cf. Abadie et al. (2017)). Equation 1 estimates an intent-to-treat model, with β j 1 capturing the effect of offering the scholarship on outcomes Y.
Our default approach is to estimate Equation 1 as two separated OLS models, for the merit-and poverty-based sub-samples. 14 For annual earnings we use a Tobit model with an inverse hyperbolic sine transformation of the outcome variable because its distribution shows a spike at zero (cf. Duflo et al. 2017). 15

D R A F T
For each "family" of outcomes, cognitive skills, socioemotional outcomes, and wellbeing, we present the results from a test that the treatment coefficients are jointly zero (using seemingly unrelated regressions, SUR). Within these sets of outcomes, we also use SUR to test whether the treatment coefficient for the poverty subsample is equal to that for the merit subsample.
Our sampling frame consists of 5,964 fourth-grade students (in the program's 207 schools), who participated in the baseline eligibility assessment, in December 2008 and January 2009. Of those, 2,996 respondents were randomly selected for the first three-year follow-up survey, in 2011. For this first follow-up, an additional 658 "replacement" students were randomly selected, in case students from the target group could not to be found. In the 2016 follow-up, we tracked all students who had participated in the 2011 study, a random subset of 140 respondents who had previously been found to be attritors, and all replacement students who were interviewed in 2011. Our 2016 sample includes 2,252 respondents, of which 2,024 had been interviewed in 2011, 86 had not been reached previously, and 142 had served as replacements, in 2011. Table 1 provides the control group means for key demographic characteristics, for the "merit" and "poverty" sub-samples (for the control group, these refer to respondents who would have qualified if their school had been assigned to one or the other scholarship approach). Our analysis sample consists of 890 and 825 respondents for the merit-based and poverty-based sub-samples, respectively. Among those, about half (48% and 51%, respectively) are female. On average, respondents live with an additional six household members. Almost all the respondents were already working at the time of the three-year follow-up survey.
The data support the fact that our experimental design is valid. First, we find that both sub-samples are balanced on observables. This holds true for the full set of respondents at baseline, as discussed by Barrera-Osorio and Filmer (2016), and for this paper's estimation samples (see Tables A1 and A2, in the Appendix). Second, overall attrition is 31 percent for either sub-sample, and we managed to track 88 percent of respondents who were included in the three-year follow-up study Belotti et al. 2015). Results do not lead to substantial changes and are available upon request.

D R A F T
(i.e., six years after our last contact with study participants). As shown in Table 2, there are no systematic differences in attrition by treatment group. Column (5) of the Table 2's "merit scholarship" and "poverty scholarship" panels presents the differencein-difference among attritors and non-attritors, across respondents in the treatment and control groups (computed by OLS regression and including stratification fixed effects). Only two out of 16 indicators in the merit subsample and only three indicators in the poverty subsample show a statistically significant or marginally significant difference-in-differences; this result is not surprising given multiple comparisons. We also test for the individual coefficients being jointly equal to zero, using seemingly unrelated estimation (SUR); the resulting Chi-square statistics (and corresponding p-values) suggests that we should not reject that the two sub-samples are balanced.

Data and Measurement
Our analysis combines data from five main sources. First, we collect outcome data through in-person interviews at the respondents' residence, using handheld tablets. Second, to construct a variable reflecting intention-to-treat, we use the official government declaration ("Prakas") of scholarship recipients. Third, we match each respondent to baseline data-application forms and baseline tests-as collected in December 2008 and January 2009. We can thus control for baseline test scores, and for students' initial household characteristics. Fourth, we construct a vector of control variables through administrative data on baseline school characteristics, as provided by the country's Educational Management Information System (EMIS). 16 Fifth, we take advantage of the fact that Cambodia's 2008 census was conducted just before the scholarship program started. Using geographic coordinates, we match each school to its closest village and include this village's demographic characteristics as additional controls. 17 Data collection for the baseline and three-year follow-up occurred from December 2008 to January 2009, and from May to September 2011, respectively. Data collection for our latest round of follow-up took from December 2016 to May 2017. We guaranteed data-quality by following standard monitoring procedures, as described by Glennerster (2017). First, during the first week of field work, we conducted 30% of re-surveys ("back-checks", usually within three days) and then reduced this number, for an overall back-check rate of 15.7%. Second, we spot-checked approximately 20% of interviews, provided immediate feedback, and offered repeat-trainings to enumerators. These spot-checks were not only conducted by field supervisors but also through additional, independent field-monitoring. Third, we ran daily analytics on newly collected data to spot irregularities, and to identify training needs. Finally, we employed 15% of staff as dedicated quality-control officers, such that steps to improve data quality could be taken immediately, as part of the regular data flow.
The following discusses our newly collected outcome measures in greater detail. As education outcomes, we measure educational attainment (highest grade completed), formal and informal training that lasted for at least one week (a binary variable), and whether the respondent received any formal education since the early three-year follow-up (a binary variable).
We also collected data on four measures of cognitive skills. First, we administered a computer-adaptive math-test, in which respondents answered ten questions from a larger pool of 23 items. 18 We used a three-parameter logistic (3PL) item response theory (IRT) model with a single guessing parameter (Birnbaum 1968;Samejima 1969) to analyze responses to math tests from an evaluation of a similar scholarship program in Cambodia that was targeted to secondary school students (Filmer and Schady 2008). Participants in this assessment had been tested in two rounds, with overlapping items, and we follow the common (Stocking and Lord 1983) methodology for IRT-based scale equating. 19 Our test begins with the item of median difficulty.
of villagers with no schooling, the percentage of villagers engaged in crop or animal farming, the village's population size, and a continuous measure of villagers' household assets.
18 To our best knowledge, this assessment constitutes the first computer-adaptive ability test as conducted during a household survey, in a developing country. 19 We removed one item with low discrimination.
As the test is administered and respondents answer correctly or incorrectly, our assessment picks the next item to be displayed based on maximum information, recalculates a respondent's ability estimate using expected a posteriori, and continues thereafter until ten items are administered for each respondent (cf. Bock and Mislevy 1982;van der Linden and Pashley 2010). The second assessment is a test of shapes and puzzles loosely based on the Raven's Progressive Matrices. This test is a measure of fluid intelligence; respondents are asked to complete 15 sets of pattern recognition. Our third measure is a "Digit Span" test, which asks respondents to repeat sequences of single-digit numbers, of increasing length. This test is a common measure of respondents' working memory (Hamoudi and Sheridan 2015). Sequences are presented in sets of two and begin with two integers (asking respondents to repeat 2-1 and 1-3).
No additional sequences are asked if a respondent fails to repeat both prompts; the last set of longest sequence presents two strings of eight integers (asking respondents to repeat 6-9-1-7-3-2-5-8 and 3-1-7-9-5-4-8-2). The fourth outcome is a vocabulary test based on picture recognition, similar to a Peabody Picture Vocabulary Test (PPVT). This test asks respondents to identify the picture corresponding to a word which the enumerator reads out loud. For each word the respondent is asked to select from a choice of four pictures. The test is structured such that items become increasingly difficult (examples of easy items include, "citrus," and "garment"; items of highest difficulty include "vitreous" and "lugubrious"). A maximum of 96 items is presented in sets of 12, and no additional item is displayed if a respondent fails to answer at least five items correctly in a given set. The final skill estimate for each of the math, pattern recognition, and vocabulary recognition tests are calculated with a two-parameter logistic (2PL) IRT model. The Digit Span test score reflects the number of integer sequences a respondent repeated correctly. All four measures are standardized (mean zero and standard deviation of one). We report on two sets of socioemotional outcomes: we screen for emotional and behavioral difficulties with the Strengths and Difficulty Questionnaire ("SDQ"), and measure the "Big 5" personality traits. The SDQ represents a common screening instrument; we use (the official Khmer translation of) its most frequently used version with 25 items on psychological attributes (Goodman 1997). Following its scoring guidelines and official recommendations (ibid.), we report on three subscales, separated into 'internalizing problems' (emotional and peer symptoms, 10 items), 'externalizing problems' (conduct and hyperactivity symptoms, 10 items), and a scale of prosocial behavior (5 items). To capture respondents' personality traits, The Big Five Scale measures five core dimensions of personality. The five broad personality traits measured are extraversion, agreeableness, openness, conscientiousness, and neuroticism. Evidence of the Big Five as being relevant (and associated with life outcomes) has been growing, beginning with the research of Fiske (1949) and later expanded upon by other researchers including Norman (1967), Smith (1967), Goldberg (1981), and McCrae and Costa (1987). We use the short 15 item Big Five Inventory (BFI-S) (Lang et al. 2011), with three items per personality trait. Like the indicators of cognitive skill, all measures of socioemotional outcomes are standardized. 20 We also collected information on five labor market outcomes. We ask whether a respondent is currently working (yes or no) and the age at which she or he first started working. We moreover construct a binary indicator of whether a respondent's main work activity is cognitively demanding. We categorize an occupation as such if it requires at least occasional use of reading, writing, mathematics, or a computer (according to the respondent). Our survey also asked for respondents' income; our analysis reports on (the inverse hyperbolic sine of) yearly earnings and (the inverse hyperbolic sine of) a respondent's daily reservation wage, i.e., the minimum wage or payment for which a respondent is willing to accept work (both are reported in US dollars, a currency commonly used in Cambodia).
Our last set of outcomes includes six indicators of socio-economic status and wellbeing. We assess subjective social status using a "MacArthur community ladder". 21 Respondents were shown a picture of a ladder with ten rungs and were told that D R A F T higher rungs correspond to higher socio-economic status. They were then asked to place themselves on this ladder in relation to everyone in their community. As a second measure of socio-economic status, we construct an index of respondents' household assets, asking whether they possess items from a list similar to the one presented in Table 2. To calculate an individual's latent SES score, we borrow from the psychometric literature and estimate a two-parameter logistic (2PL) IRT model, placing responses from 2009, 2011 and 2016 on the same scale. 22 We also asked respondents to rate their satisfaction with life at present, all things considered, on a scale from one ("completely dissatisfied") to ten ("completely satisfied") and to rate their quality of life and health, respectively, on a scale from one ("poor") to five ("excellent"). The fifth and last measure screens for (minor) mental health disorders, using the General Health Questionnaire ("GHQ"). We use the short form of the questionnaire (GHQ-12) with Likert scoring (Goldberg and Williams 2006;Quek et al. 2001). All six measures are standardized (mean zero and standard deviation of one). 23 Finally, for each set of educational outcomes, cognitive outcomes, socioemotional outcomes, and SES and subjective well-being, we also calculate an overall "family index," following Anderson (2008). 24 These indices have the benefit of reducing the number of statistical tests (and the temptation to selectively focus on positive results). In constructing the indices, we ensured that the qualitative "direction" of the construct was preserved-higher values point to more desirable outcomes. However, our index construction is atheoretical and may therefore group together measurements with different underlying constructs. We therefore present and discuss results from both individual measurements and the family indices. 22 Filmer and Scott (2012) show that such an IRT approach produces similar household rankings when compared to other aggregation methods. 23 We standardize by focusing on the endline measures for control group students (who would have qualified for at least one of the two types of scholarships, had they been in a treatment school instead). 24 We also considered using an alternative index instead, following Kling et al. (2007). The alternative approach does not lead to qualitatively different conclusions.

D R A F T 6 Results
Tables 3 to 7 present results on five main categories of outcomes: education; cognitive skills; socioemotional outcomes; socio-economic and well-being outcomes; and labor outcomes. The tables share a common structure. Each table has two panels; Panel A reports results for the merit sample, whereas Panel B reports results for the poverty sample. Each panel presents separate regressions for a given dependent variable, as stated in the column headers. For the treatment variable (1 if assigned to treatment, 0 otherwise), the table presents regression coefficients and standard errors. All regressions control for covariates at baseline and district fixed effects; standard errors are clustered at the level of randomization (the school) (Abadie et al. 2017). Each of the two panels also presents the unconditional mean, as observed for the control group. Each panel moreover includes the results from a chi-square test on the null hypotheses that all treatment coefficients are jointly zero, using seemingly unrelated regression (SUR). Finally, across the two panels and for each of the outcome variables, we present results (in the bottom two rows of the tables) for a test of the null hypothesis that the two treatment coefficients (merit and poverty) are equal.

Education
The main stated objective of the program was to increase school progression of lowincome individuals. Early dropout from primary school is still a major obstacle in education in Cambodia, especially in rural areas. At inception of the program, only close to 40% of the poorest quintile of income completed 6th grade (Barrera-Osorio and Filmer 2016). As such, the first set of outcomes that the program aimed to change was to induce greater school progression, with an immediate goal of helping students successfully complete primary school (grade 6). Table 3 presents results for school progression (highest grade attained), primary school graduation (a zero-or-one variable), an indicator of whether the respondent received any formal education since the three-year follow-up study (in 2011), and a "family index" of the previous three measures (measured in standard deviations, SDs). On average, students in the control group completed 5.45-5.57 grade levels. Both D R A F T types of scholarships increased educational attainment, with similar point estimates (0.213 and 0.291 for merit and poverty scholarships, respectively, equivalent to about four additional months of schooling). The effects on overall attainment, as reported by (Barrera-Osorio and Filmer 2016) after three years of starting the program, were slightly higher for the poverty sample (0.34), and similar for the merit sample (0.23) (Barrera-Osorio and Filmer 2016, Table 4, column 3), indicating some catch-up by the control group. The scholarships increased primary school completion, by 5.0 and 11.3 percentage points (pp) for the merit-and poverty-based approaches, respectively (statistically significant for poverty-based scholarships).
The point estimates for impacts on participation in any formal education (between 2011 and 2016) are positive and statistically significant for both groups: merit-based scholarships increased participation by 4.4pp over a control-group average of 77%, poverty-based scholarship increased it by 10pp over a control-group average of 71%, suggesting some catch-up with the merit-based scholarship group. The joint test of all coefficients being equal to zero is rejected for both treatments (a p-value of 0.06 and 0.04 for the merit and poverty treatments, respectively). Both point estimates of the regression with the "family index" as dependent variable are positive and statistically significant (at the 5% and 1% levels, respectively), with a point estimate of 0.131 standard deviations for the merit-based and of 0.264 standard deviations for the poverty-based scholarships. However, we cannot conclude that the two coefficients are in fact different (the p-value corresponding to this test is above 0.10).

Cognitive skills
An implicit objective of the program was to induce an increase in students' learning by encouraging greater attendance and retention-that is, inducing additional schooling. We measure cognitive skills through proxies that relate to an individual's knowledge, ability to tackle problems, and fluid intelligence. Unsurprisingly, the control group for the merit sample has higher average test scores on these measures than the control group for the poverty sample (Table 4). Table 4 presents the impacts of scholarships on these measures of cognitive skills.

19
Across the different measures, we find suggestive evidence of positive effects for the merit-based treatment. All coefficients are positive, and two of them are statistically significant (Raven's and the overall "family index"). The estimation suggests an overall effect of 0.113 standard deviations on these measures (significant at the 10% significance level). In contrast, the results for the poverty-based scholarship are either close to zero or even negative, in the case of the Forward Digit Span (a point estimate of -0.129 SDs, significant at the 10% level, and different from the effect for the merit-based transfer, significant at the 5% level). The "family index" is close to zero in the case of the poverty scholarship (<0.01 standard deviations). The findings here, nine years after program inception, are consistent with those documented in the previous three-year follow up study. In that study, merit-based scholarship recipients scored higher in mathematics and for the Digit Span test (Barrera-Osorio and Filmer 2016), whereas poverty-based scholarship recipients did not.

Socioemotional skills
An important contribution of our study is the analysis of effects on socioemotional skills. We are not only interested in measuring the effects of scholarships on these measures; we are also interested in the relationship between cognitive and socioemotional skills. The intuition behind this analysis has two parts. First, if scholarships induced more schooling for both types of scholarship recipients, then, under the assumption that schools also "produce" socioemotional skills, we should observe effects on these skills from both poverty-and merit-based scholarships. In contrast, if there is a complementary relationship in the accumulation of cognitive and socioemotional skills, we should observe effects on socioemotional skills only for students with the merit-based scholarship, and not for students in the poverty-based scholarship. We formally present these relationships in the next paragraphs.
Our approach is based on two different conceptual models of the relationships between years of education (E), cognitive skills (C), and socioemotional skills (S). As a starting point, based on the evaluation three years after the program's inception 20 (Barrera-Osorio and Filmer 2016), we know that treatment T 0 (at baseline, t = 0) increased years of education schooling for both merit-and poverty-based scholarships (E t = f (T 0 ; X 0 , Z 0 ); Et ∂T 0 > 0, for both types of scholarships). Furthermore, the evaluation showed a causal, positive effect of the intervention on cognitive skills for the merit-based scholarship only ( > 0); and zero effects for the poverty-based scholarship (C P t = f (T P 0 ; X 0 , Z 0 ), where M denotes merit-based treatment and P denotes poverty-based treatment.
The first conceptual relationship we explore is that between each type of skillcognitive and socioemotional-and years of education: , where X 0 are student characteristics and Z 0 are school inputs (at baseline). These equations state that the effect on either set of skills is a function of the years of education; i.e., exposure to more schooling will induce higher cognitive and socioemotional skills. Therefore, the first set of relationships we investigate are: and If schooling produces cognitive and socioemotional skills, both equations 2 and 3 are positive, independently of the type of treatment (merit or poverty). In contrast, the second conceptual relationship is based on a modification of this setup: for the merit-based scholarship we have an additional equation, relating cognitive skills and treatment: i.e., treatment induced higher cognitive skills only for the merit (M ) treatment. The

D R A F T
basic relationship of interest is between socioemotional skills and cognitive skills: The second relationship we investigate is therefore: i.e., that the effect of treatment on socioemotional skills is positive, and it depends on the effect of cognitive skills on socioemotional skills ( . If there is complementarity (or co-production) between cognitive and socioemotional skills (i.e., For the case of the poverty-based scholarship (P ), the corresponding expression is: . There are three main relevant cases for Equations 5 and 6. If exposure to school in-and-of itself produces socioemotional skills, both Equation 5 and 6 are positive. If exposure to schooling does not produce socioemotional skills, Equation 6 is equal to zero. Finally, under complementarities between cognitive and socioemotional skills (e.g. if cognitive skills help in the acquisition of socioemotional skills, or if they are co-produced), then Equation 5 is positive, independent of the relationship between socioemotional skills and exposure to school. Table 5 presents results on the Strengths and Difficulties Questionnaire (SDQ)separating out the three attributes: prosocial, internalizing, and externalizing-and on the Big 5-separating by its five traits: Openness, Conscientiousness, Extraversion, Agreeableness and Neuroticism (OCEAN).
Overall, we reject the hypothesis of impacts on any subcomponent of these two D R A F T groups of socioemotional outcomes. All point estimates are close to zero, with the exception of Neuroticism for the poverty treatment, with a coefficient of 0.186 standard deviations (statistically significant at the 1% level). Nevertheless, the coefficients for the impact on "family indices" for both treatments are close to zero (-0.005 and -0.099 standard deviations for the merit-and poverty-based treatment, respectively), and neither coefficient is statistically significant. The broad pattern of the table suggests that the program did not produce effects on socioemotional skills, despite the observed impact on school progression and, for the merit sample, on cognitive outcomes. We cannot rule out competing hypotheses such as low marginal exposure to schooling (i.e., the treatment groups increased their educational attainment by only about four months, on average). We note, however, that this amount of additional schooling was sufficient to produce improved cognitive performance among the recipients of merit-based scholarships. Table 6 presents the effects of the program on current labor status (coded as one if the respondent is "currently working", and zero otherwise); the respondent's age when they started working (which captures child labor); whether the recipient participated in work-related training that lasted for at least one week (formal or informal, a zero-or-one variable); the cognitive demands of the respondent's main work activity; and two measures of income: yearly earnings and the daily reservation wage (both transformed using an inverse hyperbolic sine).

Labor outcomes
There is a positive impact on the probability of working for recipients of the merit-based scholarships (3.4 percentage points, statistically significant, at the 10% level); the point estimate for the poverty arm is lower (1.2 percentage points), and not statistically significant. These effects are from a high baseline level of people who report to be currently working (the means of both control groups are around 92%). 25 Respondents in our two samples started to work very early in life, when they were between 12 and 13 years old. Respondents who were offered a scholarship delayed entering the labor market by 0.074 and by 0.339 years for the merit-and poverty-based program, respectively, in line with the results for school progression. However, these estimates have large standard errors and are not significantly different from zero. In addition, while about 58% of control group respondents report having received formal or informal training since 2011 (which could have improved their work prospects), we do not see any effects on this outcome. Only a small share of respondents (less than 18%) engage in economic activities that are cognitively demanding. There is no evidence of impact on the cognitive demands of the main work activity, for either of the two treatments. The point estimates on yearly earnings are negative for both treatments arms (but not statistically significant); one potential explanation is that the scholarship program delayed entry into the market for recipients and, as a result, they have less experience than non-recipients. We observe a positive impact on the daily reservation wage for both groups; however, the estimates are very imprecise. Table 7 presents effects on various measures of well-being. These include both selfassessed measures as well as more readily observed indicators such as measures of household asset ownership. All these outcomes are standardized to have a mean of zero and a standard deviation of one. As for some of the previous tables, we also present results for a standardized "family index" of these measures.

Well-being outcomes
Both treatments caused a positive impact on perceived status as measured by the SES ladder, with point estimates of 0.173 and 0.208 standard deviations for meritbased and poverty-based scholarships respectively. In addition, merit-based scholarships resulted in statistically significant positive impacts on respondents' SES Index (i.e. ownership of household assets; 0.186 standard deviations), quality of health (0.129 standard deviations) and on the "family index" (0.174 standard deviations). Other than the SES ladder, none of the other impacts for the poverty-based scholindividuals' occupation falls into ISCO Major Group 6 ("skilled agricultural and fishery workers").

D R A F T
arships are statistically significantly different from zero, although most of the point estimates are positive. We reject the null hypothesis, for both targeting approaches, of all estimators being equal to zero (at the 1% level of significance). The point estimate for the impact on the overall "family index" for merit-based scholarships is substantially higher than for the poverty-based ones (and statistically significantly so; p-value = 0.168).

Heterogeneity
We investigate two types of heterogeneous effects. Both sets of analyses use the "family indices" for education, cognition, socioemotional outcomes, and well-being outcomes; as an indicator of labor outcomes, we use a respondent's daily reservation wage. The first analysis of heterogeneous effects compares the impact of scholarships for respondents who would have qualified for a scholarship under either of the two targeting schemes. For those individuals, the scholarship only differs in terms of its name or "label". To investigate this, we estimate a regression that includes the treatment dummy, an indicator for whether a respondent would not have qualified under the other scheme, and the interaction between the treatment and this indicator. Of interest is a comparison of the two direct treatment coefficients (the first rows of Panels A and B of Table 8) as these reflect the impact of the scholarships on students who were both high merit and high poverty. A difference in point estimates indicates a labeling effect. 26 The results are consistent with heterogeneous impacts by treatment label, favoring the merit-based presentation of scholarships over their poverty-based presentation. The key pattern in Table 8 is that for the indices other than education participation and socio-emotional outcomes, the coefficients on "Treatment" for the merit-based scholarships are substantively larger than those for "Treatment" for the poverty-based scholarships. For example, for cognitive skills the impact of meritbased scholarships on high-merit high-poverty students is 0.211 (statistically sig-D R A F T nificantly different from zero) whereas the impact of poverty-based scholarships on high-merit high-poverty students is 0.056 (not statistically significantly different from zero). The difference between these coefficients is 0.155 for cognitive skills, 0.144 for well-being, and 0.158 for reservation wage. By contrast, the impacts on education participation and socio-emotional outcomes are more similar (the differences are -0.069 and 0.095, respectively).
We interpret these results as suggesting that the labeling effect that was apparent in the earlier three-year follow-up study remains and is apparent in dimensions not documented before (well-being and reservation wage). However, we recognize that imprecision in the estimates makes it hard to be confident about this finding-the only difference in coefficient estimates that is statistically significantly different from zero is that for reservation wage (p-value of 0.018). 27 The second dimension of heterogeneity we investigate is that by gender. In Table 9, we present results from regressions of the dependent variables on a treatment indicator, a gender indicator (female = 1, and zero otherwise), and their interaction. The table also assesses whether the size of the impact is different across targeting types for boys (as indicated by a Chi-square test and its corresponding p-value in the last two rows of the table). The results are mixed, and should be interpreted with caution since, as in the discussion of Table 8, some of the point estimates suffer from large standard errors. We do not find gender-differential impacts on a beneficiary's educational attainment, socioemotional outcomes, or well-being (whether within or across the two programs and samples). In contrast, there are large differences in the effect of poverty-based transfers on cognitive skills. While the impact on boys is positive (with an effect-size of 0.168), the impact on girls is negative (with an effect-size of -0.141=0.168-0.309)-with the difference being a statistically significantly different from zero. Unlike in the average effect, the results reveal a positive impact of poverty-based scholarships for male recipients. Finally, Column (5) suggests that the estimated impact on a recipient's daily reservation wage comes from the impact for D R A F T male recipients; for females, the point estimate is close to zero for either of the two programs.

Conclusions
This study has investigated the long-term impacts of increased schooling, with a particular focus on potential complementarities across schooling, the development of cognitive skills, and socioemotional and labor market outcomes later in life. To this end, we evaluated the long-term effects of a primary school scholarship program in rural Cambodia, nine years after the program's inception, tracking study participants when they were, on average, 21-years-old. Overall, we find that targeting approach matters for the impact on cognitive skills, socio-economic status, and well-being. The merit-based and poverty-based targeting schemes both led to increased schooling, but only the merit-based scholarship led to improvements in cognitive skill and to greater well-being. There is limited evidence of systematic differences across outcomes in these long-term impacts by gender.
Our study points to potential important avenues for research and policy. Prior work argues that more schooling does not necessarily imply more learning (World Bank, 2018); in turn, our work highlights that more schooling, even if it enhances learning, may not necessarily translate to noticeable changes in the labor market outcomes and may not lead to measurable improvements in socioemotional skills. To better understand this puzzle, additional research is needed, in at least three areas. First, our analysis of heterogeneous effects provides suggestive evidence that labor market effects may be concentrated among poorer beneficiaries who are male. This result echoes the findings of Duflo et al. (2017), who find labor market effects for a subset of male students, only. It will be important to understand how programs such as these can be designed in a way such that they also fully benefit female recipients. Second, our findings are consistent with research by Jackson (2018), which suggests that the school-based production of cognitive skills may not necessarily go handin-hand with improvements in socioemotional outcomes. However, research on how to purposefully foster socioemotional skill in school settings is only in its infancy, D R A F T especially in developing countries (see West et al. 2016). Third, our reported lack of impacts on socioemotional skills may be at least partially driven by a lack of precision; we would encourage other researchers to improve upon our study through continued work on the measurement of socioemotional outcomes in low-income countries (such as Laajaj and Macours 2017) and through similar, long-term evaluations with larger samples. Married is a dummy equal to 1 if the respondent is currently married and 0 if never married, divorced or separated. This variable is missing for minors. Currently working is a dummy equal to 1 if the respondent worked during the last week or has a job at the moment and 0 otherwise; respondents may work and also be a student. Column (1) presents the number of observations in the analysis sample. Colums (2) to (4) display the means for the full sample, the treatment group, and the control group, respectively. Standard deviations in parentheses. Column (5) is the difference between the treatment group mean and the control group mean. Differences in means are computed by OLS regression, controlling for province fixed effects. Standard errors in parentheses are clustered at the school level. *** p<0.01, ** p<0.05, * p<0.1. Poverty Scholarship

Attritor C Attritor T Non-attritor C Non-attritor T Diff-in-Diffs
Attritor C Attritor T Non-attritor C Non-attritor T Diff-in-Diffs (1) (1) (2) (3) Notes: All variables measured at baseline. Columns 1 to 4 display the means for the control group attritors, the treatment group attritors, the control group surveyed and the treatment group surveyed. Standard deviations in parentheses. Column (5) is the difference between the treatment group mean and the control group mean among attritors minus the difference between the treatment group mean and the control group mean among respondents. Differences in means are computed by OLS regression, controlling for province fixed effects. Standard errors in parentheses are clustered at the school level. *** p<0.01, ** p<0.05, * p<0.1. The Chi-square (and corresponding p-value below) is the result of a test testing for the individual coefficients being jointly equal to 0 using seemingly unrelated estimation.  (1) is the highest grade the individual completed and is equal to -1 if the individual received no education, 0 if he only went to kindergarten and then ranges from 1 to 11 for Grade 1 to Grade 11. In column (2), the dependent variable is a dummy equal to 1 if the individual completed primary education. In column (3), the dependent variable is equal to 1 if the individual was enrolled in the formal education system during any of the years 2011 to 2016. In column (4), the family index is the inverse covariance matrix-weighted mean of the standardized dependent variables from the three previous columns following Anderson (2008). All regressions control for district fixed effects, baseline test score, baseline poverty score, individual-level socio-economic variables from baseline, 6 school-level (EMIS) variables and 5 census village-level variables, measured at baseline. Panel A includes respondents who were eligible for the merit scholarship (Treatment=1, 0 otherwise) and Panel B respondents who were eligible for the poverty scholarship (Treatment=1, 0 otherwise). Robust standard errors are in parentheses (clustered at the school level). *** p<0.01, ** p<0.05, * p<0.1. The joint significance Chi-square (and corresponding pvalue below) is a result of testing for the coefficients of individuals regressions being jointly equal to 0, using seemingly unrelated estimation. The poverty vs. merit Chi-square (and corresponding p-value below) is a result of testing for the coefficient of the merit sample and the coefficient of the poverty sample being equal.  (1) is the score on the mathematics computer adaptive test, computed using Item Response Theory (IRT) with a two parameter logistic (2PL) model, standardized. In column (2), the dependent variable is the score on the Raven's matrices test computed using IRT with a 2PL model, standardized. In column (3), the dependent variable is the standardized score on the digit span test using forward items only, standardized. In column (4), the dependent variable is the score on a Picture Recognition Vocabulary Test computed using IRT with a 2PL model, standardized. In column (5), the family index is the inverse covariance matrix-weighted mean of the standardized dependent variables from the four previous columns following Anderson (2008). All regressions control for district fixed effects, baseline test score, baseline poverty score, individual-level socio-economic variables from baseline, 6 school-level (EMIS) variables and 5 census village-level variables, measured at baseline. Panel A includes respondents who were eligible for the merit scholarship (Treatment=1, 0 otherwise) and Panel B respondents who were eligible for the poverty scholarship (Treatment=1, 0 otherwise). Robust standard errors are in parentheses (clustered at the school level). *** p<0.01, ** p<0.05, * p<0.1. The joint significance Chi-square (and corresponding p-value below) is a result of testing for the coefficients of individuals regressions being jointly equal to 0, using seemingly unrelated estimation. The poverty vs. merit Chi-square (and corresponding p-value below) is a result of testing for the coefficient of the merit sample and the coefficient of the poverty sample being equal.  (1) is the score from 0 to 10 on the pro-social facet (the higher the score, the more pro-social) of the Strength and Difficulty Questionnaire (SDQ), standardized. In column (2), the dependent variables is the score from 0 to 20 on the internalizing behavior facet (the higher the score, the more externalizing behavior problems) of the SDQ, standardized. In column

D R A F T
(3), the dependent variables is the score from 0 to 20 on the externalizing behavior facet (the higher the score, the more externalizing behavior problems) of the SDQ, standardized. In columns (4) to (8), the dependent variables are the scores from 3 to 15 on the Openness, Conscientiousness, Extroversion, Agreeableness and Neuroticism facets of the Big Five scale, standardized. In column (9), the family index is the inverse covariance matrix-weighted mean of the standardized dependent variables from the eight first columns following Anderson (2008) (scores from columns (2), (3), and (8) have been flipped beforehand). In column (10), the family index represents the first factor from an exploratory factor analysis (EFA) with quartimin rotation, on the same set of variables as in (9). All regressions control for district fixed effects, baseline test score, baseline poverty score, individual-level socio-economic variables from baseline, 6 school-level (EMIS) variables and 5 census village-level variables, measured at baseline. Panel A includes respondents who were eligible for the merit scholarship (Treatment=1, 0 otherwise) and Panel B respondents who were eligible for the poverty scholarship (Treatment=1, 0 otherwise). Robust standard errors are in parentheses (clustered at the school level). *** p<0.01, ** p<0.05, * p<0.1. The joint significance Chi-square (and corresponding p-value below) is a result of testing for the coefficients of individuals regressions being jointly equal to 0, using seemingly unrelated estimation. The poverty vs. merit Chi-square (and corresponding p-value below) is a result of testing for the coefficient of the merit sample and the coefficient of the poverty sample being equal.  (1) is a dummy equal to 1 if the individual is currently working, i.e. she worked for at least 1 hour during the last week or has a job at the moment but did not work during the last week. In column (2), the dependent variable is the age at which the individual started to work. In column (3), the dependent variable is a dummy equal to 1 if the individual participated in any formal or informal training that lasted at least one week, since 2011. In column (4), the dependent variable is a dummy equal to 1 if the main work activity demands cognitive ability (read, write, calculate, or use a computer) and 0 otherwise. In column (5), the dependent variable is the yearly earning expressed in US dollars and transformed using an inverse hyperbolic sine. In column (6), the dependent variable is the daily reservation wage in US dollars and transformed using an inverse hyperbolic sine. In column (4), values for respondents who did not work have been imputed with 0, except if they were students. In columns (1) and (4), the sample is restricted to respondents who are not currently students. Column (2) includes everyone who ever worked. Column (4) includes only people who worked over the past 12 months. Columns (3) and (6) include the entire sample. Column (1), (2), (3), (4), and (6) are estimated using OLS regression; Column (5) is estimated using Tobit regression. All regressions control for district fixed effects, baseline test score, baseline poverty score, individual-level socio-economic variables from baseline, 6 school-level (EMIS) variables and 5 census village-level variables, measured at baseline. Panel A includes respondents who were eligible for the merit scholarship (Treatment=1, 0 otherwise) and Panel B respondents who were eligible for the poverty scholarship (Treatment=1, 0 otherwise). Robust standard errors are in parentheses (clustered at the school level). *** p<0.01, ** p<0.05, * p<0.1.  (1) is the score from 1 to 10 on an economic ladder as compared to people of the same age in the village, standardized. In column (2), the dependent variables is a socio-economic index constructed based on asset ownership computed using Item Response Theory with a two parameter logistic model, standardized. In column (3), the dependent variable is the score from 1 to 10 on a life satisfaction question, standardized. In column (4), the dependent variable is the score from 1 to 5 on a health quality question, standardized. In column (5), the dependent variable is the score from 1 to 5 on a life quality question, standardized. In column (6), the dependent variable is the standardized score on the General Health Questionnaire. In column (7), the family index is the inverse covariance matrix-weighted mean of the standardized dependent variables from all the previous columns following Anderson (2008). All regressions control for district fixed effects, baseline test score, baseline poverty score, individual-level socio-economic variables from baseline, 6 school-level (EMIS) variables and 5 census village-level variables, measured at baseline. Panel A includes respondents who were eligible for the merit scholarships (Treatment=1, 0 otherwise) and Panel B respondents who were eligible for the poverty scholarship (Treatment=1, 0 otherwise). Robust standard errors are in parentheses (clustered at the school level). *** p<0.01, ** p<0.05, * p<0.1. The joint significance Chi-square (and corresponding p-value below) is a result of testing for the coefficients of individuals regressions being jointly equal to 0, using seemingly unrelated estimation. The poverty vs. merit Chi-square (and corresponding p-value below) is a result of testing for the coefficient of the merit sample and the coefficient of the poverty sample being equal.  (1) to (3) and (5) are the family indices from Tables 3 to 5 and 7. Column (4) is the same variable as in Table 6. Treatment captures effects for students who would have qualified for a scholarship under either scheme. Below the median poverty score are individuals who qualify under the merit-based scheme but would not have received a poverty-based scholarship. Below the median test score are individuals who qualify under the poverty-based scheme but would not have received a merit-based scholarship. All regressions control for district fixed effects, baseline test score, baseline poverty score, individual-level socio-economic variables from baseline, 6 school-level (EMIS) variables and 5 census village-level variables, measured at baseline. Panel A includes respondents who were eligible for the merit scholarship (Treatment=1, 0 otherwise) and Panel B respondents who were eligible for the poverty scholarship (Treatment=1, 0 otherwise). Robust standard errors are in parentheses (clustered at the school level). *** p<0.01, ** p<0.05, * p<0.1. The Chi-square (and corresponding p-value below) is the result of testing the equality between the interaction term from the merit sample and interaction term from the poverty sample.  (1) to (4) are the family indices from Tables 3 to 5 and 7. Column (5) is the same variable as in Table 6. All regressions control for district fixed effects, baseline test score, baseline poverty score, individual-level socio-economic variables from baseline, 6 school-level (EMIS) variables and 5 census village-level variables, measured at baseline. Panel A includes respondents who were eligible for the merit scholarship (Treatment=1, 0 otherwise) and Panel B respondents who were eligible for the poverty scholarship (Treatment=1, 0 otherwise). Robust standard errors are in parentheses (clustered at the school level). *** p<0.01, ** p<0.05, * p<0.1. The Chi-square (and corresponding p-value below) is the result of testing the equality between the interaction term from the merit sample and interaction term from the poverty sample.

D R A F T
Appendix: Additional checks of validity and robustness of findings 49 (3) (1) (3) Notes: Column (1) presents the number of observations in the analysis sample (excluding observations with imputed baseline information). Columns (2) to (4) display the means for the full sample, the treatment group, and the control group, respectively. Standard deviations in parentheses. Column (5) is the difference between the treatment group mean and the control group mean. Differences in means are computed by OLS regression, controlling for province fixed effects. Standard errors in parentheses are clustered at the school level. *** p<0.01, ** p<0.05, * p<0.1. The Chi-square (and corresponding p-value below) is the result of a test testing for the individual coefficients being jointly equal to 0 using seemingly unrelated estimation.

D R A F T
(3) (1) (3) Notes: Column (1) presents the number of observations in the analysis sample (excluding observations with imputed baseline information). Colums (2) to (4) display the means for the full sample, the treatment group, and the control group, respectively. Standard deviations in parentheses. Column (5) is the difference between the treatment group mean and the control group mean. Differences in means are computed by OLS regression, controlling for province fixed effects. Standard errors in parentheses are clustered at the school level. *** p<0.01, ** p<0.05, * p<0.1. The Chi-square (and corresponding p-value below) is the result of a test testing for the individual coefficients being jointly equal to 0 using seemingly unrelated estimation.