The Role of Training Programs for Youth Employment in Nepal: Impact Evaluation Report on the Employment Fund

The youth unemployment rate is exceptionally high in the developing world. Because quality of education is arguably one of the most important determinants of youth’s labor force participation, governments worldwide have responded by creating job training and placement services programs. Despite the rapid expansion of skill-enhancement employment programs across the world and the long history of training program evaluations, debates about the causal impact of training based labor market policies on employment outcomes still persist. Using a quasi-experimental approach, this report presents the short-run effects of skills training and employment placement services in Nepal. Launched in 2009, the intervention provided skills training and employment placement services for over 40,000 Nepalese youth over a three-year period, including a specialized adolescent girls’ initiative that reached 4,410 women aged 16 to 24. We find, after three years of the program, the EF intervention positively improved employment outcomes. EF training program participation generated an increase in non-farm employment of 15 to 16 percentage points for an overall gain of about 50 percent. The program also generated an average monthly earnings gain by about 72 percent. We find significantly larger employment impacts for women than for men, but younger women aged 16 to 24 experienced the same improvements as older females. These employment estimates are comparable, though somewhat higher, than other recent experimental interventions in developing countries.


I. Introduction
Youth unemployment and underemployment across much of the developing world is extremely high. Two facts highlight the importance of youth's labor in the world economy: 17 percent of the world's population are youth (ages 16-24) and youth make up 40 percent of the world's unemployed. In developing countries, where labor frequently encompasses informal selfemployment and small-scale agriculture, youth also struggle with high underemployment (they are not able to work as much as they would like to) and low productivity. Up to two-thirds (60%) of the young population is underutilized in some developing economies, meaning that they are unemployed, in irregular employmentmost likely in the informal sector, or neither in the labor force nor in education or training (ILO, 2013). Youth unemployment and underemployment not only slow down economic growth but also negatively impact crime rates (Fella and Gallipoli, 2007), depression rates (Frese and Mohr, 1987), substance abuse rates (Linn, Sandifer and Stein, 1985), and rates of social exclusion (Goldsmith, Veum and Darity, 1997).
Nepal is similarly affected by youth unemployment and underemployment. With a per capita income of US$700, Nepal is South Asia's second poorest country (ahead of only Afghanistan). Helped by remittances, the proportion of the population falling below the national poverty line has declined in recent years from 31 percent in 2003-04 to 25 percent in 2011 (Central Bureau of Statistics 2011). One underlying factor of poverty in Nepal is the lack of employment opportunities, and a reliance on self-employment in the agriculture sector, which accounts for 61 percent of the total labor force (Nepal Living Standard Survey 2011). The unemployment rate for those aged 15-29 is 19.2 percent, compared to just 2.7 percent for people older than 15 (ILO 2014). Faced with these prospects, young Nepalese are compelled to consider migrating overseas in search of better opportunities.
Governments worldwide have responded by creating job training and placement services programs. In 2013, European Union (EU) launched an eight billion euro initiative aiming to provide every young European with a job, apprenticeship, or training within four months of becoming unemployed. 3 In Latin America, job training programs (referred to collectively as the "Jovenes" programs) have been implemented since the early 2000s. To date, more than 700 youth employment programs from around 100 countries have been implemented and more than 80 percent of these programs offer some sort of skills training. 4 2 Despite the rapid expansion of skill-enhancement employment programs across the world and the long history of training programs' evaluations 5 , debates about the causal impact of training based labor market policies on employment outcomes still persist. Based on US and European evidence, Card et al. (2009) review impacts of various training programs. Their analysis suggests that classroom and on-the-job training programs are not particularly effective in the short run, but have larger positive impacts after two years. They also find that youth programs, on average, tend to yield less positive impacts than untargeted programs. The evidence regarding impacts by gender is mixed. Kluve (2006) reviewed a number of employment program evaluations in Europe and found that the programs tended to have larger impacts for women than men. However, Card (2009), found that programs tended to work equally well for men and women.
In general, program evaluations from developing countries show larger impacts than programs conducted in other regions. Based on 289 youth employment interventions in 84 countries, Betcherman et al. (2007) show higher impact in developing countries than in developed ones. 6 Most of the rigorous evidence on training programs in developing countries is from Latin America, where positive impacts are particularly pronounced (Gonzalez-Velosa et al., 2012;Attanasio et al., 2008). 7 Attanasio et al. (2008) evaluate the Jovenes en Acción job training program in Colombia. Jovenes en Acción provided three months of classroom training followed by a three month unpaid internship at a company. Attanasio et al. (2008) detect positive employment effects for women (4 to 7 percentage points), no employment effects for men and positive earnings effects for both men (8 percent) and women (18 percent). The study argues that the increase in earnings is due to increased employment in formal sector jobs upon training completion. 8 In Nepal, a wide variety of public and private technical education and vocational training (TEVT) programs are available to youth. Youth unemployment is frequently cited as a driver of the tenyear conflict from which Nepal emerged in 2006(Macours 2011, ILO 2014, and since then international donors have invested heavily in the TEVT sector. Additionally, a combination of age, low educational attainment, and norms around marriage and childbirth confers multiple additional disadvantages to young women in the labor market. Their labor force participation rate is 43.0 percent compared to 51.7 percent for young men (ILO 2014). Concerns about the social exclusion of women, ethnic minorities, indigenous peoples, and other historically disadvantaged groups have also spurred advocacy and investment in training opportunities for these groups. The sector 3 remains fragmented, however, with many public, private, and non-governmental training providers of varying quality and weak links to the job market. Among development partners, the World Bank and the Asian Development Bank are particularly active in this area, with the ADB's US$25 million Skills for Employment project and the World Bank's US$30 million Enhanced Vocational Education and Training Project. To the best of our knowledge, no rigorous evidence yet exists on the impacts of these programs on the labor market outcomes of participants.
Using a quasi-experimental approach, this report presents the short-term effects of skills training and employment placement services in Nepal. Founded in 2008, the Employment Fund (EF) is operated by Helvetas, a Swiss NGO, in partnership with the Government of Nepal. It is currently one of the largest youth training initiatives in the country, serving almost 15,000 youth annually.
In partnership with the Employment Fund's donors, 9 the Adolescent Girls Employment Initiative (AGEI) was launched in 2009 to expand the program's reach to an additional 4,410 Nepali women aged 16-24 over a three-year period.
Unique characteristics of this impact evaluation include:  A large study sample (4677 individuals in the pooled 2010-2012 cohorts examined in this report)  Particular focus on the outcomes of women aged 16-24 as well as the outcomes of a broader category of youth (men and women aged 16-35)  Socio-economic survey data of participants and a control group of non-participants as well as administrative follow-up data of program participants  Exhaustive tracking of program participants over time that produced high response rates.  Examination of a large set of outcome domains, including employment and earnings, empowerment and self-confidence, risky behaviors, and impacts on the household.
We find, approximately two years into the program, the EF program positively improved employment outcomes. EF training program participation generated an increase in non-farm employment of 15 to 16 percentage points for an overall gain of 46 percent. The program also generated an average monthly earnings gain of 921 NRs monthly (12 USD). Given that the average monthly income at baseline was 1272 NRs ( 17 USD), the EF program impact generated an economically meaningful income gain of about 72 percent for the combined 2010-2012 cohorts.).We find that the program's impacts on employment were larger for women than for men, though the impacts on other economic outcomes were similar for both sexes. We also detect no difference in impacts when comparing older (24-35) and younger women (age 16-24), indicating that the program is equally effective for the younger women targeted under the AGEI. 4 This report's employment estimates are comparable, though somewhat higher, than other recent experimental interventions in developing countries. Estimates of employment impacts in job training programs in Latin America by Attanasio et al. (2008) and Alzua et al. (2013) show respectively gains of 14 percentage points and 10 percentage points. In terms of earnings, this evaluation finds substantially higher impacts than the increase in earnings of 18 percent found by Attanasio et al. (2008).
The report is organized as follows: In Section II, we describe the global Adolescent Girls Initiative and the Employment Fund program in Nepal. Section III describes the design of the AGEI impact evaluation, Section IV describes our empirical approach, and Section V discusses the potential threats to the study's validity. Section VI presents results and Section VII concludes.

II. Skills Training for Young Women a. Programs worldwide
This impact evaluation was launched under the aegis of the World Bank's Adolescent Girls Initiative (AGI) launched in 2009 with the aim of facilitating the transition to productive employment for young females. Comprised of eight pilot projectsin Afghanistan, Haiti, Jordan, Laos, Liberia, Nepal, Rwanda and South Sudan-the initiative's launch was motivated by two factors: (1) the particular challenges faced by young women in developing countries during the school-to-work transition, including high burden of domestic work, competing pressures to bear and raise children, negative social norms regarding occupational choice and mobility, gender discrimination, and (2) the potential benefits --to household members, and to current and future children --of empowering young women.
AGI interventions offered skills training and various ancillary services tailored to local context, such as childcare, mentoring, job placement assistance, and links to microcredit, in order to facilitate young women's transition to productive employment. Five of the eight pilots included an experimental component and two of the impact evaluations have reported results. Using a randomized evaluation, the AGI project in Jordan provided short-term soft-skills training and sixmonth employment vouchers to recent female graduates of technical universities. Although the vouchers assisted young women to obtain short-terms internships, the evaluation detected no effect on the recipients' employment or earnings after the end of the voucher period (Groh, 2010). In contrast, the AGI pilot in Liberia found strong positive impacts on employment and earnings. Using a randomized "pipeline" design, individuals were randomly assigned to receive training in two sequential batches. In addition to positive impacts on participants' self-confidence, savings, and household food security, the program had sizable impacts on employment (47 percent) and weekly earnings (80 percent) for women aged 16-27 (Adoho 2014).

5
Two other skills training programs for young women have also been rigorously evaluated. In India, a training program in stitching and tailoring offered to young women in poor slum communities of New Delhi had a unique feature, introduced to increase commitment and encourage regular attendance (Maitra & Mani, 2014). Maitra & Mani (2014) found that the program increased the likelihood of casual or permanent wage employment by more than 5 percentage points, selfemployment by almost 4 percentage points, and any employment by 6 percentage points. The program increased hours worked in the post-training period by around 2.5 hours. Finally, a related project implemented by the international NGO BRAC in Uganda implemented village-level girls' clubs to provide life skills, reproductive health, and livelihood skills to young women aged 14 to 20. A randomized impact evaluation found substantial increases in employment (72 percent) and improved empowerment and reproductive health outcomes (Bandiera 2012).

b. The Employment Fund and the Adolescent Girls Employment Initiative (AGEI) in Nepal
Started in 2008, the Employment Fund (EF), now one of the largest skills training programs in the country, provides vocational training and placement services under a unique governance structure. Each year, the Employment Fund authorizes training programs under a competitive bidding system with various training providers. First, the Employment Fund issues a call for proposals to Training and Employment (T&E) providers seeking to provide skills training and employment services. The range of T&E provider types is enormous: from formal technical education and vocational training (TEVT) institutions, public and private providers, to skilled artisans offering apprenticeships. The second step, after the call for proposals, is for each T&E provider to complete a Rapid Market Assessment (RMA) outlining viable and potential employment opportunities. 10 The third step in the process is for the EF to evaluate submitted proposals according to preset criteria; the EF weighs the capacity and experience of each T&E provider, the market demand for the proposed trades being offered, and the proposed costs. Finally, the EF issues a contract to selected providers: the contract specifies the number of training courses (hereafter called "events") to be conducted and the number of individuals to be trained and employed in each event. The T&E providers are then free to recruit and select their own trainees for each of their training events, according to guidelines established by EF.
6 Table 1 provides the total number of T&E providers, number of training events, and number of trainees. Table 1 shows an increase of total number of program beneficiaries between 2010 and 2012. Training courses in technical skills vary across a wide range of trades (e.g., incense stick rolling, carpentry, tailoring, welding and masonry). All females receive 40 hours of life skills training (beginning in 2011) and a sub-set of EF trainees receive a short course in basic business skills. In addition, each trainee is encouraged to complete a skills certification test offered by the National Skills Testing Board (NSTB). 11 Upon completion of the classroom-based training, the EF places emphasis on job placement services. EF verifies trainees' employment status three months and six months after the completion of the training. 12 Upon verification, T&E providers receive an outcome-based payment from the EF that is higher for trainees who are employed. The outcome-based payment system creates strong incentives for the T&E providers to provide placement assistance and provides graduates with an opportunity to put their new skills to work immediately after the training. The EF emphasizes the placement of trainees into "gainful" employment in which they earn a minimum of 3,000 NRs (40 USD) per month. 13 In 2010, the EF partnered with DFID and the World Bank's Adolescent Girls Initiative to improve the EF's reach and impact for young women aged 16 to 24. Training under this Adolescent Girls Employment Initiative (AGEI) proceeded in the same way as it did for other EF trainees, except that certain events had been flagged in advance as likely to attract female trainees. This was done in order to ensure that the EF reached adequate numbers of young women. 14 For the purpose of this evaluation, we designate all female participants aged 16 to 24 in EF-sponsored training courses as "AGEI".
The Employment Fund struggled in 2010 to recruit young women to training events and in 2011 launched an enhanced communication and outreach strategy to recruit more female trainees. 15 In addition to the T&E advertisement, the EF sponsored radio and newspaper ads specifically geared towards young women. Many of these ads specifically encouraged women to sign up for non-7 traditional trades for women, such as mobile phone repair, electronics, or construction. 16 The Employment Fund also partnered with women's and community-based organizations to attract applications from women and marginalized groupsif a referred applicant gained entry to an EFsponsored training event, the partner organization was paid a small finder's fee equivalent to about 1.25 US dollar per person.
The Employment Fund uses a differential pricing mechanism that awards a higher incentive to service providers who agree to train (and place) more disadvantaged groups, according to established vulnerability criteria. 17 The highest incentive is awarded for training and placing the most disadvantaged (highly vulnerable women including AGEI trainees, widows, ex-combatants, disabled women, etc.), and incentives are gradually lowered for less prioritized groups. Training providers that are able to cater to these higher priority target groups are therefore eligible to receive a higher outcome price, but they also face a higher risk of failing to achieve the outcome (gainful employment). The combination of a results-based system with a progressive incentive scheme ensures that training providers with the capacity to work with vulnerable groups will likely opt to do so.

III. Impact Evaluation Design
This evaluation estimates the impact of the EF training program by comparing the outcomes of participants, who comprise the "treatment" group, to a control group of individuals who applied, but were not selected for, an EF-sponsored training course. Isolating the causal effects of the EF training program on employment and other outcomes is complicated by the fact that some T&E providers have at least some degree of choice over who they choose to train or training participants could seek EF training for reasons we cannot fully measure. 18 In other words, comparing postprogram participation outcomes of EF program participants versus non-participants may confound training program influences on outcomes with those of hard-to-measure individual-level, familylevel or T&E-level attributes that affect both training program participation and the outcome of interest.
This estimation problem would be greatly reduced if we could compare the outcomes of individuals from truly comparable backgrounds except for their EF training program status. The primary concern in being able to detect the causal effects of the training program is that the two sets of individuals (i.e., ones receiving training and ones who do not) may have had different 16 Anecdotal evidence also suggests that the introduction of life skills training, covering topics such as communication, leadership, and reproductive health, was particularly popular among young women and may have contributed to the growing numbers of AGEI trainees. 17 For more details on the differential pricing scheme for vulnerable groups and the outcome-based payment system, see http://wwwwds.worldbank.org/external/default/WDSContentServer/WDSP/IB/2014/07/18/000470435_20140718142756/Rendered/PDF/894 760BRI0141100Box385283B00PUBLIC0.pdf 18 Systematic differences in unobservable characteristics that cannot be measured quantitatively, such as motivation, confidence, and natural ability, will lead to biased estimates of program impact. 8 characteristics to begin with and it may be those characteristics rather than the EF training program that explain the difference in outcomes between the two groups. 19 The unobserved differences in characteristics are particularly concerning.
T&E providers can select which beneficiaries enter their training programs, and this selection process drives the difference between trainees and non-trainees. Each T&E provider advertises, collects applications, shortlists and interviews applicants, and selects participants for each of training events following Trainee Selection guidelines established by EF. These guidelines stipulate a two week minimum period of advertisement, the eligibility criteria for training events, and the procedures for shortlisting and interviewing. There are three eligibility criteria for all EFsponsored training programs: age (from 16 to 35), education (below SLC, 20 or less than 10 years of formal education), and self-reported economic status. 21 Only applicants who meet all three criteria are eligible to be shortlisted. T&E providers are advised to shortlist at least 50 percent more candidates than the number of spaces in the training event. The guidelines also outline a uniform process for interviewing shortlisted candidates, including a detailed scoring rubric, instructions for ranking the shortlisted candidates by score, and selecting the top-scoring candidates for participation. 22 Figure 1 in Annex 2 shows a sample ranking form used by T&E providers. This scoring and ranking procedure forms the basis of our sampling strategy for this evaluation and is described in detail in the next section.
We use quasi-experimental methods to solve the above evaluation concerns. Our general evaluation strategy is to observe an individual before and after an EF training program and to compute a simple difference in outcome for that individual over time. The average difference over time of the individuals enrolled in the training program (i.e., the treatment group) is then compared to the average difference over time of the individuals who are not enrolled (i.e., the comparison group). Conceptually, this so-called difference-in-differences method (Campbell, 1969;Meyer, 1995), cancels out the effect of all of the characteristics that are unique to a specific individual and that do not change over time. 23 Therefore the difference in outcomes across the two groups can (under certain conditions, discussed in Section Va) be attributed solely to the treatment status, i.e. whether the individuals received training or not. To further purge bias in our estimates arising from observable differences between trainees and non-trainees, we employ a combination of this difference-in-difference estimation with propensity score matching and propensity score weighting techniques. We describe the specific estimation and matching methods in the next section.

IV. Methodology and Data a. Sample Description and Sampling Technique
Our primary source of data comes from a survey covering three consecutive cohorts of EF trainees (from 2010 to 2012), with two rounds of data collection for each cohort. 24 Figure 1 depicts the impact evaluation timeline.
We sample at the training event and the applicant level. The main sampling frame for data used in this study consisted of all EF training courses sponsored in a given year. The number of training events comprising the sample frame ranges from 598 (in 2010) to 711 (in 2012). Table 1 reports the number of events and participants by year.
Sampling into this study included a combination of stratified, random and convenience sampling. First, we selected a subset of training events occurring between the months of January through April. 25,26,27 Second, from the universe of training events offered during these four months, we randomly selected up to 15 districts. Third, from that list of training events occurring in these districts, we randomly selected 20 percent of the training events. Finally, a survey team visited each sampled training event on the day when applicant selection took place. Each event's ranking sheet listed the shortlisted applicants from the top-scorer to the bottom and indicated the threshold, or minimum score needed to gain admission to the course. From this ranking sheet, the survey team selected applicants whose scores were within 20 percent of the threshold for admission to training events. The sampled applicants above the threshold comprise this study's treatment group, while those below the threshold make up the control group. Immediately following the sampling of applicants, a baseline survey was conducted on the treatment and control groups, before the results of the selection process were announced. over the course of an evaluation. Using the same reasoning, we might conclude that many unobserved characteristics of individuals are also more or less constant over time. 24 For the 2010 cohort, a second follow-up was conducted on half of the cohort. Future analysis will examine the longer-term outcomes for this group. 25 Because of the AGEI focus of this study, we prioritize AGEI training events (identified by the T&E as we described earlier).
Because the selection into the study population was based on an individuals' proximity to the threshold score, it was not possible to stratify on AGEI status. However, events that were likely to include more AGEI candidates were purposely oversampled in 2011 and 2012 so as to increase the number of AGEI candidates in the study population. 26 Eighty percent of EF training events occurred during these four months. 27 In 2010, because a complete event listing was not available in advance, the events were chosen by convenience, based on scheduling and accessibility. Table 2 shows the resultant sample of events for the three cohorts. The 2010 event sample comprised 64 events across 30 districts. The 2011 sample comprised 182 events, of which 113 events were dropped from the baseline survey, either because the survey team could not reach the event on the day of applicant selection or because the event was not "oversubscribed". 28 The remaining 69 events in 34 districts were included in the 2011 baseline sample. The sampling process was much improved in 2012, with 85 out of 112 sampled events successfully included in the final evaluation sample.
This non-random (in 2010) and partially-random (in 2011 and 2012) sampling of training events and applicants introduces bias in two potentially important ways. First, the training events in our study may not be representative of all EF-sponsored trainings. In all three cohorts, training events that enter our sample are more likely to be based in district centers, are more likely to be more sought after (and hence oversubscribed), and are more likely be run by high-capacity T&E providers. 29,30 Second, selected trainees could differ in characteristics, other than being offered training, from non-trainees, since T&E providers purposely interview and select the applicants they think will perform best. We mitigate this bias by selecting our study participants within a narrow range of each event's threshold score, which we hypothesize will limit the differences between the treatment and control groups. We examine this hypothesis further in section V.
The sampling procedures described above resulted in a study population of 4677 over all three cohorts. For the pooled sample (i.e., 2010 through 2012 cohorts), the study population is about 64 percent female and on average 24.5 years old. Fifty-eight percent are married while 51 percent have at least one child. Approximately 59 percent of the sample has engaged in any incomegenerating activity in the month prior to the survey, a figure which may seem high, but includes those who are working without pay on their own household farms. When we restrict to non-farm income-generating activities, the employment rate falls to 27 percent. At baseline, the average earnings of the pooled sample were 1272 NRs per month (equivalent to about 17 USD). This figure may seem low, since it represents the average earnings over the entire study population of 4677 individuals, including those with zero earnings. Only 17 percent of the 2010-2012 pooled sample earned more than 3000 NRs per month, a level deemed to represent "gainful" employment. Interestingly, about 15 percent of the sample was already engaged in the same trade for which 28 The survey team was instructed to drop the event from the sample if there were not at least 3 rejected candidates that fell within 20% of the threshold score. In other words, if there were not at least 3 people who could be sampled for the control group, the event was dropped from the sample. 29 The survey team deployed a fixed number of staff to each district center, with a schedule of all events in that district. Events conducted by high-capacity T&E providers and in popular trades were more likely to keep to their scheduled start date. Because the T&E providers were not required to wait more than 5 days between the interviews and the start of training, and the unpredictability of the interview dates, it was impossible for the survey team to reach all of the events with enough time to conduct the baseline survey. 30 If these characteristics also determine the quality of the training, this non-random sampling of events may bias our estimates upward and overstate the true impact of the program. One mitigating factor is that in every cohort our sample includes a high fraction of the T&E providers contracted by EF.
training they applied (denoted as "trade-specific IGA"), indicating that a significant minority of applicants were looking to upgrade their existing skills. Though they are not older than the men in the sample, women are more likely to be married and have a child, and have lower employment and earnings at baseline. 31 As discussed in Section II, the EF provides financial incentives for T&E providers to recruit and train people from Dalit and Janajati ethnic groups. For Janajatis, the T&E providers appear to respond to these incentives, as 44 percent of the applicants are Janajatis, and they are statistically more likely to be in the treatment group than control. 32 The T&E providers appear to have less success with attracting Dalit applicants: only 8 percent of the applicants are Dalits (a bit less than the population average), and they are equally divided between the treatment and control groups.

b. Estimating the EF Training Program Effects
To estimate causal effects of the EF training program on various outcomes, first we employ a "difference in difference" (DID) technique. We will refer to this as the OLS specification in the remainder of this report. The main equation we estimate is: This regression relates a given outcome to EF program training status. Yijt is the outcome of interest for individual i from training event j at time t. Treati is an indicator which is equal to 1 for the treatment group and 0 for control. Postt is an indicator equal to 1 for follow-up observations and 0 for baseline. Its coefficient captures aggregate factors that would cause changes in Yijt even in the absence of a training program. The term represents an individual fixed effect. This individual fixed effect is critical to our identification strategy, as it controls for differences in time-invariant observable and unobservable characteristics at baseline, as described in section III. The final term, , is an idiosyncratic error term that is clustered by training event, in order to account for the likely correlation of outcomes among applicants to the same training course.
The coefficient of interest, , defines the impact of the program comparing the change in Yij of treatment group individuals with control group individuals over time. If outcomes of individuals assigned to EF training are similar to individuals not trained (that is, if the training has no impact), then we should find = 0. If individuals trained by EF have better labor market outcomes than non-participants, we should find > 0. 12 To further purge remaining differences between observable characteristics among trainees and non-trainees that could influence the difference in impacts between the two groups, we augment the "difference-in-difference" technique with propensity score matching and weighting approaches. Both methods rely on first estimating each individual's likelihood of being offered training (i.e., propensity score), based on individual baseline characteristics, such as age, education, and family background.
We implement the propensity score method by first employing the following probit model: In this equation, is equal to 0 (for non-trained individuals) or 1 (for trained individuals) for individual i in event j, and Xi is a vector of individual and household level explanatory variables, all measured at baseline. The error term, clustered by event, is given by . To predict likelihood of being trained, we use age, sex, education, ethnicity, employment status, marital and parental status, analytical ability (as measured by the commonly used Raven's progressive matrices and one financial literacy question), and an entrepreneurial orientation score based on a set of 11 questions. At the household level, we include household size, education level of the household head, and the quintile of the household's wealth based on an index of ten household assets. At the district level, the model controls for the district in which each event is held, the T&E provider, and the trade of the training (e.g., hospitality), all represented by the vector . The predicted value of is the estimated propensity score, or likelihood of being in the treatment group.
After estimating this propensity score, we derive the estimated treatment effect using two methods. The first method is "inverse propensity score weighting", in which individuals are weighted according to the inverse of their estimated propensity to participate in the program. The weighted observations are then used in a DID regression, as given by equation (1). We will refer to this as the IPSW specification from here on. We implement IPSW following Hirano et al (2003). This particular weighting method, as opposed to matching approaches, has the nice property of including all the data (unless weights are set to 0) and does not depend on random sampling, thus providing for replicability. We use a weighted least squares regression model, with weights of 1/π for the treatment group and 1/(1-π) for the control group, where π is the estimated propensity score from (2). Standard errors are clustered by training event.
The second method is "propensity score matching", in which individuals in the treatment group are matched to individuals in the control group who have similar propensity scores. We use a nearest-neighbor matching algorithm, in which each individual in the treatment group is compared to a fixed number of control observations (in our case, four) with the closest propensity scores. We will refer to this as the NN specification in the remainder of this report. Following Smith and Todd 13 (2005), we estimate the difference-in-difference matching estimator for the training program effect as follows: is the number of treatment observations, the subscript 1 denotes follow-up observations and 0 denotes baseline observations; is a matrix of weights. Weights for nearest-neighbor matching are computed by: Ax is a set of observations with the lowest values of |̂−̂|. As in the two previous models outlined in this section, the dependent variable is the first difference of a given outcome between the baseline observation and the follow-up observation. The statistical software package we use for this specification does not allow for clustering of standard errors. 33

c. Intent-to-Treat (ITT) and Average Treatment Effect on the Treated (ATT)
Estimates of the EF program's effects correspond to two different questions. The first is the effect of the intervention on the average outcomes of those assigned to one of the EF training events, regardless of whether they used the training services. In the experimental literature, this is known as the intention to treat (ITT) effect. Angeluci and Orazio (2006) discuss the quasi-experimental counterpart, which is the method we apply in this study. 34 If one is interested in the effectiveness of a program among the entire class of youth who are eligible for it, then the ITT estimates are the appropriate results to examine. Note that this ITT estimate is not biased by the fact that only some individuals choose to participate the EF program because we derive the ITT by comparing the average outcomes of everyone assigned to one of the EF training groups, whether they use the program or not, with the average outcomes of everyone assigned to the control group (i.e., the non-trainees).
If instead one is interested in the effects among individuals who have actually completed the program, then one should consult the average treatment effects on the treated (ATT) results, which are based on actual program participation, rather than program assignment.
We present both intent-to-treat (ITT) and average treatment effects on the treated (ATT) estimates of the impact of the Employment Fund's training program. In our view, the ITT estimates are more 14 relevant from the policymaker's perspective. 35 If one were to invest in scaling up the EF's training programs to more Nepali youth using the same eligibility criteria, the ITT estimates indicate the level of impact one could expect to achieve on those target youth. The ITT estimates account for the foreseeable uptake and dropout rates that one would expect if the program were expanded. The ATT estimates are relevant from a program implementer's point of view, as they indicate how well the program worked for the participants who completed the course. They compare everyone who completed the training to everyone who did not, regardless of why they did not complete the training (e.g., they were not offered a space, they were not eligible, or they dropped out or declined to join). 36 Because neither the direction nor the extent of bias can be determined precisely, and to address both the policy and programmatic perspectives, this report presents ITT results in Section VI and ATT estimates in Annex 3.

d. Heterogeneous Effects
In seeking to examine the effects of the Employment Fund's training program, it is important to keep in mind that different people within the intervention may respond differently to the same policy intervention-a possibility that researchers refer to as "treatment heterogeneity." We might be concerned that people with particular social or demographic characteristics may fare differently than the average program participant.
Because treatment heterogeneity has important policy implications, we estimate heterogeneous treatment effects by estimating the impact both for the full sample of participants as well as for several sub-populations. For example, we may want to know how well the program works for the average participant or how well the program works specifically for women, young women, and different ethnic groups that are especially targeted by the EF. Different effects for various demographic groups can assist us in informing the policy debate regarding whether specialized investments are paying off or whether further strategies might be needed to ensure that various groups participate and benefit equally from the program. The AGEI sub-group (i.e., young women aged 16-24), in particular, merits special attention because of the Employment Fund's outreach activities towards this particular age group.
To test whether the impacts of the EF training program vary by certain pre-defined sub-groups, we employ a triple difference specification: 35 Conceptually, the ITT estimates are equal to or lower than the ATT estimates, since they include all the people to whom the program was targeted but chose not to participate, or who did not complete the program in the treatment group. ATT estimates may overestimate true program impact, since they represent the impact among a self-selected group of people who chose to take part and complete the program. It may be that those who are more motivated and capable are more likely to remain in the program, and the less capable drop out; on the other hand, it may be that the more capable decide that they don't really need what the program provides. 36 Data on training program completion were obtained from the Employment Fund's administrative records, and cross-checked with self-reported training program participation collected at follow-up.

15
= + ψ( Groupi represents the subgroup characteristic. The parameter of interest ψ, indicates the differential impact of the EF training on that subgroup (relative to everyone else), while (ψ + β1) indicates the overall impact on the treated individuals in that subgroup. The remaining terms are as defined in (1).

V. Internal Validity
Although we have sought to maximize the scientific quality of this impact evaluation, it remains possible that we have missed some important program impacts. Alternative explanations, other than assignment to a training program, could account for differences in outcomes between individuals offered training and individuals who are not trained. These explanations are the socalled threats to the internal validity. In this section, we address and dismiss such potential explanations.

a. Time-varying trends and differences at baseline
We start by addressing concerns about pre-existing differences and time-varying trends that could account for observed training effects when comparing trainees and non-trainees. 37 Towards that objective, we present "balancing tests" which capture the degree of similarity between the two types of participants. Table 4 presents baseline participant characteristics (i.e., balancing tests) for a set of 38 demographic indicators. 38 These tests are based on "ITT" comparisons of the treatment group (i.e., individuals whose scores qualify them for admission to an EF training event) and the control group. The baseline balance tests for the pooled sample (2010-2012) indicate that significant differences exist between treatment and control groups for baseline observable characteristics and pre-treatment outcome variables. 39 Relative to rejected candidates, treated individuals are more likely to be Janajati and are less likely to have finished SLC (10 th grade), characteristics which reflect the eligibility criteria and the EF"s differential pricing scheme for vulnerable groups. Further, the likelihood of treated individuals being engaged in non-farm and trade specific employment before take up of training was higher, as well as their working hours and ability to earn more than 3000 NRs a month. These differences are consistent with T&E providers' incentives to select candidates they think will perform best. Finally, individuals in the treatment group are also less likely to have control over savings and money of their own at baseline. To address these differences (and potential differences in unobservable characteristics) we applied 16 a difference-in-difference approach in our analysis. However, growth in outcome variables and the may not follow a common trend, particularly when starting off at very different initial levels.
Although it does not resolve the parallel trend assumption, we additionally applied propensity score weighting and matching techniques to achieve a higher degree of baseline comparability across groups. 40

b. Survey response rates and attrition
One potential problem with any study such as ours is sample attrition. Approximately one year after each baseline survey, we conducted a follow-up survey. 41,42 The response rates were quite high for both follow-up surveys (see Table 5). Overall, the survey firm was able to track and successfully interview 88 percent of the baseline survey respondents, yielding a final sample for analysis of 4,101 individuals. 43 High response rates such as these limit the degree to which any differences between respondents and non-respondents can affect our estimates.
Particularly worrisome is the possibility of so-called "differential attrition", a situation in which we re-interview trainees and non-trainees at statistically different rates. Even with low attrition rates for a panel survey, differential attrition may influence the scientific validity of results. Suppose, hypothetically, that very motivated individuals in the control group (i.e., non-trainees) are more likely than trainees to migrate to a district where they find employment. As a result of this migration, these individuals will not show up in our analyses and this scenario could compromise the scientific validity of this study's estimates.
In Table 20, we explore this possibility and show no evidence to support it. Table 20 shows a series of regressions on the correlates of survey attrition. We regress the panel status of respondents on their treatment status, depending on how the treatment indicator is defined. This specification provides information on whether treatment or control individuals are more likely to be lost to follow-up. Next, we add a series of covariates, such as gender, age, marital status, parental status, ethnicity, and employment, a set of district and T&E provider-specific control dummy variables.
The results indicate that attrition is not correlated with treatment status, and hence differential attrition between treatment and control groups is not an issue in this evaluation.

c. Uptake of EF-sponsored training courses
Another threat to our study is imperfect compliance, which occurs when those offered treatment choose not to participate, or when people from the control group gain access to the program. These deviations from treatment assignment may bias our estimates of program impact, because the factors that determine one's actual participation, such as motivation and tenacity, cannot be observed and are likely correlated with the outcomes of interest.
Using administrative data from EF, we examine the rate of program take-up by the treatment and control groups for the 2010, 2011 and 2012 cohorts in Table 6. 44 The table shows a high degree of uptake (65 to 74 percent) among the treatment group, but also a high rate of participation among the control group. 45 Between 26 percent and 36 percent of the individuals in the control group participated in the EF training course that they applied for, even though their scores did not qualify them for admission. 46,47 In the presence of imperfect compliance, standard impact evaluation methods produce intent-totreat (ITT) estimates, as described in section IV. However, the relatively high degree of noncompliance in our study leads to a likely downward bias in our ITT estimates, which compare a treatment group in which not everyone received treatment to a control group in which some people did receive treatment, hence diluting the impact of the program. We present an alternative set of ATT estimates in Annex 3 in which we compare those who actually complete training to those who do not, irrespective of treatment assignment. 48 In our view, the ITT results serve as a lower bound for the estimated program impact while the ATT results serve as an upper bound.

VI. Short-Run Results of the EF Program
This section describes the impact of the Employment Fund's training programs for the combined 2010, 2011 and 2012 samples. We measure outcomes approximately one year after the start of 44 An individual is recorded in the EF monitoring database as a trainee only when the T&E provider submits the person's name at the end of the training. Since the EF only reimburses T&E providers at the conclusion of the training, they do not record nor do we have any way to track how many people enrolled, but did not complete, the training in 2010 to 2012. 45 The monitoring records indicate that the control group individuals who received training did so in the original training course that they applied for. There is no evidence that they applied for and participated in a different EF training course sponsored by a different T&E provider, nor that they participated in EF training courses in a different year. 46 In practice, while we cannot be sure what happened in each case, the T&E providers probably dipped into the pool of lowerscoring applicants in an attempt to fill up the training slots as people dropped out. 47 The stable unit treatment value assumption (i.e., SUTVA) based on Rubin (1980) assumes that (1) the treatment status of any unit does not affect the potential outcomes of the other units (i.e., non-interference) and (2) the treatments for all units are comparable (i.e., no variation in treatment). Note that take-up of training among control group units is not a violation of the SUTVA, as take-up among non-beneficiaries was not directly a result of the take-up of training among the individuals selected for training. 48 The ATT estimates suffer from an additional source of bias, since it is likely the "best" members of the control group who gained access to the program and are reassigned to treatment in the ATT analysis, and the least motivated among those assigned to treatment who fail to participate and are reassigned to control. See the discussion in Annex 3.
training. 49 We start with a description of our analysis on likelihood of being in the treatment group. We proceed with the short-term impacts of the EF program on the full sample and on various demographic and socio-economic sub-groups.

a. First-stage Estimation of Propensity Scores
At the heart of the propensity score matching method is pairing trainees with non-trainees who are similar in terms of their observable characteristics. This pairing between a trainee and comparable non-trainees is done via a propensity score, as described in section IV.
Results from the propensity score estimation (shown in Table 7) exhibit only three variables correlated with an individual being offered training (i.e., treatment status). 50 This finding reflects the high degree of similarity among short-listed candidates. The only characteristics correlated with an individual being assigned to a training program are being Muslim, the likelihood of having children, and the number of children one has. No other variables are correlated with training assignment, meaning that within the pool of shortlisted candidates, there are few observable differences between those selected for training and those who are not.
Based on the estimated propensity score, we match each treated individual (i.e., a person assigned to a training program) to a group of control observations (i.e., individuals who are non-trainees), as described in section IV.

b. Impacts on Employment and Earnings for the Full Sample
In this section we present impacts of the EF program as estimated with the three identification strategies described in Section IV. Table 8 shows the ITT results on employment and earnings for the pooled 2010, 2011 and 2012 cohorts. We find (results in the first row of Table 8) strong evidence of consistent impact on the employment rate across all specifications. 51 All three models indicate a positive and significant effect, despite the high employment rate (i.e., 61 percent) at baseline. Restricting the employment to non-farm activities, we also find a significant increase: the rate of participation in non-farm income-generating activities increases by 15 to 16 percentage points (from a base of 29.6 percent). Translating the results in percentage change terms, we find that the program increased non-farm employments by 50 to 54 percent. These impacts are not only statistically significant but also economically meaningful. 49 Because the EF-sponsored training courses vary in length from 1 to 3 months, the follow-up survey examines outcomes 9 to 11 months after the end of the training. 50 We report the ITT results, for both all three cohorts. For the ITT first-stage probit, treatment is defined as having a score that qualifies the respondent for admission to the EF training course to which s/he applied. 51 We measure employment by whether the respondent reported any income-generating activities in the past month or not.
We also examine the trade-specific income generating activity (IGA) ratethe percent of individuals who find employment in the same trade as the training that they applied forand we find impacts ranging from 18 to 19 percentage points. The trade-specific IGA impacts are larger than the non-farm employment impacts, suggesting that members of the control group, even when able to find employment, were less able than the treatment group to find employment in the trade in which they sought training.
Our results are considerably lower than estimates obtained by a simple before-and-after comparison within the group of trainees. For the 2010 cohort, for example, the treatment group had a non-farm employment rate of 30 percent at baseline and 55 percent at midline. A simple difference in employment rates would have suggested that the program increased employment rate by 25 percentage points (or 83 percent), substantially larger than our quasi-experimental estimate of 16 percentage points (or 50 percent).
The EF program also leads to persistent improvements in the underemployment rate (i.e., cases in which people are working fewer hours than they wish). Table 8 shows that EF-sponsored training courses increased hours worked in IGAs for the pooled cohorts by 19-21 hours per month (i.e., 28-30 percent). All three model specifications exhibit a statistically significant and positive impact.
We detect strong program impacts on monthly earnings. We measure earnings as an individual's total earnings in the past month, including income from all IGAs, but not including unearned income. 52 We observe a statistically significant (at the 1 percent level) increase in monthly earnings for the treatment group by 850 to 921 NRs (≈ 12 USD), from a baseline average of 1272 NRs (≈ 17 USD). 53 In percentage terms, this earnings increase translates to a 72 percent for the pooled sample.
With alternative measurements of earnings, we detect even larger program impacts. To account for the highly skewed nature of earnings distributions, we examine the impact on logged earnings and we find impacts of over 100 percent. A third way to examine the impact on earnings is to consider the proportion of participants who earned a "decent living." The Employment Fund considers 3000 NRs per month (≈ 40 USD) as "gainful employment" and considers this amount as "being productively employed." At baseline, only about 20 percent of the sample was "gainfully employed". The EF training program increased the "gainful employment" rate increases by 13 to 14 percentage points, a result statistically significant across all three models. The surveys of the 2011 and 2012 cohorts provide data on self-employment and work outside of home, conditional on having non-farm employment. We find an increase in self-employment (significant at the 5 percent level) and no increase in the proportion of people working outside the home.

c. Non-Employment Impacts for the Full Sample
The last four rows of Table 8 present the estimated impacts on savings and borrowing behavior. We find no systematic impacts of the EF training program on loans. The program had a significant and positive impact on individual total savingsa year into the program, savings increased by 901 to 1171 NRs (≈ 12 to 15 USD). Table 9 presents program impacts on a range of empowerment outcomes. Because empowerment is a multi-dimensional concept, we examine various aspects of empowermentfrom psychological empowerment, 54 entrepreneurial self-confidence, 55 and financial literacy 56 to control over resources. 57 The results show a number of significant impacts of the EF training program. Relative to the control group, EF training participants report having more money of their own, more control over household spending, and more access to mentors who can advise them on work-related matters. These measures indicate strong gains in economic empowerment. We also see strong gains on psychological empowerment, including significant increases in self-confidence both in life and with regard to entrepreneurial activities. The gains are on the order of 0.1 to 0.15 standard deviations. We also find marginal impacts, equal to about 0.13 of a standard deviation, on the selfregulation ability of trainees, which reflects the respondent's self-reported ability to control impulses, delay gratification, stick to difficult or detail-oriented tasks, and exert control over what happens in their life, all of which are predictors of labor market success. . Taken together, these positive impacts on six out of 10 empowerment indicators demonstrate how a skills training program and subsequent employment can benefit youth in a number of realms.
The final set of outcomes examined relate to reproductive health and household-level outcomes. 58 Table 10 shows that the EF program had no average effects on desired fertility. The program had 54 Psychological constructs related to self-confidence and self-regulation were measured using a series of questions to which the respondent indicates their level of agreement (using variations of a Likert scale), which were then aggregated into a single score. The self-regulation scale included questions on goal-setting and self-control. Preliminary reliability testing yielded a Cronbach's alpha of 0.77 for the self-regulation scale. 55 The entrepreneurial score is based on the respondent's selfconfidence to perform a series to tasks related to running a business or searching for a job. The specific tasks asked were: Find information about job opportunities in your community, Apply and interview for a job, Run your own business, Work in a team with 3-4 other people to accomplish a task, Identify income generating activities to start up a new business, Obtain credit from a bank, microfinance institution, CBO or NGO, Manage financial accounts, and Collect the money someone owes you. Respondents rated their ability to do these tasks on a scale of 1 to 5, and their responses were summed to form a single numerical score. 56 Financial literacy is defined as responding correctly to at least one out of three questions indicating the respondent's familiarity with interest rates and ability to perform a mathematical computation (about one-third of respondents got the question right at baseline) 57 Control over monetary resources is asked in a variety of ways in different sections of the survey questionnaire, including a direct question on whether the respondent has any money of his own that he alone can decide how to use, whether he/she can decide how to spend any earnings from employment or whether he/she needs permission, and whether he/she controls her own savings, and the extent to which he/she participates in decisions regarding common household expenditures such as food and medicine. 58 The rationale for studying the effects of a job training program on reproductive health outcomes comes from the strong links between employment and fertility decisions, particularly for female youth. When first entering the labor market, many youth are making concurrent decisions about whether to pursue paid work outside the home, whether and when to start a family, and how 21 no impacts on pregnancy. The EF training had no detectable impact on HIV knowledge, on household food insecurity, 59 or on protein consumption. 60 Understanding the full-array of potential advantages and disadvantages of job training entails considering the EF program impacts on trainees' out-migration. Nepal's youth labor force outmigration rates hover around 20 per cent every year, with as many as 350,000 Nepalese leaving the country to look for employment opportunities elsewhere, including mainly in Persian Gulf countries and Malaysia. On the one hand, out-migration may improve outcomes for the migrants' family through remittances. On the other hand, investing in skills development for youth only to have those youth depart the country may cause concern for policymakers. This evaluation finds no evidence that the EF program increased out-migration, at least in the short run. Not only do we not find any increase on the receipt or amounts of remittances received by households of trainees (as shown in the last two rows of Table 10), but the fact that we were able to successfully re-interview 89 percent of the baseline survey respondents approximately a year after program enrollment indicates that almost all of them remained in the country.

d. Trade-wise Program Impacts on the Full Sample
The Employment Fund sponsors about 600 training courses annually --from short four-week courses on incense-stick rolling to three-month technical courses. The breadth of course offerings attracts individuals of varied training and career needs. 61 Our evaluation sample includes a bit more than 10 percent of the courses on offer each year (shown in Tables 1 and 2). Panel 1 of Table 11 shows the breakdown of courses by trade.
We grouped training courses into seven categories. 62 The most common categories of training in our sample are Electrical/Electronics/Computer (e.g., electric wiring, computer hardware technician, and mobile phone repair), Construction/Mechanical/Automobile (e.g., arc welding, brick molding, furniture making, motor bike service), and Tailoring/Garment/Textile (e.g., galaicha weaving, garment fabrication, hand embroidery, tailoring and dressmaking). Because we sample approximately the same number of applicants per event, the breakdown of applicants by trade (Panel 2 of Table 11) is very similar to the event-wise breakdown. show strong ITT impacts on employment-graduates of these training programs are more likely to have employment in general and are also more likely to be working in the trade in which they were trained. Beautician training shows large impacts on both employment and earnings.
We detect no significant impacts on employment or earnings outcomes for the remaining four trades. Results for food and hospitality (e.g., cooking and wait service) show no significant ITT impacts; results for construction show no significant impacts except for a marginal impact on trade-specific employment and on earning more than 3000 NRs per month. 63 For the remaining three trades (i.e., poultry technician, handicrafts and farming), we detect some ITT impacts but they are not consistent across models.
Overall, the results in Tables 12 reveal substantial heterogeneity in employment outcomes across the various types of training. The positive and significant impacts discussed in Section VIb are driven almost entirely by three categories of trades: electronics, beautician training, and tailoring trades show positive and significant impacts on employment and earnings across both cohorts. We find no impacts for the food/hospitality and farming training. Construction-related trainings showed positive and significant impacts, but the effects are not consistent across outcomes. 64

e. Gender-and Age-disaggregated Impacts
An important objective of this report is to examine program impacts for men and women, and in particular for the "AGEI" population (i.e., young women aged 16 to 24). To that end, Tables 14-19 disaggregates the results to compare outcomes for men versus women, and for younger women (the "AGEI" population) versus older women. 65 Table 14 shows the differences in program impacts by gender. The employment impacts are significantly larger for women than for men, indicating that the EF program is more effective at increasing both overall employment and non-farm employment for female participants than for male. The results for other economic outcomes, such as hours worked, earnings, and type of employment, are similar for both sexes.
Several factors could account for the differential in employment impacts between men and women. First, when asked, T&E providers suggested that female students attend classes more and are more diligent than male students. We lack data on the attendance or completion rates of EF trainees; 63 Construction is a unique case in which some labor demand comes from the Middle East and other foreign countries; the fact that the construction trainings do not increase employment and earnings, regardless of location of employment, may help to explain why no impacts on remittances were observed in Table 15. 64 The difference in impacts may be driven by the types of training falling under this category (the 2010 sample includes more brick-molding courses while 2011 includes more furniture making), the characteristics of the trainees (brick-molding is femaledominated), or may reflect differences in the demand for construction labor in those two years. With the exception of poultry training, all of the training categories were offered in several districts throughout the country, so it is unlikely that local market differences are driving the observed differences in results. 65 For simplicity, we present only the Inverse Propensity Score (IPSW) regression model results.
however, the rate of non-compliance with treatment assignment is equal for men and women, suggesting that this mechanism is unlikely to exert much influence on outcomes. Second, the Employment Fund introduced life skills training for women in 2011 in all of its training courses. 66 Because all women received life-skills training, we cannot disentangle the influence of this factor, from other program elements, on outcomes. A third explanation is simply that men start with a higher level of non-farm employment (47 percent compared to 20 percent for women at baseline) and hence it may be easier to make large gains on the extensive margin for women. A fourth possible explanation relates to the difference between the types of trades that men and women apply for. Although the Employment Fund specifically tries to encourage female participation in non-traditionally female trades, most of the training courses tend to be heavily gender-segregated. For example, men tend to dominate electronics and construction courses, while the tailoring and beautician trainings are comprised almost exclusively of females. As shown earlier, the tailoring and beautician trainings exhibit the largest employment impacts. 67 The results in Table 15 indicate that the impacts on the employment outcomes for the AGEI sample (women aged 15 to 24) resemble the ones for older women (aged 25 to 35). We detect almost no statistically significant differences between younger and older women, meaning that the EF program appears to work equally well for the young women trained under the AGEI. The only difference we find between these two groups is that, conditional on employment, young women saw larger gains on obtaining work outside the home than older women. Table 16 shows the differential impacts by gender on empowerment outcomes. Some of the average impacts shown in Table 9 are shown to be stronger for men or for women. For men, we detect a significantly larger impact on control over household spending, though the difference is only significant at the 10% level. For women, we detect a significantly larger impact on having a mentor, as well as a marginally larger gains in control over earnings and mobility relative to men. In Table 17, there are no significant differences in empowerment impacts between younger and older women.
The final set of results, in Table 18, examines the impacts on reproductive health and family-level outcomes by gender. While the full sample results in Table 10 showed no impact on desired fertility, disaggregating by gender reveals that the program had significant, though countervailing, impacts on men and women. For men, the EF program increased the desired number of children while for women the desired number of children went down. Despite this change in fertility preferences, we do not detect significant impacts on pregnancy or actual fertility. Table 18 also 66 The forty-hour curriculum covered topics such as negotiation skills, workers' rights, sexual and reproductive health, and dealing with discrimination. Female students overwhelmingly responded positively to the life skills training, often claiming that it was one of their favorite parts of the course. The skills learned and the positive experience in this life skills training may contribute to the increased employment impact for women, which is line with the advice from experts in vocational training from around the world, who increasingly advocate for the inclusion of life skills in technical training programs 67 It is entirely possible, in addition to the gender-related explanation, that occupations for which one obtains skills in the tailoring and beautician courses are more profitable (relative to other occupations) in the Nepali labor market.

24
shows a marginally larger impact on protein consumption for men relative to women. We detect no statistically significant difference between the effects for family and reproductive health outcomes (Table 19) for younger women in the AGEI sample compared to older women.

VII. Summary and Implications
In this section, we summarize the impacts of the Employment Fund's training program. The EF program aims to increase employment among Nepal's disadvantaged youth through skills training and job placement support. It is one of the largest providers of Technical Education and Vocational Training (TEVT) in Nepal, and it works with local non-governmental training providers to offer courses of up to three months in a variety of trades, followed by six months of employment placement support.
To understand the effects of the EF program, we compared the experience of EF program trainees with the experience of comparable non-trainees, approximately one year after the program. Our sample included both male and female trainees, with a particular focus on the AGEI population (i.e., females aged 16-24).
Approximately one year after each baseline survey, we conducted a follow-up survey. The response rates were quite high for both follow-up surveys. Overall, the survey firm was able to track and successfully interview 89 percent of the baseline survey respondents. We detect a high degree of training uptake (65 to 74 percent) among the group offered training, but also a high degree of participation among the control group. Between 26 percent and 36 percent of the individuals in the control group participated in the EF training course that they applied for, even though their scores did not qualify them for admission.
Using a combination of difference-in-difference and propensity score matching methods, we find positive and statistically significant effects on labor market outcomes for EF trainees on the following outcomes:  employment rates (any or non-farm);  finding employment related to the skill they learned;  hours worked  earnings; and  the proportion of people earnings more than 3000 NRs per month Individuals selected for EF training programs experience an increase in non-farm employment of 15 to 16 percentage points for an overall gain of 50 percent. We detect an increase in average monthly earnings of approximately 72 percent. The gains for individuals who actually complete the training (not just those who are selected for training) are even larger, as discussed in Annex 3.
We also find strong impacts of EF training programs on economic and psychological empowerment. Trainees had more control over economic resources and stronger self-assessed selfconfidence. In contrast to the findings on employment and empowerment outcomes, we find limited evidence of impacts on reproductive health or household level outcomes.
Alongside the sizable general impacts on employment outcomes, we find that training courses in electronics, beautician services, and tailoring underpin most of the EF program's impacts. These three categories of training are much more effective in consistently increasing employment and earnings than construction, poultry rearing, handicrafts, and food preparation and hospitality.
We also find larger impacts on employment for women than for men. Women selected for training in 2010 to 2012 experience overall and non-farm employment gains of 13 and 19 percentage points respectively, while the corresponding impacts for men are 2 and 10 percent. However, we find no significant differences by gender on other economic indicators such as earnings, hours worked, trade-specific employment, or savings and loans.
The impacts for young women aged 16 to 24 are not significantly different from those for older women aged 25 to 35. This finding highlights the suitability of TEVT programs for younger women, although it may require specialized outreach strategies to recruit them. The larger impacts for women overall are consistent with the findings of rigorous evaluations of several training programs in Latin America, though uncovering the reasons for these differences requires further research.
The EF training program impacts are among the largest for training programs that have been evaluated around the world. We posit two factors as possible explanations of these large impacts. First, the EF had time to become established and to develop systems prior to the launch of the impact evaluation in 2010. Most impact evaluations are conducted during the pilot stage of project implementation, and consequently they do not always measure program impacts at their best. In this evaluation, the program was already operating at scale, and the service delivery processes already road-tested, which surely contribute to the large impacts. Second, the EF program is designed around employment outcomes. For instance, training providers must complete market assessments as part of their proposals to ensure future employability in the trades in which they propose to train individuals. Training providers are rewarded with outcome-based payments that are higher if they can demonstrate that their graduates have found work, a policy which fosters an incentive to not just train young people, but to find jobs for them. The fact that many training providers return year after year to work with the EF despite this unusual contracting arrangement speaks to the feasibility of this approach.   Notes: More events were sampled than conducted in Jan-Apr 2011 because some events that were scheduled for Jan-Apr were delayed and did not start on time.  Standard errors are clustered by training course. "ITT" indicates that treatment is defined as having a score that qualifies the respondent for an EF training course. *,**, and *** denote significance at the 10% level, 5% level, and 1% level. Notes: This table uses "ITT" treatment statusthose whose scores qualify them for training. There are four individuals from the 2011 cohort and five individuals from 2010 whose status in the EF database is unknown. For these individuals, we rely on the respondent's self-report of whether they took an EF training in the past year for the ATT results. The table only includes those individuals who were surveyed for the first follow-up. Psuedo R2 0.050 Notes: Standard errors (reported in brackets) clustered by training course. "Treat (ITT)" equals 1 if individual qualified for a training course and 0 otherwise. Other independent variables (not shown): district and T&E provider fixed effects, training-type categories, quintiles of household wealth. All variables measured at baseline. Although baseline data were collected on 4,677 individuals, incomplete data on ethnicity reduces the number of observations to 4,449. *,**, and *** denote significance at the 10% level, 5% level, and 1% level. Notes: All columns report difference-in-difference estimates. "ITT" indicates that everyone whose score qualified them for a given training event is included in the "treatment" group. Standard errors (reported in brackets) clustered at the event level where possible. Self-employment and location of work were not asked in 2010. *,**, and *** denote significance at the 10% level, 5% level, and 1% level. Notes: All columns report difference-in-difference estimates. "ITT" indicates that everyone whose score qualified them for a given training event is included in the "treatment" group. Standard errors (reported in brackets) clustered at the event level where possible. *,**, and *** denote significance at the 10% level, 5% level, and 1% level. Notes: All columns report difference-in-difference estimates. "ITT" indicates that everyone whose score qualified them for a given training event is included in the "treatment" group. Standard errors (reported in brackets) clustered at the event level where possible. *,**, and *** denote significance at the 10% level, 5% level, and 1% level.   Yes Yes Yes Notes: Standard errors (reported in brackets) clustered at the event level where possible. *,**, and *** denote significance at the 10% level, 5% level, and 1% level. Yes Yes Yes Notes: Standard errors (reported in brackets) clustered at the event level where possible. *,**, and *** denote significance at the 10% level, 5% level, and 1% level. Younger women (aged 16 to 24) compared to older women (age 25 to 35). Yes Notes: Standard errors (reported in brackets) clustered at the event level where possible. *,**, and *** denote significance at the 10% level, 5% level, and 1% level. Clustered Standard Errors Yes Yes Yes Notes: Standard errors (reported in brackets) clustered at the event level where possible. *,**, and *** denote significance at the 10% level, 5% level, and 1% level. Younger women (aged 16 to 24) compared to older women (age 25 to 35). Yes Notes: Standard errors (reported in brackets) clustered at the event level where possible. *,**, and *** denote significance at the 10% level, 5% level, and 1% level. Yes Notes: Standard errors (reported in brackets) clustered at the event level where possible. *,**, and *** denote significance at the 10% level, 5% level, and 1% level. Younger women (aged 16 to 24) compared to older women (age 25 to 35). Yes Notes: All regressions use probit models. "District, TE dummies" indicates that the regression controls for district and training provider effects. Columns 2 and 3 also include training category dummies (not shown). All standard errors are clustered by event. *,**, and *** denote significance at the 10% level, 5% level, and 1% level.

Annex 3. Analysis of Average Treatment Effects on the Treated (ATT) Impacts
This Annex analyzes the impact of the Employment Fund program on people who actually participated in EF-sponsored trainings. These results are called "Average Treatment Effects on the Treated" and differ conceptually from the ITT impacts reported in the main paper. The difference lies in the definition of treatment: for ITT estimates, the treatment group includes anyone whose score was above the threshold for the training course they applied for, that is, their score qualified them for admission to training. For ATT estimates, the treatment group includes anyone who actually participated in that training course, regardless of their score. People who were offered training but declined to participate are counted as "treatment" under ITT, and "control" under ATT. People whose score did not qualify them for training, but who nonetheless participated (perhaps because they were an alternate) count as "control" under ITT, but "treatment" under ATT. Simply put, ITT analysis reports the effects for those selected into the program whereas ATT analysis reports the effects for those who actually participated in the program.
As described in Section Vc, both the ITT and ATT estimates likely suffer from some degree of bias. As expected, the ATT impacts are generally larger than the ITT impacts: since the ATT comparison only includes people who actually participated in EF training courses in the treatment group, it is not diluted by the inclusion of people who were chosen for, but did not take the training. The drawback of the ATT comparison is that there is no way to account for the unobservable characteristics (such as motivation or ability) that led some people to decline the offer of participation or some control group members from gaining access. Because the ATT estimates always exceed the ITT estimates in our analysis, we will frame the impacts as a range, with the ITT estimate serving as a lower bound and the ATT estimate serving as the upper bound. To that end, we will replicate the previous (ITT) estimates of the program impact (reported in Tables 7-10, and 12-19 in the main paper) using the ATT definition of treatment and control groups. The consequent results are described in the remainder of this Annex.
As with the ITT definition of treatment and control group, we find significant differences between groups at baseline in the ATT comparison. The baseline balance tests reported in Table A3.1 show that the treatment group consists of a higher share of Janajati and Muslim but lower share of Dalit individuals. Moreover, the treatment group is less educated in terms of completion of 10 th grade (SLC) and desires a lower number of children at baseline on average. Further, a higher share of treatment group individuals reported that they had savings at the time of the interview and that they had been engaged in an income generating activity in the past month.
First-Stage (Table A3.2): The results are broadly similar to those of the ITT analysis in Table 7. Being Muslim and having a child is positively correlated with treatment, while the average number of children an individual has is negatively correlated with treatment. Unlike in the ITT analysis, for the ATT comparison we also find that being engaged in an income generating activity at baseline is positively correlated with program participation. Although the correlation is only significant at the 10 percent level, this finding possibly confirms our earlier assumption that the results from the ATT analysis might be slightly upwardly biased. Individuals who start off with a higher motivation for seeking employment, may also have been more successful in self-selecting into the program even though they had initially not been selected for training. Therefore, it is important to emphasize again that the ATT results should be interpreted with care and can only serve as a higher bound of the program's impact. (Tables A3.3 to A3.5): For economic outcomes in Table A3.3, the ATT results are in most cases higher than the ITT estimates. Both, the ATT and the ITT results for the first outcome (employment) are uniformly positive and significant. Yet, the impact estimated in the ATT specification is 9 to 13 percentage points increase in labor market participation, while the estimates of the ITT comparison indicate "only" an increase of 7 to 9 percentage points. The impact on non-farm employment ranges from 20 to 21 percentage points, compared to 15 to 16 percentage points for the ITT results. And the trade-specific employment rate, or the likelihood of obtaining employment in the same trade of one's training, ranges from 23 to 24 percentage points, compared to 18 to 19 percentage points for the ITT estimates. Turning to the impact on earnings, the ATT results are again uniformly higher than the ITT results for average earnings, logged earnings, and the proportion of people earning greater than 3000 NRs per month, indicating "gainful" employment. Although differences in earnings across groups are higher in the ATT analysis, total savings are lower and logged savings even turn insignificant compared to the ITT specification. A possible explanation for this finding may be the higher rise in self-employment in the ATT based analysis (9 to 10 percentage points, compared to 6 percentage points in the ITT based analysis). In the process of becoming self-employed, it is likely that individuals might have used parts of their savings as initial investment for their small-scale businesses.

Full Sample Results
The ATT estimates for the empowerment outcomes (Table A3.4), are generally consistent with the ITT estimates in the main paper. EF training participants report having more control over household spending, more access to mentors who can advise them on work-related matters, and more money of their own. Except for the latter indicator, all estimates are slightly higher in the ATT compared to the ITT analysis. The same is true for the estimated impacts on self-regulation ability and entrepreneurial selfconfidence. Additionally, we find significant evidence for improvements in control over earnings and mobility in the ATT specification. Only for the estimates on the psychological measures of selfconfidence we do not find consistent results across specifications. While the ITT analysis showed strong and significant impacts in this regard, the ATT estimates are much lower and clearly insignificant. This indicates that self-confidence rose with being selected into the program, but was apparently not affected by actual participation. Table A3.5 on reproductive health and family outcomes shows no evidence that the EF training program has any impact on desired fertility, pregnancies, actual number of children, contraceptive use, HIV knowledge, and protein consumption. Unlike the (very weak) evidence from ITT results from Table 10, the ATT results do not show any significant reduction in food insecurity among EF training participants relative to non-participants. Compared to the ITT results, the ATT table in turn shows some weak evidence of a decline in arguments with one's partner in the matching specification. These results are not precise enough or consistent across specifications, though, and therefore should not be interpreted as an overall significant impact of the program. We further find a slight upward shift (3 percentage points) in the share of households that experienced an increase in received remittances. Yet, the result is only statistically significant on the 5 to 10 percent level and its economic importance is limited. (Table A3.6): The trade-wise impacts in Table A3.6 are generally consistent with the ITT impacts from Table 12, with some notable differences. The construction training delivers large ATT impacts on employment and earnings, but no according significant ITT impacts, implying that those who do not take up the training may be able to find employment, but they do not earn as much as those who actually complete the training and are assisted by their training providers to find employment opportunities. 68 We observe a similar phenomenon for food/hospitality related training, indicating that those who completed the training did better than those who dropped out. While ITT effects in all four employment measures are negative (yet insignificant) or zero, ATT effects on monthly earnings and trade specific employment are large. Also, training in Handicrafts shows an additional significant ATT impact on earning more than 3000 NRs per month, while the ITT estimate is not significant.

Analysis by Trade
In contrast, poultry training shows negative ATT impacts on monthly earnings and earning more than 3000 NRs (the latter not being significant) in comparison to the non-significant but strongly positive ITT effects.
In addition the ATT estimate on trade specific employment decreases compared to the ITT analysis. This unusual pattern of impacts may indicate that the people who drop out of this training are actually better off than the people who complete it (although the small number of trainees in this training category make any generalizations unreliable). Since these courses tend to attract rural and less educated trainees, it is plausible that the most able among them seek out better opportunities even after gaining admission to the course.
Gender-disaggregated impacts (Table A3.7, to A3.12): The ATT results on employment outcomes in Table  A3.7 show significant impacts for men and women. While in the ITT results on employment we only find significant impacts for women, the ATT estimates show positive impacts for both gender. Although the estimate for women is still higher compared to men, the difference between men and women, in contrast to the ITT results in Table 14 is now statistically insignificant. The impact on non-farm employment, however, is higher for women (as it is in the ITT results). Further, we find significant impacts for men and women on trade specific employment, monthly earnings of more than 3000 NRs, and savings (only women), where the impacts are in each case significantly higher for female than for male participants. On the contrary, our results show an impact on taking loans for male participants only. Table A3.8 overall confirms the results of the ITT analysis in Table 15, which showed that employment impacts of the program do generally not differ by a women's age. Only one significant age related difference in impacts on employment outcomes was observed in the ITT analysis, namely take-up of employment outside of the home (positive and significant impact for young women but not for the old). This result, however, is not confirmed in the ATT analysis. Instead, the ATT estimates weakly suggest that the program has affected the ability to earn more than 3000 NRs monthly for both age groups, but even more for older women. Table A3.9 presents gender-disaggregated ATT results on empowerment. We see a positive and significant improvement in control over earnings and a gain in mobility only for women, which is similar to the pattern of ITT results. Although, we again find an impact on control over household spending for men only, the difference between men and women is not significant in the ATT analysis. Yet, trained men slightly loose on their financial literacy.
Overall, there are no significant differences in the program's impact on empowerment outcomes comparing young and old women (Table A3.10). The exception being, that we see a large positive and significant effect for trained older women on self-confidence levels, an effect that does not apply to the younger AGEI sample.
Finally, Table A3.11 examines the gender-disaggregated ATT impacts of the EF training program on reproductive health and family-level outcomes. Consistent with the full sample ATT results in Table A3.5, we see no significant impact on pregnancy. Moreover, the results that trained men show a higher protein intake and desire more children, while trained women desire less children (as seen in the ITT results in Table 18) are not confirmed in the ATT analysis. However, the ATT estimates suggest a similar pattern for the actual number of children. While the number of children of trained men increased by 3 percentage points it fell by 4 percentage points for female participants. Consistent with the analysis of ITT impacts across age, we do not find any significant difference in impacts on reproductive health or household level outcomes between young and old women in the ATT estimates. Standard errors are clustered by training course. "ITT" indicates that treatment is defined as having a score that qualifies the respondent for an EF training course. *,**, and *** denote significance at the 10% level, 5% level, and 1% level. Psuedo R2 0.071 Notes: Standard errors (reported in brackets) clustered by training course. "Treat (ATT)" equals 1 if individual participated in a training course and 0 otherwise. Other independent variables (not shown): district and T&E provider fixed effects, training-type categories, quintiles of household wealth. All variables measured at baseline. Although baseline data were collected on 4,677 individuals, incomplete data on ethnicity reduces the number of observations to 4,449.*,**, and *** denote significance at the 10% level, 5% level, and 1% level. Notes: All columns report difference-in-difference estimates. "ITT" indicates that everyone whose score qualified them for a given training event is included in the "treatment" group. Standard errors (reported in brackets) clustered at the event level where possible. Self-employment and location of work were not asked in 2010. *,**, and *** denote significance at the 10% level, 5% level, and 1% level. Notes: All columns report difference-in-difference estimates. "ITT" indicates that everyone whose score qualified them for a given training event is included in the "treatment" group. Standard errors (reported in brackets) clustered at the event level where possible. Self-employment and location of work were not asked in 2010. *,**, and *** denote significance at the 10% level, 5% level, and 1% level. Notes: All columns report difference-in-difference estimates. "ITT" indicates that everyone whose score qualified them for a given training event is included in the "treatment" group. Standard errors (reported in brackets) clustered at the event level where possible. Self-employment and location of work were not asked in 2010. *,**, and *** denote significance at the 10% level, 5% level, and 1% level.  Yes Yes Yes Notes: Standard errors (reported in brackets) clustered at the event level where possible. *,**, and *** denote significance at the 10% level, 5% level, and 1% level. Yes Yes Yes Notes: Standard errors (reported in brackets) clustered at the event level where possible. *,**, and *** denote significance at the 10% level, 5% level, and 1% level. Yes Notes: Standard errors (reported in brackets) clustered at the event level where possible. *,**, and *** denote significance at the 10% level, 5% level, and 1% level. Yes Notes: Standard errors (reported in brackets) clustered at the event level where possible. *,**, and *** denote significance at the 10% level, 5% level, and 1% level. Younger women (aged 16 to 24) compared to older women (age 25 to 35).