Peer Effects on Violence: Experimental Evidence from El Salvador

This paper provides experimental evidence of the effect of having peers with different propensities for violence in the context of an afterschool program. By randomly assigning students to participate in the program with a set of similar or diverse peers in terms of violence, the study measures the effects of segregation or integration on students' behavioral, neurophysiological, and academic outcomes. The paper also exploits a discontinuity around the median of the propensity for violence distribution, to measure the impacts of segregation on marginal students. The results indicate that integrating students with different propensities for violence is better for highly and less violent children than segregating them. In particular, the intervention can have unintended effects on misbehavior and stress, if highly violent students are segregated and treated separately from their less violent peers.


Policy Research Working Paper 9187
This paper provides experimental evidence of the effect of having peers with different propensities for violence in the context of an afterschool program. By randomly assigning students to participate in the program with a set of similar or diverse peers in terms of violence, the study measures the effects of segregation or integration on students' behavioral, neurophysiological, and academic outcomes. The paper also exploits a discontinuity around the median of the propensity for violence distribution, to measure the impacts of segregation on marginal students. The results indicate that integrating students with different propensities for violence is better for highly and less violent children than segregating them. In particular, the intervention can have unintended effects on misbehavior and stress, if highly violent students are segregated and treated separately from their less violent peers. This paper is a product of the Development Research Group, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The author may be contacted at ldinartediaz@worldbank.org.

Introduction
Peers can have important effects on individuals' economic decisions. During adolescence in particular, young people are exposed to different groups of peers; this exposure can influence subsequent human capital accumulation and behavior. 1 Specifically, the exposure to violent peers can affect not only adolescents' academic achievement in the short-run (Lavy and Schlosser, 2011;Carrell and Hoekstra, 2010) but also their earnings in the long-run (Carrell et al., 2018).
From the existing literature on group composition, there are three important evidence gaps. First, most existing papers only identify the intensive margin: the effect of exposing an individual to a group of peers with certain average characteristic; yet, with some notable exceptions (Garlick, 2018;Duflo et al., 2011;Lafortune et al., 2016), they do not directly compare average effects of different policies to one another.
Second, there is little consensus about what type of group composition-integration or segregationis more effective in improving relevant economic outcomes. Some studies find that integrated groups are preferable because the interactions with diverse peers have the potential to enhance the learning experience (Lafortune et al., 2016) or that exposure to high-performing peers improves outcomes for more disadvantaged individuals (Lavy et al., 2012;Rao, 2019;Griffith and Rask, 2014;Oreopoulos et al., 2017). However, another strand of the literature finds that grouping individuals with similar peers can generate better results, since that segregation allows teachers to match instruction to a particular group's needs (Duflo et al., 2011), or because individuals prefer to interact with peers with whom they share particular characteristics (Carrell et al., 2013;Girard et al., 2015;Goethals, 2001). 2 Finally, most of the research on this subject examines the effects of group composition by income, academic or labor performance, and entrepreneurship propensity. 3 Existing evidence indicates that exposure to violent peers can also have negative impacts on human capital accumulation and criminal behavior later in life. 4 However, there has been little rigorous analysis of the impacts of group composition in terms of propensity for violence on behaviors and economic outcomes.
This paper conducts the first randomized controlled trial designed to address these three gaps. I experimentally manipulate whether students participated in an After-School Program (ASP) in homogeneous or heterogeneous groups according to their initial predicted propensity for violence.
The empirical design overcomes some issues related to the identification of peer effects, such as the reflection problem and the strong assumption of the separability of peer composition and other confounding effects within groups (Manski, 1993;Angrist, 2014). In addition to estimating average treatment effects, my study design allows me to exploit a discontinuity in propensity for violence to estimate effects on marginal group members. The analysis focuses on the differential impacts of exposure to a particular composition of peers on academic, neurophysiological, and behavioral outcomes, or if it changes students' effort at school. My results show that it is beneficial to mix students with different levels of violence rather than to segregate them into more and less violent groups. To my knowledge, this paper provides the first experimental evaluation of the effects of peer composition by propensity for violence.
The ASP I study in this paper consists of clubs implemented after school but within school facilities in El Salvador from April to mid-October during the 2016 academic year. 5 Students participated in two sessions per week that lasted 1.5 hours each. Every session combined: (i) a dis-cussion oriented toward fostering children's conflict management, violence awareness, and social skills (similar to content used in cognitive behavioral therapy); and (ii) club curricula that included activities such as scientific experiments, artistic performances, and others. The intervention was implemented by volunteers of Glasswing International, a local NGO working in Central America and Mexico. The study sample includes 1,056 enrolled students ages 10-16 years from five vulnerable public schools, where children have a high risk of engaging in or becoming victims of criminal activities.
The Salvadoran context is relevant for this analysis because youth violence is of foremost importance. In 2015, El Salvador was one of the world's five deadliest places for young boys (UNICEF, 2017). In recent years, 18% of students reported that they dropped out of school due to delinquency within schools or in the surrounding neighborhoods (MINED, 2015). Thus important policy implications emerge from studying how to increase the effectiveness of an intervention designed to protect at-risk children and adolescents while teaching them how to address and control violent behaviors.
To measure group composition effects and to exploit the existence of excess demand for the program, within each participating school, I randomly assigned individual students according to their initial propensity for violence to a group with a heterogeneous or homogeneous combination of peers 6 or to a control group. Students in the homogeneous treatment were separated into two subgroups by considering their percentile in the distribution of violence-that is, students whose predicted violence was higher (lower) than the median were assigned to a club with peers with high (low) predicted propensity for violence. Participants in the control group went home after school time. This design was inspired by Duflo et al. (2011) and Lafortune et al. (2016). Randomization ensured that group size and club categories were balanced across both treatments.
Before the intervention, I collected self-reported data on personal and family characteristics from enrolled students. Follow-up self-reported data included questions to measure the intervention's impact on attitudes, violence, and crime as well as enrolled children's exposure to risky spaces.
I combined this self-reported information with neurophysiological evidence, particularly measures of stress and emotional regulation, from a random subsample of enrolled students. I used low-cost, portable electroencephalograms within an in-field lab setting. Finally, I also collected administrative records on grades, behavioral reports, and absenteeism data from the students.
Schools provided these data before and after the intervention.
I report three sets of results. First, when I compare both types of group composition, integration is better than segregation. Results show that, on average, improvements in attitudes and misbehavior at school are larger when participants are in more diverse groups than when they are segregated. This is true for both high-and low-violence children. These results align with the evidence that interactions with diverse peers can generate differences in the learning experience (Lafortune et al., 2016). 7 In this sense, students in heterogeneous groups have the opportunity for exposure to both good behaviors they should follow and negative ones they should reject.
These interactions are less available for students in the homogeneous group. In terms of neurophysiological outcomes, I find that students exposed to similar peers in terms of violence-that is, assigned to homogeneous groups-have stress levels greater than those of students placed in a heterogeneous setting. The increase in stress is particularly great for children treated in a homogeneous and highly violent group versus comparable children treated in heterogeneous groups.
Second, I studied tracking effects on marginal students, who are just above or below the median of the propensity distribution function within each stratum. By virtue of my design, very similar students around this cutoff were assigned either to homogeneous highly or less violent groups. By exploiting the discontinuity around the median and using only the sample of children assigned to the homogeneous treatment, I find evidence that marginal students are negatively affected by being assigned to the most violent group in both academic outcomes and misbehavior at school. This result contributes to the literature on how segregation by initial violence may encourage the formation of networks of violence (Billings et al., 2016;Di Tella and Schargrodsky, 2013;Bayer et al., 2009) and affect individuals who were supposed to be the key beneficiaries of these types of interventions.
Third, both integration and segregation of students by their initial propensity for violence generate better average effects on behavioral, neurophysiological, and academic outcomes than no ASP at all. In summary, these three pieces of evidence on peer effects indicate that having some highly violent peers can constitute a learning alternative for children less prone to violence because they can see the behaviors they should not emulate. However, the jump around the median in the tracking group also indicates that being exposed to a more significant share of "bad" peers can have the opposite effect. This implies there is an optimal "bad"-to-"good" peer combination that maximizes the program's overall impact. This paper contributes with causal evidence to the discussion of tracking versus integration as an optimal strategy for assigning participants to an intervention. The effects on academic and noncognitive outcomes as a result of integration that I present here accord with a body of microlevel evidence; it explains that these effects likely stem from the interaction between diverse individuals within groups. 8 My results are mainly similar to those of Rao (2019), who finds an improvement in some social preferences outcomes such as generosity, prosocial behavior, and equity when there is an exogenous change in wealth heterogeneity in India. My study is novel in its modification of the composition regarding violence and the inclusion of analysis of peer effects on 8 See Sacerdote et al. (2011) for a summary of recent literature on peer effects on student outcomes in educational settings.
6 additional noncognitive outcomes such as violence, misbehavior, and attitudes toward school and learning that are important in developing countries.
There is also a growing body of evidence that finds benefits from tracking. Theoretically, Lazear (2001) shows that, amid different levels of classroom disruption, segregation by type maximizes the total school output. Some empirical papers also find that school tracking can improve academic results, with greater effects for low performers (Duflo et al., 2011;Cortes and Goodman, 2014;Girard et al., 2015). 9 In contrast to those papers, my results indicate that tracking can have unintended effects on academic outcomes and misbehaviors when it targets only the most violent students.
A plausible explanation for the differences between my results and those reported in the tracking literature is the lack of specific incentives for instructors to adapt club curricula to their groups' needs. In fact, my results fit into the predictions of Duflo et al. (2011)'s model under the special case in which instructors do not respond to group composition because the teacher's effort function is a constant, or when the cost of effort is zero below certain target levels to which teachers orient instruction. Under this assumption, tracking by violence worsens outcomes for those above the median of the distribution of violence in the group to which they are assigned; but increases the performance for those below it.
The remainder of the paper is organized as follows: section 2 describes the intervention, data collection, and study design. Specifically, this section presents details of the propensity for violence (IVV) estimation, descriptive statistics, and results of experimental design checks. Section 3 summarizes the specifications used to estimate the effects of the intervention on academic, behavior, and violence outcomes, and peer effects in this context. These results are presented in Section 4. 9 Duflo et al. (2011) find that tracking benefits both lower-and higher-ability students in Kenya. Cortes and Goodman (2014) analyze the "double-dose" algebra policy in Chicago public schools, which sorted students into algebra classes by their math ability. They find that this policy improved short-and long-term academic performance. Girard et al. (2015) study students' social networks formation and find evidence of preferences for homophily along several dimensions.
Section 5 discusses the results and provides evidence of the most plausible mechanisms. Finally, the preliminary conclusions appear in Section 6. All appendix tables are at the end of this paper.

After-School Clubs
This study was conducted within the context of an ASP in El Salvador that was implemented by an NGO, Glasswing International. I partnered with Glasswing to design and implement an experimental evaluation that (i) measures the impact of this ASP as is and (ii) identifies the best targeting approach to improve its effectiveness.
In Dinarte and Egana (2019), we address the first objective. Our main estimations indicate that this ASP not only improved attitudes toward school but also enhanced learning and lessened misbehavior at school. These two effects translate into an improvement in participants' academic performance. We also estimated neurophysiologial measures of emotional regulation and stress using electroencephalogram recordings. We find evidence that control of emotions and automatic responses to stimulus is a potential mechanism for the effects on behaviors and academic attainment.
By implementing a tracking-by-violence experiment I describe below, this paper addresses the second objective. Therefore, I highlight in this section only the curricula structure and enrollment process that will help readers to better understand the experimental design and its results.
The ASP implementation is part of the NGO's program Community Schools in five public schools in highly vulnerable communities in El Salvador. According to the intervention approach, its main objective is to modify children's violence and attitudes through the acquisition of life skills, 8 thereby improving their academic performance (Glasswing International, 2012). The NGO offers four categories of clubs-Leadership, Art and Culture, Sports, and Science-in the ASP by education level (ciclos). Considering this intervention structure, the experiment was designed by using the natural school-by-education-level organization as the stratification variable.
Clubs meet twice a week for approximately 1.5 hours each and take place immediately following school hours. This intervention occurred between April and October of 2016. Each session is divided into two sections according to the services an ASP can provide. The first section is social skills development, which relates to the learning service of an ASP (Taheri and Welsh, 2016;Durlak et al., 2010). This section is common to all participants and includes topics such as management of conflict and risk, school violence reduction, and soft skills. Some of this section's activities originate in exercises from Cognitive Behavioral Therapy and positive psychology. By implementing experiential learning and role playing, these activities try to make students aware of certain behaviors, to disrupt these patterns, and to promote better ones.
The session's second section relates to the protection-or incapacitation-aspect of an ASP (Gottfredson et al., 2007;Mahoney et al., 2001). During after-school hours, children are under adult supervision as a means of preventing them from being exposed to risks in their communities. In this section, instructors implement activities related to each club's category. For example, in the sports clubs, children play soccer and basketball, among other athletic pursuits. The sports element also attracts student participation and therefore increases ASP attendance.
At the beginning of the school year, the NGO visits schools to offer the program and enroll participants. Out of a total of 2,420 children from the five beneficiary schools, 1,056 students ages 10-16 years are recruited and enrolled to participate in the ASP. Children can self-enroll; they are only required to bring a parent's signed authorization to participate.
During the registration stage, children are asked to complete an enrollment form. It collects per-sonal and family information I describe below. Then they are assigned to a group in light of their preferences, parent's authorization, and aggregated demand for each club category. Clubs are composed of 13 students from a single educational level.

Experimental Design
This paper aims to provide experimental evidence on the best targeting approach to increase the effectiveness of an ASP. After randomly assigning enrolled students to treatment or control groups, I create an additional exogenous variation on each student's peer composition in terms of violence. This is to test if integration or tracking is the best implementation strategy in the context of this program. Therefore, this design has different steps: violence measure estimation per participant, random allocation of enrolled students to different treatment arms, collection of relevant data at different stages, and experimental design robustness checks. This last component includes other distributional criteria in addition to balance on observables across treatments before the program. I describe all of them in this subsection.

Propensity-for-Violence Index (IVV) Estimation
To assign enrolled students to each treatment group, I first needed to measure their propensity for violence. It was not possible to ask directly about this during the registration phase because we could not guarantee this personal information would be kept confidential. For example, either the local authorities or gang organizations might force the research team or the NGO to reveal information that identified each child, risking not only the intervention but also-most important-the children's safety. Additionally, the inclusion of specific questions about gang membership and other associations with these organizations, which are highly correlated with crime and violence in El Salvador, might endanger both children and instructors.
Instead, following Chandler et al. (2011), I estimate a predictive model of violence and crime from existing data using a Two-Sample Least Squares strategy. First, using an existing anonymized database of youths' violence and crime from El Salvador (FUSADES, 2015), 10 I estimate the likelihood of having committed a violent act V f as a function of a wide range of covariates: where D f is a vector of violence determinants of student f in the FUSADES dataset. This vector includes variables that indicate individuals' vulnerability to violence, such as student characteristics (e.g., age, gender, time spent alone at home, and education level); children's household variables (e.g., residence area, mother's education, and household composition); and school-level controls (e.g., school location and commuting time to school). 11 Descriptive statistics and comparison of means (p-values) between the FUSADES sample and the one from this study are in Appendix Table A1. Estimations indicate that both samples are similar in most of the determinants except for some variables such as student's age and their report of being without adult supervision after school hours.
All estimated coefficientsα 1 have the expected sign according to the literature of violence determinants, as shown in Appendix Table A2. For instance, boys are more likely to be violent than girls, adolescents behave worse than children (Rodriguez-Planas, 2012), and lack of parental supervision increases the probability of committing a violent act (Gottfredson et al., 2004). Statis-tically significant determinants are participant's age, gender, living in urban area, lack of parental supervision, and commuting time. Overall, lack of parental supervision is the most important determinant of propensity for violence in this sample.
Then, exploiting the availability of these variables in the registration forms of enrolled students, I predict the measure of propensity for violence (IVV) for each child, using the vector of estimated coefficientsα 1 . Two features of this IVV are important to emphasize. First, since the variables included in the estimation relate to students' violence exposure at different domains (family, school, and community), this measure is a more accurate proxy of students' overall propensity for violence than are reports of students' misbehavior from school records. Second, this predicted index can be interpreted as a measure of students' propensity for violence rather than as an indicator of effective violence.
Although the IVV is not a perfect measure of violence, I provide some evidence to argue that it is clearly the best proxy of propensity for violence considering the restrictions of this particular context. First, according to the existing literature of violence and crime determinants for particular groups (Klassen and O'Connor, 1988;Chandler et al., 2011), 12 these types of crime and violence models estimated from existing data have high predictive power. 13 As I mention below, in using misbehavior reports as the classification variable in the experimental design, estimations indicate that I would have had a similar classification in an important share of the total sample.
An additional concern is that this index might explain another factor like school performance.
Thus, I estimate the correlation between the predicted index and grades reported by teachers, and I find that it is not statistically significant. Yet, I also find that the correlation between the predicted IVV and misbehavior at school is positive and statistically significant at 1%. In Ap-pendix Table A3, I present these estimations using different standardizations of academic grades and behavior reports.
Finally, the IVV predicts both intensive and extensive margins of future misbehavior. Using data from students in the control group, I find that the correlation between IVV and bad behavior at the end of the academic year is positive and statistically significant at 5%. The estimation strategy and main results are in Appendix Table A4.

Treatments
After estimating the IVV, enrolled children were randomly assigned to two groups-control (C, 25%) and treatment (T, 75%)-within each school-by-educational-level "block." 14 Then, in a second randomization stage, treated children were randomly assigned to two treatment armsheterogeneous (HT, 25%) and homogeneous (HM, 50%)-as shown in Figure 1.
Next, students in the HM set were ranked and assigned to subgroups according to their index: all students with an IVV above the median at the HM-stratum level were assigned to the High-IVV group (HM-High, 25% of the full sample) and the rest were assigned to the Low-IVV (HM-Low, 25%) group. The HM-Low and HM-High groups are defined using the median in each schoolby-educational-level block because I wanted variation in the cutoff. Moreover, an uniform cutoff across all randomization blocks could generate differences in group sizes that would probably confound the effects of tracking with group size. This design permits me to judge if targeting helps to improve the ASP's effects and allows me to test the potential existence of peer effects and heterogeneity by initial propensity for violence in the program. In addition, this strategy utilizes a regression discontinuity (RD) design approach to measure the impact of tracking on the marginal student and contrast it to the average impact.
14 Each education level consists of three years of schooling: the first is from first to third grades, the second from fourth to sixth grades, and the third from seventh to ninth grades.

13
Treatments are described below: 1. Heterogeneous (HT): Registered and randomly selected students are assigned to take part in a club with an heterogeneous composition of clubmates according to their IVV.
2. Homogeneous-Low (HM-Low): Registered and randomly selected students are assigned to participate in a club with low-violence peers if their IVV is lower than the median of the HM group within their respective stratum.
3. Homogeneous-High (HM-High): Registered and randomly selected students are assigned to participate in a club with highly violent peers if their IVV is greater than the median of the HM group within their respective stratum.

4.
Control: This group of students was not selected to participate in the clubs during the 2016 academic year. They left school facilities after school hours. We were able to collect their information at follow-up because we gave them a "participation coupon" they could redeem to take part in the ASP the next year.
As opposed to Duflo et al. (2011) and similar to Lafortune et al. (2016), neither instructors nor participants knew details of the assignment because I wanted to capture the effects of interactions among participants instead of other channels such as teaching or curriculum adaptation. To test for changes in teaching methodologies, I collected information from a trainers' survey and present the results in upcoming sections.

Data
As previously mentioned, after the NGO advertised the ASP in school facilities, a research team returned to schools to enroll and register participants. In this stage, students were asked to bring a consent form signed by their parents or tutor and to complete a registration form. This instrument collected personal and family information such as age, gender, mother's education, and average commuting time, among others. These were used to estimate the IVV. I also collected school records of academic grades, absenteeism, and behavior reports for all children.
According to the ASP's theory of change, the program can directly affect behavioral and neurophysiological outcomes such as children's violence, misbehavior at school, and emotional regulation. It may also have some indirect effects on academic performance, since changes in noncognitive outcomes can affect cognitive skills (Cunha and Heckman, 2008). Since this paper investigates differences in those outcomes by group composition, I study the effects of tracking and integration in the same three categories of variables.
First, short-term follow-up data on attitudes toward school and violent behaviors were collected from enrolled participants in school facilities. Students completed a follow-up survey in classrooms set up especially for this purpose at the end of October 2016, after all clubs had finished implementing their curricula. Each survey took approximately 45-60 minutes. Most surveys were self-administered, with assistance from staff trained in the survey methodology. To increase statistical power to detect effects for outcomes within a family and to reduce the number of hypothesis tests, I construct indexes of variables that are expected to move in a similar direction (Haushofer and Fehr, 2014;Heller et al., 2017). Since I do not necessarily trust self-reports, I attempted to recheck and validate these behaviors and attitudes using proxies for these outcomes obtained from administrative data such as absenteeism figures and reports of misbehavior at school.
For the analysis of group composition on stress and emotional regulation, I use neurophysiological recordings collected for our work (Dinarte and Egana (2019)) from a random subsample of enrolled students. 15 Finally, to measure the effects of group composition on academic performance, schools provided administrative records of math and science grades at the end of the academic year. Appendix 1 includes a description of all collected data and main outcomes.
As shown in Appendix Table A5, the average matching rate of administrative data of enrolled children was 94% at baseline and 97% at follow-up. All matching rates were balanced between treatments and C groups except for the fraction of math grades at baseline between the HM and C groups, significant at 10%; and in absenteeism between both tracking groups, also significant at 10%. To account for this difference, I include the imputed grade for missing observations at baseline and a missing value indicator in all specifications for the academic outcomes. Additionally, the average matching rate of administrative data of nonenrolled students was 85% at baseline and 98% at follow-up.
The share of initially enrolled students that filled out the follow-up survey after the intervention was 92%, on average; for the HM and HT groups, it was 91% and 94%, respectively. There were no statistical differences between treatment and C groups in overall attrition rates. Therefore, results are not driven by the absence of follow-up survey data for any group. For the neurophysiological measures, after filtering electroencephalogram (EEG) recording data, the average attrition share was 49%. In Dinarte and Egana (2019), we present several checks to verify that this attrition rate was not correlated with the intervention. We argue that attrition was caused mainly by the quality of the data recordings. For example, long, dense, or dirty hair and/or "frozen" computers were the most common troubles that the Matlab toolbox encountered in reading the EEG recordings. As a summary, the time line of the study is shown in Figure 2.

Summary Statistics
Descriptive statistics of the full sample and each treatment and control group are shown in Table   1. Column 1 exhibits statistics for the control (C) group and columns 2 and 3 for the treatment (HT and HM) groups, respectively. Columns 4-5 show statistics for the two homogeneous subgroups.
Panel A presents summary statistics of the violence determinants. Participants are on average 11.9 years old, 49% are male, and 73% live in an urban area. Regarding family composition, 91% of the students live with at least one parent, and 9% live with a relative or a nonrelated adult.
On average, 62% of students' mothers have an intermediate education level (7-12 years), and 31% have fewer than six years of schooling. Regarding risk exposure, only 5% of students report being alone at home when they are not at school. On average, they travel around 18 minutes to school.
Finally, the last row of panel A shows that the average propensity for violence for any treatment and C groups is 0.038, with a standard deviation of 0.029, ranging from 0.001 to 0.215. This average propensity for violence is 14 times the mean probability that a given student will be vulnerable to violence in Chicago (Chandler et al., 2011). Even when both estimations are not completely comparable (because I use fewer violence determinants than Chandler et al. (2011)), this difference sheds light on the tremendous propensity for violence of the children in this study. 16 Panel B shows academic scores and absenteeism for the first quarter of the 2016 school year. In a grade scale of 0-10, requiring a minimum grade of 5 to pass each course, enrolled students had 6.5 points. The mean absenteeism rate in the first quarter, before the intervention, was 4.2% (1.69 out of 40 days). 16 More descriptive statistics of the predicted propensity for violence are presented below.

Experimental Design Checks
This experimental design must meet five requirements to generate an exogenous variation that allows me to identify the causal effects of group composition in terms of violence. First, the treatment and C groups must be balanced. I find some differences in means between the treatment arms and the comparison group. p-values for all the tests of differences in means between each treatment-the HT, HM, HM-H, HM-L-and the C group are in Appendix Table A6.
For example, in the comparison between the HM and C groups, there are differences in the share of students who live with both parents or only with one of them. In addition, children in the HT group attend higher-level courses than students assigned to the C group. When I compare the HT and HM groups, there is a slight difference in the mean propensity for violence. Finally, average absenteeism is higher for students in the C group compared to any treatment arm. However, after adjusting p-values for multiple hypothesis testing of means and FWER, all differences are not statistically different from zero. Yet I account for all the differences by including these variables as controls in the estimations. Specifically, I control by the percentile of predicted IVV, household composition, absenteeism, and student's school year. Additionally, in academic outcomes specifications, I include the respective grades at baseline to account for the differences in academic performance before the intervention.
A second condition is that the HM-High group's IVV should be greater than that of the HM-Low group, also expressed in most of its determinants. As evident in columns 4 and 5 in Table   1, and which I can verify using tests for differences, the HM-High group has a larger proportion of male and older students than the HM-Low group. They are also more exposed to violence because they live in an urban area, face greater travel time to school, and most of them spend time at home alone. 17 Finally, the average academic performance of students in the HM-High group is worse than that of children in the HM-Low treatment.
As the assignment to the HM and HT groups was defined over the predicted violence index, the third requirement is for the design to effectively generate changes in the propensity for violence of one's clubmates, depending on the random assignment. As I show in Table 2, consistent with the premise that nontracking groups are more violence diverse than any of the tracking groups, the standard deviation of the HT group was 0.010 and 0.018 points higher (35%-78%) than the same figure for the HM-H and HM-L groups, respectively. Additionally, a second premise held that the average violence level of the HT group must be between the HM-Low and HM-High levels. This design fulfills these conditions, as is apparent from the results in Table 2: the average HT group's IVV is between those of the HM-High and HM-Low.
The fourth requirement relates to three desired characteristics of the IVV distribution functions of the HT, HM, and C groups, before treatment. The first one is that these distributions must be similar at the baseline. Using the two-sample Kolmogorov-Smirnov test for equality of distribution functions, hypotheses are not rejected-p-values of 0.62, 0.89, and 0.68 for the HT-HM comparison, the HT-C comparison, and the HM-C comparisons, respectively. The similarity among distributions can be verified as well in Figure 3. The second characteristic is that the distributions of the HT, HM-High, and HM-Low groups must differ. As Figure 4 illustrates, there are differences among the three groups' distributions. Particularly, using the two-sample Kolmogorov-Smirnov test, I reject the hypothesis of equality of each comparison of distribution functions pairs at 1%. The last desired feature is that the distributions of the HM-High and HM-Low groups should not fully overlap in the full sample in order to have some variability between both HM subgroups. Had I not stratified, there would be no overlap between both groups. However, as the assignment was defined within each stratum, there is overlap in 67% of the sample, as shown in dren as her potential income is low. Alternatively, if the mother has higher education, then she will probably have more financial means to pay for some sort of childcare or other presence in the home. The fifth condition is that there must be a sharp discontinuity at the fiftieth percentile for the HM subsample, consistent with the discontinuous assignment at the median IVV within each stratum. Figure 6 shows the predicted IVV median of a student's clubmates as a function of her own IVV, and the expected jump at the fiftieth percentile. Moreover, a RD-robust estimation using only this homogeneous subsample indicates that students assigned to the HM-High group are enrolled with peers with a mean IVV 0.8 points greater and statistically significant at 5%. 18 I contend that this IVV is a good proxy for violence because even after using misbehavior reports as the classification variable for high and low propensity for violence, estimations indicate a similar classification in 53% of the total sample. Crucially, there are no differences in the classification among treatments, as the last row of Appendix Table A3 shows.

Empirical Strategy
In this section, I describe my empirical strategy to study group composition effects and how this heterogeneity interacts with children's initial propensity for violence. First, I describe the specifications to measure average effects of being treated in a particular composition of peers, exploiting the random variation generated directly from the experiment design. Second, using the discontinuity in the median of the IVV distribution function of the HM group, I evaluate the effect of tracking on the marginal participant.

Group Composition (GC) Effects
This study design creates a direct experimental variation on GC regarding violence. Thus, I can directly test for differences in the ITT effects on the outcomes of students assigned to groups with either homogeneously or heterogeneously violent peers, using the following specification: where y ij is the postintervention behavioral, neurophysiological, and academic outcome of student i in school and education level j. HM ij and HT ij are dummies that indicate whether student i in school level j is assigned to the HM or HT treatment, respectively. X ij is a vector of control variables measured at or before baseline, including a second-order polynomial of student's IVV percentile. For the academic outcomes regressions, I also include standardized grades at baseline-including imputed values-and a missing baseline grades indicator as controls. Finally, I also control for "randomization blocks" with school-by-education-level fixed effects S j . Due to the possible bias in the estimation of the IVV, standard errors are adjusted using a cluster bootstrapped at the course-school level (Treiman, 2009). As a robustness check, I also estimate robust standard errors for the average effects.
In this setting, θ 1 (θ 2 ) can be interpreted as the effect on student i of receiving an offer to participate in the ASP with a homogeneous (heterogeneous) composition of violent peers, compared to effects of the C group. Testing for differences between the estimated coefficients θ 1 and θ 2 indicates the effects of group composition-tracking or integration-on the outcomes of interest.
I can also exploit the variation in peer quality generated by the experiment. Since participants in the HT subsample were randomly assigned to a group in the ASP, they will have a random set of peers. I can restrict the sample to these groups and estimate the effect of a student's peers mean 21 and variance baseline IVV. An alternative specification that can help to increase the efficiency in the estimation is to control for the mean and variance baseline IVV at the stratification-block level and to restrict the sample to treated students. Details of these estimation approaches are in Appendix 2. With these estimations, I can directly provide causal evidence of how student's i behaviors and/or academic outcomes are affected by the average or variance in the violence of her peers.

GC Heterogeneity by Initial IVV
By design, the HM group comprises two different subgroups: HM-High and HM-Low. This allows me to also analyze differential effects of group composition for children assigned to the lower and upper section. The assignment variable to those subgroups was the median of the IVV distribution at each HM-stratum level. Therefore, after controlling by the indicator IV V high ij and by the IVV median at the j level,ĪV V j , I can directly compare the results of each HM subgroup with the respective HT treatment, estimating the following specification: where HomH ij and HomL ij are dummies indicating whether the student i in stratum j was assigned to HM-High or HM-Low respectively, with the rest of the variables defined as before.
Specification (3) allows me to compare both treatments within each half of the IVV distribution, which is equivalent to including in specification (2) an interaction between Hom ij and IV V high ij .
In the upper half, θ 1 is an ITT estimator of assigning a child i with higher propensity for violence to a low-violence diverse group of peers, compared to allocating her to a high-violence diverse group. Also, for the lower half of the IVV distribution, θ 2 is an ITT estimator of assigning a less violent child to a low-violence diverse group of peers compared to a heterogeneously violent 22 group.

Effects of Tracking on the Marginal Participant
Results of previous equations enable identification of the average effects of being treated in a particular group composition. Moreover, with this experimental design I can explore the effect of peer violence exposure on the around-the-median children in a tracking setting. I call them the marginal participants. This group includes a set of students just above or below the fifth percentile of the IVV distribution. Given that these just-above-the-median children have a similar propensity for violence to those at or below the median, I exploit their assignment to a group of high-IVV peers and compare them with others in a low-IVV set.
It is interesting to study effects on the marginal participant because having highly violent peers (on average) means she is the least violent child in her group before the intervention; having less violent peers implies that she is the most violent child in her track. In this sense, the marginal participants are the most different children within their group. Therefore, they may experience greater tracking impact.
To identify this impact, HM groups provide a natural setup for an RD design, with the median of the IVV distribution in each stratum as the discontinuity. To maintain the validity of this strategy, the assumption is that nothing else changed discontinuously around the point of separation between the two groups, which holds true in this design. I estimate the following equation: where f (IVV ij ) is a flexible third-order polynomial of an individual's IVV percentile within each stratum, and HomH ij = 1 if the participant was in the HM-High group. In this case, λ 1 is a LATE estimator that indicates the effects of tracking for the marginal participant on her cogni-23 tive and noncognitive outcomes. I also estimate this specification while restricting the sample to the eight students around the cutoff within each stratum.

Results
In this section, I present reduced-form estimates of group composition and tracking effects on main outcomes of interest in the context of an ASP. I draw two main conclusions: first, better average effects emerge from mixing students by their initial propensity for violence than from segregating them on some behavioral and stress-related outcomes. Second, tracking has detrimental effects compared to integration, especially for students with a greater propensity for violence. Finally, being the marginal participant-that is, the least violent child within a highly violent group-negatively affects behaviors but does not necessarily harm academic performance. First, from the comparison between each treatment arm with the C group, estimations indicate that any group composition can be more effective than no treatment. For example, in panel A, I find that any group composition improves behavioral outcomes in relation to not participating in the ASP. Moreover, the estimated impacts are important in terms of magnitude. For example, the increase in time to do homework was up to 17% and the reduction in absenteeism was almost 25%. Moreover, teacher reports of student misbehaviors indicate a reduction of almost 0.14-0.17 standard deviations (intensive margin) and 6-11 percentage points in the probability of having a bad behavior report (extensive margin).

GC average effect
In terms of emotional regulation (panel B), estimations indicate that most of the statistical differences come from the comparison between the HM and C groups. I estimate a reduction in the overreaction to stimuli (valence), particularly to positive ones (positive valence difference), and a move toward internal locus of control, which is a belief that their own actions determine the rewards they receive. The HT and C groups are also different in the overreaction to positive stimuli. The rest of the estimated coefficients are not statistically different from zero.
Finally, academic performance can be improved by any type of group composition, compared to no treatment. As panel C shows, the HT composition of peers can increase the probability of a passing grade by almost 4 percentage points (column 1), while being assigned to similar peers can also reduce the probability of failing at least one grade by almost 3 percentage points, both compared to the C group (column 2).
Does one particular group composition improve the ASP's effectiveness more than another? Overall, the estimated results indicate that the HT composition of peers has a better impact on most of the outcomes than tracking by violence. As presented in panel A, students assigned to the HM groups show a reduction of 0.16 standard deviations, on average, in positive attitudes toward school, compared to students assigned to heterogeneous groups. They also increase their probability of having a bad behavior report at school by 5.5 percentage points.
As shown in panel B, the only statistical difference between the HM and HT group compositions is in participants' stress. I find that when students are treated in the HM groups, their stress levels are 0.28 standard deviations greater than those of students treated in the HT group. Finally, I do not find statistical differences between both treatments in the rest of the academic outcomes.
The positive impacts of a more diverse composition of peers are consistent with the evidence that interactions with diverse peers can generate differences in the learning experience (Lafortune et al., 2016). Moreover, the rainbow peer-effects model (Hoxby and Weingarth, 2005) can also explain these results. This model suggests students do best when they have a diverse group of classmates. Additionally, the results on misbehavior at school are suggestive evidence that treating students in violence-diverse groups reduces the probability of creating networks of violent children (Billings et al., 2016).
Appendix Table A8 presents estimated coefficients from specification (2)

GC Heterogeneity by Initial IVV
A still-open question is which children within each treatment arm benefit from the composition of their peers. An argument in favor of segregation posits that when children are mixed, the least violent can be "contaminated" by the most violent ones. However, segregation limits the potential for highly violent children to learn good behaviors from nonviolent peers. Since the design includes two different subgroups regarding violence in the HM group, I can further explore differences in group composition by comparing each HM subgroup with the HT group, using specification (3). These results are in Table 4. Column 1 presents estimated differences between lowviolence children assigned to the HM-L group and similar students assigned to the HT group.
Column 2 shows the same differences but between highly violent students who were randomly assigned to either the HM or HT groups.
First, perhaps surprisingly, I find that the HM-Low set is driving the negative effect of group composition on attitudes toward school and learning. Compared with the HT group, students in the HM-Low set experience a reduction in their positive attitudes by 0.22 standard deviations (panel A, column 1). This unexpected result is related to Hoxby and Weingarth (2005)'s invidious comparison peer-effects model, which (in this context) implies that the exposure to only less violent-or well-behaved-students depresses the average performance of the group. An alternative explanation is that students in heterogeneous groups have the opportunity to be exposed both to good behaviors they should follow and negative ones they should not emulate. These interactions are only weakly available for students in the homogeneous group. The second relevant result in this subsection is that the probability of having bad behavior reports is greater by more than 0.09 percentage points for highly violent students when they are segregated, as shown in panel A, column 2.
Moreover, as is evident in panel B, the increase in stress is greater for children treated in the HM groups when compared to respectively similar children treated in the HT group, independent of their propensity for violence at baseline. Finally, the effects of group composition on academic performance are not clear. While tracking by violence seems to improve academic scores by 0.13 standard deviations (panel C, column 1)for the less violent children in the HM groups compared to the HT group, it also worsens the performance (by almost 0.9 standard deviations) of the highly violent children assigned to tracked groups, compared to those assigned to integrated ones.
In sum, implementing this ASP using a tracking-by-violence design instead of an integration approach can negatively impact behaviors and emotional regulation outcomes of all high-and lowviolence participants; it can also worsen the academic performance of the most vulnerable ones.

27
The only instance where segregation seems to be better than integration is for students who are less susceptible to violence on academic outcomes. As I argue in the discussion, this last result can stem mainly from the content of the club curricula. According to the ASP structure, it may be that more time was employed for the club curricula in less violent HM groups, and therefore the reinforcement of "academic" content was greater here.
These results may have different interpretations. First, diversity regarding violence can be more beneficial because it allows highly violent students to be exposed to less violent children and to learn social skills and good behaviors from them. Similarly, low-violence children benefit from being exposed to "misbehaviors" they should avoid. However, students in a homogeneous group are losing the opportunity to learn from behaviors of the other tail of the violence distribution function.
An alternative explanation is that diversity is the social norm in the environments where these children usually perform. Thus, assigning them to similar peers may stress them more. In this sense, enrolling and treating only highly violent students together in these programs can produce an unintended effect. This result suggests that teaching solely socioemotional skills may not be enough to reduce misbehavior or violence by highly violent students, who would also benefit from interacting with-and emulating-low-violence students.
Finally, since participants were randomly allocated to a group in the ASP, some variation in the group composition stems from the fact that assignment to the HM vs the HT group directly affects the mean and variance of one's peers. Following Lafortune et al. (2016) and Duflo et al. (2011), the identification assumption is that after controlling for strata fixed effects, the variance and mean IVV of peers arises entirely from the random assignment. Details of the estimation and summary of results are in Appendix 2. These results reinforce the previous findings using direct variation of the experiment. First, higher average clubmates' IVV increases absenteeism and reduces time to do homework. Moreover, exposure to a more violence-diverse group of clubmates reduces absenteeism and the probability of having a misbehavior report, and it improves positive attitudes toward school and time employed to do homework.

Effects of tracking on the marginal participant
An additional piece of evidence from this experiment is the effect of tracking on students in the middle of the distribution. To directly measure these effects, I can compare the two homogeneous subgroups using specification (4). This equation allows me to identify differences of being assigned to a group of homogeneous peers with higher propensity to violence.
The estimations of tracking effects on a marginal participant's behavioral and academic outcomes are summarized in Table 5. Column 1 presents estimated coefficients using all students assigned to the HM treatment. Following Duflo et al. (2011), I run specification (3) but restrict the sample to the eight students around the IVV median within each stratum. Results are reported in column 3. As I discuss below, this sample restriction allows me to focus on the most similar students before the intervention. The downside is that it increases standard errors of the estimations, reducing statistical significance.
First, I control with a flexible second-order polynomial of a student's percentile in the IVV distribution within the homogeneous group at each stratum. As shown in panel A, I find that assigning a marginal participant to a group of peers with higher propensity for violence reduces her self-report of attitudes toward school and learning by 0.64 standard deviations, and her time to do homework by almost 1.6 hours. I do not find an effect on other behavioral outcomes due to the increase in standard errors.
On neurophysiological outcomes, estimations presented in panel B indicate that assigning a marginal participant to a group of peers with higher propensity for violence reduces her tendency for fur-ther reflection on responses-measured through CRT-by 0.89 standard deviations. This finding compares to peers who had similar propensity for violence but who were enrolled with lowerviolence peers. I do not find an effect in the rest of the outcomes. Effects of tracking on academic outcomes for marginal participants are also negative. As panel C illustrates, assignment to a high-violence group increases the probability of failing at least one course by 0.048 points. As before, there is an increase in standard errors, and some coefficients are not statistically significant.
In summary, being the least violent member of a highly violent group negatively affects behavioral and emotional regulation-related outcomes and academic performance. This is consistent with the existing evidence of endogenous formation of groups of badly behaved students when they are segregated. They seem to engage as group members and follow the group social norms of violence and negative attitudes, which indirectly impacts their academic performance.

Discussion
In this section, I present a summary of the group composition main impacts plus further discussion on the results of this paper to evaluate why the integration of students can have greater impact on most behavioral and neurophysiological outcomes. I also discuss why tracking can positively impact the academic performance of low-violence children in this particular context.

Better together: GC effects
Using the direct source of variation yielded by this experimental design, I find evidence that an average student is better-off in a more diverse ASP group than in a segregated one. Specifically, mixing is better for noncognitive outcomes regardless of the student's initial violence level. However, regarding academic grades, mixing is still better for the high-violence group, but segrega-tion generates greater effects for the less violent children.
These results are consistent with a body of microlevel evidence, such as papers on random assignment of freshmen or students (Thiemann, 2013); on elite exam schools (Abdulkadiroglu et al., 2014;Dobbie and Fryer Jr, 2014;Lucas and Mbiti, 2014); and programs for gifted individuals (Bui et al., 2014). Additional evidence on academic and labor contexts is presented by Hoxby As I briefly noted before, my results relate mostly to those of Rao (2019), who provides the first evidence of how changes in peer composition at school can shape a student's social preferences through an improvement in her generosity, prosocial behavior, and equity. My paper contributes to these results by providing additional experimental evidence that is especially relevant for the developing world. I test how exposure to diversity regarding violence positively impacts additional noncognitive outcomes such as violence, approval of peers' antisocial behavior, misbehavior, and attitudes toward school and learning. An additional outstanding characteristic in Rao (2019) is that the author uses well-constructed measures of social preferences. In my study, I collected measures of noncognitive outcomes from students' self-reports and administrative data provided by schools. These two sources of information enable me to contrast and validate the results.
Additional evidence from my experimental design regards the tracking effects for marginal participants. For example, an individual at the median in the violence distribution who is assigned to a high-violence group can be "contaminated" by her peers and increase her violence level. Or, according to the invidious comparison model, she can become less violent because she does not want to be like her fellow group members (Hoxby and Weingarth, 2005). By restricting the analysis to the homogeneous group, I find that students with the same level of violence at baseline seem to be "contaminated" by the predominant level of violence of the group to which they were assigned.
In contrast to some theoretical and empirical pro-tracking papers (Lazear, 2001;Duflo et al., 2011;Cortes and Goodman, 2014;Girard et al., 2015), my results indicate that tracking can have unintended effects on academic and noncognitive outcomes when it targets only the most violent students. This result reinforces the main conclusion of the benefits of diversity regarding violence, since it allows highly violent students to be exposed to less violent children and to learn social skills and good behaviors from them.

Why does integration generate better results?
In this subsection, I provide suggestive evidence to understand how these group composition impacts on average and marginal participants may operate. I start by exploring peer effects in social skills learning. Students in heterogeneous groups benefit from being exposed to both "good behaviors" they should follow and "misbehaviors" they should avoid, as predicted by the rainbow peer-effects model (Hoxby and Weingarth, 2005). However, students in a homogeneous group are losing the opportunity to learn from behaviors of the other tail of the violence distribution function.
A second channel that could explain the results is that diversity is the social norm in the scenariosparticularly at public schools-where students usually perform, making them more comfortable due to familiarity. In this sense, one can assume that students in heterogeneous groups may have attended more sessions than those in homogeneous groups. I test for differences in attendance at the ASP between each HM group compared to the HT group and present the results in Appendix   Table A9. Due to an increase in the standard errors, I find a small but not significant reduction in club attendance by both HM groups. Despite this lack of statistical significance, this result can shed light on preferences for diversity.
Further evidence to support the preference-for-diversity mechanism is presented in our work (Dinarte and Egana (2019)). For the same sample, we use data from spillovers and find different effects regarding proximity to misbehavior between nonenrolled and treated students. The results are higher for students whose bad behavior at school is between 1 and 2 standard deviations from the average misconduct of treated students from their classroom. Notably, the effects of this intermediate proximity are more significant on bad behavior reports.
The last mechanism that may underlie the group composition results is that tracking can strengthen the possibility of creating violence networks, which has been previously analyzed in the literature (Billings et al., 2016;Bayer et al., 2009). The implementation of interventions for groups comprising only highly or less violent students can generate unintended effects on both groups, particularly for the most violent children. These results also match those of Pekkarinen et al. (2009), who find benefits from ending school tracking on the performance of students from lower-ability backgrounds in Finland.
Such a limited effect of tracking is an unexpected finding, especially because the scope is large for targeting the learning experience of school children. In this particular context, qualitative evidence collected from focus groups with instructors indicate that there was no difference in adjusting the content of the club to the specific composition of the groups. Only 42% of instructors indicated that they employed different methods or activities in the clubs. Most of those activities were not related to particular needs of the participants, but with lack of materials, adequacy of spaces to implement the activities, reorganization of the games, and others. This may indicate that it is relevant to separate the responses of instructors and the direct interaction of effects between peers when evaluating tracking in other contexts. In addition, the differences in the results from this paper with those from Duflo et al. (2011) also suggest the lack of incentives to club instructors, compared to school teachers, to target material to the particular needs of the students.

Explaining the puzzling outcomes for less violent children
It is puzzling that the effects on academic outcomes for low-violence students are greater under tracking even when mixing improves their attitudes toward school and learning. One explanation is that the time dedicated to each part of the session was conditional on the group composition.
For instance, tutors in Low-HM clubs may have had to use less time on social skills training than on the particular club's curriculum, compared to the High-HM or HT groups. Thus, it may be expected that Low-HM clubs with academic curricula are driving the improved academic results compared to the HT clubs. I test this channel by including in specification (3) an interaction between each HM treatment and a dummy for academic clubs on academic outcomes. I find that in the comparison of the Low-HM and HT groups, the effects on academic outcomes are driven by students enrolled in clubs that focus on academic topics.

Conclusions
This paper provides experimental evidence of group composition effects of an ASP on participants' academic outcomes, behavior, violence, and neurophysiological outcomes. The intervention occurred in schools located in highly violent communities in a developing country, El Salvador.
By exploiting the direct variation from the experimental design, I find that in terms of behavioral and stress-related outcomes, tracking generates adverse attitudes toward school and learning, particularly for low-violence students, and increases the probability of bad behavior reports, especially for ex ante highly violent students. It also increases the stress levels of all participants, independent of their initial propensity for violence. These results are confirmed using the exogenous variation in the peer composition. Additionally, being the marginal participant-that is, the least violent within a highly violent group-negatively affects behaviors and harms academic performance.
These results have implications for public policy discussions regarding interventions oriented to improve academic outcomes and reduce violence within schools. This paper takes a first step in understanding the relevance of group composition in an ASP. It shows that within this context, peer effects are an important mechanism that can improve relevant outcomes, directing attention to the implementation of these interventions in heterogeneous groups.
Will these results persist over time? Glasswing International's donors, in exchange for financially supporting this impact evaluation, required the research team to allow students in the control group to participate in the intervention the following year. This will make it difficult to measure the ASP's long-term effect. Nonetheless, this experimental design can potentially serve in other contexts where the implementing practitioner or policy maker would like to evaluate the usefulness of targeting while maintaining coverage of all initial beneficiaries.
Finally, in the literature of interventions aimed at reducing crime and violence, one important aspect of these programs concerns the development of new, healthier social ties that foster a sense of belonging for participants and positively influence their identity (Heller et al., 2017). There is still a lack of evidence as to how this intervention can improve if students participate in the program within their closer network, exploiting their preferences for similar peers.

Appendix
Appendix 1. Description of Outcome Variables.
In our follow-up survey, we have multiple variables that measure some behavioral outcomes, such as attitudes, delinquency, and violent behavior. In order to have a single continuous measure that can be compared to previous evidence in the literature, we have built for some of them a standardized index that is an average of the multiple variables measured in the survey.
In the following section, we provide details of each outcome variable. For the index outcomes construction, we provide information of the main items included.
A. Behavior and Academic Outcomes and Regular (R). It can be translated in a continuous scale that is comparable to course grades. In this paper, we used a reversed continuous scale to facilitate the interpretation and comparability to the self-reported measures of violence and crime.

1.
Arousal : pre-test resting measure of individual's stress, estimated directly from her brain activity using EEG recordings measured while children were watching a black cross in the center of a gray screen for a period of 30 seconds.

2.
Valence: pre-test resting state measure estimated directly from participants' brain activity using EEG recordings. As the arousal measure, this recordings were estimated while children were watching a black cross in the center of a gray screen for a period of 30 seconds.
This variable can be interpreted as a positive or negative mood, as well as an attitude of either approach or withdrawal towards/from a stimulus (Harmon-Jones et al., 2010;Kassam et al., 2013).
3. Positive Valence Difference: Corresponds to the difference between the response intensity measure after exposure to positive stimuli and the valence-at-resting-state index described before.
4. Negative Valence Difference: Is a measure of the variation in the valence index recorded when the stimulus was negative net of the individual's baseline resting state valence index.
Both differences can be interpreted as a lower level of overreaction of participants -they become more phlegmatic or cold headed-or that individuals move towards a more withdrawal behavior or attitude.
5. Locus of control : Psychometric test developed by (Rotter, 1966). People with an internal locus of control believe that their own actions determine the rewards that they obtain, while those with an external locus of control believe that their own behavior doesn't matter much and that rewards in life are generally outside of their control. Scores range from 0 to 13. A low score indicates an internal control while a high score indicates external control.
In other words, a reduction in the score indicates moving towards an internal control.
6. Cognitive Reflection Test (CRT): Is a test designed to measure if an individual tends to automatically choose an initially incorrect response and then engage in a deeper reasoning to find a correct answer. A greater CRT indicates a deeper reasoning or less automatic responses.
C. Academic Outcomes 1. Reading, math and science scores: Variables that indicate performance on each course. It is a 0-10 scale, where 0 is the worst performance and 10 is the best. We have standardized these values from control groups at the school-grade level. Academic score is an average of the three courses.
2. Probability of passing course: Is a dummy variable that takes the value of 1 if student has been promoted to the following course and 0 otherwise.
3. Failing at least one course: Is a dummy variable that takes the value of 1 if student failed one or more courses and 0 otherwise.

Appendix 2. Exploiting the random allocation of peers
In addition to the main group composition effects obtained from the direct variation generated by this experiment, I also exploit an additional variation in peer quality generated by this trackingby-violence design. Since participants in the HT subsample were randomly assigned to a group in the ASP, they will have a random set of peers. I can restrict the sample to these groups and estimate the effect of a student's peers mean and variance baseline IVV by OLS using the following equation: wherex −ij and var(x −ij ) are the club's mean and variance to which student i was assigned, excluding her personal IVV -this allows me to address the reflection problem. In addition,X −ij and V ar(x −ij ) are the IVV mean and variance of all selected students, also excluding child i's IVV. The vector of control variables X ij includes participant's own baseline IVV. The rest of variables are defined as before. The estimated coefficients of interest are γ 1 and γ 3 . They reflect the causal effect of student i's peers' mean and variance violence on her own results.
In addition, since participants were randomly allocated to a group in the ASP, there is some variation in the group composition which stem from the fact that being assigned to HM vs HT directly affects the mean and variance of one's peers. As in Lafortune et al. (2016), after controlling for a strata fixed effect, the variance and mean IVV of peer stems entirely from the random assignment. Similar approaches have been used by Carrell et al. (2013); Duflo et al. (2011);Boozer and Cacciola (2001); Lyle (2007). The estimating equation for the sample of students selected to participate in the ASP is: All variables are defined as in the previous specification. With this equation I can directly provide evidence of how student's i non-cognitives and/or her academic outcomes are affected by the average baseline or variance in the violence of her peers.
Results from both specifications are presented in table A10 in the appendix tables section. Panel A exhibits effects on behaviors and attitudes towards school and learning. Panel B presents estimated effects of each group composition on emotional regulation, stress and psychometric outcomes. Finally, panel C shows the estimated effects of both specifications on academic performance. Columns 1-3 present the results using model 1 and columns 4-6 shows similar estimations obtained from model 2. Using these alternative estimation approaches, I obtain similar results as using the direct variation on group composition generated by the experiment. Panel A shows that a higher average clubmates' IVV reduces the self reported time spent doing homework but being in a more diverse group increases both self reported time spent doing homework and reduces absenteeism. In terms of violence, I do not find an effect from either the mean or average of clubmates' IVV. Despite the lack of power from neurophysiological outcomes and psychometric tests, I find that greater diversity reduces stress levels of participants. Finally, the effects on academic performance indicate that a greater level of peers' violence can increase academic performance, but also a greater diversity can improve the extensive margin of academic grades. 1.26 Table 1 shows descriptive statistics of the available variables at baseline. Panel A summarizes information obtained from the enrollment form that was used as determinants in the IVV estimation. Panel B presents administrative data provided by schools from students who had consented. These academic data are from the first quarter of school year 2016, before the clubs were implemented. The scale of grades in El Salvador is 0-10 points. In columns (1), (2), and (3), I present control (C), heterogeneous (HT), and homogeneous (HM) groups' average characteristics, respectively. The last two columns show the average for the two homogeneous subtreatments (HM-H and HM-L). Unadjusted p-values are presented in Table A7 in the Appendix. There are statistical differences in two categories of household composition and in average absenteeism between HM and C, in the average course and absenteeism between HT and C, and in the IVV score between HT and HM. Most of the differences between HM-H and HM-L groups are statistically relevant, which is desired in this particular experimental design.  (2015) survey, V f as a violence dummy indicating that a child or adolescent has committed at least one of the following actions: Have you ever: (i) brought a gun, (ii) attacked someone with the intention to hurt him, (iii) attacked someone with a gun, (iv) used a gun or a violent attitude to get money or things from someone?. D f is a vector of violence determinants, including gender, age, mother's education, etc. According to this experimental design, I expect that the average IVV of the HT group should be between the HM-H and HM-L groups' IVV. Moreover, the IVV variation in the HT group should be greater than the respective figures of both groups. This table presents the effects of tracking and integration on the outcomes of interest, which were estimated using specification (2). Panel A exhibits effects on behaviors and attitudes toward school and learning. Panel B presents results on emotional regulation outcomes and psychometric tests. Panel C shows estimated effects of each group composition on academic performance. Description of outcome variables is available in Appendix 1. Outcomes come from data collected during the follow-up, at the end of school year. In estimations for academic outcomes, absenteeism, and bad behavior reports, I also include the corresponding imputed outcome at the baseline and a dummy indicating a missing value at baseline. All regressions are estimated using models of specification (2). Sample size in each specification varies according to the amount of data available for each output. ***, **, *, + indicates that the effect of being treated in a specific group composition (HT or HM) compared to the C group is significant at 1%, 5%, 10%, and 10%, respectively. Bootstrapped standard errors are in parentheses. Table A8 in the Appendix section presents the same models but with robust standard errors. As I discuss, almost all results remain statistically significant at the conventional levels. ***, **, * indicates that the effect of being treated in a HM (high or low) group compared to being treated in a HT group is significant at 1%, 5%, and 10%, respectively. Bootstrapped standard errors are in parentheses. Panel A exhibits effects on behaviors and attitudes toward school and learning. Panel B presents results on emotional regulation outcomes and psychometric tests. Panel C shows estimated effects on academic performance. Outcomes come from data collected during the follow-up, at the end of school year. All regressions are estimated using the models of specification (3). In estimations for academic outcomes, absenteeism, and bad behavior reports, I also include the corresponding imputed outcome at the baseline and a dummy indicating a missing value at baseline. Sample size in each specification varies according to the amount of data available for each output. ***, **, * indicates that the effect of being treated in a HM-H group compared to being treated in a HM-L is significant at 1%, 5%, and 10%, respectively. Bootstrapped standard errors are in parentheses. Panel A exhibits effects on behaviors and attitudes toward school and learning. Panel B shows effects on emotional regulation outcomes and psychometric tests. Panel C shows the effects on academic outcomes. All regressions include-in addition to the main controls mentioned above-a second-order polynomial IVV. Estimations first use the homogeneous groups subsample-results in columns (1) and (2)and then the 8 students around the cutoff-results in columns (3) and (4). The table provides means and standard deviations of the main variables from this study and FUSADES (2015) samples. These variables were used to estimate the IVV for each student in the study sample. Column (5) shows the p-value of the comparison of means between both samples. ***, **, and * denotes difference significant at the 1%, 5%, and 10% level, respectively, when comparing the means. I estimated the correlation between the IVV prediction with academic grades and misbehavior reports before the intervention, using administrative data. The estimated specification was the following: yij = α0 + α1IV Vij + ij , where yij is the academic grade or misbehavior report for student i in school j, IV Vij is the estimated propensity for violence. ***, **, * indicates that coefficients are significant at 1%, 5%, and 10%, respectively. Robust standard errors are in parentheses. Results of the correlation between IVV prediction and misbehavior reports one year after the estimation. I used administrative data only for the control group (those who were not directly treated). The estimated specification was the following: yijt = α0 + α1IV Vijt−1 + ijt, where yijt is the misbehavior report for student i in school j in the period t (one year after) and IV Vijt−1 is the estimated propensity for violence one year before. ***, **, * indicates that coefficients are significant at 1%, 5%, and 10%, respectively. Robust standard errors at courseschool level are in parentheses. The table provides the match rate with administrative data, calculated as the fraction of students present at the survey at the baseline who could be matched with administrative data from schools. In comparing HM and C, + denotes difference significant at the 10% level. A similar notation is used to indicate statistically significant differences between HM-High and HM-Low (*).  (1) and (2), I present p-values of balance tests between (C) and each treatment arm. Column (3) presents similar values for the comparison between treatment arms, and the last column shows the results of the test between the two homogeneous subtreatments (HM-H and HM-L). The variable "similar classification" = 1 if a student would have been classified as a high-violence child using their position in the IVV and misbehavior reports distribution functions, at the stratum-treatment arm (C, T, Het, Hom) level. Tests include strata fixed effects. Robust standard errors at course-school level are in parentheses. ***, **, *, + indicates that the effect of being treated in a specific group composition (HT or HM) compared to the C group is significant at 1%, 5%, 10%, and 10%, respectively. Robust standard errors are in parentheses. Panel A exhibits effects on behaviors and attitudes toward school and learning. Panel B presents results on emotional regulation outcomes and psychometric tests. Panel C shows estimated effects of each group composition on academic performance. Description of outcome variables is available in Appendix 1. Outcomes come from data collected during the follow-up, at the end of school year. All regressions are estimated using the full sample and models of specification (2). Sample size in each specification varies according to the amount of data available for each output. ***, **, * indicates that the club attendance from the HM (high or low) group compared to being treated in a HT group is significant at 1%, 5%, and 10%, respectively. Bootstrapped standard errors at course-school level are in parentheses. Two measures of attendance are number of sessions and days. Regressions are estimated using only treated group and models of specifications (2). ***, **, * indicates that the estimated coefficient is statistically significant at 1%, 5%, and 10%, respectively. Bootstrapped standard errors at the course-school level are in parentheses. Panel A exhibits effects on behaviors and attitudes toward school and learning. Panel B presents results on emotional regulation outcomes and psychometric tests. Panel C shows estimated coefficients on academic outcomes that come from schools' academic data. All outcomes come from data collected during the follow-up, at the end of school year. All regressions are estimated using the sample of students according to specifications (5) and (6). Details are presented in Appendix 2. Sample size in each specification varies according to the amount of data available for each output.  This figure shows the sample composition and randomization procedure applied in this design. From the total of enrolled children in each educational level {1,2,3} ∈ school A, I randomly assigned 25% to C and 25% to HT treatment arm and 50% to HM groups. The same procedure was implemented in the remaining schools.  Predicted IVV distribution functions generated by the experimental design for the heterogeneous treatment group and each of the homogeneous subgroups (High and Low IVV) in the whole study sample. Cumulative distribution function for high-and low-homogeneous treatment groups' predicted propensity for violence. Vertical yellow lines define the limits of overlap between both distribution functions. This overlap in the violence level occurs because assignment was at the stratum level, and the median level was different within each stratum.  Median predicted IVV of student's clubmates as a function of the student's own baseline IVV in homogeneous high and low groups. Consistent with the discontinuous assignment at the median IVV, there is a sharp discontinuity at the fiftieth percentile for the entire subsample.