Policy Research Working Paper 10483 Can Digital Personalized Learning for Mathematics Remediation Level the Playing Field in Higher Education? Experimental Evidence from Ecuador Diego F. Angel-Urdinola Ciro Avitabile Marjorie Chinen Education Global Practice June 2023 Policy Research Working Paper 10483 Abstract Many Ecuadorian students entering higher education have math. The analysis finds no impact on the probability of cognitive skills gaps in mathematics that undermine their enrolling in the third semester. When disaggregating the ability to assimilate academic contents. This paper pres- impacts, the findings show that the effects on repetition ents the results of a randomized controlled trial assessing are particularly large for male students, possibly because the effects on academic outcomes of a Digital Personal- of higher male enrollment in science, technology, engi- ized Learning Software for mathematics remediation (the neering, and mathematics disciplines. When assessing the ALEKS software) offered to first-year students entering potential mechanisms, the findings show evidence that the technical and technological higher education programs in software led to a net increase in hours dedicated to studying Ecuador amid the COVID-19 pandemic. The possibility mathematics. The results suggest that Digital Personalized to use the software led to a large and marginally significant Learning Software can be a cost-effective solution for math decline in the probability of repeating a course, as well as remediation with potential for large-scale application. a very large positive impact on standardized test scores in This paper is a product of the Education Global Practice. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at dangelurdinola@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Can Digital Personalized Learning for Mathematics Remediation Level the Playing Field in Higher Education? Experimental Evidence from Ecuador Diego F. Angel-Urdinola, Ciro Avitabile, Marjorie Chinen JEL Classification: I20, I21, I23, I25, I29 Keywords: Mathematics remediation; higher education; teaching at the right level, computer assisted learning, digital personalized learning.  Special thanks to Segundo Agapito Farias and Nelly Fernanda Paredes from SENESCYT, as well as Flavio Jacome and Silvia Nuñez from McGraw Hill, for their significant contributions. We also acknowledge the contributions of Catalina Castillo and Lorena Moreno during the intervention design and implementation. Juan Baron provided valuable comments on an early version of the paper. This study was registered in the AEA RCT Registry on February 28th, 2022, with the RCT ID AEARCTR-0009036. 1. Introduction Many students who graduate from high school are academically unprepared for college (Bettinger & Long, 2005). The underlying problem is that the quality of secondary education does not always ensure that students have the core cognitive skills in reading and mathematics necessary to assimilate university-level academic content. To address gaps in academic readiness in mathematics, universities globally implement remedial programs (Bettinger & Long, 2005). In Latin America, due to institutional and budgetary constraints, remedial programs are scarce, do not follow clear quality standards, and remain largely unassessed (Ferreyra et al., 2017). Moreover, remedial programs often rely on tutors, making it challenging to customize them to the student’s needs and expensive to implement at scale. The COVID-19 pandemic exacerbated the needs for remedial programs globally as school closures contributed to learning losses on core foundational skills, especially among students from socio-economically vulnerable households (World Bank, 2022a; Alban Conto et al., 2021). In higher education, the literature finds that that first-year students who attend in-person remedial instruction in mathematics are highly likely to continue into their second study year (Calcagno & Long, 2009). A recent assessment of the effects of counseling and mathematics remedial courses on the academic achievement of higher education students in Chile shows that students who participated in these programs had better academic results than those with similar characteristics who did not take part (Venegas-Muggli ets l., 2019). Nonetheless, implementing remedial education that satisfies minimum quality standards relies heavily on tutoring, is costly, and requires high levels of institutional capacity (Saxon & Boylan, 2001). As a result, in Latin America remedial programs are scarce (Ferreyra et al., 2017) and some universities are opting to redesign/simplify their requirements and mathematics curricula, while others adjust the pedagogy of math-intensive courses using project-based learning and encouraging students to work in groups (Epper & Baker, 2009). An alternative to provide in-person remediation in mathematics to students is to use Digital Personalized Learning (DPL), which can individualize students’ skills development process and offers the possibility for cost-effective deployment at scale. Essentially, DPL uses Artificial Intelligence (AI) and machine learning to provide students with adaptive instruction tailored to 2 their competency levels, commonly known as "Teaching at the Right Level" (TARL).1 The basic principle of TARL is to adapt instruction to match students' needs based on their prior knowledge (Lalley & Gentile, 2009). This adaptation process helps students enhance knowledge retention and motivation, while providing a stronger foundation for new learning (Foshee et al., 2016). Adaptive Learning is a promising mechanism to improve student skills and their perceptions about those skills, known as perceived self-efficacy, which is often associated with academic performance, especially in mathematics (Ryan & Deci, 2000; Wigfield & Eccles, 2000). DPL offers additional advantages, such as providing students and teachers with different pedagogical strategies and regular data to assess and monitor learning. Many DPL platforms are available through PCs, tablets, and telephones with internet access, which makes them accessible and relevant. 2. Related Literature Evidence on the impact of DPL is limited. The available literature shows promising results in primary and post-secondary education settings. Moreover, DPL has yielded promising results in developing countries in primary education settings (Banerjee et al., 2007; Muralidharan et al., 2019). For instance, Muralidharan et al., (2019) present experimental evidence on the impact of a DPL on delivering after-school mathematics instruction at scale to middle schoolers in urban India. The authors report that students who benefited from the program scored 0.36 standard deviation higher (equivalent to 2 to 3 years of traditional instruction) in independent math exams after participating in the program for 4.5 months, with total exposure to the DPL platform about 4.5 hours per week. Building on this experience, de Barros and Ganimian, (2021) provided DPL to 1,528 students in grades 6 to 8 across 15 public schools in India. While the intervention had a positive but statistically insignificant effect on the math achievement of the average student in their sample, their study finds that treatment students with low initial performance outperformed their control counterparts by 0.22 standard deviation. 1 Although there is no consensus about its definition, DPL often includes four major components: (i) a communication interface that presents and receives information; (ii) a domain model that contains the information to teach; (iii) a student model that has students’ learning states (e.g., progress towards mastery, cognitive states, and performance); and (iv) a pedagogical model that represents instructional strategies (Sottilare, 2015). ALS often provide students with performance feedback (e.g., informing students about right or wrong answers, correcting responses, or providing worked examples) and support on steps to solve a problem, such as prompts, hints, and other scaffolds while a student is working on a problem (Vanlehn, 2006). 3 Foshee et al. (2016) discuss the results of a remedial mathematics intervention that provided DPL to 2,880 students in the U.S. who did not pass a math placement exam required to enroll in a first-year level college mathematics course. Using a pretest and posttest design, the authors found that remediation using DPL helped 75 percent of students pass the placement exam and had a positive, statistically significant effect on students’ learning and academic competence. Ma et al. (2014) conducted a meta-analysis to assess the impact of DPL on students’ learning achievement, which includes over 107 different interventions that use intelligent tutoring systems for mathematics remediation, mainly tailored to college students in developed countries. The authors find that DPL remediation is associated with higher student achievement than traditional remediation using tutors in large-group settings and non-adaptive computer-assisted remediation. The authors also find no significant difference in student achievement between learning from DPL and conventional tutoring with small groups. The findings are relevant for college remediation settings due to the high costs of tutors and setting up remedial classes. A gap in the literature is that most studies assessing the effectiveness of DPL in post-secondary education are available for developed countries. Our study is a pioneer in filling this gap, especially given that the DPL intervention we evaluate rolled out at a large scale in Ecuadorian public technical colleges. 3. Context, Intervention and Study Design 3.1. Technical Colleges in Ecuador In 2020, the public system of technical and technological colleges (TTC) in Ecuador comprised 90 public TTCs distributed nationwide. Enrollment in public TTCs reached 50,053 students in 2020 (about 8% of total enrollment in higher education). In the first half of 2019, 90 percent of students in TTCs were registered in the presence-based modality, 7.2 percent in dual programs, and 2.4 percent in distance or semi-distance modalities. In 2020, the system hosted 6,958 teachers, of which 56 percent worked full-time. About 60 percent of teachers in the system have attained an undergraduate degree, 32 percent have a graduate degree, and 6.7 percent have a technical degree. Admission to TTCs is selective and requires a secondary school certificate and a minimum score on an entrance examination. Technical and technological programs offered by TCCs take between 2 and 3 years to complete. Upon completing the program, students are awarded 4 a tertiary-level degree as technicians or technologists. Some professions offer an additional certificate evaluation, which provides graduates with a professional license in their specialization. The public systems of TTCs offer 172 careers within 20 knowledge areas (see Table A1 in Appendix A). Students who enroll in public TTCs come from low and medium-income households, and many cope with work and study simultaneously. Almost half of them come from families with parents who have achieved at most primary education. This population is more likely than the traditional college student to enter the system with academic gaps, especially in core foundational numeracy and literacy skills. Available data from the year 2021 revealed that about 61 percent of all new entrants to the public TCC system display inadequate levels of core competencies necessary for college readiness (such as communications, numeracy, and problem-solving) and were at risk of not being able to complete their post-secondary education successfully (ACET, 2021). Inadequate academic readiness often curtails student academic progression. For instance, in the first semester of 2018, approximately 19.6 percent of first-year students enrolled in public TTCs dropped out after six months, whereas 33 percent dropped out after 12 months (or two academic semesters). 3.2. Technical Higher Education Provision Under COVID-19 The COVID-19 pandemic led to the closure of in-class instruction in technical institutes nationwide and the adoption of remote learning modalities that began in March 2020 and continued for the academic period in 2021. In-person classes were gradually reintroduced to students starting in March 2022. Adopting virtual learning modalities was abrupt. The SENESCYT had to revise the admission requirements for public higher education students. Traditionally, all students who completed high school were required to take the "Ser Bachiller" exam, an assessment designed to evaluate high-school graduates' knowledge in mathematics, language and literature, natural sciences, and social sciences. During COVID-19, the Ministry of Education canceled the exam. Using similar content items and those in the "Ser Bachiller" assessment, the SENESCYT developed a new exam (the EAES, or exam for seeking access to higher education) required for students who wanted to enroll in public higher education, including TTCs. 5 During the pandemic, the SENESCYT and the Higher Education Council released general guidelines and transitional regulations for higher education institutions to develop academic activities. Under the regulation, institutes had the flexibility to adjust the content of their courses (as much as 25%) and the class schedule to fit their circumstances. Attendance was no longer required to approve a course, and teachers would decide which students would pass or fail based on formative assessments. The SENESCYT also attempted to increase internet capabilities (bandwidth and speed) across all TTCs and established an online tool to allow students and teachers to exchange information and connect to classes. Teachers needed to be equipped with adequate pedagogy support during the transition and to cope with insufficient technological resources to maintain academic services. During the second academic period of 2020 and the first academic period of 2021, about 20 percent of the students admitted into TTCs decided to withdraw from their studies (UNESCO, 2022). Many did so because they did not have adequate access to equipment and connectivity for virtual instruction modalities. Technical careers that offered in-class instruction and practical training (such as gastronomy and auto-mechanics) suspended all laboratory and workshop experiences. At the time of the rollout of the DPL program (first academic period of 2020), TTCs imparted all classes online. 3.3. The ALEKS Software and the Application in Ecuador The ALEKS (Assessment and Learning in Knowledge Spaces) software is one of the most popular DPL software for mathematics instruction globally (Fang et al., 2019). ALEKS’s adaptive learning model uses knowledge space theory (KST), a probabilistic model to assess learning paths introduced by Falmagne & Doignon, (2011).2 Fang et al., (2019) evaluated the overall effectiveness of ALEKS on student learning through a meta-analysis of available studies, most of which were implemented in post-secondary education settings in developed countries. Results of the meta-analysis revealed that ALEKS-led mathematic instruction improves students’ academic 2 KST develops the concept of "learning items," a collection of examples of a curricular topic included in an academic course. For example, an item for a college remedial course in Algebra could be "Solving a compound linear inequality" or "Solving a word problem with two unknowns using a linear equation ." Several hundred items make up a typical academic course and having the knowledge and skill to complete all the items successfully means (according to KST) mastery of the course. KST identifies using AI which subjects the students are “ready to learn” and developed an individualized learning path aiming to ensure mastery of all contents. 6 performance as much as traditional human instruction, which makes it a potentially cost-effective solution for student remediation in mathematics. All students eligible to access the ALEKS license receive an e-mail with instructions, the software license, and login credentials. Upon logging in, students must complete a brief diagnostic assessment comprising 20-30 problems. This assessment identifies their current level of knowledge and the areas where they can improve. The assessment is adaptive, meaning that the following problem in the assessment depends on the accuracy of the student's answer to previous problems. After the initial assessment, the student receives a color-coded pie chart report where each slice corresponds to an area in the course syllabus (e.g., systems of linear equations) and reflects the level of mastery of the items in that area. Each student also receives a list of topics/items that he or she is ready to learn in each area. Based on this list of items, the student chooses the topics he or she wants to work on, and ALEKS provides a set of related problems. The student learns by solving problems, and each problem includes an 'Explain' button, which presents a detailed explanation with worked examples. As the student covers new topics and develops proficiency in the items, new topics add to the list that the student is ready to learn. The software conducts periodical assessments (typically 3 or 4 times during a course) to assess the student's knowledge state and revise their learning path. In summary, ALEKS creates a continuum of knowledge states and uses student modeling to decide what course materials to present to learners. As part of the activities of a World Bank-supported Project "Reconversion of Technical and Technological Institutes in Ecuador" (PRETT for its initials in Spanish), the SENESCYT and the World Bank agreed to implement a pilot in 5 technical and technological institutes throughout the country, benefiting more than 800 first-year students enrolled in technical and technological institutes. The pilot rolled out between January and March 2020 by giving all students access to licenses to use the "ALEKS pre-calculus for college readiness" course. The standard course consists of 597 topics. Since students did not need to master all these topics (each technical program requires a different mastery in mathematics), teachers and career directors selected a sub- set of topics according to the math curricular requirement of their program. During the pilot, students spent about 90 minutes per week at the institute’s computer laboratory working with the software as part of their coursework requirements. The pilot showed promising results. During the initial evaluation (pre-test) that the software conducts, on average, students in the pilot mastered only 20 percent of their course curricula. After using the platform for three months, the knowledge 7 of the course curricula reached 61.2 percent, representing an increase in the curricular learning of between 8 and 10 percent per month. All students who received a license participated in the program and the great majority used it the recommended time. Due to the COVID-19 pandemic, TTCs closed in March 2020. As a result, the rollout of the intervention changed to the extent that students could not use the computer laboratories at their institutes and needed to ensure their means to access the software (through a computer, tablet, or smartphone) and the internet. In other words, the only change in the intervention post-pandemic was the technology delivery modality. Same as in the pilot, instructors from TTC reviewed the curricular contents included in the standard "ALEKS pre-calculus for college readiness" course. They selected the items that they considered relevant based on their course's curricular priorities. A series of item "calibration" workshops occurred in December 2020, which led to the customization and configuration of all ALEKS courses in the system and the provision of teacher credentials. A course would typically consist of about 200 items. But, since not all technical programs have the exact mathematics requirements, the number of items in every course oscillated between 80 items in technical careers related to the provision of services (e.g., health and wellbeing) and 207 items in engineering-related technical programs (Table A2 in Appendix A). Teachers participating in the program received training on accessing the software, creating, and modifying the course, viewing student dashboards to monitor their performance (use and progress), and using the software data analytics functionalities.3 The rollout of the intervention began in January 2021. During the semester, McGraw Hill provided guidance and support and offered periodic reports on the access and use of the platform. Similarly, a local monitoring firm also prepared intermediate monitoring reports showing statistical data on the platform's performance (e.g., number of active vs. enrolled students, initial proficiency and progress, average hours of use, and percentage of students who meet the minimum recommended for weekly use). The monitoring results helped identify institutes with a high share of students who had not used the platform and problems with the take-up of the program. Based on these findings, teachers received additional training the first week of March, which addressed 3 All training sessions were recorded to benefit teachers who could not participate. Additionally, the McGraw Hill team set up an email account teachers could use in case they had questions related to using the platform or required technical support. 8 take-up and individual student tracking issues. Figure 1 describes the timeline of implementation and data collection. The program cost was approximately $18 per student, considering various factors such as the number of licenses purchased by SENESCYT, the number of teachers trained, and the expenses associated with monitoring the program during its implementation period. 4. Randomization and Data Students were eligible to use an ALEKS license based on a randomized assignment. Of the 91 public TTC operating in the second semester of the academic year 2020 (2020-II), 71 offer courses requiring mathematics during the first semester by comparing the course curriculum to the "ALEKS pre-calculus for college readiness" course. Around 11,400 students enrolled in a course including at least one curricular mathematics-related content covered in the course. Randomization was conducted at the TTC level using a stratified design, with institutes being divided into terciles based on the expected size of the student enrollment in period 2020-II.4 Out of the 71 TTC, 39 were randomly assigned to receive ALEKS licenses for all their first-semester students, with the remaining 32 TTC that acted as a control group and were scheduled to receive ALEKS licenses for first-semester students enrolled in the first semester of the academic year 2021 (2021-I). 4.1. Main Outcome Variables Mathematics Achievement The SENESCYT introduced the Higher Education Access Examination (EAES) in the second semester of 2020 for students wishing to access higher education. The original test covers four content areas: mathematics, language and literature, natural sciences, and social sciences. This study uses only two areas: mathematics and language and literature. We selected the first content area to assess student knowledge in mathematics, which aligns well with the objectives of ALEKS, and the second one, to examine crowding out effects, or the possibility that the use of ALEKS for mathematics may unintendedly reduce the time students spent learning other subjects like language. Language and Literature were also selected because they cut across the curricula of 4 At the time of randomization, final data on enrollment for 2020-II were not available yet. 9 various technical programs. The mathematics/language and literature assessments include 19 and 23 items, respectively (Table 1). We compute the outcome variable as the percentage of correct answers in the EAES in the selected subjects. We then standardize it relative to the control group's mean and standard deviation. In the appendix, we present results based on Item Response Theory (IRT) in order to check the robustness of our results. Enrollment and Repetition Other key outcomes of interest include enrollment in the third semester and the probability of repeating at least one subject. The first outcome takes a value of one if a student enrolls in the third semester and zero if otherwise. Similarly, the second outcome takes a value of one if a student repeats at least one subject (up to that semester) and zero if otherwise. Both outcomes originate from available administrative data collected by SENESCYT in the second semester of the 2021 academic year.5 4.2. Baseline Covariates and Balance Results We test whether pre-treatment characteristics differ for the treatment and control groups. These covariates were obtained from the administrative enrollment dataset gathered by SENESCYT in the second semester of the 2020 academic year. This dataset collects comprehensive information from each student that ranges from unique identification (e.g., student ID), basic demographics (e.g., date of birth, gender, ethnicity), proxies of socioeconomic status (e.g., whether the student studies and works), whether the family receives cash transfers and parental education (i.e., the “Bono de Desarrollo Humano”), along with other academic information such as the admission score obtained during their higher education application process. Table 2 presents the results, with student-level and institute-level characteristics displayed in the top and bottom panels, respectively. In the control group, students are 22 years old and primarily male (60 percent). About 40 percent combine study and work, and only 2.3 percent receive a 5 The variables included in this dataset along with the descriptions and labels are described in the document named “Guía de Registro de Institutos y Conservatorios Superiores Públicos y Particulares Matriculados”. The information is uploaded directly by each TTIs to the National Information System for Higher Education (SNIESE) and transmitted directly by internet. 10 scholarship. Out of the 13 baseline characteristics, only one is statically different for treatment and control institutes, namely the admission score obtained in the application process. This variable is only available for one-third of the students, as many institutes did not provide the score. Nevertheless, it deserves careful consideration since the variable correlates with student outcomes and displays a large and significant imbalance (p < 0.01). For this reason, we include it in our baseline specification. The average number of professors per institute is relatively high (40) compared to the number of students (168). Furthermore, the SENESCYT disposes of information on professors’ cognitive and non-cognitive skills (collected using the DESCAES standardized test), a proxy for their human capital. These variables' information does not provide evidence of a significant imbalance between treatment and control schools.6 4.3. Intervention’s Take-up After randomization, 6,069 students in the 39 TTCs were assigned to receive ALEKS. However, only 84 percent (or 5,077 students) used the license. One possible explanation for the initial drop in the sample is that some students only confirmed their enrollment in the course after their license was issued or failed to follow through with their intention to enroll. Of the 5,077 participants, 97 percent used the platform at least once, and 74 percent used it for 360 minutes or more per month during at least one of the five intervention months. McGraw Hill recommends using the platform for at least 90 minutes per week or 360 minutes per month. Although teachers encouraged using the platform, doing so was not compulsory. Using ALEKS did not affect student grades, which may have decreased students' incentives to use it during the mandated times. Figure 2 shows that the take-up of ALEKS fluctuated from month to month, starting a little above 50 percent in January and achieving its peak in February and March, when 86 percent of the students with licenses used the platform for at least one minute. In April, the percentage of students that used the platform dropped to 70 percent, and in May, which coincides with the end of the semester, the take-up of the platform dropped sharply to 7 percent. 6 The DESCAES assessment is a standardized, online test that can diagnose skills and measure competencies using task-based exercises that confront individuals with real situations. 11 ALEKS' use was not uniform across TTCs and students and varied depending on their field of study. Figure 3 presents the average number of minutes that the program was used between January and May 2021 by the general knowledge area of the technical course. Unsurprisingly, students enrolled in programs with a heavier content in mathematics (such as those related to engineering and administration) used ALEKS more, on average, than those enrolled in services and agriculture programs. This result is also associated with the number of items ALEKS included in each course. Since some courses had fewer items (depending on the academic program), students would be able to complete them faster. 5. Empirical Strategy We estimate the average impact of the eligibility to receive ALEKS – the so-called intention to treat (ITT) - among students enrolled in the first semester of higher education using the following model: = 0 + 1 + ′ + ′ + (1) where denotes outcome for student i in institute s, is an indicator variable for whether institute s is among those institutes that were randomly assigned to receive the license to use ALEKS for their first semester students; denotes the stratification dummies that account for differences in expected students enrollment ahead of the institute assignment to the treatment; controls for a set of baseline characteristics for individual i, including age of the student, a dummy for whether age is missing, gender, whether the households receives social assistance (i.e., benefit of the Bono de Desarollo Humano program), the admission score students obtained during their college application process, and a dummy for whether the score is missing. These are included to improve efficiency and to correct for any baseline imbalances. is the residual term. Standard errors are clustered at the institute level, representing the treatment unit. The main parameter of interest is 1. We estimate the ITT of ALEKS on three primary outcomes: the math score in an independent cognitive test, the probability of being enrolled in the TTC in the third semester, and – for those who continue their studies – the probability of having failed at least one subject since they first enrolled in SENECYT institution. To account for multiple 12 hypothesis testing, when discussing the main results, we present Romano & Wolf (2005) adjusted p-values. Multiple sources of attrition can affect the interpretation of our results. We will discuss the potential extent and the implications for each of them. In order to better characterize our results, we test how the impacts vary according to different baseline characteristics, allowing for fully interacted models. 6. Results 6.1 Main Impacts We start by assessing the average impact of being eligible to receive an ALEKS license on the outcome variables of interest. Table 3 presents the main results, with odd columns reporting results from the specification that only controls for strata fixed effects. Even columns report results for the baseline specification controlling for the baseline characteristics specified above. Columns 1 and 2 show the result of the cognitive test (EAES, selected topics) that students completed online about a month after the end of the intervention. On average, students in the treatment group scored 0.28 standard deviations (sd) more than the control group, with a statistical significance at a 1 percent level. This result is quantitatively similar to the impact of an online tutoring program implemented in Italy during the COVID-19 school closing (Carlana & La Ferrara, 2021) and relatively close to the average impact of in-person math tutoring for pre-K to 12 students (Nickow et al., 2020). About 30 percent of the students took the online test. While attrition is large, a variety of tests boost confidence in the results. First, the difference in attrition between the treatment and the control group is quantitatively small and not statistically significant (Table A3 in Appendix A). Second, when we conduct a test of selective attrition, the characteristics of online test takers are not statistically different between the treatment and control groups. Finally, when we compute Lee (2009) bounds to potentially account for non-random attrition, we find that the treatment effects vary between 0.04sd and 0.41sd (Table A4 in Appendix A). IRT results in Table A5 rule out the possibility that the treatment effects are driven by test features. 13 Improvements in math might come at the expense of paying less attention to other subjects, as students in the treatment group might be more likely to shift their time and efforts toward studying math. Results in Table A6 in Appendix A rule out this hypothesis. Only 59 percent of the students in the control group enrolled in the third semester in November 2021. In columns 3 and 4 of Table 3, we report ALEKS’ impacts on the probability of enrolling in the third semester on time. The effect is null, irrespective of whether we control for baseline characteristics or not. Among students who enrolled in the third semester, those who had the opportunity to use ALEKS display a lower probability of failing at least one course since they first enrolled in the TTC (columns 5 and 6). The effect size is considerable - as it corresponds to 45 percent of the mean in the control group and is statistically significant at the 10 percent level. The probability of failing any course is only reported for students enrolled in the third semester, thus creating a potential source of selection bias. However, due to the null effect of ALEKS on enrollment, this bias is likely to be the same for students in the treatment and the control group. The results presented in this section show that for two (out of three) outcomes, the ITT of the ALEKS software was large and statistically significant. Addressing issues related to testing multiple hypotheses leaves our conclusions substantially unchanged. In summary, our results suggest that the possibility of using ALEKS during the first semester of higher education improved students’ math preparedness and reduced the probability of failing a class during the first two semesters. However, it did not result in increased retention, possibly because multiple factors affect this outcome. 6.2. Heterogeneous Effects To better understand which students benefited the most from ALEKS, we analyzed how the treatment effect varied across different baseline characteristics. We first study whether there are differences in ALEKS effectiveness by gender, and Panel A in Table 4 reports the results. For male and female students, we find that ALEKS led to large and statistically significant improvements in math test scores (0.28sd and 0.26sd, respectively). No gender-related differences exist in the treatment effect on enrollment (columns 3 and 4). There is, nonetheless, a significant difference in the treatment effects on repetition. Among male students, ALEKS led to a 14-percentage point reduction in repetition, equivalent to 54 percent of the repetition rate in the control group (column 5), while the effect on female repetition 14 is zero (column 6). Interestingly the treatment effect on repetition is large enough to eliminate the gender gap observed in the control group. Many students in our sample enter higher education a few years after completing high school, possibly after gaining some work experience. If older students are less familiar with technology, ALEKS might have been less beneficial for them. We test this hypothesis and present the results in Panel B in Table 4. For none of the outcomes, we observe statistically significant differences in the treatment effects for students below and above age 25. Finally, we provide evidence on whether the effect varies according to the general field of study. Within SENESCYT institutions, fields of study greatly vary in their mathematical requirements, with Engineering being much more math intensive than Agriculture and Services. Results reported in Figure 4 show that having the opportunity to use ALEKS led to improvements in math, irrespective of the field of study. However, the effects are marginally insignificant for students enrolled in Agriculture and Services, possibly due to a small sample size. Impacts on enrollment in the third semester also do not differ by field of study. We find a substantial and statistically significant effect on repetition for students attending an Engineering class and a zero impact for students attending other, potentially less math-intensive, courses. The sizable effect on students attending Engineering can explain the significant differences between male and female students. About half of the male students in our sample were enrolled in Engineering, as opposed to 14 percent of female students. 6.3. Dosage Effects As discussed above, average measures hide a substantial heterogeneity in the platform's usage. To identify the effects of an additional hour of ALEKS usage, we estimate the following model: = 0 + 1 + ′ + ′ + (2) where Hoursis denotes the cumulative number of hours of use of the platform for individual i in institute s during the entire period between January and June 2021 and is set to zero for all students in the control group. Gs and Xis are defined as in equation (1). The main challenge in identifying the parameter m1 is the fact that hours of use is likely to be endogenous, as students might be reacting to shocks or might choose based on other intrinsic determinants of academic performance. 15 The treatment assignment is by construction independent of the error term and potentially a valid exclusion restriction. We therefore estimate eq. (2) by instrumental variables (IV). The identifying assumption is that the treatment only affects the outcome by changing the hours of usage of the platform. In our context this assumption is likely to hold. First, unlike other settings (Muralidharan et al., 2019) students do not use the application in computer lab, and it is therefore unlikely that the intervention could affect the peer composition. Since the intervention was conducted during the school closing where most of the teaching was already being conducted online, it is unlikely that the intervention could differentially affect teaching practices in the treatment and control groups. Results are reported in Table 5. One standard deviation increase in the number of hours of use of ALEKS increases test scores by 0.20sd and reduces the probability of having failed any subjects by 6 percentage points. In line with the ITT results, the impact on the probability of enrolling in the third semester is null. 6.4. Potential Mechanisms There are different ways through which ALEKS might improve learning outcomes. First, by improving student preparedness, it might make learning more efficient. Unfortunately, our data do not allow us to test this hypothesis. Besides increasing the quality of learning, ALEKS might have increased the number of hours devoted to it (quantity). Information collected through the online survey applied contextually to the math test supports this hypothesis. Results presented in columns 1 to 4 in Table 6 suggest that students are, on average, studying mathematics autonomously for more hours without sacrificing other subjects (column 5). Students in the treatment group displayed a higher perceived ability of their math skills with ALEKS (column 6). Altogether, these results are suggestive that, either because they are better equipped to understand more complex concepts or because of the gamification of the learning method, students who are eligible to use ALEKS enjoy studying mathematics more, and they spend more hours doing it, without sacrificing other subjects. 7. Conclusions Providing remedial education is the primary way higher education institutions cope with students who do not have the academic preparation needed to succeed in tertiary education. Expenses for remedial education programs represent a significant share of the university and non- 16 university budgets despite the available mixed evidence of their effectiveness. Even when effective, they are often unaffordable, especially for higher education institutions in low and middle-income countries. We provide evidence assessing the effects of using a Digital Personalized Learning software that builds on AI to guide student remediation in mathematics by delivering content tailored to the learning needs of students. We evaluated the ALEKS platform at scale since it involved all first-semester students in Ecuador's public technical higher education institutions. We find that receiving a license to use ALEKS for six months led to a considerable reduction in the probability of failing a course and a sizeable and statistically significant improvement in a math assessment. While the decrease in the likelihood of failing a class concentrates among male students, possibly due to the predominantly male enrollment in more math-intensive fields, the improvements in the math assessment hold across all student groups. Given the low cost of the program ($18 per student), our results suggest that computer assisted remediation is a cost-effective strategy to improve student readiness for higher education. Since the evaluation rollout occurred during the COVID-19 pandemic, when technical universities were closed and providing academic services online, it is hard to generalize the findings of our study to a standard context with in-person instruction. On the one hand, program take-up might be higher once students return to an in-person modality, as they can access computer labs at the institutes. Teachers will also have more chances to monitor and promote the use of the platforms among students. On the other hand, if there were complementarities between the usage of ALEKS and other forms of distance learning during the pandemic, the platform's relevance to students might drop as students return to in-person learning. In general, under the hypothesis that the effects of ALEKS are heterogeneous across individuals depending on their usage time, proficiency, and motivation, it is unclear how students who benefited most from ALEKS (or used the platform more) during a time of virtual instruction would retain these benefits and utilization patterns under in-person instruction. 17 8. References Alban Conto, C., Akseer, S., Dreesen, T., Kamei, A., Mizunoya, S., & Rigole, A. (2021). Potential effects of COVID-19 school closures on foundational skills and Country responses for mitigating learning loss. International Journal of Educational Development, 87, 102434. https://doi.org/10.1016/j.ijedudev.2021.102434 Aseguramiento de la Calidad en la Educación y en el Trabajo (ACET). (2021). DESCAES, Ecuador: Reporte General de Resultados. Banerjee, A., Cole, S., Duflo, E., & Linden, L. (2007). Remedying Education: Evidence from Two Randomized Experiments in India. The Quarterly Journal of Economics, 122(3), 1235– 1264. https://doi.org/10.1162/qjec.122.3.1235 Bettinger, E. P., & Long, B. T. (2005). Remediation at the community college: Student participation and outcomes. New Directions for Community Colleges, 2005(129), 17–26. https://doi.org/10.1002/cc.182 Calcagno, J. C., & Long, B. T. (2009). Evaluating the impact of remedial education in Florida community colleges: A quasi-experimental regression discontinuity design [National Center for Postsecondary Research (NCPR) Brief]. De Barros, A., & Ganimian, A. J. (2021). Which Students Benefit from Personalized Learning? Experimental Evidence from a Math Software in Public Schools in India. Epper, R. M., & Baker, E. D. (2009). Technology Solutions for Developmental Math: An Overview of Current and Emerging Practices. Falmagne, J.-C., & Doignon, J.-P. (2011). Learning Spaces: Interdisciplinary Applied Mathematics. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-01039-2 Fang, Y., Ren, Z., Hu, X., & Graesser, A. C. (2019). A meta-analysis of the effectiveness of ALEKS on learning. Educational Psychology, 39(10), 1278–1292. https://doi.org/10.1080/01443410.2018.1495829 Ferreyra, M. M., Avitabile, C., Botero Álvarez, J., Haimovich Paz, F., & Urzúa, S. (2017). At a Crossroads: Higher Education in Latin America and the Caribbean (Directions in Development). World Bank. Foshee, C. M., Elliott, S. N., & Atkinson, R. K. (2016). Technology-enhanced learning in college mathematics remediation: TEL in college math remediation. British Journal of Educational Technology, 47(5), 893–905. https://doi.org/10.1111/bjet.12285 Lalley, J. P., & Gentile, J. R. (2009). Adapting Instruction To Individuals: Based on the Evidence, What Should It Mean? International Journal of Teaching and Learning in Higher Education, 20(3), 462–475. 18 Lee, D. (2009). Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects. The Review of Economic Studies, 76(3), 1071–1102 Ma, W., Adesope, O. O., Nesbit, J. C., & Liu, Q. (2014). Intelligent tutoring systems and learning outcomes: A meta-analysis. Journal of Educational Psychology, 106(4), 901–918. https://doi.org/10.1037/a0037123 Muralidharan, K., Singh, A., & Ganimian, A. J. (2019). Disrupting Education? Experimental Evidence on Technology-Aided Instruction in India. American Economic Review, 109(4), 1426–1460. https://doi.org/10.1257/aer.20171112 Nickow, A., Oreopoulos, P., & Quan, V. (2020). The Impressive Effects of Tutoring on PreK-12 Learning: A Systematic Review and Meta-Analysis of the Experimental Evidence (No. w27476; p. w27476). National Bureau of Economic Research. https://doi.org/10.3386/w27476 Romano, J. P., & Wolf, M. (2005). Exact and Approximate Stepdown Methods for Multiple Hypothesis Testing. Journal of the American Statistical Association, 100(469): 94–108. Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American Psychologist, 55(1), 68–78. https://doi.org/10.1037/0003-066X.55.1.68 Saxon, D. P., & Boylan, H. R. (2001). The cost of remedial education in higher education. Journal of Developmental Education, 25(2), 2–9. Sottilare, R. A. (2015). Fundamentals of Adaptive Intelligent Tutoring Systems for Self-Regulated Learning. US Army Research Laboratory. UNESCO. (2022). Evaluación de los Efectos e Impactos del COVID-19 en la Educación Superior. Vanlehn, K. (2006). The Behavior of Tutoring Systems. International Journal of Artificial Intelligence in Education, 16, 227–265. Venegas-Muggli, J. I., Muñoz-Gajardo, K. A., & González-Clares, M. J. (2019). The Impact of Counseling and Mathematics Remedial Programs on the Academic Achievement of Higher Education Students in Chile. Journal of College Student Development, 60(4), 472–488. https://doi.org/10.1353/csd.2019.0041 Wigfield, A., & Eccles, J. S. (2000). Expectancy–Value Theory of Achievement Motivation. Contemporary Educational Psychology, 25(1), 68–81. https://doi.org/10.1006/ceps.1999.1015 World Bank. (2022). The State of Global Learning Poverty: 2022 Update. World Bank. 19 Figure 1. Timeline of Implementation and Data Collection Figure 1. Percent of Treatment Students that used ALEKS at least one Minute 100% 90% 86% 86% 80% 70% 70% 59% 60% Percent 50% 40% 30% 20% 7% 10% 0% January February March April May Note. Percent calculated over sample of 5077 with ALEKS licenses. Source: Authors using take-up data from ALEKS software. 20 Figure 2. Average Number of Minutes that ALEKS Was Used, by Knowledge Area of Technical Course 3000 2500 2314.0 2127.0 2056.1 2000 1891.3 Minutes 1500 1264.6 1000 500 0 Administration Engineering, Information & Agriculture, Services industry & communication forestry, fishing and construction technology (ICT) veterinary Source: Authors using take-up data from ALEKS software and course content from SENESCYT enrollment datasets. Figure 4. ITT Heterogeneity by Field of Study 21 Table 1. Themes and Topics Included in EAES Assessment Themes Topic EAES: Mathematics Algebra and Functions Real numbers, real polynomials with coefficients in R; factoring techniques First and second-degree equations with one unknown First degree inequalities with one unknown Systems of linear equations Real functions Quadratic function Trigonometric functions Exponential function and logarithmic function Geometry and Measurement The vector space R2; straight lines in R2 Statistic and probability Descriptive statistics EAES: Language and Literature Language and Culture Written culture Oral communication Oral communication and social interaction Reading Reading Comprehension Writing Text production Literature Literature Source: Authors using information from SENESCYT. Table 2. Characteristics of the Sample at the Baseline (1) (2) (3) (4) (5) Mean SD Mean SD P-val. control Variable control control treat treat vs treat Individual Level Age 22.474 5.822 22.564 5.841 0.682 Female 0.393 0.488 0.394 0.489 0.844 Repeated (Y/N) 0.051 0.220 0.037 0.189 0.451 Work and Study (Y/N) 0.399 0.490 0.425 0.494 0.412 Bono Desarrollo (Y/N) 0.067 0.251 0.094 0.291 0.245 Scholarship (Y/N) 0.023 0.150 0.048 0.214 0.445 Father Edu: Basic 0.489 0.500 0.479 0.500 0.960 Father Edu: Secondary 0.406 0.491 0.403 0.490 0.798 Father Edu: Higher 0.105 0.307 0.119 0.323 0.280 Mother Edu: Basic 0.478 0.500 0.466 0.499 0.935 22 Mother Edu: Secondary 0.399 0.490 0.406 0.491 0.996 Mother Edu: Higher 0.123 0.329 0.128 0.334 0.795 Application Grade 753.536 40.546 737.341 41.707 0.006*** Institute Level DESCAES Score (professor) 55.820 7.682 53.832 8.229 0.323 Number of professors 40.375 38.366 40.718 33.492 0.276 Share Female Prof 0.428 0.132 0.437 0.149 0.669 Number of students 167.563 163.735 155.615 143.241 0.150 Source: Authors using SENESCYT’s administrative data. Table 3. Impact on Student Outcomes Math Score Enrolled 3rd sem. (Y/N) Repited (Y/N) (1) (2) (3) (4) (5) (6) Aleks 6 meses 0.235*** 0.279*** -0.004 0.010 -0.101* -0.092* (0.078) (0.070) (0.027) (0.028) (0.058) (0.050) Strata F.E. Yes Yes Yes Yes Yes Yes Controls No Yes No Yes No Yes RW p-value 0.066 0.727 0.066 Mean Control -0.000 -0.000 0.588 0.588 0.205 0.205 SD Control 1.000 1.000 0.492 0.492 0.404 0.404 Obs 3512 3512 11431 11431 6694 6694 Note: The math score is standardized with respect to the mean and the standard deviation in the control group. Enrolled 3rd sem. takes the value 1 if student is enrolled in the third semester, 0 otherwise. The repited dummy takes the value 1 if the student enrolled in the third semester has failed any subject throughout her career, 0 otherwise. Controls include age, a dummy for whether age is missing, a dummy for whether a student is female or not, a dummy for whether the student’s household receives the Bono de Desarollo transfer or not, the application note and a dummy for whether the application note is missing or not. Romano–Wolf adjusted p-values (RW, Romano and Wolf 2005, 2016) are reported in order to account for three simultaneous hypotheses for student outcomes. Standard errors in parentheses are clustered at the institute level. * p < 0.10, ** p < 0.05, *** p < 0.01 23 Table 4. Heterogeneous Effects Panel A: Impact by Gender Math Score Enrolled 3rd sem. (Y/N) Repited (Y/N) (1) (2) (3) (4) (5) (6) Males Females Males Females Males Females Aleks 6 meses 0.278*** 0.263*** 0.012 0.006 -0.141** -0.019 (0.070) (0.093) (0.035) (0.030) (0.063) (0.041) Strata F.E. Yes Yes Yes Yes Yes Yes Controls Yes Yes Yes Yes Yes Yes Mean Control 0.043 -0.067 0.567 0.622 0.260 0.129 SD Control 1.008 0.984 0.496 0.485 0.439 0.335 Obs 2034 1478 6933 4498 3891 2803 Panel B: Impact by Age Math Score Enrolled 3rd sem. (Y/N) Repited (Y/N) (1) (2) (3) (4) (5) (6) Below age Above age Below age Above age Below age Above age 25 25 25 25 25 25 Aleks 6 meses 0.290*** 0.213** -0.001 0.044 -0.094* -0.081 (0.069) (0.100) (0.029) (0.031) (0.049) (0.061) Strata F.E. Yes Yes Yes Yes Yes Yes Controls Yes Yes Yes Yes Yes Yes Mean Control 0.009 -0.040 0.609 0.527 0.212 0.182 SD Control 0.992 1.035 0.488 0.499 0.409 0.386 Obs 2790 721 8529 2901 5113 1580 Note: The math score is standardized with respect to the mean and the standard deviation in the control group. Enrolled 3 rd sem. takes the value 1 if student is enrolled in the third semester, 0 otherwise. The repited dummy takes the value 1 if the student enrolled in the third semester has failed any subject throughout her career, 0 otherwise. Controls include age, a dummy for whether age is missing, a dummy for whether a student is female or not, a dummy for whether the student’s household receives the Bono de Desarollo transfer or not, the application note and a dummy for whether the application note is missing or not. Standard errors in parentheses are clustered at institute level. * p < 0.10, ** p < 0.05, *** p < 0.01 24 Table 5. Dosage Effects Math Score Enrolled 3rd sem. Repited (Y/N) (Y/N) (1) (2) (3) Hours of Aleks usage 0.006*** 0.000 -0.002* (0.001) (0.001) (0.001) Strata F.E. Yes Yes Yes Controls Yes Yes Yes F First Stage 155.852 128.038 159.744 Mean Control -0.000 0.588 0.205 SD Control 1.000 0.492 0.404 Obs 3512 11431 6694 Note: The math score is standardized with respect to the mean and the standard deviation in the control group. Enrolled 3rd sem. takes the value 1 if student is enrolled in the third semester, 0 otherwise. The repited dummy takes the value 1 if the student enrolled in the third semester has failed any subject throughout her career, 0 otherwise. Controls include age, a dummy for whether age is missing, a dummy for whether a student is female or not, a dummy for whether the student’s household receives the Bono de Desarollo transfer or not, the application note and a dummy for whether the application note is missing or not. The hours of use are set to zero for the control group. The F First allows to assess a weak instrument issue. Standard errors in parentheses are clustered at institute level. * p < 0.10, ** p < 0.05, *** p < 0.01 Table 6. Intermediate Mechanisms (1) (2) (3) (4) (5) (6) (7) (8) 0-5 math 6-10 11-20 More Stopped Perceived Access to Access to hours pw math math than 20 studying Math computer internet hours hours math other Ability [1- pw pw hours subj. 4] pw Aleks 6 meses -0.089*** 0.003 0.051*** 0.036*** 0.018 0.070*** -0.000 0.011 (0.017) (0.013) (0.013) (0.009) (0.019) (0.026) (0.012) (0.016) Strata F.E. Yes Yes Yes Yes Yes Yes Yes Yes Controls Yes Yes Yes Yes Yes Yes Yes Yes Mean Control 0.353 0.418 0.165 0.064 0.266 2.723 0.858 0.822 SD Control 0.478 0.493 0.371 0.244 0.442 0.678 0.349 0.383 Obs 4850 4850 4850 4850 4850 4850 4850 4850 Note: Students were provided with a multiple choice about the number of hours they studied math on a weekly basis: 0-5, 6- 10, 10-20, more than 20. A dummy for each option was generated and results are reported in columns 1 to 5. Stopped studying other subj. takes the value 1 if the student reported having stopped other subjects in order to study math, 0 otherwise. Perceived math ability takes values 1 to 4, with 4 denoting the highest level of competency. Access to computer takes the value 1 if the student reports having either a desktop or a laptop computer at home, 0 otherwise. Access to internet takes the value 1 if the student reports having a computer at home, 0 otherwise. Controls include age, a dummy for whether age is missing, a dummy for whether a student is female or not, a dummy for whether the student’s household receives the Bono de Desarollo transfer or not, the application note and a dummy for whether the application note is missing or not. Standard errors in parentheses are clustered at institute level. * p < 0.10, ** p < 0.05, *** p < 0.01 25 Appendix A. Additional Tables Table A1. Main Knowledge Areas Offered by Public TTCs in Ecuador Knowledge Areas 1 Engineering 2 Business Administration 3 Agriculture, forestry, fishing and veterinary 4 Management 5 Services 6 Health and well-being 7 Information and Communication Technologies 8 Construction 9 Computing 10 Medicine 11 Arts and Humanities 12 Industry and Production 13 Natural sciences 14 Mathematics and Statistics 15 Arts 16 Teacher training 17 Environmental protection 18 Agriculture, forestry, and fishing 19 Social sciences, journalism, media, and law 20 Security services Source: SENESCYT. Table A2. Descriptive Statistics on ALEKS Courses Offered in Ecuador Average Number of Items Knowledge Area of Technical Course No of Students per Course Engineering, industry & construction 207 2036 Information & communication 179 680 technology Business administration 165 1429 Agriculture, forestry, fishing and 1254 308 veterinary Services 90 624 All technical programs 172 5077 Source: Authors using the ALEKS software. 26 Table A3. Differential Attrition (1) (2) Missing Math Missing Math Score Score Aleks 6 meses -0.027 -0.040 (0.058) (0.057) Strata F.E. Yes Yes Controls No Yes Mean Control 0.707 0.707 SD Control 0.455 0.455 Obs 11431 11431 Note: Controls include age, a dummy for whether age is missing, a dummy for whether a student is female or not, a dummy for whether the student’s household receives the Bono de Desarollo transfer or not, the application note and a dummy for whether the application note is missing or not. Standard errors in parentheses are clustered at institute level. * p < 0.10, ** p < 0.05, *** p < 0.01 Table A4. Lee Bounds for Impact on Math Standardized Score (1) Lower Bound 0.042 (0.208) Upper Bound 0.413** (0.179) Obs 11431 Standard errors in parentheses * p < 0.10, ** p < 0.05, *** p < 0.01 Table A5. IRT Results (1) (2) Math std score Math std score (1 par.) (2 par.) Aleks 6 meses 0.291*** 0.409*** (0.072) (0.079) Strata F.E. Yes Yes Controls Yes Yes Mean Pure Control -0.000 0.000 SD Pure Control 1.000 1.000 Obs 3512 3512 Note: Controls include age, a dummy for whether age is missing, a dummy for whether a student is female 27 or not, a dummy for whether the student’s household receives the Bono de Desarollo transfer or not, the application note and a dummy for whether the application note is missing or not. Standard errors in parentheses are clustered at institute level. * p < 0.10, ** p < 0.05, *** p < 0.01 Table A6. ITT on Language Results (1) (2) Language score Aleks 6 meses 0.007 0.045 (0.054) (0.047) Strata F.E. Yes Yes Controls No Yes Mean Pure Control 0.000 0.000 SD Pure Control 1.000 1.000 Obs 3512 3512 Note: Controls include age, a dummy for whether age is missing, a dummy for whether a student is female or not, a dummy for whether the student’s household receives the Bono de Desarollo transfer or not, the application note and a dummy for whether the application note is missing or not. Standard errors in parentheses are clustered at institute level. Standard errors in parentheses * p < 0.10, ** p < 0.05, *** p < 0.01 28