The World Bank Economic Review, 39(1), 2025, 26–41 https://doi.org10.1093/wber/lhae013 Article Too Hard, Too Easy, or Just Right: The Productivity of Downloaded from https://academic.oup.com/wber/article/39/1/26/7640280 by WORLDBANK THIRDPARTY user on 05 February 2025 Schooling and the Match between Child Skill and School Complexity Juan F. Castro and Lucciano Villacorta Abstract This study proposes a novel way of modeling the heterogeneous effects of schooling based on the notion that learning is maximized when the skill of the child matches the complexity of the learning experiences at school. It offers direct evidence about the importance of this match using longitudinal information on test scores and schooling attained by children from Peru, India, and Vietnam. Using data from Peru, it also finds that the rela- tion between the effect of schooling and early childhood skill can follow an inverted-U shape. Increasing early childhood skill will raise the productivity of the school up to the point where it matches school complexity. Further increases in child skill, however, will reduce the productivity of schooling as they will widen the mis- match. If one relates the quality of schools to the amount of learning they produce, this framework predicts that quality gains can be achieved by reducing these mismatches. JEL classification: O15, C33 Keywords: effect of schooling, early childhood skill, instructional match 1. Introduction The idea that learning is enhanced when the experience or stimulus is appropriate for the learner’s degree of understanding is present across different theories of learning (see, for example, Fosnot and Stewart (2005) on constructivism and Paas, van Gog, and Sweller (2010) on cognitive load theory). Recent lit- erature on the economics of human development also highlights the importance of offering to the child experiences that are neither too hard nor too easy in order to avoid discouraging him/her (see, for example, Heckman and Mosso (2014) on the strategy of “scaffolding”). Juan F. Castro (corresponding author) is a professor of economics at Universidad del Pacifico, Lima, Peru; his email address is castro_jf@up.edu.pe. Lucciano Villacorta is a senior economist at the Economic Research Department of the Central Bank of Chile, Santiago, Chile; his email address is lvillacorta@bcentral.cl. The authors would like to thank Manuel Arellano, Stéphane Bonhomme, Alonso Villacorta, and participants at the Economics Research Seminar at Universidad del Pacifico, the 2019 Annual Meeting of the Peruvian Economic Association, and the 2019 Latin American Meeting of the Econometric Society for their comments. We also thank Alexandra Heredia-Mayo, Lucas Cisneros, and Martin Ternero for excellent research assistance. All remaining errors are ours. The data used in this article are available online in the UK Data Archive at https://beta.ukdataservice.ac.uk/datacatalogue/series/series?id=2000060. The authors declare that they have no relevant or material financial interests that relate to the research described in this paper. A supplementary online appendix is available with this article at The World Bank Economic Review website. C The Author(s) 2024. Published by Oxford University Press on behalf of the International Bank for Reconstruction and Development / THE WORLD BANK. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com The World Bank Economic Review 27 On the empirical front, some studies argue that poor child skill attainment can explain why certain educational inputs lack effect on learning (see, for example, Glewwe, Kremer, and Moulin (2009) on the effect of textbooks) and, more recently, several field experiments have shown the positive effect of inter- ventions based on the pedagogical approach known as “teaching at the right level” (TRL) (see Banerjee et al. (2016) and Muralidharan and Ganimian (2019)). These interventions aim at identifying students’ Downloaded from https://academic.oup.com/wber/article/39/1/26/7640280 by WORLDBANK THIRDPARTY user on 05 February 2025 initial level of competence and offering instructional experiences matched to this level. In addition to potentially mitigating the mismatch between child skill and school complexity, these interventions usu- ally offer other educational inputs. Some examples of these additional inputs are more instructional time, appointing trained volunteers as substitutes for school teachers, or the provision of training for school teachers. The positive experimental evidence, therefore, can be the result of an intervention operating through multiple channels. This study seeks to provide direct evidence about the importance that the match between child skill and school complexity has for learning. For this, it proposes a novel way of modeling the productivity of schooling that allows for individual-specific effects that depend on the difference between the child’s prior stock of skill and the complexity of the learning experiences offered to the child at school (the mismatch). From the point of view of the household, one can think of the mismatch between the skill of children and the complexity of a learning input as a misallocation of child skill. This can occur for several reasons. For example, incomplete information about child skill or financial frictions can prevent parents from purchasing the most appropriate input. Also, if one thinks of the input as the interactions that occur at school, it is not hard to imagine that mismatches will occur because these interactions cannot be perfectly tailored to every student. In fact, in a recent study, Bau (2022) shows that this mismatch can be the result of the optimal response of a private school facing more competition. For its main results, the study uses longitudinal information provided by the Young Lives Study1 on skill test scores, years of schooling, and school characteristics for a large sample of Peruvian children attending different schools. Child skill is measured using the scores obtained by the children in the Peabody Picture Vocabulary Test. Differences in the complexity of the learning experiences offered to the children at school are approximated using the heterogeneity in curriculum coverage reported in the children’s classes. To identify the parameters that govern the individual-specific effect of schooling we use a valued- added specification that includes fixed effects to control for individual-specific unobserved heterogeneity and an instrumental variable approach that exploits exogenous variation in children’s birth dates to deal with time-varying shocks that can influence schooling decisions. The empirical strategy relies on observed heterogeneity in the mismatches to uncover a distribution of marginal effects, and it shows how the marginal effect of schooling changes along the skill distribution for a given school complexity. We take advantage of three rounds of data to estimate a non-linear dynamic panel model. For this, we introduce a non-linear version of the Arellano–Bond GMM estimator that exploits valid moment conditions. The main results can be summarized as follows. Using data from Peru, this study finds that the pro- ductivity of schooling depends on the difference between the existing stock of skill of the child and the complexity of the school. In particular, the effect of schooling is maximized when there is a match be- tween child skill and school complexity and, therefore, mismatches in either direction are detrimental for learning. Consistent with this source of heterogeneity, we find that the relation between the effect of schooling and the stock of early childhood skill follows an inverted-U shape for the median degree of school complexity. Increasing early childhood skill raises the productivity of the school for almost the entire first half of the distribution of skill. However, for the second half of this distribution, raising early childhood skill reduces the productivity of the school. Finally, we test and confirm that this same source 1 Young Lives is an international study of childhood poverty, following 12,000 children in 4 countries (Ethiopia, India, Peru, and Vietnam) over 15 years. 28 Castro and Villacorta of heterogeneity is relevant for the productivity of schooling in other two countries participating in the Young Lives Study: India and Vietnam. This paper makes several contributions. First, it provides direct evidence on the importance that the match between child skill and school complexity has for learning. In addition, it offers external validity for the results of the field experiments reported in the TRL literature. In fact, our main analysis relies on Downloaded from https://academic.oup.com/wber/article/39/1/26/7640280 by WORLDBANK THIRDPARTY user on 05 February 2025 observational data for a large sample of Peruvian children attending different schools across the country, and we also provide results for similarly large samples of children living in Vietnam and India. Bau (2022) assesses the importance of the match between students’ instructional needs and a school’s instructional level, proposes a model for school choices regarding instructional level, and simulates how it responds to competition. In her model, the instructional match depends on the heterogeneity in children’s wealth. Our paper complements this analysis by explicitly modeling the match as a function of children’s preschool skill and the curricular complexity of his/her school. Finally, by explicitly modeling the mismatch, this paper also contributes by shedding light on the role that the skill attained early in the life of children has on the productivity of learning inputs occurring later. Empirical results from the early childhood development literature have shown that the existing stock of child skill has a positive effect on the productivity of inputs. Some works in this literature are Attanasio et al. (2020), Attanasio, Meghir, and Nix (2020), Aizer and Cunha (2012), and Cunha, Heckman, and Schennach (2010). The logic is that children that have been exposed to a more nurturing environment are better prepared for the learning experiences they encounter later. Recent empirical work, however, has found that an increase in children’s stock of skill can also reduce the productivity of an input (Agostinelli and Wiswall 2022). The flexible production function of skill proposed here accommodates both phenomena depending on the sign of the instructional mismatch. A negative mismatch favoring instructional complexity will produce a positive effect of child skill on the productivity of the learning input. This is because an increase in the skill of the child will reduce the mismatch and, thus, raise the productivity. A positive mismatch favoring child skill will produce a negative effect of skill on the productivity of the input because raising it will widen the mismatch. The rest of the paper is organized as follows. It first presents the specification proposed to allow for heterogeneity in the productivity of schooling as a function of the mismatch between child skill and school complexity. The following section discusses the data and empirical strategy employed to identify the productivity of schooling. The main results using Peruvian data and the results using data from other Young Lives countries are presented next. The last section closes with some concluding remarks. 2. Framework This study allows for heterogeneous effects of schooling on child skill and proposes a way of modeling this heterogeneity so that it incorporates the notion that the productivity of a learning input is maximized when there is a match between the complexity of the input and the competence of the child. Deviations from this match mean that the input is either too complicated or too simple for the child and, thus, deviations are detrimental for its effect on learning (i.e. deviations are detrimental for the productivity of the input). Let us assume the following value added specification: ait = γ1 ait −1 + φit iit + αi + μit , (1) where ait represents the logarithm of some measure of skill of child i at time t and iit indicates exposure to the input between periods t and t − 1, also measured in logs. The term α i captures unobserved het- erogeneity that is allowed to be correlated with all the observables of the model, and μit represents all The World Bank Economic Review 29 Figure 1. Positive and Negative Effects of Child Skill on the Productivity of the Same Learning Input. Downloaded from https://academic.oup.com/wber/article/39/1/26/7640280 by WORLDBANK THIRDPARTY user on 05 February 2025 Source: Authors’ analysis based on the functional form proposed for the productivity of a learning input: φ it = γ 2 /(exp [λ(ait − 1 − θ Dit )2 ]). Note: The term φ it is the productivity of a learning input for child i at period t, Dit is the degree of complexity of the input, and ait − 1 is the level of skill previously attained by the child. The productivity of the learning input is maximized when its complexity matches the skill of the child. other unobservable variables that are not correlated with the input or the child’s prior skill attainment. The parameter γ 1 captures persistence in child skill. The productivity of the learning input is given by φ it . This is the effect on skill of an additional unit of exposure to the input. Note that this productivity is individual specific and, in particular, will be allowed to vary depending on the difference between the level of skill previously attained by the child ait − 1 and the degree of complexity of the input Dit . For concreteness, consider that the input is a particular learning environment to which child i can be exposed between periods t and t − 1, so iit indicates the degree of exposure to this environment in that time interval. The degree of complexity (Dit ), therefore, reflects how challenging are the interactions offered to the child in this learning environment. Consider the following functional form for φ it : γ2 φit = . (2) exp[λ(ait −1 − θ Dit )2 ] In (2), the productivity of the learning environment is maximized at a value of γ 2 when there is a match between its complexity and the child’s prior skill attainment (ait − 1 = θ Dit ). Differences between ait − 1 and θ Dit introduce variations in φ it . In fact, the larger these differences, the smaller the productivity. The parameter θ transforms units of input complexity Dit into units of skill ait − 1 . Importantly, this parameter also allows our specification to nest models where ait − 1 can only have either a positive or a negative effect on the productivity of the input (see discussion below). The parameter λ controls the curvature of φ it . In the extreme case where λ = 0, the productivity of the learning environment is constant and equal to γ 2 . Notice that the effect of the child’s prior skill attainment on the productivity of the learning environ- ment is given by ∂φit −γ2 2λ(ait −1 − θ Dit ) = = −φit 2λ(ait −1 − θ Dit ). (3) ∂ ait −1 exp[λ(ait −1 − θ Dit )2 ] Therefore, if the parameters γ 2 and λ have a positive sign, the effect of child skill on the productivity of the input will be positive ( ∂∂φ it ait −1 > 0) when ait − 1 − θ Dit < 0. Intuitively, the learning environment is too challenging for the child so raising his/her skill will enhance the effect of this environment on learning. In the opposite case, when ait − 1 − θ Dit > 0, child skill will have a negative effect on the productivity of the input ( ∂∂φ it ait −1 < 0). Intuitively, the learning environment is too easy for the child so raising his/her skill will be detrimental for the productivity of this environment. Figure 1 illustrates this. 30 Castro and Villacorta It is important to notice that the specification given in (2) nests the cases where child skill has only either a positive or a negative effect on the productivity of the learning input. In fact, if θ = 0 and λ < 0, this effect is positive since ∂∂φ it ait −1 is always positive. In contrast, if θ = 0 and λ > 0, child skill has a negative effect on the productivity of the input since ∂∂φ it ait −1 is always negative. The idea that the capacity of a certain input to produce learning is maximized when there is a match Downloaded from https://academic.oup.com/wber/article/39/1/26/7640280 by WORLDBANK THIRDPARTY user on 05 February 2025 between its complexity and the skill of the learner is consistent with several theories. For example, con- structionism proposes that learning occurs within the child’s “zone of proximal development,” which reflects the ability of the child to understand the logic of the new concept (Fosnot and Stewart 2005). Cognitive load theory (CLT) is another influential theory of learning that highlights the importance of achieving a balance between the complexity of the stimulus and the expertise of the learner. CLT is based on the notions that the human working memory has a limited capacity to process new information and that learning requires processing, organizing, and storing this new information in long-term memory (Leppink et al. 2015). Two important ideas behind this theory are (a) that the amount of cognitive load produced by an educational process depends on the difference between the complexity of the information being presented and the learner’s prior knowledge, and (b) that learning requires a certain amount of cognitive load but not too much (too little or too much cognitive load are both detrimental for learning). Combining these two ideas one deduces that, for a given degree of complexity of the information to be processed, learning is enhanced at a certain level of skill (an excess above this level or a deficit below it are both detrimental for learning). This last idea is captured in the way we model the productivity of the learning input (φ it ; see fig. 1). 3. Data and Empirical Strategy 3.1. Data The learning environment we consider for this analysis is the school. Following the model proposed above, this means that iit corresponds to the number of years of schooling accumulated by child i between periods t and t − 1, and φ it captures the effect of an additional year of schooling on his/her skill (i.e. the productivity of schooling). We allow this productivity to be individual specific and, in particular, we allow it to vary depending on the difference between the child’s prior skill attainment (ait − 1 ) and the complexity of the learning experiences offered to the child at school (Dit ). Our main estimations use the information of the Younger Cohort of the Young Lives Study in Peru. In particular, we use rounds 2, 3, and 4 of the Child Survey collected between 2006 and 2013 and the School Survey collected in 2011. The Child Survey provides information on child skill and years of schooling. The School Survey provides the characteristics of the school attended by a sub-sample of the Young Lives children. The basic structure of these data is summarized in table 1. We use the sample of children who have complete cognitive test scores for rounds 2, 3, and 4 and attend a school that participated in the school survey (480 children in 125 schools).2 3.1.1. Child Skill The availability of longitudinal information is key for our identification strategy (see the discussion in the section Empirical Strategy). In particular, we need to measure child skill in three different time periods (between preschool and school age) and use the same test to ensure that the scores reflect the accumulation of skill by the child in a way that is consistent with the model given in (1). The Young Lives Study in Peru offers the opportunity to do this only through the Peabody Picture Vocabulary Test (PPVT). The same 2 The risk of selection bias due to this second condition is very small. Primary school attendance in Peru is close to 100 percent (only 0.7 percent and 0.3 percent of Young Lives Younger Cohort children were not attending school in round 3 and round 4, respectively), and schools participating in the school survey were randomly selected (Guerrero et al. 2012). The World Bank Economic Review 31 Table 1. Structure and Sample Sizes of the Relevant Young Lives Databases for the Younger Cohort Child survey School survey Round 2 Round 3 Round 4 2006 2009 2013 2011 Downloaded from https://academic.oup.com/wber/article/39/1/26/7640280 by WORLDBANK THIRDPARTY user on 05 February 2025 Children’s age (years) 5 8 12 10 Sample size (children) 2,052 1,943 1,902 572 (132 schools) Expected school None Grade 3 or 2 Grade 7 or 6 Grade 5 or 4 attainment (in preschool) Source: Rounds 2, 3, and 4 and the School Survey for the Younger Cohort of the Young Lives Study in Peru. Note: The table presents the year, children’s age, sample size, and children’s expected school attainment for each of the three rounds of data considered, as well as for the School Survey. test was applied to the children in rounds 2, 3, and 4. As shown in table 1, the scores correspond to ages 5 (preschool), 8 (grades 3 or 2), and 12 (grades 7 or 6). In fact, this is the only cognitive skill measure with longitudinal results for the Younger Cohort. The PPVT is a widely used test of receptive vocabulary and is designed to measure the acquisition of vocabulary from an early age (2.5 years) to adulthood. The task is to select the picture that best represents the word presented orally by the examiner. The items are arranged in order of increasing difficulty, and the score will depend on the number of correct answers provided until the test-taker makes a predetermined number of mistakes, after which the test stops. Several studies have shown that PPVT scores have a strong positive correlation with common measures of intelligence, such as the Wechsler and the McCarthy scales (Campbell, Bell, and Keith 2001). The Young Lives Study in Peru used the Spanish version adapted for Latin America (Dunn et al. 1986), which has 125 items. Supplementary online appendix fig. S1.1 shows the distribution of the raw test scores in the three survey rounds considered for our main analysis. Table S1.1 shows the average scores obtained by children in different wealth quartiles between rounds 2 and 4. As expected, the average test score increases as the children grow older and accumulate more skill. In addition, PPVT scores exhibit a strong socioeconomic gradient which emerges early in the life of children and persists through their school years. This behavior has been documented for different cognitive skill measures in several studies such as Heckman (2006, 2007), Paxson and Schady (2007), and Schady et al. (2015). Please refer to Cueto et al. (2009) and Cueto and Leon (2012) for a detailed discussion about the reliability and validity of the PPVT scores provided by the Young Lives Study. 3.1.2. School Complexity According to our framework, the productivity of a learning environment will depend on the difference between the skill of the child and the complexity of the interactions offered to the child. The smaller the difference, the larger the productivity. This idea is consistent with the TRL approach, which has been tested by measuring whether identifying students’ initial level of skill and offering interactions appropriate for that level can produce more learning than following a standard grade-appropriate school curriculum. These interventions, however, do not rely on an explicit measure of the complexity of the interactions offered by the school. In this study, our objective is to directly assess the importance that the match between child skill and school complexity has for learning. Ideally, therefore, we need a measure of school complexity. Notice that, as expressed in (2), school complexity can vary both across time and across individuals, so we will assume that it has a time component (δ t ) that captures increasing difficulty due to grade advancement 32 Castro and Villacorta and a cross-section component (Ci ) that captures between-school heterogeneity within a given grade. In particular, we will assume that Dit = δ t + τ Ci . To approximate Ci , we will rely on the between-school differences in curriculum coverage. The basic idea is that, for a given grade-appropriate curriculum, the complexity of the learning interactions proposed to children at school is directly related to the number of topics covered in depth during classes. To measure Downloaded from https://academic.oup.com/wber/article/39/1/26/7640280 by WORLDBANK THIRDPARTY user on 05 February 2025 this, we will rely on the Peruvian School Survey, which provides information about the proportion of the mathematics curriculum covered in depth in the classes attended by the Young Lives children. We will refer to this measure as curriculum coverage. Supplementary online appendix fig. S1.2 shows its distribution, and table S1.3 presents its main descriptive statistics.3 3.1.3. The Mismatch As expressed in equation (2), the mismatch is given by the difference between the child’s skill and the school complexity. The presence of a time component in the school complexity variable requires estimating two additional parameters (the time component for rounds 4 and 3). To limit the number of parameters to be estimated and gain precision, our main results will rely on the demeaned versions of child skill and curriculum coverage.4 Let us define D ¯ and a ¯ t = δt + τ C ¯t −1 as the cross-sectional average of school complexity and skill in t − 1, respectively. Then the demeaned version of the mismatch is mit − (a ¯ t ) = ait −1 − a ¯ t −1 − θ D ¯ t) ¯t −1 − (θ Dit − θ D (4) = ait −1 − a ¯) ¯t −1 − θ (δt + τ Ci ) + θ (δt + τ C = (ait −1 − a ¯ ). ¯t −1 ) − θτ (Ci − C Therefore, by using the demeaned versions of child skill (ait −1 − a ¯) ¯t −1 ) and curriculum coverage (Ci − C one eliminates the time component and retains the mismatch under the assumption that the average skill is matched with the average degree of complexity (a ¯ t ). This assumption implies that mismatches ¯t −1 = θ D occur because of a misallocation of child skill or complexity within each grade and not because the education system is, on average, offering learning experiences that do not correspond to the average skill level of the children studying in a particular grade. In supplementary online appendix S4, we show in a simulation calibrated to our data that estimating the model using the demeaned version of the mismatch produces consistent estimates of the model’s parameters (see fig. S4.1 and table S4.1). 3.2. Empirical Strategy Equation (1) is a linear model but with a heterogeneous effect and can be framed within the potential outcome framework with heterogeneous effects of the “treatment” iit . One difference in our specification is that we are parameterizing the individual-specific effect as a function of the mismatch between the stock of skill of child i and the complexity of the treatment (summarized in Dit ). Therefore, we are not only interested in identifying a local average treatment effect but how the marginal effect of schooling evolves along the distribution of mismatches. 3 For the School Survey, the mathematics teachers of the Young Lives children were given a comprehensive list of topics and asked how many of them have been covered in class. The curriculum coverage corresponds to the proportion of these topics reported as covered in depth. We use the school average of this coverage if there is more than one Young Lives child attending the same school. 4 If one allows for a time component, collinearity between moment conditions significantly affects the precision of our estimates. In supplementary online appendix S3, we present the results obtained when one allows for a time component in Dit . Notice that parameter estimates are not significant, but the point estimates are similar to those we present in the main text (compare supplementary online appendix table S3.1 with table 2 in the main text). The World Bank Economic Review 33 Our empirical model is γ2 ait = γ1 ait −1 + iit + αi + μit , (5) ˜ it −1 − π Ci )2 ] exp[λ(a where a ˜ it −1 is the demeaned value of ait − 1 , Ci is the demeaned measure of curriculum coverage, and π = θτ. Downloaded from https://academic.oup.com/wber/article/39/1/26/7640280 by WORLDBANK THIRDPARTY user on 05 February 2025 Our identification relies on a value-added specification that combines a fixed effect approach with an instrumental variable strategy. Following Andrabi et al. (2011), we take advantage of the fact that the Peruvian database possesses three waves of data with comparable PPVT scores to include an individual fixed effect in our empirical model. The fixed effects control for any unobservable characteristic that might influence skills and be correlated with the treatment (iit ), the stock of skill when the treatment takes place (ait − 1 ), or the complexity of the treatment (Ci ). In general, we can expect a strong correlation between an observed input of skill and other unobserved influences both at the family and school levels. More affluent families are not only capable of purchasing more and better school inputs, but they are also capable of offering a more nurturing environment at home and during early childhood. The fixed effects in our panel data model will remove this source of heterogeneity insofar as these unobservable characteristics do not change between the last two waves of data. This is a reasonable assumption, as unobserved determinants of skill are likely related to family resources and preferences (which are typically stable over time), and the last two waves cover a rela- tively short period (four years). In addition, we implement an instrumental variable strategy to purge any remaining correlation between the treatment and μit . To gain intuition about how identification works, let us consider the case of a homogeneous effect of the input (i.e. λ = 0). In this case, equation (5) takes the form of a linear dynamic panel model similar to the one studied in Andrabi et al. (2011): ait = γ1 ait −1 + γ2 iit + αi + μit . (6) Using round 4 and round 3 of our database, we can eliminate the unobserved fixed effects by taking first differences of model (6): ai4 − ai3 = γ1 (ai3 − ai2 ) + γ2 (ii4 − ii3 ) + μi4 − μi3 . (7) From equation (7), we can clearly see that the identification of the treatment effect of one additional year of schooling (captured by γ 2 ) relies on E[(μi4 − μi3 )|(ii4 − ii3 )] = 0. Almost all the students in our database have completed four years of schooling between rounds 4 and 3, which means that the variable ii4 is practically constant across individuals (ii4 = 4). Given that all the students have taken the same schooling decision between rounds 4 and 3, we can infer that any heterogeneity across students will not explain this investment decision (i.e. E[(μi4 − μi3 )|(ii4 )] = 0). Hence, all the cross-sectional variation for identifying γ 2 comes from the heterogeneity in the investment decision made between rounds 3 and 2. In our database, we can see that 36.7 percent of the sample have completed three years of schooling between rounds 3 and 2, whereas 60.4 percent of the sample have completed only two years in the same period. In terms of the potential outcome framework, we have a treatment group that received one more year of education than the other group (control group). We exploit these differences in schooling to learn about γ 2 . In particular, by comparing the average change in the scores among the group of students that bought an extra year of schooling with the average changes in the scores among the group of students that did not pursue an additional year (controlling for the change in the score in the previous period) we can recover a homogeneous marginal effect of the input. It remains to see whether those differences in schooling decisions across groups at round 3 were exogenous to student characteristics that can influence their cognitive skills beyond their current stock of skills and a time-invariant student heterogeneity. Recall 34 Castro and Villacorta that model (6) includes a fixed effect; hence, any time-invariant unobserved factor that might influence the schooling decision, such as the parents’ background or initial skills, is already absorbed by the fixed effects. What has driven the differences in schooling decisions at round 3? Our data reveal that the difference in school attendance at round 3 is mainly explained by the child’s birth date given the official entrance Downloaded from https://academic.oup.com/wber/article/39/1/26/7640280 by WORLDBANK THIRDPARTY user on 05 February 2025 cutoff date, which we claim to be exogenous to unobserved factors once we control by a fixed effect. The official cutoff date for school entrance is July 31. In our sample, 96 percent of children who accumulated only two years of schooling up to round 3 were born after the official cutoff date in 2001, and 70 percent of children who accumulated three years of schooling were born before the official cutoff date in 2001. Based on the latter, we identify whether the child was born before or after the cutoff date (zi = 1 if child i was born before July 31 and 0 otherwise) and use this as an instrumental variable for ii3 in equation (6). Date of birth provides a valid instrument since it fulfills the relevance conditions, explaining most of the variation in schooling decisions at round 3, and is unlikely to be correlated with time-varying unobserved determinants that affect skills beyond the fixed effect. Based on this instrumental variable approach and under the assumption that μi3 is not serially correlated, we can consistently estimate γ 2 by a simple modification of the Arellano–Bond estimator (Arellano and Bond 1991) as in Andrabi et al. (2011). It is important to remark that our identification assumptions are weaker than the Arellano–Bond assumptions since Arellano–Bond assumes either a strictly exogenous regressor or predetermined regressors, whereas we are allowing our regressor of interest (ii3 ) to be endogenous with unrestricted correlation with past, contemporaneous, and future disturbances (i.e. μit for all t).5 With the data in hand, the IV-Arellano–Bond estimate of the homogeneous effect of schooling γ 2 is significant and equal to 0.41. The non-linear dynamic panel specification we consider in equation (5) expands on the linear model above by allowing the marginal effect of schooling to be student-specific ( ∂ ait ∂ iit = φit ) and to depend on the mismatch (see equation (2)). As in the linear case with homogeneous effect, we can take first differences to eliminate the fixed effects from the estimation: ai4 − ai3 = γ1 (ai3 − ai2 ) + φi4 ii4 − φi3 ii3 + μi4 − μi3 . (8) The identification of the parameters in (8) extends the arguments discussed above and relies on the as- sumption that (zi , ai2 , Ci ) are independent of μi4 and μi3 once we control for the unobserved fixed effect α i . A key aspect of our empirical framework is to use observed heterogeneity in the mismatches to recover the parameters that govern the individual-specific effect of schooling. Note that in the absence of mismatches ˜ it −1 − π Ci = 0), λ is not identified and the model with homogeneous effect (model (6)) in the data (i.e. a cannot be distinguished from the model with heterogeneous effects (model (5)). In supplementary online appendix S2, we describe the moment conditions employed by our non-linear Arellano–Bond estimator. 4. Results Table 2 presents the estimates of the four parameters involved in (5). Notice that all parameter estimates are statistically significant. Parameter γ 1 captures the persistence of skill. Our results show that around 30 percent of skill is carried forward from one round to the next. The estimated persistence parameter is in line with the results found in Andrabi et al. (2011), who use dynamic panel data techniques to estimate a linear version of the skill production function with unobserved heterogeneity. The Arellano– Bond estimates of the persistence parameter found by Andrabi et al. (2011) lie between 0.12 and 0.35 for different test scores. Recall that γ 2 in the non-linear version is the maximum productivity of schooling. This is the pro- ductivity attained when there is a match between the skill of the child and the complexity of the school. 5 Note that as in Arellano–Bond we are assuming that lagged cognitive skill is a predetermined variable in the panel sense, which rules out correlation with present and future disturbances. The World Bank Economic Review 35 Table 2. Parameters Governing the Persistence of Skill and the Effect of Schooling ˆ ∂ φi 3 γ1 γ2 λ π ˆ Mean φi3 Mean ∂a˜ i2 0.3098 0.5541 0.8472 7.1645 0.3305 −0.0278 (0.0443)∗∗∗ (0.1560)∗∗∗ (0.2216)∗∗∗ (1.2045)∗∗∗ (0.1877)∗ (0.3055) Downloaded from https://academic.oup.com/wber/article/39/1/26/7640280 by WORLDBANK THIRDPARTY user on 05 February 2025 Source: Authors’ estimations based on the data from rounds 2, 3, and 4 and the School Survey for the Younger Cohort of the Young Lives Study in Peru. γ2 Note: The first four columns present the estimated values of the parameters involved in the regression ait = γ1 ait −1 + 2 iit + αi + μit . The fifth column ˜ it −1 −π Ci ) ] exp[λ(a presents the estimated average productivity of schooling in round 3. The sixth column presents the estimated average effect of round 2 child skill on the productivity of schooling in round 3. Robust standard errors in parentheses. ∗∗∗ Significant at 1 percent. ∗∗ Significant at 5 percent. ∗ Significant at 10 percent. According to the results presented in table 2, a 1 percent increase in schooling can produce up to a 0.55 percent increase in skill. Departures from the match will reduce this marginal effect. It is also worth notic- ing from table 2 that the estimated value of λ is statistically different from zero. This means that there is heterogeneity in the effect of schooling, as λ = 0 would imply a constant productivity given by γ 2 . In addition, notice that the parameter π is also different from zero. Because π = θ τ , this implies that θ is also different from zero and this means that the data do not support a description where child skill has only either a positive or a negative effect on the productivity of schooling. In fact, because parameters γ 2 and λ both have a positive sign, child skill will have a positive effect on the productivity of schooling when a ˜ it −1 − π Ci < 0 and a negative effect on the productivity of schooling when a ˜ it −1 − π Ci > 0. This is consistent with the function depicted in fig. 1. The fifth column in table 2 displays the average of the individual-specific effect of schooling for the entire sample, which is a consistent estimator of the mean of the individual-specific effect (E[φ i3 ]). This estimate (0.33) is smaller than the Arellano–Bond estimator of the linear model assuming a homogeneous effect (0.41). This is in line with the well-known result that imposing homogeneity induces bias in the estimation of the average effect in a random coefficient model. The sixth column in table 2 displays the sample average of the derivative described in equation (3), which is a consistent estimator of E [ ∂φ ∂a ˜ i2 ]. This estimate is not significantly different from zero, which i3 means that, on average, child skill has no effect on the productivity of schooling. This, however, does not mean that this phenomenon is absent in our sample, but rather that both positive and negative effects are present and netting off each other when computing the average. In addition to these results, we would like to evaluate how the productivity of a particular school changes with the skill attained during early childhood. For this, we will fix the complexity of the school and use the functional form given in (5) and the parameter estimates given in table 2 to evaluate how the productivity of an additional year of schooling changes for different values of round 2 skill (a ˜ i2 ). The results obtained for the median curriculum coverage (fixing π ˆ Ci = πˆ C50 ) are presented in fig. 2. The figure shows point estimates and a 95 percent confidence interval. Vertical lines indicate the cutoff values of the quartiles in the distribution of a ˜ i2 . Figure 2 reveals that the productivity of schooling at the median degree school complexity is a non- monotonic function of the stock of early childhood skill. For most of the first half of its distribution, child skill has a positive effect on the productivity of schooling. In fact, for all the children situated in the lower quartile of skill, a raise in their ability would enhance the productivity of the school input. According to the framework explained above, this raise in skill has a positive effect on productivity because it reduces the gap between the competence of the child and the complexity of the learning experiences offered at school. A child in the lower quartile of the initial skill distribution would experience a larger expansion in his/her skill (would learn more) from interacting with this environment if he/she brings more initial skill to this interaction. This positive effect of early childhood skill on the productivity of schooling has a limit. In fact, at the median degree of curriculum coverage, productivity reaches the maximum of 0.55 percent at the 40th 36 Castro and Villacorta Figure 2. Relation between the Productivity of Schooling and Early Childhood Skill at the Median School Complexity. Downloaded from https://academic.oup.com/wber/article/39/1/26/7640280 by WORLDBANK THIRDPARTY user on 05 February 2025 Source: Authors’ estimations based on the data from rounds 2, 3, and 4 and the School Survey for the Younger Cohort of the Young Lives Study in Peru. Note: The term φ i3 is the productivity of schooling in round 3; a ˜ i2 is the skill of children in round 2 (early childhood). Vertical lines indicate the cutoff values of the ˜ i2 . The figure includes 95 percent confidence intervals. Standard errors were computed using the delta method. quartiles in the distribution of a percentile of child skill. At this point, the gap between the skill of the child and the complexity of the school is zero. Higher values of child skill would reduce the productivity of schooling because higher values of skill would increase the gap between the competence of the child and the complexity of the learning experiences offered at school. In what follows, we replicate this exercise at the 25th and 75th percentiles of curriculum coverage. The results are presented in fig. 3. As expected, a negative effect of early childhood skill on school productiv- ity dominates at the 25th percentile of curriculum coverage and a positive effect dominates at the 75th percentile. Relative to the skill attained during early childhood by the majority of children in our sample, the first school input is too simple. Raising children’s skill would increase the mismatch and reduce the effect of an additional year of schooling. The contrary is observed at the 75th percentile of curriculum coverage. This school input is more complex so, for the majority of children, raising their skill would close the mismatch and enhance the effect of schooling. 5. Evidence from Other Young Lives Countries In this section we provide evidence of the heterogeneous effect of schooling for other developing coun- tries in the Young Lives database. In particular, we estimate a country panel data version of model (5), combining data from Vietnam and India.6 For t = 3, 4, γ2 aict = γ1 aict −1 + i + αc + δ preschoolic + μict , 2 it (9) exp[λ(msict ) ] 6 We could not include Ethiopia in this analysis because the School Survey of this country did not include information to measure school complexity or to approximate the mismatch through the children’s self-perceived academic competence. The World Bank Economic Review 37 Figure 3. Relation between the Productivity of Schooling and Early Childhood Skill at the 25th and 75th Percentiles of School Complexity. Downloaded from https://academic.oup.com/wber/article/39/1/26/7640280 by WORLDBANK THIRDPARTY user on 05 February 2025 Source: Authors’ estimations based on the data from rounds 2, 3, and 4 and the School Survey for the Younger Cohort of the Young Lives Study in Peru. Note: The term φ i3 is the productivity of schooling in round 3; a ˜ i2 is the skill of children in round 2 (early childhood). Vertical lines indicate the cutoff values of the ˜ i2 . The figure includes 95 percent confidence intervals. Standard errors were computed using the delta method. quartiles in the distribution of a 38 Castro and Villacorta Table 3. Parameters Governing the Persistence of Skill and the Effect of Schooling in India and Vietnam ˆ ∂ φi 3 γ1 γ2 λ π ˆ Mean φi3 Mean ∂a˜ i2 0.9999 0.7435 0.6789 0.4141 0.4149 −0.0267 (0.0200)∗∗∗ (0.0750)∗∗∗ (0.1943)∗∗∗ (0.0458)∗∗∗ (0.2522)∗∗ (0.3634) Downloaded from https://academic.oup.com/wber/article/39/1/26/7640280 by WORLDBANK THIRDPARTY user on 05 February 2025 Source: Authors’ estimations based on the data from rounds 2 and 3 and the School Survey for the Younger Cohort of the Young Lives Study in India and Vietnam. γ2 Note: The first four columns present the estimated values of the parameters involved in the regression aict = γ1 aict −1 + s 2 iit + αc + δ preschoolic + μict . exp[λ(m ) ] ict The fifth column presents the estimated average productivity of schooling in round 3. The sixth column presents the estimated average effect of round 2 child skill on the productivity of schooling in round 3. Robust standard errors in parentheses. ∗∗∗ Significant at 1 percent. ∗∗ Significant at 5 percent. ∗ Significant at 10 percent. where ms ict is the mismatch measure based on the child’s self-perceived academic competence in country c, α c is a country fixed effect, and preschoolic is a dummy variable that takes the value of 1 if child i attended preschool for more than six months and 0 otherwise. There are two main differences with respect to the specification used in our baseline model. First, as opposed to the Peruvian case, the school surveys in India and Vietnam do not include a measure that reflects curriculum coverage, which prevents us from constructing the mismatch variables as a direct function of the distance between the child’s skill and his/her school complexity. For these countries, we directly consider a measure for the mismatch based on the child’s self-perceived academic competence at school. Arguably, a child’s self assessment of his/her scholastic competence is directly related to the mismatch between his/her skill and the complexity of the interactions proposed to him/her at school. We argue that, on one hand, a low perceived academic competence is the consequence of a negative mismatch that favors school complexity. On the other hand, a high perceived competence is the reflection of a positive mismatch that favors the child’s skill. The School Surveys collected in India and Vietnam include four items that can be used to characterize the child’s self-perceived academic competence. These items ask the child, on a scale from 1 to 4, if he/she fully agrees (4) or disagrees (1) with statements such as “I always do poorly in tests” or “I can follow lessons easily” (please see table S5.1 in supplementary online appendix S5 for the complete list of items). We rescaled the scores coming from items that express difficulty or lack of competence (such as the first one above) so that a score of 4 reflects high self-perceived competence and 1 expresses low competence. Finally, the mismatch corresponds to the first principal component of these four scores. The distributions of the mismatches considered for India and Vietnam are presented in fig. S5.1 in supplementary online appendix S5. The second difference with respect to Peru is that for India and Vietnam, we only have two rounds of data with comparable PPVT scores, preventing us from including individual fixed effects to control for unobserved heterogeneity as in our baseline model.7 Because of this, we introduce preschoolic as a control variable to deal with potential child heterogeneity that might affect skills and investment. Notice that, as in the Peruvian case, we estimate (9) after instrumenting the number of years of schooling with an indicator variable that identifies whether the child was born before the official school entrance cutoff date (July 1 in India and September 1 in Vietnam). Table 3 presents the estimates of the four parameters involved in (9). Notice that all parameter estimates are statistically significant. The results displayed in table 3 align with the results discussed for the Peruvian case.8 7 Given that the model is dynamic, it would require at least three rounds of data for identification. 8 The estimated parameters are not statistically different from those estimated with the Peruvian data, except for the per- sistence parameter γ 1 , which is close to 1 in the cross-country regression. An essential difference between model (9) and model (5) is the absence of a time-invariant fixed effect that can capture permanent shocks. In a panel with T = 2, a fixed effect model is observationally equivalent to a model without fixed effects and high serial correlation (Arellano 2003). The World Bank Economic Review 39 Figure 4. Relation between the Productivity of Schooling and the Mismatch in India and Vietnam. Downloaded from https://academic.oup.com/wber/article/39/1/26/7640280 by WORLDBANK THIRDPARTY user on 05 February 2025 Source: Authors’ estimations based on the data from rounds 2 and 3 and the School Survey for the Younger Cohort of the Young Lives Study in India and Vietnam. Note: The term φ i3 is the productivity of schooling in round 3; ms i3 is the mismatch between child skill and school complexity, measured in round 3. Vertical lines indicate the cutoff values of the quartiles in the distribution of ms i3 . The figure includes 95 percent confidence intervals. Standard errors were computed using the delta method. As in the baseline model, γ 2 is the maximum value of the marginal effect of schooling. The estimated parameter in table 3 indicates that a 1 percent increase in schooling can produce up to a 0.74 percent increase in skill. Again, departures from the match will reduce this marginal effect. Importantly, table 3 shows that the estimated value of λ is statistically different from zero, which implies heterogeneity in the effect of schooling. Because both parameters γ 2 and λ have a positive sign, the effect of schooling will be increasing in ms ict when mict < 0 and decreasing when mict > 0. The fifth column in table 3 shows that s s the average of the individual-specific effect of schooling for the entire sample is 0.41. Figure 4 displays how the estimated effect of an additional year of schooling changes for different values of the mismatch. The figure shows point estimates and a 95 percent confidence interval. Vertical lines indicate the cutoff values of the quartiles in the distribution of ms ict . Figure 4 confirms the non- monotonic relationship between the effect of the school and the mismatch that we found in the Peruvian case. For students who find the school input too complicated, the marginal effect of schooling will increase if their skill increases or the complexity of the school is reduced (when the mismatch goes from negative to zero). The impact of an extra year of schooling is at its maximum for students whose skill matches the school’s complexity. For students at this point, the marginal effect of schooling will decrease if their skill increases or the complexity of the school falls (when the mismatch goes from zero to positive values). 6. Concluding Remarks We proposed a novel way of modeling the heterogeneous effects of schooling on child skill. For this, we employed the notion that learning (the productivity of schooling) is maximized when the level of skill already attained by the child matches the complexity of the learning experiences offered at school. 40 Castro and Villacorta We tested this function using longitudinal information on cognitive test scores and schooling attained by a large sample of children living in three developing countries: Peru, India, and Vietnam. In particular, we found that the effect of schooling on learning is maximized at the match between child skill and school complexity and that mismatches in either direction are detrimental for learning. This constitutes direct evidence about the importance of tailoring the learning experiences offered at school to the degree Downloaded from https://academic.oup.com/wber/article/39/1/26/7640280 by WORLDBANK THIRDPARTY user on 05 February 2025 of competence of the child, and provides external validity for the experimental results reported in the “teaching at the right level” literature (see Banerjee et al. (2016) and Muralidharan and Ganimian (2019)). Bau (2022) relates the quality of schooling to the match between students’ instructional needs and a school’s instructional level. We complement this analysis by explicitly modeling this match as a function of children’s preschool skill and the curricular complexity of the school. In particular, using data from Peru, we found that the relation between the effect of schooling and early childhood skill can follow an inverted-U shape. Increasing early childhood skill will raise the productivity of the school up to the point where it matches school complexity. Further increases in child skill, however, will reduce the pro- ductivity of schooling as they will widen the mismatch. Based on these results, our framework can also serve to reconcile the empirical findings of the early childhood development literature in which increases in early childhood skill can have both a positive and a negative effect on the productivity of a learning input. Data availability The data used in this article are available online in the UK Data Archive at https://beta.ukdataservice.ac. uk/datacatalogue/series/series?id=2000060. References Agostinelli, F., and M. Wiswall. 2022. “Estimating the Technology of Children’s Skill Formation.” NBER Working Paper No. 22442. National Bureau of Economic Research. Aizer, A., and F. Cunha. 2012. “The Production of Child Human Capital: Endowments, Investments and Fertility.” NBER Working Paper No. 18429. Natioanl Bureau of Economic Research. Andrabi, T., J. Das, A. Khwaja, and T. Zajone. 2011. “Do Value-Added Estimates Add Value? Accounting for Learning Dynamics.” American Economic Journal: Applied Economics 3(3): 29–54. Arellano, M., 2003. Panel Data Econometrics. Oxford: OUP Oxford. (eds I. Guido, E. Gray- ham, P. Adrian and W. Mark). Arellano, M., and S. Bond. 1991. “Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Appli- cation to Employment Equations.” Review of Economic Studies 58(2): 277–97. Attanasio, O., S. Cattan, E. Fitzsimons, C. Meghir, and M. Rubio-Codina. 2020. “Estimating the Production Func- tion for Human Capital: Results from a Randomized Controlled Trial in Colombia.” American Economic Review 110(1): 48–85. Attanasio, O., C. Meghir, and E. Nix. 2020. “Human Capital Development and Parental Investment in India.” Review of Economic Studies 87(6): 2511–41. Banerjee, A., R. Banerji, J. Berry, E. Duflo, H. Kannan, S. Mukherji, M. Shotland, et al. 2016. “Mainstreaming an Effective Intervention: Evidence from Randomized Evaluations of ‘Teaching at the Right Level in India’.” NBER Working Paper No. 22746. National Bureau of Economic Research. Bau, N. 2022. “Estimating an Equilibrium Model of Horizontal Competition in Education.” Journal of Political Econ- omy 130(7): 1717–64. Campbell, J. M., S. K. Bell, and L. K. Keith. 2001. “Concurrent Validity of the Peabody Picture Vocabulary Test as an Intelligence and Achievement Screener for Low SES African American Children.” Assessment 8(1): 85–94. Cueto, S., J. León, G. Guerrero, and I. Muñoz. 2009. “Psychometric Characteristics of Cognitive Development and Achievement Instruments in Round 2 of Young Lives.” Young Lives Technical Note 15. Young Lives. The World Bank Economic Review 41 Cueto, S., and J. Leon. 2012. “Psychometric Characteristics of Cognitive Development and Achievement Instruments in Round 3 of Young Lives.” Young Lives Technical Note 25. Young Lives. Cunha, F., J. Heckman, and S. Schennach. 2010. “Estimating the Technology of Cognitive and Noncognitive Skill Formation.” Econometrica 78(3): 883–931. Dunn, L. M., D. E. Lugo, E. R. Padilla, and L. Dunn. 1986. Test de Vocabulario en Imagenes Peabody. Circle Pines, Downloaded from https://academic.oup.com/wber/article/39/1/26/7640280 by WORLDBANK THIRDPARTY user on 05 February 2025 MN: AGS Publishing. Fosnot, C., and R. Stewart. 2005. Constructivism: A Psychological Theory of Learning. New York: Teachers College Press. Glewwe, P., M. Kremer, and S. Moulin. 2009. “Many Children Left Behind? Textbooks and Test Scores in Kenya.” American Economic Journal: Applied Economics 1(1): 112–35. Guerrero, G., J. Leon, S. Freire, S. Cueto, E. Rosales, M. Zapata, and V. Saldarriaga. 2012. “Young Lives School Survey in Peru: Design and Initial Findings.” Young Lives Working Paper No. 92. Young Lives. Heckman, J. 2006. “Skill Formation and the Economics of Investing in Disadvantaged Children.” Proceedings of the National Academy of Science 312(5782): 1900–02. ———. 2007. “The Economics, Technology, and Neuroscience of Human Capability Formation.” Proceedings of the National Academy of Science 104(33): 13250–55. Heckman, J., and S. Mosso. 2014. “The Economics of Human Development and Social Mobility.” Annual Review of Economics 6: 689–733. Leppink, J., T. van Gog, F. Paas, and J. Sweller. 2015. Cognitive Load Theory: Researching and Planning Teaching to Maximise Learning. In Researching Medical Education. (eds J. Cleland and S.J. Durning).New Jersey: Wiley- Blackwell. Paas, F., T. van Gog, and J. Sweller. 2010. “Cognitive Load Theory: New Conceptualizations, Specifications, and Integrated Research Perspectives.” Educational Psychology Review 22(2): 115–21. Paxson, C., and N. Schady. 2007. “Cognitive Development among Young Children in Ecuador the Roles of Wealth, Health and Parenting.” Journal of Human Resources XLII(1): 49–84. Schady, N., J. Behrman, M. C. Araujo, R. Azuero, R. Bernal, D. Bravo, F. Lopez-Boo, et al.. 2015. “Wealth Gradients in Early Childhood Cognitive Development in Five Latin American Countries.” Journal of Human Resources 50(2): 446–63. Singh Muralidharan, K. A., and A. Ganimian. 2019. “Disrupting Education? Experimental Evidence on Technology- Aided Instruction in India.” American Economic Review 109(4): 1426–60.