WPS7429


Policy Research Working Paper                           7429




      Are Public Libraries Improving Quality
                  of Education?
     When the Provision of Public Goods Is Not Enough

                                Paul Rodriguez-Lesmes
                                 Jose Daniel Trujillo
                                 Daniel Valderrama




Poverty Global Practice Group
September 2015
Policy Research Working Paper 7429


  Abstract
 This paper analyzes the relation between public, educa-                            decomposition on diﬀerence-in-diﬀerences estimates
 tion-related infrastructure and the quality of education in                        to assess whether variation of traditional determinants
 schools. The analysis uses a case study of the establishment                       of mathematics, verbal, and science test scores explains
 of two large, high-quality public libraries in low-income                          the estimates. The analysis ﬁnds differences that are
 areas in Bogotá, Colombia. It assesses the impact of these                         not statistically diﬀerent from zero that could be attrib-
 libraries on the quality of education by comparing national                        uted to the establishment of the libraries. These results
 test scores (SABER 11) for schools close to and far from the                       are robust to alternative speciﬁcations, a synthetic con-
 libraries before (2000–02) and after (2003–08) the librar-                         trol approach, and an alternative measure of distance.
 ies were opened. The paper introduces a Blinder-Oaxaca




  This paper is a product of the Poverty Global Practice Group. It is part of a larger effort by the World Bank to provide
  open access to its research and make a contribution to development policy discussions around the world. Policy Research
  Working Papers are also posted on the Web at http://econ.worldbank.org. The authors may be contacted at p.lesmes.11@
  ucl.ac.uk, jdtrujillos@dane.gov.co and dvalderramagonza@worldbank.org.




         The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
         issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
         names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
         of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
         its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                       Produced by the Research Support Team
       Are public libraries improving quality of education?
             When the public good provision is not enough∗

                                               ıguez-Lesmes
                                      Paul Rodr´
                            PhD-Student, Department of Economics UCL


                                           e Daniel Trujillo
                                        Jos´
                                          Consultant, DANE


                                         Daniel Valderrama
                                       Consultant, World Bank




Keywords Libraries; Quality of education; School quality; Public good provision.
JEL codes D62, I21, H52.

   ∗
    This project was funded by the ICFES under a research grant for graduate students in 2010. We express
                       ˜
our gratitude to Hugo Nopo,  Andr´        ıa, Ali Sharman and valuable comments from ICFES seminars.
                                   es Garc´
The ﬁndings, interpretations, and conclusions expressed in this paper are entirely those of the authors.
They do not necessarily represent the views neither of ICFES nor other institutions that authors are part
of. This paper was already accepted and published as Article in the Journal Desarrollo and Sociedad N.
74, Pag 225-274


                                                   1
1    Introduction
Facilitating public access to information, the traditional primary function of libraries, is
being challenged by the information revolution. However, public libraries serve multiple
functions beyond their role in disseminating materials. A big movement of public library
construction undertaken in the developing world reﬂects these functions by emphasizing
libraries as the center of social transformation in deprived slums, providing the general
population, especially the less well-oﬀ, with access to meeting spaces, cultural activities,
technology, and information services, among others. For example, impressive (and expen-
                                                                                         ın
sive), massive public libraries were constructed in the most impoverished areas of Medell´
                                                        a (Colombia). These libraries are
(Colombia), in zones with high criminal rates, and Bogot´
not only places where you can ﬁnd books or magazines for free, but also places oﬀering a
wide range of services which are intended to motivate the general public towards culture
and education and, ultimately, to change the living conditions of the people.
    The goal of this paper is to establish the impact on the quality of education of the 2001
construction of two of these massive libraries (from here on mega-libraries) in the city of
     a (Colombia). Even public schools provide services to a selected group of students,
Bogot´
thus they can be considered a private asset in a sense. Public libraries, however, are
available to students from diﬀerent schools. Thus, this study will tell us something about
the possible eﬀect of truly public, education-related infrastructure on quality of education.
It is also possible to assess latent complementarities between public (libraries) and private
(schools) educational services in enhancing quality education by estimating the eﬀect of
libraries on the returns that certain school characteristics have on education. In other
words, the paper studies how public libraries aﬀect the quality of education and to what
extent this could be through the enhancement of services provided by schools.
    This paper contributes a new perspective to the literature on the determinants of qual-
ity of education. This literature is generally limited to the use of private characteristics
from the school and from the family to explain diﬀerences in student performance. By
widening the perspective of determinants beyond the walls of the school and the house,
this paper contributes to the education literature, looking towards public goods that are
around the schools and which could be used to enhance the impact of schools’ inputs. At
the same time, considering that the main objective of libraries is not their direct inﬂuence


                                              2
on quality of education in schools, this paper contributes to the urban economics litera-
ture by analyzing the existence of externalities and complementarities between this kind
of public infrastructure and schools or households near to the libraries.
   The causal eﬀect of access to public libraries on student academic performance is as-
sessed using a Diﬀerence-in-Diﬀerences (DiD) methodology, combined with propensity score
matching as a robustness test of the results. The procedure takes advantage of the spatial
location of the libraries with the ﬁrst, El Tunal constructed on the grounds of a public
park, and the second, El Tintal, in an old garbage processing plant. We compare the
average results on standardized test scores at the end of secondary level studies of schools
(SABER 11) close to the libraries and those far from them from 2000 to 2008, that is,
before and after the libraries’ opening. This concept is implemented under both paramet-
ric and nonparametric speciﬁcations of the relationship between distance to the library
and test scores. We also implement Oaxaca-Blinder decomposition of the impact of the
program on the quality of education to explore the possible improvement via the variation
of traditional inputs of education quality.
   Given our speciﬁcation, we are considering both the direct and indirect impacts that the
libraries could have on student performance. Direct impact might come from the possibility
that students living close to libraries access library services and programs independently
or that nearby schools deliberately take advantage of the library for their own activities.
Indirect eﬀects might come from the impact of the renovation of the public infrastructure
on the area which could improve crime perceptions, the general mood of the population, or
other neighborhood eﬀects. Due to the lack of information on students’ actual residences
or on speciﬁc school programs which take advantage of the libraries, we cannot assess these
channels separately.
   Our main results show that while the relationship shows the expected positive sign,
results are not statistically signiﬁcant. This either tells us that the libraries are not fully
exploited by schools or that the possible gains are concentrated among particular types of
individuals. This opens the question of how aligned incentives are to foster cooperation
between schools and public libraries in order to improve the quality of education. Perhaps
it is not enough to construct beautiful and well-equipped libraries that are near to schools;
a second generation of policies might be required to enhance the coordination between these



                                              3
libraries with the current educational environment of neighborhood schools and households.
    The remainder of this paper is organized as follows: Section 2 discusses the theoretical
links between libraries and quality of education. Next, Section 3 describes the program
and its context, Section 4 presents data on quality of education and other controls. Section
5 discusses the identiﬁcation strategy and decomposition of the eﬀect, Section 6 presents
the results and Section 7 concludes.


2    Libraries and academic performance
Vegas and Petrow (2008) classify determinants of education into demand-based and supply-
based components. Both groups include tangible and intangible inputs deﬁned by students’
access to private facilities or their environments. For instance, on the demand side, im-
portant inputs include an environment, deﬁned by parental characteristics, that promotes
study (Fertig and Schmidt, 2002; World Bank, 2005) and the availability of educational
resources in the household, like books or well used internet (Murnane et al., 1981; Gamboa
et al., 2010; Blomeyer et al., 2009). On the supply side, libraries are included as physical
infrastructure along with other, intangible, inputs which are generally considered more im-
portant, such as educational policy which incentivizes competence in schools and teacher
quality (Hanushek and Woßmann, 2007).
    Focusing on the impact of libraries on education beyond the ’infrastructure’ component
of schools, Lance (1994) in a largely descriptive study of improvements in school perfor-
mance that are associated with libraries in Colorado, shows a relationship between the
availability of libraries and speciﬁc skills such as reading, writing and critical thinking.
Similar relationships are discussed in Lance and others’ further research of libraries in the
United States (Lance, 1994; Lance et al., 2000; Rodney et al., 2002) and the United King-
dom (Williams et al., 2001). Lonsdale (2003) provides a review of studies linking libraries
to educational outcomes, such as Smith (2001) which argues that libraries improving school
performance by 4%. However, this literature does not involve a causal analysis; it rest on
correlation and qualitative analysis.
    In terms of proper causal analyses, few in the literature analyze libraries themselves.
The most relevant literature analyzes the impact of programs which make learning materials
more available in schools on educational outcomes. These learning materials, a traditional

                                             4
part of library services, are: textbooks (Glewwe et al., 2009), ﬂipcharts (Glewwe et al.,
2004) and computers in schools (Barrera-Osorio and Linden, 2009). Across programs, each
with its own particularities, no authors ﬁnd impact of the respective learning material on
the quality of education received by the average student.1 . However, these evaluations do
not consider the joint eﬀect derived from the interaction of these learning materials, an
eﬀect that could be captured in an analysis of public libraries given that these institutions
provide learning materials simultaneously.
       Borkum et al. (2013) is the only study found that explores the role of libraries on
educational outcomes. In an evaluation of an educational program in Bangalore, India
that provides high quality libraries to public primary schools, the authors ﬁnd no impact
of school libraries on scores of diﬀerent subjects and on dropout rates. Given that this study
does not consider public libraries and, most importantly, the type of public libraries that
we are considering (mega-libraries), the present study is the ﬁrst that presents evidence
on causality between public mega-libraries2 on educational outcomes within impoverished
areas in a developing country.
       We propose that the production function of education quality for school i, Yi , in urban
areas includes not only the demand characteristics that it faces, X1,i , and private supply (in
this case, schools) characteristics, X2,i , but also the beneﬁt from public, education-related
facilities Zi (equation 1). This additional input acts as a complement to the education
provided by schools. Assuming that these institutions do have a positive impact on the
skills related to test-scores of their users, the relationship between Z and Y might vary
according to the interaction between both the demand and supply elements related to using
the public, education-related facilities. In other words, the impact of public, education-
related facilities on quality of education depends on the degree to which both families
directly use them and schools facilitate their use.3 Let us consider two examples: ﬁrst, for
school managers who obtain more beneﬁts for promoting activities related to a particular
public facility than others, Z might be larger; second, families living far from public facilities
   1
     In an evaluation of the impact of textbooks on student achievement, Glewwe et al. (2009) ﬁnds a
localized positive eﬀect on those students who already had relatively high achievement
   2
     Mega-libraries are not just large buildings full of learning materials but represent a catalyst for rede-
velopment of urban zones and repositories of new public spaces.
   3
     Positive returns to higher levels of school quality based on facility use in Colombia are expected for
families (Gamboa and Rodr´   ıguez-Lesmes, 2014). However, it is not clear that all schools have the same
incentives (Gaviria and Barrientos, 2001).


                                                      5
are less likely to beneﬁt from them due to credit or time constraints, which will be reﬂected
in a lower value of Z than for those who live close by.


                               Yi = f (X1,i , X2,i , Zi (X1,i , X2,i ))                     (1)

    Our data are limited by only one kind of public, education-related facility (the mega-
libraries) to calculate Z and a we do not have information about the relation between
schools-households and libraries, so we cannot disentangle the relationship between Z and
Y at the level of detail just explained. Given these data restrictions, our data will use the
proximity of schools to the libraries as a proxy of Z .
    In order to link the relation between the schools and libraries we use as measure of
intensity the distance between both. That is, we will identify the diﬀerence δ of being close
rather than far to the public facility based on assigning a discrete value of T = 1, if a school
is within a close range of a library and T = 0 if the school is outside of this range. Our
main assumption is that if a school is far enough away from the public facility, its students
do not receive any beneﬁt from it (Z = 0, as shown in the Equation 2).

             δ   =   E [Yi |T = 1, X ] − E [Yi |T = 0, X ]
                 =   f (X1,i (T = 1), X2,i (T = 1), Z (X1,i (T = 1), X2,i (T = 1)))
                     −f (X1,i (T = 0), X2,i (T = 0), Z (X1,i (T = 0), X2,i (T = 0)))
                 =   f (X1,i (T = 1), X2,i (T = 1), Z (X1,i (T = 1), X2,i (T = 1)))
                     −f (X1,i (T = 0), X2,i (T = 0), 0))                                    (2)


3    BibloRed program and Colombian schools
                                 a’s local administration designed in 1998 and opera-
BibloRed is a program which Bogot´
tionalized by the end of 2001. The idea was to allow the general population to get access to
information services and reading and writing resources. However, the program also seeks
to foment cultural growth and promote research. In the ﬁrst stage, the operation started
with 3 major libraries (El Tunal, El Tintal and the Virgilio Varco ), 15 minor libraries and
1 bibliobus ; almost ten years later another major library started operations (Julio Mario
Santodomingo ). Each major library has an area of around 10,000 square meters, 150,000

                                                  6
volumes and 600 reader seats (Tolosa, 2012). Information services not only include books
and magazines, but also children’s rooms with specialized staﬀ, programs for babies and
their parents, activities for teens, workshops in literature, puppets, etc. The intention
is to attract the public with these activities while integrating education into them. One
of the main projects occurs over holidays, when BibloRed implements Bibliovacaciones, a
program with the activities mentioned plus cost-free art, history and literature exhibitions
such as theater plays and ﬁlms. In this context, it is evident that these libraries have
many activities which enhance the quality of life, particularly through their integration
of culture; thus, the possible eﬀect on the educational performance of children and young
people is just one of the multiple beneﬁts that libraries bring to society.
      Since it is not possible to have information on which of the test-takers actually use
the libraries, we propose to use the distance of libraries to their schools as an alternative
indicator for treatment status. As discussed in the previous section, this rests on the
assumption that the use of libraries is likely to be higher for those living closer than for
those who live far, supported by travel costs to libraries incurred by the latter which
reduce students’ incentives to visit them frequently. According to Table 1, 77% of students
        a live less than 20 minutes from the school they attend. As a result, it is a fair
in Bogot´
assumption that distance from school to the library approximates the distance from the
library to students’ residence and, therefore, the likelihood that they live in an environment
aﬀected by libraries.
      The Euclidean distance between the school and the local library is shown in Figure 1.We
calculate it based on the information on the spatial location of each school as speciﬁed by
     a’s Department of Education. Alternatively, we use road-based distances as shown
Bogot´
in Figure 2.4 Figure B.1 presents the link between both distances. As expected, the road-
based distances all fall above the blue line corresponding to the 45-degree line. The black
dotted line is the predicted linear relationship between both measures, which captures up
to 80% of the total variation. As a robustness check, the main estimators are repeated
using the ﬁtted distance.5
  4
    These calculations were made using ESRI ArcMap 10.2 Closest Facility Analysis. The road network
was obtained from Open Street Map project (OSM).
                                        ˆ
  5
    More explicitly: AdjustedRD = RD  −β 0         ˆ come from the OLS regression between road distance
                                           , where β
                                     ˆ1
                                     β
RD and euclidean one ED: RD = β0 + β1 ED + u



                                                  7
Figure 1: Libraries and treatment status allocation: euclidean distance




                                  8
           Figure 2: Libraries and treatment status allocation: road distance




             ¯
                                        El Tintal




                                                      El Tunal




                  Libraries

                  Schools
             Zones (Roads)
                  Treatment (1500)
                  Control (1500-3500)




   El Tintal and El Tunal libraries are located in middle-low income zones, where most of
the students attend nearby schools. Schools near to Virgilio Barco and Julio Mario Santo


                                                9
Domingo are populated by, on average, wealthier families which are more likely to live far
from school and use private transport for the daily commuting. If we include the last two
libraries, our approximation of taking the distance between the library and the school to
represent the treatment status will not be accurate. As a result, we decided to include only
El Tintal and El Tunal libraries in this analysis.
       In Colombia, schools can be classiﬁed according to four important characteristics that
are closely related with the quality of education in the literature. These characteristics
are: whether the school is managed by the government, the proportion of females to males
attending the school, the start of the academic year and the length of the school day. In
regards to the ﬁrst characteristic, most of the students who would demand the services
of libraries are part of the government-managed education system. Public schools are
free at the primary level and have low tuition fees at the secondary level, but provide a
lower quality of education than private schools (N´
                                                  un˜ez et al., 2002).6 In regards to the
second characteristic, the fact that some parents may prefer speciﬁc types of education
such as religious institutions or gender-speciﬁc schools could be correlated with demand-
side factors. With respect to the start of the academic year, schools can be calendar A or
calendar B, which means they start in January or August, respectively. While calendar
A is the norm, calendar B schools are typically private institutions usually designed in
order to follow European or US schedules. This typically means that calendar B schools
have higher test scores due to the strong selection related to the high income of students’
families. Finally, schools can serve students for a full school day (12 hours) or implement
double-shifts, with some students coming in the morning and others in the afternoon.                     7

Double-shifting is usually associated with lower academic results in the Latin American
                                    ıa (2011).
context as documented by Bonilla-Mej´
   6
     A small number of public schools are managed by the private sector and seem to follow a diﬀerent
pattern (Sarmiento et al., 2005). None of them is close enough to our libraries.
   7
     Other schools include night shifts or weekend shifts, but we will not consider them. Typically, these
institutions are intended for young adults, who want to ﬁnish their secondary education after dropping out,
thus the education incentives and the environment is totally diferent from a typical student.




                                                    10
4     Data

4.1   Quality of education data

Our measure of education quality is the Colombian equivalent to the SAT, the SABER 11
test administered by the ICFES (Colombian Institute for Evaluation of Education) which
is part of the Ministry of Education. It includes a comprehensive evaluation of diﬀerent
areas of knowledge, speciﬁcally mathematics, verbal and sciences (biology, physics and
chemistry). The test is carried out twice per year due to the existence of two main school
calendars, and, though it is not compulsory for graduation, it is an entry requirement by
universities in order to use it as a common ﬁlter for selecting their new students. In order
                                                                           a level in each
to ensure comparability, test results are standardized by wave at the Bogot´
one of the described subject areas and an average is taken of the scores (called here the
general result).
    Tables 3 and 2 show average, standardized test scores of schools according to their
characteristics including only the universe of schools used in the estimation, speciﬁcally,
Bogota schools located within a 3.5 Km range around the libraries as shown in Figure
1. Table 3 shows that students attending schools with a full-day schedule score higher,
on average, than students attending double-shift schools. Among the latter, the students
attending school in the morning score higher, on average, than those attending schools in
the afternoon. This is related to the management of the school: students attending those
managed by the government typically do worse than those managed by the private sector,
which are normally private institutions. These relationships are stable over time and a
common factor in the Colombian quality of education literature (Gaviria and Barrientos,
2001). Table 2 shows that there are also diﬀerences in test scores between students who
attend diﬀerent types of schools in terms of school size, the teacher-student ratio, the
female-male student ratio, and teacher education level. These are all traditional inputs of
education that we will discuss further in the next section
    Table 4 shows a U-shape relationship between school quality and distance to the li-
braries. Schools close to the libraries are normally better than those at a medium-range
distance (1 Km - 2.5 Km), but worse than or similar to those far away (2.5 Km - 3.5
Km). As this relationship might be driven by the allocation of inputs, our next section



                                            11
will analyze them in more detail.


4.2       Other variables and data restrictions

In order to take into account other sources of variation that might be correlated with
distance to the libraries, we take into account variables that the literature has identiﬁed
as key determinants of the quality of education. Variables used to control for institutional
characteristics come from the C600 (a registry of students and school staﬀ) and C100 (a
registry of school infrastructure) from the Ministry of Education. Neighborhood controls
are derived from the General Population Census of 2005 conducted by DANE (national
statistics department). The relationship of these variables to our measures of quality of
education is described in Table 3.
   Though C100 information is only available starting from 2002, it provides valuable
information on the physical infrastructure of schools. It includes data on sports facilities,
the presence of a school library and a measure of the quality of educational assets, a dummy
which is one if the school has simultaneously computer, physics and chemistry labs. From
the C600 form we introduce several time-varying variables per school which are related to
the supply-side of quality of education. First, we take into account the number of students
per school in a logarithmic scale and the teacher-pupil ratio of the school. Larger schools
are correlated with better results. To provide us with an idea of the overall quality of
the facilities, we include the area in squared meters of classrooms and sport facilities per
student. We also take into account the proportion of teachers with a graduate degree as a
proxy of their human capital. As the public sector incentivizes the concentration of teachers
with more qualiﬁcations, its relationship with quality seems to be negative as described by
 un
N´ ˜ez et al. (2002). Gender diﬀerences might be relevant, so we include the proportion of
female students and teachers. Finally, we include some controls speciﬁc to the examined
cohort: its size and the ratio of female test-takers. This data were cleaned by removing
schools with a teacher-student ratio greater than 0.5 (one teacher for every two students)
or equal to 0 (no teacher to student) as these ratios indicate that the data may contain
errors.
   Finally, neighborhood-level controls are available at the census block level from 2005.
We averaged the information of the blocks which were at least 50 meters from the school.


                                             12
These controls are the average age and the share in the block of the population who are
students, who have at most primary education, who immigrated from other municipalities
and from rural areas during the last 5 years, who are of working age, who are working or
looking for a job and who fasted for one week.
      Tables 5 and 6 report for diﬀerent ranges of distance to the library (column 1) the
number of schools-students (column 2) and the number of schools-students used in the
model (column 3), respectively.8 The diﬀerence between columns two and three are due to
information gaps in C600. Hot Deck imputation methodology was used to minimize the
                                                    aez and Buitrago (2010) based on
number of missing, following the implementation of B´
˜
Nopo (2008) idea about donors and receptors.


4.3     Test scores and distance to the libraries

After observing the data on the relationships between some features of the campus and
the quality of education, and considering the causal impact that the literature attributes
to these features, it is prudent to identify whether the location of the libraries is correlated
with the type of schools. Table 7 addresses this question by calculating the average charac-
teristics of schools that are located in diﬀerent ranges from the nearest mega-library. The
main observation is that the nearest schools are more likely to be public. As public schools
                                 un
tend to have lower test scores (N´ ˜ez et al., 2002; Gaviria and Barrientos, 2001), the cor-
relation between education quality and the distance of the libraries is negative. A ﬁrst
approach to the impact of libraries on test-scores score is to explore the score-distance re-
lationship after deducting the impact of variation of common determinants from the score.
For this, we turn to a classic semi-parametric model. A partial linear regression allows us
to see a non-linear relationship as presented in Equation 3.9 In it, Y is the score, X is
the controls, u is an error such that E [u|d, X ] = 0 . Figure B.2 shows the estimates m
                                                                                       ˆ (d),
which gives the relationship between the score and the distance variation by discounting
usual controls.


                                     Y = m(d) + Xβ + u                                       (3)
  8
   In the case of public institutions with a school is considered as the combination seat-day.
  9
   The estimation was performed following the algorithm diﬀerences Yatchew (1997), implemented by
Lokshin (2006).


                                               13
    We found a U-shaped relationship where the minimum is centered near 1500 meters. As
a result, our analysis will be particularly focused on schools located between 750 and 2000
meters from libraries, where the impact of libraries is likely to reach. However, these graphs
are used just to explore the relationship, because they include unobserved determinants u,
in fact the U pattern is found both before and after 2002.
    To estimate the eﬀect we must assume that unobservable variables can vary across
the distance, but the time variations of these unobservable variables are not related with
distance. This restriction allows us to identify the average impact on the schools ‘close’ to
the libraries compared to those that are ‘distant’ and supports the motivation to use the
DiD strategy, as will be discussed in the next section.


5    Empirical Strategy
The impact of libraries on quality of education is identiﬁed using the Diﬀerence in Diﬀerence
(DiD) method. We deﬁne the schools ‘near’ to the libraries as treated, and those ‘far’
from them as controls. That is, we are assuming that any diﬀerence between these two
groups of schools would have been preserved if no libraries were constructed (parallel
trends assumption). It is important to remember that in these cases the ‘libraries’ refer to
the entire intervention on the public infrastructure and urban planning development that
occurred in those areas. Thus, the estimation is based on the provision, not the intensity
of use, of libraries which is assumed to be a function of the distance of the school to the
physical building.
    The identiﬁcation strategy involves two stages: the ﬁrst refers to the measure of the
magnitude and signiﬁcance of the impact, and the second is to decompose it into the impact
due to changes in observed inputs and to variations not linked to those inputs. The de-
composition addresses the question of complementarities between libraries and traditional
determinants of the quality of education, in other words, how the libraries enhance the
impact of traditional inputs already present in schools.
    As described before, our treatment indicator is the spatial proximity from schools.
However, being ‘near’ or ‘far’ is an arbitrary deﬁnition and requires a selection rule that is
part of the research question. Discrete and continuous options were considered to deﬁne
exposure treatment using the distance of each school to the libraries, d .

                                             14
   A ﬁrst alternative (continuous approach ) is to impose a parametric restriction on the
relationship between the distance to library and test scores. Given the results from the
partially-linear regression, it is possible to presume that the impact decreases with the
inverse of distance up to some far, arbitrary cutoﬀ R1 where we set the impact to be
                                                                                       R1
exactly 0, including all the schools within a ﬁxed radius R2. Hence, we deﬁne T =      d    −1
if d ≤ R1 and T = 0 if d ≥ R1. For this speciﬁcation we present results for ratios
R1 ∈ {1500, 2000, 2500, 3000, 3500} and R2 = 3500.
   On the hand, the eﬀect could be discontinuous (discrete approach ). Hence, in order to
avoid any assumption on the distance-scores’ relation, schools within a certain ratio, R2 is
assigned into treated T = 1 and control groups T = 0 using an arbitrary distance to the
library cut-oﬀ R1. This speciﬁcation, henceforth Discrete I, is represented in Figure 1. An
alternative, Discrete II, is to omit some schools between treatment and control zones, so
the control zone starts at R3 ∈ [R1, R2]. Implementing diﬀerent cut-oﬀs in the analysis
did not show substantial diﬀerences. We will present results using R2 = 3500, R3 = 2000
and R1 ∈ {750, 1000, 1250, 1500, 1750, 2000}.


5.1    Estimation of the general impact (DiD)

We deﬁne the average treatment eﬀect on the treated δ τ , as the impact on average test
scores at year τ for schools that are located close to the libraries in comparison to those
that are far from them. If we consider the continuous treatment scenario, the fullest impact
occurs for schools that are located right next to one of the libraries. This parameter is
estimated using the classic setup as presented in equation 4. Let Yit be the average test
scores of school i at year t, Ti the treatment status of each school, At a dummy that is
1 if t ≥ 2003, 1(τ = t) is an indicator for year τ being equal to year t, and ﬁx eﬀects γi
and γt . For this speciﬁcation, we assume that the parallel trends hold conditional on the
school-level controls Xit .

                           2008
                  Yit =             δ τ Ti · 1(τ = t) + β1 At + ηXit + γi + γt + eit        (4)
                          τ =2003

   The identiﬁcation assumption might be too strong; schools placed in diﬀerent areas
might follow dissimilar trends due to uncontrolled factors. For instance, migration of


                                                    15
people with diﬀerent willingness to spend on education may shape schools’ investments in
a way that is not captured by our current covariates. In essence, some schools might be
improving while others worsening. In order to address this, we can include school-speciﬁc
trends10 , t · γi , as shown in equation 5. The limitation of this approach is that trends can
diﬀer only as long as they do so in a linear fashion.

                         2008
                Yit =             δ τ Ti · 1(τ = t) + β1 At + ηXit + γi + γt + ωi t · γi + eit             (5)
                        τ =2003


5.2      Propensity Score Matching and Synthetic Control

One of the main concerns with the DiD method for studies with limited control units
is how to choose the best control when there are few treated units, which implies high
sensitivity of the estimation to the control selection, and when the unit of observation is an
aggregate (eg. countries, states or schools). Abadie and Gardeazabal (2003) introduced an
approach known as the ‘synthetic control’ to deal with these problems. The idea is to select
a set of weights for the control units to construct the parallel trends between outcomes
before the intervention. However, as is suggested by Abadie and Gardeazabal (2003), the
synthetic control needs a long period of time prior to the intervention in order to control for
structural patterns in both observables and non-observables (Abadie et al., 2010). Given
that there are just three years available before the implementation of the mega-libraries
and that the objective is to forecast over the next six years, the synthetic control strategy
might lead to misleading results. An alternative that might be more suitable is to weaken
the DiD parallel trends assumption by introducing matching into the pre-treatment period
(Blundell and Dias, 2009). The matching estimator relies on the minimization of a distance
function which is increasingly hard to estimate with the number of included covariates. A
traditional way to simplify this problem, when there is more than one treated unit, is
to perform the matching based on the predicted likelihood of being a treated unit, the
propensity score (Rosenbaum and Rubin, 1983).
      In this paper we combine both approaches by implementing kernel propensity score
matching11 (Heckman et al., 1997) that includes as controls the pre-treatment evolution
 10
      For other applications that introduce this technique, see for instance, Besley and Burgess (2004).
 11
      The procedure was implemented using psmatch2 (Leuven and Sianesi, 2014) in Stata 12.


                                                       16
of test scores, which is in line to the synthetic control matching step. Once the synthetic
control is constructed by re-weighting the non-treated schools, DiD speciﬁcations from
equations 4 and 5 are applied.12
      In doing so, the underlying identiﬁcation assumption changes slightly. Once the ob-
served covariates are taken into account, and schools close and far from the libraries follow
similar time-trends or diﬀer in a linear way, estimated impacts can be attributed to the
mega-libraries. However, keep in mind that the identiﬁcation will be invalid if there were
events that were not considered and aﬀected some of the schools (either close or far from
libraries) and not the others.
      Apart from the 2000-2002 test scores, the matching variables considered are the fol-
lowing: the proportion of teachers with graduate studies, pupil-teacher ratio, public school
dummy, morning school day dummy, complete school day dummy, female-teacher ratio,
11th grade female-male students ratio, 11th grade students, total students, girls-students
ratio, built area per student, classrooms area per student, sports area per student, and a
dummy for the presence of a school library.


5.3      DiD-OB: Decomposition of the impact

As discussed, the construction of the libraries implied a massive urban development. As
a result, it is likely the mega-libraries triggered changes in other inputs. For instance, the
construction of mega-libraries could lead to emigration from the area due to changes in real
estate prices, also they could change the number of private schools or the teacher-student
ratio. Thus part of the observed changes between schools close and far from libraries would
be due to this channel. Hence, we would be interested on see if the program had an impact
on the inputs and such variation explain part of the outcomes diﬀerence, let’s call that
part ∆X , and if there is part of that impact that is not due to them, ∆0 , instead this
part of impact could be due to changes on the impact that teachers with high level of
education could has with the presence of the libraries or could be due to changes in the
eﬃciency of public schools who engage with the libraries’ services. In that case, ∆0 would
be more likely to be related with the complementarity between schools and libraries. This
is achieved by implementing a novel strategy, proposed in this study, that introduces the
 12
      As the matching is based on discrete categories, the continuous approach cannot be implemented.



                                                    17
Oaxaca (1973) and Blinder (1973) decomposition into a DiD context (see the appendix for
details). The conditions for the identiﬁcation of the eﬀect are the usual parallel trends of
DiD but without conditioning on covariates. The decomposition is obtained by applying
equation 6.



 Yit = α0 + α1 Tit + α2 Ait + α3 X + α4 Xit · Tit + α5 Xit · Ait + α6 Tit · A + α7 X · Tit · Ait + u (6)

    From this equation, we can deﬁne the impact generated by the covariates variation
(induced by the programme) ∆X , and the variation that is unrelated to them, ∆0 :

δ = (E [y |T = 1, A = 1] − E [y |T = 0, A = 1]) − (E [y |T = 1, A = 0] − E [y |T = 0, A = 0])
δ = ∆0 + ∆x
δ = α6 + (α4 + α5 + α7 )E [X |T = 1, A = 1] − α5 E [X |T = 0, A = 1]) − α4 E [X |T = 1, A = 0]
    + α3 [(E [X |T = 1, A = 1] − E [X |T = 0, A = 1]) − (E [X |T = 1, A = 0] − E [X |T = 0, A = 0])]

    Standard errors are calculated by bootstrapping due to the lack of an analytical expres-
sion for them. In order to present results by year, the strategy is implemented by comparing
the pre-intervention period against each treatment-year in a separate regression.


6     Results and discussion

6.1    Classic DiD strategy

First using the parametric approach, we compare the evolution of the treatment group in
each year from 2003 to 2008 against the pre-treatment period, 2000 to 2002. In Table
8, we consider the intensity of treatment to be inversely proportional to the distance. It
ranges from 1, the intensity received by a school in front of the library, to 0, a school that
is located R1 meters or further. The general impact of being just beside the library implies
an increase on average scores between 0.02 and 0.06 standard deviations (R1=1500 for
2003 and 2008, respectively). This impact is lower when we assume that there is a slower
decay in the beneﬁt received based on distance (higher R1), suggesting that the area of the



                                                  18
impact is relatively small. However, those impacts are not statistically diﬀerent from 0.
   Table 9 presents the results from the discrete approach. In Panel A the treatment
group are those schools between 0 and R1 meters from the libraries and the controls are
those from R1 to R2 (ﬁxed at 3.5 Km), as shown in the map from Figure 1. Estimates
range between 0.21 for the lowest ratio in 2005 and -0.05 for the largest. This is consistent
with the previous speciﬁcation, which found that the impact is greater for the nearest
schools. However, there is no evidence of impact diﬀerent from 0. Similar results are found
in the last speciﬁcation, shown in panel B, where the controls are those schools between
R3 = 2000 and R2. That is, we are not taking into account those schools between R1 and
R3 meters. These results are also presented in Figure 3, as a reference for comparison.

                         Figure 3: Euclidean Distance Estimators




   Equation 5 relaxed the parallel trends assumptions by allowing school-speciﬁc trends.
Figure B.3 shows that for both the discrete speciﬁcation II and the continuous approaches,


                                             19
schools which are very close to the libraries seems to have a declining trend in the outcome.
However, that pattern is still statistically non-diﬀerent to zero.
   One clear concern is the measure of distance. The Euclidean approach might not
capture the real cost to travel between points in certain contexts. For instance, there
might be restrictions due to geographic accidents or infrastructure. However, in this urban
context it might not be a bad approach. An alternative that takes these issues into account
is road distance, which measures the total distance necessary to reach a mega-library while
using the road infrastructure. Figure B.4 presents the main estimates using this approach.
In order to be able to compare both main and additional results, the road distance was
rescaled using a linear function (see section 3) as the relevant diﬀerence might come not
from the absolute position of each school but from the relative one. The remainder of this
paper will consider only the Euclidean measure.


6.2   Synthetic Control

The next step is to introduce the matching strategy into the DiD. The main objective is
to ensure that schools which are close to the libraries are compared to similar schools that
are far from them. In order to achieve this, these schools were matched on the propensity
score. Figures 4 and 5 show that once the matching weights are introduced, the propensity
score calculated for the synthetic control group resembles the one of the treated schools
(according to the treatment deﬁnition). The purpose of this step is to ensure that by
matching the score, the covariates are matched as well.
   We can check the performance of the technique in Tables 10 and 11, for both discrete
speciﬁcation I and II respectively. For each distance deﬁnition, the tables present the dif-
ference for each match variable between treatment and control groups before (General)
and after (Matched) the matching as well as the percentage reduction on the standard-
ized bias (B.R.). Starts on the tables reﬂect the results of t-tests for equality of means
for each diﬀerence where the null hypothesis is that the diﬀerences are equal to 0. The
matched results appear balanced, and, giving that we are matching the outcome trend
before the intervention, the resulting synthetic control group trend closely resembles that
of the treatment. A graphic representation of this is presented in Figures B.5 and B.6. The
only one for which the technique does not look as successful is for speciﬁcation II, where


                                             20
the treatment seems to be following a quite diﬀerent trend.

                Figure 4: Propensity Score Matching at 2002: Discrete I




                Figure 5: Propensity Score Matching at 2002: Discrete II




   Apart from the quality of matching, Figures B.5 and B.6 also tell another story. It seems
that schools which are closer to the libraries have a decreasing trend compared to distant

                                            21
schools which are comparable in key covariates. This is reﬂected in the DiD estimates
in Figure 6. In contrast with Figure 3, almost all of the estimates are negative, and, for
years 2006 and 2007, some of them are signiﬁcant. In other words, after the libraries
were constructed, schools nearby, especially those which are very close to the libraries,
started to perform worse than similar ones not as close to the libraries. This means that
either the libraries and the urban development in their surroundings did decrease student
performance relative to their peers13 or that the identiﬁcation assumption is not as good
as desired.

                              Figure 6: Matching at 2002 Estimators




    As described before, Figure B.6 for the 1000 meter deﬁnition according to speciﬁcation
II shows that the declining trend for some of these schools started prior to the construction
of the libraries which was not fully controlled for by the matching. In order to assess
this, performance data was de-trended by school (see Equation 5). Figure 7 and Table 12
present the results of this approach. Estimated coeﬃcients are still negative but are not
diﬀerent from zero.
   13
      It might be that this schools did perform better, but not as much as to other schools in the city which
is the base of our standardization.


                                                     22
             Figure 7: Matching at 2002 Estimators with School-Speciﬁc Trends




6.3    Blinder-Oaxaca Decomposition

So far it seems that there is no signiﬁcant variation on the relationship between distance
to the libraries and average tests scores on mathematics, science and verbal sections.14 It
might be the case that the urban transformation was related to changes in inputs in the
quality of education production function. Table 13 studies this via the Oaxaca-Blinder
DiD decomposition proposed before, but we should bear in mind that the identiﬁcation
assumptions are stronger than in the simple DiD analysis. In most of the cases, it seems
that the diﬀerence between schools far and close to the libraries on test scores due to the
observed inputs is negative (∆X ). The direct impact of the libraries on test scores (∆0 ) is
around 0.1 and 0.2 standard deviations for schools located between 0 and 1.5 Km from the
libraries. As a reference, the diﬀerence between students with college graduated mothers
and the others in the same sample (3.5 Km at most for each library) is 0.6 standard
deviations. However, these results are not diﬀerent from 0.
  14
    Results for each one of these scores separately are not meaningfully diﬀerent from the ones presented
here



                                                   23
6.4      Summary

The fact that estimation procedures with diﬀerent sets of assumptions provide similar-
results gives us a good idea of the underlying relationship between the construction of
mega-libraries and quality of education: there is no evidence of a positive and statistical
signiﬁcant impact of the libraries on average standardized scores. We can interpret these
results in many diﬀerent ways. First, the fact that the numbers are positive but the
variance is large could be related to the small number of observations available (around
190 schools per year). If that is the case, any signiﬁcant positive relationship between
public libraries and schools’ scores, is likely to be small. This does not mean that the
libraries are useless for education: they could improve other skills that are not related
with tests scores but which are important for the society, such as the availability of safe
spaces and exposure to cultural activities. Current information makes it impossible to
test those alternatives. Second, the high variance could be due to the positive impact of
libraries only on those schools, students or teachers that decided to take advantage of the
libraries and zero impact on those that did not. Heterogeneous impacts are the rule, not
the exception, in the literature of educational inputs (Murnane and Ganimian, 2014).15
Without further information on the selection mechanism, it is impossible to determine the
impact only on those schools, students or teachers that are willing to take advantage of
the public infrastructure.
       In the case that some schools, students or teachers within similar distances to libraries
use the libraries facilities at diﬀerent rates, policy may not only be needed to construct and
run these public facilities but also to impose incentive schemes that induce to use them.
Glewwe and Kremer (2006) argue that the provision of resources is insuﬃcient to improve
student performance and the teachers should be instructed in order to maximize the po-
tential advantage of the resources. Moreover, using the theoretical framework proposed by
Witte and Geys (2011), the provision of most public goods, in this case the libraries, need
two stages of policies: the ﬁrst one for the construction of the libraries, while the second
  15
    Murnane and Ganimian (2014) remark three cases: High- and low-education parents responded very
diﬀerently to initiatives to empower school councils in Niger (Beasley and Huillery, 2012); low- and high-
achieving students derived very diﬀerent beneﬁts from free textbooks in English in Kenya (Glewwe et al.,
2009); and rural girls did not proﬁt nearly as much as urban boys from the use of LEGO kits to teach
science in Peru (Beuermann et al., 2013)



                                                   24
should work on how these programmatic inputs are transformed into observed and desired
outputs of education. For instance, prizes for both teachers and students for projects that
involve the usage of these resources might be relevant.


7    Conclusions
We have analyzed the impact on the quality of education, measured by mathematics,
science and verbal SABER 11 scores, of the construction of two big, public libraries that
                                                               a, Colombia. To do so,
involved the transformation of low-income, urban areas in Bogot´
we measured how the construction of the libraries could change the test scores of nearby
schools, controlling for observable variables that are related to students’ performances. We
opted for a DiD approach to analyze the evolution of the relation of distance-to-library
and average test scores before and after the public libraries’ introduction at the school
level. This approach assumes that the eﬀect of the libraries decays with distance and that,
without the intervention, the relationship would have been unaltered over time. We also
propose and implement a decomposition of the eﬀect considering the potential variations
of traditional determinants of quality of education.
    The libraries analyzed are public, education-related infrastructure that is progressive
in a context of inequality in access to quality school education. Both libraries were built in
areas populated by the less well-oﬀ and where schools have relatively poor facilities. Thus,
the policy has the potential to boost the equality of opportunities in terms of quality of
education. However, our ﬁndings present non-statistically diﬀerent from zero impacts of
the libraries on the average standardized test scores. That is, there is no evidence that
schools close to the libraries are getting a clear advantage on test scores against those with
similar characteristics but for their location further from the new public infrastructure.
    It is important to remark that the results are correct only under the validity of the
assumptions deﬁned in the identiﬁcation strategy. In general, there are two main scenarios
in which the assumptions would be invalid. First, if it is the case that the intensity of the use
of libraries is unrelated to the distance from them. For instance, there could be a network
of teachers which take advantage of library facilities though their schools are not close to
the libraries. Another reason could be that the network of medium and small libraries
communicates perfectly with the more distant mega-libraries, thus there is not diﬀerence

                                               25
in access according to the distance. Second, it might be the case that schools close and
far from the libraries were aﬀected heterogeneously by other events which are not fully
captured by observed covariates. As an example, patterns of migration or criminality in
the zones that are near to the libraries which did not aﬀect cohort sizes, gender composition,
or any other observed inputs with respect to the other neighborhoods could explain those
results.
   These results do not necessarily mean that libraries do not improve the quality of
education. On one hand, libraries might be related to skills that are not directly reﬂected
in test scores or to these types of skills but for students in older stages of their lives,
such as college students. We are unable to assess these cases via the present methodology.
On the other hand, if a direct objective of these types of programs is to enhance test
scores, our results imply that the policies that introduced these public facilities should be
complemented with stronger programs which link and coordinate them with the already
existent educational institutions. The capacity to reach the target (school-students and
teachers) is an important part of the policy which might require more attention from local
governments. For instance, prizes for both teachers and students for projects that involve
the usage of these resources might be relevant.




                                             26
References
Abadie, A., Diamond, A. and Hainmueller, J. (2010), ‘Synthetic control methods for com-
  parative case studies: Estimating the eﬀect of californias tobacco control program’,
  Journal of the American Statistical Association 105(490).

Abadie, A. and Gardeazabal, J. (2003), ‘The economic costs of conﬂict: A case study of
  the basque country’, American economic review pp. 113–132.

Barrera-Osorio, F. and Linden, L. L. (2009), The use and misuse of computers in education:
  evidence from a randomized experiment in colombia, Technical Report 4836, World
  Bank.

Beasley, E. and Huillery, E. (2012), ‘Empowering parents in schools: What they can (not)
  do’, Cambridge, MA: Abdul Latif Jameel Poverty Action Lab (J-PAL) .

Besley, T. and Burgess, R. (2004), ‘Can labor regulation hinder economic performance?
  evidence from india’, The Quarterly Journal of Economics pp. 91–134.

Beuermann, D. W., Naslund-Hadley, E., Ruprah, I. J. and Thompson, J. (2013), ‘The
  pedagogy of science and environment: Experimental evidence from peru’, The Journal
  of Development Studies 49(5), 719–736.

 aez, N. A. D. and Buitrago, C. F. (2010), Ingresos en el sistema de identiﬁcaci´
B´                                                                              on de po-
                                                    en): Tres metodolog´
  tenciales beneﬁciarios de programas sociales (sisb´                                 on,
                                                                       ıas de imputaci´
                    ıa 006451, Departamento Nacional de Planeaci´
  Archivos de Econom´                                           on.

Blinder, A. (1973), ‘Wage discrimination: reduced form and structural estimates’, Journal
  of Human resources pp. 436–455.

Blomeyer, D., Coneus, K., Laucht, M. and Pfeiﬀer, F. (2009), ‘Initial risk matrix, home
  resources, ability development, and children’s achievement’, Journal of the European
  Economic Association 7(2-3), 638–648.

Blundell, R. and Dias, M. (2009), ‘Alternative approaches to evaluation in empirical mi-
  croeconomics’, Journal of Human Resources 44(3), 565–640.


                                           27
           ıa, L. (2011), Doble jornada escolar y calidad de la educaci´
Bonilla-Mej´                                                           on en colombia,
                                   ıa Regional 143, Banco de la Rep´
  Documento de Trabajo sobre Econom´                                              ıa
                                                                   ublica - Econom´
  Regional.

Borkum, E., He, F. and Linden, L. L. (2013), The eﬀects of school libraries on language
  skills: Evidence from a randomized controlled trial in india, IZA Discussion Papers 7267,
  Institute for the Study of Labor (IZA).

Fertig, M. and Schmidt, C. M. (2002), The Role of Background Factors for Reading Lit-
  eracy: Straight National Scores in the PISA 2000 Study, IZA Discussion Papers 545,
  Institute for the Study of Labor (IZA).

Fortin, N., Lemieux, T. and Firpo, S. (2011), ‘Decomposition methods in economics’,
  Handbook of labor economics 4, 1–102.

                      ıguez-Lesmes, P. A. (2014), Do colombian students underesti-
Gamboa, L. F. and Rodr´
  mate higher education returns?, Documentos de Trabajo 164, Universidad del Rosario -
                    ıa.
  Facultad de Econom´

                   ıguez-Acosta, M. and Garc´
Gamboa, L. F., Rodr´                        ıa-Suaza, A. (2010), Academic achievement
  in sciences: the role of preferences and educative assets, Documentos de Trabajo 78,
                                              ıa.
  Universidad del Rosario - Facultad de Econom´

                                                ısticas del plantel y calidad de la educaci´
Gaviria, A. and Barrientos, J. (2001), ‘Caracter´                                          on
          a’, Coyuntura Social 25, 81–98.
  en bogot´

Glewwe, P. and Kremer, M. (2006), ‘Schools, teachers, and education outcomes in devel-
  oping countries’, Handbook of the Economics of Education 2, 945–1017.

Glewwe, P., Kremer, M. and Moulin, S. (2009), ‘Many children left behind? textbooks and
  test scores in kenya’, American Economic Journal: Applied Economics 1(1), 112–135.

Glewwe, P., Kremer, M., Moulin, S. and Zitzewitz, E. (2004), ‘Retrospective vs. prospective
  analyses of school inputs: the case of ﬂip charts in kenya’, Journal of development
  Economics 74(1), 251–268.



                                             28
Hanushek, E. and Woßmann, L. (2007), The role of education quality in economic growth,
  Technical report, World Bank.

Heckman, J. J., Ichimura, H. and Todd, P. E. (1997), ‘Matching as an econometric evalua-
  tion estimator: Evidence from evaluating a job training programme’, Review of Economic
  Studies 64(4), 605–54.

Lance, K. (1994), ‘The impact of school library media centers on academic achievement.’,
  School Library Media Quarterly 22(3), 167–70.

Lance, K., Rodney, M. and Hamilton-Pennell, C. (2000), Measuring up to standards: The
  impact of school library programs & information literacy in pennsylvania schools., Tech-
  nical report, Pennsylvania State Dept. of Education, Oﬃce of Commonwealth Libraries.

Leuven, E. and Sianesi, B. (2014), ‘Psmatch2: Stata module to perform full mahalanobis
  and propensity score matching, common support graphing, and covariate imbalance
  testing’, Statistical Software Components .

Lokshin, M. (2006), ‘Semi-parametric diﬀerence-based estimation of partial linear regres-
  sion models’, Stata Journal 6(3), 377–383.

Lonsdale, M. (2003), Impact of school libraries on student achievement: A review of the
  research., Information Analyses 70, Australian Council for Educational Research, Victo-
  ria.

Murnane, R. J. and Ganimian, A. J. (2014), Improving educational outcomes in develop-
  ing countries: Lessons from rigorous evaluations, 20284, National Bureau of Economic
  Research.

Murnane, R., Maynard, R. and Ohls, J. (1981), ‘Home resources and children’s achieve-
  ment’, The Review of Economics and Statistics 63(3), 369–377.

˜
Nopo, H. (2008), ‘Matching as a tool to decompose wage gaps’, The review of economics
  and statistics 90(2), 290–299.

 un
N´                                                            ales colegios ofrecen mejor
   ˜ez, J., Steiner, R., Cadena, X. and Pardo, R. (2002), ‘¿cu´
         on en colombia?’, Archivos de Econom´
  educaci´                                   ıa 193.

                                           29
Oaxaca, R. (1973), ‘Male-female wage diﬀerentials in urban labor markets’, International
  economic review 14(3), 693–709.

Rodney, M., Lance, K., Hamilton-Pennell, C. and Center, M. (2002), Make the connection:
  Quality school library media programs impact academic achievement in Iowa, Mississippi
  Bend Area Education Agency.

Rosenbaum, P. R. and Rubin, D. B. (1983), ‘The central role of the propensity score in
  observational studies for causal eﬀects’, Biometrika 70(1), 41–55.

                                                 on, C. A. (2005), Evaluaci´
Sarmiento, A., Alonso, C. E., Duncan, G. and Garz´                         on de la
       on de los colegios en concesi´
  gesti´                                       a 2000-2003, DNP.
                                    on en Bogot´

Smith, E. (2001), Texas school libraries: Standards, resources, services, and students’
  performance, EGS Research & Consulting.

                                                             ublicas en colombia’, C´
Tolosa, L. R. T. (2012), ‘Breve historia de las bibliotecas p´                      odices
  8(1), 57–86.

Vegas, E. and Petrow, J. (2008), Raising student learning in Latin America: The challenge
  for the 21st century, World Bank Publications.

Williams, D., Wavell, C. and Coles, L. (2001), Impact of school library services on achieve-
  ment and learning, Technical report, Department for Education & Skills and Resources:
  The Council for Museums, Archives & Libraries.

Witte, K. D. and Geys, B. (2011), ‘Evaluating eﬃcient public good provision: theory and
  evidence from a generalised conditional eﬃciency model for public libraries’, Journal of
  Urban Economics 69(3), 319–327.

World Bank (2005), Mexico: Determinants of learning policy note, Report 31842-MX,
  World Bank, Washington D.C.

Yatchew, A. (1997), ‘An elementary estimator of the partial linear model’, Economics
  Letters 57(2), 135–143.




                                            30
A     A. Tables

                              Table 1: Travelling time to school

          Time                                   Freq.   Cum.
          Less than 10 min.                       51%      51%
          Between 10 and 20 min.                  26%     77%
          Between 20 y 30 min.                    23%     100%
          Source: DANE Population Census 2005




       Table 2: Average test score by institutional and environment characteristics

                                                         Year
                   2000     2001     2003       2004     2005     2006     2007     2008     Total
School day
Complete           0.040   -0.075    -0.017     -0.046    0.020    0.016    0.030    0.051    0.002
Morning           -0.065   -0.209    -0.193     -0.288   -0.251   -0.313   -0.394   -0.365   -0.263
Afternoon         -0.252   -0.451    -0.356     -0.362   -0.420   -0.474   -0.460   -0.475   -0.408
Total             -0.082   -0.234    -0.174     -0.217   -0.197   -0.235   -0.245   -0.234   -0.204
Type of school
Public            -0.139   -0.312    -0.256     -0.289   -0.339   -0.410   -0.432   -0.414   -0.328
Private           -0.034   -0.162    -0.097     -0.147   -0.062   -0.068   -0.073   -0.060   -0.088
Total             -0.082   -0.234    -0.174     -0.217   -0.197   -0.235   -0.245   -0.234   -0.204
 Source: Own calculations based on SABER 11 (include imputations).




                                                 31
            Table 3: Average test score by infrastructure and teaching force

                                                           Year
                          2000     2001    2003    2004    2005    2006       2007    2008    Total
Students
Less than 300              -0.26   -0.52   -0.44   -0.42   -0.41   -0.33      -0.41   -0.29   -0.38
Between 300-600            -0.22   -0.36   -0.15   -0.21   -0.13   -0.19      -0.26   -0.16   -0.21
Between 600-1000           -0.02   -0.15   -0.03   -0.14   -0.19   -0.17      -0.05   -0.14   -0.12
More than 1000              0.13   -0.01   -0.12   -0.09   -0.10   -0.16      -0.16   -0.18   -0.10
Total                      -0.06   -0.21   -0.15   -0.18   -0.19   -0.20      -0.20   -0.19   -0.17
Teacher-student ratio
Less than .03              -0.08   -0.57   -0.32   -0.29   -0.31   -0.34      -0.25   -0.28   -0.31
Between .03-.04            -0.06   -0.16    0.01   -0.19   -0.21   -0.25      -0.20   -0.27   -0.18
Between .04-.05            -0.11   -0.21   -0.00   -0.18   -0.00   -0.04      -0.17    0.01   -0.11
Between .05-.06             0.01   -0.10   -0.46   -0.03   -0.12   -0.19      -0.19   -0.27   -0.13
More than .06              -0.21   -0.45   -0.37   -0.35   -0.30   -0.34      -0.40   -0.36   -0.36
Total                      -0.08   -0.23   -0.17   -0.22   -0.20   -0.23      -0.25   -0.23   -0.20
Girls-students ratio
Less than 0.15              0.15    0.41    0.29    0.11    0.13    0.11       0.03   -0.03    0.15
Between 0.15-0.43           0.10   -0.06   -0.08   -0.11   -0.11   -0.15      -0.28   -0.15   -0.11
Between 0.43-0.48          -0.11   -0.29   -0.19   -0.19   -0.13   -0.18      -0.06   -0.20   -0.17
Between 0.48-0.52          -0.20   -0.40   -0.31   -0.30   -0.36   -0.37      -0.40   -0.33   -0.34
Between 0.52-0.85          -0.22   -0.36   -0.18   -0.39   -0.23   -0.28      -0.34   -0.42   -0.30
More than 0.85              0.24    0.26    0.33    0.13    0.09    0.01       0.03    0.16    0.16
Total                      -0.08   -0.23   -0.17   -0.22   -0.20   -0.23      -0.25   -0.23   -0.20
Basic level teachers
Less than .25              -0.03   -0.20   -0.11   -0.16   -0.19   -0.22      -0.20   -0.21   -0.17
Between .25-.5             -0.26   -0.35   -0.43   -0.47   -0.13   -0.38      -0.36   -0.21   -0.33
Between .5-.75              0.21   -0.03           -0.22   -0.36                              -0.05
More than .75              -0.31   -0.83   -0.66   -0.52   -0.76   -0.59              -0.90   -0.61
Total                      -0.06   -0.22   -0.15   -0.20   -0.19   -0.23      -0.21   -0.21   -0.19
Highest Level teachers
Less than .25              -0.08   -0.25   -0.19   -0.24   -0.14   -0.18      -0.17   -0.17   -0.18
Between .25-.5             -0.16   -0.37   -0.15   -0.55   -0.17   -0.18      -0.81   -0.26   -0.30
Between .5 -.75            -0.12   -0.28   -0.19   -0.27   -0.33   -0.32      -0.41   -0.34   -0.29
More than .75              -0.04   -0.20   -0.08   -0.18   -0.11   -0.30      -0.24   -0.32   -0.19
Total                      -0.09   -0.26   -0.18   -0.24   -0.18   -0.21      -0.23   -0.22   -0.20
 Source: Own calculations based on C600 and SABER 11 (include imputations).




                                              32
                           Table 4: Average test score by distance

                                                         Years
Distance to library          2000-2002          2003-2005            2006-2008      Total
Less than 1000                   -0.141             -0.048               -0.150     -0.110
Between 1000-2500                -0.239             -0.306               -0.346     -0.306
More than 2500                   -0.112             -0.140               -0.177     -0.147
Total                            -0.161             -0.196               -0.238     -0.204
 Source: Own calculations based on SABER 11 (include imputations).




                                  Table 5: Schools by distance

                  Distance to the                Schools                 Used in
                  library (meters)                                    the models
                  0-500m                                5                       4
                  500m-1000m                           15                      11
                  1000m-1500m                          28                      27
                  1500m-2000m                          30                      24
                  2000m-2500m                          48                      40
                  2500m-3000m                          45                      39
                  3000m-3500m                          45                      38
                  3500m-4000m                          59                      49
                  Total                               275                     232
                   Soruce: Own calculations




                                 Table 6: Students by distance

                  Distance to the               Students                 Used in
                  library (meters)                                    the models
                  0-500m                              237                     115
                  500m-1000m                         2996                    2888
                  1000m-1500m                        5372                    5178
                  1500m-2000m                        5229                    4820
                  2000m-2500m                        6322                    5629
                  2500m-3000m                        7634                    7032
                  3000m-3500m                        6263                    6086
                  3500m-4000m                        7685                    7195
                  Total                             41738                   38943
                   Source: Own calculations




                                               33
               Table 7: Distribution by distances and school characteristics

                                                   Distance to the library
                                Between 0 and 1        Between 1 and 2       Between 2 and 4
                                            Km                     Km                    Km
                                             %                      %                     %
Type of School
Public                                     59.84                   58.96               50.80
Private                                    40.16                   41.04               49.20
Total                                       100                     100                 100
Post-graduated teachers
ratio
Less than 30%                              51.18                   61.32               63.19
Between 30% y 60%                          25.20                   16.98               19.93
More than 70%                              23.62                   21.70               16.88
Total                                       100                     100                 100
School day
Complete                                   31.50                   37.26               42.90
Morning                                    35.43                   28.07               24.64
Afternoon                                  33.07                   34.67               32.46
Total                                       100                     100                 100
Student-teacher ratio
Less than 20                               20.47                   21.70               23.84
Between 20 and 30                          59.84                   58.96               54.06
More than 30                               19.69                   19.34               22.10
Total                                       100                     100                 100
School size
More than 1000 students                    39.37                   50.47               27.90
Between 500 and 1000 students              37.01                   25.47               38.84
Less than 500 students                     23.62                   24.06               33.26
Total                                       100                     100                 100
Gender of the school
Boys or Girls school                          0                    11.79               11.67
Coeducational school                        100                    88.21               88.33
Total                                       100                     100                 100




                                            34
                        Table 8: DID Continuous Speciﬁcation: Exponential

                                                  Estimated values of δ τ from
                                           2008     τ
                                   Yit =              · 1(τ = t) + β1 At + ηXit + γi + γt + eit
                                           τ =2003 δ Ti

     Exponential Speciﬁcation: For a school of distance di from a library, Ti = R 1
                                                                                di − 1 if di ≤ R1 and Ti = 0 if di ≥ R1
Distance Def            2003              2004                2005               2006               2007               2008
R1=1500                  0.03            −0.02                 0.06                 0.05             0.04               0.06
                       (0.07)            (0.05)              (0.08)             (0.09)             (0.10)             (0.09)
R1=2000                  0.02            −0.01                 0.03                 0.03             0.02               0.04
                       (0.04)            (0.03)              (0.05)             (0.06)             (0.07)             (0.06)
R1=2500                  0.01            −0.01                 0.02                 0.02             0.01               0.03
                       (0.03)            (0.02)              (0.04)             (0.04)             (0.05)             (0.04)
R1=3000                  0.01            −0.00                 0.02                 0.01             0.01               0.03
                       (0.03)            (0.02)              (0.03)             (0.03)             (0.04)             (0.04)
R1=3500                  0.01            −0.00                 0.01                 0.01             0.01               0.02
                       (0.02)            (0.02)              (0.03)             (0.03)             (0.03)             (0.03)
 R2=3500. Standard errors clustered by locality in parentheses. Signiﬁcance level: * 90%, ** 95%, *** 99%.




                                             Table 9: DiD Discrete
                                                  Estimated values of δ τ from
                                           2008     τ
                                   Yit =              · 1(τ = t) + β1 At + ηXit + γi + γt + eit
                                           τ =2003 δ Ti
   A. Speciﬁcation I: Schools between 0 and R1 meters are treated, Ti = 1, and from R1 to R2 meters are controls, Ti = 0
Distance Def             2003              2004               2005               2006             2007                2008
R1=750                    0.04              0.02               0.21               0.12             0.10                0.15
                        (0.19)            (0.17)             (0.22)             (0.26)           (0.28)              (0.26)
R1=1000                   0.10              0.04             −0.02                0.04           −0.03                 0.02
                        (0.13)            (0.11)             (0.14)             (0.16)           (0.16)              (0.16)
R1=1250                   0.10              0.06             −0.02                0.03           −0.02                 0.04
                        (0.09)            (0.08)             (0.10)             (0.11)           (0.12)              (0.11)
R1=1500                   0.03              0.05             −0.03                0.04           −0.02                 0.05
                        (0.07)            (0.06)             (0.07)             (0.07)           (0.08)              (0.08)
R1=1750                   0.02            −0.00              −0.06              −0.00            −0.04                 0.03
                        (0.06)            (0.06)             (0.06)             (0.07)           (0.07)              (0.07)
R1=2000                   0.02            −0.01              −0.02              −0.05            −0.05                 0.02
                        (0.06)            (0.05)             (0.06)             (0.06)           (0.06)              (0.06)
   B. Speciﬁcation II: Schools between 0 and R1 meters are treated, Ti = 1, and from R3 to R2 meters are controls, Ti = 0
Distance Def              2003              2004                2005               2006               2007             2008
R1=750                     0.06              0.02                0.18               0.09               0.07             0.14
                         (0.19)            (0.17)              (0.22)             (0.26)             (0.28)           (0.26)
R1=1000                    0.10              0.03              −0.03                0.01             −0.05              0.02
                         (0.13)            (0.11)              (0.14)             (0.15)             (0.16)           (0.16)
R1=1250                    0.10              0.05              −0.03              −0.00              −0.04              0.03
                         (0.09)            (0.08)              (0.10)             (0.11)             (0.12)           (0.11)
R1=1500                    0.03              0.03              −0.03                0.01             −0.03              0.04
                         (0.07)            (0.06)              (0.07)             (0.07)             (0.08)           (0.08)
R1=1750                    0.02            −0.01               −0.05              −0.03              −0.05              0.03
                         (0.06)            (0.06)              (0.06)             (0.07)             (0.07)           (0.07)
R1=2000                    0.02            −0.01               −0.02              −0.05              −0.05              0.02
                         (0.06)            (0.05)              (0.06)             (0.06)             (0.06)           (0.06)
 R2=3500, R3=2000. Standard errors clustered by locality in parentheses. Signiﬁcance level: * 90%, ** 95%, *** 99%.




                                                            35
                                                        Table 10: Balance Status after Matching: Discrete I
     General: E [X |T ] − E [X |C ]
     Matched: E [X |T ] − E [X |SC ]
                                   [X |T ]− E [ X |C ]          E [X |T ]−E [X |SC ]
     Bias Reduction (BR): E1                           −         1
                                      2
                                          V [X | T ]V [X |C ]    2
                                                                   V   [X |T ]V [X |SC ]
     T: Treated, C: Control, SC: Synthetic Control (weighted control group)
                          Year                                                             R1=750                            R1=1000                        R1=1250
                       Variables                                            General          Matched    B.R.      General     Matched    B.R.       General   Matched      B.R.
     Proportion of teachers with graduate studies                           −0.05             −0.02      63.3    −0.01          0.01       48.7    −0.00       −0.01      −130.0
     Pupil-Teacher Ratio                                                     0.01 ∗            0.01      45.7     0.01          0.00       70.5      0.00      −0.00        31.6
     Female-Teacher ratio                                                   −0.07 ∗           −0.04      50.9    −0.03          0.00       89.2    −0.02        0.00        97.3
     11 Grade Girls-students ratio                                          −0.02             −0.03     −66.7     0.03          0.00       86.0      0.02       0.01        39.3
     11 Grade Students                                                      11.25            −25.70    −240.0     3.75        −10.38     −290.0     13.15      −4.20        68.3
     Total Students                                                         85.47           −243.67    −290.0    58.98        −57.09      −20.0    118.48     −46.13        60.9
     Girls-Students ratio                                                   −0.04             −0.03      16.2     0.02          0.01       52.0      0.01       0.02       −89.3
     Public School                                                           0.17              0.13      19.7     0.02         −0.00       95.5    −0.09        0.03        72.5
     School day: morning                                                    −0.04              0.04      −6.4    −0.01          0.12    −1400.0      0.07       0.04        49.6
     School day: complete                                                    0.19              0.09      57.2     0.04         −0.06      −41.0    −0.03       −0.01        73.5
     Built area per student                                                 −0.25             −0.27     −11.8    −0.20         −0.04       81.6      2.80 ∗∗    1.18        58.6
     Classrooms area per student                                             0.09             −0.10      −2.8     0.20          0.11       51.9      0.51       0.48        18.1
     Sports area per student                                                 0.77             −0.27      65.3     1.32          0.35       73.4      1.99 ∗∗    0.70        71.1
     Has a library (C100)                                                   −0.13 ∗            0.05      24.1    −0.11 ∗∗       0.05       13.2    −0.05        0.04       −45.7
     Avg Std Test Score: 2000                                               −0.07             −0.06      16.9    −0.11         −0.11        8.4    −0.10       −0.05        47.0
     Avg Std Test Score: 2001                                                0.22              0.05      75.0     0.18         −0.04       79.8      0.09      −0.04        53.6




36
     Avg Std Test Score: 2002                                                0.09              0.02      76.0     0.04         −0.02       37.3    −0.08       −0.01        85.1
                          Year                                                             R1=1500                           R1=1750                         R1=2000
                        Variables                                          General           Matched    B.R.      General     Matched    B.R.        General    Matched   B.R.
     Proportion of teachers with graduate studies                          0.02               0.00        83.7     0.04        0.00         95.6     0.05         0.00     93.0
     Pupil-Teacher Ratio                                                 −0.00               −0.00        42.8     0.00        0.00         43.7     0.00         0.00     21.4
     Girls-teacher ratio                                                 −0.01                0.00        72.5   −0.01         0.01         31.0   −0.02          0.00     88.2
     11 Grade Girls-students ratio                                         0.01              −0.01       −14.3   −0.00        −0.01      −110.0      0.00         0.00     −8.2
     11 Grade Students                                                    23.25 ∗∗           −0.83        96.7    21.97 ∗∗     0.55         97.5    23.11  ∗∗∗  −3.76      79.4
     Total Students                                                      234.79 ∗∗∗          30.50        87.8   191.24 ∗∗    28.81         84.3   208.51 ∗∗∗ −43.47       75.0
     Girls-students ratio                                                  0.00              −0.00       −73.6     0.00        0.00         76.8     0.01         0.01    −27.0
     Public School                                                       −0.04                0.00        96.6   −0.04        −0.01         70.4   −0.04        −0.01      80.8
     School day: morning                                                   0.02               0.03      −110.0     0.00       −0.00       −33.9      0.01         0.02    −52.1
     School day: complete                                                  0.00              −0.02     −4800.0     0.03       −0.01         58.9     0.01       −0.00      83.1
     Built area per student                                                1.01               0.65        42.1     0.56        0.74       −22.8      1.23         1.16     12.2
     Classrooms area per student                                           0.19               0.05        76.4     0.05        0.08       −46.3      0.25         0.09     68.9
     Sports area per student                                               0.89               0.17        82.4     0.53        0.64       −24.0      0.89         0.03     96.8
     Has a library (C100)                                                −0.01               −0.00        95.5   −0.00        −0.02     −19000.0     0.02       −0.00      58.4
     Avg Std Test Score: 2000                                            −0.10               −0.00        95.4   −0.09        −0.02         77.6   −0.00          0.00    −68.5
     Avg Std Test Score: 2001                                              0.06              −0.00        93.2     0.04       −0.02         21.2     0.09         0.02     79.3
     Avg Std Test Score: 2002                                            −0.09                0.01        90.5   −0.09         0.00         98.8   −0.01          0.01    −25.9
     Signiﬁcance level for t-tests for equality of means: * 90%, ** 95%, *** 99%
                                                       Table 11: Balance Status after Matching: Discrete II
     General: E [X |T ] − E [X |C ]
     Matched: E [X |T ] − E [X |SC ]
                                   [X |T ]− E [ X |C ]          E [X |T ]−E [X |SC ]
     Bias Reduction (BR): E1                           −         1
                                      2
                                          V [X | T ]V [X |C ]    2
                                                                   V   [X |T ]V [X |SC ]
     T: Treated, C: Control, SC: Synthetic Control (weighted control group)
                          Year                                                             R1=750                          R1=1000                        R1=1250
                       Variables                                            General          Matched    B.R.     General    Matched     B.R.      General    Matched    B.R.
     Proportion of teachers with graduate studies                          −0.03              −0.02      46.5      0.01       0.04      −320.0     0.02      −0.01       17.0
     Pupil-Teacher Ratio                                                     0.01 ∗            0.00      85.4      0.01       0.00        71.5     0.00        0.00      92.7
     Female-Teacher ratio                                                  −0.08 ∗            −0.02      75.9    −0.03       −0.00        96.2   −0.02       −0.00       92.1
     11 Grade Girls-students ratio                                         −0.02              −0.03    −130.0      0.03       0.03         8.7     0.02      −0.00       82.3
     11 Grade Students                                                      18.99            −13.30     −20.4     11.74      −4.50        42.8    19.55 ∗    −2.17       87.8
     Total Students                                                        156.17           −113.86     −17.0    128.49     −37.55        63.4   176.24 ∗ −20.45         87.3
     Girls-Students ratio                                                  −0.04              −0.02      35.9      0.02       0.03       −79.4     0.01      −0.00       82.2
     Public School                                                           0.14              0.20     −43.9      0.00      −0.00        46.0   −0.10         0.02      76.5
     School day: morning                                                   −0.03               0.07    −130.0    −0.00        0.10     −3900.0     0.06        0.04      43.0
     School day: complete                                                    0.19              0.05      74.6      0.04      −0.02        43.5   −0.02         0.03     −17.4
     Built area per student                                                  0.20             −2.08    −650.0      0.26       0.31        −9.3     2.82 ∗∗     0.96      66.9
     Classrooms area per student                                             0.18             −0.43    −100.0      0.27      −0.12        65.3     0.52        0.34      45.8
     Sports area per student                                                 1.05              0.34      64.5      1.51       0.07        95.7     2.02 ∗∗     0.37      85.3
     Has a library (C100)                                                  −0.12               0.04      21.3    −0.09        0.05        −9.5   −0.04       −0.02       31.1
     Avg Std Test Score: 2000                                              −0.06              −0.05      28.0    −0.10        0.02        84.2   −0.08       −0.02       68.0
     Avg Std Test Score: 2001                                                0.24              0.05      80.1      0.20       0.08        57.0     0.11      −0.04       56.3




37
     Avg Std Test Score: 2002                                                0.08             −0.01      88.2      0.03       0.12      −270.0   −0.07       −0.01       88.4
                          Year                                                             R1=1500                         R1=1750                         R1=2000
                        Variables                                          General           Matched    B.R.       General   Matched    B.R.       General    Matched   B.R.
     Proportion of teachers with graduate studies                          0.03               0.02        42.8     0.05        0.01      88.3      0.05         0.00     93.0
     Pupil-Teacher Ratio                                                 −0.00               −0.00       −36.2     0.00      −0.00        3.0      0.00         0.00     21.4
     Girls-teacher ratio                                                 −0.01                0.00        99.3   −0.01         0.01      17.9    −0.02          0.00     88.2
     11 Grade Girls-students ratio                                         0.01              −0.01        −9.0   −0.00       −0.01     −540.0      0.00         0.00     −8.2
     11 Grade Students                                                    26.19 ∗∗∗           6.26        74.4    24.34  ∗∗∗ −2.17       90.2     23.11  ∗∗∗  −3.76      79.4
     Total Students                                                      255.52 ∗∗∗          74.85        68.7   214.53 ∗∗∗    6.53      96.6    208.51 ∗∗∗ −43.47       75.0
     Girls-students ratio                                                  0.00              −0.00       −33.1     0.01      −0.00       49.3      0.01         0.01    −27.0
     Public School                                                       −0.05               −0.04        18.4   −0.04       −0.01       75.6    −0.04        −0.01      80.8
     School day: morning                                                   0.02               0.05      −210.0     0.01        0.03    −280.0      0.01         0.02    −52.1
     School day: complete                                                  0.01              −0.05      −870.0     0.03      −0.01       41.1      0.01       −0.00      83.1
     Built area per student                                                1.22               0.65        53.6     0.85        0.85      12.3      1.23         1.16     12.2
     Classrooms area per student                                           0.24               0.02        92.4     0.13        0.04      74.9      0.25         0.09     68.9
     Sports area per student                                               1.01               0.06        94.2     0.70        0.42      41.7      0.89         0.03     96.8
     Has a library (C100)                                                −0.00               −0.04     −2900.0     0.01      −0.01     −170.0      0.02       −0.00      58.4
     Avg Std Test Score: 2000                                            −0.08                0.02        62.7   −0.07       −0.04       25.2    −0.00          0.00    −68.5
     Avg Std Test Score: 2001                                              0.08               0.01        86.8     0.06      −0.03       39.4      0.09         0.02     79.3
     Avg Std Test Score: 2002                                            −0.08                0.03        47.4   −0.07       −0.03       42.5    −0.01          0.01    −25.9
     Signiﬁcance level for t-tests for equality of means: * 90%, ** 95%, *** 99%
          Table 12: DiD Discrete after Matching Including School-Speciﬁc Trends

                                                      Estimated values of δ τ from
                                      2008     τ
                              Yit =   τ =2003 δ Ti   · 1(τ = t) + β1 At + ηXit + γi + γt + ωi t · γi + eit
   A. Speciﬁcation I: Schools between 0 and R1 meters are treated, Ti = 1, and from R1 to R2 meters are controls, Ti = 0
Distance Def             2003              2004               2005               2006             2007                2008
R1=750                  −0.23             −0.22                0.02             −0.14            −0.29               −0.09
                        (0.15)            (0.25)             (0.24)             (0.38)           (0.51)              (0.61)
R1=1000                 −0.12             −0.16              −0.20              −0.24            −0.41               −0.24
                        (0.11)            (0.15)             (0.20)             (0.23)           (0.28)              (0.35)
R1=1250                 −0.08             −0.11              −0.22              −0.21            −0.33               −0.27
                        (0.11)            (0.16)             (0.22)             (0.26)           (0.29)              (0.35)
R1=1500                 −0.06             −0.04              −0.14              −0.10            −0.19               −0.13
                        (0.09)            (0.11)             (0.15)             (0.18)           (0.21)              (0.24)
R1=1750                 −0.06             −0.06              −0.16              −0.13            −0.19               −0.14
                        (0.09)            (0.11)             (0.14)             (0.17)           (0.19)              (0.23)
R1=2000                 −0.06             −0.04              −0.08              −0.12            −0.14               −0.05
                        (0.08)            (0.11)             (0.14)             (0.17)           (0.20)              (0.23)
   B. Speciﬁcation II: Schools between 0 and R1 meters are treated, Ti = 1, and from R3 to R2 meters are controls, Ti = 0
Distance Def              2003              2004                2005               2006               2007             2008
R1=750                   −0.27             −0.15                 0.01             −0.09              −0.24            −0.01
                         (0.20)            (0.32)              (0.29)             (0.46)             (0.59)           (0.71)
R1=1000                 −0.20∗             −0.24               −0.39              −0.45             −0.56∗            −0.42
                         (0.11)            (0.18)              (0.26)             (0.28)             (0.32)           (0.39)
R1=1250                  −0.05             −0.09               −0.25              −0.23              −0.35            −0.28
                         (0.13)            (0.17)              (0.23)             (0.27)             (0.30)           (0.34)
R1=1500                  −0.07             −0.04               −0.18              −0.14              −0.22            −0.14
                         (0.09)            (0.12)              (0.17)             (0.20)             (0.22)           (0.25)
R1=1750                  −0.07             −0.04               −0.13              −0.11              −0.18            −0.12
                         (0.09)            (0.11)              (0.15)             (0.18)             (0.21)           (0.24)
R1=2000                  −0.06             −0.04               −0.08              −0.12              −0.14            −0.05
                         (0.08)            (0.11)              (0.14)             (0.17)             (0.20)           (0.23)
 R2=3500, R3=2000. Standard errors clustered by locality in parentheses. Signiﬁcance level: * 90%, ** 95%, *** 99%.




                                                                38
                                        Table 13: BO-DD Discrete
Blinder-Oaxaca decomposition of the treatment eﬀect: delta = ∆0 + ∆x
δ : Total impact
∆X : Impact due to variation on covariates
∆0 : Impact due to other channels
          Treated/Controls          2003             2004             2005       2006         2007       2008
R1=750                 δ           0.2008           0.2252            0.4293     0.3343      0.1444     0.1726
10/182                           (0.3981)         (0.3644)          (0.4220)   (0.5306)    (0.4816)   (0.4339)
                     ∆0            0.0310           0.2346            0.5356     0.4517      0.3101     0.3223
                                 (0.4573)         (0.3555)          (0.4093)   (0.4972)    (0.4399)   (0.4144)
                     ∆X            0.1698         −0.0094           −0.1063    −0.1174     −0.1657    −0.1497
                                 (0.1710)         (0.0985)          (0.1055)   (0.1137)    (0.1078)   (0.1005)
R1=1000                δ           0.1626           0.1460            0.0851     0.1524    −0.0009       0.0465
19/173                           (0.2353)         (0.2138)          (0.2474)   (0.2810)    (0.2593)    (0.2562)
                     ∆0            0.1253           0.1787            0.2083     0.2776      0.1597      0.2296
                                 (0.2616)         (0.2075)          (0.2410)   (0.2625)    (0.2414)    (0.2469)
                     ∆X            0.0373         −0.0327           −0.1232    −0.1252     −0.1605    −0.1831∗
                                 (0.1308)         (0.1149)          (0.1120)   (0.1241)    (0.1073)    (0.1080)
R1=1250                δ           0.1308           0.1056            0.0286     0.0639    −0.0229       0.0463
27/165                           (0.1595)         (0.1489)          (0.1789)   (0.1891)    (0.1787)    (0.1774)
                     ∆0            0.1360           0.1890            0.1740     0.2745      0.2032      0.2613
                                 (0.1895)         (0.1644)          (0.1890)   (0.2016)    (0.1754)    (0.1850)
                     ∆X          −0.0051          −0.0833           −0.1455    −0.2107    −0.2262∗∗   −0.2149∗
                                 (0.1091)         (0.1158)          (0.1079)   (0.1331)    (0.1118)    (0.1173)
R1=1500                δ           0.0560           0.0840            0.0115     0.0790    −0.0202      0.0637
44/148                           (0.1194)         (0.1145)          (0.1239)   (0.1245)    (0.1430)   (0.1270)
                     ∆0            0.0535           0.1459            0.1349     0.2530      0.1201     0.1831
                                 (0.1390)         (0.1278)          (0.1330)   (0.1586)    (0.1466)   (0.1568)
                     ∆X            0.0026         −0.0619           −0.1235    −0.1740     −0.1403    −0.1193
                                 (0.0990)         (0.1230)          (0.1052)   (0.1341)    (0.1155)   (0.1303)
R1=1750                δ           0.0437           0.0320          −0.0172      0.0379    −0.0237      0.0574
52/140                           (0.1081)         (0.1109)          (0.1106)   (0.1213)    (0.1337)   (0.1144)
                     ∆0            0.0515           0.0611            0.0967     0.2114      0.1084     0.1779
                                 (0.1211)         (0.1157)          (0.1204)   (0.1469)    (0.1370)   (0.1436)
                     ∆X          −0.0078          −0.0291           −0.1139    −0.1735     −0.1321    −0.1206
                                 (0.0906)         (0.1131)          (0.0958)   (0.1302)    (0.1204)   (0.1273)
R1=2000                δ           0.0208         −0.0132           −0.0178    −0.0446     −0.0722      0.0125
70/122                           (0.1074)         (0.1079)          (0.1071)   (0.1175)    (0.1178)   (0.1107)
                     ∆0            0.0230           0.0657            0.0530     0.1115      0.0862     0.1253
                                 (0.1206)         (0.1033)          (0.1103)   (0.1387)    (0.1334)   (0.1350)
                     ∆X          −0.0021          −0.0789           −0.0708    −0.1561     −0.1584    −0.1128
                                 (0.0853)         (0.1138)          (0.0952)   (0.1292)    (0.1195)   (0.1343)
R2=3500. Clusters by locality standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1




                                                               39
B   Figures


              Figure B.1: Euclidean vs Road distances




                                40
Figure B.2: Distance and scores relationship




                    41
Figure B.3: Euclidean Distance Estimators with School-Speciﬁc Trends




                                42
Figure B.4: Road Distance Estimators




                43
Figure B.5: Matching test Scores Evolution : Discrete I




Figure B.6: Matching test Scores Evolution : Discrete II




                          44
C     Appendix: Oaxaca-Blinder and DiD
Here we propose a new identiﬁcation strategy that mix the advantages of BLinder Oaxaca
decomposition with the DiD speciﬁcation. The Blinder (1973) and Oaxaca (1973) proce-
dure allows to decompose the diﬀerence of a variable y between two groups, δ = E [y |T =
1] − E [y |T = 0], by the diﬀerence on observed characteristics x, ∆x , and a diﬀerence
that is not related to them ∆0 . Here we assume a linear relationship between observed
characteristics x and the outcome y which can be speciﬁc to the group T .

                            y = β0 + β1 x + β2 T + β3 T · x + e2

    If we impose E [e2 |T = 1] = E [e2 |T = 0], the diﬀerence δ can be expressed on terms of
the diﬀerence on x between the two groups and a remainder.

              δ = E [y |T = 1] − E [y |T = 0]
                  = [β0 + β2 + (β1 + β3 )E [x|T = 1]] − [β0 + β1 E [x|T = 0]]
                  = β2 + (β1 + β3 )E [x|T = 1] − β1 E [x|T = 0]
                  = {β2 + β3 E [x|T = 1]} + {β1 (E [x|T = 1] − E [x|T = 0])}
                  = {∆0 } + {∆x }

    We deﬁne ∆x = β1 (E [x|T = 1] − E [x|T = 0]), as the diﬀerence for being part of
T = 1 and not of T = 0 on x. The other term, ∆0 = β2 + β3 E [x|T = 1], reﬂects the
diﬀerence on y which is not explained due to the diﬀerence on x. In empirical labour
economics, these former term was usually interpreted as the ‘discrimination’ for being part
of T = 1. Under the framework of treatment eﬀects literature, where T is a treatment that
has a heterogeneous eﬀects according to x, so the ‘unexplained’ component is an average
treatment on the treated (Fortin et al., 2011).
    We propose a Diﬀerence-in-Diﬀerences (DiD) analogue of the decomposition, where we
can understand which part of the variation is explained by the impact on an observed
channel x. In the case of our program, we would like to understand which part of the eﬀect
is due to an enhancement of the results of schools via the increase on certain inputs, and
what is due to a general impact that is not related to them. To the best of our knowledge


                                             45
this is the ﬁrst paper that implements this decomposition.
   Let’s assume that we can observe two periods, A ∈ {0, 1}. Given it, we deﬁne the
average treatment on the treated estimator:

                     δ = (E [y |T = 1, A = 1] − E [y |T = 0, A = 1])
                         − (E [y |T = 1, A = 0] − E [y |T = 0, A = 0])

This is the classical DiD estimator under the usual parallel trends assumption. It could be
retrived by using the traditional speciﬁcation,

                             y = η0 + η1 T + η2 A + δT · A + ε

   Now, let’s assume that part of this impact is due to a variation on a particular variable
x that is aﬀected by the treatment. Our decomposition is able to decompose the treatment
eﬀect of T on Y between the impact on the observed channel, ∆x and the impact via other
channels, ∆0 . It can be implemented using the following linear equation:



       y = α0 + α1 T + α2 A + α3 x + α4 x · T + α5 x · A + α6 T · A + α7 x · T · A + u

Given that

   E [y |T = 0, A = 0] = α0 + α3 E [x|T = 0, A = 0]
   E [y |T = 1, A = 0] = α0 + α1 + (α3 + α4 )E [x|T = 1, A = 0]
   E [y |T = 0, A = 1] = α0 + α2 + (α3 + α5 )E [x|T = 0, A = 1]
   E [y |T = 1, A = 1] = α0 + α1 + α2 + α6 + (α3 + α4 + α5 + α7 )E [x|T = 1, A = 1]

The impact δ is decomposed between the variation on x that is correlated with the treat-




                                             46
ment implementation, ∆x , and the variation that is explained due to other channels, ∆0 .

δ = ((α0 + α1 + α2 + α6 + (α3 + α4 + α5 + α7 )E [x|T = 1, A = 1])
   − (α0 + α2 + (α3 + α5 )E [x|T = 1, A = 1]))
   − ((α0 + α1 + (α3 + α4 )E [x|T = 1, A = 0]) − (α0 + α3 E [x|T = 0, A = 0]))
δ = α6 + (α4 + α5 + α7 )E [x|T = 1, A = 1] − α5 E [x|T = 0, A = 1]) − α4 E [x|T = 1, A = 0]
   + α3 [(E [x|T = 1, A = 1] − E [x|T = 0, A = 1]) − (E [x|T = 1, A = 0] − E [x|T = 0, A = 0])]
   = ∆0 + ∆x

Hence, the impact on Y due to T that can be explained by the impact of T on X is:

∆x = α3 [(E [x|T = 1, A = 1]−E [x|T = 0, A = 1])−(E [x|T = 1, A = 0]−E [x|T = 0, A = 0])]

   And the remainder variation

∆0 = α6 + (α4 + α5 + α7 )E [x|T = 1, A = 1] − α5 E [x|T = 0, A = 1] − α4 E [x|T = 1, A = 0]




                                            47