Policy Research Working Paper                               11143




                      Luck of the Draw
     The Causal Effect of Physicians on Birth Outcomes

                            Christian Posso
                             Jorge Tamayo
                             Arlen Guarin
                           Estefania Saravia




Development Economics                          A verified reproducibility package for this paper is
Development Impact Group                       available at http://reproducibility.worldbank.org,

June 2025                                      click here for direct access.
Policy Research Working Paper 11143


  Abstract
 This paper estimates the effect on birth outcomes of a                                    vital statistics records, and records from physicians’ man-
 mother’s being treated by more-skilled versus less-skilled                                datory graduation exams. The findings show that mothers
 physicians, by exploiting a Colombian government program                                  treated at local health centers with more-skilled physicians
 that randomly assigned newly graduated physicians to local                                were 9.14 percent less likely to give birth to an unhealthy
 health centers. It estimates the impact on 255,089 children                               baby, potentially because the more-skilled physicians better
 whose mothers received care in the local health centers using                             targeted care toward more-vulnerable mothers.
 administrative data from the program, local health centers’




 This paper is a product of the Development Impact Group, Development Economics. It is part of a larger effort by the
 World Bank to provide open access to its research and make a contribution to development policy discussions around the
 world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be
 contacted at aguarin@worldbank.org. A verified reproducibility package for this paper is available at http://reproducibility.
 worldbank.org, click here for direct access.




                                                                                         RESEA
                                                                                    CY
                                                                               LI                R
                                                                                                 CH
                                                                          PO




                                                                    TRANSPARENT
                                                                                ANALYSIS
                                                                                                     S
                                                                           W




                                                                                                 R




                                                                               R                     E
                                                                            O




                                                                                    KI           P
                                                                                         NG PA




         The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
         issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
         names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
         of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
         its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                       Produced by the Research Support Team
                                        Luck of the Draw:
           The Causal Effect of Physicians on Birth Outcomes

       Christian Posso r , Jorge Tamayo r , Arlen Guarin r , Estefania Saravia r ∗




       Keywords: Physicians’ skills, birth outcomes, experimental evidence
       JEL Codes: H51, I14, I15, I18



   ∗
    The authors’ names are listed in random order. Posso: Banco de la República de Colombia (email:
cpossosu@banrep.gov.co); Tamayo: Harvard University, Harvard Business School, Digital Reskilling Lab (email:
jtamayo@hbs.edu); Guarin: Development Impact (DECDI), World Bank (email: aguarin@worldbank.org); Saravia:
University of California, Los Angeles (email: esaravia@ucla.edu). We are grateful to Achyuta Adhvaryu,
Maria Aristizabal, Carolina Arteaga, Guadalupe Bedoya, Francesco Bogliacino, Leonardo Bonilla, David Card,
Juan Esteban Carranza, Maíra Coube, Janet Currie, Kaveh Danesh, Margarita Gafaro, Robert Gonzalez, Marcus
Holmlund, Hilary Hoynes, Raymond Kluender, Rem Koning, Juliana Londoño-Vélez, Edward Miguel, Anant
Nyshadham, Paul Rodriguez, Daniel Rogger, Emmanuel Saez, Molly Schnell, Benjamin Scuderi, Jesse Shapiro,
Mauricio Villamizar, Christopher Walters, Danny Yagan, the editor and coeditor, four referees, and numerous
seminar participants for helpful comments and advice. We thank Manuela Cardona, Leidy Gomez, Silvia
Granados, Nicolas Mancera, Brayan Pineda, Daniel Marquez, Gabriel Suarez, Santiago Velasquez, and Carolina
Velez for excellent research assistance. Arlen gratefully acknowledges financial support from the University of
California, Berkeley, Opportunity Lab. We also thank the Colombian Ministry of Health, the Departamento
Administrativo Nacional de Estadística (DANE) and the Instituto Colombiano para la Evaluación de la Educación
(ICFES) for providing access to the data and for insightful discussions. The findings, interpretations, and
conclusions expressed in this paper are solely those of the authors and do not necessarily reflect the views of
the World Bank and its affiliated organizations, the Executive Directors of the World Bank, the governments they
represent, or of the Banco de la República and its Board of Directors.

                                                       1
    1    Introduction
    Inequality can originate as early as the prenatal period. These critical months shape children’s
    health at birth, which has been shown to predict future abilities and health trajectories beyond
    what genetics alone can explain (Almond et al., 2005; Black et al., 2007; Currie, 2011; Currie
    and Almond, 2011). While much research on the determinants of birth outcomes has focused
    on maternal health and families’ socioeconomic conditions (Currie, 2011; Currie and
    Schwandt, 2016b), recent evidence suggests that health care providers may also play a
    significant role (Okeke, 2023). This evidence shows that physicians have differential effects on
    babies compared to other practitioners with substantially less training. In turn, this evidence
    raises the question of whether physicians with similar training may nevertheless have
    differential effects on children’s birth outcomes because of differences in their level of medical
    skill.
          In this paper, we provide causal evidence of the role that skilled physicians play in
    children’s health at birth. Prior research has shown that physicians significantly impact
    patients’ health (Chan et al., 2022; Chen, 2021; Currie and MacLeod, 2017, 2020; Das and
    Hammer, 2005) and that poor health at birth has long-lasting adverse impacts on an
    individual’s future outcomes (and the outcomes of the next generation), including earnings,
    education, and disability (Almond et al., 2018; Currie, 2011; Persson and Rossin-Slater, 2018;
    ?). If more-skilled physicians have differential impacts on children’s health compared to
    less-skilled physicians, the health of babies and their future outcomes could both be boosted
    by policies aimed at better assigning or targeting these more-skilled physicians to populations
    with greater needs.
          The lack of causal evidence regarding the impact of skilled physicians on birth outcomes
    is not surprising because answering this question poses substantial empirical challenges: it
    requires both accounting for the selection bias associated with the match between physicians
    and hospitals (Doyle Jr et al., 2010) and overcoming the difficulty of obtaining reliable
    measures of physicians’ medical skills.1 We overcome this challenge by exploiting a policy
    experiment conducted in Colombia. In this national-level program, 2,126 recently graduated
    physicians were randomly assigned to 618 local health centers (LHCs) through a within-state
    random lottery. This feature of the program’s design enables us to bypass the selection bias
    issue and isolate the impact of physicians on children’s health at birth.2 As a proxy for their
    medical skills, we match these physicians with their scores on the health-specific modules of

1
  There is an extensive literature on positive assortative matching (PAM) that shows that companies and high-
  productivity workers match together (for example, Abowd et al., 1999; Becker, 1973; Kremer, 1993; Roy, 1951;
  Shimer and Smith, 2000; Woodcock, 2008).
2
  These LHCs are equipped to provide primary care, emergency care, and outpatient and inpatient care, including
  for childbirth. They are typically referred to as “hospitals” despite being smaller and having less capacity than
  hospitals in more developed urban settings.


                                                         2
the mandatory exams they took just before graduating from college.
      Several features of our context are conducive to accomplishing our study’s goals. First,
Colombian regulations mandate that medical school graduates dedicate the first year of their
careers to the national Mandatory Social Service (Servicio Social Obligatorio, or SSO) program.
This program randomly assigns new physicians to LHCs in the state where they apply.
Because LHCs and physicians are assigned to one another without regard to their
characteristics, physicians with different levels of skill encounter similar facilities,
administrative resources, and health staff. By comparing birth outcomes across LHCs, we can
estimate the causal effect of a mother’s being treated at an LHC that was randomly assigned a
more-skilled cohort of SSO physicians on children’s health at birth.
      Second, we combine several rich and granular administrative records in Colombia, which
allow us to observe the LHCs where physicians were assigned, obtain a proxy for their skills,
and measure LHC outcomes as performance measures. Specifically, we collect data from the
reports published by Colombia’s Ministry of Health after the SSO lottery draws that took place
between 2013 and the third quarter of 2014. Further, we use individual records from mandatory
college graduation exams to identify more-skilled physicians. Finally, we link the LHC to which
physicians were randomly assigned to the national vital statistics records (VSRs), from which
we obtain birth outcomes and maternal sociodemographic characteristics.
      The random assignment of physicians to LHCs allows us to satisfy the identification
assumption that the cohorts of SSO physicians assigned to LHCs are mean independent of
unobservable variables associated with the LHCs. In our setting, some mothers were exposed
to multiple cohorts of physicians during their pregnancies. To isolate the causal variation
associated with the random assignment, we estimate an instrumental variable (IV) model. In
this model, we use the skill level of the first SSO cohort to which a mother was exposed during
her pregnancy as an instrument for the average skill level of all SSO cohorts she was exposed
to over the course of her pregnancy. The key identifying assumption behind our IV approach
is that, conditional on the design fixed effects of the first cohort, the average graduation exam
score of the first physicians’ cohort predicts the average exam score of all physician cohorts to
which the mother was exposed and affects birth outcomes only through this channel. To make
the interpretation straightforward, all the results are expressed in standard deviations of the
skill measure.
      Our local average treatment effect (LATE) estimates indicate that more-skilled physicians
improve birth outcomes. We find that mothers who were treated at an LHC that had been
randomly assigned a cohort of SSO physicians whose graduation exam scores were one standard
deviation higher were 9.14 percent less likely to give birth to an unhealthy baby. We define a baby
as unhealthy if at least one of the following three conditions is satisfied: its birth weight is low
(below 2,500 grams), it is born prematurely (before 37 weeks of gestation), or its Apgar score



                                                 3
    is low (below 7).3 The effect of treatment by more-skilled physicians is consistent across each
    of these measures of health at birth: we find a 9.57 percent decrease in the probability that an
    infant has a low birth weight, a 10.99 percent decrease in the probability that an infant is born
    prematurely, and an 11.56 percent decrease in the probability that an infant has a low Apgar
    score.4 Our findings are consistent with evidence from related studies showing that variations
    in the quality or availability of health care providers significantly impact patient outcomes. For
    example, Chen (2021) finds that shared work experience among physicians reduces mortality
    rates, and Currie and Gruber (1996) show that increased access to Medicaid for pregnant women
    improves infant health outcomes.
         To assess the internal validity of our identification strategy, we implement two tests. First,
    we assign a placebo treatment to babies born before the arrival of the SSO cohorts in our
    sample. The random assignments that we use in our main specification took place in 2013 and
    2014. We run placebo tests similar to our main specification but using outcomes for children
    born in the same LHCs from 2009–2012, the four years prior to the random assignment. We
    find that the treatment generates precisely estimated zeros. Second, we show evidence of the
    actual randomness of the assignment by testing for any correlation between physicians’ skill
    levels and LHC, municipal, and demographic characteristics using balance tests on
    pretreatment (2010–2012) and concurrent predefined characteristics during the SSO cohort’s
    assignment.
         We recognize that focusing solely on graduation exam scores as a proxy for physicians’
    level of medical skills may overstate their importance while understating the relevance of other
    correlated characteristics. We therefore take advantage of the random assignment to obtain an
    estimate of physicians’ relative value-added (Angrist et al., 2017; Chetty et al., 2014; Kane and
    Staiger, 2008). Following Jackson (2018) and Fletcher et al. (2014), our shrunken value-added
    result implies that assigning an LHC a cohort of physicians at the 75th percentile of the quality
    distribution, versus the 25th percentile, would decrease the likelihood of a baby being
    unhealthy by approximately 0.08 standard deviations. Using our unbiased value-added

3
  Low birth weight is one of the key measures of health at birth studied in the literature (Currie, 2011). Prematurity
  is highly correlated with low birth weight, mortality, and several health complications (Butler et al., 2007; Currie
  and Walker, 2011; Taylor et al., 2001; Veddovi et al., 2001). The Apgar score is also frequently used in the literature
  as an indicator of health at birth (Almond et al., 2010; Ehrenstein, 2009; Lin, 2009; Moore et al., 2014).
4
  Unfortunately, during our analysis period, we could not test the impact of physicians on mortality due to data
  issues. First, the variable indicating the number of weeks of gestation is missing from birth records for a significant
  portion of fetal and neonatal deaths. This omission prevents us from determining the gestation period’s start
  for these deaths, thereby hindering our ability to precisely identify exposure to physician cohorts, as we can for
  births. Additionally, fetal and neonatal records frequently lack information about the LHC and about mothers’
  and children’s covariates. Given these limitations, we conduct a cohort-level rather than a child-level exercise. We
  quantify the number of fetal and combined fetal plus neonatal deaths during the time a cohort was assigned to
  an LHC. This quantification disregards how long the gestation period was exposed to the cohort and is based on
  the data with all aforementioned limitations. While these results are expected to be subject to measurement error
  attenuation bias, we still observe a negative, albeit not statistically significant, point estimate, which aligns with our
  main results.


                                                             4
    estimates, we study the relationship between the physicians’ value-added and several
    observable characteristics, such as their scores on the health-specific modules of the
    mandatory graduation exam (our skill measure), proxies for the quality of the medical
    program they attended, family socioeconomic characteristics, and gender. The results suggest
    that the health-specific graduation exam scores are the variable with the highest power for
    predicting a physician’s skill level measured as relative value-added. In contrast, the other
    characteristics have no significant relationship to their value-added.
         How might more-skilled physicians contribute to improved birth outcomes? To shed light
    on potential mechanisms, we first analyze several heterogeneous effects across groups of
    mothers with different characteristics. Although the effects of being treated by more-skilled
    physicians are slightly more pronounced among first-time mothers, teenage mothers, mothers
    with low education, and single mothers, the differences between groups are not statistically
    significant.
         Furthermore, we examine heterogeneity across infants and LHCs. First, we estimate
    effects separately for male and female infants. It is commonly observed in the literature that
    male fetuses, as well as male infants, tend to be more susceptible to health shocks than females
    (Eriksson et al., 2010; Kraemer, 2000; Naeye et al., 1971; Pongou et al., 2017). To the extent that
    LHCs with more-skilled physicians improve children’s health at birth, they may help mitigate
    adverse shocks in utero. We find that the reduction in the probability of being born unhealthy
    is particularly pronounced for male infants, but the difference is not statistically significant
    between male and female infants. Second, we explore heterogeneity related to the proportion
    of SSO physicians within the LHC. We split the sample into LHCs with a high and low share of
    SSO physicians. While the point estimate is larger for LHCs with a higher share of SSO
    physicians, the difference between the two groups is not statistically significant.
         Having analyzed potential heterogeneous effects, we explore a mechanism through which
    physicians may improve health at birth: prenatal checkups. According to WHO (2016) and
    the Colombian government (Gomez et al., 2013), better and more frequent prenatal care can
    improve the health of mothers and their children.5 We follow the standard recommendations of
    the WHO (2016) in 2013, the first year of the records that we use, and define “adequate prenatal
    care” as having at least four checkups during pregnancy.6 We find that more-skilled physicians,
    on average, do not schedule more prenatal checkups than less-skilled physicians.7 This means

5
  Better and more frequent prenatal care improves maternal health because, during a prenatal checkup, pregnant
  women are screened and treated to avoid complications, preterm births, and other problems. Additionally,
  pregnant women are given critical information on nutrition, diet, and general safety practices, which has been
  shown to play a crucial role in in utero infant growth (Amarante et al., 2016; Kramer, 1987). Furthermore, in
  Colombia, the Ministry of Health requires that physicians carry out prenatal checkups (Gomez et al., 2013). As
  a result, physicians are responsible for prenatal care, and they are the professionals who attend 98 percent of
  deliveries.
6
  The data we have access to only record the number of prenatal checkups within ranges, preventing a more flexible
  use of this variable. In our sample, 87 percent of mothers have at least four checkups.
7
  Carrillo and Feres (2019) find no evidence of increase in prenatal care when physicians were replaced by nurses.
                                                        5
that more-skilled physicians do not improve health at birth by increasing the number of prenatal
checkups they offer.
      Without increasing the number of prenatal checkups, more-skilled physicians might
improve birth outcomes by better targeting these checkups. We therefore test whether
more-skilled physicians target prenatal checkups toward more-vulnerable mothers (measured
as those predicted to be more likely to give birth to an unhealthy baby) without compromising
the care of lower-risk mothers. We use several machine learning techniques to generate
predictions of the probability that a mother will give birth to an unhealthy baby on the basis of
a set of LHC and mother characteristics, such as indicators of first-time mothers or teenage
mothers, that are usually salient to physicians at the time of prenatal care. Regardless of the
predictive technique we use, the results show that lower-risk mothers are not significantly
more likely to have at least the suggested number of prenatal checkups if they see more-skilled
physicians. This is consistent with the idea that more-skilled physicians do not compromise
the care of lower-risk mothers. However, physicians do seem to target more prenatal checkups
toward more-vulnerable mothers. We likewise show that the effects on birth outcomes of
being treated at an LHC with more-skilled physicians are particularly pronounced among
mothers with an ex ante high predicted probability of giving birth to an unhealthy baby. Taken
together, these results are consistent with the account that physicians are time constrained and
cannot increase the average number of prenatal checkups for all mothers but do improve the
targeting of care toward more-vulnerable mothers without compromising the care of
lower-risk mothers.
      This paper contributes to the literature in several ways. First, our study contributes to the
experimental evidence on the effects of more-skilled physicians on health outcomes (Chan and
Chen, 2022; Currie and Zhang, 2023; Dahlstrand, 2021; Fadlon and Van Parys, 2020; Stoye,
2022). Our identification strategy and the availability of granular administrative records allow
us to measure the causal impact on health outcomes of being treated at an LHC that was
randomly assigned more-skilled physicians.             Previous studies have documented the
relationships between health outcomes and physicians’ diagnostic skills (Currie and MacLeod,
2020), physicians’ teams (Chen, 2021), health care access (Almond et al., 2010; Anderson et al.,
2014; Aron-Dine et al., 2015; Bardach et al., 2013; Finkelstein et al., 2012; Michalopoulos et al.,
2012), health care costs (Alsan et al., 2019; Clemens and Gottlieb, 2014; Molitor, 2018), the
quality of physicians’ academic institutions (Doyle Jr et al., 2010), physicians’ performance on
qualifying examinations (Carrera et al., 2018; Tamblyn et al., 2002; Wenghofer et al., 2009),
physicians’ competence (Das and Hammer, 2005, 2007; Das et al., 2008, 2016; Das and
Sohnesen, 2007; Leonard and Masatu, 2007; Leonard et al., 2007), physicians’ ability to
facilitate adherence to prescription medications (Iizuka, 2012; Simeonova et al., 2020),
physicians’ fees and payment for performance (Basinga et al., 2011; Ho and Pakes, 2014a,b),
general practitioners and specialists (Baicker and Chandra, 2004), and physicians’

                                                6
communication (Curtis et al., 2013).
     Second, our paper contributes to the broader literature on service providers’ value-added,
extending the framework typically applied to education into health care. Studies have shown
that effective service providers, such as teachers, can significantly impact outcomes in their
respective fields (Araujo et al., 2016; Chetty et al., 2011; Rivkin et al., 2005; Rockoff, 2004).
Similarly, we find substantial heterogeneity in physicians’ value-added, highlighting the
crucial role of physician quality in health outcomes. Furthermore, consistent with Davis et al.
(1995); Schnell and Currie (2018), who provide evidence on the significant link between
physicians’ education and their professional performance, our results show that physicians’
test scores on their graduation exams are strong predictors of their value-added. These
observable scores serve as practical tools with high predictive power for unobservable features
like physician value-added and can be effectively used to identify higher-performing
physicians.
     Finally, we contribute to the literature showing differential effects on the variation in the
health care personnel expertise. Previous papers have found wide variations in treatment rates
across LHCs due to allocative inefficiencies and variations in treatment expertise (Abaluck
et al., 2016; Chandra and Staiger, 2020; Currie and MacLeod, 2017). We benefit from recent
advances in machine learning techniques to show that more-skilled physicians target prenatal
consultations toward mothers with the highest risk of giving birth to an unhealthy baby. Our
results suggest observable risk factors receive more attention from more-skilled physicians.
This would suggest that taking physicians’ skills into account when assigning and matching
them to areas or populations of greatest need could yield positive social value by improving
health outcomes among vulnerable populations.
     The remainder of this paper is organized as follows: In section 2, we describe the
Colombian health system and the SSO program, the setting we exploit to identify parameters
of interest. Section 3 describes the rich administrative data we derive from physicians’
graduation exams and patients’ birth outcomes. In section 4, we introduce our empirical
strategy, show evidence for the randomness of physicians’ LHC assignments, and present our
main estimated effects. Section 5 presents our robustness checks. In section 6, we discuss the
frequency of prescribed prenatal checkups as a potential mechanism through which
more-skilled physicians impact health outcomes. We conclude in section 7.


2     Institutional Background and Experimental Setting

2.1   Institutional Background
According to the Political Constitution of Colombia of 1991, access to health services is an
individual basic right. The system is structured to promote equity in the distribution of

                                                7
    subsidies and access to health services (Law 100, Congress of Colombia, 1993). Law 100 of
    1993 introduced two types of health insurance: subsidized and contributive. The contributive
    regime covers formal employees (and their families) who contribute a fixed share of their
    employment income to the system. The subsidized regime covers poor household members
    who lack formal employment.8 By 2011, access to health care was close to universal; indeed,
    even among the poorest population, insurance coverage was at 87 percent, while in rural areas
    it was at about 88 percent (Páez et al., 2007).
         High levels of health care access are associated with greater use of reproductive health
    services, which is essential to reducing the risks associated with pregnancy and childbirth, as
    well as infant mortality (WHO, 2016). During our period of analysis, 87.7 percent of Colombian
    women received adequate prenatal care, defined by the WHO (2016) as having at least four
    prenatal checkups. Likewise, 8.8 percent of infants had a low birth weight, and 9.3 percent were
    born prematurely. Still, the system faces important challenges. In 2017, according to the United
    Nations Statistics Division database, the neonatal mortality rate (deaths per 1,000 live births)
    was 7.8 and the infant mortality rate (infant deaths per 1,000 live births) was 12.2.9
         To become a physician in Colombia, one must be accepted into an undergraduate health
    program in medicine.10 Medical students earn a BA after five to six years of education.
    According to Colombian law, all professionals who graduate from health programs are social
    servants; as such, directly after graduation, they must work in urban and rural areas with
    limited access to health services for one year before practicing as professionals. This service is
    provided under the SSO program. The current SSO program was created by Law 1164/2007
    (Congress of Colombia, 2007), but it was only adopted in 2010 when its implementation was
    legislated by Resolution 1058/2010 (Ministry of Health, 2010). The main objective of the SSO
    program is to improve the quality of health services in depressed urban and rural areas, to
    increase access to health services in those areas, and to better distribute human talent in health
    throughout the country. The SSO program also promotes spaces for the personal and
    professional development of those beginning their careers in the health sector.11
         Physicians play a key role in maternal medicine in the Colombian health system. The
    Ministry of Health (2013), in Resolution 1441 of 2013, states that any physician in Colombia
    can perform low-complexity surgeries and procedures, including child delivery, cesarean
    sections, providing medical care to infants, and offering early detection activities like prenatal
    checkups. An important characteristic of the Colombian health system is that physicians
    always carry out prenatal checkups. According to the practical guide for preventing, detecting,

8
   Eligibility for the subsidized regime is defined by the household’s wealth score in the System of Identification
   of Potential Social Program Beneficiaries (Sistema de Identificación de Potenciales Beneficiarios de Programas
   Sociales, or SISBEN), which is used to target public program beneficiaries in Colombia.
 9
   https://data.un.org/, consulted in May 2020.
10
   Other health programs include nursing, bacteriology, and dentistry.
11
   See Resolution 1058/2010 (Ministry of Health, 2010).


                                                         8
     and treating pregnancy complications by the Colombian Ministry of Health (Gomez et al.,
     2013), prenatal checkups can be carried out by nurses specializing in maternal-perinatal care
     instead, but calculations from the VSRs show that physicians are responsible for all prenatal
     checkups and attend 98 percent of deliveries.12


     2.2    Experimental Setting: The SSO Program
     By 2007, as the number of people getting medical training in Colombia increased, there were
     fewer available positions for SSO physicians than there were applicants. Therefore, how
     applicants would be chosen and assigned to LHCs became one of the program’s most critical
     decisions. Law 1164/2007 (Congress of Colombia, 2007) required that LHC assignments were
     to be “guided by the principles of transparency and equal conditions for all applicants.” In
     concordance, Resolution 1058/2010 established that applicants must be selected and assigned
     to LHCs through state-level lottery draws.
          At the end of 2012, a more organized approach was introduced. The first two years of the
     new program had shown that the directions in Resolution 1058/2010 were not robust enough
     to guarantee that the assignment of physicians to LHCs would be transparent and organized.
     Consequently, Resolution 566/2012 (Ministry of Health, 2012b) mandated that there would be
     four state-level SSO lottery draws each year, starting in January 2013.13 Applicants would
     choose the state of their assignment but would be randomly assigned to available positions in
     that state. Resolution 4503/2012 (Ministry of Health, 2012a) also provided clearer and more
     organized guidance on how the lottery draws should be conducted. To prevent strategic
     application behavior and to take advantage of the fact that the number of newly graduated
     physicians was about twice the number of available positions, Resolution 4503/2012
     established that physicians could apply only to one state and only when the number of
     applicants for that state was lower than twice the number of available positions. This rule
     guaranteed an excess of demand for spots in each state and cohort.
          After the application process closed, each state publicly and randomly assigned its available
     spots according to the following steps: First, an oversight board consisting of one civil servant
     from the state secretariat of health and four health professionals was chosen. The civil servant
     then publicly announced the number of positions available and who had registered for each
     profession. At this point, she also stated the rules for the lotteries, which typically used ballots.
     If an applicant received a white ballot, they were exempt from the SSO program and received
     a certificate allowing them to work in Colombia as a professional (i.e., their medical license).
     Otherwise, they received a red ballot with the randomly assigned code of the LHC where they
     would work. If there were fewer applicants than positions available, all the applicants who had

12
     Nurses who have just graduated from college cannot perform prenatal examinations in Colombia.
13
     The lottery draws took place in January, April, July, and October in each of Colombia’s 32 states.


                                                             9
     registered were assigned to an LHC, but the specific LHC was still assigned through the lotteries.
     Finally, the civil servant of the secretariat of health prepared a report listing the SSO physicians
     and their assigned LHCs, as well as the applicants who were exempt from the SSO program.
          A physician’s social service at their assigned LHC typically began between one and three
     months after the lottery draw and lasted for 12 months. The starting date was defined before
     the random assignment and, therefore, was orthogonal to the physicians’ characteristics as
     well. If a physician refused to work in the LHC to which they were assigned or unilaterally
     quit before the official end of their service, they were given a six-month sanction, during which
     time they could not work as a health professional. After that period, they had to apply to the
     SSO program again. This sanction imposed strong costs for quitters and proved to be a good
     deterrent against dropping the program. This system for randomly assigning applicants to
     LHCs lasted for seven lottery draws.14 Since October 2014, a new centralized system that gives
     more weight to applicants’ stated preferences and a prioritized list has replaced the random
     assignment process.
          The Ministry of Health (1990, 2001) specifies that the responsibilities of physicians during
     their SSO service include the following:

        • Developing health prevention programs (such as vaccinations, family planning programs,
          prenatal controls, chronic diseases controls, and buccal and visual health programs)

        • Providing primary care and diagnosis

        • Assigning treatment and therapies

        • Creating and improving medical records

        • Making a health plan and epidemiological profile for the local community

        • Performing any other duty stated in their contract

     Moreover, LHCs explicitly mention attending and performing surgical procedures, including
     cesarean sections and child delivery, as part of the functions and activities of SSO physicians.15
         The period of time during which physicians were randomly assigned to LHCs is a
     convenient setting to estimate causal relationships that would otherwise be difficult to identify.
     The SSO assignment has implications for both the physicians who were selected randomly and
     the communities that were assigned physicians with different qualities. The latter set of
     implications is the focus of the present paper; the implications for physicians are studied in
     Guarin et al. (2023). In this paper, we use the exogenous rule of assignment to compare the

14
     This lottery system covered all four of the 2013 cohorts and the first three cohorts of 2014.
15
     We reviewed the manual of functions for five LHCs included in our sample. The reviewed institutions were LHC
     Salazar de Villeta, LHC Francisco Valderrama, Subred de Servicios de Salud sur, Red de servicios del primer nivel,
     and Guaviare.


                                                            10
     birth outcomes of patients in LHCs that were assigned physicians with different levels of
     medical skill but who are otherwise comparable.16


     3   Data
     We use five main sets of administrative data. The primary data set comes from the reports
     written and published by the Ministry of Health for each of the state-level SSO lottery draws,
     which were conducted in January, April, July, and October 2013 and January, April, and July 2014
     (Ministry of Health, 2014). From these data, we obtain individual identifications, the lottery
     draw date, the state to which each physician applied, whether the physician was selected by the
     lottery or not, and, importantly, the LHC to which each physician was randomly assigned and
     the proposed start date. For our period of analysis, 45 percent of the LHCs in the SSO program
     show up in only one lottery draw, while 29 percent of the LHCs appear in two lottery draws and
     26 percent of the LHCs appear in three to five lottery draws.
          The second administrative data set comes from the Colombian Institute for Educational
     Evaluation (Instituto Colombiano para la Evaluación de la Educación, or ICFES). ICFES is the
     institution that administers SABER PRO, the exam that all professionals, including physicians,
     must take before college graduation (Colombian Institute for Educational Evaluation, 2014).
     Using national ID numbers, we are able to link the physicians who participated in the SSO
     program to the ICFES records and recover information on their performance in SABER PRO.
     From SABER PRO, we glean data on physicians’ individual performance in two health-related
     modules, one that tests their knowledge of health care and another that tests their knowledge of
     disease prevention, as well as detailed sociodemographic information about each physician.17
          Our estimations use the scores in the two health-specific modules as proxies for
     physicians’ medical skills before the SSO program.18 The objective of the health-specific
     modules in SABER PRO is to measure the skills and knowledge of medical professionals.

16
   While service in the SSO is mandatory for health graduates in nursing, bacteriology, and dentistry as well as
   medicine, in this paper we focus on physicians for three reasons. First, the excess demand for the state-level lottery
   draws was mandatory for physician positions, creating suitable conditions for lotteries. Second, as previously
   mentioned, prenatal checkups in Colombia must be carried out by physicians (Gomez et al., 2013). Finally,
   physicians arguably make the greatest contribution to the health of the patient (Das and Hammer, 2005) and to
   birth outcomes in particular.
17
   We also recover data on physicians’ individual performance in two other modules: one that tests reading
   comprehension and another that tests quantitative reasoning. Graduation exam scores are only available for the
   newly appointed physicians (i.e., we do not have the exam scores of physicians who graduated before 2009).
18
   The correlation between a physician’s medical skills and their test performance has previously been documented
   in the literature. For example, Norcini et al. (2002) and Norcini et al. (2014) show a strong correlation between
   mortality and a physician’s certifying examination performance. Similarly, Tamblyn et al. (2002) find a relationship
   between examination scores and the primary care practice of doctors in Quebec. Wenghofer et al. (2009) find an
   association between medical examination scores and the quality of health care in Canada, while Tamblyn et al.
   (2007) find a relationship between physicians’ exam scores and patients’ complaints to the medical regulatory
   authorities.


                                                           11
     According to ICFES, the health care module assesses whether the physician has the
     competence to provide care that integrates both disease prevention and proper diagnosis with
     medical treatment and patient rehabilitation at all levels of complexity. The disease prevention
     module evaluates the physician’s competence to apply basic concepts of health promotion and
     disease prevention to prioritize actions according to individuals’ health conditions.
           ICFES ranks physicians according to four levels of quality. Physicians who score in the
     lowest level of the health care module only understand basic concepts and elements of
     epidemiology and public health. On the other hand, physicians who score in the highest level
     understand public health concepts (actions aimed at mitigating the health problems of
     communities), can assess patients’ health conditions, and can analyze social, cultural, and
     economic factors that may influence differences across patients’ health. Similarly, for the
     disease prevention module, physicians who score in the lowest level understand basic concepts
     of biosafety and occupational risk. Those who score in the highest level can analyze complex
     health situations in a given context and select appropriate actions following current
     regulations and standards in medicine. Because the SSO program is the physicians’ first real
     work experience, and because SABER PRO is taken just before graduation, we consider their
     scores a good measure of their medical skills at the time they start their SSO service and their
     professional career.19
           In Colombia, as in many other developing countries, there is high heterogeneity in the
     quality of medical education. In 2009, only 30 percent of medicine programs in Colombia had
     been accredited as high-quality programs by the Ministry of Education (Fernández Ávila
     et al., 2011). Figure 1 shows high heterogeneity in average scores on the health-specific SABER
     PRO modules between and within universities for the physicians in our sample.20 The figure
     shows the mean score for each university and an interval of one standard deviation to each
     side of the mean. Note that there is a difference of almost two standard deviations between the
     averages of the best and the worst programs. This high heterogeneity plays in our favor
     because it allows us to compare the outcomes of patients who were randomly exposed to
     physicians with very different baseline levels of knowledge and skills.21
           Using the scores and demographic characteristics from SABER PRO, Guarin et al. (2023)
     have shown that the SSO lotteries in our sample are well balanced between SSO physicians and
     those who were randomly exempted from participation in the SSO program. They use
     individual regressions correlating physicians’ characteristics and lottery status as well as
     machine learning techniques and a classification permutation test to provide evidence of the
     equality of multivariate distributions between the treatment and control groups

19
   Schnell and Currie (2018) provide evidence on the important link between physicians’ education and their
   professional performance.
20
   In Colombia, each university has no more than one medicine program.
21
   Similarly, figure A.4 shows substantial heterogeneity in scores on the quantitative and reading modules for the
   universities the physicians in our sample attended.

                                                        12
(Gagnon-Bartsch et al., 2019) and the randomness of selection into the program in general.
      The third administrative data set comes from VSRs collected by the Administrative
Department of Statistics (Departamento Administrativo Nacional de Estadística, or DANE)
(Administrative Department of Statistics, 2018b). The VSRs contain rich information for all
birth certificates filed in LHCs within Colombia’s 1,120 municipalities (subdivisions of the 32
states) from 1998 to 2018. Using LHCs’ identification codes, we are able to link physicians and
the birth records of the LHCs to which they were assigned. Using the birth date and number
of gestation weeks from the VSRs, we are able to identify children born between 2013 and 2016
who were exposed to each team of physicians. We also use the VSR data from 2009 to 2012 to
create mother- and LHC-level controls to provide evidence of covariate balance at the LHC
level and to run placebo tests.
      The fourth administrative data set comes from the 2005 National Census, also collected by
DANE (Administrative Department of Statistics, 2005). From the census, we get the population
and other variables at the municipality level that we use to test the randomization of the program
and as controls in the robustness checks.
      Finally, we collect information from the National Registry of Human Resources in Health
(Registro Único Nacional del Talento Humano en Salud, or ReTHUS). The Ministry of Health
designed ReTHUS through Law 1164 of 2007 (Congress of Colombia, 2007). ReTHUS registers
all individuals authorized to practice a health profession or occupation. These data contains
detailed information on the date of degrees, the date on which the medical license was granted,
and postgraduate degrees. We also collect additional data at the LHC level from the Colombian
Ministry of Health.




                                               13
           Figure 1: Heterogeneity in SABER PRO Scores between and within Medicine Programs




                      Note: This figure reports the health care and disease prevention module
                      scores on SABER PRO for the universities (Ministry of Education, 2019)
                      that the physicians in our sample attended. The data account for 44
                      different universities. The figure shows the mean score for each university
                      and an interval of one standard deviation. The dashed horizontal line
                      represents the overall median. The figure shows substantial heterogeneity
                      both within and between programs. For all the fields reported, there is
                      a difference of almost two standard deviations between the averages of
                      the best and the worst programs and a difference of almost one standard
                      deviation between the averages of the worst and the median program and
                      the averages of the median program and the best program.



     3.1     Main Sample
     As noted above, our cohorts of SSO physicians were chosen at random in state-level lottery
     draws conducted in January, April, July, and October 2013 and January, April, and July 2014. We
     exclude physicians assigned to metropolitan areas (MA) because the presence of larger hospitals
     and other LHCs may introduce selection biases that we do not expect in smaller municipalities.22
     Additionally, SSO physicians play a less pivotal role in metropolitan areas.23 Our sample of 598
     municipalities covers about 58 percent of the Colombian population.
          The main sample consists of all babies whose mothers were exposed to our randomly
     assigned cohorts of SSO physicians in non-metropolitan areas. Although regulations stipulate

22
   To determine which municipalities are not part of a metropolitan area, we restrict our sample to those outside the
   23 metropolitan areas defined by DANE, which bases its definition on population size and the degree of integration
   of urban centers with surrounding municipalities.
23
   In Colombia, patients are assigned to a nearby Level 1 LHC as their primary facility for basic care. Level 1 LHCs,
   which are typically staffed by SSO practitioners and offer basic health care services with low-complexity technology,
   often are the only facilities in smaller municipalities. In contrast, metropolitan areas have health centers and
   hospitals of all levels, allowing mothers to easily substitute among multiple providers. While national regulations
   specify that SSO physicians are responsible for maternal care, including family planning and prenatal checkups
   (Ministry of Health, 1990, 2001), SSO physicians may be less likely to perform prenatal care in metropolitan
   areas due to the presence of more experienced and specialized doctors. The SSO program’s objective is to
   provide professional services in mostly rural areas with limited access to health services (Ministry of Health, 2010,
   Resolution 1058/2010); accordingly, between 2013 and 2014, 77.3 percent of the available positions for assigned
   physicians were in small cities outside metropolitan areas.

                                                           14
     that SSO physicians should be the ones treating pregnant mothers in their assigned hospitals,
     we do not observe which physicians actually treated the mothers. Instead, we consider a
     mother to be exposed to an SSO cohort if her gestation period overlaps with the time of a
     cohort’s assignment to the LHC where she gave birth. This implies that a mother can be
     exposed to multiple cohorts; in fact, 50 percent of the babies in our sample are exposed to more
     than one cohort.
          As detailed in the empirical strategy section, our main variable of interest—which serves
     as a proxy for the level of skill of the physicians to which a baby was exposed—is calculated
     based on the graduation exam scores of the SSO cohorts to which mothers were exposed. For
     the 50 percent of cases in which a mother was exposed to only one cohort, we use the average
     of that cohort’s exam scores. For the other 50 percent of cases, in which a mother was exposed
     to more than one cohort, we compute a weighted average of the exam scores of the different
     cohorts, where the weight for each cohort is the number of overlapping days between her
     gestation period and the time of the cohort’s assignment to her LHC. Since there is usually
     only one LHC per municipality in our non-metropolitan areas main sample, mothers are not
     expected to be exposed to more than one LHC. We exclude from the analysis 53 babies for
     whom gestational age information is missing.
          Our main sample contains 255,089 babies and 2,126 physicians. For each baby, we observe
     the birth certificate, which includes information on low birth weight, Apgar score, weeks of
     gestation, prenatal checkups, and demographic information for the mother and the child. For
     each physician, we observe their scores on the SABER PRO health care, disease prevention,
     reading comprehension, and quantitative reasoning modules, as well as sociodemographic
     information they provided at the time of the graduation exam.
          Table 1 provides basic descriptive statistics for the main health outcomes we measure using
     data from the VSRs.24 It also shows how our main sample compares to the full sample of mothers
     and babies exposed to SSO physicians. The binary variable unhealthy takes a value of 1 if the
     infant has a birth weight below 2,500 grams, is born before 37 weeks of gestation, or has an
     Apgar score below 7. We use the variable unhealthy as our main measure of a newborn infant’s
     health at birth, while also analyzing birth weight, prematurity, and Apgar score individually.
          Columns 1 and 2 show the mean and standard deviation, respectively, for babies in LHCs
     to which at least one SSO physician was assigned (the full SSO sample); columns 3 and 4 show
     the same statistics when we constrain the sample to municipalities outside of the main
     metropolitan areas (the rural SSO sample, which is our main sample). The last two columns (3
     and 4) correspond to our main sample. In our main sample, 4.27 percent of infants had a low

24
     Unfortunately, we do not have continuous measures for birth weight or Apgar scores. However, we do have data on
     gestational weeks. To keep consistency across analyses, we have opted to use binary outcome variables throughout.
     That said, we also conducted analyses using gestational weeks as a continuous variable and explored alternative
     definitions of the binary variable for prematurity. The results are consistent with the results reported in the main
     analysis.

                                                             15
     birth weight, 4.11 percent were born prematurely, 3.75 percent had an Apgar score below 7,
     and 9.52 percent of newborn infants experienced at least one of these three medical conditions,
     meaning they were classified as unhealthy. The share of female infants is 48.84 percent.
     Moreover, 16.3 percent of the mothers had insufficient prenatal care, which is an indicator
     variable that takes the value of 1 if the mother received fewer than four prenatal checkups.
     Teenage pregnancy accounts for 28.46 percent of total births in the main sample. Finally, the
     average number of LHCs by municipality is around 1.2.

      Table 1: Descriptive Statistics for Mothers and Babies Exposed to SSO Physicians, 2013–2016
        Covariate                       Description                                   Full SSO            No MA SSO
                                                                                       sample               sample

                                                                                    Mean      SD        Mean       SD
                                                                                      (1)     (2)         (3)      (4)
        Low birth weight                1(Weight < 2500 g)                          0.0594 0.2364       0.0427 0.2022
        Prematurity                     1(Gestational weeks < 37)                   0.0615 0.2402       0.0411 0.1985
        Low Apgar score                 1(Apgar score < 7)                          0.0379 0.1911       0.0375 0.1900
        Unhealthy                       max (LBW.P remature.AP GAR)                 0.1175 0.3220       0.0952 0.2935
        Insufficient prenatal care      1(Prenatal checkups < 4)                    0.1755 0.3804       0.1630 0.3693
        Number of observations                                                          363,744              255,089
        Note: This table presents the mean and standard deviation (SD) for the main birth statistics of the mothers and babies
        affected by the SSO program. The data come from the 2013–2016 DANE VSRs, which collect information about all
        births and deaths in Colombia. The full SSO sample covers all the LHCs that had an SSO physician assigned to them
        in our sample, while the rural SSO sample, our main sample, is restricted to municipalities outside metropolitan
        areas. Low birth weight is the proportion of newborn infants whose birth weight was less than 2,500 grams. Prematurity
        is the proportion of newborn infants who were born after fewer than 37 weeks of gestation. Low Apgar score is the
        proportion of newborn infants whose Apgar score was lower than 7. Unhealthy, our main measure of health at birth, is
        the proportion of newborn infants with at least one of the three previous conditions. Female infants is the proportion
        of female infants. Insufficient prenatal care is the proportion of mothers who had fewer than four prenatal checkups.
        Teenage mothers is the proportion of mothers who were 19 years old or younger at the time they gave birth. Number of
        LHCs per municipality is the count of LHCs in the birthplace municipality.




     3.1.1   Municipalities

     As previously mentioned, we restrict our sample to municipalities in rural areas—outside of
     the main 23 Colombian metropolitan areas—where we expect fewer physicians per
     municipality. There are 598 municipalities included in our sample (see figure A.2). The
     median number of people living in each municipality is 14,049 (the mean is 22,042). The
     average share of people living with unsatisfied basic needs (UBN) is almost 50 percent,
     including some municipalities where the whole population lives with UBN.25 These figures
     indicate that SSO physicians provide their services in LHCs located in underserved areas.

25
     As a reference, the average share of people living with UBN for the 23 and 7 largest cities and their metropolitan
     areas is 21.5 percent and 17.4 percent, respectively.



                                                                 16
         We obtain the total number of physicians per municipality from ReTHUS.26 From the 598
     municipalities included in our sample, only 16 have more than one LHC per municipality. The
     median number of physicians per LHC is three, and around 94 percent of the LHCs have fewer
     than 20 physicians per LHC.27
         Most deliveries are attended by general practitioners and SSO physicians. In fact,
     approximately 90 percent (527 out of 582) of the municipalities with one LHC and available
     data on specialist availability do not have an obstetrician or gynecologist working in their
     LHCs at any time during our sample period. While this highlights the limited access to
     specialists in these areas, it is significant for our study, as SSO physicians play a crucial role in
     providing maternal health care in their LHCs.

     3.1.2   SSO Physicians

     As noted above, our main sample includes 2,126 physicians who were selected in one of seven
     lottery draws between 2013 and 2014. Table A.1 presents baseline summary statistics for the
     physicians in our sample. Nearly 56 percent of the physicians are women. While 29 percent of
     physicians lived in lower socioeconomic neighborhoods (strata 1 and 2), 36 percent lived in
     stratum 3, and 35 percent resided in higher-income neighborhoods (strata 4–6). Given that less
     than 10 percent of Colombians live in strata 4–6, this indicates that physicians in our sample
     generally come from households with significantly better economic conditions than the
     median Colombian. Physicians’ average household size is four people. Looking at the parents
     of SSO physicians, 64.4 percent (63.4 percent) of fathers (mothers) have completed tertiary
     education. Almost 45 percent of these households have a monthly income of less than three
     times the monthly minimum wage (22.9 percent earn less than two). Finally, the average score
     on the health care module for the physicians in our sample is 10.4, with a maximum of 13.9
     and a standard deviation of 1, and the average score on the disease prevention module for the
     physicians in our sample is 10.4, with a maximum of 13.4 and a standard deviation of 1.

     3.1.3   Compliance

     We use the ID numbers of all the physicians in the SSO program between 2013 and 2014 and
     merge them with ReTHUS to get the dates on which the physicians graduated and obtained their
     medical licenses.28 We define as compliers those physicians who obtained their licenses more
     than three months but less than two years after their graduation date. The share of compliers is

26
   Unfortunately, ReTHUS provides information at the municipality level, so we can only match SSO physicians, not
   every physician, to the LHC at which they work.
27
   Figure A.3 shows the distribution of physicians per municipality for the sample of 582 municipalities with one LHC
   per municipality.
28
   In addition to requiring physicians to receive their license between three months and two years after graduation,
   we limit the definition to those who do not appear in subsequent lottery draws within the same time frame.


                                                         17
     94 percent.


     4     Empirical Analysis
     The aim of our empirical analysis is to identify the impact of more-skilled physicians on birth
     outcomes. We estimate this impact on the 255,089 children whose mothers received care from
     2,126 physicians randomly allocated to 616 LHCs in rural areas in 2013 and 2014. As previously
     noted, the principal outcome that our empirical approach focuses on is an aggregate measure of
     health at birth that incorporates the three main indicators commonly studied in the literature:
     low birth weight, prematurity, and Apgar score. To proxy physicians’ level of medical skills,
     we focus on the average score of the two health-specific SABER PRO exam modules. We also
     provide robustness checks using the first principal component of the two health-specific scores–
     which statistically combines them into a single index capturing the largest shared variation–and
     we also consider each score individually.
          We first test the internal validity of our identification strategy. Next, we present our main
     results on birth outcomes. We also explore whether physicians’ effects are more pronounced
     among different subgroups. Finally, we compute a relative measure of value-added and regress
     it on several physician characteristics, including our measure of physicians’ skill levels.


     4.1    Empirical Strategy
     Our empirical strategy examines a health production function linking birth outcomes to
     physicians’ skills. Specifically, we consider the following linear model:

                                                    Yi = α + βZi,h,t + ϵi ,                                             (1)

     where Yi is the birth outcome of child i, Zi,h,t represents the weighted average skill level of the
     physicians’ cohort working at LHC h during the gestation period t of child i, and ϵi is the error
     term. Note that the analysis is conducted at the child level and that, while we denote the LHC
     by h and the gestation period by t, these depend on child i. 29
         In our setting, some mothers were exposed to more than one SSO cohort during their
     pregnancy. Estimating equation (1) directly using ordinary least squares (OLS) may result in
     biased estimates of β due to potential correlation between the assignment of physicians and
     unobserved characteristics of patients. To address this endogeneity, we leverage the random

29
     For simplicity, we denote the LHC by h, the gestation period by t, and the draw-by-state fixed effects by γd1
                                                                                                                   , omitting
     their dependence on child i. Formally, we would write h(i), t(i), and γd(i) but we use simplified notation to enhance
                                                                               1

     readability. Note also that the gestation period t represents the time interval corresponding to the pregnancy period
     of child i. Additionally, the draw-by-state fixed effect, γd
                                                                1
                                                                  , depends on i through the timing of the first SSO cohort
     the mother was exposed to during her pregnancy.


                                                               18
assignment of physicians to LHCs and employ an IV approach. To isolate the causal variation
associated with the random assignment, we use the skill level of the first SSO cohort a mother
was exposed to during her pregnancy, Zi,h,t
                                       1
                                            , as an instrument for the weighted average skill
level of all SSO cohorts she was exposed to over the course of her pregnancy, Zi,h,t . The
first-stage equation is

                                                  1
                                   Zi,h,t = η + πZi,h,t    1
                                                        + γd + νi,h,t ,                            (2)

where Zi,h,t is calculated as a weighted average of the graduation exam scores of the different
cohorts, with weights given by the number of overlapping days between the gestation period
and the period of each cohort’s assignment to the mother’s LHC, Zi,h,t   1
                                                                              is the skill level of the
first cohort of physicians the mother is exposed to, proxied by the average of their scores, γd   1
                                                                                                    are
draw-by-state fixed effects corresponding to the first cohort, and νi,h,t is the error term.
      The reduced-form equation is

                                                 1
                                      Yi = θ + ρZi,h,t    1
                                                       + γd + εi ,                                 (3)

      and the second-stage equation is

                                                ˆi,h,t + γd
                                     Yi = α + β Z         1
                                                            + ϵi .                                 (4)

In equation (4), β identifies the impact on the child’s health at birth of a mother’s being treated
at an LHC h that has been randomly assigned a more-skilled SSO cohort. Similarly, ρ in the
reduced-form equation (3) captures the overall effect of the skill level of the first SSO cohort on
birth outcomes. This parameter reflects the total impact of the mother’s initial exposure to more-
skilled physicians on birth outcomes, combining both the effect through the average physician
skill level during gestation and any direct effects mediated by the instrument.
      A key identifying assumption behind our IV approach is that conditional on the draw-by-
state fixed effects of the first SSO cohort, the skill level of the first cohort predicts the average
skill level of the physicians the mother was exposed to and affects birth outcomes only through
this channel.


4.2    Internal Validity
Our identification relies on the assumption that conditional on the design fixed effects, the
allocation of physicians to LHCs, h, is independent of potential outcomes, Yi . To assess the
internal validity of our identification strategy, we conduct two tests. First, we examine whether
any characteristics of the LHCs, municipalities, mothers, or children—including pre-treatment
LHC birth outcomes—are correlated with the skill level of the physicians who were randomly
assigned in 2013 and 2014. Second, we implement placebo tests by assigning a “placebo

                                                   19
     treatment” to births recorded in the VSRs during the four years prior to the program
     (2009—2012) instead of the years used in our main estimation sample (2013–2016).

     4.2.1   Balance Tests on Pretreatment and Concurrent Covariates

     To test for any correlation between physicians’ skill levels and LHC, municipal, and
     demographic characteristics, we conduct balance tests on two separate sets of variables:
     pretreatment characteristics measured from 2010 to 2012 and concurrent characteristics during
     the period of each SSO cohort’s assignment to the LHC.
          The pretreatment variables include LHC-level covariates, such as municipality population,
     number of LHCs in the municipality, average birth outcomes, and predetermined demographics
     of mothers (e.g., education, age, and marital status) and children (e.g., sex), for births occurring
     from 2010 to 2012 in those LHCs. The concurrent variables include municipality population,
     number of LHCs in the municipality, and predetermined demographics of mothers and children
     born during the period when the SSO cohorts were assigned to each LHC.
          For each set of variables, we estimate the following equation using OLS:

                                          Xh(j ),τ = µ + ϕZj + γd(j ) + ζh(j ),τ ,                                 (5)

     where Xh(j ),τ represents the LHC, municipal, or demographic characteristic for LHC h during
     the relevant variable-specific time interval τ (i.e., pretreatment or concurrent); Zj is the proxy
     for the medical skills of cohort j assigned to LHC h in lottery draw d, measured as the average
     of their health-specific graduation exam scores; γd(j ) are draw-by-state fixed effects; and ζh(j ),τ is
     the error term.
          Under our identification assumption, we expect that there should be no significant
     correlation between the proxy for physicians’ skill level and the baseline characteristics of
     LHCs, municipalities, and demographics. A lack of significant relationships in these balance
     tests would suggest that the random assignment of physicians is indeed independent of the
     pre-existing characteristics of the LHCs and the populations they serve.
          Table 2 presents the coefficients (ϕ) from estimating equation (5) using OLS, with
     standard errors clustered at the LHC level. The results show no significant correlation between
     physicians’ skill level and either the pretreatment or concurrent LHC, municipal, or
     demographic characteristics, supporting the assumption that physician assignment is
     independent of these baseline characteristics.30

30
     In table A.2, we replicate the analysis using the average of all four modules of the SABER PRO exam as a proxy for
     physicians’ skill levels, and we find consistent results.




                                                            20
                              Table 2: Covariate Balance at the LHC Level

                                       Covariate                      Coefficient             Standard
                                                                                                error
                        a. Pretreatment variables (2010–2012)
                        Unhealthy                                         0.00744              0.00700
                        Low birth weight                                  0.00071              0.00198
                        Prematurity                                       0.00017              0.00330
                        Low Apgar score                                  −0.00246               0.0037
                        Insufficient prenatal care                       −0.00255              0.00574
                        Female infants                                   −0.00201              0.00315
                        Mothers with basic education                     −0.00220              0.00631
                        Married mothers                                   0.00004              0.00510
                        Teenage mothers                                   0.00607              0.00415
                        Number of LHCs per municipality                  −0.01687              0.01677
                        Municipality population                          −888.07               1,848.66
                        b. Concurrent variables
                        Female infants                                    0.00075              0.00266
                        Mothers with basic education                      0.00181              0.00699
                        Married mothers                                   0.00095              0.00498
                        Teenage mothers                                  −0.00047              0.00358
                        Number of LHCs per municipality                  −0.00777              0.02058
                        Municipality population                           −93.30               2,411.90
                        Note: This table presents the results of different LHC-by-cohort level regressions
                        (equation 5) of the LHC-level variables, listed in the first column, on the measure
                        of physicians’ skill level and the draw-by-state fixed effects. The coefficient and
                        the standard error of the physicians’ skill variable are reported in the second and
                        third columns, respectively. Standard errors are clustered at the LHC level. LHCs’
                        characteristics in panel a come from the 2010–2012 DANE VSRs, using a total of 1,837
                        LHC-by-cohort observations. LHCs’ characteristics in panel b come from the 2013–
                        2015 DANE VSRs, using a total of 1,714 LHC-by-cohort observations.Unhealthy, our
                        main measure of health at birth, is the proportion of newborn infants with at least
                        one of the three following conditions: low birth weight, prematurity, or low Apgar
                        score. Low birth weight is the proportion of newborn infants whose birth weight was
                        less than 2,500 grams. Prematurity is the proportion of newborn infants who were
                        born after fewer than 37 weeks of gestation. Low Apgar score is the proportion of
                        newborn infants whose Apgar score was lower than 7. Insufficient prenatal care is the
                        proportion of mothers who had fewer than four prenatal checkups. Female infants
                        is the proportion of female infants. Mothers with basic education is the proportion
                        of mothers with at least secondary education at the time they gave birth. Married
                        mothers is the proportion of mothers that were married at the time they gave birth.
                        Teenage mothers is the proportion of mothers who were 19 years old or younger at
                        the time they gave birth. Number of LHCs per municipality is the count of LHCs in
                        the birthplace municipality. We interpret the non-significance of these estimates as
                        evidence in favor of the randomness of the assignment of physicians.




4.2.2   Placebo Tests

To further assess internal validity, we conduct placebo tests by applying our estimation strategy
to data from the four years preceding the SSO program (2009–2012) rather than to data from
2013–2016, the actual years of SSO physician assignments. Specifically, we shift the physicians’
arrival times four years earlier, simulating the same lottery draw dates, proposed start dates,
and LHC assignments as in the main analysis, but for the period before the program began. We
then estimate equation (4) (LATE) and (3) (reduced form) using the same outcomes and fixed
effects as in our main analysis.


                                                                21
          Since physicians in our sample did not treat children born in 2009, 2010, 2011, and 2012, we
     would expect null effects. Table 3 shows that the point estimates are not statistically significant
     for our main outcome measure, unhealthy, and for each of the other birth outcome measures
     (low birth weight, prematurity, and low Apgar score).31 Our results are robust to the use of the first
     principal component as a proxy for skill, as well as to the inclusion of a set of controls, such as
     ex ante LHC and mother characteristics as well as a vector of mother-child sociodemographic
     information (figure A.5 and table A.4).

                                                         Table 3: Placebo Test

                                                              Unhealthy         LBW         Prematurity       Low Apgar
                                                                               Average health scores
                                                                  (1)           (2)         (3)                    (4)
                                                                              a. Reduced-form estimates
                        Coefficient                            −0.0015        −0.0011         −0.0019           < 0.0001
                        SE                                     (0.0020)       (0.0010)        (0.0013)          (0.0014)
                        Relative effect                        −1.28%         −2.38%          −3.71%             0.04%
                                                                                   b. 2SLS estimates
                        Coefficient                            −0.0019        −0.0014         −0.0024           < 0.0001
                        SE                                     (0.0024)       (0.0013)        (0.0016)          (0.0017)
                        Relative effect                        −1.58%         −2.93%          −4.59%             0.05%
                        Average dependent variable               0.118          0.046         0.052               0.046
                        Number of observations                                          261,216
                        Note: This table presents a placebo test in which we estimate equations (3) and (4) but move the
                        arrival date of the physician back four years (2009–2012). The coefficients represent the effect of
                        being treated at an LHC that was randomly assigned SSO physicians whose skill level is higher
                        by one standard deviation. Relative (percent) effects are computed as the coefficient divided by
                        the average of the dependent variable. Unhealthy is a binary variable that takes a value of 1 if the
                        newborn infant has a birth weight below 2,500 grams, if the newborn infant is born after fewer
                        than 37 weeks of gestation, or if the Apgar score of the newborn infant is lower than 7 and zero
                        otherwise. LBW is a binary variable that takes a value of 1 if the newborn infant has a birth weight
                        below 2,500 grams and zero otherwise. Prematurity is a binary variable that takes a value of 1 if the
                        newborn infant is born after fewer than 37 weeks of gestation and zero otherwise. Low Apgar is a
                        binary variable that takes a value of 1 if the Apgar score of the newborn infant is lower than 7 and
                        zero otherwise. All regressions control for draw-by-state fixed effects. Numbers in parentheses
                        are LHC-level clustered standard errors. We read the results of this placebo test as additional
                        evidence in favor of the randomness of the assignment of the physicians to LHCs.
                        2SLS = two-stage least squares
                        * p < 0.1, ** p < 0.05, *** p < 0.01




     4.3    Impacts on Birth Outcomes
     In this section, we present our main results on the impact of physicians’ level of medical skill
     on birth outcomes. Figure 2 shows the first-stage estimate of π in equation (2). The average
     graduation exam score of the first SSO cohort is a relevant instrument for the weighted average
     graduation exam score of all SSO cohorts to which the mother was exposed during pregnancy.
     There is a positive and statistically significant relationship, with an estimated coefficient of

31
     In table A.3, we repeat the same exercise and present the results for windows 3.5, 3, 2.5, and 2 years before the start
     of the SSO program.

                                                                         22
0.823 (standard error of 0.0013). This strong correlation is expected because most mothers
were exposed to only one or two cohorts (50 percent were exposed to only one), so the
correlation is high, often close to one. Table 4 presents the reduced-form (panel a) and LATE
(panel b) estimates, while figure 3 displays the reduced form estimates graphically. We find a
substantial improvement in children’s health at birth when mothers are treated at an LHC
randomly assigned a more-skilled SSO cohort. In particular, our main skill measure has a
negative and significant effect on the unhealthy outcome measure and on each health outcome
measure individually. In table 4, both panel a and panel b present the coefficients with
standard errors in parentheses. Below the standard errors, the relative (percent) effect is
shown by dividing each coefficient by the mean of the dependent variable.

Figure 2: Average Exam Score of All SSO Cohorts by Average Exam Score of First SSO Cohort




                     Note: This figure presents a binned scatter plot of the average
                     graduation exam score of all SSO cohorts a mother was exposed
                     to during pregnancy against the first cohort’s average exam
                     score for the physicians in our sample. Regressions fit stands
                     for the first stage estimate (π ) presented in equation (2).
                     Regression controls for draw-by-state fixed effects. The number
                     in parentheses is the LHC-level-clustered standard error.


     In the IV estimates in column (1) of panel b in table 4, we observe a significant negative
relationship between the skill level of SSO physicians and the probability that a baby is born
unhealthy—a decrease of 0.87 percentage points. That is, if a mother is assigned a cohort of
SSO physicians whose scores in the health modules of the graduation exam were one standard
deviation higher, the probability that her baby is born unhealthy decreases by 9.14 percent.
Notably, in our context, an increase of one standard deviation is almost equivalent to moving
from having a physician from the bottom-ranked program to one from a median-ranked
program, or from having a physician from a median-ranked program to one from the
top-ranked program (see figure 1). In the education context, the teacher value-added

                                                   23
     literature (Chetty et al., 2014; Rothstein, 2017) has found that an increase in teacher quality of
     one standard deviation corresponded to an increase in students’ test scores of 0.19 standard
     deviations in math and 0.14 standard deviations in reading. Columns (2) to (4) of panel b
     examine each birth outcome measure individually. The point estimate indicates a decrease in
     the probability of low birth weight by 0.41 percentage points (9.57 percent), a decrease in the
     probability of premature birth by 0.45 percentage points (10.99 percent),32 and a decrease in
     the probability of a low Apgar score by 0.43 percentage points (11.56 percent).33
          Our results align with Amarante et al. (2016), who explore in utero exposure to a social
     assistance program in Uruguay to estimate its effects on birth outcomes. They find that
     participation in the program led to a “sizeable” (19–25 percent) reduction in the incidence of
     low birth weight. Similarly, Currie and Schwandt (2016a) find that fetal exposure to the toxic
     dust release during the collapse of the World Trade Center in New York City on 9/11
     negatively affected gestation length, prematurity, birth weight, and low birth weight. Barber
     and Gertler (2010) evaluate the impact of a cash transfer program in Mexico on birth weight
     and find a very large reduction in the incidence of low birth weight (44.5 percent lower among
     beneficiary mothers).

32
   These results are consistent with prior findings in the literature and in the Colombian context that prematurity is
   an important determinant of birth weight (Almond et al., 2005). We find a strong correlation between prematurity
   and low birth weight in Colombia. Figure A.1 shows a monotonic negative correlation between the probability of
   low birth weight and the number of gestational weeks for all births in Colombia between 2009 and 2012. The figure
   presents the local polynomial regression fit of the probability of low birth weight over the number of gestational
   weeks using all birth records in Colombia from 2009 to 2012.
33
   Colombia’s infant mortality rate is 6.7 percent in 2022, smaller than the average for middle-income countries and
   Latin America, but slightly larger than upper-middle-income countries. Due to substantial data limitations in
   mortality records, including over 30 percent of records that are missing information on the number of weeks of
   gestation as well as incomplete LHC data, we are compelled to conduct a cohort-level analysis instead of our
   preferred birth-level estimates. We compute cohort-level estimates of mortality in table A.6. As expected, the
   estimates subject to measurement error attenuation. The results indicate that more-skilled physicians have a
   negative effect on mortality, though it is not statistically significant, consistent with our main findings.




                                                          24
           Table 4: Main Estimates of the Effect of Physicians’ Skill Level on Birth
           Outcomes

                                                     Unhealthy          LBW         Prematurity        Low Apgar
                                                                        Average exam scores
                                                          (1)            (2)         (3)                    (4)
                                                                      a. Reduced-form estimates
               Coefficient                           −0.0072***       −0.0034*        −0.0037***        −0.0036**
               SE                                     (0.0022)        (0.0017)         (0.0014)         (0.0015)
               Relative effect                        −7.52%          −7.88%           −9.05%            −9.52%
                                                                          b. LATE estimates
               Coefficient                           −0.0087***      −0.0041**        −0.0045***        −0.0043**
               SE                                     (0.0026)       (0.0021)          (0.0017)         (0.0019)
               Relative effect                        −9.14%          −9.57%           −10.99%          −11.56%
               Average dependent variable               0.095           0.043          0.041               0.038
               Number of observations                                           255,089
               Notes: This table presents our main estimates from equations (3) and (4). The coefficients represent
               the effect of being treated at an LHC that was randomly assigned SSO physicians whose skill level is
               higher by one standard deviation. Relative (percent) effects are computed as the coefficient divided
               by the average of the dependent variable. First stage coefficient and standard error is shown in figure
               2. Unhealthy is a binary variable that takes a value of 1 if the newborn infant has a birth weight
               below 2,500 grams, if the newborn infant is born after fewer than 37 weeks of gestation, or if the
               Apgar score of the newborn infant is lower than 7 and zero otherwise. LBW is a binary variable that
               takes a value of 1 if the newborn infant has a birth weight below 2,500 grams and zero otherwise.
               Prematurity is a binary variable that takes a value of 1 if the newborn infant is born after fewer than
               37 weeks of gestation and zero otherwise. Low Apgar is a binary variable that takes a value of 1 if the
               Apgar score of the newborn infant is lower than 7 and zero otherwise. All regressions control for
               draw-by-state fixed effects. The numbers in parentheses are LHC-level-clustered standard errors.
               We interpret the high significance and consistency of these results across the different measures
               of birth outcomes as evidence of the important role that skilled physicians play in determining an
               infant’s health at birth.
               LATE = local average treatment effect
               * p < 0.1, ** p < 0.05, *** p < 0.01




4.4   Physicians’ Impacts across Subgroups
In this section, we explore whether the effects of skilled physicians on birth outcomes,
presented in the previous sections, are more pronounced among some subgroups. We focus
solely on the LATE estimates from equation (4), although the reduced-form equation yields
similar (rescaled) conclusions.        The economics literature has extensively explored
heterogeneous effects across different socioeconomic groups, using measures such as mother’s
education, age, marital status, and the sex of the infant (Almond and Mazumder, 2011;
Amarante et al., 2016; Currie and Schwandt, 2016a; Dinkelman, 2017; Eriksson et al., 2010;
Hoynes et al., 2011; Okeke and Abubakar, 2020; Persson and Rossin-Slater, 2018). Consistent
with these studies, our data include information from the VSRs on the infant’s sex and the
mother’s education, age, and marital status, as well as whether the mother is a first-time
mother. We find that the effect of being assigned to a more-skilled physician on our main birth
outcome measure, unhealthy, is slightly more pronounced among first-time mothers, teenage

                                                                25
Figure 3: Reduced-Form Estimates of the Effect of Physicians’ Skill Level on Birth Outcomes




          Note: This figure presents a binned scatter plot of our main birth outcome measures against the first-cohort
          average graduation exam score. Unhealthy is a binary variable that takes a value of 1 if the newborn infant
          has a birth weight below 2,500 grams, if the newborn infant is born after fewer than 37 weeks of gestation,
          or if the Apgar score of the newborn infant is lower than 7 and zero otherwise. Low birth weight is a binary
          variable that takes a value of 1 if the newborn infant has a birth weight below 2,500 grams and zero otherwise.
          Prematurity is a binary variable that takes a value of 1 if the newborn infant is born after fewer than 37
          weeks of gestation and zero otherwise. Low Apgar is a binary variable that takes a value of 1 if the Apgar
          score of the newborn infant is lower than 7 and zero otherwise. Regressions fit stands for the reduced-form
          estimates (ρ) presented in equation (3). All regressions control for draw-by-state fixed effects. The numbers
          in parentheses are LHC-level-clustered standard errors. Results are robust to the exclusion of outliers.




                                                               26
     mothers, mothers with low education, and single mothers (see table 5). While this pattern
     suggests that more vulnerable mothers may benefit somewhat more from more-skilled
     physicians, none of these differences across mothers’ characteristics were statistically
     significant. In section 6, we will revisit this heterogeneity across mothers’ characteristics while
     discussing potential mechanisms.

        Table 5: Heterogeneity of the Effects on Birth Outcomes across Subgroups of Mothers and
        Babies
                                                                                            Dependent variable: unhealthy
                                                   First-        Non-first-       Teenage        Non-teenage         Mothers         Mothers         Married           Single
                                                   time            time           mothers         mothers           with low        with high        mothers          mothers
                                                  mothers        mothers                                            education       education
                                                    (1)             (2)              (3)               (4)             (5)             (6)              (7)             (8)
           Coefficient                          −0.0110***       −0.0077***      −0.0127***       −0.0079***        −0.0101***      −0.0075**      −0.0077*** −0.0106***
           SE                                    (0.0037)         (0.0023)        (0.0035)         (0.0027)          (0.0028)       (0.0032)        (0.0027)   (0.0031)
           Relative effect                       −10.05%          −9.02%          −11.19%          −8.93%            −10.14%         −8.42%         −8.90%     −9.81%
           Average dependent variable              0.109             0.086          0.113             0.088            0.100            0.089         0.087            0.109
           Number of observations                 103,557           151,531        72,608            182,478          151,513          103,574       154,288          100,801
           Difference test (p-value)                         0.46                             0.28                              0.54                           0.47
           Note: This table presents the heterogeneity of our estimated results from equation (4) when we divide the sample by mothers’ characteristics and infants’ gender.
           The coefficients represent the effect of being assigned a physician whose skill level is higher by one standard deviation for each subgroup. Relative (percent)
           effects are computed as the coefficient divided by the average of the dependent variable. Unhealthy is a binary variable that takes a value of 1 if the newborn infant
           has a birth weight below 2,500 grams, if the newborn infant is born after fewer than 37 weeks of gestation, or if the Apgar score of the newborn infant is lower
           than 7 and zero otherwise. First-time refers to the group of of mothers who are giving birth to their first child, and non-first-time refers to the complementary
           group. A mother is a teenage mother if she is giving birth at age 19 or younger and a non-teenage mother otherwise. A mother is a married mother if she is married
           at the moment of giving birth and a single mother otherwise. All regressions control for draw-by-state fixed effects. The numbers in parentheses are LHC-level-
           clustered standard errors.
           * p < 0.1, ** p < 0.05, *** p < 0.01




          We also look at heterogeneity across infants’ and LHC characteristics. In table 6, columns
     1 and 2, we examine whether the treatment effects vary by the infant’s sex. It has been
     established that male fetuses are more vulnerable to health shocks than female fetuses
     (Almond and Mazumder, 2011; Currie and Schwandt, 2016a; Eriksson et al., 2010; Kraemer,
     2000; Naeye et al., 1971).34 It is possible that skilled physicians play an important role in
     mitigating negative shocks on more-vulnerable fetuses. Although the reduction in our
     measure of the number of unhealthy babies is particularly pronounced among male infants,
     we do not find any statistical difference between males and females.
          Finally, we examine heterogeneity associated with the share of physicians from the SSO
     program relative to a proxy for the entire physician workforce in their local settings.35 Although
     the Ministry of Health (1990, 2001) specifies that SSO physicians are responsible for maternal
     care, including family planning and prenatal checkups, if the randomly assigned physicians do
     not constitute the entire workforce at the LHCs, the coefficient in our main regression (equation

34
   In medicine and epidemiology, this phenomenon is known as “fragile males" (Cameron, 2004; Eriksson et al., 2010;
   Kraemer, 2000; Mathews et al., 2008; Mizuno, 2000).
35
   To calculate the share of physicians from the SSO program, we obtain the total number of physicians for each
   municipality using ReTHUS and PILA data (see section 3). While this share is calculated at the municipality level,
   it is equivalent to calculating at the LHC level for 97.3 percent of municipalities, as only 2.7 percent of municipalities
   in our sample have more than one LHC.


                                                                                           27
4) may show larger effects in LHCs with a greater share of physicians from the SSO program
(and hence greater exposure to the random assignment).
     To quantitatively test this idea, we implement two exercises. First, we estimate separately
for the subset of LHCs with a high and low share of physicians from the SSO program, where
high (low) is defined as those LHCs above (below) the 75th percentile of the distribution of the
shares. Table 6, columns 3 and 4, shows that, while the point estimate for LHCs with a higher
share of SSO physicians is larger, there is not a significant difference between the two groups.
In a second exercise, we re-estimate table 4 but add as a separate control the share of physicians
from the SSO program. Table A.7 shows that the results are quantitatively the same. The point
estimate’s lack of strong dependence on the share of SSO physicians may suggest that LHCs in
our sample adhere closely to the regulation recommending that SSO physicians take primary
responsibility for conducting prenatal care.

      Table 6: Heterogeneity of the Effects on Birth Outcomes across Subgroups of LHCs

                                                                   Dependent variable: unhealthy
                                                        Female            Male          Higher        Lower
                                                        infants          infants      share of SSO share of SSO
                                                                                       physicians   physicians
                                                           (1)             (2)            (3)          (4)
                 Coefficient                          −0.0072***      −0.0106***        −0.0145**           −0.0087***
                 SE                                    (0.0028)        (0.0029)         (0.0067)             (0.0027)
                 Relative effect                       −7.73%          −10.92%          −13.74%              −9.19%
                 Average dependent variable              0.093            0.097           0.106               0.095
                 Number of observations                 124,577          130,508          14,894             240,191
                 Difference test (p-value)                        0.40                               0.42
                 Note: This table presents the heterogeneity of our estimated results from equation (4) when we divide
                 the sample by LHC. The coefficients represent the effect of being assigned a physician whose skill
                 level is higher by one standard deviation for each subgroup. Relative (percent) effects are computed
                 as the coefficient divided by the average of the dependent variable. Unhealthy is a binary variable that
                 takes a value of 1 if the newborn infant has a birth weight below 2,500 grams, if the newborn infant
                 is born after fewer than 37 weeks of gestation, or if the Apgar score of the newborn infant is lower
                 than 7 and zero otherwise. An LHC with a higher incidence of unhealthy (lower incidence of unhealthy) is
                 an LHC above (below) the 75th percentile of the ex ante unhealthy proportion distribution. An LHC
                 with a higher share of SSO physicians (lower share of SSO physicians) is an LHC above (below) the 75th
                 percentile of the ex ante share of the SSO physicians proportion distribution in the SSO sample. All
                 regressions control for draw-by-state fixed effects. We interpret these results as evidence of a (weak)
                 significant difference between the effect of physicians in LHCs with a high and low incidence of poor
                 health. The numbers in parentheses are LHC-level-clustered standard errors.
                 * p < 0.1, ** p < 0.05, *** p < 0.01




4.5   Physicians’ Value-Added
While our empirical setting links birth outcomes to physicians’ skill levels, the estimated
coefficients should not be interpreted as the effect of exogenously increasing physicians’ skills
while keeping all else constant. Instead, we identify the effect of being treated at an LHC that
is randomly assigned a more-skilled cohort of SSO physicians compared to a less-skilled
cohort, including all the characteristics that may differ between these two groups of


                                                                   28
physicians. Our results can be informative for policy makers, as test scores are often an
observable proxy for skills. However, it is important to note that test scores may not capture all
the factors that influence clinical competence, so this measure may understate the role of
physicians’ skill levels in determining patient health outcomes. To estimate physicians’
broader contribution to children’s health at birth, we take advantage of the random
assignment of physicians to LHCs and compute a relative measure of value-added.
     Consider the model where child i’s potential birth outcome when assigned to a cohort of
SSO physicians j , denoted by Yj,i , can be written as the sum of two components:

                                           Yj,i = vj + αi ,                                      (6)

where vj is the average potential effect of physician j on child’s outcomes at birth and αi is the
child’s latent health at birth. Let Dj,i be a dummy variable indicating whether child i’s mother
was assigned to cohort of SSO physicians j . The observed birth outcome for child i can then be
expressed as

                                                     J
                                   Yi = Ys,i +               (Yj,i − Ys,i )Dj,i
                                                 j =1
                                                 J
                                                                                                 (7)
                                     = vs +              βj Dj,i + αi ,
                                              j =1


where vs represents the average potential outcome associated with a reference cohort of SSO
physicians, indexed by s, and the parameter βj measures the value-added of cohort j relative to
this reference cohort. In most settings, the match between physicians and patients is at risk of
being correlated with other patients’ unobserved characteristics, implying that the estimation
of equation (7) using OLS would result in potentially biased estimates of the physician’s value-
added.
     Now, consider the projection version of equation (7) but controlling by the draw-by-state
fixed effects γd :
                                                         J
                                     Yi = γd +                βj Dj,i + εi .                     (8)
                                                     j =1

Since, in our setting, several cohorts of SSO physicians applied to a specific state and were
randomly assigned to LHCs in that state, we have E [Dj,i εi |γd ] = 0 for all j = 1, ..., J , and OLS
estimates can identify the causal effect of being randomly assigned a more-skilled cohort of
SSO physicians on children’s outcomes. Due to the random nature of physicians’ assignment
to LHCs at the draw-by-state level, we estimate physicians’ relative value-added by first
running a regression of the unhealthy indicator on the draw-by-state fixed effects:



                                                         29
                                                    Yi = γd + ri .                                                (9)

                               ˆi from equation (9) and regress them on the J different assignment
     We then compute residuals r
     indicator dummies to recover the estimated physician effect:

                                                        J
                                                r
                                                ˆi =          βj Dj,i + ϵi ,                                    (10)
                                                       j =1

     where the β ˆj , estimated using OLS, is an unbiased estimate of physician j ’s effect on children’s
     health relative to the draw-by-state average. Since the outcome in equation (9) is the probability
     of being born unhealthy, a “smaller” value-added has a positive connotation.
          The empirical value-added literature typically shrinks the value-added estimates toward a
     common Bayesian prior (Herrmann et al., 2016). The benefit of the shrinkage procedure is to
     produce estimates of value-added for which the estimation error variance is reduced through
     the dependence on the stable prior. In practice, the prior is specified as the average
     value-added (Chetty et al., 2014; Kane et al., 2008).36 The weight applied to the prior for a
     cohort of physicians is an increasing function of the variance with which that value-added is
     estimated. The following formula describes the empirical shrinkage procedure estimated:

                                                      ˆj + (1 − aj )β
                                             ˆEB = aj β
                                             β                      ¯
                                              j



                                                                ˆ2
                                                                σ
                                                   aj =                 ,
                                                                   ˆj
                                                              ˆ2 + λ
                                                              σ
     where σ                                                                                      ˆj . Our
                                                             ˆ j is the squared standard error of β
            ˆ 2 is the estimated variance of value-added and λ
     shrunken value-added result implies that assigning a team of physicians at the 25th percentile of
     the skill distribution, compared to the 75th percentile, would increase the likelihood of a child’s
     being unhealthy by approximately 0.08 standard deviations.
          We then regress the unbiased value-added estimates on multiple physician characteristics,
     including average performance on the health modules of the graduation exam, to study which
     SSO physician characteristics correlate more with the estimated valued-added effects.37
     Columns (1) and (2) of table 7 show the results of regressing the mentioned physicians’
     estimated effects on different sets of physician characteristics. Columns (3) and (4) control for
     the additional observable characteristics (children and LHC) in equation (9) (e.g., LHC health
     indicators) to account for the quality of other, potentially longer-term-appointed physicians at
     the LHC. Table 7 shows that results are similar to the ones presented in columns (1) and (2).

36
   This benefit is particularly valuable in applications where we want an estimator that performs well on average
   (Angrist et al., 2017; Chetty et al., 2014; Harris and Sass, 2014; Kane et al., 2008), reducing mean squared error.
37
   Since we are working with cohorts of SSO physicians, these characteristics will be calculated as averages.

                                                              30
          We interpret the results from table 7 as evidence of the relevance of the health-specific
     graduation exam scores for predicting physicians’ skills. Column (1) shows that the
     graduation exam score is negatively and significantly correlated with the physicians’ relative
     value-added. An increase of one standard deviation in the exam score is associated with a
     0.0124 percentage point improvement in value-added.38
          However, the significant relationship between the scores and the relative value-added
     could be the result of the graduation exam score’s correlation with other physician
     characteristics, which could be more relevant and closely associated with the physicians’
     performance. To test this hypothesis, we regress the estimated relative value-added on the
     physicians’ exam scores and other characteristics that were observed at the same time as exam
     scores, including gender, family socioeconomic characteristics, and some proxies for the
     quality of the medicine program they attended. The results in column (2) show that, not only
     does the coefficient on test scores remain significant and statically similar to the one in column
     (1), but also, once we account for the exam score, none of the other observed physician
     characteristics have a significant correlation with the physicians’ performance.
          These two results highlight the relevance of the graduation exam scores as both a practical,
     observable tool and as an indicator with high predictive power. Finally, as expected, columns
     (3) and (4) indicate that the random assignment allows us to obtain similar results even when
     controlling for other physician characteristics in the value-added estimation.

38
     Note that this coefficient should be similar to the one estimated in table 4 but does not have to be the same; the
     regression in table 4 is at the child level, whereas the regression in table 6 is at the cohort level.




                                                            31
       Table 7: Physicians’ Observable Characteristics and Their Relative Value-Added

                                                                             Dependent variables
                                                     Value-added without controls             Value-added with controls
                                                        (1)           (2)                        (3)          (4)
           Average exam scores                       −0.0141**             −0.014**           −0.0152**         −0.0159***
                                                      (0.0063)             (0.0063)            (0.0061)          (0.0061)
           Female                                                          −0.0016                               −0.0036
                                                                            (0.009)                              (0.0084)
           Father with tertiary education                                  −0.0007                                0.0035
                                                                            (0.008)                              (0.0077)
           Mother with tertiary education                                  −0.0082                               −0.0122
                                                                           (0.0103)                              (0.0097)
           Father or the mother has a job                                   0.0025                                0.0075
                                                                           (0.0101)                              (0.0094)
           Top program                                                     −0.0119                               −0.0106
                                                                           (0.0154)                              (0.0139)
           Top income                                                       0.0059                                0.0116
                                                                           (0.0115)                              (0.0101)
           Public school                                                    0.0019                                0.0078
                                                                           (0.0141)                              (0.0132)
           Accredited program                                               0.0123                                0.0101
                                                                           (0.0093)                              (0.0088)
           Note: This table reports the results of regressing physicians’ estimated relative value-added on observable
           characteristics across 1,248 cohorts of physicians. Each column from (1) to (4) refers to a different regression.
           The regressors, listed in the first column, are expressed in relative terms with respect to the by-draw and by-
           state average. Column (1) includes only the average graduation exam score as a regressor. Column (2) includes
           other physician characteristics as well. Columns (3) and (4) present the results of analogous exercises where
           relative value-added is estimated as in equation (8) but also using the following observed child and mother
           characteristics as controls: an indicator variable for the sex of the infant; an indicator variable that takes a value
           of 1 if the mother has at least secondary education and zero otherwise; an indicator variable that takes a value
           of 1 if the mother is 19 years old or younger and zero otherwise; marital status; number of inhabitants in the
           municipality; number of LHCs per municipality; an indicator variable that equals 1 if the LHC is above the
           75th percentile of the low birth weight distribution for the country in 2010–2012, and 0 otherwise; an indicator
           variable that equals 1 if the LHC is above the 75th percentile of the prematurity distribution for the country in
           2010–2012, and 0 otherwise; and an indicator variable that equals 1 if the LHC is above the 75th percentile of the
           Apgar score distribution for the country in 2010–2012, and 0 otherwise. The numbers in parentheses are LHC-
           level-clustered standard errors. We interpret the results from this table as evidence of the distinctive relevance
           of the health-specific graduation exam scores in predicting physicians’ performance.
           * p < 0.1, ** p < 0.05, *** p < 0.01




5     Robustness Checks and Additional Exercises

5.1   Additional Controls
For robustness, we run additional specifications, adding different sets of controls. We show that
our results are robust to including ex ante LHC characteristics and a vector of sociodemographic
information about the mother and child. The estimated coefficients are stable with the inclusion
of controls. We report the results with and without controls in table A.9.




                                                                    32
     5.2    Alternative Definitions of Physicians’ Skill Levels
     First, we use a (standardized) principal component instead of the standardized average
     health-specific graduation exam scores as a proxy for physicians’ skill levels. In addition, we
     use the average of the four SABER PRO modules (health care, disease prevention, reading
     comprehension, and quantitative reasoning) and each individual test score as proxies for
     physicians’ skill levels before the SSO program. Figure A.8 (and table A.9) compares the
     estimated relative coefficient (dividing by the dependent variable mean), β , in equation (4)
     using the average (main specification) and the principal component of the graduation exam
     scores both with and without controls, while table A.5 presents the results using each
     individual test score. Our conclusions are similar across the different definitions of physicians’
     skill levels and with the inclusion of controls.


     5.3    Alternative Definitions of the Main Outcome
     We standardize, center, and aggregate the three main health outcomes (low birth weight,
     prematurity, and low Apgar score) using the inverse covariance index suggested by Anderson
     (2008) and repeat our main empirical analysis using the index as the dependent variable. In
     table A.10, we present the results using the covariance index and our main outcome, unhealthy
     (standardized), as dependent variables.39 As before, we see that our conclusions are similar
     regardless of the definition of the main outcome.


     5.4    Nonlinear Estimations
     The average prevalence of the outcomes considered is relatively low and around 4 percent.
     One concern is that a linear regression may not fit the data well. To address this concern, we
     estimate an analogous logit model based on equation (3) and compute the average marginal
     effect associated with being treated in an LHC assigned an SSO cohort whose skill level is one
     standard deviation higher. Table A.11 shows that the marginal effects (signs and magnitudes)
     are very similar to those estimated using a linear regression model.


     5.5    Impact across Distribution of Skills
     Finally, while OLS allows us to compute the average effect of physicians’ skill levels, it does not
     tell us much about the magnitude of this effect across the distribution of physicians’ skills. We
     divide the score into quartiles and estimate equation (4) using a set of dummy variables
     indicating the quartile of the score distribution to which physicians belong. The results are
     presented in table A.12. Columns (1) and (2) present the coefficients associated with the effect

39
     Note that the adjusted standardized coefficients (in standard deviations) are very similar for both specifications.


                                                             33
belonging to the second, third, and fourth quartiles of the distribution of the average of the
graduation exam scores and the first principal component, respectively, on our main birth
outcome measure, unhealthy, relative to the first quartile. Although we lack the power to find
statistically significant differences, we see that the point estimates are negative and
monotonically decreasing with respect to the quartile. This suggests potential gains are
associated with being assigned to more-skilled physicians across the whole distribution of
skills.
     Finally, in table A.14, we also interact the average score with the university’s (program’s)
average score to test whether top universities drive the estimated effect. We do not find evidence
that top-ranked universities drive the effects presented earlier.


6     Potential Mechanisms
Physicians differ systematically in the decisions they make when faced with similar cases (Chan
et al., 2022). Likewise, the previous literature has found differences in practice patterns and
identified how these practices affect health outcomes (Tsugawa et al., 2017). Some dimensions of
these practices, such as the quality of the medical advice doctors provide, are unobservable (Das
et al., 2008; Leonard and Masatu, 2007; Mullainathan and Obermeyer, 2022), whereas others,
such as the number of prenatal checkups they offer, are observable. In this section, we study
prenatal checkups as a potential mechanism for observed differences between more-skilled and
less-skilled physicians.


6.1   Prenatal Checkups
We first explore whether more-skilled physicians increase the number of prenatal checkups
that mothers have, as a mechanism to improve the quality of health care and birth outcomes.
Although most of the evidence from economics and medicine shows an important association
between prenatal care and both birth weight and prematurity, some disagreements persist
(Alexander and Korenbrot, 1995; Amarante et al., 2016; Barber and Gertler, 2010; Carrillo and
Feres, 2019; Conway and Deb, 2005; Currie and Grogger, 2002; Grossman and Joyce, 1990;
Kramer, 1987; McCormick and Siegel, 2001).
     According to the WHO (2016) and the Colombian government (Gomez et al., 2013),
prenatal care improves the health status of both mother and child. As noted above, in
Colombia, the Ministry of Health requires physicians to carry out prenatal monitoring (Gomez
et al., 2013). We follow the standard recommended by the WHO (2016) for our period of
analysis and measure “adequate prenatal care" as having at least four checkups during
pregnancy. We do not find evidence that more-skilled doctors reduce the probability that
mothers are scheduled for fewer than four prenatal checkups (see table A.8).

                                               34
          We expect that SSO physicians assigned to rural areas are time constrained, as they are
     usually the only physicians in those areas.40 Anecdotal evidence supports this argument: in
     various reports from Colombian medical associations, physicians describe their experience
     during the SSO year as characterized by an overwhelming workload and long working
     hours.41 In this setting, in which physicians are time constrained, it comes as no surprise that
     the overall likelihood that a mother has a sufficient number of prenatal checkups is not
     significantly affected by the skill level of the physicians. However, we might expect that
     more-skilled physicians could better target care, allocating resources more effectively to
     more-vulnerable mothers without compromising the care of lower-risk mothers.
          Therefore, using the graduation exam scores, we analyze whether more-skilled physicians
     target their prenatal checkups toward more-vulnerable mothers—–those who are more likely
     to give birth to unhealthy babies. Supporting this argument, one of the health-specific exam
     modules directly evaluates the physician’s skill to “analyze the personal, social, economic, and
     environmental determinants that influence the health status of the individual, family, and
     community, in order to prioritize actions to be taken.”
          Recent studies have focused on applying machine learning techniques to analyze
     physicians’ decision-making in diagnoses (Mullainathan and Obermeyer, 2022; Stern and
     Trajtenberg, 1998). Taking a similar approach, we conceptualize the likelihood that a baby is
     born unhealthy as a predictive problem, leveraging recent advancements42 in these techniques
     to generate two groups of predictions about the probability that a mother gives birth to an
     unhealthy baby, using a set of mother-LHC characteristics that are available to the physician at
     the time of prenatal care. Specifically, we incorporate in the prediction all the characteristics
     listed in tables 5 and 6. We apply algorithms that are commonly used in the machine learning
     literature: random forest and logistic regression models.43
          The sample is clustered into training and testing subsets of randomly selected LHCs using
     a K-means algorithm. We repeat this procedure—splitting the main sample using K-means—
     1,000 times. We run logit and random forest models on the training sets and use the models to
     predict the probability of giving birth to an unhealthy child for each testing subset.44
          We then divide the test sample into two groups: low and high predicted probability, defined

40
   The median number of physicians per LHC in these rural areas is three.
41
   See, for example, reports from the Colegio Médico Colombiano (2018) and the Universidad del Rosario (2015).
42
   Supervised machine learning seeks to solve the problem of prediction (Kleinberg et al., 2015). Athey and
   Imbens (2017) and Mullainathan and Spiess (2017) emphasize that machine learning is significantly better at
   making predictions, in part because it can use very flexible functional forms and fit complex data structures
   without imposing any specific restrictions in advance. According to Mullainathan and Spiess (2017), machine
   learning algorithms can do significantly better than traditional methods, even with moderate sample sizes and few
   covariates.
43
   These methods are able to handle many covariates, and they provide natural estimators of parameters when these
   are highly complex. The focus in the machine learning literature is often on working properties of algorithms
   in specific settings. See Mullainathan and Spiess (2017) for a review of the literature and Breiman (2001) for a
   description of the methods.
44
   We follow Chernozhukov et al. (2018) and rescale the outcomes and covariates to be between 0 and 1 before training.
                                                          35
     as mothers with a probability of giving birth to an unhealthy child below and above the 75th
     percentile, respectively, for each of the two model predictions.45
          We estimate the reduced form equation (3) using a dummy equal to 1 if the number of
     prenatal check-ups is fewer than four—as our main outcome—in each of the previously defined
     groups (i.e., low and high predicted probability of giving birth to an unhealthy child). Table 8
     presents the average coefficient and standard error for the 1,000 repetitions.46 Columns (1) and
     (2) present the results for the sample of mothers with a low predicted probability of giving birth
     to an unhealthy child, and columns (3) and (4) for the sample of mothers with a high predicted
     probability of giving birth to an unhealthy child. We include the results both with and without
     controls.
          Table 8 shows that regardless of the method we use, more-skilled doctors do not seem to
     change the recommended number of prenatal checkups for mothers with a low predicted
     probability of giving birth to an unhealthy child. Instead, they target prenatal checkups
     towards the more-vulnerable mothers, measured as mothers with a high predicted probability
     of giving birth to an unhealthy baby, but without compromising the care of lower-risk
     mothers. Consistent with our suggested mechanism—that more-skilled physicians are better
     able to target care toward more-vulnerable mothers, we find stronger effects of physicians’ skill
     levels when we focus on mothers with a higher predicted probability than when we focus on
     those with a lower predicted probability. While the point estimate for the effect of physicians’
     skill levels on the likelihood of having an unhealthy child in the lower predicted probability
     sample is between −0.0011 and 0.0003 percentage points, depending on the prediction used to
     divide the data, the point estimate for the higher predicted probability group is between
     −0.016 and −0.024 percentage points. These estimates suggest that an increase of one standard
     deviation in physicians’ average graduation exam score decreases the probability that mothers
     are scheduled for fewer than four prenatal checkups between 9.49 and 13.17 percent for
     mothers with a high predicted probability of giving birth to an unhealthy child.
          Taken together, the results from this section are consistent with a story of time-constrained
     physicians not being able to increase the average time they spend in prenatal checkups but they
     are better at targeting care toward more-vulnerable mothers.

     6.1.1   Effect on Probability of Giving Birth to an Unhealthy Child

     We next ask—consistent with the idea that more-skilled physicians are better at targeting care
     toward more-vulnerable mothers without compromising the care of lower-risk
     mothers—whether being assigned to a more-skilled cohort of SSO physicians reduces the
     probability that a mother gives birth to an unhealthy child, particularly among the most

45
   Liberman et al. (2018) and ? follow a similar strategy when studying the effects of information deletion and usury
   rates on consumer credit markets.
46
   Figure A.6 shows the distribution of the estimated coefficients for the 1,000 repetitions.

                                                         36
         Table 8: Number of Prenatal Checkups by Predicted Probability of an Unhealthy Child

                                                          Dependent variable: prenatal checkups < 4
                                                 Low predicted probability                High predicted probability
                                                   of an unhealthy child                    of an unhealthy child
                                                    Without               With                Without              With
                                                    controls             controls             controls            controls
                                                      (1)                  (2)                  (3)                 (4)
                                                                                   a. Logit
                           Coefficient               0.0015               0.0001              −0.0203**          −0.0236**
                           SE                       (0.0097)             (0.0098)              (0.009)           (0.0094)
                           Relative effect           0.94%                0.05%               −11.29%            −13.17%
                                                                             b. Random forest
                           Coefficient               0.0003              −0.0011              −0.0163*           −0.0194**
                           SE                       (0.0099)              (0.01)              (0.0088)            (0.009)
                           Relative effect           0.20%               −0.67%               −9.49%             −11.27%
                           Note: This table reports the differential effects of physicians on the number of prenatal
                           checkups a mother has by her predicted probability of giving birth to an unhealthy child.
                           To predict the probability of an unhealthy child, we divide our data into training and testing
                           subsets of randomly selected LHCs using a K-mean algorithm. On the training sets, we run
                           logit and random forest models of the probability of being born unhealthy on our usual set
                           of mother and LHC ex ante covariates, and we use the estimations to predict the probability
                           of giving birth to an unhealthy child on each testing subset. Using the prediction on the
                           testing sample, we divide each subset into groups with a high and low predicted probability
                           of giving birth to an unhealthy child, defined as mothers with a probability of an unhealthy
                           child below and above the median, respectively. The coefficients (β ) represent the effect of
                           being treated at an LHC that was randomly assigned SSO physicians whose skill level is higher
                           by one standard deviation on the probability of having insufficient (fewer than four) prenatal
                           checkups. Relative (percent) effects are computed as the coefficient divided by the average of
                           the dependent variable. All regressions control for draw-by-state fixed effects. The numbers in
                           parentheses are LHC-level-clustered standard errors. We interpret the non-significant effect
                           for the low predicted probability of an unhealthy child group and the significant effect for the high
                           predicted probability of an unhealthy child group as evidence consistent with the idea that more-
                           skilled physicians are better at targeting care toward more-vulnerable mothers.
                           * p < 0.1, ** p < 0.05, *** p < 0.01




     vulnerable mothers. Table 9 shows that being assigned to more-skilled doctors seems to
     improve the health at birth of children for all mothers (i.e., whether they have a low or high
     predicted probability of having an unhealthy child). However, the effect is more pronounced,
     regardless of the method we use to split the sample, for mothers with a (ex ante) high
     predicted probability of having an unhealthy child. In particular, for the more-vulnerable
     mothers, being assigned to a physician whose graduation exam scores were one standard
     deviation higher decreases the probability of an unhealthy baby by around 9.45 percent, while
     for mothers with (ex ante) low predicted probability of an unhealthy baby, the effects are
     smaller in magnitude, close to 8.71 percent.47

47
     Figure A.7 presents the distribution of the estimated coefficients for the 1,000 repetitions for the four outcomes
     studied.




                                                                           37
    Table 9: Main Outcomes by Predicted Unhealthiness

                                      Dependent variable: unhealthy
                     Low predicted probability               High predicted probability
                       of an unhealthy child                   of an unhealthy child
                        Without              With                Without              With
                        controls            controls             controls            controls
                          (1)                 (2)                  (3)                 (4)
                                                      a. Logit
Coefficient            −0.0079***          −0.0076***            −0.0111**         −0.0115**
SE                      (0.0026)            (0.0026)             (0.0045)          (0.0046)
Relative effect         −9.11%              −8.73%                −9.20%            −9.49%
                                                 b. Random forest
Coefficient            −0.0079***          −0.0076***            −0.0111**         −0.0115**
SE                      (0.0026)            (0.0026)             (0.0045)          (0.0046)
Relative effect         −9.07%              −8.71%                −9.41%            −9.45%
Note: This table reports the differential effects of physicians on the probability of a child’s
being born unhealthy, by the mother’s predicted probability of giving birth to an unhealthy
child. To predict the probability of an unhealthy child, we divide our data into training and
testing subsets of randomly selected LHCs using a K-mean algorithm. On the training sets, we
run logit and random forest models of the probability of being born unhealthy on our usual set
of mother and LHC ex ante covariates, and we use the estimations to predict the probability
of giving birth to an unhealthy child on each testing subset. Using the prediction on the
testing sample, we divide each subset into groups with a high and low predicted probability
of giving birth to an unhealthy child, defined as mothers with a probability of an unhealthy
child below and above the median, respectively. Unhealthy is a binary variable that takes a
value of 1 if the newborn infant has a birth weight below 2,500 grams, if the newborn infant
is born after fewer than 37 weeks of gestation, or if the Apgar score of the newborn infant is
lower than 7 and zero otherwise. The coefficients (β ) represent the effect of being treated at an
LHC that was randomly assigned SSO physicians whose skill level is higher by one standard
deviation on unhealthy. Relative (percent) effects are computed as the coefficient divided by
the average of the dependent variable. All regressions control for draw-by-state fixed effects.
The numbers in parentheses are LHC-level-clustered standard errors. The results show how,
consistent with the idea that more-skilled physicians are better at targeting care toward more-
vulnerable mothers, the negative effects on the probability of having an unhealthy child are
particularly pronounced among more-vulnerable mothers.
* p < 0.1, ** p < 0.05, *** p < 0.01




                                               38
6.2     Other Mechanisms
In addition to more-skilled physicians’ ability to target care, we discuss two alternative
mechanisms through which assigning more-skilled SSO physicians to an LHC may impact
birth outcomes: the potential sorting of patients to LHCs and the potential impact of practice
styles on LHC outcomes.

6.2.1   Patient Assignment

It is possible that the presence of more-skilled physicians in an LHC could influence the
demographics of the mothers seeking care—for instance, by attracting more-vulnerable
mothers from the local municipality and nearby areas. However, the evidence from our study
suggests that this kind of sorting of patients by physicians’ skill level is limited. Moreover,
because we focus on non-metropolitan areas that generally have no more than one LHC, there
are few practical alternatives for mothers to seek care elsewhere, further reducing the
likelihood of sorting. In particular, there are no significant correlations between the skill level
of physicians in an LHC and various demographic characteristics of the mothers they treat,
such as education level, marital status, and age (table 2). This absence of significant
correlations suggests that the arrival of more-skilled physicians at an LHC does not
systematically attract mothers with specific demographic profiles.
     Furthermore, the temporary nature of the government’s SSO program, which places
physicians in an LHC for only 12 months, further constrains the potential for long-term patient
sorting across LHCs. Given this limited duration, the opportunity for mothers to switch their
care preferences based on physicians’ skill level is limited, suggesting that the observed
improvements in birth outcomes are primarily attributable to the direct effects of physician
competence rather than changes in the patient mix.

6.2.2   Practice Style

The influence of physician practice styles and practice environments on treatment outcomes
has been extensively documented in the literature. This body of work highlights how
physician-specific factors such as personal preferences, training backgrounds, and
accumulated experience can lead to significant variations in treatment approaches within the
same local health care markets, resulting in persistent style differences among physicians
(Epstein and Nicholson, 2009; Grytten and Sørensen, 2003; Molitor, 2018; Phelps, 1992). While
these individual tendencies contribute to disparities in medical practices, environment-specific
factors such as hospital resources, staff productivity, and financial incentives also shape
practice styles, suggesting a complex interplay between individual and environmental
influences (Molitor, 2018).


                                                39
     Our analysis suggests that the potential impact of matching individual physician
backgrounds with the specific environments of LHCs is limited in this context. First, the
random assignment of physicians through a within-state lottery controls for selection biases,
distributing more- and less-skilled physicians across LHCs without preference. Thus,
differences in outcomes are less likely to be due to the systematic matching of physician and
LHC characteristics; rather, they likely reflect the intrinsic skill differences among physicians.
Second, our study focuses on rural municipalities, where health care facilities are scarcer (only
16 out of 598 municipalities have more than one LHC). Facilities in these areas are likely to
have similar practice environments, processes, and systems. Third, our examination of
educational backgrounds (table A.14) indicates that differences in training institutions among
top-ranked universities do not significantly drive the observed outcomes in our study. This
finding suggests that, even if educational institutions impart distinct practice styles to their
graduates, such differences are not the primary determinants of the variations in patient
outcomes in our setting. Finally, the results presented in section 4.5 suggest that there is no
relationship between the physicians’ value-added and several observable characteristics.


7    Conclusions
Physicians are a key input in the production function of health at birth. Yet there is little evidence
on the effect they can have on birth outcomes. The lack of causal evidence on this topic is related
to the selection bias associated with the match between physicians and LHCs (Doyle Jr et al.,
2010). In the present study, we provide experimental evidence to answer the difficult question
of whether and how physicians’ skill level affects birth outcomes for the mothers and children
they treat.
     In Colombia, medical school graduates must spend the first year of their careers working
in the SSO. The SSO program randomly assigns physicians to their first jobs, providing a test
for the effects of being treated at an LHC with more-skilled physicians. In this paper, we
combine administrative records to match physicians in the SSO program, LHCs, VSRs,
physician characteristics, and scores from mandatory health-specific college graduation exams
to measure the skills of the physicians assigned to each LHC and the main birth outcomes.
Using these data sets, we provide evidence of the covariate balance between LHCs and the
skill level of physicians. Finally, we provide evidence of the causal relationship between
more-skilled physicians and health at birth.
     We find that being treated at an LHC that is randomly assigned more-skilled SSO physicians
has a negative and significant effect on the probability that a mother gives birth to an unhealthy
child. We estimate that being assigned to a physician whose graduation exam score was one
standard deviation higher reduces the probability that a mother gives birth to an unhealthy


                                                 40
child by 9.14 percent. Although we use an aggregate measure of health at birth as our main
measure, the results are robust to other measures, such as low birth weight, prematurity and
low Apgar score.
     Furthermore, we explore whether being assigned to more-skilled physicians increases the
number of prenatal checkups a mother has, serving as a mechanism to improve the quality of
health care and birth outcomes. According to WHO (2016) and the Colombian government,
better and more frequent prenatal care improves a child’s health at birth. We find that more-
skilled doctors do not schedule mothers for more prenatal checkups. Nonetheless, we provide
evidence that these physicians target their prenatal checkups toward more-vulnerable mothers,
measured as those with a higher predicted likelihood of giving birth to an unhealthy baby.
     Finally, we present several meaningful placebo tests. The results show the internal validity
of our exercise. We conclude that more-skilled physicians play a crucial role in overall health
at birth and that governments should consider these findings in developing policies to assign
physicians optimally.



References
Abaluck, J., L. Agha, C. Kabrhel, A. Raja, and A. Venkatesh (2016). The determinants
  of productivity in medical testing: Intensity and allocation of care. American Economic
  Review 106(12), 3730–64.
Abowd, J. M., F. Kramarz, and D. N. Margolis (1999). High wage workers and high wage firms.
  Econometrica 67(2), 251–333.
Administrative Department of Statistics (2005). National census 2005. www.dane.gov.co.
Administrative Department of Statistics (2018a). National geostatistical framework 2018.
  www.dane.gov.co.
Administrative Department of Statistics (2018b). Vital statistics records. www.dane.gov.co.
Alexander, G. R. and C. C. Korenbrot (1995). The role of prenatal care in preventing low birth
  weight. The Future of Children, 103–120.
Almond, D., K. Y. Chay, and D. S. Lee (2005). The costs of low birth weight. The Quarterly Journal
  of Economics 120(3), 1031–1083.
Almond, D., J. Currie, and V. Duque (2018). Childhood circumstances and adult outcomes: Act
  ii. Journal of Economic Literature 56(4), 1360–1446.
Almond, D., J. J. Doyle Jr, A. E. Kowalski, and H. Williams (2010). Estimating marginal returns
  to medical care: Evidence from at-risk newborns. The Quarterly Journal of Economics 125(2),
  591–634.
Almond, D. and B. Mazumder (2011). Health capital and the prenatal environment: the effect
  of ramadan observance during pregnancy. American Economic Journal: Applied Economics 3(4),
  56–85.
Alsan, M., O. Garrick, and G. Graziani (2019). Does diversity matter for health? experimental
  evidence from oakland. American Economic Review 109(12), 4071–4111.




                                               41
Amarante, V., M. Manacorda, E. Miguel, and A. Vigorito (2016). Do cash transfers improve
  birth outcomes? evidence from matched vital statistics, program, and social security data.
  American Economic Journal: Economic Policy 8(2), 1–43.
Anderson, M. L. (2008). Multiple inference and gender differences in the effects of early
  intervention: A reevaluation of the abecedarian, perry preschool, and early training projects.
  Journal of the American statistical Association 103(484), 1481–1495.
Anderson, M. L., C. Dobkin, and T. Gross (2014). The effect of health insurance on emergency
  department visits: Evidence from an age-based eligibility threshold. Review of Economics and
  Statistics 96(1), 189–195.
Angrist, J. D., P. D. Hull, P. A. Pathak, and C. R. Walters (2017). Leveraging lotteries for school
  value-added: Testing and estimation. The Quarterly Journal of Economics 132(2), 871–919.
Araujo, M. C., P. Carneiro, Y. Cruz-Aguayo, and N. Schady (2016). Teacher quality and learning
  outcomes in kindergarten. The Quarterly Journal of Economics 131(3), 1415–1453.
Aron-Dine, A., L. Einav, A. Finkelstein, and M. Cullen (2015). Moral hazard in health insurance:
  do dynamic incentives matter? Review of Economics and Statistics 97(4), 725–741.
Athey, S. and G. W. Imbens (2017). The state of applied econometrics: Causality and policy
  evaluation. Journal of Economic Perspectives 31(2), 3–32.
Baicker, K. and A. Chandra (2004). The productivity of physician specialization: evidence from
  the medicare program. American Economic Review 94(2), 357–361.
Barber, S. L. and P. J. Gertler (2010). Empowering women: how mexico’s conditional cash
  transfer programme raised prenatal care quality and birth weight. Journal of Development
  Effectiveness 2(1), 51–73.
Bardach, N. S., J. J. Wang, S. F. De Leon, S. C. Shih, W. J. Boscardin, L. E. Goldman, and R. A.
  Dudley (2013). Effect of pay-for-performance incentives on quality of care in small practices
  with electronic health records: a randomized trial. Jama 310(10), 1051–1059.
Basinga, P., P. J. Gertler, A. Binagwaho, A. L. Soucat, J. Sturdy, and C. M. Vermeersch (2011).
  Effect on maternal and child health services in rwanda of payment to primary health-care
  providers for performance: an impact evaluation. The Lancet 377 (9775), 1421–1428.
Becker, G. S. (1973). A theory of marriage: Part i. Journal of Political Economy 81(4), 813–846.
Black, S. E., P. J. Devereux, and K. G. Salvanes (2007). From the cradle to the labor market? the
  effect of birth weight on adult outcomes. The Quarterly Journal of Economics 122(1), 409–439.
Breiman, L. (2001). Random forests. Machine Learning 45(1), 5–32.
Butler, A. S., R. E. Behrman, et al. (2007). Preterm birth: causes, consequences, and prevention.
  National Academies Press.
Cameron, E. Z. (2004). Facultative adjustment of mammalian sex ratios in support of the trivers–
  willard hypothesis: evidence for a mechanism. Proceedings of the Royal Society of London. Series
  B: Biological Sciences 271(1549), 1723–1728.
Carrera, M., D. P. Goldman, G. Joyce, and N. Sood (2018). Do physicians respond to the costs and
  cost-sensitivity of their patients? American Economic Journal: Economic Policy 10(1), 113–52.
Carrillo, B. and J. Feres (2019). Provider supply, utilization, and infant health: evidence from a
  physician distribution policy. American Economic Journal: Economic Policy 11(3), 156–96.
Chan, D. C. and Y. Chen (2022). The productivity of professions: evidence from the emergency
  department. Technical report, National bureau of economic research.
Chan, D. C., M. Gentzkow, and C. Yu (2022). Selection with variation in diagnostic skill:
  Evidence from radiologists. The Quarterly Journal of Economics 137(2), 729–783.
Chandra, A. and D. Staiger (2020). Identifying sources of inefficiency in healthcare. The Quarterly
  Journal of Economics 135(2), 785–843.

                                                42
Chen, Y. (2021). Team-specific human capital and team performance: evidence from doctors.
  American economic review 111(12), 3923–3962.
Chernozhukov, V., M. Demirer, E. Duflo, and I. Fernandez-Val (2018). Generic machine learning
  inference on heterogenous treatment effects in randomized experiments. Technical report,
  National Bureau of Economic Research.
Chetty, R., J. N. Friedman, N. Hilger, E. Saez, D. W. Schanzenbach, and D. Yagan (2011). How
  does your kindergarten classroom affect your earnings? evidence from project star. The
  Quarterly Journal of Economics 126(4), 1593–1660.
Chetty, R., J. N. Friedman, and J. E. Rockoff (2014). Measuring the impacts of teachers ii: Teacher
  value-added and student outcomes in adulthood. American economic review 104(9), 2633–2679.
Clemens, J. and J. D. Gottlieb (2014). Do physicians’ financial incentives affect medical treatment
  and patient health? American Economic Review 104(4), 1320–49.
Colegio Médico Colombiano (2018). Historia del servicio social obligatorio. Retrieved
  from:        https://www.colegiomedicocolombiano.org/web_cmc/upload/docs/
  Epicrisis-7_web.pdf.
Colombian Institute for Educational Evaluation (2014). Quality evaluation of higher education.
Congress of Colombia (1993, December). Law 100 of 1993. por la cual se crea el sistema de
  seguridad social integral y se dictan otras disposiciones.
Congress of Colombia (2007, October). Law 1164 of 2007. por la cual se dictan disposiciones en
  materia del talento humano en salud.
Conway, K. S. and P. Deb (2005). Is prenatal care really ineffective? or, is the ‘devil’in the
  distribution? Journal of Health Economics 24(3), 489–513.
Currie, J. (2011). Inequality at birth: Some causes and consequences. American Economic
  Review 101(3), 1–22.
Currie, J. and D. Almond (2011). Human capital development before age five. In Handbook of
  Labor Economics, Volume 4, pp. 1315–1486. Elsevier.
Currie, J. and J. Grogger (2002). Medicaid expansions and welfare contractions: offsetting effects
  on prenatal care and infant health? Journal of Health Economics 21(2), 313–335.
Currie, J. and J. Gruber (1996). Saving babies: The efficacy and cost of recent changes in the
  medicaid eligibility of pregnant women. Journal of Political Economy 104(6), 1263–1296.
Currie, J. and W. B. MacLeod (2017). Diagnosing expertise: Human capital, decision making,
  and performance among physicians. Journal of Labor Economics 35(1), 1–43.
Currie, J. and W. B. MacLeod (2020). Understanding doctor decision making: The case of
  depression treatment. Econometrica 88(3), 847–878.
Currie, J. and H. Schwandt (2016a). The 9/11 dust cloud and pregnancy outcomes: a
  reconsideration. Journal of Human Resources 51(4), 805–831.
Currie, J. and H. Schwandt (2016b). Mortality inequality: The good news from a county-level
  approach. Journal of Economic Perspectives 30(2), 29–52.
Currie, J. and R. Walker (2011). Traffic congestion and infant health: Evidence from e-zpass.
  American Economic Journal: Applied Economics 3(1), 65–90.
Currie, J. and J. Zhang (2023). Doing more with less: Predicting primary care provider
  effectiveness. Review of Economics and Statistics, 1–45.
Curtis, J. R., Q. Cai, S. W. Wade, B. S. Stolshek, J. L. Adams, A. Balasubramanian, H. N.
  Viswanathan, and J. D. Kallich (2013). Osteoporosis medication adherence: physician
  perceptions vs. patients’ utilization. Bone 55(1), 1–6.
Dahlstrand, A. (2021). Defying distance? the provision of services in the digital age. Job Market
  Paper, London School of Economics and Political Science.

                                                43
Das, J. and J. Hammer (2005). Which doctor? combining vignettes and item response to measure
  clinical competence. Journal of Development Economics 78(2), 348–383.
Das, J. and J. Hammer (2007). Money for nothing: the dire straits of medical practice in delhi,
  india. Journal of Development Economics 83(1), 1–36.
Das, J., J. Hammer, and K. Leonard (2008). The quality of medical advice in low-income
  countries. Journal of Economic Perspectives 22(2), 93–114.
Das, J., A. Holla, A. Mohpal, and K. Muralidharan (2016). Quality and accountability in
  health care delivery: audit-study evidence from primary care in india. American Economic
  Review 106(12), 3765–99.
Das, J. and T. P. Sohnesen (2007). Variations in doctor effort: Evidence from paraguay: Doctors
  in paraguay who expended less effort appear to have been paid more than doctors who
  expended more. Health Affairs 26(Suppl2), w324–w337.
Davis, D. A., M. A. Thomson, A. D. Oxman, and R. B. Haynes (1995). Changing physician
  performance: a systematic review of the effect of continuing medical education strategies.
  Jama 274(9), 700–705.
Dinkelman, T. (2017). Long-run health repercussions of drought shocks: Evidence from south
  african homelands. The Economic Journal 127(604), 1906–1939.
Doyle Jr, J. J., S. M. Ewer, and T. H. Wagner (2010). Returns to physician human capital: Evidence
  from patients randomized to physician teams. Journal of health economics 29(6), 866–882.
Ehrenstein, V. (2009). Association of apgar scores with death and neurologic disability. Clinical
  Epidemiology 1, 45.
Epstein, A. J. and S. Nicholson (2009). The formation and evolution of physician treatment
  styles: an application to cesarean sections. Journal of health economics 28(6), 1126–1140.
Eriksson, J. G., E. Kajantie, C. Osmond, K. Thornburg, and D. J. Barker (2010). Boys live
  dangerously in the womb. American Journal of Human Biology 22(3), 330–335.
Fadlon, I. and J. Van Parys (2020). Primary care physician practice styles and patient care:
  Evidence from physician exits in medicare. Journal of health economics 71, 102304.
Fernández Ávila, D. G., L. C. Mancipe García, D. C. Fernández Ávila, E. Reyes Sanmiguel, M. C.
  Díaz, and J. M. Gutiérrez (2011). Analysis of the supply of medicine undergraduate programs
  in colombia, during the past 30 years. Revista Colombiana de Reumatología 18(2), 109–120.
Finkelstein, A., S. Taubman, B. Wright, M. Bernstein, J. Gruber, J. P. Neuse, H. Allen, K. Baicker,
  and O. H. S. Group (2012). The oregon health insurance experiment: evidence from the first
  year. The Quarterly Journal of Economics 127(3), 1057–1106.
Fletcher, J. M., L. I. Horwitz, and E. Bradley (2014). Estimating the value added of attending
  physicians on patient outcomes. Technical report, National Bureau of Economic Research.
Gagnon-Bartsch, J., Y. Shem-Tov, et al. (2019). The classification permutation test: A flexible
  approach to testing for covariate imbalance in observational studies. The Annals of Applied
  Statistics 13(3), 1464–1483.
Gomez, P., I. Arevalo, et al. (2013). Guías de práctica clínica para la prevención, detección
  temprana y tratamiento de las complicaciones del embarazo, parto y puerperio. Ministerio
  de Salud y protección social Colombia 84, 74–82.
Grossman, M. and T. J. Joyce (1990). Unobservables, pregnancy resolutions, and birth weight
  production functions in new york city. Journal of Political Economy 98(5, Part 1), 983–1007.
Grytten, J. and R. Sørensen (2003). Practice variation and physician-specific effects. Journal of
  health economics 22(3), 403–418.
Guarin, A., C. Posso, E. Saravia, and J. Tamayo (2023). Healing the gender gap: The impacts of
  randomized first-job on female physicians.

                                                44
Harris, D. N. and T. R. Sass (2014). Skills, productivity and the evaluation of teacher
   performance. Economics of Education Review 40, 183–204.
Herrmann, M., E. Walsh, and E. Isenberg (2016). Shrinkage of value-added estimates and
   characteristics of students with hard-to-predict achievement levels. Statistics and Public
   Policy 3(1), 1–10.
Ho, K. and A. Pakes (2014a). Hospital choices, hospital prices, and financial incentives to
   physicians. American Economic Review 104(12), 3841–84.
Ho, K. and A. Pakes (2014b). Physician payment reform and hospital referrals. American
   Economic Review 104(5), 200–205.
Hoynes, H., M. Page, and A. H. Stevens (2011). Can targeted transfers improve birth outcomes?:
   Evidence from the introduction of the wic program. Journal of Public Economics 95(7-8), 813–
   827.
Iizuka, T. (2012). Physician agency and adoption of generic pharmaceuticals. American Economic
   Review 102(6), 2826–58.
Jackson, C. K. (2018). What do test scores miss? the importance of teacher effects on non–test
   score outcomes. Journal of Political Economy 126(5), 2072–2107.
Kane, T. J., J. E. Rockoff, and D. O. Staiger (2008). What does certification tell us about teacher
   effectiveness? evidence from new york city. Economics of Education review 27 (6), 615–631.
Kane, T. J. and D. O. Staiger (2008). Estimating teacher impacts on student achievement: An
   experimental evaluation. Technical report, National Bureau of Economic Research.
Kleinberg, J., J. Ludwig, S. Mullainathan, and Z. Obermeyer (2015). Prediction policy problems.
   American Economic Review 105(5), 491–95.
Kraemer, S. (2000). The fragile male. Bmj 321(7276), 1609–1612.
Kramer, M. S. (1987). Determinants of low birth weight: methodological assessment and meta-
   analysis. Bulletin of the World Health Organization 65(5), 663.
Kremer, M. (1993). The o-ring theory of economic development. The Quarterly Journal of
   Economics 108(3), 551–575.
Leonard, K. L. and M. C. Masatu (2007). Variations in the quality of care accessible to rural
   communities in tanzania: Some quality disparities might be amenable to policies that do not
   necessarily relate to funding levels. Health Affairs 26(Suppl2), w380–w392.
Leonard, K. L., M. C. Masatu, and A. Vialou (2007). Getting doctors to do their best the roles of
   ability and motivation in health care quality. Journal of Human Resources 42(3), 682–700.
Liberman, A., C. Neilson, L. Opazo, and S. Zimmerman (2018). The equilibrium effects of
   information deletion: Evidence from consumer credit markets. Technical report, National
   Bureau of Economic Research.
Lin, W. (2009). Why has the health inequality among infants in the us declined? accounting for
   the shrinking gap. Health Economics 18(7), 823–841.
Mathews, F., P. J. Johnson, and A. Neil (2008). You are what your mother eats: evidence for
   maternal preconception diet influencing foetal sex in humans. Proceedings of the Royal Society
   B: Biological Sciences 275(1643), 1661–1668.
McCormick, M. C. and J. E. Siegel (2001). Recent evidence on the effectiveness of prenatal care.
   Ambulatory Pediatrics 1(6), 321–325.
Michalopoulos, C., D. Wittenburg, D. A. Israel, and A. Warren (2012). The effects of health
   care benefits on health care use and health: a randomized trial for disability insurance
   beneficiaries. Medical Care, 764–771.
Ministry of Education (2019). National higher education information system.


                                                45
Ministry of Health (1990, June). Decree 1335 of 1990. por el cual se expide parcialmente el
  manual general de funciones y requisitos del subsector oficial del sector salud.
Ministry of Health (2001). Reglamento del año de servicio de salud rural.
Ministry of Health (2010, March). Resolution 1058 of 2010. por medio de la cual se reglamenta
  el servicio social obligatorio para los egresados de los programas de educación superior del
  área de la salud y se dictan otras disposiciones.
Ministry of Health (2012a, December). Resolution 4503 of 2012. por la cual se modifica el artículo
  6 de la resolución 274 de 2011 modificado por el artículo 2 de la resolución 566 de 2012.
Ministry of Health (2012b, March). Resolution 566 of 2012. por la cual se modifica parcialmente
  la resolución 274 de 2011.
Ministry of Health (2013, May).           Resolution 1441 of 2013. por la cual se definen los
  procedimientos y condiciones que deben cumplir los prestadores de servicios de salud.
Ministry of Health (2014). Reports of professionals registered and assigned to the process of
  assigning places in the mandatory social service.
Mizuno, R. (2000).         The male/female ratio of fetal deaths and births in japan.          The
  Lancet 356(9231), 738–739.
Molitor, D. (2018). The evolution of physician practice styles: evidence from cardiologist
  migration. American Economic Journal: Economic Policy 10(1), 326–56.
Moore, E. A., F. Harris, K. R. Laurens, M. J. Green, S. Brinkman, R. K. Lenroot, and V. J. Carr
  (2014). Birth outcomes and academic achievement in childhood: A population record linkage
  study. Journal of Early Childhood Research 12(3), 234–250.
Mullainathan, S. and Z. Obermeyer (2022). Diagnosing physician error: A machine learning
  approach to low-value health care. The Quarterly Journal of Economics 137(2), 679–727.
Mullainathan, S. and J. Spiess (2017). Machine learning: an applied econometric approach.
  Journal of Economic Perspectives 31(2), 87–106.
Naeye, R. L., L. S. Burt, D. L. Wright, W. A. Blanc, and D. Tatter (1971). Neonatal mortality, the
  male disadvantage. Pediatrics 48(6), 902–906.
Norcini, J. J., J. R. Boulet, A. Opalek, and W. D. Dauphinee (2014). The relationship between
  licensing examination performance and the outcomes of care by international medical school
  graduates. Academic Medicine 89(8), 1157–1162.
Norcini, J. J., R. S. Lipner, and H. R. Kimball (2002). Certifying examination performance and
  patient outcomes following acute myocardial infarction. Medical education 36(9), 853–859.
Okeke, E. N. (2023). When a doctor falls from the sky: The impact of easing doctor supply
  constraints on mortality. American Economic Review 113(3), 585–627.
Okeke, E. N. and I. S. Abubakar (2020). Healthcare at the beginning of life and child survival:
  Evidence from a cash transfer experiment in nigeria. Journal of Development Economics 143,
  102426.
Páez, G., L. Jaramillo, C. Franco, and L. Arregoces (2007). Estudio sobre el modo de gestionar
  la salud en colombia.
Persson, P. and M. Rossin-Slater (2018). Family ruptures, stress, and the mental health of the
  next generation. American Economic Review 108(4-5), 1214–52.
Phelps, C. E. (1992).         Diffusion of information in medical care.      Journal of Economic
  Perspectives 6(3), 23–42.
Pongou, R., B. Kuate Defo, and Z. Tsala Dimbuene (2017). Excess male infant mortality: The
  gene-institution interactions. American Economic Review 107(5), 541–45.
Rivkin, S. G., E. A. Hanushek, and J. F. Kain (2005). Teachers, schools, and academic
  achievement. Econometrica 73(2), 417–458.

                                               46
Rockoff, J. E. (2004). The impact of individual teachers on student achievement: Evidence from
  panel data. American Economic Review 94(2), 247–252.
Rothstein, J. (2017). Measuring the impacts of teachers: Comment. American Economic
  Review 107 (6), 1656–84.
Roy, A. D. (1951). Some thoughts on the distribution of earnings. Oxford Economic Papers 3(2),
  135–146.
Schnell, M. and J. Currie (2018). Addressing the opioid epidemic: is there a role for physician
  education? American Journal of Health Economics 4(3), 383–410.
Shimer, R. and L. Smith (2000). Assortative matching and search. Econometrica 68(2), 343–369.
Simeonova, E., N. Skipper, and P. R. Thingholm (2020). Physician health management skills and
  patient outcomes. Technical report, National Bureau of Economic Research.
Stern, S. and M. Trajtenberg (1998).        Empirical implications of physician authority in
  pharmaceutical decisionmaking.
Stoye, G. (2022). The distribution of doctor quality: Evidence from cardiologists in england.
  Technical report, IFS Working Papers.
Tamblyn, R., M. Abrahamowicz, D. Dauphinee, E. Wenghofer, A. Jacques, D. Klass, S. Smee,
  D. Blackmore, N. Winslade, N. Girard, et al. (2007). Physician scores on a national clinical
  skills examination as predictors of complaints to medical regulatory authorities. Jama 298(9),
  993–1001.
Tamblyn, R., M. Abrahamowicz, W. D. Dauphinee, J. A. Hanley, J. Norcini, N. Girard,
  P. Grand’Maison, and C. Brailovsky (2002). Association between licensure examination scores
  and practice in primary care. Jama 288(23), 3019–3026.
Taylor, H. G., N. Klein, N. M. Minich, and M. Hack (2001). Long-term family outcomes for
  children with very low birth weights. Archives of Pediatrics & Adolescent Medicine 155(2), 155–
  161.
Tsugawa, Y., A. B. Jena, J. F. Figueroa, E. J. Orav, D. M. Blumenthal, and A. K. Jha (2017).
  Comparison of hospital mortality and readmission rates for medicare patients treated by male
  vs female physicians. JAMA Internal Medicine 177(2), 206–213.
Universidad del Rosario (2015).              El año rural:       Realidad agridulce para los
  médicos recién graduados. un relato de quien lo vivió.                       Retrieved from:
  https://www.urosario.edu.co/Revista-Nova-Et-Vetera/Vol-1-Ed-2/Cultura/El-ano-rural-
  Realidad-agridulce-para-los-medicos-r.pdf.
Veddovi, M., D. T. Kenny, F. Gibson, J. Bowen, and D. Starte (2001). The relationship between
  depressive symptoms following premature birth, mothers’ coping style, and knowledge of
  infant development. Journal of Reproductive and Infant Psychology 19(4), 313–323.
Wenghofer, E., D. Klass, M. Abrahamowicz, D. Dauphinee, A. Jacques, S. Smee, D. Blackmore,
  N. Winslade, K. Reidel, I. Bartman, et al. (2009). Doctor scores on national qualifying
  examinations predict quality of care in future practice. Medical education 43(12), 1166–1173.
WHO (2016). Pregnant women must be able to access the right care at the right time, says
  who. Retrieved from: https://www.who.int/news/item/07-11-2016-pregnant-women-must-
  be-able-to-access-the-right-care-at-the-right-time-says-who.
Woodcock, S. D. (2008). Wage differentials in the presence of unobserved worker, firm, and
  match heterogeneity. Labour Economics 15(4), 771–793.




                                               47
Online Appendix
   Not for Publication




           A1
A   Appendix

     Figure A.1: Probability of low birth weight vs. gestational weeks, 2009-2012




                   Notes: This figure presents the local polynomial
                   regression fit of the probability of having low birth
                   weight over the number of gestational weeks using all
                   birth records for Colombia from 2009 to 2012.




                                           A2
Figure A.2: Population (per 100,000) for municipalities included in our main sample




                  Notes: This figure presents the map (Administrative
                  Department of Statistics, 2018a) of the population per
                  100,000 people for the municipalities included in our
                  main sample in 2005. The municipalities in orange are
                  not included in our sample or do not have SSO.




                                           A3
        Figure A.3: Distribution of physicians per municipalities




             Notes: This figure shows the distribution of physicians
             per municipality for the sample of 582 municipalities
             with only one LHC. The data spans from January 2012
             to December 2012.


Figure A.4: Heterogeneity in quantitative and reading SABER PRO scores




     Notes: This figure reports the quantitative and reading test scores for the
     universities (Ministry of Education, 2019) that the physicians in our sample
     attended. Data accounts for 44 different universities. The figure shows the
     mean score for each university/program and an interval of one standard
     deviation to each side of the average. The dashed horizontal line represents
     the overall percentile 50. The figure shows substantial heterogeneity both
     within and between programs. For all the fields reported, there is a
     difference of almost two standard deviations between the averages of the
     best and the worst programs.




                                         A4
           Table A.1: Summary statistics - physicians in the main sample

                          Covariate                                        Mean                Standard
                                                                                                 error
Sex (female)                                                                0.558                 0.497
The household has a private car                                            0.483                  0.500
Number of people in the household                                           4.025                 1.659
Father with tertiary education                                              0.644                 0.479
Mother with tertiary education                                              0.634                 0.482
Socioeconomic strata: 1 or 2 or rural areas                                 0.292                 0.455
Socioeconomic strata: 4, 5 or 6                                             0.349                 0.477
The household has internet                                                  0.831                 0.375
Monthly household income: Less than 2 MW                                    0.229                 0.420
Monthly household income: between 2 and 3 MW                                0.220                 0.414
The father or the mother has a job                                          0.872                 0.335
The household has a washing machine                                         0.854                 0.353
The household has a television                                              0.859                 0.348
The household has a cellphone                                               0.963                 0.188
The house has proper flooring                                               0.908                 0.289
The household has an oven                                                   0.671                 0.470
Physician’s score on the Health care test                                  10.426                 1.059
Physician’s score on the Disease prevention test                           10.431                 1.010
Physician’s score on the Reading test                                      10.624                 1.007
Physician’s score on the Math test                                         10.572                 1.123
Physician’s average score on SABER PRO                                     10.513                 0.854
Observations                                                                           2,126
Notes: This table reports the summary statistics for the physicians included in our main sample. These
characteristics are obtained at the time physicians took their SABER PRO exam (before the SSO). Sex
is a binary variable that takes the value of 1 if the physician is female and zero otherwise; The household
has a private car that takes the value of 1 if the household of the physician owns a private car at the time
the physician took the SABER PRO test and zero otherwise; Number of people in the household counts
the number of individuals living in the same house as the physician; Father with tertiary education is a
binary variable that takes the value of 1 if the physician’s father has at least tertiary education and zero
otherwise; Mother with tertiary education is a binary variable that takes the value of 1 if the physician’s
mother has at least tertiary education and zero otherwise; Socioeconomic strata: 1 or 2 or rural areas takes
the value of 1 if the physician’s household’s socioeconomic strata at the time of the SABER PRO test was
1, 2 or rural and zero otherwise; Socioeconomic strata: 4, 5 or 6 is a variable that takes the value of 1 if the
physician’s household’s socioeconomic strata at the time of the SABER PRO test was 4, 5 or 6 and zero
otherwise; The household has internet takes the value of 1 if the physician had internet service at home
at the time of the test; Monthly household income: Less than 2MW takes the value of 1 if the physician’s
household had an income lower than 2 times the national monthly minimum wage and zero otherwise;
Monthly household income: between 2 and 3 MW takes the value of 1 if the physician’s household had an
income between 2 and 3 times the national monthly minimum wage and zero otherwise; The father or
the mother has a job takes value 1 if either of the physician’s parents have a job; The household has a
washing machine, television, cellphone, proper flooring or oven, take value 1 if the household has that
characteristic described and zero otherwise; physician’s scores are continuous variables of the score
obtained on each SABER PRO test subgroup; physician’s average score on SABER PRO is the average
of the four main components of the test, health care, disease prevention, reading and math.




                                                     A5
Table A.2: Covariate balance at LHC level using all the areas tested in the SABER PRO

                               Covariate                      Coefficient             Standard
                                                                                        Error
                a. Pretreatment variables (2010-2012)
                Unhealthy                                          0.0060               0.0094
                Low birth weight                                  −0.0027               0.0026
                Prematurity                                        0.0044               0.0043
                Low Apgar score                                   −0.0020               0.0048
                Insufficient prenatal care (Prop.)                −0.0054               0.0078
                Female infants                                    −0.0011               0.0046
                Mothers with basic education                      −0.0035               0.0082
                Married mothers                                   −0.0040               0.0070
                Teenage mothers                                    0.0055               0.0053
                Number of LHCs per municipality                   −0.0247               0.0203
                Municipality population                           1,196.04             2,767.35
                b. Concurrent variables
                Female newborn                                    −0.0008               0.0033
                Mother with basic education                        0.0053               0.0077
                Married mother                                    −0.0034               0.0059
                Teenage mother                                     0.0006               0.0039
                Number of LHCs by municipalities                  −0.0043               0.0209
                Municipality population                           1,260.14             2,807.26
                Notes: This table presents the results of different LHC-by-cohort level regressions
                (equation 5) of the LHC-level variables, listed in the first column, on the measure
                of physicians’ skill level and the draw-by-state fixed effects. The coefficient and
                the standard error of the physicians’ skill variable are reported in the second and
                third columns, respectively. Standard errors are clustered at the LHC level. LHCs’
                characteristics in panel a come from the 2010–2012 DANE VSRs, using a total of 1,837
                LHC-by-cohort observations. LHCs’ characteristics in panel b come from the 2013–
                2015 DANE VSRs, using a total of 1,714 LHC-by-cohort observations.Unhealthy, our
                main measure of health at birth, is the proportion of newborn infants with at least
                one of the three following conditions: low birth weight, prematurity, or low Apgar
                score. Low birth weight is the proportion of newborn infants whose birth weight was
                less than 2,500 grams. Prematurity is the proportion of newborn infants who were
                born after fewer than 37 weeks of gestation. Low Apgar score is the proportion of
                newborn infants whose Apgar score was lower than 7. Insufficient prenatal care is the
                proportion of mothers who had fewer than four prenatal checkups. Female infants
                is the proportion of female infants. Mothers with basic education is the proportion
                of mothers with at least secondary education at the time they gave birth. Married
                mothers is the proportion of mothers that were married at the time they gave birth.
                Teenage mothers is the proportion of mothers who were 19 years old or younger at
                the time they gave birth. Number of LHCs per municipality is the count of LHCs in
                the birthplace municipality. We interpret the non-significance of these estimates as
                evidence in favor of the randomness of the assignment of physicians.




                                                        A6
                                                               Table A.3: Placebo other years
                                 Unhealthy                                        LBW                                      Prematurity                                    Apgar < 7
                    Average Health           PCA Health           Average Health           PCA Health          Average Health           PCA Health           Average Health           PCA Health
                        Scores                 Scores                 Scores                 Scores                Scores                 Scores                 Scores                 Scores
                         (1)                    (2)                    (1)                    (2)                   (1)                    (2)                    (1)                    (2)
                                                                                                        a. 2 years
Coefficient              −0.0029               −0.0027                −0.003*                −0.0029*                −0.0005               −0.0004               −0.0022                −0.0021
SE                       (0.0026)              (0.0026)               (0.0016)               (0.0016)                (0.0016)              (0.0016)              (0.0019)               (0.0019)
Relative effect          −2.90%                −2.72%                 −6.57%                 −6.39%                  −1.09%                −0.98%                −5.44%                 −5.23%
                                                                                                       b. 2.5 years
Coefficient              −0.0022               −0.0021                −0.0009                −0.0008                 −0.0006               −0.0006               −0.0022                −0.0022
SE                       (0.0023)              (0.0023)               (0.0013)               (0.0013)                (0.0015)              (0.0015)              (0.0019)               (0.0019)
Relative effect          −2.18%                −2.07%                 −1.95%                 −1.82%                  −1.41%                −1.35%                −5.58%                 −5.42%
                                                                                                        c. 3 years
Coefficient              −0.0035               −0.0034                −0.0007                −0.0007                 −0.0016               −0.0016               −0.0026                −0.0026
SE                       (0.0022)              (0.0022)               (0.0012)               (0.0013)                (0.0014)              (0.0014)              (0.0018)               (0.0018)
Relative effect          −3.33%                −3.30%                 −1.40%                 −1.45%                  −3.85%                −3.92%                −6.27%                 −6.12%
                                                                                                       d. 3.5 years
Coefficient              −0.0014               −0.0014                −0.0007                −0.0007                 −0.0012               −0.0013               −0.0006                −0.0006
SE                       (0.0023)              (0.0023)               (0.0011)               (0.0012)                (0.0016)              (0.0016)              (0.0018)               (0.0018)
Relative effect          −1.26%                −1.29%                 −1.41%                 −1.58%                  −2.80%                −2.87%                −1.47%                 −1.37%
Notes: This table presents placebo test in which we estimate equation (4) but moving the arrival date 3.5, 3, 2.5 and 2 years before the start of the SSO program. The coefficients represent the effect
of being treated at an LHC that was randomly assigned SSO physicians whose skill level is higher by one standard deviation. Relative (percent) effects are computed as the coefficient divided
by the average of the dependent variable. First stage coefficient and standard error is shown in figure 2. Unhealthy is a binary variable that takes a value of 1 if the newborn infant has a birth
weight below 2,500 grams, if the newborn infant is born after fewer than 37 weeks of gestation, or if the Apgar score of the newborn infant is lower than 7 and zero otherwise. LBW is a binary
variable that takes a value of 1 if the newborn infant has a birth weight below 2,500 grams and zero otherwise. Prematurity is a binary variable that takes a value of 1 if the newborn infant is born
after fewer than 37 weeks of gestation and zero otherwise. Low Apgar is a binary variable that takes a value of 1 if the Apgar score of the newborn infant is lower than 7 and zero otherwise. All
regressions control for draw-by-state fixed effects. The numbers in parentheses are LHC-level-clustered standard errors. The results show that regardless of the time window that we use for the
calculation of the placebo test, the estimated coefficients are always precisely estimated zeros which we interpret as evidence of the randomness of the assignment of the physicians to the LHCs.
* p < 0.1, ** p < 0.05, *** p < 0.01




                                                       Table A.4: Placebo robustness checks
                                                         Unhealthy                                 LBW                             Prematurity                           Apgar < 7
                                                   Average              PCA             Average              PCA              Average             PCA              Average             PCA
                                                   Health              Health           Health              Health            Health             Health            Health             Health
                                                    Scores             Scores            Scores             Scores             Scores            Scores             Scores            Scores
                                                     (1)                 (2)              (3)                 (4)               (5)                (6)               (7)                (8)
                                                                                                             a. Without controls
    Coefficient                                    −0.0019           −0.0019            −0.0014            −0.0014            −0.0024           −0.0024            <0.0001           <0.0001
    SE                                             (0.0024)          (0.0025)           (0.0013)           (0.0013)           (0.0016)          (0.0016)           (0.0017)          (0.0017)
    Relative effect                                −1.58%            −1.59%             −2.93%             −3.05%             −4.59%            −4.68%              0.05%             0.13%
                                                                                                               b. With controls
    Coefficient                                     0.0013            0.0013            <0.0001          < −0.0001            −0.0017           −0.0017             0.002             0.002
    SE                                             (0.0018)          (0.0018)           (0.0008)          (0.0008)            (0.0011)          (0.0011)           (0.0016)          (0.0016)
    Relative effect                                 1.09%             1.06%              0.09%            −0.04%              −3.23%            −3.28%              4.42%             4.42%
    Average dependent variable                                 0.118                                0.046                                0.052                                0.046
    Number of observations                                                                                            261,616
    Notes: This table presents our placebo estimates from equation (4) with and without controls. The coefficients represent the effect of being treated at an
    LHC that was randomly assigned SSO physicians whose skill level is higher by one standard deviation. Relative (percent) effects are computed as the
    coefficient divided by the average of the dependent variable. First stage coefficient and standard error is shown in figure 2. Unhealthy is a binary variable
    that takes a value of 1 if the newborn infant has a birth weight below 2,500 grams, if the newborn infant is born after fewer than 37 weeks of gestation, or
    if the Apgar score of the newborn infant is lower than 7 and zero otherwise. LBW is a binary variable that takes a value of 1 if the newborn infant has a
    birth weight below 2,500 grams and zero otherwise. Prematurity is a binary variable that takes a value of 1 if the newborn infant is born after fewer than
    37 weeks of gestation and zero otherwise. Low Apgar is a binary variable that takes a value of 1 if the Apgar score of the newborn infant is lower than 7
    and zero otherwise. All regressions control for draw state fixed effects. Regressions for the coefficients labeled as With controls also include the following
    controls: an indicator variable for the sex of the newborn; an indicator variable that takes the value of 1 if the mother has at least secondary education
    and zero otherwise; an indicator variable that takes the value of 1 if the mother is 19 years old or younger and zero otherwise; marital status, number of
    inhabitants in the municipality; number of LHCs per municipality; an indicator variable that equals 1 if the LHC is above the 75th percentile of the low
    birth weight distribution for the country in 2010–2012, and 0 otherwise; an indicator variable that equals 1 if the LHC is above the 75th percentile of the
    prematurity distribution for the country in 2010–2012, and 0 otherwise; and an indicator variable that equals 1 if the LHC is above the 75th percentile of
    the Apgar score distribution for the country in 2010–2012, and 0 otherwise.. Note that the results are robust to the inclusion/exclusion of controls and how
    we measure skills. Numbers in parentheses are LHC-level clustered standard errors.
    * p < 0.1, ** p < 0.05, *** p < 0.01




                                                                                                A7
              Table A.5: Main estimates using all the areas tested in the SABER PRO

                                                                         Dependent variable: Unhealthy
                                       Average         Average          Health Care         Prevention          Average          Reading        Quantitative
                                         All           Health             Score              Disease           Academic           Score           Score
                                                        Scores                                Score              Scores
                                          (1)            (2)                 (3)               (4)                (5)               (6)               (7)
                                                                                        a. Without controls
Coefficient                           -0.0109***      -0.0087***         -0.0089***           -0.0054*         -0.0105***       -0.0065**         -0.0106***
Stand. Err.                            (0.0026)        (0.0026)           (0.0026)            (0.0027)          (0.0027)        (0.0027)           (0.0024)
Relative effect                        -11.44%          -9.14%             -9.33%              -5.64%           -11.01%          -6.78%            -11.12%
                                                                                          b. With controls
Coefficient                           -0.0096***      -0.0076***         -0.0075***           -0.0051**        -0.0094***       -0.0066**         -0.0091***
Stand. Err.                            (0.0022)        (0.0023)           (0.0023)            (0.0025)          (0.0023)        (0.0026)           (0.0022)
Relative effect                        -10.11%          -7.94%             -7.90%              -5.33%            -9.86%          -6.89%             -9.56%
Average Dependent Variable                                                                      0.095
Number of Observations                                                                         255,089
Notes: This table presents our main estimates from equation (4) using all areas tested in the SABER PRO. The coefficients represent the effect of being treated
at an LHC that was randomly assigned SSO physicians whose skill level is higher by one standard deviation. Relative (percent) effects are computed as the
coefficient divided by the average of the dependent variable. Unhealthy is a binary variable that takes a value of 1 if the newborn infant has a birth weight
below 2,500 grams, if the newborn infant is born after fewer than 37 weeks of gestation, or if the Apgar score of the newborn infant is lower than 7 and zero
otherwise. LBW is a binary variable that takes a value of 1 if the newborn infant has a birth weight below 2,500 grams and zero otherwise. Prematurity is a
binary variable that takes a value of 1 if the newborn infant is born after fewer than 37 weeks of gestation and zero otherwise. Low Apgar is a binary variable
that takes a value of 1 if the Apgar score of the newborn infant is lower than 7 and zero otherwise. Regressions for the coefficients labeled as With controls
also include the following controls: an indicator variable for the sex of the newborn; an indicator variable that takes the value of 1 if the mother has at least
secondary education and zero otherwise; an indicator variable that takes the value of 1 if the mother is 19 years old or younger and zero otherwise; marital
status; number of inhabitants in the municipality; number of LHCs per municipality; area; an indicator variable that takes the value of 1 if the LHC is above
the 75th percentile of the distribution of low birth weight measured in 2010-2012 and zero otherwise; an indicator variable that takes the value of 1 if the LHC
is above the 75th percentile of the distribution of prematurity measured in 2010-2012 and zero otherwise; and an indicator variable that takes the value of 1 if
the LHC is above the 75th percentile of the distribution of the Apgar score measured in 2010-2012 and zero otherwise. These results show that the estimated
effects are robust to using the average of the four areas tested in the SABER PRO (health management, public health, reading, quantitative) as well as each
individual (except for reading) score as proxies of the physician’s skills before the SSO program. The results are also robust to the inclusion/exclusion of
controls and how we measure skills. Numbers in parentheses are LHC-level clustered standard errors.
* p < 0.1, ** p < 0.05, *** p < 0.01




                                       Table A.6: Cohort-level mortality estimates

                                                         Fetal deaths                  Fetal and neonatal                 Infant Mortality
                                                                                             deaths                            Ratio
                                                     Average            PCA            Average            PCA           Average             PCA
                                                     Health            Health          Health            Health         Health             Health
                                                      Scores           Scores           Scores           Scores          Scores            Scores
                                                       (1)               (2)             (3)               (4)            (5)                (6)
         Coefficient                                  -0.9408          -0.6315          -0.9582          -0.6421         -0.0005           -0.0003
         Stand. Err.                                 (2.4024)         (1.5996)         (2.4228)          (1.613)        (0.0015)          (0.0010)
         Relative effect                              -6.39%           -4.29%           -6.10%           -4.09%          -2.18%            -1.43%
         Average Dependent Variable                            14.728                           15.705                             0.024
         Number of Observations                                 1,073                            1,073                             1,073
         Notes: This table presents our cohort-level estimates on mortality following equation (4). Relative (percent) effects are
         computed as the coefficient divided by the average of the dependent variable. Fetal deaths is the total number of fetal deaths
         registered at the LHC during the timeframe when the cohort was assigned. Fetal and neonatal deaths represent the total number
         of fetal deaths and fatalities of children under one year old registered in a LHC during the cohort’s assignment period (ideally,
         we would have preferred to focus on shorter-term mortality, but under one year was the most granular definition of infant
         mortality available in our data). Infant Mortality Ratio, represents the number of fetal and neonatal deaths divided by the total
         number of births births during the cohort’s assignment period. These variables are regressed on either the cohort’s average
         health score (columns 1, 3, 5) or the cohort’s PCA for the health scores (columns 2, 4, 6). We restrict to cohorts assigned
         to LHC where there are at least 5 births during their assignment period, but the results are similar when this threshold is
         increased/decreased/ignored. While these results are expected to be subject to high measurement error attenuation bias, we
         still observe negative, albeit not statistically significant, point estimates, which aligns with our main results.
         * p < 0.1, ** p < 0.05, *** p < 0.01




                                                                             A8
                  Table A.7: Controlling by Share of SSOs on LHC

                                       Unhealthy         LBW        Prematurity       Apgar < 7
                                                       Average Health Scores
                                           (1)           (2)        (3)                    (4)
Coefficient                            -0.0087***      -0.004**      -0.0044***        -0.0044**
SE                                      (0.0026)       (0.002)        (0.0016)         (0.0019)
Relative effect                          -9.11%         -9.45%        -10.82%           -11.68%
Average Dependent Variable                0.095          0.043         0.041              0.038
Number of Observations                                           255,089
Note: This table presents our main estimates from equation (4) controlling for the share of
SSO physicians at the LHC. The coefficients represent the effect of being treated at an LHC that
was randomly assigned SSO physicians whose skill level is higher by one standard deviation.
Relative (percent) effects are computed as the coefficient divided by the average of the dependent
variable. First stage coefficient and standard error is shown in figure 2. Unhealthy is a binary
variable that takes a value of 1 if the newborn infant has a birth weight below 2,500 grams, if
the newborn infant is born after fewer than 37 weeks of gestation, or if the Apgar score of the
newborn infant is lower than 7 and zero otherwise. LBW is a binary variable that takes a value
of 1 if the newborn infant has a birth weight below 2,500 grams and zero otherwise. Prematurity
is a binary variable that takes a value of 1 if the newborn infant is born after fewer than 37 weeks
of gestation and zero otherwise. Low Apgar is a binary variable that takes a value of 1 if the
Apgar score of the newborn infant is lower than 7 and zero otherwise. All regressions control
for draw-by-state fixed effects. Numbers in parentheses are LHC-level clustered standard errors.
We interpret the high significance and consistency of these results across the different measures
of health at birth as evidence of the important role that skilled physicians play in determining
an infant’s health at birth.
* p < 0.1, ** p < 0.05, *** p < 0.01




                                                 A9
       Table A.8: Antenatal consultations < 4

                                               Dependent variable:
                                            Antenatal consultations < 4
                                       Average Health                 PCA Health
                                           Scores                       Scores
                                            (1)                          (2)
                                                   a. Without controls
Coefficient                                 -0.0029                       -0.0031
Stand. Err.                                (0.0093)                      (0.0094)
Relative effect                             -1.77%                        -1.93%
                                                     b. With controls
Coefficient                                 -0.0055                       -0.0058
Stand. Err.                                (0.0092)                      (0.0093)
Relative effect                             -3.39%                        -3.53%
Average Dependent Variable                                0.163
Number of Observations                                   255,089
Notes: This Figure presents our main estimates from equation (4). The coefficients
represent the effect of being treated at an LHC that was randomly assigned SSO
physicians whose skill level is higher by one standard deviation, on the probability
that mothers are scheduled for less than four prenatal checkups (Insufficient
antenatal consultations). Relative (percent) effects are computed as the coefficient
divided by the average of the dependent variable. First stage coefficient and
standard error is shown in figure 2. Antenatal consultations < 4 takes value one if
the mother attended to less than 4 consultations while pregnant, an zero otherwise.
All regressions control for draw-state fixed effects. Regressions for the coefficients
labeled as With controls also include the following controls: an indicator variable
for the sex of the newborn; an indicator variable that takes the value of 1 if the
mother has at least secondary education and zero otherwise; an indicator variable
that takes the value of 1 if the mother is 19 years old or younger and zero otherwise;
marital status, number of inhabitants in the municipality; number of LHCs per
municipality; an indicator variable that equals 1 if the LHC is above the 75th
percentile of the low birth weight distribution for the country in 2010–2012, and
0 otherwise; an indicator variable that equals 1 if the LHC is above the 75th
percentile of the prematurity distribution for the country in 2010–2012, and 0
otherwise; and an indicator variable that equals 1 if the LHC is above the 75th
percentile of the Apgar score distribution for the country in 2010–2012, and 0
otherwise. Note that the results are robust to the inclusion/exclusion of controls
and how we measure skills. Numbers in parentheses are LHC-level clustered
standard errors. The results show there is not a significant average effect of more-
skilled doctors on the probability that mothers are scheduled for less than four
prenatal checkups.
* p < 0.1, ** p < 0.05, *** p < 0.01




                                       A10
          Figure A.5: Placebo using all samples and average scores




Notes: This Figure presents our placebo estimates from equation (4). The coefficients represent the effect
of being treated at an LHC that was randomly assigned SSO physicians whose skill level is higher by one
standard deviation. Relative (percent) effects are computed as the coefficient divided by the average of the
dependent variable. First stage coefficient and standard error is shown in figure 2. Unhealthy is a binary
variable that takes a value of 1 if the newborn infant has a birth weight below 2,500 grams, if the newborn
infant is born after fewer than 37 weeks of gestation, or if the Apgar score of the newborn infant is lower than
7 and zero otherwise. LBW is a binary variable that takes a value of 1 if the newborn infant has a birth weight
below 2,500 grams and zero otherwise. Prematurity is a binary variable that takes a value of 1 if the newborn
infant is born after fewer than 37 weeks of gestation and zero otherwise. Low Apgar is a binary variable that
takes a value of 1 if the Apgar score of the newborn infant is lower than 7 and zero otherwise. All regressions
control for draw-state fixed effects. Regressions for the coefficients labeled as With controls also include the
following controls: an indicator variable for the sex of the newborn; an indicator variable that takes the value
of 1 if the mother has at least secondary education and zero otherwise; an indicator variable that takes the
value of 1 if the mother is 19 years old or younger and zero otherwise; marital status, number of inhabitants
in the municipality; number of LHCs per municipality; an indicator variable that equals 1 if the LHC is
above the 75th percentile of the low birth weight distribution for the country in 2010–2012, and 0 otherwise;
an indicator variable that equals 1 if the LHC is above the 75th percentile of the prematurity distribution for
the country in 2010–2012, and 0 otherwise; and an indicator variable that equals 1 if the LHC is above the
75th percentile of the Apgar score distribution for the country in 2010–2012, and 0 otherwise. These results
show that the estimated effects are robust to the inclusion/exclusion of controls and the way we measure of
skills. These results support the ones presented in Table 3 on the robustness of the estimated zero effect for
the placebo tests.




                                                    A11
Figure A.6: Distribution of Logit simulations on antenatal consultations by predicted probability
of unhealthy newborn




         Notes: This figure plots the distribution of the estimated effects of physicians on antenatal
         consultations by mother’s predicted probability of giving birth to an unhealthy child from
         1,000 different random repetitions. In each of the 1,000 repetitions, to predict the probability
         of an unhealthy child, we divided our data into training and testing subsets of randomly
         selected LHCs using a K-mean algorithm. On the training sets, we run a Logit model of the
         probability of being born unhealthy on our usual set of mother and LHC ex-ante covariates,
         and use the estimations to predict the probability of giving birth to an unhealthy child on
         each testing subset. Using the prediction on the testing sample, we divide each subset into
         high and low predicted probability of giving birth to an unhealthy child, defined as mothers
         with a probability of an unhealthy child below and above the 75th percentile, respectively.
         Unhealthy is a binary variable that takes the value of 1 if the newborn has low birth weight
         or if the newborn is premature (fewer than 37 weeks of gestation) or if the Apgar score
         of the newborn is lower than 7, and zero otherwise. The plotted coefficients represent the
         effect of being assigned a physician with one standard deviation higher quality (proxied
         by the average score) on the probability of having insufficient (less than four) antenatal
         consultations. All regressions control for draw-by-state fixed effects. The figure shows
         that there is almost no overlap between the distributions and that most of the mass of the
         distribution for the coefficient associated with the low predicted Unhealthy is around zero.
         This is consistent with the idea that more skilled physicians are better at targeting the care
         towards the more vulnerable mothers.




                                                      A12
                             Table A.9: Main estimates without and with controls

                                            Unhealthy                         LBW                       Prematurity                     Apgar < 7
                                       Average            PCA         Average           PCA         Average            PCA         Average           PCA
                                       Health            Health       Health           Health       Health            Health       Health           Health
                                        Scores           Scores        Scores          Scores        Scores           Scores        Scores          Scores
                                         (1)               (2)          (3)              (4)          (5)               (6)          (7)              (8)
                                                                                       a. Without controls
Coefficient                           -0.0087***     -0.0086***      -0.0041**       -0.0040*      -0.0045***      -0.0045***      -0.0043**      -0.0043**
Stand. Err.                            (0.0026)       (0.0026)       (0.0021)        (0.0021)       (0.0017)        (0.0016)       (0.0019)       (0.0019)
Relative effect                         -9.14%         -9.02%         -9.57%          -9.38%        -10.99%         -10.89%         -11.56%        -11.46%
                                                                                         b. With controls
Coefficient                           -0.0076***     -0.0075***      -0.0045**       -0.0045**     -0.0050***      -0.0050***       -0.0024        -0.0023
Stand. Err.                            (0.0023)       (0.0023)       (0.0018)        (0.0018)       (0.0015)        (0.0015)       (0.0018)       (0.0018)
Relative effect                         -7.94%         -7.85%         -10.60%         -10.53%       -12.17%         -12.22%         -6.39%         -6.20%
Average Dependent Variable                       0.095                         0.043                          0.041                         0.038
Number of Observations                                                                       255,089
Notes: This table presents our main estimates from equation (4) with and without controls. The coefficients represent the effect of being treated at an
LHC that was randomly assigned SSO physicians whose skill level is higher by one standard deviation. Relative (percent) effects are computed as the
coefficient divided by the average of the dependent variable. First stage coefficient and standard error is shown in figure 2. Unhealthy is a binary variable
that takes a value of 1 if the newborn infant has a birth weight below 2,500 grams, if the newborn infant is born after fewer than 37 weeks of gestation, or
if the Apgar score of the newborn infant is lower than 7 and zero otherwise. LBW is a binary variable that takes a value of 1 if the newborn infant has a
birth weight below 2,500 grams and zero otherwise. Prematurity is a binary variable that takes a value of 1 if the newborn infant is born after fewer than
37 weeks of gestation and zero otherwise. Low Apgar is a binary variable that takes a value of 1 if the Apgar score of the newborn infant is lower than 7
and zero otherwise. All regressions control for draw-state fixed effects. Regressions for the coefficients labeled as With controls also include the following
controls: an indicator variable for the sex of the newborn; an indicator variable that takes the value of 1 if the mother has at least secondary education
and zero otherwise; an indicator variable that takes the value of 1 if the mother is 19 years old or younger and zero otherwise; marital status, number of
inhabitants in the municipality; number of LHCs per municipality; an indicator variable that equals 1 if the LHC is above the 75th percentile of the low
birth weight distribution for the country in 2010–2012, and 0 otherwise; an indicator variable that equals 1 if the LHC is above the 75th percentile of the
prematurity distribution for the country in 2010–2012, and 0 otherwise; and an indicator variable that equals 1 if the LHC is above the 75th percentile of the
Apgar score distribution for the country in 2010–2012, and 0 otherwise. These results show that the estimated effects are robust to the inclusion/exclusion
of controls and the way we measure quality. Numbers in parentheses are LHC-level clustered standard errors.
* p < 0.1, ** p < 0.05, *** p < 0.01




                                                                           A13
Figure A.7: Distribution of the coefficient logit simulations on the probability of being born
unhealthy by the (ex-ante) predicted probability of an unhealthy newborn




         Notes: This figure plots the distribution of the estimated effects of physicians on the
         probability of being born unhealthy by mother’s predicted probability of giving birth to an
         unhealthy child from 1,000 different random repetitions. In each of the 1,000 repetitions, to
         predict the probability of an unhealthy child, we divided our data into training and testing
         subsets of randomly selected LHCs using a K-mean algorithm. On the training sets, we run
         a Logit model of the probability of being born unhealthy on our usual set of mother and
         LHC ex-ante covariates, and use the estimations to predict the probability of giving birth to
         an unhealthy child on each testing subset. Using the prediction on the testing sample, we
         divide each subset into high and low predicted probability of giving birth to an unhealthy
         child, defined as mothers with a probability of an unhealthy child below and above the 75th
         percentile, respectively. Unhealthy is a binary variable that takes the value of 1 if the newborn
         has low birth weight or if the newborn is premature (fewer than 37 weeks of gestation)
         or if the Apgar score of the newborn is lower than 7, and zero otherwise. The plotted
         coefficients represent the effect of being assigned a physician with one standard deviation
         higher quality (proxied by the average score) on the probability of having insufficient (less
         than four) antenatal consultations. All regressions control for draw-by-state fixed effects.
         The figure shows that there is almost no overlap between the distributions and that the
         estimated effects of the more skilled physicians are consistently stronger for the population
         with higher predicted probability of being born unhealthy.




                                                      A14
                     Figure A.8: Main estimates using all sample




Notes: presents our main estimates from equation (4). The coefficients represent the effect of being treated
at an LHC that was randomly assigned SSO physicians whose skill level is higher by one standard deviation.
Relative (percent) effects are computed as the coefficient divided by the average of the dependent variable.
First stage coefficient and standard error is shown in figure 2. Unhealthy is a binary variable that takes a value
of 1 if the newborn infant has a birth weight below 2,500 grams, if the newborn infant is born after fewer
than 37 weeks of gestation, or if the Apgar score of the newborn infant is lower than 7 and zero otherwise.
LBW is a binary variable that takes a value of 1 if the newborn infant has a birth weight below 2,500 grams
and zero otherwise. Prematurity is a binary variable that takes a value of 1 if the newborn infant is born after
fewer than 37 weeks of gestation and zero otherwise. Low Apgar is a binary variable that takes a value of
1 if the Apgar score of the newborn infant is lower than 7 and zero otherwise. All regressions control for
draw-by-state fixed effects. Regressions for the coefficients labeled as With controls also include the following
controls: an indicator variable for the sex of the newborn; an indicator variable that takes the value of 1 if
the mother has at least secondary education and zero otherwise; an indicator variable that takes the value of
1 if the mother is 19 years old or younger and zero otherwise; marital status, number of inhabitants in the
municipality; number of LHCs per municipality; an indicator variable that takes the value of 1 if the LHC is
above the 75th percentile of the distribution of low birth weight measured in 2010-2012 and zero otherwise;
an indicator variable that equals 1 if the LHC is above the 75th percentile of the low birth weight distribution
for the country in 2010–2012, and 0 otherwise; an indicator variable that equals 1 if the LHC is above the 75th
percentile of the prematurity distribution for the country in 2010–2012, and 0 otherwise; and an indicator
variable that equals 1 if the LHC is above the 75th percentile of the Apgar score distribution for the country in
2010–2012, and 0 otherwise. These results show that the estimated effects are robust to the inclusion/exclusion
of controls and the way we measure physicians’ skills (Averages vs. principal components). Standard errors
are clustered at the LHC level. 95% confidence intervals.




                                                     A15
 Table A.10: Main results using covariance index (Anderson, 2008)

                                       Unhealthy Cov index                    Unhealthy standarized
                                  Average Scores        PCA Scores Average Scores PCA Scores
                                       (1)                   (2)              (3)        (4)
                                                             a. Without controls
Coefficient                           -0.0220***         -0.0218***       -0.0297*** -0.0292***
Stand. Err.                            (0.0065)           (0.0065)         (0.0088)   (0.0088)
Relative effect                         -3.23%             -3.19%           -2.97%     -2.92%
                                                               b. With controls
Coefficient                           -0.0192***          -0.0190***       -0.0257***             -0.0255***
Stand. Err.                            (0.0060)            (0.0060)         (0.0078)               (0.0079)
Relative effect                         -2.82%              -2.79%           -2.57%                 -2.55%

Number of Observations                                              255,089
Notes: This table presents our main estimates from equation (4) using (Anderson, 2008) covariance index.
The coefficients represent the effect of being treated at an LHC that was randomly assigned SSO physicians
whose skill level is higher by one standard deviation. Relative (percent) effects are computed as the
coefficient divided by the average of the dependent variable. Unhealthy is a binary variable that takes a
value of 1 if the newborn infant has a birth weight below 2,500 grams, if the newborn infant is born after
fewer than 37 weeks of gestation, or if the Apgar score of the newborn infant is lower than 7 and zero
otherwise. All regressions control for draw-by-state fixed effects. Regressions for the coefficients labeled
as With controls also include the following controls: an indicator variable for the sex of the newborn;
an indicator variable that takes the value of 1 if the mother has at least secondary education and zero
otherwise; an indicator variable that takes the value of 1 if the mother is 19 years old or younger and zero
otherwise; marital status, number of inhabitants in the municipality; number of LHCs per municipality; an
indicator variable that equals 1 if the LHC is above the 75th percentile of the low birth weight distribution
for the country in 2010–2012, and 0 otherwise; an indicator variable that equals 1 if the LHC is above
the 75th percentile of the prematurity distribution for the country in 2010–2012, and 0 otherwise; and an
indicator variable that equals 1 if the LHC is above the 75th percentile of the Apgar score distribution for the
country in 2010–2012, and 0 otherwise. These results show that the estimated effects are robust to using the
covariance index as an outcome instead of unhealthy. The results are also robust to the inclusion/exclusion
of controls and how we measure quality. Numbers in parentheses are LHC-level clustered standard errors.
* p < 0.1, ** p < 0.05, *** p < 0.01




                                                    A16
                                  Table A.11: Main estimates using a Logit model

                                            Unhealthy                          LBW                       Prematurity                     Apgar < 7
                                       Average            PCA         Average            PCA         Average            PCA         Average           PCA
                                       Health            Health       Health            Health       Health            Health       Health           Health
                                        Scores           Scores        Scores           Scores        Scores           Scores        Scores          Scores
                                         (1)               (2)          (3)               (4)          (5)               (6)          (7)              (8)
                                                                                        a. Without controls
Coefficient                           -0.0070***      -0.0069***      -0.0032**      -0.0031**      -0.0039***     -0.0039***      -0.0034**       -0.0033**
Stand. Err.                            (0.0021)        (0.0021)       (0.0015)       (0.0015)        (0.0014)       (0.0014)       (0.0015)        (0.0015)
Relative effect                         -7.34%          -7.22%         -7.45%         -7.27%          -9.54%         -9.41%         -9.03%          -8.92%
                                                                                         b. With controls
Coefficient                           -0.0059***      -0.0058***     -0.0036***      -0.0035***     -0.0046***     -0.0046***        -0.0020        -0.0019
Stand. Err.                            (0.0018)        (0.0018)       (0.0013)        (0.0013)       (0.0012)       (0.0012)        (0.0012)       (0.0012)
Relative effect                         -6.20%          -6.13%         -8.37%          -8.29%        -11.17%        -11.18%          -5.22%         -5.08%
Average Dependent Variable                       0.095                          0.043                          0.041                         0.038
Number of Observations                                                                        255,079
Notes: This table presents our main estimates from equation (3) using a logit model. The coefficients represent the effect of being treated at an LHC that
was randomly assigned SSO physicians whose skill level is higher by one standard deviation. Relative (percent) effects are computed as the coefficient
divided by the average of the dependent variable. Unhealthy is a binary variable that takes a value of 1 if the newborn infant has a birth weight below 2,500
grams, if the newborn infant is born after fewer than 37 weeks of gestation, or if the Apgar score of the newborn infant is lower than 7 and zero otherwise.
LBW is a binary variable that takes a value of 1 if the newborn infant has a birth weight below 2,500 grams and zero otherwise. Prematurity is a binary
variable that takes a value of 1 if the newborn infant is born after fewer than 37 weeks of gestation and zero otherwise. Low Apgar is a binary variable
that takes a value of 1 if the Apgar score of the newborn infant is lower than 7 and zero otherwise. All regressions control for draw-by-state fixed effects.
Regressions for the coefficients labeled as With controls also include the following controls: an indicator variable for the sex of the newborn; an indicator
variable that takes the value of 1 if the mother has at least secondary education and zero otherwise; an indicator variable that takes the value of 1 if the
mother is 19 years old or younger and zero otherwise; marital status, number of inhabitants in the municipality; number of LHCs per municipality; an
indicator variable that equals 1 if the LHC is above the 75th percentile of the low birth weight distribution for the country in 2010–2012, and 0 otherwise; an
indicator variable that equals 1 if the LHC is above the 75th percentile of the prematurity distribution for the country in 2010–2012, and 0 otherwise; and an
indicator variable that equals 1 if the LHC is above the 75th percentile of the Apgar score distribution for the country in 2010–2012, and 0 otherwise. These
results show that the estimated effects are robust to using an analogous Logit model and compute the average marginal effect associated with an increase
in one standard deviation of the skill measure. The results are also robust to the inclusion/exclusion of controls and how we measure quality. Numbers in
parentheses are LHC-level clustered standard errors.
* p < 0.1, ** p < 0.05, *** p < 0.01




                                                                           A17
         Table A.12: Main estimates linearity

                                      Dependent variable: Unhealthy
                                       Average Health            PCA Health
                                           Scores                  Scores
                                            (1)                     (2)
                Coefficient                  -0.0150                -0.0147
Quartile 2      Stand. Err.                 (0.010)                (0.0105)
                Relative effect             -15.79%                -15.45%

                Coefficient                 -0.0217**               -0.0227
Quartile 3      Stand. Err.                 (0.0104)               (0.0145)
                Relative effect              -22.74%               -23.82%

                Coefficient                 -0.0196**              -0.0199**
Quartile 4      Stand. Err.                 (0.0089)               (0.0079)
                Relative effect              -20.62%                -20.94%
Notes: This table presents our main estimates from equation (4) using the
quartiles of the quality distribution. The coefficients represent the effect
of being treated at an LHC that was randomly assigned SSO physicians
whose skill level was at the 2nd, 3rd, or 4th quartile of the physicians’
quality distribution compared to being treated at an LHC that was randomly
assigned SSO physicians whose skill level was at the 1st quartile. Relative
(percent) effects are computed as the coefficient divided by the average of the
dependent variable. Unhealthy is a binary variable that takes a value of 1 if the
newborn infant has a birth weight below 2,500 grams, if the newborn infant
is born after fewer than 37 weeks of gestation, or if the Apgar score of the
newborn infant is lower than 7 and zero otherwise. All regressions control for
draw-by-state fixed effects. Numbers in parentheses are LHC-level clustered
standard errors. While not all the coefficients are statistically different from
each other, we do observe increases in the point estimates associated with
higher quartiles and cannot discard linearity of the effects.
* p < 0.1, ** p < 0.05, *** p < 0.01




                                     A18
       Table A.13: Main estimates without, with dummy and continuous controls
                                             Unhealthy                          LBW                       Prematurity                      Apgar < 7
                                        Average           PCA          Average            PCA         Average            PCA         Average            PCA
                                        Health           Health        Health            Health       Health            Health       Health            Health
                                         Scores          Scores         Scores           Scores        Scores           Scores        Scores           Scores
                                          (1)              (2)           (3)               (4)          (5)               (6)          (7)               (8)
                                                                                      b. With dummy controls
Coefficient                            -0.0087***     -0.0086***       -0.0041**      -0.0040*       -0.0045***      -0.0045***      -0.0043**       -0.0043**
Stand. Err.                             (0.0026)       (0.0026)        (0.0021)       (0.0021)        (0.0017)        (0.0016)       (0.0019)        (0.0019)
Relative effect                          -9.14%         -9.02%          -9.57%         -9.38%         -10.99%         -10.89%         -11.56%         -11.46%
                                                                                      b. With dummy controls
Coefficient                            -0.0076***     -0.0075***       -0.0045**      -0.0045**      -0.0050***      -0.0050***       -0.0024         -0.0023
Stand. Err.                             (0.0023)       (0.0023)        (0.0018)       (0.0018)        (0.0015)        (0.0015)       (0.0018)        (0.0018)
Relative effect                          -7.94%         -7.85%          -10.60%        -10.53%        -12.17%         -12.22%         -6.39%          -6.20%
                                                                                   c. With continuous controls
Coefficient                            -0.0077***     -0.0077***       -0.0036**      -0.0036**      -0.0041***      -0.0042***      -0.0037**       -0.0037**
Stand. Err.                             (0.0021)       (0.0021)        (0.0017)       (0.0017)        (0.0012)        (0.0012)       (0.0017)        (0.0017)
Relative effect                          -8.11%         -8.06%          -8.53%         -8.52%          -9.95%         -10.12%         -9.95%          -9.73%
Average Dependent Variable                       0.095                           0.043                          0.041                          0.038
Number of Observations                                                                         255,089
Notes: This table presents our main estimates from equation (4) without controls, and with dummy and continuous controls. The coefficients represent
the effect of being treated at an LHC that was randomly assigned SSO physicians whose skill level is higher by one standard deviation. Relative (percent)
effects are computed as the coefficient divided by the average of the dependent variable. First stage coefficient and standard error is shown in figure 2.
Unhealthy is a binary variable that takes a value of 1 if the newborn infant has a birth weight below 2,500 grams, if the newborn infant is born after fewer
than 37 weeks of gestation, or if the Apgar score of the newborn infant is lower than 7 and zero otherwise. LBW is a binary variable that takes a value of 1
if the newborn infant has a birth weight below 2,500 grams and zero otherwise. Prematurity is a binary variable that takes a value of 1 if the newborn infant
is born after fewer than 37 weeks of gestation and zero otherwise. Low Apgar is a binary variable that takes a value of 1 if the Apgar score of the newborn
infant is lower than 7 and zero otherwise. All regressions control for draw-by-state fixed effects. Regressions for the coefficients labeled as With dummy
controls also include the following controls: an indicator variable for the sex of the newborn; an indicator variable that takes the value of 1 if the mother has
at least secondary education and zero otherwise; an indicator variable that takes the value of 1 if the mother is 19 years old or younger and zero otherwise;
marital status; an indicator variable that equals 1 if the LHC is above the 75th percentile of the low birth weight distribution for the country in 2010–2012,
and 0 otherwise; an indicator variable that equals 1 if the LHC is above the 75th percentile of the prematurity distribution for the country in 2010–2012,
and 0 otherwise; and an indicator variable that equals 1 if the LHC is above the 75th percentile of the Apgar score distribution for the country in 2010–2012,
and 0 otherwise. Regressions for the coefficients labeled as With continuous controls include the following controls: an indicator variable for the sex of the
newborn; an indicator variable that takes the value of 1 if the mother has at least secondary education and zero otherwise; an indicator variable that takes the
value of 1 if the mother is adolescent and zero otherwise; marital status; the LHC’s low birth weight average measured in 2010-2012; the LHC’s prematurity
percentage measured in 2010-2012; and the LHC’s Apgar average measured in 2010-2012 and zero otherwise . These results show that the estimated effects
are robust to the inclusion/exclusion of controls and the way we measure of skills. Numbers in parentheses are LHC-level clustered standard errors.
* p < 0.1, ** p < 0.05, *** p < 0.01




               Table A.14: Interaction between cohort scores and program scores

                                                                   Unhealthy                   LBW                 Prematurity               Apgar < 7
                                                                      (1)                       (2)                    (3)                     (4)
                                           Coefficient               -0.0090**               -0.0046**                -0.0047**                -0.0035
  Average Health Score                     Stand. Err.               (0.0035)                (0.0019)                 (0.0021)                (0.0029)
                                           Relative effect            -9.40%                  -10.83%                  -11.47%                 -9.36%
                                           Coefficient                0.0023                  0.0028**                  0.0023                 -0.0011
  Program Average                          Stand. Err.               (0.0028)                (0.0014)                 (0.0016)                (0.0026)
                                           Relative effect             2.39%                   6.45%                    5.53%                  -2.83%
                                           Coefficient                0.0013                   0.0012                   0.0014                  0.0005
  Av. Health Sc. x Prog. Av.               Stand. Err.               (0.0017)                (0.0015)                 (0.0011)                (0.0013)
                                           Relative effect             1.35%                   2.75%                    3.32%                   1.42%
  Average Dependent Variable                                                  0.095                    0.043                     0.041                  0.038
  Number of Observations                                                                                  255,089
  Notes: This table presents our main estimates from equation (4) using the interaction between cohort and program scores. The coefficients
  represent the effect of being treated at an LHC that was randomly assigned SSO physicians whose skill level is higher by one standard deviation.
  Relative (percent) effects are computed as the coefficient divided by the average of the dependent variable. Unhealthy is a binary variable that
  takes a value of 1 if the newborn infant has a birth weight below 2,500 grams, if the newborn infant is born after fewer than 37 weeks of gestation,
  or if the Apgar score of the newborn infant is lower than 7 and zero otherwise. LBW is a binary variable that takes a value of 1 if the newborn
  infant has a birth weight below 2,500 grams and zero otherwise. Prematurity is a binary variable that takes a value of 1 if the newborn infant
  is born after fewer than 37 weeks of gestation and zero otherwise. Low Apgar is a binary variable that takes a value of 1 if the Apgar score of
  the newborn infant is lower than 7 and zero otherwise. All regressions control for draw state fixed effects. These results show that the effects
  presented in Table 4 are not driven by top-ranked universities. Numbers in parentheses are LHC-level clustered standard errors.
  * p < 0.1, ** p < 0.05, *** p < 0.01




                                                                            A19
                       Table A.15: Main results using municipalities with one LHC

                                             Unhealthy                          LBW                        Prematurity                      Apgar < 7
                                        Average            PCA         Average            PCA          Average            PCA         Average            PCA
                                        Health            Health       Health            Health        Health            Health       Health            Health
                                         Scores           Scores        Scores           Scores         Scores           Scores        Scores           Scores
                                          (1)               (2)          (3)               (4)           (5)               (6)          (7)               (8)
                                                                                         a. Without controls
Coefficient                            -0.0077***      -0.0075***      -0.0040*        -0.0039*       -0.0045***      -0.0044***      -0.0033*        -0.0033*
Stand. Err.                             (0.0025)        (0.0026)       (0.0021)        (0.0021)        (0.0017)        (0.0017)       (0.0017)        (0.0017)
Relative effect                          -7.97%          -7.84%         -9.21%          -8.95%         -10.69%         -10.57%         -8.71%          -8.72%
                                                                                           b. With controls
Coefficient                            -0.0068***      -0.0068***      -0.0038**       -0.0038**      -0.0046***      -0.0046***       -0.0023         -0.0023
Stand. Err.                             (0.0024)        (0.0024)       (0.0019)        (0.0019)        (0.0015)        (0.0015)       (0.0018)        (0.0018)
Relative effect                          -7.07%          -7.02%         -8.91%          -8.88%         -10.89%         -11.02%         -6.14%          -6.01%
Average Dependent Variable                        0.096                          0.043                           0.042                          0.038
Number of Observations                                                                         238,296
Notes: This table presents our main estimates from equation (4) using municipalities with only one LHC. The coefficients represent the effect of being
treated at an LHC that was randomly assigned SSO physicians whose skill level is higher by one standard deviation. Relative (percent) effects are computed
as the coefficient divided by the average of the dependent variable. Unhealthy is a binary variable that takes a value of 1 if the newborn infant has a birth
weight below 2,500 grams, if the newborn infant is born after fewer than 37 weeks of gestation, or if the Apgar score of the newborn infant is lower than
7 and zero otherwise. LBW is a binary variable that takes a value of 1 if the newborn infant has a birth weight below 2,500 grams and zero otherwise.
Prematurity is a binary variable that takes a value of 1 if the newborn infant is born after fewer than 37 weeks of gestation and zero otherwise. Low Apgar is
a binary variable that takes a value of 1 if the Apgar score of the newborn infant is lower than 7 and zero otherwise. All regressions control for draw-state
fixed effects. Regressions for the coefficients labeled as With controls also include the following controls: an indicator variable for the sex of the newborn; an
indicator variable that takes the value of 1 if the mother has at least secondary education and zero otherwise; an indicator variable that takes the value of 1
if the mother is 19 years old or younger and zero otherwise; marital status, number of inhabitants in the municipality; number of LHCs per municipality;
an indicator variable that equals 1 if the LHC is above the 75th percentile of the low birth weight distribution for the country in 2010–2012, and 0 otherwise;
an indicator variable that equals 1 if the LHC is above the 75th percentile of the prematurity distribution for the country in 2010–2012, and 0 otherwise;
and an indicator variable that equals 1 if the LHC is above the 75th percentile of the Apgar score distribution for the country in 2010–2012, and 0 otherwise.
The table shows that the results presented in Table 4 are almost identical if we exclude from our main sample the ten municipalities with more than two
LHCs per municipality. The results are also robust to the inclusion/exclusion of controls and how we measure skills. Numbers in parentheses are LHC-
level clustered standard errors.
* p < 0.1, ** p < 0.05, *** p < 0.01




                                                                             A20
Table A.16: Main results using the weighted score without and with controls

                                            Unhealthy             LBW           Prematurity        Apgar < 7
                                                              Average Health Scores
                                                (1)              (2)           (3)                      (4)
                                                                   a. Without controls
    Coefficient                             -0.0084***          -0.0041*         -0.0047***         -0.0041**
    Stand. Err.                              (0.0026)           (0.0021)          (0.0017)          (0.0019)
    Relative effect                           -8.82%             -9.54%           -11.35%            -10.92%
                                                                     b. With controls
    Coefficient                             -0.0070***         -0.0044**         -0.0049***          -0.0021
    Stand. Err.                              (0.0023)          (0.0018)           (0.0015)          (0.0018)
    Relative effect                           -7.39%            -10.18%           -12.00%            -5.70%
    Average Dependent Variable                 0.095              0.043             0.041             0.037
    Number of Observations                                                252,159
    Notes: This table presents our main estimates from equation (4) using the weighted score. The coefficients
    represent the effect of being treated at an LHC that was randomly assigned SSO physicians whose skill
    level is higher by one standard deviation. Relative (percent) effects are computed as the coefficient
    divided by the average of the dependent variable. Unhealthy is a binary variable that takes a value of 1 if
    the newborn infant has a birth weight below 2,500 grams, if the newborn infant is born after fewer than 37
    weeks of gestation, or if the Apgar score of the newborn infant is lower than 7 and zero otherwise. LBW
    is a binary variable that takes a value of 1 if the newborn infant has a birth weight below 2,500 grams and
    zero otherwise. Prematurity is a binary variable that takes a value of 1 if the newborn infant is born after
    fewer than 37 weeks of gestation and zero otherwise. Low Apgar is a binary variable that takes a value of
    1 if the Apgar score of the newborn infant is lower than 7 and zero otherwise. All regressions control for
    draw-state fixed effects. Regressions for the coefficients labeled as With controls also include the following
    controls: an indicator variable for the sex of the newborn; an indicator variable that takes the value of 1 if
    the mother has at least secondary education and zero otherwise; an indicator variable that takes the value
    of 1 if the mother is 19 years old or younger and zero otherwise; marital status, number of inhabitants in
    the municipality; number of LHCs per municipality; an indicator variable that equals 1 if the LHC is above
    the 75th percentile of the low birth weight distribution for the country in 2010–2012, and 0 otherwise; an
    indicator variable that equals 1 if the LHC is above the 75th percentile of the prematurity distribution for
    the country in 2010–2012, and 0 otherwise; and an indicator variable that equals 1 if the LHC is above the
    75th percentile of the Apgar score distribution for the country in 2010–2012, and 0 otherwise. The table
    shows that the results are very similar when the weighted score is used as a proxy of physicians’ skills.
    The results are also robust to the inclusion/exclusion of controls and how we measure skills. Numbers in
    parentheses are LHC-level clustered standard errors.
    * p < 0.1, ** p < 0.05, *** p < 0.01




                                                       A21