WPS7921
Policy Research Working Paper 7921
Does Mass Deworming Affect Child Nutrition?
Meta-Analysis, Cost-Effectiveness, and Statistical Power
Kevin Croke
Joan Hamory Hicks
Eric Hsu
Michael Kremer
Edward Miguel
Development Research Group
Impact Evaluation Team
December 2016
Policy Research Working Paper 7921
Abstract
The WHO has recently debated whether to reaffirm its and underpowered to reject the hypothesis that the effect
long-standing recommendation of mass drug administra- of MDA is different from the effect that might expected,
tion (MDA) in areas with more than 20 percent prevalence given deworming’s effects on those known to be infected.
of soil-transmitted helminths (hookworm, whipworm, The hypothesis of a common zero effect of multiple-dose
and roundworm). There is consensus that the relevant MDA deworming on child weight at longest follow-up is
deworming drugs are safe and effective, so the key ques- rejected at the 10 percent level using the TMSDG data-
tion facing policymakers is whether the expected benefits set, and with a p value < 0.001 using the updated sample.
of MDA exceed the roughly $0.30 per treatment cost. The In the full sample, including studies in settings where
literature on long run educational and economic impacts prevalence is low enough that the WHO does not recom-
of deworming suggests that this is the case. However, a mend deworming, the average effect on child weight is
recent meta-analysis by Taylor-Robinson et al. (2015), 0.134 kg (95 percent CI: 0.031, 0.236, random effects).
(hereafter TMSDG), disputes these findings. The authors In environments with greater than 20 percent prevalence,
conclude that while treatment of children known to be where the WHO recommends mass treatment, the average
infected increases weight by 0.75 kg (95 percent CI: 0.24, effect on child weight is 0.148 kg (95 percent CI: 0.039,
1.26; p=0.0038), there is substantial evidence that MDA 0.258). The implied average effect of MDA on infected
has no impact on weight or other child outcomes. This children in the full sample is 0.301 kg. At 0.22 kg per
paper updates the TMSDG analysis by including stud- U.S. dollar, the estimated average weight gain per dollar
ies omitted from that analysis and extracting additional is more than 35 times that from school feeding programs
data from included studies, and finds that the TMSDG as estimated in RCTs. Under-powered meta-analyses are
analysis is underpowered: Power is inadequate to rule out common in health research, and this methodological issue
weight gain effects that would make MDA cost effective will be increasingly important as growing numbers of econ-
relative to comparable interventions in similar populations, omists and other social scientists conduct meta-analysis.
This paper is a product of the Impact Evaluation Team, Development Research Group. It is part of a larger effort by the
World Bank to provide open access to its research and make a contribution to development policy discussions around the
world. Policy Research Working Papers are also posted on the Web at http://econ.worldbank.org. The authors may be
contacted at kcroke@worldbank.org.
The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.
Produced by the Research Support Team
Does Mass Deworming Affect Child Nutrition?
Meta-Analysis, Cost-Effectiveness, and Statistical Power
Kevin Croke1, Joan Hamory Hicks2, Eric Hsu2, Michael Kremer3, Edward Miguel2
1
World Bank and Harvard T.H. Chan School of Public Health
2
University of California, Berkeley
3
Harvard University
December 16, 2016
Abstract: The WHO has recently debated whether to reaffirm its long-standing recommendation of mass
drug administration (MDA) in areas with more than 20% prevalence of soil-transmitted helminths
(hookworm, whipworm, and roundworm). There is consensus that the relevant deworming drugs are safe
and effective, so the key question facing policymakers is whether the expected benefits of MDA exceed the
roughly $0.30 per treatment cost. The literature on long run educational and economic impacts of
deworming suggests that this is the case. However, a recent meta-analysis by Taylor-Robinson et al.
(2015), (hereafter TMSDG), disputes these findings. The authors conclude that while treatment of children
known to be infected increases weight by 0.75 kg (95% CI: 0.24, 1.26; p=0.0038), there is substantial
evidence that MDA has no impact on weight or other child outcomes. This paper updates the TMSDG
analysis by including studies omitted from that analysis and extracting additional data from included
studies, and finds that the TMSDG analysis is underpowered: Power is inadequate to rule out weight gain
effects that would make MDA cost effective relative to comparable interventions in similar populations,
and underpowered to reject the hypothesis that the effect of MDA is different from the effect that might
expected, given deworming’s effects on those known to be infected. The hypothesis of a common zero
effect of multiple-dose MDA deworming on child weight at longest follow-up is rejected at the 10% level
using the TMSDG dataset, and with a p value < 0.001 using the updated sample. In the full sample,
including studies in settings where prevalence is low enough that the WHO does not recommend
deworming, the average effect on child weight is 0.134 kg (95% CI: 0.031, 0.236, random effects). In
environments with greater than 20% prevalence, where the WHO recommends mass treatment, the average
effect on child weight is 0.148 kg (95% CI: 0.039, 0.258). The implied average effect of MDA on infected
children in the full sample is 0.301 kg. At 0.22 kg per U.S. dollar, the estimated average weight gain per
dollar is more than 35 times that from school feeding programs as estimated in RCTs. Under-powered
meta-analyses are common in health research, and this methodological issue will be increasingly important
as growing numbers of economists and other social scientists conduct meta-analysis.
JEL classifications: I10, I15, I38, H43.
Keywords: deworming, nutrition, meta-analysis, cost-effectiveness
Acknowledgments: We are grateful to the Campbell Collaboration deworming team, especially Vivian Welch, for generously
sharing data and information on their work. We thank Harold Alderman, Shally Awasthi, Don Bundy, Serene Joseph, Chengfang Liu,
Scott Rozelle, and Walter Willett for help interpreting their studies, searching for original data, or supplying us with additional results.
We thank Egor Abramov, Kamran Jamil, Spencer Ma, Cole Scanlon, Wen Wang, and Kevin Xie for research assistance. We thank
Amrita Ahuja, Harold Alderman, Sarah Baird, Matthew Basilico, Peter Hotez, Rachael Meager, Antonio Montresor, and Dina Pomeranz
for helpful comments.
Croke et al. (2016)
1 Introduction
Soil transmitted helminths (STH; including hookworm, whipworm, and roundworm) are
estimated to affect 1 in 4 people in endemic countries (Pullan et al., 2014) and at least 1.3
billion people worldwide (Global Atlas of Helminth Infections, 2016). The intensity of
infection is correlated with prevalence, and highly skewed, so while many infected
individuals have light infections, only a minority have the moderate to severe intensity
infections believed to account for the bulk of morbidity (Anderson, Truscott and
Hollingsworth, 2014).
Because of the damage worms can cause, there is consensus that those known to be
infected should be treated. There is also consensus that deworming drugs are safe and
effective; indeed, they are the standard of care for those known to be infected. Because
individual collection and testing of stool samples prior to treatment is prohibitively
expensive and logistically impractical in many low-income contexts, the World Health
Organization (WHO) has long recommended mass drug administration (MDA) in endemic
areas. In particular, the WHO recommends annual MDA in areas with more than 20%
prevalence of soil-transmitted helminths, and multiple annual MDA where prevalence is
above 50% (World Health Organization, 2012).
Studies of the long-term impact of deworming in multiple environments suggest
that even under conservative assumptions and allowing for uncertainty, the expected
benefits of MDA exceed the $0.30 per treatment costs (Ahuja et al., 2015; Baird et al.,
2
Croke et al. (2016)
2016; Bleakley, 2007; Croke, 2014; Ozier, 2015). 1 Many organizations that have examined
the evidence have judged deworming to be highly cost effective, including the Copenhagen
Consensus (Hall and Horton, 2008), the Disease Control Priorities Project (Hotez et al.,
2006) 2, Givewell (Givewell, 2013; Givewell, 2014), the Abdul Latif Jameel Poverty Action
Lab (J-PAL Policy Bulletin, 2012) and the World Bank (World Bank, 1993).
However, a recent Cochrane Review (Taylor-Robinson et al., 2015; hereafter
“TMSDG”) argues against this long-standing consensus. TMSDG conduct a meta-analysis
of randomized studies focusing on the short-term impacts of deworming. The authors
express support for treating those known to be infected (p. 30), and estimate that single
dose treatment for those known to be infected (and to have moderate intensity infections
on average) increases weight by 0.75 kg (95% CI: 0.24,1.26; p=0.0038). However, they
take a strong stand on mass drug administration in areas where worms are endemic, arguing
that there is “substantial evidence” that mass treatment has no impact on child weight or
other outcomes (p. 2). This has led a subset of the TMSDG authors to conclude in other
work that the belief in long-term educational and economic impacts discussed above are
“delusional'' (Garner, Taylor-Robinson and Sachdev, 2015).
Of course, failure to reject the null hypothesis of no effect does not constitute
evidence of no effect. Particularly when treatment is inexpensive and side-effects are mild,
studies and meta-analyses must have adequate power in order to rule out possibly modest
effect sizes that would still render treatment cost-effective. Unfortunately, underpowered
1
Givewell calculates the cost of deworming for soil-transmitted helminthes in India at $0.30 per child per
treatment, which includes both drug and delivery costs, including the value of staff time (Givewell (2016)).
2
We note that while analysts at Givewell identified errors in the calculations of deworming's cost
effectiveness presented in the Disease Control Priorities Project, their revised calculations also find it to be
a cost effective intervention at $82.51 -- $138.28 per DALY (Givewell, 2011).
3
Croke et al. (2016)
meta-analyses are extremely common in many areas of health research (Turner, Bird and
Higgins, 2013), and it is important to examine when this is the case. In the case of
deworming, the cost of MDA for STH is estimated at just $0.30 per treatment (Givewell,
2016). Since the majority of those treated in MDA programs will either be uninfected or
have only light intensity infections rather than the moderate to severe infections thought to
account for the bulk of STH morbidity (De Silva et al. 2015, Montresor et al. 2015),
statistical power to pick up population-wide effects is typically limited (Bundy, Walson
and Watkins, 2013). TMSDG include studies in environments with prevalence below the
20% threshold at which the WHO recommends deworming, further weakening statistical
power by including estimates from settings where there are few worm infections to begin
with.
In this paper, we first assess statistical power in TMSDG, and conclude that power
is inadequate to rule out weight gain effects that would make MDA cost effective relative
to school feeding programs aimed at similar populations. It is also underpowered to reject
the hypothesis that the effect of MDA is different from the effect that might expected, given
deworming’s effects on those known to be infected, and given STH prevalence in the
populations where the MDA trials took place. We then update the work of TMSDG to
create a more comprehensive, and thus better-powered, meta-analysis. The update includes
22 estimates from 20 studies examining the impact of multiple-dose MDA on child weight
at longest follow-up, twice as many as TMSDG. It includes four studies not identified in
TMSDG and additional data from six studies discussed by TMSDG but not included in
their meta-analysis for the child weight outcome, either using procedures in the Cochrane
Handbook for Systematic Reviews of Interventions (Higgins and Green, 2011) to extract
4
Croke et al. (2016)
additional data or contacting the original authors to obtain additional information. 3
Additionally, in three cases, the updated analysis includes improved estimates, for
example, by obtaining information on intra-cluster correlation directly from the original
study authors rather than by imputing data from other studies.
With the full data set, the hypothesis of a common zero effect of multiple-dose
deworming on weight is rejected with p<0.001. With the TMSDG sample, this hypothesis
is rejected at the 10% level, but applying either a study classification approach used in the
previous 2012 Cochrane Review or a study classification approach used in earlier Cochrane
Reviews leads to rejection with p<0.01. 4 Any one of five other individual updates to the
TMSDG data leads to rejection of the null hypothesis of a common zero effect at the 5%
level.
Using our updated sample, and following TMSDG in including studies from low-
prevalence environments where the WHO does not recommend MDA, mass deworming is
estimated to increase child weight by 0.134 kg (95% CI: 0.031,0.236; p=0.01; random
effects estimation). This effect remains robust (at p<0.05) when any individual trial
estimate is dropped from the meta-analysis. The result is also nearly unchanged even when
simultaneously dropping any two of the 22 estimates: among the 231 possible
combinations of two studies that could be dropped simultaneously, in 96% of cases the
estimated effect remains statistically significant at p<0.05, and the largest p-value is just
0.067. In areas with prevalence above 20%, where the WHO recommends MDA, the
average estimated impact on child weight is 0.148 kg (CI: 0.039,0.258; p=0.008; random
3
In several cases we received data from the Campbell Collaboration deworming team, who had themselves
directly contacted the original authors. See Appendix A for details on the source of all studies included in
our sample.
4
See section 5.1 for more detailed discussion of these study classification issues.
5
Croke et al. (2016)
effects). Where the WHO recommends multiple annual MDA (areas with prevalence above
50%), the average estimated weight gain is 0.182 kg (CI: 0.070,0.293; p=0.001; random
effects).
The implied average effect of MDA deworming on infected children in the full
sample (calculated by dividing estimated impact by worm prevalence for each study and
applying a random effects model) is 0.301 kg. 5 This average effect likely conceals
substantial heterogeneity. Light infections are often asymptomatic, and only between 2 and
16 percent of the population experience moderate to severe intensity infections in the
studies in our sample that report this information,6 so implied effects in the subpopulation
of those with moderate to severe intensity infections are likely much larger. For more
general context on the implied average effect size of 0.301 kg, the difference in weight gain
for boys at the 25th versus at the 50th percentile of the weight-for-age distribution between
ages 2 and 3 is 0.2 kg (World Health Organization Multicentre Growth Reference Study
Group, 2006). 7
Moreover, this gain comes at modest cost compared to some other common
interventions. The implied weight gain per U.S. dollar of expenditure is 0.22 kg, assuming
5 The implied average effect on infected children in the subsamples of studies conducted in areas of 20%
and 50% prevalence are similar, at 0.249 and 0.276 respectively. The confidence intervals overlap with that
of the full sample.
6 In particular, Joseph et al. (2015) reports 15% prevalence of any helminthic infection among preschool
aged children at baseline in Peru, with only 1.8% of children exhibiting moderate or heavy infections, while
Miguel and Kremer (2004) report hookworm, whipworm, and roundworm prevalences of 77%, 42%, and
55%, with 15%, 16%, and 10% prevalence of moderate to severe infection respectively. Moreover, Miguel
and Kremer (2004) use thresholds that are lower than the WHO thresholds used by Joseph et al. (2015) for
moderate to severe intensity infections, suggesting that the comparable prevalence of moderate-severe
infections in the Miguel and Kremer (2004) sample is lower than the figures presented here.
7 According to WHO growth charts, a boy at the 25th percentile of the weight for age distribution grows 1.9
kg between age 2 and age 3, while a boy at the 50th percentile grows 2.1 kg over that year. Note that the
difference in weight gain for boys at the 25th versus at the 50th percentile of the weight-for-age distribution
between ages 3 and 4 is also 0.2 kg (World Health Organization Multicentre Growth Reference Study
Group, 2006). In our full sample, the median duration of follow up at which weight gain is measured is 1
year.
6
Croke et al. (2016)
two MDA treatments per year. For comparison, the weight gain per dollar of expenditure
estimated in RCTs by Galloway et al. (2009) for school feeding programs is less than 0.01
kg, suggesting that relative to school feeding, deworming is highly cost effective in
increasing weight in school-age children in low-income countries. This echoes a recent
epidemiological study that similarly finds deworming to be highly cost-effective (Lo et al.,
2016).
This paper is organized as follows. Section 2 provides background information on
soil-transmitted helminths, mass drug administration, and earlier literature, including
TMSDG, and assesses whether TMSDG is adequately powered to rule out cost-
effectiveness. Section 3 discusses the sample, including criteria for study inclusion, the
procedure used to identify studies, and the general principles guiding data extraction and
determination of which estimates to use in the meta-analysis. Appendix A details the
application of these principles to individual studies. Section 4 describes our hypothesis
testing and estimation strategy. Section 5 replicates the TMSDG analysis, tests the
hypothesis of a common zero effect of multiple dose MDA, and estimates the impact of
multiple dose MDA on child weight, both in environments where the WHO recommends
mass deworming and more broadly. Section 6 concludes with a discussion of implications,
methodology, and directions for future research.
2 Background
STH are spread via eggs deposited in the local environment when individuals defecate in
their surroundings or do not practice proper hygiene after defecating. Due to the
transmission mechanism, school-aged children are especially vulnerable to these worm
7
Croke et al. (2016)
infections, and also play an important role in spreading them in the local community (Hotez
et al., 2006). The potential health consequences of worm infections are generally agreed to
depend on the number of worms in the body (i.e., infection intensity), rather than a simple
binary indicator of infection status. Infection intensity is highly skewed and is strongly
correlated with disease prevalence: in low prevalence populations relatively few people
have severe infections, while many more do in high prevalence populations (Anderson,
Truscott and Hollingsworth, 2014).
Because of these health impacts, there is widespread acceptance that those who are
known to be infected with intestinal helminths should be treated. Indeed, this is the standard
of medical care (Horton, 2000; Keiser and Utzinger, 2008; Perez del Villar et al., 2012),
and some consider not treating individuals known to be infected as unethical. New trials of
this type are therefore typically not conducted, but TMSDG identify five trials of single-
dose treatment on what the authors term “infected children,” 8 and one trial of multiple dose
treatment. The TMSDG meta-analysis found a near-term child weight gain effect of 0.75
kg across the single dose trials (95% CI: 0.24,1.26; p=0.0038). 9 These trials were largely
conducted in settings with considerable infection intensity: for the 3 (out of 5) single dose
studies that report infection intensities as measured by eggs per gram (epg), all three report
mean epg values which are equivalent to a moderate intensity infection for at least one type
of worm (Stephenson et al. 1989; Stephenson et al. 1993; Yap and Steinmann 2014). 10
8 As described in more detail below, this terminology is slightly different than “screened for infection,”
which was the classification used in the previous Cochrane Review (Taylor-Robinson et al., 2012).
9 The degree of confidence that TMSDG have in these findings is unclear. At one point in the text they note
that the case for treating infected children is “obvious” (p. 30). However, they elsewhere describe the meta-
analysis results as “low quality evidence” (Taylor-Robinson et al. 2015, p. 2).
10
These are our calculations, using the infection intensity thresholds from World Health Organization
(2002).
8
Croke et al. (2016)
Because deworming treatment is inexpensive and safe but diagnosis is
comparatively expensive (necessitating lab analysis of a stool sample) and logistically
difficult in many contexts, the WHO recommends annual mass treatment in areas where
worm infections are above 20% and multiple treatments annually where prevalence is
greater than 50%. Screening for worm infections requires testing stool samples, which in
turn necessitates skilled staff, laboratory facilities, and re-contacting infected individuals
for treatment, which can be challenging in many contexts where worm infections are
endemic. Furthermore, the Kato-Katz test - the most commonly used method for testing
for worms in these regions - has an estimated specificity between 52% and 91% (Barda et
al., 2013; Assefa et al., 2014), suggesting that many infections would go undetected, and
thus presumably untreated, even with proper screening. TMSDG note that screening for
worm infections is not recommended by the WHO because the cost of screening is 4 to 10
times that of the treatment itself (Taylor-Robinson et al., 2015, p. 7). Taken together, this
suggests that a policy of screened treatment for worm infections would be costly,
logistically complicated, and imprecise.
Subsequent to the WHO recommendation for MDA, a social science literature emerged
measuring the longer-term educational and economic impact of mass deworming. Four
studies in three moderate to high prevalence settings -- in Kenya, Uganda, and the historical
southern United States -- all find substantial long-run impacts of deworming on educational
outcomes (Croke, 2014; Ozier, 2015; Bleakley, 2007; Baird et al., 2016). Two of these
studies also report economic outcomes and both find positive effects.
9
Croke et al. (2016)
• Croke (2014) finds that Ugandan children exposed to a deworming program
originally studied in Alderman et al. (2006) have higher math test scores nearly a
decade later, with effect sizes of over 0.2 standard deviation units.
• Ozier (2015) finds that infant children who lived in Kenyan communities where
older school-age children were dewormed show large cognitive test score
improvements ten years later, presumably due to reduced infection through
beneficial spillover effects. The magnitude of the effect is 0.2 to 0.3 standard
deviation units, which is equivalent to between 0.5 to 0.8 years of schooling.
• Using a difference-in-difference estimation methodology rather than a
randomized design to study a deworming campaign in the U.S. South in the early
1900s, Bleakley (2007) finds that deworming led to increased school enrollment
and attendance for children, and improved literacy and boosted income by 17%
for adults who were treated as children.
• Finally, Baird et al. (2016) estimate that a decade after treatment, males who
participated in mass deworming in Kenya worked 17% more hours per week and
had higher living standards, missing approximately one fewer meal per week.
Females were approximately one-quarter more likely to have passed the primary-
school leaving exam and attended secondary school. The estimated value of
benefits, in terms of the net present value of future earnings net of increased
schooling costs, exceeds the cost by more than one hundred fold.
These results suggest that the expected benefits of deworming would greatly exceed its
costs even if one took a conservative approach, assuming a very low probability of effects
10
Croke et al. (2016)
of the measured magnitude or assuming that true effects are considerably less than the
measured effects (Ahuja et al. 2015).
While TMSDG estimate positive impacts of deworming treatment on weight of
children known to be infected, they argue in contrast that there is “substantial evidence”
that mass deworming does not improve weight or other child outcomes. Note that this is a
considerably different -- and far more demanding -- statement than a claim that the null
hypothesis of no effect cannot be rejected (for instance, due to a lack of statistical power),
or that MDA has a positive impact in some settings but not others.
TMSDG has been controversial. Campbell et al. (2016) note substantial changes
over time in the way the meta-analysis in the Cochrane Review on deworming is presented
(see the evolution across Dickson et al. (2000), Dickson et al. (2007), Taylor-Robinson et
al. (2012), and Taylor-Robinson et al. (2015)). Campbell et al. (2016) also note that
although the text refers to a protocol (Taylor-Robinson et al., 2015, p.29), they were unable
to find a publicly available pre-specified protocol for the updated Review, leading to
confusion over the precise hypothesis being tested, uneasiness over how studies are
grouped for analysis, and concern about which studies are included versus excluded. A
number of authors (Campbell et al., 2016; De Silva et al., 2015; Montresor et al., 2015;
Michael et al., 2000) have expressed concern over lack of consideration given to the effects
of different STH species, treatments, and drug distribution strategies.
2.1 Is TMSDG adequately powered?
A simple calculation suggests that the TMSDG analysis is underpowered to rule out the
possibility of effects that would make mass drug administration cost effective relative to
11
Croke et al. (2016)
school feeding. As noted, the estimated weight gain in kilograms per dollar spent from
school feeding is less than 0.01 (from RCT studies; Galloway et al. (2009)). Given the
$0.60 per year treatment cost of deworming in environments where two annual doses are
required, an effect of just 0.006 kg would make deworming cost effective relative to school
feeding (setting aside, for the time being, the issue of other outcomes).
Examining the random effects meta-analysis estimator that is the focus in TMSDG
and taking as given the variance of the effect across studies that TMSDG estimate, and the
standard errors TMSDG report for the underlying studies, the implied Minimum Detectable
Effect (MDE) size at 95% confidence and 80% power is 0.28 kg. 11 As illustrated in Figure
3, a wide range of possible effect sizes lies above the magnitude at which deworming would
be cost effective relative to other common interventions for similar populations (~0.006
kg), yet below the magnitude which TMSDG would be adequately powered to detect
(~0.28 kg). For example, this range includes TMSDG's random effects estimate of 0.08
kg, for which there is only 13% statistical power to detect a significant effect at 95%
confidence.
TMSDG also appears to be underpowered to detect effects that might be reasonable to
expect from MDA given the estimated effects TMSDG report in studies of those known
to be infected. This is notable because the suggestion that there are positive effects from
treating individuals known to be infected, yet no impact from treating populations
11 These estimates were obtained using the method of Hedges and Pigott (2001). In particular, for a given
effect size, we estimate power as: = 1 − Φ �1.96 − � + Φ(−1.96 − ),
where Φ is the cumulative distribution function for a standard normal random variable, and
is the standard error for the average effect size under the random effects model. Reported
power for a given effect size is the probability that the null hypothesis that the average effect size is zero is
rejected at the 0.05 level of significance. The reported MDE is an estimate of the effect size that would
deliver a test with 80% power.
12
Croke et al. (2016)
containing infected individuals, creates an apparent paradox, since if there are effects of
deworming among those known to be infected then one would expect a smaller but still
positive average effect of MDA in endemic populations. A simple exercise suggests that
this hypothesis cannot be rejected using TMSDG's data. Under the assumption that MDA
does not affect weight in uninfected children, 12 the expected average effect of MDA on
weight is given by multiplying prevalence by the expected effect on infected children.
Thus we calculate each study’s expected MDA treatment effect by multiplying the study-
specific STH prevalence by TMSDG’s random effects estimate for the impact of
deworming on infected children, which is 0.75 kg. 13 We generate a standard error for this
expected MDA treatment effect by multiplying the study-specific prevalence by
TMSDG’s standard error of 0.26 kg. These hypothetical data represent MDA effects that
would be reasonable to expect in each setting, with standard errors that reflect our
uncertainty about the expected treatment effects. 14 We apply a random effects model to
these expected MDA effects, which yields a point estimate of 0.20 kg and a confidence
interval of (0.13 kg to 0.28 kg). We cannot reject that this expected average effect of
MDA is different from TMSDG’s estimate of 0.08 kg (with p=0.23). Thus, TMSDG’s
MDA results are consistent with their estimates of the effect of deworming on infected
12 This assumption may not hold if deworming creates epidemiological externalities and also affects
weight, but it will hold under the null hypothesis that worms have no impact on weight.
13 For the expected effect on infected children, we use TMSDG’s random effects estimate of 0.75 kg (95%
CI: 0.24, 1.26). Note that these calculations use estimated prevalence for two studies in TMSDG's sample
that do not report prevalence within their study – Alderman et al. (2006) and Awasthi et al. (2008). See
Section 4 for more details.
14 In particular, suppose that our beliefs about the average treatment effect on infected children are
informed by TMSDG’s results and are characterized by (0.75, 0.262 ). Under the assumptions
of this exercise, in an MDA setting with prevalence , our beliefs about the MDA treatment effect
would be characterized by (0.75, 0.262 2 ). The hypothetical data we consider, then, are
consistent with our beliefs about MDA treatment effects. Results we obtain using these hypothetical
data are results that we consider reasonable to expect given TMSDG’s results on infected children.
13
Croke et al. (2016)
children. Moreover, the TMSDG analysis is underpowered to detect these expected MDA
effects. Nearly all expected MDA effects that lie in the 95% confidence interval are
smaller than the 80% MDE of 0.28 kg.
Furthermore, it is commonly believed that the health impact of worms is much
greater for those with moderate to severe infections than with light infections. Since
TMSDG’s estimate for effects on infected children is based on studies with moderate
intensity infections on average, while the majority of those infected in most MDA studies
are likely to have light intensity infections (see footnote 6 above), we may actually expect
the average MDA effect to be even smaller than the 0.20 kg estimated above. This
underscores the points that (1) TMSDG’s MDA and screen-and-treat results are
consistent with one another, and (2) TMSDG is underpowered to detect effect sizes that
we would reasonably expect given TMSDG’s estimated effects on infected children.
Finally, power is also an issue in many of the individual studies which make up the
meta-analysis. In low worm infection prevalence populations, very large samples are
needed to have sufficient statistical power to pick up effects of mass drug administration.
Since low-intensity infections are expected to have little impact, and since intensity rises
non-linearly with prevalence (Anderson and May, 1985), only a small fraction of the
population in low prevalence environments will have moderate to heavy infections and the
necessary increase in sample size may be much larger if weight gain effects are
concentrated among those with moderate-severe intensity infections. 15
15
Anderson and May (1985) explain that the relationship between prevalence and intensity is well
described by: Mean Intensity = k*(1-Prevalence)-1/k-k, where k is a parameter representing the degree of
aggregation or dispersion of the parasite in the population. Estimates of k lie in the range 0.11 to 0.81
(Anderson, Truscott and Hollingsworth, 2014). Taking the derivative with respect to prevalence implies
that intensity is expected to rise more than linearly in prevalence.
14
Croke et al. (2016)
Together, these factors imply that it will be important to use the most efficient
estimators possible, for example, by including baseline values of weight when possible
rather than just endline values, in order to improve statistical power. It will also require
efforts to employ all the available studies and data.
3 Sample and data extraction procedures
This section describes trial inclusion criteria, the search procedure for identifying studies,
and the procedures for extracting data from included trials.
3.1 Trial inclusion criteria
Our analysis includes randomized controlled trials of deworming MDA with multiple
doses that include child body weight as an outcome. Following what TMSDG term their
“main comparison” (Taylor-Robinson et al. 2015, p. 4), we consider only trials in which
multiple doses of deworming treatment were administered, and include treatment effect
estimates from the longest follow-up reported. 16 We focus on child weight gain because it
is an important nutritional outcome which could potentially be improved over relatively
short time horizons.17 Moreover, there is substantial evidence for this outcome available
in existing studies, which we expect to lead to relatively more adequate statistical
power. 18 Weight is also highlighted as one of the three primary outcomes examined in
16 The median length of study in both our sample and the TMSDG sample is 12 months.
17
The studies we consider generally take place in low-income settings where child obesity is not
considered a widespread issue.
18
For many of the other outcomes they examine, TMSDG include relatively few studies in their meta-
analysis. Only weight (10 studies), height (7), and hemoglobin (7) have more than three studies that are
aggregated in formal meta-analysis. As height deficiencies and stunting are generally conceived in the
nutrition literature as the result of cumulative undernutrition over extended periods, we considered that
height was unlikely to respond to deworming over the course of relatively short run trials (the median
length is 12 months). Hemoglobin is an important outcome but it is most closely associated with
15
Croke et al. (2016)
TMSDG (p.11). 19 Only trials for which a proper intention-to-treat estimate can be
obtained are included. Therefore, we require that the study (or trial authors, through
personal communication) report outcomes for the population assigned to treatment and
comparison groups, independent of whether they received treatment or not.
When estimating the mean effect of MDA on weight, we report results both in the
set of trials that take place in settings where the WHO recommends deworming (i.e., those
where prevalence of either hookworm, whipworm, or roundworm is over 20%, which is
the threshold for annual MDA, or 50%, which is the threshold for multiple annual MDA),
and in the full sample (for completeness and comparability to TMSDG).
3.2 Search procedure
We start with the sample of studies included in TMSDG, as a well-known and oft-cited
systematic review, for their analysis of the impact of multiple-dose deworming treatment
of “all children living in an endemic area” (i.e. mass drug administration, or MDA) at
longest follow-up on child weight. We supplement this sample with additional studies we
could identify that meet the trial inclusion criteria above. The Campbell Collaboration
deworming team generously shared information on additional trials which they identified
for use in their own forthcoming systematic review on the impacts of mass deworming. All
studies we include were identified by the Campbell Collaboration and any study that they
hookworm; hookworm infection is the third leading contributor to the global burden of anemia, whereas
neither of the other two STH species rank among the top causes of anemia (Kassebaum et al., 2013; Smith
and Brooker, 2010). Of the 7 trials in the TMSDG meta-analysis of hemoglobin (which produce 9
treatment estimates, since two are factorial), only one appears to have any significant hookworm
prevalence (Dossa and Ategbo, 2001); the other six either do not report hookworm prevalence or report
very low values, between 1% — 11%.
19
The other two are hemoglobin and cognition.
16
Croke et al. (2016)
identified for their analysis of the impact of multiple-dose MDA on weight was considered
for inclusion in our sample. 20
3.3 Data extraction and choice of estimator
Given the importance of statistical power, we sought to use the most precise unbiased
estimator available. We also followed guidelines in the Cochrane Handbook for Systematic
Reviews of Interventions (Higgins and Green, 2011). We use the following principles for
extraction of data and selection of estimates from included trials in our analysis below:
i. If treatment effects are presented without standard errors, standard errors are
calculated using other presented data (e.g., t-statistics, p-values, or 95%
confidence intervals), where possible following the formulas provided in the
Cochrane Handbook (Higgins and Green 2011, section 7.7.3.3).
ii. If results are reported in figures rather than in the text or in a table, Web Plot
Digitizer software (Rohatgi, 2015) is used to extract numerical estimates from the
figures.
iii. If key information on treatment impacts is missing from a paper (and cannot be
derived from what is presented), original microdata (where available) is used to
obtain relevant estimates. We also obtain information from trial authors in several
cases, through either direct communication or thanks to the generosity of the
Campbell Collaboration research team.
iv. Where studies report multiple treatment impact estimates, we follow the standard
in TMSDG and the medical literature of favoring unadjusted estimates. If studies
20
The Campbell Collaboration report just became publicly available.
17
Croke et al. (2016)
do not report unadjusted estimates and we are unable to obtain them directly, but
the studies do report treatment effect estimates adjusted with standard covariates
or baseline values (such as child age and sex), these estimates are included in the
analysis. (Note that since expected weight gain varies with age, including age as a
covariate should generally improve precision of the estimates, and that including
age or other pre-determined variables as a covariate should not induce bias).
v. When there was a choice between treatment effect estimates based on a
comparison of endline differences and treatment effect estimates based on a
comparison of changes from baseline to endline, the “changes” estimate is used,
since it typically is more precise. Using the single difference treatment effect
typically leads to a substantial loss of statistical precision: when outcomes are
highly autocorrelated over time (as is the case for body weight), failure to use
baseline values in measurements of treatment effects results in far less precise
estimates (Gerber and Green, 2012; McKenzie, 2012). Estimates that take into
account baseline information remain unbiased, while typically improving
precision, and thus are preferable under standard statistical criteria, such as under
the goal of minimizing mean squared error. Following the Cochrane Handbook,
when baseline and endline means and measures of variance were present but
variance of the changes are missing in the original text, the standard error for
changes is calculated using a correlation coefficient for the value between
baseline and endline imputed from other studies (Higgins and Green 2011, section
16.1.3.2).
18
Croke et al. (2016)
vi. In the event of apparent textual contradictions about key parameter values in a
trial (for example, language in the study text which reports significant effects
versus reported standard deviation values which imply non-significant results), we
first try to obtain the original microdata to perform the estimation ourselves.
Where this is not possible, we assess which statistics were the primary focus of
reporting in the text and contact the original authors for clarification.
vii. When possible, treatment effect estimates are extracted based on an Analysis of
Covariance (ANCOVA) model, rather than estimates based on difference-in-
difference estimators. The Cochrane Handbook states that since ANCOVA
estimates “give the most precise and least biased estimates of treatment effects
they should be included in the analysis when they are available” (Higgins and
Green (2011), section 9.4.5.2). Properly including baseline weight measures in the
analysis is also critical in contexts with baseline imbalance (Kerwin, 2015).
The full sample includes 22 estimates from 20 studies, twice as many as TMSDG. In
particular, the full sample includes four studies not identified in TMSDG or unpublished
when their review was compiled (Gateff, Lemarinier and Labusquiere 1972; Ostwald et al.
1984; Joseph et al. 2015; Liu et al. 2016); data extracted from six studies discussed by
TMSDG but not included in their meta-analysis for the MDA child weight outcome
(Willett, Kilama and Kihamia 1979; Miguel and Kremer 2004; Ndibazza et al. 2012; Wiria
et al. 2013; Stephenson et al. 1993; Gupta and Urrutia 1982); and improved estimates from
three studies that are included in the TMSDG meta-analysis (Sur et al. 2005; Hall et al.
2006; Awasthi and Pande 2001). Note that we classify Stephenson et al. (1993) as an MDA
trial; the Cochrane authors classify it in this way in their 2012 Review but change the
19
Croke et al. (2016)
classification system in the 2015 update; we retain the 2012 classification in this analysis. 21
See Appendix A for detailed information on each individual study in the full sample.
4 Hypothesis tests and estimation strategy
In light of TMSDG's conclusion that there is “substantial evidence” of no impact of
deworming MDA, we first report a test of the hypothesis that the true impact of multiple-
dose deworming on weight is zero in all settings. This involves testing the null hypothesis
that β=0 in a standard fixed effects meta-analysis estimate:
� = +
� is the estimated effect in study i; is the true deworming treatment effect, and
where
is a random variable, representing measurement error, assumed to be distributed
normally, with mean zero and a standard deviation equal to the standard error of the
estimated treatment effect in each study. Rejection of this null hypothesis implies that
deworming affects child weight in at least some circumstances.
We then report the estimated average impact using a random effects model:
� = + +
21
TMSDG state that “We changed the classification of Stephenson et al. (1989) and Stephenson et al.
(1993). Previously these trials were in the ‘all children in an endemic area’ category, whereas now they are
classified in the ‘children with infection.’ This decision was based on reviewing the trials with
parasitologists and examining the prevalence and intensity of the infection where clearly the whole
community was heavily infected” (Taylor-Robinson et al., 2015, p. 154). It is worth noting that although
TMSDG exclude Stephenson et al. (1993), they include Watkins, Cruz and Pollitt (1996); the highest
recorded worm baseline prevalence in Watkins, Cruz and Pollitt (1996) by STH species is 92% (for
ascaris); the highest prevalence in Stephenson et al. (1993) is also 92% (for whipworm). Thus this
reclassification does not appear to have been done systematically by worm prevalence. In our view,
assessing the merits of the WHO policy by including studies in environments with prevalence below WHO
thresholds while excluding MDA studies in areas with high prevalence may lead to risk of bias. Since our
goal is to examine the effect of MDA, we retain the 2012 classification. Stephenson et al. (1989) is a single
dose deworming trial so does not enter into the present meta-analysis but Stephenson et al. (1993) has one
multi-dose treatment arm, which we incorporate. See Appendix A.3 for more detail.
20
Croke et al. (2016)
where is the underlying true average effect, is a normally distributed random variable
denoting the difference between the average effect and the effect in the particular context,
and represents measurement error due to sampling variation, which is assumed to be
captured in the study-specific standard errors.
Finally, under the assumption that deworming has no effect on uninfected children,
one can calculate for each study an implied estimated effect on weight for infected children
as the estimated intention-to-treat effect divided by prevalence. If one takes the estimated
prevalence to be accurate and not subject to measurement error, then standard errors for
these estimated effects are straightforward to compute. One can then apply a random
effects model to estimate the average treatment effect on infected children. In a few cases,
a study does not report an exact value for prevalence, but we are able to identify whether
the study has below or above 50% prevalence. If prevalence is above 50%, we then
compute the average prevalence among all studies in the sample reporting greater than 50%
prevalence and assign that value to the study. We proceed similarly for studies with below
50% prevalence that do not report an exact value for prevalence. 22
5 Results
Subsection 5.1 first replicates the results of the TMSDG subsample. Subsection 5.2 tests
the hypothesis of a common zero effect. Finally, subsection 5.3 reports estimated effects
of MDA on weight.
5.1 Verifying replication of results in the TMSDG sample
22
See Appendix A.7 for more detail on how prevalence categories were assigned when STH prevalence
was not reported in the study text.
21
Croke et al. (2016)
We call the sample of 11 treatment effect estimates from 10 studies used in the TMSDG
meta-analysis for the impact of MDA on child weight gain the “TMSDG sample.” See
Table 1 for a list of these studies. Figure 1 verifies that our estimation procedure yields
results similar to TMSDG when using this sample. In particular, this figure presents a forest
plot with TMSDG estimates; inserting the effect sizes, standard errors, and sample sizes
reported for each of these studies in TMSDG's text and figures into the relevant formulas
provided in Borenstein, Hedges and Rothstein (2007) using the R statistical software
package replicates the TMSDG results.
Using the data presented in TMSDG, the hypothesis of a common zero effect of
deworming on child weight gain is rejected at the 10% level (p=0.089). Figure 1 shows the
fixed effect estimate used to test this hypothesis.
The null hypothesis of a common zero effect is more strongly rejected when
applying either of two study classification approaches used in previous Cochrane Reviews
(prior to TMSDG).
1. The first versions of the Cochrane Review (Dickson et al. 2000; Dickson et al.
2007) did not create separate categories for test-and-treat (i.e. treatment only of
children who have been diagnosed with STH) and MDA studies (i.e. treatment of
the whole population), but rather considered all the data together. At a minimum,
reporting results with the full sample before turning to the subgroup analysis seems
reasonable, since if deworming has a positive effect on infected individuals, and if
there is no effect on uninfected individuals, then deworming must have a positive
effect on weight in a population that includes infected individuals. In the case of
the TMSDG dataset, using all of the multiple dose at longest follow-up studies
22
Croke et al. (2016)
would only involve the addition of the one multiple dose study the authors identify
but classify as separate from the MDA studies – Stephenson et al. (1993). 23
2. The 2012 Cochrane Review changed this approach, however, introducing a
distinction that effectively distinguished between test-and-treat and MDA studies
(Taylor-Robinson et al. 2012). Applying this classification used in the 2012
Cochrane Review to the studies in TMSDG also leads to the inclusion of
Stephenson et al. (1993), which was classified as an MDA study by Taylor-
Robinson et al. 2012 (or, as a “target population treated” study, using the language
in that review) rather than a test-and-treat study.
Thus, applying either of these procedures from Cochrane Reviews prior to TMSDG results
in the addition of Stephenson et al. (1993) to the TMSDG sample. With Stephenson et al.
(1993) classified in this way but otherwise using the data set in the TMSDG study, the null
hypothesis of a common zero effect is strongly rejected with a p-value of 0.009 (see Table
2, row 2).
5.2 Testing the hypothesis of a common zero effect
We now turn to the full sample of 22 treatment effect estimates. Table 3 shows that the null
hypothesis of a common zero effect is rejected at p<0.001 in the full sample, in areas with
prevalence above the thresholds at which WHO recommends MDA (20% prevalence), and
in areas where it recommends multiple annual dose MDA (50% prevalence). This result is
not reliant on the addition of any one study to the TMSDG sample: any one of six updates
leads to rejection of the hypothesis of a common zero effect with p<0.05 (see Table 2).
23
Note that we believe that Stephenson et al. (1993) should be included in the set of MDA studies, since
individuals were treated without first being screened, as discussed in Appendix A.3.
23
Croke et al. (2016)
Figure 2 shows point estimates and confidence intervals for the effect of mass
deworming drug administration on weight from each of the studies included in the full
sample.
The rejection of a common fixed effect of zero implies that MDA deworming
affects child weight in at least some circumstances. If effects are positive in some
circumstances, then unless they are negative in other circumstances, average effects must
be positive. Yet, there is no scientific reason to believe that deworming has negative side
effects on weight. With only one negative estimate significant at the 5% level out of 22
estimates in the full sample (Figure 2), the patterns in the data seem consistent with the
hypothesis that the true effect of MDA on weight is never negative. In future work, we
hope to more formally examine if this hypothesis can be rejected. However, for the sake of
comparability, we follow TMSDG in the next section by imposing that the distribution of
true effects is normal around some unknown mean.
5.3 Estimating the impact of deworming
Table 3 reports random effects estimates. In the full sample, the estimated weight gain
effect is 0.134 kg [CI: 0.031,0.236; p=0.01]. Of course, the full sample includes trials
conducted in low infection prevalence areas where the WHO does not currently
recommend mass deworming. 24 In areas with greater than 20% prevalence, where the
24 TMSDG examine outcomes by subgrouping based on infection prevalence, but they split the data into
three subgroups, one containing only two studies, thus limiting statistical power for each test. In the weight
gain multiple dose comparison, TMSDG analyze 5 studies from low prevalence settings (defined as less than
50% infection), 4 from medium prevalence settings (50% to 70% prevalence), and 2 from high prevalence
settings (over 70% prevalence). They take this tripartite division from an earlier WHO framework (World
Health Organization, 2002). Creating groupings with only a few studies makes the resulting estimates far less
precise. The Cochrane Handbook notes that “when there are few studies or if the studies are small, a random-
effects analysis will provide poor estimates of the width of the distribution of intervention effects” (Higgins
and Green, 2011, section 9.5.4). Note that this tripartite division was not pre-specified in the original
24
Croke et al. (2016)
WHO recommends MDA deworming, the estimated treatment effect is 0.148 kg [95% CI:
0.039,0.258; p=0.008]. In areas with more than 50% prevalence, where the WHO
recommends multiple doses annually, the estimated effect is 0.182 kg [CI: 0.070,0.293;
p=0.001].
Our full sample estimate has more statistical power than TMSDG. Using the same
approach as above, we find an MDE of 0.15 kg in the full sample, roughly half the MDE
of 0.28 kg using the TMSDG sample. Using the random effects estimate of 0.13 kg, we
find that the full sample analysis has 72% statistical power to detect a significant effect at
95% confidence. This compares to 18% statistical power to detect an effect of 0.08 kg, or
31% statistical power to detect an effect of 0.13 kg using the TMSDG sample (see Figure
3). 25
This full sample effect remains robust (at p<0.05) when any individual trial estimate
is dropped from the meta-analysis, as shown in Table 4. The level of significance with
which the null hypothesis that the mean effect is zero can be rejected remains high even
when simultaneously dropping any two of the 22 estimates: among the 231 possible
combinations of two studies that could be dropped simultaneously, in 96% of cases the
estimated effect remains statistically significant at p<0.05, and the largest p value is just
0.067 (not shown). The results are similarly robust to dropping any one or two studies in
Cochrane pre-analysis plan (Dickson, Awasthi, and Demmellweek, 1997). Examining all studies conducted
in environments in which the WHO recommends MDA while excluding studies where the WHO does not
recommend MDA would enhance policy relevance.
25 Using an effect of 0.13 kg and the value of ̂ 2 estimated in the full sample (0.038 in the full sample as
opposed to 0.074 in the TMSDG sample), we find that the TMSDG analysis has 44% statistical power to
detect a significant effect at 95% confidence.
25
Croke et al. (2016)
the subsample of studies with prevalence greater than 20% and those with prevalence
greater than 50%. 26
The full sample estimate of 0.134 kg comes from studies with average worm
prevalence of 51%. Assuming treatment only affects weight in the worm-infected
population, this implies an average effect of roughly 0.301 kg among those with worms
(calculated by dividing estimated impact by worm prevalence for each study and applying
a random effects model). 27 This in turn likely represents the average of a considerably
larger effect among the small proportion of those infected who have moderate or severe
intensity infections and a considerably smaller effect among the majority of those infected,
who have light-intensity infections.
While the average weight gain is fairly modest, it is far from negligible relative to
the very low cost of deworming. Given the estimated effect of 0.134 kg and the estimated
cost of $0.60 per person treated for two treatments per year (in the multiple dose context
we focus on in this meta-analysis), the estimated cost per kg gained is ($0.60)/(0.134kg) =
$4.48 per kg. For reference, it is worth comparing this to nutritional programs aimed at
similar populations, in particular, school feeding programs. Kristjansson et al. (2007)
conduct a Cochrane Review of the impact of school feeding, and Galloway et al. (2009)
combine these results with data on costs to estimate the cost effectiveness of school
feeding. They estimate that over a one year period, “the cost of an extra kilogram of weight
ranged from $112 to $252 in the RCTs and $38 to $86 in the [controlled before-and-after
26 Results are also robust to maintaining the treatment effects and standard errors reported in TMSDG
(rather than using updating treatment effects or standard errors for Awasthi 2001, Hall 2006, and Sur 2005)
and only altering the TMSDG sample by incorporating previously-excluded studies.
27
Note that this calculation includes estimated prevalence for three studies in the full sample which do not
report prevalence within their study – Gateff, Lemarinier and Labusquiere (1972), Alderman et al. (2006),
Awasthi et al. (2008). See Table 1 and Section 4 for more details.
26
Croke et al. (2016)
studies]” (p.177-8). This suggests that just focusing on the weight outcome, deworming is
highly cost effective relative to another widely implemented intervention. As noted by
Galloway et al. (2009), school feeding is implemented in over 72 countries by the World
Food Programme alone. To the extent that school feeding programs aim to produce child
weight gain, deworming is likely to be a highly cost effective option for policymakers who
already support school feeding. 28
Of course, a complete cost-effectiveness analysis of mass deworming treatment
would also need to consider effects on later educational and labor market outcomes, in
addition to child weight gains, and as noted above these are often substantial (Ahuja et al.,
2015; Baird et al., 2016; Bleakley, 2007; Croke, 2014; Ozier, 2015).
6 Conclusion
We began with the question of whether the expected benefits of mass drug administration
according to WHO guidelines for deworming exceed the cost.
To summarize, the null hypothesis of a common effect of zero weight gain from
multiple-dose mass drug administration is rejected at the 10% level using the TMSDG data.
Employing either of two classification systems used in previous Cochrane Reviews would
lead to rejection at p<0.01. Any one of five other updates to the data leads to rejection of
the null hypothesis at the 5% level. Combining all updates leads to rejection with p<0.001.
28Note that this comparison is imperfect for a number of reasons, including that the cost figures for
deworming are for India whereas those for school feeding are based in three African countries. Moreover,
both school feeding and deworming may affect other outcomes. However, it is unlikely that appropriate
adjustments for these factors would overturn the conclusion that deworming is highly cost effective in
increasing weight relative to school feeding programs, given that the cost per kg of weight gain is an order
of magnitude higher for school feeding. Moreover, while school feeding may also promote school
participation, deworming, too, has been found to be highly cost effective in increasing school participation
(Miguel and Kremer, 2004).
27
Croke et al. (2016)
Reasonable people may disagree about statistical methods for analyzing data.
However, at a minimum, it seems clear that implementing MDA generates child weight
gains in some circumstances. Since the null hypothesis of a common zero effect is rejected,
MDA must have positive impacts in at least some environments. This implies that if one
accepts the standard view that anthelmintic drugs have no substantial negative side effects,
the expected effect of deworming on child weight is positive.
Applying standard approaches from the Cochrane Handbook to a larger set of
studies yields an estimated average effect of deworming on weight of 0.134 kg [95% CI:
0.031,0.236], corresponding to an estimated average effect of 0.301 kg [CI: 0.071, 0.530]
among those with worm infections. While this effect is modest, it is substantial relative to
the cost of deworming, and suggests that deworming is many times as cost effective as
widely implemented school-feeding programs at enabling weight gain among school age
children in low-income populations. Moreover, the findings that deworming improves
nutrition in at least some environments implies that the literature on the long-run
educational and economic impacts of deworming cannot be dismissed on a priori grounds,
and this literature suggests that the expected benefits of MDA greatly exceed the cost.
Our results also suggest that the data from studies of mass deworming are consistent
with the data from studies of deworming of those known to be infected, given the much
lower prevalence and intensity of infection in MDA studies than in test-and-treat studies,
and the substantial confidence intervals around both the MDA and test-and-treat estimates.
A key lesson of this paper is that large samples are needed to have adequate
statistical power to pick up a minimum detectable effect (MDE) that corresponds to what
would be needed to form a policy judgment that the expected benefit of deworming is less
28
Croke et al. (2016)
than the cost. 29 This implies that analyses which divide up the set of available studies will
likely be underpowered, limiting the scope for further subsample tests. While the results
here suggest that overall, multiple-dose MDA increases child weight, they also suggest that
if MDA had similar impacts on weight across drug type or worm species, a meta-analysis
focusing on any one species of worm or drug may well be underpowered. We therefore
think it would be appropriate for any future studies designed to explore heterogeneity,
across worm species or drug type, for example, to report a test of the hypothesis that the
average effect of MDA for each worm species or each drug is the same, rather than to
simply test the hypothesis that the effect of any one individual drug or MDA against any
one species has zero impact on weight. Beyond its relevance for health research, greater
awareness of the limitations of under-powered meta-analyses will become increasingly
important as more social scientists conduct meta-analyses (Vivalt, 2015).
A further methodological lesson is that it is appropriate to explicitly test the null
hypothesis of a common zero effect. Finally, in evaluating policy, it is appropriate to focus
on all mass-drug administration studies conducted in environments where mass deworming
is recommended under WHO guidelines rather than to mix in studies from environments
in which worm prevalence is sufficiently low that the WHO does not recommend mass
treatment, or to selectively exclude studies of mass drug administration conducted in high-
prevalence environments.
29 The estimated effect in the full sample is about 50% greater than the estimate in the TMSDG sample,
though well within the confidence interval of (-0.11 kg, 0.27 kg) found by TMSDG in their meta-analysis
of the impacts of multiple-dose MDA on weight. However, the confidence interval shrinks by 47% to (0.03
kg, 0.24 kg) with the full data set. Thus, incorporation of additional information into the analysis helps
address the problem of insufficient power in TMSDG.
29
Croke et al. (2016)
While we have argued that deworming MDA is cost effective based on its impact
on child weight alone, there is evidence that deworming also leads to gains in other
outcomes (Ahuja et al. 2015). TMSDG aggregate data from a more limited number of
studies for outcomes other than weight, and some of these are from low-prevalence
environments where the WHO does not recommend deworming, further reducing
statistical power. This means that for each outcome assessed individually, it is typically
impossible to reject either the hypothesis of no average effect, or the hypothesis of effects
large enough to make deworming cost effective. In such a setting, it may be appropriate to
consider the joint hypothesis that there is no impact on any of the child outcomes
considered. We hope to do this in future work.
We have begun to explore heterogeneity of impact with covariates suggested by the
scientific literature, including prevalence and intensity, age, and whether the study design
captures epidemiological spillovers. One could also examine heterogeneity by helminth
type, drugs used, and co-morbidities. However, given the limited number of studies and
hence limited degrees of freedom, scope for examining heterogeneity while maintaining
statistical power is limited. In future work, we hope to systematically examine
heterogeneity, but here we simply capture heterogeneity using random effects estimation.
We follow TMSDG in assuming a normal distribution of the effect of MDA across
study environments, but future work could relax the assumptions of symmetry and
normality on the distribution of effects across studies and estimate these from the data.
There is no reason, a priori, to expect that the distribution of deworming effects follows
such a distribution. In fact, the underlying science naturally suggests a non-symmetric
30
Croke et al. (2016)
distribution, with positive effects in some cases and negligible effects in cases with low
prevalence and intensity of infections.
We hope to examine a Bayesian, rather than frequentist, approach to meta-analysis
for policy analysis in future work. The implicit loss function implied by requiring 95%
confidence to undertake MDA without regard to the statistical power of the test is one in
which there is a high cost of a false positive and a low cost of a false negative. That might
be appropriate if, for example, the Food and Drug Administration were considering a drug
that might have side effects. However, in the deworming context, the drugs have already
been through regulatory approval, the monetary cost of deworming is low, and there is no
evidence of serious side effects, while there is at least some potential that deworming has
large long-run benefits (Ahuja et al. 2015). Thus, the cost of a false positive is low while
the cost of a false negative is potentially quite substantial.
A Bayesian approach to estimating whether the expected benefit of MDA according
to WHO guidelines exceeds the cost would start with a prior on the effect of deworming.
Studies of the impact of treatment on those known to be infected provide a natural prior. It
would factor in the range of benefits that have been estimated from deworming, each with
an attached probability weight in order to assess whether summing across the range of
potential benefits times their likelihood yields a benefit greater than the estimated cost.
Since the net present value of the long-run educational and economic benefits has been
estimated as more than one-hundredfold that of the cost (Baird et al. in press), assessing
even the subjective probability of these benefits would likely lead to the conclusion that
the expected benefits of MDA exceed their cost.
31
Croke et al. (2016)
32
Croke et al. (2016)
References
Ahuja, Amrita, Sarah Baird, Joan Hamory Hicks, Michael Kremer, Edward Miguel and
Shawn Powers. 2015. “When Should Governments Subsidize Health? The Case of
Mass Deworming.” The World Bank Economic Review 29(suppl 1):S9–S24.
URL: http://wber.oxfordjournals.org/content/29/suppl 1/S9.abstract
Alderman, Harold, Joseph Konde-Lule, Isaac Sebuliba, Donald Bundy and Andrew
Hall. 2006. “Effect on weight gain of routinely giving albendazole to preschool
children during child health days in Uganda: a cluster randomised controlled trial.”
BMJ: British Medical Journal 333(7559):122–124.
URL: http://www.jstor.org/stable/40699371
Anderson, R, J Truscott and TD Hollingsworth. 2014. “The coverage and frequency of
mass drug administration required to eliminate persistent transmission of soil-
transmitted helminths.” Phil. Trans. R. Soc. B 369:20130435
Anderson, RM and RM May. 1985. “Helminth infections of humans: mathematical
models, population dynamics, and control.” Adv Parasitol 24:1–101.
Assefa, L. M., T. Crellen, S. Kepha, J. H. Kihara, S. M. Njenga, R. L. Pullan and S. J.
Brooker. 2014. “Diagnostic Accuracy and Cost-Effectiveness of Alternative Methods
for Detection of Soil-Transmitted Helminths in a Post-Treatment Setting in Western
Kenya.” PLoS Neglected Tropical Diseases 8(5):e2843.
Awasthi, S, R Peto, VK Pande, R Fletcher, S Read and Bundy DA. 2008. “Effects of
deworming on malnourished preschool children in India: an open-labelled, cluster-
randomized trial.” PLoS Neglected Tropical Diseases 2(4):e223.
Awasthi, Shally, Richard Peto, S Read, SM Richards, V Pande, Donald A Bundy and
DEVTA team. 2013. “Population de-worming with 6-monthly albendazole: DEVTA,
a cluster-randomised trial among 1 million preschool children in North India.” The
Lancet 381(9876):1478–1486.
Awasthi, Shally and Vinod Kumar Pande. 2001. “Six-monthly de-worming in infants to
study effects on growth.” Indian Journal of Pediatrics 68(9):823–7.
Awasthi, Shally, Vinod Kumar Pande and Robert H. Fletcher. 2000. “Periodic
deworming with albendazole and its impact on growth status and diarrhoeal incidence
among children in an urban slum of India.” Indian Pediatrics 37:19–29.
Baird, Sarah, Joan Hamory Hicks, Michael Kremer, and Edward. 2016. “Worms at Work:
Long-Run Impacts of a Child Health Investment” The Quarterly Journal of
Economics 131(4): 1637-1680. doi: 10.1093/qje/qjw022
33
Croke et al. (2016)
Barda, B., H, Zepherine, L. Rinaldi, G. Cringoli, R. Burioni, M. Clementi and M.
Albonico. 2013. “Mini-FLOTAC and Kato-Katz: Helminth Eggs Watching on the
Shore of Lake Victoria.” Parasites & Vectors 6(220).
Bleakley, Hoyt. 2007. “Disease and Development: Evidence from Hookworm
Eradication in the American South.” Quarterly Journal of Economics 122(1):73–117.
Borenstein, Michael, Larry Hedges and Hannah Rothstein. 2007. “Introduction to Meta-
Analysis.”.
URL:https://www.metaanalysis.com/downloads/Meta%20Analysis%20Fixed%20vs
%20Random%20effects.pdf
Bundy, Donald AP, Judd L Walson and Kristie L. Watkins. 2013. “Worms, wisdom, and
wealth: why deworming can make economic sense.” Trends in Parasitology
29(3):142–148.
Campbell, SJ, SV Nery, SA Doi, DJ Gray, RJ Soares Magalhes, JS McCarthy and et al.
2016. “Complexities and Perplexities: A Critical Appraisal of the Evidence for Soil-
Transmitted Helminth Infection-Related Morbidity.” PLoS Negl Trop Dis 10(5).
Croke, Kevin. 2014. “The long run effects of early childhood deworming on literacy and
numeracy: Evidence from Uganda.” Working Paper.
URL: http://scholar.harvard.edu/files/kcroke/files/ug lr deworming 071714.pdf
De Silva, Nilanthi, Be-Nazir Ahmed, Martin Casapia, H. J. de Silva, John Gyapong,
Mwelecele Malecela and A. Pathmeswaran. 2015. “Cochrane Reviews on
Deworming and the Right to a Healthy, Worm-Free Life.” PLoS Negl Trop Dis 9.
Dickson, R, S Awasthi, C Demellweek and P Williamson. 2000. “Anthelmintic drugs for
treating worms in children: effects on growth and cognitive performance (Review).”
The Cochrane Library (2).
Dickson, R, S Awasthi, C Demellweek and P Williamson. 2007. “Anthelmintic drugs for
treating worms in children: effects on growth and cognitive performance (Review).”
The Cochrane Library (2).
Dickson, R, S Awasthi and C Demmellweek. 1997. “Routine Intermittent Anthelminth
Therapy in Disadvantaged Populations [Protocol].”.
Donnen, P, D Brasseur, M Dramaix, F Vertongen and et al. 1998. “Vitamin A
Supplementation but not deworming improves growth of malnourished preschool
children in Eastern Zaire.” Journal of Nutrition 128(8):1320–7.
Dossa, R. A. M. and E. A. D. Ategbo. 2001. “Impact of iron supplementation and
deworming on growth performance in preschool Beninese children.” European
Journal of Clinical Nutrition 55(4):223.
34
Croke et al. (2016)
Frison, Lars and Stuart J. Pocock. 1992. “Repeated measures in clinical trials: Analysis
using mean summary statistics and its implications for design.” Statistics in Medicine
11(13):1685–1704.
URL: http://dx.doi.org/10.1002/sim.4780111304
Galloway, R., E. Kristjansson, A. Gelli, U. Meir, F. Espejo and D. Bundy. 2009. “School
feeding: outcomes and costs.” Food and Nutrition Bulletin 30(2):171–182.
Garner, Paul, David Taylor-Robinson and Harshpal Singh Sachdev. 2015. “Commentary:
Replication of influential trial helps international policy.” International Journal of
Epidemiology 44(5):1599–1601.
Gateff, C, G Lemarinier and R Labusquiere. 1972. “Chimiotherapie antihelminthique
systematique au thiabendazole en milieu scolaire africain.” Annales de la Societe
Belge de Medicine Tropicale 52(2):103–112.
Gerber, Alan S. and Donald P. Green. 2012. Field Experiments: Design, Analysis, and
Interpretation. W.W Norton and Co.
Givewell. 2011.
URL: http://blog.givewell.org/2011/09/29/errors-in-dcp2-cost-effectiveness-estimate-
fordeworming/#actualcosteffectivenes
Givewell. 2013.
URL: http://www.givewell.org/charities/top-charities
Givewell. 2014.
URL: http://www.givewell.org/international/top-charities/deworm-world-initiative
Givewell. 2016. “Deworm the World Initiative, led by Evidence Action.”.
URL: http://www.givewell.org/international/top-charities/deworm-worldinitiative#
Whatisthecostpertreatment
Global Atlas of Helminth Infections. 2016. “Soil-transmitted helminths.”.
URL: http://www.thiswormyworld.org
Goto, Rie, C.G. Nicholas Mascie-Taylor and Peter G. Lunn. 2009. “Impact of anti-
Giardia and anthelminthic treatment on infant growth and intestinal permeability in
rural Bangladesh: a randomized double-blind controlled study.” Transactions of The
Royal Society of Tropical Medicine and Hygiene 103(5):520–529.
Gupta, M C and J J Urrutia. 1982. “Effect of periodic antiascaris and antigiardia
treatment on nutritional status of preschool children.” The American Journal of
Clinical Nutrition 36(1):79–86.
URL: http://ajcn.nutrition.org/content/36/1/79.abstract
35
Croke et al. (2016)
Hall, Andrew, L Nguyen Bao Khanh, Don Bundy, N Quan Dung, T Son Hong and R
Lansdown. 2006. “A randomized trial of six monthly deworming on the growth and
educational achievements of Vietnamese school children.” Unpublished manuscript.
Hall, Andrew and Sue Horton. 2008. Best Practice Paper: Deworming. Copenhagen
Consensus Center, Denmark.
Hedges, L. V. and T. D. Pigott. 2001. “The power of statistical tests in meta-analysis.”
Psychological Methods 6(3):203–217.
Higgins, Julian PT and Sally Green. 2011. Cochrane Handbook for Systematic Reviews
of Interventions. Cochrane.
Horton, J. 2000. “Albendazole: A Review of Anthelmintic Efficacy and Safety in
Humans.” Parasitology 121((Suppl)):S11332.
Hotez, Peter J, Donald A. P. Bundy, Kathleen Beegle, Simon Brooker, Lesley Drake,
Nilanthi de Silva and Lorenzo Savioli. 2006. Helminth Infections: Soil-transmitted
Helminth Infections and Schistosomiasis, from Disease Control Priorities in
Developing Countries (2nd edition). World Bank.
J-PAL Policy Bulletin. 2012. “Deworming: A Best Buy for Development.”.
URL: https://www.povertyactionlab.org/sites/default/files/publications/2012.3.22-
Deworming.pdf
Joseph, Serene A., Martin Casapia, Antonio Montresor, Elham Rahme, Brian J. Ward,
Grace S. Marquis, Lidsky Pezo, Brittany Blouin, Mathieu Maheu-Giroux and
TheresaW. Gyorkos. 2015. “The Effect of Deworming on Growth in One-Year-Old
Children Living in a Soil-Transmitted Helminth-Endemic Area of Peru: A
Randomized Controlled Trial.” PLoS Negl Trop Dis 9.
Kassebaum, Nicholas J., Rashmi Jasrasaria, Mohsen Naghavi, Sarah K.Wulf, Nicole
Johns, Rafael Lozano, Mathilda Regan, DavidWeatherall, David P. Chou, Thomas P.
Eisele, Seth R. Flaxman, Rachel L. Pullan, Simon J. Brooker and Christopher J. L.
Murray. 2013. “A systematic analysis of global anemia burden from 1990 to 2010.”
Blood 123(5):615–624.
URL: http://www.bloodjournal.org/content/123/5/615
Keiser, J. and J. Utzinger. 2008. “Efficacy of Current Drugs against Soil-Transmitted
Helminth Infections: Systematic Review and Meta-Analysis.” Journal of the
American Medical Association 299(16):193748.
Kerwin, Jason. 2015. “The Effect of HIV Infection Risk Beliefs on Risky Sexual
Behaviors: Scared Straight or Scared to Death?” Working paper.
36
Croke et al. (2016)
Kristjansson, E. A., V. Robinson, M. Petticrew, B. MacDonald, J. Krasevec, L. Janzen, T.
Greenhalgh, G.Wells, J. MacGowan, A. Farmer, B. J. Shea, A. Mayhew and P.
Tugwell. 2007. “School feeding for improving the physical and psychosocial health
of disadvantaged elementary school children.” Cochrane Database Syst Rev
(1):CD004676.
Kruger, M, CJ Badenhorst, EPG Mansvelt, JA Laubscher and AJS Benade. 1996. “The
effect of iron fortification in a school feeding scheme and anthelminthic therapy on
the iron status and growth of 6-8 year old school children.” Food and Nutrition
Bulletin 17(1).
Liu, Chengfang, Louise Lu, Linxiu Zhang, Renfu Luo, Sean Sylvia, Alexis Medina, Scott
Rozelle, Darvin Scott Smith, Yingdan Chen and Tingjun Zhu. 2016. “Effect of
deworming on indices of health, cognition, and education among schoolchildren in
rural China: a cluster-randomized controlled trial.” Unpublished manuscript.
Lo, N. C., Y. S. Lai, D. A. Karagiannis-Voules, I. I. Bogoch, J. T. Coulibaly, E.
Bendavid, J. Utzinger, P. Vounatsou and J. R. Andrews. 2016. “Assessment of global
guidelines for preventive chemotherapy against schistosomiasis and soil-transmitted
helminthiasis: a cost-effectiveness modelling study.” Lancet Infect Dis .
McKenzie, David. 2012. “Beyond baseline and follow-up: The case for more T in
experiments.” Journal of Development Economics 99(2):210 – 221.
Michael, E, Alok Bhargava, Don Bundy, Richard Peto, Ed Cooper, Lorenzo Savioli,
Maria Neira, Marco Albonico, Michael J. Beach, Hababu Mohammed Chwaya,
David WT Crompton, John Dunne, John P Ehrenberg, Theresa Gyorkos, Jane
Kvalsvig, Martin G Taylor, Carlo Urbani, Feng Zheng, Paul Garner, Rumona
Dickson, Colin Demellweek, Paul Williamson and Shally Awasthi. 2000. “Letters to
the Editor: Treatment For Intestinal Helminth Infection.” BMJ: British Medical
Journal 321(7270):1224–1227.
URL: http://www.jstor.org/stable/25226175
Miguel, Edward and Michael Kremer. 2004. “Worms: Identifying Impacts on Education
and Health in the Presence of Treatment Externalities.” Econometrica 72(1):159–217.
Miguel, Edward and Michael Kremer. 2014. “Worms: Identifying Impacts on Education
and Health in the Presence of Treatment Externalities, Guide to Replication of Miguel
and Kremer (2004).”.
URL: http://emiguel.econ.berkeley.edu/assets/miguel research/46/PSDP-REP 2014-
11.pdf
Montresor, Antonio, David Addiss, Marco Albonico, Said Mohammed Ali, Steven K.
Ault, Albis-Francesco Gabrielli, Amadou Garba, Elkhan Gasimov, Theresa Gyorkos,
Mohamed Ahmed Jamsheed, Bruno Levecke, Pamela Mbabazi, Denise Mupfasoni,
Lorenzo Savioli, Jozef Vercruysse and Aya Yajima. 2015. “Methodological Bias Can
37
Croke et al. (2016)
Lead the Cochrane Collaboration to Irrelevance in Public Health Decision-Making.”
PLoS Negl Trop Dis 9.
Ndibazza, Juliet, Harriet Mpairwe, Emily L. Webb, Patrice A. Mawa, Margaret Nampijja,
Lawrence Muhangi, Macklyn Kihembo, Swaib A. Lule, Diana Rutebarika, Barbara
Apule, Florence Akello, Hellen Akurut, Gloria Oduru, Peter Naniima, Dennison
Kizito, Moses Kizza, Robert Kizindo, Robert Tweyongere, Katherine J. Alcock,
Moses Muwanga and Alison M. Elliott. 2012. “Impact of Anthelminthic Treatment in
Pregnancy and Childhood on Immunisations, Infections and Eczema in Childhood: A
Randomised Controlled Trial.” PLoS ONE 7(12):e50325.
Ostwald, Rosemarie, Mark Fitch, Rainer Arnhold, Jennifer Shield, Louie Dexter, Jan
Kilner and Richard Kimber. 1984. “The effect of intestinal parasites on nutritional
status in well-nourished school age children in Papua New Guinea.” Nutrition
Reports International 30(6):1409–1421.
Ozier, Owen. 2015. “Exploiting Externalities to Estimate the Long-term Benefits of Early
Childhood Deworming.” World Bank Policy Research Working Paper No. 7052.
Perez del Villar, L., F. J. Burguillo, J. Lopez-Aban and A. Muro. 2012. “Systematic
Review and Meta Analysis of Artemisinin Based Therapies for the Treatment and
Prevention of Schistosomiasis.” PLoS ONE 7(9):e45867.
Pullan, R. L., J. L. Smith, R. Jasrasaria and S. J. Brooker. 2014. “Global Numbers of
Infection and Disease Burden of Soil Transmitted Helminth Infections in 2010.”
Parasites and Vectors 7(37).
Rohatgi, Ankit. 2015. “WebPlotDigitizer.”.
URL: http://arohatgi.info/WebPlotDigitizer
Smith, Jennifer L. and Simon Brooker. 2010. “Impact of hookworm infection and
deworming on anaemia in non-pregnant populations: a systematic review.” Tropical
Medicine & International Health 15(7):776–795.
URL: http://dx.doi.org/10.1111/j.1365-3156.2010.02542.x
Stephenson, L. S., M. C. Latham, E. J. Adams, S. N. Kinoti and A. Pertet. 1993.
“Physical Fitness, Growth and Appetite of Kenyan School Boys with Hookworm,
Trichuris trichiura and Ascaris lumbricoides Infections Are Improved Four Months
after a Single Dose of Albendazole.” Journal of Nutrition 123(6):103646.
Stephenson, LS, MC Latham, KM Kurz, SN Kinoti and H. Brigham. 1989. “Treatment
with a single dose of albendazole improves growth of Kenyan schoolchildren with
hookworm, Trichuris trichiura, and Ascaris lumbricoides infections.” American
Journal of Tropical Medicine and Hygiene 41(1).
38
Croke et al. (2016)
Sur, D, D.R. Sahan, B Manna, K Rajendran and S.K Bhattacharya. 2005. “Periodic
deworming with albendazole and its impact on growth status and diarrhoeal incidence
among children in an urban slum of India.” Transactions of the Royal Society of
Tropical Medicine and Hygiene 99(4):261–7.
Taylor-Robinson, David C, Nicola Maayan, Karla Soares-Weiser, Sarah Donegan and
Paul Garner. 2015. Cochrane Database of Systematic Reviews 7.
Taylor-Robinson, DC, N Maayan, K Soares-Weiser, S Donegan and P Garner. 2012.
“Deworming drugs for soil-transmitted intestinal Deworming drugs for soil-
transmitted intestinal worms in children: effects on nutritional indicators,
haemoglobin and school performance (Review).” The Cochrane Library (11).
Turner, R. M., S. M. Bird and J. P. Higgins. 2013. “The impact of study size on meta-
analyses: examination of underpowered studies in Cochrane reviews.” PLoS ONE
8(3):e59202.
Vivalt, Eva. 2015. “Heterogeneous Treatment Effects in Impact Evaluation.” American
Economic Review 105(5):467–70.
URL: http://www.aeaweb.org/articles?id=10.1257/aer.p20151015
Watkins, William E., Jos R. Cruz and Ernesto Pollitt. 1996. “The effects of deworming
on indicators of school performance in Guatemala.” Transactions of The Royal
Society of Tropical Medicine and Hygiene 90(2):156–161.
Willett, Walter, WL Kilama and CM Kihamia. 1979. “Ascaris and Growth Rates: A
Randomized Trial of Treatment.” American Journal of Public Health 69(10):987–
991.
Wiria, Aprilianto E., Firdaus Hamid, Linda J. Wammes, Maria M. M. Kaisar, Linda May,
Margaretta A. Prasetyani, Sitti Wahyuni, Yenny Djuardi, Iwan Ariawan, Heri
Wibowo, Bertrand Lell, Robert Sauerwein, Gary T. Brice, Inge Sutanto, Lisette van
Lieshout, Anton J. M. de Craen, Ronald van Ree, Jaco J. Verweij, Roula Tsonaka,
Jeanine J. Houwing-Duistermaat, Adrian J. F. Luty, Erliyani Sartono, Taniawati
Supali and Maria Yazdanbakhsh. 2013. “The Effect of Three-Monthly Albendazole
Treatment on Malarial Parasitemaia and Allergy: A Household-Based Cluster-
Randomized, Double-Blind, Placebo-Controlled Trial.” PLOS One 8(3):e57899.
World Bank. 1993. World Development Report 1993: Investing in Health. World Bank.
URL: http://elibrary.worldbank.org/doi/pdf/10.1596/0-1952-0890-0
World Health Organization. 2002. “Prevention and Control of Schistosomiasis and Soil-
Transmitted Helminthiasis.”.
URL: http://apps.who.int/iris/bitstream/10665/42588/1/WHO TRS 912.pdf
39
Croke et al. (2016)
World Health Organization. 2012. “Deworming to combat the health and nutritional
impact of soil-transmitted helminths.”.
URL: http://www.who.int/elena/titles/bbc/deworming/en/
World Health Organization Multicentre Growth Reference Study Group. 2006. “WHO
Child Growth Standards: Length/height-for-age, weight-for-age, weight-for-length,
weight-for-height and body mass index-for-age: Methods and development.”.
URL: http://www.who.int/childgrowth/standards
Yap, P., Wu F. W. Du Z. W. Hattendorf J. Chen R. Jiang J. Y. and P. Steinmann. 2014.
“Effect of deworming on physical fitness of school-aged children in Yunnan, China: a
double-blind, randomized, placebo-controlled trial.” PLoS Neglected Tropical
Diseases 8:e2983.
40
Table 1: Summary of Studies
Study: Study: TMSDG Full Prevalence Country
Full Name Short Name Estimate (SE) Estimate (SE)
1. Kruger et al. (1996) Kruger 1996 -0.38 (0.23) (same) 0.38 South Africa
2. Watkins, Cruz and Pollitt (1996) Watkins 1996 0.13 (0.11) (same) 0.92 Guatemala
3. Donnen et al. (1998) Donnen 1998 -0.45 (0.17)† (same) 0.11 Zaire
4. Awasthi, Pande and Fletcher (2000) Awasthi 2000 -0.05 (0.08) (same) 0.13 India
5. Dossa and Ategbo (2001) Dossa 2001a 0.00 (0.27) (same) 0.59 Benin
6. Dossa and Ategbo (2001) Dossa 2001b 0.00 (0.14) (same) 0.59 Benin
7. Alderman et al. (2006) Alderman 2006 0.15 (0.09)‡ (same) >50%± Uganda
8. Awasthi et al. (2008) Awasthi 1995 0.98 (0.15) (same) ≤20%∓ India
9. Awasthi and Pande (2001) Awasthi 2001 0.17 (0.34) 0.17 (0.07) 0.09 India
10. Hall et al. (2006) Hall 2006 0.0 (0.07) 0.05 (0.06) 0.84 Vietnam
11. Sur et al. (2005) Sur 2005 0.5 (0.47) 0.29 (0.09) 0.54 India
12. Willett, Kilama and Kihamia (1979) Willett 1979 (not included) 0.16 (0.08) 0.55 Tanzania
13. Joseph et al. (2015) Joseph 2015 (not included) 0.04 (0.05) 0.12 Peru
14. Miguel and Kremer (2004) Miguel 2004 (not included) -0.76 (0.44) 0.77 Kenya
15. Ndibazza et al. (2012) Ndibazza 2012 (not included) 0.01 (0.09) 0.03 Uganda
16. Gupta and Urrutia (1982) Gupta 1982a (not included) 0.027 (0.175) 0.51 Guatemala
17. Gupta and Urrutia (1982) Gupta 1982b (not included) 0.13 (0.15) 0.54 Guatemala
18. Ostwald et al. (1984) Ostwald 1984 (not included) 0.70 (0.45) 0.96 Papua New Guinea
19. Gateff, Lemarinier and Labusquiere (1972) Gateff 1972 (not included) 0.35 (0.13) >50% Cameroon
20. Wiria et al. (2013) Wiria 2013 (not included) 0.19 (0.45) 0.75 Indonesia
21. Liu et al. (2016) Liu 2016 (not included) 0.03 (0.15) 0.31 China
22. Stephenson et al. (1993) Stephenson 1993 (not included) 0.90 (0.18) 0.92 Kenya
Notes: This table summarizes key features of the studies included in the meta-analysis. We follow the
meta-analysis literature by referring to studies using the ﬁrst author and year only, but report full ref-
erences in the ﬁrst column of this table. We were able to include Gateff 1972, Gupta 1982, Ostwald
1984, Ndibazza 2012, Wiria 2013, and Liu 2015 thanks to the generosity of the Campbell Collabora-
tion deworming review team. Effect sizes are in kg. †The estimated effect for Donnen 1998 was taken
from TMSDG, who obtained unadjusted estimates from trial authors. ‡The estimated effect size for
Alderman 2006 was taken from TMSDG, who obtained clustered, unadjusted estimates from trial au-
thors. ±Alderman 2006 does not report baseline prevalence from the study population, but references
an earlier study done in the area. ∓Awasthi 1995 did not report baseline prevalence in study, and was
classiﬁed according to two subsequent studies performed in the same region of India (Awasthi 2000 and
Awasthi 2001). Gateff 1972 does not provide baseline prevalence from within their own study, but is
classiﬁed based on statistics from a local health center mentioned within the trial text. See Appendix A
for other details on these studies.
41
Table 2: Tests of the Hypothesis of Common Zero Effect, Adding Updates Individually
Study Effect Size P-value
1. TMSDG .061 .089*
2. TMSDG (using prior Cochrane classiﬁcations) .092 .009***
3. Sur 2005 .092 .006***
4. Willett 1979 .077 .021**
5. Joseph 2015 .054 .066*
6. Awasthi 2001 .086 .006***
7. Ostwald 1984 .066 .069*
8. Gateff 1972 .082 .019**
9. Liu 2015 .06 .089*
10. Stephenson 1993 .092 .009***
11. Wiria 2013 .062 .084*
12. Ndibazza 2012 .054 .105
13. Gupta 1982a .06 .09*
14. Gupta 1982b .065 .063*
15. Hall 2006 (ANCOVA) .073 .032**
16. Miguel 2004 .056 .121
17. Full Sample (rows 3-16) .111 <0.001***
Notes: This table presents meta-analysis treatment effect estimates for the impact of multiple-dose
mass deworming on weight using a ﬁxed effects model, and the p-values associated with a test of the
null hypothesis of a common zero effect across all studies included in the sample. Row (1) includes
the TMSDG sample of studies (described in the notes of Figure 1). Row (2) adds Stephenson 1993
(classiﬁed by TMSDG as a study of “infected children” rather than “all children living in an endemic
area”) to the TMSDG sample, following classiﬁcation approaches used in prior Cochrane Reviews.
Rows (3) through (16) make the sample changes described for the full sample (described in the notes of
Figure 2) one at a time (holding the remainder of the TMSDG sample constant), and then for the full
sample altogether in Row (17). For brevity we refer to each study by the ﬁrst author and year; see Table
1 for the full reference. * p < 0.1, ** p < 0.05, *** p < 0.01.
42
Table 3: Estimated Weight Gain (kg) and Test Results Across Samples
TMSDG Full sample Full with Full with
>20% prevalence >50% prevalence
Fixed effect estimate (s.e.) 0.061 0.111 0.142 0.157
(0.036) (0.022) (0.030) (0.031)
P-value: test for common zero effect 0.089* <0.001*** <0.001*** <0.001***
Random effects estimate (s.e.) 0.078 0.134 0.148 0.182
(0.098) (0.052) (0.056) (0.057)
P-value: random effects estimate 0.426 0.011** 0.008*** 0.001***
Number of studies 11 22 16 14
Notes: This table presents treatment effect estimates and key test results across the samples discussed in the main text. * p < 0.1, **
p < 0.05, *** p < 0.01.
iii
43
Table 4: Robustness of Random Effects Estimates to Dropping Individual Studies
Study Effect Size Standard Error P-value
Full Sample 0.134 (0.052) 0.011**
1. Kruger 1996 0.150 (0.052) 0.004***
2. Watkins 1996 0.134 (0.055) 0.016**
3. Donnen 1998 0.158 (0.051) 0.002***
4. Awasthi 2000 0.146 (0.055) 0.008***
5. Dossa 2001a 0.137 (0.054) 0.010***
6. Dossa 2001b 0.140 (0.054) 0.010***
7. Alderman 2006 0.132 (0.056) 0.018**
8. Awasthi 1995 0.095 (0.043) 0.028**
9. Awasthi 2001 0.131 (0.057) 0.021**
10. Hall 2006 0.139 (0.057) 0.016**
11. Sur 2005 0.123 (0.055) 0.024**
12. Willett 1979 0.132 (0.056) 0.018**
13. Joseph 2015 0.140 (0.058) 0.016**
14. Miguel 2004 0.144 (0.052) 0.006***
15. Ndibazza 2012 0.141 (0.055) 0.011**
16. Gupta 1982a 0.138 (0.054) 0.011**
17. Gupta 1982b 0.134 (0.054) 0.014**
18. Ostwald 1984 0.127 (0.053) 0.016**
19. Gateff 1972 0.122 (0.054) 0.023**
20. Wiria 2013 0.133 (0.053) 0.012**
21. Liu 2016 0.139 (0.054) 0.011**
22. Stephenson 1993 0.105 (0.049) 0.031**
Notes: This table presents meta-analysis treatment effect estimates for the impact of multiple dose mass
deworming on weight using a random effects model, dropping one study from the sample at a time
(holding the remainder of the full sample constant). For brevity we refer to each study by the ﬁrst author
and year; see Table 1 for the full reference. * p < 0.1, ** p < 0.05, *** p < 0.01.
44
Figure 1: Effect of Mass Deworming on Child Weight, TMSDG Sample
Notes: This meta-analysis forest plot includes all studies in the “TMSDG sample”, which is the set
of studies included in Taylor-Robinson et al. (2015) for the study of treating “all children living in an
endemic area” with multiple doses of deworming medication at longest follow-up on the weight gain
outcome, for a total of 11 effect estimates. For brevity, we refer to each study by the ﬁrst author and
the year; see Table 1 for the full reference. Squares denote the point estimate, and the whiskers show
the 95% conﬁdence interval. Effect sizes and standard errors are taken directly from Taylor-Robinson
et al. (2015), not from the original articles. The point estimate squares are sized according to the weight
each study is given in the ﬁxed effect meta-analysis (calculated according to the precision of the study).
The dotted vertical line represents zero effect. The lower panel displays the estimated effect across all
studies using ﬁxed and random effects models, the p-value associated with a test of the null hypothesis
of a common zero effect across all studies, and the p-value of the random effects estimate.
45
Figure 2: Effect of Mass Deworming on Child Weight, Full Sample
Notes: This meta-analysis forest plot includes all studies in the “full sample”, as described in the text,
for a total of 22 effect estimates. For brevity, we refer to each study by the ﬁrst author and the year;
see Table 1 for the full reference. Squares denote the point estimate, and the whiskers show the 95%
conﬁdence interval. For studies that were included in Taylor-Robinson et al. (2015), effect sizes and
standard errors are taken directly from that meta-analysis, except for the updates described to Awasthi
and Pande (2001), Hall et al. (2006), and Sur et al. (2005). The full sample additionally includes
several studies that were not included in the TMSDG meta-analysis: Gateff, Lemarinier and Labusquiere
(1972), Gupta and Urrutia (1982), Joseph et al. (2015), Liu et al. (2016), Miguel and Kremer (2004),
Ndibazza et al. (2012), Ostwald et al. (1984), Stephenson et al. (1993), Willett, Kilama and Kihamia
(1979), and Wiria et al. (2013). The point estimate squares are sized according to the weight each study
is given in the ﬁxed effect meta-analysis (calculated according to the precision of the study). The dotted
vertical line represents zero effect. † denotes a study with data updated since TMSDG. ‡ denotes a study
not included in TMSDG. The lower panel displays the estimated effect across all studies using ﬁxed and
random effects models, the p-value associated with a test of the null hypothesis of a common zero effect
across all studies, and the p-value of the random effects estimate.
46
Figure 3: Statistical Power and Minimum Detectable Effects in TMSDG (2015) and Full Sample
(a) Panel A: Child Weight Gain and Statistical Power in Full Sample (black) and TMSDG (gray)
0.08 kg TMSDG estimate
and C.I.
0.13 kg Full sample estimate
and C.I.
Statistical Power
100% Full Sample
TMSDG
Full sample power: 72%
TMSDG power: 13%
Child weight
gain, kg
0.0 0.1 0.2 0.3
(b) Panel B: Minimum Detectable Effects in Full Sample (black) and TMSDG (gray)
Statistical Power
100% Full Sample
TMSDG
80%
60%
40%
20%
Child weight
Full sample MDE: TMSDG MDE: gain, kg
0.0 0.1 0.15 0.2 0.28 0.3
47
Notes: This ﬁgure shows estimates of statistical power using the “full sample” as well as the TMSDG
sample, which is the set of studies included in Taylor-Robinson et al. (2015) for the study of treating
all children living in an endemic area with multiple doses of deworming medication at longest follow-
up on the weight gain outcome. The “full sample” includes 22 effect estimates, while the TMSDG
sample includes 11 effect estimates. For studies that were included in TMSDG, effect sizes and stan-
dard errors are taken directly from that meta-analysis, except for updates to Awasthi and Pande (2001),
Hall et al. (2006), and Sur et al. (2005), as described in the text. The full sample additionally in-
cludes several studies that were not included in the TMSDG meta-analysis: Gateff, Lemarinier and
Labusquiere (1972), Gupta and Urrutia (1982), Joseph et al. (2015), Liu et al. (2016), Miguel and
Kremer (2004), Ndibazza et al. (2012), Ostwald et al. (1984), Stephenson et al. (1993), Willett, Kil-
ama and Kihamia (1979), and Wiria et al. (2013). See Table 1 for the estimates that are included in
each sample. In Panel A, the gray circle represents the estimated average effect across all studies in
the TMSDG sample using a random effects model. The black circle represents the estimated average
effect across all studies in the “full sample” using a random effects model. Whiskers represent 95%
conﬁdence intervals for the estimated average effect sizes. In both panels, the black and gray lines (cor-
responding to the full sample and TMSDG sample respectively) show the estimated statistical power
(vertical axis) to detect a given average effect size (horizontal axis). These estimates were obtained
using the method of Hedges and Pigott (2001). In particular, for a given effect size, we estimate power
E f f ectSize
as: Power = 1 − Φ 1.96 − StandardError E f f ectSize
+ Φ −1.96 − StandardError , where Φ is the cumulative dis-
tribution function for a standard normal random variable, and StandardError is the standard error for
the average effect size under the random effects model. Reported power for a given effect size is the
probability that the null hypothesis that the average effect size is zero is rejected at the 0.05 level of sig-
niﬁcance. Panel A shows the estimated power in the full sample and the TMSDG sample. The reported
80% MDE is an estimate of the effect size that would deliver a test with 80% power. Panel B shows the
estimated 80% MDE for the TMSDG sample and the full sample.
48
Appendix A Details on data extraction for the full sample
This appendix describes the full sample of studies included in the primary analysis, and describes
data extraction for these studies, following the general principles outlined in section 3, which were
used to generate the full sample of 22 treatment impact estimates from 20 different trials. Table 1
presents descriptive characteristics of the full sample.
Appendix A.1 lists the studies included from TMSDG without any updates. Appendix A.2
discusses studies included in the updated analysis which were not mentioned in TMSDG, and
which we presume the TMSDG authors were unaware of. Appendix A.3 describes the process of
incorporating studies that were mentioned in TMSDG but not included in their child weight gain
analysis, for example, by using formulas in The Cochrane Handbook (Higgins and Green, 2011)
to derive standard errors from other reported data. The subsequent sections detail adjustments
to some estimates included in the TMSDG sample: Appendix A.4 discusses cases in which more
precise difference-in-difference estimates could be used instead of simply looking at endline differ-
ences, while Appendix A.5 discusses cases in which ANCOVA estimation could be used. Section
Appendix A.6 describes the process for resolving conﬂicting i nformation i n Awasthi a nd Pande
(2001). Appendix A.7 explains how studies were classiﬁed according to WHO recommendations
for MDA based on helminth prevalence.
Appendix A.1 Estimates adopted from TMSDG
TMSDG include 11 treatment effect estimates from 10 different trials in their meta-analysis of
the impact of multiple dose deworming of “all children living in an endemic area” on weight gain
at longest follow-up. Eight of these treatment effect estimates are included without alteration in
the updated sample: those from Kruger et al. (1996), Watkins, Cruz and Pollitt (1996), Donnen
et al. (1998), Awasthi, Pande and Fletcher (2000), Dossa and Ategbo (2001), Alderman et al.
(2006), and Awasthi et al. (2008). Note that the clustered, unadjusted estimates from Alderman
49
et al. (2006), and the unadjusted estimates from Donnen et al. (1998), were not contained in the
published versions of the trials, but were obtained by Cochrane authors directly from the original
trial authors. We use these same estimates in our sample.
Appendix A.2 Incorporating studies not mentioned in TMSDG
The full sample employed in this paper additionally incorporates four studies not mentioned in
TMSDG.
Joseph et al. (2015) was likely not included in TMSDG simply because it was published in
2015, conceivably after the ﬁnal literature review was conducted for the meta-analysis. The trial
targeted children between ages 1 and 2 in rural Peruvian communities over the course of 1 year.
The study presents a treatment effect and 95% conﬁdence interval from the multiple dose treatment
arm.25 A formula provided in The Cochrane Handbook was used to compute the standard error
(following Principle i in Section 3.3 of the main text).
Liu et al. (2016), was likely not included in TMSDG simply because it is a new, still unpub-
lished study. This cluster randomized trial targeted school-aged children in China. The unad-
justed difference-in-difference treatment impact estimate, 95% conﬁdence interval, and associated
p-value were obtained from the study authors, thanks to communication facilitated by the Camp-
bell Collaboration. Again, we use a formula provided in The Cochrane Handbook to compute the
standard error (following Principle i in Section 3.3 of the main text).
The Campbell Collaboration alerted us to the existence of Ostwald et al. (1984) and Gateff,
Lemarinier and Labusquiere (1972). Ostwald et al. (1984) is a trial involving school-aged children
in Papua New Guinea, and Gateff, Lemarinier and Labusquiere (1972) is a study of school-aged
children in rural Cameroon. It is unclear why neither of these studies are mentioned in TMSDG.
Treatment effects and standard errors were calculated using information available in the published
papers and formulas provided in Higgins and Green (2011) (following Principle i in Section 3.3 of
25 Two treatment arms involved just a single dose of deworming and were not included.
50
the main text).
Appendix A.3 Incorporating studies mentioned in TMSDG but omitted from
weight gain meta-analysis
Six studies mentioned in TMSDG but not incorporated in their meta-analysis for the weight gain
outcome are included in the sample.
1) Willett, Kilama and Kihamia (1979) is acknowledged in TMSDG, but not included in their
meta-analysis for weight gain, possibly (although it is not entirely clear) because the trial authors
report only an adjusted treatment effect of mass deworming on weight gain or because the standard
errors of the treatment effect are not directly reported in the text. Following TMSDG’s preference
for unadjusted treatment effect estimates, we contacted the trial authors in an attempt to obtain
the microdata in order to extract unadjusted values, but after searching in his archives, Dr. Wil-
lett determined that the original data had been destroyed. We thus include what appears to be an
adjusted treatment effect measure in our full sample (following Principle iv in Section 3.3 of the
main text). The covariates used are baseline weight, study induction date (there were two separate
study intakes), and age at the time of induction. All three of these are likely to improve precision
of the estimates. Although treatment impact standard errors are not directly reported in the study,
information presented is used to calculate standard errors of treatment effects, following the pro-
cedure and formulas in Higgins and Green (2011) (following Principle i in Section 3.3 of the main
text).
2) Miguel and Kremer (2004) report estimated impacts on weight-for-age z-score, but do not
report estimates for the raw weight outcome. As a result, this study is not included in the TMSDG
sample for meta-analysis on weight. However, the original trial data is publicly available, and we
computed the estimated impact on weight using that data and an ANCOVA speciﬁcation (following
data extraction principles iii and vii).26 Schools which received treatment for schistosomiasis
26Miguel and Kremer (2014) corrects rounding, coding, and typographical errors in the original
51
(praziquantel) are dropped.
3) Ndibazza et al. (2012) was not included in the weight meta-analysis in TMSDG, likely
because the study reports only impacts on outcomes derived from weight (weight-for-age and
weight-for-height), but does not present estimates for the raw weight outcomes. The data for this
trial is not publicly available, but the Campbell Collaboration generously shared information on
the raw weight impact from this study obtained through correspondence with the study authors,
allowing inclusion of this trial in our full sample (following data extraction principle iii).
4) Wiria et al. (2013) is classiﬁed i n T MSDG a s a s ingle d ose t rial, b ut t his a ppears t o be
erroneous based on our reading of the article. In their abstract, the study authors write “481 house-
holds (2022 subjects) and 473 households (1982 subjects) were assigned to receive placebo and
albendazole, respectively, every three months.” Furthermore, this trial does not report raw weight
outcomes in the study text, although they were measured. The Campbell Collaboration authors had
contacted the original authors and received from them baseline and endline measures of weight and
standard deviations of those values for all study participants under age 16, and generously shared
these estimates with us.27 Wiria et al. (2013) does not report variance of changes, so a correlation
coefﬁcient is required to impute the standard error of the treatment effect. A correlation coefﬁcient
was estimated using a study with author-provided raw microdata of baseline and endline weight
values (Hall et al., 2006). Using this estimated correlation coefﬁcient o f 0 .89 y ields a standard
error of 0.4458 for Wiria et al. (2013).28 We thus incorporate this trial into our sample using data
extraction principles iii and vi.
5) Stephenson et al. (1993) was included in the 2012 Cochrane Review as a case of mass
paper and presents updated data and results. We use these updated data and refer to updated results
throughout, although we continue to reference Miguel and Kremer (2004) for simplicity.
27 It is not entirely clear whether the values that were calculated account for clustering, but since
the household clusters had so few children per cluster, additional clustering would not substantially
affect standard errors.
28Another trial for which authors provided raw microdata, Goto, Mascie-Taylor and Lunn
(2009), has the extremely similar baseline-endline correlation coefficient of 0.90.
52
treatment and since we are examining mass treatment studies, we include this study. Prevalence
at baseline was 92%, so while this is a high prevalence community, this was not a test and treat
study, but an MDA study. Departing from the previous review, TMSDG classify this as a study
of “infected children”, and do not include it in their meta-analysis of the effect of “all children
living in an endemic area”. Note that in the 2015 update, the Cochrane Review changed its test-
and-treat category, previously called “screened for infection”, to “children known to be infected.”
The result of this choice is that in the 2015 update, TMSDG no longer classify Stephenson et al.
(1989) and Stephenson et al. (1993) as mass treatment programs.29 The distinction used in the
2012 Cochrane Review, between “test and treat” and “mass treatment”, corresponds more closely
to the decision facing policymakers, and we preserve the original distinction. In doing so, one
treatment effect and standard error from Stephenson et al. (1993) is incorporated in the full sample,
which measured the impact of multiple doses in an unscreened, but heavily infected, population
of Kenyan schoolchildren.30 We are able to calculate this treatment impact and standard error,
following data extraction principle i.
6) Gupta and Urrutia (1982) was excluded from the TMSDG analysis for reasons that are un-
clear (to us). TMSDG note in the “Characteristics of excluded studies” section that “[There are]
only two units of allocation for relevant comparison. Children randomly divided into 4 groups,
29 TMSDG state that “We changed the classiﬁcation of Stephenson et al. (1989) and Stephenson
et al. (1993). Previously these trials were in the ‘all children in an endemic area’ category, whereas
now they are classiﬁed in the ‘children with i nfection.’ This decision was based on reviewing the
trials with parasitologists and examining the prevalence and intensity of the infection where clearly
the whole community was heavily infected” (Taylor-Robinson et al. (2015) p. 154). It is worth
noting that although TMSDG exclude Stephenson et al. (1993), they include Watkins, Cruz and
Pollitt (1996); the highest recorded worm baseline prevalence in Watkins, Cruz and Pollitt (1996)
by STH species is 92% (for ascaris); the highest prevalence in Stephenson et al. (1993) is also 92%
(for whipworm). Thus this reclassiﬁcation does not appear to have been done systematically by
worm prevalence. In our view, assessing the merits of the WHO policy by including studies in
environments with prevalence below WHO thresholds while excluding MDA studies in areas with
high prevalence may lead to risk of bias.
30Stephenson et al. (1989) and the other treatment arm of Stephenson et al. (1993) tested single
dose deworming so are excluded from our analysis.
53
‘taking care that age distribution was similar in each group”’. The 4 groups were then allocated
1 of 4 different single treatment regimens; no details given.” (p. 97). Following data extraction
principle i, we calculate treatment effects and standard errors from the deworming versus placebo
comparisons (n=78), and the deworming plus giardia treatment versus giardia treatment only com-
parisons (n=80) in the published paper.
Like TMSDG, the full sample excludes Goto, Mascie-Taylor and Lunn (2009). We received
raw data generously from the study authors (via the Campbell Collaboration). However, the shared
data only contained observations for children who had received the full set of intended doses of de-
worming medicine, rather than all who had been assigned to treatment, regardless of whether or not
they received full treatment. Therefore a valid intention-to-treat analysis could not be conducted
and estimates from this data were not included in the meta-analysis.
We also follow TMSDG in excluding Awasthi et al. (2013) since the text indicates that the non-
mortality outcomes such as weight were only measured for a subset of children from a randomly
chosen cluster, but that within clusters, measured children were not chosen randomly.
Appendix A.4 Increasing precision using differences-in-differences estimates
Sur et al. (2005) is included in the TMSDG sample using an endline-only comparison. The updated
sample uses additional data from the article in order to calculate a difference-in-difference estimate,
following data extraction principles vi and ii.
In particular, Web Plot Digitizer software (Rohatgi, 2015) was used to extract difference-in-
difference estimates for Sur et al. (2005) from a ﬁgure. E xtracting e ndline v alues a nd endline
error bars from the graph nearly exactly reproduces in RevMan software the treatment effect of 0.5
and (abnormally large) standard error of 0.4717 reported in TMSDG (Web Plot Digitizer yields
a treatment effect of 0.53 and a standard error of 0.46). Data from the ﬁgure and from p-values
reported in the paper text was used to calculate the standard error of the baseline to endline change
using the same software to extract data from the figure in which baseline and endline values are
54
reported. This data was combined with the regressions of the treatment effect reported in the paper.
The standard error of the change was calculated following the formulas and procedures in Higgins
and Green (2011), using information on the treatment effect, p-values, and degrees of freedom.31
The change in weight from baseline to endline in Sur et al. (2005) is 0.2925 (note that this is a
smaller treatment effect than the 0.5 difference at endline used by TMSDG). In the text of the
article it explicitly states that the p-value of this change is 0.001.32 The t statistic is calculated
using the p-value and degrees of freedom. Once the t statistic is obtained, the standard error can
be calculated using the following formula:
treatment e f f ect
standard error = , (3)
t statistic
The tinv function in Excel was used to determine that, given a p-value of 0.001 and a sample size
of 683 (and thus 681 degrees of freedom), the t statistic is 3.3048. This, in turn, using equation 1,
implies a standard error of 0.0885. This result takes Sur et al. (2005) from being an outlier with
an extremely large standard error of 0.4717 (despite a relatively large sample of n=683) to having
similar standard errors to the other study in the TMSDG sample with comparable sample size:
Awasthi, Pande and Fletcher (2000), with sample size of 1,045, has a standard error of 0.0760.33
This revised standard error is included in the full sample.
31As the change in weight over time is not reported in the text of the paper, the same method
was used that we believe Taylor-Robinson et al. (2015) used to estimate the endline difference in
means, i.e. using data from Figure 1 in the article.
32See p. 261 and p. 265 of Sur et al. (2005).
33We contacted Dr. Sur to obtain the original micro data from the trial, in order to verify these
calculations directly from the original microdata. Unfortunately, Dr. Sur is now retired and thus
no longer has access to the micro data
55
Appendix A.5 Use of ANCOVA to account for baseline imbalance in out-
come
Among all of the studies included in our meta-analysis, only one reports baseline imbalance in the
weight outcome measure - Hall et al. (2006). Speciﬁcally, t he c ontrol g roup i s h eavier, a t 20.7
kg, compared to 20.5 kg in the treatment group – a difference which is statistically signiﬁcant at
p = 0.01 and thus quite unlikely to occur by chance.
As noted in section 3, the Cochrane Handbook states clearly that when baseline data is avail-
able, the preferred analytical approach is to control for the baseline value of the outcome using
an Analysis of Covariance (ANCOVA) speciﬁcation, instead of the difference-in-difference spec-
iﬁcation u sed b y t he o riginal a uthors a nd b y T MSDG ( data e xtraction p rinciple v ii). A further
advantage of this method is not only that ANCOVA is a more efﬁcient estimator (McKenzie, 2012;
Frison and Pocock, 1992), but that it also reduces bias in cases of any baseline imbalance (Kerwin,
2015). We use microdata obtained directly from the Hall et al. (2006) trial authors in order to
estimate this ANCOVA speciﬁcation (properly accounting for clustering), and obtain an effect size
of 0.05 (SE 0.06).34
Appendix A.6 Resolving apparently conﬂicting reporting
In the text of Awasthi and Pande (2001), the authors report conﬂicting treatment effect estimates,
an issue that was also noted by TMSDG in their meta-analysis (Taylor-Robinson et al. (2015),
p.43). In particular, the text of Awasthi and Pande (2001) states that deworming produced positive
34 A second issue with this trial relates to the imputation of clustered standard errors. In TMSDG,
the treatment effect values (for a weight gain of 0.00) are included in the meta-analysis using the
results reported in an unpublished manuscript obtained from the trial authors. TMSDG note that
while some estimates were analyzed using methods to account for clustering, the main unadjusted
results in the manuscript did not appear to use clustered standard errors, so they adjust the standard
errors using an ICC that they obtain from Alderman et al. (2006), which was a cluster randomized
trial in Uganda. In this analysis the original trial data are used to calculate, rather than impute, the
clustered standard errors.
56
and signiﬁcant effects on weight; the authors write that “Mean (+ SE) weight gain in Kg in control
versus ABZ [i.e. treatment] areas was 3.04 (0.03) versus 3.22 (0.03), (p=0.01)” (p. 823). Later in
the text, however, a similar treatment effect and level of statistical signiﬁcance, but a different set
of standard errors for the treatment effect, is reported: “The mean weight gain in 1.5 years in the
albendazole plus vitamin A group was 5.57% greater than that in the vitamin A group alone (3.22
KG (SD: 2.03, SE: 0.26) vs. 3.05 KG (SD: 1.47 SE: 0.19) P-value=0.01).” (p. 825).
We follow data extraction principle vii in consideration of this issue. In their meta-analysis,
TMSDG use the reported treatment effect (0.17 kg), and appear to calculate the standard error
using the second set of values (SE 0.26 and SE 0.19). Based on the p-values calculated from
these numbers, and in contradiction to the p-value of 0.01 reported in the study, TMSDG refer to
these results as not statistically signiﬁcant, with a standard error of 0.341. By contrast the standard
error is 0.0650 if one uses the p-value of 0.01 and treatment effect of 0.17 to back out a standard
error, following, as in section 3, the formulas and procedures in Higgins and Green (2011), section
7.7.3.3. 35
Three pieces of evidence were used to assess which estimate to use. First, we consulted directly
with Dr. Awasthi about this issue. She expressed disagreement with TMSDG’s interpretation of the
results, and conﬁrmed that she agreed with the interpretation of the study’s results and calculation
of the study’s results and calculation of the study’s standard errors using the p-values and effect
sizes used here.36 Second, the standard error for the weight outcome presented in the TMSDG
analysis is 0.341, very large for the size of this large trial (124 clusters, and over 2,000 participants).
In fact, this is 1.5 to 3 times larger than the weight outcome standard errors that TMSDG calculate
for other trials in their original sample with only a fraction of the sample size.37 By contrast, if
35 There is yet a third possible way to calculate standard errors from data reported in this paper.
This would be to use a set of standard errors reported in the abstract (0.03 for both treatment and
control changes from baseline). These ﬁgures imply a still smaller standard error of 0.04.
36 Personal communication, March 23, 2016. Dr. Awasthi also noted that the original micro data
is no longer available.
37 For instance, Kruger et al. (1996), n=74, SE 0.2241), Watkins, Cruz and Pollitt (1996), n=226,
57
the standard error is calculated using the p-value and treatment effect (SE=0.0650), this makes it
comparable to the other large cluster RCTs.38 Finally, we note that it is the (statistically signiﬁcant)
p-value that is reported consistently in the paper, rather than the standard error. Essentially, it is
either the case that the authors entered incorrect measures of variance at one point in the paper, or
one believes that the authors’ interpretation of the full set of study results was incorrect. Given our
correspondence with Dr. Awasthi, the evidence from the standard errors of comparable studies,
and the fact that the p-value is reported consistently in the paper while the standard errors differ,
the standard error derived from the p-value is incorporated into the full sample.
Appendix A.7 Classiﬁcation of studies by prevalence
Studies are classiﬁed according to WHO guidelines for MDA recommendations which are in turn
based on whether helminth prevalence is greater than 20%, in which case MDA is recommended,
and greater than 50%, in which case multiple dose MDA is recommended. Helminth prevalence
in a study is classiﬁed based on the maximum prevalence across all worms reported in that study.
Where possible, helminth prevalence level (see Table 1) is classiﬁed based on prevalence described
within the study itself, using cutoffs that are appropriate for WHO policy guidelines. One study
in our sample is classiﬁed based on prevalence from an earlier study done in the area and which
was used for targeting of the intervention, rather than baseline data collection within the trial itself
(Alderman et al. (2006)). (Another study in our sample, Awasthi et al. (2008), does not report on
prevalence at all, and is classiﬁed based on two other subsequent trials conducted in the same area
of India – Awasthi, Pande and Fletcher (2000) and Awasthi and Pande (2001)). Finally, Gateff,
SE=0.1059), Donnen et al. (1998), n=198, SE=0.1665), and the two treatment arms from Dossa
and Ategbo (2001) (n=65, SE 0.265 and n=64, SE=0.1385).
38 For instance, Hall et al. (2006) (40 clusters, SE 0.0599), Alderman et al. (2006) (50 clusters, SE
0.0892), Awasthi et al. (2008) (50 clusters, SE 0.148)) and the large individually randomized trials
(Sur et al. (2005), n=683, SE=0.0885, Awasthi, Pande and Fletcher (2000), n=1,045, SE=0.076).
We do note, however, that there are two large cluster RCTs in the full sample with comparably
large standard errors: Miguel and Kremer (2004) (73 clusters, SE=0.44 and Wiria et al. (2013)
(954 household clusters, SE=0.45).
58
Lemarinier and Labusquiere (1972) is classiﬁed according to information from local health center
statistics provided in the article, although the authors do not report baseline prevalence in their own
sample.
59