Are There Lasting Impacts of Aid to Poor Areas? Evidence for Rural China Shaohua Chen, Ren Mu and Martin Ravallion1 Development Research Group, World Bank 1818 H Street NW, Washington DC Summary: The paper re-visits the site of a large, World Bank-financed, rural development program in China, 10 years after it began and four years after disbursements ended. The program emphasized community participation in multi-sectoral interventions (including farming, animal husbandry, infrastructure and social services). Data were collected on 2,000 households in project and non-project areas, spanning 10 years. A double-difference estimator of the program's impact (on top of pre-existing governmental programs) reveals sizeable short-term income gains that were mostly saved. Only small and statistically insignificant gains to mean consumption emerged in the longer-term -- though in rough accord with the gain to permanent income. The use of community-based beneficiary selection greatly reduced the overall impact, given that the educated poor were under-covered. The main results are robust to corrections for various sources of selection bias, including village targeting and interference due to spillover effects generated by the response of local governments to the external aid. Keywords: Poor-areas, aid, credit, rural development, impact evaluation, China JEL: D91, H43, I32, O22 World Bank Policy Research Working Paper 4084, March 2008 The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the view of the World Bank, its Executive Directors, or the countries they represent. Policy Research Working Papers are available online at http://econ.worldbank.org. 1 This study would not have been possible without the survey data collection effort over 10 years by the Rural Household Survey Team of China's National Bureau of Statistics (NBS). We are particularly grateful to Wang Ping Ping at NBS, who ably supervised the surveys. The authors have benefited from discussions with Guido Imbens and the comments of Kathleen Beegle, Richard Blundell, Solveig Buhl, Shubham Chaudhuri, Richard Chiburis, Alan De Brauw, Quy-Toan Do, Gershon Feder, Emanuela Galasso, Garance Genicot, Stuti Khemani, Alice Mesnard, Alan Piazza, Dominique van de Walle and seminar participants at University College London, the Overseas Development Institute, the University of Namur, the Indian Statistical Institute, the Paris School of Economics and the World Bank. The support of the Bank's Research Committee and the Knowledge for Change Trust Fund is gratefully acknowledged. The paper's findings, interpretations and conclusions are those of the authors and should not be attributed to the World Bank, its Executive Directors, or the countries they represent. 1. Introduction Publicly-supported grants and loans to poor areas have long been an important vehicle for development assistance. For example, China's anti-poverty policies have emphasized such poor-area programs since the mid-1980s,2 motivated by the observation that the country's success against poverty over the last 25 years has been geographically uneven, with marked disparities in living standards emerging.3 Advocates of such programs claim that credit constraints in poor areas perpetuate their poverty and that targeted aid can relieve those constraints. By this view, capital-market failures in poor areas entail that the investments made under such a program would be infeasible otherwise, implying both efficiency and equity gains. It remains an open question how much impact can be expected. While not perfect, capital markets may still work well enough to assure that marginal products of capital come into rough parity between poor and non-poor areas in steady state. Then the problem of lagging poor areas is not so much lack of capital as low productivity of capital, such as due to poor natural conditions, lack of complementary knowledge or skills, or poor policies. And even with credit constraints, some people are clearly more constrained than others. If those selected are not credit constrained, their participation is voluntary, and the interest rate is no different from other credit sources, then there will be no net gain from the extra availability of credit. Heterogeneity in the impacts of such programs can also arise from inequalities in the complementary skills or knowledge needed to derive benefits from the extra investment. Beneficiary selection will then be crucial to the outcomes. However, it is not obvious that the selection procedures found in practice would "pick the winners." Beneficiary selection for local development programs has come to rely heavily on local community groups. This practice may well achieve greater equality in access to the aid within villages, but possibly at the expense of assuring that the aid goes to those who would benefit the most. 2 See (inter alia) Leading Group (1988), World Bank (1992, 1997), Jalan and Ravallion (1998) and Park et al. (2002). 3 See, for example, Knight and Song (1993), Jian et al., (1996), Khan and Riskin (1998), Ravallion and Jalan (1996, 1999), World Bank (1992,1997), Kanbur, and Zhang (1999) and Ravallion and Chen (2006). 2 This paper provides the first rigorous assessment of the longer-term impacts of a poor-area program. The program is the World Bank's Southwest China Poverty Reduction Project -- the Southwest Program (SWP) for short. This comprised a package of multi-sectoral interventions targeted to poor villages using community-based participant and activity selection. The aim was to achieve a large and sustainable reduction in poverty. The paper reports results from an intensive survey data collection effort over 10 years, initiated by two of the authors and done in close collaboration with the Rural Survey Organization of China's National Bureau of Statistics. Assessing development aid effectiveness at the project level raises a number of challenges. A long-term commitment to collecting high-quality survey data is crucial, but it is not sufficient. Impact can only be meaningfully assessed relative to a counterfactual; our counterfactual is the absence of the SWP, which means that we assess the incremental impacts, on top of pre-existing governmental spending. As in any observational study, there are concerns about selection bias, i.e., differences in counterfactual outcomes between SWP participants and non-participants. Our data collection effort allows us to "difference out" the time-invariant component of the selection bias (arising from non-random placement). However, it is not obvious on a priori grounds that the bias would be constant over time, given that the initial village characteristics that attract the program (such as poor infrastructure) may also influence the growth rate under the counterfactual. We use both propensity-score weighted regression and kernel-matching methods to balance the observable covariates between sampled SWP and non-SWP villages. A further problem is that aid-financed poor-area development projects are likely to violate the common assumption in impact evaluations (both experimental and non- experimental) of no interference with the comparison units.4 A plausible source of interference in this setting is through local public-spending spillover effects to non-SWP villages. The local government cuts its own development spending in the villages targeted for external aid, and the spending is diverted in part at least to the non- 4 This assumption is often implicit in impact evaluations but it was made explicit by Rubin (1980), who dubbed it the stable unit treatment value assumption (SUTVA). SUTVA is known to be implausible in certain bio-medical evaluations. 3 participants used to form the comparison group. We propose and implement a test for spillover effects and we construct a bound for the bias. The paper's principle finding is that there were sizeable income gains from the SWP during its disbursement period, but these gains did not survive four years later. The longer-term impact on mean income is neither large nor statistically significant. However, we do find significant gains for some sub-groups, notably those among the poor with better schooling. Our results point to substantial losses from the community- based beneficiary selection process. The following section describes the SWP while sections 3 and 4 describe our data and methods. Section 5 presents the main results while Section 6 draws some lessons for future evaluations. 2. Background on the program In 1986, the Government of China designated that about 15% of the country's 2,200 counties were "poor counties," which would receive extra assistance, mainly in the form of credit for development projects. Past research has suggested that the designated poor counties are in fact poor (by a range of defensible criteria) and that they have seen higher growth rates than one would have otherwise expected (Jalan and Ravallion, 1998; Park et al., 2002). The gains have not been sufficient to reverse the underlying tendency for growth divergence (whereby poorer counties tend to have lower growth rates) and there is evidence that the impacts on economic growth may have declined in the 1990s (Park et al, 2002). Within these designated poor counties, geographic pockets of extreme poverty have persisted to the present day, mainly in upland areas. The SWP was introduced in 1995 with the aim of reversing the fortunes of selected poor villages in the designated poor counties of Guangxi, Guizhou and Yunnan. About one-quarter of the villages were selected for the SWP (1,800 out of 7,600 villages). The aim was to choose relatively poor villages within these counties, with selection based on objective criteria, although not formulaic. The selection was done by the county government's project office in consultation with provincial and central authorities and the World Bank. 4 The total outlay on the SWP was US$464 million, which was financed by World Bank loans and counterpart funding from China's central and provincial governments. The total investment per capita under the SWP was only slightly lower than mean annual income per capita of the project villages. As in other World Bank projects, there were numerous appraisal and supervision missions by Bank staff and consultants, and these missions often probed quite deeply into the project's local operations, including numerous visits to participating counties and villages. Two of the authors (Chen and Ravallion) participated in some of these missions and also revisited a number of the sampled villages over two weeks in May 2005 (including some that they had visited 10 years earlier) and had informal discussions about the SWP with numerous ex-participants. Within the selected villages, virtually all households were expected to benefit from the infrastructure investments, such as improved rural roads, power lines and piped water supply. Widespread benefits were also expected from the improved social services, including upgrading village schools and health clinics, and training of teachers and village health-care workers. Those with school-aged children also received tuition subsidies (conditional cash transfers). Over half of the households in SWP villages also received individual loans (accounting for about 60% of disbursements). The interest rate was set at the same level as for loans from the government's poor-area programs and the Agricultural Development Bank of China, although this is a lower rate than for commercial sources of credit. The loans financed various activities including initiatives for raising farm yields, animal husbandry and tree planting. There was also a component for off-farm employment, including voluntary labor mobility to urban areas and support for village enterprises. The selection of project activities aimed to take account of local conditions and the expressed preferences of participants, although it is unclear how well this worked in practice; there have been reports that farmers' preferences were sometimes over-ruled by local cadres (World Bank, 2003). Household selection into the SWP was a less transparent process than village selection, which could be based on data and field observations. The household selection was typically done by the pre-existing "farmers' committee" in each village and was not subject to rigorous monitoring. From our discussions in field work, it appears that credit- 5 worthiness criteria and successful past experience with similar project activities played an important role. No doubt local level connections also played a role. In common with other development projects, the SWP provided the capital and technical assistance, but it did not provide insurance, and many of the activities are likely to entail non-negligible risk; the income gains will depend on a number of contingencies, including the vagaries of the weather, uncertain demand for the new products and risks associated with out-migration. The ex ante expectation was that the SWP would virtually eliminate poverty in the selected villages over the longer term. The World Bank's Implementation Completion Report (ICR) -- the final document giving the ex post "self-assessment" of a lending operation by the relevant operational unit -- claimed that the SWP had a substantial impact on poverty, citing survey data indicating that the poverty rate had been more than halved in the project areas over 1995-2001 (World Bank, 2003).5 However, the attribution of these gains to the SWP is questionable. The evaluative claims in the ICR are reflexive comparisons, which only reveal the true impact under the assumption that there would have been no progress against poverty in the absence of the project. That assumption must be deemed highly implausible in this setting. Ravallion and Chen (2005) studied the impacts of the SWP over the disbursement period, 1995-2000, using survey data for 2,000 randomly sampled households in both SWP and observationally similar non-SWP villages that had first been surveyed in 1995 (at the beginning of the project) and then annually until project completion. On comparing income changes in SWP villages with those in the matched non-SWP villages, they found an average income gain over five years of around 10% of baseline mean income, representing an average rate of return of 9%. The gains are not as dramatic as suggested by the reflexive comparisons in the ICR, but they are still sizeable. However, Ravallion and Chen found that a large share of the income gain was saved. On comparing the final year of disbursement with the first, Ravallion and Chen found only a modest impact on mean consumption or consumption poverty. The savings rate from the project's income gains was well above the pre-intervention savings rate. 5 This was confirmed by researchers at the Chinese Academy of Social Sciences, who also pointed to a substantial increase in primary school completion rates and a decline in the infant mortality rate which they attributed to the SWP (Guobao et al., 2004). 6 Why was there such a high savings rate from the initial income gains? A number of explanations can be suggested, carrying rather different implications for the long-term impact of the SWP. Possibly households saved more to assure they could repay the loans. That depends on the extent to which repayment was enforced. While the World Bank's loan is made to the (central) Government of China, and repayment is virtually certain, that is not the case for the loans made at local level, where enforcement problems are common. Indeed, local repayment rates on loans for poverty reduction under the government's own program were less than 25% in the three provinces covered under SWP.6 However, it may be that the necessity of the center repaying the World Bank "trickled down" in the form of greater local enforcement of SWP repayments than for the loans made under the government's own poor-area programs. Another possibility is that the high initial savings rate reflected a perception on the part of participants that the longer-term income gains from SWP would be modest or uncertain at best -- raising concerns about the sustainability of the program's impacts.7 When interpreted in terms of the Permanent Income Hypothesis, the Ravallion-Chen findings imply that participants felt that a large share of the income gain was transient, and (hence) it was saved. While this would happen even without uncertainty about the future income gains, such uncertainty is likely, and would probably lead to precautionary saving in response to the project.8 In this regard, it is instructive that Ravallion and Chen found large year-to-year differences in impact, which were primarily due to variability in the annual returns to the program's investments rather than the level of investment. This variability in the returns suggests that participants would have had a hard time assessing the program's impact on permanent income. The transient-income explanation suggests that the income impacts of SWP would diminish appreciably after disbursement. Precautionary saving would also start to fall as 6 The repayment rates on loans for poverty reduction in 1997 ranged from 8% in Yunnan to 23% in Guizhou. Repayment rates were somewhat higher for other types of loans but the overall average was still only 30% (Government of China, 1998). 7 The ICR rated "sustainability" as "highly likely." The Bank's internal evaluation of SWP by its Operations Evaluation Department pointed to the need for further evidence on the longer- term sustainability and impact of SWP. 8 There is evidence of precautionary savings in response to uninsured risk in the same region of rural China; see Jalan and Ravallion (2001). 7 participants learn more about the impacts. Consumption gains should become evident in due course, consistent with the project's underlying impact on permanent income. There is another explanation for the high savings rate from the short-term income gains. This postulates that the SWP systematically alters the returns-to-saving in the participating villages. By this view, the project provided local public goods that increased the marginal product of private capital, and so stimulated higher savings to support the desired private investment, which would yield longer-term income gains beyond the life of the project.9 This assumes that there are capital-market imperfections, which entail that investment depends on own-savings and that the marginal products of private capital are not equalized across locations. With the poor facing severe constraints on access to credit and yet having higher marginal products of capital in their own (farm and non-farm) enterprises (given low capital stocks and concave production functions) one might expect to see a sizeable (and pro-poor) investment response. Clearly, this explanation offers a more positive view of the prospects of a sustained impact on poverty from the SWP, in that it suggests that income gains will persist well beyond the disbursement period (as the returns to investment start to be realized) and that sustainable consumption gains would emerge. By re-surveying in 2004/05 the same sample studied by Ravallion and Chen (2005) we hope to throw light on which of these explanations is most plausible. 3. Data The original plan for the impact evaluation of SWP was to do a baseline survey in 1995 and to only do follow-up surveys during the Bank's disbursement period, up to 2000. However, we decided to re-survey the original sampled households in 2004/05, to try to resolve the issues about longer-term impact raised by Ravallion and Chen (2005) and discussed in the previous section. All surveys were implemented by the Rural Household Survey (RHS) team of the government's National Bureau of Statistics (NBS). The surveys covered 2,000 randomly-sampled households in 200 villages, with roughly half not participating in the 9 Jalan and Ravallion (2002) provide a micro model of growth with imperfect capital markets that is consistent with this property, and find supportive evidence in the same region of rural China. 8 SWP. All villages were in counties covered under the government's poor-area program, to assure that we will identify the impact of the SWP, on top of the government's program. There are 112 SWP villages and 86 non-SWP villages.10 The SWP villages were a random sample from all project villages, while the non-SWP villages were a random sample from all other villages in the designated poor counties. Ten randomly sampled households were interviewed in each village. The 1996-2000 and 2004/05 surveys included community, household and individual questionnaires. The community schedule collected data on natural conditions, infrastructure and access to services. The household survey collected data on (inter alia) incomes, consumptions and assets. The individual questionnaires covered gender, age, education and occupation. A data set was collected from 1997 to 2001 on development project activities (both SWP and under other existing government programs). There are 34 project activities identified in these data, in seven categories (farming, animal husbandry, forestry, infrastructure, education, health and labor migration).11 We follow Ravallion and Chen (2005) in using 1996 as the baseline. There are serious comparability problems between the 1995 survey and later surveys.12 As a baseline, the 1996 data are not free of contamination; 17% of the program's total disbursement on household projects had been made by the end of 1996. We check robustness to using 1995 as the baseline. Relative to other household surveys, unusual effort went into obtaining accurate estimates of consumption and income from the 1996-2000 and 2004/05 surveys. While the community, individual and project activity surveys used conventional one-time interviews, the household survey was quite different. The household surveys from 1996 onwards were closely modeled on NBS's Rural Household Survey (RHS) (which is 10 In the 2004/05 survey, two villages (one SWP and one not) were inadvertently replaced by two different villages in the same township. 11 A project activity survey in 1998 also gathered information about the scale and the starting year of each SWP sub-project at village and household level, as well as data on other funding these villages and households received from the government and other sources. 12 Because of delays in NBS being told the locations of SWP villages, the first survey in December 1995 had to use a one-time interview method, asking recall over the full year. The use of this long recall period is likely to lead to underestimation of income and consumption (though this is of less concern for village-level characteristics). The subsequent surveys used the daily- diary method over the full year, allowing more accurate income and consumption data. 9 described in detail in Chen and Ravallion, 1996). This is a good quality budget and income survey, notable in the care that goes into reducing both sampling and non- sampling errors. Similarly to the RHS, sampled households maintain a daily record on all transactions plus log books on production. Local interviewing assistants visited each household at two-three weekly intervals, at which time inconsistencies found at the local (county-level) NBS office are checked. Other trained interviewers also visited at regular intervals to collect additional data. This intensive interviewing method is a marked contrast to most surveys in which the respondent is visited only once or twice. The consumption aggregate is built up from very detailed data on cash spending on all commodities and imputed values of in-kind spending, which is mainly consumption from household production, valued at local selling prices. Living expenditures exclude spending on production inputs (which are accounted for in net income from own-production activities). They also exclude transfer payments, though these only account for a small share of total spending (3.7% over the whole sample in 1996). The income aggregate includes cash income from all sources and imputed values for in-kind income. Income is measured net of all production costs, including interest on debt (including loans from the SWP). The migrant workers were not tracked, although the income aggregate includes remittances received from family members who migrated, including those supported by the SWP. Remittances are expected to be the main means by which the out-migration component reduced poverty in the short run. Given the unusual effort that went into data collecting and checking the consumption and income data, we expect that subtracting consumption from income will give reasonably accurate estimates of savings. We also look into what forms the savings took. There are many forms of saving in this setting, including money balances and investment in own-production activities. The survey was not designed to allow a complete independent accounting of all forms of saving. Some data were collected on assets and liabilities, although the reliability of the reported values is questionable. We also study impacts on holdings of specific assets. For the 2004/05 follow-up survey we used exactly the same survey instrument as for the 1996-2000 surveys, augmented with a module to elicit perceptions of both welfare and the project's impacts. The module asked respondents to assess whether various 10 aspects of their lives had improved over the preceding 10 years. (The questions in this module were asked in 2005.) These involved a long list of aspects of well-being and in each case the respondent was asked whether this item had improved or not over the last 10 years, on a 10 point scale (from "extremely worse off" to "extremely better off").13 (The sample was restricted to adults who were at least 28 years of age at the time of the interview.) Our idea here is to see whether such a rapid appraisal tool -- which does not require any prior surveys, including a baseline -- gives similar results to our more costly longitudinal survey-based method. Over 1996-2005, the attrition rate was 12% (6% over 2000-05). Using a probit model for attrition over 1996-2005 we found a number of significant predictors, including age of head, share of children, landholding and some geographic variables.14 (Being an SWP village was not a significant predictor of attrition.) NBS survey teams were instructed to find replacement households as similar as possible to those that dropped out. We also tested how well this replacement worked, using a regression for the probability of being a replacement household estimated on the pooled sample of replacement and "drop-out" households.15 Among the same set of covariates for attrition, no regressor was a significant predictor for replacement and the regression has very low overall explanatory power. It appears that the sample with replacements can be considered representative of the population. We checked the robustness of our results to several potential data problems. One problem concerns the aggregation of total living expenditures. It appears that in processing the 2004/05 survey data, living expenditure in one county may have failed to include in-kind consumption.16 The data for three other households whose in-kind income was more than six times larger than their total living expenditure seem to have a similar problem. We re-estimated the impacts on consumption and income, dropping this 13 The Chinese and local language versions of the module were refined over time on the basis of field tests in poor villages in a number of locations. 14 A statistical addendum is available from the authors giving full details. 15 Note that we have baseline data for the "drop-outs" and the current year's data for the replacements. To deal with the time difference we did a pro-rata adjustment of the data on drop- outs to 2004 values according to the ratios of the means over time for each variable, based on the balanced panel. In caculating the ratios, we also weighted by the attrition probability. 16 We suspect there is a problem because the total living expenditure of 68% the sample in that county is equal to cash expenditure, whereas net in-kind income is about half of overall total. 11 one county and the three households. The results reported below were robust to this change (details available from the authors). Another potential data problem is related to the coding of SWP projects. We find in the village-level project data base that all ten villages in one county claim to have a project funded by the SWP, even though six of them were officially designated as non- SWP villages.17 It may well be that there was significant SWP participation by villages that had not been selected for the project in this particular county (although we cannot rule out coding errors). On deleting this county we found that our main results were robust (details are available from the authors). 4. Estimation methods and sources of bias Our aim is to estimate average treatment effects on the treated. The double- difference ("difference-in-difference") method identifies a project's impact under the assumption that the selection bias (the counterfactual difference in outcomes) is constant over time and additive in its effect on outcomes. In the present context, we point to two sources of time-varying selection bias: (i) outcome changes are correlated with initial differences between the participating and non-participating areas, and (ii) spillover effects, whereby the project itself alters the subsequent path of outcomes for the non- participants. 4.1 Biases due to targeting Let us begin with the classic evaluation problem. We have data on an outcome measure Yit for the i'th unit observed at dates, t=0,1. Each unit is observed to be either a participant (Tit =1) or non-participant (Tit = 0 ). We can write the outcome measure as: Yit = Yit + TitGit (t=0,1; i=1,...,N) C (1) where Git = Yit -Yit is the gain ("impact"), Yit is the outcome under treatment and Yit T C T C is the counterfactual outcome. Git is not directly observable for any i (or in expectation) since we do not know Yit for Tit = 0 and Yit for Tit =1. The selection bias is the mean T C difference in counterfactual outcomes (dropping the i subscripts): 17 There are scattered minor reports of SWP activity in non-SWP villages elsewhere, but these appear to be random, and are probably coding errors. 12 Bt = E(Yt | T1 =1) - E(Yt | T1 = 0) C C (2) We call this the unconditional bias, given that we have not yet allowed for control variables. Given the purposive targeting of the SWP it must be presumed that Bt 0. The standard double-difference estimator assumes that B1 = B0 , implying that the change in mean gains for period 1 participants is consistently estimated by: DD = E[(Y1 -Y0 ) |T1 =1]- E[(Y1 -Y0 ) |T1 = 0] = E[G1 -G0 |T1 =1] T T C C (3) If period 0 is a true baseline, with T0 = 0 for all i (by definition), then Y0i = Y0i for all i, C i and so DD = E(G1 | T1 =1), i.e., mean impact on the treated units. However, time-invariant unconditional bias ( B1 = B0 ) is implausible for poor-area development programs. The targeted poor areas typically lack infrastructure and other initial endowments, which could (in turn) affect the subsequent growth rates. DD will then be a biased estimator, since the subsequent outcome changes are a function of initial conditions that also influenced the assignment of the sample between the two groups. In other words, the selection bias will not be constant over time.18 The direction of bias in DD depends on whether the underlying growth process is convergent or divergent. For the government's poor-area programs in southwest China in the 1980s, Jalan and Ravallion (1998) found that failure to control for the initial heterogeneity between the targeted counties and non-participating counties yields a downward bias in a DD estimator, consistent with growth divergence.19 However, it is unclear whether this also holds across villages within the same (poor) counties; indeed, the results of Jalan and Ravallion (2002) (also for southwest China) suggest that inter- county divergence can occur side-by-side with intra-county convergence. We address this issue by balancing treatment and comparison units in terms of the initial conditions that may have influenced program placement. These variables are represented by the vector X. Our key identifying assumption is that the selection bias is time-invariant conditional on X, i.e., that: E(Y1 | T1 =1, X ) - E(Y1 | T1 = 0, X ) = E(Y0 | T1 =1, X ) - E(Y0 | T1 = 0, X ) C C C C (4) 18 This echoes more general concerns about the importance of correcting for selection bias based on observables (Rosenbaum and Rubin, 1983; Heckman et al., 1998). 19 Also see Jalan and Ravallion (2002) who find evidence of divergence at the county level. 13 On applying a result due to Rosenbaum and Rubin (1983), if outcome changes are independent of participation given X, then they are also independent of participation given the propensity score: P(Xi) = Pr(Ti =1Xi) , (0 < P(Xi) <1). This justifies balancing on P(X) to remove selection bias based on X. Note that this only addresses time-varying selection bias based on observables; a bias will remain if there are any latent (time-varying) factors correlated with the changes in counterfactual outcomes. As discussed later, a remaining bias due to unobservables appears to be more likely for household selection than village selection. We use various methods for assuring balance on P(X). One method is to limit comparisons to a trimmed sub-sample with sufficient overlap in propensity scores. For our data, the region of common support (minimum score for treated, maximum score for untreated) is (0.11, 0.95). For our "trimmed sample" we chose a slightly tighter interval (0.1, 0.9), which are also the efficiency bounds recommended by Crump et al. (2006) for estimating average treatment effects with minimum variance.20 We also use the weighted-regression method proposed by Hirano, Imbens and Ridder (2003). Thus we estimate the DD from the following regression: Yit = + DD.Ti1t + Ti1 +t +it (t=0,1; i=1,...,N) (5) where E(i Ti ) = 0. This is estimated with weights of unity for treated units and 1 P^(X ) /(1- P^(X )) for controls, where P^(X ) is a consistent estimate of P(X) and 0 < P^(X ) <1. Hirano et al. show that weighting the controls this way yields an efficient estimator.21 We estimate (5) on both the pooled sample (for t=0,1 and including replacement households) and for both the total sample and trimmed sample. 20 Using the formula in Crump et al. (2006), the exact bounds are 0.0997 and 0.9003. For estimating the average treatment effect on the treated Crump et al. also recommend dropping treatment units with scores less than about 0.8 (for our data), but keeping all un-treated units. We did not follow this recommendation. For one thing, we felt that this entailed the loss of too many treatment villages, raising concerns about inference for the population of treated villages. Secondly, our balancing tests performed better when we also deleted the low-score untreated villages, which are clearly poor comparison units. 21 If we wish to estimate the average treatment effect for the population, the weights are 1/ P^(X ) for the treated units and 1/(1- P^(X )) for the controls (Hirano and Imbens, 2002). 14 To interpret (5) note that, in a balanced panel, we could instead estimate the equivalent regression in the more familiar "fixed-effects" form: Yit = * + DD.Ti1t +t +i +it (6) Here the fixed effect is i =i Ti +i (1-Ti ) = Ti + + i ( E(i Ti ) 0 )) where T C C 1 1 1 1 = - , i = (i - )Ti +(i - )(1-Ti ), E(i) = 0, it =it + i and T C T T C C 1 1 =* + . Thus,Ti in (5) picks up differences in the mean of the latent individual C 1 effects, such as would arise from initial selection into the program. The advantage of (5) is that it does not require a balanced panel, and hence it gives estimates that are robust to selective attrition (recalling that the replacements appear to have preserved the sample's ability to represent the population). As a robustness check, we compare these estimates with matching on the propensity score. Note first that the sample estimate of mean impact can be written as: NT NC ((Y T (7) i1-Yi0 )- Wij(Yj1 -Yj0))/NT T C C i=1 j=1 where NT is the number of SWP participants, NC is the number of control observations, and Wij is the propensity score-based weight given to the j'th non-participants in making a comparison with the i 'th participant. How many non-participants to include in the control group and how to assign weights to each non-participants are practical questions in implementing PSM. One option is to use the popular method of nearest-neighbor matching. However, because of the non-smoothness of nearest neighbor matching, the conventional bootstrapping method is inappropriate for estimating the standard errors (Abadie and Imbens, 2006). In order to assure valid bootstrapped standard errors, we choose to apply nonparametric kernel matching in which all the non-participants are used as controls and weights are assigned according to a kernel function of the predicted propensity score (following Heckman et al., 1997, 1998). The weights can be written as Wij = Kij / Kik , where Kij = K((P^j (X ) - P^i (X ))/ an ), in which K() is a kernel k function and an a bandwidth parameter. We use the normal density function as the kernel and the odds ratio (rather than propensity score) because SWP villages are over-sampled relative to their frequency in the population eligible for the project. 15 The conditional independence assumption motivates a specification test of whether there are differences in observables between the project and non-project villages after conditioning on P(X ) through matching or re-weighting. Following Rosenbaum and Rubin (1985) and Abadie and Imbens (2006) we test for covariate balancing using differences in standardized means between the SWP villages and matched or re-weighted non-SWP villages. To achieve a better balance of covariates and to allow for a more flexible estimate of propensity scores, we also include polynomial terms for the initial income levels (see, for example, Smith and Todd, 2005). We will show that the matching and re-weighting procedures produce a satisfactory balancing of the observables between SWP and comparison villages. 4.2 Biases due to spillover effects All the methods described above assume that an observationally similar comparison group pre-intervention reveals the counterfactual of what would have happened over time to mean outcomes for the treatment group in the absence of the intervention. This will clearly not be the case if there are any spillover effects, whereby the intervention changes outcomes for non-participants. Spillover effects due to residential mobility between villages are unlikely in this setting given the village-level administrative land allocation. Under China's rural land laws, a migrating household would have little prospect of getting a share of the land available (and almost certainly cultivated) at the destination and would also risk losing their land at the origin. Another source of spillover effects is inter-village trade (possibly via urban hubs). To the extent that the project has an impact on local incomes and prices, trade-induced general equilibrium effects will entail spillover effects to the non-SWP villages used to infer the counterfactual. We will test for impacts on prices as well as incomes, distinguishing cash-incomes (as derived from inter-village trade) and incomes-in-kind. Local public spending responses to project aid can also be confounding. Recall that there were other development activities supported by the local (county and provincial) governments, side-by-side with the aid-financed SWP. Non-SWP villages could then be affected by a SWP-induced re-allocation of public spending by local authorities. If the SWP does not have a lasting impact then the bias will probably be 16 confined to the disbursement period. However, if there are lasting impacts (observable to local authorities) then one would expect the local spending response to the SWP to continue beyond the disbursement period. A theoretical model of the local public spending response to external aid can help inform an assessment of the likely bias. Let GOVj denote the local government's spending on its own poor-area development programs in village j=SW, NSW, which index the SWP (project) and non-SWP (comparison) villages, and total spending is GOV = GOVSW + GOVNSW . (We treat the two groups as having equal size but this does not change the main result as long as the group sizes are fixed.) The external aid provides extra spending in the amount AID in the project villages, so that total spending on poor-area programs in the SWP villages is GOVSW + AID . The local government has a preference ordering over its spending allocation across the two sets of villages and its spending on all other activities, denoted Z. The preference ordering is represented by: W(GOVSW + AID,GOVNSW , Z) and this function is strictly increasing in all three elements, and strictly concave in all three; it simplifies the analytics if we also assume that the function is additively separable, though this can be weakened. The local government maximizes W subject to its local revenue constraint, which creates an upper bound on GOV+Z. Under these assumptions we have the following result (that is proved in Appendix 1): Proposition: The external aid will displace local government spending in the project villages, increase spending in the comparison villages, but decrease total local government spending across both sets of villages. The implication for our evaluation is plain: Comparing outcome changes over time between SWP and (matched) non-SWP villages in the same counties will under-estimate the project's true impact. We will test for spillover effects. The presence of non-SWP development projects in the SWP villages provides the clue. We use the same evaluation methods described above, but the "outcome variable" becomes the extent of non-SWP project activity in the SWP villages. The theoretical result in the above proposition will be exploited in determining an upper bound to the bias induced by spillover effects, 17 5. Estimated impacts Table 1 gives mean income and consumption and the poverty rates for 1996, 2000 and 2004/05. The poverty line of 808 yuan per person per year in 1995 prices (corresponding to the $1 a day line used by Ravallion and Chen, 2005, at 1993 purchasing power parity) as well as poverty lines above and below this figure. We see that the income gains in SWP villages between 1996 and 2000 were larger than among non-SWP villages, but that this reverses between 2000 and 2004/05. Ten years after its commencement, the SWP does not appear to have allowed the selected poor villages to catch up with the rest of these (poor) counties. Table 1 suggests that SWP had little or no impact on income and consumption. However, before accepting that conclusion we need to probe more deeply into the potential sources of bias described in the previous section. We begin with selection bias due to non-random placement of the SWP. At the end of this section we test for bias due to contaminating spillover effects. 5.1 Probits for selection into the SWP Table 2 gives probits for whether a village was selected for SWP, as used to estimate the propensity scores. The variables were chosen to reflect the selection criteria used by the project staff (based on our interviews at the time). We find that project villages tend to be in more hilly/mountainous areas, are less likely to have electricity, less likely to have a school in the village or nearby, though more likely to have a health clinic within the village relative to nearby.22 The SWP villages also tend to have larger populations, with lower mean income in 1995 (from the village-level data), lower mean consumption in 1995 (from the household survey) and more land per capita. The latter characteristic probably reflects lower population density and lower land quality in the project villages. In most respects, the results of Table 2 suggest that the SWP villages tend to be poorer than other villages within the project counties, consistently with Table 1. Using the propensity scores based on Table 2 to re-weight the data we were able to obtain a close balancing of the characteristics of the two samples (including in the 22 Remote villages are more likely to have a very basic health clinic, to compensate for the inaccessibility to more comprehensive township facilities. 18 means of the initial outcome variables), particularly after trimming the samples, as discussed in the previous section. Appendix 2 provides details on the balancing tests, which pass comfortably; this was also the case for a full set of covariates in Table 2, for which the balancing tests are reported in the Addendum available from the authors. 5.2 Double-difference estimates of average impacts In assessing impacts on mean consumption and income, we begin with the simple DD estimates of the mean impacts for income, consumption and saving, as given in Table 3. We give estimates for both 2000 (at the end of disbursements) and 2004/05 and for both the levels and the logs; the latter gives higher weight to the gains to poorer households. The baseline is 1996 in both cases. Focusing first on the disbursement period, we see a sizeable and statistically significant impact on income but not consumption; the bulk of the income gain was saved. (The same pattern was found using 1995 as the baseline.) On decomposing income (as wage income, farming, animal husbandry, fishery, forestry, non-farm enterprises, transfers and asset income), the only component that showed a statistically significant impact was animal husbandry, for which the simple DD impact on net income was 90.85 yuan (t=2.92), which rose to 117.26 (t=3.37) and 136.15 (t=3.55) using weighting and matching (respectively) to correct for selection bias (Table 4).23 Another way of disaggregating income is into cash or kind (which will be relevant when we consider trade spillovers in section 5.4). We found that the bulk of the short- term income impact was income in-kind from animal husbandry, as is evident from Table 4. This is puzzling, as a sizeable share of income-in-kind from husbandry in a rural economy is also consumed directly, and should then show up in consumption. However, the income in-kind that is being affected by the project appears to be small non- productive animals and new litters of productive animals, which are counted as income in kind but are held over for consumption or sale at a later date rather than consumed.24 We will return to this point when we discuss the longer-term impacts. 23 We only report the results for husbandry, and summarize those for other components; a statistical addendum is available with full details. 24 We do not have data on this, but the practice of counting such animals as income in-kind is discussed in the manual for enumerators provided by NBS. 19 We can also disaggregate consumption expenditure. On separating food staples (rice, wheat etc) from non-staples and other foods we found significant impacts in 2000 for non-staple foods (meat, vegetables etc); the simple DD for this category was 26.26 yuan (t=1.68) though rising to 40.64 yuan (t=2.69) and 42.58 yuan (t=2.70) for the PS weighted and kernel matched estimators respectively. This is likely to entail nutritional gains through higher protein and more micro-nutrients. The results change dramatically when we track the impacts through to 2004/05, as is evident when we return to Table 3. We find no significant impacts on mean income or consumption over the longer observation period.25 (This also was also true for staples and non-staples separately.) Table 3 also gives the DD estimates for mean income using the propensity scores to balance project and non-project villages; we give results using both weighting and matching, for both end dates, and for both the trimmed sample and total sample. The basic pattern in the simple DD estimates is still evident. The results are robust to using kernel matching instead of the re-weighted regression method.26 While there is clearly some sensitivity to the choice of estimation method, the pattern is still reasonably robust, indicating significant and sizeable income gains during the disbursement period but much less in the longer term. The estimated income gains in 2000 tend to be larger when we correct for purposive selection of SWP villages; this is consistent with a divergent growth process between villages. However, no such pattern is evident for the 2004/05 impacts. We did find significant longer-term impacts on income in-kind. On breaking up income in-kind by source, we found that both farming and husbandry accounted for almost all these long-run impacts, though only husbandry was significant (Table 4). The simple DD estimate of impact in 2004/05 on income in-kind was 130.30 yuan (t=2.11), though this fell somewhat when we corrected for selection; with weighting we obtained DD=111.90 (t=1.89) while with matching, DD=96.98 (t=1.78). We found no other significant impacts in the long run amongst cash income components. 25 The same pattern was evident using 1995 as the baseline, although impacts were somewhat lower. 26 The results were also robust to deleting the troublesome county and the observations with problematic data (section 3). 20 In contrast to the period up to 2000, we find consumption gains in the post- disbursement period. The impact on total consumption in 2004/05 is not statistically significant (Table 3). However, when we break this up according to cash or kind, we do find signs of larger impacts on consumption in kind. The simple DD estimate for consumption in kind in 2004/05 is 118.40 yuan (t=2.54), although this drops appreciably when we correct for selection bias; using PS weighting the impact is 74.46 (t=1.50). The longer-term impacts on consumption in kind probably include consumption of the income in-kind from animal husbandry that we observed in the SWP disbursement period. For either the simple DD or the score-weighted DD, the consumption gains exceed what one could reasonably expect under the permanent income hypothesis (PIH) if the income gains from SWP were purely transient. For then the consumption gains in the four-year period following SWP would simply be the rate of interest times the permanent-income equivalent of the transient income gain. For the simple and score- weighted DD, plausible rates of interest would imply lower consumption gains than we see in Table 3, although this is not true for the kernel-matched DD for which the post- disbursement consumption gains equal the increment to permanent income at a rate of interest of about 10%. Statistically, however, we cannot reject the null hypothesis that the post-disbursement consumption gain equals the increment to permanent income (at reasonable interest rates) treating the SWP income gain as transient. The PIH interpretation begs the question as to why we saw no consumption gains in the disbursement period. If SWP participants knew at the outset that the project would entail only a transient income gain then consumption would have immediately reflected the implied gain to permanent income. However, from what we know about the SWP, it is unlikely that participants could have formed a reliable estimate of the gain to permanent income due to SWP until at least project completion. As noted in section 2, there was considerable uncertainty about the income gains, and high initial savings may have been a short-term precautionary response. We found no evidence of impacts on interest payments on loans or the proportion of households paying interest or paying back loans, for either 2000 or 2004/05. 27 So we find no support for the idea that either the high savings from the short-term gains or the 27 Again we only summarize the results here; the addendum gives full details. 21 lower longer-term impacts on incomes stem from greater enforcement of interest or repayment requirements under the SWP, compared to other credit sources. With weak enforcement of the SWP loan repayments, it might be conjectured that taxes on SWP areas would increase, to help local authorities pay back the SWP loans to higher levels of government. However, we did not find any evidence of impacts on taxes or fees paid per capita, in either 2000 or 2004. It appears that higher levels of government treated the SWP as, in large part, a transfer payment to lower levels. In testing for impacts on agricultural productivity, we used total farm income per unit area.28 We found no evidence of impacts. Nor did we find much evidence of impacts on holdings of productive assets and wealth (including housing). This was true for both the disbursement period and the longer-term. An exception is that the village data base revealed a significant impact on livestock holdings, notably cows and goats.29 There is some sign of a demographic impact. Household size fell in both SWP and non-SWP villages over 1996-2000, but more so in the former. The simple DD for household size is -0.13 persons (t=-1.75) and it is slightly larger with the corrections for selection bias (the PS-weighted estimate is -0.16, t=-1.64, and it was similar for kernel matching). The demographic effect was associated with slightly fewer children. However, the demographic impact was not evident in 2004. Nor did we find any evidence of impacts on remittances received from family members migrating out, or on the probability of a family member migrating.30 We did find significant impacts on school enrolment rates during the disbursement period; our PS-weighted DD estimate was 0.074 (with a t-ratio of 2.20), i.e., a 7.4% point increase in the school enrollment rate of children aged 6-14 by the year 28 Ideally we would use physical output for a given crop per unit area under its cultivation. However, only total land area under cultivation was collected. Instead we used an overall farm productivity measure, obtained by dividing total net income from farming by total cultivated area; this can be interpreted as a mean crop-specific yields weighted by both prices and shares of land. 29 The simple DD for cows per person in 2000 was 0.05 (t-ratio=2.47); with score- weighting it rose to 0.07 (t=3.54) and it was the same with kernel matching (t=4.33). By 2004 the impacts were slightly higher and equally significant statistically; the simple DD estimate was 0.07 (t=3.69) while the score-weighting the impact was 0.09 (t=4.05) and with kernel matching it was 0.10 (t=3.92). Significant impacts were also evident for sheep, although with lower t-ratios. 30 Out migration in the previous year is only measured for those present in the village at the time of the interview, although NBS made an effort to ask the individual questions at times of the year when migrants are more likely to be present. Remittances may well be the better indicator. 22 2000 is attributed to SWP.31 However, this impact had dropped substantially by 2004/05; the corresponding DD estimate fell to 0.032 (t=1.00). The transient schooling impact probably reflects the fact that the tuition subsidies ended with other SWP disbursements. Of course, even though the non-SWP village caught up substantially with the SWP villages in schooling by 2004/05. Thus there were children in SWP villages who entered school earlier than without the SWP and this will probably yield future income gains. There was almost no sign of impacts on the prices of agricultural outputs and purchase prices for inputs for 13 items.32 We found positive impacts during the disbursement period for a number of types of infrastructure, although they are generally not statistically significant. We found little sign of impacts in the 2004/05 data. The exception was TV reception, which showed significant impacts in the longer-term as well as during the disbursement period. Table 5(a) gives the estimated impacts on the incidence of income poverty for various poverty lines; Table 5(b) gives the corresponding results for consumption poverty. Again we give estimates using the poverty line of 808 yuan per person per year as well as selected poverty lines above and below this figure.33 The poverty impacts in the SWP disbursement period are broadly consistent with our findings for the impacts on the mean income and consumption in Table 3.34 In Figure 1 we also give the results graphically, by plotting the DD estimate of the impact on the headcount index of poverty (for income and consumption poverty in panels (a) and (b) respectively) against the poverty line, which we vary over virtually the whole distribution. Impacts on the income poverty rate are largest just below the 808 poverty line, for both end dates. The impacts on consumption poverty echo our results for mean consumption around the middle of the range of poverty lines, where 2004/05 consumption-poverty impacts exceed those for 31 The uncorrected DD was 0.046 (t=1.41) and the kernel matched DD was 0.072 (t=2.40). 32 The only exceptions were that diesel oil had a significantly higher price in the SWP villages by 2004/05 and edible oil crop had a slightly lower price. 33 The table only gives results for the trimmed sample, which is better balanced. However, although the precise estimates differ between the two samples, the basic pattern was the same, and our main conclusions do not depend on this choice. 34 The results were also robust to deleting the county in which some SWP activity was recorded in non-SWP villages. We found an impact on extreme consumption poverty in 2004 after deleting the consumption outliers ( The weighted DD at 500 consumption poverty line is - 8.06 with t-ratio of -1.72; the weighted DD at 600 is -9.20 with t-ratio of -1.67.) 23 2000; the results imply a sizeable nine percentage point drop in the consumption poverty rate at poverty lines around 600 yuan. However, this is not true at lower and higher lines, where impacts over the two time periods agree fairly closely. For all of the above impact estimates, the counterfactual is the absence of the SWP. There is an alternative counterfactual of interest, namely the absence of direct participation in any anti-poverty program, including the government's programs. For identifying this counterfactual we can use those households in non-SWP villages who did not participate in any other program; this applied to 69% of the households in non-SWP villages. So we repeated the above calculations dropping those who recorded any direct participation in other programs. (The balancing tests passed comfortably.) The impacts for 2000 were similar to those above. However, the long-run impacts on mean income and consumption were larger. For example, the simple DD estimate of the impact on mean income in 2004 rose to 125 yuan per person (as compared to 45 yuan in Table 5) although this fell to 99 yuan when we corrected for selection bias using PS weighting. Nonetheless, the impacts relative to this alternative counterfactual were still not significantly different from zero; for example, the t-ratio on the simple DD for mean income was 1.47, which dropped to 1.13 with PS weighting. 5.3 Heterogeneity in impacts We tested for differences in impacts according to the initial values of income, education and ethnicity.35 The score-weighted DD's were not significantly different for any of our outcome variables when we stratified by education or ethnicity. However, we found a notable difference when stratified by initial income (above or below the median), with significant longer term gains for the low-income group. When we interacted income with education we found that the longer-term gains were strongest for the relatively well educated (at least junior high school) amongst the low-income households, as can be seen in Table 6. The heterogeneity in returns suggests that a different assignment of the loans would have increased overall impact. The household participation rate was slightly higher for the group of relatively poor but well educated households; 61.1% of this group in 35 We distinguish Han Chinese from all other ethnic minorities. The ICR points to concerns about how well ethnic minorities were reached by the SWP (World Bank, 2003). 24 SWP villages participated, as compared to 58.8% of those with above median income and higher education, 50.0% of those with high income but low schooling, and 47.8% of those with both low income and schooling. (The program slightly favored better educated households both above and below median income.) Suppose that beneficiary selection had focused solely on the relatively well-educated poor, and saturated this group, with no change to conditional mean impacts by subgroup, which were zero for other groups (consistently with Table 6). Then the impact of the program as a whole would have risen substantially, from a mean impact of about 40 yuan per person to about 150 yuan.36 To achieve this outcome, the program would have had to over-ride the community-based selection process, which evidently put too little weight on reaching the educated poor, even though this group was already favored in the selection process. While we found no impacts on average remittances and out-migration, significant positive impacts were evident when we stratified by initial income and education; the impacts were significant for those who were initially above median income and (among those with above-median income) were larger for those with more schooling. 5.4 Are we underestimating the impacts due to spillover effects? Biases in long-term impact estimates can arise from interference due to spillover effects, as discussed in section 4.2. Our results do not offer much support to the idea of trade-induced spillover effects. We have seen that there were no significant impacts on prices, although it might be argued that arbitrage eliminated any price differentials. More damaging to the notion that there were significant trade-spillovers across villages is the fact that we did not find significant impacts on cash income, even during the disbursement period; the short-term income gains were in kind, and mainly from animal husbandry. Since inter-village trade is likely to involve cash, there must be a presumption that such trade was affected rather little by SWP. What about bias due to the responses of the local political economy? From the data on project activities, we counted the number of new non-SWP projects of each type that started between 1996 and 2001 (inclusive). (So this is the change in the number of non-SWP projects during the period.) For the loans made to households, the project data 36 This is based on an impact of about 200 Yuan for this group (Table 5), scaled down by 25% to reflect the number of households in this group, which would then represent 75% of the total number of SWP participants. 25 also give counts of the total number of beneficiary households. However, we cannot tell what happened in the post-disbursement period since it was only possible to collect the project data we use for these calculations during the SWP disbursement period. Table 7 gives the results for various project activities.37 Large displacement effects are evident for virtually all non-SWP activities.38 For most categories, the mean in SWP villages is half or less that in non-SWP villages, implying that 40% or more of the non-SWP spending allocation to SWP villages was cut, and re-allocated to non-SWP villages.39 Such large displacement effects would imply that the benefits of the SWP are likely to have spilled over to our comparison villages, leading us to under-estimate the impacts of SWP. How large is the bias in our estimates of the impact on income due to these spillover effects? We shall assume that the displacement is entirely within the same county; that is plausible given that the county government is the key decision maker in the sub-county allocation. Invoking the theoretical result in section 4.2, we expect that total government spending (in both project and comparison villages) will also fall. In other words spending is expected to rise in the comparison villages by less than the amount that had been displaced in the project villages. To determine an upper bound to the bias we can assume that the increase in spending in the comparison villages exactly equaled the displaced spending in the project villages. In this case we will be over- estimating the bias due to spillover effects. To help throw light on the likely magnitude of bias due to spillover effects, let GOV denote the spending done under the government's own program, expressed as spending per capita of the total population. Some of this spending is done in SWP villages and some is in the non-SWP villages; GOV = wGOVSW + (1- w)GOVNSW where w is the population share of the SWP villages while GOVSW and GOVNSW denote the 37 The main activities excluded are minor infrastructure projects none of which showed any significant displacement. When there is no response from a village for a specific activity we treat it as a zero; this is plausible, although we test robustness to treating it as a missing value. 38 We repeated these tests using the total samples and treating all cases in which no entry was made as missing values. The results in Table 9 were reasonably robust. (The effects tended to be stronger under the alternative treatment of "no response" entries.) 39 Recall that about one quarter of villages in SWP counties received the aid project, so that a non-SWP village will receive, on average, one third of the displaced spending. 26 observed (post-SWP) levels of government spending in SWP and non-SWP villages respectively (per capita of the relevant population). We assume that in the absence of the SWP there would be no difference in the level of the government's spending between these two types of villages. The amount of displacement of non-SWP spending in SWP villages that is attributed to the SWP is then (GOVNSW - GOVSW )(1- w) .40 The bias in the double-difference estimate is RNSW (GOVNSW - GOVSW ) where RNSW is the income rate of return to the government's projects.41 The true impact is thus: DD* = DD + RNSW (GOVNSW - GOVSW ) (8) On noting that DD* = RSW AIDSW where RSW is the true rate of return to the SWP and the * * external aid-financed investment is AIDSW per capita in the SWP villages, we can then derive the following formula for the proportionate bias: DD =1- RNSWGOV w(1- k) (9) DD* RSW AID * where = 1 -w(1-k) and where k GOVSW /GOVNSW and AID = wAIDSW . There will be no bias if there is no displacement (k=1), or the SWP is negligible in size (w=0) or the rate of return to the displaced government investment is zero ( RNSW = 0 ). However, this is still not a usable formula for determining an upper bound for the bias since the measured rate of return to SWP spending will also be contaminated by the spillover effect. (We assume that the bias due to the local-spending spillover effects induced by the external aid only contaminates estimates of the rate of return to that aid.) The true rate of return is RSW = RSW DD* / DD . Substituting into (9) and solving we have: * DD* =1+ RNSWGOV (10) DD RSW AID 40 Note that if S is the per capita government spending displaced from SWP villages then Sw /(1- w) is the corresponding gain (per capita) in the non-SWP villages. GOVSW = GOV - S and GOVNSW = GOV + Sw /(1- w) . 41 Note that YSW = YSW - RNSW S and YNSW = YNSW + RNSW Sw /(1 - w) are the * * measured income gains where the * denotes the values without the spillovers. Also note that DD = YSW - YNSW and DD* = YSW - YNSW . The following result is then easily derived. * * 27 What are seemingly plausible values for the parameters of (10)? Jalan and Ravallion (1998) estimated an average rate of return of 12% for the Government's poor area development program in the same region of China over 1985-90. Using different methods, Park et al., (2002) also estimate a rate of return to the Government's national poor-area program of 12% in the period 1992-95. Using the same data, and similar methods to the present study, Ravallion and Chen (2005) estimated that the rate of return to the SWP spending during the disbursement period was RSW = 9%. So we set RNSW / RSW = 1.33. One-quarter of villages in the poor countries participated in SWP, so w=0.25. Based on Table 7 we can take k=1/3 to be a reasonable lower-bound (noting that DD / DD* is strictly increasing in k).42 So = 0.2 . The level of investment per capita under the non-SWP projects is about half of than under SWP (GOV / AIDSW = 0.5) implying that GOV / AID = 2 .43 Inserting these numbers into equation (9) we obtain DD* / DD =1.53 (implying RSW = 14%). * So allowing for spillover effects could yield as much as a 50% larger income gain attributed to the SWP during the disbursement period. The bias-corrected simple DD estimate of the income gain during the disbursement period could rise to about 200 yuan per person, from 130 yuan. In principle, the consumption gains could also be biased, although, given that we find virtually zero (indeed negative) consumption impacts in the disbursement period, our conclusion that the income gains were fully saved remains unaffected. The more interesting question concerns the post-disbursement period. Recall that the tests for displacement in Table 7 do not cover the post-disbursement period. It might be expected that the local spending balance between the treatment and comparison villages would be restored once the external aid ceased. Although the data used in Table 7 are not available for 2004/05, we can at least test for long-term impacts on new loan activity from non-SWP sources, as an indication of whether the SWP displaced other 42 Using the project data base to comparing average loan amounts for non-SWP in SWP villages with those in non-SWP villages gives k=0.58. 43 According to the project data, mean lending per capita under non-SWP projects (whether in SWP or non-SWP villages) represents 53% of the corresponding mean loan under the SWP (per capita of the population in SWP villages). 28 sources of finance in the post-disbursement period. (In 1995 we know who had received SWP loans so we can net this out of total loans received. Of course, in 2004/05 there were no new SWP loans.) By these calculations, we found no significant impacts on non-SWP loans in 2004/05. This does not suggest there was long-term displacement of other sources of finance. While the displacement effect is presumably greater in the disbursement period, it cannot be ruled out post-disbursement. If there are in fact longer-term gains from the SWP and this is known locally then continuing positive displacement will be expected, making it harder to identify those gains. However, even the upper bound to the bias derived above of DD* / DD =1.5 is well short of being sufficient to imply a significant long-term impact on mean income; assuming that the standard error is not biased by the spillover effect, one would need to quadruple the income gain in 2004 before it could be deemed statistically significant. 6. Conclusions The longer-term impacts of aid to poor areas depend crucially on why these areas are poor in the first place. If persistently poor areas arise from generalized capital-market failures then external aid can relieve the credit constraints and so enhance long-run growth. If instead the credit market failures are specific to certain (liquidity-constrained) subgroups of the population then the aid will need to be targeted to those groups. However, persistently poor areas can arise from other causes, such as governance failures or (possibly policy-induced) distortions in other markets (including labor, such as due to restrictions on migration). Heterogeneity in impacts can also interact with the beneficiary selection process in a way that attenuates the aggregate impact. So the benefits from extra aid to poor areas may well be modest. Unfortunately, the absence of rigorous studies of the long-term impacts of aid to poor areas has left a gap in our knowledge about both the causes of geographically concentrated poverty and aid effectiveness. To help fill this gap in knowledge, we have used a specially designed set of high- quality surveys collected over a 10 year period to study the impacts of a World Bank- financed poor-area development program in southwest China. We find a sizeable and 29 statistically significant impact on mean household income in the participating villages during the disbursement period. However, there was a much smaller impact on consumption during that period; the short-term income gains were largely saved (although with some improvements in diet quality). Four years after disbursements had ended, both project and non-project villages had seen sizeable economic gains, with only modest net gain to mean income attributed to the project. Indeed, we cannot reject the null hypothesis that the longer-term average impact was in fact zero, although we do find evidence of longer-term impacts on income in-kind from animal husbandry. The most plausible interpretation of our findings appears to be as follows. The high savings rate from the initial income gains reflected uncertainty about the future impacts -- no doubt compounded by the uncertainty about the project's loan repayment and interest obligations, given uncertain contract enforcement at local level. Farm animals were clearly an important form of saving as well as being the main source of the short-term income gains. No doubt the relevant uncertainties were resolved in the longer term. Productivity gains turned out to be small. The initial income gains proved to be transient for most households, although there was some persistence in the income gains from animal husbandry. The mean consumption gains over the longer period are in rough accord with what one would expect from the (modest) increment to permanent income attributable to the project. We highlight three findings that raise broader issues for development programs. First, heterogeneity in impacts can play an important role in explaining poor overall outcomes. We find that there were significant and lasting income gains among the subset of households who were initially poor and relatively well educated. Presumably these households had more productive investment options, which could not be financed otherwise given the liquidity constraints facing the poorest. The program's community- based selection process favored the better educated, but expanded coverage of those who were also poor could have greatly enhanced the program's overall impact. Given the heterogeneity in returns, the implied (ex-post) deficiencies of the community-based selection process help explain the program's disappointing overall impact. While the program performed well in selecting poor villages, overall impacts were greatly attenuated by inadequate coverage of the (educated) poor within poor villages. 30 This finding points to a potentially serious trade-off facing such programs. The desirability of more participatory processes of local beneficiary selection may well come at a large cost to overall impacts, including on poverty. To assure larger impacts one would need to over-ride this process by dictating the types of households that should be targeted, based on the likely benefits to them. (In the program studied here, it appears that the presence of complementary skills and knowledge, as proxied by education, was crucial to the impact.) Whether that is feasible or not in practice is a moot point. Second, our results point to the importance of taking account of the participants' inter-temporal behavior, such as in response to the uninsured risks often associated with a development project. Those responses can cloud impacts in both experimental and non- experimental evaluations. An evaluation that focused solely on the income or consumption gains during the disbursement period (as is commonly the case) can give a deceptive picture of the true impacts. Third, our findings illustrate how the responses of local development agents can cloud identification of the long-term impacts of geographically-placed projects (whether randomly placed or targeted). We found evidence of positive spillover effects on the comparison villages through the displacement of other development spending during the program's disbursement period. Such interference suggests that the classic impact evaluation methods will systematically underestimate the impact. In our case, the biases could well be substantial, although it is unlikely that these effects are imparting a sufficiently large bias on our impact estimates (under seemingly plausible assumptions) to overturn our main qualitative results. But this may well be a bigger problem in other settings. 31 Appendix 1: Proof of the proposition in Section 4.2 The problem is to maximize W(GOVSW + AID,GOVNSW , Z) s.t. GOVSW + GOVNSW + Z R , where R is the local government's revenue. The first-order conditions for an optimum require that: WSW (GOVSW + AID) = WZ (Z) (A1.1) WNSW (GOVNSW ) = WZ (Z) (A1.2) (in obvious notation). By the implicit function theorem, to optimal levels of GOVj and Z are functions of AID. Differentiating (A1.1) and (A1.2) totally with respect to AID we have: (WSS +WZZ ) GOVSW GOVNSW (A2.1) AID +WZZ AID = -WSS WZZ GOVSW GOVNSW = 0 (A2.2) AID + (WNN +WZZ) AID where WSS is the second derivative of W w.r.t. GOVSW , WNN is the second derivative of W w.r.t. GOVNSW and WZZ is the second derivative of W w.r.t. Z. Solving (A2.1) and (A2.2) we have: GOVSW (A3.1) AID = -WSS(WNN +WZZ)/ J < 0 GOVNSW (A3.2) AID =WSSWZZ / J > 0 Summing (A3.1) and (A3.2) we also have: GOV AID = -WSSWNN / J < 0 (A4) Where J WSSWNN +WSSWZZ +WZZWNN > 0 . Proposition 1 follows immediately. 32 References Alberto, Abadie and Guido Imbens, 2006, "Large Sample Properties of Matching Estimators for Average Treatment Effects," Econometrica, 74(1): 235-267. ______________________________, 2006, "On the Failure of the Bootstrap for Matching Estimators", mimeo, University of California, Berkeley. Chen, Shaohua and Martin Ravallion, 1996, "Data in Transition: Assessing Rural Living Standards in Southern China," China Economic Review, 7: 23-56. Crump, R., J. Hotz, G. Imbens, and O. Mitnik, 2006, "Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand," National Bureau of Economic Research, Technical Paper 330, Cambridge, Mass. Government of China, 1998, Yearbook of China Agricultural Development Bank. Beijing: China Statistics Press. Guobao, Wu, Qiulin Yang and Chengwei Huang, 2004, "The China Southwest Poverty Reduction Project," Paper presented at the conference, Scaling Up Poverty Reduction, Shanghai, China. Shortened version published in Reducing Poverty on a Global Scale, Edited by Blanca Moreno-Dodson, World Bank, 2005, CD-ROM, pp. 255-258. Heckman, James and Petra Todd, 1995, "Adapting Propensity Score Matching and Selection Model to Choice-Based Samples," Working Paper. Department of Economics, University of Chicago. Heckman, J., H. Ichimura, and P. Todd, 1997, "Matching as an Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Program," Review of Economic Studies 64(4): 605-654. Heckman, J., H. Ichimura, J. Smith, and P. Todd, 1998, "Characterizing Selection Bias using Experimental Data," Econometrica, 66: 1017-1099. Hirano, Keisuke and Guido Imbens, 2002, "Estimation of Causal Effects using Propensity Score Weighting: An Application to Data on Right Heart Catheterization," Health Services and Outcomes Research Methodology, 2: 259-278. Hirano, Keisuke, Guido Imbens, and Geert Ridder, 2003, "Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score," Econometrica Vol. 71(4): 1161-1189. Imbens, Guido, 2004, "Nonparametric Estimation of Average Treatment Effects under Exogeneity: A Review," Review of Economics and Statistics, 86(1): 4-29. Jian, Tianlun, Jeffrey Sachs and Andrew Warner, 1996, "Trends in Regional Inequality in China," China Economic Review, 7(1), 1-21. Jalan, Jyotsna and Martin Ravallion, 1998, "Are There Dynamic Gains from a Poor-Area Development Program?" Journal of Public Economics, 67: 65-85. ___________ and ______________, 2001, "Behavioral Responses to Risk in Rural China," Journal of Development Economics, 66: 23-49. ___________ and ______________, 2002, "Geographic Poverty Traps? A Micro Model of Consumption Growth in Rural China," Journal of Applied Econometrics, 7(4): 329-346. Kanbur, Ravi and Xiaobo Zhang, 1999. "Which Regional Inequality? The Evolution of Rural- Urban and Inland-Coastal Inequality in China from 1983 to 1995," Journal of Comparative Economics 27: 686-701. Khan, Azizur Rahman and Carl Riskin, 1998, "Income Inequality in China: Composition, Distribution and Growth of Household Income, 1988 to 1995," China Quarterly, 154: 221-253. Knight, John and Lina Song, 1993, "The Spatial Contribution to Income Inequality in Rural China," Cambridge Journal of Economics 17: 195-213. Leading Group, 1988, Outlines of Economic Development in China's Poor Areas, Office of the Leading Group of Economic Development in Poor Areas Under the State Council, Agricultural Publishing House, Beijing. National Bureau of Statistics (NBS), 2000, The Poverty Monitoring Report of Rural China 2000, Beijing: China Statistics Press. Pack, Howard and Janet Pack, 1990, "Is Foreign Aid Fungible? The Case of Indonesia," Economic Journal 100: 188-194. Park, Albert, Sangui Wang and Guobao Wu, 2002, "Regional Poverty Targeting in China," Journal of Public Economics, 86(1): 123-153. Ravallion, Martin and Shaohua Chen, 2005, "Hidden Impact: Household Saving in Response to a Poor-Area Development Project," Journal of Public Economics, 89: 2183-2204. Ravallion, Martin and Jyotsna Jalan, 1999, "China's Lagging Poor Areas," American Economic Review, Papers and Proceedings 89(2): 301-305. 34 Roemer. John and Joaquim Silvestre, 2002, "The Flypaper Effect is Not an Anomaly," Journal of Public Economic Theory 4(1): 1-17. Rosenbaum, Paul R., and Donald B. Rubin, 1983, "The Central Role of the Propensity Score in Observational Studies for Causal Effects," Biometrika, 70: 41-55. _________________and ___________, 1985, "Constructing a Control Group Using Multivariate Matched Sampling Methods that Incorporate the Propensity Score," American Statistician, 39(1): 33-38. Rubin, Donald B., 1980, "Discussion of the Paper by D. Basu," Journal of the American Statistical Association 75: 591-593. Smith, A. Jeffrey and Petra E. Todd, 2005, "Does Matching Overcome LaLonde's Critique of Nonexperimental Estimators?" Journal of Econometrics, 125: 305-353. van de walle, Dominique and Ren Mu, 2007 , "Fungibility and the Flypaper Effect of Project Aid: Micro-Evidence for Vietnam," Journal of Development Economics, 84: 667-685. World Bank, 1992, China: Strategies for Reducing Poverty, Washington DC: World Bank. __________, 1997, China 2020: Sharing Rising Incomes, Washington DC: World Bank. __________, 2003, Implementation Completion Report on the Southwest Poverty Reduction Project, Report No. 26132, Washington DC: World Bank. 35 Figure 1: Impacts on poverty (trimmed sample) (a) Income poverty DD poverty impact (% points) 2 0 -2 -4 -6 -8 -10 -12 -14 350 450 550 650 750 808 950 1050 1150 Poverty lines (Yuan per person per year) Year 2000 Year 2004 (b) Consumption poverty DD poverty impact (% points) 2 0 -2 -4 -6 -8 -10 -12 -14 350 450 550 650 750 808 950 1050 1150 Poverty lines (Yuan per person per year) Year 2000 Year 2004 36 Table 1: Summary statistics on outcome indicators 1996 2000 2004/05 SWP Non-SWP SWP Non-SWP SWP Non-SWP villages villages villages villages villages villages Mean income 996.061 1158.319 1263.412 1223.698 1390.766 1518.963 (715.402) (604.914) (910.036) (669.843) (902.030) (930.867) Mean consumption 843.559 945.201 943.550 1023.352 1130.588 1211.973 (469.555) (445.787) (566.183) (698.428) (794.167) (795.499) Income poverty rate Poverty line=600 yuan 0.222 0.127 0.138 0.112 0.123 0.095 (0.416) (0.332) (0.345) (0.316) (0.329) (0.294) 808 yuan 0.453 0.306 0.290 0.262 0.242 0.182 (0.498) (0.461) (0.454) (0.440) (0.429) (0.386) 1000 yuan 0.614 0.456 0.449 0.415 0.369 0.290 (0.487) (0.498) (0.497) (0.493) (0.483) (0.454) Consumption poverty rate Poverty line=600 yuan 0.290 0.183 0.276 0.219 0.179 0.135 (0.454) (0.387) (0.447) (0.414) (0.384) (0.342) 808 yuan 0.576 0.454 0.509 0.441 0.385 0.317 (0.494) (0.498) (0.500) (0.497) (0.487) (0.465) 1000 yuan 0.757 0.648 0.675 0.627 0.537 0.468 (0.429) (0.478) (0.468) (0.484) (0.499) (0.499) Notes: Standard deviations are in parenthesis. Income, consumption and poverty measures are weighted by household size. There are 112 project villages and 86 comparison villages. The mean of income/expenditure is Yuan per capita per year at 1995 prices. 37 Table 2: Probit regression of village participation in the SWP using baseline covariates Coeff. z-value Village on the plains Reference category Hills 4.876 (4.02) Mountainous 2.771 (3.05) Whether village has electricity -0.672 (-1.82) ...telephones -0.070 (-0.2) ...road passing through it 0.215 (0.59) ...radio transmitters 0.352 (1.09) Whether village can receive TV transmission 0.237 (0.82) Located <5km from the nearest market 0.028 (0.05) ...5-10 km from the nearest market -0.494 (-0.94) ...10-20 km from the nearest market 0.740 (0.95) ...>20km Reference category # of days in a cycle during which the market assembles -0.115 (-0.76) County town within 5 km Reference category Distance from village to county town is 5-10km 1.373 (1.95) ...10-20km -0.530 (-0.85) >20km -0.448 (-0.83) Township=village Reference category Distance from village to township is within 5km 0.137 (0.19) ...5-10km 0.229 (0.34) ...10-20km -1.628 (-2.55) Main mode of transportation used by the villager: bicycle -0.296 (-0.4) ...bus -0.305 (-0.9) ...other automobile 0.913 (1.71) ...walking Reference category Nearest train station is within 5 km -0.586 (-0.62) ...5-10km 0.999 (1.39) ...10-20km 1.111 (1.52) >20km Reference category Nearest bus station is within 5 km 0.021 (0.07) ...5-10km 0.265 (0.64) ...10-20km 0.469 (1) ...>20km Reference category Whether village has a day-care center 0.724 (1.38) Elementary school is in village Reference category Nearest elementary school is within 5km 0.055 (0.16) ...5-10km 0.737 (1.6) Middle school is in village Reference category Nearest middle school is within 5 km 1.026 (2.09) ...5-10km 0.142 (0.21) ...10-20km 1.551 (1.63) ...>20km 0.882 (1.13) Medical clinic in village Reference category Nearest medical clinic is within 5 km -1.026 (-2.79) ...5-10km -0.420 (-1.11) ...10-20km -0.820 (-1.24) ...>20km -0.997 (-1.46) 38 Total population of the village 0.000 (1.99) Irrigated land (mu) -0.001 (-2.8) Forest land (mu) 0.000 (-0.87) # of people work in TVE over # of labor 0.139 (1.99) Whether village has TVE -0.798 (-1.35) Output of grain per capita (kg/person) 0.001 (1.52) Net income per capita 0.020 (2) Net income per capita squared 0.000 (-1.97) Net income per capita cube 0.000 (1.66) (End of year) # of pigs per person 0.972 (1.75) (End of year) # of cows per person 0.840 (0.7) (End of year) # of sheep, goat per person 0.531 (1.12) (End of year) # of poultry per person 0.419 (2.54) (End of year) # of hone been per person -5.412 (-2.27) Workforce per capita 0.036 (1.4) Average household size -0.042 (-1) Share of workforce female -0.082 (-1.68) Cultivated land per capita (mu) 1.438 (3.19) Grassland per capita (mu) 1.887 (1.43) Village mean of consumption (log) -0.493 (0.198) Village mean of school enrollment (age 6-14) -2.029 (-2.84) Guangxi 1.394 (2.73) Guizhou 0.659 (0.92) Yunnan Reference category Intercept -2.522 (-0.88) Pseudo-R2 0.360 Note: The village is the unit of observation (n=200) and all explanatory variables are pre- intervention (1995). Standard errors are adjusted for cluster at county level. 39 Table 3: Impact of SWP on household income and consumption using propensity-score weighting or matching 1996 mean Gain in Gain in non- PS Kernel in SWP SWP SWP Simple weighted matched villages project villages DD t-ratio DD t-ratio DD t-ratio Trimmed sample 2000 income 981.906 196.322 66.012 130.31 1.826 182.655 2.541 169.150 2.392 consumption (C) 841.729 67.092 70.480 -3.388 -0.067 -17.662 -0.313 -45.762 -0.751 saving (S) 140.223 129.185 -4.525 133.711 2.107 200.333 2.723 214.93 2.685 2004/05 income 981.906 432.325 387.399 44.926 0.500 42.975 0.455 42.234 0.549 consumption 841.729 345.947 287.687 58.26 0.870 58.535 0.786 18.312 0.223 saving 140.223 86.333 99.655 -13.322 -0.159 -15.544 -0.18 23.941 0.289 2000 log income 6.747 0.18 0.051 0.128 2.046 0.161 2.395 0.133 2.251 log consumption 6.629 0.058 0.019 0.040 0.755 0.031 0.537 0.001 0.003 log(1+S/C) 0.117 0.120 0.031 0.089 1.79 0.131 2.467 0.133 2.617 2004/05 log income 6.747 0.345 0.264 0.081 1.171 0.062 0.823 0.038 0.522 log consumption 6.629 0.299 0.210 0.090 1.707 0.067 1.130 0.025 0.474 log(1+S/C) 0.117 0.046 0.055 -0.009 -0.148 -0.005 -0.078 0.014 0.263 Total sample 2000 income 989.45 273.962 65.379 208.583 3.346 213.605 3.287 192.731 2.985 consumption (C) 843.559 99.991 78.151 21.84 0.510 -151.054 -1.180 -189.569 -1.427 saving (S) 145.934 173.928 -12.828 186.755 3.141 364.696 3.371 382.342 3.612 2004/05 income 989.45 401.316 360.644 40.673 0.537 -47.159 -0.423 -45.246 -0.344 consumption 843.559 287.029 266.772 20.258 0.371 36.752 0.633 25.893 0.439 saving 145.934 114.244 93.816 20.427 0.303 -83.874 -0.705 -71.097 -0.52 2000 log income 6.752 0.230 0.050 0.180 3.448 0.180 3.337 0.160 3.221 log consumption 6.631 0.087 0.028 0.059 1.374 -0.046 -0.577 -0.081 -0.945 log(1+S/C) 0.121 0.143 0.021 0.122 2.727 0.227 3.568 0.241 3.615 2004/05 log income 6.752 0.310 0.231 0.078 1.314 -0.005 -0.064 -0.01 -0.112 log consumption 6.631 0.223 0.188 0.035 0.682 0.021 0.388 0.007 0.115 log(1+S/C) 0.121 0.087 0.043 0.044 0.915 -0.026 -0.307 -0.017 -0.185 Notes: All the calculations are weighted by household size. T-ratio of kernel matching is obtained from bootstrapping (100 repetitions). Standard errors of weighted D-D estimations are robust to heteroskedasticity and serial correlation of households within each village. In the total sample, there are 112 project villages and 86 comparison villages. In the trimmed sample, there are 71 project villages and 66 comparison villages. Table 4: Impacts on income from animal husbandry Gain in Gain in PS Kernel Revenue or costs from 1996 mean SWPR non- weighted matched animal husbandry (AH) in SWPR project SWPR DD t-ratio D-D t-ratio D-D t-ratio 2000 total revenue 326.983 107.6 33.894 73.706 2.302 100.03 2.703 118.883 2.498 total cost of production 190.901 -15.516 1.623 -17.139 -0.801 -17.229 -0.779 -17.271 -0.717 net income from AH 136.082 123.117 32.271 90.845 2.924 117.26 3.373 136.154 3.551 cash income (net) 142.204 12.411 1.587 10.824 0.663 14.684 0.858 -2.356 -0.685 in-kind income (net) -6.123 110.705 30.684 80.021 2.79 102.575 3.099 125.853 2.895 2004 total revenue 326.983 196.889 225.753 -28.864 -0.507 -1.357 -0.023 12.356 0.246 total cost of production 190.901 80.772 121.847 -41.075 -1.196 -36.175 -1.031 -42.896 -1.285 net income 136.082 116.118 103.906 12.212 0.282 34.818 0.785 55.252 1.344 cash income (net) 142.204 103.745 150.839 -47.093 -1.025 -30.646 -0.578 3.219 0.641 in-kind income (net) -6.123 12.372 -46.932 59.305 2.033 65.464 1.805 74.179 1.705 Notes: All the calculations are weighted by household size. T-ratio of kernel matching is obtained from bootstrapping (100 repetitions). Standard errors of weighted D-D estimations are robust to heteroskedasticity and serial correlation of households within each village. The trimmed sample is used, for which there are 71 project villages and 66 comparison villages. 41 Table 5: Propensity score weighted estimates of impacts on poverty Poverty (1) (2) incidence Change in Change in (1996) in H in H in (1)-(2) Poverty project project comparison Double line villages villages villages difference t-ratio (a) Income poverty 2000 500 14.584 -6.747 0.957 -7.704 -2.138 600 22.762 -7.331 -1.672 -5.659 -1.247 700 35.116 -13.093 1.490 -14.582 -2.824 808 46.697 -15.713 -4.599 -11.114 -1.515 900 55.047 -15.193 -4.771 -10.422 -1.581 1000 62.025 -12.906 -3.606 -9.300 -1.395 1100 68.973 -10.802 1.642 -12.444 -2.195 1150 72.405 -9.981 2.484 -12.465 -2.256 2004/05 500 14.584 -8.053 -5.021 -3.032 -0.809 600 22.762 -12.250 -6.779 -5.470 -0.857 700 35.116 -19.410 -11.533 -7.877 -1.046 808 46.697 -24.907 -19.276 -5.630 -0.693 900 55.047 -26.344 -22.915 -3.429 -0.444 1000 62.025 -28.097 -23.816 -4.281 -0.530 1100 68.973 -27.623 -19.537 -8.086 -1.352 1150 72.405 -28.378 -20.347 -8.031 -1.424 (b) Consumption poverty 2000 500 18.673 -2.695 6.111 -8.806 -1.691 600 29.053 0.078 5.298 -5.221 -0.841 700 40.749 1.140 1.088 0.052 0.006 808 57.392 -5.266 -1.902 -3.364 -0.386 900 67.000 -5.761 -0.715 -5.046 -0.734 1000 75.665 -6.102 -4.570 -1.532 -0.248 1100 80.898 -4.987 -5.782 0.796 0.164 1150 83.586 -5.184 -3.569 -1.615 -0.347 2004/05 500 18.673 -11.537 -4.081 -7.456 -1.500 600 29.053 -16.661 -7.918 -8.743 -1.536 700 40.749 -18.226 -13.352 -4.874 -0.747 808 57.392 -23.241 -19.095 -4.146 -0.584 900 67.000 -24.439 -22.567 -1.872 -0.267 1000 75.665 -25.936 -23.121 -2.815 -0.520 1100 80.898 -24.192 -21.455 -2.737 -0.511 1150 83.586 -22.006 -17.962 -4.044 -0.789 Notes: All the calculations are weighted by household size. Standard errors are robust to heteroskedasticity and serial correlation of households within each village. The trimmed sample is used with 71 project villages and 66 comparison villages. 42 Table 6: Estimated impacts stratified by initial income and education Lower education group Higher education group Weighted DD Weighted DD Weighted 1996 mean for lower 1996 mean for higher triple in SW education in SW education difference villages group (1) t-ratio villages group (2) t-ratio (1)-(2) t-ratio Initial income below median 2000 income 643.538 81.686 1.015 645.831 207.958 2.525 -126.271 -1.491 consumption 664.573 -43.809 -0.593 674.167 55.069 0.604 -98.878 -1.246 saving -20.989 125.518 2.167 -28.290 152.875 1.460 -27.357 -0.291 productive assets 413.096 -58.508 -0.753 311.452 86.098 1.424 -144.606 -1.737 housing value 501.121 -39.476 -0.189 611.993 173.552 0.959 -213.028 -0.947 2004/05 income 643.538 43.687 0.319 645.831 197.933 2.026 -154.246 -1.079 consumption 664.573 97.623 1.188 674.167 219.517 2.370 -121.894 -1.105 saving -20.989 -53.914 -0.521 -28.290 -21.598 -0.247 -32.316 -0.277 productive assets 413.096 80.478 0.752 311.452 134.206 1.985 -53.728 -0.446 housing value 501.121 216.285 0.866 611.993 815.739 2.481 -599.454 -2.022 Number of households 312 (173+139) 299 (169+130) Initial income above median 2000 income 1465.163 305.638 1.535 1476.474 174.261 1.194 131.376 0.587 consumption 1061.494 -237.040 -1.268 1170.625 -8.934 -0.071 -228.105 -0.979 saving 403.747 542.693 1.720 305.881 183.215 1.375 359.479 1.086 productive assets 600.292 -160.040 -1.775 609.010 -34.391 -0.374 -125.649 -1.054 housing value 842.872 343.787 1.768 1109.570 60.008 0.303 283.780 1.118 2004/05 income 1465.163 -27.644 -0.133 1476.474 -54.414 -0.348 26.770 0.107 consumption 1061.494 -24.752 -0.179 1170.625 -136.847 -0.913 112.095 0.637 saving 403.747 -2.876 -0.015 305.881 82.452 0.493 -85.328 -0.389 productive assets 600.292 120.572 1.089 609.010 -201.500 -1.258 322.072 1.816 housing value 842.872 432.315 0.874 1109.570 -697.603 -0.910 1129.918 1.331 Number of households 204 (97+107) 363 (170+193) Notes: The numbers parentheses are the number of observations in SWP villages and non-SWP villages respectively. Estimation is made on a balanced panel of 1178 households on the trimmed sample. Lower education is defined as household head education level being lower than junior high school (illiterate or primary school). Higher education is defined as household head education level being at lest junior high school. Standard errors of weighted D-D estimations are robust to heteroskedasticity and serial correlation of households within each village. Balanced panel in trimmed sample is used with 67 project villages and 62 comparison villag 43 Table 7: Testing for displacement of new non-SWP development projects in SWP villages Mean in Mean in non- PS weighted Kernel SWP villages SWP villages Difference t-ratio diff. t-ratio matched diff. t-ratio Farming Number of projects 0.79 2.11 -1.32 -2.45 -1.68 -2.09 -2.04 -2.03 Number of households 147.63 399.44 -251.80 -2.48 -182.99 -2.43 -205.03 -2.05 Animal husbandry Number of projects 1.51 3.03 -1.52 -2.08 -2.21 -1.98 -2.38 -1.78 Number of households 135.09 324.87 -189.78 -1.17 -94.99 -1.18 -62.14 -1.00 Forestry Number of projects 0.54 1.34 -0.79 -2.50 -1.50 -1.84 -2.28 -1.66 Number of households 131.63 296.63 -165.00 -1.41 -120.06 -1.97 -117.65 -3.15 Infrastructure Terracing 0.12 0.65 -0.53 -2.08 -0.94 -1.58 -1.35 -1.46 Drinking water 0.31 0.90 -0.59 -3.04 -0.86 -2.58 -1.04 -2.54 Irrigation 0.24 0.60 -0.36 -1.80 -0.30 -1.42 -0.27 -1.31 Electricity 0.28 0.58 -0.30 -2.21 -0.49 -2.01 -0.61 -1.52 Roads 0.19 0.39 -0.20 -1.89 -0.24 -1.39 -0.25 -1.53 Student subsidies: No. 0.82 2.35 -1.53 -3.03 -1.74 -2.79 -1.75 -2.83 New schools: No. 0.35 0.79 -0.44 -2.10 -0.55 -1.96 -0.84 -2.06 Teacher training: No. 0.07 0.37 -0.30 -1.87 -0.39 -2.25 -0.37 -2.26 Health insurance 0.16 0.31 -0.14 -2.05 -0.12 -1.26 -0.05 -0.69 New clinic 0.06 0.24 -0.18 -3.84 -0.12 -1.85 -0.09 -1.46 Doctor training 0.07 0.26 -0.18 -1.62 -0.18 -1.56 -0.12 -1.23 Total no. projects 6.07 14.81 -8.73 -3.25 -11.72 -2.45 -13.68 -2.14 Total no. households 415.38 1026.10 -610.71 -1.74 -399.02 -2.22 -386.20 -2.98 Notes: Trimmed sample, treating "no response" as "no project". T-ratio of kernel matching is obtained from bootstrapping (100 repetitions). Standard errors of D-D and weighted D-D estimations are robust to heteroskedasticity and serial correlation of villages within each county. Trimmed sample is used with 71 project villages and 65 comparison villages. 44 Appendix 2: Balancing tests for village characteristics and household outcomes with and without weighting and trimming Difference in standardized means PS kernel- PS weighted for matched for total PS-weighted for PS kernel-matched Standardized means* Un-weighted total sample sample trimmed sample for trimmed sample SWP Non-SWP villages villages mean s.e. mean s.e. mean s.e. mean s.e. mean s.e. Village characteristics (1995) Total population 0.009 -0.012 0.021 0.143 0.013 0.137 0.180 0.134 0.076 0.186 0.115 0.187 Electricity -0.151 0.196 -0.347 0.141 -0.229 0.138 -0.028 0.157 0.104 0.164 -0.268 0.147 Phone 0.053 -0.069 0.122 0.143 0.109 0.141 0.373 0.132 0.155 0.168 0.072 0.171 Road -0.061 0.079 -0.139 0.143 -0.094 0.141 0.090 0.155 0.211 0.164 -0.130 0.134 Radio 0.044 -0.058 0.102 0.143 0.075 0.135 0.241 0.126 0.271 0.155 0.193 0.170 TV -0.084 0.109 -0.193 0.142 -0.131 0.143 0.056 0.152 0.117 0.175 -0.136 0.163 Nearest market <5km -0.036 0.047 -0.083 0.143 -0.068 0.148 0.100 0.152 0.078 0.187 0.417 0.206 Elementary school in village -0.009 0.011 -0.02 0.143 -0.031 0.143 0.102 0.129 -0.075 0.182 -0.005 0.18 Clinic in village 0.021 -0.028 0.049 0.143 0.051 0.141 0.258 0.129 0.043 0.170 0.073 0.172 Net income per capita -0.162 0.211 -0.373 0.141 -0.241 0.142 0.133 0.124 0.073 0.164 0.094 0.171 Cultivated land per capita 0.134 -0.173 0.307 0.141 0.238 0.135 0.251 0.122 0.299 0.151 -0.159 0.144 Household outcomes (1996) Consumption per capita -0.156 0.203 -0.36 0.141 -0.217 0.190 -0.195 0.206 -0.069 0.181 -0.007 0.181 Income per capita -0.168 0.219 -0.39 0.141 -0.23 0.139 -0.238 0.137 -0.182 0.181 -0.153 0.185 Headcount poverty index ___600 yuan (income) 0.175 -0.227 0.402 0.141 0.384 0.169 0.442 0.18 0.248 0.201 0.244 0.234 ___808 yuan (income) 0.196 -0.256 0.452 0.14 0.345 0.172 0.375 0.196 0.194 0.192 0.108 0.194 ___1000 yuan (income) 0.212 -0.277 0.489 0.139 0.41 0.202 0.457 0.256 0.165 0.201 0.062 0.216 ___600 yuan (consumption) 0.161 -0.21 0.371 0.141 0.455 0.186 0.534 0.18 0.259 0.198 0.325 0.214 ___808 yuan (consumption) 0.155 -0.202 0.357 0.141 0.253 0.194 0.242 0.213 0.07 0.194 0.008 0.218 ___1000 yuan (consumption) 0.171 -0.222 0.393 0.141 0.319 0.268 0.291 0.308 0.006 0.182 -0.130 0.203 Notes: * (sub-group mean minus mean for full sample)/standard deviation for full sample. In total sample, there are 112 project villages' 86 comparison villages. In the trimmed sample, there are 71 project villages and 66 comparison villages. Household income, consumption and poverty measures are weighted by household size. The Addendum provides further balancing tests. 45