Transport Infrastructure and Welfare: An Application to Nigeria

Transport infrastructure is deemed to be central to development and consumes a large fraction of the development assistance envelope. Yet there is debate about the economic impact of road projects. This paper proposes an approach to assess the differential development impacts of alternative road construction and prioritize various proposals, using Nigeria as a case study. Recognizing that there is no perfect measure of economic well-being, a variety of outcome metrics are used, including crop revenue, livestock revenue, non-agricultural income, the probability of being multi-dimensionally poor, and local gross domestic product for Nigeria. Although the measure of transport is the most accurate possible, it is still endogenous because of the nonrandom placement of road infrastructure. This endogeneity is addressed using a seemingly novel instrumental variable termed the natural path: the time it would take to walk along the most logical route connecting two points without taking into account other, bias-causing economic benefits. Further, the analysis considers the potential endogeneity from nonrandom placement of households and markets through carefully chosen control variables. It finds that reducing transportation costs in Nigeria will increase crop revenue, non-agricultural income, the wealth index, and local gross domestic product. Livestock sales increase as well, although this finding is less robust. The probability of being multi-dimensionally poor will decrease. The results also cast light on income diversification and structural changes that may arise. These findings are robust to relaxing the exclusion restriction. The paper also demonstrates how to prioritize alternative road programs by comparing the expected development impacts of alternative New Partnership for Africa's Development projects.

Transport infrastructure is deemed to be central to development and consumes a large fraction of the development assistance envelope. Yet there is debate about the economic impact of road projects. This paper proposes an approach to assess the differential development impacts of alternative road construction and prioritize various proposals, using Nigeria as a case study. Recognizing that there is no perfect measure of economic well-being, a variety of outcome metrics are used, including crop revenue, livestock revenue, non-agricultural income, the probability of being multi-dimensionally poor, and local gross domestic product for Nigeria. Although the measure of transport is the most accurate possible, it is still endogenous because of the nonrandom placement of road infrastructure. This endogeneity is addressed using a seemingly novel instrumental variable termed the natural path: the time it would take to walk along the most logical route connecting two points without taking into account other, bias-causing economic benefits. Further, the analysis considers the potential endogeneity from nonrandom placement of households and markets through carefully chosen control variables. It finds that reducing transportation costs in Nigeria will increase crop revenue, non-agricultural income, the wealth index, and local gross domestic product. Livestock sales increase as well, although this finding is less robust. The probability of being multi-dimensionally poor will decrease. The results also cast light on income diversification and structural changes that may arise. These findings are robust to relaxing the exclusion restriction. The paper also demonstrates how to prioritize alternative road programs by comparing the expected development impacts of alternative New Partnership for Africa's Development projects.

Introduction
Governments and donors in Sub-Saharan Africa have devoted considerable resources to the construction and rehabilitation of roads. An emphasis on transport infrastructure is evident in the lending pattern of the World Bank, which commits a larger share of resources to transport infrastructure than education, health and social services combined (World Bank 2007). Total transport commitments in fiscal year 2013 amounted to US$5.9 billion and rural and inter-urban roads remained the largest sub-sector with 60 percent of lending in FY13 (US$3.2 billion) (World Bank 2014). The rationale behind these investments is self-evident. Roads, while expensive, facilitate the creation of, and the participation in, markets and are deemed to be central to development. Africa has the lowest density of roads in the world, with 204 kilometers of road per 1,000 square km, nearly one-fifth the world average, and less than 30% of the next worst region, South Asia. Starting from such a low base, the potential for growth due to improvements in transportation infrastructure is presumed to be especially large in Africa.
However, the existing body of research about the impact of roads on economic well-being remains ambiguous, partially because it is hard to disentangle cause and effect. There is even less evidence on where investments might be the most transformative in creating new opportunities to link producers to markets. Given limited resources, there is a need for selectivity in deciding what investments should occur and where these should be located. This paper aims at tackling these issues by drawing on, and improving upon, the best data available, and by using a somewhat novel approach to overcome some of the technical challenges.
The two key challenges of estimating the impact of road networks on economic activity are well known. First is the difficulty of obtaining data which accurately reflect the conditions of the roads, and the cost of traveling along them. This is always a concern when dealing with road infrastructure-the quality of which is constantly in flux-but it is especially a challenge in Africa where infrastructure assessments are infrequent and rural roads are often unaccounted for. The second challenge is overcoming the potential sources of endogeneity arising from the non-random placement of (i) roads, (ii) spatial sorting of households, and (iii) the geographic emergence of markets. 2 Such endogeneity could potentially bias econometric results.
Roads tend to be built so as to connect major economic activities, e.g. linking cities, markets, mines, or areas of high agricultural productivity. Hence estimates need to take account of reverse causality in looking at the impact of roads: on the one hand economic potential may determine where roads are built, on the other hand, roads may spur greater economic activity. In situations where natural experiments are not feasible, and panel data are unavailable, instrumental variables are the most commonly used technique to correct for these placement effects, which is the approach used in this analysis. While no instrument is perfect, this paper constructs a variable, termed the "natural path" (described in the data section), which we suggest greatly improves the efficiency of the estimates over, say, estimates produced using more common straightline instruments. 3 Spatial sorting by households could also potentially bias our estimates if, for example, a household moved to a particular location on the basis of a variable which we do not control for. Similarly, locations of the markets which we focus on in this paper (i.e. cities with populations greater than 100,000) may also be endogenous, as they emerged historically in locations of high economic potential. We address both of these potential biases by including carefully chosen control variables in our regression analysis (discussed later in the paper).
Another difficulty with estimating the impact of road networks on economic activity is generating a variable which properly captures the proximate benefits of both local and regional roads. We overcome this difficulty by calculating the actual cost of transporting goods to market along Nigeria's existing road network. Taking into account the road classification, quality, type of paving and roughness of the terrain, the measure of transport cost to market that is calculated is perhaps the most accurate possible, given existing information.
2 See Emran and Hou (2013) which discusses these three sources of bias in the context of rural China. 3 See also Faber (2014) which uses a similar instrument.
Nigeria has an extensive national road network of more 85,000 km of classified roads (Gwilliam 2011). Both paved and unpaved road network densities are more than twice as high as those for the peer group of resource-rich African countries, although still only half of the levels found in Africa's middle-income countries (Foster and Pushak 2011). According to the Africa Infrastructure Country Diagnostic benchmark study (Foster and Briceño-Garmendia 2008): if Nigeria wishes to meet its economic and social targets for transportation infrastructure it would need to invest $1.2 billion annually for a 10 year period. However, it is important to evaluate the effect of investment on transportation infrastructure to justify large investment.
This paper attempts to provide a more complete picture of the extent to which household welfare and incomes are expected to improve with a given reduction in transport costs. We do so by considering several different outcome variables which we obtain from two household surveys, and a raster data set 4 on local gross domestic product (GDP). The household surveys employed in this paper are the 2010 Living Standards Measurement Study -Integrated Survey on Agriculture (LSMS-ISA) for Nigeria, and the 2008 Nigeria Demographic Health Survey (NDHS). From these surveys, we are able to obtain several welfare indicators, including: revenue from crop production, revenue from livestock sales, non-agricultural income, employment, wealth, and a multi-dimensional poverty indicator. A key advantage of these household surveys is that the enumeration areas -the geo-locations -are recorded. The raster data set, which we obtain from Ghosh et al (2010), gives an estimate of local GDP at a very fine spatial level for the entire land area of Nigeria.
These measures also gives us the ability to study the effects of transport costs on both 'flow' measures of welfare, or 'stock' measures, which capture much longer term effects. Measures of income, such as crop revenue, livestock sales, and non-agricultural income are in the former category and will be impacted by idiosyncratic shocks or localized impacts, for instance, a bad harvest due to less than average rainfall, or a sudden illness of the household head. Meanwhile, although improving transportation infrastructure can lead to benefits in the short term, many of the benefits will not become apparent for many years after the improvement, at which point households and businesses 4 A raster is a matrix of cells where each cell contains spatial information, in our case local GDP.
have had time to adjust to the new equilibrium. To capture these benefits, we also study 'stock' measures of welfare, including a wealth index available in the NDHS, and a multi-dimensional poverty index (MPI) which we construct following Alkire and Santos (2010). 5 In addition, we examine a plausible mechanism of income diversification and improvement in economic activity (all year employment of working age population) by which transport cost reduction may affect local GDP. By looking at these different indicators of welfare, we are able to disaggregate the benefits of a transport cost reduction and obtain insights into the causal pathways to poverty reduction.
The elasticities generated from the household survey and local GDP analysis are summarized in Table 1 below. We then use the elasticity on local GDP to forecast the economic impact of the improvement of the current network. Local GDP is used in these simulations because these data provides a baseline which is available throughout the entirety of the country, unlike household survey data which has only limited spatial coverage. Our forecasts allow for heterogeneous benefits depending on current levels of welfare, current transportation costs, and spatially varying transportation cost reductions from new road construction. This can enable decision-makers to maximize the efficiency with which they use scarce resources, by prioritizing construction of those roads which would have the biggest impact on economic growth and poverty reduction in the region. Note: These benefits were estimated using the natural path IV.

Related Literature
This paper is related to a vast and rapidly growing literature on the effects of infrastructure on well-being and a continuing debate (among planners, policy makers, and academics) about the role of transport investments in economic growth. This debate has been fostered by limited evidence of a causal relationship and conflicting evidence provided by different studies on the relationship between the two (Gunasekara et al Lakshmanan 2007).
Approaches to addressing this issue have varied considerably and evolved over time. Researchers have examined the effects of road infrastructure and transport capital investments on aggregate productivity (usually measured by GDP or personal income), output elasticity and productivity in developed countries (Aschauer 1989, Lakshman and Anderson 2002, Lakshman and Anderson 2007, Chandra and Thompson 2000, Demetriades and Mamuneas 2000, Annala and Perez 2001, Foster and Araujo 2004, Ihori and Kondo 2001, Lokshin and Yemtsov 2003, Nadiri and Mamuneas 1996, Munnell 1990, Shirley and Winston 2004, and Sturm 2001, and in developing countries (Deichmann et al 2002, Morrison-Paul et al 2001, Lokshin and Yemtsov 2003Feltenstein and Ha 1995).
The results however remain ambiguous with conflicting evidence of impacts in both developed and developing countries. To a large extent the contradictory evidence and the ensuing debates are a consequence of the identification and reverse causality problems.
A set of recent papers have used rigorous and compelling identification strategies to shed light on the impact of large transport infrastructure improvements (Michaels 2008, Donaldson 2012, Datta 2012, Faber 2012and Banerjee et al 2012 6 . One approach is to use panel data estimation methods (Dercon et al 2008, Khandker andKoolwal 2011 (Jacoby and Minten 2009, Shrestha 2012, Emran and Hou 2013.
Recent papers have looked at the mechanisms through which transportation costs impact wellbeing. One of these is that reducing transportation costs leads to greater access to markets, as well as a decrease in both trade costs and interregional price gaps (Donaldson 2012, Casaburi et al 2013. This further affects input and output prices of crops (Khandker et al 2006, Minten and Kyle 1999, Chamberlin et al 2007, Stifel and Minten 2008

Data
This paper utilizes several different data sets to analyze the relationship between transportation costs that households incur to access the nearest market (defined as cities with population of at least 100,000) and several different measures of welfare. 7 In order to do so, a very thorough road network was constructed for Nigeria, using several sources of data described in section 3.1 below. To correct for endogenous road placement, an IV approach is used and a instrument was generated for this paper which we refer to as the Natural Path, described in section 3.2. Finally, a multitude of welfare indicators are utilized in this paper, and these are described in section 3.3. (For summary statistics of the main variables used, see Appendix I.)

Travel costs to market
Throughout much of the literature on transport infrastructure, the variables of choice to measure infrastructure availability typically fall into three categories; local road 7 This paper analyzes the combined effect of both large transport infrastructure, such as highways, and rural roads, and thus differ from Michaels, 2008, Donaldson, 2012, Datta, 2012, Faber, 2012and Banerjee, Duflo and Qian, 2012, which analyze the impact of large transport infrastructures, highways and railways. It also differs from Jacoby and Minten 2008, Dorosh et al 2010, Gibson and Rozelle (2003, Ali (2011) Banerjee, Duflo and Qian 2012, Stifel and Minten, 2008, Fafchamps and Moser, 2003Jacoby, 2000;Minten and Kyle, 1999), and travel time to destination (e.g. Dorosh et al 2012). These are all attempts to proxy the true price of traveling or transporting goods throughout a road network. Proxies are necessary because obtaining true transport prices would require surveying every possible origin and destination to determine the local price of shipping goods, which would be very difficult, if not infeasible. These techniques, while certainly correlated with transport prices, are imperfect measures. The biggest shortcoming of these measures is the fact that they often fail to distinguish between roads of different qualities and across different terrains. It is certainly more costly to travel along an unpaved, tertiary road, or a road with a steep gradient, than it would be to travel down a flat, paved, national highway. However, simple measures of road infrastructure will not distinguish between these routes (although travel time to some extent can account for this by reducing speed on unpaved roads).
To better account for this, we use the Highway Development Management Model (HDM-4) and a mixture of GIS tools to estimate the actual cost of traveling along a road.
This model penalizes routes which follow roads in less than perfect condition, that are unpaved, or that are not flat (See Appendix II for more details). While this is still not a perfect measure of transport prices, it is a significant improvement over the current state of the literature. The total travel cost to the cheapest market (defined as a city of at least 100,000 residents) is calculated using an iterative cost-minimizing process in which every possible travel path to every available market was calculated, and the least cost one chosen as the optimal route. 8

The natural path instrument
As discussed above, it is well established in the literature that simple OLS regressions will often yield biased estimates of the effects of public investment due to the non-random placement of infrastructure. In cases where roads are built to connect regions of high economic potential, OLS will tend to overestimate the impacts of roads.
If on the other hand, roads are built with poverty-reduction goals in mind, then OLS would instead underestimate the impacts.
In order to eliminate the bias from this endogeneity, we use an instrumental variable (IV) approach based on the topography of the land between the origin (households) and the destination (markets). Specifically, this variable, which we refer to as the "natural path", is the time that it takes to walk along the time minimizing route from a given location to the nearest market, in the absence of roads. Faber (2014) made use of a similar instrument. Given that road construction costs are a function of segment length, and land topography, the natural path route is therefore highly correlated with the most cost effective place to construct a road network, if economic benefits were ignored.
Moreover in the context of Africa it captures many of the historic trade and caravan routes where head-loading (walking) was the dominant pre-colonial mode of transportation. 9 Therefore, it is plausible to suggest that any endogeneity in the road network from placement decisions (i.e. decisions to place roads in areas which would maximize economic benefits) is captured in the difference between the current road network, and the natural pathway. This instrument is strictly an improvement over "straight-line" instruments as the natural path more accurately represents what straightline instruments are attempting to estimate, that is, the most cost effective route to connect two points, while excluding any other economic benefits. Details on the GIS algorithm and data used to construct the Natural Path are in Appendix III.

Welfare indicators
In order to robustly estimate the welfare benefits of reducing transportation costs,

Empirical Framework
Our main identification strategy is to instrument for cost to market with the natural path variable (i.e. time it takes to walk to market along the natural terrain).
To illustrate the approach, consider the following model: where denotes the level of outcome k (agricultural revenue, livestock sales, nonagricultural income, multi-dimensional poverty index, wealth index, local GDP, income diversification, and all year employment) indicating welfare or employment of household i in case of the two household surveys, or location i in case of local GDP. is the transport cost to market, is a vector of household controls, and is the natural path 11 The LSMS-ISA is part of a $19 million project of the Bill and Malinda Gates Foundation. In Nigeria, the LSMS-ISA data was collected twice over two seasons. The Post-Planting Survey was conducted August-October 2010. This was followed by the Post-Harvest Survey in February-April 2011. Each survey is made up of three integrated questionnaires: household, agriculture, and community. In addition, certain geovariables are available (including information on agro-ecological zones). Each enumeration area is geolocated, allowing us to merge this data with spatial data from other sources. For the purposes of this analysis, we use the 2010 post-planting survey, mainly focusing on the agriculture questionnaire with a few variables (e.g. labor) from the household survey. 12 The original purpose of the survey was to inform policy makers on a variety of issues mainly affecting women and children, including fertility preferences, infant and young children feeding practices, nutritional status, and early childhood and maternal mortality. For an explanation of the sampling procedure used in the NDHS, see Appendix IV.
variable. For local GDP analysis, household controls are replaced with geographic-level control variables.
The key parameter of interest is 1 , the causal impact of the cost of traveling to the cheapest market, on household welfare. Three of the outcome variables analyzed represent sources of income (i.e. from crops, livestock, or nonagricultural sources) and are seemingly related. Thus, these three outcomes are also estimated using a seemingly unrelated regression (SUR) framework. Specifically, since transport costs are endogenous, we employ three-stage least squares (which combines SUR and IV methods). The remaining outcome variables are estimated using the customary two-stage approach. In all cases, the endogenous transport cost variable is instrumented using the natural path variable.
In addition to addressing the endogeneity of non-random road placement, we further consider the potential endogeneity stemming from non-random locations of households and markets. Failure to take these into account could yield potentially biased estimates. We address these sources of bias through carefully chosen control variables to be included in .
Spatial sorting by households could potentially bias estimates if, for example, a household's location was determined by a variable that has not been controlled for. In the context of Nigerian farmers, this spatial sorting is arguably much less of a concern.
Given the lack of a functional land market, it is unlikely that farming households would change locations, as moving the entire household would require abandoning one's land.
In the Nigeria LSMS data, for example, 74% of land is inherited with less than 6% bought (the remaining land is either rented or used free of charge). Therefore, while individuals do migrate (usually to cities), it is rare for the household as a whole to relocate, especially to other rural areas. Even so, we do control for characteristics of the household head (i.e. age and literacy) in our regressions that may indicate whether a household will have the means or relocate.
The location of markets is largely determined by environmental and topographical factors. As discussed previously, cities tend to emerge historically in areas of high economic potential. To address the endogeneity of market locations we include fixed effects based on which city a household travels to according to the least cost path, which we refer to as 'marketshed fixed effects'. By doing so, we are accounting for any unobserved heterogeneity between market locations. 13 As a robustness check for the validity of the instruments, we calculate a set of Conley Bounds (Conley et al (2012)), for the coefficients of interest. To illustrate this, let be the IV and rewrite equation (1) as follows: The traditional IV strategy assumes that = 0. Conley Bounds allow to be close but not actually equal to zero, in other words they allow the IVs to be only "plausibly exogenous". By allowing the value of to vary, we can then test how sensitive the estimates are to different degrees of exogeneity.

Empirical Results
For each of our welfare measures, our identification strategy focuses on the use of our natural path instrumental variable. As a robustness check, we further report the results from using a Euclidean Distance IV, 14 which yields very similar results. In the interest of brevity, our discussion focuses on the natural path results. To account for the fact that household observations are likely correlated within enumeration areas, we report results clustered at the enumeration area. Overall, the empirical results suggest that lowering the cost to market would yield significant benefits to rural households-though the impacts appear to depend on the source of income and location.

LSMS Measures of Household Welfare
We begin by presenting the effects of transport cost on income from different sources, such total revenue from crop sales, livestock sales, and non-agricultural income of the household over the past year, which are the flow measures of household welfare using the LSMS-ISA data for Nigeria. 13 As a robustness check, we instead controlled for regional fixed effects (e.g. Agro-ecological or geopolitical zones) in place of marketshed fixed effects. With the exception of livestock, which turned out insignificant, all outcome variable estimates remained robust. 14 Similar to the natural path variable, Euclidean distance measures the straight-line distance from the household (or cell, in the case of local GDP) to the least-cost market.

Crop Revenue
Our crop revenue regressions suggest that, on average, decreased transport costs lead to increased household revenue from crop sales. Preliminary SUR results are reported in column (1) of Table 2, which shows that that decreasing transport cost by ten percent would increase crop revenue by approximately 6.2 percent. While these preliminary SUR results are reassuring in that they conform to prior expectations, they must be treated with caution as they do not take account of the endogeneity of roads.  (2) and (3) report the three-stage least squares (3SLS) estimates where cost to market is instrumented by the Euclidean distance and natural path, respectively.
These unbiased estimates of the effect of transport costs are slightly larger than the SUR coefficient (at -0.63 and -0.64 respectively). The natural path IV passes the Angrist-Pischke F Test of Weak Identification, with the F statistic far exceeding 10, the rule of thumb.

Livestock Sales
As with crop revenue, we report both our SUR and 3SLS estimates of livestock sales in Table 3. In both cases, we find that the estimated coefficient on cost to market is negative and significant. After considering the various robustness checks below, it would seem that overall the relationship between cost to market and the sale of livestock is not very robust. This might be in part due to the multiple roles of livestock as a store of value and capital good. Livestock sales in a given year are therefore driven by decisions on asset management (e.g., need for revenue to manage temporary household needs for cash such as weddings or natural disasters losses), much more than are crop sales

Non-Agricultural Income
Turning next to the relationship between access to markets and non-agricultural income, economic theory suggests that as transport costs decrease, more opportunities outside of the agricultural sector become available. To investigate this, we regress log non-agricultural income on the log of cost to market, holding constant household characteristics and marketshed fixed effects. Table 4 reports the SUR estimates (column 1), and the 3SLS estimates (column 2) for non-agricultural income. The SUR estimates suggest that reducing transport costs by 10 percent increases non-agricultural income by 3.2 percent. After controlling for endogeneity, our 3SLS estimates find a higher increase in income: 3.3 percentage points.

Robustness Checks
As a robustness check, we estimate the impact of transport costs on the three sources of income separately. That is, we estimate three sets of two-stage least squares models and find very consistent estimates. Controlling for Agro-Ecological Zone fixed effects in place of the marketshed dummies yielded similar results for each of the three outcomes. Further, the estimated impact of transport costs on crop revenue and nonagricultural income was found to be robust to a number of alternative specifications. For example, they were robust to the inclusion and exclusion of land, labor, fertilizer, and irrigation control variables. Further, alternatives were tested such as credit, as well as an indicator of the presence or absence of a hospital within the community. These results are reported in Appendix VII.
To check the robustness of our estimates to the relaxation of the exclusion restriction on the natural path instrument, a set of Conley Bounds are calculated following Conley et al (2012), and reported in Table 10. 15 For both the crop revenue and non-agricultural income, the 95% confidence interval suggests that the coefficient on the variable of interest remains consistently negative. Taken together with the Angrist-Pischke F statistic and first stage results, these findings suggest that our estimates are robust to relaxation of the exclusion restriction. 16 In the case of livestock sales, the Conley Bound 95% confidence interval crosses the zero line. This, together with the abovementioned alternate specifications, suggests that these livestock estimates are not as robust as those of crop revenue and non-agricultural income.

NDHS Measures of Household Welfare
We now turn to the two measures of household welfare from the NDHS; the wealth index and the multidimensional poverty index. 15 Note that since the Conley Bound estimation is designed for two-stage least squares, and so for the purposes of this robustness check the three outcome variables are treated separately. 16 These first stage results are taken from separate two-stage least squares models.

Nigeria Demographic and Health Survey
The first indicator of household welfare is the "wealth index", available in the DHS data. 17 The second indicator, a multi-dimensional poverty measure, is generated specifically for this study. We follow Alkire and Santos (2010) to calculate the Multi-Dimensional Poverty Index (MPI) for each household. The MPI is a weighted sum of ten indicators of deprivations across three dimensions: education, health, and standard of living. We follow convention and use equal weights for each of the three dimensions and for indicators within dimensions. A household is considered to be multi-dimensionally poor if it is deprived in three of the ten weighted indicators. Table A4 in Appendix V gives more specific details on how this index was constructed. 18 Table 5 presents the results from regressing the wealth index on transportation costs (both in natural log form). Column (1) in Table 5 presents the coefficients from OLS estimation, and column (2) presents the coefficients from 2SLS estimation. The coefficient on the natural path instrument in the first stage is very highly statistically significant and positively related to transportation cost to the market, as expected. Our results indicate that a 10 percent reduction in transportation cost leads to a statistically significant 2.3 percent increase in the wealth index according to OLS estimation, and a 2.1 percent increase in the wealth index from our IV estimates. 19 Again, our IV passes the Angrist-Pischke F Test of Weak Identification, with the F statistic far exceeding 10. The control variables tend to follow intuition. Households located in areas of higher agricultural potential, larger households, and households with more adults in the working age group (15-49 years for females and 15-59 for males) tend to have accumulated more wealth. Households which are agriculturally involved, rural households, and households with more young children have lower levels of wealth.

Multi-dimensional poverty
Turning next to the impact of transportation costs on the probability of a household being multi-dimensionally poor, we report two sets of results in Table 6: a standard linear probability (OLS and IV) and the marginal effects from maximum likelihood estimation (probit and IV probit). We present both results for robustness, but for space considerations, we only interpret probit models here since the two sets of estimates are broadly consistent. 20 Overall, decreasing a household's transport cost to market by 10 percent reduces a household's probability of being multi-dimensionally poor by 2.6 percent.
Our results also indicate that households that live in rural areas or are agriculturally involved are more likely to be multi-dimensionally poor, and households that live in areas with higher agricultural potential are less likely to be multidimensionally poor. Comparing the probit and IV probit marginal effects, we find that the IV probit estimate (0.26) is considerably larger than the probit estimate (0.08) indicating that the Probit model underestimates the effect of transportation costs on multidimensional poverty and that the IV estimation approach was important to obtain an unbiased, accurate measure. Table 10 show that with both the wealth index and the multi-dimensional poverty index, the coefficients on market cost are robust to a relaxing of the exclusion restriction. The coefficients remain within a small range, and of the same sign, when the exclusion restriction is relaxed.

Local GDP
Our next set of results looks at the impact of transport costs on local GDP. Our (approximately 1km 2 ), using the fact that brighter lights at night are associated with higher levels of economic activity (see Ghosh et al. 2010 for additional details about how these data were generated). Given the granularity of our control data, we aggregated this data into square cells with sides measuring 5 arc minutes in length (approximately 10km).
As control variables, we include total population within each cell, 21 total population squared, the Euclidean distance to the nearest mining facility, 22 as well as indicators measuring the agro-ecological potential yield of the land for four staple crops 23 -cassava, maize, rice, and yams-and their squared terms (all variables are in natural logs). In addition to these control variables, marketshed fixed-effects are included in the regressions. This specification is tested both for all of Nigeria, and also for only rural areas.
Columns (1), (2), and (3) of Table 7 show OLS and IV results. The OLS estimate of the coefficient on transportation costs implies that a 10 percent reduction in transport costs increases local GDP by 5.4 percent. The IV estimates are slightly lower at approximately 5 percent. Both the coefficient on population, and its squared term are significant and positive, implying that there important agglomeration economies to local GDP. The negative coefficient on distance to mine implies that economic activity is, as we should expect, denser around mining facilities, although this relationship is not very strong. The coefficients on the agricultural potential of various crops are difficult to interpret because they are highly correlated with each other.
Columns (4), (5), and (6) of Table 7 show OLS and IV results when we only include rural areas of Nigeria. Examining the coefficient on transportation costs in the IV regression, we see that the effect of reducing transportation costs is slightly lower when urban areas are omitted; a 10 percent reduction in transportation costs implies a 4.5 percent increase in local GDP. However, a difference in means test shows that the coefficient on transportation costs for the full sample is statistically indistinguishable from that using only rural observations. Similarly, the coefficients on the control variables do not change significantly between columns (3) and (6), with the exception of the coefficient on population becoming insignificant (but the squared term remains significant, leading to similar interpretation). Again, the Conley bounds in Table 10 show that our coefficient on market cost remains negative and within a small range when the exclusion restriction is relaxed.
Given the spatial layout of the data used in the above regressions, there is a possibility that spatial autocorrelation could be biasing these results.. A Moran's I test confirms the presence of spatial autocorrelation in the residuals of the regressions in Table 7. In order to test for potential bias resulting from this, we employ a bootstrapping technique in which the data set is resampled 1,000 times in a way that ensures spatial independence of each of the samples. 24 The results from this bootstrap are shown in Table A11, Column 2, with the standard, non-bootstrapped results presented again in Column 1 for comparison. The coefficient of interest, on cost of market, while slightly lower in the bootstrapped model, is statistically indistinguishable between the two models. This implies that although spatial autocorrelation is present in the data, any bias it creates on our estimates is negligible. The result that the estimates from our non-spatial model are robust even in the presence of short-distance residual special autocorrelation (RSA) is not uncommon (Hawkins et al. 2007).

Level and diversification of economic activity
To analyze whether lower transportation cost creates more, as well as diverse employment opportunities, we examine how transportation costs effect year round employment, and agricultural vs. non-agricultural employment for males and females.
Our dependent variables consist of dummy variables indicating employment types of individuals within households-full time vs. less than full time/unemployed (Table 8), and agricultural vs. non-agricultural employment (Table 9). We also break up our analysis by gender, to allow for heterogeneous effects. Columns (3) and (6) in Table 8, indicate that a 10 percent reduction in transportation cost increases the probability of being employed all year round as opposed to being unemployed or seasonally employed by 4 percent for males and 3 percent for females. Columns (3) and (6) in Table 9 indicate that a 10 percent reduction in transportation cost will decrease the probability of being agriculturally employed for those who are employed (i.e. improve non-agricultural employment) by 4 percent among males and 5.3 percent among females. These results suggest that reducing transportation cost leads to an increase in both economic activity and diversification away from agricultural activity.

Economic Impact of Alternative Road Investments
In this final section, we use our estimate of the local GDP elasticity of transport costs to simulate the effect of several road infrastructure improvement projects which have been proposed by the World Bank, The African Development Bank (ADB) and the New Partnership for Africa's Development (NEPAD), a planning and coordinating technical body of the Africa Union. The approach is meant to be illustrative and could be applied to study any road improvement or new road construction project within Nigeria.

The projects
We present below an estimate of the impact of improving the portion of NEPAD's and ADB's Trans African Highway 25 project segments that runs through 25 The system of Trans African Highways consists of 9 main corridors with a total length 59 100 km. The concept as originally formulated in the early 1970s, aims at the establishment of a network of all-weather roads of good quality, which would: a) provide as direct routes as possible between the capitals of the continent, b) contribute to the political, economic and social integration and cohesion of Africa and c) ensure road transport facilities between important areas of production and consumption.
Nigeria as shown in Figure 1. It is assumed in the simulations that each corridor would be improved from its current quality, to paved and good condition status. The baseline scenario is obtained from FERMA 26 and requires that 20 km need to be paved, while approximately 1,275 km need to be improved from poor to good and 815 km from fair to good condition.

Simulation Methodology
To calculate the change in transportation cost resulting from the improvement of each corridor, we follow the same procedure utilized in section 3.1 to estimate the travel cost to the cheapest market and compare these to current transport costs. The percentage change in transportation costs for each cell, if all three of the corridor improvement projects were completed, is shown in Figure 2.
New transport cost elasticities are calculated, one for each of the 6 geopolitical zones of Nigeria 27 . This allows us to account for regional heterogeneity in the benefits of reducing transport costs. These elasticities are shown in Table 11 (the South East region dummy is omitted). Formally, local GDP increase calculation is given by: where is the total increase to local GDP due to project j, is the local GDP elasticity of transportation costs for region k (from Table 11), is the percentage change in transportation costs in cell i due to project j, and is the baseline GDP in cell i, from the local GDP data. This increase in GDP represents an increase in annual GDP over the baseline level. These benefits will accrue every year as long as the benefits from reducing transportation costs and baseline GDP levels, both remain constant. 28 For each project, we estimate the increase in local GDP in each grid cell separately, and then aggregate these benefits to arrive at an aggregate total benefit. The increase in local GDP is then summed up amongst all grid cells to arrive at an aggregate value. The spatial approach also allows us to identify the number of beneficiaries and estimate the benefit per road kilometer improved. Given the inherent uncertainty involved in statistical analysis, we calculate total benefits given our preferred elasticities (point estimates given in Table 11), as well as a range of benefits representing the 95% confidence intervals around those point intervals.

Benefit Point Estimates
We first present results from the point estimations of the elasticities in Table 11 for the benefits of the three NEPAD projects. These are shown in Table 12. The benefits from these projects are found to be quite large. The North-South Corridor, which is the longest road of the three projects, would result in estimated annual benefits of over $1 billion. Annual benefits from the Northeastern and Southern corridors are significantly lower, at $233 and $529 million, respectively. Nevertheless, these roads are also shorter, potentially implying a lower cost of improvement. If all projects were completed, total estimated annual benefits would be $1.8 billion. Note that the total benefits of all of the construction projects are not equal to the sum of the benefits of each of the projects individually, because there is some overlap between the project locations. Figure 3 shows where exactly the increase in local GDP would occur if all three of the corridor improvement projects were completed.
Turning to the third column of Table 12, we calculate the total benefit per KM of each corridor, which allows us to rank the projects according to their benefit-efficiency; i.e. assuming road improvement costs are uniform and equal across projects, which project gives us the most benefits per unit cost. We see that the North-South Corridor project would return annual benefits of approximately $970,000 per km improved, significantly higher than that of the Northeastern Corridor, with benefits of $250,000 per km improved, and moderately higher than the Southern Corridor, at $730,000 per km improved.
Using Landscan population data, we also get an approximation of the number of people whose transportation costs to market would decline, as a result of each project.
This is shown in column (4) Again, the North-South Corridor has the biggest impact here, benefiting 23.1 million people. The Northeastern and Southern Corridor projects would benefit 14.9 and 9.2 million people, respectively. Figure 4 shows the total population affected if all three corridor improvement projects are completed. By dividing total benefits by the number of people affected, we arrive at estimated benefits per person affected (column 5). Despite only having the second largest overall impact, the Southern Corridor project has the largest benefit per capita, at $57.7 per person benefited.

Range of Plausible Benefits
Given that statistical estimates are not precise, we also present benefit ranges based on the 95% confidence interval surrounding the estimated local GDP elasticity of transportation costs. These intervals are given in Table 13. The recalculated benefits for each of the NEPAD projects for these two elasticity bounds are given in Table 14. When all projects are completed, the estimated annual benefits range from $1.2 billion to $2.3 billion. Per capita and per km benefit could also easily be calculated for this range of benefits. Because the total number of people affected and the length of each road will not change, these values are not shown for brevity.

Road Prioritization
By analyzing small segments of each road separately, we can further prioritize these infrastructure projects not just by which overall project would have the largest impact, but within each project, which segments should be improved with greater urgency.
In order to analyze different segments of the road, we first divide roads into "marketsheds." Recall that a marketshed is defined by the land area served by each city (with a population of at least 100,000 residents), when the transport costs of reaching the city are minimized. The size and shape of a city's marketshed will therefore depend on both the road network around that city, and its proximity to other cities. again Lagos emerges as the top priority. Figure 6 displays visually the benefits per kilometer of each road segment.

Discussion and Conclusion
In summary, we find that reducing transportation costs in Nigeria would lead to a significant increase in several dimensions of welfare, an increase in economic activity, can increase all-year employment, and can allow for income diversification opportunities.
It further suggests that income diversification and increase in income from different sectors lead to long-term wealth accumulation and poverty reduction. We also provide guidance on how the analysis conducted in the study can be used to prioritize road projects based on the objective of the planner.
The paper contributes to the literature on transport cost in a variety of ways. It creates a rich data set on transportation costs for Nigeria by combining data on quality and road networks from different sources. We deal with several potential sources of bias arising from the non-random placement of roads, spatial sorting of households and the geographic emergence of markets. We are thus able to disentangle cause and effect, using a novel instrumental variable, the time taken to reach markets using a natural pathway.
Results from Conley Bounds demonstrate that our estimates are robust even in the case that the exclusion restriction assumption of our IV fails.
The above analysis, while by no means exact, represents a robust attempt at estimating the economic impact from several road improvement projects. Although we believe we have used the best possible methods, and the best possible data, several shortcomings are acknowledged.
In calculating our estimated local GDP elasticity of transportation cost, we use data which itself is estimated. This adds an additional level of uncertainty to our estimates, but uncertainty which is unavoidable due to the fact that spatially disaggregated data on actual (non-estimated) GDP is not available. Additionally, even though our elasticity is based on estimated data, it falls within the range of other elasticities derived using survey data, suggesting that the estimate is plausible.
Another potential short-coming is the fact that we are using cross-sectional data, which can often make discerning causality very difficult. The instrumental variable technique employed is commonly used in the literature, and we believe our IV is a significant improvement over those used by other very well cited authors. Nevertheless, there is no such thing as a "perfect" instrument. For this reason, we have been careful to present our point estimates along with the respective Conley bounds, which give a range of estimates under the assumption that our instruments are not perfectly exogenous.
Finally, it is important to note that benefits from these road projects simulated in section 6 will not all occur immediately, nor all at once. They will likely cascade over time, as people begin learning of the new, lower transportation costs, and adjusting their behavior accordingly. Therefore, these estimates should be considered long-term annual benefits.

Appendix II: Transport Cost Calculation
To construct a measure of travel costs to the market in Nigeria we combine road survey data from the Federal Roads Maintenance Agency (FERMA) and World Bank's FADAMA 29 project, with GIS roads network data from Delorme 30 . We used the Delorme data set for data on both the trunk roads and the rural network of Nigeria, and The data used for the estimates in this paper were collected specifically for Nigeria, to best characterize the transportation conditions one would find there.

Characterization of network type and terrain
The road network of Nigeria includes three classes of roads: primary, secondary, and tertiary. Average vehicle speed and width of the main carriage road are used to characterize the differences among network types as follows:  Road Condition  Primary 7m Secondary 6m Tertiary 5m  Flat  80  70  60  Rolling  60  50  40  Mountainous  40  30  20 where terrain type is defined using the following concepts and road geometry parameters:

Unpaved Road Speed (km/hr) by Network & Condition
•

Characterization of network type and condition
The International Roughness Index IRI (m/km) is used to define the differences in road condition by network as follows: The last step is to add the individual transport cost for each combination of road types (54 in total) into the network data set segments and multiply them by the length of the roads (see figure 1). As a result a monetary road user cost can now be used as a measure of the amount of resistance required to traverse a path in a network, or to move from one element in the network to another. Higher impedance values indicate more resistance to movement, and a value of zero indicates no resistance. This way an optimum path in a network is the path of lowest impedance, also called the least-cost path. These costs per ton-km were used to calculate the cost it takes each georeferenced household or pixel centroid to transport one ton of goods to the nearest market.  where W is the hiking velocity in km/hr and S is the slope or gradient of the terrain.

Paved Road IRI (m/km) by Network & Condition
Finally, we compute the time that it takes to travel from each point in Nigeria to each of our selected markets. The map of Nigeria is divided into a 'fishnet' grid of 10km 2 cells, with approximately 11,000 cells in total. Minimum travel times are calculated using the optimal walking path from the center of each of these 11,000 cells to each of the 65 markets. The algorithm utilizes a node/link cell representation system in which the center of each cell is considered a node and each node is connected to its adjacent nodes by multiple links. Every link has an impedance, which is derived from the time it takes to pass through the cell, according to the natural path friction cost surface, and takes into account the direction of movement through the cell. An ArcGIS/python script was written which creates an optimal path raster for each of the 65 selected cities/markets. This raster defines the optimal path (minimizing walking time), and then records the total time required in each cell. As a result we obtained an origin/destination travel time matrix of more than 11,000 rows (grid cells) and 65 columns (selected markets).  The household does not own more than one bicycle, motorcycle, radio, fridge, TV, or phone and does not own a car.

0.56
World Heath Organization (WHO) standards were followed in determining what to consider unimproved water sources, inadequate sanitation, and dirty cooking fuel.
A household is considered to be multi-dimensionally poor if its weighted sum of indicators was greater than 3. Note that the weights add up to about 10, the number of indicators (difference due to rounding).

Appendix VI: LSMS Multi-dimensional Poverty Index
To test the robustness of our multi-dimensional poverty index indicator, we constructed a second MPI using data from the LSMS. The index is constructed in a similar manner, with three main components each receiving equal weight: education, health and standard of living. Table A5 gives a detailed description of the components of this index. For consistency, similar controls were used in regressions on the LSMS MPI as were used for other LSMS regressions. As with the NDHS MPI, we estimate two linear probability models (OLS and IV), and two maximum likelihood models (probit and IV probit) Table A6, columns (1) report the OLS and IV estimates, respectively, of the effect that log cost to market has on the households' probability of being multi-dimensionally poor. We find that the coefficient on market cost is positive and significant at the one percent level in both regressions. Consistent with a positive bias, the IV estimates is smaller in magnitude, between at 0.073, compared with 0.104 for OLS.  (1). Column (4), which reports the IV probit estimate shows a much larger estimated impact, more than doubled. A ten percent decrease in transport costs decreases the probability of being multidimensional poverty by roughly 2 percent.
As a robustness check of the exogeneity of our IVs, we report the Conley Bounds in Table A7. From the 95% confidence intervals reported, we see as we increase the correlation between the IV and the outcome variable the range of estimated values widens, but remains positive. Taken together with the first stage test results, these test statistics suggest that our IVs have power in explaining the variation in cost to market across the households.
In general, the coefficient on market cost for the MPI constructed using the NDHS is very similar to that for the MPI constructed using the LSMS. The NDHS coefficients on market cost for IV and IV probit are 0.065 and 0.243, respectively. For the LSMS MPI, those coefficients are 0.073 and 0.208. In fact, a modified t-test confirms that there is no statistical difference between these estimates.

Appendix VII. Robustness Checks for LSMS results
To test the robustness of the estimated impact of transport costs on different sources of income, we estimate several alternative specifications. These include (1) estimating the impact on each source of income separately in a two-stage least squares model, (2) controlling for agro-ecological zone fixed effects in place of the marketshed dummies, (3) excluding land, labor, fertilizer, and irrigation controls, and (4) introducing additional controls including access to credit and the presence of a hospital in the community. These results are reported below.   The next step is to resample the data set. This is done in a similar way to the standard bootstrapping methodology, with two important distinctions. The first is that the data set is sampled without replacement. And second, it is done in a way that ensures that all points are spatially independent from each other. To be conservative, we used a 30 km buffer, 10 kms larger than that suggested by the correlogram. This buffer, along with the geography of Nigeria, allows for approximately 500 points to be selected, and to be spatially independent of each other. 1,000 samples were generated and each sample was regressed separately, with the mean of each respective coefficient calculated to arrive at the bootstrapped coefficient, and the standard deviation of the respective coefficients used to calculate t-statistics.
The results shown in Table A11 show that there is no significant difference between the coefficient on cost to market in the standard model and the bootstrapped estimate, implying that spatial autocorrelation has no statistically detectable influence on our estimate.

Marketshed Fixed Effects Yes Yes
Observations 10,607 Obs/sample: 500 Samples: 1,000 t-statistics in parentheses*** p<0.01, ** p<0.05, * p<0.1 Data: Various sources, see section 5.3 Coefficients given in column 2 are the mean coefficients of each variable from the 1,000 regressions of each sample. T-statistics are calculated by replacing the standard errors with the standard deviation of the sample estimates.