Firm Location, Transport Connectivity, and Agglomeration Economies: Evidence from Liberia

Transport connectivity is among the most important factors in increasing firm productivity and accelerating economic development. The literature generally supports the idea of agglomeration economies, although there is little evidence of their effectiveness in Africa. There are often empirical challenges, such as spatial externalities and endogeneity of infrastructure development. Using firm registry data in Liberia, this study used the instrumental variable spatial autoregressive model to examine the effects of transport connectivity on firms' decisions on where to locate. The study found significant spatial autocorrelation and possible endogeneity related to transport infrastructure, and that firms are more likely to be located where market accessibility is better. The data indicated strong agglomeration economies, indicating that the primary city, Monrovia, is likely to continue to grow and attract more people and firms, and that secondary cities can also grow with greater transport connectivity between populated areas, such as district centers.


I. INTRODUCTION
Transport infrastructure is among the most important factors in increasing firm productivity and supporting local businesses. In particular, transport connectivity plays an important role in influencing firms' decisions on where to locate. Proximity to markets is essential to firms, and transport infrastructure is its main determinant. In Hungary, road availability is considered to be a significant determinant of firm investment (Boudier-Bensebaa 2005).
Access to good port facilities is also essential for private investors (Belderbos and Carree 2002;Deichmann et al. 2005). Foreign direct investment tends to be concentrated in countries with good transport infrastructure (Cieślik and Ryan 2004;Milner et al. 2006;Procher 2011;Lee et al. 2012;Mare and Graham 2013).
The traditional inventory model (e.g., Arrow et al. 1951) also suggests that firms can reduce inventory costs by lowering transportation costs. In general, it is financially costly to firms to carry inventory, restraining their operational flexibility. In the United States, the level of firm inventory declined by 7 cents for every additional dollar of investments in the highway capital (Shirley and Winston 2004). In China, each dollar of road spending has been estimated to reduce firms' held inventories by 2 percent (Li and Li 2013). Using the Golden Quadrilateral project linking India's four largest cities as a quasi-natural experiment, Datta (2012) found that firms in cities located along the improved highway reduced their inventory by 7 operating days' supply on average. When firms are located near one another, transport and transaction costs can be minimized.
The new economic geography literature indicates that agglomeration economies can make an industrial cluster or a city attractive (e.g., Krugman 1991;Fujita et al. 1999). This creates a dynamic effect that continues to influence further investment decisions of firms. Therefore, although the cost of distance has been declining in recent years because of new technologies such as information and communications technology, firms still prefer to be located near one another to share the common input markets of labor and intermediate inputs and thus minimize trade and transaction costs.
As a result, in many developed countries and rapidly growing emerging economies, firm agglomerations and industrial clusters have been established (e.g., Yusuf et al. 2008). In Africa, there are only a few examples of this occurring, such as the textile sector in East Africa (Otsuka and Sonobe 2011). Africa has lagged behind in the global manufacturing market since the 1960s. Among others, low labor productivity and lack of land access are significant constraints in Africa (Dinh et al. 2012). Poor infrastructure is another constraint (Harrison et al. 2011).
Empirically, there are methodological challenges to exploring the relationship between firm location, productivity, and transport infrastructure. First, infrastructure investment is often endogenous. Firms tend to be located where transport connectivity is better, but causality is less obvious than it would appear. Firms can be more productive because of better mobility of goods and people, but at the same time, governments may invest more in transport infrastructure where firms already exist. Thus, causality remains unclear. Recent publications propose different instrumental techniques to avoid this endogeneity problem (e.g., Chandra and Thompson 2000;Banerjee et al. 2012;Jedwab and Moradi 2012).
Another challenge is spatial autocorrelation. Transport infrastructure, by nature, forms a network. Thus, any improvement in transport connectivity has local and spillover effects.
The extent to which firms would be affected is normally considered to depend on proximity to the improvement. Tobler's (1970) first law of geography says that, "everything is related to everything else, but near things are more related than distant things." Spatial autoregressive models address this problem. Wang, Kochelman, and Damien (2012) used the spatial autoregressive probit model to analyze land use in Austin, Texas. Other authors have incorporated possible spatial autocorrelation into the count data regression context (e.g., Bhat, Paleti, and Singh 2014).
The current study examined whether transport connectivity affects firms' decisions on where to locate and how firms' decisions affect those of other firms. Using the firm registry data in Liberia, the study used the instrumental variable spatial autoregressive model (Drukker et al. 2011), which allows the above-mentioned empirical issues to be addressed simultaneously.
Policy implications will be developed from the estimation results: Where should investments be concentrated, in the primary city or other urban areas? What kind of transport connectivity affects firm location? The remaining sections are organized as follows: Section II provides an overview of recent developments of formal businesses and road networks in Liberia. Section III develops an empirical methodology and describes our data. Section IV presents main estimation results and discusses some policy implications. Section V concludes.

II. CURRENT ROAD NETWORK AND FIRM LOCATION IN LIBERIA
Liberia is one of the poorest countries in the world and has a population of approximately 4.4 million. Gross domestic product per capita was US$457 in 2014. 1 Many people live below the poverty line, depending on subsistence farming without access to markets. According to the national census, the national poverty rate was estimated to be 54.1 percent in 2008 (LISGIS 2009). There is significant urban-rural inequality. Monrovia has a poverty rate of 31.6 percent, but poverty is generally more prevalent in the inland north central region (72 percent), such as Lofa County, which was most severely affected by the recent Ebola outbreak.
Agriculture is an important sector, employing approximately half of the labor force. A significant number of people work in the informal sector, which accounts for nearly 60 percent of the total labor force (table 1). The industry and service sectors, especially in the formal sector, are thin. A recent labor force survey showed that approximately 17,000 people are employed in the manufacturing sector and 343,000 in the service sector. Economic activity is highly concentrated in Monrovia, where more than 1.2 million people, or about 30 percent of the total population and 55 percent of the total urban population, live.
The distribution of firm location is even more skewed (figure 1). The government's business registry database indicates that more than 41,000 firms were officially registered in Liberia as of 2016, of which approximately 80 percent were located in Montserrado County, where the capital city, Monrovia, is (table 2). Thus, formal firms are almost exclusively concentrated in Monrovia, but even while Liberia has been experiencing rapid urbanization, many people live in rural and remote areas.
The skewness of firm location is higher than in other countries (figure 2). This does not mean that there are few businesses in other areas; there are many informal businesses, but they have not been registered. As discussed above, informality is significant in the Liberian economy, particularly in rural areas.  Corridor, which has recently been rehabilitated, are in good or fair condition, but nearly 60 percent of unpaved roads are in poor or very poor condition (figure 3).
Because of the poor condition of the nonprimary road network, transport connectivity is generally limited and varies significantly across the country. For instance, in rural and remote areas, accessibility is extremely limited. The Rural Access Index (RAI), which measures the share of the rural population that lives in within 2 kilometers of a road in good condition, is estimated at 41.9 percent in Liberia (figure 4).

Figure 3 Road Network Condition Figure 4 Rural Access Index
Source: World Bank survey. Source: World Bank estimate.
Although access to the road network (RAI in the case of rural areas) is one of the most important economic fundamentals, it is not sufficient from a development point of view.
Essentially, people need to be connected to markets to take advantage of economic and social opportunities, which are often located in urban areas. Transport costs to bring one unit of a good to a major market can be estimated using georeferenced road network data (figure 5).
Large cities and towns with more than 15,000 people are used as a proxy of market accessibility. Transport costs are relatively low along the Monrovia-Ganta Corridor but are much higher in other parts of the country, particularly in the north, such as Gbapolu County, and the central region, such as River Cess and Nimba Counties. These areas are completely isolated from the domestic market. It is likely that the difference in market accessibility plays an important role in influencing firms' decisions on where to locate. There is a clear correlation between firm location and transport costs to market (figure 6).

III. METHODOLOGY AND DATA
To examine firms' decisions on where to locate in connection with site-specific characteristics, a standard discrete choice model was considered in the conditional logit framework (McFadden 1974). Suppose that firm i maximizes the following general profit function at location or district j: is the mean profit level at location (district) j, and εij is an idiosyncratic error including firm-specific unobservables; x consists of location-specific observables, including a measurement of agglomeration economies and infrastructure variables.
Following Berry (1994), the set of firm-specific unobservables is defined by . Then the share of firms choosing to be located in district j can be written implicitly as . When ε is assumed to be independently and , the log of the probability of a firm choosing location j is: with the mean profit of the outside option normalized to zero; s0 is the share of firms that take an outside option, which is assumed to be employment in the informal sector in our context.
In other words, people who are working in the informal sector can choose to formalize their businesses in a particular district. They do so if expected profits are positive. Otherwise, they continue to stay in the informal sector or unemployed. Recent labor force statistics were used to calculate the number of informal businesses that choose the outside option, and it is assumed that the average size of informal businesses is five employees.
Transport connectivity can be measured differently. In this analysis, six variables are examined: rural accessibility (RAI) in district j, road density (RDDEN), share of roads in good condition (GDRD), transport costs to a nearby large city (TCMK), transport costs to the largest port (Freeport of Monrovia, TCPT), and market access index (MAI) at location j. The first variable is a global indicator of rural accessibility at the district level, as discussed above. RDDEN and GDRD are traditional transport indicators. It has long been argued whether they can measure actual connectivity accurately.
The transport cost to market variable (TCMK) is considered to be a more-accurate measurement than the above, representing connectivity with the road condition taken into account. Given the georeferenced road condition data, transport costs to bring one unit of a good to a major market are calculated using spatial software. In principle, if road conditions are bad, road user costs tend to be high and travel speed low, which increases time costs.
From each location, the optimal route is selected to minimize total transport cost to reach a city with a population of more than 15,000. Similarly, transport costs to Freeport of Monrovia are calculated.
A more-general market accessibility measurement can be calculated. The above transport cost variables are still less general, because they reflect connectivity to a particular destination. In reality, people and firms need to be connected to different locations or potentially all cities and markets. The level of the importance of each market may depend on proximity to it. Thus, with the size of market capacity and distance to the market taken into account, the following simple market access index is considered, based on a conventional gravity framework: 3 This is the sum of purchasing power or market capacity, Y, inversely weighted by the degree of impediment between two locations (e.g., distance, d). For Y, city population is used as a proxy for market capacity; d is measured according to transport costs from district j and city k.
To estimate Equation (2) in the logit framework, a critical empirical concern is that it assumes the independence of irrelevant alternatives, which requires that preferences between any pair of two choices be independent of the third option (e.g., Bhat, Paleti, and Singh 2014). In our context, this means that the relative attractiveness of one location (A) over another location (B) is not changed even if the underlying characteristics of the third location (C) are changed, but this is not likely to hold. For instance, suppose that some unanticipated investment is made in location C. It is not in location A or B, but if A is next to C, A may become more attractive than B, because of the spillover effects from C. In general, locational attractiveness is interdependent with all other locations, especially if transport connectivity is Spatial interdependence comes from two sources. First, there is often spatial interdependency between locations. Although infrastructure is a typical network industry (e.g., roads are connected to each other), agglomeration economies themselves imply spatial interdependency with the neighboring areas. Second, from an empirical point of view, some variables are likely to have been omitted from the model. Spatial autocorrelation lies in ξj if there are any unobservable location-specific characteristics.
In the literature, there are two empirical solutions to address this problem. The first approach is the nested logit model, which can partly relax the independence of irrelevant alternatives assumption across different clusters. The potential disadvantage is that this is only a partial solution and that the nesting structure cannot be known ex ante. The second approach, which the current study relied on, is to allow correlated errors in ξj and eliminate the independence of irrelevant alternatives assumption. This is more flexible, with few presumptions required, and computationally simpler. Particularly, taking the share equation (Berry 1994), as in Equation (2), is advantageous, because it is computationally heavy to incorporate spatial autocorrelation into the standard maximum likelihood procedure.
Letting our dependent variable, i.e., 0 ln ln s s  , be s and explanatory variables x be X in the matrix notation, the following general spatial autoregressive model is considered: where λ is a spatial autoregressive dependence in district share s, ρ represents possible autocorrelation in error term ξ, and W and M are spatial weighting matrices. For both matrices, inverse distances between two locations (j and k) are used. This follows Tobler's (1970) first law of geography: "everything is related to everything else, but near things are more related than distant things." Two locations are more closely related to each other if they are located near each other. The distance is calculated according to the Euclidean distance between the two locations.
Another empirical challenge to estimating Equation (2) is that infrastructure placement may be endogenous. Firms are productive because of better transport connectivity. At the same time, the government may invest in particular places because firms are located there. For instance, some firms are technically more efficient and prefer to be located where transport access is good. Road authorities may invest more in the places where these productive firms are located. As a result, regardless of transport connectivity, good infrastructure and good firms are likely to coexist.
To address the possible endogeneity of infrastructure variables in X, the spatial instrumental variable estimator (e.g., Anselin 1988;Drukker et al. 2011) is used, in which ) , (   is first estimated using the instrumental variable technique in the untransformed model. Using the estimated residuals, ρ is then estimated. Finally, using the results and transforming the equation, the generalized spatial two-stage least-squared estimator can be obtained. To instrument the transport connectivity variables, three instruments in logarithm are considered: terrain slope at each district center (SLOP), the elevation of each district center (ELEV), and straight-line distance from each district center to the nearest historical port city (DIST). The first two variables are considered to be fairly exogenous because they are geographic conditions. The third variable follows the recent literature focused on the evaluation of infrastructure investment. Chandra and Thompson (2000), examining the effect of U.S. interstate highways on earnings of firms, argue that the nonmetropolitan counties served by highways receive exogenous benefits, because the interstate highways first aimed at connecting metropolitan areas. Banerjee et al. (2012) apply the same concept to the case of Chinese railways, calculating the distance from counties to straight lines connecting historic cities and ports, which can be arguably treated as exogenous. The validity of our proposed instruments will be examined using conventional test statistics, such as the exogeneity and overidentifying restriction tests, in the following section.
The summary statistics are shown in table 3. There were an average of 52.6 firms registered per district in 2016. There is significant variation in this. To avoid complexity related to another type of endogeneity, the number of existing firms in each district was measured using the 2015 data. On average, there were 320 firms in each district, with a range from none to more than 25,000 firms. As shown above, the RAI varies considerably from district to district. Similarly, other transport variables also differ from district to district. The market access index was normalized to a range from zero to 100.
Population density was included in the model to control for differences between districts, which was expected to capture the size of the local market. Firms are normally expected to be located near their customers, holding everything else constant. Labor costs may also affect firms' decisions on where to locate. In theory, firms prefer to locate where wages are low, everything else being equal. Because wage data were not available, poverty and unemployment were included. Presumably, wages tend to be low where poverty and unemployment are high. At the same time, firms may also consider the quality of labor.
Firms may want to hire skilled labor regardless of cost. To show this opposite effect, the share of population that attained at least a junior high school education (EDU) was included.
As discussed, informality of the economy may also be an important factor in firms' decisions about where to locate.
To control for other unobservable subregional characteristics, five region-specific dummy variables are also included (north western (Bomi, Gbarpolu, Grand Cape Mount), north central (Bong, Lofa, Nimba), south central (Grand Bassa, Margibi), south eastern A (Grand Gedeh, Rivercess, Sinoe), and south eastern B (Grand Kru, Maryland, River Gee)), and Montserrado is used as a baseline. This follows the classification in a recent poverty analysis (LISGIS 2009). Poverty data are not available at a more disaggregated level. Therefore, no poverty measurement can be included in X.

IV. MAIN ESTIMATION RESULTS
First of all, ordinary least squares (OLS) regression is performed. The results are shown in table 4. Although the results may be biased because endogeneity and spatial autocorrelation are not addressed, they are largely consistent with economic theory. Population density, which is an indication of the size of the local market, has a significant effect. Agglomeration economies are also always strongly positive; firms establish themselves where other firms already exist. For transport connectivity, the coefficients are largely consistent with prior expectations, except for road density, which has a significant negative coefficient. Market access seems particularly important; the coefficients of TCMK and MAI are statistically significant, although the results are not conclusive, as discussed above. Instrumental variable regression is performed to address the endogeneity related to transport infrastructure investment (table 5). The results are broadly similar; the data exhibit significant economies of agglomeration and show that market access is important to attracting firms. More firms are located where transport costs to markets are lower and the market access index is higher. The results indicate that transport connectivity seems largely exogenous, although in the model in which MAI is regressed, the exogeneity test can be rejected at the 10 percent significance level. In this case, our instruments are found to be largely valid; the overidentifying restriction test cannot be rejected no matter which transport connectivity variables are used. The instrumental variable spatial autoregressive model is used to address the possible spatial autocorrelation. The results are shown in table 6, showing that agglomeration economies are significant and market accessibility is important for firms in deciding where to locate. Both coefficients of TCMK and MAI are statistically significant, but transport costs to Freeport of Monrovia have a negative but insignificant coefficient. Thus, port access per se may not be particularly important to attracting firm investment, possibly because port accessibility is generally poor except for in the Monrovia area. The significance of the coefficient of MAI can be interpreted to mean that connectivity to Monrovia is still important both because it is a populated market and because it has the primary port.
The instrumental variable spatial autoregressive results also indicate that spatial autocorrelation affects firms' decisions on where to locate. The spatial autoregressive term λ is estimated to be -0.33 to -0.37. Contrary to prior expectations, an unanticipated shock of a firm locating in a particular district had negative externalities for neighboring areas. On the other hand, the spatial error term ρ is found to be positive but statistically insignificant. This means that an exogenous shock in a particular district is not likely to affect its neighboring districts. For instance, if an extreme weather event, such as flood, unexpectedly happens in a particular district, the event may affect firm investment behavior in that district but not in neighboring districts.
As discussed above, the spatial autoregressive model is more efficient if the transport connectivity variables are exogenous. The results are unchanged (table 7). Firms are more likely to be located where transport costs to market are lower and general market accessibility measured according to MAI is higher. The coefficient of N15 is estimated to be approximately 0.75 to 0.8, which is also similar to the above, indicating strong agglomeration economies. Monrovia is likely to continue growing because that is where (formal) enterprises are currently located, although this does not exclude the possibility that other cities will grow. The coefficient of TCPT, which essentially measures the effect of proximity to Monrovia, is always insignificant, but MAI has a significant coefficient, which measures more-general connectivity, including not only Monrovia, but also secondary cities.
Therefore, other secondary cities can also grow when their intercity connectivity is improved.  One may wonder to examine whether the agglomeration variable N15 is also endogenous.
More-lagged values are used to test this. The numbers of firms registered by 2013 and 2014 are denoted as N13 and N14, respectively. It is less likely that past decisions directly affect a firm's decision on where to locate. The results confirm robustness of the main estimation results shown above (table 8). Market accessibility is an important determinant of firm location, and agglomeration economies were still found to be significant. Autocorrelation was also significant. (1.250) Note: Robust standard errors are shown in parentheses. *, **, and *** indicate statistical significance at 10%, 5%, and 1%, respectively. 1 The instrumental variable estimation is performed with three instruments in logarithm: SLOP, ELEV, and DIST.

V. CONCLUSION
Agglomeration economies are an important factor in increasing firm productivity and promoting economic growth. The evidence shows that firms prefer to be located close to one another, but unlike in other developing areas, Africa has only a few industrial clusters, such as textiles in East Africa. There is little evidence, although many African countries are currently experiencing rapid urbanization.
This study explored the relationship between firm location and transport connectivity. Using the recent firm registry database in Liberia, firms' decisions on where to locate were examined with respect to various indicators that measure transport connectivity. The instrumental variable spatial autoregressive model was used to address the potential endogeneity that infrastructure placement causes spatial autocorrelation between neighboring locations. The results indicate significant agglomeration economies. It is likely that Monrovia continues to grow because that is where (formal) enterprises are located. Other cities will also grow when their intercity transport connectivity is improved.
It was also found that market accessibility is a main driver of firm agglomeration; it is important to connect firms to large markets. Other connectivity measures do not have significant coefficients. Traditional indicators such as road density and the share of good roads have no power to explain firms' decisions on where to locate. These measurements may not be important to consider firm location and promote firm investment.
Finally, from a methodological point of view, the study found the importance of addressing spatial autocorrelation in the firm location literature. Spatial autocorrelation is always significant. There are externalities of firm location choice around neighboring areas.