credit, including © notice, is given to the source. Is India's Manufacturing Sector Moving Away From Cities?

This paper investigates the urbanization of the Indian manufacturing sector by combining enterprise data from formal and informal sectors. We find that plants in the formal sector are moving away from urban and into rural locations, while the informal sector is moving from rural to urban locations. While the secular trend for India‘s manufacturing urbanization has slowed down, the localized importance of education and infrastructure have not. Our results suggest that districts with better education and infrastructure have experienced a faster pace of urbanization, although higher urban-rural cost ratios cause movement out of urban areas. This process is associated with improvements in the spatial allocation of plants across urban and rural locations. Spatial location of plants has implications for policy on investments in education, infrastructure, and the livability of cities. The high share of urbanization occurring in the informal sector suggests that urbanization policies that contain inclusionary approaches may be more successful in promoting local development and managing its strains than those focused only on the formal sector.


Introduction
One of the biggest challenges in development is urbanization. 1 Within developing countries, nearly two billion people are expected to move from rural regions into cities in the next two decades (World Bank, 2011a). The pace at which this will happen will be much faster than what the world has experienced before. For instance, China's and India's economic transformation and urbanization is happening at 100 times the scale of the first country in the world to urbanize-the United Kingdom-and in just one-tenth of the time. Most countries have only one chance to get it right, and there is no "one size fits all" recipe.
McKinsey Global Institute (2010) contends that cities in India have the potential to generate 70% of the country's new jobs and GDP over the next 20 years, a process that could drive a four-fold increase in per capita incomes. While promising and greatly desired, such urbanization imposes unprecedented managerial and policy challenges. However, in spite of the potential for urbanization in the country, research on spatial location and concentration of economic activity in cities is still at an early stage.
Given the great challenges ahead, we look backward to describe how the urbanization process has proceeded (or not) for the Indian manufacturing sector over the 1989-2005 period. We have two goals. The first goal is to trace the trends and depth of India's manufacturing urbanization across states and industries. This description can yield important insights for researchers and policy makers going forward.
Our second goal is to examine whether localized education and infrastructure are linked to urbanization and more efficient spatial allocation of India's industries and plants since 1989. As we discuss below, there are many explanations for urbanization drivers in India, too many in fact for one study to accurately assess. We seek to quantify whether and how these local conditions, along with other factors like wage costs, promoted or discouraged the urbanization process.
Our study combines data from the Annual Survey of Industries for the organized (formal) sector and from the National Sample Survey for the unorganized (informal) sector. On the whole, India's manufacturing sector became more urbanized, with the share of workers in urban areas rising from 33% of employees in 1989 to 41% in 2005 ( Figure 1a). Urbanization growth was most dramatic from 1989 to 1994, but slowed down from 1994 to 2000. The urbanized employment share was basically flat from 2000 to 2005. This pattern of increasing urbanization 3 was also present when looking just at manufacturing plant counts, but the opposite trend is observed for manufacturing output. The latter has increasingly moved towards rural areas. We investigate several features of these trends in detail.
Our first investigation focuses on the relative movements of the organized and unorganized sectors. The differences, illustrated in Figures 1b and 1c, are striking. Throughout the 1989-2005 period, the organized sector moved from urban to rural locations, with its urban employment share declining from 69% in 1989 to 57% in 2005. 2 On the other hand, urban employment share for the unorganized sector increased from 25% to 37%. Since the unorganized sector accounts for about 80% of employment in India's manufacturing sector, the total urbanization level increased for the employment measure. Likewise, the organized sector accounts for over 80% of India's output, so the aggregate output series instead becomes more rural. Section 2 examines the differences across states and industries within India. Simply put, the urbanization process and trends are very heterogeneous at the micro level.
To set the stage for our study of localized changes, Section 3 decomposes India's overall urbanization changes into shifts in urbanization within districts versus changes in the spatial allocation of activity across districts. Using two decomposition techniques, we find that both within-and between-districts components are important and tend to work in the same direction for the urbanization of the unorganized sector and the de-urbanization of the organized sector. We show that the within-district component explains about three-quarters of the overall adjustments evident. Following this observation, we focus the rest of our paper on studying this within-district adjustment process.
To assess the factors contributing to these within-district shifts, we consider a series of regressions that quantify district traits that are associated with increased urbanization during this period. Per our second objective, we find substantial evidence that links greater urbanization to districts with more educated workforces and better infrastructure levels. Further, we find evidence that higher costs, or sharper differences in urban-rural cost levels, decrease the rate of urbanization. These effects are most pronounced in the unorganized sector and before 2000. We also use interaction regressions to show that industries with high capital and land intensity are more likely to locate in rural areas in districts with strong education and infrastructure levels.
By itself, there is no guarantee that the increased urbanization associated with education and infrastructure is optimal. Our final exercise is to construct simple spatial mismatch indices to test whether these urbanization trends are associated with more efficient allocation of industry 4 for districts between urban and rural settings. The spatial mismatch index compares the observed industry distribution of district employment across urban and rural locations to a counterfactual where plants or employment in the district are allocated (in some sense optimally) to urban and rural locations according to national propensities by industry to be in urban settings. As elaborated further below, higher mismatch values for a district indicate that plants that would have been expected to be in urban areas are in rural areas, and vice versa. Figures 2a-2b plot the trends in our spatial mismatch indices. Encouragingly, there has been an aggregate decline in spatial mismatch since 1989, primarily driven by improved allocation of the unorganized sector. Additional regressions identify that the urbanization shifts associated with better education and infrastructure have improved the spatial allocation of industry.
The plan of this paper is as follows. Section 2 discusses our data and the broad patterns of urbanization for India's manufacturing sector. We also discuss in depth how urbanization is defined in India and issues for longitudinal consistency. Section 3 presents the decomposition of urbanization changes across districts. Section 4 considers the district-level traits associated with urbanization changes; this section also presents our interaction analysis of industry traits. Section 5 analyses spatial mismatch in industry allocations. The final section concludes and discusses implications from this work.

Section 2: Surveys of Indian Manufacturing--Organized and Unorganized Sectors
This paper employs cross-sectional establishment-level surveys of manufacturing enterprises carried out by the Government of India. Our work studies the manufacturing data from surveys conducted in fiscal years 1989, 1994, 2000, and 2005. In all four cases, the survey was undertaken over two fiscal years (e.g., the 1994 survey was conducted during 1994-1995), but we will only refer to the initial year for simplicity. This section describes some key features of these data for our study. 3 It is important to first define and characterize the distinction between the organized and unorganized sectors in the Indian economy. These distinctions in the Indian context relate to establishment size. In manufacturing, the organized sector includes establishments with more than 10 workers if the establishment uses electricity. If the establishment does not use electricity, the threshold is 20 workers or more. These establishments are required to register under the India Factories Act of 1948. The unorganized manufacturing sector is, by default, composed of establishments which fall outside the scope of the Factories Act.
The organized manufacturing sector is surveyed by the Central Statistical Organization every year through the Annual Survey of Industries (ASI), while unorganized manufacturing establishments are separately surveyed by the National Sample Survey Organization (NSSO) at approximately five-year intervals. Establishments are surveyed with state and four-digit National Industry Classification (NIC) stratification. We use the provided sample weights to construct population-level estimates of total establishments and employment at the district and two-digit NIC level. We focus mostly on district and industry variation in our empirical analyses. Districts are administrative subdivisions of Indian states or union territories that are more appropriate spatial units for understanding the urbanization process.
These surveys identify for each establishment whether it is in an urban or rural location. Our study considers changes in urbanization over time, and thus the stability and comparability of this survey question over time are very important. To begin with, the statutory definition of an urban setting during our period of study is: (a) All statutory places with a municipality, corporation, cantonment board or notified town area committee, etc., or (b) A place satisfying the following three criteria simultaneously: i) A minimum population of 5,000; ii) At least 75% of male working population engaged in non-agricultural pursuits; and iii) A density of population of at least 400 per sq. km. (1,000 per sq. mile).
This definition has been mostly stable since the 1961 Census. One set of adjustments with the 1971 and 1991 Censuses focused on including 'outgrowths' (e.g., railway colonies, university campuses, industrial townships, and residential and commercial complexes) that lay beyond strict town or village boundaries within a combined urban agglomeration concept for the purposes of classifications. A second set of adjustments enacted since the 1981 Census focused on the definition of the workforce and agricultural sector. 4 Among our datasets, the 1989 survey follows the 1981 Census classification, the 1994 and 2000 surveys follow the 1991 classification, and the 2005 survey follows the 2001 Census classifications. Our primary focus is on urbanization changes since 1994, and the formal 6 definition of an urban area has not changed during this period. As we discuss further below, some sub-units of districts move from being rural areas to being urban areas with the 2001 Census compared to the 1991 Census when the sub-units begin to satisfy the urbanization criteria. This change can influence our measured urbanization levels, and we discuss below the robustness of our documented patterns to these reclassifications. 5 It is important to note that India uses a more demanding set of criteria than most countries to define what is 'urban' (e.g., Bhagat, 2005;United Nations, 2001). For instance, substantial parts of U.S. metropolitan areas like Atlanta or Phoenix would be classified as rural in Indian statistical analyses because their population densities fall below 1000 persons per square mile. Thus, our measured urbanization for India will be lower than many international standards. This consideration does not affect, however, the longitudinal consistency of our trends for India.
We primarily measure urbanization of manufacturing activity for a district through the share of manufacturing employment contained in establishments classified to be in an urban location. This choice most closely corresponds to the prior literature and the central concerns of Indian policy makers. We also consider urbanization of plant and outputs for comparison. Table  1 provides basic descriptive statistics on districts, further discussed below, that begin with the average of these measures for 2001. Table 2 lists the 17 major states from our sample and their urban shares, combining the organized and unorganized sectors. Figure 3a provides a graphical depiction. These states are a subset of India's 35 states/union territories. Exclusions were due to three potential factors: 1) the state was not sampled across all of our surveys, 2) the small sample size for the state raised data quality concerns, or 3) persistent conflict and political turmoil existed in the region. We discuss below in greater detail the explicit criteria for a district's inclusion in the regression sample. These exclusions are minor in terms of economic activity.

7
Tables 3a and 3b provide urban shares by two-digit NIC industry, again combining the organized and unorganized sectors. Office, accounting and computing machinery; Publishing, printing and media; and Medical, precision and optical instruments, watches are the most urbanized industries with greater than 80% of employment in urban plants. The least urbanized industries in 2000 are Wood and wood products; straw and plating articles; Other non-metallic mineral products; Tobacco products; and Food products and beverages, with under 30% of industry employment in an urban area. Major increases in urbanization of employment are evident for the Textiles; Leather tanning; luggage, handbags, footwear; Machinery and equipment, n.e.c.; and Office, accounting and computing machinery industries. Table 3b presents some industry traits: 1) Labor intensity, defined as the total wage bill over value of shipments; 2) Capital intensity, defined as the total fixed capital value over value of shipments; 3) Materials intensity, defined as total raw materials costs over value of shipments; and 4) Land intensity, defined as the closing net land value over value of shipments. There is a 0.20 correlation between labor intensity and 2000 urbanization of employment across industries. There is a negative correlation of -0.05 for capital intensity. There is no correlation evident for the materials or land intensity.
The last part of Table 3b documents the observed ratio of these industry traits between urban and rural establishments for an industry. That is, the urban-rural labor intensity ratio measures the labor intensity of urban establishments to the labor intensity of rural establishments for each industry. The other ratios are similarly defined. Plants in urban areas generally employ more labor, similar levels of materials, and less capital and land than rural plants do. There is a 0.43 correlation between the urban-rural intensity difference and the urbanization rate for labor usage; the correlation to materials usage differentials is likewise high at 0.44. Capital intensity differentials are less correlated at 0.26, and land usage differentials are not correlated with overall urbanization levels. 6 These descriptive tables refine one of the key trends noted in the introduction. Overall, manufacturing activity is becoming more urbanized as measured by employment and plants, while it is becoming less urbanized as measured by output. These trends are mainly explained by the share of organized and unorganized sectors for each metric and the differences in the urbanization trends for these sectors. The details in Tables 2-3b also show that urbanization increases are concentrated, rather than broad-based, even for employment. Only 8 of the 17 states and 7 of the 22 industries exhibit an increase in urbanization from 1994-2005. The remainder of the paper will analyze the traits of districts that have successfully urbanized over this period. 7

Section 3: Decomposition of India's Urban Share Changes
Table 4 presents our first district-level analysis. Following the productivity growth decomposition work of Baily et al. (1992), Griliches and Regev (1995), and Foster et al. (2001), we decompose the observed changes in the aggregate urbanization from 1994 to 2005 into the "within" changes in urban rates for districts (i.e., average growth in urbanization for districts weighted by initial employment shares) versus "between" changes across districts in activity (i.e., relocation of activity from districts with low initial urbanization rates to districts with high initial urbanization rates).  These three components by definition sum to the total change in urban share in aggregate as well for each of the sectors in India. As we do not consider entry or exit of districts across years, our decomposition requires a balanced panel in Table 4. The urban shares in Panel A of Table 4 closely mirror the earlier trends. Table 4 provides the decomposition of urban shares in plants, employment, and output in aggregate manufacturing (columns 1-3), as well as across the organized and unorganized sectors.
Over the 1994 to 2005 period, the within-and between-district components generally operate in the same direction in Panel B. For the organized sector, both components serve to reduce the urbanization rate, while for the unorganized sector both components serve to increase the urbanization rate of plants and employment. The aggregate metrics follow the combination of these two in accordance to their relative shares. In almost every case, and especially for the organized sector, the within-district component is larger than the between-district component.
Interestingly, the covariance term is almost always negative for all three metrics of urbanization, suggesting that rapidly growing districts experienced relative declines in urbanization levels. Equivalently, this pattern suggests that urbanization growth was highest in districts that were growing their manufacturing base less than the national average. While the within-and between-district components tended to act in different directions for the organized and unorganized sectors, this covariance component was more consistent.
These patterns are confirmed in Panel C using the technique of Griliches and Regev (1995). This second technique and its merits/liabilities are outlined in greater detail by Foster et al. (2001). Most importantly, the technique lacks a covariance component, with this feature instead absorbed into both the within-and between-district components. Weighing against this disadvantage, the Griliches and Regev (1995) technique is more robust to measurement error. Using this technique provides very similar conclusions to those in Panel B. The appendix also shows that we obtain similar patterns when examining 1994-2000 and 2000-2005 separately.
These exercises set the stage for our regression analyses. The estimation techniques used in the remainder of the paper will thus mostly focus on changes in urbanization levels within district cells. These decompositions suggest that within-district changes capture the majority of the urbanization changes during this period. The Griliches and Regev (1995) technique provides the most straightforward calculation in this regard: the within-district component represents on average 83% of the total change observed. Using the Baily et al. (1992) technique and ignoring the covariance component, the within-district component accounts for about 75% of the total observed changes in absolute terms. Ongoing research is systematically evaluating the betweendistrict component and the role of major transportation linkages (e.g., Lall et al. 2010).

Section 4: Empirical Analysis of Urban Movements in Economic Activity
This section analyzes the factors promoting or discouraging changes in urbanization from 1994 to 2005 in districts, with particular emphasis on education and infrastructure. We conduct our estimations at the district-industry level. In all regressions, we control for industry fixed effects to absorb changes in urbanization that are simply following national changes in urbanization levels for given industries. In other words, we will be considering factors that promote or discourage urbanization for a district beyond the change that would be expected based upon its industry composition. We first describe the district sample and the district traits that we consider, and then we present the estimation results.

District Sample
Our sample focuses on district-industry cells. Our core sample contains 262 districts out of a total district count for India of 630. This sizeable decline in district count is due to the steps taken to prepare a consistent estimation setting, with the primary decline being due to lack of information in the 2001 Census on the district's traits (e.g., education, infrastructure). This Census information was only collected for about 400 districts.
In addition to our state-level restrictions noted earlier, our regression sample also requires that the district-industry be observed in all of our manufacturing surveys so that we can observe changes over time. Our explicit criteria with respect to district size are that the district has a population of at least one million in the 2001 Census and has 50 or more establishments sampled. Given our desire to study urbanization changes, we also exclude districts that had fewer than five urban plants or five rural plants. Finally, we exclude plants that have negative value add, which accounts for 6%-7% of employment.
These restrictions are not very significant in terms of economic activity. The resulting panel accounts for over 80% of plants, employment, and output in the manufacturing sector throughout the period of study. As important, the aggregate trends for these districts mirror that of India as a whole shown in the figures.

District Traits
Table 1 provides descriptive statistics on our districts. Beyond the starting urbanization level for a district-industry, our two most important explanatory variables are the education levels of the local labor force and the quality of local physical infrastructure. These two factors are consistently linked to India's regional development. 8 Education levels can play a central role in urbanization. Many local policy makers stress developing the human capital of their 11 workforces, and India is no different (Amin and Mattoo, 2008). We measure the general education level of the district's labor force from the 2001 Census as the percentage of adults with a graduate (post-secondary) degree. Our results below are robust to alternatively defining a district's education as the percentage of adults with higher secondary education.
Our second trait is the physical infrastructure level of the district. Functioning urban areas depend critically on their underlying infrastructure. Infrastructure likewise connects urban and rural parts of a district. The government of India is providing substantial financial resources for infrastructure investment (Ministry of Urban Development, 2008). The 2001 Census provides figures on the number of villages in a district which have paved roads, electric power access, telecommunications access, and access to safe drinking water. We calculate the percentage of villages that have infrastructure access within a district and sum across the four measures to create a continuous composite metric of infrastructure which ranges from zero (no infrastructure access in any village) to four (full access to all four infrastructure components in all villages). 9 We empirically assess the role of education and infrastructure for development without strong theoretical priors on their role, only noting their frequent emphasis by policy makers. We believe this agnostic approach is important as anecdotal accounts of India suggest multiple relationships are at play. For example, a natural baseline for education is that agglomeration economies or urbanization premiums are higher for skilled workers and industries. Observers note, however, that skilled workers may want to live outside of Indian cities to the extent that the amenities are lower in Indian cities than in the surrounding areas. Likewise, better infrastructure typically allows strong urbanization levels. However, cities in India often experience some of the largest infrastructure failures, and production of own electricity for organized sector firms is high in both settings. Thus better infrastructure capacity may allow establishments to move to rural locations.
Cost factors also play a critical role in location choice decisions for firms. We would ideally like to model several factors like rental prices, utilities prices, wage rates, and so on, but data on cost factors are very limited. We primarily focus on the wage rate of the districtexamining both its average rate and the urban-rural ratio for a district. For a given district, we anticipate that both aspects can push firms to locate in rural areas. Lower average wages in a district may also attract plants searching for lower labor costs from neighboring districts.
District wage rates are calculated using firms in the organized sector in 2000. We focus on the organized sector as many firms in the unorganized sector do not report wages (and may not pay workers). To guard against outliers in the wage rates or ratios, we use the wage rate of the median employee in the district (and its urban and rural areas). We do not directly observe the full wage distribution, but instead calculate the metrics through plant averages and their relative employment levels. Wages are expressed in 2005 U.S. dollars. When discussing our results, we describe in greater detail the extent to which this factor may model other costs beyond labor factors for districts. We jointly model the average wage and urban-rural wage ratio variables as their correlation is small at 0.12.
Our remaining variables are covariates to contrast with the above factors. We first consider district land area. This metric is calculated by the World Bank using Geographic Information System (GIS) data files of district boundaries in India. Conditional on the other variables specified, we would not anticipate district scale playing an important role. Our focus is instead on the urban build-up of a district. Following Schneider et al. (2009), we consider the share of land in the district that is built-up for urban use. This fraction is calculated using 2001 MODIS satellite information at 500 meter spatial resolution. Built-up area, focusing on building structures, is a good proxy for urban area. Greater built-up area in a district can allow for stronger urbanization growth and is our best proxy for local land markets. A second proxy, termed land use intensity, is calculated as total land value of organized sector plants per unit of shipments.
For our core estimations, we do not model the overall level or change in population urbanization for a district, as this may be endogenous/follow the industry movements we consider. Robustness checks do consider, however, the change in population urbanization from the 1991 Census to the 2001 Census as a control. Thus, these regressions will be considering the movement of industry over-and-above that of the underlying population base. Section 2 noted that some sub-areas of districts change their urbanization status in 2001 and that this could affect our urbanization measures even if a plant's location remained fixed. Including this change provides a quantitative check on whether this is important as the redefinition would apply to the sub-area's population, too.

Basic District Analyses
Our base estimation for district d and industry i is specified as: The dependent variable, ΔUrban Share d,i , is the change in urban share in employment across a time period. Initial Urban Share d,i is the initial urban share in employment at the start of the period. This initial condition is important as we consider shares that are bounded. Thus, highly urbanized areas like Delhi have limited potential to further urbanize.

13
The vector X d includes the district-level covariates defined above, and η i is a vector of industry fixed effects. The industry fixed effects capture broad differences in production techniques and changes in urbanization levels that are specific to the period analyzed. We further include in some models a vector of state fixed effects η s that capture heterogeneity in institutions, policies, and regulations across states. Including state fixed effects in our regression is nontrivial, as many education and infrastructure investments may be implemented at the state level. Nevertheless, we place greater confidence in results that are not sensitive to including the state fixed effects. Table 5 investigates the correlates of India's urbanization of employment at the districtindustry level from 1994 to 2005. Estimations report standard errors clustered by district, have 1700 observations, and weight observations by the interaction of log district size and log industry size. The first row of Table 5 indicates strong conditional convergence, where higher initial urban shares for district-industries experience slower urbanization growth. Figure 4 provides a graphical depiction of unconditional convergence in urban shares. While this convergence is partly mechanical due to a share-based formulation, this property generally holds in other specification formats.
Columns 1 and 2 find that districts with better infrastructure and more educated workforces in 2000 experienced higher urbanization growth from 1994 to 2005. These patterns are evident with and without state fixed effects. In Columns 3 and 4, we include the two wage measures to model cost conditions in districts. Both measures are always negative and have tstatistics of at least 1.5. Without state fixed effects, the urban-rural differential tends to come through strongest, while the overall wage factor tends to come through strongest when state fixed effects are included. Either way, it is very clear that cost factors are important.
Columns 5 and 6 add district land area and change in urban population to the preceding specification. The latter is added as a robustness check to the issue of reclassification of rural areas highlighted in Section 2. These traits are not statistically important. The urban population change covariate has a large point estimate that disappears when including state fixed effects, perhaps indicative of growth differences across states. More important is the stability of the main regressors in these specifications.
Column 7 and 8 next control for the share of a district's land area that is built-up for urban use. 10 Our reported results use indicator variables to allow for non-linear effects in the 14 built-up area shares. Higher build-up, especially in the top quartile, is strongly associated with increased urbanization; we find similar effects with other specification formats like linear shares. A plausible interpretation of these coefficients is that land availability is a strong governor of the urbanization process. While this interpretation would match anecdotal accounts of the constraints that land availability and real estate prices have for the location choices of manufacturing firms in India, this measure is indirect (e.g., compared to real estate price data) and thus caution in interpretation is warranted.
One effect of adding this control is that the coefficient on the infrastructure index declines somewhat and becomes insignificant when including state fixed effects. This is not too surprising given that the built-up metric in part captures higher infrastructure levels. Adding this variable, however, does not affect the results on our wage variables. We take from this set of results that cost factors that slow manufacturing urbanization in India are definitely present in the labor market and most likely present in the real estate market, too. The evidence further suggests that absolute cost levels and localized urban-rural cost ratios play a role in India's urbanization.
Finally, Columns 9 and 10 alternatively model land use intensity by organized sector firms to measure the tightness of local land markets. Inclusion of this control does not affect our core variables. Interestingly, the coefficient is positive, suggesting greater urbanization is associated with tighter land markets. This counter-intuitive result is one factor suggesting caution regarding land availability variables. Table 6a considers the robustness of these results over different time periods. The first two columns repeat the specifications in Columns 5 and 6 of Table 5  15 side. Average wages have a positive and significant effect on urbanization when only industry fixed effects are controlled for. With state fixed effects, the average wage does not exhibit a strong relationship. By contrast, the urban-rural wage ratio is very important in explaining urbanization even after controlling for both industry and state fixed effects.
In Columns 5 and 6, we again find broad comparability for the 2000-2005 period. The education variable is especially strong, and the infrastructure variable is equivalent to the 1994-2000 estimations. The differences thus suggest that while the secular trend for India's manufacturing urbanization has slowed, the localized importance of education and infrastructure has not. Among our cost measures, the average wage is negative and statistically significant. In contrast to the earlier period, the urban-rural wage differential is negative but not especially powerful in this second period.
These adjustments over time, especially in light of the aggregate trends, are interesting but we can only conjecture about the difference with respect to our cost variables. One plausible and intriguing possibility relates the series of economic reforms that India undertook in the 1980s and early 1990s. These reforms substantially impacted the manufacturing sector (e.g. Balakrishnan et al., 2000;Goldar and Kumari, 2003;Hashim et al., 2009;Srivastava et al. 2001). Prior to the reforms, many restrictions existed on where plants could locate. The early changes may have reflected a pent-up sorting to move to cheaper locations within districts. As this process worked itself out, general district wage rates became more important. We hope that future work can clarify the story behind this pattern.
Finally, Columns 7 and 8 of Table 6a present the correlates of changes in urban employment shares from 1989 to 2005. We focus primarily on the 1994-2005 period to maximize data quality and comparability; we also lose a quarter of the sample with the longer horizon. Nevertheless, the basic patterns we show are evident across the full time period, too. Table 6b next compares our results on change in urban employment share from 1994 to 2005 with that observed in plants and output. Columns 3 and 4 show similar patterns when looking at the urbanization of plants, while the results with output urbanization are much weaker across the board. Only the growth in urbanization in districts with more educated workforces remains. Columns 7-10 show that these differences directly link to the differences between the organized and unorganized sectors. 11 The unorganized sector, which accounts for most of plants and employment, mirrors the aggregate trends documented in Table 5. By contrast, changes in urban location decisions for the organized sector, which dominates output-based metrics, does not appear to be systematically linked to these metrics.

16
We find similar patterns-both for the aggregate employment measure and for differences across sectors-in a variety of robustness checks. To mention a few, our core estimations weight observations by an interaction of log district size and log industry size. This weighting strategy focuses attention on generally more important observations while avoiding biases that could be introduced with using actual placements. Nevertheless, we find similar results when weighting by actual employment or dropping the weights. We have also confirmed robustness of the education and infrastructure covariates to including other traits of districts like distance to a large city, household banking, demographic dividend, and import penetration. These do not affect our core patterns.
Our infrastructure index is a composite of four sub-components. Unreported regressions test introducing these four components separately. These estimations emphasize the importance of paved roads and electric power access. We are cautious about these disaggregated results, however, as the paved roads metric dominates in regressions that include state fixed effects, while the access to electric power metric dominates when state fixed effects are excluded. Given this sensitivity, we can only conclude that the aggregate infrastructure level is important. In a second test, we compared the combined sum of the infrastructure components to the minimum infrastructure level that exists for a district across these components (with and without the sanitation measure). The combined measure outperforms the minimum measure, suggesting that the overall package of infrastructure in a district matters more than the minimum level.

District-Industry Interaction Analyses
Results in Table 4-6b focus on district-level traits that explain where urbanization is occurring. These estimations control for industry differences but do not otherwise exploit industry-level variation. Table 7 evaluates the effect of interactions of industry traits with district characteristics on urbanization in plants, employment, and output across the three periods. Interacting industry and district traits can help us identify the characteristics of industries that are most sensitive to district features. Estimations control for district and industry fixed effects, so that the identification only comes through this interaction.
Panel A studies the effect of interaction of industry land intensity with education and infrastructure, while Panel B uses industry capital intensity instead. We focus on these two variables given their earlier emphasis. For education interactions, we tend to find little action. The point estimates are all negative, suggesting that land and capital intensive industries may be urbanizing less in districts with more educated workforces, but the results are not precisely estimated. On the other hand, we pick up more variation with the infrastructure interaction. Especially across the 1994-2000 period, land and capital intensive industries urbanized less in districts with better infrastructure. These results quantify a bit more precisely the correlations discussed earlier.
Besides capital and land intensity (and financial intensity which closely followed), unreported tests do not find other industry traits to be significant in interactions-either with the education and infrastructure measures, or with our cost-side wage factors. This limited response, suggesting that most of the district traits noted earlier acted similarly across industries, is surprising. Moreover, it raises some questions as to whether the urbanization changes improved the allocation of industry across India's urban and rural regions. We turn to this in the next section.

Section 5: Urbanization and Spatial Mismatch
This section considers whether the above urbanization changes linked to better education and infrastructure also connected with a better allocation of industries between urban and rural settings for a district. Our depiction of spatial mismatch for establishments is simple and based upon the extent to which industries are nationally urbanized, as depicted in Table 3a. Industries like Office, accounting and computing machinery are predominantly in urban areas, while industries like Other non-metallic mineral products are predominantly rural.
We approach our spatial mismatch metrics as follows: Define the national urban share for an industry as National Urban% i . Taking a given district d, we identify the industry distribution of employment in the district that is in urban locations, combining both organized and unorganized sectors. Labeling this as Actual Urban Emp% d,i , our first calculation is Actual Urban Allocation d = ∑ i Actual Urban Emp% d,i •National Urban% i , This provides a weighted average of the extent to which industries located in urban positions in district d are typically urbanized nationally.
By itself, Actual Urban Allocation d is difficult to interpret, as districts have varying industrial compositions and urbanization levels. Our next step is then to create a comparison point for the district using the observed industry distribution of employment across both urban and rural settings. Conceptually, we ask what would have been the maximum allocation value possible if we assigned the urban employments to the industries in district d that are the most urbanized nationally. Imagine district d had an urban employment of 1000 workers. We would first take all employment from the Office, accounting and computing machinery industry and assign them to urban settings in the counterfactual. If this industry's employment fell short of 1000 workers, we would then assign workers in the industry that is the second most urbanized nationally, and so on, until we have the industry distribution for the 1000 workers that should have been in the urban locations had industries sorted according to national trends. From this, we calculate Ideal Urban Allocation d = ∑ i Counterfactual Urban Emp% d,i •National Urban% i . By definition, this ideal metric will always equal or exceed the actual metric. Our primary measure of spatial mismatch is Spatial Mismatch d = (Ideal Urban Allocation d -Actual Urban Allocation d ) / Ideal Allocation d The index simply captures the degree to which the allocation of industries in a district does not conform to what we would have expected based upon national urbanization patterns. Table 8 examines the spatial mismatch adjustments in a regression format very similar to Table 5. We control for the level of initial spatial mismatch for a district, always finding that areas with large initial spatial mismatch tend to decrease the mismatch over time. We also control for the change in urbanization evident during this period. We include this second control because spatial mismatch will be mechanically lower for more urbanized districts (at an extreme, no mismatch is possible if the district is 100% urbanized). The exact format of this control, or even its inclusion, is not important to our findings, but we include it to be conservative. It shows that our spatial mismatch metric declines as urbanization increases.
Among our focal variables, districts with more educated workforces show stronger declines in spatial mismatch. This is true in both sub-periods, and it is especially strong in the unorganized sector. In the organized sector, the education variable loses statistical significance once state fixed effects are controlled for. On the other hand, while the coefficient on the infrastructure measure is generally negative, it only displays a powerful connection during the 1994-2000 period. The cost factors do not appear important, beyond potentially what is captured by the base urban change variable.
These results are robust to several adjustments in metric design. We find similar results when normalizing by the average of Ideal Urban Allocation d and Actual Urban Allocation d and when considering the raw difference without normalization. We likewise find similar results when using measures of plant allocation without considering employment weights. Finally, we obtain similar results when not weighting the extent of misallocation by the national urbanization percentages for an industry but instead treating misallocation as a binary outcome. This work is both encouraging and suggestive of future research. On one hand, while our metric design is admittedly simple, Table 8's results suggest that the urbanization process in India linked to education, and perhaps infrastructure, is improving spatial industry allocation. Unreported estimations find these effects are most pronounced at medium and high levels of urbanization. On the other hand, changes in urban spatial allocations due to cost factors are not associated with improvements. This latter result could be due to limitations in our cost measures, or it could suggest that the cost-based sorting does not help in this regard, which would be surprising. We hope that future research can clarify these matters.
We want to note that there are limits to our approach in the Indian context that should be considered in future work. Anecdotal accounts of India's business suggest that some firms which should typically be in urban areas instead choose to locate in rural areas due to a combination of cheaper land prices, lower pollution restrictions and greater ability to generate own electricity, lower regulations, weaker congestion, and so on. Our approach does not allow for these types of realities, and it would be interesting in future work to attempt to model the ideal spatial allocation if these realities are considered.

Section 6: Conclusions and Implications
In this paper, we closely examine the movement of economic activity in Indian manufacturing between urban and rural areas. We find that while the organized sector is becoming less urbanized, the unorganized sector is becoming more urbanized. This process has been most closely linked to greater urbanization changes in districts with high education levels; a second role is often evident for public infrastructure as well. On the whole, these urbanization changes have modestly improved the urban-rural allocation of industries within India's districts.
We want to note several key factors that our paper does not address. We have discussed at various points the limits on cost side factors. Especially with respect to real estate costs or limitations on land availability, our measures are quite crude. We hope that better data emerge in the future to refine these estimates. Related, anecdotal accounts for India suggest that urban-rural differences in regulation, severe congestion, 12 and limits on urban property titles also direct firm location. While we have started to collect these data on the congestion side, this paper has not been able to model these factors systematically yet. Thus, to some extent, our wage variables may be capturing these issues as currently constructed.
Observers have frequently noted the relatively slow pace of India's urbanization (even recognizing the differences in urban definitions); moreover, the movement of organized manufacturing sector plants to rural areas is surprising, given the relative youth of India's manufacturing sector. Perceived wisdom is that this sluggishness is in part due to the limits imposed by India's poor infrastructure and weaker education levels, among other factors like strict building regulations (Sridhar 2010). Our work supports these claims. Continued investment 12 Ministry of Urban Development (2008) finds that the problem of congestion does not come from the number of vehicles in India but their concentration in a few selected cities, particularly in metropolitan cities. For instance, 32% of all vehicles are in metropolitan cities alone, and these cities constitute about 11% of the country's total urban population. 20 in these factors, beyond their direct effects for Indian businesses, may also provide beneficial effects from an urbanization and spatial allocation perspective. 13 Our findings suggest that policies that take an inclusionary approach to the urban informal economy may be more successful in promoting local development and managing its strains than those focused only on the formal sector. It is very important for Indian policy makers to recognize that much of the urbanization that is occurring is in the unorganized sector. Moreover, education and infrastructure investments, regardless of original motivation, are primarily operating through the unorganized sector. Going forward, adequate provision of infrastructure is necessary for the informal sector to develop. The more Indian cities recognize this influx and design appropriate policies and investments to support it, the more effective the policy interventions will be. Examples of inclusionary policies are mechanisms to ensure that urban informal livelihoods are integrated into urban plans, land allocation, and zoning regulations; that the urban informal workforce gains access to markets and to basic urban infrastructure services; and that organizations of informal workers participate in government procurement schemes and policy-making processes.
It is something of a paradox that India, among the most densely populated countries in the world, is also among the least urbanized. An important aspect for India's continued growth is better and deeper urbanization over the next two decades than it has achieved over the past two decades.       Baily et al. (1992), and Griliches and Regev (1995). For the Bailey et al. (1992) technique, the within component represents changes in urban shares within districts with districts weighted by initial employment shares for the Indian economy. Positive values indicate that districts tended to have increasing urban shares when weighted by initial employment. The between component represents changes in employment shares across districts interacted with the initial deviation of districts from the national urban share. Positive values indicate that employment tended to be reallocated towards districts that had higher initial urban shares. The covariance component term represents the interaction of changes in urban shares for districts across the period with changes in employment shares for districts across the period. Positive values indicate that fast-growing districts also experienced rising urban shares. The three components by definition sum to the total change in urban share for India, with some minor deviations due to panel composition changes. Appendix Tables 1a and 1b 1994-2005 1994-2000 2000-2005 1994-2005 1994-2000 2000-2005 1994-2005 1994-2000 2000-2005 (1) Positive values indicate an increase across the period in the urban employment share of the district-industry. Estimations report standard errors clustered by district, include state and industry fixed effects, include 1700 observations, and weight observations by the interaction of log district size and log industry size. + significant at 10% level; ++ significant at 5% level; +++ significant at 1% level.  Table 8: Estimations for changes in urban spatial mismatch in manufacturing sector DV: Change in urban spatial mismatch for district employment Notes: Estimations consider changes in district spatial mismatch for 1994-2005. The spatial mismatch index compares the observed industry distribution of district employment across urban and rural locations to a counterfactual where plants in the district are allocated to urban and rural locations according to national propensities by industry to be in urban settings (holding constant for the district the total urban and rural activity). Negative values indicate that the industry distribution across the district shifted to be in closer alignment to the counterfactual distribution. Estimations report robust standard errors, include state fixed effects as indicated, and weight observations by log district size. + significant at 10% level; ++ significant at 5% level; +++ significant at 1% level.  Griliches and Rejev (1995) Notes: See Table 4. Appendix