Trade, Internal Migration, and Human Capital: Who Gains from India’s IT Boom?

How do trade shocks affect welfare and inequality when human capital is endogenous? Using an external IT demand shock and detailed internal migration data from India, I first document that both IT employment and engineering enrollment responded to the rise in IT exports, with IT employment responding more when nearby regions have higher college age population. I then develop a quantitative spatial equilibrium model featuring two new channels: higher education choice and differential costs of migrating for college and work. Using the framework, I quantify the aggregate and distributional effects of the IT boom, and perform counterfactuals. Without endogenous education, estimated aggregate welfare gain from the export shock would have been half and regional inequality about a third higher. Reducing barriers to mobility for education, such as reducing in-state quotas for students at higher education institutes, would substantially reduce inequality in the gains from the IT boom across districts. JEL Classification: F16, F63, I24, J24, R12


Introduction
New economic opportunities that arise from globalization are often accompanied by a rising demand for different types of skills. Inequalities in local access to education and jobs, along with mobility frictions, make it costly for individuals in some regions to acquire education or pursue better job opportunities. These frictions could be particularly large in developing countries. To what extent do these frictions limit the gains from trade and exacerbate inequality? What policies can help reduce these inequalities? The main challenge in answering these questions is disentangling the different ways in which individuals respond to these opportunities, such as choosing the sector and the locations of work and education, and the interdependence between these decisions.
In this paper, I analyze the effects of trade on welfare and inequality when education choice is endogenous and when there are mobility frictions to access both education and work. Combining detailed spatial and migration data, I document that IT employment and engineering enrollment responded to the rise in Indian IT exports in the late 1990s, and this response was heterogeneous across regions. Consistent with these stylized facts, I develop and quantify a spatial equilibrium model that adds two new margins of response relative to the existing economic geography literature: first, agents can acquire new skills and second, they can migrate internally to acquire these skills. I find that without higher education choice, estimated aggregate welfare gains from the IT boom would be halved and estimated regional inequality would be a third higher. Restricting individuals to go to college only in their home districts (i.e., not allowing for mobility for education), reduces the estimated aggregate welfare gains marginally but increases regional inequality by 15%.
The paper begins by providing a set of stylized facts about the labor market consequences of the IT boom using spatially granular sectoral labor and education data that I compiled and a unique census dataset tracking migration flows between Indian districts, disaggregated by reason for migration. From 1998 to 2008, while Indian IT as a fraction of total service exports increased from 15% to 40%, engineering enrollment as a fraction of total enrollment more than doubled, and total college enrollment increased three-fold. I document two salient stylized facts: 1) IT employment and engineering enrollment positively respond to IT exports, with IT employment responding more when nearby regions have higher engineering enrollment and exports; and 2) distance affects migration, and individuals migrate more for work than for education. State borders restrict migration flows for education more than that for work, reflecting state-level barriers to mobility for education, such as in-state quotas for students at higher education institutes.
Consistent with these stylized facts, I develop a quantitative spatial equilibrium model that allows individuals to make education and work decisions in two stages. In the first stage, they decide what and where to study, accounting for access to higher education and job opportunities. In the second stage, individuals choose the sector and location of work. The first and second stage decisions generate the education and employment responses respectively, documented by stylized fact 1. To my knowledge, this is the first paper to allow for and estimate differential mobility costs for work and education. The estimated mobility costs are much higher for education than for work, consistent with stylized fact 2.
In the model, sector specific trade shocks, such as the Indian IT boom, change the relative returns to occupations across locations depending on two factors: 1) the location's comparative advantage in that sector and 2) the location's connectivity to other locations.
The changes in the relative returns to occupations affect an individual's incentives to invest in different skill types. Skill investments are constrained by the local availability of higher education and the costs of moving to regions with colleges. Thus, regions differ in how much skilled labor they can access and consequently, by how much they can expand IT production. To the extent that the external demand shock and historical regional differences in comparative advantage are not correlated with unobserved productivities that are determined from the supply side, I can leverage the IT boom to estimate the structural model parameters, such as the elasticity of software exports to software prices.
Differences in local access to jobs and education, along with differential moving costs for work and education, generate regional inequalities in the welfare gains from the IT boom.
People face differential migration costs when they move for education or for work.
The dependence of job opportunities on skill levels makes it challenging to separately estimate work mobility costs using migration data that do not track the skill level of the migrant. I use the two stage structure of the model to explicitly account for such dependence and use the unique census data that track why people move, to estimate these two costs separately. 1 I find that the mobility costs across districts, measured as the dis-utility from moving, for education are 7 percentage points higher than those for work. Estimated state border effects are large: being in the same state increases migration between neighboring districts by 269% for education and 59% for work. There are several reasons why the mobility cost of education could differ from that of work, such as policy-induced mobility barriers and language barriers that could have a differential effect depending on an individual's age. In India (as in many other countries like the U.S. and China), there are state quotas in higher education institutes for in-state students.
This policy could result in higher costs of crossing state borders for education than for work.
Compared to a benchmark quantitative model with fixed skill types, I find significantly different aggregate and distributional consequences of trade across regions after incorporating the mechanism of endogenous education choice. Almost half of the gains in average welfare are driven by the ability to change skills: without endogenous education, the average individual would have benefited by 0.67% compared with an average gain of 1.12% in the endogenous education case. Even though inequality in the distribution of employment is reduced, the rise in inter-regional welfare inequality due to the IT boom, measured as the coefficient of variation in regional welfare at the origin district, is 37% more in the fixed-skills model than in the model with endogenous education choice. The key mechanisms leading to higher aggregate welfare and lower welfare inequality in the endogenous education model, compared with the fixed-skills model are the ability to acquire skills and to move across regions for education.
How important is this mobility cost for education that I introduce in a spatial equilibrium model? To quantify the importance of mobility costs for education, I restrict individuals to attend college in their home districts. This counterfactual increases regional inequality by 15%. The gap between the welfare gains of the worst off and the best district increases by 63%.
The question of how to reduce inequality across both regions and skill groups lies at the heart of many policy debates. This paper suggests policy interventions in the education market that can reduce trade-driven regional inequality, but not by moving jobs directly. The policy of reducing in-state quotas for students at colleges can reduce the migration costs for education relative to work. In the model, this is implemented by restricting the effect of state borders on migration for education to be exactly the same as that for work. I find that the rise in average welfare would have been 1% higher compared with the rise in the model without these restrictions on work and education costs, and regional inequality, measured by the coefficient of variation, would have been 27% lower. Reducing inter-state barriers to education can significantly increase access to education for out-of-state students. This can increase the opportunity for people from remote regions to gain access to education and migrate to areas with more high-skilled jobs. Although this policy does not reduce inequality in the distribution of employment, it reduces inequality in the distribution of welfare by increasing access to education.
This underscores the importance of general equilibrium effects induced by the expansion of exports, which requires us to give more consideration to the interactions of trade, education, and labor markets. This paper makes three contributions. First, I introduce human capital acquisition decisions in a general equilibrium economic geography model. The general equilibrium aspect is important, since human capital takes time to respond to employment opportunities, during which both people and goods can move. Second, to my knowledge, this is the first paper to estimate the mobility costs for work and education separately and show that these costs are quantitatively different. The unique Census data tracking migration flows disaggregated by reason, obtained through an agreement with the Indian government, made this estimation possible. I show that access to both jobs and education are individually important for determining the spatial dispersion in the gains from trade.
Third, the framework is well-suited for analyzing the effects of policy-induced spatial frictions to moving for higher education, such as in-state quotas at colleges. Reducing these barriers would increase aggregate welfare marginally but substantially decrease the impact of the export shock on regional inequality. The results underscore the potential for education policies to distribute the gains from globalization more equally.
The model builds on a large theoretical literature in the fields of international trade, economic geography, labor, and migration. Similarly to Caliendo et al. (2019), Fuchs (2018, and Kucheryavyy et al. (2016) the model features multiple sectors. Like Allen et al. (2018) and Tsivanidis (2018), the model features agents with heterogeneous skill types. However, unlike the above models, the theory developed here endogenizes the formation of skills across space.
Costly labor mobility relates this paper to the class of gravity migration models, such as those by Allen et al. (2018), Tombe and Zhu (2019), Fan (2019), and Bryan and Morten (2019) that feature multiple sectors or regions with costly mobility of goods and people. Kone et al. (2018) use the Indian migration data to provide evidence of how migrations flows relate to distance and cultural differences. Imbert and Papp (2020) provides evidence that the seasonal cost of migration from rural to urban India is very high such that persistent wage differences between the rural and urban sectors can persist.
Differently from these papers, I provide separate estimates for mobility costs by reasons for migration.
A few structural trade models study endogenous human capital acquisition in trade. Khanna and Morales (2017) studies how US immigration policy and the internet boom affected aggregate welfare in both the US and India in a dynamic setting with international migration. In contrast, this paper studies the regional distributional consequences of the IT boom, quantifying how costs of migration contributed to regional inequality induced by the IT boom. Compared to Ferriere et al. (2018), who build a dynamic multi-region model of international trade with heterogeneous households, incomplete credit markets, and costly endogenous skill acquisition, this paper, in a static setting, additionally features costly mobility for education. A few other theoretical works in this literature focus on quantifying the overall response of endogenous education to trade, without considering regional differences, such as Danziger (2017). Seminal works on the dynamic Heckscher-Ohlin (HO) model, which embed endogenous factor formation in response to trade in the classic HO framework, include Stiglitz (1970), Findlay and Kierzkowski (1983), and Borsook (1987). Consistent with this literature, I demonstrate that trade can strengthen a country's initial comparative advantage by changing the incentives to acquire skills, and thereby reduce regional inequality in the gains from trade.
Endogenizing education relates my model to the class of human capital accumulation models prominent in the education and labor literature. In these models, forward-looking individuals make education decisions based on labor market returns and costs of tuitions (Jones and Kellogg (2014), Johnson (2013), and Lee (2005)). Compared to this class of models which requires keeping track of a large number of state spaces, I use a simpler twostage model that allows me to tractably incorporate many regions and bilateral migration flows between these regions.
Given the emphasis in the trade literature on the effect of exports and trade liberalization on skill premium, there has been relatively little research on the effect of trade on skill acquisition. A number of empirical studies such as Atkin (2016), Blanchard and Olney (2017), Edmonds et al. (2010), Greenland and Lopresti (2016), Shastry (2012), Liu (2017), and Oster and Steinberg (2013) focus on the impact of trade on primary and secondary education. Exceptions to these are Li (2018) and Khanna and Morales (2017) which study the response of college enrollment to high-skill export shocks. More evidence has emerged recently (Li (2019), Hou and Karayalcin (2019), Ma et al. (2019)).
Complementing this literature, I provide reduced form evidence about the response of tertiary enrollment to shocks in the high-tech sector in a large developing country, and I document regional heterogeneity in this response.

Data
A major constraint in studying the effects of IT export growth on human capital acquisition in the presence of costly migration is the lack of employment and education data, disaggregated at the sector of work and field of education level, combined with the absence of detailed migration data. To this end, I use three sources to collect data on India's IT sector and access confidential Indian Census data to obtain district-to-district migration flows by reasons for migration.

Data on the Indian IT sector
I use three rounds of Economic Census data (1998, 2005, and 2013) to obtain data on total IT employment across all districts of India. While the advantage of the Census data is that it covers the entirety of all Indian firms and hence reports total employment, the data are not disaggregated by level of education. To supplement this information, I use data from the National Sample Survey (NSS) rounds 50, 55, 60, 61, 62, 64, 66, and 68. These surveys record information on the sector and location of occupation as well as the field of study. The drawback of the NSS data is that it represents only a small sample, and hence does not contain a lot of important sector-level information.
However, it does report multipliers on each unit of observation, which, in the NSS, is an individual. This allows me to obtain the unbiased ratios of engineers, non-engineers, and both college-educated and non-college-educated individuals in each sector of employment.
By multiplying these ratios with total employment from the Economic Census, one can recover the distribution of the population by field of study and sector of employment in each district of India. Data on wages by sector of occupation and field of education are also obtained from the NSS, supplemented with data from the Economic Census.
For more details on combining NSS with Economic Census, see appendix A.3 for further details. Tables VII and VIII report the daily average district-level raw wages in Rs. and the average district-level employment respectively. From these tables, observe that both the wages and employment of college-educated and non college-educated workers increased more in high-skill intensive industries between the pre and post boom periods compared to those in manufacturing.
As an additional source, for the reduced form analysis, I supplement the IT employment and wage data with data on IT exports from NASSCOM (the leading trade association of the software industry in India) directories 1992, 1995, 1998, 1999, 2002, and 2003. The strength of the NASSCOM dataset is that it contains data on "95% of all registered IT firms in India" 2 . NASSCOM also contains data on IT employment, and this information is divided according to whether employees are technical employees (that is, associated directly with the provision and deliverance of IT services) or non-technical employees (all other employees). Several papers have used the NASSCOM data, which is the most comprehensive source of data on Indian IT firms; among these, Tharakan et al. Throughout the paper, 1995-1999 is referred to as the pre-boom period and 2001-2011 is referred to as the post-boom period. The relatively longer choice for the post boom period is based on the fact that it takes at-least 2 or 3 years to prepare for college and at-least 4 years to complete a college degree. Thus, the effect of the IT boom on enrollment and graduation will be observed with a lag.

Data on internal migration
The National Census of India for 2001 is the main data source for internal migration in India. An individual is a migrant, according to the Census,"if the place in which he is enumerated during the census is other than his place of immediate last residence" (Census, 2001). The Census includes additional questions based on the last residence criteria. These questions include reason for migration, such as marriage, education, or employment; the urban/rural status of the last residence's location; and the duration of stay in the current residence since migration. This level of disaggregation is crucial for separately estimating the costs of migration due to education and work. Publicly available Census data only report the destination district and whether the migrant's origin is in the same state or out-side the state, aggregated over all reasons for migration. I obtain the more disaggregated data through a special agreement with the Census of India. More information on this data can be found in Section 4 and descriptive evidence about the proportion of people migrating for work and education can be found in Table II.

Other data
Data on the linguistic distance of each Indian district from Hindi was obtained from Gauri Kartini Shastry who constructed the linguistic distance measures for Shastry (2012).
Construction of the index, which is key to my empirical strategy, is detailed in Shastry's paper and in appendix ?? of this paper. The linguistic distance I use is calculated by ethno-linguistics based on the similarity of grammar and cognates. For example, daughter in English is "dokhtar" in Perisan and "nuer" in Mandarin Chinese. While Persian and English are both part of the Indo-European language family, Chinese is derived from the Sino-Tibetan language family. Linguistic distance between Persian and English is therefore lower than between Chinese and English or Chinese and Persian. In India, languages differ across regions. The 1961 Census of India documented speakers of 1652 languages from five language families. There can be wide linguistic diversity between districts, and most people adopt a second language that is a widely accepted speaking medium across districts. Of all multilingual people who were not native speakers, 60 percent chose to learn Hindi and 56 percent chose English (Shastry (2012)). Shastry (2012) proxies English-learning costs as linguistic distance from Hindi relative to English. She shows that since a necessary condition for employment in the IT industry is fluency in English, IT firms locate more in districts that have a higher proportion of English speakers, as proxied by linguistic distance of that district to English relative to Hindi. Data on the college-age population, college enrollment, and literacy are collected from the decadal Census data of 2001 and 2011. Summary statistics for enrollment are reported in Table IX in the appendix. The most notable is the rise in engineering enrollment. Between the pre and the post boom period, the proportion of engineers in total college enrollment more than doubled from 5% to 11%. During this time, the total number of people enrolled in college also increased by three-fold. Thus, the total number of students studying engineering also increased in absolute numbers.

Background of India's IT growth
While the last two decades have witnessed a world-wide expansion of IT and consequent increase in demand for computing skills, this expansion has been disproportionately larger for India than for any other country in the world (International Trade Center (2017)). Figure IV plots the growth in IT exports over time, where the value of IT exports in 1993 has been normalized to one. This figure shows that IT exports from India have been steadily increasing since 1993, but a large jump occurred in the late 1990s and early 2000, when normalized software exports increased by more than 76%

Panel A of
in one year. Figure IV shows that during this period, IT employment as a fraction of total employment was also rising. From 1998 to 2000, IT employment as a fraction of total employment almost doubled. Engineering as a fraction of total enrollment was also generally increasing, but the largest jump occurred after 2000. While many factors are responsible for the growth of IT in India, the lack of domestic demand for IT means that the sector's growth is constrained by the growth in world demand for Indian IT. This constraint was eased during the late 1990s and early 2000s, when several major events suddenly escalated demand for Indian IT. The Y2K phenomenon dominated from 1998 to 2000, along with the earlier dot-com boom and, later on, the dot-com bust. In order to solve Y2K-related computer problems, commonly known as the "Y2K bugs", IT firms started offshoring large parts of their work to developing countries such as India. 4 The dot-com boom was a historic economic bubble and period of excessive speculation that occurred from roughly 1995 to 2000; it was marked by extreme growth in the use and adaptation of the Internet. The dot-com bust caused many firms in the US (two-thirds of India's IT market) and elsewhere to slash their IT budgets, prompting even more outsourcing to India. (Economist (2003)) Most notably, technological progress in the worldwide Internet which had been un-derway for some time, was responsible for bringing world outsourcing demand to Indian firms. As Khanna and Morales (2017) notes: The absence of world-wide Internet during the 1980s meant that on-site work ("body-shopping") dominated, because otherwise software had to be transported on tapes that faced heavy import duties. But in 1992, satellite links were set up in Software Technology Parks (STP), negating the need for some kinds of on-site work, and this boosted the offshoring of work to India. In 1993, the shift from B-1 to H-1 visas in the US further lowered the incentives to hire Indian engineers for on-site work, as they were to be paid the prevailing market wage.
While world-wide events such as the Y2K shock, the dot-com boom and bust, and changes in US H-1B visa policies provided considerable external demand stimuli for the growth of the Indian IT sector, certain factors inherent to India are responsible for this expansion of Indian IT exports. It is generally agreed that the availability of low-cost, high-skill human resources has given India a comparative advantage in the IT sector over its competitor nations (Kapur (2002)). Moreover, much of the population (over 60%) is under 25, and India has one of the largest pools of technical graduates in the world. India also has a large English-speaking population due to its British legacy, and this fact is considered one of the key ingredients in the success of IT. As Shastry (2012) has shown, IT firms in India are located mostly in regions with a larger English-speaking population.
A natural advantage of India is its time difference with the US, which is one of India's biggest customers for IT services; this enables India to offer overnight services to the US, effectively creating round-the-clock working hours for outsourcing firms (Carmel and Tjia (2005)).
The growth of Indian IT is the result of much more than a single transitory demand shock that temporarily catapulted the sector upward. With the expansion in Indian IT exports, Indian IT employment continued to increase. Wages peaked during the sudden expansion of the late 1990s and early 2000s. Arguably, in response to rising IT employment opportunities, engineering enrollment started to respond after 2000, as shown in Figure ??.

Reduced-form facts
In this section, I present four facts about internal migration, the relationships between IT exports, regional employment, and enrollment over the short-run and the long-run in India. I use the expansion of IT during 1998-2002, largely driven by external demand shocks as described in Section 1.3, to study the labor market effects in the long-run, that is, between 2005-2011. The choice of this time frame is dictated by the fact that an engineering degree takes at-least four years to complete and thus any effect on the labor market related to skill acquisition will occur after 2004-2005.
Fact 1: IT employment and engineering enrollment positively respond to exports.
To understand how IT employment and engineering enrollment changed across regions after the IT boom, I estimate the following event study specification: where Y dt is standardized IT employment or standardized engineering enrollment in dis- The idea is that districts which initially had higher connections with the rest of the world, as measured by the proportion of software exports in 1995, will gain more from the expansion in world demand for Indian IT than districts that had little or no connection with the rest of the world. In alternative specifications reported in Online Appendix B, following Shastry (2012), I instrument the initial software exports with the historical linguistic distance of a district from English.
In panel A of Figure  In panel B of Figure II, I plot the response of engineering enrollment at ten year intervals, as the available census data allows. As the graph shows, engineering enrollment has also been rising since 2001. Since the Census data is available at decadal intervals, I cannot show the pre-trend estimates for enrollment.
Fact 2: The effects are heterogeneous. Employment responds more when nearby regions have higher engineering enrollment and higher IT exports. The heterogeneous effects are stronger in the long run.
In equation 2 below, I add an interaction term between the number of students enrolled in engineering in 1991 and the proportion of software exports from district d in 1995. Estimated coefficient δ t is plotted in Figure III. δ t measures the differential response of IT employment between the pre and post boom periods depending on the historical level of engineering college enrollment in 1991, in districts that already had prior software exports in 1995.
(2) Figure III shows that, conditional on the level of software exports, post 1998, IT employment responds more in districts that, in 1991, had more enrolled engineering students in same-state, nearby districts. The intuition, formalized in the model, is that in these districts, it is easier to expand future IT production due to having access to more college-educated, engineering program graduates in close proximity. Regression results are reported in appendix B in Table XIII.
Both IT employment and engineering enrollment thus respond more in districts that had prior IT exports compared to districts that did not. Figure  I next establish a set of facts related to the costs of migration over distance for both work and for education.
Fact 3: Migration reduces over distance. In addition, state borders negatively affect migration and this effect is significantly larger when people migrate for education than when they migrate for work or for any other reason.
Using the Poisson pseudo maximum likelihood procedure (PPML), I estimate (3), similar to Kone et al. (2018). 5 PPML is a non-linear estimation procedure which performs better than a log-log estimation in the presence of zeros and has been traditionally used in the estimation of migration gravity equations (Santos Silva and Tenreyro (2006)).
where l oj is the stock of migrants migrating from district o to district j for education (column 1 in Table I), for work (column 2) or for other reasons (column 3). Dist oj is a measure of geographic distance between two districts. 6 For bilateral distance between any two districts, I use the geodesic (flight) distance between the geographic centers of districts i and j. All these variables included in the gravity specification are obtained from the calculations by Kone et al. (2018). 7 lang oj denotes the likelihood of any two individuals from districts i and j being able to communicate in a common language. This is given by: where s l i is the share of people from district i having mother tongue l.
There are three contiguity variables: dif f − N BR ij is a dummy variable that takes the value 1 if districts i and j are in different states but are neighbors; same − N BR ij is a dummy variable that is equal to 1 if the districts i and j are in the same state and are neighbors; same − notN BR ij is a dummy variable that is equal to 1 if the districts i and j are in the same state but are not neighbors. The base group is 'not in the same state and not neighbors'. The difference between γ 1 and γ 2 gauges the role of the state borders. Other reasons include marriage, business and other unclassified reasons 7 Geodesic distance is the length of the shortest curve between two points along the surface of a mathematical model of the earth-between the districts' geographical centers, denoted as distance centroids.
8 This effect is calculated by (e 3.577−2.422 − 1) * 100 9 Support for the 85% reservation policy started in Maharashtra from the year 2011 with the backing quotas also exist for jobs and thus create significant hurdles for moving across states, the employment quotas are more specific and less ubiquitous than the in-state education quotas.
Fact 4: Individuals migrate more for work than for education and the distributions of flows for work and for migration across districts differ accordingly. Figure V shows the histogram of migration flows by reason for migration. The xaxis plots the percentage of people who migrated for work and for education out of the total number of migrants at the destination district. The y-axis plots the number of destination districts with the corresponding percentages. As is clear from the plots of these very different and almost non-overlapping distributions, out of the total migrant population in most destination districts, a much higher percentage had migrated for work compared to that for education.
Facts 3 and 4 are also borne out by variety which is costly to trade across locations, as in an Armington set up. Each worker is endowed with an unit of labor which they supply inelastically. There are S sectors in the economy.

Individuals
Utility of an individual i who attained college education in field f from region o 2 and then works in sector S in region d depends on wages, amenities, migration costs, price indices, and idiosyncratic productivity shocks, is given by: (suppressing individual subscript i from utility for expositional clarity) where w f,dS is the wage of a worker in region d with a degree in field f who is working in sector S, u f,dS is the amenity of living in region d for a worker with degree f working in S, henceforth referred to as type (f, S) worker. P d is the cost of living in region d, which is endogenously determined as described in Section 5.
(1 − µ 2 o 2 d ) is the utility cost of migrating from o to work in d.

The idiosyncratic productivity shocks for each individual
θ determines the dispersion of the Frechet productivity shocks.

Utility cost of education
The above formation of utility ignores the utility cost of education. To add workers' education choice, I introduce an utility cost of education. Let a o 2 f denote the amenity of studying f in o 2 , which includes the unobserved preferences for studying f in o 2 and the time and money cost of education. In other words, it is the fraction of utility lost in order to study field f in region o 2 . People who choose not to go to school earn income w u,dS and people who go to school earn a normalized stipend 1. Let ζ iof denote the idiosyncratic γ again determines the dispersion of amenities of studying f in o 2 . There is also a migration cost incurred due to moving from one's location of birth o 1 to one's location of study o 2 denoted by (1 − µ 1 o 1 o 2 ). Thus utility of an individual i born in o 1 who chooses to study field f in location o 2 and then decides to work in sector S in region d is given by: (suppressing individual subscript i in utility for expositional clarity) where IU is the weight placed on period 1 utility. 10 w u,o 2 is the wage earned by unskilled workers in stage 1 in region o 2 . w u,o 2 is 1 if the person is not employed in stage 1. In other words, people who are not working in stage 1 earn a normalized stipend of just one. Derivation of this utility is given in Online Appendix C.1.

Migration decisions for education and work
When choosing the location and field of education in stage 1, the individual takes into account his expected utility from stage 2. She does not know the exact utility in stage 2 since the idiosyncratic productivity shock is not yet observed. We thus solve the individual's problem backwards. In stage 2, given the choice of location and field of education (sector of work for an unskilled person), the individual makes his choice of sector of occupation (S) and location (d), given by: Given the Frechet distribution of the idiosyncratic productivity shock, the proportion of people with degree in f from region o 2 who goes to region d to work in sector S is given by: Φ o 2 f is a measure of access to jobs for an individual from o 2 with degree f . It summarizes the expected value of all the job opportunities available to a person from o 2 with a degree in f , taking into account costs of migration and the distribution of job opportunities.
In stage 1, the individual maximizes E(U io 1 o 2 f,dS ) by choosing (o 2 , f ).
Using propositions 1 and 2, the results of the maximization problem of an individual in stage 1, described by the left-hand side, is given by: ) is the expected income prior to drawing match productivities for workers trained in field f at location o 2 .
The proportion of people living in o 1 who studies f in region o 2 is then given by:

Firms
There is perfect competition in the production of each variety. The representative firm of sector S in location d produce a variety of the sector S good using both highskilled L hdS and low-skilled labor L ldS , combined in a nested CES constant returns to scale production function: is the effective labor supply. The Armington structure of the model delivers a cost of living index P d for each

External trade
The country exports a tradeable good to the RoW where each region of the country produces a variety of the tradeable good, and, in turn, imports an importable good from the RoW. The country is a price-taker in the world market so the price of the importable good is given. The income of the RoW is also exogenously given. 11 Gravity determines the level of trade between each region of the country and the RoW. People can move within the country but not outside the country. The demand for IT exports from region where p d,IT is the price of IT variety from region d, τ d,IT are the costs of exporting IT to the RoW, mostly consisting of communication and management costs, E IT is the RoW's income spent on the IT sector. Using equation 10, we can solve for IT prices in each district:

Internal trade
The sectors other than IT and the importable goods sector are all internally traded.
The gravity equations determining the flows of these internally traded sectors are given by: Equation 12 states that the income of sector S in region d equals the sum of exports from sector S in region d to all other districts. Equation 13 states that the expenditure of region j on sector S good must equal the sum of imports of good S from all other regions.

Equilibrium
For each region (in our analysis, district), equilibrium in the steady-state is defined as a set of sectoral employment according to field of study (L f,dS ), field-wise college enrollment (L o 2 f ), wages (w f,dS ), prices (P d ), and quantities (Q dS ). For each district, the equilibrium takes as given population, amenities and bilateral migration costs of studying and working according to fields of education and sectors of employment, trade costs between domestic districts and between domestic districts and the RoW. It also takes as given the parameters governing the dispersion of productivity shocks (θ) and amenity shocks (γ), expenditure shares (α), the elasticity of substitution between highskilled and low-skilled workers (ρ S ) and between different types of high-skilled workers (ρ hS ).
The steady-state equilibrium is governed by the following equations describing goods and labor market clearance: 1. Given productivities and the initial distribution of population, the quantity produced in each location is determined by the production functions.
2. Given quantities produced in each location and trade costs, exogenously given world income spent on IT, from equation 11 the price of the tradeable good is given by the market clearing for the tradeable good S in each region d: 3. Given quantities produced in each location and trade costs, price of the tradeable good, from equations 12 and 13 prices of the externally non-tradeable goods but internally tradeable goods S is given by market clearing of the non-tradeable goods: is the income of region j, and α is the proportion of income spent on the non-tradeable good. 12 4. Given prices of both tradeable and non-tradeable goods, the wages of workers with field of education f working in industry S in region d are given by: (16) 5. Given wages and prices, migration flows for education determine the population distribution of skill at each location. The proportion of people from o migrating to j to seek education in field f is given by: where L o 1 is the college eligible population in o.
6. Given wages, prices, and the distribution of skill in each region, the distribution of people with skill f working in industry S in region d is given by: In the steady-state, the initial distribution of population working in different industries with different skill levels is equal to the final distribution.
This completes the description of equilibrium in this model. In appendix C.4 , I show that a competitive general equilibrium exists.

Summary of the mechanics of the model
This section describes how a rise in the demand for the externally traded good, in this case IT, affects employment, education, and ultimately welfare of individuals in different regions within the country. The rise in IT export demand translates into differential changes in IT real wages across regions, depending on the region's geographic location that determines how difficult it is to migrate there and the regions comparative advantage in IT, as measured by the historical regional software exports and the region's linguistic distance from English. People start moving into regions where the real wages rise faster.
This is the place where the mobility costs for work matter. This part of the model is like a specific factors model in that engineers are more required in the IT sector. This is a standard spatial model with no changes in skills. But then the rise in real wages changes amount of income spent on IT in the foreign country also has to be equal to the amount of income spent on imports by the domestic country.
(3) uses the condition that the income spent on Non-IT goods by each region is α proportion of its income, that is, α(p IT qIT + p N onIT q N onIT ). This implies that (1 − α)(p IT qIT + p N onIT q N onIT ) is spent on imports. Since condition 2 ensures that sum of imports is equal to value of sales from IT, trade balance is maintained.
the incentives for higher education, specially engineering. Individuals who are closer to skilled jobs or who are closer to good education facilities are more likely to get educated.
Thus, enrollment rises. This is where the mobility costs for education matter and this is the new component that I add to existing spatial models. This generates a Heckscher-Ohlin (HO) type response to changes in skilled wages.

Identification and estimation
In this section, I estimate the structural parameters that determine the migration and IT trade costs and measure the expenditure shares on goods using available expenditure data. I then use the estimated parameters and the measured quantities, along with the available data on employment, wages, migration and enrollment, to back out the unknown amenities and productivities consistent with the model. Adapting the model for estimation, I assume F =3, where f ∈ F . f can be college degree in engineering, college degree in any other field, henceforth referred to as non-engineering, or no college degree at all. There are two types of high-skilled workers: those who complete a college degree in engineering and those who complete a college degree but not in engineering.
There is only one type of low-skilled worker, those who do not go to college. There are 7 sectors in the economy (S=7), where these sectors are: agriculture and allied activities, manufacturing, wholesale and retail trade, low-skill services, skilled services except IT, the IT sector and an importable sector.
IT is only consumed by the RoW. There is an importable sector: goods in this sector are not produced domestically but are consumed domestically. Goods in the other sectors are all traded internally between districts.
6.1 Estimation of migration costs 6.1.1 Estimation of migration costs due to education In this section, the migration costs of people moving to acquire education are estimated.
Taking the logarithm on both sides of equation 49, the ideal gravity equation of flows of workers from o 1 to o 2 who move to study field f would be estimated by: where f is engineering, non-engineering, or no college. The equation states that the the proportion of people who move from o 1 to o 2 to study field f depends on ii) The bilateral migration costs of moving from o 1 to o 2 , given by µ 1 iii) The geographic advantage of the origin district, determined by its proximity to regions with good job and education opportunities ( Following the migration gravity literature, I parameterize the costs of migration in equation 18 where the migration costs depend on geographic and cultural distances: where DistCentroid o 1 o 2 measures the distance between district-centroids and (lang o 1 o 2 ) measures the proportion of people speaking a common language in districts o 1 and o 2 .
If two districts belong to different states but share the same border, dif f − N BR=1. If two districts belong to the same state and also share a border, same − N BR=1. If two districts belong to the same state and are not neighbors, same − notN BR=1.
The estimating equation becomes: Given bilateral migration data on the number of people moving from district o 1 to district o 2 to acquire education, bilateral geographic and cultural distances, the composite parameter λ is identified in the cross-section by the elasticity of migration flows to distances. The key assumption required for the identification of λ is that the unobserved error term o 1 o 2 which is not derived from the model and does not represent any structural object, is random measurement error and is uncorrelated with bilateral district to district cultural and geographic distances.
Regression 21 thus gives an estimate of The results of the estimation are given in Table III.

Joint estimation of migration due to work and amenities
To estimate the migration costs for work, the ideal regression would be estimating the log of 6, the migration flow equation for work: This relates the flow of people who move from location o 2 to location d for work to the average wage in location d in field f weighted by the option value of studying f (Φ o 2 f ). Since the option value of education varies by origin (o 2 ) and field of education (f ), the relative attractiveness of a destination is no longer separable in just the origin and destination fixed effects, as in traditional gravity models. The problem is that the option value of education contains the unobserved migration costs µ 2 o 2 d and amenities (u f,dS ), and so this relative attractiveness is not known. If we treat this relative attractiveness of a destination as unknown, the existence of unobserved migration costs in the error will bias the estimate of θ.
First, for a relatively remote district o 2 , a rise in bilateral migration cost to d will reduce migration to d but by not as much compared to a district that is relatively wellconnected to regions with employment opportunities, since people from the remote district have fewer options to choose from. Even this effect will differ according to an individual's skill level depending on how valuable destination d is for that skill group. Thus, the existence of this unaccounted for and unknown remoteness measure in the error term will bias the estimate of the elasticity of migration costs downward.
On the other hand, for people in well-connected locations, if the migration cost to a particular district falls, they can more easily turn to other districts compared to their more remote counterparts and this effect varies according to their field of training f . For districts in well-connected locations, the elasticity of migration to migration costs are thus over-estimated.
On the aggregate, it remains an empirical question as to which effect dominates.
The costs of migration depend on distance: Rewriting the estimating equation by inserting the migration cost in terms of distance, where logdist o 2 d is the vector of distances mentioned above, and ζ = −θζ and ζ = (ζ 1 , ζ 2 , ζ 3 , ζ 4 , ζ 5 ) I use a nested nonlinear least squares approach to estimate ζ. The idea is to explicitly account for the effect of the unobserved option value of education by location and degree, thereby correcting the source of the bias in traditional gravity estimation.
After accounting for the unobserved option value of education as I will describe below, the moment condition that identifies ζ is: The assumption is that after accounting for the unobserved attractiveness of the destination region relative to the origin region, the remaining unobserved term is white noise, uncorrelated with migration costs. Since In the outer loop, I choose migration cost parameters to minimize the distance between bilateral migration flows predicted by the model and observed in the data in 2001, the only year for which such detailed migration data is available. Given the assumption that migration costs do not change during the period under study, I use the estimated migration costs and the distribution of employment post-2004 to recover unknown amenities.
As unknown amenities are recovered in the last step from the distribution of population in each location, I update the amenities and re-estimate equation 25 until the migration costs converge. Note that the estimation of unknown amenities requires an estimate of θ, which is described in section 6.1.3.
In the same way, given estimated migration costs for education and the option value of education, one can use population with and without college degrees in each location to solve for unknown quantities a o 2 f , which includes the time and money cost of education as well as unobserved preferences for education.
The results of the estimation procedure are given in Table III. Columns 1 and 2 report the estimation results of the traditional PPML gravity regression where the reasons for migration are education and work, respectively. In the third column, I report the estimates for work using the non-linear least squares method.
Reading off columns 1 and 3 of other countries if such data are available for other countries.

Estimation of elasticity of migration flows to migration Costs
The elasticity of migration flows to migration costs is the dispersion parameter θ that governs the variance of the idiosyncratic component of workers' productivity draws. The higher the value of θ, the lower is the variance in productivity, and thus workers are more identical. This means that workers tend to respond more similarly to changes in migration costs compared to when they are more heterogeneous in their productivities.
Thus, for a given rise in migration cost, the higher is θ, the larger is the fall in migration.
Following Fan (2019), I use the variance in the wage distribution of stayers, that is, the wage distribution of people who do not migrate for work, to identify θ. Using the properties of the Frechet distribution, it can be shown that the productivity distribution of stayers also follows a Frechet distribution where the mean varies by field of education, sector of work, and location of degree. For any (f , d, S) combination, the wage observed in the data is the effective wage (w f,dS ), wherẽ Taking logs on both sides, where F f,dS is a sector of job, field of education, and district fixed effect which is a combination of average wage per effective unit of labor and the average productivity of stayers. The variance of exponentiated residuals (η if,dS ) identifies θ, which turns out to be 2.61. This is very similar to the estimate of Fan (2019), who used the same method to estimate these elasticities to be within the tight range of 2.50 to 2.73. The assumption is that after controlling for field of education, sector, and location of work, the remaining variation in individual wages for those who stay back in the same location is due to variation in the idiosyncratic component, which can include factors such as ability, talent, and family background.
Given the estimate of θ (elasticity of migration flows to migration costs for work) and γ (elasticity of migration flows to distance for work) from section 6.1.2, it is possible to separately identify the elasticity of migration flows to migration costs for education (ζ).
The assumption required for this identification is the following: the elasticity of migration costs to geographic distance is the same irrespective of the reason for migration, once institutional boundaries such as state borders and neighboring districts dummies have been accounted for.
Note that this assumption does not require the elasticity of migration flows to distance to be the same. In fact, these elasticities are very different, as we estimated before.
It only requires the costs of migration to respond to geographic distances in exactly the same way, once we have accounted for state-specific institutional barriers such as differential quotas for work and for education. An example of a violation of this assumption would be any factor that increases or decreases the migration costs for education relative to work over the same geographic distance. For example, one such factor would be the provision of special transportation for students.
This completes the description of my estimation strategy for migration costs.
6.2 Trade costs 6.2.1 Trade costs in the IT sector: In this model, IT is the only good traded with the RoW and it is not consumed domestically. Taking the logarithm on both sides of equation 11, the gravity equation expressing IT trade flows as a function of IT prices and comparative advantage, and getting rid of IT in the notation, I get the following estimating equation: where is a quantity that is constant across districts. Following Shastry (2012) and Banerjee and Duflo (2000), I parameterize the costs of exporting IT as a function of the linguistic distance of d from English and the prior software exports in 1995. ( The historical comparative advantage of a district in this sector depends on the prior links of a district to the RoW, measured by the proportion of software exports historically exported from that district. Prior connections, through building reputation, play an important role in determining the volume of transactions in this sector (Banerjee and Duflo (2000)). Shastry (2012) showed that linguistic distance of each regional language spoken in a district from English determines the cost of learning English for individuals in that district. Since English proficiency is a necessary skill in this industry, the comparative advantage of a district also depends on the linguistic distance of the district from English. Let dist d,IT be the vector denoting the linguistic distance of each district from English and the proportion of historical software exports from d, measured using 1995 export data.
Note that price is unobserved since it includes the unobserved productivities. Using the structure of the production function, price can be log-linearly decomposed into its known and unknown components. Using marginal cost pricing, Substituting this, the estimating equation becomes: Taking first differences, where ln((p h (1−ρ IT ) ) is the observable part of MC, referred to as OC dt Intuitively, in equilibrium, how responsive IT exports are to changes in price depends on the elasticity of substitution between different varieties of IT products (σ IT ), where each variety corresponds to a region. The lower the elasticity of substitution, the more difficult it is to switch to a different variety as the price of a particular variety rises, and the less responsive is IT demand to IT prices..
Since in general equilibrium the unobserved district specific productivities determine the marginal cost of production, σ IT cannot be recovered through a linear regression of IT exports on the observed part of marginal cost. I construct an instrument by leveraging the IT boom of the late 1990s and early 2000. As demand for IT increased, the prices of IT increased in all regions that produce IT. However, the capacities of IT production differ across regions. In particular, regions that are better connected geographically to other populous regions could expand supply more because people can migrate more easily into these regions and thus the supply of labor in these regions is more elastic. Also, regions with a historical comparative advantage in IT production attracted more IT demand from abroad. To formalize this intuition, I develop an instrument by interacting a measure of labor supply for each region (defined below) with the historical software exports of a region. A measure of labor supply access for each region is summarized by The instrument, referred to as I d is formally defined below: Regions that are better connected historically and where the potential labor supply are high will see lower increases in marginal costs and hence lower increases in prices. On the other hand, regions that have potentially high labor supply but are not historically connected will have relatively larger increases in MC.
This estimation requires the assumption that changes in the productivity of nonengineers in the IT sector, in the pre and post-2000 boom periods are uncorrelated with the pre-period exports and the remoteness of a region during the period of the IT boom.
The unobserved productivities, by model construction, do not depend on historical software exports, and the historical distribution of college educated workers. These productivities are the residual quantities that explain the deviation of predicted output from actual output, after these known quantities are taken into account. These historical factors, in turn, are not affected by future changes in productivities. However, to account for the fact that in reality district level productivities in the IT sector can be affected by these factors, I additionally run the following specifications: First, I include controls for the geographic remoteness of a district, as measured by the average log distance of a district to other districts. Second, I include state fixed effects to account for any differential growth in productivities across states.
In Table (IV) column 1, I report the results from the OLS estimation. Columns 2,3, and 4 report the results from the IV estimation. The regression result in column 3 controls for the remoteness of districts and the regression result in column 4 additionally controls for state fixed effects, which, in a first difference equation implies controlling for a linear state-level time trend. Export demand responds negatively to changes in observable prices/ MCs. In the most demanding specification, reported in column 4, I find that a 1% increase in prices leads to a .45% fall in demand, which translates into an elasticity of substitution of 1.45.
The first stage is reported in Table ( be justified on the ground that these regions specialize in very different types of tasks, such as, data processing, software development, multimedia graphics, as is reported in the NASSCOM software data.

Trade costs in the non-IT sector
The iceberg transport cost is taken to be, τ od = distance 1 od , calibrating the distance elasticity to the canonical value of -1 (Head and Mayer (2014)). 15

Quantifying sector-specific productivities
We use the equality of marginal costs to prices to back out the unobserved amenities, after calculating the observable part of MC that depends on known wages and employment, following the long tradition in urban economics (see for example Allen and Arkolakis (2014), Allen et al. (2018)). Equation (31) determines prices: l,d,IT , (w l,d,IT ) 1−ρ IT ) , ρ h,IT are all known, we can recover A ne,d,IT using Intuitively, how the magnitude of estimated prices differ from that of the observed components of marginal cost consisting of the information on wages and employment, helps determine productivities. Note that (p d,IT ) is known by recovering it from equation 11, given estimated trade costs σ IT and exports.p h d,IT l,d,IT are known as they are functions of observables. Finally we recover the productivity of low-skilled workers in the IT sector A l,d,IT in all locations by using the firm's first order condition below: and then A e,d,IT by using the firm's first order conditions.
To recover prices in the internally traded sectors, use equations 12 and 13, the identities that state that the income of sector S in district d equals the sum of exports from sector S in district d to all other districts, and the expenditure of district d on sector S good must equal the sum of imports of good S from all other districts, respectively.
Combining these two equations, prices can be expressed as: where S is any sector other than IT and the importable goods sector.
Income of each region Y dS is obtained by summing wage bill and employment. Expenditure of each region on sector S goods E dS is calculated given share of GDP spent on sector S good. Internal trade costs τ jdS are calculated given distances between districts and σ S from the literature. Productivities in the internally traded sector can be recovered in exactly the same way as in the IT sector described above.

Calibration from the literature
The elasticity of substitution between engineers and non-engineers is calibrated to 2 across all sectors Ryoo and Rosen (2004). The elasticity of substitution between high and low skilled labor (college and non-college graduates) is taken to be 1.7 from Khanna and Morales (2017) which apply Card and Lemieux (2001) methodology to Indian data and find the estimate to be consistent with the literature (such as in Katz and Murphy (1992), Card and Lemieux (2001) and Goldin and Katz (2007)). The elasticity of substitution σ between different types of goods traded internally within India is taken to be 5 following Simonovska and Waugh (2014). Several other papers estimate elasticities of substitution that are close. For example, Van Leemput (2016)

Model validity
Given model parameters and the exogenous amenities and productivities, the model makes predictions about the equilibrium wages, employment, and enrollment across districts, all of which are observable quantities in the data. In this section, I validate the model by first showing that the model generated data can replicate the reduced form facts established in section 5.

Replication of reduced form facts
Reduced form fact 1: IT employment and engineering enrollment positively respond to software exports Using the model generated data, I repeat the reduced form regression and plot the coefficients β t in figure VII where Y dt is IT employment or engineering enrollment in district d at time t. Exports d,1995 is the proportion of software exports from district d in the year 1995 out of total Indian IT exports in 1995. α t are time fixed effects that capture any factors that are common to all districts at time t. γ d are district fixed effects that capture any factors that are fixed over-time in district d. χ d * t is a district-level time-trend capturing any linear trend in the outcome variable at the district level.
From this figure, just like in the data, we can see that post-1998, IT employment increased more in districts that had a higher level of software exports in 1995.
In figure X, I plot the response of engineering enrollment and here also the reduced form results are replicated: post 2000, engineering enrollment increased in districts with higher level of software exports.
Fact 2: The effects are heterogeneous. Employment responds more when nearby regions have higher engineering enrollment and higher IT exports.
The heterogeneous effects are stronger in the long run.
In equation 2 below, I add an interaction term between the number of students enrolled in engineering in 1991 and the proportion of software exports from district d in 1995. Estimated coefficient δ t is plotted in figure VI.
As in the reduced form counterpart, figure 13 shows that, conditional on the level of software exports, post 1998, IT employment responds more in districts that, in 1991, had more enrolled engineering students in same-state, nearby districts. The intuition, formalized in the model, is that in these districts, it is easier to expand future IT production due to having more college-educated, engineering program graduates in close proximity. The short-run is defined as the period during which people cannot change their skills.

Non-targeted moments
Since it takes at-least four years of college and two years of pre-college to complete an engineering degree, the long-run changes in skill composition will not be visible in the labor market until 2001. The long-run is defined as the period from 2001 to 2007 16 . In the long-run, welfare is defined as: where Φ o 1 measures the access to higher education for college-eligible individuals from where access to education, in turn, depends on education amenities, connectivity of region, and the job opportunities available from that region.
When skill levels are fixed, regional welfare depends on the access to jobs for each skill-group, weighted by the distribution of skills. In this case, welfare is defined as: where Φ s measures access to jobs for skill-group s, s = engineers, non-engineers, and unskilled.
With fixed skill-level, welfare increases on average by 0.61%, with the regional gains ranging from 0.17% to 2%.
Using the full general equilibrium model with endogenous skill acquisition, costs of human mobility for both education and work, and costs of moving goods internally, I find that the IT boom increased the average welfare of an individual by 1.12%. The average masks substantial variation across districts, with individuals born in districts with good access to jobs and education gaining as much as 2.63% while their counterparts in remote districts experienced gains as low as 0.67%. To quantify the importance of mobility for education, I run a counterfactual where the option to move for education is shut off. I find that regional inequality increases by 15% and the gap between the welfare gains of the worst off and the best district increases by 63%. The gains in aggregate welfare would have been 1.79% lower if there were no education mobility. These numbers, especially the changes in regional welfare, are large despite the fact that the estimated migration costs for education are quite high.
A high migration cost for education makes it more difficult to migrate across districts for education, undermining the importance of the endogenous education channel compared to a situation with no mobility frictions for education. However, since zero mobility frictions are not possible in reality, I conduct a counterfactual experiment where I reduce the costs of migrating for education across states.
Counterfactual policy: Reducing state quotas for education: In the particular case of India, the widespread prevalence of in-state student quotas for higher educational institutions, reflected in the significantly higher costs of crossing state borders for education relative to that for work, increases the potential for districts in larger states with good educational facilities to gain more from the IT boom. Given that migration costs in India are one of the highest compared to available migration costs for other countries, the geographical connectivity of the district also plays an important role in determining the welfare gains of the district. 17 There seems to be an obvious policy intervention in the education market -the reduction of state quotas for education -that is easier to implement than labor market policies that aim to move jobs. In the counterfactual, this is achieved by reducing the effect of state borders on migration costs for education to the same level as that for work.
The existing magnitude of quotas in higher education institutes in India is huge: most state colleges have home state quotas of 50%, with some being as high as 85 % . 18 The size of the state quota varies by state and by whether the university in question is public or private, but in general, it is a substantial proportion of the total class size (Kone et al. (2018)). 19 These domicile quotas, legally defined as quotas pertaining to the "place of living" or permanent residence, create huge costs of migrating to a different state. While such quotas also exist for jobs and thus create significant hurdles for moving across states, the employment quotas are more specific and less ubiquitous than the instate education quotas. Such quotas are in no way unique to India, and exist in many other countries, including the US and China.
One way of looking at the effect of reduction in state quotas for education on the aggregate and distributional consequences of the IT boom is to reduce the effect of state border on migration for education to be the same as that for work. The reduction of migration costs has the effect of increasing aggregate welfare due to the IT boom by 1% and reducing regional inequality by 27%, compared to the equilibrium change with full migration costs. An interesting point to note is that the reduction in state quotas increases aggregate welfare by very little since not all districts gain from such a measure.
A little less than a third of districts actually gain less from the IT boom in this case compared to the case with the current levels of higher education quotas. In fact, the gains are negatively correlated with the initial education amenities in a district, implying that districts that gained the most from this policy are those that did not initially have good education facilities. Figure XIII shows that the histogram of welfare gains with reduced education quotas has a lower spread than that with education quotas.
It is also clear from the histograms that reducing in-state quotas is not a Pareto improving measure and is likely to meet with political resistance from districts which benefit from the quota policy. In 2016, when the Chinese government announced a policy of reducing provincial quotas to increase opportunities for students from poorer provinces 17 For example, many districts in Uttar Pradesh, the largest state of India in terms of land area and also the number of colleges, gained more than the average district 18 Support for the 85% reservation policy started in Maharashtra from the year 2011 with the backing of nationalist state parties 19 Reservation policy in India is a contentious issue. The magnitude of reservation at private institutions varies hugely from state to state and is still a matter of legal debate. For example, some private universities reserve seats following the state laws under which they were established. For example, in Haryana, private universities also have to reserve 25% of it's seats for students domiciled in Haryana.
to study in elite colleges, mostly located in the more prosperous provinces, there were wide spread protests in Beijing and Shanghai, fueled by the fear that this will hurt local students. 20 In the mid-2000s, the Haryana state government invested land and money to build a hub for higher learning and a center for research, at the same time implementing a policy that reserves 25% seats for in-state students in all colleges across the state. 21

Conclusion
This paper assesses the aggregate and distributional consequences of human capital response to trade for the spatial distribution of welfare. In answering this question, the paper makes three contributions: first, it introduces human capital acquisition decisions in a general equilibrium model with multiple locations. It shows that studying the effects of trade on the labor market without taking into account endogenous skill acquisition can underestimate the aggregate welfare gains from trade. Second, a key innovation of this paper compared to the existing literature on migration is that people can move either for work or for education. Using confidential and unique district-to-district Indian migration data disaggregated by reasons for migration, this paper provides the first separate estimates of mobility costs by reasons for migration. I show that quantifying both of these costs separately is important as these costs can significantly alter the welfare gains from trade depending on their relative magnitudes. Third, as a result of studying the interaction of education and labor market choices in the presence of changes in export-driven employment opportunities, this paper is able to suggest new forms of policy intervention to reduce inequality in regional welfare gains from trade.
Despite a lot of interest surrounding the IT boom and its effect on geographic inequality in India, the lack of disaggregated data made it challenging to quantify its effect on overall economic growth. This paper also takes the first step in collecting district-level data and building a general equilibrium model to quantify the effect of IT boom on skill acquisition and the regional distribution of welfare gains in India. Using the model, it finds that between 1995 and 2005, the IT boom in India increased the average individual welfare by 1.116%, with individuals born in districts with good access to jobs and education gaining as much as 2.36% while those in remote districts experienced gains as low as .67%. These gains are attenuated by high costs of mobility for education and for work across Indian districts, leaving scope for policy interventions in both the education and labor markets that have the potential to reduce regional inequality as well as increase aggregate welfare. There is scope for future work to further the research agenda presented in this paper by studying the regional welfare implications of endogenous edu- cation choice with trade in a dynamic framework, which can trace how welfare changes during the transition period from short to long run. The challenge will be to devise a way to tackle the large number of state spaces as people migrate across regions and over time for work and for education. Tharakan, P. K., I. Van Beveren, and T. Van Ourti (2005, nov 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 year (b) Panel B             Tables   Table I: PPML gravity estimation on district to district migration by reason for migration Education Work Other reasons log distance between district centers -0.585*** -0.567*** -0.752***          language depends on the relative costs of learning that language, which in turn depends on her mother tongue. Someone whose mother tongue is similar (not similar) to Hindi will find Hindi easier (more difficult) to learn relative to English. To quantify what is the relative cost of learning Hindi or English , Shastry constructed three measures of linguistic distance of each native language from Hindi.The first measure classifies languages into five "degrees" of linguistic distance from Hindi based on cognates, grammar, and syntax (see Table 2). The second measure is the percent of words from a core list that are cognates of Hindi words. The third measure is based on language family trees from the Ethnologue database. These measures are highly correlated: 0.935 between degrees and percent cognates and 0.903 between degrees and nodes. From the 1991 census of India, Shastry calculates a district's linguistic distance from Hindi in two ways-1) the population weighted average distance of all native languages from Hindi and 2) the population share of languages at least 3 degrees away from Hindi.
All my analysis that follows is conducted with measure 2) but the analysis are robust to using measure 1) instead. Shastry proxies English-learning costs as linguistic distance from Hindi. One may think the natural proxy is linguistic distance from English, but it is the relative costs of learning Hindi and English that should determine which language one learns. A native Hindi speaker can choose to learn English as a second language at a much lower cost than a non-native speaker whose language is close to Hindi. So there is a non-monotonicity in the relationship-native Hindi speakers are more likely to learn English but speakers of languages close to Hindi learn Hindi rather than English.
Then as distance to Hindi rises, the probability of learning English as a second language rises except for at distance 0. Shastry (2012) shows that such a relation holds. From now on, I would use linguistically distant to Hindi and linguistically closer to English interchangeably.

A.3 Missing value imputation
The NSS is a sample as opposed to the Census which is a complete enumeration. In This yields : Consumption of variety k of good S for an individual who got his degree in o 2 and moved to d to work in occupation S is given by: Assuming ice-berg transportation cost: Consumption of variety k of good S for an individual who got his degree in o 2 and moved to d is given by: Using the above quantities, worker indirect utility in stage 2 is derived as: We can derive the indirect utility for stage 1 very similarly and this gives a combined stage 1 and stage 2 utility of the following form: where P dS = d τ kdS p kS

C.2 Firm's problem
The firm profit maximization condition for sector S is given by: Differentiating with respect toL s,S,k where s=e or ne, In the empirical model, we use engineers and non-engineers as two types of skilled labor. Denoting s=e and s=ne for engineers and non-engineers respectively, one can derive the following foc: Under the assumption that all productivities are drawn from the same Frechet distribution, and firms do not know worker productivities,the foc does not contain effective labor, only labor. We thus get the following estimating equation: From firm first order condition for high-skilled labor, we can rewrite it as: Thus we get the following equation for high-skilled, I now solve the foc for low skilled workers.
For low-skilled, taking the first order condition, we get, ldS Combining the two, we get the following equation: Thus, Note however, that these are not observable quantities due to the presence of unobserved productivity. ln( For ease of notation, I now use s=e and s=ne for engineers and non-engineers respectively.
Note that, all the quantities in this equation are observable. If we plugin the first order condition 38, this is a regression of known quantities with the unobserved productivities as residuals.
Recover IT prices then non-IT Use the following equation for S=IT (p S,k ) 1−ρ S = and p S,k = A u,l w l,S,k Thus we can write price as: The term in the bracket is a function of known quantities. How?
From 40, we get: Or, substituting prices and quantities in terms of their observable components,

C.3 Unknown amenities
Given the distribution of population in each region, estimated migration costs and real wages, unknown region, field of education and sector specific amenities are backed out.
The equilibrium population in location d of workers with degree in s working in sector S is given by: And Equations 44 and 45 have D * s * S + D * s unknowns and D * s * S + D * s equations.
We can solve these uniquely for the unknowns Φ sj and ( W sdS u sdS P d ) θs . Using the obtained values, I run the following regression: Since wages are given from data and the sequence of regional prices P d have already been estimated, one can run this regression to recover θ s . Now local amenities are correlated with real wages in general equilibrium. I can again use the same instrument here by using a long-difference equation and using model predicted wages, holding amenities constant at old values, as an instrument.
To estimate the elasticity of movement for education, I use the population of people with degrees in field s in each location: In the same way, I can solve for unknown quantities a js and Φ o . The utility cost of education has two components:

C.4 Existence of equilibrium proof
To show the existence of equilibrium I use the following theorem, proved in Allen et al (2019).
Theorem 1: Consider any N × K system of equations F : R N ×K ++ R N ×K ++ : where Q m (.) are nested CES aggregating functions: where δ m,l > 0 and β m > 0 for all m and l, K ijk , U l , T j,n are all strictly positive parameter values; S m and T l,m are (weak) subsets of 1, ...., K; and {α k,l , λ k,l , γ k,m , κ k,p } are all real-valued. The equilibrium conditions that govern enrollment are: Let the following hold for some value of κ Thus, we get, The two equations then just boil down to one.
This allows us to consider a single non linear equation: The equilibrium condition in the internally traded sector is given by: We can rewrite the internal gravity equation 12 as: Multiplying both sides by Q 1−σ dS , we get, Simplifying, the above: We can rewrite 13, Suppose that the following relationship holds true for some scalar κ In that case, as I show below, I can express equations 12 and 13 as a single equation.
Equation 12 is given below: in the above we get back equation 13 This allows us to consider a single non-linear equation: Now substitute the price index Simplifying, Finally, substitute the expression for wages, We are thus able to express the equilibrium conditions in the form required for theorem 1. An equilibrium thus exists by the contraction mapping theorem.

D Identification and estimation D.1 Migration cost estimation
In column 1, I report the results from the PPML estimation, the same one reported in the main text of the paper. In column 2, I report the results from regressing the log flows of people migrating for education on the relevant distance measures, where zeros are replaced by the minimum across all migration flows. In column 3, I repeat the same estimation as in column 2, but with zeros excluded. In column 4, I follow a more traditional estimation where the combinations of same state and neighbor dummies have been replaced with just a same state dummy. Across all specifications, the effect of state borders is huge: the effect is 269.5%, 481%, 293.5% and 1039% in specifications 1,2,3 and 4 respectively. Specification 1 is the preferred specification. In specification 2 pairs of districts that do not record any migration flows receive a small value in order to avoid getting thRoWn out of the sample, which induces downward bias in the estimated cost. In specification 3, the pairs of districts with zero values have been ignored which introduces the selection problem. In specification 4, the estimated state border effects are huge because districts in different states share a border with less frequency than districts in the same state, which was previously accounted for by the neighborhood dummy and now gets loaded on to the state dummy. In table 7 above, the same specifications are repeated when people migrate for work.
(3)         1. Quality differences in education: Let the idiosyncratic preference shock be drawn from a Frechet distribution with mean T o 2 , where T o 2 depends on regional average quality of education.
G(ζ io 2 f ) = exp(−T o 2 ζ −γ io 2 f ) Given this, the proportion of people migrating for education is given by: The new Φ o 1 is scaled by T o 2 . The destination fixed effects capture the average quality of education in that region. It therefore behaves in the exact same manner as amenities for education. Note that, this does not change the migration equation for work.
2. International migration: International migration is introduced in the model by adding one region where people can migrate to but from where people cannot migrate out. This region is closer to some regions of India and further from others.
Education facilities and job opportunities are both better in this region than in any region of India. The introduction of this region increases both the aggregate welfare as well as the regional inequality. This is because skilled and unskilled workers are complements in the production function. As skilled workers start migrating out of certain districts that did not see much of the IT boom, this brings down the marginal productivity of unskilled workers as skilled and unskilled workers are complements in the production function.