What Does 'Entrepreneurship' Data Really Show? A Comparison of the Global Entrepreneurship Monitor and World Bank Group Datasets

This paper compares two datasets designed to measure entrepreneurship. The Global Entrepreneurship Monitor dataset captures early-stage entrepreneurial activity; the World Bank Group Entrepreneurship Survey dataset captures formal business registration. There are a number of important differences when the data are compared. First, GEM data tend to report significantly greater levels of early-stage entrepreneurship in developing economies than do the World Bank data. The World Bank data tend to be greater than GEM data for developed countries. Second, the magnitude of the difference between the datasets across countries is related to the local institutional and environmental conditions for entrepreneurs, after controlling for levels of economic development. A possible explanation for this is that the World Bank data measure rates of entry in the formal economy, whereas GEM data are reflective of entrepreneurial intent and capture informality of entrepreneurship. This is particularly true for developing countries. Therefore, this discrepancy can be interpreted as the spread between individuals who could potentially operate businesses in the formal sector - and those that actually do so: In other words, GEM data may represent the potential supply of entrepreneurs, whereas the World Bank data may represent the actual rate of entrepreneurship. The findings suggest that entrepreneurs in developed countries have greater ease and incentives to incorporate, both for the benefits of greater access to formal financing and labor contracts, as well as for tax and other purposes not directly related to business activities.


Policy ReseaRch WoRking PaPeR 4667
This paper compares two datasets designed to measure entrepreneurship. The Global Entrepreneurship Monitor dataset captures early-stage entrepreneurial activity; the World Bank Group Entrepreneurship Survey dataset captures formal business registration. There are a number of important differences when the data are compared. First, GEM data tend to report significantly greater levels of early-stage entrepreneurship in developing economies than do the World Bank data. The World Bank data tend to be greater than GEM data for developed countries. Second, the magnitude of the difference between the datasets across countries is related to the local institutional and environmental conditions for entrepreneurs, after controlling for levels of economic development. A possible explanation for this is that the World Bank data measure rates of entry This paper-a product of the Finance and Private Sector Team, Development Research Group-is part of a larger effort in the department to study entrepreneurship. Policy Research Working Papers are also posted on the Web at http://econ. worldbank.org. The author may be contacted at lklapper@worldbank.org.
in the formal economy, whereas GEM data are reflective of entrepreneurial intent and capture informality of entrepreneurship. This is particularly true for developing countries. Therefore, this discrepancy can be interpreted as the spread between individuals who could potentially operate businesses in the formal sector-and those that actually do so: In other words, GEM data may represent the potential supply of entrepreneurs, whereas the World Bank data may represent the actual rate of entrepreneurship. The findings suggest that entrepreneurs in developed countries have greater ease and incentives to incorporate, both for the benefits of greater access to formal financing and labor contracts, as well as for tax and other purposes not directly related to business activities.

Introduction
Since the mid-1980s, entrepreneurship has been increasingly considered an important tool for economic growth and innovation across economies, regardless of stage of economic development. Entrepreneurship is now at the center of many policy questions related to science and technology, sustainability, poverty, human capital, endogenous resources, employment, regional and comparative advantages, etc. The surge of policy interest in entrepreneurship has, not surprisingly, been accompanied by growing academic research into its dynamics and processes. With respect to policy, research priorities have focused first on understanding (measuring) and second, on creating environments supportive of entrepreneurship (Acs and Szerb 2007). One particularly important public policy issue for international development is the role played by institutional features of the investment climate, for instance, indicators of the business environment measured by the World Bank Doing Business reports (World Bank 2007).
For example, this includes measures of the regulatory burden for starting, operating, and closing a business, such as the cost, number of days, and number of procedures required to start a business.
In recent years, different sources of data on "entrepreneurship" have led to contradictory or inconclusive empirical findings for research into its dynamics. 1 For example, it is still unclear if -and in what direction -a causation exists between entrepreneurship and unemployment, poverty, taxation, regulatory burden, etc. Countryspecific differences may certainly lead to contradictory findings, as well as the variety in the types of data used as broad measures of "entrepreneurship." This has contributed to a great deal of confusion in entrepreneurship research. For this reason, it is critically important to understand what the data indicate, and exactly what element of entrepreneurial dynamics is being measured. The World Bank Group Entrepreneurship Survey (WBGES) data, for example, measure the registration of LLCs, which is one kind of legal arrangement for a new firm. We discuss the implications of the various definitions of start-ups further in the comparative analysis section of our paper.
Separate studies using Global Entrepreneurship Monitor (GEM) and WBGES data have found contradictory results: While no relationship is found between GEM data and administrative barriers to starting a business, a significantly negative effect is found with WBGES data (van Stel, et al. 2007 andKlapper et al. 2007, respectively). 2 It is possible that this -and similar contradictory results in the empirical entrepreneurship research -can be attributed to some degree to the differences in what the data captures.
For this reason, we compare the two popular datasets designed to capture entrepreneurial dynamics.
In this paper, we compare the GEM dataset for early-stage entrepreneurial activity and the WBGES dataset for formal business registration. We find two important trends when the data are compared descriptively. First, GEM data tends to report significantly higher levels of early-stage entrepreneurship in developing economies than does the World Bank business entry data. Second, the World Bank business entry data tends to be higher than GEM data for developed countries.
There are at least three possible ways to interpret for this discrepancy. First, the datasets simply measure different dynamics related to "entrepreneurship." The WBGES measure rates of entry in the formal economy, and even more specifically, entry in the form of LLC establishments. The GEM data is perhaps more reflective of entrepreneurial intent and what some might call "entrepreneurial spirit." For this reason, GEM data captures informality of entrepreneurship, particularly in developing countries. In particular, firm formation does not necessarily mean firm registration. Second, this discrepancy can also be interpreted as the spread between individuals who could potentially operate businesses in the formal sector -and those that actually do so. If this is the case, then GEM data may represent the potential supply of entrepreneurs, whereas WBGES data would represent the actual rate of entrepreneurship. This is interesting especially in the context of the allocation of talent (Murphy et al., 1991) and the allocation of entrepreneurship (Baumol, 1990). In the allocation of talent model, the stock of talent is relatively constant but its allocation towards a range of activities can change.
2 This is also consistent with Klapper, et al. (2006), who find a significant relationship between business registration in 35 European countries and entry barriers. De Soto (1989) and Djankov, et al. (2002) find that costly regulations impede the setting up of businesses and stand in the way of economic growth. Djankov et. al. (2002) find that high costs of entry exist in most countries, and that countries with more corruption have larger unofficial economies.
Similarly, in the allocation of entrepreneurship model, the stock of entrepreneurs in the economy is relatively constant, but the nature of their activities changes.
The motivation for entrepreneurs to operate in the formal versus informal sector is examined further in our empirical analysis. We find that the magnitude of differences reported in the datasets across countries is related to the institutional and environmental conditions for entrepreneurs. In terms of institutional differences, we find that the conditions related to registration, operation and closure of business are important; and in terms of environmental differences, we find significant affects of economic and political conditions. Overall, entrepreneurs in developed countries have greater ease and incentives to incorporate, both for the benefits of greater access to formal financing and labor contracts, as well as for tax and other purposes not related to business activities. We elaborate on this further in the comparative analysis section of this paper.

GEM
The Global Entrepreneurship Monitor (GEM) project is unique in that while all countries collect official data on self-employment, the size distribution of firms, census data on all or most plants and firms, firm and plant entry, almost none of these registry sources are comparable across countries, even in developed countries. Official data sources differ in the way they define when an establishment enters a file and when it leaves, and how they handle self-employment makes cross-national comparisons almost impossible. 3 Therefore, one of the major strengths of the project is the application of uniform definitions and data collection across countries for international comparisons.
The intent of GEM is to systematically assess two things: the level of start-up activity or the prevalence of nascent firms and the prevalence of new or young firms that have survived the start-up phase. First, start-up activity (the "nascent" rate) is measured by the proportion of the adult population (18-64 years of age) in each country that is currently engaged in the process of creating a nascent business. Second, the proportion of adults in each country who are involved in operating a business that is less than 42 months old measures the presence of new firms (the "baby" rate). The distinction between nascent and new firms is made in order to determine the relationship of each to national economic growth. For both measures, the research focus is on entrepreneurial activity in which the individual involved have a direct but not necessarily full, ownership interest in the business.

World Bank Group data
The goal of the 2007 World Bank Group Entrepreneurship Survey was to collect a benchmark of formal entrepreneurial activity for a large number of developed and developing countries. This intent is that this data will be used to compare private sector development across countries, as well as to monitor and evaluate the impact of regulatory reforms over time. In order to measure entrepreneurship and make data universally comparable, we developed a methodology that can be applicable across heterogeneous legal regimes and economic systems. Previous efforts had been made in this regard, but the great majority focused solely on the developed world, and did not take into account differences in legal systems, sectors, and economic structures (see United Nations, 2005).
The WBGES defines the unit of measurement of entrepreneurship as: Any economic unit of the formal sector incorporated as a legal entity and registered in a public registry, which is capable, in its own right, of incurring liabilities and of engaging in economic activities and transactions with other entities.
Notably, this definition excludes informal sector initiatives. This exclusion is based on the difficulties of quantifying the number of firms in the informal sector, rather than on its relevance for developing economies (Boegh, Nielsen and Ploving, 1997). The only way to measure the informal sector is through economic censuses, which due to their high costs are infrequently collected.
Furthermore, entrepreneurship is defined as:

The activities of an individual or a group aimed at initiating economic activities in the formal sector under a legal form of business.
However, few countries (i.e Denmark) maintain "active" registries that annually confirm that registered firms are still operating. Therefore, official registration data includes both businesses incorporated for economic activities, as well as those incorporated for tax or other non-business purposes (e.g. Shell companies). An additional limitation of the data is that it does not report the number of closed businesses.
The reasons differ from country to country, but are mainly due to the fact that the registrars generally have no enforcement mechanisms to obligate businesses to report closures. Although the number of closed companies is essential to paint a clear picture of the economic and entrepreneurial activities of a country, it is not yet feasible to obtain comparable data (Nuci, 1999).
The For the purpose of the analysis in this paper, the data is used to calculate the "corporate" entrepreneurship rate, which is defined as the number of newly registered companies as a percentage of the adult population.

Comparative Analysis
To compare entrepreneurship rates between the two databases, we calculate the spread between the "nascent" and "baby" entrepreneurship rates in GEM and the "corporate" entrepreneurship rate in WBGES 7 . The first new indicator, SPR_N_C, measures the difference between percentages of individuals who in the process of starting a business (the GEM "nascent" rate) and those who have actually started a formal corporation. The second new indicator, SPR_B_C, measures the difference between the percentage of individuals operating a young business in either the formal or informal sector ("baby"), with the percentage of individuals who have chosen and/or succeeded in starting a formal corporation ("corporate").
We interpret these spreads to reflect, in some part, a loss of potential formal sector participation. In other words, this can represent those individuals that were unsuccessful in registering their business, because of barriers to registration that we later introduce, or that chose to operate in the informal sector. The tendency of GEM data to be higher than WBGES data for developing countries is likely partly indicative of lost formal sector participation due to barriers to participation, and partly indicative of the informal economy due to choice. These are not mutually exclusive. In either case, the individual may still have started a business -but as we mentioned in the introduction, firm formation does not mean registration. We expect a higher spread -indicating a larger loss of entrepreneurial potential -in countries with weaker business environments. 8 The quality of the business environment, as measured by the Doing Business and other indicators, is collectively accepted as a critical determinant of entrepreneurial activity.
These spreads, by country, are shown in Figure 1. 7 The SPR_B_C cannot be strictly compared. The nascent prevalence rate is for one point in time, so it is more or less an annual rate. However, the baby business data is for 42 months of activity, so it is not actually an 'entry rate' of new firms. The GEM data can be estimated for an annual rate. First, you need to estimate how many new births the numbers represents. Since there is an annual attrition rate at the end of six months 95% of the firms would still be in operating. For example, if 100 are born, this assumes that 95 will be operating at month seven. This increases the total count by 16% to compensate for the discontinuances. Second, we adjust from 42 months to one year. The final correction factor is 0.33. What would we expect the data to show from a theoretical perspective? If the nascent rate represents early stage activity, we expect this to be higher than the young entrepreneurship rate. This is because many people that take "some steps" towards starting a business do not actually succeed. We also expect the young entrepreneurship rate to be larger than the formal rate, since many firms first are initially established under sole proprietorship but incorporated at a later stage. In fact, for the United States, these rates are 8.12%, 4.98% and 2.55% respectively. This does not, however, hold across developed and developing countries.
In fact, it appears that in many countries -developed and developing -the young entrepreneurship rate and the nascent entrepreneurship rate are less than the formal entrepreneurship rate. This is case not only in Hong Kong, but also in Latvia, the For instance, laws on hiring and firing employees in Italy applies only to firms with more than 15 employees, which might encourage business owners to register multiple smaller firms (Klapper, et al 2006).
LLC to avoid taxes. The incentive to register firms for redundant or non-business activities might be greater in developed countries with more complex (and enforced) tax and regulatory structures.

Data and Summary Statistics
The sample for the analysis is a pooled, cross-sectional,  We consider a variety of country characteristics as predictors of entrepreneurial activity, which vary over time. We include log GDP per capita (GDPPC) in all estimations, to control for economic development because of the varied levels of development of countries for which we have data. As an additional explanatory variable, we include the ratio of domestic credit to the private sector as a percentage of GDP as a measure of financial development (DomCredit).
We use four measures of the regulatory barriers: First, an indicator of the difficulty of hiring and firing employees (Labor_Rig). Second, the log cost of business registration (Entry_Cost). Third, the log number of procedures required to start a business (Entry_Proc). Fourth, the ease of closing a business, proxied by the estimated recovery rate claimants can expect following foreclosure or bankruptcy (Rec_Rate).
These measures indicate the difficulties in starting, operating, and closing a business.
It is important to note that these indicators measure the barriers for a "typical" formal sector firm, which might in part explain the weak relationship with GEM data.
For instance, the methodology for entry barriers assumes: "The business is: • A limited liability company. 10 The complete list of countries is shown in Annex B. • Has a turnover of at least 100 times income per capita." 11 We expect that these barriers would have a stronger relationship with the formal entrepreneurship rates in the WB database. Furthermore, these indicators might be important predictors of a firm's decision to operate in the formal versus informal sector.
Next, we include indicators of operational risk, which may proxy for the risks and benefits of individuals of operating a firm in the formal (rather than informal) sector. For instance, we would expect individuals to be less willing to operate illegally (and more likely to pay taxes) in countries where registration laws are enforced, corruption is lower, and the economy is healthy. First, we include an index of political risk (Pol_Risk), which measures corruption, government stability, etc. Second, we include an index of law and order (Law_Order), which measures the efficiency of the legal and judicial system. Third, we include an index of economic risk (Econ_Risk), which measures the economic growth of the country. Fourth, we include a composite risk index, which is an average of political, economic, and governmental financial risk and stability. Table 2 shows the correlation matrix of our variables. Univariate tests show significance with all variables except employment laws. An explanation might be that both formal and informal young firms are less likely to hire a large number of employees. 12 Because of the large and significant correlation between the explanatory variables, estimations are run separately, while controlling for economic development through logGDP per capita. Figure 2 shows scatter-plots and univariate tests of our explanatory variables. We find significant relationships for both the SPR_N_C and SPR_B_C. As expected the spread between the two measures is negatively related to per capita GDP, composite risk, recovery rate and law and order. It is positively related to the number of procedures needed to register a business and the share of the informal economy. 11 http://www.doingbusiness.org/MethodologySurveys/StartingBusiness.aspx. 12 Especially since formal firms in developing countries are likely to be in the sectors of wholesale and retail trade -and unlikely to be in manufacturing -which are less dependent on labor ). Table 3 shows our estimation results for the spread between nascent and formal entrepreneurship. We find no relationship between this spread and domestic credit, which might suggest that start-ups are less dependent on formal bank financing (and depend more on personal savings). The strongest relationship among our investment climate variables is with closure costs -since the default rate of new firms is very high, firms that expect to get the lowest return on their investment might be least likely to undertake the time and cost of joining the formal sector (and benefiting from formal legal bankruptcy proceedings). We find the interaction terms of entry costs, entry procedures, and recovery rates with GDP per capita to be significant -barriers to starting (and closing) a business matter more in lower-income countries. Or, in other words, individuals in developing countries are only likely to have incentives to join the formal sector if entry barriers are low. A possible explanation is that many developing countries host substantial informal sectors, so entrepreneurs are able to operate entirely within the informal economy. For example, the ILO estimates 60 per cent of the workforce in Asia to be in the informal sector (ILO, 2007). Individuals can start businesses that meet demand, and derive supply, within the informal sector. In such cases, they have little actual need to join the formal sector in order to operate. Table 4 shows the relationship between the spread with nascent entrepreneurs and measures of country risk. We find a strong and significant relationship with the composite risk index -again, individuals are more likely to choose and succeed in joining the formal sector if the political, economic, and financial risks are low.

Empirical Results
Furthermore, the interaction with law and order is significant.
Next, we use as our dependent variable the spread between young business -both formal and informal -and formal entrepreneurship. We expect this spread to be largest in countries with weaker business environments (and larger informal sectors). Table 5 shows that in this case, in addition to recovery rates, entry procedures (and the interaction with GDP per capita) is significant, i.e. entry barriers matter. Table 6 shows that law and order -legal and judicial efficiency -is the most important determinant in the decision whether or not to operate in the formal sector and/or to register as a limited-liability company.
The results raise one interesting question. As entry barriers increase, the spread between the informal and the formal sector rises, as expected, and as entry procedures fall, the spread between the formal and informal sector falls. The implication is that barriers to entry are greater for corporate entrepreneurship than for young businesses that have not incorporated or for nascent entrepreneurs where they are in the process of starting a business. However, in developed countries, the spread between the informal and formal sectors not only decreases, but is often positive; i.e. the number of limited-liability companies is greater than the sum of sole proprietors and informal firms. This implies that it is at least as easy to start a limited liability company as a sole proprietorship.

Conclusion
The purpose of this paper is to compare two datasets designed to capture entrepreneurial dynamics: The GEM data for early-stage entrepreneurial activity and the World Bank Entrepreneurship Group dataset for formal business registration. We find a number of important differences in the data. First, the GEM data tends to report significantly lower levels of early-stage entrepreneurial activity in developed countries. In other word it is more common to start a formal business in a developed country than a sole proprietorship.
Second, the GEM data tends to be higher for developing countries than for developed countries. One possible explanation if the distinction between intent and informality of entrepreneurial activity particularly in developing countries that is captured by GEM data.
However, important exceptions to this are found for both the United States and Germany in particular. This suggests that firms in developed countries have greater ease and incentives to incorporate, both for the benefits of greater access to formal financing and labor contracts, as well as for tax and other purposes not related to business activities.

Table 1: Variable Definitions and Summary Statistics
The sample is a pooled, cross-sectional, longitudinal unbalanced panel across 41 countries with nonmissing explanatory variables for 2003, 2004, and 2005. "GEM" is the Global Entrepreneurship Monitor; "WBGED" is the World Bank Group Entrepreneurship Database; "DB" is the World Bank Doing Business Database (www.doingbusiness.org); "ICRG" is the International Country Risk Guide.

Variable
Obs Description Mean Std. Dev.

SPR_N_C 90
The spread between the "Nascent" entrepreneurship rate (GEM) -defined as the number of people actively involved in starting a new venture, as a percentage of adult population -and "Corporate" entrepreneurshipdefined as the percentage of newly registered limited-liability firms (less than 1 year), as a percentage of adult population. A higher value indicates a greater loss of entrepreneurial potential.
-0.36 4.14 SPR_B_C 90 The spread between the "Baby" entrepreneurship rate (GEM) -defined as the number of people that are owners/managers of a business that is less than 42 months old, as a percentage of adult population -and "Corporate" entrepreneurship. A higher value indicates a greater loss of entrepreneurial potential.
- The log estimate of how many cents on the dollar claimants -creditors, tax authorities and employees -recover from an insolvent firm, as a measure of the efficiency of foreclosure or bankruptcy procedures. A higher value indicates lower closure barriers (DB). Share of the informal economy, calculated as the size of the informal economy as a percentage of official GNI; normalized between 0 and 1 (DB).

Figure 1: Nascent, Young, and Formal Entrepreneurship
Variables are defined in Table 1.     2003, 2004, and 2005. The dependent variable is "SPR_B_C", which is a measure of lost entrepreneurial potential. The regressions are estimated using population-averaged Generalized Estimating Equations (GEE), with a year trend added as a control. . z-scores are shown in brackets beneath regression coefficient. Asterisks, *, **, and ***, indicate significance at 10%, 5%, and 1%, respectively.  2003, 2004, and 2005. The dependent variable is "SPR_B_C", which is a measure of lost entrepreneurial potential. The regressions are estimated using population-averaged Generalized Estimating Equations (GEE), with a year trend added as a control. . z-scores are shown in brackets beneath regression coefficient. Asterisks, *, **, and ***, indicate significance at 10%, 5%, and 1%, respectively. Shown are averages of non-missing variables for 2003, 2004, and 2005. "Nascent" is the number of people actively involved in starting a new venture, as a percentage of adult population, "Baby" is the number of people that are owners/managers of a business that is less than 42 months old, as a percentage of adult population, and "Corporate" is the percentage of newly registered limited-liability firms (less than 1 year), as a percentage of adult population. SPR_N_C is the spread between Nascent and Formal entrepreneurship rates and "SPR_B_C" is the spread between Young and Formal entrepreneurship rates Country "Nascent" "Young" "Formal" SPR_B_C SPR_N_C