Policy Research Working Paper 11119 Using Household Surveys and Specialized Enterprise Surveys to Measure Informal Enterprises Akuffo Amankwah Hibret B. Maemir Pauline Castaing Amparo Palacios-Lopez Richmond Attah-Ankomah Diego Zardetto David C. Francis Development Economics Development Data Group May 2025 Policy Research Working Paper 11119 Abstract This paper compares two widely used methods for survey- methods, with the household survey approach reporting ing informal enterprises: household surveys, which collect a significantly higher count. The paper explores poten- data on enterprises through household interviews, and tial reasons for these differences, focusing on design and area-based enterprise surveys, which directly target busi- implementation factors. The findings also suggest that both nesses in specific geographic areas. By implementing both survey methods yield consistent statistics for characterizing survey approaches simultaneously in two urban centers in informal businesses and identifying factors that drive their Ghana, this study examines key differences and similarities performance. Characteristics such as bank account owner- in the characteristics of informal enterprises across these ship, sector of operation (retail), phone usage, and operating cities. The analysis reveals substantial variation in estimates in a fixed premise outside the household are associated with of the number of informal enterprises between the two higher productivity across both surveys. This paper is a product of the Development Data Group, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at aamankwah@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Using Household Surveys and Specialized Enterprise Surveys to Measure Informal Enterprises Akuffo Amankwah1, Hibret B. Maemir3, Pauline Castaing2, Amparo Palacios-Lopez1, Richmond Attah-Ankomah4, Diego Zardetto2, David C. Francis3 1 Living Standards Measurement Study, Development Data Group, World Bank Group, Washington DC, USA. 2 Living Standards Measurement Study, Development Data Group, World Bank Group, Rome, Italy. 3 Enterprise Analysis Unit, World Bank Group, Washington DC, USA. 4 Institute of Statistical, Social and Economic Research, University of Ghana, Legon, Ghana Keywords: list-based household survey; area-based adaptive cluster sampling; informal sector enterprise survey; Ghana WB Thematic topics: Enterprise Surveys; Informal Sector. JEL Codes: D22; O17; M21. 1. Introduction In low- and middle-income countries (LMICs), the informal sector is a prominent feature and means of livelihood, contributing 30%-70% of GDP and employing 20%-80% of the labor force (Ulyssea, 2020). This indicates that a large portion of economic activity in these countries operates outside regulatory frameworks and does not appear in official registration records. This could be in terms of the informal status of an enterprise or workers in formal firms being informal. As a result, accurately measuring the informal sector presents significant challenges, hindering research and limiting the scope for policy recommendations. To effectively inform policies and research in this area, it is crucial to have data that accurately represent the informal sector. However, a number of studies highlight methodological issues in estimating the size of the informal sector in LMICs (see Charmes, 2002; Hussmanns, 2004; Maligalig & Guerrero, 2008). This paper emphasizes the importance of sampling and survey methods and their implications for characterizing and estimating informal enterprises using data from Ghana, a lower middle-income country. Two main survey approaches have been employed to characterize informal enterprises in LMICs (Hussmans, 2004; 2009). The first is household surveys (HS) (typically Labor Force Surveys and multi-topic household/living standards surveys), which use clusters defined by population census data, select a random sample of primary sampling units (PSUs), and list all households in those PSUs. A random sample of households are then selected and interviewed in each of the sampled PSUs. In household surveys like the LSMS-ISA 1 that have a dedicated module on enterprises operated by the household, detailed data on those enterprises are collected (including formalization status, irrespective of the location of the enterprise relative to the households dwelling). For these surveys, households constitute the main sampling units and are usually selected through multi-stage probability sampling procedures. The second approach involves sampling informal enterprises directly through specialized enterprise surveys, such as informal sector enterprise surveys (ISES) using an area-based sampling approach, described in Section 2 below. 1 The Living Standards Measurement Study - Integrated Surveys on Agriculture (LSMS-ISA) is a system of multi- topic, nationally representative panel household surveys with a strong focus on agriculture. More information can be found on this website: https://www.worldbank.org/en/programs/lsms/initiatives/lsms-ISA. 2 Recognizing that each of the two methods has its own advantages and drawbacks, this study investigates the strengths between the two approaches and provides recommendations for the development of survey tools that can enhance the measurement of informal enterprises. To achieve this, the team conducted two surveys, a household survey with an integrated module on household enterprises (HS-IME) using a household sampling approach and ISES using an area- based approach, to collect data on the informal enterprises in the same areas of Ghana and at approximately the same time. The two surveys asked similar questions to the informal enterprises surveyed but used different sampling techniques. By conducting these overlapping surveys, we obtained two sets of information on informal enterprises that can be compared and combined for analysis on the extent and characterization of informality in Ghana. This paper makes two main contributions to the existing literature. First, it contributes to the ongoing discussion regarding how to enhance the measurement of informality in LMICs. Due to their elusive nature, informal enterprises are often classified as hard-to-reach populations as they are not included in standard sampling frames, their locations are often not fixed and require non-standard sampling strategies to obtain unbiased estimates (Aga et al., 2023; Aberra et al., 2022). Our goal is to understand any divergences in the estimates obtained by the two methodologies used in this study, and to better explain the nuances in their interpretation and advance improvements in the methodology of measuring informal enterprises. Second, the study helps in profiling the informal enterprises in Ghana. The informal sector in Ghana comprises private agricultural and non-agricultural enterprises, accounting for more than 65% of employment and nearly 36% of gross domestic product (GSS, 2022; Good Governance Africa, 2023). Hence, the majority of off-farm jobs are provided by informal firms, which include own-account microenterprises, family businesses, and informal wage employment. Although informal sector jobs are an important source of income and employment, they are generally considered as less desirable, more unstable, and offering lower remuneration (La Porta and Schleifer, 2014). At the same time, the prevalence of informality may also provide information about formal job availability and larger conditions in the business environment (Harris and Todaro, 1970). Therefore, understanding the operation, activity, and productivity of units or firms in this sector is crucial for researchers and policy makers alike. Policy makers can only assess progress or achievements with good quality statistics. We contribute to this effort by providing new 3 evidence from two alternative surveys on the informal enterprises in two cities (Tamale and Kumasi) of Ghana. We find significant deviations in estimates of the number of informal enterprises from the two methods with no overlapping confidence intervals. The HS-IME provides estimates that are aligned with the Ghana Living Standards Survey (GLSS 7), which uses a similar sampling approach, 2 while the ISES gives significantly lower estimates. Interestingly, despite the differences in estimating the number of such entities, the two survey approaches are largely consistent with each other in producing descriptive statistics for characterizing informal enterprises as well as the key correlates or drivers of enterprise performance. These results are in general consistent with the characterization of informality in the extant literature (Maloney, 2004; La Porta and Shleifer, 2014; Aga et al., 2023; Koeda and Dabla-Noris, 2008; Kanbur, 2017; Benhassine et al., 2018; Campos et al., 2018). This suggests that while the two approaches diverge in counting informal enterprises, they provide insights into the characteristics of these businesses. The rest of paper is organized as follows: Section 2 presents a brief review of relevant literature, and Section 3 describes the data and sampling procedures used for the two surveys. Section 4 presents the analytical approaches used. Section 5 presents the results and discussion, and Section 6 provides conclusions and implications of the study. 2. Literature review 2.1. Defining or conceptualizing informal enterprises The 21st International Conference of Labor Statisticians adopted a new definition of informal market economy, which is defined as “all production for pay or profit in the informal sector and all productive activities of workers in employment that are – in law or in practice – not covered by formal arrangements.” The resolution defines the “informal sector” as “comprising economic units whose production is mainly intended for the market with the purpose of generating income and profit, but that are not formally recognized as producers with a production distinct from the own-use production of the household” (Frosch 2025). The formal status of an economic unit is determined by meeting at least one of the following set of five operational criteria: “ (i) being 2 Results available upon request. 4 owned or controlled by the government (governmental or public unit); or (ii) being recognized as separate legal entities from their owners (incorporated enterprise); or (iii) keeping a complete set of accounts for tax purposes (quasi-corporations); or (iv) being registered in a governmentally established system of registration (registered enterprises); or (v) producing for the market and employing one or more persons to work as an employee with a formal job (economic unit is employing at least one formal employee)” (ILO 2023). However, if the economic unit does not meet at least of one of the five criteria above, it would be considered as an “informal unincorporated household market enterprise” (Frosch 2025). In this paper, we define an informal enterprise as an enterprise that is owned by the household (does not fulfill criteria (i) and (ii)) and that is not registered (does not fulfill criteria (iv)). 2.2. Survey approaches for measuring informal enterprises As mentioned above, household surveys and enterprise surveys have been the predominant vehicles for measuring informal enterprises (Charmes, 2002; Hussmanns, 2004; Maligalig & Guerrero, 2008). These approaches tend to have different pros and cons. For example, Hussmanns (2004) provides an evaluation of the use of labor force surveys to measure informal enterprises. He argues that labor force surveys and other traditional household surveys (that collect information on labor) can be handy in providing estimates of informality because they are usually conducted at higher frequency than enterprise surveys. However, in such surveys, employees may find it difficult to provide information on some important criteria like registration and bookkeeping practices needed to define informal enterprises. Additionally, the estimation of the number of informal enterprises from labor force surveys is almost impossible largely because the number of informal enterprises is not identical with the number of owners of informal enterprises due to the existence of business partnerships (Hussmanns, 2004; 2009). Specialized enterprise surveys such as ISES, on the other hand, focus on understanding the composition, characteristics, and operations of informal enterprises. Hussmanns (2009) provides a more graphic explanation of this by stating that “if the measurement objectives are to collect detailed structural information on the composition of the informal sector in terms of the number and characteristics of enterprises involved, and to obtain data for an in-depth study of the production activities, employment, income generation and capital equipment of informal sector 5 enterprises, the conditions and constraints under which they operate, their organization and relationships with the formal sector and the public authorities, etc., surveys are required in which the informal sector businesses themselves and their owners are the observation and reporting units” (p. 2). However, the ISES might miss less visible enterprises that operate within the dwellings without identification, highly mobile enterprises, or enterprises that operate at night or outside normal business hours. Hussmanns (2009) and Maligalig and Guerrero (2008) provide a description of a mixed method as well as the ways in which the household surveys and enterprise surveys could be combined. One variant involves first conducting the household survey, from which informal enterprises are identified and are further surveyed. The first stage provides a comprehensive sampling frame for informal enterprises through a household listing or survey operation in selected primary sampling units, while in the second stage, all or a sample of the informal enterprises are interviewed to obtain detailed information on their characteristics, including those of owners and workers (Hussmans, 2009). Besides using this consecutive two-stage process, household and enterprise surveys can be integrated into one survey such as the HS-IME, a household survey that includes a dedicated module on household enterprises. Household surveys might be more suitable to capture small enterprises with low physical visibility, or highly mobile traders who move around the city. However, the number of questions that can be included in the HS-IME may be limited due to respondent’s fatigue resulting from the multitopic nature of the HS-IME. Depending on the sample size of the survey, these surveys might fail to capture all the complexities of informal enterprises within a specific geographic area. The HS and ISES follow different sampling approaches. The HS approach identifies informal enterprises based on where the owner(s) live, while the ES approach is based on where the enterprises are located. If there is a disconnect, this could drive differences in the counts and profiling of these enterprises across the two methods. Household surveys typically rely on multi- stage probability sampling techniques (Scott et al. 2005; GSS, 2015). The multi-stage sampling method usually follows a two-step sampling procedure, in which primary sampling units (PSUs) that often consist of small geographic clusters (often called enumeration areas) are selected at random in a first stage (sometimes with stratification). The second stage involves the random 6 selection of secondary sampling units (SSUs), which are households. The multi-stage sampling approach is preferred to simple random sampling (SRS) because it helps in minimizing data collection cost, though with some impact on the precision of estimates (Scott et al. 2005). The ISES relies on an adaptive cluster sampling (ACS) approach (Aga et al., 2023; Aberra et al., 2022; Thompson, 1990 p. 1056). Under ACS sampling, an initial sample of PSUs are selected from a standardized sampling frame. Each PSU is enumerated in a process where a count of relevant units – in the present case, informal enterprises – is obtained. In the case of the ISES, such PSUs are defined as evenly sized geographical areas, block areas (BAs) within clearly delineated boundaries of an urban area. In ACS, following the initial enumeration of the PSUs, implementers set a so-called expansion threshold, which if met, triggers the selection of adjacent BAs for enumeration. 3 ACS methods provide design-unbiased population estimates (Thompson, 1990) and generally provide implementers with the ability to reduce fieldwork effort (Aga et al., 2023). Such an ACS approach has been applied to surveying informal enterprises in developing countries under project(s) run by the World Bank’s Enterprise Analysis Unit (see Aga et al. 2023; Aberra et al. 2022). Based on surveys of informal enterprises in several developing countries which used ACS, Aga et al. (2023) argue that ACS produces substantial efficiency gains over simple random sampling, especially in fieldwork efforts. ACS approaches are well-established in the statistical sampling literature but had not been applied in social science research until its recent application to informal enterprise surveys by the World Bank Enterprise Unit. The approach had been mostly applied to the study of animal and plants ecology, and it emerged to mainly address cost and/efficiency problems associated with using SRS to studying rare and clustered populations (see Thompson 1990, 1991). 4 The advantage with ACS is that its unbiasedness is design-based, which does not depend on the type of population being sampled although its efficiency in relation 3 In reality, different expansion patterns into neighboring BAs can be set. In the case of the ISES, PSUs that trigger an expansion result in the enumeration of all adjacent BAs. Expansions can, therefore, result in (up to) an additional 8 enumerated BAs for each PSU where the count of units meets or exceeds the set threshold. 4 It should be noted that the fieldwork efficiency gains to ACS rely upon relative and not absolute rareness. The intuition is straightforward: ACS intensifies enumeration in the immediate vicinity of an enumerated area that is determined to have a threshold-equivalent or higher count of the target units. Implementers, therefore, face a choice of threshold rules given their budget and time constraints. Note as well that a threshold of infinity (which prohibits expansions) is, therefore, equivalent to simple random sampling (within strata, if those are included in a complex design); likewise, a threshold of 0 is equivalent to a full census. The latter example is informative, as it shows that if implementers set an expansion threshold at a very low relative value, fieldwork effort will approximate a census, reducing any fieldwork efficiency gains (noting that ACS remains design unbiased). 7 to nonadaptive methods depends on the relative distribution of the population being sampled (Thompson 1990). Specifically, ACS tends to be “more efficient than simple random sampling if the within-network variance of the population is sufficiently high” (Thompson, 1990 p. 1056). 5 2.3. Characteristics of informal enterprises Informal enterprises exhibit certain characteristics that differentiate them from formal enterprises. These characteristics relate to both the profile of their owners and the nature and operations of the enterprises. The existing literature shows that informal enterprises are owned or operated by young, less educated, or low-skilled people who start them out of necessity instead of an opportunity for growth (Maloney, 2004; La Porta and Shleifer, 2014; Aga et al. 2023). Ownership of informal enterprises tends to be gender-based but the extent of this has been found to be context specific (Aga et al. 2023). Female-owned enterprises are associated with low productivity levels and face more operational constraints compared to enterprises owned by males, which can also be indictive of barriers or restrictions to formal work (Aga et al., 2023; Peprah et al., 2019; Owoo et al., 2024). The existing literature also shows that informal enterprises are very small (arguably the vast majority of microenterprises are informal), and compared to their formal counterparts, they are younger and are more concentrated in retail and service sector with poor management practices and limited access to financial services (Koeda and Dabla-Noris, 2008; Kanbur, 2017; Benhassine et al., 2018; Campos et al., 2018). Unsurprisingly, the performance of informal enterprises has been found to be considerably low compared to formal ones. For example, using sales per worker as a measure of performance, Aga et al. (2023) find that informal enterprises perform very poorly in relation to their formal counterparts and this finding is consistent with other studies that have also evaluated the performance of informal enterprises (see De Mel et al., 2008, 2009; Bardasi et 5 To illustrate this point, imagine two networks. The first is comprised of 8 BAs which have a uniform count of units of 5 in each BA, for a network total count of 40. In the second, the unit counts are 2,2,2,2,2,2,11,11, which also totals 40 over the network. Assume a threshold of count = 2 for both networks. In the first example, note, that there is no additional efficiency gained compared to SRS to enumerating any particular BA. In the second, however, the discovery via enumeration of the high-unit BAs (count=11) produces much more efficiency as the higher-unit squares would have had a much lower probability of selection than with ACS, which intensifies fieldwork in areas adjacent to those meeting the expansion threshold. 8 al., 2011, Benjamin and Mbaye, 2012; La Porta and Shleifer, 2014; Ulyssea, 2020; Aga et al., 2021). Also, informal enterprises tend to be poorly integrated into the formal economy, and their owners perceive no benefits from registering (Maloney, 2004; Benhassine et al., 2018; Aga et al., 2023), although, in some settings, but to varying degree, they put competitor pressure on formal enterprises (Williams and Kedir, 2019). 3. Data and sampling procedure The data used for the study comes from two surveys conducted independently in the same location at about the same time, a HS-IME conducted by the LSMS team and an ISES conducted by the Enterprise Analysis team, both of the World Bank. The HS-IME component was implemented between September and November 2022, while the ISES spanned August to November 2022. The two surveys were administered within the same geographic areas in Ghana: Kumasi and Tamale. Both cities are characterized by high prevalence of informal enterprises. In each city, the area selected for the HS-IME overlapped with the delimitation area defined for the ISES (see Figures 1 and 2). In Tamale, the area selected for the HS-IME study spanned a 40km radius from the city center to help in understanding the regional labor markets and firm dynamics in one catchment area. 6 However, enterprises operated within the catchment area but external to the shared city delineation with the ISES were excluded from the comparative analysis to ensure consistency across the target populations of both surveys. 3.1. Sampling strategy for HS-IME A standard two-stage stratified cluster sampling design was adopted for the HS-IME. Census Enumeration Areas (EAs) were used as Primary Sampling Units (PSUs), while households served as Secondary Sampling Units (SSUs). The first-stage sample frame was established by restricting the 2021 Ghana Population and Housing Census (PHC) list of EAs to those that overlapped with the geographic delineation of each city’s target area. This cartography exercise was generously 6 Defining catchment area as the geographic area following the algorithm developed by Cattaneo et al. (2021), i.e., based on travel distance to the nearest urban center and city size. 9 facilitated by the Ghana Statistical Service (GSS), which also provided the necessary EA maps and datasets. For each city, a stratified sample of 67 EAs was randomly selected from the first- stage sampling frame with probability proportional to size (PPS), based on the number of households in the EA of each strata (district). The selection of EAs was performed independently within sampling strata whose design was different for Kumasi and Tamale, owing to the different analytical objectives and estimation domains envisioned for the two cities. Within these sampling strata, implicit stratification was obtained by sorting the list of EAs according to geographical criteria before executing the systematic PPS selection. At the second stage, fresh lists of households were generated through a household listing exercise conducted in each of the selected EAs. These fresh lists served as second-stage sampling frames, from which 15 households were randomly selected with equal probabilities within each EA. The planned sample size for each city was set at 67 EAs and 1,005 households. 3.1.1. Kumasi sampling design For Kumasi, the target area surveyed by the HS-IME encompassed the Kumasi Metropolitan Assembly (KMA) and 6 surrounding municipal districts, which coincided exactly with the area targeted by the ISES (Figure 1). Within the target area, the sampling frame of Kumasi’s EAs was stratified by sub-metros. 7 The sample of 67 EAs and 1,005 households was allocated to the resulting 11 sampling strata using Kish allocation method (Kish, 1988), as shown in panel A of Table 1. The Kish method is a compromise allocation method whose result can be seen as a quadratic mean of the proportional and equal allocations. It allocates the sample of households disproportionately across strata, by simultaneously ensuring a sufficient sample size for the smallest strata and limiting the maximum sample size for the largest ones: �ℎ 2 + −2 ℎ = (1) 2 ∑ ℎ=1 �ℎ + −2 where is the total sample size in terms of households, ℎ is the proportion of the total household population in stratum ℎ, and is the total number of strata. Within each sub-metro, reserve EAs 7 Here, for convenience, the term sub-metro encompasses both the actual sub-metros of KMA (5) and the municipal districts surrounding KMA (6), resulting in the 11 strata shown in Table 1. 10 were also selected to serve as potential replacements in the event of inaccessibility, logistical hurdles, or security concerns. In the end only two EAs were replaced and were adjusted for the weights calculation. Furthermore, three reserve households were randomly sampled within each selected EA with equal probability, facilitating field substitutions in cases of household nonresponse. 3.1.2. Tamale sampling design For Tamale, the target area of the HS-IME was designed as a 40 km radius circle centered around the Tamale Metropolitan Assembly (TMA), encompassing nearby urban districts as well as prevalently rural areas from the Northern and Savannah regions, as depicted in Figure 2. The circle was partitioned into two areas: a central area comprising 1,156 EAs, that overlapped with the delineated area of the ISES, and a surrounding area consisting of 748 EAs that did not overlap with the ISES delineated area. The boundary of the overlapping area precisely aligned with the delineation of the area targeted by the ISES in Tamale, indicated by the red outline in Figure 2. 8 Since the current study is centered on comparing methods and findings between the HS-IME and ISES that target a common population of informal enterprises, the remainder of the paper exclusively focuses on analyzing data derived from the overlapping area. 67 EAs were selected in the overlapping area, which resulted in a planned sample of 510 households. The sample was further stratified by districts, 9 yielding 3 sampling strata in the overlapping area. Similar to Kumasi, the Kish allocation method was employed to distribute Tamale’s subsamples among the sampling strata, as detailed in panel B of Table 1. The table also indicates the number of replacement households used during fieldwork. 8 The non-overlapping area, predominantly rural and without a homologous counterpart in the Kumasi HS-IME, was intentionally designed to investigate the catchment area of the urban center of Tamale. It is comprised of 33 selected EAs, corresponding of a planned sample of 495 households. However, these households are not part of the analysis since they are not located in the area that overlaps with ISES delineations. 9 Within the overlapping area, 3 strata were formed by merging EAs belonging to neighboring districts as follows: 1) Tamale Central (without any merges), 2) Sagnerigu_mod (mainly consisting of EAs from the Sagnerigu district, supplemented by a few additional EAs from the Kumbungu and Nanton districts), and 3) Tamale South_mod (comprising EAs from the Tamale South district, along with one EA from the Tolon district). 11 3.1.3. Target populations and indirect sampling The HS-IME conducted in Kumasi and Tamale were designed to collect data on multiple topics and enable objective inference regarding different target populations. The primary target populations of the HS-IME are naturally constituted by the private (i.e., non-institutional) households situated within the specified boundaries of the city’s target area, along with the individuals residing in such households. However, the HS-IME also provides objective inferences on the population of household enterprises (NFEs) operated by the households. Within that target population, the subpopulation of NFEs that were informal and active at the reference time of the survey is of special interest in this paper, owing to its comparability with the target population of the ISES. As far as the target population of NFEs is concerned, the HS-IME employed a methodology known as indirect sampling (Lavallee, 2007). When a list frame of the desired target population is not available, indirect sampling first randomly draws a sample of units belonging to a different but related population for which a list frame is available. Then, the selected units are used as a vehicle to collect data on linked units of the target population. If the initial random selection adheres to the principles of probability sampling, the links connecting units of the list and target populations are known, and each unit in the target population is linked to at least one unit in the list, the generalized weight share method (Deville and Lavallee, 2006) can be utilized to obtain unbiased estimates of the target population under indirect sampling. In the present context, households were used to indirectly sample NFEs by exploiting the “operated by” relationship as a link. To this scope, the non-farm enterprise module of the HS-IME questionnaire was designed to separately collect information about each NFE that any respondent household was operating or had operated in the past 12 months. The analysis that follows target NFEs that are informal and operating at the time of the survey. Additionally, interviewers recorded the GPS coordinates of each NFE that was active at the reference time of the survey by traveling to the location where the NFE was operated. 10 10 For mobile NFEs, the GPS coordinates were taken at the location where the business operates most of the time. 12 3.1.4. Nonresponse and bias analysis Like any real-world survey, both the HS-IME conducted in Kumasi and Tamale were affected by unit nonresponse. However, since reserve EAs and households had randomly been selected from the outset to facilitate field substitutions of nonrespondent units, the realized sample size and its allocation to the strata exactly matched the planned ones in both cities. EA-substitution rates were either negligible or null. Only one EA out of 67 was replaced in Kumasi for logistic reasons, and no EAs were replaced in Tamale. On the other hand, household-level nonresponse rates were not negligible and higher in Kumasi than in Tamale. In Kumasi, 65 households out of 1,005 had to be replaced, yielding a nonresponse rate of 6.5%. In Tamale, the nonresponse rate was 2.1%, with 21 replaced households out of 1,005. The stratum-level response rates achieved in the two cities are detailed in the last two columns of Table 1. The observed variation of response rates across strata is not statistically significant for Tamale, but is significant for Kumasi, where higher nonresponse rates were recorded, e.g., in the sub-metros of Asokore Mampong and Oforikrom. While field substitutions improve estimation efficiency by reducing data loss from nonresponse, they do not guarantee the elimination of nonresponse bias. However, in the HS-IME survey in Kumasi and Tamale, extensive statistical tests using listing data found no evidence of systematic differences between nonrespondents and their replacements, suggesting no detectable nonresponse bias at the 5% significance level (see Appendix 1 for details). 3.1.5. Survey weights As outlined in Section 3.1, the HS-IME adopted a standard two-stage stratified cluster sampling design. Therefore, the design weights of the households (i.e., the reciprocals of their inclusion probabilities) were calculated following the established formulas reported in Appendix 2. The design weight ℎ of household j, belonging to EA i, sampled within stratum h, can be expressed as follows: ∗ ℎ ℎ ℎ =� �� � (2) 15 × ℎ ℎ where ℎ is the number of sample EAs allocated to stratum h, ℎ is the total number of households in sampled EA i of stratum h (i.e., the measure of size (MOS) of EA i of stratum h), 13 ℎ is the total number of households across all EAs in stratum h (i.e., the total MOS of stratum h), ∗ ℎ is the total number of households in sampled EA i of stratum h as resulting from the listing exercise, and 15 is the number of households randomly selected with equal probability within each sampled EA. In addition to household-level variables, the HS-IME collected individual-level information from all household members. Consequently, all individuals within the same household share the same inclusion probability, which is equal to the household inclusion probability. The same property holds true for the weights. Moreover, as elaborated earlier, the realized sample in both cities exactly conformed to the planned design, and no indications of nonresponse bias were identified in either city. Therefore, survey estimates regarding both the target populations of households and individuals can be calculated using the design weights in equation (2) without any nonresponse adjustments. Regarding the target population of non-farm enterprises operated by the households located in the city’s target area, we highlighted above that the HS-IME collected data on all NFEs operated by households. In this setting, under the reasonable assumption that NFEs jointly operated by different households are negligibly rare, the generalized weight share method leads to NFE weights that are equal to the weights of the corresponding households (see section 5.2 of Deville and Lavallee, 2006). This implies that survey estimates related to the target population of NFEs can be calculated using the household design weights. In other words, equation (2) also expresses the weight ℎ of NFE e, operated by household j, belonging to EA i, sampled within stratum h. The sample distributions of the HS-IME weights are reported in Appendix 2. 3.2. Area-based adaptative cluster sampling 3.2.1. Sampling approach The ISES uses an area-based Adaptive Cluster Sampling (ACS) method (Aga et al., 2023; Thompson, 1990; 1991). The approach uses a spatial grid to partition each city into squares of equal size and these squares measured 150 meters by 150 meters. 11 Each square in the spatial grid, 11 This methodology can be applied to any geographic area. The ISES are typically conducted within a city/well- defined urban area. 14 referred to as a block, constitutes a PSU (Figure 4 and 5). ACS begins with an initial random selection of a sample of starting blocks () from the city's grid. Each sampled block (starting or selected through subsequent expansions) is thoroughly enumerated, constituting “listed enterprises”. Basic information (such as sector, size, etc.) for all informal enterprises located in the block is collected during this listing process. Enumerators were instructed to list and account for enterprises that refuse the listing exercise as well as those that are unavailable at the time of fieldwork (e.g., they are encountered during off-hours). Enumerators were instructed to approach all structures, including households, and apparent business activities (both fixed and mobile) observed within a BA at the time of enumeration. In certain cases, apparent business activities are unavailable or refuse the initial questions that establish eligibility for the survey. In such cases, enumerators record the refusal/unavailability as status of enumeration and also record additional information by observation. ACS takes advantage of the expectation that informal enterprises are geographically clustered (Aga et al., 2023 provided census-based evidence of such geographic clustering). While informality is pervasive in many economies such as Ghana, the required expectation is that there are relative concentrations where informal enterprises operate. To exploit this concentration, a threshold number of informal enterprises (thus, defining such relative concentration) is defined. All blocks that meet or exceed the threshold trigger full enterprises enumeration of all surrounding blocks (called “expansion”). The intuition behind this adaptive element is that it is more efficient, from a fieldwork perspective, to concentrate enumeration in areas that are opened to enumeration by virtue of being adjacent to those that have higher levels of informal enterprise activity. Expansions can continue propagating, using this same logic, and the process of sample expansion continues until no blocks meet the set expansion threshold. 12 This process produces a set of networks, which are formed by a group of contiguous blocks that all meet the expansion threshold. A randomly selected subset of the available enumerated enterprises is interviewed through a 25-minute questionnaire (called “interviewed enterprises”). Enterprises were selected for the longer interview based on a pre-set selection indicator (0 or 1) assigned to each row of the listed roster; the overall selection of rows adheres to a set selection probability (e.g., if the selection probability for a block is 20%, on average, 20% of the roster rows will be selected). The selection 12 Different patterns of expansion can be implemented through this methodology. The ISES typically activate all surrounding blocks of those meeting the expansion threshold. 15 indicator is determined based on if a random number over the range (0,1) falls within an inclusive selection fraction. For example, if this selection fraction is 0.2, and row with a random number within (0,.2] will receive a 1 in the selection indicator, 0 otherwise. Each row is assigned its own random number, and this process is conducted independently by block to avoid situations where the same position in the roster is always selected for the longer interview. As such, the ISES follows a two-stage sampling process, whereby a probability of selection is clearly defined by integrating selection probabilities from both stages for each interviewed enterprise resulting in a sample that is representative of the informal sector at the city level. To enhance precision, the ISES uses the stratified version of the ACS (Thompson, 1991). After the spatial grid is created, blocks in the grid of each city are categorized into six strata: residential, Central Business District (CBD), other business districts, market center, everything else, and inaccessible. 13 In Kumasi, the survey encompassed a universe of 9,766 blocks, with 1,002 starting blocks enumerated. After the expansion, a total of 1,273 blocks were surveyed, from which 13,243 informal enterprises were listed. A long-form questionnaire was administered to 1,702 of these enterprises. In Tamale, the survey covered a universe of 24,869 blocks, with 984 starting blocks enumerated. After the expansion, the survey included 1,279 blocks, resulting in the listing of 5,113 informal enterprises. Of these, the long-form questionnaire was administered to 796 enterprises (Kumasi has more enterprises listed per sample block (~10) than Tamale (~4)) (see Table 2 for further details). 3.2.2. Sampling weights As in SRS, the weights from the ACS approach are computed as the inverse of the selection probabilities, with the probability of selection (within stratum) adjusted to account for the adaptive selection process. Let ℎ denote the total number of starting blocks (PSUs) selected by stratum h, with ℎ = 1, . . . , indexing each stratum. Let ℎ denote the total number of blocks in the universe of stratum ℎ. Within stratum, starting PSUs are selected using SRS without replacement. 13 There is a sixth category for blocks that are physically inaccessible. This category is excluded from the sampling. See Table 2 16 Thompson (1990, 1991) defines a network as any group of blocks where the enumeration of one block would lead to the enumeration of all blocks in the network. 14 This definition helps show how, within a network, the probability of selection can be estimated. Specifically, let denote the total number of blocks in a network i. 15 In the case that a starting block does not meet the threshold, it is considered as a network with a size of = 1. Then, the inclusion probability of a block in network , , is given by 16: �ℎ− ℎ, � = 1 − � � ℎ � (3) ℎ=1 �ℎ ℎ � The inverse of provides the first-stage weight for each block belonging to network . For market ℎ center blocks, the weight is calculated as since their selection follows a SRS. ℎ In providing population estimates using ACS with stratification, Thompson (1990, 1991) show that these weights produce unbiased estimates. However, in many cases there will be informal enterprises that are enumerated in blocks (edge units) that are not part of the starting selection and do not meet the expansion threshold. To obtain unbiased estimates requires omitting these blocks since they contain unenumerated neighboring/surrounding blocks whose actual probability of selection are not known. In the population count estimates provided in this study, such edge units are omitted. 17 Second-stage weights are produced by multiplying the first-stage (block-level) weight by an adjustment factor, which is given by the inverse of the ratio of the number of interviews 14 Note, then, by definition a network is exclusive; it can also include multiple starting BAs. That is, since all BAs in a network meet or exceed the threshold, when an expansion rule into all adjacent BAs is set, any two adjacent BAs that meet the threshold must belong to the same network and that network only. 15 Note that within a network, all blocks are treated as the same, and so some ACS literature indexes by network rather than block. 16 If there are no blocks with stratum ℎ in the network , then ℎ and ℎ are set to 0. Note that with networks of size = 1, all blocks will be within one stratum and the probability of selection simplifies to the same probability of the initial SRS selection (within the stratum). 17 However, due to the expense of fieldwork and the fact that there are often interviews conducted in edge units, weights are calculated for edge units as well. These weights were estimated according to the formula for π_i with the following substitutions: the term m_(h,i) is replaced with (〖2m〗_(h,i)+e_h ) where e_h is a dummy that equals 1 for edge units of strata h and 0 otherwise, and n_h is replaced with 2n_h. The resulting weights have a known but directionally agnostic bias on population estimates when edge units are retained in this manner. These adjusted weights are used for estimating population means of characteristics of informal enterprises. 17 completed to the total number of informal enterprises found in the block. 18 Since within blocks there are typically both refusals and informal enterprises that cannot be reached, assumptions were made in calculating this adjustment factor. Specifically, the second-stage weights used in this study are assumed to treat all refusals and unavailable (but identified) enterprises as part of the informal economy. 3.3. Pooled HS-IME and ISES data for analytic inference The HS-IME and the ISES share a common target (sub-)population: NFEs that were informal, active, and located within the boundaries of the agreed city delineation at the reference time of the surveys. Moreover, both surveys include a set of common variables that were consistently measured on the respondent NFEs (Table A1). As a result, for these shared variables, the independent estimates produced by the HS-IME and the ISES for the common (sub-)population can be compared both statistically and substantively. The HS-IME and ISES datasets were initially restricted to the common population and then stacked together to create a pooled dataset. This pooled dataset underpinned statistical analyses involving variables that were jointly observed by both the HS-IME and the ISES. This includes both descriptive and analytic inference, i.e., estimates of both population and model parameters. However, to result in valid inference, such analyses must properly account for (i) the independence of the original HS-IME and ISES samples and (ii) the different sampling design that generated those samples. This required specific precautions both in using survey weights (e.g., for point estimation) and in considering sampling design metadata (e.g., for uncertainty estimation and hypothesis testing). Appendix 3 provides details on how such necessary precautions were implemented. 4. Analytical and econometric approach The study relies on descriptive and econometric approaches. With the descriptive analysis, we compare point estimates and their confidence intervals from the two surveys to learn about their 18 Weights of the blocks that contain no completed interviews are proportionally transferred to blocks that (i) have at least one completed interview, (ii) have the same stratum, and (iii) have the same first-stage weight. If matching with these three parameters is not possible, the requirement of the same first-stage weight is replaced with a requirement that the blocks are in the same cluster. If matching is still not possible, only the first two parameters are used. 18 closeness. In the econometric analysis, we explore whether similar factors can be identified as drivers of informality from the two surveys. We rely on (the logarithm of) sales and sales per worker in the last month as the key dependent variables in our analysis for three reasons. First, both measures capture the performance attributes of the enterprises and are likely to be well correlated with many other attributes of informal enterprises (Böhme and Thiele, 2014; Shibia and Barako, 2017; Ackah et al., 2020; Owoo et al., 2024). Second, both measures are continuous variables, and help capture the notion of a continuum, with the former being associated with size definition of informal enterprises (Maloney, 2004; Aga et al., 2023), while the latter measures labor productivity, of which low levels are associated with high informality (La Porta and Shleifer, 2014; Aga et al., 2023; Aberra et al., 2022). Finally, the two variables are the only continuous variables that were most similar with regards to how the information about them were solicited in the two surveys. The following two main equations were estimated: = + � + (4) =1 = + 1 + 2 + 3 . (5) +� + =1 Both equations (4) and (5) were estimated using Weighted Least Squares. Inference on model parameters, including uncertainty estimation and hypothesis testing, is conducted within the pseudo maximum likelihood framework (see Appendix 3). In both equations, denotes the logarithm of sales or the logarithm of sales per worker (labor productivity), represents a vector of independent variables, while and are the error terms. In equation (4), and are the parameters to be estimated and are respectively for the constant term and the coefficient of the ℎ explanatory variable. We run the regressions separately for the HS-IME and the ISES using Equation (4) and compare their parameter estimates in terms of signs, magnitude and statistical significance. In Equation (5), we perform similar regressions but with pooled data from both the ISES and HS-IME. Pooling the two datasets helps to explore whether enterprises in a particular survey type are different from the other enterprises in terms sales and sales per worker. Additionally, it 19 allows for understanding whether any such difference may be location specific or not by including an interaction variable from survey type and location (city) variables. In equation (5), and represent the survey type and location, respectively. The coefficients of household, enterprise and owner characteristics are denoted by , while 1 and 2 are the parameters for survey type and location, respectively. We also examine the interactions effect of survey type and location on enterprise performance, the coefficient of which is denoted by 3 . Table 3 provides detailed descriptions of all the variables used in the analysis, including their expected signs, determined based on the existing literature on informal enterprises. In cases where we could not find relevant literature, we relied on our intuition and casual observation in deciding on the expected sign. Table A1 provides details on how information for the variables were measured in respective surveys. We define an enterprise as informal if it is not registered with the appropriate government agency. 5. Results and discussions 5.1. Descriptive results In this section, we rely on descriptive statistics to compare the characteristics of the informal enterprises sampled in the HS-IME and the ISES from the two cities. The comparison is done both between locations and survey types, exploring differences and similarities in characteristics and performance across these two dimensions. These comparisons could provide insights into whether the suitability of the two approaches may be location or context-specific and an entry point for discussing the potential contextual circumstances under which one of the survey approaches may be preferred to the other. The analysis is done using both point and interval estimates (i.e. confidence intervals) for each variable. While the point estimates provide a simple and straightforward means of comparing the characteristics, the interval estimates provide an indication of the extent of precision associated with the estimates from the two surveys and locations. In general, the results show minimal deviations in the estimates (both points and intervals) from the two surveys in Kumasi. In Tamale, however, we observe wider deviations with no overlapping confidence intervals for some of the variables as well as large confidence intervals particularly in the case of the ISES. 20 5.1.1. The estimated population of informal enterprises from HS-IME and ISES Using the sample data from HS-IME and ISES, we estimate the population of informal enterprises for the two locations. From Table 4, one can observe that the HS-IME projected a population of informal enterprises close to 120,000 and 89,000 in Kumasi and Tamale, respectively, while that of the ISES was approximately 68,000 and 21,000 for the two cities respectively. The differences are large with no overlap between their respective confidence intervals. While the true population estimates are unknown, 19 the HS-IME estimates align with the estimates of the GLSS 7. 20 However, the differences may partly arise from design and implementation issues leading to a risk of undercounting when using the ACS sampling approach or overcounting when using the HS- IME. Potential undercounting by ISES Sampling enterprises based on their geographic location means those with no physical location (that is mobile enterprises such as hawking and taxi services) will be captured only to the extent that they are observed in the moment by an enumerator. Moreover, even for enterprises with a fixed physical location, some may not be captured if the survey hours fall outside of the peak operating period and/or coincide with their downturns. 21 Some of these enterprises also do not maintain a visible presence, meaning enumerators are less able to list them when they are not in operation. In all these scenarios, there is a high potential for the HS-IME to capture the enterprises concerned because the sample method is not enterprise location based. Potential overcounting by HS-IME On the other hand, the HS-IME approach may be prone to double counting of informal enterprises, co-owned by multiple households. The HS-IME method also relies upon the elaboration of different business activities as understood by respondents, meaning different activities, which could constitute similar and sometimes integrated activities, would be listed separately (for 19 A potential ideal benchmark for comparing the two estimates would be an exhaustive economic establishment census, including data on informality. Such a census was conducted in Ghana in 2024, but the data was not available at the time of drafting this paper. 20 Results from this analysis of the GLSS 7 data are available upon request. 21 Of the ISES interviews, 99% started between 8 AM and 6 PM. 21 example, consider an operator who both sells palm wine and distills it into Akpeteshie, a local homemade spirit, could be listed twice in the HS-IME but would only be observed once in the ISES). To test this hypotheses, further analysis was conducted on the analytical sample; 80% of households operate only one NFE, 17% of households have 2 NFE and 3% have 3 or 4 NFE. Among the NFE that belong to households with multiple enterprises, 99% are unique in terms of location and owner, while 93% are unique in terms of location (different GPS coordinates), so the HS-IME may not be overestimating informal NFE counts due to these reasons. 5.1.2. Characteristics of main owner The analysis on the characteristics of owners from the two surveys show that, on average, the owners of informal enterprises in Kumasi are older (42 years for ISES and 43 years for HS-IME), have better educational background and belong to small-sized households compared to those in Tamale (Figure 6). However, no notable difference in the representation of women-owned enterprises is observed between the two surveys. The difference in the proportion of women- owned enterprises is not statistically significant across the two surveys. This finding diverges from the results reported by Kagy et al. (2023), who identified a significant disparity in female ownership representation between the two types of surveys in Sub-Saharan Africa. 22 They attribute this difference to the sampling criteria utilized in enterprise surveys, which predominantly target visible enterprises in densely populated commercial areas, potentially leading to an underestimation. The field implementation protocol of the ISES mandates the identification of informal enterprise activities within all households in each surveyed block, likely resulting in a more accurate representation of women engaged in the informal enterprises. The divergence likely stems from the differences in Kagy et al. (2023) methodology, which compares surveys conducted in varied settings, cities, and time periods, unlike our paper. This highlights that a valid comparison between the two surveys requires conducting them in a similar setting, which our study did. We observe significant differences in the incidence of sole ownership only in Tamale (95% for ISES and 99% for HS-IME in Tamale). The difference in gender, age, and household sizes across the two methods in Kumasi are not as large as those observed in Tamale. In general, the 22 Even though Kagy et al. (2023) use data from both formal and informal enterprise surveys, these results hold for the comparison between multi-topic household surveys and informal sector enterprise surveys. 22 estimates from both surveys in Kumasi are associated with smaller confidence intervals than that of Tamale (Figure 6). Unlike Kumasi, we observed larger confidence intervals associated with the ISES in Tamale than that of the HS-IME. In the case of age and household size, the confidence intervals from the two surveys do not overlap. Interestingly, the ISES rather yielded smaller confidence intervals on age, education, and gender in Kumasi. It thus appears that, based on the characteristics presented in Figure 6, the ISES provides slightly more precise confidence intervals than the HS-IME for Kumasi, while for Tamale, the reverse is true but with much wider confidence intervals for the ISES. 5.1.3. Characteristics of the workers The results in Figure 7 show smaller confidence intervals for the ISES estimates in Kumasi than that of the HS-IME in the same location, particularly with regards to the total number of workers and proportion of workers that are female. In contrast, the estimates from the ISES in Tamale, especially for number of workers and proportion of female workers, have much wider confidence intervals than those from the HS-IME in Tamale. Interestingly, we can observe a very wide deviation between the point estimates from the HS-IME and ISES the on number of workers in Tamale, with the confidence intervals that are completely far off from each other. For the other characteristics in Figure 7 (especially literate workers), however, the deviations in the point estimates between the HS-IME and ISES in Tamale are smaller than in Kumasi. Thus, Figure 7 in general show that the ISES in Tamale yielded wider confidence intervals on the characteristics of workers, though there are instances where very minimal deviations between the estimates from the HS-IME and ISES are observed. The comparison of the worker characteristics between Kumasi and Tamale corroborates the earlier point made that the informal enterprises in Tamale may be ‘more informal’ than their counterparts in Kumasi. For example, the share of literate workers (read and write) in Tamale is much smaller than in Kumasi (Figure 7). Also, Figure 7 indicates that the enterprises in Tamale are likely to have a much smaller number of workers, and a large share of female employees. 5.1.4. Sector of operation and other characteristics Figure 8 presents the performance indicators (sales and sales per worker) and sectoral distribution of the informal enterprises in the two cities. The data shows close estimates of the average sales 23 and sales per worker from the HS-IME and ISES in Kumasi though we observe relatively large deviations in Tamale with wider confidence intervals for the estimates from ISES. Figure 8 further shows that, in relative terms, there are more manufacturing enterprises in Tamale (22% for ISES and 29% for HS-IME) than in Kumasi (17% for ISES and 22% for HS-IME). The figure also shows more retail (59% for ISES and 61% for HS-IME) and services (25% for ISES and 19% for HS-IME) enterprises in Kumasi than in Tamale (retail: 59% for ISES and 52% for HS-IME; and services: 19% for ISES and 19% for HS-IME). The ISES estimates in Tamale have much wider confidence intervals than those from the HS-IME in the same location, with relatively large deviations between the point estimates from the two surveys, except for enterprises in the service sector. While we observe relatively large deviations between point estimates for both surveys in Kumasi, the confidence intervals from the ISES are relatively small. Specifically, for the share of retail and manufacturing enterprises in Kumasi, the confidence intervals of the estimates from the ISES are smaller, compared to those from the HS-IME. Figure 9 provides information on other characteristics of the enterprises in the two study locations. The figure shows that irrespective of the survey type, informal enterprises in Kumasi have higher incidence of bank account ownership, loan application, need for electricity as well as membership in business associations than their counterparts in Tamale. The informal enterprises in Kumasi also appear to have been operating longer than those in Tamale. However, the incidence of mobile phone usage is slightly higher among the enterprises in Tamale than those in Kumasi, while the proportion of enterprises that operate at home is higher in Tamale than in Kumasi. Except for mobile phone usage, the results in in Figure 9 significantly corroborate the earlier assertion that the informal enterprises in Tamale characteristically appear ‘more informal’ than those in Kumasi. The point estimates in Figure 9 from the two surveys generally show higher deviations in Tamale than in Kumasi, consistent with those observed for characteristics presented earlier. Generally, it appears from the descriptive analysis of the various characteristics (owner, worker, sector, and other characteristics) that the point estimates and their confidence intervals from the two surveys in Kumasi largely converge. In Tamale, however, significant differences are observed more often than not with considerably larger confidence intervals for the ISES. These variations in the estimates exist despite the advantage of larger sample sizes used in the ISES. 24 5.2. Econometric results 5.2.1. Correlates of enterprise performance: ISES versus HS-IME Table 5 presents the regression results on the correlates of log of sales and log of sales per worker (hereafter sales and labor productivity respectively) separately for the two surveys (see Tables A2 and A3 for location-specific results by survey type). Sales and labor productivity capture the performance spectrum of enterprises. The results from the HS-IME are presented in columns 1-6 of Table 5, where columns 1-3 and 4-6 are respectively for sales and labor productivity. Columns 7-12 present the results from the ISES with Columns 7-9 and 10-12 being for sales and labor productivity, respectively. Columns 1 and 7 are the naïve results, while columns 2 and 8 control for the month of interview. From the results in columns 3, 6 and 9, we find seven predictors (Tamale, total number of workers, ownership of a bank account, operating at home, mobile operation (i.e. no fixed location), use of mobile phones and years of operations) with statistically significant coefficients for both the HS-IME and ISES. While the signs of the coefficients of these variables appear fairly consistent across the two surveys, the magnitude of these coefficients differ substantially. There are other statistically significant survey-specific correlates of sales. In particular, there are three correlates (manufacturing, retail and membership of association) that are statistically significant in the HS- IME but not in the ISES. The sector-related variables (i.e. manufacturing and retail) are positively associated with sales while membership of association is negatively associated with sales. We also observe three variables (household size, proportion of workers that are females and applied for loan) that are statistically significant drivers of sales in the ISES model but plays no role in influencing sales in the HS-IME, though they have the expected signs. On the labor productivity model results, we see that the total number of workers operating from inside the home and mobile operation (i.e. no fixed location) are negatively associated with productivity while the retail sector, business bank account ownership, use of mobile phones and years of operation are positively associated with productivity, irrespective of the survey type. In addition, we find no education and membership of association as non-enhancing factors of labor productivity in the HS-IME. For the ISES, we find enterprises that used loans, located in Tamale, or have higher share of female workers being less productive. The above results show that while the two surveys generally produce similar correlates of informal enterprise performance, there are also several areas of divergence, particularly in terms 25 of the level of statistical significance of these correlates. Two perspectives may arise from the divergence. First, the divergence may be indicative of the differences in sampling approaches. Second, the differences in the statistically significant correlates between the two surveys may suggest that each approach captures different types of enterprises. For example, we observe from Table 5 that the location variable (i.e., Tamale) in the HS-IME is significant but negatively associated with sales (column 3) while it has no significant relationship with sales per worker (column 6). In contrast, for the ISES, the location variable has statistically significant and negative relationship with both sales 23 and labor productivity. Section 5.2.2 explores this issue in more detail by pooling the two datasets and examining whether the survey type and location affect sales and labor productivity. It is important to note that the signs of the variables with statistically significant coefficients in the HS-IME or ISES are largely consistent with the existing literature (see Table 3). An exception, however, is the sign for membership of enterprise association which has a statistically significant and negative relationship with sales in the HS-IME, which suggests that enterprises that belong to associations are more likely to be smaller in terms of sales, and, hence, are likely to be more informal. This result appears counterintuitive but seems to be driven by the enterprises in Tamale. This is because there are many program interventions by several entities, particularly nongovernmental organizations, in Tamale which often encourage enterprises to belong to associations. This is confirmed by the results in Tables A2 and A3 by location, and indicates that the membership of association is only statistically significant with a negative sign in the results for Tamale, but not statistically significant in Kumasi. Also, the negative sign of the location variable in Table 5 supports this argument and further suggests that the informal enterprises in Tamale are relatively smaller and may be more informal than their counterparts in Kumasi, as alluded to earlier in the descriptive results. 5.2.2. Correlates of enterprise performance in the pooled sample Table 6 reports the results from the regressions that use pooled data from the HS-IME and ISES. Using the pooled data helps to directly examine the relationship between the survey type and the dependent variables (sales and sales per worker) while controlling for location as well as an 23 Note that location’s relationship with sales is only statistically significant at 10% while its relationship with productivity is statistically significant at 1%. 26 interaction effect between location and survey type. The results in Table 6 show that the enterprises captured in the ISES have, on average, lower sales compared to those captured by the HS-IME (see columns 1-3). Between the two locations, the results in columns 1-3 of Table 6 further indicate that the enterprises in Tamale have lower sales on average, which is consistent with results from the HS-IME reported in Table 5. However, although both location and survey types have statistically significant relationship with sales, we do not find a statistically significant interaction effects for these two variables, suggesting the relationship between the survey type and sales may have less to do with location or the relationship between sales and location may not be affected by the survey type. Also consistent with the results on sales is the finding in columns 4 – 6 of Table 6, which indicates that the enterprises in the ISES have lower labor productivity, on average, than those in the HS. In other words, the statistically significant difference in productivity levels between the two survey types largely aligns with the argument that the enterprises captured by the ISES may be relatively more informal than their HS-IME counterparts. The results in columns 4 – 6 of Table 6 do not show a robust relationship between location and labor productivity, consistent with the HS-IME results in Table 5. Similar to the findings on sales, column 6 of Table 6 show that the interaction variable from location and survey type has no statistically significant relationship with labor productivity. There are two important insights worth highlighting from the results on the other characteristics in the pooled regressions. First, the results on the other characteristics are generally in line (in terms of statistical significance and direction of effect) with what we observe for the separate survey-specific results in Table 5. Second, and more importantly, the sign of each of the statistically significant correlates in Table 6 for both sales and sales per worker resonates with the characterization of informality in the existing literature. For example, total number of workers, which is also an indication of the size of enterprises, is positively related with sales, and this aligns with the literature that suggests that informal enterprises are mostly micro or small in size (see La Porta and Shleifer, 2014; Aga et al., 2023). Arguably, however, this result further indicates that the smaller the total number of workers an informal enterprise has, the higher the level of its informality. Ownership of bank account, use of loans and mobile phones are all positively associated with sales, implying that access to financial and telecommunication services is higher among 27 informal enterprises with higher sales. Thus, in relative terms, the more informal enterprises have lower access to financial and telecommunication services, which is also consistent with the extant literature on informality (Koeda and Dabla-Noris, 2008; Bardasi et al., 2011; Kanbur, 2017; Benhassine et al., 2018; Campos et al., 2018). However, home-based and mobile enterprises, as well as employment of female workers are negatively associated with sales; these factors are known to constitute a defining characteristic of informal enterprises (Bardasi et al. 2011; Aberra et al. 2022), and hence, the level of informality. The signs of the coefficients of the correlates of labor productivity are generally similar to those of sales except for the total number of workers, which is positively associated with sales but negatively correlated with productivity. This suggests that informal enterprises that are small may be more productive than larger informal enterprises and this may help explain why informal enterprises often remain small or micro in nature. 6. Conclusions and implications This study draws on data from two surveys that employ varied survey methodologies, including different sampling approaches for measuring and understanding informality. The two surveys were conducted in the same cities in Ghana (Kumasi and Tamale) and at approximately the same time, allowing for investigating the potential differences and complementarity between the two approaches and their relative suitability. The study finds several important results with significant implications for informal sector enterprise survey designs and implementation, particularly with regards to enhancing our understanding and measurement of informality in LMICs. First, we find that the estimates of population count from the two surveys show significant deviations; the HS-IME provides larger estimates that are more aligned with the GLSS while the ISES provides significantly lower estimates, with no overlapping confidence intervals in both locations. The similarity between the GLSS and HS-IME estimates does not necessarily confirm their accuracy, but it does suggest that the HS-IME method is validated by a comparable survey approach. The differences between HS-IME and ISES estimates may partly arise from design and implementation issues leading to a risk of undercounting when using the ACS approach or overcounting when using the HS-IME methodology. For example, the ISES enumerates enterprises that are observed in real-time, while HS-IME ask if the enterprise is currently operating without the need to observe its operation in real-time. This narrow time frame in the ISES can 28 make it challenging to time the survey to coincide with peak operating hours for businesses, potentially leading to underestimation of low-intensity enterprises, such as night markets, that operate outside the survey period. These results imply that further improvements and safeguards can be applied to ISES methodologies to reduce the risk of undercounting. These could include aligning the time windows of enumeration to peak hours and methods to estimate and capture, say, mobile enterprises that moved on before they could be listed. The HS-IME approach, on the other hand, may be prone to double counting of informal enterprises co-owned by multiple households, hence, questionnaire design should include questions on whether the household co-own(s) the enterprise with non-household members and where those non-household members reside. Second, the results from the descriptive analysis showed minimal deviations in both point and interval estimates in Kumasi across the two surveys. These findings largely suggest that, for Kumasi, the two methods are largely consistent with each other in providing descriptive characterization of informal enterprises though this is not the case for Tamale. Third, the direction of the deviations in the descriptive statistics between the two locations generally suggests that the informal enterprises in Tamale may have lower performance than their counterparts in Kumasi. The results from the econometric analyses corroborate this, an indication that informal enterprises in Tamale have significantly lower sales and labor productivity than those in Kumasi. Fourth, the econometric analysis showed largely consistent results from the two surveys and across both locations regarding the key correlates or drivers of enterprise performance and are in line with the characterization of informality in the extant literature. We argue that the few areas of divergence in the determinants from the two surveys reflect the differences between the two approaches for understanding informality. Fifth, an additional finding from the econometric analysis showed that informal enterprises surveyed in the ISES have lower performance than those from the HS-IME, after controlling for a set of covariates. This means that, in terms of performance, the informal enterprises captured in the ISES may be more informal than their counterparts in the HS-IME. This argument is anchored on the basic logic that the more informal an enterprise is, the poorer its performance. 29 Sixth, an additional finding worth noting and consistent across the two surveys is that the number of workers is negatively correlated with labor productivity. This suggests that informal enterprises that are small (in terms of workers employed) may be more productive than larger ones. In this sense, the two survey approaches may be complementary in explaining why informal enterprises often remain small or micro. Finally, on the development of survey tools, it is important to ensure harmonization of questions meant to capture informality in the two survey approaches. In that case, there should be minimal deviations and consistency in variables from both surveys to characterize and assess the performance of these informal businesses. References Aberra, A., Aga, G., Jolevski, F., & Karalashvili, N. (2022). Understanding informality: comprehensive business-level data and descriptive findings. Policy Research Working Papers;10208. World Bank, Washington, DC. Ackah, C. G., Atta-Ankomah, R., & Appiah Kubi, J. (2020). Management practices and performance improvement in manufacturing enterprises: The case of Kaizen adoption in Ghana. In Hosono et al. (eds.). Workers, Managers, Productivity: Kaizen in Developing Countries, 269-292. Aga, G. A., & Reilly, B. (2011). Access to credit and informality among micro and small enterprises in Ethiopia. International Review of Applied Economics, 25(3), 313-329. Aga, G., Campos, F., Conconi, A., Davies, E. & Geginat, C. (2021). Informal firms in Mozambique. Policy Research Working Paper #9712 World Bank, Washington, DC. Aga, G., Francis, D. C., Jolevski, F., Rodriguez Meza, J., & Wimpey, J. S. (2023). An application of adaptive cluster sampling to surveying informal businesses. Journal of Survey Statistics and Methodology, 11(5), 1246-1266. Bardasi, E., Sabarwal, S., &Terrell, K. (2011). How do female entrepreneurs perform? Evidence from three developing regions. Small Business Economics, 37(4), 417. Benhassine, N., McKenzie, D., Pouliquen, V., & Santini, M. (2018). Does inducing informal firms to formalize make sense? Experimental evidence from Benin. Journal of Public Economics, 157, 1-14. Benjamin, N. C., & Mbaye, A. A. (2012). The informal sector, productivity, and enforcement in West Africa: A firm‐level analysis. Review of Development Economics, 16(4), 664-680. Binder, D. A. (1983). On the variances of asymptotically normal estimators from complex surveys. International Statistical Review/Revue Internationale de Statistique, 279-292. 30 Böhme, M. H., & Thiele, R. (2014). Informal–formal linkages and informal enterprise performance in urban West Africa. The European Journal of Development Research, 26, 473- 489. Campos, F., Goldstein, M., &McKenzie, D. (2018). How should the government bring small firms into the formal system? Experimental Evidence from Malawi. Policy Research Working Paper #8601. World Bank, Washington, DC. Charmes, J. (2002). Estimation and survey methods for the informal sector. University of Versailles, Versailles, France. Chu, A., Brick, J.M., & Kalton, G. (1999). Weights for combining surveys across time or space. Bulletin of the International Statistical Institute, 2, 103-104. De Mel, S., McKenzie, D., and Woodruff, C. (2008). Returns to capital in micro-enterprises: Evidence from a field experiment. Quarterly Journal of Economics, 123(4), 1329-1372. De Mel, S., McKenzie, D., &Woodruff, C. (2009). Are women more credit constrained? Experimental evidence on gender and microenterprise returns. American Economic Journal: Applied Economics, 1(3), 1-32. Deen-Swarray, M., Moyo, M. & Stork, C. (2013). ICT access and usage among informal businesses in Africa. Emerald Insight, 15(5), 52-68. Deville, J., & Lavallée, P. (2006). Indirect sampling: The foundations of the generalized weight share method. Survey Methodology, 32(2), 165. Frosch M. (2025) Williams Revealing the unseen: The 21st ICLS statistical standards on the informal economy. Statistical Journal of the IAOS. 2025;40(4):767-785. doi:10.3233/SJI- 240058 Ghana Statistical Service. (2015). 2015 Labor Force Report. Accra, Ghana. Retrieved from http://www.statsghana.gov.gh/docfiles/publications/Labor_Force/LFS report_fianl_21- 3-17.pdf Ghana Statistical Service (2022). Ghana annual household income and expenditure survey. Quarterly labor force report. Retrieved: July 29, 2024. https://www.statsghana.gov.gh/gssmain/fileUpload/pressrelease/AHIES%202022%20Q1%2 0and%20Q2%20Labour%20Force%20Report.pdf Good Governance Africa. (2023). Assessing Opportunities for the Sustainable Integration of Ghana’s Informal Sector Contributions into Socio-Economic Development of Ghana. West Africa Regional Office, Osu-Accra, Ghana Grimm, M., Hartwig, R., & Lay, J. (2013). Electricity access and the performance of micro and small enterprises: evidence from West Africa. The European Journal of Development Research, 25, 815-829. Hussmanns, R. (2004). Measuring the informal economy: From employment in the formal sector to informal employment. ILO Working Papers 993750003402676, International Labor Organization. 31 Hussmanns, R. (2009). Informal sector surveys: Advantages and limitations of different survey methods and survey designs for the data collection. ILO Bureau of Surveys. International Labor Organization (ILO) (2023). Resolution concerning statistics on the informal economy, 21 International Labour Conference, Geneva. Kagy, G., Hardy, M., & Jimi, N. (2023). Mind the data gaps: An examination of women-owned enterprise representation. Kanbur, R. (2017). Informality: Causes, consequences and policy responses. Review of Development Economics, 21 (4), 939-961 Kish, L. (1992). Weighting for unequal Pi. Journal of Official Statistics, 8, 183-200. Kish, L., (1988) Multipurpose Sample Designs, Survey Methodology, Vol. 14, No. 1, 1988, 19-32. Koeda, J., & Dabla-Noris, E. (2008) Informality and bank credit: Evidence from firm-level data. IMF Working Paper WP/08/94. La Porta, R., & Shleifer, A. (2014). Informality and development. Journal of Economic Perspectives, 28(3), 109-26. Lohr, S. (2021). Multiple-frame surveys for a multiple-data-source world. Survey Methodology, 47(2), 229-263. Lumley, T., & Scott, A. (2017). Fitting regression models to survey data. Statistical Science, 265- 278. Maligalig, D. S., & Guerrero, M. F. (2008). How can we measure the informal sector. Phillipine Statistical Association, Inc. Maloney, W. F. (2004). Informality Revisited. World Development, 32(7), 1159-1178. McCaig, B., & Pavcnik, N. (2021). Entry and exit of informal firms and development. IMF Economic Review, 69, 540-575. Ndoya, H., Okere, D., laure Belomo, M., & Atangana, M. (2023). Does ICTs decrease the spread of informal economy in Africa?. Telecommunications Policy, 47(2), 102485. Nishimura, R. (2015). Substitution of Nonresponding Units in Probability Sampling (Doctoral dissertation). University of Michigan, Lansing, USA. O’Muircheartaigh, C., & Pedlow, S. (2002). Combining samples vs. cumulating cases: a comparison of two weighting strategies in NLSY97. American Statistical Association Proceedings of the Joint Statistical Meetings, 2557-2562. Owoo, N.S., Amankwah, A., Castaing, P., & Palacios-Lopez, A. (2024). Household business performance in Ghana: The role of personality traits and gender role attitudes. Policy Research Working Paper Series 10804, The World Bank Group, Washington DC. Peprah, V., Buor, D., & Forkuor, D. (2019). Characteristics of informal sector activities and challenges faced by women in Kumasi Metropolis, Ghana. Cogent Social Sciences, 5(1), 1656383. Pfeffermann, D. (1993). The role of sampling weights when modeling survey data. International Statistical Review/Revue Internationale de Statistique, 317-337. 32 Scott, K., Steele, D., & Temesgen, T. (2005). Chapter XXIII: Living Standards Measurement Study Surveys. Household sample surveys in developing and transition countries. Shibia, A. G., & Barako, D. G. (2017). Determinants of micro and small enterprises growth in Kenya. Journal of Small Business and Enterprise Development, 24(1), 105-118. Skinner, C.J. (1989) Domain means, regression and multi-variate analysis. In, Skinner, C.J., Holt, D. and Smith, T.M.F. (eds.) Analysis of Complex Surveys. Chichester, UK; New York, USA. Wiley, 59-88. Thompson, S. K. (1991). Stratified adaptive cluster sampling. Biometrika, 78(2), 389-397. Thompson, S. K. (1990). Adaptive cluster sampling based on order statistics. Environmetrics, 7(2), 123-133. Ulyssea, G. (2020). Informality: Causes and consequences for development. Annual Review of Economics, 12, 525-546. Williams, C. C., & Kedir, A. M. (2019). Explaining cross-country variations in the prevalence of informal sector competitors: lessons from the World Bank Enterprise Survey. International Entrepreneurship and Management Journal, 15(3), 677-696. World Bank (2022). Enterprise surveys sampling methodology. https://www.enterprisesurveys.org/content/dam/enterprisesurveys/documents/methodology/ Sampling_Note-Consolidated-2-16-22.pdf. 33 Figure 1. Map of sub-metros selected in Kumasi for the HS. 34 Figure 2. Map of districts selected in Tamale for the HS. 35 Figure 3. Sample distribution of the weights of active informal NFEs in the delineated area from the HS-IME. 36 Figure 4. Stratification of grids for the ISES – Kumasi. Note: The districts covered are Kumasi Metropolitan Area, Asokore Mampong Municipal, Asokwa Municipal, Kwadaso Municipal, Oforikrom Municipal, Old Tafo Municipal, and Suame Municipal. 37 Figure 5. Stratification of grids for the ISES – Tamale. Note: The districts covered are Tamale Metropolitan and Sagnerigu. However, the sparsely populated area within these districts have been excluded. 38 Figure 6. Characteristics of the main owner by survey type. 39 Figure 7. Characteristics of the workers by survey type. 40 Figure 8: Performance and sector of operation. 41 Figure 9. Other characteristics of the informal enterprises. 42 Table 1. Sample distribution in the HS-IME. Kumasi Planned Planned Respondent Replacement Total Interviewed Stratum EAs HHs HHs HHs HHs Asokore Mampong 7 105 88 17 105 Asokwa 6 90 88 2 90 KMA -Bantama 6 90 81 9 90 KMA -Manhyia 5 75 72 3 75 North KMA -Manhyia 5 75 72 3 75 South KMA -Nhyiaeso 6 90 90 0 90 KMA -Subin 5 75 73 2 75 Kwadaso Municipal 7 105 97 8 105 Oforikrom 7 105 95 10 105 Old Tafo Municipal 6 90 83 7 90 Suame Municipal 7 105 101 4 105 Total 67 1,005 940 65 1,005 Tamale Planned Planned Respondent Replacement Total Interviewed Stratum EAs HHs HHs HHs HHs CORE.SAGNERIGU_Mo 12 180 177 3 180 d Core.Tamale Central 11 165 162 3 165 CORE.TAMALE 11 165 161 4 165 SOUTH_Mod Core Total 34 510 500 10 510 Ring.Central Gonja 4 60 57 3 60 Ring.Gushiegu 2 30 30 0 30 Ring.Kumbungu 6 90 87 3 90 Ring.Mion 2 30 29 1 30 Ring.Nanton 4 60 59 1 60 Ring.North East Gonja 3 45 45 0 45 Ring.Savelugu 5 75 73 2 75 Ring.Tamale South 2 30 29 1 30 Ring.Tolon 5 75 75 0 75 Ring Total 33 495 484 11 495 Total 67 1,005 984 21 1,005 Note: In each EA, 3 households were originally selected for replacements, though the actual replacements were more in some EAs than expected. These differences were accounted for in the final weights calculation. 43 Table 2. Sample distribution in the ISES. Average no. Starting Total number of Informal of informal Universe of Interviews City Strata blocks blocks business units businesses blocks completed enumerated enumerated enumerated units per block Residential 6400 420 461 2329 5 439 Central Business District 109 60 81 1367 17 98 Other Business Districts 1627 325 526 7146 14 907 Kumasi Markets 152 106 106 1891 18 136 Everything else 1478 91 99 510 5 122 Total 9766 1002 1273 13243   1702 Residential 7921 396 431 442 1.03 114 Central Business District 182 100 166 1569 9.45 179 Other Business Districts 1938 387 578 2666 4.61 467 Tamale Markets 27 27 27 418 15.48 24 Everything else 14801 74 77 18 0.23 12 Total 24869 984 1279 5113   796 44 Table 3. Variable definitions and expected signs in the regressions for sales and sales per worker. Variable Description Expected Supporting literature for the expected signs* sign Dependent variables: Log of sales Log of sales in the last month of operation NA - Log of sale per Log of sales per worker in the last month of NA - worker operation Independent variables: es_survey NFE was captured in enterprise survey (base -/+ Author’s intuition = household survey) tamale NFE is located in Tamale (base=kumasi) - Author’s intuition es_survey_tamale Interaction variable between es_survey and - Author’s intuition tamale age_owner Age of the main owner + La Porta and Shleifer, 2014; Aga et al. 2023 noschooling_owner Main owner has never been to school - Maloney, 2004; La Porta & Shleifer, 2014; Aga et al. 2022 female_owner The main owner is female - Aga et al. 2023; Peprah et al. 2019; Bardasi et al. 2011; Aberra et al. 2022 one_owner Only one person owns the NFE - Aberra et al. 2022 hhsize Number of household members -/+ Author’s intuition manufacturing2 NFE is into manufacturing (base=services) + Koeda and Dabla-Noris, 2008; Williams et retail2 NFE is into retail (base = services) + al. 2016; Kanbur 2017; Benhassine et al., 2018; Campos et al., 2018 tot_workers Total number of workers + Benjamin & Mbaye 2012 share_fem_workers Proportion of female workers among total - Aga et al. 2023; Peprah et al. 2019 workforce share_workers_read Proportion of workers who can read + Maloney, 2004; La Porta and Shleifer, 2014; Aga et al. 2023 share_workers_write Proportion of workers who can write + Maloney, 2004; La Porta and Shleifer, 2014; Aga et al. 2023 bank_account Has a separate bank account for the business + Koeda and Dabla-Noris, 2008; Kanbur 2017; Benhassine et al., 2018; Campos et al., 2018 op_athome_inside Location of operation is the household (only - Aberra et al. 2022 inside) applied_loan NFE applied for a loan in the past year + Aga & Reilly 2011; Bardasi et al 2011; Kanbur 2017; Benhassine et al., 2018; Campos et al., 2018 use_phones Uses smartphones or cellphones + Deen-Swarray et al. 2013; Ndoya et al. 2023 need_electricity Need electricity to operate + Grimm et al 2013; Aga et al. 2023 years_operation Years of business operation + Williams et al 2016; McCaig & Pavcnik, 2021. member_organization The NFE is a member of or part of any + /- Author’s intuition organized association Month of interview Dummies for month NFE was interviewed NA - Notes: *Between sales and sales per worker, signs are expected to be the same for all the independent variables except in the case of total number of workers for which we expect a positive sign for sales but a negative sign for sales per worker; NA stands for ‘not applicable’. 45 Table 4. Estimated population of informal enterprises from HS-IME and ISES HS-IME ISES 95% 95% 95% 95% Count CI_L CI_U n Count CI_L CI_U n Kumasi: Operating, formal 13, 983 9,298 18,667 50 Operating, informal 119,722 106,530 132,914 473 67,548 58,651 76,445 1,580 Tamale: Operating, formal 6,157 3,098 9216 31 Operating, informal 88,820 74,621 103,019 417 21,295 17,242 25,348 784 46 Table 5. Regression results on correlates of sales and sales per worker by ISES and HS-IME data. (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) Variable HS-IME ISES log_sale_w log_sale_per_worker_w log_sale_w log_sale_per_worker_w tamale -0.425*** -0.454*** -0.362*** -0.210* -0.253** -0.170 -0.460*** -0.439*** -0.198* -0.598*** -0.574*** -0.301*** (0.102) (0.110) (0.123) (0.109) (0.114) (0.125) (0.161) (0.149) (0.113) (0.144) (0.143) (0.115) age_owner -0.000 -0.001 -0.003 -0.005 (0.005) (0.005) (0.004) (0.004) noschooling_owner -0.156 -0.206* -0.081 0.139 (0.115) (0.114) (0.158) (0.149) female_owner -0.003 -0.535 -0.064 -0.146 (0.404) (0.343) (0.129) (0.123) one_owner 0.220 0.310 0.049 0.273 (0.338) (0.333) (0.243) (0.240) hhsize 0.014 0.006 0.026* 0.014 (0.016) (0.016) (0.015) (0.013) manufacturing2 0.233* 0.116 -0.039 -0.033 (0.118) (0.114) (0.110) (0.112) retail2 0.361*** 0.383*** 0.168 0.295*** (0.101) (0.097) (0.115) (0.111) tot_workers 0.077** -0.083*** 0.089** -0.145*** (0.035) (0.020) (0.035) (0.028) share_fem_workers -0.546 0.053 -0.397*** -0.260** (0.406) (0.353) (0.133) (0.125) share_workers_read -0.157 -0.033 0.009 0.166 (0.266) (0.270) (0.284) (0.256) share_workers_write 0.016 -0.049 0.264 0.149 (0.248) (0.249) (0.275) (0.249) bank_account 0.475*** 0.417*** 0.709*** 0.697*** (0.081) (0.083) (0.111) (0.102) op_athome_inside -0.476*** -0.367*** -0.230* -0.192* (0.116) (0.107) (0.124) (0.116) op_mobile -0.335*** -0.271*** -0.439*** -0.358*** (0.095) (0.098) (0.117) (0.107) applied_loan 0.100 0.083 0.368** 0.374*** (0.147) (0.133) (0.143) (0.136) use_phones 0.257*** 0.187** 0.163* 0.250*** (0.088) (0.080) (0.094) (0.090) need_electricity -0.097 -0.106 0.094 0.045 (0.104) (0.104) (0.086) (0.084) years_operation 0.012** 0.014** 0.034*** 0.032*** (0.006) (0.006) (0.007) (0.007) member_organization -0.334** -0.333** 0.064 0.055 (0.162) (0.147) (0.151) (0.156) 9.month 0.091 0.025 0.100 0.055 (0.120) (0.100) (0.120) (0.100) 10.month 0.173 0.192* 0.260** 0.264** 0.148 0.135 0.164 0.130 (0.114) (0.110) (0.109) (0.112) (0.154) (0.131) (0.150) (0.129) 47 11.month 0.217 0.128 0.202 0.222 (0.222) (0.184) (0.195) (0.190) Constant 7.426*** 7.301*** 6.876*** 6.944*** 6.757*** 6.591*** 6.908*** 6.829*** 6.099*** 6.471*** 6.385*** 5.854*** (0.076) (0.079) (0.387) (0.072) (0.079) (0.363) (0.065) (0.101) (0.369) (0.059) (0.094) (0.351) Observations 882 882 848 882 882 848 1,651 1,651 1,548 1,641 1,641 1,548 R-squared 0.030 0.034 0.228 0.008 0.016 0.163 0.025 0.027 0.360 0.045 0.047 0.349 F 17.48 8.730 11.81 3.732 3.701 10.26 8.221 2.370 20.96 17.34 5.069 15.04 p 6.72e-05 0.000345 0 0.0565 0.0286 0 0.00446 0.0529 0 4.21e-05 0.000595 0 Notes: (1) Standard errors in parentheses; (2) *** p<0.01, ** p<0.05, * p<0.1 48 Table 6: Regression results on correlates of sales and sales per worker – pooled ISES and HS-IME data. (1) (2) (3) (4) (5) (6) Variable Dependent variable = log of sales Dependent variable = log of sale per worker es survey -0.516*** -0.442*** -0.512*** -0.471*** -0.353*** -0.429*** (0.102) (0.104) (0.094) (0.094) (0.102) (0.098) tamale -0.425*** -0.445*** -0.335*** -0.210* -0.242** -0.186 (0.103) (0.108) (0.113) (0.109) (0.113) (0.115) es_survey_tamale -0.032 0.004 0.029 -0.380** -0.322* -0.180 (0.179) (0.181) (0.162) (0.183) (0.186) (0.159) age_owner -0.002 -0.003 (0.003) (0.003) noschooling_owner -0.036 -0.039 (0.088) (0.089) female_owner -0.053 -0.249* (0.132) (0.127) one_owner -0.016 0.146 (0.201) (0.197) hhsize 0.022* 0.013 (0.012) (0.012) manufacturing2 0.145 0.057 (0.088) (0.084) retail2 0.281*** 0.315*** (0.084) (0.079) tot_workers 0.080*** -0.109*** (0.029) (0.023) share_fem_workers -0.451*** -0.210* (0.130) (0.127) share_workers_read -0.079 -0.001 (0.212) (0.216) share_workers_write 0.070 0.022 (0.201) (0.201) bank_account 0.567*** 0.535*** (0.067) (0.067) op_athome_inside -0.413*** -0.332*** (0.091) (0.088) op_mobile -0.343*** -0.284*** (0.075) (0.078) applied_loan 0.221** 0.209** (0.107) (0.101) use_phones 0.272*** 0.247*** (0.067) (0.064) 49 need_electricity -0.037 -0.075 (0.081) (0.081) years_operation 0.018*** 0.019*** (0.005) (0.005) member_organization -0.143 -0.129 (0.124) (0.122) 9.month 0.017 -0.087 0.033 -0.082 (0.126) (0.108) (0.119) (0.107) 10.month 0.139 0.070 0.224 0.113 (0.145) (0.126) (0.140) (0.128) 11.month 0.071 -0.075 0.048 -0.030 (0.234) (0.183) (0.177) (0.180) Constant 7.426*** 7.321*** 7.036*** 6.944*** 6.773*** 6.777*** (0.078) (0.144) (0.282) (0.073) (0.138) (0.274) Observations 2,533 2,533 2,396 2,523 2,523 2,396 R-squared 0.041 0.042 0.266 0.038 0.042 0.224 F 16.09 8.404 23.28 19.49 10.70 21.01 p 7.56e-10 1.51e-08 0 0 5.90e-11 0 Notes: (1) Standard errors in parentheses; (2) *** p<0.01, ** p<0.05, * p<0.1 50 Table A1. How key questions/variables were asked in the HS-IME and ISES. Variable Questions in HS-IME Questions in ISES Is this [NFE] registered with any of the Is this business currently registered NFE is informal following government agency? Option: Yes, with Registrar General? Registrar- General’s Dept Age of the owner How old is [NAME]? What is the [owner]’s age? Owner has been to What is [owner]’s highest level of Has [NAME] ever attended school? school completed education? Sex of the owner What is the sex of [NAME]? Is the owner a female? Who in the household owns/owned this [INCOME-GENERATING ACTIVITY]? How many owners does this business Number of owners Select the primary owner first, followed by the or activity have? second owner if applicable. Listing of household members in the How many people live in [owner]’s Household size household roster household? How many paid female employees work in this [NFE] who are not household members? How many Paid FEMALE household members work in this [NFE]? How many Unpaid FEMALE household members work in this [NFE]? In [insert last calendar month], how Number of workers How many paid MALE employees work in many people worked in this business? this [NFE] who are not household members? How many Paid MALE household members work in this [NFE]? How many Unpaid MALE household members work in this [NFE]? Does the owner have a bank account to run Does the [owner] have a bank account Bank account this [NFE]? to run this business or activity? INTERVIEWER: PLEASE Where does your household operate this NFE's operations at INDICATE THE TYPE OF [NFE]? Option: HOME (INSIDE home (inside) RESIDENCE) PREMISES WHERE BUSINESS TAKES PLACE. Option: Household INTERVIEWER: PLEASE Where does your household operate this INDICATE THE TYPE OF NFE's operations [NFE]? Option: MOBILE/NO FIXED PREMISES WHERE BUSINESS are mobile LOCATION TAKES PLACE. Option: Non-fixed premises, including hawkers Did you or any member of your household use any FUNDS RECEIVED AS CREDIT At this time, does this business or (LOAN) to operate this [NFE]? During the Loan application past 12 months, has this[NFE] tried to get activity or the [owner] have a loan for the business or activity? credit from banks, and other financial institutions? At the present time, does this business At the present time, does [NFE] use Cellphone Usage of phone or smartphone? or activity use cellphone or smartphone? 51 Does this business or activity require Need of electricity Does this [NFE] need electricity to operate? electricity to produce goods or provide services? Total years of How long has this [NFE] been actively In which year did this business or operation operating or operated? In years activity originally start? Is this [NFE] a member of or part of any Is this [NFE] a member of or part of organized association such as a market any organized association such as a Member of an association, a professional association, a trade market association, a professional organization union, or any other type of organized association, a trade union, or any other association? type of organized association? How many paid female employees work in this [NFE] who are not household members? In [insert last calendar month], how Proportion of How many Paid FEMALE household many females worked in this business female workers members work in this [NFE]? or activity, including paid and unpaid How many Unpaid FEMALE household workers? members work in this [NFE]? Proportion of How many of the people working in How many of the people working in this this business or activity (including workers who can [NFE] (excluding [owner]) can read? [owner]) have the following skills? read Can read Proportion of How many of the people working in How many of the people working in this this business or activity (including workers who can [NFE] (excluding [owner]) can write? [owner]) have the following skills? write Can write Manufacturing Please describe each of the non-farm sector businesses or self-employed activities that INTERVIEWER: WHAT IS THE Retail sector individuals in your household did in the past MAIN OBSERVED BUSINESS 12 months? ACTIVITY? Services sector SUPERVISOR: record the industry code for the income generating activity What were the total sales/revenue (either from In [insert last calendar month], what Sales goods or services) for the [NFE] during the were the total sales of this business or last month of operation? activity? 52 Table A2. Kumasi estimates for the ISES and HS. Variable Log sales Log sales per worker es_survey -0.516*** -0.433*** -0.543*** -0.471*** -0.345*** -0.427*** (0.097) (0.098) (0.091) (0.092) (0.093) (0.092) age_owner -0.001 -0.002 (0.004) (0.004) noschooling_owner -0.123 -0.181 (0.132) (0.128) female_owner 0.032 -0.088 (0.137) (0.125) one_owner 0.009 0.193 (0.224) (0.213) hhsize 0.032** 0.021 (0.015) (0.015) manufacturing2 0.223* 0.124 (0.130) (0.122) retail2 0.332*** 0.438*** (0.097) (0.093) tot_workers 0.091* -0.070*** (0.048) (0.011) share_fem_workers -0.523*** -0.322** (0.143) (0.134) share_workers_read -0.027 0.218 (0.284) (0.278) share_workers_write 0.132 0.009 (0.269) (0.265) bank_account 0.583*** 0.530*** (0.076) (0.074) op_athome_inside -0.354** -0.250** (0.139) (0.123) op_mobile -0.390*** -0.289*** (0.099) (0.100) applied_loan 0.241* 0.223* (0.137) (0.128) use_phones 0.328*** 0.292*** (0.083) (0.075) need_electricity 0.037 -0.001 (0.090) (0.088) years_operation 0.012** 0.016*** (0.006) (0.006) member_organization 0.021 0.071 (0.154) (0.139) 9.month -0.070 -0.199* -0.040 -0.192 (0.143) (0.115) (0.135) (0.117) 10.month 0.103 0.024 0.193 0.088 (0.160) (0.135) (0.149) (0.136) 11.month -0.085 -0.246 -0.118 -0.200 53 (0.265) (0.179) (0.185) (0.166) Constant 7.426*** 7.371*** 6.827*** 6.944*** 6.816*** 6.351*** (0.071) (0.151) (0.365) (0.069) (0.143) (0.318) Observations 1,719 1,719 1,630 1,710 1,710 1,630 R-squared 0.039 0.042 0.290 0.038 0.045 0.220 F 28.40 7.368 18.30 26.42 7.258 13.96 p 2.13e-07 1.23e-05 0 5.39e-07 1.49e-05 0 54 Table A3. Tamale estimates for the ISES and HS-IME. VARIABLES Log sales Log sales per worker VARIABLES Log sales Log sales per worker VARIABLES es_survey -0.548*** -0.341* -0.259 -0.851*** -0.601** -0.451* (0.157) (0.204) (0.201) (0.151) (0.266) (0.244) o.tamale - - age_owner -0.005 -0.006 (0.004) (0.005) noschooling_owner -0.127 -0.128 (0.096) (0.095) female_owner -0.748* -1.319*** (0.400) (0.382) one_owner 0.046 0.157 (0.283) (0.280) hhsize 0.010 0.003 (0.015) (0.015) manufacturing2 0.168 0.128 (0.128) (0.119) retail2 0.285* 0.264* (0.157) (0.148) tot_workers 0.055** -0.190*** (0.024) (0.034) share_fem_workers 0.263 0.841** (0.386) (0.370) share_workers_read -0.079 -0.153 (0.248) (0.253) share_workers_write -0.096 -0.107 (0.259) (0.259) bank_account 0.473*** 0.464*** (0.124) (0.117) op_athome_inside -0.505*** -0.471*** (0.120) (0.120) op_mobile -0.304*** -0.293** (0.111) (0.119) applied_loan -0.050 -0.011 (0.190) (0.212) use_phones 0.209** 0.209** (0.104) (0.102) need_electricity -0.222 -0.249 (0.155) (0.160) years_operation 0.024*** 0.023*** (0.009) (0.009) member_organization -0.523** -0.562** (0.210) (0.226) 9.month 0.457** 0.368* 0.398* 0.358 (0.189) (0.220) (0.204) (0.235) 10.month 0.484* 0.433* 0.509* 0.350 55 (0.262) (0.259) (0.295) (0.291) 11.month 0.802* 0.818** 0.839* 1.009*** (0.413) (0.365) (0.445) (0.335) Constant 7.001*** 6.520*** 6.645*** 6.734*** 6.238*** 6.911*** (0.062) (0.257) (0.422) (0.073) (0.287) (0.417) Observations 814 814 766 813 813 766 R-squared 0.017 0.021 0.256 0.036 0.039 0.315 F 12.19 4.788 33.81 31.97 11.54 22.44 p 0.000724 0.00147 0 1.57e-07 1.15e-07 0 56 Appendix 1: Field Substitution and Non-Response Analysis Field Substitutions The implementation of field substitutions undoubtedly improves estimation efficiency by counteracting the data loss resulting from nonresponse and preserving the integrity of the planned sample structure. However, there exists no theoretical ex-ante guarantee that substitutions completely protect against or effectively mitigate nonresponse bias (see Nishimura, 2015 and references therein). Indeed, any systematic difference between nonrespondent and replacement units could allow nonresponse bias to persist despite the field substitution treatment. Unfortunately, the absence of such systematic differences is difficult to assess even ex-post, since, by definition, little is generally known about nonrespondents. However, for the HS-IME conducted in Kumasi and Tamale, efforts were made during the listing exercise to collect a parsimonious yet informative set of variables on both nonrespondent and replacement households. Leveraging these common variables, an extensive array of independent statistical tests was performed, as detailed below. In both cities, no evidence was found of systematic differences between the nonrespondent set and the replacement set at the 5% significance level. In other words, none of the tests detected any signs of nonresponse bias. This result should be regarded as robust, in that no adjustments for multiple comparisons were applied to the implemented battery of statistical tests, thereby preserving the statistical power of the tests to the maximum extent possible. Nonresponse analysis for HS-IME – statistical tests For both the HS-IME conducted in Kumasi and Tamale, 5 variables were collected during the listing exercise on both nonrespondent and replacement households: nfe_own (whether the household owns any NFEs), head_sex (sex of the household head), head_occupation (occupation of the household head), hh_size (household size), and nfe_num (number of NFEs owned by the household). Further details are provided in Table A4. Utilizing these common variables, statistical tests were conducted to identify any systematic differences between nonrespondent and replacement households. Table A4. Listing variables observed on both nonrespondent and replacement households kind (dichotomous, NONRESPONDENT / REPLACEMENT) nfe_own (dichotomous, YES / NO) 57 head_sex (dichotomous, MALE / FEMALE) head_occupation (categorical, 7 modalities) 1. In your own non-farm business 2. In a non-farm business operated by a household member 3. In a family farm, growing crops, raising livestock, or fishing 4. As an employee for a private company or another individual 5. As an employee for the government 6. As an apprentice, trainee, intern 7. No job hh_size (numeric) nfe_num (numeric) REPL (dummy, 1 = REPLACEMENT / 0 = NONRESPONDENT) The results of bivariate tests, aimed at assessing the association between each of the 5 variables and the dichotomous variable kind, which discriminates between nonrespondent and replacement households, are presented in Table A5. At the 5% significance level, none of the tests led to the rejection of the null hypothesis of independence in either of the two cities. Table A5. Bivariate test of association between variables. Kumasi Tamale Alternative hypothesis Test Significance p-value n p-value n Association between kind Pearson’s Chi-squared 0.05 0.112 130 0.743 42 and nfe_own test with Yates’ continuity correction Association between kind Pearson’s Chi-squared 0.05 0.136 130 0.097 42 and head_sex test with Yates’ continuity correction Association between kind Fisher’s Exact Test for 0.05 0.267 130 0.271 42 and head_occupation Count Data Difference in mean of T-test 0.05 0.394 130 0.153 42 hh_size by kind Difference in mean of T-test 0.05 0.266 130 0.641 42 nfe_num by kind 58 Table A6 shows the results of a multivariate test where a logistic model was constructed using variable REPL (1 = replacement, 0 = nonrespondent) as outcome variable and the five variables available from the listing data as predictors. T-tests on the estimated model parameters were then evaluated to measure the ability of the fitted model to discriminate between nonrespondents and replacements. At the 5% significance level, no logistic regression coefficients were found to be statistically different from zero in either of the two cities. Table A6. Multivariate logistic regression of predictors of replacements. KUMASI Estimate Std. Error z value Pr(>|z|) (Intercept) 0.299 0.559 0.535 0.593 nfe_own NO -0.609 0.624 -0.977 0.329 head_sex FEMALE 0.627 0.419 1.497 0.134 head_occupation In a non-farm business operated by a household member -1.386 1.194 -1.160 0.246 head_occupation In a family farm, growing crops, raising livestock, or fishing 13.946 882.744 0.016 0.987 head_occupation As an employee for a private company or another individual -0.257 0.593 -0.434 0.664 head_occupation As an employee for the government -0.399 0.955 -0.418 0.676 head_occupation As an apprentice, trainee, intern 0.859 1.214 0.708 0.479 head_occupation No job -0.960 0.623 -1.542 0.123 hh_size 0.038 0.092 0.412 0.680 nfe_num -0.194 0.315 -0.615 0.538 n = 130. AIC = 190.320 TAMALE Estimate Std. Error z value Pr(>|z|) (Intercept) -0.718 1.337 -0.537 0.591 nfe_own NO -0.237 1.299 -0.182 0.855 head_sex FEMALE 1.956 1.217 1.606 0.108 head_occupation In a family farm, growing crops, raising livestock, or fishing -0.253 0.902 -0.280 0.779 head_occupation As an employee for a private company or another individual -17.804 1923.452 -0.009 0.993 head_occupation As an employee for the government -0.369 1.591 -0.232 0.817 head_occupation No job 0.686 1.420 0.483 0.629 hh_size 0.210 0.135 1.560 0.119 nfe_num -0.193 0.664 -0.291 0.771 n = 42. AIC = 61.193 59 As the goal of the tests above was to detect signs of nonresponse bias that might have survived despite the replacement of nonresponding households, type II errors (i.e., false negatives) were considered more dangerous than type I errors (i.e., false positives). For this reason, to preserve the statistical power of the tests to the maximum extent possible, no adjustments for multiple comparisons were applied. Nonetheless, none of the tests detected any signs of differences between nonrespondent and replacement households. 60 Appendix 2: Weights calculation for HS-IME Both the HS-IME conducted in Kumasi and Tamale adopted a standard two-stage stratified cluster sampling design. Therefore, the inclusion probability of household j, belonging to EA i, sampled within stratum h, can be expressed as follows: ℎ = 1ℎ × 2 | ℎ (A2.1) where 1ℎ is the inclusion probability of EA i within stratum h, and 2 | ℎ is the inclusion probability of household j conditional to the first stage selection of EA i within stratum h (subscripts 1 and 2 clearly denote first and second stage selection). EAs were selected independently within strata with PPS and the total number of households resulting from the sampling frame was used as MOS. Thus, 1ℎ reads: ℎ × ℎ 1ℎ = (A2.2) ℎ where ℎ is the number of sample EAs allocated to stratum h, ℎ is the total number of households in sampled EA i of stratum h (i.e., the MOS of EA i of stratum h), and ℎ is the total number of households across all EAs in stratum h (i.e., the total MOS of stratum h). Following the household listing exercise, 15 households were randomly selected within each sampled EA from the freshly produced lists with equal inclusion probabilities. Thus, 2 | ℎ reads: 15 2 | ℎ = ∗ (A2.3) ℎ ∗ where ℎ is the total number of households in sampled EA i of stratum h as resulting from the listing exercise. From the above formulas, the inclusion probabilities of the households can be expressed as: ℎ × ℎ 15 ℎ = 1ℎ × 2ℎ| = × ∗ (A2.4) ℎ ℎ The design-weights of sampling units are, by definition, reciprocals of inclusion probabilities. Therefore, the design weights of the households can be written as: 61 ∗ 1 ℎ ℎ ℎ = �ℎ = � �� � (A2.5) 15 × ℎ ℎ Table A7 summarizes the sample distributions of the HS-IME weights in Kumasi and Tamale. The table examines two distinct samples: households, and household enterprises that were informal and active at the reference time of the survey and located within the overlapping area. The Kish Unequal Weighting Effect (UWE) is also reported for each set of weights. Following Kish’s definition (Kish, 1992), the UWE is calculated as 1 plus the relative sample variance of the weights. It can be regarded as a measure of how far the weights at hand are from the case of a self- weighting sample (UWE = 1). In general, increasing values of UWE are associated with estimates of decreasing precision. The UWE values displayed in Table 2, ranging from 1.12 to 1.29, can be considered moderate for a two-stage stratified cluster sampling design. Furthermore, while the UWE of households is higher in Tamale compared to Kumasi, the UWE of active informal enterprises operating within the overlapping area is nearly the same - and small - in both cities. As illustrated in Figure 3, the weights of these active informal NFEs also exhibit a well-shaped distribution and appear free from anomalous values. Table A7. Household survey weights’ distributions. Survey weights’ distribution summary Sample Sample City type size Min 1st Q Median Mean 3rd Q Max UWE Kumasi Households 1005 99.1 187.5 234.7 248.4 284.1 575.3 1.12 Kumasi Active informal 473 99.1 190.4 240.6 253.1 284.1 575.3 1.12 NFEs Tamale Households 1005 21.8 103.2 163.5 163.5 214.2 467.9 1.29 Tamale overlapping Active informal 417 99.0 164.1 187.7 213.0 242.3 467.9 1.13 area NFEs 62 Appendix 3: Pooling HS-IME and ISES data: Adjustment of weights and design metadata Upon pooling the HS-IME and ISES datasets, the original HS-IME and ISES weights in equations (2) and (3), respectively, cannot be used as they are, because this would lead to invalid inferential results. For instance, the population total, Y, of any variable, , would be overestimated by a factor of two. Therefore, to safely analyze the pooled sample, a pooling adjustment factor must be computed and applied to the original HS-IME and ISES weights. To achieve this, the weight of any NFE e belonging to the pooled sample must be expressed as follows (Chu, Brick, and Kalton, 1999): × if ∈ =� (A3.1) × if ∈ where the pooling adjustment factors and must be non-negative and satisfy + � of the population total when = 1. If the HS-IME and the ISES produce unbiased estimates Y used in isolation, the above formula guarantees that unbiased estimates are obtained from the pooled sample as well. The pooling adjustment factors in (A3.1) can be computed in many ways. By leveraging the independence of the original samples, it is easy to prove that the choice that � = ∑ minimizes the sampling variance of the pooled estimator of the total, Y , depends on the sample sizes and the Design Effects of the original samples (see, e.g., (Lohr, 2021) and references therein): Deff = where = {, } (A3.2) ∑ Deff Since Design Effects are specific to each estimator, the pooling adjustment (A3.2), which optimizes the precision of the pooled estimator of the total of variable , will likely be sub-optimal for other variables. This can be particularly inconvenient when, as in the present work, the primary interest lies in utilizing the pooled sample for multivariate statistical analysis, such as fitting and testing multiple regression models. In such cases, a natural compromise solution could be to plug in equation (A3.2) the average Design Effect of the variables appearing in the regression. This solution may still be unsatisfactory, though, as empirical econometric analysis often necessitates exploration across several different regression models. To address these challenges, a more robust 63 alternative was preferred in this work. Following the approach of O’Muircheartaigh and Pedlow (2002), the variable-specific Design Effects in equation (A3.2) were replaced with the variable- agnostic, yet survey-specific, Unequal Weighting Effects (UWE) of the HS-IME and ISES samples: = UWE where = {, } (A3.3) ∑ UWE Once the pooled dataset is equipped with the pooling adjusted weights calculated from equations (A3.1) and (A3.3), combined point estimates of population or model parameters can be calculated with the standard procedures followed for a single complex sample survey. For instance, the pseudo maximum likelihood (PML) estimation method (see, e.g., (Binder, 1983), (Skinner, 1989)), which is nowadays implemented in popular statistical packages such as R and STATA, allows for the seamless incorporation of survey weights when fitting classical statistical models to complex sample data (see, e.g., (Pfeffermann, 1993), (Lumley and Scott, 2017)). Notably, the PML method also produces estimates of the standard errors of the estimated model parameters (e.g., regression coefficients) that fully account for the sampling design of the underlying survey data (e.g., stratification and clustering effects). Such estimated standard errors can be calculated either via Taylor linearization (as was the case in the present work) or via replication methods. However, care must be applied when constructing the sampling design metadata for the pooled dataset. For instance, such metadata must encode the information that the HS-IME and ISES datasets originated from independent samples. This can be achieved by defining the stratification variable for the pooled sample as the cross-classification of a dichotomous variable that identifies the survey (HS-IME or ISES) with the specific stratification variables of the surveys. Similarly, the first-stage clustering variable for the pooled dataset must be constructed as the combination of categories of the clustering variables of the individual surveys. 64