Policy Research Working Paper 10077 Combining Survey and Geospatial Data Can Significantly Improve Gender-Disaggregated Estimates of Labor Market Outcomes Joshua D. Merfeld David Newhouse Michael Weber Partha Lahiri Poverty and Equity Global Practice & Social Protection and Jobs Global Practice June 2022 Policy Research Working Paper 10077 Abstract Better understanding the geography of women’s labor market labor force participation, as the mean absolute error falls outcomes within countries is important to inform targeted by approximately half, while the mean squared error falls efforts to increase women’s economic empowerment. This by almost 75 percent when holding coverage rates constant. paper assesses the extent to which a method that combines In contrast, the results for municipal unemployment rate simulated survey data from urban areas in Mexico with estimates are not reliable because values of unemployment broadly available geospatial indicators from Google Earth rates are low and therefore poorly suited for linear models. Engine and OpenStreetMap can significantly improve esti- The municipal results hold in repeated simulations of alter- mates of labor force participation and unemployment rates. native samples. Models utilizing Basic Geo-Statistical Area Incorporating geospatial information substantially increases (AGEB)–level auxiliary information generate more accurate the accuracy of male and female labor force participation predictions than area-level models specified using the same and unemployment rates at the state level, reducing mean auxiliary data. Overall, integrating survey data and publicly absolute deviation by 50 to 62 percent for labor force par- available geospatial indicators is feasible and can greatly ticipation and 25 to 52 percent for unemployment. Small improve state-level estimates of male and female labor force area estimation using a nested error conditional random participation and unemployment rates, as well as municipal effect model also greatly improves municipal estimates of estimates of male and female labor force participation. This paper is a product of the Poverty and Equity Global Practice and the Social Protection and Jobs Global Practice. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank. org/prwp. The authors may be contacted at dnewhouse@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Combining Survey and Geospatial Data Can Significantly Improve Gender- Disaggregated Estimates of Labor Market Outcomes Joshua D. Merfeld*, David Newhouse†, Michael Weber‡, and Partha Lahiri§ JEL codes: J21, C13 Keywords: Small Area Estimation, Data integration, Geospatial Data, Labor Force Participation, Unemployment, Mexico * KDI School of Public Policy and Management and IZA † The World Bank and IZA ‡ The World Bank and IZA § Department of Mathematics and Joint Program in Survey Methodology, University of Maryland 1. Introduction Low female labor force participation can be an important barrier to economic growth. In Mexico, for example, women’s labor force participation is considerably lower than that of men and is also low in comparison to other countries. The female labor force participation rate was just 45 percent in 2019, compared to 77 percent for men, a gap of 32 percentage points. In the OECD, only Turkey and Italy have a lower rate of female labor force participation, while in LAC only Guatemala has a lower value than Mexico. This has important economic impacts. A recent World Bank analysis suggests that if women participated at the same rate as men, per capita income could be as much as 22 percent higher (Inchauste et al., 2021). The same analysis suggests that if Mexico could implement policies to increase the female labor force participation rate by 0.6 percentage points a year – in line with that observed in Spain, Ireland, and Chile – this would help eliminate the gender gap in labor force participation and increase economic growth by 0.4 percent a year, highlighting how increases in female labor force participation are not just an output of the development process, but a driver of it, as well. Trends in female labor force participation are determined by several key factors, including the relationship between households’ economic conditions and female participation, the supply and demand for jobs deemed appropriate for more educated women, national growth strategies, and occupational gender segregation (Klasen, 2018). To better understand from a spatial perspective how policy can address these factors, it is useful to understand where pockets of low rates of female labor participation exist. However, most labor force surveys can only provide adequate estimates for larger areas of aggregation. Mexico’s labor force survey is no exception, as it can only provide estimates for urban and rural areas within states, as well as for select larger municipalities. Since Mexico has 32 states and 2,450 municipalities, reliable estimates of labor force participation – for both women and men – at the level of the municipality could greatly inform the geographic targeting of education, labor market programs, and other measures designed to improve female labor force participation. This is also true of other developing countries. Small area estimation is a branch of statistics that combines survey data with richer auxiliary data to generate more precise and accurate estimates of statistics. It has frequently been applied to measures of poverty and labor market outcomes. An extensive literature has documented that auxiliary data, typically taken from census or other administrative sources, can improve estimates of labor force participation and unemployment (Esteban et al., 2020, Chambers et al., 2016, López- Vizcaino et al., 2015, Ugarte et al., 2009, Molina et al., 2007, Datta et al, 1999). This literature employs a variety of methods, including hierarchical Bayes, empirical Best Predictors, and multinomial logit mixed models, as well as area level models. Intuitively, small area estimation allows surveys to “borrow strength” from auxiliary data that is more geographically comprehensive, and therefore covers unsampled areas. Combining the survey data with model predictions for unsampled areas, derived from the auxiliary data, makes the estimates more accurate and precise. In addition, by modeling the average relationship between the survey and auxiliary data across the entire sample, small area estimation can help correct sampling error even in sampled areas. This method can therefore produce more precise estimates for geographic levels, such as Mexican municipalities, where the labor force survey is too small to generate reliable estimates. 2 At the same time, the rapid proliferation of publicly available “big data” from satellite and crowd- sourcing applications has made geographically comprehensive data freely available, while survey enumeration area geocoordinates or administrative shapefiles required to link surveys to geospatial data are also becoming increasingly available. These developments have sparked a burgeoning literature – ably reviewed in Burke (2021), McBride et al. (2021), and World Bank (2021) – on the use of satellite imagery for economic measurement. Most early work has focused on the ability of geospatial data to predict agricultural yields and crops (Erciulescu et al., 2019, Lobell et al., 2020), household asset wealth and poverty (Jean et al., 2016, Yeh et al., 2020 Steele et al., 2017, Engstrom et al., 2021), and population (Wardrop et al.., 2020, Engstrom et al., 2020). This is the first paper to our knowledge that explores whether combining survey data with geospatial indicators can be used for the purposes of improving estimates of labor market outcomes, such as female and male labor force participation and unemployment, through small area estimation. Demonstrating that this methodology works is important, as many developing countries do not have the requisite data traditionally used for small area estimation, such as up-to- date censuses or detailed administrative data. We show that the incorporation of geospatial data – which is available across the globe – is a feasible alternative when preferred sources of auxiliary data are not available. This paper also contributes to the ongoing discussion over the proper methods for combining survey and geospatial data for the purpose of prediction. Some analysts take an explicitly Bayesian approach (Steele et al., 2017, Pokhriyal and Jacques, 2017, Erciulescu et al., 2019), while others employ an empirical best predictor model (Battese Harter and Fuller, 1988 Masaki et al., 2020). We undertake this exercise in the context of Mexico, which has publicly available data that makes our approach feasible. We focus on three separate levels of aggregation: the state, the municipality, and the AGEB (Área Geoestadistica Básica, i.e. the Basic Geo-Statistical Area). Importantly, the census data has AGEB-level identifiers for all urban AGEBs in the country. This allows us to match geospatial data to AGEBs. In addition, we simulate a random sample from the census before implementing our preferred small area estimation approach. The use of census data allows us to compare the resulting estimates to the “truth,” derived from the census data itself. This type of data is rarely available in developing countries and, as such, Mexico is the perfect context in which to apply our approach. It is important to note that our results on unemployment and labor force participation differ from official rates in two key ways. First, we include only urban AGEBs due to data limitations. Second, we are using census data, not a labor force survey, so the instruments differ. For example, the age range for which employment questions are asked differs across the instruments. However, despite these differences in the instruments, the underlying concepts of labor force participation and unemployment in the census and labor force survey are similar. There is therefore no reason to believe that the findings on the benefits of incorporating geospatial data would not apply to the official definitions as well. This paper implements a sub-area model, which is essentially a unit-level model in which the unit is taken to be the AGEB. In particular, we specify a weighted empirical best predictor model, with conditional random effects specified at the municipality level in our main results. Municipal 3 predictions are then generated by taking the population-weighted mean of the AGEB-level predictions. 1 This approach leverages AGEB-level data, which leads to more accurate and efficient estimates than area-level models specified at the municipal level and requires minor modifications of publicly available software to generate point estimates and confidence intervals. 2 Six main findings emerge: 1. Combining survey and geospatial data substantially improves the precision and accuracy of state-level estimates of both labor force participation (LFP) and unemployment rates. Estimated mean absolute deviation falls by approximately 62 and 50 percent for female and male participation rates, respectively, and by approximately 52 and 25 percent for female and male unemployment rates. 2. When considering municipal estimates of labor force participation rates, incorporating geospatial data significantly improves accuracy and greatly improves precision. In estimates from repeated simulations, estimated mean absolute deviation falls by about 43 percent for women and 53 percent for men, and rank correlation increases by 0.13 for women and 0.11 for men. After adjusting direct survey estimates to equalize coverage rates, estimated mean squared error for male and female LFP falls by a factor of more than four. 3. Because unemployment rates are very low, municipal estimates of unemployment rates based on a linear model are not reliable. Although estimated mean absolute deviation falls for both man and women, relative bias is high, rank correlation falls significantly for women and the estimates exhibit little variation. 4. Accuracy is substantially lower in out-of-sample municipalities than sampled municipalities. 5. When using a sample that simulates fully enumerating sampled AGEBs, small area estimates offer very minor improvements on direct survey estimates. 6. A model specified at the AGEB level generates more accurate estimates than one specified at the municipal level. The improvement due to using AGEB rather than municipal variation is much larger for male LFP, which is harder to predict, than for female LFP. In short, the results support the use of augmenting survey data with geospatial data when estimating state-level statistics and when estimating municipality-level female and male labor force participation rates. When estimating municipal unemployment rates, however, the estimates are not reliable because the sample contains insufficient information to train a linear model. In particular, the sample contains no unemployed men in 44 percent of municipalities and no employed women in 68 percent of municipalities. Modeling unemployment rates using a linear model is problematic in this setting, highlighting the usefulness of software that can conveniently estimate and apply a wider range of non-linear empirical best predictor models. 1 While this is in spirit similar to the sub-area model proposed in Torabi and Rao (2018), the approach outlined in that paper is difficult to implement using publicly available software. 2 Code is available from the authors upon request. 4 2. Data and Methodology a. Census and Survey Data The primary source of data on labor market outcomes is the 2020 Census of Population and Housing, carried out in March 2020 by the Mexican National Institute of Statistics, Geography, and Informatics (INEGI). 3 The census was collected the month before the COVID-19 pandemic led to widespread shutdowns. INEGI publishes census statistics publicly at different geographic levels, with the lowest level being the Área Geoestadistica Básica (AGEB). INEGI publishes aggregate statistics only for urban AGEBs. We use these to generate municipality statistics that are weighted by the population of urban AGEBs. Because only urban AGEBs are included in the analysis, the results in this paper pertain to urban state and municipality-level rates. Urban AGEBs total to around 96.4 million people, or more than 75% of Mexico’s estimated 127.6 million people. We are primarily interested in examining the feasibility and effectiveness of combining geospatial data with survey data to improve municipal estimates of labor force participation rates and unemployment rates, separately for women and men. We also examine the ability of the procedure to improve state-level estimates of these four labor market outcomes. We restrict the analysis to urban AGEBs in order to utilize the census data as a credible benchmark against which to assess the performance of the small area estimates. From the available urban census data, we construct a data set with six different variables: the total number of women in the labor force, the total number of women employed, the total number of women 12 years or older, plus the same three variables for men. 4 Since this is census data, we take these values to be the true values for each AGEB. We then randomly sample AGEBs from the full set of urban AGEBs to form a pseudo survey. We select AGEBs with probability proportional to size within states, which serve as the strata. We then choose the number of AGEBs per stratum (state) using sample sizes similar to INEGI’s labor force surveys. 5 Like INEGI, we assume an average of 2.7 people 12 years of age and older per household when selecting AGEBs. 6 Out of 50,942 AGEBs with non-missing geospatial auxiliary features, 7,642 – or 15.0 percent – are included in the pseudo sample. After selecting AGEBs, we simulate sampling individuals within each selected AGEB. The census data indicate the number of men and women the number of people 12+ living in each AGEB, as well as the number of people 12+ who are active labor force participants as well as the number employed. Since being employed is conditional on being in the labor force, we can use this information to simulate a sample of men and women within each AGEB based on these variables. We essentially construct the total population and randomly select 40 individuals within each AGEB – or all individuals if there are fewer than 40 men and women 12+ in the AGEB. These are then be used to construct gender-specific labor force participation and unemployment rates in the sample. 3 The data are available online here. 4 The labor data are collected for those 12 years of age and older, hence why we use 12+ as our population. 5 More information is available here. One difference between our methodology and theirs – apart from the fact that we do not have access to actual enumeration areas – is that INEGI also includes some larger municipalities as strata. For simplicity, we only use states. 6 This number comes from INEGI’s labor force survey methodology manual in footnote 5. 5 Table 1 presents summary statistics for the 7,642 AGEBs in the sample, covering 1,034 municipalities. The first column presents statistics for the sample – for both women and men – while the second column presents actual AGEB population values from the census. Two striking facts emerge from the table. First, labor force participation for women is considerably lower than for men, both in the sample and in the population. Second, the number of unemployed (labor force minus employed) is very low across both columns. For example, in the sample the implied unemployment rate is less than 1 percent for women and less than 3 percent for men. The fact that the unemployment rate is close to zero makes it far more challenging to predict unemployment rates than labor force participation using a linear model in this sample. We return to this point below. Table 1 - Mean number of individuals across AGEBs (1) (2) Sample Population Female Total 20.73 693.07 In labor force 10.62 356.04 Employed 10.48 351.09 Unemployed 0.14 4.94 Male Total 19.23 640.56 In labor force 14.54 482.42 Employed 14.21 471.19 Unemployed 0.33 11.23 Number of AGEBs 7,733 45,250 Number of municipalities 1,072 1,619 Figure 1 - Appearance in Sample: AGEBs and Municipalities 6 Figure 1 shows the AGEBs and municipalities that appear in the main sample. Panel A shows AGEBs, with individual AGEBs colored based on whether they are in the estimation sample (“Yes”), are not in our estimation sample but are urban (“No”), or are rural AGEBs that are not considered in the analysis (“Not in results”). Panel B shows the same, but for municipalities. Specifically, a municipality is coded as included in the sample if at least one AGEB within that municipality is contained in the sample. As such, a municipality in our sample could have a relatively small proportion of its total population actually included in our sample. There are 1,641 municipalities that contain at least one urban-AGEB with non-missing data. Of these, 1,034 – or 63.0 percent – have at least one AGEB in the sample. b. Auxiliary geospatial data Auxiliary data is drawn from two sources: Google Earth Engine, and Open StreetMap, utilizing the 2020 shapefile of urban AGEBs provided by INEGI. These sources were largely selected because they are publicly available, cover a large portion of the world, and convenient to obtain. However, both contain a large number of candidate predictors that could be plausibly correlated with spatial patterns in labor force participation. From Google Earth engine, we extracted summary statistics by AGEB from six datasets: Nitrogen Dioxide from Sentinel 5P, Normalized Difference Vegetation Index (NDVI) from the Sentinel 2 Multi-spectral Instrument, Nighttime lights from VIIRS, estimated population from WorldPop, land cover classifications from the Copernicus dynamic land cover map, and the year of development, as proxied for by the change in pixels from pervious to impervious surfaces (Gong et al., 2020). All summary statistics were taken over March 2020, except for land cover, which pertains to the period between January and December 2019. From Open StreetMap, indicators were obtained representing the total length and number of highways in each AGEB. Two measures of road length were calculated: The first is the total length of the portions of highways contained within the AGEB, and the second is the total length of all highways that intersect the AGEB, including the portions of the road outside the AGEBs. In addition, the total counts of open StreetMap amenities and points of interest, such as hospitals, offices, schools, churches, and so on, were calculated for each AGEB. A final auxiliary variable, besides the Open StreetMap indicators, is the total area of the AGEB of the AGEB in sq km, which was calculated directly from the AGEB level shapefile provided publicly by INEGI. For each auxiliary variable, municipal variables were constructed by taking either population-weighted averages or simple sums (for counts and area) of the AGEB values. c. Model selection Variables for prediction were selected using the LASSO (Least Absolute Selection and Shrinkage Operator), with the penalty chosen to minimize the Bayesian Information Criteria (BIC) (Zhang et 7 al., 2010). 7 LASSO models were fitted separately to four dependent variables. These are the transformed versions of the four labor market outcomes, AGEB male or female labor force participation and unemployment rates, using the arcsine transformation. The candidate predictors included in the LASSO model include all of the AGEB and municipal level geospatial auxiliary data described in the last section, as well as dummy variables for states. The state dummies were included with no penalty, and were included to both improve the model and to prioritize the selection of variables that predict variation in the labor market outcomes within states. d. Model estimation We begin by constructing two separate estimates of LFP and unemployment in Mexico: 1. Direct estimates using only the pseudo survey of sampled AGEBs, which can only be constructed for in-sample municipalities. 2. Small area estimates using the pseudo survey of sampled AGEBs and the auxiliary geospatial data, which can be constructed for all municipalities, both in and out of sample. These estimates are compared with the full census results along several different dimensions to assess their accuracy, as discussed below. To generate the small area estimates for municipalities, we utilize a standard nested error model at the AGEB level (Battese, Harter, and Fuller, 1988, Jiang and Lahiri, 2006, Molina and Rao, 2010). The transformed version of the labor market outcome at the AGEB level is modeled as a linear function of geospatial auxiliary variables. The dependent variable was transformed using an arcsine transformation, a common transformation for proportions bounded between zero and one. The model used to generate municipal estimates is specified as follows: (1) G( ) = 1 + 2 + 3 + + , s=1,…,S,m=1,…,Ms; a=1,…,Asm, where G( ) denotes the arcsine of one of the four estimated labor market outcomes (male and female LFP, and male and female unemployment rate) of AGEB a within municipality m and state s, derived from the sample. The model is assumed to hold for the full population of AGEBs although it is estimated using the sample. Xsma is a vector of geospatial auxiliary variables specified at the AGEB level, while Xsm is a vector of geospatial auxiliary variables aggregated to the municipal level. Xs is a vector of state dummy variables. β1, β2, and β3 are vectors of regression 2 coefficients, is distributed normally with variance , and is a set of municipality 2) conditional random effects. ~(0, is a residual error term. As noted above, the set of predictor variables was selected using the BIC-minimizing lasso procedure from the full set of candidate variables, with the full set of state dummies Xs included with no penalty. For the model used to generate estimates for states, the specification is similar, except that the conditional random effect is omitted from the model: 7 The BIC was based on the unpenalized regression of selected variables rather than the penalized regression, and when estimating the lasso, standard errors were clustered at the municipality level. 8 (2) G( ) = 1 + 2 + 3 + , s=1,…,S,m=1,…,Ms; a=1,…,Asm. The state-level effect is omitted because of the inclusion of a full set of state dummy variables Xs as independent variables. This mechanically causes the estimated variance of to be zero, due to the lack of variation across states in the residual after controlling for state dummies, which effectively drops from the model. In other words, the state-level estimates are generated from what is essentially a fixed effect estimator instead of a conditional random effect estimator. This is possible when generating state-level estimates, but not municipal level estimates, because the survey is sufficiently large to estimate state level results precisely. 8 To generate the small area point estimates, the procedure simulates 100 draws from a normal distribution for both the random error term and the area effect (or for the state level estimates) for each AGEB j. These estimates are then back-transformed, averaged across simulations, and finally aggregated across AGEBs to generate municipal or state-level estimates. When aggregating across AGEBs, we weight using estimated population from WorldPop. 9 Mean squared error estimates are obtained from a parametric bootstrap procedure, as proposed by Gonzalez-Manteiga et al.(2007) and implemented in the R EMDI package (Kreutzmann et al., 2018), based on the theory developed by Butar (1997) and Butar and Lahiri (2003). When estimating the parameters of models (1) and (2), observations are weighted using normalized inverse probability weights, using the weighting method implemented in the R nlme package (Pinheiro, et al., 2021). Because each municipality is given equal weight when evaluating the estimates, it is necessary to normalize the weights to avoid giving too much weight to more populous municipalities in the sample estimation. 10 We therefore normalize the weights by dividing the inverse probability of selection by the mean inverse probability of selection for the municipality, as follows: −1 = 1 where is the number of AGEBs contained in the sample for ∑ −1 =1 municipality m and is the probability of AGEB a being included in the sample. 11 This normalizes the sum of the weights in each municipality to equal the number of sample observations, which is one recommended method for normalizing weights when estimating conditional random 8 The same software package was used to estimate both municipal and state level results. We experimented with omitting state-level dummies and the results were worse. 9 Using estimated population from WorldPop has the added benefit of being applicable to other contexts. While we have census information that would allow us to aggregate using arguably more “correct” data, this would not be possible in other contexts in which census data is missing or out of date. For this reason, we drop any AGEBs with an estimated population of zero. Only 0.6 percent of AGEBs are dropped. 10 For some policy contexts, it may be desirable to give greater weight to more populous municipalities when evaluating the accuracy of estimates, but in this analysis, we limit consideration to the case where each municipal estimate is given equal weight when evaluating the performance of the estimates as a whole. 11 We construct the sampling weight as the inverse of the probability of selection, using the formula for sampling with replacement, as an approximation for the actual probability of sampling based on our sampling strategy. In particular, the probability of selection is approximated as = 1 − �1 − � , where popsma = population of AGEB a, poptotal = total national population and = (40, ), the number of individuals in the sample. 9 effects models. 12 Below, as a robustness check, we show that giving equal weight to each municipality worsens the predictions. Finally, we ignore heteroscedasticity in the dependent variable, which is the average AGEB labor market outcome taken from the sample. A sample of 40 individuals was taken from each AGEB, except for the few cases where AGEBs contain fewer than 40 working aged persons, which implies that heteroscedasticity is not a first order concern when estimating labor force participation rates. Different rates of labor force participation across sample AGEBs will affect the number of labor force participants used to calculate unemployment rates from the sample, but we do not adjust for this source of heteroscedasticity. 13 Population counts for each AGEB taken from the 2020 census were used as population weights when aggregating AGEBs to municipal estimates. We used a modified version 2.0 of the EMDI package in R, modified to incorporate sample and population weights, to estimate the model (Kreutzmann et al., 2018). 14 e. Evaluating the estimates To evaluate the estimates, we compare each estimate to the population values from the 2020 census. We calculate the following statistics to compare these four separate estimation methods: • Estimated mean squared error. For the direct estimates, this is assumed to be equal to the estimated variance of the mean, estimated using the Horwitz-Thompson approximation.15 For the small area estimates, MSE estimates were generated using the parametric bootstrap procedure. • The median estimated relative standard error across municipalities, where the relative standard error for each municipality is defined as the square root of the estimated mean squared error divided by the point estimate. • Coverage rate: This is the share of municipalities for which the actual census value falls within the 95% confidence interval generated by the estimate. 12 See Rabe-Hesketh and Skondral (2006). In conditional random effect models, unlike in standard random effect models, multiplying the weights by a constant affects the parameter estimates. 13 This is partly because the high number of municipalities in the sample with zero unemployment is a more serious issue than heteroscedasticity when estimating unemployment rates. 14 The number of simulations and bootstraps, L and B, were each set to equal 100. The weighting strategy relies on passing through the specified weights to the lme function in the R nlme package that estimates the conditional random effect model (Pinheiro et al., 2020). In addition, we take a weighted mean instead of a simple mean when aggregating point estimates for AGEBs to generates estimates for municipalities. In the latter case, the aggregation weights are the population of the AGEBs. 15 We use the Horvitz-Thompson approximation of the variance described in Molina and Marhuenda (2015), in which the variance of the sample mean for municipality m is estimated as: � � = ∑ (1 − ) 2 � � =1 Where m is a particular municipality, i is an individual within municipality m, Nm is the number of relevant individuals in the municipality, w is the inverse probability sample weight, and ymi is the binary outcome variable (participation or unemployment) for individual i. 10 • Estimated relative bias: This is the deviation of the estimate from its true value, expressed in percentage terms relative to the true value. • Estimated mean absolute error: This is the absolute value of the difference between the estimated rate and the actual census rate. • Estimated correlations with census value: This is a simple correlation between a point estimate and the census value and is a measure of the accuracy of the prediction. • Estimated rank correlations with census value: This is the spearman rank correlation between a point estimate and the census value, a measure of the accuracy of the predicted rank. Estimated in this context refers to the fact that these statistics are estimated using one sample. In each case, statistics are calculated giving equal weight to each municipality. All comparisons between direct survey estimates and the small area estimates are restricted to in-sample municipalities. This makes for more accurate comparisons, since the direct estimates only pertain to these municipalities. Later, we also show how results for out-of-sample municipalities compare with results for sampled municipalities. f. Repeated simulations A concern with drawing inferences about the performance from one sample is that any particular sample may not be representative. Since we construct this sample using a synthetic population, we can simulate the performance of the estimators across multiple samples. We therefore simulate 100 separate samples and calculate point estimates – for both direct estimates and small area estimates – for each of these samples. These are our preferred measure of the accuracy of small area estimates and direct estimates. However, due to computing time considerations, we do not perform a parametric bootstrap to calculate mean squared error in the simulation. 16 Therefore, our estimate of the efficiency gain due to small area estimation comes from only one sample. g. Second-stage sample size As noted above, a key issue with respect to modeling unemployment rates is that the indicator is very close to zero in Urban Mexico, especially for women. In our sample, for instance, female unemployment is less than 1.5 percent. However, our sample size for each AGEB is only 40 and around half of those are women, for just 20 women per AGEB. Additionally, labor force participation is about 50 percent, meaning that unemployment is necessarily calculated from a sample of sometimes as small as just 10 women. With an average unemployment rate of less than 1.5 percent, the great majority of these AGEBs have an estimated unemployment rate of zero.17 The same is true for men. 16 Drawing the sample, calculating the estimates, and performing the parametric bootstrap takes around two hours for a single sample. Since we simulate 100 different samples, this would take more than eight full days to estimate MSE across all simulations. Instead, we opt to calculate just point estimates – which takes only around 20 minutes for a single sample – and compare the performance of the point estimates. 17 For male unemployment, approximately 72 percent of AGEBs have a direct estimate of zero. For female unemployment, it is 87 percent. 11 The low unemployment rates in the population lead to two issues. First, the estimated R2 of the model is very low, meaning that the linear model predicts little of the variation in unemployment rates in the sample. This is not surprising, given the large amount of sampling error in the AGEB- level unemployment rates in the simple random sample. This does not rule out the possibility that the small area estimates improve on the direct estimates. 18 In this case, however, the large share of sample municipalities with zero unemployment also downwardly biases the variance estimate of the municipal conditional random effect. This in turn leads the predictions to give too much weight to inaccurate predictions from the model, relative to the sample, which in turn generates inaccurate estimates. The most direct approach to this problem would be to implement a non-linear model such as a two-part model (Belotti et al., 2015), although doing so in an empirical best framework with existing software is not straightforward. However, increasing the size of the second stage of the sample may mitigate this issue by reducing the number of municipalities with zero unemployment in the sample. 19 As a robustness check, we calculate small area estimates using a hypothetical full enumeration of sampled AGEBs. In other words, we use the actual census value of the labor market outcomes in each selected AGEB to estimate the model. In many developing countries, it is not uncommon to perform a full listing of all households in selected enumeration areas before drawing the second stage sample. It may therefore be feasible to collect data on low-probability events like unemployment from all adults in selected enumeration areas, as part of a listing exercise. 3. Results A. Model diagnostics Before comparing results against the full census, we check the characteristics of the models themselves. Table 2 shows the number variables selected, the R2 values of the regression, and a variety of other model diagnostics for the lasso and post-lasso results for all four outcome variables. The first panel presents the results for the state-level estimates, while the second and third panels present results for the municipality-level results. Marginal R2 represents the variance explained by the auxiliary variables, while conditional R2 represents the variance explained by both the auxiliary variables and the conditional random effect. The R2 values for the simple random sample that generates municipal level estimates are generally lower than individual level models, particularly for modeling unemployment. 20 However, the conditional R2 of the female LFP model is significantly higher, at approximately 0.3, and EBP models can perform well even in cases when the available covariates are not strong predictors (Marhuenda et al, 2017). 18 This is counterintuitive but follows from Stein’s paradox. Stein’s paradox implies that a shrinkage estimator that takes a weighted average of direct survey estimates for subpopulations and the grand mean, when there are three or more subpopulations, can substantially improve predictions for the subpopulations (Stein, 1964, Efron and Morris, 1977). This is also consistent with Marhuendra et al.(2017), who note that EB estimates can perform well even in cases where the available covariates are not strong. 19 The problem can also be fixed by using a non-linear estimator such as a two-part model (Belotti et al., 2015). R s may increase after adjusting for sampling error (Li and Lahiri, 2018). 20 2 12 Appendix Table A1 shows the actual post-lasso model coefficients for the random sample results. In general, areas that are more urban on average are associated with higher female LFP, as mean vegetation intensity is negatively associated with participation, while the correlation is positive for the number of residential roads and street crossings. Consistent with this, areas that have been urbanized more recently, as indicated by the mean year of switching to impervious surfaces, have lower participation rates all else equal. Furthermore, smaller municipalities (by area), many of which are in Southern Mexico, are correlated with lower female labor force participation all else equal. The relationship between other indicators of urban density and female labor force participation is more complex, however. Areas that are more heterogeneous in terms of vegetation intensity, such as the suburbs, are other things equal associated with higher labor force participation. The relationship between night-time lights and female LFP is another example. The median value of nighttime lights is positively associated with female LFP while the mean value is negatively associated, although the latter is not statistically significant. This nonetheless suggesting that AGEBs with a large number of very bright pixels, perhaps due to the presence of highways, have lower female labor force participation rates. Some of the infrastructure variables are more easily interpretable, as well, but these are rarely significant. The number of schools is significantly positively correlated with higher male LFP but does not show up for any of the other three indicators. Appendix Table A2 decomposes model R2 across different categories of variables. The latter shows that the determinants of male and female LFP differ significantly. For example, the geographic size of the AGEB and municipality, nighttime light luminosity, the mean of the vegetation index, the number of different points of interest recorded in Open StreetMap, and the year that pixels changed to being an impervious surface are important predictors of female LFP. Municipal area is positively associated with female LFP, which makes sense in the Mexican context because many of the smaller municipalities are located in the poorer south of the country where female LFP tends to be lower. Median nighttime luminosity in the municipality is positively associated with female LFP, consistent with nighttime lights indicating economic activity. Mean nighttime lights in the municipality, which conditional on median is a measure of right-skewness or inequality in luminosity, is negatively associated with LFP. This is consistent with women in more developed areas being able to afford not to work. Areas with a greater concentration of vegetation, as proxied by mean NDVI, have lower female LFP, perhaps also reflecting greater wealth in suburban areas with more vegetation. The correlates of male LFP are more difficult to interpret. The share of bare land in the municipality is positively and significantly correlated with male LFP, as is the number of untagged highway points, and the latter may be positively correlated with economic activity. Table 2 - Model diagnostics by dependent variable and sample type Female Male Female Male LFP LFP Unemp Unemp State - Simple Random Sample within Selected AGEBs Number of (non-state) variables selected 41 22 19 10 Marginal R2 0.127 0.061 0.025 0.028 13 Of which due to state dummies 0.055 0.018 0.017 0.022 Conditional R2 0.127 0.061 0.025 0.028 Skewness of state effect N/A N/A N/A N/A Kurtosis of state effect N/A N/A N/A N/A Estimated variance of state effect N/A N/A N/A N/A Estimated household residual N/A N/A N/A N/A Ratio of estimated state effect variance to total N/A N/A N/A N/A variance Municipality - Simple Random Sample within Selected AGEBs Number of (non-state) variables selected 41 22 19 10 2 Marginal R 0.132 0.071 0.038 0.032 Of which due to state dummies 0.055 0.018 0.017 0.022 2 Conditional R 0.293 0.170 0.132 0.077 Skewness of municipal effect -0.102 0.222 4.047 1.011 Kurtosis of municipal effect 4.840 6.168 36.395 5.679 Estimated variance of municipal effect 0.004 0.002 0.0013 0.0008 Estimated household residual 0.017 0.017 0.012 0.017 Ratio of estimated municipal effect variance to 0.186 0.107 0.098 0.046 total variance Municipality - Full Enumeration of Selected AGEBs Number of (non-state) variables selected 33 26 28 39 2 Marginal R 0.277 0.224 0.164 0.158 Of which due to state dummies 0.113 0.060 0.082 0.100 2 Conditional R 0.597 0.451 0.467 0.525 Skewness of municipal effect -0.610 -0.394 2.460 1.613 Kurtosis of municipal effect 6.128 4.779 24.136 12.472 Estimated variance of municipal effect 0.004 0.001 0.0011 0.0015 Estimated variance of household residual 0.005 0.003 0.0020 0.0020 Ratio of estimated municipal effect variance to 0.442 0.292 0.363 0.436 total variance Number of AGEBs 7,733 7,733 7,733 7,733 Number of municipalities 1,072 1,072 1,072 1,072 Dependent variable was transformed using arcsine transformation. We focus here on the municipality-level diagnostics. The marginal R2 values are generally low for the simple random sample, especially for unemployment rates. The low R2 does not necessarily mean that the small area estimates do not improve on the direct estimates. The smaller the sample and the greater the heterogeneity in outcomes across AGEBs, the more likely a small area 14 estimation model is to improve upon sample estimates. 21 However, the low R2 does raise concerns about biased estimates at the tails of the distribution, especially when estimating unemployment rates using the simple random sample. We expect that the combination of low marginal and conditional R2 may lead to particularly inaccurate estimates for male and female unemployment rates when using the simple random sample. Model predictions for labor force participation using the simple random sample look significantly better for LFP than for unemployment, however, especially for women where the marginal and conditional R2 values are 0.132 and 0.293 (as opposed to men where the same R2 values are 0.071 and 0.17). The ratio of the estimated variance of the area effect to total variance is 0.19 for female participation and 0.11 for male participation, while the corresponding ratio for unemployment is about 0.1 for women and about 0.05 for men. Using the full enumeration sample, not surprisingly, dramatically improves the prediction models. Marginal R2 rises to about 0.27 for female LFP and 0.22 for men, and to approximately 0.16 for unemployment rates. The ratio of the variance of the estimated area effect to total variance ranges from 30 to 44 percent, depending on the indicator. Compared with the models developed using the simple random sample, more weight is given to the sample, which more accurately reflects municipal outcomes when sample AGEBs are fully enumerated. The higher R2s of the models reflects the elimination of the considerable noise in the sample data, which drove down the R2s of the models estimated using the simple random sample. Finally, Appendix Figure A1 shows quantile-quantile plots for both components of the residuals, which is useful to check whether they appear to be normally distributed as assumed. The residuals are highly non-normal in most cases, except for the random area effect for male and female LFP. The non-normal random effects for unemployment, as well as non-normality in the household error terms may contribute to biased estimates of outcomes and uncertainty, which we will assess by comparing the small area estimates with the census. Future research can explore alternative transformations, such as the rank transformation used in Masaki et al.(2020), that might help achieve a more normal distribution. B. State-Level Results We first consider estimates at the state level, the level at which the survey is considered to be representative. We present the seven statistics listed in section 2.e, separately for direct and small area estimates, in Table 3. The results are striking: Even though the direct estimates are considered to be reliable at the state level, we see large improvements in all seven statistics, across all four indicators, when incorporating geospatial data. The estimated mean squared error is smaller, the estimated correlation with the full-census benchmark increases, and the estimated absolute deviation from the benchmark decreases. Median relative bias declines for all outcomes. Coverage rates improve, sometimes dramatically. The estimates of unemployment rates improve on the direct estimates on most dimensions, even though the R2 of some models is low. Because the outcomes measured in 21 For example, suppose as in this case that the sample covers 15 percent of the enumeration areas. Even a model that explains 10 percent of the variation in the remaining 85 percent of the population contributes at least the additional 8.5 percent of the total variation in the population not captured by the sample, in addition to any improvement in the estimates for sampled areas due to averaging samples with model predictions. 15 the sample contain a large amount of random measurement error, even models with low R2 can contribute significant amounts of additional information and improve the estimates. Table 3 - State-level Results (1) (2) (3) (4) Male LFP Female Male Female LFP Unemp Unemp Average estimated mean squared error (x1,000) Direct 0.3787 0.1983 0.0067 0.0031 SAE 0.0899 0.1263 0.0055 0.0021 Average estimated mean squared error (x1,000) with equalized coverage rates Direct 0.4048 1.2653 0.0555 0.0105 SAE 0.0898 0.1264 0.0055 0.0021 Coverage rate Direct 0.8750 0.7188 0.5625 0.5000 SAE 0.9375 1.0000 0.9062 0.6875 Estimated mean absolute error Direct 0.0179 0.0193 0.0048 0.0032 SAE 0.0088 0.0073 0.0023 0.0024 Median estimated relative bias Direct 2.1164 -3.0573 -10.7269 -12.3980 SAE -0.0757 -0.4322 1.5223 15.1544 Estimated rank correlation Direct 0.5652 0.7933 0.7911 0.6635 SAE 0.7185 0.9498 0.8867 0.7529 Estimated Pearson correlation Direct 0.6048 0.8141 0.7528 0.6506 SAE 0.7614 0.9583 0.8698 0.5467 States 32 32 32 32 All figures represent unweighted averages or correlation across states. Equalized coverage rates refer to multiplying the estimated MSE of the direct estimates for each state by a constant to achieve the same coverage rate as the small area estimates. Rank or Pearson correlation is the unweighted rank or Pearson correlation between the estimated value and the actual state value. The absolute deviation is the absolute value of the difference between the estimate and the actual state value. Relative bias is 100 times the estimated minus the actual state value, divided by the actual state value. Table A3 in the appendix presents a robustness check which calculates state-level results by aggregating municipality-level estimates up. The key difference between those results and the ones presented in this section are that the ones in Table 3 are estimated at the state level, without a random effect. The estimates in Table A3 are generally slightly worse than or similar to those in Table 3, with the possible exception of one accuracy indicator for male unemployment. The results in Table A3 give no empirical basis to prefer a method that aggregates the municipal estimates 16 with a random effect to one that estimates state results directly without a random effect. The latter has the advantage of being much simpler to estimate, particularly when it comes to measures of uncertainty. C. Municipal-level estimates for sampled municipalities. We now turn to municipality-level statistics, which the survey is not designed to estimate reliably. All reported estimates in this section pertain to in-sample municipalities only, so that the direct and SAE estimators are compared over the same municipalities. We begin by comparing measures of uncertainty for direct estimates and small area estimates. We present these results for all four labor market statistics in Table 4. The estimated mean squared error is always substantially smaller for the small area estimates than the direct estimates. This difference is largest – in percentage terms – for male labor force participation, where the estimated mean squared error is around a third as large as the direct estimate, roughly equivalent to effectively tripling the size of the sample. The second row reports coverage rates, which is the share of municipalities for which the true census value is within 1.96 standard errors of the point estimate. Despite the smaller estimated mean squared error for the small area estimates, the coverage rates are also higher, again across all four indicators. For female labor force participation in particular, there is a significant increase in coverage rates from 87 to 98 percent in the small area estimates. This suggests that uncertainty for the direct estimates is underestimated and uncertainty for the small area estimates may be slightly overestimated. Therefore, for reach of the four indicators, we multiply the MSE of the municipal direct estimates by a constant greater than one to replicate the coverage rate of the small area estimates. When coverage rates are equalized in this way, the estimated MSE falls by a factor of four to five for labor force participation. 22 Table 4 - Measures of Uncertainty and Coverage for Municipal Estimates (1) (2) (3) (4) Male Female Male Female LFP LFP Unemp Unemp Average estimated mean squared error (x1,000) Direct 9.329 5.494 0.288 0.161 SAE 3.331 4.991 0.179 0.093 Average estimated mean squared error (x1,000) with equalized coverage rates Direct 13.895 24.116 N/A N/A SAE 3.340 5.042 0.179 0.094 Median estimated relative SE Direct 0.122 0.156 N/A N/A SAE 0.068 0.143 0.480 0.494 22 Coverage rates cannot be equalized for unemployment rates because of the large share of direct estimates that are zero. 17 Coverage rate Direct 0.971 0.853 0.409 0.215 SAE 0.988 0.990 0.952 0.976 In-sample municipalities 1,072 1,072 1,072 1,072 Only in-sample municipalities are included. The coverage rate is equal to one in a given municipality if the actual census value is within the calculated confidence interval (point estimates +/- 1.96 times the standard error). MSE with equalized coverage rates refer to MSEs after multiplying the estimated variance of the direct estimates by a constant to match the coverage rate of the small area estimates. Relative SE is missing for unemployment due to the large number of zeros in the direct estimates. We note that both male and female unemployment rates are not normally distributed in our sample, even after transforming them, due to the mass at zero. The empirical best predictor method used here assumes a normal distribution. This may contribute to uncertainty if anything being slightly overestimated, as indicated by coverage rates that lie between 95 and 99 percent. We next present statistics intended to measure the accuracy of the estimates, relative to results in the full census. We start with estimated relative bias and absolute deviation in Table 5. These two measures differ along two key dimensions: The former is both relative, in the sense that it gives deviations from low true values greater weight than deviations from high true values, and directional, in the sense that negative and positive bias will cancel out. Estimated absolute deviation on the other hand is both measured in absolute terms and uses an absolute value and is therefore non-directional. It is also important in both cases to distinguish means from medians. Small (census) values of unemployment can lead to very large estimates of relative bias, since the bias formula contains the true value in the denominator. 23 For most policy applications relating to targeting, relative bias may be a misleading measure of accuracy for unemployment rates. This is because relative bias gives absolute discrepancies greater weight in municipalities with low true rates, which does not necessarily correspond to the objective of allocating resources to minimize unemployment. For labor force participation, both estimated mean and median relative bias is modestly higher in the SAE estimates participation than in the direct estimates, though median relative bias is under 1 percent in both cases. For unemployment rates, estimated mean relative bias is much higher, reflecting the lack of explanatory power of the unemployment models, which in turn leads to large relative bias in areas where true unemployment rates are low. However, when it comes to absolute deviation the SAE estimates outperform the direct estimates by a substantial margin, for both LFP and unemployment. For female LFP, for example, the SAE procedure reduces estimated mean absolute error by about 45 percent, from 0.069 to 0.039, and for female unemployment rates the estimated absolute deviation falls by over half. To get a further sense of the accuracy of the estimates, the bottom rows of the table report estimated rank and Pearson correlations. For the labor force participation estimates, the small area estimation increases rank correlation by 0.11 points for men, and by 0.13 points for women, from 0.59 to 0.73. For Pearson correlations the improvement is slightly smaller, 0.08 points for men and 0.11 points 23 For example, the smallest non-zero value for female unemployment is 0.0006349. An estimated unemployment rate of 0.005 would lead to a bias value of 688. 18 for women, but still sizeable. When it comes to the unemployment rate estimates, the estimated rank correlations also show improvements when comparing the SAE estimates to the direct estimates, increasing by 0.9 for men and 0.6 for women (from an admittedly low level). However, the estimated Pearson correlations show a marked decline, falling by 0.22 for men and 0.065 for women. This again reflects errors at the left and right tails of the true unemployment rate distribution, which the model does a very poor job of predicting. Table 5 - Measures of Accuracy for Municipal Estimates (1) (2) (3) (4) Male Female Male Female LFP LFP Unemp Unemp Mean Mean Mean Mean (median) (median) (median) (median) Estimated Relative Bias Direct 0.039 -1.017 -4.363 26.429 (0.306) (-1.288) (-59.069) (-100.000) SAE -0.507 1.423 30.732 99.948 (-0.622) (-0.726) (9.651) (40.522) Estimated mean absolute error Direct 0.061 0.071 0.022 0.017 (0.046) (0.054) (0.015) (0.009) SAE 0.028 0.040 0.009 0.006 (0.022) (0.028) (0.006) (0.005) Estimated Rank correlation Direct 0.471 0.623 0.363 0.324 SAE 0.541 0.750 0.497 0.264 Estimated Pearson correlation Direct 0.493 0.659 0.418 0.385 SAE 0.553 0.770 0.415 0.614 In-sample municipalities 1,072 1,072 1,072 1,072 Only in-sample municipalities are included. Reported statistics are unweighted means, medians (in parentheses), or correlations across municipalities. Relative Bias is defined as 100 times the deviation of the estimated value and the true value divided by the true value. Rank or Pearson correlation is the unweighted rank or Pearson correlation between the estimated value and the actual municipal value. The direct estimates of unemployment actually show negative estimated relative bias. This is likely driven by the fact that most AGEBs with a true unemployment rate of 1 percent are likely to have a sample with an unemployment rate of zero, leading to a consistent underestimate of the true rate. This is clear in Figure 2, which plots bias for male and female unemployment. Values of -100 indicate estimates of zero for unemployment. The direct estimates cluster disproportionately at this lower bound, indicating a large number of zeros. Although this averages out somewhat over municipalities (and different draws, which we return to below), the estimates for a single 19 municipality are often well off the mark. The figure shows that the small area estimates clearly reduce the estimated mean absolute deviation over the direct estimates for all four indicators, especially for small municipalities, despite high mean bias values for unemployment rates. Overall, comparing results in sampled municipalities shows that the small area estimation procedure substantially improves the precision and accuracy of the estimates for labor force participation. Figure 2 – Estimated bias for unemployment D. Out-of-sample municipalities The comparisons in the previous section examined direct estimates and small area estimates for sampled municipalities, but did not address whether out-of-sample predictions are accurate. We first compare selected statistics of these out-of-sample municipalities with the in-sample direct estimates. We present unweighted summary statistics of the estimated MSE, coverage rate, estimated absolute deviation, and estimated rank and Pearson correlation in Table 6. 20 Table 6 - In-sample direct estimates vs. out-of-sample small area estimates (1) (2) (3) (4) Male LFP Female LFP Male Female Unemp Unemp Average estimated mean squared error (x1,000) Direct (in sample only) 9.329 5.494 0.288 0.161 SAE (in sample only) 3.331 4.991 0.179 0.093 SAE (out of sample only) 6.736 10.458 0.379 0.202 Coverage rate Direct (in sample only) 0.971 0.853 0.409 0.215 SAE (in sample only) 0.988 0.990 0.952 0.976 SAE (out of sample only) 0.974 0.956 0.940 0.979 Estimated mean absolute error Direct (in sample only) 0.061 0.071 0.022 0.017 SAE (in sample only) 0.028 0.040 0.009 0.006 SAE (out of sample only) 0.046 0.073 0.013 0.009 Estimated rank correlation Direct (in sample only) 0.471 0.623 0.363 0.324 SAE (in sample only) 0.541 0.750 0.497 0.264 SAE (out of sample only) 0.225 0.312 0.309 0.207 Estimated Pearson correlation Direct (in sample only) 0.493 0.659 0.418 0.385 SAE (in sample only) 0.553 0.770 0.415 0.614 SAE (out of sample only) 0.164 0.298 0.163 0.131 In-sample municipalities 1,072 1,072 1,072 1,072 Out-of-sample municipalities 569 569 569 569 Coverage rate is equal to one if the true census value is within 1.96 standard errors of the point estimate. There are several patterns to note. First, looking at estimated mean squared error, estimates for non-sampled municipalities are approximately half as precise as those from sampled municipalities, and - except for male LFPs - are less precise than the direct estimates for sampled municipalities. Second, for unemployment rates, the coverage rate is markedly better for the out-of-sample small area estimates relative to the direct estimates. Third, small area estimates are far less accurate out of sample than in-sample. For example, for female LFP estimated absolute deviation is approximately twice as large and correlations are nearly twice as low for out-of-sample municipalities compared with in-sample municipalities. Comparisons between survey-based in- sample municipalities and out-of-sample small area estimates are more mixed. With the exception of female LFP, out-of-sample estimates have slightly lower estimated mean absolute deviation than in-sample survey estimates, but far lower correlations. 21 Figure 3 - In sample SAE, out of sample SAE, and direct estimates Figure 3 presents kernel density graphs of deviations from the true value for three separate estimates: in-sample SAE, out-of-sample SAE, and direct estimates (which are only available in- sample). We see the tendency of direct estimates to underestimate the true outcome for unemployment, due to the large number of municipalities with zero unemployment in the sample. The out-of-sample estimates appear to be reasonable when compared to the direct estimates, which are in sample. The in-sample small area estimates clearly perform better than either of the other estimates for LFP, but the difference is less stark between in-sample and out-of-sample SAE for unemployment, especially for female unemployment. Overall, small area estimates for sampled municipalities are far more precise and accurate than both out-of-sample predictions and direct survey estimates. E. Full enumeration of sampled AGEBs As discussed above, a key issue complicating the prediction of unemployment is that unemployment rates in urban Mexico are very low. As a result, the sample contains a large number of municipalities with no unemployed persons, which adversely affects the accuracy of both the direct estimates and small area estimates. The large number of zeroes makes it difficult to predict 22 variation in unemployment rates, and also leads the small area estimation procedure to underestimate the variance of the municipal random effect, which in turn gives these inaccurate predictions more weight relative to the survey data in generating the estimates. Therefore, the small area estimates for unemployment suffer from high levels of relative bias and are more weakly correlated with the census values than the direct estimates. While a non-linear estimation model might improve these estimates, a natural question is whether larger samples might improve the small area estimates for unemployment. This ties into a larger question about the potential benefits of expanding the second stage of samples when combining survey data with alternative data sources. To shed some light on this question, we repeat the small area estimation exercise after simulating a sample in which every household in selected AGEBs is included in the sample. This will reduce measurement error in the dependent variable, which substantially improves the quality of the predictions (Lobell et al., 2021, Engstrom et al., 2021). The exercise in intended to provide an upper bound estimate of the impact of expanding the second stage of the sample. But since listing exercises are common parts of household surveys in many developing countries, it may be possible to include selected questions as part of the listing exercise. The full enumeration results also shed light on the potential benefits of expanding the second stage of samples when combining survey data with alternative data sources. Table 7 - Sample of sampled AGEBs vs. full enumeration of sampled AGEBs (1) (2) (3) (4) Male LFP Female LFP Male Female Unemp Unemp Panel A: All municipalities Average estimated mean squared error (x1,000) Random sample SAE 4.512 6.887 0.248 0.131 Full enumeration SAE 1.306 3.450 0.147 0.049 Coverage rate Random sample SAE 0.983 0.978 0.948 0.977 Full enumeration SAE 0.951 0.943 0.941 0.960 Estimated relative bias (median) Random sample SAE -0.759 -0.816 9.858 56.367 Full enumeration SAE -0.510 -0.275 8.363 12.187 Estimated mean absolute error Random sample SAE 0.034 0.051 0.010 0.007 Full enumeration SAE 0.026 0.042 0.008 0.004 Estimated rank Correlation Random sample SAE 0.436 0.659 0.426 0.241 Full enumeration SAE 0.673 0.748 0.633 0.599 Panel B: In-sample municipalities 23 Average estimated mean squared error (x1,000) Random sample SAE 3.331 4.991 0.179 0.093 Full enumeration SAE 0.869 2.014 0.084 0.033 Full enumeration direct 0.240 0.510 0.043 0.045 Coverage rate Random sample SAE 0.988 0.990 0.952 0.976 Full enumeration SAE 0.970 0.966 0.953 0.965 Full enumeration direct 0.543 0.538 0.537 0.547 Estimated median relative bias Random sample SAE -0.622 -0.726 9.651 40.522 Full enumeration SAE -0.491 -0.325 7.281 10.281 Full enumeration direct -0.041 0.000 0.000 -0.186 Estimated mean absolute error Random sample SAE 0.028 0.040 0.009 0.006 Full enumeration SAE 0.017 0.026 0.006 0.003 Full enumeration direct 0.019 0.030 0.006 0.004 Estimated rank correlation Random sample SAE 0.541 0.750 0.497 0.264 Full enumeration SAE 0.819 0.866 0.805 0.741 Full enumeration direct 0.830 0.852 0.808 0.722 Table 7 presents summary statistics comparing the sample SAE and full enumeration SAE, as well as the direct estimates using the full enumeration of sampled AGEBs. Panel A compares across all municipalities, while Panel B compares only across in-sample municipalities. Not surprisingly, both the small area estimates and the direct estimates using the full enumeration sample perform much better than the small area estimates using the smaller, more typical, sampling strategy. The main finding from this exercise, however, is that the small area estimation only slightly improves on the direct estimates in this case, except for the coverage rate because uncertainty is estimated more accurately. For example, for female LFP, the small area estimates only reduces absolute deviation by 0.3 pp, from 2.9 to 2.6 pp, and only improves rank correlation from 0.86 to 0.875. For other indicators, the benefit of SAE when the second stage of the sample is fully enumerated is even smaller. Figure 4 compares the accuracy of the random sample and full enumeration small area estimates, relative to the full census. The blue dots clearly demonstrate that the random sample SAE predictions for unemployment are close to the overall average, due to the poor predictive power of the unemployment models. In contrast, the orange dots representing the full enumeration unemployment results exhibit far more variation and better track the true rate, as do the LFP results for the random sample. This reflects the absence of variation available in the smaller sample to train an accurate model when predicting unemployment rates. 24 Figure 4 - Sample vs. Full Enumeration Small Area Estimates To sum up, the results show that in most cases the use of this publicly available set of geospatial data improves estimates of labor force statistics in the context of urban Mexico. This improvement is seen at the state level – the level at which the pseudo survey is reliable, as well as at the municipality level, a level at which the pseudo survey is not considered to be reliable. In addition, small area estimation also generates synthetic predictions for non-sampled municipalities, although these are about half as accurate as the estimates for in-sample municipalities, when looking at estimated mean absolute deviation and correlation. When considering in-sample estimates for municipal unemployment obtained from the simple random sample, small area estimates do not always perform well. In this case, the estimated simple correlations between the small area estimates and the full census are markedly lower than those for direct estimates. Because true unemployment rates are close to zero and the sample is small, the simple random sample does not contain sufficient information to generate accurately municipal unemployment estimates with a linear mixed model. This is in part because the method gives too much weight to the inaccurate model predictions vis-à-vis the sample. 24 24 This is not an issue for the state level estimates because even though the model is not predictive, it includes state- level fixed effects which effectively incorporates the sample data into the estimates. 25 4. Robustness checks A. Design-based simulations The results in the previous section use just a single sample to compare results from two estimators: a direct estimator, using just survey results, and a small area estimator, which uses auxiliary information to improve the accuracy and precision of the survey results. However, it is difficult to come to a firm conclusion when using just a single sample. Therefore, we simulate the small area estimates 100 times, using 100 different samples and calculating 100 different point estimates for direct and small area estimates. We do not calculate MSE on each simulation due to computing limitations. Table 8 - Simulation results for municipal estimates (1) (2) (3) (4) Male LFP Female LFP Male Female Unemp Unemp mean mean mean mean (median) (median) (median) (median) Simulated relative bias Direct 0.186 -0.222 0.128 1.451 (0.534) (-0.302) (-58.406) (-100.000) SAE -0.274 1.755 35.627 71.286 (-0.656) (-0.561) (11.734) (24.950) Simulated mean absolute error Direct 0.060 0.070 0.023 0.015 SAE 0.028 0.040 0.009 0.005 Simulated rank correlation Direct 0.443 0.614 0.366 0.361 SAE 0.555 0.742 0.449 0.389 Rank correlations are the average rank correlations across the 100 simulations. Table 8 presents three separate statistics – bias, absolute deviation from the census value, and the correlation with the true value – for direct estimates and small area estimates. Simulated mean and median bias is quite small for both direct estimates and small area estimates for LFP, with the highest rate being 2 percent mean bias for female LFP. The means are always higher than the medians, especially for the small area estimates for unemployment, as very low base rates can lead to very high values in some simulations. 26 In terms of mean absolute error, small area estimates perform markedly better. For LFP, simulated mean absolute error for the small area estimates is between 30 and 50 percent smaller than the direct estimates. For unemployment, the difference is even starker; for both male and female unemployment, the small area estimate’s simulated absolute deviation is less than half the size of the direct estimate’s absolute deviation. The standard deviation across simulations is also markedly lower for SAE, across all four outcomes, with the difference again largest for unemployment. Finally, rank correlations are about 0.1 higher for the small area estimates than the direct estimates for LFP, 0.08 higher for male unemployment, and about the same for female unemployment. Figure 5 - In sample SAE, out of sample SAE, and direct estimates Deviation from truth across 100 simulations Figure 5 plots the deviations from census values across all the simulations. Both male and female unemployment have a mass below zero for the direct estimates, implying an underestimate of the actual value. Moreover, there is a long right tail for the direct estimate. This is even more pronounced in smaller municipalities, for whom the sample size tends to be much smaller (since fewer AGEBs are selected in each simulation). Figure 6 plots the same kernel density estimates, except only for municipalities below the median adult (12+) population. The underestimate for 27 unemployment is more pronounced here, with very little mass at zero, a large mass below zero, and a long right tail. Figure 6 - In sample SAE, out of sample SAE, and direct estimates Municipalities below median population We present one final graph to better understand the variability in the direct and small area estimates across simulations. Figure 7 plots each municipality across the x axis. The y axis is the deviation from truth, with the bars showing the range (maximum value to minimum value) across the 100 simulations for each municipality. The results are stark; the range for direct estimates are markedly larger than for small area estimates. This is especially true for unemployment, with most municipalities having some very large deviations in the direct estimates, but very few showing similar deviations with the small area estimates. It is worth noting that there do appear to be some municipalities that are consistently estimated poorly by the small area estimation. For both male and female LFP, we see a few municipalities that are always overestimated. This of course does not happen with the direct estimates, which are a random sample and thus generally cross zero. 28 Figure 7 - Range of deviation from truth across simulations A natural question is whether we can say anything about the characteristics of the municipalities for which predictions are overestimated. One obvious characteristic is population. We look at municipalities for which the minimum deviation is above 0.1, meaning that the entirety of the range lies quite a bit above zero and that we are consistently overestimating LFP for these municipalities. For male LFP, there are 33 such municipalities. The average number of adults in these 33 municipalities is 3,223. This is in stark contrast to the other municipalities, with a mean of 49,618 adults. Similarly, for female LFP, there are 83 such municipalities, with a mean adult population of 3,898, compared to a mean of 51,071 for the municipalities that are more accurately estimated. It seems that municipalities with less accurate predictions are, on average, much smaller municipalities. Moreover, if we look at simple unweighted correlations, smaller municipalities are also less accurately predicted in general across the simulations. For example, the correlation between the small area estimates for male LFP and the actual census value for municipalities above the median adult population is 0.560, while the correlation for municipalities below the median is just 0.317. We see similar differences for female LFP (0.677 vs. 0.532), male unemployment (0.399 vs. 0.157), and female unemployment (0.286 vs. 0.040). If we weight by the relevant population value, the differences are even larger. 29 One interesting area of future research will be to determine whether these smaller municipalities are simply more idiosyncratic and, thus, harder to predict, or whether they differ structurally in observables ways. If the latter is true, different types of data or methods might improve estimation for the smallest municipalities. In other words, is it possible to improve the estimation of these municipalities, or are there simply weaker systematic relationships between geospatial data and labor market statistics for these areas? B. Area-level model An alternative method for combining survey data and geospatial data is to obtain municipal estimates is an area-level model (Fay and Herriot, 1979). This method is convenient and well- known, but suffers from a few limitations in this context. The main shortcoming is that it requires aggregating the auxiliary data to the municipal level, which discards variation across AGEBs within municipalities. In addition, the method faces challenges to accurately generate estimates of municipal variance, which is an important input into the model. This is particularly true when there is a significant number of zero values, as is the case for unemployment rates in this context. To better understand whether these potential limitations are important in practice, we examine the results of two area-level modeling exercises. Fay-Herriot models can either be assumed to be linear models or the dependent variable can be transformed prior to estimation. For ratios such as labor force participation and unemployment rates, it is common to transform the variable using an arcsine transformation (Halbmeier et al, 2019). Use of such a transformation adds additional complexity to the procedure, but ensures that predicted participation or unemployment rates lie between zero and one. In addition, using the arcsine transformation is more consistent with the AGEB-level model discussed above. For these reasons, we use the transformed model as our main results, but display the results of a linear model in Table A4. We select a model for each outcome by applying LASSO to the full set of municipal candidate geospatial variables, again selecting lambda to minimize BIC. When using the arcsine transformation, the dependent variable is the survey-weighted estimate of the transformed mean outcome variable of interest in each municipality. When using the linear model, no transformation is applied prior to the LASSO. We then calculate the sampling variance of each outcome using the Horvitz-Thompson approximation, as recommended by Halbmeier et al (2019), and fit the model using the restricted information maximum likelihood option of Stata’s fayherriot package. The R2s of the transformed models are quite low: 0.06 and 0.19 for male and female LFP, and only 0.10 and 0.06 for male and female unemployment. Even after reducing the noise associated with the unemployment measure by aggregating to the municipality level, the geospatial variables are weak predictors of unemployment rates and male LFP, although they predict cross-municipal variation in female LFP somewhat better. Table 9 displays the results from the transformed Fay-Herriot (FH) area-level model. 25 We estimate the FH using an arcsine transformation of the target variable, just like with the sub-area 25 The table omits MSE because the State fayherriot command used to estimate the FH model does not output standard errors. Instead, it outputs confidence intervals derived from a bootstrap procedure. Because the confidence intervals are asymmetric, calculation of the MSE is not straightforward. 30 (AGEB-level) model. Across all statistics and labor market indicators -- except one -- the area- level model performs worse than the sub-area model. The exception is the estimated rank correlation for female unemployment, which the AGEB-level model predicts poorly due to the large number of zeroes in the unemployment indicator. The area-level model often performs better than the direct estimates (with absolute deviation, for example), especially for unemployment. Table A4 in the appendix shows the comparison between the transformed and linear Fay-Herriot model. The transformed area-level model generates much more accurate estimates of uncertainty for unemployment and male LFP than the linear area-level model, as the linear model greatly underestimates mean squared error for these indicators, leading to very low coverage rates. However, the results on accuracy are mixed. For example, when examining correlations, the arcsine estimates are more accurate for male LFP but less accurate than the linear model for female LFP. The results suggest that the linear F-H is preferred to the transformed F-H model in the case of female LFP, when the prediction model is better. However, when considering male and female LFP, the AGEB-level model generates more accurate estimates than both area-level model according to all criteria. Table 9 – Comparison of direct, AGEB-level, and municipal-level (FH) (1) (2) (3) (4) Male Female Male Female LFP LFP Unemp Unemp Coverage rate Direct 0.971 0.853 0.409 0.215 AGEB-level model 0.988 0.990 0.952 0.976 Transformed area-level model 0.887 0.823 0.804 0.842 Estimated Relative Bias (median) Direct 0.039 -1.017 -4.363 26.429 (0.306) (-1.288) (-59.069) (-100.000) AGEB-level model -0.507 1.423 30.732 99.948 (-0.622) (-0.726) (9.651) (40.522) Transformed Area-level model 1.382 4.058 -37.108 -53.035 (0.717) (-1.784) (-51.308) (-76.510) Estimated mean absolute error (median) Direct 0.061 0.071 0.022 0.017 (0.046) (0.054) (0.015) (0.009) AGEB-level model 0.028 0.040 0.009 0.006 (0.022) (0.028) (0.006) (0.005) Transformed Area-level model 0.034 0.053 0.015 0.008 (0.025) (0.040) (0.010) (0.006) Estimated rank correlation 31 Direct 0.471 0.623 0.363 0.324 AGEB-level model 0.541 0.750 0.497 0.264 Transformed Area-level model 0.444 0.652 0.294 0.373 Estimated Pearson correlation Direct 0.493 0.659 0.418 0.385 AGEB-level model 0.553 0.770 0.415 0.614 Transformed Area-level model 0.405 0.619 0.209 0.384 Table A4 in the appendix also presents a comparison of an area-level model using the transformed target variable and an area-level model using a linear (untransformed) model. The transformed model generally performs better than the linear model, with the possible exception of some female LFP statistics (coverage rates, absolute deviation, and correlations). However, the AGEB-level model remains the best choice in this context. 5. Conclusion This paper considers the extent to which combining simulated sample data with publicly available geospatial data improves state and municipal estimates of male and female labor force participation and unemployment rates. Results are compared against the full 2020 census. The small area estimation procedure greatly improves the accuracy and precision of state level estimates for all four indicators, as well as municipal estimates of male and female labor force participation. This method can therefore be used to obtain more granular information on municipalities with low LFP, which can lead to a better understanding of the causes of low female LFP and potential policies to address it. The method does not work nearly as well for estimating urban unemployment, because the values for unemployment rates are very low and zero for many clusters in the sample. In this setting, linear mixed models do not perform well. Although rank correlations and mean absolute deviations both improve, the simple correlation with the census value falls and mean relative bias is very high. In all cases except for one, estimates from a model specified at the AGEB level generate more accurate predictions than those obtained from an area-level model, with the lone exception being when considering the rank correlation for estimated female unemployment rates. Not surprisingly, using a hypothetical enumeration of all households in selected clusters dramatically improves the accuracy of the estimates. But in this case the small area estimates offer very minor improvements in accuracy and precision over direct estimates from the fully enumerated sample. Non-linear models such as two-part models may be an appealing alternative for small area estimation of low-probability events in a sample. In the two-part model, the first part models the probability of a positive unemployment rate and the second part models the unemployment rate conditional on a positive rate. This is feasible to incorporate in a small area estimation framework, but significant effort would be required to implement it in existing software. 32 Another important area for further work is to continue to experiment with additional forms of geographically comprehensive auxiliary data to better predict labor market outcomes. This could include, for example, information on building footprints or the presence of different types of businesses. New types of geospatial data are being released each year that can be incorporated into this type of approach. There is room for further work on model diagnostics. The results of this study demonstrate that R2 can be a misleading metric for assessing the usefulness of the model, since even models with relatively low R2s can contribute important information to reduce bias in the predictions (Marhuenda et al., 2017). At the same time, model R2s are much higher in the full enumeration sample, but the small area estimates contribute little additional information. An important research agenda for further work is to better understand which diagnostic indicators, in cases where census data is not available, can provide a rough assessment of the gains from incorporating geospatial data. Another open question is how optimal sample design changes in the presence of free, predictive geospatial data that can be linked to surveys. The evidence presented above indicates that estimates for sampled municipalities are substantially more accurate than non-sampled municipalities, suggesting that surveys should try to cover all target areas if possible. It is not clear, however, how the benefit of small area estimation relates to the sampling structure, and what this implies for optimally structuring household surveys. Importantly, it remains to be seen how well the results here will generalize to other contexts. As such, it will be important to conduct studies to evaluate the methodology in other settings before applying it generally. This is easier said than done, however. In order to validate the methodology, it is best to use a census – or an unusually large survey –as a measure of ground truth. Furthermore, either the survey or ground truth must have either geolocated enumeration areas or, as in this case, highly disaggregated geographic identifiers that can be matched with a shapefile. A final important issue pertains to the appropriate weighting strategy for the conditional random effect models relied upon here and in other small area estimation applications. We make three points regarding weights. The first relates to the choice of whether to give equal weight to each municipality or to weight by population when evaluating the estimates. We consider only the former, as is common in the literature on small area estimation. However, in some policy contexts, there is a strong argument that it is more important to generate accurate estimates for more populous areas than less populous areas. This is a decision best made based on political rather than technical grounds. Second, properly weighing the estimates of a sub-area model such as this one is not trivial. In this case, the weights for each AGEB in the sample were normalized by dividing each weight by its municipal average. This ensured that each municipality was given weight proportional to its sample size, while appropriately giving more weight to more populous AGEBs within municipalities. The latter is important because population weights are used to aggregate AGEB- level estimates to municipalities. Furthermore, in most sub-area models, it would be beneficial to adjust the weights further to correct for heteroscedasticity, to the extent that the number of 33 observations used to construct the sub-area average used as the dependent variable varies across observations. Finally, there are different methods for accounting for weights in conditional random effect models. We used a particular method implemented by Pinheiro et al. (2021), although other methods have also been proposed and implemented. 26 Establishing the pros and cons of different approaches to incorporating weights in small area estimation models is an important topic for further research. 26 See, for example, Skarke and Kreutzmann (2021), who implement the correction for informative sampling proposed by Guadaramma et al.(2018). 34 References Arora, V., Lahiri, P., & Mukherjee, K. (1997). Empirical Bayes estimation of finite population means from complex surveys. Journal of the American Statistical Association, 92(440), 1555-1562. Battese, G. E., Harter, R. M., & Fuller, W. A. (1988). An error-components model for prediction of county crop areas using survey and satellite data. Journal of the American Statistical Association, 83(401), 28- 36. Belotti, F., Deb, P., Manning, W. G., & Norton, E. C. (2015). twopm: Two-part models. The Stata Journal, 15(1), 3-20. Butar, F. B. (1997). Empirical Bayes methods in survey sampling. The University of Nebraska-Lincoln. Butar, F. B., & Lahiri, P. (2003). On measures of uncertainty of empirical Bayes small-area estimators. Journal of Statistical Planning and Inference, 112(1-2), 63-76. Burke, M., Driscoll, A., Lobell, D. B., & Ermon, S. (2021). Using satellite imagery to understand and promote sustainable development. Science, 371(6535). Chambers, R., Salvati, N., & Tzavidis, N. (2016). Semiparametric small area estimation for binary outcomes with application to unemployment estimation for local authorities in the UK. Journal of the Royal Statistical Society. Series A (Statistics in Society), 453-479. Efron, B., & Morris, C. (1977). Stein's paradox in statistics. Scientific American, 236(5), 119-127. Erciulescu, A. L., Cruze, N. B., & Nandram, B. (2019). Model-based county level crop estimates incorporating auxiliary sources of information. Journal of the Royal Statistical Society: Series A (Statistics in Society), 182(1), 283-303. Engstrom, R., Hersh, J., Newhouse, D., Poverty from Space: Using High Resolution Satellite Imagery for Estimating Economic Well-being, The World Bank Economic Review, forthcoming;, lhab015, https://doi.org/10.1093/wber/lhab015 Esteban, M. D., Lombardía, M. J., López-Vizcaíno, E., Morales, D., & Pérez, A. (2020). Small area estimation of proportions under area-level compositional mixed models. TEST, 29(3), 793-818. Ghosh, M., & Lahiri, P. (1992). A hierarchical Bayes approach to small area estimation with auxiliary information. In Bayesian Analysis in Statistics and Econometrics (pp. 107-125). Springer, New York, NY. Gong, P., Li, X., Wang, J., Bai, Y., Chen, B., Hu, T., ... & Zhou, Y. (2020). Annual maps of global artificial impervious area (GAIA) between 1985 and 2018. Remote Sensing of Environment, 236, 111510. González-Manteiga, W., Lombardía, M. J., Molina, I., Morales, D., & Santamaría, L. (2008). Bootstrap mean squared error of a small-area EBLUP. Journal of Statistical Computation and Simulation, 78(5), 443-462. Jean, N., Burke, M., Xie, M., Davis, W. M., Lobell, D. B., & Ermon, S. (2016). Combining satellite imagery and machine learning to predict poverty. Science, 353(6301), 790-794. Jiang, J., & Lahiri, P. (2001). Empirical best prediction for small area inference with binary data. Annals of the Institute of Statistical Mathematics, 53(2), 217-243. 35 Inchauste,M. G; Tavares,P; Reteguis, N; Moreno Herrera,L; Arceo-Gómez, E; Ríos Cázares, A; Santillán, A; Cadena,Kiyomi E.; Iacovone,L; Saucedo Carranza,C; Anderson,M. Mexico - Gender Assessment (English). Washington, D.C. : World Bank Group. http://documents.worldbank.org/curated/en/377311556867098027/Mexico-Gender-Assessment” Klasen, S. (2019). What explains uneven female labor force participation levels and trends in developing countries?. The World Bank Research Observer, 34(2), 161-197. Kreutzmann, A. K., Pannier, S., Rojas-Perilla, N., Schmid, T., Templ, M., & Tzavidis, N. (2019). The R package emdi for estimating and mapping regionally disaggregated indicators. Journal of Statistical Software, 91. Li, Y., & Lahiri, P. (2019). A simple adaptation of variable selection software for regression models to select variables in nested error regression models. Sankhya B, 81(2), 302-317. Lobell, D. B., Azzari, G., Burke, M., Gourlay, S., Jin, Z., Kilic, T., & Murray, S. (2020). Eyes in the Sky, Boots on the Ground: Assessing Satellite‐and Ground‐Based Approaches to Crop Yield Measurement and Analysis. American Journal of Agricultural Economics, 102(1), 202-219. Lobell, D. B., Di Tommaso, S., Burke, M., & Kilic, T. (2021). Twice Is Nice: The Benefits of Two Ground Measures for Evaluating the Accuracy of Satellite-Based Sustainability Estimates. Remote Sensing, 13(16), 3160. López-Vizcaíno, E., Lombardía, M. J., & Morales, D. (2013). Multinomial-based small area estimation of labour force indicators. Statistical modelling, 13(2), 153-178. Marhuenda, Y., Molina, I., Morales, D., & Rao, J. N. K. (2017). Poverty mapping in small areas under a twofold nested error regression model. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(4), 1111-1136. Masaki, T., Newhouse, D., Silwal, A. R., Bedada, A., & Engstrom, R. (2020). Small area estimation of non-monetary poverty with geospatial data. McBride, L., Barrett, C. B., Browne, C., Hu, L., Liu, Y., Matteson, D. S., ... & Wen, J. (2021). Predicting poverty and malnutrition for targeting, mapping, monitoring, and early warning (No. 2388-2021-383). Marhuenda, Y., Molina, I., Morales, D., & Rao, J. N. K. (2017). Poverty mapping in small areas under a twofold nested error regression model. Journal of the Royal Statistical Society: Series A (Statistics in Society), 180(4), 1111-1136. Molina, I., Saei, A., & José Lombardía, M. (2007). Small area estimates of labour force participation under a multinomial logit mixed model. Journal of the Royal Statistical Society: Series A (Statistics in Society), 170(4), 975-1000. Molina, I. and Y Marhuenda (2015). R package SAE: Methodology. Available at: https://cran.r- project.org/web/packages/sae/vignettes/sae_methodology.pdf Pinheiro, J., D. Bates, S. DebRoy, D. Sarkar, and R Core Team. "nlme: Linear and nonlinear mixed effects models. R package version 3.1-145." (2020). Pokhriyal, N., & Jacques, D. C. (2017). Combining disparate data sources for improved poverty prediction and mapping. Proceedings of the National Academy of Sciences, 114(46), E9783-E9792. Rabe‐Hesketh, S., & Skrondal, A. (2006). Multilevel modelling of complex survey data. Journal of the Royal Statistical Society: Series A (Statistics in Society), 169(4), 805-827. 36 Steele, J. E., Sundsøy, P. R., Pezzulo, C., Alegana, V. A., Bird, T. J., Blumenstock, J., ... & Bengtsson, L. (2017). Mapping poverty using mobile phone and satellite data. Journal of The Royal Society Interface, 14(127), 20160690. Stein, C. (1964). Inadmissibility of the usual estimator for the variance of a normal distribution with unknown mean. Annals of the Institute of Statistical Mathematics, 16(1), 155-160. Ugarte, M. D., Goicoa, T., Militino, A. F., & Sagaseta-López, M. (2009). Estimating unemployment in very small areas. SORT-Statistics and Operations Research Transactions, 49-70. Wardrop, N. A., Jochem, W. C., Bird, T. J., Chamberlain, H. R., Clarke, D., Kerr, D., & Tatem, A. J. (2018). Spatially disaggregated population estimates in the absence of national population and housing census data. Proceedings of the National Academy of Sciences, 115(14), 3529-3537. World Bank. World Development Report 2021: Data for Better Lives. The World Bank, 2021. Yeh, C., Perez, A., Driscoll, A., Azzari, G., Tang, Z., Lobell, D., ... & Burke, M. (2020). Using publicly available satellite imagery and deep learning to understand economic well-being in Africa. Nature communications, 11(1), 1-11. Zhang, Y., Li, R., & Tsai, C. L. (2010). Regularization parameter selections via generalized information criterion. Journal of the American Statistical Association, 105(489), 312-323. 37 Appendix Table A1 - Post-lasso regression coefficients for simple random sample models (1) (2) (3) (4) Female Male Female Male LFP LFP Unemp Unemp b/se b/se b/se b/se Baja California 0.1064** -0.0115 -0.0111 -0.0144 Baja California Sur 0.0495** -0.0263 0.0025 0.0019 Campeche 0.0072 0.0225 0.0040 -0.0065 Coahuila de Zaragoza -0.0318 -0.0515** 0.0031 0.0242* Colima 0.0654** 0.0122 -0.0088 0.0012 Chiapas 0.0248 0.0239 0.0478** 0.0203 Chihuahua 0.0157 -0.0363** -0.0057 0.0096 Ciudad de México 0.0417 0.0172 -0.0008 -0.0203 Durango -0.0180 -0.0330** -0.0088 0.0380** Guanajuato 0.0144 -0.0024 -0.0048 0.0104 Guerrero 0.0854** 0.0092 0.0006 0.0108 Hidalgo 0.0439** -0.0085 -0.0020 -0.0099 Jalisco 0.0517** 0.0222 -0.0227** -0.0266* México -0.0075 -0.0008 -0.0015 0.0060 Michoacán de Ocampo 0.0518** 0.0336* -0.0080 0.0030 Morelos 0.0819** 0.0141 0.0277* 0.0156 Nayarit 0.0994** 0.0113 -0.0115 -0.0163 Nuevo León -0.0037 -0.0123 -0.0073 -0.0003 Oaxaca 0.0917** -0.0013 -0.0108 0.0020 Puebla -0.0005 0.0289* -0.0031 0.0054 Querétaro 0.0596** 0.0483** -0.0129 0.0058 Quintana Roo 0.0659** 0.0464* 0.0055 -0.0115 San Luis Potosí 0.0318* -0.0109 -0.0032 0.0078 Sinaloa 0.0385 -0.0368* -0.0077 -0.0162 Sonora 0.0463* -0.0466** 0.0012 0.0335* Tabasco 0.0014 0.0062 0.0153 0.0425* Tamaulipas 0.0129 -0.0045 0.0089 0.0187 Tlaxcala 0.0222 0.0312* 0.0174 0.0450** Veracruz 0.0350* 0.0031 0.0070 0.0261* Yucatán 0.0219 0.0233 -0.0163 -0.0171 Zacatecas -0.0235 -0.0499** -0.0104 0.0241 Share of AGEB classified as cropland -0.0003 Share of AGEB classified as urban -0.0004** Mean NDVI -0.2772** Median NO2 57.5187 Number of untagged roads 0.0012 Number of platforms/bus stops 0.0001 Number of residential roads 0.0064* Number of secondary roads 0.0030 -0.0067** Number of service roads -0.0009 Number of unclassified roads -0.0025 -0.0004 Length of untagged highways 0.0001 Length of footways 0.0008 -0.0031** -0.0005 Length of living streets 0.0011 Length of paths/trails -0.0007 0.0004 Length of primary highways 0.0008 -0.0003 Length of primary link highways 0.0003 Length of tertiary link roads 0.0027 Number of mini roundabouts 0.0122 Number of emergency escape ramps in -0.0001 municipality Number of roads in municipality 0.0000 Number of trunk roads in municipality 0.0000 Total length of crossings in municipality 0.0009** Total length of motorways in municipality -0.0000** Number of untagged highway points in 0.0016 0.0046 municipality Number of crossings in municipality -0.0017 Number of give way signs in municipality -0.0257* Number of residential highways in municipality 0.0025 Number of townhalls in municipality -0.0032 -0.0037 Number of public marketplaces in municipality -0.0022 Number of places of worship in municipality 0.0067 Number of fountains in municipality 0.0028 Number of miscellaneous companies in -0.0030 municipality Mean nighttime lights in municipality -0.1097 -0.0013 Median nighttime lights in municipality 0.1446* 0.0031 Mean year of switch to impervious surface in -0.0024** municipality Share of bare land cover in municipality 0.0009 0.0109* Share of tree land in municipality 0.0019 Standard deviation of vegetation index in 0.4751** municipality Earliest year of switch to impervious surface in 0.0018 0.0070** municipality Latest year of switch to impervious surface in 0.0094** 0.0078** 0.0010 municipality Log of geographic area of municipality 0.0121** Number of steps in municipality 0.0019** Number of schools in municipality 0.0014** Number of bus stations in municipality -0.0395 Median year of switch to impervious surface in -0.0029 -0.0020 municipality Share of shrub land in municipality -0.0006 Minimum nighttime lights in municipality -0.0006 Number of secondary link roads -0.0080** Number of trunk roads 0.0000 Share of urban land in municipality -0.0006 Std Dev of No2 levels in municipality -0.0100* Intercept -0.0013 Number of observations 7,733 7,733 7,727 7,733 R-squared 0.1258 0.0623 0.0189 0.0220 Dependent variable is transformed rate, transformed using arcsine transformation. * 0.05 ** 0.01 Table A2 – Shapley decompositions of R2: post-lasso regression results Female Male Female Male LFP LFP Unemp. Unemp. State dummies 25.3% 27.1% 62.9% 73.4% Area of municipality 11.5% 0.5% Land Cover 4.6% 27.3% 7.7% 1.2% Vegetation Index 10.2% 0.7% 2.3% Night-time lights 11.3% 1.6% 14.0% 14.9% Pollution 2.0% Year of switching to impervious surface 9.0% 7.7% 2.5% 2.4% Population 0.9% 6.2% 1.4% Highway counts 5.7% 3.5% 3.9% 2.7% Highway lengths 4.2% 7.6% 3.9% 1.6% Points of interest counts 9.5% 13.7% 0.4% Amenity counts 7.9% 5.4% 2.2% Figure A1 - Quantile-quantile plots of residuals using simple random sample Male LFP Female LFP Male unemployment Female unemployment Figure A2 - Quantile-quantile plots of residuals using full enumeration sample Male LFP Female LFP Male unemployment Female unemployment Table A3 – Aggregating municipality point estimates to state level (1) (2) (3) (4) Male LFP Female Male Female LFP Unemp Unemp Estimated mean absolute error Direct 0.0179 0.0193 0.0048 0.0032 SAE – State level 0.0088 0.0073 0.0023 0.0024 SAE – Muni aggregation 0.0091 0.0117 0.0023 0.0023 Median estimated relative bias Direct 2.1164 -3.0573 -10.7269 -12.3980 SAE – State level -0.0757 -0.4322 1.5223 15.1544 SAE – Muni aggregation -0.1279 -1.9771 -0.1216 14.1461 Estimated rank correlation Direct 0.5652 0.7933 0.7911 0.6635 SAE – State level 0.7185 0.9498 0.8867 0.7529 SAE – Muni aggregation 0.7243 0.9380 0.8904 0.7775 Estimated Pearson correlation Direct 0.6048 0.8141 0.7528 0.6506 SAE – State level 0.7614 0.9583 0.8698 0.5467 SAE – Muni aggregation 0.7441 0.9506 0.8884 0.5652 States 32 32 32 32 Table A4 – Fay-Herriot with arcsine transformation (1) (2) (3) (4) Male LFP Female Male Female LFP Unemp Unemp Coverage rates F-H linear model 0.396 0.909 0.001 0.174 F-H arcsine transformation 0.887 0.823 0.804 0.842 Median relative bias F-H linear model 1.435 -6.127 -100.000 -100.000 F-H arcsine transformation 0.717 -1.784 -51.308 -76.510 Median absolute deviation F-H linear model 0.025 0.039 0.021 0.008 F-H arcsine transformation 0.025 0.040 0.010 0.006 Estimated rank correlation F-H linear model 0.319 0.723 0.193 0.356 F-H arcsine transformation 0.444 0.652 0.294 0.373 Estimated Pearson correlation F-H linear model 0.343 0.708 -0.012 0.155 F-H arcsine transformation 0.405 0.619 0.209 0.384