Global Poverty Monitoring Technical Note 19 Rapid Consumption Method and Poverty and Inequality Estimation in Somalia Revisited Shinya Takamatsu, Nobuo Yoshida and Aphichoke Kotikula March 2022 Keywords: Poverty and inequality measurement, consumption measurement, two-part models, predictions, survey methods, multiple imputation, missing data, Somalia. Development Data Group Development Research Group Poverty and Equity Global Practice Group GLOBAL POVERTY MONITORING TECHNICAL NOTE 19 Abstract This paper presents updated poverty and inequality estimates from the Somalia High Frequency Survey. This survey used the Rapid Consumption Method to collect consumption data quickly in an environment of high insecurity. Its poverty estimation, therefore, requires imputation of skipped consumption modules. Previous poverty estimates did not properly impute consumption, resulting in the imputation of negative total consumption values for some households. This paper uses the Two-Part Multiple Imputation method to address this issue. The assessment of module-level prediction performance demonstrates that the Two-Part Multiple Imputation handles this issue effectively. In addition, this paper adopts the newly updated 2011 purchasing power parities to convert the High Frequency Survey consumption data for global poverty measurement purposes. Lastly, this paper provides new inequality measures to address issues with the previous exercise. The paper finds that new poverty rates are slightly lower than those using the previous method while inequality is higher with the new method. JEL codes: C81, I32, D63. Affiliation: Poverty and Equity Global Practice, World Bank. Corresponding author(s): Shinya Takamatsu (stakamatsu@worldbank.org). We are grateful for comments received from the Global Poverty Working Group team (the World Bank) and, in particular, from Wendy Karamba, Christoph Lakner, David Newhouse, Minh Cong Nguyen, Roy van der Weide and Marta Schoch. This paper also benefited from inputs by Arden Finn. The Global Poverty Monitoring Technical Note Series publishes short papers that document methodological aspects of the World Bank’s global poverty estimates. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Global Poverty Monitoring Technical Notes are available at http://iresearch.worldbank.org/PovcalNet/. 1. Introduction Somalia is highly data-deprived and has not had a reliable poverty number for a long time. To overcome this challenge, the World Bank implemented the second wave of the Somali High Frequency Survey (HFS) to better understand livelihoods and vulnerabilities and, especially, to estimate national poverty numbers. The Somalia Wave 2 (HFS2) consumption data were collected using the Rapid Consumption Method (RCM) to save time on the collection process for security reasons. HFS2 was carried out in 2017-18 and achieved greater geographical and population coverage compared to earlier surveys. It is the first survey to include the Somali nomadic population as well as many households in insecure areas and covered 6,384 households from rural and urban areas in the central regions – Jubaland, Puntland, Somaliland and South West – and urban areas in Banadir. Its sample also included nomads and households in Internally Displaced Population (IDP) settlements from urban areas as well as those in IDP host communities. In all, the sample covered 16 out of 17 pre-war regions (Figure 1) and was stratified into 57 strata.1 The Primary Sampling Units were randomly drawn proportionate to size, based on the constructed sampling frame using satellite images for rural areas. Figure 1: Coverage of Somalia HFS2 Source: World Bank Group (2019). The RCM requires that all four food and four nonfood noncore consumption modules that were randomly allocated across households be computed (Pape et al., 2020; Pape and Wollburg, 2019). 1 Middle Juba was not covered due to security concerns. 1 However, the method used in Pape and Wollburg (2019) results in imputing negative total household consumption for 3.6 percent of the imputations (21,779 negatives of 609,200 imputations).2 Not only is negative consumption hard to justify as a concept, but is also problematic because we cannot calculate poverty indexes other than headcount ratios and inequality indexes.3 To avoid the practical issue, inequality and distribution estimates in Pape and Wollburg (2019) are calculated using the mean of 100 per capita household consumption over the 100 imputed values.4 However, the distribution of the mean becomes very different from the properly estimated distribution using the full 100 consumption values. Figure 2 shows that the distribution of the household mean of 100 imputed consumption values clearly diverges from those of each imputed consumption (imputations one and two are shown in Figure 2, as an example).5 Figure 2: Bias to use mean of imputed consumption Source: Authors’ estimation using the Somalia HFS 2. 2 At the module levels, the following modification was made after the initial imputation was produced though it was not enough to avoid a negative total consumption prediction. Whenever the sum of 100 imputed values at each of estimated modules became negative, a value to offset negative sum (absolute of the negative sum/100) was added in each of 100 predicted values. According to Pape and Wollburg, 2019, the modification was made to avoid negative imputed values but “without affecting the variance.” 3 Not only inequality indexes but also commonly used poverty indexes, such as the poverty gap and poverty square gap indexes, cannot be calculated since per capita consumption is used in the formulas and has negative values for some households. 4 In Pape and Wollburg (2019), these statistics were calculated using the mean of per capita consumption over households. Due to the modification explained in footnote one, the means of the module-level consumption are non- negative but the distributions are biased. 5 The poverty headcount indexes in Pape & Wollburg (2019) are not based on the mean consumption but other statistics are estimated using the mean consumption, according to the programs. 2 This paper uses the Two-Part Multiple Imputation method to address the negative consumption issue and applies the updated poverty line that reflects the latest available information. The following sections of the paper are organized as follows. Section 2 describes revisions of the poverty line with explanations on Purchasing Power Parities (PPPs). Section 3 discusses shortcomings of the previous method to impute poverty, including consumption expenditures that were imputed as negative values (Pape & Wollburg, 2019), and proposes a new imputation method to overcome these shortcomings. Section 4 presents the estimates calculated with the new method and compares its performance with the previous method. Section 5 discusses the new poverty and inequality estimates. These estimates have also been incorporated in the World Bank’s global poverty measures since March 2021 (also see Arayavechkit et al., 2021). Section 6 briefly examines the time-saving effects of HFS2 in Somalia, and Section 7 explores the possibility of simplifying the outputs from the new imputation methodology. Section 8 concludes the paper. 2. Revisions to setting the poverty line Poverty estimates for Somalia are calculated using consumption aggregates from HFS2 that are deflated to 2011 using the official Consumer Price Index (CPI) and then adjusted using the 2011 PPP conversion factor to obtain a consumption aggregate in 2011 PPP US$, needed for global poverty measurement. Here, the number of poor and the poverty headcount rate measure households or individuals whose consumption in 2011 PPP US$ is less than the international poverty line of US$1.90 per person per day. While the previous poverty estimates for Somalia were calculated using the original 2011 PPP conversion factors published by the ICP (Atamanov et al., 2018), the poverty estimates presented in this note have been updated following the publication of revised 2011 PPPs by the ICP (Atamanov et al., 2020). Because of the absence of official Gross Domestic Product (GDP) data, no revised PPP estimate is available for Somalia. To overcome this limitation, PPPs are calculated using the input data that were used in the original 2011 PPPs and an updated regression model (Atamanov et al., 2020).6 As a result, the revised 2011 PPP conversion factor was estimated at 6 Non-benchmark countries, such as Somalia, do not participate in the ICP price collection. Their PPPs are imputed using a cross-country relationship. This imputation model is updated with the revised 2011 PPPs, but the input data that is used to predict the PPP for Somalia remains unchanged. For Somalia, the standard model for ICP non- benchmark countries is used. 3 10,817.2 (Somali Shillings per international US$) while the earlier poverty estimation exercise used 10,731.7 as the conversion factor. 2.1. Impact of updating the PPPs on poverty estimates. Table 1 shows the poverty headcount rate measured at the international poverty line of US$1.90 (2011 PPPs) per person per day for national, Mogadishu, other urban, IDP settlement and nomads with the revised poverty line. The table also compares these data with the poverty rates based on the previous estimation (Pape & Wollburg, 2019). Even after updating the PPP, poverty rates remain almost the same at the national and subnational levels given the marginal increase in the poverty line with the updated PPP. Table 1: Poverty headcount rate, US$1.90 poverty line Other IDP Nomadic National Mogadishu Urban Rural Settlements population Previous estimate 69.4% 73.7% 60.3% 72.2% 75.6% 71.6% Updated PPP 69.8% 74.2% 60.8% 72.6% 75.9% 72.1% Source: Previously published estimates are taken from Pape & Wollburg (2019) (Figure 6 on p. 32). The rest are authors’ own calculations based on HFS2 2017/18. 3. Imputation Method 3.1. Rapid Consumption Methodology in HFS2 The HFS2 adapted a sampling strategy and logistical arrangements using micro-listing7 and questionnaire design to limit time on the ground, based on the RCM (Pape & Wollburg, 2019). The RCM worked as follows: the team divided food and nonfood consumption items in one core and four optional noncore modules. Core items were selected based on their importance for consumption. The remaining items were divided into four optional noncore modules. Then, the core module was assigned to all households and one of three optional noncore modules were randomly assigned to each household. This methodology, therefore, reduces the time spent on collecting consumption data on the missing noncore modules. Finally, multiple imputation techniques are used to impute total consumption and poverty. Pape and Mistiaen (2018) show that this design delivers reliable poverty estimates. 7 The micro‐listing method divides enumeration areas into smaller enumeration blocks using satellite imagery. Rather than performing a lengthy full listing of all households in the enumeration area, enumerators list only households in one enumeration block, select the household to be interviewed, and then immediately conduct the interview, greatly reducing the time required in the enumeration areas (Pape and Wollburg, 2019). 4 Another challenge in Somalia’s RCM data is that some households have different missing patterns of consumption data. As described in Pape and Wollburg (2019), 15 percent of households do not have core food, nonfood consumption and asset (user-cost) modules. In the RCM, all parts of the core modules should be collected for all the households. However, due to the poor quality of consumption data collected from two regions (rural North-East and Jubaland), consumption data were discarded from the analysis. Only household characteristics were retained in the database for these 15 percent of the households. 3.2. MI-MVN–An imputation methodology used in the previous estimation Why did the imputation method adopted by Pape & Wollburg (2019) produce negative household expenditures? Pape & Wollburg (2019) used Multiple Imputation-Multivariate Normal Regression (MI-MVN).8 MI-MVN is a Stata (StataCorp, 2019) command that simultaneously estimates multiple models to predict each of the noncore modules. There are two sources for the negative values. First, large shares of households consume none of the items in the noncore modules, and estimating noncore module consumption censored at 0 is not possible with the simple model used. As explained in Pape & Wollburg (2019), frequently consumed items were included in the core modules, and as a result, many infrequently consumed items are included in noncore modules. Indeed, 29 to 64 percent of noncore food modules and 27 to 40 percent of noncore nonfood modules were not consumed, as seen in Table 2. Second, MI-MVN adds to the predicted noncore module an error term randomly drawn from a multivariate normal distribution, but the error term can be a large negative value under the just described situation. As a result, the predicted noncore modules became negative for many households, and the sum of expenditures from all modules became negative. 8 The more detailed estimation steps are explained on p.17 —25 in Pape and Wollburg (2019); more statistical properties and applications of the MI-MVN method can be found in the mi impute mvn help manual (StataCorp 2019). 5 Table 2: Comparison of consumption data across actual consumption data, imputed data by MI-MVN, imputed data by Two-Part MI Actual consumption data By MI-MVN By Two-Part MI Mean Mean (exc. Mean Consumption Share of Mean (exc. Share of Mean non- Share of Mean (exc. modules: positive (all) zero) positive (all) positive) positive (all) zero) Food: Core module 100.0% 0.693 0.693 99.0% 0.684 0.699 100.0% 0.693 0.693 NC module 1 71.1% 0.065 0.091 74.6% 0.075 0.115 69.2% 0.077 0.112 NC module 2 55.7% 0.053 0.094 64.3% 0.050 0.106 57.3% 0.061 0.107 NC module 3 58.9% 0.045 0.076 65.4% 0.044 0.091 58.0% 0.057 0.099 NC module 4 35.6% 0.032 0.091 54.5% 0.034 0.108 35.9% 0.045 0.125 Nonfood: Core module 93.7% 0.191 0.204 92.2% 0.188 0.219 93.3% 0.191 0.205 NC module 1 59.6% 0.031 0.052 71.0% 0.033 0.057 61.9% 0.035 0.056 NC module 2 66.5% 0.037 0.055 67.8% 0.040 0.082 66.8% 0.045 0.067 NC module 3 73.3% 0.049 0.067 71.5% 0.046 0.084 70.8% 0.049 0.069 NC module 4 71.4% 0.027 0.037 72.1% 0.030 0.050 72.7% 0.030 0.042 Asset module 96.1% 0.024 0.025 95.4% 0.029 0.031 96.0% 0.024 0.025 Source: Authors’ estimation using the Somalia HFS2. Notes: NC = noncore. Multiple imputation (MI) methods such as MI-MVN have commonly been used to impute missing data in general (Rubin, 1996; Little and Rubin, 2019) and consumption or income data (Christiaensen et al., 2012; Dang et al., 2017; Douidich et al., 2016; Newhouse et al., 2014; Stifel and Christiaensen, 2007; Yoshida et al., 2015; Yoshida et al., 2020) in developing countries. However, these applications are usually estimated for total consumption and not parts of consumption. Estimating parts of consumption is more difficult than estimating total consumption since not all households consume a subset of items; therefore, the distribution of the subset would be irregular.9 These problems are applicable to HFS2 consumption data where large shares of households do not consume items in noncore modules, as seen in Table 2. This problem is exacerbated because the HFS2 tried to allocate consumption items that are less likely to be consumed into the noncore modules. 9 The literature on consumption imputation often uses a log transformation to convert the distribution into a well- behaved distribution like the normal distribution, but any transformation was not used in the previous method probably because of zero consumption. 6 For these reasons, using a simple imputation model in this situation resulted in both negative predicted values and the divergence in the predicted values for households that consumed optional modules. The MI-MVN imputation included zero values in the training data, but the predicted values do not accommodate such prediction that is truncated at zero. Thus, the left-tail of the predicted values become negative, given the normal distribution assumed in the method. In addition, the distributions of nonzero noncore consumption are highly skewed to the left, as we usually observe in consumption data, but a log transformation could not be used to avoid producing missing values due to zero consumption. The impact of these problems on poverty estimates depends on the shapes of the distributions and the level of a poverty line, and we will show this in Section 4. 3.3. A new imputation technique - Two-Part Multiple Imputation method This paper proposes a different method to estimate consumption. As explained above, the estimation method must handle two cases of households that have different missing patterns. The first group of households do not have any consumption data (about 15 percent) at all. The second group of households have data on core consumption items but not on noncore consumption modules, and the missing pattern follows the RCM design. The previous estimation was based on the MI-MVN method, as mentioned above; while this paper proposes a different imputation methodology that does not impute negative values and can produce a distribution censored at 0. The methodology is called a Two-Part Multiple Imputation Technique (Two-Part MI). Given the two different types of missing patterns, Stata’s mi impute chained command was used since it “fills in missing values in multiple variables iteratively by using chained equations, a sequence of univariate imputation methods with fully conditional specification (FCS) of prediction equations. It accommodates arbitrary missing patterns” (StataCorp, 2019, p.140). In each of the 100 simulation steps, the mi impute chained imputes variables in order from the most observed to the least observed, so in this specific missing pattern case it imputes the missing values in core food, core nonfood and asset modules one by one given already available non-missing modules at first, and then imputes the missing values in noncore food and nonfood modules given already 7 imputed modules.10 This iteration algorithm is repeated until the estimated parameters converge and then we move to the next of 100 simulations.11 Another benefit of the mi impute chained command is that it allows us to use a variety of popular univariate imputation methods using the Gaussian normal and logistic regressions, as in the Two-Part MI. Therefore, the Two-Part MI with the mi impute chained command can impute missing consumption for the first and second groups of households altogether. To model the situation where large shares of households do not consume noncore consumption modules at all, we used the two-part model that explicitly specifies two stages: 1) whether households consume a module or not; and 2) how much they consume the module if they do (Cameron and Trivedi, 2005). The two-part model is estimated with the MI method, so that the estimate can preserve the distribution with good property (Rubin 1987). The estimation can be implemented easily using the MI commands in Stata (StataCorp, 2019). The Somali HFS has four food (and nonfood) consumption modules (1-4) that are allocated randomly to households. We have household ij, i ∈{1,.., N} and j ∈{1, 2, 3, 4}, where i is household ID; N is the total number of households; and j is the noncore food module allocated. For example, when household i belongs to group j=1, this household collects information on noncore module 1, but no information (missing) on noncore modules 2, 3, and 4. We have noncore nonfood modules to be estimated, but these modules are omitted from the notation for brevity. First part of the Two-Part MI identifies households with strictly positive or zero consumption in noncore modules. This part of the estimation method follows the steps described in Rubin (1987 pp. 169–170) and StataCorp (2019, pp. 176–181). 1. We use a logistic regression to estimate a probability to have strictly positive consumption ), using data from a household with j=j’ where the inverse logit in noncore module j ( function is: 10 The previously used MI-MVN also handles these arbitrary missing patterns, like the mi impute chained command, and estimates core and noncore modules simultaneously with similar iteration steps (StataCorp, 2019). 11 Due to these iteration steps within each of the 100 simulations, we get different final coefficients of the predictors in each of the 100 simulations. Presenting all the estimated model parameters with mi impute chained is not feasible; therefore, the regression results without MI are shown in the appendix table. 8 ( ) ( > 0 | ) = + ( ) 2. We randomly draw parameters from the normal posterior distribution and obtain the predicted probability, for where households with j ≠ j’. 3. If is greater than a number that is randomly drawn from uniform (0,1), predict > 0, and predict = 0 otherwise, 4. Steps 2-3 are repeated 100 times. Second part of the Two-Part MI estimates positive consumption expenditures to households ′ with >0. This step follows the steps described in Rubin (1987, pp. 169–170) and StataCorp, (2019, pp. 176–181). 1. We estimate a linear regression model only using households that have strictly positive ′ consumption in module j’ ( >0) after transforming the consumption with the natural log, assuming the normal linear regression model: ( > 0 | ) ∽ ( , 2 ), 2. We predict consumption for households where j≠j’ and >0 from the first part. We randomly draw parameters and from the normal posterior distributions (specified in the above mentioned two papers). 3. Step 2 is repeated 100 times. The variables used as predictors are taken from those used in Pape & Wollburg (2019) and are shown in Appendix A.1. In addition, for the 15 percent of households, the core modules need to be estimated in addition to the noncore modules. The noncore modules estimation can follow the same steps as above, but these households do no not have core consumption modules, so the core modules are estimated at first for these households. As seen in Table 2, very small percentages of households (6 and 4 percent, respectively) did not consume any core nonfood and asset modules among those for whom core consumption was collected, so the estimation of core modules followed the same two parts model as above, and the second part of the two part steps is used for the food core module. With 9 the mi impute chained command, the core modules are first imputed according to the algorithm, and we can use all the households together in this command. Finally, the 100 imputed values from the first part and second part are combined after imputing the core and noncore modules. Since we have four noncore food and four nonfood modules, we conduct the steps for eight modules individually in addition to the core food, nonfood and asset modules. The total consumption is the sum of the eight imputed noncore modules, and the core consumption modules (food, nonfood, and asset). The final estimate of statistics of our interest will be calculated following Rubin’s combination rule (1987, pp. 76–81) and the Stata mi estimate command to calculate the estimate, standard errors and confidence intervals (StataCorp, 2019). For example, a point estimate of the poverty headcount can be calculated by taking the mean of the 100 imputed poverty estimates that are calculated using each of the imputed simulations. Note that our estimates are different from inequality and distribution estimates in Pape and Wollburg (2019) that used the mean of the 100 imputed consumption values because the distribution of the household-level mean consumption is different from the distributions of 100 imputed consumption values. 4. Comparison of estimation results by MI-MVN and Two-Part MI To compare the prediction results from the two methods, the poverty and inequality indexes using the original MI-MVN method are shown in the second column in Table 3.12 We observe relatively small differences in poverty estimates between columns one and two. By contrast, we observe some difference in the Gini indexes between the two columns. This is because the Gini indexes by MI-MVN could not be calculated directly from the imputed values due to negative prediction and the mean of the 100 imputed per capita household consumption was used to calculate inequality such as Gini indexes. However, this resulted in underestimated inequality reported in Pape and Wollburg (2019).13 12 Note that the modification explained in footnote 2 was implemented. It is known that taking the mean of the “multiply” imputed consumption resul ts in a narrower distribution. This 13 means that the calculated poverty and inequality are usually underestimated. 10 The national poverty rate became slightly lower than with the previous method but this is due to the change at the upper tail of the entire distributions with a relatively high poverty line. The national poverty headcount rate is 68.6 percent with the Two-Part MI method, while it is 69.8 percent with the MI-MVN method, as shown in Table 3. Table 3: Summary of poverty and inequality estimates from two estimation methods Two-Part MI Original MI-MVN Headcount ratio (2011 PPP US$1.90) National 68.6% 69.8% Mogadishu 74.5% 74.2% Other urban 59.1% 60.8% Rural 72.2% 72.6% IDP 74.4% 75.9% Nomads 70.4% 72.1% Poverty gap* National 0.291 Mogadishu 0.277 Other urban 0.232 Rural 0.363 IDP 0.334 Nomads 0.278 Squared poverty gap* National 0.157 Mogadishu 0.131 Other urban 0.122 Rural 0.220 IDP 0.185 Nomads 0.139 Gini index** National 0.368 0.344 Mogadishu 0.280 0.255 Other urban 0.341 0.335 Rural 0.447 0.407 IDP 0.373 0.335 Nomads 0.344 0.320 Source: Authors’ estimation using the Somalia HFS2. Notes: * Poverty gap and squared poverty gap indexes are not well defined with negative consumption. So, the estimates for the MI-MVN are not shown. ** Inequality indexes such as Gini and mean consumption from MI-MVN are calculated by the mean of 100 simulated per capita consumption, based on the method reported in Pape & Wollburg (2019). The national Gini coefficient based on the Two-Part MI is 0.368 and is higher than that of the MI- MVN by 0.024. This increase is due to two differences. In addition to the difference in the 11 imputation method, the Gini from MI-MVN calculated the household-level expenditures by averaging the 100 imputed expenditure values at first and estimated the Gini coefficient in accordance with Pape & Wollburg (2019). As discussed above, the distribution of the means has a smaller variance than the true distribution and, as a result, the Gini coefficient is underestimated. On the other hand, this paper estimates the Gini coefficient for each of the 100 imputation rounds and takes the average as the point estimate of the Gini coefficient. This correct approach could be done because the imputed expenditures by the Two-Part MI method do not result in negative household expenditures. 4.1. Performance of imputation by MI-MVN and Two-Part MI The HFS2 collects actual consumption data in the core module, one noncore food module, and one noncore nonfood module from each of the four randomly selected groups of sample households. Its time-saving effects come from the fact that it skips three noncore food and nonfood modules for each household in the sample. The missing consumption data are imputed by MI-MVN in Pape and Wollburg (2019) as well as by Two-Part MI in this paper. Since each of the noncore food and nonfood modules has actual consumption data, we evaluate the prediction performance of the two imputation methods and summarize the findings in Table 2. First, the module-level estimates indicate that the MI-MVN results are biased because they underestimate the shares of households that consumed items in each of the modules, leading to the underestimation of consumption and, ultimately, an overestimation of poverty. To understand this, we compare predicted values with the actually-observed consumption.14 Columns one, four, and seven in Table 2 show the percentages of households that consumed items listed in individual modules before the imputation, those with the MI-MVN and those with the Two-Part MI. The table shows that the MI-MVN severely overestimated the percentages of households that consumed optional noncore modules (column five), compared to the actual consumption (column two), in four out of eight modules. On the other hand, the predicted percentages derived by the Two-Part MI (column eight) replicated those in the actual consumption for all modules. This clearly indicates the benefit of using the two-part approach. 14 Consumption in optional noncore modules was actually-collected for one-fourth of households. Predicted values should be close to those actually-observed data. 12 Second, the MI-MVN and Two-Part MI both predicted the overall means and the means excluding non-positive values reasonably well. As seen in the table, the predicted means including zero (and negative for the MI-MVN) from the two estimation methods are mostly close (column six and nine) to those for the actual consumption data (column three). Similarly, the means excluding zero from the two methods (columns seven and 10) are close to those for the actual consumption (column four). Therefore, the MI-MVN seemed as good as the two-Part MI for the mean prediction; however, the prediction of the means itself is not very beneficial since we would like to predict not only means but also the distribution. Third, the following analysis shows that the predicted distribution using the MI-MVN clearly diverges from that of the observed distribution, while the predicted distribution using the Two- Part MI is much closer to that of the observed one. The predicted distributions of optional noncore module 1 using the MI-MVN and the Two-Part MI are shown in Figure 3.15 Figure 3 shows that the predicted distribution of noncore food module 1 using the MI-MVN is very different from the actually observed values, though the estimated means were correct, as shown in the previous paragraph. The predicted values excluding zero and negatives also diverge. In contrast, the figure shows that both predicted distributions, including and excluding zero, using the Two-Part MI closely replicated the distributions of the actually observed module 1. The left panel of the figure shows that the existing zero is handled well, as opposed to in the MI-MVN. Though results are not shown, the same findings are observed for other modules. 15 The figures for all of the imputed modules are shown in the Appendix Figure A.1, and the same observations to noncore food module 1 are applicable to the rest of noncore food and nonfood modules. 13 Figure 3: Distribution of observed and predicted optional noncore food module 1 by MI-MVN and Two-Part MI estimations As for the core food and nonfood consumption and asset modules that were imputed due to poor data quality, the predicted distributions from the MI-MVN look very problematic. Figure 4Error! Reference source not found. shows that the predicted distribution of the core food module using the MI-MVN has negative values though the predicted distributions greater than zero look close to the observed values both for the MI-MVN and Two-Part MI. Turning to the predicted means 14 shown in Table 2 , the Two-Part MI replicated the means perfectly (columns nine and 10) though the MI-MVN’s prediction are slightly off (columns six and seven). Figure 4: Distribution of observed and predicted food core module by MI-MVN and Two-Part MI 15 5. Comparison with previously published poverty and inequality estimates The update of the poverty line with the new PPP has not changed the estimation of poverty as shown in section 2. We next show the effects of switching the imputation method from MI-MVN to Two-Part MI on these estimates. Table 4 summarizes all effects. The poverty headcount rate at the national level was originally 69.4 percent and is 69.8 percent after updating PPPs. Similarly, the poverty headcount rates for the subnational levels remain almost the same. Now, after switching the imputation methodology, poverty headcount rates decrease slightly – about 1 percentage point at the national level while there is almost no change for Mogadishu and rural areas, the rest show about 1.5 percentage point decrease. Table 4: Comparison with previously published poverty headcount rate, US$1.90 poverty line Other IDP Nomadic National Mogadishu Urban Rural Settlements population Previous estimate 69.4% 73.7% 60.3% 72.2% 75.6% 71.6% Updated PPP 69.8% 74.2% 60.8% 72.6% 75.9% 72.1% Paper’s estimate (Two-Part MI; updated PPP) 68.6% 74.5% 59.1% 72.2% 74.4% 70.4% Source: Previously published estimates are taken from Pape & Wollburg (2019) (Figure 6 on p. 32). The rest are authors’ own calculations based on HFS2 2017/18. Results of the estimation for the full set of poverty and inequality statistics at the national level are shown in Table 5 and 6. Inequality estimates are higher since the previous methodology underestimated inequality estimates. The national Gini is 36.8 using the Two-Part MI, while it was 34.4 in the previous report using MI-MVN. The difference in inequality is mainly due to the fact that the Gini was calculated using the mean of imputed per capita consumption in the previous estimate. As discussed in section 2, the distribution of the mean of imputed consumption is narrower than the individual distributions, so that the Gini will be underestimated. The Gini estimated with Two-Part MI estimates are free from this issue since without negative imputation the Gini can be estimated directly by taking the mean of the 100 Gini indexes that were calculated from each of the 100 imputed distributions. As discussed in this section, while the correct approach was adopted for the poverty headcount ratio calculation in the previous report, this was not the case in the estimation of other poverty and inequality statistics. As a result, after correcting this estimation procedure and using the Two-Part 16 MI technique, we find that both poverty gaps and severity measures and the Gini coefficient increase significantly. Table 5: Comparison with previously published Gini index estimates Other IDP Nomadic National Mogadishu Urban Rural Settlements Population Previous estimate 0.344 0.255 0.335 0.407 0.335 0.320 Paper’s estimate 0.368 0.280 0.341 0.447 0.373 0.344 Source: The previously published estimate is taken from Pape & Wollburg, 2019 (p.35). The other is authors' own calculation based on HFS2 2017/18. Table 6: Comparison with previously published national estimates Revised Original Estimates Estimate National: Headcount ratio (2011 PPP US$1.90) 68.6% 69.4% Poverty gap 29.1% 28.8% Squared poverty gap 15.7% 14.9% Gini index 36.8% 34.4% Source: The previously published estimates are taken from Pape & Wollburg, 2019. The other is authors' own calculation based on HFS2 2017/18. 6. Time-saving effects of the RCM in the Somalia HFS The RCM saves time because it skips data collection for three noncore food and nonfood modules. The question is: how much time can we save from such skips? Table 2 shows that, on average, 45 and 32 percent of households did not consume any items included in the noncore modules for food and nonfood, respectively, while the percentage of nonconsumption for each module ranges from 27 to 64 percent. If a household did not consume any of the items in the noncore module, the household needed to simply report “no,” so that the time saved by skipping these items instead of asking would be very small. Given these considerations, it may be useful in future surveys to conduct pilot tests on interview time taken per noncore module, which can provide a good sense of the benefits of the RCM. The decision whether to continue using RCM for Somalia calls for an evaluation of costs and benefits. In terms of benefit from time saving, one should consider the time saved during the interview to collect three noncore food modules and three noncore nonfood modules, and not 17 compare with the total time for a regular survey, which may last up to three to six hours. In this regard, the cost of RCM is the added complication in consumption aggregate imputation, both for poverty estimation and for national accounts purposes. 6.1. Approximation with one simulation Using a multiple imputation method such as the MI-MVN and Two-Part-MI results in producing multiply imputed consumption values, estimates such as poverty and inequality indexes are calculated by taking the means of corresponding indexes within each of the 100 imputations. 16 Although statistical packages such as Stata provide useful commands that can easily conduct tabulations and various types of regression analysis using the 100 vectors of consumption expenditures, those who are not familiar with multiple imputations might find it difficult to conduct such technical analysis. To make the results more accessible and replicable for a more nontechnical audience, this section proposes a method that uses a single vector of household expenditure data, so that a nontechnical audience can easily replicate all key poverty and inequality statistics (single imputation approach). With a single vector, users can handle poverty profiling and other analysis as if they have used regular datasets.17 In this regard, those unfamiliar with MI techniques can easily conduct further analyses. To pick a vector of household expenditure, we select one vector from 100 vectors of consumption estimates imputed by the Two-Part MI method that most closely replicate key poverty and distribution indicators. More specifically, we chose the following indicators – poverty estimates using: (i) the national poverty line; (ii) the food poverty line; (iii) the national bottom 30 percent line; and (iv) the Gini index. The corresponding estimates with the 100 imputations are 69, 49, 30 and 36.8, respectively. We then rank these 100 imputations using the distance from the averages 16 The standard errors and confidence intervals are calculated by following the Rubin’s combination rule (1987, pp. 76–81). 17 Another approach is to use the full (or 100) imputations but to carry out the analyses as if we had non-imputation data. This pooled imputation approach is attractive since the poverty indexes can be replicated perfectly without following the Rubin’s formula. However, we did not discuss this in this paper since havi ng more than one observation per a household may be confusing. The replicability of inequality indicators with the pooled imputation approach is an empirical question to be tested and the results are shown in Appendix Table A.. 18 of the indicators estimated in each of the 100 imputations.18 When we rank imputed expenditures, we round off the poverty estimates to the nearest integer and the Gini index to the nearest decimal point. We then pick the vector of the 66th imputation round, as it produces the same poverty numbers as the nearest integer or decimal point, i.e., 69, 49, 30 and 36.8, respectively. 6.2. Two-Part MI vs. the two proposed methods We find that round 66 of the 100 imputations closely replicate the poverty and inequality indices estimated using 100 imputations by Two-Part MI at the national and subnational levels. As seen in Table 7, very small differences are generally found between the full Two-Part MI (column two) and the results of the single imputation in round 66 (column three). Though we only targeted the national poverty rates and national Gini coefficient when this imputation was selected, the following values can be mostly replicated: the poverty headcount ratios at the subnational levels; the poverty gap and squared poverty gaps at the national and subnational levels; and the Gini coefficients at the subnational levels. The levels of consumption can be closely replicated since the estimated per capita mean consumption and those by consumption quintiles are close between column two and three in the figure. Therefore, using the single imputation approach would be a useful option. Although the single imputation underestimates the standard errors, the degree of bias in standard errors is minimal.19 18 The national bottom 30 percent line was used instead of the popular 40 percent line simply because 40 percent is too close to the food line. 19 Standard errors are shown in Appendix Tables A.2 and A.3. 19 Table 7: Comparison of the full Two-art MI results with the results from Round 66 and all imputations pooled of the Two-Part imputation Imputation Round 66 from Two-Part Two-Part MI MI Headcount ratio (2011 PPP US$1.90) National 0.686 0.690 Mogadishu 0.745 0.731 Other urban 0.591 0.603 Rural 0.722 0.744 IDP 0.744 0.742 Nomads 0.704 0.694 Poverty gap* National 0.291 0.294 Mogadishu 0.277 0.278 Other urban 0.232 0.236 Rural 0.363 0.369 IDP 0.334 0.342 Nomads 0.278 0.275 Squared poverty gap* National 0.157 0.158 Mogadishu 0.131 0.130 Other urban 0.122 0.123 Rural 0.220 0.220 IDP 0.185 0.193 Nomads 0.139 0.137 Gini index National 0.368 0.367 Mogadishu 0.280 0.274 Other urban 0.341 0.338 Rural 0.447 0.451 IDP 0.373 0.367 Nomads 0.344 0.344 Per capita consumption National: 1.32 1.31 By deciles: Poorest 0.33 0.34 2 0.54 0.54 3 0.68 0.68 4 0.82 0.81 5 0.96 0.95 6 1.13 1.11 7 1.33 1.32 8 1.61 1.60 9 2.12 2.12 Richest 3.69 3.64 Source: Authors’ estimation using the Somalia HFS2. 20 7. Conclusions The HFS2 consumption data use the RCM to save time due to the security situation; its poverty estimation requires imputation of skipped consumption modules. To fill the skipped consumption data, Pape and Wollburg (2019) used the MI-MVN imputation method. This paper found that the MI-MVN did not impute the skipped consumption expenditures properly, resulting in negative total consumption values for some households. The existence of negative values is problematic because these values cannot be justified as consumption expenditures, and it does not allow us to calculate poverty and inequality indexes. Instead, this paper uses the Two-Part MI method to address the negative imputed values and also shows empirical evidence that the Two-Part MI outperforms MI-MVN. After all changes, the national poverty headcount rate marginally declined from 69.4 percent to 68.6 percent. As for inequality, the Gini coefficient at the national level increased from 34.4 percent to 36.8 percent. Lastly, this paper investigated whether the results of the Two-Part MI can be approximated by a single imputation from multiple imputations. The result showed that this single imputation approach is promising when we need to disseminate the imputed consumption or poverty data to users who are not familiar with the imputation methods. There are outstanding empirical questions that could be explored with further work. For example, how do we conduct model selection in this complex setting with 16 models? This paper successfully showed that the Two-Part MI outperforms the MI-MVN. Also, it is essential to investigate further the time-saving effects of the RCM approach. The RCM saves interview time by skipping part of the consumption modules. Still, since a large share of households did not report any consumption from the items included in the modules, the time-saving effect seems limited. It will be useful to measure time spent for each noncore module (hence, saved by the RCM) to assess the cost and benefit of the RCM approach. Lastly, it is very important to empirically evaluate whether skipping the questions introduces biases in the households’ responses to the rest of the survey questions (Beegle et al., 2012). 21 8. References Arayavechkit,T., Atamanov, A., Barreto H.,K. Y.; Belghith,N. B. H.; Castaneda A. R. A.; Fujs,T. H. M. J.; Dewina,R.; Diaz-Bonilla,C.; Edochie,I. N.; Jolliffe,D. M.; Lakner,C.; Mahler,D. G.; Montes,J.; Moreno H.,L. L.; Mungai,R.; Newhouse,D. L.; Nguyen,M. C.; Sanchez C.,D. M.; Schoch,M.; Sharma,D.; Simler,K.; Swinkels,R. A; Takamatsu,S.; Uochi,I.; Viveros M.,M. C.; Yonzan,N.; Yoshida,N.; Wu,H.. 2021. March 2021 PovcalNet Update : What’s New (English). Global Poverty Monitoring Technical Note,no. 15 Washington, D.C. : World Bank Group. http://documents.worldbank.org/curated/en/654971615585402030/March-2021-PovcalNet- Update-What-s-New Atamanov, A. & D. M. Jolliffe, C. Lakner and E. B. Prydz, 2018. "Purchasing Power Parities Used in Global Poverty Measurement," Global Poverty Monitoring Technical Note Series 5, The World Bank. Atamanov,A., C. Lakner, D. G. Mahler; T. Baah, S. Kofi, and J. Yang. 2020. The Effect of New PPP Estimates on Global Poverty : A First Look (English). Global Poverty Monitoring Technical Note 12. Washington, D.C. : World Bank Group. http://documents.worldbank.org/curated/en/191631589896884566/The-Effect-of-New- PPPEstimates-on-Global-Poverty-A-First-Look Beegle, K. J. De Weerdt, J. Friedman, and J. Gibson. 2012. Methods of household consumption measurement through surveys: Experimental results from Tanzania. Journal of Development Economics, 98(1), 3-18. Cameron, A. Colin, and P. K. Trivedi. 2005. Microeconometrics: methods and applications. Cambridge University Press. Christiaensen, L., P. Lanjouw, J. Luoto, and D. Stifel. 2012. Small area estimation-based prediction methods to track poverty: Validation and applications. Journal of Economic Inequality, 10(2), 267–297. https://doi.org/10.1007/s10888-011-9209-9 Dang, H. A. H., P. F. Lanjouw, and U. Serajuddin. 2017. Updating poverty estimates in the absence of regular and comparable consumption data: Methods and illustration with reference to a middle- income country. Oxford Economic Papers, 69(4), 939–962. https://doi.org/10.1093/oep/gpx020 Douidich, M., A. Ezzrari, R. van der Weide, and P. Verme. 2016. Estimating quarterly poverty rates using labor force surveys: A primer. World Bank Economic Review, 30(3), 475–500. https://doi.org/10.1093/wber/lhv062 Little, R. JA, and D. B. Rubin. 2019. Statistical analysis with missing data. Vol. 793. John Wiley & Sons. Newhouse, D., S. Shivakumaran, S. Takamatsu, and N. Yoshida. 2014. How Survey-to-Survey Imputation Can Fail. Policy Research Working Paper No. 6961, World Bank, Washington, DC. Pape, U.J. 2021. Measuring Poverty Rapidly Using Within-Survey Imputations. Policy Research Working Paper; No. 9530. World Bank, Washington, DC. Pape, U., T. Fujii and J. Mistiaen. 2020. Measuring Poverty Rapidly Using Within-Survey Imputations (Mimeo). Pape, U. J., L. Parisotto, V. Phipps-Ebeler, Angelika Johanna Marie Ralston, L. R. Mueller, T. Nezam, and A. Sharma. 2018. Impact of Conflict and Shocks on Poverty : South Sudan Poverty Assessment 2017 (English) (No. AUS0000204). 22 http://documents.worldbank.org/curated/en/953201537854160003/Impact-of-Conflict-and- Shocks-on-Poverty-South-Sudan-Poverty-Assessment-2017 Pape, U. and P. Wollburg. 2019. Estimation of Poverty in Somalia Using Innovative Methodologies Policy Research Working Paper, No. 8735, Issue February. https://openknowledge.worldbank.org/handle/10986/31267 Rubin, D. B. 1987. Multiple imputation for nonresponse in surveys (Vol. 44, Issue 8). Wiley. Rubin, D. B. 1996. Multiple Imputation after 18+ Years. Journal of the American Statistical Association, 91(434), 473–489. https://doi.org/10.1080/01621459.1996.10476908 StataCorp. 2019. Stata 16 Base Reference Manual. Stata Press. StataCorp. 2019. Stata Statistical Software: Release 16. StataCorp LLC. Stifel, D., and L. Christiaensen. 2007. Tracking poverty over time in the absence of comparable consumption data. World Bank Economic Review, 21(2), 317–341. https://doi.org/10.1093/wber/lhm010 Takamatsu, S., Yoshida, N, Ramasubbaiah, R., Fatima, F. 2021. Rapid Consumption Method and Poverty and Inequality Estimation in South Sudan Revisited. Global Poverty Monitoring Technical Note; No. 18. World Bank, Washington, DC. World Bank. https://openknowledge.worldbank.org/handle/10986/36540 License: CC BY 3.0 IGO.” World Bank Group. 2019. Somali Poverty and Vulnerability Assessment: Findings from Wave 2 of the Somali High Frequency Survey. World Bank, Washington, DC. Yoshida, N., A.S. Munoz, C.K. Lee, M. Brataj, and D. Sharma. 2015. SWIFT Data Collection Guidelines version 2. http://documents1.worldbank.org/curated/en/591711545170814297/pdf/97499-WP-P149557- OUO-9-Box391480B-ACS.pdf Yoshida, N, X. Chen, S. Takamatsu, K. Yoshimura, S. Malgioglio and S. Shivakumaran2020. The Concept and Empirical Evidence of SWIFT Methodology (Mimeo). 23 Appendix A: Additional tables Table A.1: Models from the Two-Part MI estimation Logit regression results from first part model for noncore food modules (1) (2) (3) (4) VARIABLES food m1>0 food m2>0 food m3>0 food m4>0 quartiles of core food module* = 2 0.0853 0.291 0.105 0.311 (0.190) (0.182) (0.185) (0.194) quartiles of core food module* = 3 0.476** 0.576*** 0.490** 0.645*** (0.208) (0.195) (0.200) (0.203) quartiles of core food module* = 4 1.111*** 0.584*** 0.832*** 1.254*** (0.248) (0.224) (0.230) (0.230) quartiles noncore nonfood* = 2 -0.0367 0.217 -0.0679 0.388* (0.201) (0.191) (0.200) (0.209) quartiles noncore nonfood* = 3 0.0101 0.255 0.0826 0.702*** (0.210) (0.196) (0.201) (0.201) quartiles noncore nonfood* = 4 0.705*** 0.628*** 0.354 1.257*** (0.225) (0.221) (0.216) (0.220) quartiles of asset* = 2 0.523** 0.317 0.452** 0.381 (0.216) (0.204) (0.207) (0.233) quartiles of asset* = 3 0.745*** 0.640*** 0.712*** 0.576** (0.223) (0.202) (0.211) (0.225) quartiles of asset* = 4 0.773*** 1.087*** 0.887*** 1.100*** (0.249) (0.229) (0.233) (0.242) Household size 0.303*** 0.0877** 0.208*** 0.232*** (0.0457) (0.0379) (0.0418) (0.0397) Share of child in household 0.499 0.624** -0.000549 0.555* (0.314) (0.296) (0.307) (0.306) Share of senior in household -0.462 -0.293 2.451*** 0.00577 (0.826) (0.711) (0.950) (0.753) Gender ratio = 1 -0.0791 -0.260* -0.0202 -0.261* (0.148) (0.143) (0.138) (0.146) Share of employed member = 1 0.0464 0.168 0.178 0.0904 (0.157) (0.148) (0.147) (0.153) Share of literate member -0.0577 -0.135 -0.211 -0.187 (0.160) (0.153) (0.152) (0.152) House type = 2, House 0.431* 0.0740 0.185 -0.250 (0.252) (0.241) (0.247) (0.245) House type = 3, Hut 1.101*** 0.464 0.386 -0.0330 (0.315) (0.297) (0.307) (0.294) House type = 4, Other 0.906*** -0.0217 0.145 -0.294 (0.287) (0.267) (0.276) (0.266) Drinking water = 2, Tap -0.148 0.166 0.0892 -0.110 (0.303) (0.292) (0.310) (0.317) Drinking water = 3, Tap or well 0.253 0.611** 0.125 0.158 (0.262) (0.250) (0.260) (0.255) Drinking water = 4, Delivered 0.0264 0.202 0.271 -0.00944 (0.222) (0.210) (0.217) (0.219) Floor = 2, Mud -0.748*** -0.373* -0.255 -0.0589 (0.191) (0.196) (0.186) (0.191) Floor = 3, Wood/other -0.204 0.198 -0.0789 -0.205 (0.219) (0.200) (0.210) (0.219) Dwelling = 2, Own 0.164 -0.390** -0.0370 -0.265 (0.165) (0.167) (0.163) (0.165) 24 Dwelling = 3, Provided 0.658** -0.302 0.621** 0.335 (0.304) (0.252) (0.284) (0.272) Dwelling = 4, Occupation 0.501 0.390 1.803*** 0.662* (0.393) (0.319) (0.448) (0.341) Location = 2, North-east Urban -1.189*** -1.130*** -1.158*** -1.041*** (0.266) (0.259) (0.265) (0.273) Location = 3, North-west Urban -1.236*** -1.504*** -0.493* -0.479* (0.286) (0.287) (0.271) (0.283) Location = 5, North-west Rural -0.329 -1.215** -0.381 -0.908 (0.568) (0.587) (0.658) (0.733) Location = 6, IDP Settlements 0.116 0.525* 0.0218 -0.318 (0.311) (0.298) (0.295) (0.302) Location = 7, Central regions Urban -0.491* -0.464* -0.399 -0.769*** (0.286) (0.258) (0.254) (0.262) Location = 8, Central regions Rural -0.00143 -0.184 0.482 -0.383 (0.343) (0.311) (0.333) (0.333) Location = 11, South West Urban -0.158 0.363 0.516** -0.336 (0.274) (0.265) (0.255) (0.256) Location = 12, South West Rural 0.298 0.623* 1.670*** 0.518 (0.366) (0.325) (0.387) (0.318) Location = 13, Nomadic population -0.473 -0.174 0.0693 -0.845** (0.386) (0.363) (0.379) (0.382) HH experienced hunger = 1, Rarely -0.107 0.0678 -0.0494 0.0147 (0.156) (0.150) (0.149) (0.152) HH experienced hunger = 2, Often 0.295 0.359 0.278 0.0201 (0.254) (0.235) (0.247) (0.244) Received remittances = 1, Yes 0.176 0.254 0.303* 0.266* (0.173) (0.168) (0.166) (0.160) Constant -2.440*** -1.484*** -2.261*** -3.088*** (0.461) (0.443) (0.449) (0.453) Observations 1,294 1,283 1,264 1,304 Pseudo R2 0.140 0.119 0.127 0.146 Standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1 RHS variables are the same as those used in MN MNV in Pape & Wollburg (2019). *The core modules (food, nonfood and assets) in the dependent variables are missing for some households due to poor quality data in two regions, but the quartiles for the households in these regions were calculated using the available data. Pape & Wollburg (2019) avoided missing values in these variables, and the authors followed the same method. 25 Table A.2 Logit regression results from first part model for noncore nonfood modules (1) (2) (3) (4) VARIABLES Nfood m1>0 Nfood m2>0 Nfood m3>0 Nfood m4>0 quartiles of core food module* = 2 0.200 0.392** 0.625*** 0.145 (0.191) (0.197) (0.202) (0.190) quartiles of core food module* = 3 0.603*** 0.752*** 1.039*** 0.618*** (0.214) (0.216) (0.225) (0.206) quartiles of core food module* = 4 1.037*** 1.117*** 1.461*** 0.470** (0.251) (0.251) (0.264) (0.233) quartiles noncore nonfood* = 2 0.819*** 0.798*** 0.582** 0.630*** (0.208) (0.211) (0.228) (0.201) quartiles noncore nonfood* = 3 1.277*** 1.092*** 0.801*** 1.239*** (0.219) (0.219) (0.229) (0.200) quartiles noncore nonfood* = 4 1.999*** 1.689*** 1.780*** 1.832*** (0.239) (0.249) (0.254) (0.231) quartiles of asset* = 2 0.902*** 0.876*** 0.446* 0.572** (0.228) (0.232) (0.245) (0.224) quartiles of asset* = 3 1.115*** 1.147*** 0.510** 0.942*** (0.238) (0.228) (0.243) (0.219) quartiles of asset* = 4 1.511*** 1.402*** 0.777*** 1.350*** (0.266) (0.254) (0.266) (0.246) Household size 0.365*** 0.294*** 0.358*** 0.261*** (0.0453) (0.0442) (0.0488) (0.0430) Share of child in household 0.506 0.462 1.173*** 0.731** (0.320) (0.327) (0.344) (0.313) Share of senior in household -0.635 -0.723 -0.471 0.557 (0.969) (0.789) (1.049) (0.758) Gender ratio = 1 -0.398*** -0.449*** -0.579*** -0.129 (0.152) (0.159) (0.159) (0.155) Share of employed member = 1 0.564*** 0.533*** 0.606*** 0.244 (0.159) (0.159) (0.166) (0.158) Share of literate member -0.137 -0.0771 -0.365** -0.0541 (0.162) (0.169) (0.175) (0.160) House type = 2, House 0.439* -0.0279 0.234 0.339 (0.257) (0.257) (0.268) (0.249) House type = 3, Hut 0.117 -0.490 -0.287 0.595* (0.317) (0.324) (0.345) (0.306) House type = 4, Other 0.583** -0.0424 -0.0458 0.369 (0.291) (0.288) (0.307) (0.278) Drinking water = 2, Tap 0.533* 0.159 0.814** 0.534 (0.316) (0.320) (0.362) (0.343) Drinking water = 3, Tap or well 1.429*** 0.817*** 0.375 0.431 (0.280) (0.279) (0.295) (0.270) Drinking water = 4, Delivered 0.436* 0.476** 0.739*** 0.107 (0.226) (0.231) (0.248) (0.224) Floor = 2, Mud -0.392** 0.452** 0.276 0.128 (0.196) (0.225) (0.215) (0.206) Floor = 3, Wood/other 0.270 0.206 0.0450 0.204 (0.223) (0.221) (0.240) (0.226) Dwelling = 2, Own 0.245 -0.464*** 0.148 0.0701 (0.167) (0.177) (0.181) (0.169) Dwelling = 3, Provided 0.864*** 0.447 0.747** 0.652** (0.291) (0.298) (0.328) (0.315) 26 Dwelling = 4, Occupation 0.511 0.725** 2.203*** 0.595 (0.368) (0.349) (0.609) (0.376) Location = 2, North-east Urban 0.305 -0.266 -0.154 -0.0510 (0.273) (0.271) (0.274) (0.273) Location = 3, North-west Urban -0.359 -0.844*** -0.617** -0.581** (0.294) (0.297) (0.301) (0.286) Location = 5, North-west Rural 0.495 -0.478 0.829 0.105 (0.636) (0.629) (0.679) (0.627) Location = 6, IDP Settlements 0.543* 0.307 0.771** 0.733** (0.296) (0.311) (0.335) (0.327) Location = 7, Central regions Urban 0.692** 0.416 0.756** 0.124 (0.289) (0.288) (0.297) (0.277) Location = 8, Central regions Rural 0.758** 0.491 1.206*** 0.198 (0.336) (0.344) (0.375) (0.335) Location = 11, South West Urban 0.299 0.293 0.643** 0.231 (0.264) (0.280) (0.280) (0.274) Location = 12, South West Rural 0.541 0.596* 1.704*** 0.864** (0.335) (0.333) (0.379) (0.359) Location = 13, Nomadic population 1.539*** 1.927*** 2.351*** 1.805*** (0.409) (0.466) (0.482) (0.448) HH experienced hunger = 1, Rarely 0.220 0.511*** 0.261 0.0472 (0.157) (0.169) (0.170) (0.167) HH experienced hunger = 2, Often 0.468* 0.464* 0.752** -0.276 (0.259) (0.259) (0.304) (0.253) Received remittances = 1, Yes -0.0655 0.439** 0.157 0.0989 (0.171) (0.191) (0.188) (0.176) Constant -5.740*** -4.265*** -4.904*** -3.957*** (0.518) (0.515) (0.533) (0.472) Observations 1,294 1,283 1,264 1,304 Pseudo R2 0.232 0.231 0.260 0.199 Standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1 RHS variables are the same as those used in MN MNV in Pape & Wollburg (2019). *The core modules (food, nonfood and assets) in the dependent variables are missing for some households due to poor quality data in two regions, but the quartiles for the households in these regions were calculated using the available data. Pape & Wollburg (2019) avoided missing values in these variables, and the authors followed the same method. 27 Table A.3 OLS regression results from second part model for food module (1) (2) (3) (4) VARIABLES Ln food m1 Ln food m2 Ln food m3 Ln food m4 quartiles of core food module* = 2 0.388*** 0.264** 0.268* 0.532*** (0.112) (0.128) (0.140) (0.203) quartiles of core food module* = 3 0.628*** 0.442*** 0.540*** 0.809*** (0.122) (0.134) (0.149) (0.207) quartiles of core food module* = 4 1.175*** 0.973*** 1.277*** 1.277*** (0.137) (0.152) (0.169) (0.223) quartiles noncore nonfood* = 2 -0.184 0.125 -0.00570 -0.0644 (0.120) (0.136) (0.152) (0.231) quartiles noncore nonfood* = 3 -0.0700 0.0676 0.189 0.155 (0.126) (0.141) (0.150) (0.224) quartiles noncore nonfood* = 4 0.0829 0.317** 0.414** 0.559** (0.132) (0.154) (0.164) (0.232) quartiles of asset* = 2 -0.0774 -0.0915 -0.0560 0.0941 (0.130) (0.147) (0.157) (0.259) quartiles of asset* = 3 -0.0774 -0.0365 -0.0839 0.181 (0.134) (0.145) (0.158) (0.249) quartiles of asset* = 4 0.180 0.290* 0.0913 0.0385 (0.153) (0.167) (0.177) (0.264) Household size -0.0234 -0.0418 -0.0182 -0.0650* (0.0228) (0.0277) (0.0283) (0.0370) Share of child in household -0.286 -0.308 -0.0798 -0.571* (0.184) (0.209) (0.224) (0.302) Share of senior in household 0.765 -0.259 0.271 -1.686** (0.574) (0.567) (0.536) (0.742) Gender ratio = 1 -0.0554 -0.0579 0.00914 -0.144 (0.0842) (0.0967) (0.103) (0.141) Share of employed member = 1 -0.0176 -0.0937 0.176 0.0383 (0.0894) (0.0999) (0.112) (0.146) Share of literate member 0.237*** -0.0158 -0.144 0.0685 (0.0885) (0.106) (0.109) (0.149) House type = 2, House 0.140 -0.412** -0.395* -0.406* (0.170) (0.178) (0.208) (0.224) House type = 3, Hut 0.378** -0.149 -0.000712 0.0495 (0.192) (0.204) (0.241) (0.275) House type = 4, Other 0.225 -0.237 -0.155 -0.423* (0.181) (0.191) (0.226) (0.242) Drinking water = 2, Tap -0.130 -0.0762 0.00467 -0.0742 (0.177) (0.196) (0.208) (0.300) Drinking water = 3, Tap or well -0.260* -0.296* 0.205 -0.641** (0.148) (0.163) (0.186) (0.255) Drinking water = 4, Delivered -0.0382 -0.0857 0.168 -0.163 (0.130) (0.149) (0.160) (0.208) Floor = 2, Mud 0.00150 0.0425 0.226* 0.120 (0.109) (0.132) (0.132) (0.182) Floor = 3, Wood/other 0.137 0.429*** 0.0607 0.0226 (0.119) (0.127) (0.142) (0.216) Dwelling = 2, Own 0.0106 0.200* 0.0400 -0.429*** (0.0957) (0.117) (0.125) (0.156) Dwelling = 3, Provided -0.0166 0.358** 0.0548 -0.0547 (0.146) (0.167) (0.178) (0.274) Dwelling = 4, Occupation -0.254 0.445** 0.0188 -0.121 28 (0.195) (0.191) (0.207) (0.315) Location = 2, North-east Urban 0.160 0.181 -0.0905 0.0548 (0.171) (0.197) (0.235) (0.266) Location = 3, North-west Urban 0.0582 0.938*** 0.0453 0.813*** (0.190) (0.233) (0.226) (0.258) Location = 5, North-west Rural 0.318 0.611 -0.177 0.0114 (0.390) (0.514) (0.539) (0.819) Location = 6, IDP Settlements 0.517*** 0.309 -0.0501 0.265 (0.168) (0.202) (0.219) (0.289) Location = 7, Central regions Urban 0.318** 0.0881 0.0411 0.556** (0.153) (0.183) (0.200) (0.235) Location = 8, Central regions Rural 0.461** 1.094*** -0.0768 1.057*** (0.188) (0.218) (0.242) (0.335) Location = 11, South West Urban 0.281* 0.318* 0.173 0.279 (0.147) (0.175) (0.183) (0.245) Location = 12, South West Rural 0.183 0.207 -0.434** 0.0613 (0.183) (0.208) (0.218) (0.283) Location = 13, Nomadic population 0.504** 0.549** -0.0707 0.939** (0.215) (0.246) (0.274) (0.388) HH experienced hunger = 1, Rarely 0.0756 0.0517 0.0612 -0.0586 (0.0880) (0.102) (0.110) (0.144) HH experienced hunger = 2, Often 0.159 -0.107 0.219 0.220 (0.139) (0.156) (0.174) (0.237) Received remittances = 1, Yes 0.174* 0.00515 0.134 0.152 (0.0963) (0.110) (0.118) (0.142) Constant -3.749*** -3.384*** -3.713*** -3.485*** (0.271) (0.314) (0.337) (0.436) Observations 881 759 719 489 R-squared 0.239 0.274 0.241 0.347 Standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1 RHS variables are the same as those used in MN MNV in Pape & Wollburg (2019). *The core modules (food, nonfood and assets) in the dependent variables are missing for some households due to poor quality data in two regions, but the quartiles for the households in these regions were calculated using the available data. Pape & Wollburg (2019) avoided missing values in these variables, and the authors followed the same method. 29 Table A.4 OLS regression results from second part model for nonfood module (1) (2) (3) (4) VARIABLES Ln nfood m1 Ln nfood m2 Ln nfood m3 Ln nfood m4 quartiles of core food module* = 2 0.0845 0.277** 0.263** 0.157* (0.0983) (0.121) (0.120) (0.0951) quartiles of core food module* = 3 0.281*** 0.565*** 0.430*** 0.273*** (0.105) (0.128) (0.128) (0.0995) quartiles of core food module* = 4 0.560*** 1.056*** 0.700*** 0.380*** (0.119) (0.145) (0.148) (0.116) quartiles noncore nonfood* = 2 0.584*** 0.354*** 0.0178 0.146 (0.113) (0.134) (0.129) (0.106) quartiles noncore nonfood* = 3 0.569*** 0.330** 0.318** 0.392*** (0.123) (0.141) (0.137) (0.105) quartiles noncore nonfood* = 4 0.753*** 0.588*** 0.926*** 0.889*** (0.130) (0.152) (0.149) (0.116) quartiles of asset* = 2 0.0452 -0.141 0.263** -0.0268 (0.120) (0.139) (0.130) (0.115) quartiles of asset* = 3 0.0740 0.428*** 0.355*** -0.00373 (0.123) (0.139) (0.133) (0.115) quartiles of asset* = 4 0.411*** 0.702*** 0.616*** 0.431*** (0.139) (0.160) (0.154) (0.126) Household size -0.0452** 0.0204 0.00473 -0.0630*** (0.0200) (0.0244) (0.0252) (0.0196) Share of child in household -0.0115 -0.264 -0.488** -0.317** (0.159) (0.190) (0.199) (0.154) Share of senior in household -0.472 -0.141 -0.198 -0.788* (0.627) (0.542) (0.662) (0.413) Gender ratio = 1 0.0691 -0.160* -0.0379 -0.0816 (0.0732) (0.0903) (0.0867) (0.0723) Share of employed member = 1 -0.0172 0.0770 0.226** 0.157** (0.0800) (0.0956) (0.0959) (0.0767) Share of literate member -0.0570 0.0661 0.0980 0.0104 (0.0770) (0.0938) (0.0941) (0.0740) House type = 2, House 0.227 -0.0606 0.0938 0.207 (0.143) (0.162) (0.176) (0.131) House type = 3, Hut 0.280* -0.288 -0.0975 0.161 (0.168) (0.190) (0.207) (0.155) House type = 4, Other 0.0622 -0.398** -0.294 -0.0202 (0.157) (0.173) (0.191) (0.141) Drinking water = 2, Tap 0.225 0.165 0.0355 -0.117 (0.153) (0.192) (0.178) (0.144) Drinking water = 3, Tap or well 0.136 0.328** 0.0678 0.00740 (0.122) (0.153) (0.156) (0.123) Drinking water = 4, Delivered 0.0126 0.238* 0.220 -0.0321 (0.117) (0.138) (0.140) (0.113) Floor = 2, Mud 0.128 0.0599 0.00709 0.0201 (0.0942) (0.118) (0.111) (0.0916) Floor = 3, Wood/other 0.117 0.165 0.211* 0.260** (0.101) (0.120) (0.126) (0.104) Dwelling = 2, Own 0.00393 0.110 0.0774 -0.0961 (0.0835) (0.112) (0.105) (0.0826) Dwelling = 3, Provided 0.0466 -0.131 0.140 -0.0648 (0.126) (0.151) (0.154) (0.128) 30 Dwelling = 4, Occupation -0.0554 0.0852 -0.0919 0.0197 (0.164) (0.178) (0.177) (0.160) Location = 2, North-east Urban 0.161 -0.0479 0.139 0.136 (0.138) (0.173) (0.179) (0.135) Location = 3, North-west Urban -0.0713 -0.188 0.0313 -0.472*** (0.168) (0.211) (0.213) (0.158) Location = 5, North-west Rural 0.186 0.809* 0.978** -0.0770 (0.358) (0.455) (0.454) (0.333) Location = 6, IDP Settlements 0.234 0.363* 0.0821 -0.0153 (0.159) (0.192) (0.187) (0.148) Location = 7, Central regions Urban -0.178 0.234 -0.0359 -0.00583 (0.133) (0.166) (0.163) (0.129) Location = 8, Central regions Rural 0.212 0.528*** 0.340 0.211 (0.177) (0.204) (0.212) (0.170) Location = 11, South West Urban -0.222 0.0819 -0.0808 -0.144 (0.136) (0.167) (0.166) (0.128) Location = 12, South West Rural -0.173 0.0465 0.0285 -0.146 (0.173) (0.203) (0.196) (0.157) Location = 13, Nomadic population 0.0532 0.517** 0.737*** 0.256 (0.184) (0.219) (0.228) (0.177) HH experienced hunger = 1, Rarely 0.0745 0.128 0.247*** -0.0347 (0.0737) (0.0910) (0.0888) (0.0735) HH experienced hunger = 2, Often 0.121 0.0380 -0.0616 -0.151 (0.120) (0.148) (0.142) (0.121) Received remittances = 1, Yes 0.274*** 0.235** 0.184* 0.0981 (0.0820) (0.100) (0.102) (0.0785) Constant -4.363*** -4.804*** -4.795*** -4.034*** (0.261) (0.313) (0.309) (0.236) Observations 738 801 804 859 R-squared 0.285 0.306 0.352 0.350 Standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1 RHS variables are the same as those used in MN MNV in Pape & Wollburg (2019). *The core modules (food, nonfood and assets) in the dependent variables are missing for some households due to poor quality data in two regions, but the quartiles for the households in these regions were calculated using the available data. Pape & Wollburg (2019) avoided missing values in these variables, and the authors followed the same method. 31 Table A.5: Full estimation result from Two-Part MI estimation Mean Std. Err. [95% Conf. Interval] Headcount ratio (2011 PPP US$1.90) National 0.686 0.025 0.636 0.736 Mogadishu 0.745 0.032 0.681 0.809 Other Urban 0.591 0.043 0.506 0.676 Rural 0.722 0.065 0.594 0.850 IDP Settlements 0.744 0.071 0.604 0.885 Nomadic population 0.704 0.047 0.611 0.797 Poverty gap National 0.291 0.015 0.261 0.321 Mogadishu 0.277 0.018 0.242 0.313 Other Urban 0.232 0.026 0.181 0.282 Rural 0.363 0.041 0.281 0.444 IDP Settlements 0.334 0.038 0.260 0.407 Nomadic population 0.278 0.027 0.225 0.331 Squared poverty gap National 0.157 0.011 0.136 0.179 Mogadishu 0.131 0.013 0.105 0.157 Other Urban 0.122 0.017 0.089 0.156 Rural 0.220 0.031 0.160 0.280 IDP Settlements 0.185 0.024 0.138 0.232 Nomadic population 0.139 0.018 0.103 0.175 Gini Coef. Std. Err. [95% Conf. Interval] National 0.368 0.015 0.338 0.398 Mogadishu 0.280 0.019 0.242 0.318 Other urban 0.341 0.018 0.305 0.376 Rural 0.447 0.040 0.369 0.525 IDP 0.373 0.043 0.289 0.457 Nomads 0.344 0.027 0.291 0.396 Mean Std. Err. [95% Conf. Interval] National 1.32 0.06 1.20 1.44 Quintiles Poorest 0.44 0.01 0.41 0.46 2 0.75 0.01 0.73 0.77 3 1.04 0.01 1.02 1.07 4 1.47 0.02 1.43 1.52 Richest 2.90 0.13 2.64 3.16 Deciles Poorest 0.33 0.01 0.31 0.36 2 0.54 0.01 0.52 0.56 3 0.68 0.01 0.66 0.71 4 0.82 0.01 0.80 0.84 5 0.96 0.01 0.93 0.98 6 1.13 0.01 1.10 1.16 7 1.33 0.02 1.29 1.37 8 1.61 0.02 1.56 1.66 9 2.12 0.04 2.04 2.19 Richest 3.69 0.21 3.28 4.09 32 Table A.6: Full estimation result from original MI-MVN estimation Mean Std. Err. [95% Conf. Interval] Headcount ratio (2011 PPP US$1.90): National 0.698 0.026 0.648 0.749 Mogadishu 0.742 0.034 0.675 0.809 Other Urban 0.608 0.039 0.531 0.686 Rural 0.726 0.073 0.581 0.870 IDP Settlements 0.759 0.071 0.620 0.899 Nomadic population 0.721 0.047 0.629 0.813 Poverty gap National 0.317 0.029 0.260 0.373 Mogadishu 0.278 0.023 0.233 0.322 Other Urban 0.261 0.028 0.206 0.316 Rural 0.431 0.122 0.188 0.673 IDP Settlements 0.352 0.041 0.271 0.433 Nomadic population 0.281 0.027 0.227 0.335 Squared poverty gap National 0.226 0.100 0.027 0.425 Mogadishu 0.139 0.019 0.102 0.175 Other Urban 0.165 0.026 0.115 0.216 Rural 0.466 0.474 -0.481 1.413 IDP Settlements 0.208 0.030 0.149 0.268 Nomadic population 0.143 0.020 0.104 0.182 Gini using mean consumption over households Coef. Std. Err. [95% Conf. Interval] National 0.344 0.014 0.316 0.372 Mogadishu 0.255 0.018 0.221 0.290 Other Urban 0.335 0.014 0.308 0.363 Rural 0.407 0.042 0.325 0.489 IDP Settlements 0.335 0.029 0.278 0.392 Nomadic population 0.320 0.030 0.260 0.379 Mean National 1.26 Quintiles Poorest 0.469 2 0.763 3 1.025 4 1.388 Richest 2.674 Deciles Poorest 0.363 2 0.575 3 0.700 4 0.825 5 0.952 6 1.098 7 1.274 8 1.502 9 1.946 Richest 3.402 33 Table A.7: Full estimates with one simulation of 100 that closely replicates the distribution with the 100 simulations (mi==66) Mean Std. Err. [95% Conf. Interval] Headcount ratio (2011 PPP US$1.90) National 0.690 0.024 0.643 0.737 Mogadishu 0.731 0.027 0.678 0.784 Other Urban 0.603 0.038 0.528 0.678 Rural 0.744 0.061 0.624 0.865 IDP Settlements 0.742 0.071 0.602 0.881 Nomadic population 0.694 0.047 0.602 0.786 Poverty gap National 0.294 0.015 0.263 0.324 Mogadishu 0.278 0.016 0.245 0.310 Other Urban 0.236 0.025 0.186 0.286 Rural 0.369 0.042 0.286 0.452 IDP Settlements 0.342 0.036 0.270 0.413 Nomadic population 0.275 0.026 0.223 0.326 Squared poverty gap National 0.158 0.011 0.137 0.179 Mogadishu 0.130 0.011 0.108 0.152 Other Urban 0.123 0.016 0.091 0.155 Rural 0.220 0.031 0.159 0.281 IDP Settlements 0.193 0.023 0.149 0.237 Nomadic population 0.137 0.017 0.103 0.170 Gini Coef. Std. Err. [95% Conf. Interval] National 0.367 0.014 0.339 0.394 Mogadishu 0.274 0.018 0.238 0.310 Other Urban 0.338 0.015 0.308 0.367 Rural 0.451 0.040 0.373 0.528 IDP Settlements 0.367 0.023 0.321 0.413 Nomadic population 0.344 0.023 0.298 0.390 Mean National 1.31 Quintiles Poorest 0.440 2 0.744 3 1.031 4 1.459 Richest 2.880 Deciles Poorest 0.339 2 0.541 3 0.679 4 0.808 5 0.953 6 1.109 7 1.318 8 1.601 9 2.122 Richest 3.642 34 Table A.7: Full estimates with the pooled estimation Poverty estimate Mean Std. Err. [95% Conf. Interval] Gini Std. [95% Coef. Err. Conf. Interval] National 0.368 0.013 0.342 0.395 Mogadishu 0.280 0.015 0.250 0.310 Other Urban 0.341 0.014 0.314 0.368 Rural 0.448 0.038 0.374 0.521 IDP Settlements 0.375 0.037 0.303 0.447 Nomadic population 0.344 0.025 0.296 0.392 Mean National 1.32 Mean Quintiles (tc_imp) Poorest 0.435 2 0.751 3 1.042 4 1.470 Richest 2.898 Deciles Poorest 0.334 2 0.536 3 0.684 4 0.818 5 0.959 6 1.125 7 1.331 8 1.610 9 2.113 Richest 3.683 35 Appendix B: Additional Figures Figure A.1: Module-level prediction by MI-MVN 36 37 38 39 40 41 Figure A.2: Module-level prediction by Two-Part MI 42 43 44 45 46 47