Policy Research Working Paper 11034 Predicting Income Distributions from Almost Nothing Daniel Gerszon Mahler Marta Schoch Christoph Lakner Minh Nguyen Jose Montes Development Data Group & Poverty and Equity Global Department January 2025 Policy Research Working Paper 11034 Abstract This paper develops a method to predict comparable income which the method can be applied. The paper finds that a and consumption distributions for all countries in the world simple model relying on gross domestic product per capita, from a simple regression with a handful of country-level under-5 mortality rate, life expectancy, and rural popu- variables. To fit the model, the analysis uses more than lation share gives almost the same accuracy as a complex 2,000 distributions from household surveys covering 168 machine learning model using 1,000 indicators jointly. countries from the World Bank’s Poverty and Inequality The method allows for easy distributional analysis in coun- Platform. More than 1,000 economic, demographic, and tries with extreme data deprivation where survey data are remote sensing predictors from multiple databases are used unavailable or severely outdated, several of which are likely to test the models. A model is selected that balances out-of- among the poorest countries in the world. sample accuracy, simplicity, and the share of countries for This paper is a product of the Development Data Group and the Poverty and Equity Global Department. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/ prwp. The authors may be contacted at dmahler@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Predicting Income Distributions from Almost Nothing Daniel Gerszon Mahler, Marta Schoch, Christoph Lakner, Minh Nguyen, Jose Montes1 JEL classification: C53, D31, I32, O10 Keywords: Income, consumption, data deprivation, machine learning, poverty, measurement 1 All authors are with the World Bank. We are grateful for comments received from Roy van der Weide, Johannes Hoogeveen, Federico Haslop, Andres Fernando Chamorro Elizondo, Zander Prinsloo, Benjamin Stewart, Emi Suzuki, and the Global Poverty Monitoring Working Group of the World Bank. The authors gracefully acknowledge financial support from the UK government through the Data and Evidence for Tackling Extreme Poverty (DEEP) Research Programme. 1 Introduction Household surveys are needed to measure poverty and design distributional policies, yet in several countries they are not conducted due to low statistical capacity, conflict, or lack of resources. In other cases, household surveys are collected but not shared with researchers and policy makers (Dang et al. 2019, Ekhator-Mobayode and Hoogeveen 2022). Lack of data disproportionally affects poorer countries, meaning that ignoring or not properly accounting for such countries in global analyses biases results. We address these gaps by developing a method that through a simple regression can predict annual income and consumption (henceforth welfare) distributions with a credible distributional shape for each country. The method uses widely available social and economic indicators at the country level and does not require any household survey data for the country of interest. We explicitly search for a simple model that can be easily taken up in applied work and can work for the most data deprived countries. To estimate such a model, we leverage rich information on welfare distributions from more than 2,000 household surveys available in the World Bank’s Poverty and Inequality Platform (PIP), which covers 168 countries, using data between 1991 and 2020. For each of these surveys, we have distributions of daily per-capita household income or consumption expressed in purchasing-power-parity-adjusted dollars. We sequentially remove one of the 168 countries from the sample and predict the excluded distribution for this country using the remaining 167 countries and various predictors. As potential predictors, we search among more than 1,000 candidate variables spanning multiple databases and including remote sensing indicators. After repeating this leave-one-out cross-validation for each of the 168 countries with available data, we compare the predicted distributions to the survey-based distributions, favoring models that minimize the prediction error. We find that a model that uses GDP per capita, under-5 mortality, life expectancy, rural population shares, and regional dummies predicts welfare well, and that adding more information does not lead to any relevant gains. National accounts data stand out as the single best predictor of welfare, even, but to a lesser extent, in data-deprived countries. This suggests that notwithstanding the gaps between welfare from surveys and national accounts (Deaton 2005; Pinkovskiy and Sala-i- Martin 2016, Deaton and Schreyer 2022, Prydz et al. 2022) and measurement issues with GDP in autocratic countries (Martinez 2022), GDP provides valuable information on welfare in contexts where household surveys are unavailable. However, about half of the countries are so data- deprived that they not only lack an income or consumption distribution, but also GDP. For these countries, we use World Bank income groups as a proxy, resulting in two tiers of models depending on whether GDP data is available for the country of interest. These two tiers outperform models based on remote sensing data (e.g., nighttime lights, vegetation), which do not meaningfully lower the out-of-sample error if added to the models. This suggests that, on average, remote sensing data are worse predictors of welfare at the national level than GDP, which is consistent with evidence suggesting that remote sensing data might produce household welfare estimates very different from survey-based ones (Van Der Weide et al. 2024). This suggests that a relatively simple model using readily available data outperforms more complicated models using costly data. 2 We implement our preferred methods for all countries to estimate global poverty, benchmarking the results against poverty estimates published by the World Bank. We find that the models track poverty rates relatively well in general but with notable exceptions. We show that these errors are in part due to poverty estimates not being comparable across countries and within countries over time but surely also due to modeling errors. On average, our preferred models predict income or consumption off by around 30%. While this is a large error, it is important to assess it against an appropriate benchmark. A random forest on all 1,000+ possible predictors does not lead to any gains in the out-of-sample error, suggesting that much of the remaining error is likely irreducible. Furthermore, at a global scale, a 30% error is small in comparison with the large observed income differences: The median welfare of the richest country in our sample is more than 100 times that of the poorest country, and the 75th percentile of medians is 5 times greater than the 25th percentile. A long-standing literature has tried to overcome data gaps and predict distributions when limited information was available. Survey-to-survey imputations can be used when survey data on welfare is unavailable at a desired point in time, but data on correlates of households’ wellbeing is available together with welfare data from a prior household survey (Stifel and Christiansen 2007, Roy and Van Der Weide 2024). Alternatively, national accounts data can be used to extrapolate older welfare vectors forward in time (Mahler et al. 2022; Angrist et al. 2021). Others have estimated full distributions when grouped data or summary statistics are available (Chen 2018; Chotikapanich et al. 2012, Eckernkemper and Gribisch 2021, Jorda and Niño-Zarazúa 2019; Hajargasht et al. 2012). Wealth indices have been used to predict full distributions for individual countries that lack consumption or income data but have a Demographic and Health Survey (Filmer and Prichett 2001, Dang et al. 2019). However, all these methods require at least one survey-based welfare vector, thus making them inapplicable for countries that do not have any survey data. Remote sensing data and mobile phone data have been used to predict mean welfare, the poverty rate, or another distributional statistic in a country in the absence of survey data (Pinkovskiy and Sala-i-Martin 2016; Blumenstock et al. 2015, Steele et al. 2017; Pokhriyala and Jacques 2017; Lee & Braithwaite 2022; Engstrom et al. 2022). However, these approaches do not predict full distributions. Given the multiple poverty lines and welfare measures used in practice (see for example Jorda et al. 2023, Decerf and Ferrando 2022, Kanbur et al. 2022, Jolliffe & Prydz 2021), using such a model for each relevant welfare metric would jeopardize the simplicity of our approach. Moreover, the remote sensing data needed for these approaches do not stretch back far in time for long-term time trends and are not always publicly available, making these methods difficult to implement for practitioners. The remainder of the paper is structured as follows. The data and method are described in sections 2 and 3. Sections 4 and 5 present results and robustness checks. Section 6 applies the models to global poverty measurement, and section 7 concludes. 3 2 Data Our primary data source is household survey data on disposable income or consumption available in PIP. We use data from 1,989 surveys covering 168 countries for the period 1991 and 2020. We exclude data before the 1990s as the quality was generally worse then, particularly for low- and middle-income countries. The data are standardized as far as possible but differences exist with regards to the method of data collection, and whether the welfare aggregate is based on income or consumption. We use information on per capita household welfare expressed in 2017 USD PPPs. We use PIP’s public percentile database (version 20230919), utilizing 99 percentiles on the distribution from each income or consumption vector. Concretely, we use the values of income or consumption such that the cumulative density function () takes the following values {0.01, 0.02, … ,0.99}. That is, we retain the 99 poverty lines that result in poverty rates of {1%, 2%, … ,99%}. The final dataset consists of 196,903 quantile-country-year observations on pairs of daily per-capita welfare and the associated quantile in the distribution. We combine this survey data with various possible predictors of welfare at the country level. We use data from the World Development Indicators (WDI) of the World Bank, which is one of the largest databases of country-year development indicators spanning a wide range of topics. The WDI contains information on around 1,400 indicators covering topics such as health, agriculture, education, climate change, infrastructure and more. We also use all data from the World Economic Outlook of the International Monetary Fund, which contains dozens of variables on macroeconomic indicators, and all data from the UN’s World Population Prospects, which contains dozens of variables on population, health, and demographics. We compliment GDP data from the sources above with estimates from the Madison Database (Bolt and Van Zanden 2024). We also use country and region classifications by the World Bank and data on political rights, civil liberties, and freedom status from the Freedom House. In addition, we use remote sensing data available from the Google Earth Engine. We use data on nighttime lights, precipitation, temperature, impervious surface, cropland, the normalized difference water/snow/vegetation index, and the enhanced vegetation index. While spatial coverage of remote sensing variables is not a concern as they cover the earth’s entire surface, temporal coverage is at times limited. Nighttime light data, for example, dates back to 1992. To fit into this exercise, the remote sensing data need to be aggregated to the country-year level. We first aggregate them to annual data by calculating the mean, max, min, and standard deviation of each location (e.g., a pixel) over a year. Afterwards, we aggregate them spatially by taking the mean, max, min, and standard deviation of the annual data for a country. This gives 16 features for each type of variable. Nighttime lights are also converted to a per capita level by dividing the sum of lights by the population size. We weigh each grid equally, and hence the indices reflect, for example, the mean temperature in the territory of a country, not the mean temperature experienced by a person in a country. Given that many of the variables impact welfare through agriculture, it is not clear that population weighting (which would give a dominance to urban areas) would make them more related to welfare distributions. Yet we also add population-grid weighted estimates of temperature and rainfall from Gortan et al. (2024). 4 Where sensible, we use all variables in levels and in logs. From the total set of covariates, we remove all that have more than 50% missing values, as these are unlikely to be relevant in the application where we will apply our models, which is for the most data-deprived countries. This leaves us with a total of 1,444 candidate variables for predicting welfare distributions. 3 Method 3.1 Distributional assumptions To ensure that the predicted cumulative density functions (CDFs) are well-behaved, we need to impose a distributional assumption. Though the log-normal distribution is the typical two- parameter distribution used in applied work (see for example Bergstrom 2022, Kraay & Van der Weide 2022, and Soergel 2021), we find that the log-logistic distribution, also known as the Fisk distribution (Fisk 1961), provides a marginally better fit (see section 5.1). This is consistent with Bresson (2009), who finds the log-logistic distribution to be the best performing two-parameter distribution. The Fisk distribution is given by 1 () = − (1) 1+� � where is the scale parameter, which equals the median of the distribution, and is the shape 1 parameter, which equals the inverse of the Gini coefficient (Gini = ). We are interested in predicting welfare levels, which we can isolate on the left-hand by using the quantile function of the log-logistic distribution. −1 = = � � ↔ ln() = ln() + ∗ � � (2) 1− 1− where is the quantile of the distribution (i.e., percentile in our application). Equation 2 thus expresses log welfare as a function of the quantile. This quantile function of the Fisk distribution is convenient because it can be estimated through a simple OLS regression where the covariates included directly in the regression will predict ln(), while the covariates interacted with � � 1− will predict the Gini. With one covariate, such a regression can be written as ln�,, � = 0 + 1 ∗ , + �0 + 1 ∗ , � ∗ � � (3) 1− where ,, is daily per-capita income or consumption at percentile , in country , in year , and , is a covariate of interest. While , is only available at the country-year level, it can be used together with the distributional assumption to generate predictions at the country-year-percentile- level. We could instead predict the median and Gini index separately and then recover a distribution under the log-logistic parameterization. A robustness check shows that this variant performs marginally worse (see section 5.1). The advantage of our preferred approach is that it leverages the microdata and accomplishes the predictions in one step. 5 To illustrate our approach, we use only log GDP per-capita as a covariate without interacting it with the percentile term, i.e. setting 1 = 0 in equation 3, and implicitly assuming that all countries have the same Gini. Estimating this regression on our 196,903 quantile-country-year observations, yields the following result: ln,, �, � = −5.794 + 0.869 ∗ /, + 0.385 ∗ � � (4) 1− Hence, the median (in 2017 USD PPP per person per day) is estimated to be = −5.794+0.869∗/, , which for an annual GDP per capita of $20,000 equals $16.6. The 1 estimate implicitly derives a ‘passthrough rate’ indicating how much of GDP growth passes through to growth in (median) welfare. According to equation 4, whenever GDP per capita grows by 1%, the median grows by 0.87%, which is similar to other estimates in the literature (Prydz et al. 2022, Lakner et al. 2022). The Gini is predicted to be 38.5. For any GDP per capita, we can turn the predictions of welfare at a particular percentile to a full distribution. Figure 1 provides a graphical example of the predicted distribution for two specific levels of GDP per capita using equation (4). Figure 1: Illustration of predicted distributions for two levels of GDP per capita Note: Predicted distributions from the output of equation 4. PIP includes a mix of consumption and income distributions, with income more commonly used in richer countries. We can predict either an income or consumption distribution for any model we run by adding an income/consumption dummy and interacting this dummy with � �. That 1− said, given that the countries that use income aggregates tend to be wealthier and vice versa, predicting income aggregates for poor countries and consumption aggregates for wealthy countries would mean predicting beyond where there is common support and hence involve a greater level of uncertainty. 6 3.2 Model performance We use spatial-block leave-one-out cross-validation to compare out-of-sample errors of different models (Roberts et al. 2017). That is, we sequentially remove all surveys for one of the 168 countries with available survey data and estimate welfare at 99 percentiles at each distribution of the omitted country by running variants of equation (3) on the remaining 167 countries. We remove all surveys available for a country and not a survey at the time to better simulate the scenario where no data is available for a country. At each of the 99 predicted values of (), we calculate the absolute difference between true log welfare and predicted log welfare, and summarize the error of the survey, , , as the mean absolute deviation: 1 , = � ��(,, ) − (,, )� 99 ∈ Figure 2 shows a graphical example of this. The blue curve represents the true distribution of Lesotho in 2017 (()), the black curve is the result of our GDP-based prediction from equation (4) (� ()), and the loss is equal to the average width of the yellow lines. Figure 2: Illustration of loss function Note: Graphical example of the distance between the survey-based welfare distribution and the predicted welfare distribution on a log scale. The yellow bars indicate the absolute deviation between the two distributions at 99 percentiles. In the main specification, we use the mean absolute deviation rather than the mean squared error since we are interested in minimizing the deviations between the true and predicted log welfare while giving equal weight to all deviations. Using the mean squared error would give a larger weight to large deviations, which often occur at very low or high percentiles. The welfare values in the tails are most susceptible to measurement error. 7 We calculate the mean absolute deviation for every survey distribution available for a country, average these losses, and then aggregate over all countries as follows: 1 1 = �� � , �, , ∈ ∈ where , is the number of surveys for a country with data and is the number of countries with surveys. Across countries, the frequency of surveys varies drastically and systematically with income. For example, the average low-income country has 4 surveys between 1991 and 2020, compared with 20 surveys in the average high-income country. To ensure that we do not select models that only work well for countries with many surveys, our aggregation formula weighs each country by the inverse of its number of surveys, such that the total weight for each country equals one. We multiply the final loss by 100 such that the loss approximately equals the average error in predicting welfare in percent. Our loss function and use of a linear model to predict () leads us to estimate equation (3) using a quantile regression. Quantile regressions estimate the parameters while minimizing the absolute size of the error, which is consistent with our loss function. As a robustness check, we also use OLS, which minimizes the sum of squared errors, and hence is consistent with a mean squared error loss function (see section 5.1). 3.3 Model selection The selection of our preferred model is not only guided by the loss function, but also by three additional principles, which we define below: simplicity, useability, and coherence. With respect to simplicity, we prefer models that can be applied easily, which is in part why we use a linear model. One challenge with a linear model is that missing values in the covariates cannot be handled easily. A potential solution is to impute the missing values, but this would make the model intractable. To keep the framework simple, we will only use one covariate with missing values, and otherwise restrict the model to covariates with at most 1% missing values. This results in two tiers of models, one where the covariate with more than 1% missing values is available and one where it is not. We allow for 1% missing values in other covariates because some of the databases we use exclude certain economies for political reasons (notably Taiwan, China; and Kosovo), yet data are often available for these excluded economies from country-specific sources. So in practice, the second tier model can be applied everywhere. To select the primary variable of interest, we will compare the error from regressions of the type shown in equation (4) but replacing GDP per capita sequentially with all covariates in our dataset. By useability, we mean that the model can be applied and performs relatively well even in the most data deprived contexts. The tier 1 covariate is selected based on a balance of predictive accuracy and non-missingness. By construction, our tier 2 model will only use covariates available for virtually all countries, and can hence be applied everywhere. Even if the model can be applied everywhere, it is possible that the relationship between welfare and covariates is fundamentally different for data deprived countries. Martinez (2022) for example finds that GDP growth is less reliable for authoritarian countries, which also tend to be the ones that do not produce or publish 8 survey data. To ensure that our model works well for data deprived countries, we check if our model selection and performance are robust to using only the countries with at most three poverty estimates, which reflects the 25% most data deprived countries for which we have some data. These countries tend to be more authoritarian and conflict-ridden but also include a couple of wealthy countries which simply do not share much data (Table A.1). With respect to coherence, we mean that we are willing to tolerate marginal increases in errors if doing so results in a model that is easier to rationalize. This is in part why we are restricting ourselves to predictions consistent with the log-logistic distribution, but it also matters for the covariates we will select. For example, if we find that a model using GDP per capita in 2011 PPPs performs slightly better than a model using GDP per capita in 2017 PPPs, we will favor the latter as the welfare distributions are expressed in 2017 PPPs, and hence makes for a more consistent model. The use of these three principles in addition to the error of the model means that, at times, value judgments are needed to assess when an increase in the error may be merited by a greater adherence to the three principles. This complicates the model selection somewhat in contrast to a sole reliance on the model error. As a robustness check, we use machine learning to select a model guided exclusively on model performance as a test of how much accuracy we give up by including these three additional principles (section 5.2). 4 Results Our first objective is to select the main variable to use for the tier 1 model. To that end, we run separate quantile regressions following equation (4) using one candidate covariate at a time, replacing () in the equation. We evaluate the fit in-sample for reasons of efficiency, since evaluating all models out-of-sample is too computationally intensive. Given that all models are identical and simple, overfitting is unlikely to be an issue. The left panel of Figure 3 plots the error from these regressions against the availability of each indicator from 1991-2020 for the 218 economies considered by the World Bank. We exclude variables with less than 50% availability given that they have too much missingness to be useful for our application. We are primarily interested in variables for which no other variable has a lower error and lower missingness. When using the full sample (panel a), this concerns national accounts variables and under-five mortality. The national accounts variables on the frontier are GDP, Household Final Consumption Expenditure (HFCE), and Gross National Income (GNI) (the national accounts variables are expressed in logs and per capita terms throughout). We use versions of these variables expressed in 2017 PPP-adjusted USD even though at times non-PPP adjusted versions perform slightly better (see the unlabeled orange dots in Figure 3). These variables have the advantage of being highly correlated with welfare but the drawback of not always being available. For example, GNI per capita is available only for 58% of the country-year observations. Hence, national accounts data can only offer a partial solution to predict the distribution of welfare. To the contrary, under-five mortality is universally available due to modeling done by the UN Population Division. 9 Figure 3: Covariates’ predictability of welfare and availability (a) Full sample (b) Data deprived countries Note: Each dot in the figure presents output from a separate regression using a particular covariate in the regression. The vertical axis shows the error of the regression, and the horizontal axis shows the availability of the covariate across countries. The labelled national accounts variables are in per capita 2017-PPP terms. Panel (a) uses the full sample while panel (b) uses countries with at most three surveys. Panel (b) evaluates missingness by looking at the country-years without a poverty estimate but with data on the covariate. Variables in the bottom-right have high accuracy and high availability. An error of, say, 30 means that the model predicts log welfare off by 0.3 on average, which is approximately equal to 30% (though for large errors, the approximation is less accurate). In panel (b) we perform the same analysis on the data-deprived sample. Concretely we look at the performance among countries with at most three welfare distributions and evaluate missingness based on the country-years without data. We do this to get a sense of how the variables could perform in situations where the modeling efforts are most likely to be applied, namely in settings where actual welfare data are unavailable. National accounts data perform relatively worse in this subsample: HFCE and GNI are not even shown in the plot as they are not available in more than half of the cases, and GDP per capita is now outperformed by sanitation and electricity access. This could happen because the country-years for which the latter two variables are available are easier to predict that the country-years for which GDP per capita is available. For example, electricity access is essentially only available post-2000, which could give rise to its relatively good performance if there is more noise in early welfare aggregates. Even in the data deprived case, remote sensing variables including nighttime lights do not offer the most accuracy. 2 2 From this evidence it is not possible to draw conclusions about the accuracy of remote sensing data in other applications such as for small area or subnational estimation, where many of the variables we consider here are unavailable. 10 The greater accuracy of variables without complete availability lends itself to using the two tiers of models discussed previously: Tier 1, the model with a variable without complete availability, and Tier 2, the model relying exclusively on variables with complete availability. We test the performance of possible Tier 1 variables in Figure 4. To evaluate possible Tier 1 variables, we delete all observations with missing values in any candidate variable, as this gives a fair evaluation. As candidate variables we use the ones on the frontier of either panel of Figure 3 – GNI, HFCE, GDP, electricity access, access to sanitation, and under-five mortality. We add nighttime lights for reference. Including HFCE and GNI reduces the common sample that can be used on all models, since these variables have many missing observations. Hence, we consider a version that includes these variables (panel a) and a version without them (panel b). Figure 4: Candidate tier 1 variables (a) With HFCE and GNI (b) Without HFCE and GNI Note: The figure plots the prediction error (vertical axis) of various predictors (horizontal axis). Observations: in panel (a) N=132,163; in panel (b) N=159,385. GNI performs best in the full sample, marginally outperforming GDP, while HFCE performs best in the data deprived sample. HFCE and GNI are however only available in less than half of country-years without poverty data, so a model with GDP has a broader use. For that reason, we select GDP over GNI or HFCE. If we remove those two variables (panel b), GDP clearly outperforms other contenders in the full sample, but only marginally in the data deprived sample. Notably using sanitation access or electricity access gives essentially the same performance. However, these two variables are not well suited for predicting poverty in non-poor countries, since access is universal. Therefore, once again, there are reasons to prefer GDP as the tier 1 variable also in panel (b). Next, we test if adding more variables to the tier 1 model would reduce the error, and which variables we should use for the tier 2 model. To that end, we run a quantile regression lasso predicting () using equation (3) (Sherwood & Maidman 2017). As possible covariates for tier 2, we use all the variables we gathered with at least 99% availability, all of these variables 11 interacted with � � (such that they also predict the distribution, not just the mean), and all of 1− these two sets of variables interacted with whether income or consumption is used to measure welfare. For tier 1 we also add the interaction of log GDP per capita with � � and the welfare 1− type. For both tiers, only three types are selected by the lasso: Mortality rates (infant, under-5, under-40, and under-60), life expectancy (overall and at 80), and the rural population share. In principle, we could use the lasso alone to select the final models, but upon further inspection, we find that the lasso does not select group variables (income groups and regions) in an ideal way, results in a relatively unstable model selection, and does not deal with income/consumption dummies efficiently. 3 For those reasons, we use a stepwise model selection using as inputs the subset of variables that the lasso identified, to which we add income groups to tier 2 as a proxy of GDP per capita. Figure 5 shows the order in which variables enter and the out-of-sample error with each additional variable included. For tier 1, the variables chosen in order are GDP/capita, under-60 mortality, under-5 mortality, and life expectancy. For tier 2, the variables chosen in order are under-5 mortality, income groups, rural population share, and life expectancy. After these variables are included, no additional variable lowers the out-of-sample error notably. Figure 5: Error as variables enter the model (a) Tier 1 (b) Tier 2 Note: The figure plots the order in which variables enter the two tiers and the errors as each variable is added. For example, GDP per capita is the first variable to enter in tier 1, and when it is the only variable in the model, the error is 33.7. The next variable to enter is under-60 mortality, which, when added to the model that includes GDP per capita, lowers the error to 30.3. 3 We find very small variations in the penalty parameter results in either all income groups, regions, and their interaction with welfare type being included or none. 12 The lasso and stepwise regression did not include interactions. We next test if adding interactions with the income/consumption dummy, or between the variables and � � (i.e. letting the 1− variables influence the Gini) could lower the error. As we show in the next section, regional dummies are most predictive of the Gini index among all 1,000+ variables. Therefore, we also test whether accuracy improves when we allow for variation in the Gini index by region. We try both Gini dummies for all regions and for the three regions that show notable differences in their Gini indices from the rest of the world. These regions are Latin America & the Caribbean and Sub- Saharan Africa which are regions with high inequality (Haddad et al. 2024), and Europe & Central Asia, which tends to have lower inequality. Finally, we test if the Gini should be a second-order polynomial of log GDP per capita, as the Kuznets curve would suggest (Kuznets 1955). The error of both tiers would be reduced by letting the Gini differ by the three regions mentioned above (by 1.4 for tier 1 and 1.2 for tier 2). For tier 1, there is additional evidence in favor of adding a dummy for welfare type (income or consumption) and interacting this with GDP per capita. This means that as GDP grows, it has differential impacts on income and consumption vectors, which is in line with existing evidence (Lakner et al 2022, Mahler et al. 2022, Prydz et al 2022). This reduced the error of tier 1 further to 26.7. On the other hand, for tier 2, there is no similar case for adding a welfare type dummy or interacting the income group classifications with welfare type. Finally, we substitute under-60 mortality for the rural population share in the tier 1 model, which increases the error very marginally (from 26.67 to 26.69). This improves the consistency between the two models, and hence reduces revisions to the predictions if a country moves from one tier to another. In sum, we end up with two almost parallel models, where GDP per capita is proxied by income group in tier 2. The final out-of-sample error of the two tiers are 26.7 and 30.1. The regression outputs from the final models are shown in Table 1 (for the full sample). Table 1: Regression output Outcome variable: Log welfare (()) Tier 1 Tier 2 Intercept -1.975*** (0.052) 1.837*** (0.047) Log GDP per capita (2017 PPP) 0.393*** (0.004) Log under-5 mortality (per 1,000 live births) -0.185*** (0.003) -0.307*** (0.004) Life expectancy (years) 0.016*** (0.000) 0.017*** (0.000) Rural population share (0-100) -0.003*** (0.000) -0.008*** (0.000) Income group Low income Base Lower-middle income 0.229*** (0.006) Upper-middle income 0.450*** (0.008) High income 1.102*** (0.011) Welfare type (income =1, consumption = 0) -3.671*** (0.030) Welfare type * log GDP per capita 0.392*** (0.003) (/(1 − )) 0.354*** (0.002) 0.350*** (0.002) (/(1 − )) * I[Europe & Central Asia] -0.045*** (0.002) -0.034*** (0.003) (/(1 − )) * I[Latin America & Caribbean] 0.159*** (0.002) 0.166*** (0.003) (/(1 − )) * I[Sub-Saharan Africa] 0.060*** (0.002) 0.069*** (0.003) Observations 194,626 194,824 Pseudo R2 0.7512 0.719 Note: *=0.05, **=0.01, ***=0.001. Robust standard errors in parentheses. 13 We use two examples to illustrate how these regressions can predict welfare and poverty rates. The first example is for predicting a consumption distribution in a Tier 1 country in Sub-Saharan Africa (i.e., GDP is available). In this case, the predicted log consumption equals ln() = −1.975 + 0.393 ln() − 0.185 ln(5) − 0.003ℎ + 0.016 + (0.354 + 0.060)ln � � 1 − Isolating , which can be interpreted as the poverty rate associated with the poverty line yields 1 −1 exp(−1.975 + 0.393 ln() − 0.185 ln(5) − 0.003ℎ + 0.016) 0.354+0.060 () = �1 + � � � Using data for Ethiopia in 2021, GDP per capita is $2.319, under-5 mortality is 47, the rural share is 78 percent, and life expectancy is 65 years. At the international poverty line of $2.15, this yields () = 0.276, so a poverty rate of 27.6%. The second example predicts the income distribution of an upper-middle-income country in East Asia & Pacific, which lacks GDP data (i.e., using the Tier 2 model): () = 1.837 + 0.450 − 0.307 ln(5) + 0.017 − 0.008ℎ + 0.350 � � 1 − Again, isolating gives 1 −1 exp(1.837 + 0.450 − 0.307 ln(5) + 0.017 − 0.008ℎ) 0.350 () = �1 + � � � Using data for Thailand (an upper-middle-income country in East Asia) in 2021, under-5 mortality is 8, life expectancy is 79 years, and the rural population share is 48 percent. Setting = 6.85, which is the typical poverty line for upper-middle-income countries, results in () = 0.124, so a poverty rate of 12.4%. 5 Robustness checks This section conducts three robustness checks of our model selection. First, we predict welfare distributions using other distributional approaches. Second, we predict distributions flexibly using machine learning on all the indicators we gathered. This helps assess how much we are giving up by insisting on a simple model. Third, we look at how well the preferred model performs in relevant subsamples to explore if there are cases where it may be less appropriate to use. 5.1 Accuracy using other distributional approaches We try three other ways of predicting full distributions to assess the robustness of our main approach. In all cases, we use the same covariates as in the final two-tier model but change how they translate into full distributions. First, instead of predicting the full distributions directly using the inverse CDF of the log-logistic distribution, we first predict the log of the median and the Gini, and then infer the full distribution from the inverse CDF using the predicted median and Gini. Second, we run OLS regressions rather than quantile regressions, and hence estimate the parameters using the MSE rather than the MAD. Third, rather than assuming the distributions are 14 log-logistic, we assume that they are log-normal. Given that the quantile function for the log- normal distribution results in n() = ln() + √2 2 ∗ −1 (2 − 1), we can essentially run the same regressions as we have run so far, but instead of interacting covariates with � �, we 1− interact them with −1 (2 − 1). We find that in all three cases, the error increases (modestly) compared to our preferred approach (Figure A.1). 5.2 Accuracy using machine learning on all gathered indicators The errors of our two models -- 26.7 or 30.1 -- are substantial on an absolute scale, as they imply that welfare is predicted off by around 30% on average. We test whether the error is high because our model ultimately is too simple to perform well, because of the distributional assumption we impose, or because of irreducible error related to the uncertainty involved with constructing welfare aggregates and converting them to a common currency. First, to estimate the limitations imposed by the simple model, we predict the median and Gini using a conditional inference random forest (Hothorn et al. 2006) using all the 1,000+ covariates we gathered. This helps shed light on whether the limitations of the functional form imposed, or the set of covariates considered causes our relatively large error. When doing so, we get an out- of-sample error of 27.9 on the full sample and 36.5 on the data-deprived sample. This is higher than the errors from the tier 1 model (26.7 and 31.1). Though it certainly is possible to improve upon this error using even more covariates or other machine learning methods, this suggests that the main reason for the error is not the simplicity of the model we used. The intuition for this result is clear from the variable importance plots of the random forest: The main predictor of medians is GDP per capita, while the most predictive variable of the Gini is the region (Figure A.2). Both variables are already included in our preferred models. Second, to assess the potential loss from the distributional assumption, we estimate a hypothetical error if we predicted the median and Gini perfectly and then impose a log-logistic distribution. This results in errors of 5.6 on the full sample and 6.6 on the data-deprived sample. Hence of the final error of our model, around 20% can be ascribed to our distributional assumption, while the remaining 80% are most likely due to irreducible error. Some additional statistics can also help bring clarity on the size of the error of the two models. Firstly, the R2 from the regressions of our final models are 0.71 and 0.75, whereas the R2 of a model without any predictors is 0.15 (i.e. a model which would assign approximately the distribution of the median country to any country). Hence, our models explain most of the variance across and within countries. In addition, it is important to note that countries differ greatly in their welfare levels. The median welfare of the richest country in our sample is more than 100 times the poorest country, and even the 75th percentile of medians is 5 times greater than the 25th percentile. Hence, an error of around 30% would not lead to a substantial mis-ranking of a particular country, given the large differences in living standards in the world at large. 5.3 Accuracy in subsamples We next explore how our preferred models perform in various settings (Figure 6). If the model performs relatively poorly in a setting it may either be because the chosen covariates are less related to welfare distributions in this setting or because welfare distributions in general are noisy 15 and hard to predict in those settings. We try to distinguish between these two explanations by also reporting the error of the corresponding machine learning model. If the error in a particular subsample is higher both for our models and the machine learning model, then welfare in general is hard to predict in this setting. By contrast, if the machine learning model gives low errors, while the errors of our models are high, then the covariates we have selected are not as relevant for this subsample. Figure 6: Performance by country groups Note: The out-of-sample error for different groups of countries. FCV refers to whether the country-year is classified as in fragility, conflict, or violence according to the World Bank’s classification. The errors are lower in data rich settings, for rich countries, and using more recent welfare aggregates. In general, whenever the error is low, the random forest outperforms our models, while in poor and data deprived settings, the random forest does worse. This suggests that when there is high-quality information available, a simple model does not pick up all the relevant information, and a more complex model could do better. By contrast, when the data are sparse and of worse quality, there is less to be gained from complex models. 6 Application to global poverty measurement We apply our model to all countries in the world from 1991-2020 to measure global poverty. We use the international poverty line ($2.15/day, which is the median poverty line of low-income countries), the median poverty line of upper-middle-income countries ($6.85), and the median poverty line of high-income countries ($24.35) (Jolliffe ⓡ al. 2024). In a handful of cases where data are still missing in the variables we use in the two tiers, we find alternative values. 4 We 4 Under-5 mortality is measured by the UN Inter-agency Group for Child Mortality Estimation. It uses country-level estimates from vital registration, surveys and censuses and fits a Bayesian B-spline bias-reduction model to create a 16 benchmark the results against the official global poverty estimates by the World Bank, which rely on survey data, but also interpolations and extrapolations using growth in national accounts. We can in theory recover predictions for both income and consumption distributions for each country using tier 1. To approximate what countries actually rely upon, we use income distributions for countries in Latin America & the Caribbean and high-income countries. Results are shown in Figure 7. Figure 7: Predicted and survey-based global and regional poverty trends Note: Predicted poverty rates using the two tiers and actual poverty rates from PIP. The predictions at the extreme poverty line globally are a bit below the actual trend and the reduction is a bit less strong, with the predicted $2.15 poverty rate falling from 26% in 1991 to 7% in 2019, while PIP’s poverty rate falls from 37% to 9% over the same period. At the $6.85 line, the predicted rates are also below PIP, but the rate of progress is a bit faster, with the predicted poverty rate going from 67% to 38%, while PIP’s falls from 69% to 47%. At the high-income line, the two are more aligned. Despite these differences, the two-tiered model is able to predict the decline in global poverty rather well using a handful of readily-available country-level variables. smooth trend through these estimates (Alkema & New 2014). All 218 economies in the World Bank’s list of economies have data on this indicator. Life expectancy is based on the World Population Prospects 2022, relying on similar sources as under-5 mortality but uses different methods to smooth and fill gaps (United Nations 2022). Life expectancy has no missing values. Rural population shares are based on data from the 2018 World Urbanization Prospects (United Nations 2018), which relies on national criteria of urban and rural populations, using extrapolations from past trends when timely data are missing. The rural share is missing for Kosovo; Taiwan, China; and St. Martin (French Part). For St. Martin (French part), we assume the same rural population share as the Dutch part, while the data for Kosovo is taken from Serbia, and for Taiwan, China, we use the share of China. Income groups are defined based on a country’s Gross National Income (GNI) per capita. Countries may have an income group classification even though they do not publish GDP (or GNI) data. This happens when GNI estimates of sufficient quality exist to place a country in an income category but not with sufficient precision to release the numerical estimate. The only economy among the World Bank’s 218 economies without an income group classification (since 2015) is the República Bolivariana de Venezuela, which is assumed to carry its last classification (upper-middle income) forward to 2021. 17 The predicted poverty rates track rather well in all regions except for East Asia & Pacific and South Asia, where the predicted rates are notably lower. This is in large part due to China and India, which influence the regional and global results due to their size, not because the model necessarily performs worse there. We will briefly discuss both countries, which illustrate issues related to consumption surveys that might also be present elsewhere (Nicoletti et al. 2011). China’s extreme poverty rate is “only” predicted to have fallen from 24% to 0.6% from 1993-2020, while PIP’s survey-based estimates suggest it fell from 63% to 0.1% (Figure 8). This is in part explained by China’s poverty rates not being comparable before and after 2012. The non- comparable (survey-based) poverty rate fell from 8.5% to 2.9% from 2012 to 2013, while the predicted (comparable) poverty rate only fell from 1.5% to 1.3%. This suggests that some of the discrepancies between the predicted and actual poverty rates might be due to the actual poverty rates not being fully comparable within countries over time. Figure 8: Predicted and survey-based country poverty trends Note: Predicted poverty rates using the two tiers and actual poverty rates from PIP. When estimates from PIP are not connected with a line, this indicates that the two estimates are not comparable to each other. Uses only the survey-year observations. The regional aggregates from PIP in Figure 7 use these survey-year observations but also interpolations and extrapolations at the country-level. In India, many different extreme poverty rates exist depending on the measure of consumption used and how one extrapolates from the most recent survey (Tarozzi 2007). At the time of this analysis, PIP used a consumption aggregate based on a uniform recall period and modeled poverty post 2011-12 based on estimates from Roy and Van der Weide (2025). In 2004 the predicted poverty rate is 23% while the one from PIP based on the uniform recall period is 40%. If the modified mixed recall period was used for 2004-05 instead, PIP would have a poverty rate of 28%, hence much more in line with the predictions from the model. Again, this suggests that 18 some of the differences between the predicted poverty rates can be explained by particularities of how poverty is measured in a country. Figure 8 shows the predicted and survey-based poverty rates for other populous, or poor and data deprived countries. In most cases, the two are fairly well aligned, though there are exceptions, such as the cases of China and India just discussed as well as the República Bolivariana de Venezuela, where the predicted poverty rates are much lower than the survey- based ones. Figures A.3-A.12 in the annex show similar plots for 218 economies. PIP currently assigns countries without any poverty estimate (or without national accounts data to extrapolate/interpolate from an old poverty estimate) the regional poverty rate. This concerns about 3% of the global population. In Appendix B we show how PIP’s regional and global poverty rates would change if we used the estimates from our two tiers instead of applying the regional average, as PIP currently does. Global poverty is projected to be revised upwards in 2019 by 11 million at the $2.15 line and 16 million at the $6.85 line. For certain countries the implications are larger, notably Afghanistan, Somalia, and the Democratic People’s Republic of Korea. 7 Conclusion This paper proposes a method to estimate welfare distributions in contexts where little data is available. Data deprivation is often a bigger concern in poorer and more fragile countries where data collection might be limited due to active conflict, lack of resources, or institutional fragility, but where efforts to monitor changes in living conditions are essential. While comprehensive surveys of household income and consumption remain the best way to measure household welfare, we offer a simple alternative to estimate welfare distributions when surveys are not available. The method consists of leveraging income or consumption distributions from more than 2,000 household surveys available in the World Bank’s Poverty and Inequality Platform (PIP) for 168 countries covering the period between 1991 and 2000. We combine these distributions with more than 1,000 predictors at the country-year level, including remote sensing variables, across various databases. We develop a simple model which predicts distributions for all countries in the world from a regression on country-level data. Guided by variable-selection techniques, we try various versions of such regressions, each time sequentially excluding all surveys available for one of the 168 countries available in our dataset to predict the distribution for the excluded country using the remaining 167 ones. We then calculate the absolute deviation between the true welfare and predicted welfare at 99 points on each distribution. Once we have these deviations for all surveys in the 168 countries, we calculate the mean absolute deviation across all surveys and all countries. We find that when predicting distributions with GDP per capita (or income groups if GDP/capita is unavailable), under-5-mortality, life expectancy, and rural population shares, no additional feature significantly reduces the error further. Our preferred model predicts log welfare off by around 30% on average. Though this may sound high, we show that a random forest using all 1,000+ predictors is unable to reduce the error, suggesting that much of the remaining error is likely to be noise. 19 We demonstrate how to apply our model to predict poverty rates in a given country, and apply our preferred model to global poverty measurement, comparing it to the World Bank’s poverty estimates. We show that the model in general tracks the official poverty estimates well, but with notable exceptions. Where there are exceptions, these may in part be explained by the deficiencies in the survey methodology of a particular country, and not just because our model comes up short. References Alkema, Leontine, and Jin Rou New. 2014. "Global Estimation of Child Mortality Using a Bayesian B-spline Bias-Reduction Model." The Annals of Applied Statistics: 2122-2149. Angrist, Noam, Pinelopi Koujianou Goldberg, and Dean Jolliffe. 2021. “Why Is Growth in Developing Countries So Hard to Measure?” Journal of Economic Perspectives 35 (3): 215–42. Bergstrom, Katy. 2022. "The Role of Income Inequality for Poverty Reduction." World Bank Economic Review 36 (3): 583-604. Bolt, Jutta, and Jan Luiten Van Zanden. 2024. "Maddison‐Style Estimates of the Evolution of the World Economy: A New 2023 Update." Journal of Economic Surveys. Bresson, Florent. 2009. "On the Estimation of Growth and Inequality Elasticities of Poverty with Grouped Data." Review of Income and Wealth 55(2): 266-302. Castaneda Aguilar, R. Andres. 2022. “Pip: Stata Module to Access World Bank’s Global Poverty and Inequality Data (Version 0.3.8).” STATA. https://worldbank.github.io/pip/. Chen, Yi-Ting. 2018. “A Unified Approach to Estimating and Testing Income Distributions with Grouped Data.” Journal of Business & Economic Statistics 36(3): 438–55. Chotikapanich, Duangkamon, William E. Griffiths, D. S. Prasada Rao, and Vicar Valencia. 2012. “Global Income Distributions and Inequality, 1993 and 2000: Incorporating Country-Level Inequality Modeled with Beta Distributions.” Review of Economics and Statistics 94(1): 52–73. Christiaensen, Luc, Peter Lanjouw, Jill Luoto, and David Stifel. 2012. “Small Area Estimation- Based Prediction Methods to Track Poverty: Validation and Applications.” The Journal of Economic Inequality 10(2): 267–297. Dang, Hai-Anh, Dean Jolliffe, and Calogero Carletto. 2019. “Data Gaps, Data Incomparability, and Data Imputation: A Review of Poverty Measurement Methods for Data-Scarce Environments.” Journal of Economic Surveys 33(3): 757–797. Datt, Gaurav, Valerie Kozel, and Martin Ravallion. 2003. “A Model-Based Assessment of India’s Progress in Reducing Poverty in the 1990s.” Economic and Political Weekly, 355–361. Datt, Gaurav, and Martin Ravallion. 2002. “Is India’s Economic Growth Leaving the Poor Behind?” Journal of Economic Perspectives 16(3): 89–108. Deaton, Angus. 2005. “Measuring Poverty in a Growing World (or Measuring Growth in a Poor World).” The Review of Economics and Statistics 87(1): 1–19. Deaton, Angus, and Paul Schreyer. 2022. “GDP, Wellbeing, and Health: Thoughts on the 2017 Round of the International Comparison Program.” Review of Income and Wealth 68(1): 1– 15. Decerf, Benoit, and Mery Ferrando. 2022. "Unambiguous Trends Combining Absolute and Relative Income Poverty: New Results and Global Application." World Bank Economic Review 36(3): 605-628. Eckernkemper, Tobias, and Bastian Gribisch. 2021. “Classical and Bayesian Inference for Income Distributions Using Grouped Data.” Oxford Bulletin of Economics and Statistics 83(1): 32– 65. 20 Ekhator-Mobayode, Uche E., and Johannes Hoogeveen. 2022. “Microdata Collection and Openness in the Middle East and North Africa.” Data & Policy 4: e31. Elbers, Chris, Jean O. Lanjouw, and Peter Lanjouw. 2003. “Micro-Level Estimation of Poverty and Inequality.” Econometrica 71(1): 355–364. Engstrom, Ryan, Jonathan Hersh, and David Newhouse. 2022. "Poverty from Space: Using High Resolution Satellite Imagery for Estimating Economic Well-Being." World Bank Economic Review 36(2): 382-412. Filmer, Deon, and Lant H. Pritchett. 2001. “Estimating Wealth Effects without Expenditure Data— or Tears: An Application to Educational Enrollments in States of India.” Demography 38 (1): 115–132. Gortan, Marco, Lorenzo Testa, Giorgio Fagiolo, and Francesco Lamperti. 2023. "A Unified Repository for Pre-Processed Climate Data Weighted by Gridded Economic Activity." Scientific Data 11: 533. Haddad, Cameron Nadim, Daniel Gerszon Mahler, Carolina Diaz-Bonilla, Ruth Hill, Christoph Lakner, and Gabriel Lara Ibarra. 2024. "The World Bank’s New Inequality Indicator: The Number of Countries with High Inequality." Policy Research Working Paper Series 10796. Hajargasht, Gholamreza, William E. Griffiths, Joseph Brice, DS Prasada Rao, and Duangkamon Chotikapanich. 2012. "Inference for Income Distributions Using Grouped Data." Journal of Business & Economic Statistics 30(4): 563-575. Hothorn, Torsten, Kurt Hornik, and Achim Zeileis. 2006. “Unbiased recursive partitioning: A conditional inference framework,” Journal of Computational and Graphical Statistics 15: 651–674. Jolliffe, Dean, and Espen Beer Prydz. 2021. “Societal Poverty: A Relative and Relevant Measure.” World Bank Economic Review 35(1): 180-206. Jolliffe, Dean Mitchell ⓡ Daniel Gerszon Mahler ⓡ Christoph Lakner ⓡ Aziz Atamanov ⓡ Samuel Kofi Tetteh Baah. 2024. "Assessing the Impact of the 2017 PPPs on the International Poverty Line and Global Poverty." World Bank Economic Review, lhae035. Jorda, Vanesa, and Miguel Niño-Zarazúa. 2019. “Global Inequality: How Large Is the Effect of Top Incomes?” World Development 123: 104593. Jorda, Vanesa, Miguel Niño-Zarazúa, Laurence Roope, and Finn Tarp. 2023. “Global income polarization: Relative and absolute perspectives.” WIDER Working Paper 146. Kanbur, Ravi, Eduardo Ortiz-Juarez, and Andy Sumner. 2022. "The global Inequality Boomerang." WIDER Working Paper 27. Kraay, Aart, and Roy Van der Weide. 2022. "Measuring Intragenerational Mobility Using Aggregate Data." Journal of Economic Growth 27(2): 273-314. Kraay, Aart ⓡ Christoph Lakner ⓡ Berk Ozler ⓡ Benoit Decerf ⓡ Dean Jolliffe ⓡ Olivier Sterck ⓡ Nishant Yonzan. 2023. “A New Distribution Sensitive Index for Measuring Welfare, Poverty, and Inequality”. Policy Research Working Paper 10470. Washington, D.C.: World Bank Group. Kuznets, Simon. 1955. “Economic Growth and Income Inequality.” American Economic Review 45(1): 1-28. Lakner, Christoph, Daniel Gerszon Mahler, Mario Negre, and Espen Beer Prydz. 2022. "How Much Does Reducing Inequality Matter for Global Poverty?." Journal of Economic Inequality 20(3): 559-585. Lee, Kamwoo, and Jeanine Braithwaite. 2022. "High-Resolution Poverty Maps in Sub-Saharan Africa." World Development 159: 106028. Mahler, Daniel Gerszon, R. Andrés Castañeda Aguilar, and David Newhouse. 2022. “Nowcasting Global Poverty.” World Bank Economic Review 36 (4): 835–856. 21 Mahler, Daniel Gerszon ⓡ Nishant Yonzan ⓡ Christoph Lakner. 2022. “The Impact of COVID- 19 on Global Inequality and Poverty.” Policy Research Working Paper 10198; Washington, DC: World Bank Group. Martinez, Luis R. 2022. "How Much Should We Trust the Dictator’s GDP Growth Estimates?" Journal of Political Economy 130(10): 2731-2769. Nicoletti, Cheti, Franco Peracchi, and Francesca Foliano. 2011. "Estimating Income Poverty in the Presence of Missing Data and Measurement Error." Journal of Business & Economic Statistics 29(1): 61-72. Pinkovskiy, Maxim, and Xavier Sala-i-Martin. 2016. “Lights, Camera … Income! Illuminating the National Accounts-Household Surveys Debate *.” The Quarterly Journal of Economics 131 (2): 579–631. Pokhriyal, Neeti, and Damien Christophe Jacques. 2017. "Combining Disparate Data Sources for Improved Poverty Prediction and Mapping." Proceedings of the National Academy of Sciences 114(46): E9783-E9792. Prydz, Espen Beer, Dean Jolliffe, and Umar Serajuddin. 2022. “Disparities in Assessments of Living Standards Using National Accounts and Household Surveys.” Review of Income and Wealth 68: S385-S420. Roberts, David R., Volker Bahn, Simone Ciuti, Mark S. Boyce, Jane Elith, Gurutzeta Guillera‐ Arroita, and Severin Hauenstein. 2017. "Cross‐Validation Strategies for Data with Temporal, Spatial, Hierarchical, or Phylogenetic Structure." Ecography 40(8): 913-929. Roy, Sutirtha, and Roy Van Der Weide. 2025. “Estimating poverty for India after 2011 using private-sector survey data.” Journal of Development Economics 172, 103386. Sherwood, B., & Maidman, A. (2017). rqPen: Penalized quantile regression. R package version, 2. Soergel, Bjoern, Elmar Kriegler, Benjamin Leon Bodirsky, Nico Bauer, Marian Leimbach, and Alexander Popp. 2021. "Combining Ambitious Climate Policies with Efforts to Eradicate Poverty." Nature Communications 12(1): 2342. Stifel, David, and Luc Christiaensen. 2007. "Tracking Poverty over Time in the Absence of Comparable Consumption Data." The World Bank Economic Review 21(2): 317-341. Tarozzi, Alessandro. 2007. Calculating Comparable Statistics from Incomparable Surveys, With An Application to Poverty in India.” Journal of Business and Economic Statistics 25(3), 314-336. UN DESA. 2016. “Transforming Our World: The 2030 Agenda for Sustainable Development.” United Nations. 2018. “World Urbanization Prospects: The 2018 Revision.” Department of Economic and Social Affairs, Population Division. New York: United Nations. United Nations, Department of Economic and Social Affairs, Population Division (2022). World Population Prospects 2022: Methodology of the United Nations population estimates and projections. UN DESA/POP/2022/TR/NO. 4. Van Der Weide, Roy, Brian Blankespoor, Chris Elbers, and Peter Lanjouw. 2022. “How Accurate Is a Poverty Map Based on Remote Sensing Data? An Application to Malawi.” Journal of Development Economics 171, 103352. 22 Appendix A: Additional results Table A.1: Countries with at most three poverty estimates in PIP Algeria Iraq Nepal Syrian Arab Republic Angola Japan Papua New Guinea Timor-Leste Cabo Verde Kiribati Samoa Tonga Central African Repulic Lebanon São Tom’e and Príncipe Trinidad and Tobago Chad Lesotho Sierra Leone Turkmenistan Comoros Liberia Solomon Islands Tuvalu Congo, Rep. Marshall Islands South Sudan United Arab Emirates Congo, Dem. Rep. Mauritius St. Lucia Vanuatu Gabon Micronesia Sudan Yemen, Rep. Guyana Myanmar Suriname Zimbabwe Haiti Nauru Figure A.1: Alternative distributional approaches vis-à-vis baseline approach Note: Error of alternative approaches minus the error of our baseline approach. ‘Indirect’ refers to estimates from first predicting the median and Gini, and then inferring a full log-logistic distribution. ‘OLS’ refers to using an OLS regression rather than a quantile regression. ‘Log-normal’ refers to assuming that the distributions are log-normal rather than log- logistic. 23 Figure A.2: Variables important for predicting median and Gini (a) Median (b) Gini Note: The figure shows the ten most important predictors of the median and Gini out of 1000+ variables featuring in our models. The predictor with the most relevance is scaled to 1. 24 Figure A.3: Predicted and actual poverty rates (1 of 9) Note: When poverty rates from PIP are disconnected, estimates are not comparable. 25 Figure A.4: Predicted and actual poverty rates (2 of 9) Note: When poverty rates from PIP are disconnected, estimates are not comparable. 26 Figure A.5: Predicted and actual poverty rates (3 of 9) Note: When poverty rates from PIP are disconnected, estimates are not comparable. 27 Figure A.6: Predicted and actual poverty rates (4 of 9) Note: When poverty rates from PIP are disconnected, estimates are not comparable. 28 Figure A.7: Predicted and actual poverty rates (5 of 9) Note: When poverty rates from PIP are disconnected, estimates are not comparable. 29 Figure A.8: Predicted and actual poverty rates (6 of 9) Note: When poverty rates from PIP are disconnected, estimates are not comparable. 30 Figure A.9: Predicted and actual poverty rates (7 of 9) Note: When poverty rates from PIP are disconnected, estimates are not comparable. 31 Figure A.10: Predicted and actual poverty rates (8 of 9) Note: When poverty rates from PIP are disconnected, estimates are not comparable. 32 Figure A.11: Predicted and actual poverty rates (9 of 9) Note: When poverty rates from PIP are disconnected, estimates are not comparable. 33 Appendix B: Application to the World Bank’s global poverty and inequality measures The method we developed lends itself to being used in the World Bank’s global poverty measures for the countries that do not have any survey data. For the purpose of calculating regional and global poverty headcounts, countries without a household survey at any point in time, or countries without national accounts data needed for year-to-year extrapolations, are currently assigned the population-weighted regional average poverty rate. This affects 55 economies (Table B.1). 5 A few countries, such as Afghanistan, Saudi Arabia, the República Bolivariana de Venezuela, and the Democratic People’s Republic of Korea, make up the largest part of the population without data, adding up to more than 200 million people across all the missing economies (Table B.2). Table B.1: Economies without household survey data at any point in time in PIP Cuba Afghanistan Curaçao Kosovo San Marino American Samoa Dominica Kuwait Saudi Arabia Andorra Equatorial Guinea Libya Singapore Antigua and Barbuda Eritrea Liechtenstein Sint Maarten (Dutch part) Aruba Faeroe Islands Macao SAR, China Somalia Bahamas, The French Polynesia Monaco South Sudan Bahrain Gibraltar Nauru St. Kitts and Nevis Barbados Greenland New Caledonia St. Martin (French part) Bermuda Grenada New Zealand St. Vincent and the Grenadines British Virgin Islands Guam Northern Mariana Islands Timor-Leste Brunei Darussalam Hong Kong SAR, China Oman Turks and Caicos Islands Cambodia Isle of Man Palau Venezuela, RB Cayman Islands Korea, Dem. People’s Puerto Rico Virgin Islands (U.S.) Channel Islands Rep. Qatar Table B.2: Missing economies ranked by population in 2019 Economy Population 2019 (in millions) Afghanistan 38 Saudi Arabia 36 Venezuela, RB 29 Korea, Dem. People’s Rep. 26 Cambodia 16 Somalia 16 Cuba 11 South Sudan 10 Hong Kong SAR, China 8 Libya 7 Other Economies 40 Total Missing 205 5 Table B.1 includes economies that are missing in all years (e.g., the Democratic People’s Republic of Korea), but also economies that are missing only for some years (e.g., the República Bolivariana de Venezuela). The latter occurs when the national accounts data needed for the extrapolations is unavailable in some years. 34 Though the World Bank often focuses on poverty at specific poverty lines ($2.15/day and $6.85/day), PIP allows users to query any poverty line. Therefore, any method to be used for countries without data needs to predict a poverty rate for any poverty line, and hence, in practice recover a full distribution. In addition, full distributions are needed to extrapolate and nowcast poverty, and to calculate inequality measures, such as the prosperity gap (Kraay etⓡ al. 2023). This makes our method particularly suitable. In this appendix, we first show how our model performs relative to the current practice of using regional averages. Next, we demonstrate how the model performs compared to a model which directly predicts poverty rates at the global poverty lines of $2.15 and $6.85 per person per day to see how much is lost by predicting a full distribution. Third, we explain how the model relates to PIP’s current extrapolation rules. Finally, we show the implications of adopting the model on global and regional poverty and inequality measures. B.1 Accuracy compared to PIP’s current method Figure B.1 compares the errors using PIP’s current method with the two tiers for the full sample and for data deprived countries. We add the error for two similar methods: using the regional average by World Bank region instead of PIP region, and using the income group average. On the full sample, PIP’s method is 18% higher than tier 2 and 33% higher than tier 1. It performs better than using income groups or World Bank regions in the full sample, but for data deprived countries, income groups outperform the PIP regions. This suggests that data deprived countries are less representative of their region than their income group. On the data deprived sample, PIP’s method is 23% and 39% higher than tier 2 and 1, respectively. Figure B.1: Accuracy of PIP’s method compared to two tiers 35 Note: Prediction error of two tiers, PIP’s current method (“PIP region”) and PIP’s current method applied to two alternative groupings. B.2 Accuracy compared to model predicting poverty rates directly One concern with the proposed tiers is that their focus on predicting full distributions may imply that they are not as accurate as models that predict a particular poverty rate. Given that most cross-country poverty comparisons use the international poverty line or another fixed poverty line, such as the typical poverty line of upper-middle-income countries, $6.85, this could come at some cost of the accuracy of such work. Figure B.2 shows the errors from our two models when evaluated by how well they predict poverty rates at the $2.15 line or $6.85 line. The predicted poverty rate is then compared with the true poverty rate at that line, and the error is now evaluated as the mean absolute deviation in poverty rates. We compare the errors from these models with the errors frum running fractional logit regressions, where the left-hand-side is the true poverty rate, and the covariates used on the right-hand side are the ones used for the two tiers. Fractional logit regressions are generalizations of logit regressions where the left-hand-side variable can take any value in the unit interval. In all cases, we still evaluate errors using leave-one-country-out cross-validation. We show the errors on the full sample and the data deprived sample. Figure B.2: Accuracy of proposed method on specific poverty lines Note: Errors in predicted poverty rates using our two-tier distributional models (“Distributions”) and the errors from predicting the poverty rates directly without recovering full distributions (“Direct poverty rate”) The error of 6.7 for the distributional approach on the full sample at the $2.15 line suggests that our model on average predicts $2.15 poverty rates 6.7 percentage points off. Unsurprisingly, the errors are larger for tier 2 and for the data deprived sample. Somewhat surprisingly, the errors are almost always higher when the poverty rates are predicted directly through fractional logit regressions. This suggests that not much is lost by predicting full distributions, in fact, the error might be reduced. It is important to note, though, that the choice of variables in the prediction 36 model is held constant in these comparisons: If we had focused exclusively on predicting poverty at the $2.15 line, it is likely that other covariates would have been chosen, which could have led to a model predicting $2.15 poverty rates directly performing better than shown above. B.3 Consistency with PIP’s extrapolation rule PIP currently extrapolates old welfare vectors forward in time by assuming that a 1% growth in GDP per capita or HFCE per capita leads to a 1% growth in welfare across the distribution when income is used, and 0.7% when consumption is used. The coefficients from the tier 1 regression (Table 1) suggest that, all else equal, when GDP per capita grows 1%, the predicted median grows by 0.393% for consumption vectors and 0.785% (0.393+0.392) for income vectors. Yet this ignores that growth in GDP is likely to also impact welfare indirectly through lowering under-5 mortality, rural population shares, and by increasing life expectancy (the other three variables). Regressing changes in these other variables on growth in GDP per capita suggests that a 1% growth in GDP per capita is associated with a 0.82% decline in under-5 mortality, a 0.15 pct. point decline in the rural population share, and a 0.06 increase in life expectancy. Taking these factors into account, our estimate suggests that a 1% growth in GDP per capita is expected to increase consumption vectors by 0.69% and income vectors by 1.08%, which is very close to the extrapolation rule. B.4 Implications for global poverty and inequality measurement Finally, we implement our preferred methods to predict poverty rates at the international poverty line ($2.15/day) and the upper-middle-income poverty line ($6.85/day) for countries that are missing from PIP. Specifically, we replace missing headcounts for 1,721 country-year observations with predictions from our Tier 1 model when GDP per capita is available (1,092 country-year observations), and from our Tier 2 model in the remainder of the cases (629 observations). Implementing our proposed method increases the global poverty rate in 2019 by about 0.14 percentage points at the $2.15 line and 0.21 percentage points at the $6.85 line, equivalent to an additional 11 million or 16 million poor people globally (Figure B.3). There is a similarly small, estimated increase in global inequality, with the global Gini coefficient estimated at 61.9 instead of 61.8 in 2019. For both global poverty and global inequality, the trend from 1991- 2019 looks strikingly similar with or without our two tiers. This is because countries without data make up only 3% of the global population. The differences are slightly larger at the regional level, with the extreme poverty rate in East Asia & Pacific increasing by 7%, and the rate of Latin America & the Caribbean and South Asia increasing by 4% (Figure B.4). The increase in the poverty rate for these regions is driven by populous countries without data for which the two tiers suggest higher poverty rates than the regional average. In Afghanistan, the extreme poverty rate is 20 percentage points higher in 2019 (equivalent to 7 million additional poor), while the figures are also notable for Somalia (increase by 14 pct. points, 2 million additional poor), the República Bolivariana de Venezuela (5 pct. points, 1.5 million poor), Cambodia (7 pct. points, 1.1 million poor) and the Democratic People’s Republic of Korea (3 pct. points, 0.7 million poor). These findings suggest that poverty rates are underestimated in data deprived countries which tend to be poorer than others in their regions. Country examples of the poverty rates from the current method and our proposal are shown in Figure B.5. 37 Figure B.3: Global poverty and inequality (a) Global poverty rate, 1991-2019 (b) Global inequality, 1991-2019 Note: Panel (a) Compares the poverty rates in PIP (version 20230919_2017_01_02_PROD) with estimates when using the models presented in this paper for countries without a poverty estimate in PIP (in dashed lines). The two nearly perfectly overlap with each other. Panel (b) does the same for the Global Gini, using estimates from Mahler ⓡ al. (2022), updated to reflect PIP version 20230919_2017_01_02_PROD. Figure B.4: Regional poverty in 2019 Note: Compares regional poverty rates in 2019 using PIP’s current method to deal with missing countries (“From PIP”) and the proposed models from this paper (“Predictions”). 38 Figure B.5: Country examples comparing current method to proposed alternative Note: Compares country-level poverty rates for countries without any data in PIP using the regional average poverty rate (“from PIP”) and the proposed models from this paper (“Predictions”). The country predictions are more volatile than the ones implicitly used in PIP because the current procedure averages out country fluctuations by relying on information from the whole region. By contrast, the input data used in the two tiers here at times change drastically year-to-year. For example, Somalia’s life expectancy changes from 27 years in 1992 to 51 years in 1993, Cuba moves from LMIC to UMIC in 2007, and Cambodia’s GDP/capita falls from US$1,717 to US$1,078 from 1993 to 1994. 39