The World Bank Economic Review, 36(4), 2022, 835–856 https://doi.org10.1093/wber/lhac017 Article Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 Nowcasting Global Poverty Daniel Gerszon Mahler, R. Andrés Castañeda Aguilar, and David Newhouse Abstract This paper evaluates different methods for nowcasting country-level poverty rates, including methods that apply statistical learning to large-scale country-level data obtained from the World Development Indicators and Google Earth Engine. The methods are evaluated by withholding measured poverty rates and determining how accurately the methods predict the held-out data. A simple approach that scales the last observed welfare distribution by a fraction of real GDP per capita growth performs nearly as well as models using statistical learning on 1,000+ variables. This GDP-based approach outperforms all models that predict poverty rates directly, even when the last survey is up to five years old. The results indicate that in this context, the additional complexity introduced by applying statistical learning techniques to a large set of variables yields only marginal improvements in accuracy. JEL classification: C53, D31, I32, O10 Keywords: poverty, nowcasting, machine learning, measurement 1. Introduction Timely and comparable poverty estimates are vital to assess countries’ development progress. Interna- tional poverty estimates serve as a public good for researchers and inform the development community on efforts to meet the first Sustainable Development Goal, to end extreme poverty by 2030. Within in- ternational development organizations, national development agencies, and NGOs, they also inform the allocation of resources and the development of strategic priorities. Yet timely and comparable estimates of poverty are lacking for many reasons. In some countries, fragility, conflict, and violence make it difficult to conduct household expenditure surveys altogether, while in other countries, lack of financial resources is the main obstacle. Even when surveys are frequently con- ducted, the time it takes to field a survey, collect, process, and analyze the data often implies a two-year Daniel Gerszon Mahler (corresponding author) is with the World Bank Data Group in Washington, DC. His email address is dmahler@worldbank.org. R. Andrés Castañeda Aguilar is with the World Bank Data Group in Washington, DC. His email address is acastanedaa@worldbank.org. David Newhouse is with the World Bank Data Group in Washington, DC. His email address is dnewhouse@worldbank.org. The research for this article was supported financially by the UK government through the Data and Evidence for Tackling Extreme Poverty (DEEP) Research Programme and by the World Bank through a Research Support Budget grant. The authors thank Aart Kraay, Andres Fernando Chamorro Elizondo, Benjamin Stewart, Benu Bidani, Christoph Lakner, Dean Jolliffe, Lucas Kitzmueller, Marta Schoch, Minh Cong Nguyen, Nishant Yonzan, Nobuo Yoshida, and Samuel Kofi Tetteh Baah for insightful comments. The authors are also grateful for feedback received during the special IARIW-World Bank Conference “New Approaches to Defining and Measuring Poverty in a Growing World,” the CCS-UN Workshop “Nowcasting in International Organizations,” and the 2021 ECINEQ Conference. A supplementary online appendix is available with this article at The World Bank Economic Review website. © 2022 International Bank for Reconstruction and Development / The World Bank. Published by Oxford University Press 836 Mahler, Castañeda Aguilar, and Newhouse lag from the time of data collection to the release of international poverty estimates. In some instances, the data are never publicly released due to quality concerns or lack of data transparency. This was the case with India’s 2017/18 National Sample Survey (Jha 2019) and is the case regularly in the Middle East & North Africa (Ekhator-Mobayode and Hoogeveen 2021). Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 As a result, the latest published data on poverty often paint an outdated picture of poverty in a country. As of October 2021, on average across the developing world, the most recent survey with international poverty data was from 2014. For this reason, initiatives that reliably and cost-effectively predict timely poverty rates are crucial for informed and effective high-level decision-making. The objective of this paper is to test various methods to estimate extreme poverty in all countries of the world as of the present year (at the time of writing, 2021) and as of the preceding year (2020). The paper refers to estimates of poverty for the present year as nowcasts and estimates of poverty for the preceding year as nearcasts. For nearcasting, one can rely on all data that are produced with a one-year time lag. For nowcasting, one can only rely on variables that themselves have been nowcasted and variables that are produced with little or no time lag, such as certain remote sensing indicators. As possible predictors of poverty, more than 1,000 variables from the World Development Indica- tors, the World Economic Outlook, and the Google Earth Engine are used. All of these predictors are combined with the PovcalNet database, which contains more than 2,000 international poverty esti- mates covering 168 countries. The models are trained on these past estimates—essentially by pretend- ing a subset of them do not exist—and evaluated by measuring how well they approximate the held-out estimates. Intuitively, to predict extreme poverty around the world, one might first try models that predict poverty rates directly. Yet this ignores that prior full distributions of consumption or income (henceforth welfare) are available for most countries. The study explores whether greater accuracy can be obtained by predict- ing poverty indirectly, by, for example, predicting changes in poverty from the last survey, by predicting growth in mean welfare and applying this growth to scale up the past distribution, or by predicting growth in the mean and in the Gini and applying growth incidence curves to scale and stretch the past distributions. Findings suggest that models that predict poverty rates directly are outperformed by models that predict growth in mean welfare since the last survey and scale the last distribution by this predicted growth. Though this method assumes that inequality remains unchanged since the last survey, explicitly modeling distributional changes by predicting changes in the Gini does not help. The reason for this is that of the 1,000+ candidate variables, none of them contains notable information about changes in inequality. The best performing method overall, which predicts growth in mean welfare using a random forest, gives a mean absolute deviation of 3.65 percentage points. This means that on average over all countries with data, the predicted poverty rate evaluated at the international poverty line of $1.90 is 3.65 percentage points from the truth. A simplified model which just uses a fraction of growth in real GDP per capita to scale up the last mean gives a mean absolute deviation of 3.69, about 1 percent worse than the overall best performing model. In other words, conditional on knowing growth in real GDP per capita since the last observed distribution, no other variable contributes significant information about the evolution of poverty rates. Even when the last survey is as much as five years old, extrapolating forward using a fraction of growth in real GDP per capita is superior to predicting poverty rates directly with 1,000+ variables. For longer extrapolation times, power is lacking to determine which method is superior. The model works well even when GDP growth rates themselves are nowcasted. For these reasons and due to the simplicity of this approach, it is considered the preferred method in this paper. On the one hand, the relevance of GDP growth for nowcasting poverty is not surprising; the impact of growth on poverty reduction has been well known for decades (Kraay 2006; Ferreira and Ravallion 2009). On the other hand, ample evidence has found large inconsistencies between consumption measured The World Bank Economic Review 837 in household surveys and national accounts within and across countries (Ravallion 2003; Deaton 2005; Ferreira, Leite, and Ravallion 2010; Pinkovskiy and Sala-i Martin 2016; Deaton and Schreyer 2021; Prydz, Jolliffe, and Serajuddin forthcoming) and noted the difficulty of measuring GDP in developing countries (Angrist, Goldberg, and Jolliffe 2022). Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 Several factors can explain why GDP nonetheless is found to be such an important predictor of poverty. First, discrepancies between levels of GDP and levels of welfare do not directly affect the accuracy of the preferred method, which is based on growth rates from the two data sources. Second, the average discrepancy between growth rates in the two sources is accounted for by only allowing a fraction of growth in real GDP per capita to “pass through” to welfare as measured in household surveys. The preferred method has a passthrough rate of about 0.7 for consumption-based poverty estimates, and a passthrough rate of about 1 for income-based poverty estimates. Third, GDP is probably the statistic in which the global community has invested the most amount of financing and capacity-building to improve quality and ensure cross-country comparability (Angrist, Goldberg, and Jolliffe 2022). Despite measurement issues and shortcomings, one could expect GDP to have more signaling properties for cross-country measures of average well-being than any other statistic. That said, there are cases where the GDP-based model is less attractive. The GDP-based model works less well for rich countries and for situations where specific components of GDP make up an unusually large or small share. As such, it is certainly possible that other methods or data can improve upon the models presented in this paper. This is particularly the case when the objective is to nowcast poverty for a single country, where relying on microsimulation tools or more granular data will likely be superior as they can be tailored to country-specific contexts. Yet, as such methods are hard to implement consistently across many countries, the present exercise can be attractive when the objective is to compare poverty across a range of countries. The findings contribute to the literature by pointing out that when predicting poverty at the national level, (a) predicting changes from the past distribution is generally superior to predicting poverty rates directly and (b) conditional on knowing real GDP per capita, other variables including publicly available remote sensing data generally carry little additional information. These contributions can extend beyond the problem of nearcasting and nowcasting poverty to any attempt that tries to express global poverty in a given year. Such attempts also require extrapolating or interpolating between poverty estimates, for which the findings may be applicable. To our knowledge, this is the first paper comparing different ways of nowcasting poverty at a global scale. Other papers, such as Chi et al. (2022), Cuaresma et al. (2018), and Moses et al. (2021) have predicted global or near-global poverty rates but not with the purpose of testing different methods. For other indicators, nowcasting is a more established exercise, such as nowcasting GDP (Giannone, Reich- lin, and Small 2008), inflation (Aruoba and Diebold 2010), and macroeconomic variables more broadly (Giannone et al. 2012). Partially due to the Sustainable Development Goal (SDG) target of ending ex- treme poverty by 2030, many papers have focused on forecasting global poverty rather than nowcasting it (Hillebrand 2008; Ravallion 2013; Edward and Sumner 2014; Lakner et al. 2022; Sumner and Hoy 2022). 2. Method This paper employs several methods to predict poverty around the world. This section outlines the meth- ods, their advantages and disadvantages, and other important methodological choices. Three distinct issues will be focused upon: (a) the target variable being predicted and (if not poverty rates directly) how poverty rates are obtained from these predictions, (b) the algorithms used to generate the pre- dictions, and (c) the evaluation criterion used to judge predictive performance. Throughout this paper, when poverty rates are referred to, the international poverty line of $1.90 per day (Ferreira et al. 2016) is used. 838 Mahler, Castañeda Aguilar, and Newhouse Table 1. Target Variables Target variable Approach for estimating poverty (1) Poverty rates Predicted directly (2) Changes in poverty rates Apply the predicted change to the most recent poverty rate Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 (3) Mean welfare Scale the past distribution to the predicted mean welfare (4) Growth in mean welfare Scale the past distribution by 1 + the predicted growth in mean welfare (5) Mean welfare and Gini coefficient (a) Assume the distribution is log-normal or log-logistic or (b) apply a growth incidence curve from the past distribution to match the predicted mean and Gini (6) Growth in mean welfare and Gini coefficient (a) Apply the predicted growth in the mean and Gini to the past mean and Gini and assume that the distribution is log-normal or log-logistic or (b) apply a growth incidence curve to the past distribution with the predicted growth in the mean and Gini Source: Authors’ overview. Note: Six different combinations of target variables that will be used for the predictions with information on how poverty rates will be backed out based on the target variable(s). 2.1. The Target Variable When deciding how to predict poverty, it seems intuitive that the target variable to predict should be the poverty rate in each country. Yet such a method ignores that prior poverty estimates from most countries exist, which could be used to predict changes from. Behind these prior estimates are a full distribution of welfare which might contain relevant information—for example by distinguishing the near poor and the rich. Taking advantage of these distributions, this paper uses six different target variable combinations (table 1) as explained in more detail in what follows. (1) Poverty Rates and (2) Changes in Poverty Rates. Predicting the poverty rate directly is the most intuitive and straightforward option; the poverty rates are the ultimate objective of this paper. Predicting poverty rates directly at the nowcasting year, tn , has the advantage that it can yield estimates for countries without any previous poverty estimates at all. Predicting changes in poverty from the past survey conducted at time ts , in contrast, needs some assumptions about poverty levels in countries without data to arrive at global poverty rates. Yet, by utilizing the past survey, one can exploit that there is a past estimate to anchor the analysis around. When predicting changes in poverty, c ˆpoverty,tn ,ts , the nowcasted poverty rates are given by ˆpoverty,ts ,tn . ˆpoverty,ts ,tn ) = povertyts + c povertytn (c When predicting changes from the past survey, the annualized change in the poverty rate in percentage points is used. This avoids having to predict extreme and undefined values, which often occur when predicting the annualized growth in poverty rates, due to countries with poverty rates close to or at zero percent. (3) Mean Welfare and (4) Growth in Mean Welfare. While predicting changes in poverty rates from the past survey exploits some information available, it still ignores the fact that a whole distribution of welfare was available in the past. Countries with high density around the poverty line are likely to experience different magnitudes of changes in poverty than countries with sparse density around the poverty line. By predicting the mean or growth in the mean and scaling the past distribution to match these predictions, the model takes full advantage of the previous data in the sense that the entire distribution is leveraged. Method (3) works by scaling the welfare of each household, h, at the last survey by the ratio of the The World Bank Economic Review 839 Figure 1. Example of Recovery of Poverty Rates from Predictions of Mean Welfare Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 Source: Authors’ illustration based on data from PovcalNet. Note: Illustration of how models are implemented when the target variable is mean welfare or growth in mean welfare. This figure uses the last observed distribution from Botswana from 2016 and shows how that distribution is projected forward by a hypothetical prediction of a growth in mean welfare of 4 percent per year. predicted mean at the nowcasting year, μ ˆ tn , and the observed mean at the last survey, μts . The scaled distribution is used to estimate poverty at time tn : μ ˆ tn ˆ tn ) = F welfareh,ts povertytn (μ < 1.90 . μts Similarly, method (4) works by taking the last observed distribution of welfare and scaling it by the growth ˆ μ,ts ,tn : in the mean predicted between ts and tn , g ˆ μ,ts ,tn ) < 1.90]. ˆ μ,ts ,tn ) = F [welfareh,ts (1 + g povertytn (g A hypothetical example can be constructed using the latest observed distribution for Botswana from 2016 assuming predicted annualized growth in the mean of 4 percent between 2016 and 2021 (fig. 1). By shifting the distribution to the right reflecting five years of this growth rate, the poverty rate at $1.90 declines from about 15 percent to 9 percent. Predicting growth in the mean has the advantage that the model can be applied to any poverty line. Yet it imposes the assumption that all households have experienced the same growth since the last survey. In other words, it imposes that inequality has not changed since the last survey. (5) Mean Consumption and Gini Coefficient and (6) Growth in Mean and Growth in Gini. The fifth and sixth methods try to deal with the latter issue by also predicting inequality—either directly or by predicting growth in inequality since the last survey. A challenge with this approach is that there are many different measures of inequality and infinitely many ways in which the same level/growth of inequality can materialize. In this paper, the Gini coefficient is used as the measure of inequality due its popularity, together with some further distributional assumptions to pin down how the Gini shapes the distribution. In particular, two ways of converting Gini predictions into poverty rates are used. First, the predicted mean and Gini are employed together with known two-parameter distributional shapes, the log-normal distribution, which is frequently used for poverty and inequality analysis (see for 840 Mahler, Castañeda Aguilar, and Newhouse Figure 2. Illustration of Log-Normal and Log-Logistic Conversions Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 Source: Authors’ calculations. Note: Predicted poverty rate for a given predicted mean welfare, μ ˆ tn , and Gini, ginitn , when assuming a log-normal distribution or a log-logistic distribution. example Bourguignon 2003) and the log-logistic distribution (also known as the Fisk distribution, after Fisk 1961).1 These distributions are used to back out the poverty rate given a predicted mean, μ ˆ tn and predicted Gini, ginitn . Formally, poverty with the log-normal distribution is derived as ˆ tn ) + 2[erf−1 (ginitn )]2 ln(1.9) − ln(μ ˆ tn , ginitn ) = povertytn ,lognormal (μ , 2erf−1 (ginitn ) and with the log-logistic distribution as 1 −1 ˆ tn sin(π ginitn ) μ ginitn ˆ tn , ginitn ) = 1 + povertytn ,loglogistic (μ . 1.9π ginitn The resulting poverty rates can be plotted as a function of μ ˆ tn and ginitn (fig. 2). The other method applies a specific growth incidence curve (GIC) from the last observed distribu- tion. GICs plot the growth in welfare as a function of the percentile p of the initial welfare distribution (Ravallion and Chen 2003). Downward-sloping GICs reduce inequality and vice versa. Evidence shows that GICs often take on approximately linear and convex forms (Kakwani 1993; Ferreira and Leite 2003; Lakner et al. 2022). By imposing particular functional forms on the GICs, given a predicted mean welfare and predicted Gini, there is only one possible GIC. When using GICs, the nowcasted poverty rates are backed out as follows: ˆ tn , ginitn ) = F [welfareh, p,ts (1 + g p,ts ,tn (μ povertytn ,GIC (μ ˆ tn , ginitn )) < 1.9], ˆ tn , ginitn ) are percentile-specific growth rates, which are determined such that the resulting where g p,ts ,tn (μ distribution matches the predicted mean and Gini. In addition, for the linear GIC, there is a requirement that g p,ts ,tn = β + δ p, while for the convex GIC, there is a condition that g p,ts ,tn = (1 − α )(1 + γ ) − 1 + [α (1 + γ )μts ]/μ p,ts . Here β and δ or α and γ are parameters that are estimated to ensure that the equa- tions hold (Lakner et al. 2022). The most recent survey from Botswana in 2016 can again be used as the starting point to illustrate how the two GICs impact the shape of the nowcasted distribution (fig. 3). Assuming that the nowcasted mean has grown by 4 percent annually since the last survey and that the nowcasted Gini is 5 percent lower than 1 Only two-parameter distributions are used given that only two quantities—the mean and the Gini—are predicted. Under this setup, it is not possible to arrive at a unique solution for three-parameter distributions. For further work it would be The World Bank Economic Review 841 Figure 3. Illustration of Growth Incidence Curve Conversions Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 Source: Authors’ illustration based on data from PovcalNet. Note: Examples of a linear and convex GIC, here applied to the last survey in Botswana from 2016. the last observed Gini, the two GICs will result in different predicted poverty rates as they distribute the growth differently along the distribution. The above equations pertain to method (5). Analogous equations can be made with method (6) where the target variables are the growth in the mean and Gini. Throughout the analysis, growth in the Gini, rather than changes in the Gini, is predicted based on preliminary analysis comparing the performance of those two options. Particular attention will be paid to a submethod under the fourth category, which is a variant of what the World Bank uses to extrapolate poverty in countries and report on Sustainable Development Goal indicator 1.1.1—to end extreme poverty by 2030. The method is based on the premise that there is a tight relationship between income or expenditure measured in national accounts and income or consumption observed in household surveys. It works by taking the last observed distribution of welfare and scaling the welfare of each household by the growth observed in real GDP per capita from national accounts between the survey and the nowcasting year.2 It assumes that growth observed in national accounts is fully “passed through” to the welfare observed in household surveys and that the only factor informative for changes in poverty is growth in national accounts. In addition, it assumes that growth accrues to everyone equally, that is, without changing the distribution of welfare. This is problematic if growth was pro-poor or pro-rich in the intervening period. The methods covered are obviously not comprehensive, and other ways of predicting poverty rates exist. Hopefully, the methods chosen cover both a mix of the most intuitive options and those that have been applied in prior work. 2.2. Algorithms In order to predict any of the target variables, a number of frequently used machine-learning algorithms are relied upon, particularly the lasso (Tibshirani 1996), the post-lasso (Belloni et al. 2013), CART random interesting to see whether predicting another outcome, for example the median, and applying three-parameter distribu- tions improves upon what is found here. 2 The World Bank uses Household Final Consumption Expenditure (HFCE) whenever available and GDP otherwise, with the exception of countries in Sub-Saharan Africa where only GDP is used (Prydz et al. 2019). Here, focus is on GDP since HFCE nowcasts are not available for many countries. 842 Mahler, Castañeda Aguilar, and Newhouse forests (Breiman 2001), conditional inference random forests (Hothorn, Hornik, and Zeileis 2006), and gradient boosting (Friedman 2001). These methods all have in common that they can predict the outcome variable of interest while being agnostic about which variables are relevant for the predictions. Since the features used suffer from a lot of missing data, it is necessary to find a strategy to deal with this Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 missingness. Simply deleting rows with missing values is not feasible as this would leave no or very few observations. For the conditional inference random forests and gradient boosting, imputation methods embedded in the algorithms to deal with missing data are relied upon. For conditional inference random forests, for example, the algorithm works by sequentially splitting the sample into two based on a variable deemed most predictive of the target variable. The algorithm might judge that countries with a decline in the share of male workers in agriculture is predictive of growth in the mean. For country-years without data on the share of male workers in agriculture, the algorithm will search for the most similar variable in terms of how it relates to the target variable, which in this case could be the share of all workers in agriculture, and split the observations with missing values in the first variable based on the latter. Such methods of dealing with missing values are not possible or not available in the programs used here for the lasso, post-lasso, and CART random forests. Instead, the entire data set of features will be multiply imputed to avoid missing values altogether (Rubin 1976, 2004; Schafer 1997). For each imputation a predicted value is calculated, upon which these predictions are averaged over to obtain a final estimate. Although multiple imputation could also be used for the other methods, it adds quite a bit of computing time, so it is only used where necessary. 2.3. Evaluation of Performance In order to tune the algorithms listed above, compare the performance of the different approaches for nowcasting poverty (table 1), and report the final out-of-sample errors, this paper relies on nested five- fold cross validation (Stone 1974; Iizuka et al. 2003; Varma and Simon 2006). Intuitively, nested cross validation works by iteratively splitting the sample into three different subsamples: a training subsample on which a particular machine-learning method is run multiple times using various parameters, a testing subsample on which the predictions from these various parameter options are compared against each other and the best performer is selected, and a validation subsample on which the best performers across machine-learning methods and target variable options are compared against each other and the final out- of-sample errors are reported. The data points going into these three subsamples are reassigned multiple times to maximize the power of the data. Nested cross validation is an extension of regular k-fold cross validation, which reduces a bias with the latter towards selecting a model with a large tuning grid and a downward bias in the final out-of-sample errors obtained. Supplementary online appendix S1 contains a more thorough and technical discussion of nested cross validation. Once done with nested five-fold cross validation, there will be one out-of-sample estimate of the poverty rate for each household survey (except for the earliest one for each country, as the methods that rely on changes since the last data point will not work in those cases) for each machine-learning method, and for each target variable. To evaluate the performance of the various methods, the primary loss function will be the mean absolute deviation between the predicted and true poverty rates. To ensure that the selected model does not only work well for a few countries that happen to have many poverty estimates, each country is weighted by the inverse of its number of poverty estimates, such that the total weight for each country equals 1. The mean absolute deviation is used rather than the mean squared error since the objective is to min- imize the deviations between the true and predicted poverty rates while giving equal weight to all devia- tions. Using the mean squared error tends to give more weight to the prediction of outliers. This can be problematic since data incomparabilities can create some strong outliers at times, and the methods should not be judged by how well they predict these outliers. Percentage point deviations are used rather than percentage deviations since the latter sometimes can be very large for countries with low poverty rates. The World Bank Economic Review 843 If a country has a poverty rate of 1 percent and a model predicts a poverty rate of 2 percent, the error in percentage terms would be 100 percent which would give this observation a large impact. The focus on percentage point deviations implicitly gives a larger focus on countries with high poverty rates. More traditional goodness-of-fit measures such as the AIC or BIC cannot be used, given that these Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 cannot be computed for all models. The use of the mean absolute deviation, however, largely accomplishes the same as these traditional goodness-of-fit measures. The measure of performance used here evaluates the fit out of sample instead of evaluating it in sample while penalizing for model complexity. 3. Data All poverty estimates used in this paper come from PovcalNet, which contains the World Bank’s official country-level, regional, and global estimates of poverty. Most of the data in PovcalNet come from the Global Monitoring Database, which is the World Bank’s repository of harmonized multitopic income and expenditure household surveys used to monitor global poverty. PovcalNet contains more than 2,000 surveys from 168 countries covering 98 percent of the world’s population. Information on the range and number of surveys by country is available in the supplementary online appendix (tables S3.1–S3.4). The data available in PovcalNet are standardized as far as possible but differences exist with regard to the method of data collection and whether the welfare aggregate is based on income or consumption. By relying on PovcalNet, there is consistency with the official numbers used by the World Bank and United Nations for monitoring poverty and inequality. To predict poverty, variables from three databases are relied upon. First, we use the World Bank’s World Development Indicators (WDI), which contain country-year information on nearly 1,300 variables covering a wide range of topics, such as health, agriculture, education, climate change, infrastructure, and more (but as explained below, only a subset of these can be used for the present exercise). Second, we use the IMF’s World Economic Outlook (WEO) database, which contains country-year information on about 50 variables related to macroeconomic outcomes, such as inflation, government debt, and unemployment. Third, we use remote sensing data from the Google Earth Engine, particularly data on nighttime lights, rainfall, land surface temperature, impervious surface, cropland, and normalized difference vegetation, snow, and water indices. In contrast to WEO and WDI, the remote sensing data are both more granular spatially and more frequent temporally. To fit into this exercise, they need to be aggregated to the country- year level. They are first aggregated to annual data by calculating the mean, max, min, and standard deviation of each location over a year. Afterwards, they are aggregated spatially by taking the mean, max, min, and standard deviation of the annual data for a country. This gives 16 features for each type of variable. Some of these combinations will not be relevant for global poverty nowcasting, but all of them are included here to remain agnostic about which ones are relevant. For all of the variables from the three sources above, the annualized growth rates of the variables between two household surveys for a country are also calculated. For all variables that are expressed as percentages, rates, or indices, the annualized changes between two household surveys for a country are calculated as well. The only variables removed from what is described above are (a) variables with more than 90 percent missing information in 2020, the nearcasting year at the time of writing, (b) variables with more than 90 percent missing information for country-years with poverty estimates, and (c) variables that are not comparable between countries, such as variables expressed in local currency units. The first two criteria reduce computation time by removing variables with too little information to be relevant.3 The latter removes variables where exploiting cross-country variation is not meaningful. 3 An alternative to the first two criteria is to use lagged values whenever a variable has missing information before applying the 90 percent thresholds. Applying this method improved the models that predict poverty rates directly and had a negligible impact on the models that predict from the last survey, and hence could be an alternative way of partially dealing with missing values. 844 Mahler, Castañeda Aguilar, and Newhouse Depending on which month of the year this exercise is carried out, the removal of variables with more than 90 percent missing in the nearcasting year removes a large fraction of the WDI. The reason is that early in the year, most variables do not yet have information for the prior year, meaning that the information used for nearcasting is not much better than the information that can be used for nowcasting. Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 When conducted in July (as the current exercise is), only about 300 WDI variables meet the criterion. All in all, there are a bit more than 1,000 features across the various data sources. Not all poverty trends within countries are comparable over time due to changes in the survey method- ology or the welfare aggregate. This matters for predictions of changes/growth in a target variable. Even if the exact causes of poverty in a country are known, if welfare aggregates are not comparable, then no model will be able to predict the changes in poverty between two surveys. Though this also makes it more difficult to predict levels in the target variables, the problem is arguably larger when predicting changes. Though the sample can be restricted to comparable spells only, the main results will cover all data points, even those that are not comparable over time. As one of the findings will be that nowcasting poverty by predicting changes from a past distribution gives more accurate results than predicting levels directly, the decision to include non-comparable spells makes this result more conservative, as it precisely penalizes the methods found to be superior. A robustness check restricts the data to comparable spells. 4. Results 4.1. Evaluation of the Performance of the Models Figure 4 shows the prediction errors across the six combinations of target variables (table 1) by each of the five machine-learning algorithms used. Regardless of which target variable(s) are used, all the predictions are turned into poverty rates and the errors are evaluated over poverty rates. Hence, they are comparable to each other. As one example, the very first bar shows that when predicting poverty rates directly using a CART random forest (abbreviated as carf), then the predicted poverty rate is on average 4.99 percentage points off the truth. Each bar in panels 5 and 6 of fig. 4 have several possible realizations depending on which growth incidence curve or distributional assumption is used to convert means and Gini’s into poverty rates. Figure S2.1 in the supplementary online appendix shows the best way of converting predictions of growth/levels in the mean and Gini into poverty rates. The best way of converting predicted levels of the Gini and mean tends to be by using a log-normal distribution, while the best way of converting pre- dictions of growth in the mean and Gini tends to be by applying a linear growth incidence curve. Figure 4 shows the best performing options from fig. S2.1 in panels 5 and 6. Generating poverty rates by predicting from the last observed survey (right column of fig. 4) gives lower errors than predicting poverty rates or the mean (and Gini) without accounting for changes from the last survey (left column). On the one hand, this is intuitive; the direct models do not account explicitly for past patterns in a country. On the other hand, given that surveys within countries often are incomparable and often have many years between them, lagged information need not provide much additional information. Among the different ways of predicting from the last survey, predicting changes in poverty rates (panel 2 of fig. 4) performs worse than predicting growth in the mean or mean and Gini (panels 4 and 6). Predicting growth in the Gini along with growth in the mean tends to slightly increase the error. In other words, assuming no changes to inequality works slightly better than trying to predict changes in inequality since the last survey.4 The best performing method is a conditional inference random forest which scales the past distribution by a predicted growth in the mean. This method has an error of 3.65 percentage points. 4 This is a slightly counterintuitive result which may emerge because the distributional assumptions or growth incidence curves imposed are only one way in which a given Gini coefficient can be implemented. The World Bank Economic Review 845 Figure 4. Performance of Machine Learning Models Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 Source: Authors’ estimates based on data from PovcalNet, World Development Indicators, World Economic Outlook, and Google Earth Engine. Note: carf = CART random forest, cirf = conditional inference random forest, grbo = gradient boosting, plas = post-lasso, rlas = regular lasso. Panels 5 and 6 show the best performing options from fig. S2.1. 4.2. Comparison to Models Just Using GDP Growth It is interesting to compare the best performing machine-learning models with three variants of the model that simply scales the last distribution up according to growth in real GDP per capita (fig. 5). First, we use a model which assumes all growth from GDP per capita passes through to growth in welfare. Second, we use a model which only allows a fraction of GDP per capita growth to pass through to the distribution and where this fraction is estimated through a simple linear regression. The fraction turns out to be 79 percent. This follows empirical evidence showing that on average, only a fraction of growth in national accounts trickles down to household surveys (Ravallion 2003; Deaton 2005; Pinkovskiy and Sala-i Martin 2016; Lakner et al. 2022; Prydz, Jolliffe, and Serajuddin forthcoming). Third, we use a model where the passthrough rate is estimated separately by consumption and income welfare aggregates. This is motivated by the fact that this interaction was the first variable to enter in the lasso when predicting growth in the mean. With this method, 71 percent of growth is estimated to pass through to consumption aggregates while 97 percent is estimated to pass through to income aggregates. These models are also compared with two models that help interpret how large the errors are. First, the predictions are compared to predictions one would get if one were perfectly able to predict the mean. This is done by shifting a distribution from the beginning of a spell such that it matches the mean of the distribution at the end of the spell. It represents the lowest possible prediction error of using method (3) or (4). Second, we use a method which simply uses the lagged poverty rate as the prediction. This scenario 846 Mahler, Castañeda Aguilar, and Newhouse Figure 5. Comparing Best Performing Methods to Models Only Using GDP Growth Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 Source: Authors’ estimates based on data from PovcalNet, World Development Indicators, World Economic Outlook, and Google Earth Engine. Note: Errors of seven different models. The best performing method takes the minimum bar from fig. 4 while the best method predicting poverty rates directly takes the minimum bar of panel 1 of fig. 4. Using only GDP growth to shift the mean refers to predicting poverty by adjusting mean welfare by the growth in real GDP per capita. The rightmost column reflects a hypothetical scenario of the errors one would get if one was perfectly able to predict growth in the mean. is intended to shed light on whether nowcasting is a worthwhile exercise or whether one could simply use the latest official poverty rates as proxies for nowcasts. The best performing machine-learning method only reduces the error by 0.12 percentage points over just using GDP growth, only by 0.05 percentage points if a passthrough rate is used, and only by 0.04 if an income-/consumption-specific passthrough rate is used. This means that just using GDP per capita growth to nearcast distributions is nearly as accurate as any method using more than 1,000 variables and complex machine-learning methods. This is not because none of the methods work well—if one simply used the latest official poverty rate as the nowcast, then the error increases by nearly 1 percentage point. Rather, it is because nearly all of the variation in growth in mean consumption that can be explained with these 1,000 variables can be explained by growth in GDP per capita. If it was possible to predict all the variation in growth in mean consumption, then the error would nearly halve to 1.91 percentage points. Using growth in GDP per capita to shift the mean gives a better performance than any model trying to predict poverty rates directly using 1,000 variables. In the supplementary online appendix it is shown that these results hold if only comparable spells are used (figs S2.2 and S2.3). If the root mean squared error is used instead of the mean absolute deviation, predicting poverty rates directly with gradient boosting is relatively more attractive (figs S2.4 and S2.5). Once more, however, the potential gain in accuracy of any machine-learning model hardly merits the extra complexity it adds. The supplementary online appendix also contains results on the share of poverty trends correctly pre- dicted. This is a relevant loss function if one cares about whether the situation is improving or worsening and less about the poverty rate itself. With this loss function it is possible to break down the error into trends that are incorrectly predicted as improvements and trends incorrectly predicted as deteriorations. The World Bank Economic Review 847 Figure 6. Variables Important for Predicting Growth in Mean Welfare Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 Source: Authors’ estimates based on data from PovcalNet, World Development Indicators, World Economic Outlook, and Google Earth Engine. Note: Top 10 most important features from the conditional inference random forest. For a particular feature, the value is calculated by permuting the feature in the data set and calculating how much the prediction error increases. The importance measure is standardized such that the feature with the highest value gets a value of 1. For completeness, the features important for the other 5 target variables are shown in figs S2.8 and S2.9 in the supplementary online appendix. The results with this loss function do not alter the main findings: the models that predict growth in the mean predict most of the trends correctly—around 4 out of 5 trends to be exact (figs S2.6 and S2.7). Yet, if one wants to minimize the share of trends incorrectly predicted as a decline, these models are no longer the best performing. Some of the models that predict poverty rates directly are incorrectly predicting trends as a decline half as frequently, but come at the cost of incorrectly predicting increases in poverty 5–8 times as frequently. 4.3. Predictors of Poverty The fact that just using growth in GDP per capita to extrapolate the mean works well to nowcast poverty invites the question whether there could be another simple model that performs even better. For the lasso, this can be analyzed by looking at the order in which variables enter. For forests and gradient boosting, it can be analyzed through feature importance measures. Feature importance measures indicate the importance of a particular feature for the accuracy of the predictions. Here the feature importance measures from the conditional inference random forests are used, as this method generally performed best across the machine-learning algorithms (judged by its average rank in fig. 4). For a particular feature, the feature importance is calculated by permuting the feature in the data set and calculating how much the prediction error increases. The importance measure is standardized such that the variable with the greatest importance gets a value of 1. Plotting the top 10 important features for predicting growth in the mean, reveals that the only 6 features that are substantively predictive of growth in the mean are all national accounts variables (fig. 6). They are all identical to, or highly correlated with, growth in GDP per capita. The most predictive variable is growth in final consumption expenditure (FCE), which is the sum of two components of GDP, government expenditure and HFCE. The second most predictive variable is growth in HFCE, followed by growth in GDP, Gross National Income (GNI), and gross domestic income (GDP measured from the income side). Interestingly, employing final consumption expenditure or HFCE instead of GDP does not improve the predictions. The reason is that while HFCE and FCE work better for upper-middle-income and high- income countries, GDP works better for low-income and lower-middle-income countries, which dominate the loss function. The variable most informative not from national accounts is growth in the employment rate. Yet it is clear that just using growth in GDP per capita sums up the information well. 848 Mahler, Castañeda Aguilar, and Newhouse One may still wonder whether a slightly more complicated GDP-based model than the one with passthrough rates by income and consumption could perform even better. This could be the case if passthrough rates differ by various contexts beyond the type of the welfare aggregate. Three different models were tried that have passthrough rates by income/consumption and either income group (four Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 categories), World Bank region (seven categories), or the sign of the GDP growth rate (positive or nega- tive). If, for example, growth in real GDP per capita trickles down to growth in welfare at higher or lower rates during recessions, then the last model would perform better. These three models result in mean ab- solute deviations of 3.65, 3.73, and 3.76, respectively (compared to 3.69 when only having passthrough rates separately by income and consumption). Hence, even though the last two models only add a bit more complexity, they lead to overfitting. The model which differentiates passthrough rates by income group, on the other hand, works as well as the best model overall. This result, though, does not extend to using the mean squared error, using only comparable spells, or comparing models by how well they predict growth in the mean, and hence should be interpreted with a grain of salt. An unanswered question remains why a model which assumes about 70 percent of GDP growth trickles down to consumption works better than assuming a full transmission as with income vectors. One possible explanation relates to the marginal propensity to consume. The passthrough rate of 0.7 is fully consistent with a behavior where households on average consume all their income until per capita GDP reaches a particular threshold, at which point the marginal propensity to consume declines as per capita GDP rises. This seems to be consistent with existing evidence that marginal propensities to consume are higher in poorer contexts. Gross, Notowidigdo, and Wang (2020) and Drescher, Fessler, and Lindner (2020) estimate marginal propensities to consume between 0.33 and 0.57 in the United States or euro area, while Crozier and Zavaleta (2022) estimate a marginal propensity to consume of 0.9 in Peru. A second possible explanation may be related to consumption items not captured well in surveys. This includes items deliberately not captured in consumption aggregates, such as health expenses, which often are excluded on the grounds that they do not increase welfare (Deaton and Zaidi 2002). It also includes items not captured due to non-classical measurement error in consumption, such as food eaten away from home. If the share of spending on such items increases with GDP, then that would generate a lower passthrough rate for consumption. Finally, it is possible that the different passthrough rates are not related to whether welfare is measured with income or consumption, but that countries with consumption aggregates differ from countries with income aggregates in some unobserved way, and that this is driving the different passthrough rates. When looking at the ratio of mean income to mean consumption for the six countries in PovcalNet that have both at the same time on at least four different occasions, there are clear cases where this ratio increases as income grows (see fig. S3.1 in the supplementary online appendix). This makes it less likely that an omitted variable is behind the differential passthrough rates. 4.4. Exploring Heterogeneity Though the preceding two subsections suggest that a simple GDP-based model works well on average, a point of interest is whether there are cases where this may not apply. This is explored here using two different strategies: (a) by comparing how well some of the top models perform in various contexts and (b) by explicitly testing all models separately on rich and poor countries. Figure 7 plots the mean absolute deviation for three select models—the best overall, the best at pre- dicting poverty rates directly, and the model using a fraction of growth in GDP per capita to scale the mean—as a function of six other variables. In all of these figures, the sample of countries changes over the x-axis. For example, in the first panel on extrapolation time (the time between two household surveys), the countries with an extrapolation time of one year are mostly in Europe and South America, while the countries with an extrapolation of 10 years tend to be in Sub-Saharan Africa. As a result, looking at the The World Bank Economic Review 849 Figure 7. Errors as a Function of Other Variables Source: Authors’ estimates based on data from PovcalNet, World Development Indicators, World Economic Outlook, and Google Earth Engine. Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 Note: Local polynomials of the absolute deviations as a function of other variables. The trend for a particular line is not interpretable since the sample of countries changes over the x-axis. Yet the gaps between the lines are interpretable. Only confidence intervals for one model are shown to not clutter the graph. The confidence interval evaluates the uncertainty of the local polynomial fit but does not incorporate the uncertainty of the predicted poverty rates themselves. trend for a given line is of little interest. However, the gap between the three lines can be used to explore which method works best in various contexts. It is plausible that predicting changes from the last survey is a good strategy when the last survey is only a couple of years old, while it is a less sound strategy when the last survey is a decade old or more. Extrapolation time does not matter when predicting poverty rates directly, but the average prediction error might still be a function of extrapolation time for the reason mentioned above—that the sample of countries changes. Even for extrapolation times of five years, predicting changes from the past survey by just using growth in GDP per capita with a passthrough rate outperforms a model using 1,000+ variables trying to predict the poverty rate directly and performs as well as models predicting growth in the mean using 1,000+ variables (fig. 7). For extrapolation times beyond five years, there are too few observations 850 Mahler, Castañeda Aguilar, and Newhouse to say anything with sufficient certainty, but it can be ruled out that just using GDP growth is significantly worse than the other methods. Deaton and Schreyer (2021) argue that GDP is increasingly becoming detached from national material well-being. This could imply that using growth rates in GDP per capita has become less attractive, relative Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 to predicting poverty rates directly, in recent years. This is not the case in the results presented here (fig. 7). A possible reconciliation between these two findings is that while GDP has become increasingly detached from household surveys, household surveys have become more comparable across rounds within the same country, allowing predictions from one survey to the next not to deteriorate. One could speculate that using GDP growth to project forward works less well in times of extreme growth in either direction. This does not appear to be the case. Even during recessions and periods of high growth, predicting changes from the last survey using GDP growth gives a lower error than predicting poverty rates directly (fig. 7). There are cases, though, where using GDP growth is less attractive. This appears to be the case whenever inflation is above 7 percent, when exports as a fraction of GDP are above 60 percent, and when gross capital formation as share of GDP is very low or very high. In other words, when real GDP growth is likely to be driven by irregular patterns—a large deflator or large specific components—then real GDP growth is a less strong predictor of welfare changes. Another way to test whether the model using only GDP growth works well in different scenarios is to train and evaluate all models on specific subsets of the data. Concretely, all models have been rerun on cases with poverty rates less than or greater than 4 percent, a split which approximately halves the full sample into two equal parts. For the poor sample, predicting changes in poverty rates works better than predicting growth in the mean together with distribution neutrality (fig. S2.10). Yet the model using only a fraction of growth in real GDP per capita performs better than all models (fig. S2.11). Though one may be reluctant to conclude that no machine-learning model would work better on this subsample, it is safe to say that the GDP-based model works particularly well in poor settings. For the rich sample, the story is different. Here, models that predict poverty rates directly now work best (fig. S2.12). This is partially mechanical given that when only trained on estimates between 0 percent and 4 percent, they can only yield predictions in this interval. When looking at how well the GDP-based model performs, it is not far from the models that predict changes from the last survey, but it is notably less attractive than in the poor sample (fig. S2.13). 4.5. From Nearcasting to Nowcasting All results so far were based on models constrained to using variables available in the nearcasting year. It is not clear how these models perform for nowcasting for two reasons. First, they rely partially on features that may not be available in the nowcasting year. For the GDP-based model, this does not matter given that GDP growth rates themselves are nowcasted by various institutions. This is not the case for the more complicated models or even models just using growth in HFCE, which to our knowledge is not nowcasted across countries. Depending on how well the complicated models predict when missing values are present, this will make them perform worse for nowcasting and may make the preferred model relatively more advantageous for nowcasting. Indeed, when running all methods earlier in the calendar year (recall that earlier in the calendar year the informational space available for nearcasting approximates that of nowcasting), just using GDP growth becomes relatively more advantageous. The second reason why the precision of nowcasting estimates may differ from nearcasting estimates is that the features with data available in the nowcasting year are based on modeling, extrapolations, or data for only part of the year. Such data are likely less accurate and ultimately less connected to the welfare distribution. The nowcasted growth rates, for example, have not yet been realized and are likely to deviate from the growth rates that eventually will be estimated by national authorities. Some evidence suggests that nowcasted growth rates by the IMF might be too optimistic (Sandefur and Subramanian 2020). The World Bank Economic Review 851 Figure 8. Error from Using Nowcasted Growth Data Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 Source: Authors’ estimates based on data from PovcalNet and World Economic Outlook (WEO). Note: Estimates of how off initial growth forecasts from WEO are from final estimates, here understood as estimates published four years after the year in question (panel a). For example, the point that intersects the vertical axis at 2 suggests that growth estimates launched in April of a given year that try to predict growth of that year on average are 2 percentage points off the growth estimate for that year released four years later. This does not matter for nowcasting poverty (panel b). The relatively flat curve suggests that whether forecasted, nowcasted, nearcasted, or final growth estimates are used for nowcasting poverty does not impact the accuracy of the predictions. The poverty predictions use the growth data with a separate passthrough rate by income and consumption. The errors on the right-hand side are not comparable to the main results given that the sample of spells with WEO nowcasted growth rates is a subset of the full sample. One can test the extent to which this matters for the preferred model by looking at how well historical growth nowcasts predicted poverty. Here this is done by gathering all growth nowcasts (and forecasts and nearcasts) from the World Economic Outlook back to 1999, the earliest available. Next it is tested how well these growth nowcasts were aligned with final growth rates, defined here as growth rates estimated four years after the year in question. Finally, these nowcasted growth rates are used to see how well they predict poverty using the preferred model. GDP growth nowcasts (and even more so GDP growth forecasts) differ from the final growth rates (fig. 8, panel a). Though this speaks against using growth data for nowcasting poverty, surprisingly, poverty rates are not predicted worse using GDP growth forecasts, nowcasts, or nearcasts (fig. 8, panel b). One possible way to reconcile these two findings is that the quality of GDP data generally is worse for poorer countries. This means that though modeled GDP nowcasts may differ from what is estimated by national authorities, they are equally good signals of changes to poverty. Though the purpose of this paper is not to analyze current patterns of global poverty, one can shed light on the challenges when going from nearcasting to nowcasting poverty by looking at the nowcasted global poverty rates from the models. Looking at the global and regional trends in extreme poverty from 2014– 2021, the nowcasting year at the time of writing, it stands out that the best model that predicts poverty rates directly deviates for the nowcasting year (fig. 9). This is because the particular model used—gradient boosting—turns out not to work well for 2021 where many of the input variables are missing. Though all models could have been rerun only on variables widely available at the nowcasting year, it is not clear that this would make things better as the model then shifts from one year to the next. For that reason, this model is best only to use for nearcasting. Even for nearcasting, looking closely at the East Asia & Pacific panel reveals that methods trying to predict poverty rates directly might yield unlikely trends. The relatively large increase in the regional poverty rate in 2017 corresponds to the year where survey data for China is no longer available. Prior to 2016, survey estimates for China are used. The jump suggests that the model predicts higher poverty for China than the official estimates. It is hard to support a case 852 Mahler, Castañeda Aguilar, and Newhouse Figure 9. Nowcasted Global Poverty Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 Source: Authors’ estimates based on data from PovcalNet, World Development Indicators, World Economic Outlook, and Google Earth Engine. Note: For countries without any prior data, gradient boosting is used to predict poverty rates directly for all models. Note that the scale of the y-axis differs for each graph. Nowcasts at the country level are available in figs S3.2–S3.6 in the supplementary online appendix. for poverty in East Asia increasing rapidly from 2016 to 2017 though. This speaks intuitively in favor of models that are anchored in past estimates. When comparing the model only using GDP growth and the best model overall, it is evident that for most regions the two are nearly aligned. This makes it unlikely that the nowcasted GDP growth rates differ from other trends observed in nowcasted variables. The exception is South Asia, and as a consequence, the world as a whole, where just using GDP per capita suggests a lower poverty rate for India. In other words, other indicators have progressed less fast in India than growth in real GDP per capita. 4.6. Limitations of Using Growth for Nowcasting and Nearcasting The past four subsections have shown that a model just using growth in real GDP per capita to shift the mean forward is relatively accurate, unlikely to be severely improved upon by a simple model using another variable, works in variety of settings (but is less attractive for rich countries and when the makeup of GDP is irregular), and works for nowcasting as well. Yet using growth in real GDP per capita also comes with possible shortcomings. Here three of those will be mentioned. First, the tight historical relationship between poverty reduction and growth may in part be due to new welfare aggregates being benchmarked against GDP data. When the World Bank, National Statistical Offices, or others create welfare aggregates, there are many assumptions that need to be made, such as the treatment of outliers and the inclusion of particular components. There are cases where those choices have been partially guided by how the country fared on related indicators, most notably GDP per capita. If this behavior is prevalent, such a pattern could create a mechanical relationship between measured poverty and GDP which need not extend to nowcasting. Second, in some cases GDP relies partially on data from the same survey that is used to measure poverty. This would likewise create a mechanical relationship between the two. Third, if GDP numbers are subject to quality concerns, then they may likewise be less relevant for both nearcasting and nowcasting. The fact that using growth rates in real GDP per capita worked well when trained on historical data suggests that historically this has not been frequently the case (or at least not more so than quality concerns with other indicators). Nonetheless, a large discrepancy for India when using growth in real GDP per capita and using more explanatory variables was found (fig. 9). Though this may simply be because true growth in real GDP per capita is less connected with household consumption The World Bank Economic Review 853 in India than in other countries, evidence suggests that GDP growth rates in India in recent years are non-credibly high (Subramanian 2019). If this is indeed the case, then this serves as another argument against only relying on growth in real GDP per capita. Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 5. Conclusion This paper has analyzed how best to nowcast poverty around the world. Statistical learning techniques were applied to the World Bank’s collection of international poverty estimates utilizing more than 1,000 development indicators as features. It was investigated whether predicting poverty rates directly had a higher accuracy than predicting poverty indirectly, by, for example, predicting growth in mean welfare and applying this growth to scale up the past distribution, or by predicting growth in the mean and in the Gini and applying growth incidence curves to scale and stretch the past distribution. Findings revealed that a model which simply uses a fraction of growth in real GDP per capita since the last observed household survey to shift the entire distribution performs better than any model that predicts poverty rates directly using all the variables mentioned above, and nearly as well as all models trying to predict growth in the mean and in the Gini using all the variables mentioned above. This suggests that conditional on knowing growth in real GDP per capita, no other variable can substantially increase the predictive accuracy. On the one hand, given the decades-long literature documenting the importance of growth for poverty reduction (Kraay 2006; Ferreira and Ravallion 2009), this is not surprising. Partially as a result of this literature, variants of the GDP-based model have been used both in the literature and for the World Bank’s official global poverty monitoring. On the other hand, several papers have documented important gaps between national accounts data and household survey data, suggesting that GDP might not be that connected to welfare as measured in household surveys (Ravallion 2003; Deaton 2005; Ferreira, Leite, and Ravallion 2010; Pinkovskiy and Sala-i Martin 2016; Deaton and Schreyer 2021; Prydz, Jolliffe, and Serajuddin forthcoming). One way to reconcile this seemingly contradictory conclusion—that a GDP-based model predicts poverty relatively well despite GDP being disconnected from welfare as measured in household surveys— is to note that no model predicts poverty that well in absolute terms. Though the model using growth rates in GDP per capita performs better than nearly all other models, it does not imply that the model’s error is low. The GDP-based model can explain about 97 percent of the out-of-sample variation in poverty rates, but only about 22 percent of the out-of-sample variation in growth in mean welfare. The models using 1,000+ variables can explain at most 28 percent of growth in mean welfare. In other words, most of the temporal variation cannot be predicted by any information utilized here. Why are changes in poverty so hard to predict by any model? Five possible reasons come to mind. First, the available features may not be well suited to predict growth in welfare. Though predictors from different databases were relied upon, including remote sensing data, it is possible that the most important features are contained in other data sources, such as mobile phone data or proprietary remote sensing data. Second, other algorithms or ensemble learning might perform better. Third, only data at the country level were considered; it is possible that models specified at the subnational level perform better. Fourth, measurement error and measurement differences in welfare aggregates over time may make it difficult to predict changes. Issues like whether a recall or diary is used, number of consumption items asked about, the treatment of rent and durables, and differential non-response to surveys vary between and within countries and can have significant bearings on poverty rates (Jolliffe 2001; Beegle et al. 2012; Islam, Newhouse, and Yanez-Pagans 2021). Accounting for these issues would require a detailed database on the methodology used to construct welfare aggregates, which is not currently available. Finally, the fact that the distributions of many countries have a lot of mass around the international poverty line means 854 Mahler, Castañeda Aguilar, and Newhouse that even small changes to the distribution of welfare can yield large changes to the poverty rate. This makes it difficult to predict the poverty rate with great precision. It may be the case that with different features, more granular data, or different models, it is possible to improve upon the preferred model of this paper. Yet from the analysis conducted here, it remains the Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 case that a very simple model utilizing growth in real GDP per capita to scale a prior welfare distribution is often an appealing method. Data Availability Statement The data underlying this article are available at https://github.com/danielmahler/NowcastingGlobal Poverty. References Angrist, N., K. P. Goldberg, and D. Jolliffe. 2022. “Why Is Growth in Developing Countries So Hard to Measure?” Journal of Economic Perspectives 35 (3): 215–42. Arlot, S., and A. Celisse. 2010. “A Survey of Cross-Validation Procedures for Model Selection.” Statistics Surveys 4: 40–79. Aruoba, S. B., and F. X. Diebold. 2010. “Real-Time Macroeconomic Monitoring: Real Activity, Inflation, and Inter- actions.” American Economic Review 100 (2): 20–24. Beegle, K., J. De Weerdt, J. Friedman, and J. Gibson. 2012. “Methods of Household Consumption Measurement through Surveys: Experimental Results from Tanzania.” Journal of Development Economics 1 (98): 3–18. Belloni, A., and V. Chernozhukov, 2013. “Least Squares After Model Selection in High-Dimensional Sparse Models.” Bernoulli 19 (2): 521–47. Bergmeir, C., and J. M. Benítez. 2012. “On the Use of Cross-Validation for Time Series Predictor Evaluation.” Infor- mation Sciences 191(15 May): 192–213. Bergmeir, C., R. J. Hyndman, and B. Koo. 2018. “A Note on the Validity of Cross-Validation for Evaluating Autore- gressive Time Series Prediction.” Computational Statistics & Data Analysis 120(April): 70–83. Bourguignon, F. 2003. “The Growth Elasticity of Poverty Reduction: Explaining Heterogeneity across Countries and Time Periods.” In Inequality and Growth: Theory and Policy Implications, edited by T. Eicher and S. Turnovsky. Cambridge: MIT Press. Breiman, L. 2001. “Random Rorests.” Machine Learning 45 (1): 5–32. Chi, G., H. Fang, S. Chatterjee, and J. E. Blumenstock. 2022. “Micro-Estimates of Wealth for All Low- and Middle- Income Countries.” Proceedings of the National Academy of Sciences 119 (3): e2113658119. Crozier, S. L., and F. B. Zavaleta. 2022. “The Marginal Propensity to Consume of 2020 COVID-19 Stimulus Payments in Peru.” International Journal of Economics and Finance 14 (3): 115–15. Cuaresma, J. C., W. Fengler, H. Kharas, K. Bekhtiar, M. Brottrager, and M. Hofer. 2018. “Will the Sustainable Devel- opment Goals Be Fulfilled? Assessing Present and Future Global Poverty.” Palgrave Communications 4 (1): 1–8. Deaton, A. 2005. “Measuring Poverty in a Growing World (or Measuring Growth in a Poor World).” Review of Economics and Statistics 87 (1): 1–19. Deaton, A., and P. Schreyer. 2021. “GDP, Wellbeing, and Health: Thoughts on the 2017 Round of the International Comparison Program.” Review of Income and Wealth 68 (1): 1–15. Deaton, A., and S. Zaidi. 2002. “Guidelines for Constructing Consumption Aggregates for Welfare Analysis.” LSMS Working Paper No. 135. World Bank, Washington, DC. Drescher, K., P. Fessler, and P. Lindner. 2020. “Helicopter Money in Europe: New Evidence on the Marginal Propensity to Consume across European Households.” Economics Letters 195(October): 109416. Edward, P., and A. Sumner. 2014. “Estimating the Scale and Geography of Global Poverty Now and in the Future: How Much Difference Do Method and Assumptions Make?” World Development 58(June): 67–82. Ekhator-Mobayode, U. E., and J. Hoogeveen. 2021. “Microdata Collection and Openness in the Middle East and North Africa.” Policy Research Working Paper 9892. World Bank, Washington, DC. The World Bank Economic Review 855 Ferreira, F. H. G., S. Chen, A. Dabalen, Y. Dikhanov, N. Hamadeh, D. Jolliffe, and A. Narayan et al.. 2016. “A Global Count of the Extreme Poor in 2012: Data Issues, Methodology and Initial Results.” Journal of Economic Inequality 14 (2): 141–72. Ferreira, F. H. G., and P. G. Leite. 2003. “Policy Options for Meeting the Millennium Development Goals in Brazil: Can Micro-Simulations Help?” Policy Research Working Paper 2075. World Bank, Washington, DC. Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 Ferreira, F. H. G., P. G. Leite, and M. Ravallion. 2010. “Poverty Reduction without Economic Growth?: Explaining Brazil’s Poverty Dynamics, 1985–2004.” Journal of Development Economics 93 (1): 20–36. Ferreira, F. H. G., and M. Ravallion. 2009. “Poverty and Inequality: The Global Context.” In The Oxford Handbook of Economic Inequality, edited by W. Salverda, B. Nolan and T. Smeeding. Oxford: Oxford University Press. Fisk, P. R. 1961. “The Graduation of Income Distributions.” Econometrica: Journal of the Econometric Society 29 (2): 171–85. Friedman, J. H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics 29 (5): 1189–232. Giannone, D., J. Henry, M. Lalik, and M. Modugno. 2012. “An Area-Wide Real-Time Database for the Euro Area.” Review of Economics and Statistics 94 (4): 1000–13. Giannone, D., L. Reichlin, and D. Small. 2008. “Nowcasting: The Real-time Informational Content of Macroeconomic Data.” Journal of Monetary Economics 55 (4): 665–76. Gross, T., M. J. Notowidigdo, and J. Wang. 2020. “The Marginal Propensity to Consume over the Business Cycle.” American Economic Journal: Macroeconomics 12 (2): 351–84. Hillebrand, E. 2008. “The Global Distribution of Income in 2050.” World Development 36 (5): 727–40. Hothorn, T., K. Hornik, and A. Zeileis. 2006. “Unbiased Recursive Partitioning: A Conditional Inference Framework.” Journal of Computational and Graphical Statistics 15 (3): 651–74. Iizuka, N., M. Oka, H. Yamada-Okabe, M. Nishida, Y. Maeda, N. Mori, and T. Takao et al. 2003. “Oligonucleotide Microarray for Prediction of Early Intrahepatic Recurrence of Hepatocellular Carcinoma after Curative Resection.” Lancet 361 (9361): 923–29. Islam, T. T., D. Newhouse, and M. Yanez-Pagans. 2021. “International Comparisons of Poverty in South Asia.” Asian Development Review 38 (1): 142–75. Jha, S. 2019. “Govt Scraps NSO’s Consumer Expenditure Survey over ‘Data Quality’.” Business Standard, November 6. Jolliffe, D. 2001. “Measuring Absolute and Relative Poverty: The Sensitivity of Estimated Household Consumption to Survey Design.” Journal of Economic and Social Measurement 27 (1–2): 1–23. Kakwani, N. 1993. “Poverty and Economic Growth with Application to Côte d’Ivoire.” Review of Income and Wealth 39 (2): 121–39. Kraay, A. 2006. “When Is Growth Pro-Poor? Evidence from a Panel of Countries.” Journal of Development Economics 80 (1): 198–227. Lakner, C., D. G. Mahler, M. Negre, and E. B. Prydz. 2022. “How Much Does Reducing Inequality Matter for Global Poverty?” Journal of Economic Inequality 20 (3): 559–585. Moses, M., H. Kharas, M. Miller-Petrie, G. Tsakalos, L. Marczak, S. Hay, and C. Murray et al. 2021. “Global Poverty and Inequality from 1980 to the COVID-19 Pandemic.” SocArXiv x45np, Center for Open Science. Pinkovskiy, M., and X. Sala-i Martin. 2016. “Lights, Camera... Income! Illuminating the National Accounts- Household Surveys Debate.” Quarterly Journal of Economics 131 (2): 579–631. Prydz, E. B., D. M. Jolliffe, C. Lakner, D. G. Mahler, and P. Sangraula. 2019. “National Accounts Data Used in Global Poverty Measurement.” World Bank Group Global Poverty Monitoring Technical Note. World Bank, Washington, DC. Prydz, E. B., D. M. Jolliffe, and U. Serajuddin. forthcoming. “Disparities in Assessments of Living Standards Using National Accounts and Surveys.” Review of Income and Wealth. Ravallion, M. 2003. “Measuring Aggregate Welfare in Developing Countries: How Well Do National Accounts and Surveys Agree?” Review of Economics and Statistics 85 (3): 645–52. ———, 2013. “How Long Will It Take To Lift One Billion People Out of Poverty?” World Bank Research Observer 28 (2): 139–58. Ravallion, M., and S. Chen. 2003. “Measuring Pro-Poor Growth.” Economics Letters 78 (1): 93–99. Rubin, D. B. 1976. “Inference and Missing Data.” Biometrika 63 (3): 581–92. 856 Mahler, Castañeda Aguilar, and Newhouse ———. 2004. Multiple Imputation for Nonresponse in Surveys, Vol. 81. Hoboken: John Wiley & Sons. Sandefur, J., and A. Subramanian. 2020. “The IMF’s Growth Forecasts for Poor Countries Don’t Match Its COVID Narrative.” Working Paper 533. Center for Global Development. Schafer, J. L. 1997. Analysis of Incomplete Multivariate Data. New York: CRC Press. Stone, M. 1974. “Cross-Validatory Choice and Assessment of Statistical Predictions.” Journal of the Royal Statistical Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023 Society: Series B (Methodological) 36 (2): 111–33. Subramanian, A. 2019. “India’s GDP Mis-Estimation: Likelihood, Magnitudes, Mechanisms, and Implications.” Cen- ter for International Development Working Paper Series No. 354. Harvard University. Sumner, A., and C. Hoy. 2022. “The End of Global Poverty: Is the UN Sustainable Development Goal 1 (Still) Achiev- able?” Global Policy 12 (4): 419–29. Tibshirani, R. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society: Series B (Methodological) 58 (1): 267–88. Varma, S., and R. Simon. 2006. “Bias in Error Estimation When Using Cross-Validation for Model Selection.” BMC Bioinformatics 7 (1): 1–8.