The World Bank Economic Review, 36(4), 2022, 835–856
                                                                               https://doi.org10.1093/wber/lhac017
                                                                                                                      Article




                                                                                                                                Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
Nowcasting Global Poverty
Daniel Gerszon Mahler, R. Andrés Castañeda Aguilar,
and David Newhouse
Abstract
This paper evaluates different methods for nowcasting country-level poverty rates, including methods that
apply statistical learning to large-scale country-level data obtained from the World Development Indicators
and Google Earth Engine. The methods are evaluated by withholding measured poverty rates and determining
how accurately the methods predict the held-out data. A simple approach that scales the last observed welfare
distribution by a fraction of real GDP per capita growth performs nearly as well as models using statistical
learning on 1,000+ variables. This GDP-based approach outperforms all models that predict poverty rates
directly, even when the last survey is up to five years old. The results indicate that in this context, the additional
complexity introduced by applying statistical learning techniques to a large set of variables yields only marginal
improvements in accuracy.
JEL classification: C53, D31, I32, O10

Keywords: poverty, nowcasting, machine learning, measurement




1. Introduction
Timely and comparable poverty estimates are vital to assess countries’ development progress. Interna-
tional poverty estimates serve as a public good for researchers and inform the development community
on efforts to meet the first Sustainable Development Goal, to end extreme poverty by 2030. Within in-
ternational development organizations, national development agencies, and NGOs, they also inform the
allocation of resources and the development of strategic priorities.
   Yet timely and comparable estimates of poverty are lacking for many reasons. In some countries,
fragility, conflict, and violence make it difficult to conduct household expenditure surveys altogether, while
in other countries, lack of financial resources is the main obstacle. Even when surveys are frequently con-
ducted, the time it takes to field a survey, collect, process, and analyze the data often implies a two-year

Daniel Gerszon Mahler (corresponding author) is with the World Bank Data Group in Washington, DC. His email address
is dmahler@worldbank.org. R. Andrés Castañeda Aguilar is with the World Bank Data Group in Washington, DC. His
email address is acastanedaa@worldbank.org. David Newhouse is with the World Bank Data Group in Washington, DC. His
email address is dnewhouse@worldbank.org. The research for this article was supported financially by the UK government
through the Data and Evidence for Tackling Extreme Poverty (DEEP) Research Programme and by the World Bank through
a Research Support Budget grant. The authors thank Aart Kraay, Andres Fernando Chamorro Elizondo, Benjamin Stewart,
Benu Bidani, Christoph Lakner, Dean Jolliffe, Lucas Kitzmueller, Marta Schoch, Minh Cong Nguyen, Nishant Yonzan, Nobuo
Yoshida, and Samuel Kofi Tetteh Baah for insightful comments. The authors are also grateful for feedback received during
the special IARIW-World Bank Conference “New Approaches to Defining and Measuring Poverty in a Growing World,”
the CCS-UN Workshop “Nowcasting in International Organizations,” and the 2021 ECINEQ Conference. A supplementary
online appendix is available with this article at The World Bank Economic Review website.

© 2022 International Bank for Reconstruction and Development / The World Bank. Published by Oxford University Press
836                                                                Mahler, Castañeda Aguilar, and Newhouse


lag from the time of data collection to the release of international poverty estimates. In some instances,
the data are never publicly released due to quality concerns or lack of data transparency. This was the
case with India’s 2017/18 National Sample Survey (Jha 2019) and is the case regularly in the Middle East
& North Africa (Ekhator-Mobayode and Hoogeveen 2021).




                                                                                                                Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
   As a result, the latest published data on poverty often paint an outdated picture of poverty in a country.
As of October 2021, on average across the developing world, the most recent survey with international
poverty data was from 2014. For this reason, initiatives that reliably and cost-effectively predict timely
poverty rates are crucial for informed and effective high-level decision-making.
   The objective of this paper is to test various methods to estimate extreme poverty in all countries of the
world as of the present year (at the time of writing, 2021) and as of the preceding year (2020). The paper
refers to estimates of poverty for the present year as nowcasts and estimates of poverty for the preceding
year as nearcasts. For nearcasting, one can rely on all data that are produced with a one-year time lag.
For nowcasting, one can only rely on variables that themselves have been nowcasted and variables that
are produced with little or no time lag, such as certain remote sensing indicators.
   As possible predictors of poverty, more than 1,000 variables from the World Development Indica-
tors, the World Economic Outlook, and the Google Earth Engine are used. All of these predictors are
combined with the PovcalNet database, which contains more than 2,000 international poverty esti-
mates covering 168 countries. The models are trained on these past estimates—essentially by pretend-
ing a subset of them do not exist—and evaluated by measuring how well they approximate the held-out
estimates.
   Intuitively, to predict extreme poverty around the world, one might first try models that predict poverty
rates directly. Yet this ignores that prior full distributions of consumption or income (henceforth welfare)
are available for most countries. The study explores whether greater accuracy can be obtained by predict-
ing poverty indirectly, by, for example, predicting changes in poverty from the last survey, by predicting
growth in mean welfare and applying this growth to scale up the past distribution, or by predicting
growth in the mean and in the Gini and applying growth incidence curves to scale and stretch the past
distributions.
   Findings suggest that models that predict poverty rates directly are outperformed by models that predict
growth in mean welfare since the last survey and scale the last distribution by this predicted growth.
Though this method assumes that inequality remains unchanged since the last survey, explicitly modeling
distributional changes by predicting changes in the Gini does not help. The reason for this is that of the
1,000+ candidate variables, none of them contains notable information about changes in inequality. The
best performing method overall, which predicts growth in mean welfare using a random forest, gives a
mean absolute deviation of 3.65 percentage points. This means that on average over all countries with
data, the predicted poverty rate evaluated at the international poverty line of $1.90 is 3.65 percentage
points from the truth.
   A simplified model which just uses a fraction of growth in real GDP per capita to scale up the last
mean gives a mean absolute deviation of 3.69, about 1 percent worse than the overall best performing
model. In other words, conditional on knowing growth in real GDP per capita since the last observed
distribution, no other variable contributes significant information about the evolution of poverty rates.
Even when the last survey is as much as five years old, extrapolating forward using a fraction of growth
in real GDP per capita is superior to predicting poverty rates directly with 1,000+ variables. For longer
extrapolation times, power is lacking to determine which method is superior. The model works well even
when GDP growth rates themselves are nowcasted. For these reasons and due to the simplicity of this
approach, it is considered the preferred method in this paper.
   On the one hand, the relevance of GDP growth for nowcasting poverty is not surprising; the impact
of growth on poverty reduction has been well known for decades (Kraay 2006; Ferreira and Ravallion
2009). On the other hand, ample evidence has found large inconsistencies between consumption measured
The World Bank Economic Review                                                                          837


in household surveys and national accounts within and across countries (Ravallion 2003; Deaton 2005;
Ferreira, Leite, and Ravallion 2010; Pinkovskiy and Sala-i Martin 2016; Deaton and Schreyer 2021; Prydz,
Jolliffe, and Serajuddin forthcoming) and noted the difficulty of measuring GDP in developing countries
(Angrist, Goldberg, and Jolliffe 2022).




                                                                                                               Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
    Several factors can explain why GDP nonetheless is found to be such an important predictor of poverty.
First, discrepancies between levels of GDP and levels of welfare do not directly affect the accuracy of
the preferred method, which is based on growth rates from the two data sources. Second, the average
discrepancy between growth rates in the two sources is accounted for by only allowing a fraction of growth
in real GDP per capita to “pass through” to welfare as measured in household surveys. The preferred
method has a passthrough rate of about 0.7 for consumption-based poverty estimates, and a passthrough
rate of about 1 for income-based poverty estimates. Third, GDP is probably the statistic in which the global
community has invested the most amount of financing and capacity-building to improve quality and
ensure cross-country comparability (Angrist, Goldberg, and Jolliffe 2022). Despite measurement issues
and shortcomings, one could expect GDP to have more signaling properties for cross-country measures
of average well-being than any other statistic.
    That said, there are cases where the GDP-based model is less attractive. The GDP-based model works
less well for rich countries and for situations where specific components of GDP make up an unusually
large or small share. As such, it is certainly possible that other methods or data can improve upon the
models presented in this paper. This is particularly the case when the objective is to nowcast poverty for
a single country, where relying on microsimulation tools or more granular data will likely be superior as
they can be tailored to country-specific contexts. Yet, as such methods are hard to implement consistently
across many countries, the present exercise can be attractive when the objective is to compare poverty
across a range of countries.
    The findings contribute to the literature by pointing out that when predicting poverty at the national
level, (a) predicting changes from the past distribution is generally superior to predicting poverty rates
directly and (b) conditional on knowing real GDP per capita, other variables including publicly available
remote sensing data generally carry little additional information. These contributions can extend beyond
the problem of nearcasting and nowcasting poverty to any attempt that tries to express global poverty
in a given year. Such attempts also require extrapolating or interpolating between poverty estimates, for
which the findings may be applicable.
    To our knowledge, this is the first paper comparing different ways of nowcasting poverty at a global
scale. Other papers, such as Chi et al. (2022), Cuaresma et al. (2018), and Moses et al. (2021) have
predicted global or near-global poverty rates but not with the purpose of testing different methods. For
other indicators, nowcasting is a more established exercise, such as nowcasting GDP (Giannone, Reich-
lin, and Small 2008), inflation (Aruoba and Diebold 2010), and macroeconomic variables more broadly
(Giannone et al. 2012). Partially due to the Sustainable Development Goal (SDG) target of ending ex-
treme poverty by 2030, many papers have focused on forecasting global poverty rather than nowcasting
it (Hillebrand 2008; Ravallion 2013; Edward and Sumner 2014; Lakner et al. 2022; Sumner and Hoy
2022).


2. Method
This paper employs several methods to predict poverty around the world. This section outlines the meth-
ods, their advantages and disadvantages, and other important methodological choices. Three distinct
issues will be focused upon: (a) the target variable being predicted and (if not poverty rates directly)
how poverty rates are obtained from these predictions, (b) the algorithms used to generate the pre-
dictions, and (c) the evaluation criterion used to judge predictive performance. Throughout this paper,
when poverty rates are referred to, the international poverty line of $1.90 per day (Ferreira et al. 2016)
is used.
838                                                                                                    Mahler, Castañeda Aguilar, and Newhouse


Table 1. Target Variables

Target variable                                                                                        Approach for estimating poverty

(1) Poverty rates                                                                                         Predicted directly
(2) Changes in poverty rates                                                       Apply the predicted change to the most recent poverty rate




                                                                                                                                                                          Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
(3) Mean welfare                                                                    Scale the past distribution to the predicted mean welfare
(4) Growth in mean welfare                                                       Scale the past distribution by 1 + the predicted growth in mean
                                                                                                               welfare
(5) Mean welfare and Gini coefficient                                        (a) Assume the distribution is log-normal or log-logistic or (b) apply a
                                                                                 growth incidence curve from the past distribution to match the
                                                                                                      predicted mean and Gini
(6) Growth in mean welfare and Gini coefficient                              (a) Apply the predicted growth in the mean and Gini to the past mean
                                                                             and Gini and assume that the distribution is log-normal or log-logistic
                                                                             or (b) apply a growth incidence curve to the past distribution with the
                                                                                              predicted growth in the mean and Gini


Source: Authors’ overview.
Note: Six different combinations of target variables that will be used for the predictions with information on how poverty rates will be backed out based on the target
variable(s).




2.1. The Target Variable
When deciding how to predict poverty, it seems intuitive that the target variable to predict should be the
poverty rate in each country. Yet such a method ignores that prior poverty estimates from most countries
exist, which could be used to predict changes from. Behind these prior estimates are a full distribution of
welfare which might contain relevant information—for example by distinguishing the near poor and the
rich. Taking advantage of these distributions, this paper uses six different target variable combinations
(table 1) as explained in more detail in what follows.

(1) Poverty Rates and (2) Changes in Poverty Rates. Predicting the poverty rate directly is the most intuitive
and straightforward option; the poverty rates are the ultimate objective of this paper. Predicting poverty
rates directly at the nowcasting year, tn , has the advantage that it can yield estimates for countries without
any previous poverty estimates at all. Predicting changes in poverty from the past survey conducted at time
ts , in contrast, needs some assumptions about poverty levels in countries without data to arrive at global
poverty rates. Yet, by utilizing the past survey, one can exploit that there is a past estimate to anchor the
analysis around. When predicting changes in poverty, c     ˆpoverty,tn ,ts , the nowcasted poverty rates are given
by
                                                                                         ˆpoverty,ts ,tn .
                                                         ˆpoverty,ts ,tn ) = povertyts + c
                                              povertytn (c
When predicting changes from the past survey, the annualized change in the poverty rate in percentage
points is used. This avoids having to predict extreme and undefined values, which often occur when
predicting the annualized growth in poverty rates, due to countries with poverty rates close to or at zero
percent.

(3) Mean Welfare and (4) Growth in Mean Welfare. While predicting changes in poverty rates from the
past survey exploits some information available, it still ignores the fact that a whole distribution of welfare
was available in the past. Countries with high density around the poverty line are likely to experience
different magnitudes of changes in poverty than countries with sparse density around the poverty line. By
predicting the mean or growth in the mean and scaling the past distribution to match these predictions,
the model takes full advantage of the previous data in the sense that the entire distribution is leveraged.
Method (3) works by scaling the welfare of each household, h, at the last survey by the ratio of the
The World Bank Economic Review                                                                                                                                839

Figure 1. Example of Recovery of Poverty Rates from Predictions of Mean Welfare




                                                                                                                                                                       Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
Source: Authors’ illustration based on data from PovcalNet.
Note: Illustration of how models are implemented when the target variable is mean welfare or growth in mean welfare. This figure uses the last observed distribution
from Botswana from 2016 and shows how that distribution is projected forward by a hypothetical prediction of a growth in mean welfare of 4 percent per year.




predicted mean at the nowcasting year, μ   ˆ tn , and the observed mean at the last survey, μts . The scaled
distribution is used to estimate poverty at time tn :
                                                                                               μ
                                                                                               ˆ tn
                                                           ˆ tn ) = F welfareh,ts
                                                povertytn (μ                                        < 1.90 .
                                                                                               μts
Similarly, method (4) works by taking the last observed distribution of welfare and scaling it by the growth
                                          ˆ μ,ts ,tn :
in the mean predicted between ts and tn , g
                                                                                      ˆ μ,ts ,tn ) < 1.90].
                                                   ˆ μ,ts ,tn ) = F [welfareh,ts (1 + g
                                        povertytn (g
A hypothetical example can be constructed using the latest observed distribution for Botswana from 2016
assuming predicted annualized growth in the mean of 4 percent between 2016 and 2021 (fig. 1). By shifting
the distribution to the right reflecting five years of this growth rate, the poverty rate at $1.90 declines from
about 15 percent to 9 percent. Predicting growth in the mean has the advantage that the model can be
applied to any poverty line. Yet it imposes the assumption that all households have experienced the same
growth since the last survey. In other words, it imposes that inequality has not changed since the last
survey.

(5) Mean Consumption and Gini Coefficient and (6) Growth in Mean and Growth in Gini. The fifth
and sixth methods try to deal with the latter issue by also predicting inequality—either directly or by
predicting growth in inequality since the last survey. A challenge with this approach is that there are many
different measures of inequality and infinitely many ways in which the same level/growth of inequality
can materialize. In this paper, the Gini coefficient is used as the measure of inequality due its popularity,
together with some further distributional assumptions to pin down how the Gini shapes the distribution.
In particular, two ways of converting Gini predictions into poverty rates are used.
   First, the predicted mean and Gini are employed together with known two-parameter distributional
shapes, the log-normal distribution, which is frequently used for poverty and inequality analysis (see for
840                                                                                                      Mahler, Castañeda Aguilar, and Newhouse

Figure 2. Illustration of Log-Normal and Log-Logistic Conversions




                                                                                                                                                                     Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
Source: Authors’ calculations.
Note: Predicted poverty rate for a given predicted mean welfare, μ
                                                                 ˆ tn , and Gini, ginitn , when assuming a log-normal distribution or a log-logistic distribution.



example Bourguignon 2003) and the log-logistic distribution (also known as the Fisk distribution, after
Fisk 1961).1 These distributions are used to back out the poverty rate given a predicted mean, μ
                                                                                               ˆ tn and
predicted Gini, ginitn . Formally, poverty with the log-normal distribution is derived as

                                                                                               ˆ tn ) + 2[erf−1 (ginitn )]2
                                                                                  ln(1.9) − ln(μ
                                                ˆ tn , ginitn ) =
                          povertytn ,lognormal (μ                                                                                            ,
                                                                                                    2erf−1 (ginitn )
and with the log-logistic distribution as
                                                                                                                              1
                                                                                                                                    −1
                                                                                               ˆ tn sin(π ginitn )
                                                                                               μ                           ginitn
                                                          ˆ tn , ginitn ) = 1 +
                                  povertytn ,loglogistic (μ                                                                              .
                                                                                                   1.9π ginitn

The resulting poverty rates can be plotted as a function of μ
                                                            ˆ tn and ginitn (fig. 2).
   The other method applies a specific growth incidence curve (GIC) from the last observed distribu-
tion. GICs plot the growth in welfare as a function of the percentile p of the initial welfare distribution
(Ravallion and Chen 2003). Downward-sloping GICs reduce inequality and vice versa. Evidence shows
that GICs often take on approximately linear and convex forms (Kakwani 1993; Ferreira and Leite 2003;
Lakner et al. 2022). By imposing particular functional forms on the GICs, given a predicted mean welfare
and predicted Gini, there is only one possible GIC. When using GICs, the nowcasted poverty rates are
backed out as follows:
                                           ˆ tn , ginitn ) = F [welfareh, p,ts (1 + g p,ts ,tn (μ
                           povertytn ,GIC (μ                                                    ˆ tn , ginitn )) < 1.9],

                  ˆ tn , ginitn ) are percentile-specific growth rates, which are determined such that the resulting
where g p,ts ,tn (μ
distribution matches the predicted mean and Gini. In addition, for the linear GIC, there is a requirement
that g p,ts ,tn = β + δ p, while for the convex GIC, there is a condition that g p,ts ,tn = (1 − α )(1 + γ ) − 1 +
[α (1 + γ )μts ]/μ p,ts . Here β and δ or α and γ are parameters that are estimated to ensure that the equa-
tions hold (Lakner et al. 2022).
    The most recent survey from Botswana in 2016 can again be used as the starting point to illustrate how
the two GICs impact the shape of the nowcasted distribution (fig. 3). Assuming that the nowcasted mean
has grown by 4 percent annually since the last survey and that the nowcasted Gini is 5 percent lower than

1    Only two-parameter distributions are used given that only two quantities—the mean and the Gini—are predicted. Under
     this setup, it is not possible to arrive at a unique solution for three-parameter distributions. For further work it would be
The World Bank Economic Review                                                                                          841

Figure 3. Illustration of Growth Incidence Curve Conversions




                                                                                                                               Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
Source: Authors’ illustration based on data from PovcalNet.
Note: Examples of a linear and convex GIC, here applied to the last survey in Botswana from 2016.



the last observed Gini, the two GICs will result in different predicted poverty rates as they distribute the
growth differently along the distribution.
   The above equations pertain to method (5). Analogous equations can be made with method (6) where
the target variables are the growth in the mean and Gini. Throughout the analysis, growth in the Gini,
rather than changes in the Gini, is predicted based on preliminary analysis comparing the performance of
those two options.
   Particular attention will be paid to a submethod under the fourth category, which is a variant of what
the World Bank uses to extrapolate poverty in countries and report on Sustainable Development Goal
indicator 1.1.1—to end extreme poverty by 2030. The method is based on the premise that there is a tight
relationship between income or expenditure measured in national accounts and income or consumption
observed in household surveys. It works by taking the last observed distribution of welfare and scaling
the welfare of each household by the growth observed in real GDP per capita from national accounts
between the survey and the nowcasting year.2 It assumes that growth observed in national accounts is
fully “passed through” to the welfare observed in household surveys and that the only factor informative
for changes in poverty is growth in national accounts. In addition, it assumes that growth accrues to
everyone equally, that is, without changing the distribution of welfare. This is problematic if growth was
pro-poor or pro-rich in the intervening period.
   The methods covered are obviously not comprehensive, and other ways of predicting poverty rates
exist. Hopefully, the methods chosen cover both a mix of the most intuitive options and those that have
been applied in prior work.

2.2. Algorithms
In order to predict any of the target variables, a number of frequently used machine-learning algorithms
are relied upon, particularly the lasso (Tibshirani 1996), the post-lasso (Belloni et al. 2013), CART random


    interesting to see whether predicting another outcome, for example the median, and applying three-parameter distribu-
    tions improves upon what is found here.
2   The World Bank uses Household Final Consumption Expenditure (HFCE) whenever available and GDP otherwise, with
    the exception of countries in Sub-Saharan Africa where only GDP is used (Prydz et al. 2019). Here, focus is on GDP since
    HFCE nowcasts are not available for many countries.
842                                                                   Mahler, Castañeda Aguilar, and Newhouse


forests (Breiman 2001), conditional inference random forests (Hothorn, Hornik, and Zeileis 2006), and
gradient boosting (Friedman 2001). These methods all have in common that they can predict the outcome
variable of interest while being agnostic about which variables are relevant for the predictions.
   Since the features used suffer from a lot of missing data, it is necessary to find a strategy to deal with this




                                                                                                                     Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
missingness. Simply deleting rows with missing values is not feasible as this would leave no or very few
observations. For the conditional inference random forests and gradient boosting, imputation methods
embedded in the algorithms to deal with missing data are relied upon. For conditional inference random
forests, for example, the algorithm works by sequentially splitting the sample into two based on a variable
deemed most predictive of the target variable. The algorithm might judge that countries with a decline in
the share of male workers in agriculture is predictive of growth in the mean. For country-years without
data on the share of male workers in agriculture, the algorithm will search for the most similar variable
in terms of how it relates to the target variable, which in this case could be the share of all workers in
agriculture, and split the observations with missing values in the first variable based on the latter.
   Such methods of dealing with missing values are not possible or not available in the programs used here
for the lasso, post-lasso, and CART random forests. Instead, the entire data set of features will be multiply
imputed to avoid missing values altogether (Rubin 1976, 2004; Schafer 1997). For each imputation a
predicted value is calculated, upon which these predictions are averaged over to obtain a final estimate.
Although multiple imputation could also be used for the other methods, it adds quite a bit of computing
time, so it is only used where necessary.

2.3. Evaluation of Performance
In order to tune the algorithms listed above, compare the performance of the different approaches for
nowcasting poverty (table 1), and report the final out-of-sample errors, this paper relies on nested five-
fold cross validation (Stone 1974; Iizuka et al. 2003; Varma and Simon 2006). Intuitively, nested cross
validation works by iteratively splitting the sample into three different subsamples: a training subsample
on which a particular machine-learning method is run multiple times using various parameters, a testing
subsample on which the predictions from these various parameter options are compared against each
other and the best performer is selected, and a validation subsample on which the best performers across
machine-learning methods and target variable options are compared against each other and the final out-
of-sample errors are reported. The data points going into these three subsamples are reassigned multiple
times to maximize the power of the data. Nested cross validation is an extension of regular k-fold cross
validation, which reduces a bias with the latter towards selecting a model with a large tuning grid and a
downward bias in the final out-of-sample errors obtained. Supplementary online appendix S1 contains
a more thorough and technical discussion of nested cross validation.
   Once done with nested five-fold cross validation, there will be one out-of-sample estimate of the poverty
rate for each household survey (except for the earliest one for each country, as the methods that rely on
changes since the last data point will not work in those cases) for each machine-learning method, and for
each target variable. To evaluate the performance of the various methods, the primary loss function will
be the mean absolute deviation between the predicted and true poverty rates. To ensure that the selected
model does not only work well for a few countries that happen to have many poverty estimates, each
country is weighted by the inverse of its number of poverty estimates, such that the total weight for each
country equals 1.
   The mean absolute deviation is used rather than the mean squared error since the objective is to min-
imize the deviations between the true and predicted poverty rates while giving equal weight to all devia-
tions. Using the mean squared error tends to give more weight to the prediction of outliers. This can be
problematic since data incomparabilities can create some strong outliers at times, and the methods should
not be judged by how well they predict these outliers. Percentage point deviations are used rather than
percentage deviations since the latter sometimes can be very large for countries with low poverty rates.
The World Bank Economic Review                                                                                             843


If a country has a poverty rate of 1 percent and a model predicts a poverty rate of 2 percent, the error in
percentage terms would be 100 percent which would give this observation a large impact. The focus on
percentage point deviations implicitly gives a larger focus on countries with high poverty rates.
    More traditional goodness-of-fit measures such as the AIC or BIC cannot be used, given that these




                                                                                                                                   Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
cannot be computed for all models. The use of the mean absolute deviation, however, largely accomplishes
the same as these traditional goodness-of-fit measures. The measure of performance used here evaluates
the fit out of sample instead of evaluating it in sample while penalizing for model complexity.


3. Data
All poverty estimates used in this paper come from PovcalNet, which contains the World Bank’s official
country-level, regional, and global estimates of poverty. Most of the data in PovcalNet come from the
Global Monitoring Database, which is the World Bank’s repository of harmonized multitopic income
and expenditure household surveys used to monitor global poverty. PovcalNet contains more than 2,000
surveys from 168 countries covering 98 percent of the world’s population. Information on the range and
number of surveys by country is available in the supplementary online appendix (tables S3.1–S3.4). The
data available in PovcalNet are standardized as far as possible but differences exist with regard to the
method of data collection and whether the welfare aggregate is based on income or consumption. By
relying on PovcalNet, there is consistency with the official numbers used by the World Bank and United
Nations for monitoring poverty and inequality.
   To predict poverty, variables from three databases are relied upon. First, we use the World Bank’s
World Development Indicators (WDI), which contain country-year information on nearly 1,300 variables
covering a wide range of topics, such as health, agriculture, education, climate change, infrastructure, and
more (but as explained below, only a subset of these can be used for the present exercise). Second, we use
the IMF’s World Economic Outlook (WEO) database, which contains country-year information on about
50 variables related to macroeconomic outcomes, such as inflation, government debt, and unemployment.
Third, we use remote sensing data from the Google Earth Engine, particularly data on nighttime lights,
rainfall, land surface temperature, impervious surface, cropland, and normalized difference vegetation,
snow, and water indices. In contrast to WEO and WDI, the remote sensing data are both more granular
spatially and more frequent temporally. To fit into this exercise, they need to be aggregated to the country-
year level. They are first aggregated to annual data by calculating the mean, max, min, and standard
deviation of each location over a year. Afterwards, they are aggregated spatially by taking the mean, max,
min, and standard deviation of the annual data for a country. This gives 16 features for each type of
variable. Some of these combinations will not be relevant for global poverty nowcasting, but all of them
are included here to remain agnostic about which ones are relevant.
   For all of the variables from the three sources above, the annualized growth rates of the variables
between two household surveys for a country are also calculated. For all variables that are expressed as
percentages, rates, or indices, the annualized changes between two household surveys for a country are
calculated as well. The only variables removed from what is described above are (a) variables with more
than 90 percent missing information in 2020, the nearcasting year at the time of writing, (b) variables
with more than 90 percent missing information for country-years with poverty estimates, and (c) variables
that are not comparable between countries, such as variables expressed in local currency units. The first
two criteria reduce computation time by removing variables with too little information to be relevant.3
The latter removes variables where exploiting cross-country variation is not meaningful.
3   An alternative to the first two criteria is to use lagged values whenever a variable has missing information before applying
    the 90 percent thresholds. Applying this method improved the models that predict poverty rates directly and had a
    negligible impact on the models that predict from the last survey, and hence could be an alternative way of partially
    dealing with missing values.
844                                                                        Mahler, Castañeda Aguilar, and Newhouse


    Depending on which month of the year this exercise is carried out, the removal of variables with
more than 90 percent missing in the nearcasting year removes a large fraction of the WDI. The reason
is that early in the year, most variables do not yet have information for the prior year, meaning that the
information used for nearcasting is not much better than the information that can be used for nowcasting.




                                                                                                                             Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
When conducted in July (as the current exercise is), only about 300 WDI variables meet the criterion. All
in all, there are a bit more than 1,000 features across the various data sources.
    Not all poverty trends within countries are comparable over time due to changes in the survey method-
ology or the welfare aggregate. This matters for predictions of changes/growth in a target variable. Even
if the exact causes of poverty in a country are known, if welfare aggregates are not comparable, then no
model will be able to predict the changes in poverty between two surveys. Though this also makes it more
difficult to predict levels in the target variables, the problem is arguably larger when predicting changes.
Though the sample can be restricted to comparable spells only, the main results will cover all data points,
even those that are not comparable over time. As one of the findings will be that nowcasting poverty by
predicting changes from a past distribution gives more accurate results than predicting levels directly, the
decision to include non-comparable spells makes this result more conservative, as it precisely penalizes
the methods found to be superior. A robustness check restricts the data to comparable spells.


4. Results
4.1. Evaluation of the Performance of the Models
Figure 4 shows the prediction errors across the six combinations of target variables (table 1) by each of the
five machine-learning algorithms used. Regardless of which target variable(s) are used, all the predictions
are turned into poverty rates and the errors are evaluated over poverty rates. Hence, they are comparable
to each other. As one example, the very first bar shows that when predicting poverty rates directly using a
CART random forest (abbreviated as carf), then the predicted poverty rate is on average 4.99 percentage
points off the truth.
   Each bar in panels 5 and 6 of fig. 4 have several possible realizations depending on which growth
incidence curve or distributional assumption is used to convert means and Gini’s into poverty rates.
Figure S2.1 in the supplementary online appendix shows the best way of converting predictions of
growth/levels in the mean and Gini into poverty rates. The best way of converting predicted levels of
the Gini and mean tends to be by using a log-normal distribution, while the best way of converting pre-
dictions of growth in the mean and Gini tends to be by applying a linear growth incidence curve. Figure 4
shows the best performing options from fig. S2.1 in panels 5 and 6.
   Generating poverty rates by predicting from the last observed survey (right column of fig. 4) gives lower
errors than predicting poverty rates or the mean (and Gini) without accounting for changes from the last
survey (left column). On the one hand, this is intuitive; the direct models do not account explicitly for past
patterns in a country. On the other hand, given that surveys within countries often are incomparable and
often have many years between them, lagged information need not provide much additional information.
   Among the different ways of predicting from the last survey, predicting changes in poverty rates (panel
2 of fig. 4) performs worse than predicting growth in the mean or mean and Gini (panels 4 and 6).
Predicting growth in the Gini along with growth in the mean tends to slightly increase the error. In other
words, assuming no changes to inequality works slightly better than trying to predict changes in inequality
since the last survey.4 The best performing method is a conditional inference random forest which scales
the past distribution by a predicted growth in the mean. This method has an error of 3.65 percentage
points.

4   This is a slightly counterintuitive result which may emerge because the distributional assumptions or growth incidence
    curves imposed are only one way in which a given Gini coefficient can be implemented.
The World Bank Economic Review                                                                                                                               845

Figure 4. Performance of Machine Learning Models




                                                                                                                                                                      Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
Source: Authors’ estimates based on data from PovcalNet, World Development Indicators, World Economic Outlook, and Google Earth Engine.
Note: carf = CART random forest, cirf = conditional inference random forest, grbo = gradient boosting, plas = post-lasso, rlas = regular lasso. Panels 5 and 6 show
the best performing options from fig. S2.1.




4.2. Comparison to Models Just Using GDP Growth
It is interesting to compare the best performing machine-learning models with three variants of the model
that simply scales the last distribution up according to growth in real GDP per capita (fig. 5). First, we use
a model which assumes all growth from GDP per capita passes through to growth in welfare. Second, we
use a model which only allows a fraction of GDP per capita growth to pass through to the distribution
and where this fraction is estimated through a simple linear regression. The fraction turns out to be 79
percent. This follows empirical evidence showing that on average, only a fraction of growth in national
accounts trickles down to household surveys (Ravallion 2003; Deaton 2005; Pinkovskiy and Sala-i Martin
2016; Lakner et al. 2022; Prydz, Jolliffe, and Serajuddin forthcoming). Third, we use a model where the
passthrough rate is estimated separately by consumption and income welfare aggregates. This is motivated
by the fact that this interaction was the first variable to enter in the lasso when predicting growth in the
mean. With this method, 71 percent of growth is estimated to pass through to consumption aggregates
while 97 percent is estimated to pass through to income aggregates.
    These models are also compared with two models that help interpret how large the errors are. First,
the predictions are compared to predictions one would get if one were perfectly able to predict the mean.
This is done by shifting a distribution from the beginning of a spell such that it matches the mean of the
distribution at the end of the spell. It represents the lowest possible prediction error of using method (3)
or (4). Second, we use a method which simply uses the lagged poverty rate as the prediction. This scenario
846                                                                                                 Mahler, Castañeda Aguilar, and Newhouse

Figure 5. Comparing Best Performing Methods to Models Only Using GDP Growth




                                                                                                                                                                     Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
Source: Authors’ estimates based on data from PovcalNet, World Development Indicators, World Economic Outlook, and Google Earth Engine.
Note: Errors of seven different models. The best performing method takes the minimum bar from fig. 4 while the best method predicting poverty rates directly takes
the minimum bar of panel 1 of fig. 4. Using only GDP growth to shift the mean refers to predicting poverty by adjusting mean welfare by the growth in real GDP per
capita. The rightmost column reflects a hypothetical scenario of the errors one would get if one was perfectly able to predict growth in the mean.




is intended to shed light on whether nowcasting is a worthwhile exercise or whether one could simply use
the latest official poverty rates as proxies for nowcasts.
    The best performing machine-learning method only reduces the error by 0.12 percentage points over
just using GDP growth, only by 0.05 percentage points if a passthrough rate is used, and only by 0.04
if an income-/consumption-specific passthrough rate is used. This means that just using GDP per capita
growth to nearcast distributions is nearly as accurate as any method using more than 1,000 variables and
complex machine-learning methods. This is not because none of the methods work well—if one simply
used the latest official poverty rate as the nowcast, then the error increases by nearly 1 percentage point.
Rather, it is because nearly all of the variation in growth in mean consumption that can be explained with
these 1,000 variables can be explained by growth in GDP per capita. If it was possible to predict all the
variation in growth in mean consumption, then the error would nearly halve to 1.91 percentage points.
Using growth in GDP per capita to shift the mean gives a better performance than any model trying to
predict poverty rates directly using 1,000 variables.
    In the supplementary online appendix it is shown that these results hold if only comparable spells are
used (figs S2.2 and S2.3). If the root mean squared error is used instead of the mean absolute deviation,
predicting poverty rates directly with gradient boosting is relatively more attractive (figs S2.4 and S2.5).
Once more, however, the potential gain in accuracy of any machine-learning model hardly merits the extra
complexity it adds.
    The supplementary online appendix also contains results on the share of poverty trends correctly pre-
dicted. This is a relevant loss function if one cares about whether the situation is improving or worsening
and less about the poverty rate itself. With this loss function it is possible to break down the error into
trends that are incorrectly predicted as improvements and trends incorrectly predicted as deteriorations.
The World Bank Economic Review                                                                                                                                 847

Figure 6. Variables Important for Predicting Growth in Mean Welfare




                                                                                                                                                                        Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
Source: Authors’ estimates based on data from PovcalNet, World Development Indicators, World Economic Outlook, and Google Earth Engine.
Note: Top 10 most important features from the conditional inference random forest. For a particular feature, the value is calculated by permuting the feature in the
data set and calculating how much the prediction error increases. The importance measure is standardized such that the feature with the highest value gets a value of
1. For completeness, the features important for the other 5 target variables are shown in figs S2.8 and S2.9 in the supplementary online appendix.




The results with this loss function do not alter the main findings: the models that predict growth in the
mean predict most of the trends correctly—around 4 out of 5 trends to be exact (figs S2.6 and S2.7). Yet, if
one wants to minimize the share of trends incorrectly predicted as a decline, these models are no longer the
best performing. Some of the models that predict poverty rates directly are incorrectly predicting trends
as a decline half as frequently, but come at the cost of incorrectly predicting increases in poverty 5–8 times
as frequently.


4.3. Predictors of Poverty
The fact that just using growth in GDP per capita to extrapolate the mean works well to nowcast poverty
invites the question whether there could be another simple model that performs even better. For the
lasso, this can be analyzed by looking at the order in which variables enter. For forests and gradient
boosting, it can be analyzed through feature importance measures. Feature importance measures indicate
the importance of a particular feature for the accuracy of the predictions. Here the feature importance
measures from the conditional inference random forests are used, as this method generally performed
best across the machine-learning algorithms (judged by its average rank in fig. 4). For a particular feature,
the feature importance is calculated by permuting the feature in the data set and calculating how much
the prediction error increases. The importance measure is standardized such that the variable with the
greatest importance gets a value of 1.
   Plotting the top 10 important features for predicting growth in the mean, reveals that the only 6 features
that are substantively predictive of growth in the mean are all national accounts variables (fig. 6). They
are all identical to, or highly correlated with, growth in GDP per capita. The most predictive variable is
growth in final consumption expenditure (FCE), which is the sum of two components of GDP, government
expenditure and HFCE. The second most predictive variable is growth in HFCE, followed by growth in
GDP, Gross National Income (GNI), and gross domestic income (GDP measured from the income side).
Interestingly, employing final consumption expenditure or HFCE instead of GDP does not improve the
predictions. The reason is that while HFCE and FCE work better for upper-middle-income and high-
income countries, GDP works better for low-income and lower-middle-income countries, which dominate
the loss function. The variable most informative not from national accounts is growth in the employment
rate. Yet it is clear that just using growth in GDP per capita sums up the information well.
848                                                                 Mahler, Castañeda Aguilar, and Newhouse


   One may still wonder whether a slightly more complicated GDP-based model than the one with
passthrough rates by income and consumption could perform even better. This could be the case if
passthrough rates differ by various contexts beyond the type of the welfare aggregate. Three different
models were tried that have passthrough rates by income/consumption and either income group (four




                                                                                                                 Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
categories), World Bank region (seven categories), or the sign of the GDP growth rate (positive or nega-
tive). If, for example, growth in real GDP per capita trickles down to growth in welfare at higher or lower
rates during recessions, then the last model would perform better. These three models result in mean ab-
solute deviations of 3.65, 3.73, and 3.76, respectively (compared to 3.69 when only having passthrough
rates separately by income and consumption). Hence, even though the last two models only add a bit
more complexity, they lead to overfitting. The model which differentiates passthrough rates by income
group, on the other hand, works as well as the best model overall. This result, though, does not extend
to using the mean squared error, using only comparable spells, or comparing models by how well they
predict growth in the mean, and hence should be interpreted with a grain of salt.
   An unanswered question remains why a model which assumes about 70 percent of GDP growth trickles
down to consumption works better than assuming a full transmission as with income vectors. One possible
explanation relates to the marginal propensity to consume. The passthrough rate of 0.7 is fully consistent
with a behavior where households on average consume all their income until per capita GDP reaches
a particular threshold, at which point the marginal propensity to consume declines as per capita GDP
rises. This seems to be consistent with existing evidence that marginal propensities to consume are higher
in poorer contexts. Gross, Notowidigdo, and Wang (2020) and Drescher, Fessler, and Lindner (2020)
estimate marginal propensities to consume between 0.33 and 0.57 in the United States or euro area, while
Crozier and Zavaleta (2022) estimate a marginal propensity to consume of 0.9 in Peru.
   A second possible explanation may be related to consumption items not captured well in surveys. This
includes items deliberately not captured in consumption aggregates, such as health expenses, which often
are excluded on the grounds that they do not increase welfare (Deaton and Zaidi 2002). It also includes
items not captured due to non-classical measurement error in consumption, such as food eaten away
from home. If the share of spending on such items increases with GDP, then that would generate a lower
passthrough rate for consumption.
   Finally, it is possible that the different passthrough rates are not related to whether welfare is measured
with income or consumption, but that countries with consumption aggregates differ from countries with
income aggregates in some unobserved way, and that this is driving the different passthrough rates. When
looking at the ratio of mean income to mean consumption for the six countries in PovcalNet that have
both at the same time on at least four different occasions, there are clear cases where this ratio increases
as income grows (see fig. S3.1 in the supplementary online appendix). This makes it less likely that an
omitted variable is behind the differential passthrough rates.

4.4. Exploring Heterogeneity
Though the preceding two subsections suggest that a simple GDP-based model works well on average,
a point of interest is whether there are cases where this may not apply. This is explored here using two
different strategies: (a) by comparing how well some of the top models perform in various contexts and
(b) by explicitly testing all models separately on rich and poor countries.
   Figure 7 plots the mean absolute deviation for three select models—the best overall, the best at pre-
dicting poverty rates directly, and the model using a fraction of growth in GDP per capita to scale the
mean—as a function of six other variables. In all of these figures, the sample of countries changes over the
x-axis. For example, in the first panel on extrapolation time (the time between two household surveys),
the countries with an extrapolation time of one year are mostly in Europe and South America, while the
countries with an extrapolation of 10 years tend to be in Sub-Saharan Africa. As a result, looking at the
The World Bank Economic Review                                                                                                                                   849

Figure 7. Errors as a Function of Other Variables




Source: Authors’ estimates based on data from PovcalNet, World Development Indicators, World Economic Outlook, and Google Earth Engine.
                                                                                                                                                                          Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
Note: Local polynomials of the absolute deviations as a function of other variables. The trend for a particular line is not interpretable since the sample of countries
changes over the x-axis. Yet the gaps between the lines are interpretable. Only confidence intervals for one model are shown to not clutter the graph. The confidence
interval evaluates the uncertainty of the local polynomial fit but does not incorporate the uncertainty of the predicted poverty rates themselves.




trend for a given line is of little interest. However, the gap between the three lines can be used to explore
which method works best in various contexts.
   It is plausible that predicting changes from the last survey is a good strategy when the last survey is
only a couple of years old, while it is a less sound strategy when the last survey is a decade old or more.
Extrapolation time does not matter when predicting poverty rates directly, but the average prediction
error might still be a function of extrapolation time for the reason mentioned above—that the sample of
countries changes. Even for extrapolation times of five years, predicting changes from the past survey by
just using growth in GDP per capita with a passthrough rate outperforms a model using 1,000+ variables
trying to predict the poverty rate directly and performs as well as models predicting growth in the mean
using 1,000+ variables (fig. 7). For extrapolation times beyond five years, there are too few observations
850                                                                    Mahler, Castañeda Aguilar, and Newhouse


to say anything with sufficient certainty, but it can be ruled out that just using GDP growth is significantly
worse than the other methods.
   Deaton and Schreyer (2021) argue that GDP is increasingly becoming detached from national material
well-being. This could imply that using growth rates in GDP per capita has become less attractive, relative




                                                                                                                      Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
to predicting poverty rates directly, in recent years. This is not the case in the results presented here (fig. 7).
A possible reconciliation between these two findings is that while GDP has become increasingly detached
from household surveys, household surveys have become more comparable across rounds within the same
country, allowing predictions from one survey to the next not to deteriorate.
   One could speculate that using GDP growth to project forward works less well in times of extreme
growth in either direction. This does not appear to be the case. Even during recessions and periods of high
growth, predicting changes from the last survey using GDP growth gives a lower error than predicting
poverty rates directly (fig. 7). There are cases, though, where using GDP growth is less attractive. This
appears to be the case whenever inflation is above 7 percent, when exports as a fraction of GDP are
above 60 percent, and when gross capital formation as share of GDP is very low or very high. In other
words, when real GDP growth is likely to be driven by irregular patterns—a large deflator or large specific
components—then real GDP growth is a less strong predictor of welfare changes.
   Another way to test whether the model using only GDP growth works well in different scenarios is to
train and evaluate all models on specific subsets of the data. Concretely, all models have been rerun on
cases with poverty rates less than or greater than 4 percent, a split which approximately halves the full
sample into two equal parts. For the poor sample, predicting changes in poverty rates works better than
predicting growth in the mean together with distribution neutrality (fig. S2.10). Yet the model using only
a fraction of growth in real GDP per capita performs better than all models (fig. S2.11). Though one may
be reluctant to conclude that no machine-learning model would work better on this subsample, it is safe
to say that the GDP-based model works particularly well in poor settings.
   For the rich sample, the story is different. Here, models that predict poverty rates directly now work
best (fig. S2.12). This is partially mechanical given that when only trained on estimates between 0 percent
and 4 percent, they can only yield predictions in this interval. When looking at how well the GDP-based
model performs, it is not far from the models that predict changes from the last survey, but it is notably
less attractive than in the poor sample (fig. S2.13).

4.5. From Nearcasting to Nowcasting
All results so far were based on models constrained to using variables available in the nearcasting year.
It is not clear how these models perform for nowcasting for two reasons. First, they rely partially on
features that may not be available in the nowcasting year. For the GDP-based model, this does not matter
given that GDP growth rates themselves are nowcasted by various institutions. This is not the case for
the more complicated models or even models just using growth in HFCE, which to our knowledge is
not nowcasted across countries. Depending on how well the complicated models predict when missing
values are present, this will make them perform worse for nowcasting and may make the preferred model
relatively more advantageous for nowcasting. Indeed, when running all methods earlier in the calendar
year (recall that earlier in the calendar year the informational space available for nearcasting approximates
that of nowcasting), just using GDP growth becomes relatively more advantageous.
    The second reason why the precision of nowcasting estimates may differ from nearcasting estimates
is that the features with data available in the nowcasting year are based on modeling, extrapolations,
or data for only part of the year. Such data are likely less accurate and ultimately less connected to the
welfare distribution. The nowcasted growth rates, for example, have not yet been realized and are likely
to deviate from the growth rates that eventually will be estimated by national authorities. Some evidence
suggests that nowcasted growth rates by the IMF might be too optimistic (Sandefur and Subramanian
2020).
The World Bank Economic Review                                                                                                                                    851

Figure 8. Error from Using Nowcasted Growth Data




                                                                                                                                                                           Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
Source: Authors’ estimates based on data from PovcalNet and World Economic Outlook (WEO).
Note: Estimates of how off initial growth forecasts from WEO are from final estimates, here understood as estimates published four years after the year in question
(panel a). For example, the point that intersects the vertical axis at 2 suggests that growth estimates launched in April of a given year that try to predict growth of
that year on average are 2 percentage points off the growth estimate for that year released four years later. This does not matter for nowcasting poverty (panel b). The
relatively flat curve suggests that whether forecasted, nowcasted, nearcasted, or final growth estimates are used for nowcasting poverty does not impact the accuracy
of the predictions. The poverty predictions use the growth data with a separate passthrough rate by income and consumption. The errors on the right-hand side are
not comparable to the main results given that the sample of spells with WEO nowcasted growth rates is a subset of the full sample.




    One can test the extent to which this matters for the preferred model by looking at how well historical
growth nowcasts predicted poverty. Here this is done by gathering all growth nowcasts (and forecasts and
nearcasts) from the World Economic Outlook back to 1999, the earliest available. Next it is tested how
well these growth nowcasts were aligned with final growth rates, defined here as growth rates estimated
four years after the year in question. Finally, these nowcasted growth rates are used to see how well they
predict poverty using the preferred model.
    GDP growth nowcasts (and even more so GDP growth forecasts) differ from the final growth rates
(fig. 8, panel a). Though this speaks against using growth data for nowcasting poverty, surprisingly, poverty
rates are not predicted worse using GDP growth forecasts, nowcasts, or nearcasts (fig. 8, panel b). One
possible way to reconcile these two findings is that the quality of GDP data generally is worse for poorer
countries. This means that though modeled GDP nowcasts may differ from what is estimated by national
authorities, they are equally good signals of changes to poverty.
    Though the purpose of this paper is not to analyze current patterns of global poverty, one can shed light
on the challenges when going from nearcasting to nowcasting poverty by looking at the nowcasted global
poverty rates from the models. Looking at the global and regional trends in extreme poverty from 2014–
2021, the nowcasting year at the time of writing, it stands out that the best model that predicts poverty
rates directly deviates for the nowcasting year (fig. 9). This is because the particular model used—gradient
boosting—turns out not to work well for 2021 where many of the input variables are missing. Though
all models could have been rerun only on variables widely available at the nowcasting year, it is not clear
that this would make things better as the model then shifts from one year to the next. For that reason,
this model is best only to use for nearcasting. Even for nearcasting, looking closely at the East Asia &
Pacific panel reveals that methods trying to predict poverty rates directly might yield unlikely trends. The
relatively large increase in the regional poverty rate in 2017 corresponds to the year where survey data
for China is no longer available. Prior to 2016, survey estimates for China are used. The jump suggests
that the model predicts higher poverty for China than the official estimates. It is hard to support a case
852                                                                                                      Mahler, Castañeda Aguilar, and Newhouse

Figure 9. Nowcasted Global Poverty




                                                                                                                                                                             Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
Source: Authors’ estimates based on data from PovcalNet, World Development Indicators, World Economic Outlook, and Google Earth Engine.
Note: For countries without any prior data, gradient boosting is used to predict poverty rates directly for all models. Note that the scale of the y-axis differs for each
graph. Nowcasts at the country level are available in figs S3.2–S3.6 in the supplementary online appendix.



for poverty in East Asia increasing rapidly from 2016 to 2017 though. This speaks intuitively in favor of
models that are anchored in past estimates.
   When comparing the model only using GDP growth and the best model overall, it is evident that for
most regions the two are nearly aligned. This makes it unlikely that the nowcasted GDP growth rates differ
from other trends observed in nowcasted variables. The exception is South Asia, and as a consequence,
the world as a whole, where just using GDP per capita suggests a lower poverty rate for India. In other
words, other indicators have progressed less fast in India than growth in real GDP per capita.

4.6. Limitations of Using Growth for Nowcasting and Nearcasting
The past four subsections have shown that a model just using growth in real GDP per capita to shift
the mean forward is relatively accurate, unlikely to be severely improved upon by a simple model using
another variable, works in variety of settings (but is less attractive for rich countries and when the makeup
of GDP is irregular), and works for nowcasting as well. Yet using growth in real GDP per capita also comes
with possible shortcomings. Here three of those will be mentioned.
   First, the tight historical relationship between poverty reduction and growth may in part be due to
new welfare aggregates being benchmarked against GDP data. When the World Bank, National Statistical
Offices, or others create welfare aggregates, there are many assumptions that need to be made, such as the
treatment of outliers and the inclusion of particular components. There are cases where those choices have
been partially guided by how the country fared on related indicators, most notably GDP per capita. If this
behavior is prevalent, such a pattern could create a mechanical relationship between measured poverty
and GDP which need not extend to nowcasting. Second, in some cases GDP relies partially on data from
the same survey that is used to measure poverty. This would likewise create a mechanical relationship
between the two.
   Third, if GDP numbers are subject to quality concerns, then they may likewise be less relevant for
both nearcasting and nowcasting. The fact that using growth rates in real GDP per capita worked well
when trained on historical data suggests that historically this has not been frequently the case (or at least
not more so than quality concerns with other indicators). Nonetheless, a large discrepancy for India when
using growth in real GDP per capita and using more explanatory variables was found (fig. 9). Though this
may simply be because true growth in real GDP per capita is less connected with household consumption
The World Bank Economic Review                                                                          853


in India than in other countries, evidence suggests that GDP growth rates in India in recent years are
non-credibly high (Subramanian 2019). If this is indeed the case, then this serves as another argument
against only relying on growth in real GDP per capita.




                                                                                                               Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
5. Conclusion
This paper has analyzed how best to nowcast poverty around the world. Statistical learning techniques
were applied to the World Bank’s collection of international poverty estimates utilizing more than 1,000
development indicators as features. It was investigated whether predicting poverty rates directly had a
higher accuracy than predicting poverty indirectly, by, for example, predicting growth in mean welfare
and applying this growth to scale up the past distribution, or by predicting growth in the mean and in the
Gini and applying growth incidence curves to scale and stretch the past distribution.
    Findings revealed that a model which simply uses a fraction of growth in real GDP per capita since
the last observed household survey to shift the entire distribution performs better than any model that
predicts poverty rates directly using all the variables mentioned above, and nearly as well as all models
trying to predict growth in the mean and in the Gini using all the variables mentioned above. This suggests
that conditional on knowing growth in real GDP per capita, no other variable can substantially increase
the predictive accuracy.
    On the one hand, given the decades-long literature documenting the importance of growth for poverty
reduction (Kraay 2006; Ferreira and Ravallion 2009), this is not surprising. Partially as a result of this
literature, variants of the GDP-based model have been used both in the literature and for the World
Bank’s official global poverty monitoring. On the other hand, several papers have documented important
gaps between national accounts data and household survey data, suggesting that GDP might not be that
connected to welfare as measured in household surveys (Ravallion 2003; Deaton 2005; Ferreira, Leite,
and Ravallion 2010; Pinkovskiy and Sala-i Martin 2016; Deaton and Schreyer 2021; Prydz, Jolliffe, and
Serajuddin forthcoming).
    One way to reconcile this seemingly contradictory conclusion—that a GDP-based model predicts
poverty relatively well despite GDP being disconnected from welfare as measured in household surveys—
is to note that no model predicts poverty that well in absolute terms. Though the model using growth rates
in GDP per capita performs better than nearly all other models, it does not imply that the model’s error is
low. The GDP-based model can explain about 97 percent of the out-of-sample variation in poverty rates,
but only about 22 percent of the out-of-sample variation in growth in mean welfare. The models using
1,000+ variables can explain at most 28 percent of growth in mean welfare. In other words, most of the
temporal variation cannot be predicted by any information utilized here.
    Why are changes in poverty so hard to predict by any model? Five possible reasons come to mind.
First, the available features may not be well suited to predict growth in welfare. Though predictors from
different databases were relied upon, including remote sensing data, it is possible that the most important
features are contained in other data sources, such as mobile phone data or proprietary remote sensing
data. Second, other algorithms or ensemble learning might perform better. Third, only data at the country
level were considered; it is possible that models specified at the subnational level perform better.
    Fourth, measurement error and measurement differences in welfare aggregates over time may make it
difficult to predict changes. Issues like whether a recall or diary is used, number of consumption items
asked about, the treatment of rent and durables, and differential non-response to surveys vary between
and within countries and can have significant bearings on poverty rates (Jolliffe 2001; Beegle et al. 2012;
Islam, Newhouse, and Yanez-Pagans 2021). Accounting for these issues would require a detailed database
on the methodology used to construct welfare aggregates, which is not currently available. Finally, the fact
that the distributions of many countries have a lot of mass around the international poverty line means
854                                                                     Mahler, Castañeda Aguilar, and Newhouse


that even small changes to the distribution of welfare can yield large changes to the poverty rate. This
makes it difficult to predict the poverty rate with great precision.
    It may be the case that with different features, more granular data, or different models, it is possible
to improve upon the preferred model of this paper. Yet from the analysis conducted here, it remains the




                                                                                                                        Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
case that a very simple model utilizing growth in real GDP per capita to scale a prior welfare distribution
is often an appealing method.



Data Availability Statement
The data underlying this article are available at https://github.com/danielmahler/NowcastingGlobal
Poverty.



References
Angrist, N., K. P. Goldberg, and D. Jolliffe. 2022. “Why Is Growth in Developing Countries So Hard to Measure?”
  Journal of Economic Perspectives 35 (3): 215–42.
Arlot, S., and A. Celisse. 2010. “A Survey of Cross-Validation Procedures for Model Selection.” Statistics Surveys 4:
  40–79.
Aruoba, S. B., and F. X. Diebold. 2010. “Real-Time Macroeconomic Monitoring: Real Activity, Inflation, and Inter-
  actions.” American Economic Review 100 (2): 20–24.
Beegle, K., J. De Weerdt, J. Friedman, and J. Gibson. 2012. “Methods of Household Consumption Measurement
  through Surveys: Experimental Results from Tanzania.” Journal of Development Economics 1 (98): 3–18.
Belloni, A., and V. Chernozhukov, 2013. “Least Squares After Model Selection in High-Dimensional Sparse Models.”
  Bernoulli 19 (2): 521–47.
Bergmeir, C., and J. M. Benítez. 2012. “On the Use of Cross-Validation for Time Series Predictor Evaluation.” Infor-
  mation Sciences 191(15 May): 192–213.
Bergmeir, C., R. J. Hyndman, and B. Koo. 2018. “A Note on the Validity of Cross-Validation for Evaluating Autore-
  gressive Time Series Prediction.” Computational Statistics & Data Analysis 120(April): 70–83.
Bourguignon, F. 2003. “The Growth Elasticity of Poverty Reduction: Explaining Heterogeneity across Countries and
  Time Periods.” In Inequality and Growth: Theory and Policy Implications, edited by T. Eicher and S. Turnovsky.
  Cambridge: MIT Press.
Breiman, L. 2001. “Random Rorests.” Machine Learning 45 (1): 5–32.
Chi, G., H. Fang, S. Chatterjee, and J. E. Blumenstock. 2022. “Micro-Estimates of Wealth for All Low- and Middle-
  Income Countries.” Proceedings of the National Academy of Sciences 119 (3): e2113658119.
Crozier, S. L., and F. B. Zavaleta. 2022. “The Marginal Propensity to Consume of 2020 COVID-19 Stimulus Payments
  in Peru.” International Journal of Economics and Finance 14 (3): 115–15.
Cuaresma, J. C., W. Fengler, H. Kharas, K. Bekhtiar, M. Brottrager, and M. Hofer. 2018. “Will the Sustainable Devel-
  opment Goals Be Fulfilled? Assessing Present and Future Global Poverty.” Palgrave Communications 4 (1): 1–8.
Deaton, A. 2005. “Measuring Poverty in a Growing World (or Measuring Growth in a Poor World).” Review of
  Economics and Statistics 87 (1): 1–19.
Deaton, A., and P. Schreyer. 2021. “GDP, Wellbeing, and Health: Thoughts on the 2017 Round of the International
  Comparison Program.” Review of Income and Wealth 68 (1): 1–15.
Deaton, A., and S. Zaidi. 2002. “Guidelines for Constructing Consumption Aggregates for Welfare Analysis.” LSMS
  Working Paper No. 135. World Bank, Washington, DC.
Drescher, K., P. Fessler, and P. Lindner. 2020. “Helicopter Money in Europe: New Evidence on the Marginal Propensity
  to Consume across European Households.” Economics Letters 195(October): 109416.
Edward, P., and A. Sumner. 2014. “Estimating the Scale and Geography of Global Poverty Now and in the Future:
  How Much Difference Do Method and Assumptions Make?” World Development 58(June): 67–82.
Ekhator-Mobayode, U. E., and J. Hoogeveen. 2021. “Microdata Collection and Openness in the Middle East and
  North Africa.” Policy Research Working Paper 9892. World Bank, Washington, DC.
The World Bank Economic Review                                                                                   855


Ferreira, F. H. G., S. Chen, A. Dabalen, Y. Dikhanov, N. Hamadeh, D. Jolliffe, and A. Narayan et al.. 2016. “A Global
   Count of the Extreme Poor in 2012: Data Issues, Methodology and Initial Results.” Journal of Economic Inequality
   14 (2): 141–72.
Ferreira, F. H. G., and P. G. Leite. 2003. “Policy Options for Meeting the Millennium Development Goals in Brazil:
   Can Micro-Simulations Help?” Policy Research Working Paper 2075. World Bank, Washington, DC.




                                                                                                                        Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
Ferreira, F. H. G., P. G. Leite, and M. Ravallion. 2010. “Poverty Reduction without Economic Growth?: Explaining
   Brazil’s Poverty Dynamics, 1985–2004.” Journal of Development Economics 93 (1): 20–36.
Ferreira, F. H. G., and M. Ravallion. 2009. “Poverty and Inequality: The Global Context.” In The Oxford Handbook
   of Economic Inequality, edited by W. Salverda, B. Nolan and T. Smeeding. Oxford: Oxford University Press.
Fisk, P. R. 1961. “The Graduation of Income Distributions.” Econometrica: Journal of the Econometric Society 29 (2):
   171–85.
Friedman, J. H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics 29 (5):
   1189–232.
Giannone, D., J. Henry, M. Lalik, and M. Modugno. 2012. “An Area-Wide Real-Time Database for the Euro Area.”
   Review of Economics and Statistics 94 (4): 1000–13.
Giannone, D., L. Reichlin, and D. Small. 2008. “Nowcasting: The Real-time Informational Content of Macroeconomic
   Data.” Journal of Monetary Economics 55 (4): 665–76.
Gross, T., M. J. Notowidigdo, and J. Wang. 2020. “The Marginal Propensity to Consume over the Business Cycle.”
   American Economic Journal: Macroeconomics 12 (2): 351–84.
Hillebrand, E. 2008. “The Global Distribution of Income in 2050.” World Development 36 (5): 727–40.
Hothorn, T., K. Hornik, and A. Zeileis. 2006. “Unbiased Recursive Partitioning: A Conditional Inference Framework.”
   Journal of Computational and Graphical Statistics 15 (3): 651–74.
Iizuka, N., M. Oka, H. Yamada-Okabe, M. Nishida, Y. Maeda, N. Mori, and T. Takao et al. 2003. “Oligonucleotide
   Microarray for Prediction of Early Intrahepatic Recurrence of Hepatocellular Carcinoma after Curative Resection.”
   Lancet 361 (9361): 923–29.
Islam, T. T., D. Newhouse, and M. Yanez-Pagans. 2021. “International Comparisons of Poverty in South Asia.” Asian
   Development Review 38 (1): 142–75.
Jha, S. 2019. “Govt Scraps NSO’s Consumer Expenditure Survey over ‘Data Quality’.” Business Standard,
   November 6.
Jolliffe, D. 2001. “Measuring Absolute and Relative Poverty: The Sensitivity of Estimated Household Consumption
   to Survey Design.” Journal of Economic and Social Measurement 27 (1–2): 1–23.
Kakwani, N. 1993. “Poverty and Economic Growth with Application to Côte d’Ivoire.” Review of Income and Wealth
   39 (2): 121–39.
Kraay, A. 2006. “When Is Growth Pro-Poor? Evidence from a Panel of Countries.” Journal of Development Economics
   80 (1): 198–227.
Lakner, C., D. G. Mahler, M. Negre, and E. B. Prydz. 2022. “How Much Does Reducing Inequality Matter for Global
   Poverty?” Journal of Economic Inequality 20 (3): 559–585.
Moses, M., H. Kharas, M. Miller-Petrie, G. Tsakalos, L. Marczak, S. Hay, and C. Murray et al. 2021. “Global Poverty
   and Inequality from 1980 to the COVID-19 Pandemic.” SocArXiv x45np, Center for Open Science.
Pinkovskiy, M., and X. Sala-i Martin. 2016. “Lights, Camera... Income! Illuminating the National Accounts-
   Household Surveys Debate.” Quarterly Journal of Economics 131 (2): 579–631.
Prydz, E. B., D. M. Jolliffe, C. Lakner, D. G. Mahler, and P. Sangraula. 2019. “National Accounts Data Used in Global
   Poverty Measurement.” World Bank Group Global Poverty Monitoring Technical Note. World Bank, Washington,
   DC.
Prydz, E. B., D. M. Jolliffe, and U. Serajuddin. forthcoming. “Disparities in Assessments of Living Standards Using
   National Accounts and Surveys.” Review of Income and Wealth.
Ravallion, M. 2003. “Measuring Aggregate Welfare in Developing Countries: How Well Do National Accounts and
   Surveys Agree?” Review of Economics and Statistics 85 (3): 645–52.
———, 2013. “How Long Will It Take To Lift One Billion People Out of Poverty?” World Bank Research Observer
   28 (2): 139–58.
Ravallion, M., and S. Chen. 2003. “Measuring Pro-Poor Growth.” Economics Letters 78 (1): 93–99.
Rubin, D. B. 1976. “Inference and Missing Data.” Biometrika 63 (3): 581–92.
856                                                                        Mahler, Castañeda Aguilar, and Newhouse


———. 2004. Multiple Imputation for Nonresponse in Surveys, Vol. 81. Hoboken: John Wiley & Sons.
Sandefur, J., and A. Subramanian. 2020. “The IMF’s Growth Forecasts for Poor Countries Don’t Match Its COVID
   Narrative.” Working Paper 533. Center for Global Development.
Schafer, J. L. 1997. Analysis of Incomplete Multivariate Data. New York: CRC Press.
Stone, M. 1974. “Cross-Validatory Choice and Assessment of Statistical Predictions.” Journal of the Royal Statistical




                                                                                                                            Downloaded from https://academic.oup.com/wber/article/36/4/835/6750020 by Sectoral Library Rm MC-C3-220 user on 10 December 2023
   Society: Series B (Methodological) 36 (2): 111–33.
Subramanian, A. 2019. “India’s GDP Mis-Estimation: Likelihood, Magnitudes, Mechanisms, and Implications.” Cen-
   ter for International Development Working Paper Series No. 354. Harvard University.
Sumner, A., and C. Hoy. 2022. “The End of Global Poverty: Is the UN Sustainable Development Goal 1 (Still) Achiev-
   able?” Global Policy 12 (4): 419–29.
Tibshirani, R. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society: Series
   B (Methodological) 58 (1): 267–88.
Varma, S., and R. Simon. 2006. “Bias in Error Estimation When Using Cross-Validation for Model Selection.” BMC
   Bioinformatics 7 (1): 1–8.