Policy Research Working Paper                             9860




                 Nowcasting Global Poverty
                               Daniel Gerszon Mahler
                            R. Andrés Castañeda Aguilar
                                 David Newhouse




Development Data Group
  &
Poverty and Equity Global Practice
November 2021
Policy Research Working Paper 9860


  Abstract
 This paper evaluates different methods for nowcasting                              from current World Bank practice—performs nearly as well
 country-level poverty rates, including methods that apply                          as models using statistical learning on 1,000+ variables. This
 statistical learning to large-scale country-level data obtained                    GDP-based approach outperforms all models that predict
 from the World Development Indicators and Google Earth                             poverty rates directly, even when the last survey is up to
 Engine. The methods are evaluated by withholding mea-                              five years old. The results indicate that in this context, the
 sured poverty rates and determining how accurately the                             additional complexity introduced by applying statistical
 methods predict the held-out data. A simple approach that                          learning techniques to a large set of variables yields only
 scales the last observed welfare distribution by a fraction of                     marginal improvements in accuracy.
 real GDP per capita growth—a method that departs slightly




 This paper is a product of the Development Data Group, Development Economics and the Poverty and Equity Global
 Practice. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to
 development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://
 www.worldbank.org/prwp. The authors may be contacted at dmahler@worldbank.org, acastanedaa@worldbank.org, and
 dnewhouse@worldbank.org.




         The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
         issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
         names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
         of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
         its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                       Produced by the Research Support Team
                 Nowcasting Global Poverty*
                            Daniel Gerszon Mahler
                        R. Andr´       ˜
                               es Castaneda Aguilar
                                David Newhouse




      JEL codes: C53, D31, I32, O10.
      Keywords: Poverty, Nowcasting, Machine Learning, Measurement.



  * All authors are with the World Bank. We are grateful for comments received from
Aart Kraay, Andres Fernando Chamorro Elizondo, Benjamin Stewart, Benu Bidani, Christoph
Lakner, Dean Jolliffe, Lucas Kitzmueller, Marta Schoch, Minh Cong Nguyen, Nishant Yon-
zan, Nobuo Yoshida, and Samuel Koﬁ Tetteh Baah. We are also grateful for feedback re-
ceived during the special IARIW-World Bank Conference ‘New Approaches to Deﬁning and
Measuring Poverty in a Growing World’, the CCS-UN Technical Workshop ‘Nowcasting in In-
ternational Organizations’, and the 2021 ECINEQ Conference. We gratefully acknowledge ﬁ-
nancial support from the UK government through the Data and Evidence for Tackling Ex-
treme Poverty (DEEP) Research Programme and from the World Bank through a Research
Support Budget grant. The code to reproduce the ﬁndings of this paper is available at
https://github.com/danielmahler/NowcastingGlobalPoverty.
1    Introduction
Timely and comparable poverty estimates are vital to assess countries’ develop-
ment progress. International poverty estimates serve as a public good for re-
searchers and inform the development community on efforts to meet the ﬁrst
Sustainable Development Goal, to end extreme poverty by 2030. Within inter-
national development organizations, national development agencies, and NGOs,
they also inform the allocation of resources and the development of strategic pri-
orities.
    Yet timely and comparable estimates of poverty are lacking for many reasons.
In some countries, fragility, conﬂict, and violence make it difﬁcult to conduct
household expenditure surveys altogether, while in other countries, lack of mone-
tary resources is the main obstacle. Even when surveys are frequently conducted,
the time it takes to ﬁeld a survey, collect, process, and analyze the data, often im-
plies a two-year lag from the time of data collection to the release of international
poverty estimates. With the world changing at an ever more rapid pace, as illus-
trated by the unexpected onset of COVID-19, this lag risks painting an outdated
picture of poverty in a country. As of October 2021, on average across the devel-
oping world, the most recent survey with international poverty data was from
2014. In addition, 16 economies with a population greater than 1 million had
no international poverty estimate at all. For these reasons, initiatives that reliably
and cost-effectively predict what the poverty rate is today are crucial for informed
and effective high-level decision-making.
    The objective of this paper is to test various methods to estimate extreme
poverty in all countries of the world as of the present year and as of the preceding
year – at the time of writing, 2021 and 2020. We will refer to estimates of poverty
for the present year as nowcasts and estimates of poverty for the preceding year
as nearcasts. For nearcasting, one can rely on all data that are produced with less
than a one-year time lag. We will use more than 1,000 variables from the World
Development Indicators, the World Economic Outlook, and the Google Earth En-
gine. For nowcasting, only variables that themselves have been nowcasted by
others are available, as well as variables that are produced with little or no time
lag, such as certain remote sensing indicators.
    We combine all of these possible predictors with the PovcalNet database, which
contains more than 2,000 international poverty estimates covering 168 countries.
We train our model on these past surveys – essentially pretending a subset of
them do not exist – and evaluate the models by measuring how well they ap-
proximated the true poverty rates of these held-out surveys. The best performing

                                         2
models are then leveraged to predict global poverty for the nearcasting and now-
casting years.
    Intuitively, to predict extreme poverty around the world, one would want the
models to predict poverty rates directly. Yet this ignores that prior full distribu-
tions of consumption or income (henceforth welfare) are available for most coun-
tries. We will explore whether greater accuracy can be obtained by predicting
poverty indirectly, by, for example, predicting changes in poverty from the last
survey, by predicting growth in mean welfare and applying this growth to scale
up the past distribution, or by predicting growth in the mean and in the Gini and
applying growth incidence curves to scale and stretch the past distributions.
    For nearcasting, we ﬁnd that models that predict poverty rates directly are
outperformed by models that predict growth in mean welfare since the last sur-
vey and scales the last distribution by this predicted growth. Though this method
assumes that inequality remains unchanged since the last survey, explicitly mod-
eling distributional changes by predicting changes in the Gini coefﬁcient does
not help. The models that predict changes in inequality next to growth in mean
welfare perform slightly worse than models that assume no changes to inequal-
ity. The reason for this is that of the 1,000+ candidate variables, none of them
contains notable information about changes in inequality. The best performing
method overall, which predicts growth in mean welfare using a random forest,
gives a mean absolute deviation of 3.65 percentage points. This means that on av-
erage over all countries for which we have data, the predicted poverty evaluated
at the international poverty line of $1.90 is 3.65 percentage points from the truth.
    The discussion above pertained to models using variables that are not always
available in the nowcasting year. Yet, we ﬁnd that a model which simply uses
a fraction of growth in real GDP per capita to scale up the last mean – a model
which can be used for nowcasting as well – gives a mean absolute deviation of
3.69, about 1 percent worse than the overall best performing model for nearcast-
ing. In other words, conditional on knowing growth in real GDP per capita since
the last observed distribution, no other variable contributes signiﬁcant informa-
tion about the evolution of poverty rates. Due to its simplicity and ability to be
used for both nearcasting and nowcasting, we consider this the preferred method.
We show that even when the last survey is as much as ﬁve years old, extrapo-
lating forward using a fraction of growth in real GDP per capita is superior to
predicting poverty rates directly with 1,000+ variables. For longer extrapolation
times, we lack power to determine which method is superior.
    On the one hand, the relevance of GDP growth for nowcasting poverty is not


                                         3
surprising; the impact of growth on poverty reduction has been well-known for
decades (Kraay, 2006; Ferreira and Ravallion, 2009). On the other hand, ample ev-
idence has found large inconsistencies between consumption measured in house-
hold surveys and national accounts within and across countries (Ravallion, 2003;
Deaton, 2005; Ferreira et al., 2010; Pinkovskiy and Sala-i Martin, 2016; Prydz et al.,
2021; Deaton and Schreyer, forthcoming) and noted the difﬁculty of measuring
GDP in developing countries (Angrist et al., 2022).
    Several factors can explain why we nonetheless ﬁnd GDP to be such an impor-
tant predictor of poverty. First, discrepancies between levels of GDP and levels
of welfare do not directly affect the accuracy of the method, which is based on
growth rates from the two data sources. Second, the average discrepancy be-
tween growth rates in the two sources is accounted for by only allowing a frac-
tion of growth in real GDP per capita to ’pass through’ to welfare as measured
in household surveys. The preferred method has a passthrough rate of about
0.7 for consumption-based poverty estimates, and a passthrough rate of 1 for
income-based poverty estimates. Third, GDP is probably the statistic in which
the global community has invested the most amount of ﬁnancing and capacity-
building to improve quality and ensure cross-country comparability (see Angrist
et al. (2022) for a discussion of this). Despite measurement issues and shortcom-
ings, we would expect GDP to have more signaling properties for cross-country
measures of average well-being than any other statistic. Fourth, although using
growth in real GDP per capita is the best performing method, no method is able
to explain more than a quarter of the variation of growth in mean welfare. Hence,
a large part of the variation in poverty rates no model is able to predict, and may
reﬂect a substantial amount of random noise in measured welfare, resulting from
different deﬁnitions of welfare.
    To our knowledge, this is the ﬁrst paper comparing different ways of nowcast-
ing poverty at a global scale. Other papers, such as Chi et al. (2021), Cuaresma
et al. (2018), and Moses et al. (2021) have predicted global or near-global poverty
rates but not with the purpose of testing different methods. For other indicators,
nowcasting is a more established exercise, such as nowcasts of GDP (Giannone
et al., 2008), inﬂation (Aruoba and Diebold, 2010), and macroeconomic variables
more broadly (Giannone et al., 2012). Partially due to the SDG target of ending ex-
treme poverty by 2030, many papers have focused on forecasting global poverty
rather than nowcasting it (Edward and Sumner, 2014; Hillebrand, 2008; Raval-
lion, 2013; Lakner et al., 2020; Sumner and Hoy, 2022).
    Our ﬁndings contribute to the literature by pointing out that when predicting


                                          4
poverty at the national level, (1) predicting changes from the past distribution
is superior to predicting poverty rates directly, and (2) conditional on knowing
real GDP per capita, publicly available remote sensing data carry little additional
information.
    COVID-19 illustrates both the promises and pitfalls of nowcasts. On the one
hand, the pandemic caused an unprecedented temporary stop in the production
of household surveys leading to an important gap in timely data and an impor-
tant role for methods that circumvent the availability of primary data sources. On
the other hand, the nature of the shock casts doubts on whether models trained
on historical data work in a radically changed environment. Partially due to this
uncertainty, the country-level nowcasts we produce should not be understood as
a substitute for country-level nowcast exercises. When the objective is to nowcast
poverty for a single country, often relying on other methods, such as microsim-
ulation tools, or more granular data will be superior as they can be tailored to
country-speciﬁc contexts. Yet as those methods are hard to implement consis-
tently across many countries, the present exercise can be attractive when the ob-
jective is to compare poverty across a range of countries.


2     Method
We will employ several methods to predict poverty around the world. In this
section we outline the methods we use, their advantages and disadvantages, and
other important methodological choices. We will focus on three distinct issues:
(1) The target variable being predicted and (if not poverty rates directly) how
poverty rates are obtained from these predictions, (2) the algorithms used to gen-
erate the predictions, and (3) the evaluation criterion used to judge predictive
performance. Throughout this paper, when we refer to predicting poverty rates,
we are using the international poverty line of $1.90 per day (Ferreira et al., 2016).


2.1   The target variable
When deciding how to predict poverty, it seems intuitive that the target variable
to predict should be the poverty rate in each country. Yet, such a method ignores
that we have prior poverty estimates from most countries which we could pre-
dict changes from. Behind these prior estimates are a full distribution of welfare
which might contain relevant information – for example by distinguishing the
near poor and the rich. Utilizing some of this past information, we will work with


                                         5
six different target variable combinations as outlined in Table 1 and explained in
more detail in what follows.

                                   Table 1: Target variables
 Target variable            Approach for estimating poverty
 (1) Poverty rates          Predicted directly
 (2) Changes in             Apply the predicted change to the most recent poverty rate
 poverty rates
 (3) Mean welfare           Scale the past distribution to the predicted mean welfare
 (4) Growth in mean         Scale the past distribution by 1+ the predicted growth in mean
 welfare                    welfare
 (5) Mean welfare           (i) Assume the distribution is log-normal or log-logistic or (ii)
 and Gini coefﬁcient        apply a growth incidence curve from the past distribution to
                            match the predicted mean and Gini
 (6) Growth in mean         (i) Apply the predicted growth in the mean and Gini to the
 welfare and Gini           past mean and Gini and assume that the distribution is log-
 coefﬁcient                 normal or log-logistic or (ii) apply a growth incidence curve
                            to the past distribution with the predicted growth in the mean
                            and Gini
       Notes: The table lists the six different combinations of target variables we will use for the
predictions and how poverty rates are backed out based on the target variable(s).


    (1) Poverty rates and (2) changes in poverty rates. Predicting the poverty rate di-
rectly is the most intuitive and straightforward option; the poverty rates are the
ultimate objective of this paper. Predicting poverty rates directly at the nowcast-
ing year, tn , has the advantage that it can yield estimates for countries without any
previous poverty estimates at all. This is particularly a concern for countries that
do not produce or share household survey data. Predicting changes in poverty
from the past survey conducted at time ts , in contrast, needs some assumptions
about poverty levels in countries without data to arrive at global poverty rates.
Yet, by utilizing the past survey, one can exploit that there is a past estimate to
anchor the analysis around. When we predict changes in poverty, c       ˆpoverty,tn ,ts , the
nowcasted poverty rates are given by

                                                                ˆpoverty,ts ,tn
                                ˆpoverty,ts ,tn ) = povertyts + c
                     povertytn (c                                                               (1)

When predicting changes from the past survey, we look at the annualized change
in poverty rates in percentage points. This avoids having to predict extreme
and undeﬁned values, which often occur when predicting annualized growth
in poverty rates, due to countries with poverty rates close to or at zero percent.
    (3) Mean welfare and (4) growth in mean welfare. While predicting changes in
poverty rates from the past survey exploits some past information available, it


                                                 6
still ignores the fact that a whole distribution of welfare was available in the past.
Distributions with high density around the poverty line are likely to experience
different magnitudes of changes in poverty than countries with sparse density
around the poverty line. By predicting the mean or growth in the mean and
scaling the past distribution accordingly, the model takes full advantage of the
previous data in the sense that the entire distribution is leveraged. Method (3)
works by scaling the welfare of each household, h, at the last survey by the ratio
of the predicted mean at the nowcasting year, µ    ˆ tn , and the observed mean at the
last survey µts . The adjusted distribution is used to estimate poverty at time tn :

                                                               ˆ tn
                                                               µ
                                  ˆ tn ) = F [wel f areh,ts
                       povertytn (µ                                 < 1.9]             (2)
                                                               µ ts

   Similarly, method (4) works by taking the last observed distribution of welfare
                                                                       ˆ µ,t n ,t s .
and scaling it by the growth in the mean predicted between ts and tn , g

                            ˆ µ,ts ,tn ) = F [wel f areh,ts (1 + g
                povertytn ( g                                    ˆ µ,ts ,tn ) < 1.9]   (3)

Figure 1 shows a hypothetical example of how this works. In this hypothetical
example, we take the latest observed distribution for Botswana from 2016 and
show what would happen if we predicted annualized growth in the survey mean
of 4% between 2016 and 2021. By shifting the distribution to the right reﬂecting
ﬁve years of this growth rate, the poverty rate at $1.90 declines from about 15% to
9%. As the ﬁgure makes clear, predicting growth in the mean has the advantage
that the model can be applied to any poverty line. Yet it imposes the assumption
that all households have experienced the same growth since the last survey. In
other words, it imposes that inequality has not changed since the last survey.
    (5) Mean consumption and Gini coefﬁcient and (6) Growth in mean and growth in
Gini. The ﬁfth and sixth methods try to deal with the latter issue by also pre-
dicting inequality – either directly or by predicting growth in inequality since the
last survey. A challenge with this approach is that there are many different mea-
sures of inequality and inﬁnitely many ways in which the same level/growth of
inequality can materialize. We will use the Gini coefﬁcient as the measure of in-
equality due its popularity and use some further distributional assumptions to
pin down how the Gini shapes the distribution. In particular, we will use two
ways of converting Gini predictions into poverty rates.
    First, we will use the predicted mean and predicted Gini together with a
known two-parameter distributional shape that welfare can follow. We will use
the log-normal distribution, which is frequently used for poverty and inequality

                                                 7
Figure 1: Example of recovery of poverty rates from predictions of mean welfare




Notes: The ﬁgure shows how we implement models where the target variable is mean welfare
or growth in mean welfare. In this particular ﬁgure, we use the last observed distribution from
Botswana from 2016 and show how that distribution is projected forward by a hypothetical pre-
diction of a growth in mean welfare of 4% per year.


analysis (see for example Bourguignon (2003)) and the log-logistic distribution
(also known as the Fisk distribution, after Fisk (1961)). We use either of these dis-
tributions to back out the poverty rate given a predicted mean, µ ˆ tn and predicted
        ˆ . Formally, we back out poverty with the log-normal distribution as
Gini, gini tn



                                                                               ˆ )]2
                                                         ˆ tn ) + 2[er f −1 ( gini
                                            ln(1.9) − ln(µ
                                ˆ )=Φ                                              tn
                        ˆ tn , gini
  povertytn ,lognormal (µ           tn                                                , (4)
                                                             − 1   ˆ
                                                       2er f ( gini )    tn

   and with the log-logistic distribution as:
                                                                                 1
                                                       µ            ˆ )
                                                       ˆ tn sin(π gini         ˆ
                                                                              gini
                                                                                        −1
                                                                       tn          tn
                                          ˆ ) = 1+
                                  ˆ tn , gini
          povertytn ,loglogistic (µ                                                          (5)
                                              tn                  ˆ
                                                           1.9π ginitn

                                                                          ˆ tn and
   Figure 2 shows the resulting poverty rates we predict as a function of µ
 ˆ .
gini tn




    The other method applies a speciﬁc growth incidence curve (GIC) from the
last observed distribution. GICs plot the growth in welfare as a function of
the percentile, p, of the initial welfare distribution (Ravallion and Chen, 2003).
Downwards-sloping GICs reduce inequality and vice versa. Evidence shows that

                                              8
          Figure 2: Illustration of log-normal and log-logistic conversions
                 (a) Log-normal                                    (b) Log-logistic




                                                                                      ˆ tn , and
Notes: The ﬁgures plot the predicted poverty rate for a given predicted mean welfare, µ
       ˆ , when assuming a log-normal distribution or a log-logistic distribution.
Gini, gini tn



GICs often take on approximately linear and convex forms (Lakner et al., 2020;
Kakwani, 1993; Ferreira and Leite, 2003). By imposing particular functional forms
on the GICs, given a predicted mean welfare and predicted Gini, there is only one
possible GIC. When using GICs, the nowcasted poverty rates are backed out as
follows:


                            ˆ ) = F [wel f areh, p,t (1 + g p,t ,t (µ
                    ˆ tn , gini                                            ˆ
    povertytn ,GIC (µ           tn                  s          s n ˆ tn , gini tn )) < 1.9],   (6)

                          ˆ ) are percentile-speciﬁc growth rates, which are determined
                  ˆ tn , gini
where g p,ts ,tn (µ           tn
such that the resulting distribution matches the predicted mean and Gini. In ad-
dition, for the linear GIC, there is a requirement that g p,ts ,tn = β + δ p, while for
the convex GIC, there is a condition that g p,ts ,tn = (1 − α)(1 + γ) − 1 + [α(1 +
γ)µts ]/µ p,ts . Here β and δ or α and γ are parameters that are estimated to ensure
that all the above equations hold (Lakner et al., 2020).
    Figure 3 shows two sample GICs – again using the last survey from Botswana
in 2016 as the starting point – and how they impact the shape of the nowcasted
distribution. Both assume that the nowcasted mean has grown by 4% annually
since the last survey and that the nowcasted Gini is 5% lower than the last ob-
served Gini but distribute this growth differently along the distribution.



   The above equations pertain to method (5). Analogous equations can be made
with method (6) where the target variables are the growth in the mean and Gini.

                                                9
             Figure 3: Illustration of growth incidence curve conversions
           (a) Growth incidence curves                   (b) Implications on distributions




Notes: The ﬁgures show examples of a linear and convex GIC, here applied to the last survey in
Botswana from 2016.


Throughout, we predict growth in the Gini rather than changes in the Gini based
on preliminary analysis comparing the performance of those two options.
    Throughout the analysis we will pay particular attention to a submethod un-
der the fourth category, which is a variant of what the World Bank uses to extrap-
olate poverty in countries and report on Sustainable Development Goal indicator
1.1.1 – to end extreme poverty by 2030. The method is based on the premise that
there is a tight relationship between income or expenditure as measured in na-
tional accounts and income or consumption observed in household surveys. This
method works by taking the last observed distribution of welfare and scaling
the welfare of each household by the growth observed in GDP per capita from
national accounts between the survey and the nowcasting year.1 This model as-
sumes that growth observed in national accounts is fully ‘passed through’ to the
welfare observed in household surveys and that the only factor informative for
changes in poverty is growth in national accounts. In addition, like all methods
under (3) and (4), it assumes that growth accrues to everyone equally, that is,
without changing the distribution of welfare. This is problematic if growth was
pro-poor or pro-rich in the intervening period.
   1 The World Bank uses Household Final Consumption Expenditure (HFCE) whenever avail-
able and GDP otherwise, with the exception of countries in Sub-Saharan Africa where only GDP
is used (Prydz et al., 2019). We focus on GDP here since HFCE nowcasts are not available for many
countries.


                                               10
    The methods we cover are obviously not comprehensive, and other ways of
arriving at predictions of poverty rates exist. We hope that the methods we have
chosen cover both a mix of the most intuitive options and the ones that have been
applied in prior work.


2.2   Algorithms
In order to predict any of the options presented in Table 1, we rely on a number
of frequently used machine learning algorithms. In particular, we will use the
lasso (Tibshirani, 1996), the post-lasso (Belloni et al., 2013), CART random forests
(Breiman, 2001), conditional inference random forests (Hothorn et al., 2006) and
gradient boosting (Friedman, 2001). These methods all have in common that they
can predict the outcome variable of interest while being agnostic about which
variables are relevant for the predictions.
    Since the variables we use suffer from a lot of missing data, it is necessary to
ﬁnd a strategy to deal with this missingness. Simply deleting rows with missing
values is not feasible as this would leave no or very few observations left. For the
conditional inference random forests and gradient boosting, we rely on imputa-
tion methods embedded in the algorithms to deal with missing data. For condi-
tional inference random forests, for example, the algorithm works by sequentially
splitting the sample into two based on a variable deemed most predictive of the
target variable. The algorithm might judge that countries with a decline in the
share of male workers in agriculture is predictive of growth in the mean. For
country-years without data on the share of male workers in agriculture, the al-
gorithm will search for the most similar variable in terms of how it relates to the
target variable, which in this case could be the share of all workers in agriculture,
and split the observations with missing values in the ﬁrst variable based on the
latter.
    Such methods of dealing with missing values are not possible or not avail-
able in the programs we use for the lasso, post-lasso, and CART random forests,
where we will instead multiply impute the entire data set of features and thereby
avoid missing values altogether (Rubin, 1976, 2004; Schafer, 1997). For each im-
putation we calculate a predicted value and average over all of these to obtain
a ﬁnal estimate. Although this multiple imputation could also be used for the
other methods, it adds quite a bit of computing time, so we will only use it where
necessary.




                                         11
2.3   Evaluation of performance
In order to tune the algorithms listed above, compare the different approaches
for nowcasting poverty presented in Table 1, and report the ﬁnal out-of-sample
errors, we rely on nested 5-fold cross validation (Stone, 1974; Iizuka et al., 2003;
Varma and Simon, 2006). Intuitively, nested cross-validation works by iteratively
splitting the sample into three different subsamples; a training subsample on
which variants of a particular machine learning method are estimated, a test-
ing subsample on which these variants are compared against each other and the
best performer is selected, and a validation subsample which for each machine
learning method is used to calculate the out-of-sample errors. The data points
going into these three subsamples are reassigned multiple times to maximize the
power of the data. Nested cross-validation is an extension of regular k-fold cross-
validation, which reduces a bias with the latter towards selecting a model with
many tuning options and a downward bias in the ﬁnal out-of-sample errors ob-
tained. Appendix A contains a more thorough and technical discussion of nested
cross-validation.
    Once we are done with nested 5-fold cross validation, we will have one out-of-
sample estimate of the poverty rate for each household survey (except for the ear-
liest one for each country, as the methods that rely on changes since the last data
point will not work in those cases), for each machine learning method, and for
each approach to estimating poverty from Table 1. To evaluate the performance
of the various methods, our primary loss function will be the mean absolute de-
viation between the predicted and true poverty rate. To assure that we do not
select models that only work well for a few countries that happen to have many
poverty estimates, we weigh each country by the inverse of number of surveys,
such that the total weight for each country equals one.
    We use the mean absolute deviation rather than the mean squared error since
we are interested in minimizing the deviations between the true and predicted
poverty rates while giving equal weight to all deviations. Using the mean squared
error tends to give more weight to the prediction of outliers. We think this can be
problematic since data incomparabilities can create some strong outliers at times,
and we do not want to judge the methods by how well they predict these out-
liers. We focus on percentage point deviations rather than percentage deviations,
as the latter sometimes can be very large for countries with low poverty rates. If
a country has a poverty rate of 1% but our model predicts a poverty rate of 2%
the error in percentage terms would be 100% which would give this observation
a large impact. Our focus on percentage point deviations thus implicitly gives a


                                        12
larger focus on countries with high poverty rates.
    Once we have selected the model that minimizes the mean absolute deviation
following the details above, we will apply the selected model to predict poverty
in the nearcasting and nowcasting years, which for the present paper are 2020
and 2021.


3    Data
All poverty estimates used in this paper come from PovcalNet, which contains
the World Bank’s ofﬁcial country-level, regional, and global estimates of poverty.
Most of the data in PovcalNet comes from the Global Monitoring Database, which
is the World Bank’s repository of harmonized multitopic income and expenditure
household surveys used to monitor global poverty. PovcalNet contains more
than 2,000 surveys from 168 countries covering 98% of the world’s population.
The data available in PovcalNet are standardized as far as possible but differ-
ences exist with regards to the method of data collection, and whether the wel-
fare aggregate is based on income or consumption. By relying on the PovcalNet
database, we ensure consistency with the ofﬁcial numbers used by the World
Bank and United Nations for monitoring poverty, inequality, and related goals.
    To predict poverty, we rely on variables from three databases. First, the World
Bank’s World Development Indicators, which contain country-year information
on nearly 1,300 variables covering a wide range of topics, such as health, agri-
culture, education, climate change, infrastructure and more (but as we will ex-
plain below, only a small selection of these can be used for the present exercise).
Second, we rely on the IMF’s World Economic Outlook (WEO) database, which
contains country-year information for about 50 variables related to macroeco-
nomic outcomes, such as inﬂation, government debt, and unemployment. Third,
we rely on remote sensing data from the Google Earth Engine, particularly, data
on nighttime lights, rainfall, land surface temperature, impervious surface, crop-
land, and normalized difference vegetation, snow, and water indices. In contrast
to WEO and WDI, the remote sensing data are both more granular spatially and
more frequent temporally. To ﬁt into our framework, they need to be aggregated
to the country-year level. We ﬁrst aggregate them to annual data by calculating
the mean, max, min, and standard deviation of each location over a year. Af-
terwards, we aggregate spatially by taking the mean, max, min, and standard
deviation of the annual data for a country. This gives 16 features for each type
of variable. Some of these combinations will not be relevant for global poverty


                                        13
nowcasting but we include all of them here – as we do with WDI and WEO – to
remain agnostic about which ones are relevant and which ones are not.
    For all of the variables from the three sources above we also calculate annual-
ized growth rates of the variables between two household surveys for a country.
For all variables that are expressed as percentages, rates, or indices, we also cal-
culate the annualized change between two household surveys for a country.
    The only variables we remove from what is described above are (1) variables
with more than 90% missing information in 2020, the nearcasting year, (2) vari-
ables with more than 90% missing information for country-years with poverty
estimates, and (3) variables that are not comparable between countries, such as
variables expressed in local currency units. The ﬁrst two criteria help speed up
the models by removing variables with too little information to be relevant. The
latter removes variables where exploiting cross-country variation is not meaning-
ful.
    Depending on which month of the year this exercise is carried out, the re-
moval of variables with more than 90% in the nearcasting year removes a large
fraction of the WDI. The reason is that early in the year, most variables do not yet
have information for the prior year, meaning the information used for nearcasting
is not much better than the information that can be used for nowcasting. When
conducted in July (as the current exercise is), only about 300 WDI variables meet
that criterion. All in all, we are left with a bit more than 1,000 features across the
various data sources.
    Not all poverty trends within countries are comparable over time due to changes
in the survey methodology or the welfare aggregate. This matters for our predic-
tions of changes/growth in a target variable. Even if we knew the exact causes
of poverty in a country, if welfare aggregates are not comparable, then we would
not be able to predict the changes in poverty between two surveys. Though this
also makes it more difﬁcult to predict levels in our target variables, the problem is
arguably larger when predicting changes. Though we can restrict our sample to
comparable spells only, our main results will cover all data points, even the ones
that are not comparable over time. As one of our ﬁndings will be that nowcast-
ing poverty by predicting changes from a past distribution gives more accurate
results than predicting levels directly, the decision to include non-comparable
spells makes this result more conversative, as it precisely penalizes the methods
we ﬁnd superior. As a robustness check we restrict our data to comparable spells
only.




                                        14
4     Results

4.1   Evaluation of the performance of the models
Figure 4 shows the prediction errors across the six combinations of target vari-
ables from Table 1 by each of the ﬁve machine learning algorithms we use (regular
lasso, post-lasso, CARF random forest, conditional inference random forest, and
gradient boosting). Note that regardless of which target variable combination we
used (i.e. which panel of Figure 4), all the predictions are turned into poverty
rates and the errors are evaluated over poverty rates. Hence, they are compara-
ble to each other. As one example, the very ﬁrst bar shows that when predicting
poverty rates directly using a CART random forest (abbreviated as carf ), then we
on average get a predicted poverty rate 4.99 percentage points off the truth.
    Note that each bar in panels 5 and 6 of Figure 4 have several possible realiza-
tions depending on which growth incidence curve or distributional assumption
is used to convert means and Ginis into poverty rates. In Figure B.1 in the Ap-
pendix we show the best way of converting predictions of growth/levels in the
mean and Gini into poverty rates. The best way of converting predicted levels
of the Gini and mean tends to be by using a log-normal distribution, while the
best way of converting predictions of growth in the mean and Gini tends to be
by applying a linear growth incidence curve. Figure 4 shows the best performing
options from Figure B.1 in panels 5 and 6.
    Figure 4 reveals that generating poverty rates by predicting from the last ob-
served survey (right column) gives lower errors than predicting poverty rates or
the mean (and Gini) without accounting for changes from the last survey (left col-
umn). Nearly all methods trying to predict poverty rates directly give a higher
error than the worst method trying to predict from the last survey.
    The poor performance of models trying to predict poverty rates directly is
probably not due the fact that they have poverty rates on the left-hand side, but
more likely that the models we have used with levels of poverty, mean and the
Gini on the left-hand side do not use country ﬁxed effects or lagged values. Under
one interpretation, this treats these models equally to the models that predict
changes from the last distributions, which likewise do not have country-ﬁxed
effects or lagged values. Such factors are less important when we have already
taken ﬁrst differences, though. Given that surveys within countries often are
incomparable and often have many years between them, lagged information may
not provide much additional information. Yet, to the extent that there are country
speciﬁc effects in levels that are hard to predict, then any model that does not use


                                        15
        Figure 4: Testing various conversions of mean and Gini predictions




Notes: carf = CART random forest, cirf = conditional inference random forest, grbo = gradient
boosting, plas = post-lasso, rlas = regular lasso. The ﬁgure shows the best performing options
from Figure B.1 in panel 5 and 6.


ﬁxed effects or lagged values will not perform well.
    Among the different ways of predicting from the last survey, predicting changes
in poverty rates (panel 2) performs worse than predicting growth in the mean or
mean and Gini (panel 4 and 6). Predicting the Gini along with the mean tends
to slightly increase the error. In other words, assuming no changes to inequality
works better than trying to predict changes in inequality since the last survey.2
   2 Thisis a slightly counterintuitive result that appears to emerge because the models that also
predict the Gini work better on average for countries with low poverty rates but slightly worse
for countries with high poverty rates. The latter effect dominates our loss function. Perhaps this
is because that most of the variation of changes in the Gini that our models can pick up is from


                                               16
The best performing method is a conditional inference random forest which just
predicts growth in the mean, giving an error of 3.65 percentage points (panel 4).


4.2    Comparison to models just using GDP growth
It is interesting to see how the methods above compare with the method that
simply scales the distribution up according to growth in GDP per capita. Such a
model is simpler to understand, simpler to implement, and has the additional ad-
vantage that it can be used for nowcasting as well, given that GDP growth rates
themselves are nowcasted by various institutions.3 The models presented in the
last subsection relied heavily on variables that are not available for nowcasting.
Depending on how well the models predict when missing values are present,
this will make them perform worse for nowcasting. In Figure 5 we compare the
method of only using GDP growth with the best performing machine learning
method, the best performing machine learning method that predicts poverty rates
directly, as well as three other scenarios. These other three scenarios are as fol-
lows. First, a method which only allows a fraction of GDP per capita growth to be
passed through to the distribution and where this fraction is estimated through a
simple linear regression. The fraction passed through turns out to be 79%. This
follows empirical evidence showing that on average, only a fraction of growth in
national accounts trickles down to household surveys (Deaton, 2005; Lakner et al.,
2020; Pinkovskiy and Sala-i Martin, 2016; Ravallion, 2003). Second, we estimate
this passthrough rate separately by consumption and income welfare aggregates.
This is motivated by the fact that this interaction was the ﬁrst variable to enter
in the lasso predicting growth in the mean. With this method, 71% of growth is
estimated to pass through to consumption aggregates while 97% is estimated to
pass through to income aggregates. Third, we compare the predictions with the
predictions we would get if we were perfectly able to predict the mean. This is
done by shifting a distribution from the beginning of a spell such that it matches
the mean of the distribution at the end of the spell. This is what would happen
if our predictions of the mean or growth in the mean were error free. As such, it
represents the lowest possible prediction error of using method (3) or (4).
    We ﬁnd that the best performing machine learning method only reduces the
wealthy countries. Note also that if the distributional assumptions or growth incidence curves
we impose are not accurate, then even if we predict the Gini and mean well, it does not mean that
we predict the poverty rate well.
   3 A similar method would be to rely on growth in Household Final Consumption Expenditure

(HFCE), often the largest component of GDP. HFCE is rarely nowcasted and not available for all
countries for nearcasting, making it slightly less attractive from a practical perspective.


                                               17
Figure 5: Comparing best performing methods to models only using GDP growth




Notes: The ﬁgure compares errors of six different models. The best performing method takes
the minimum bar from Figure 4 while the best method predicting poverty rates directly takes
the minimum bar of panel 1 of Figure 4. Using only GDP growth to shift the mean refers to
predicting poverty by adjusting mean consumption by the growth in GDP per capita. The right-
most column reﬂects a hypothetical scenario of the errors one would get if one was perfectly able
to predict growth in the mean.


error by 0.12 percentage points over just using GDP growth, only by 0.05 percent-
age points if a passthrough rate is used, and only by 0.04 if an income/consumption
speciﬁc passthrough rate is used. This means that just using GDP per capita
growth to nearcast distributions is nearly as accurate as any method using more
than 1,000 variables and complex machine learning methods. This is not because
the method of ﬁrst predicting the mean and adjusting the distribution accordingly
does not work – if growth in the mean was predicted perfectly, the mean absolute
deviation would nearly half to 1.91 percentage points. Rather, it is because nearly
all of the variation in growth in mean consumption that can be explained with
these 1,000 variables can be explained by growth in GDP per capita. The ﬁgure
also reveals that just using growth in GDP per capita to shift the mean gives a
better performance than any model trying to predict poverty rates directly using
1,000 variables.
    Though the ﬁndings above apply across countries on average, a point of in-
terest is whether there are cases where this may not apply. We explore this in
Figure 6 by plotting the mean absolute deviation for three select models – the
best overall, the best at predicting poverty rates directly, and the best model only

                                               18
using a fraction of growth in GDP per capita – as a function of extrapolation time
(the time between two household surveys), poverty rates, year of the data, and
annualized growth in real GDP per capita. In all of these ﬁgures, the sample of
countries changes over the x-axis. For example, the countries with an extrapola-
tion time of one year are mostly in Europe and South America, while the countries
with an extrapolation of 10 years tend to be in Sub-Saharan Africa. As a result,
looking at the trend for a given line is of little interest. However, the gap between
the three lines can be used to explore, for example, for countries with a ﬁve-year
extrapolation time, which method performs better?



    It is plausible that using changes from the last survey is a good strategy when
the last survey is only a couple of years old, while it is a less sound strategy when
the last survey is a decade old or more. Extrapolation time does not matter when
predicting poverty rates directly, but the average prediction error might still be
a function of extrapolation time for the reason mentioned above – that the sam-
ple of countries changes. Panel (a) of Figure 6 shows that even for extrapolation
times of ﬁve years, predicting changes from the past survey by just using growth
in GDP per capita with a passthrough rate outperforms a model using 1,000+
variables trying to predict the poverty rate directly and performs as well as mod-
els predicting growth in the mean using 1,000+ variables. For extrapolation times
beyond ﬁve years, we have too few observations to say anything with sufﬁcient
certainty, but we can rule out that just using GDP growth is signiﬁcantly worse
than the other methods.
    In panel (b) we look at whether predicting poverty rates directly could be
better for either poor or non-poor countries. Predicting poverty rates directly
appears to be worse for all poverty levels except for countries with poverty rates
around 30%. Particularly for very poor countries and countries with low extreme
poverty, modeling changes through GDP is preferred to predicting poverty rates
directly.
    Deaton and Schreyer (forthcoming) argue that GDP is increasingly becoming
detached from national material well-being. This could imply that using growth
rates in GDP per capita has become less attractive, relative to predicting poverty
rates directly, in recent years. Panel (c) shows that we do not ﬁnd this to be the
case. A possible reconciliation between these two ﬁndings is that, while GDP
has become increasingly detached from household surveys, household surveys
have become more comparable across rounds within the same country, allowing
predictions from one survey to the next not to deteriorate.

                                         19
                    Figure 6: Errrors as a function of other variables
             (a) Extrapolation time                                (b) Poverty rates




                     (c) Year                                   (d) GDP growth rates




Notes: The ﬁgures plot local polynomials of the absolute deviations as a function of other vari-
ables. The trend for a particular line is not interpretable, since, for example, the sample of coun-
tries with extrapolation time of two years might be very different from the sample of countries
with an extrapolation time of four years. Yet the gap between the lines is interpretable. We only
show conﬁdence intervals for one model to not clutter the graph. The conﬁdence interval evalu-
ates the uncertainty of the local polynomial ﬁt but does not incorporate the uncertainty regarding
the survey-level predicted poverty rates themselves.


   In panel (d) we explore whether using GDP growth rates to project forward
only works for some levels of growth rates, such as only positive growth rates or
only non-extreme growth rates. This could be relevant for years of crises – such as
the economic crises following COVID-19 – where one could suspect that models


                                                20
trained on historical data may be vulnerable. Yet, we do not ﬁnd it to be the case
that using GDP growth to project forward is problematic for some growth rates,
suggesting, possibly, that even during recessions such as the one caused due to
the pandemic, predicting changes from the last survey might be preferable to
predicting poverty rates directly. In sum, we do not ﬁnd overwhelming evidence
that predicting poverty rates directly could be more attractive when compared to
simply shifting the mean with adjusted GDP growth rates.


4.3   Predictors of poverty
Though the ﬁndings so far suggest that growth in real GDP per capita is the over-
whelming predictor of poverty, we can look at this more systematically by ana-
lyzing the features that are important for the predictions. For the lasso, this can
be analyzed by looking at the order in which variables enter. For forests and
gradient boosting, it can be analyzed through feature importance measures. Fea-
ture importance measures indicate the importance of a particular feature for the
accuracy of the predictions.
    In Figure 7, we use the feature importance measures from the conditional in-
ference random forests, as this method generally performed best across the ma-
chine learning algorithms (judged by its average rank in Figure 4). For a partic-
ular feature, the feature importance is calculated by permuting the feature in the
dataset and calculating how much the prediction error increases. The importance
measure is standardized such that the variable with the highest value gets a value
of 1. The equivalent plots for the other methods look similar.



    The only six features that are substantively predictive of growth in the mean
are all national accounts variables. They are all identical to, or highly correlated
with, growth in GDP. The most predictive variable is growth in ﬁnal consump-
tion expenditure, which is the sum of two components of GDP, government ex-
penditure and HFCE. The second most predictive variables is growth in HFCE,
followed by growth in GDP, GNI, and gross domestic income (GDP measured
from the income side). The variable most informative not from national account-
ing is growth in the employment rate. Yet, it is clear from this ﬁgure that just
using growth in GDP per capita sums up the information well. Though growth
in ﬁnal consumption expenditure could be used instead to obtain a slightly lower
error, ﬁnal consumption expenditure is often not nowcasted widely, making it
only feasible to use for nearcasting.

                                        21
       Figure 7: Variables important for predicting growth in mean welfare




Notes: The variable importance measure comes from the conditional inference random forest. For
a particular feature, the value is calculated by permuting the feature in the dataset and calculating
how much the prediction error increases. The importance measure is standardized such that the
variable with the highest value gets a value of 1.


   For completeness, we show what predicts the other ﬁve target variables in
Figures B.2 and B.3 in the Appendix.


4.4    Implications for global poverty
Though the purpose of this paper is not to analyze current patterns of global
poverty, below we show a couple of ﬁgures on how the methods we employ can
(and, as we will see, cannot) be utilized to get an up-to-date picture of extreme
poverty. Figure 8 shows the global and regional trends in extreme poverty from
2014-2021, the nowcasting year as of the time of writing, using GDP growth to
shift the mean, the best model overall, and the best model predicting poverty
rates directly. For countries without at least one prior estimate of poverty, we use
the last method for the former two.
    Looking at Figure 8, it stands out that the best model that predicts poverty
rates directly deviates for the nowcasting year. This is because the particular
model used – gradient boosting – turns out not to work well for 2021 where many
of the input variables are missing. Though we could have rerun the model only
on variables widely available at the nowcasting year, it is not clear that this would
make things better as the model then shifts from one year to the next. For that
reason, this model is best only to use for nearcasting. Even for nearcasting, look-
ing closely at the East Asia & Paciﬁc panel reveals that methods trying to predict
poverty rates directly might yield unlikely trends. The relatively large increase
in the regional poverty rate in 2017 corresponds to the year where survey data


                                                 22
                          Figure 8: Nowcasted global poverty




Notes: For countries with at least one estimate we use the status quo with separate passthrough
rates by income/consumption and for countries without any prior data we use gradient boosting
to predict poverty rates directly. Note that the scale of the y-axis differs for each graph.


for China no longer is available. Prior to 2016, the survey estimates for China
are used. The jump suggests that the model predicts a higher poverty for China
than the ofﬁcial estimates. It is hard to support a case for poverty in East Asia
increasing rapidly from 2016 to 2017, though. This, we believe, speaks intuitively
in favor of models that are anchored in past estimates.
    When comparing the model only using GDP growth and the best model over-
all, it is evident that for most regions the two are nearly aligned. The exception
is South Asia, and as a consequence, the world as a whole, where just using GDP
per capita suggests a lower poverty rate for India. In other words, other indica-
tors have progressed less fast in India than growth in real GDP per capita.
    Globally, for the best model and the model only using GDP growth, we ﬁnd
an increase in extreme poverty of about 0.6 and 0.8 percentage points in 2020
but that more than half of this increase will reverted in 2021. The increases in
2020 were largest in terms of percentage point in Sub-Saharan Africa and South
Asia, but largest relatively speaking in the Middle East & North Africa and Latin
America & Caribbean. The nowcasts suggests that both of these regions may see
further increases in extreme poverty in 2021.
    Figure 9 shows the nowcasted poverty levels around the world using the GDP
growth model. The map gives a clear picture of poverty being concentrated in
Sub-Saharan Africa, in fact, 23 of the 24 poorest countries are found in Africa
(with the Republic of Yemen being the only non-African country). Particularly the


                                              23
landlocked countries in Africa have high levels of poverty, with South Sudan top-
ping the list with a poverty rate of 82%. High levels of poverty can also be found
in other pockets around the world, for example in Yemen (49%), Afghanistan
                ´
(40%), the Republica  Bolivariana de Venezuela (32%), Papua New Guinea (30%),
and Haiti (25%).
                          Figure 9: Global Poverty Map, 2021




Notes: Nowcasts of extreme poverty around the world using the model that shifts the last distri-
bution by a fraction of growth in real GDP per capita. For countries without any prior data we
use gradient boosting to predict poverty rates directly.




4.5    Preferred model and its limitations
Based on the ﬁndings of the past subsections, and due to the simplicity and in-
terpretability of just using GDP growth to project the distribution forward, the
GDP model with a passthrough rate of around 1 for income aggregates and 0.7
for consumption aggregates is our preferred method to nearcast poverty.
    When nowcasting poverty, all methods are likely to perform worse. The ma-
chine learning methods will perform worse for two reasons. (1) If missing values
are not imputed, the number of features will be much smaller, and if missing
values are imputed, the features will be of worse quality because imputations
are not a perfect substitute to actual data. (2) Since the features with data in the
nowcasting year are based on modeling, extrapolations, or data for only part of
the year, they are likely to be less accurate and ultimately less connected to the
welfare distributions.

                                              24
    Our preferred model of only using growth in real GDP per capita does not
suffer from the ﬁrst issue but it does suffer from the last issue. The nowcasted
growth rates have not yet been realized and are likely to deviate from the growth
rates that eventually will be estimated by national authorities. Hence, they will
likely be less predictive of welfare changes observed in household surveys. Since
our preferred method for nearcasting, in that sense, only suffers from one of the
two extra challenges that the other models suffer from when used for nowcasting,
we think its advantages stand out even more when nowcasting. In fact, when
running all our methods earlier in the calendar year (recall that earlier in the
calendar year the informational space available for nearcasting approximates that
of nowcasting), just using GDP growth becomes relatively more advantageous.
    Using growth in real GDP per capita also comes with risks. If it is the case
that nowcasted estimates of GDP are much more off target than nowcasted es-
timates of other variables, then it is possible that using only GDP growth will
be relatively less attractive for nowcasting than nearcasting. Some evidence sug-
gests that nowcasted growth rates by IMF might be too optimistic (Sandefur and
Subramanian, 2020).
    Another concern arises if GDP numbers are subject to quality concerns and are
not deemed credible. If this is the case, then they may likewise be less relevant for
both nearcasting and nowcasting. The fact that using growth rates in real GDP
per capita worked well when trained on historical data suggests that historically
this has not been frequently the case (or at least not more so than quality concerns
with other indicators). Nonetheless, Figure 8 did show a large discrepancy for
India when using growth in real GDP per capita and using more explanatory
variables. Though this may simply be because true growth in real GDP per capita
is less connected with household consumption in India than in other countries,
evidence suggests that GDP growth rates in India in recent years are non-credibly
high (Subramanian, 2019). If this is indeed the case, then this serves as another
argument against only relying on growth in real GDP per capita.
    Despite of these arguments against just using national accounts we think the
arguments in favor – particularly the relatively high accuracy, simplicity, ability
to work for both nearcasting and nowcasting, ability to work with any poverty
line, and ability to avoid large shifts when moving from survey estimates to ex-
trapolations – create a compelling case.




                                         25
5     Robustness checks
In this section we test some of our main results under different methodological
choices: ﬁrst by only using comparable survey spells and second by using the
mean squared error as the lost function.


5.1   Using only comparable survey spells
First, we rerun all results using only comparable survey spells. As we mentioned
earlier in the text, non-comparable spells tend to make it more difﬁcult to predict
changes from the past survey, meaning that we would expect the inferiority of
predicting poverty rates directly to hold even stronger when restricting the sam-
ples to comparable spells. We show the two main ﬁgures from the main text in
Figure B.4 and Figure B.5 in the Appendix. Note that the mean absolute devia-
tions are not comparable to the equivalent ﬁgures in the main text given that the
countries with comparable spells tend to be less poor (implying that the mean
absolute deviations will be lower). However, within Figure B.4 it is still possible
to compare across and within panels.
    Our main results all hold. Predicting poverty rates directly is far from optimal,
predicting changes in poverty is worse than predicting growth in the mean or
growth in the mean and trying to predict the Gini as well does not help above
and beyond the distribution neutral assumption. Once more, a model which just
applies a passthrough rate to growth in GDP per capita performs nearly as well
as the best of all machine learning methods.


5.2   Using the mean squared error
A more common loss function than the one we applied is the root mean squared
error (RMSE). We did not use the RMSE since it tends to give larger weight to
the prediction of outliers, which in the context of poverty measurement probably
are estimates suffering from measurement error. We do not want to tailor our
models towards these data points. Figure B.6 and Figure B.7 in the Appendix
test our main results using the RMSE. Some of our main results still hold, but
now predicting poverty rates directly with gradient boosting is relatively more
attractive (with an error of 6.86). In fact, it is now better performing that shifting
the mean with a fraction of growth in GDP per capita (6.92). Once more, however,
the potential gain in accuracy of any machine learning model hardly merits the
extra complexity they add. It remains the case that predicting changes in poverty


                                         26
is worse than predicting growth in the mean, and that trying to predict inequality
as well does not help.


6     Conclusion
In this paper, we have analyzed how best to nowcast poverty around the world.
We applied statistical learning techniques to the World Bank’s collection of inter-
national poverty estimates utilizing more than 1,000 development indicators as
features. We looked at whether predicting poverty rates directly had a higher ac-
curacy than predicting poverty indirectly, by, for example, predicting changes in
poverty from the last survey, by predicting growth in mean welfare and applying
this growth to scale up the past distribution, or by predicting growth in the mean
and in the Gini and applying growth incidence curves to scale and stretch the
past distributions.
    Our ﬁndings revealed that a model which simply uses a fraction of growth in
real GDP per capita since the last observed household survey to shift the entire
distribution performs better than any model that predicts poverty rates directly
using all the variables mentioned above, and nearly as well as all models trying
to predict growth in the mean and in the Gini using all the variables mentioned
above. This suggests that conditional on knowing growth in real GDP per capita,
no other variable can substantially increase the predictive accuracy.
    On the one hand, given the decades long literature documenting the impor-
tance of growth for poverty reduction (Kraay, 2006; Ferreira and Ravallion, 2009),
this is not surprising. Partially as a result of this literature, variants of the model
we prefer have been used both in the literature and for the World Bank’s ofﬁcial
global poverty monitoring. On the other hand, several papers have documented
important gaps between national accounts data and household survey data, sug-
gesting that GDP might not be that connected to welfare as measured in house-
hold surveys (Ravallion, 2003; Deaton, 2005; Ferreira et al., 2010; Pinkovskiy and
Sala-i Martin, 2016; Prydz et al., 2021; Deaton and Schreyer, forthcoming). What
can explain these seemingly contradictory statements?
    First, although the model using growth rates in GDP per capita performs bet-
ter than nearly all other models, it does not imply that the model’s error is low.4
   4 Note that this pertains to the error of country-level nowcasts. The results do not directly as-

sess the accuracy of the World Bank’s regional and global nowcasts, since a large portion of the
country-speciﬁc errors will average out when aggregating across many countries. However, the
results do suggest that there are substantial discrepancies between nowcasts and survey-based
measures for individual countries, and that the country nowcasts should be interpreted with ap-


                                                27
In fact, our preferred model explains only about 25% of the out-of-sample vari-
ation in growth in mean welfare. The models using 1,000+ variables perform at
about the same level or slightly better. In other words, most of the variation we
are trying to predict cannot be predicted by any information we are utilizing.
    Why is poverty so hard to predict by any model? We hypothesize ﬁve possible
answers. First, the available features may not be well-suited to predict growth in
welfare. Though we rely on predictors from different databases including remote
sensing data, it is possible that the most important features are contained in other
data sources, such as mobile phone data or proprietary remote sensing data. Sec-
ond, it may be that other algorithms or ensemble learning might perform better.
Third, this analysis has only considered data at the country level, and it is possi-
ble that models speciﬁed at the subnational level may perform better.
    Fourth, the reason for the seemingly poor performance of all methods may
be due to measurement error and measurement differences in welfare aggregates
across countries. Issues like whether a recall or diary is used, number of con-
sumption items asked about, the treatment of rent and durables, and differential
non-response to surveys vary between and within countries and can have signif-
icant bearings on poverty rates (Beegle et al., 2012; Jolliffe, 2001). This affects both
poverty rates in the cross-section as well as changes. Accounting for these issues
would require a detailed database on the methodology used to construct welfare
aggregates, which is not currently available. Finally, the fact that the distributions
of many countries have a lot of mass around the international poverty line means
that even small changes to the distribution of welfare can yield large changes to
the poverty rate. This makes it difﬁcult to predict the poverty rate with great
precision.
    Regardless of which, if any, of these hypotheses can explain why no model is
able to explain a signiﬁcant fraction of the variation in the growth in welfare, a
very simple model utilizing growth in real GDP per capita and a prior welfare
distribution remains an appealing method.




propriate caution.


                                          28
References
A NGRIST, N., G OLDBERG , K. P. and J OLLIFFE , D. (2022). Why is Growth in Developing
  Countries So Hard to Measure? Journal of Economic Perspectives, 35 (3), 215–242. (page
  4)

A RLOT, S. and C ELISSE , A. (2010). A Survey of Cross-Validation Procedures for Model
  Selection. Statistics Surveys, 4, 40–79. (page 34)

A RUOBA , S. B. and D IEBOLD , F. X. (2010). Real-Time Macroeconomic Monitoring: Real
  Activity, Inﬂation, and Interactions. American Economic Review, 100 (2), 20–24. (page 4)

B EEGLE , K., D E W EERDT, J., F RIEDMAN , J. and G IBSON , J. (2012). Methods of Household
  Consumption Measurement Through Surveys: Experimental Results from Tanzania.
  Journal of Development Economics, 1 (98), 3–18. (page 28)

B ELLONI , A., C HERNOZHUKOV, V. et al. (2013). Least Squares After Model Selection in
  High-Dimensional Sparse Models. Bernoulli, 19 (2), 521–547. (page 11)

B ERGMEIR , C. and B EN´
                       I TEZ , J. M. (2012). On the Use of Coss-Validation for Time Series
  Predictor Evaluation. Information Sciences, 191, 192–213. (page 34)

—, H YNDMAN , R. J. and K OO , B. (2018). A Note on the Validity of Cross-Validation
  for Evaluating Autoregressive Time Series Prediction. Computational Statistics & Data
  Analysis, 120, 70–83. (page 35)

B OURGUIGNON , F. (2003). The Growth Elasticity of Poverty Reduction: Explaining Het-
  erogeneity Across Countries and Time Periods. In T. Eicher and S. Turnovsky (eds.),
  Inequality and Growth: Theory and Policy Implications, Cambridge, MIT Press. (page 8)

B REIMAN , L. (2001). Random Rorests. Machine Learning, 45 (1), 5–32. (page 11)

C HI , G., FANG , H., C HATTERJEE , S. and B LUMENSTOCK , J. E. (2021). Micro-Estimates
  of Wealth for all Low- and Middle-Income Countries. arXiv preprint arXiv:2104.07761.
  (page 4)

C UARESMA , J. C., F ENGLER , W., K HARAS , H., B EKHTIAR , K., B ROTTRAGER , M. and
  H OFER , M. (2018). Will the Sustainable Development Goals be Fulﬁlled? Assessing
  Present and Future Global Poverty. Palgrave Communications, 4 (1), 1–8. (page 4)

D EATON , A. (2005). Measuring Poverty in a Growing World (or Measuring Growth in a
  Poor World). The Review of Economics and Statistics, 87 (1), 1–19. (page 4, 17, 27)

— and S CHREYER , P. (forthcoming). GDP, Wellbeing, and Health: Thoughts on the 2017
  Round of the International Comparison Program. The Review of Income and Wealth.
  (page 4, 19, 27)

                                             29
E DWARD , P. and S UMNER , A. (2014). Estimating the Scale and Geography of Global
  Poverty Now and in the Future: How Much Difference D Method and Assumptions
  Make? World Development, 58, 67–82. (page 4)

F ERREIRA , F. H. G., C HEN , S., D ABALEN , A., D IKHANOV, Y., H AMADEH , N., J OLLIFFE ,
  D., N ARAYAN , A., P RYDZ , E. B., R EVENGA , A., S ANGRAULA , P. et al. (2016). A Global
  Count of the Extreme Poor in 2012: Data Issues, Methodology and Initial Results. The
  Journal of Economic Inequality, 14 (2), 141–172. (page 5)

— and L EITE , P. G. (2003). Policy Options for Meeting the Millennium Development Goals in
  Brazil: Can Micro-Simulations Help? The World Bank. (page 9)

—, — and R AVALLION , M. (2010). Poverty Reduction Without Economic Growth?: Ex-
  plaining Brazil’s Poverty Dynamics, 1985–2004. Journal of Development Economics, 93 (1),
  20–36. (page 4, 27)

— and R AVALLION , M. (2009). Poverty and Inequality:            The Global Context. In
  W. Salverda, B. Nolan and T. Smeeding (eds.), The Oxford Handbook of Economic In-
  equality, Oxford University Press. (page 4, 27)

F ISK , P. R. (1961). The Graduation of Income Distributions. Econometrica: journal of the
  Econometric Society, pp. 171–185. (page 8)

F RIEDMAN , J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine.
  Annals of statistics, pp. 1189–1232. (page 11)

G IANNONE , D., H ENRY, J., L ALIK , M. and M ODUGNO , M. (2012). An Area-Wide Real-
  Time Database for the Euro Area. Review of Economics and Statistics, 94 (4), 1000–1013.
  (page 4)

—, R EICHLIN , L. and S MALL , D. (2008). Nowcasting: The Real-time Informational Con-
  tent of Macroeconomic Data. Journal of Monetary Economics, 55 (4), 665–676. (page 4)

H ILLEBRAND , E. (2008). The Global Distribution of Income in 2050. World Development,
  36 (5), 727–740. (page 4)

H OTHORN , T., H ORNIK , K. and Z EILEIS , A. (2006). Unbiased Recursive Partitioning:
  A Conditional Inference Framework. Journal of Computational and Graphical statistics,
  15 (3), 651–674. (page 11)

I IZUKA , N., O KA , M., YAMADA -O KABE , H., N ISHIDA , M., M AEDA , Y., M ORI , N.,
  TAKAO , T., TAMESA , T., TANGOKU , A., TABUCHI , H. et al. (2003). Oligonucleotide Mi-
  croarray for Prediction of Early Intrahepatic Recurrence of Hepatocellular Carcinoma
  After Curative Resection. The Lancet, 361 (9361), 923–929. (page 12, 33)


                                             30
J OLLIFFE , D. (2001). Measuring Absolute and Relative Poverty: The Sensitivity of Esti-
  mated Household Consumption to Survey Design. Journal of Economic and Social Mea-
  surement, 27 (1-2), 1–23. (page 28)

                                                                       ˆ d’Ivoire.
K AKWANI , N. (1993). Poverty and Economic Growth with Application to Cote
  Review of Income and Wealth, 39 (2), 121–139. (page 9)

K RAAY, A. (2006). When Is Growth Pro-poor? Evidence From A Panel of Countries.
  Journal of Development Economics, 80 (1), 198–227. (page 4, 27)

L AKNER , C., M AHLER , D. G., N EGRE , M. and P RYDZ , E. B. (2020). How Much Does
  Reducing Inequality Matter for Global Poverty? World Bank Group Global Poverty Mon-
  itoring Technical Note. (page 4, 9, 17)

M OSES , M., K HARAS , H., M ILLER -P ETRIE , M., T SAKALOS , G., M ARCZAK , L., H AY, S.,
  M URRAY, C. and D IELEMAN , J. L. (2021). Global Poverty and Inequality from 1980 to
  the covid-19 Pandemic. (page 4)

P INKOVSKIY, M. and S ALA - I M ARTIN , X. (2016). Lights, Camera . . . Income! Illumi-
  nating the National Accounts-Household Surveys Debate *. The Quarterly Journal of
  Economics, 131 (2), 579–631. (page 4, 17, 27)

P RYDZ , E. B., J OLLIFFE , D. M., L AKNER , C., M AHLER , D. G. and S ANGRAULA , P.
  (2019). National Accounts Data Used in Global Poverty Measurement. World Bank
  Group Global Poverty Monitoring Technical Note. (page 10)

—, — and S ERAJUDDIN , U. (2021). Mind the Gap: Disparities in Assessments of Living
  Standards Using National Accounts and Ssurveys. World Bank Policy Research Working
  Paper 9779. (page 4, 27)

R AVALLION , M. (2003). Measuring Aggregate Welfare in Developing Countries: How
  Well Do National Accounts and Surveys Agree? The Review of Economics and Statistics,
  85 (3), 645–652. (page 4, 17, 27)

— (2013). How Long Will It Take To Lift One Billion People Out of Poverty? The World
  Bank Research Observer, 28 (2), 139–158. (page 4)

— and C HEN , S. (2003). Measuring Pro-Poor Growth. Economics Letters, 78 (1), 93–99.
  (page 8)

R UBIN , D. B. (1976). Inference and Missing Data. Biometrika, 63 (3), 581–592. (page 11)

— (2004). Multiple Imputation for Nonresponse in Surveys, vol. 81. John Wiley & Sons. (page
  11)



                                            31
S ANDEFUR , J. and S UBRAMANIAN , A. (2020). The IMF’s Growth Forecasts for Poor
  Countries Don’t Match Its COVID Nsarrative. Center for Global Development Working
  Paper 533. (page 25)

S CHAFER , J. L. (1997). Analysis of Incomplete Multivariate Data. CRC press. (page 11)

S TONE , M. (1974). Cross-Validatory Choice and Assessment of Statistical Predictions.
  Journal of the Royal Statistical Society: Series B (Methodological), 36 (2), 111–133. (page 12,
  33)

S UBRAMANIAN , A. (2019). India’s GDP Mis-Estimation: Likelihood, Magnitudes, Mech-
  anisms, and Implications. Center for International Development Working Paper Series No.
  354. (page 25)

S UMNER , A. and H OY, C. (2022). The End of Global Poverty: Is the UN Sustainable
  Development Goal 1 (Still) Achievable? Global Policy”. (page 4)

T IBSHIRANI , R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the
  Royal Statistical Society: Series B (Methodological), 58 (1), 267–288. (page 11)

VARMA , S. and S IMON , R. (2006). Bias in Error Estimation when Using Cross-Validation
  for Model Selection. BMC Bioinformatics, 7 (1), 1–8. (page 12, 33)




                                              32
A       Evaluation of model performance
In order to tune the machine learning algorithms we use, select among the dif-
ferent modeling methods presented in Table 1, and report the ﬁnal out-of-sample
errors, we rely on nested 5-fold cross validation (Stone, 1974; Iizuka et al., 2003;
Varma and Simon, 2006). To this end, we split our data into ﬁve folds (the outer
folds) and split each of these ﬁve folds into ﬁve folds themselves (the inner folds).
Hence, each datapoint is part of two nested folds. For each machine learning al-
gorithm, we tune the models using the inner folds and use the outer folds to select
among the different methods and report the out-of-sample errors.
    In more detail, we take four of the ﬁve outer folds, and on this subset, run a
grid of models with various tuning parameters on four of the ﬁve inner folds. We
estimate the error on the ﬁfth inner fold, and repeat, sequentially leaving out a
ﬁfth of the already subset data. This will allow us to ﬁnd optimal tuning parame-
ters without having touched the ﬁfth outer fold. We repeat this ﬁve times holding
out a different outer fold, thus getting ﬁve different optimal tuning parameters,
each of which has not used one outer fold. Thus far we have run 5 ∗ 5 ∗ g models,
where g is the size of the tuning grid. Subsequently, we take the average of the
ﬁve optimal tuning parameters, run the optimal model on four of the ﬁve outer
folds, estimate the error of that model of the held-out outer fold, and repeat over
the ﬁve outer-folds. This gives us the out-of-sample error of the tuned model.5
    Nested k-fold cross validation avoids a problem with using regular k-fold
cross-validation for tuning and model selection at the same time. If using regular
k-fold cross validation for both of these tasks, then there will be a bias towards
selecting a model with a large tuning grid and a downward bias in the ﬁnal out-
of-sample errors obtained. To see this, suppose we want to compare two machine
learning methods that on expectation perform equally well. Suppose further
that one of the methods can be run with 50 different tuning parameter options
( g = 50) that all perform equally well on expectation while the other method has
no tuning parameters ( g = 1). When using cross-validation for model selection
and model evaluation at the same time, the method with tuning parameters is
likely to be chosen 50 out of 51 times instead of half of the time. In addition, the
error of the ﬁnal model is likely to be unlikely low, since it by chance was the best
performing of all iterations. If we instead had evaluated the methods on data not
    5 Forrandom forests, rather than creating inner folds, we rely on the out-of-bag error. The out-
of-bag error leverages the fact that each tree in a forest is only based on a subset of observations.
To get an out-of-sample error for an observation, the trees in which an observation did not ﬁgure
are utilized. This avoids having to run 5 ∗ 5 ∗ g + 5 models, and rather only 5 ∗ g + 1 models.



                                                 33
used for the model selection, the out-of-sample error would not be downwards
biased.
     An alternative to using nested k-fold cross-validation could be to reserve the
last spell for each county, applying cross-validation on the remaining data to tune
models, and use the last spell for choosing methods and reporting predictive per-
formance. Reserving the last spell of each country mimics what Bergmeir and
Ben´ ıtez (2012) call last-block validation. This would leave us with only about 150
data points to evaluate the error, of which more than half have a poverty rate less
than 3%. We think this would result in a high uncertainty around the ﬁnal error,
which might lead us to choose the wrong method (Bergmeir and Ben´        ıtez, 2012).
     An assumption necessary for regular and nested cross-validation to work is
that the residuals are i.i.d (Arlot and Celisse, 2010). This assumption is violated if
there are spatial or temporal patterns in the dataset unaccounted for by the mod-
els. This assumption is likely violated in our methods, particularly so when we
predict the levels of an outcome variable. The reason for this is simple: given that
a country is represented repeatedly in our data, a country’s residual in year t will
likely not be independent of that same country’s residual in t − 1. This could oc-
cur, for example, if part of a country’s welfare aggregate, such as durable goods,
is systematically excluded. This country’s poverty rates would all be higher than
what we would expect had the welfare aggregate been constructed with durable
goods. Supposing none of our features are able to fully pick up this country-
speciﬁc idiosyncrasy, the residuals from this country’s poverty rate predictions
will not be i.i.d. When predicting changes or growth in an outcome variable, it is
likely that some of the spatial and temporal patterns disappear, meaning that the
i.i.d assumption is less of a concern.
     A method to partially overcome the issue of the residuals not being i.i.d would
be to use a temporal block cross-validation – assigning observations to folds in
blocks of time – or a spatial block cross-validation – assigning all estimates for a
country to the same fold. Yet, to the extent that our models are able to explain at
least part of the country- or time-speciﬁc peculiarities, such models would likely
result in less relevant nowcasts. Continuing the example from before, suppose
that our models actually were able to ﬁnd a variable that predicts the relatively
high poverty in our hypothetical country where durable goods are excluded from
the welfare aggregate. Arguably, our nowcasts for this country should include
this country-speciﬁc bias, since – had an actual poverty rate been available for
the country in the nowcasting year – it would likely have continued the country-
speciﬁc idiosyncrasy. If using spatial block cross-validation, this country would


                                         34
not have data in the estimation folds and held-out folds at the same time, mean-
ing that the models that utilize the variable particular to the country’s idiosyn-
crasies will perform poorly (supposing it does not work for well other countries),
and a model not utilizing this variable will likely be chosen.
    It is not clear whether this is attractive or not. On one hand, one could argue
that if we want to predict some latent unobserved poverty rate of which actual
estimates are only a noisy signal, then a nowcast which ignores potential biases
from this signal is preferred. On the other hand, the prior estimates of poverty
often carry some authority and are treated as the ground-truth by governments.
Hence continuing the country-speciﬁc particularities into the nowcasting year
may be preferable. Since we are of the latter belief, we will not use blocked cross-
validation. Bergmeir et al. (2018) have shown that as long as the data are ﬁtted
well by the model, cross-validation without any modiﬁcation works well in prac-
tice. This gives us at least some reassurance that i.i.d violations may not be too
problematic.
    That said, when predicting poverty for a country without any prior poverty
estimates, we are likely to get more accurate estimates if using a spatial block
cross-validation. For those countries, using models which ﬁt to country-speciﬁc
peculiarities are likely to work less well than using models that are evaluated on
data fully based on new countries, akin to what a spatial block cross-validation
does. Since we are more concerned with nowcasting poverty for countries with
prior estimates (where, as we have argued, we think blocked spatial cross-validation
is not preferable) than predicting poverty for countries without prior poverty es-
timates (which concerns less than 2% of the world’s population), we think this
speaks in favor of not using spatial block cross-validation.




                                       35
B     Further results and robustness checks

Figure B.1: Performance of methods predicting level/growth in mean and Gini




Notes: carf = CART random forest, cirf = conditional inference random forest, grbo = gradient
boosting, plas = post-lasso, rlas = regular lasso.




                                             36
                Figure B.2: Variables important for predictions (1/2)
                                        (a) Poverty rates




                                  (b) Changes in poverty rates




                                     (c) Mean consumption




Notes: The variable importance measure comes from the conditional inference random forest and
measures the total decrease in residual sum of squares from splitting on the variable, averaged
over all trees. The importance measure is standardized such that the variable with the highest
value gets a value of 1.




                                              37
                Figure B.3: Variables important for predictions (2/2)
                                       (a) Gini coefﬁcient




                                 (b) Growth in gini coefﬁcient




Notes: The variable importance measure comes from the conditional inference random forest and
measures the total decrease in residual sum of squares from splitting on the variable, averaged
over all trees. The importance measure is standardized such that the variable with the highest
value gets a value of 1.




                                              38
   Figure B.4: Performance of various methods using only comparable spells




Notes: carf = CART random forest, cirf = conditional inference random forest, grbo = gradient
boosting, plas = post-lasso, rlas = regular lasso.




                                             39
   Figure B.5: Performance of subset of methods using only comparable spells




Notes: The ﬁgure compares errors of six different models. The best performing method takes
the minimum bar from Figure B.4 while the best method predicting poverty rates directly takes
the minimum bar of panel A of Figure B.4. Using only GDP growth to shift the mean refers to
predicting poverty by adjusting mean consumption by the growth in GDP per capita. The right-
most column reﬂects a hypothetical scenario of the errors one would get if one was perfectly able
to predict growth in the mean.




                                               40
           Figure B.6: Performance of various methods using the RMSE




Notes: carf = CART random forest, cirf = conditional inference random forest, grbo = gradient
boosting, plas = post-lasso, rlas = regular lasso.




                                             41
           Figure B.7: Performance of subset of methods using the RMSE




Notes: The ﬁgure compares errors of six different models. The best performing method takes
the minimum bar from Figure B.6 while the best method predicting poverty rates directly takes
the minimum bar of panel A of Figure B.6. Using only GDP growth to shift the mean refers to
predicting poverty by adjusting mean consumption by the growth in GDP per capita. The right-
most column reﬂects a hypothetical scenario of the errors one would get if one was perfectly able
to predict growth in the mean.




                                               42