Policy Research Working Paper                        10836




     Measuring the Upper Tail of the Income
           and Wealth Distributions
                                     Andrew Kerr
                                     Mxolisi Zondi




Poverty and Equity Global Practice
July 2024
Policy Research Working Paper 10836


  Abstract
 This paper describes the challenges of accurately measuring                        the top of the wealth distribution and the key differences
 the upper tail of the income and wealth distributions in                           compared with measuring income. It then identifies gaps
 low- and middle-income countries. It reviews the seminal                           in the literature and the implications for those undertaking
 contributions in the literature on measuring the top of the                        research in this area. Finally, the paper proposes a research
 income distribution and then more recent work, focusing                            agenda to close the identified key gaps for practitioners,
 on the challenges in doing so and the solutions that have                          distinguishing between steps that can be undertaken with
 been proposed. The paper focuses mostly on incomes, but                            current datasets and those that require new data.
 it devotes a specific section to examining measurement of




 This paper is a product of the Poverty and Equity Global Practice. It is part of a larger effort by the World Bank to
 provide open access to its research and make a contribution to development policy discussions around the world. Policy
 Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted
 at andrew.kerr@uct.ac.za.




         The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
         issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
         names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
         of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
         its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                       Produced by the Research Support Team
Measuring the Upper Tail of the Income and Wealth Distributions
Andrew Kerr1 and Mxolisi Zondi2




JEL Classification: D31, C81

Keywords: Income Distribution, Wealth Distribution, Top tail, Measurement, Household Surveys




1
 School of Economics and DataFirst, University of Cape Town. andrew.kerr@uct.ac.za
2
 Research Unit on the Economics of Excisable Products, University of Cape Town
We thank Utz Pape, Jed Friedman and Matthew Wai-Poi from the World Bank for helpful comments
and suggestions. Mxolisi Zondi was employed by DataFirst at the University of Cape Town when he
worked on this project.
Introduction
This paper describes the challenges of accurately measuring the upper tail of the income and wealth
distributions in low- and middle-income countries. We first review the seminal contributions in the
literature on measuring the top of the income distribution and then more recent work, focusing on
the challenges in doing so and the solutions that have been proposed. We focus initially on incomes,
but then devote a specific section to examining measurement of the top of the wealth distribution
and the key differences compared with measuring income. We then identify gaps in the literature and
the implications for those undertaking research in this area. Finally, we propose a research agenda to
close the identified key gaps for practitioners, distinguishing between those that can be undertaken
with current datasets and those that require new data.

Kuznets and Jenks’s (1953) landmark study brought together tax data, newly created national
accounts and population estimates for the United States to estimate shares of income received by the
top percentiles. Similar studies were subsequently undertaken in other countries, but the top of the
distribution was then neglected, until being revived in the last 20 years thanks to the work of Piketty
(2001, 2003). Following Piketty’s work, researchers have used tax tabulations from official reports, but
the availability of tax microdata, also used by Piketty, has also generated further impetus to the
research agenda of investigating the top of the income and wealth distribution. A 2022 special issue
of the Journal of Economic Inequality, entitled “Finding the Upper Tail”, shows that this is an area of
active research. We draw from several of the articles from that special issue below, as well as from
Lustig (2020), when discussing the challenges that arise when measuring the top tail of the income
and wealth distributions and the solutions.

Atkinson (2007) provided two motivations for better measurement of the top tail of the income
distribution - to improve knowledge of the capacity of states to raise extra revenue through increased
taxation levels and to measure the extent to which those with high incomes have command or power
over other people, including the ability to pay for what Atkinson calls “elite separation”, which he
emphasized was a bad thing.

Another motivation for better measurement of the top tail of the income distribution is that while
much research acknowledges that the top tail is not well measured in surveys, the extent to which it
is under-captured is not known. One common but very approximate statistic that is indicative of the
problem that is available for many countries is the difference between total income or consumption
in surveys compared to national accounts (Lakner and Milanovic, 2016). Prydz et. al. (2022) show that
these differences are substantial, and argue that under-estimates of income or consumption in the
top tail are thus part of the explanation, although how much of the explanation under-estimates of


                                                                                                      2
the top tail account for and how this may vary across countries or time is not yet clear. The research
discussed in this paper provides evidence from a number of countries that better measurement of the
top tail increases the estimated shares of the top of the distribution and thus can potentially explain
much of the difference between survey and national accounts estimates of total income or
consumption.

Seminal papers
In the absence of representative survey data on income, Kuznets and Jenks’s (1953) pathbreaking
study used aggregated data from tax statistics in the US to measure incomes at the top of the
distribution, as well as population and total national income from national accounts as controls.
Kuznets (1963) compiled similar information from a variety of sources for 18 countries, including 11
developing countries. 3 His conclusion was that the share of the top 5 percent was much larger in
developing countries than in developed countries, although the share of the bottom 60% was similar
in developed and developing countries. Since some of the data came from tax statistics, shares for the
middle and bottom of the distribution were sometimes imputed using parametric methods- for
example by estimating the Pareto coefficient for the top of the income distribution using tax data and
using this to impute the shares for different parts of the rest of the distribution (Kuznets, 1963:15).

Recent Literature
Piketty et al. (2022) note that after Kuznets’ work in the 1950s and 1960s, household surveys and the
microdata produced from them became much more common. This meant that the entire income
distribution could be examined, rather than the top (or top and middle) only, encouraging research
on poverty as well as inequality in many more countries that undertook representative household
surveys. This meant that the focus on the top end of the income distribution was mostly neglected for
several decades, in favor of index measures of inequality that did not bring into focus the very top end
of the distribution, unlike Kuznets’ work (Milanovic, 2014).

The neglect of the top end of the income distribution ended with Piketty’s publication of long-run
estimates of top income and wealth shares for France for the 20th century (Piketty, 2001, 2003), which
reinvigorated research on the top end of the income and wealth distributions. Piketty (2003) used
data and methods when analysing inequality in France for most of the 20th century that were similar
to those used by Kuznets (1953)- national accounts data, published tax tables with aggregated data


3
  Alvaredo and Atkinson (2022) note that similar research had already been undertaken for South Africa by
Frankel and Herzfeld (1943) - a decade before Kuznets and Jenks' work for the United States, although this was
limited to whites only.



                                                                                                             3
on taxpayers in various tax brackets and interpolation using the Pareto distribution. But Piketty
(2003:1007) notes that he could check the reliability of such methods because tax microdata was
available for later periods, in addition to the tax tables.

Piketty’s work stimulated the creation of the World Top Incomes Database (WTID) in 2011, which
created top income inequality statistics for 30 countries using similar data to that used in Piketty’s
original work (Alvaredo et al., 2016). WTID was later subsumed into the World Inequality Database,
which extended WTID to create Distributional National Accounts (DINA). The DINA project aims to
provide estimates of the distributions of income and wealth that are consistent with national accounts
and harmonized over time and countries (Alvaredo et al. 2016). To do so requires household survey
data, national accounts data and ideally either tax microdata or aggregated data. And since
comparisons of these data sources often show that household surveys do under-capture incomes and
wealth, some of the research on measuring the top tail of the income distribution and the challenges
of doing so that we discuss in the next section has come from researchers associated with the DINA
project directly.

The Extent of Income Inequality
A reasonable definition of the top tail of the income distribution is the top 10 percent. Using the World
Inequality Database (WID) we briefly describe the share of the top 10% of the income distribution,
using the most recent estimate for each region or country. Across all countries for which there is data,
and excluding extrapolated or interpolated statistics, the share of the top 10% is 47%. It is only 37% in
Europe but is 48% in the US, 56% in Sub-Saharan Africa, 58% in Latin America, 43% in China and 57%
in India. These estimates are from a variety of sources and are of varying quality, but the bottom line
is that the top of the distribution receive a very large share of total income. The next section discusses
the challenges of accurately capturing the top tail using survey and administrative data.

Challenges to Accurately Capturing the Top Tail in Household Surveys
In this section we document the challenges faced by researchers wanting to use household surveys to
measure the top tail of the income distribution. Following this, we focus on challenges when using
other types of data, mostly administrative data such as tax or social security records or house prices.
Administrative data have been used precisely because of the challenges encountered when using
household surveys. However, these types of administrative data are very rare in low-income countries
and are still unusual in middle income countries. Thus, household surveys are still very important,
which is why we spend some time documenting the challenges associated with using them. In doing
so we follow recent papers by Lustig (2020) and Ravallion (2022), although we complement these
papers in a variety of ways.


                                                                                                        4
In the discussion below we refer mostly to the challenges of measuring the top tail of the income
distribution. But most of the challenges we discuss apply to measuring the wealth distribution as well.
We provide a specific section on how these challenges differ for wealth below.

Unit non-response
One key challenge in measuring the top tail of the income distribution is unit non-response. This
means that a household was included in the sample, but no information was collected on that
household. The main reasons for unit non-response are refusals and non-contact, with refusals likely
to be more common (Ravallion, 2022). Unit non-response is usually classified as being missing
completely at random (MCAR), missing at random (MAR) or not missing at random (NMAR) (Lohr,
2009). These mean, respectively, that non-response is ignorable, that non-response is ignorable
conditional on some covariates or that it is not ignorable, i.e., that it is correlated with the outcome
of interest, here income or wealth.

We did not find a comprehensive overview of unit non-response levels and trends for the World Bank’s
Living Standards Measurement Surveys (LSMS). Scott et al. (2005) did review response rates from 8
LSMSs in the late 1990s and early 2000s, finding a non-response rate of 11%. Vaessen et al. (2005)
reported an average non-response rate of 2.5% to the Demographic and Health Surveys conducted in
44 developing countries between 1990 and 2000. This, plus the fact that questions about incomes are
probably less than or similarly sensitive to questions about health, suggests that unit non-response
rates to surveys that include questions on incomes are likely to be much lower in low-income countries
than in rich countries. Hlasny (2020) uses surveys collated and harmonized by the Luxembourg Income
Study for 38 middle- and high-income countries to show that non-response rates are correlated with
GDP per capita. Countries with PPP GDP per capita of around $10,000 had an average non-response
rate of around 10%, whereas this was around 25% for those with per capita incomes of around
$35,000.

When measuring the top of the income distribution, the concern is the unit non-response rate at the
top of the distribution. Ravallion (2022) notes that direct evidence that rich people are missing from
surveys is scarce. As evidence that they are missing, he cites a paper by Székely and Hilgert (2007),
who used household survey data on 18 South American countries which showed that the highest
income individual in each survey had incomes that were similar to that of managers in medium-sized
companies, meaning that the top of the distribution was missing. Ravallion (2022) notes that one
reason that the rich are assumed to be missing is that rich people have a higher opportunity cost of
time and may therefore be less like to respond. Choumert-Nkolo et al. (2023) focus on non-response
and assert that high income households in low and middle-income countries (LMICs) often live in gated


                                                                                                      5
communities that are much harder to access, resulting in higher non-response rates through non-
contact, although they do not provide evidence for this (reasonable) assertion.

Kennickel (2019) shows that non-response rates in the US Survey of Consumer Finances are much
higher for those at the very top of the income distribution. The survey has two sample frames, one
derived using a regular household survey frame and another for rich households, which is derived
from tax filer data. For the richest stratum of the tax filer derived sample frame, described as “a group
in the upper reaches of the top 1%”, the response rate in 2013 was less than 10%, compared to 66%
in the sample obtained from the regular household sample frame (Kennickel, 2019: 446).

Hlasny and Verme (2018) used data from the Egyptian Income and Expenditure Survey to show that
unit non-response rates are higher in Egyptian governorates with higher mean income per capita. The
National Income Dynamics Study (NIDS) in South Africa obtained much higher unit non-response rates
than the previously mentioned LMIC surveys- 31%. Non-response rates by race of the predominant
racial group in the primary sampling unit (PSU) also varied. Predominantly white areas had non-
response rates of 64% while predominantly black areas had a non-response rate of 24% (Leibbrandt
et al. 2009). Since race is strongly positively correlated with income in South Africa, the implication is
that non-response rates are increasing with income, which is a concern when measuring the top of
the income distribution.

Item non-response
We have noted that refusals or non-contacts may limit how well one measures the top of the income
distribution if the rich have high non-response rates. But those individuals who do respond to a survey
may not answer specific questions about incomes, wealth or consumption, since these are more
sensitive. This issue is called item non-response. Like unit non-response, this can be missing
completely at random, or it may be related to other individual characteristics or income levels
themselves. If the latter, then this is a form of selection bias, like the potential bias of unit non-
response.

Bollinger et al. (2019) used linked survey and administrative earnings data from the US to show that
item non-response is not missing completely at random- it is highest at low and high earnings. Such
studies require substantial resources and are thus unfortunately uncommon in LMICs. However,
Flachaire et al. (2022) did obtain such linked survey and income tax data from Uruguay. The authors
show that item non-response was around 10% in the surveys and that it was highest at the bottom of
the income distribution, a surprising result that, taken at face value, suggests that item non-response
may not be as much of a concern when measuring the top of the income distribution as one might
expect.

                                                                                                        6
Many surveys ask for a reason for the item non-response. Both NIDS and Statistics South Africa
household surveys allow “refusal” and “don’t know” answers to earnings questions. In the Statistics
South Africa surveys, “don’t know” responses are overwhelmingly given by proxy respondents, who
are answering on behalf of another household member. In some sense then, don’t know item non-
response is similar to non-contact unit non-response, but it is a specific household member who
cannot be contacted to obtain answers to some questions that another household member does not
know the answer to.

Lepkowski (2005) and Si et al. (2023) highlight that item non-response for total individual or household
income or wealth may be a combination of item non-response or response for several questions on
each subject for the same person or household, making measurement of total income, consumption
or wealth even harder. In South Africa wave one of the National Income Dynamics Study (NIDS) in
2008, 41% of households had at least one of a detailed set of consumption items missing (Finn et al.
2009). We discuss solutions to item non-response in the sections that follow.



Measurement error
Even if individuals respond to a survey and to the questions on incomes, there is still a concern that
the true income of household members is not accurately captured. The obvious example of
measurement error when thinking about the top tail of the income distribution is under-reporting by
respondents who do not want to reveal their true incomes, perhaps because of concerns that these
will be reported to tax collection agencies (such data sharing is illegal in some countries). Other types
of measurement error affecting the top tail could occur when survey questions ask about gross
income, but respondents give after tax income or even after other deductions such as medical
insurance or pension fund contributions (which could be important in the top tail even in low-income
countries and more important in middle income countries).

It is hard to find direct evidence of under-reporting because this requires matched survey and
administrative data. We noted above that Flachaire et al. (2022) obtained such data for Uruguay. The
authors showed that under-reporting in the household survey they used is, on average, highest at the
top of the tax income distribution, as might be expected, but that there is over-reporting at the bottom
of the distribution. In the top five percentiles, the tax data values are, on average, around 2 to 2.5
times as large as the values reported in the surveys. This is novel evidence that under-reporting is
indeed an important issue when measuring the top tail of the income distribution, although the data
requirements are so exacting there are, to our knowledge, no other similar studies for LMICs. Bollinger




                                                                                                       7
et al. (2019) also find that similar patterns of measurement error in earnings in the US Current
Population Survey (CPS).


Sparseness
Household surveys used to measure the top tail of the wealth or income distribution may have a
sample of only a few thousand households. This makes it possible that no very high-income
households or individuals are sampled, meaning one has a right truncated distribution. And when such
individuals do appear in the sample, they may appear to be incorrect outliers (Jenkins, 2017, Lustig,
2020). This can lead to volatility when measuring the top tail of the distribution over time within
countries (Burkhauser et al., 2017) or across countries, meaning it is harder to discern trends or make
cross country comparisons. But there may also be incorrect outliers, and discerning which data are
real and which are not is thus important.

Proxy respondents
We highlighted above that item non-response can result from proxy respondents answering “don’t
know” to questions on incomes of other household members. Many household surveys, including the
LSMS, ask most or many questions to the most knowledgeable household member (Kilic et al. 2021,
Doss et al. 2008) or, in the US CPS, to any one household member (Bollinger et al. 2019), rather than
to each member individually. 4 In discussing the measurement of consumption, Deaton (2005) notes
that proxy respondents may also answer incorrectly or under-report and that this is likely to be worse
for richer households when the responding household member may not know about all consumption
of other household members. This means proxy respondents may introduce additional measurement
error, over and above that discussed above.

Hasanbasri et al. (2022) showed that measured wealth inequality is higher when questions about
individual asset ownership assets are asked to each household member individually, rather than to a
most knowledgeable household member. This research suggests that whether data was collected
from each member or from one member in a household survey will matter when measuring the top
tail of the wealth distribution.




4
  The South African National Income Dynamics Study is one example of a survey which attempted to interview
each household member separately as well as obtaining household information from the oldest woman in the
household. A shorter proxy respondent questionnaire was completed if the individual could not be
interviewed.

                                                                                                         8
Data processing
How survey organizations process the data obtained from respondents is also an important issue for
measuring the top tail of the income distribution. The main concern is top-coding, where the survey
organization does not provide the actual income for individuals close to the top of the distribution but
provides the value of some percentile for all respondents above that value- usually as a way of
mitigating privacy concerns. The extent of top coding can be large- Burkhauser et al. (2012) document
that 4.6% of individuals in the US CPS live in households in which some source of income was top
coded.

In LMICs where statistical capacity may not be as good as in rich countries there are other data
processing concerns. McLennan et al. (2021) show that in the Tanzanian Household Budget Survey
(HBS) public data there are very large labor income values generated from individuals reporting high
hourly rates of earnings, which are then multiplied by 8 (hours in a day) and then by 22 days in a
month. These errors are so large that the implied tax revenue is substantially higher than actual tax
revenue collected. McLennan et al. (2021) highlight that the public release documentation for the
Tanzanian HBS contains no information about how such cases are treated. How these were processed
and whether enough data is released publicly to allow alternative assumptions will matter when
measuring the top tail of the distribution.

Some of the very large hourly earners in Tanzania might be contaminated incorrect outliers but it is
possible they could be genuine high earners. The solution used by McLennan et al. (2021) is to cap
large values at the 99th percentile of the labor income distribution, which is not suitable when trying
to measure the top tail of the distribution.

A lack of documentation on this and other edits and decisions taken when processing data from
household surveys is likely to be true in other LMICs. Kerr and Wittenberg (2021) note that earnings
are imputed in South Africa’s Quarterly Labour Force Surveys, that these imputations are of poor
quality but that they are undertaken and how they are undertaken is not mentioned in any public
documentation. The obvious recommendation is that survey organizations and statistical offices
should release as much documentation and data as possible without compromising respondent
anonymity.




Lack of income data in household surveys in some LMICs
A key challenge in measuring the top tail of the income distribution in LMICs is that many household
surveys do not collect data on incomes. This is mainly because reported incomes are seen as less


                                                                                                      9
reliable than reported consumption in LMICs, where self-employment and/or agriculture
predominate, and which makes accurate reporting of incomes difficult (Deaton, 1997). The concerns
about the reliability of income data collected in poor countries pertain mainly to measuring poverty.
It is possible that income data is more reliable at the top end than at the bottom because agricultural
income is likely to be less important at the top. However, self-employment income is still likely to be
important, although the extent to which more accurate record keeping in formal or at least larger
businesses translates into better income data in surveys is unclear.

Lack of common support
All the challenges discussed thus far can limit the number of observations with incomes close to the
top of the true income distribution. At worst, this can result in a lack of common support between the
top of the distribution in the household survey and the true distribution, meaning that there are simply
no individuals in the surveys with reported incomes near the top of the true income distribution
(Lustig, (2020), Ravallion, (2022)). This implies the surveys will not accurately represent the top of the
income distribution. But it is also a concern because some of the solutions to the measurement
challenges that we discuss below require at least some individuals in the surveys at the top of the
distribution.

Table 1 summarizes the challenges described above, in a rough order of their importance when
measuring the top tail of the income or wealth distribution. However, we should stress that the order
is likely to vary in different countries and/or time periods and depend on the level of development of
the country concerned. In the section below outlining the research agenda, we further discuss which
challenges are more important and deserve further investigation.


 Table 1: Measurement Challenges for the Top tail of the Income and wealth distributions

 Challenge         Description
 Unit non-         High Income or Wealth Individuals or Households do not respond to the survey
 response          at all
 Item non-         High Income or Wealth Individuals or Households respond to the survey but do
 response          not respond to income or wealth questions
 Measurement       High Income or Wealth Individuals or Households respond to the survey but
 Error             under-report their incomes or wealth
                   The survey sample size is relatively small, and few or no high income or wealth
 Sparseness
                   individuals are sampled
 Lack of
                   The challenges above mean that there are no individuals with income or wealth
 Common
                   in the top tail of the distribution
 Support
 Data              The survey organization creates measurement error that makes the measured
 Processing        top tail unreliable

                                                                                                       10
                   High Income or Wealth Individuals do not respond to the survey and other
 Proxy
                   household members answering on their behalf under-state their incomes or
 Respondents
                   wealth


Ex ante solutions for the challenges when using household survey data
Following a distinction made by Ravallion (2022), in this section we discuss ex-ante solutions to
sparseness, unit non-response and item non-response – solutions that can be implemented in survey
design. In the next section we discuss ex-post solutions that use survey and administrative data to
address the general challenge of missing top incomes.

A summary of all the solutions to the challenges of measuring the top tail of the income and wealth
distributions reviewed in this paper, as well as a description of these solutions and key references, can
be found in Table 2 below.

 Table 2: Solutions to Measurement Challenges for the Top tail of the Income and wealth distributions
                                                                                               Key
           Solutions          Challenge Data Type               Description                    references
                                          Requires tax data
                                          with address
                                          information or
                                                                Use tax data or population
            Oversample rich               population                                           Kennickel
 Ex Ante                      Sparsity                          census data to oversample
            households                    census or tax                                        (2019)
                                                                rich households
                                          data with low
                                          geographic level
                                          identifiers

            Improve
                                 Unit non-                         Require multiple visits,         Ravallion
            fieldworker                       None
                                 response                          improve fieldworker training     (2022)
            protocols


                                                                   Design questionnaire to
            Questionnaire                                          obtain at least some income
                                 Item non-                                                          Lepkowski
            design                            None                 information for those who
                                 response                                                           (2005)
            improvement                                            would otherwise not provide
                                                                   any.

                                                                   Use PSU or regional              Korinek et
                                 Unit non-                         response rates plus an           al. (2006),
 Ex Post    Reweighting                       survey data only
                                 response                          econometric method to            Ravallion
                                                                   adjust sample design surveys     (2022)




                                                                                                      11
                                                                  Use admin data to adjust
                                                                  sample weights so that
                                                                                                  Campos-
                                             survey and admin     shares of high earners in the
             Reweighting                                                                          Vazquez and
                                             data                 surveys match the admin
                                                                                                  Lustig (2019)
                                                                  data. Requires common
                                                                  support.

                                                                  Use incomes of those who        Campos-
             Replacing-         Item non-
                                             survey data only     do respond to impute            Vazquez and
             imputation         response
                                                                  incomes for those who don't     Lustig (2019)

                                                                  Estimate parameters for (for
                                                                                                  Jenkins
                                                                  example) a Pareto
             Replacing-                                                                           (2017),
                                                                  distribution using the
             parametric         Various      survey data only                                     Hlasny and
                                                                  bottom X% of the survey
             distribution                                                                         Verme
                                                                  distribution, impute earnings
                                                                                                  (2018)
                                                                  for the top (1-x)%
                                                                                                  Alvaredo
                                                                                                  and
                                                                                                  Londono
                                                                                                  Velez (2013),
                                                                  Replace the top X% of the
                                                                                                  Bach et al.
                                                                  survey distribution using
                                             survey and admin                                     (2009),
             Replacing          Various                           admin data and either
                                             data                                                 Burkhauser
                                                                  parametric or non-
                                                                                                  et al. (2018),
                                                                  parametric methods.
                                                                                                  Czajka
                                                                                                  (2017), Van
                                                                                                  der Weide et
                                                                                                  al. (2018)
                                                                  Endogenously determine a
                                                                  merging income value,
                                                                  above which use admin data
             Reweighting and                 survey and admin                                     Blanchet et
                                Various                           to reweight survey data and
             Replacing                       data                                                 al. (2022)
                                                                  then replace survey with tax
                                                                  data to allow representivity
                                                                  at the very top


Sparseness
One solution to the lack of sampled households or individuals at the very top of the distribution is to
oversample such households. The US Survey of Consumer Finances aims to obtain estimates of the
wealth of households in the US. The survey has two independent samples – one that has a positive
probability of selection for all US households and a second that includes only the rich in the sample
frame, which is derived from tax filings (Kennickel, 2019). Although such a sample design does improve
the sample size at the top of the distribution and thus potentially solves the sparseness problem, the
response rates for the survey drawn from the rich sample frame survey are still very low, less than
10% for the wealthiest (Kennickel, 2019). The very low response rates implies that sparseness could

                                                                                                    12
still be an issue. Oversampling is not common in LMICs but is undertaken in some surveys – for
example measuring wealth in Chile and Uruguay (Gandelman and Lluberas, 2023).

Unit non-response
We noted that a solution to sparseness is to oversample high-income individuals. But if there is a lack
of high-income individuals because of unit non-response of high-income individuals then
oversampling can also play a role in ameliorating possible unit non-response bias.

Unit non-response can also be reduced if survey protocols require enumerators to visit a certain
number of times on different days before recording a non-contact (Ravallion, 2022). Another potential
solution to unit non-response that has been discussed in the literature is to provide incentives to
participate in the surveys. Ravallion (2022) points out that is a concern, especially for measuring the
top tail of the income distribution, because the poor may be more likely to be persuaded than the
rich, which could make non-response bias worse (Ravallion, 2022). The impact of incentives on
response rates was investigated by Stecklov et al. (2018), who randomized a small monetary incentive
in an Indian household survey, finding that whilst it did improve response rates it also resulted in those
households receiving the incentive reporting lower consumption. The authors hypothesized that this
was due to respondents exaggerating their poverty in the hopes of future payments from the
surveyors.

Item non-response
Lepkowski (2005) notes that one way of reducing item non-response to income or wealth questions
asking for a monetary value a priori is to ask “unfolding bracket” questions if a respondent is unwilling
or unable to give a monetary value, in which respondents are asked whether their income lies above
or below a certain value. Depending on the answer, further questions about the range in which income
lies are asked. This method was used in the South African National Income Dynamics Study. A simpler
version of this is one question where individuals are asked which bracket their incomes fall into. This
at least provides some indication of the level of income. It is used by Statistics South Africa in the
Quarterly Labour Force Surveys and General Household Surveys. In 2020 quarter 1 around 20% of
employed individuals give bracket responses, around 50% give amounts and 30% refuse or don’t
know.

Ex post Solutions for the Challenges Encountered When Using Household Survey Data
Despite the best efforts of survey organizations to lessen item and unit non-response and sparsity, the
income or wealth data from household surveys may still be missing part of the top of the distribution.
In this case ex-post methods are required to solve missing top incomes. Hlasny and Verme (2022)
characterized ex-post solutions to under-capturing of the top tail of the income distribution in

                                                                                                       13
household surveys as methods that used either replacement or reweighting. Replacement involves
replacing some part of the top of the income distribution in the household surveys parametrically- e.g.
with values from the Pareto distribution with the parameter estimated using the rest of the survey
distribution. Reweighting involves increasing the weights of those at the top of the distribution to
account for unit non-response. We now discuss reweighting for unit non-response as well as
imputation and parametric replacement, both of which can be classified as methods using
replacement. Lustig (2020) highlighted that the replacement and reweighting distinction can also be
applied to solutions that use administrative data, and we discuss solutions using administrative data
in the following section.

Using survey data only
In this section we first discuss reweighting as a solution to non-random unit non-response and
imputation to solve non-random item non-response. These are methods with long histories in the
statistics literature and solve very specific problems with specific solutions. We then discuss using
consumption data and imputing top incomes using the survey data.

Reweighting for unit non-response
Unit non-response is unlikely to be MCAR. Ravallion (2022) discusses solutions for non-random unit
non-response. One is to find replacement observations, but this is unlikely to solve non-random non-
response, since presumably the same biases affecting original non-response will affect which
households respond as replacements. One benefit of replacement is that it does raise the sample size,
which may be important for studying the top of the income distribution since it can reduce sparseness
problems.

A commonly used method to solve unit non-response is reweighting, where the respondents in
different groups (“weighting classes”) are upweighted to represent themselves and those in the same
group that did not respond. The reweighting groups are often regions (Korinek et al. (2006)), which
implies that for the method to be successful non-response must be random within regions, or that
non-response is “ignorable” within regions (Ravallion, 2022). The size of the regions used is likely to
affect the proportion of bias the reweighting can solve, since non-ignorability is likely to be less of a
concern when using smaller geographical regions. The US CPS uses 254 cells for the entire country. It
is common for the PSU, a much smaller region, to be the area used to undertake the reweighting
(Hlasny and Verme (2018) document this for the Arab Republic of Egypt, Kerr and Wittenberg (2015)
document this for South Africa).

Korinek et al. (2006) argue that non-response is unlikely to be ignorable and provide an econometric
method to adjust survey weights for selective compliance in responding. Korinek et al. (2007) show

                                                                                                      14
that responding is positively correlated with incomes in the US CPS and that correcting for this raises
the mean income in the top percentile by around 40%. The authors note that only regional level
response rates are required to implement the method.

The Korinek et al. (2006) method has subsequently been used in several papers to correct for selective
compliance on income. One of these is Hlasny (2020), who used 66 Luxembourg Income Study surveys
for 38 upper- and middle-income countries and collected regional unit non-response rates to
implement the Korinek et al. (2006) corrections. He finds extremely large effects across the countries
for some statistics- the mean top 1% share of income is 6.5% across the 66 surveys but the Korinek et
al. (2006) non-response adjustment increases this to 17% (Hlasny (2020), appendix A6). There are also
several extreme or even implausible cases- the Italian top 1% share rises from 6% to 44% in the 2008
survey after the unit non-response corrections but only from 5% to 16% in the 2010 survey.

Imputation for item non-response
Like in the case for unit non-response, it is possible to ignore item non-response by using only the data
from responders. To the extent that item non-response is ignorable this is reasonable. It would still
mean the sample size is reduced, however, which matters for the variance of any estimates and can
worsen sparsity problems.

But the missing data is very unlikely to be ignorable and so other solutions are required. The most
common solution is imputation. This involves using responses to other questions in the survey and/or
external data to predict earnings for those who did not respond to the question on income. A common
method for incomes is hotdeck imputation, in which the income of a person or household with the
same set of specified characteristics is “donated” to the individual with missing income (Lohr, 2009).

The problem with single imputation methods, including hotdeck imputation, is that they fail to capture
the uncertainty about the true values for those whose values are imputed. Multiple imputation
methods (Rubin, 1987) were developed to overcome this problem, in which missing data is imputed
multiple times- for example 10 donors are chosen randomly instead of one for hotdeck multiple
imputation. The variation across the estimates from the multiple samples captures the uncertainty.

One issue with any imputation method when measuring the top of the income distribution is that
there may be a high fraction of individuals with missing incomes at the top of the income distribution.
In this case, the few individuals with incomes will each donate their incomes to multiple other sample
members. This is a version of the sparsity problem discussed above.

We noted that some surveys allow bracket responses, which is still partial item non-response.
Wittenberg (2008) provides a solution when there are bracket responses. The idea is similar to


                                                                                                      15
reweighting for unit non-response- to weight up those who respond with amounts so that they
represent themselves and the other respondents who responded in the bracket in which the amount
reported falls. This improves on single imputation methods (such as mid-point or even hotdeck)
because it allows for the uncertainty in the true value, which other single imputation methods do not.

Imputing the income distribution from consumption data
Household surveys in many LMICs collect only consumption data, since income data is assumed to be
less reliable, particularly for measuring poverty. Chancel et al. (2023) attempt to estimate income
inequality levels and trends in Africa between 1990 and 2010 for countries covering 60% of the
population (and 80%-90% of the population from 1995-2010). But because of the complete lack of
income data or harmonized and publicly available consumption microdata for most of these countries,
Chancel et al. (2023) used the World Bank’s Povcalnet consumption data on shares of consumption
by decile and then data for five countries where both income and consumption microdata were
available was used to impute the income distribution in the rest of the countries. This lack of income
data implies that estimating the income distribution in many African countries, let alone the top end
of this distribution, is impossible with the data that currently exists and is publicly available. As we
discuss below, there are other African countries with surveys that asked about income and
consumption that were not used by Chancel et al. (2023). We also note that limited income data
collection is not prevalent in all LMICs - Latin America has long had income data collected in household
surveys (Lustig, 2020).

Using survey data and parametric distributions estimated from survey data
Imputation for item non-response and reweighting for unit non-response are specific solutions for
specific problems. A more general solution to the more general problem of missing top income
recipients (whether due to sparsity, measurement error, non-random unit non-response etc.) is to
replace the incomes at the top of the distribution with values drawn from a specific distribution. For
example, if top incomes are assumed to have a Pareto distribution it is possible to estimate inequality
measures for the top X% of observations assuming the incomes are Pareto distributed, where the
Pareto parameter is estimated using the survey data above the Xth percentile. An overall measure of
inequality, like the Gini coefficient or the share of the top 1%, is then estimated from the top X
percentiles and the bottom (100-X) percentiles.

Jenkins (2017) argues that this method of replacement of top incomes does not work very well in the
UK for solving under-capturing of top incomes, by comparing it to methods using tax data. Hlasny and
Verme (2018) use this method to estimate income inequality in Egypt. In their preferred results Hlasny
and Verme (2018) replace the top 10% of the distribution with a Pareto distribution estimated using


                                                                                                     16
the rest of the distribution, finding that the estimated income Gini coefficient is unchanged by this
replacement of top incomes.



Using survey data and administrative data
As noted above, Hlasny and Verme (2017, 2022) described solutions to under-capturing of the top tail
of the income distribution in household surveys as methods that used either replacement or
reweighting. Lustig (2020) used this distinction to also classify many of the methods that use both
household survey and administrative data as methods using either replacing and/or reweighting. In
the context of combining administrative and household survey data, replacement involves replacing
some part of the top of the income distribution in the household surveys with data from another
source. Reweighting involves increasing the weights of those at the top of the distribution, to match
the distribution of income in the administrative data source. We now discuss each of these methods
and provide examples.

Reweighting
Survey calibration means the adjustment of survey weights, usually after unit non-response
adjustment of the weights, so that the weighted population estimates for basic demographics (such
as age, sex or regions) from the household surveys matches information from demographic models or
other external sources of population information. These “reweighting” methods have also since been
used to improve estimates of the top tail of the income distribution when administrative data on
earnings has been available. For example, Campos-Vazquez and Lustig (2019), obtained Mexican Social
security administrative earnings data between 2000 and 2017. The admin data they had access to is
the proportion of employed individuals in 25 earnings categories representing multiples of the
minimum wage. The authors’ comparison of the survey and admin data clearly shows the under-
capturing of the top tail of the earnings distribution. In 2017 the admin data showed that 9% of formal
sector workers earned more than 10 times the minimum wage, whereas the proportion was only 2%
in the labor force survey. The authors then adjust the survey weights so that the proportion of formal
sector workers in each earnings category in the surveys matches the proportion in the administrative
data.


Campos-Vazquez and Lustig (2019) note that the item non-response rate for the earnings questions
in the Mexican labor force survey rose from 5% in 2002 to 30% by 2017. To solve this issue one of the
sets of corrections implemented by the authors is to first undertake hot deck imputation for the item
non-response and then the reweighting explained above. So this set of corrections actually involves
replacement and then reweighting. As well as mechanically increasing the share of individuals in the

                                                                                                    17
top earnings bracket due to the calibration, the corrections make a substantial difference to trends
over time in the top tail. Without imputation and reweighting the earnings for the top 10 percentiles
supposedly shrank by between 25% and 30%, whereas with these corrections they rose by between 1
and 8 percent.


Replacement
Replacement using household and administrative data means that some of the incomes in the
household surveys are replaced with incomes in the administrative data. There are various methods
that have been used in this broad approach. One distinction is between those that use parametric and
non-parametric replacement (Lustig 2020). Parametric replacement involves estimating the Gini
coefficient or some other inequality measure for the top x% of the income distribution from tax or
other administrative data by assuming the top incomes follow a Pareto distribution, estimating the
Pareto parameter and using this to obtain the implied Gini coefficient of the top x%. This is very similar
to the methods described above using survey data only. One can then estimate the Gini coefficient for
the bottom (1-x) % of the population using survey data and combine them to obtain an overall
measure of inequality, using the results of Alvaredo (2011) that the Gini can be decomposed using
shares of the top (say) 1% and the bottom (say) 99%. Alvaredo (2011) used tax data and the Current
Population Survey for the US to show that the increase over time in the Gini coefficient doubled when
adjusting for missing top earners. This method was also used by Alvaredo and Londono Velez (2013)
for Colombia, who showed that when using tax data to replace the top 1% of incomes in the surveys,
inequality levels were substantially higher, and the trend of declining inequality was much smaller
than when using uncorrected household surveys.


Non-parametric replacement with admin data involves either replacing higher income individuals in
the household surveys with individuals from admin data or imputation of incomes in surveys using
incomes in admin data observably similar individuals. Bach et al. (2009) undertook this this for
Germany, using unit record tax data and the German Socio-Economic Panel to match individuals in the
survey and tax data and replace the survey respondents with similar individuals in the tax data. They
found that the Gini increased by 6 percentage points, and that increases among individuals at the very
top of the income distribution drove the increase in inequality in Germany between 1992 and 2003.
Non-parametric replacement was also undertaken by Burkhauser et al. (2018), who used UK
household surveys but replaced top incomes with cell means from tax data for the UK, which is a
method applied by the UK Department of Work and Pensions. This increased Gini coefficients in most
years between 1994 and 2014 and reduced volatility over the different surveys compared to the


                                                                                                       18
estimates using household survey data only. Czajka (2017) used tax and survey data from Côte d’Ivoire
to implement a similar correction- he adjusted the survey data incomes in the formal private sector in
each percentile by the ratio of the tax to survey ratio of mean incomes in each percentile. This
adjustment increased the share of the top 1 percent by nearly 50 percent.


Van der Weide et al. (2018) examine inequality in urban Egypt and are concerned with the under
capturing of top incomes in household surveys. They note that in most developing countries tax data,
even in tabulated form, are not available. The authors also highlight that tax evasion is common in
LMICs, as is a large informal sector, which limits the usefulness of tax data even when it does exist. To
solve the lack of administrative data in Egypt, Van der Weide et al. (2018) use data on urban house
prices, obtained from a private company, to improve estimates of top incomes. There are several steps
undertaken by Van der Weide et al. (2018) to use house prices to replace top incomes, not all of which
seem reliable. One obvious issue is the lack of house prices in the surveys to estimate a relationship
between household income and house prices, which would then be applied to the admin data on
house prices. The lack of house prices means the authors estimate the relationship between
household incomes and housing rents, which are imputed for owner occupied housing. There is also
no detail provided on how rents are imputed. Despite these issues, this is an example of a method
that can be used when tax data are not available. The impact of the adjustments using house price
data to impute incomes is very substantial- the Gini rises by around 30% after the replacement of top
incomes.


Reweighting and replacement
The third solution to missing top incomes that uses household and administrative data is a
combination of the previous two methods. Blanchet et al. (2022) focus on reweighting to solve the
under capturing of the top tail in household surveys, but they also use replacement. The authors
highlight that in previous work using replacement of household survey top incomes with admin data
(including Alvaredo (2011) for the US, Burkhauser et al. (2018) and Alvaredo and Londono Velez
(2013), all discussed in the previous section), the point in the distribution where survey data
information is replaced with administrative data is chosen arbitrarily (e.g. the top 1%). The novel
contribution of the authors is to endogenously determine the optimal merging point between
household surveys and tax data. Having done so, the authors use survey calibration methods to ensure
that the numbers of individuals in each tax bracket in the surveys and the tax data match.

Whilst the calibration solves non-sampling error problems, Blanchet et al. (2022) note that the method
still leaves small samples at the top of the distribution, which can, as discussed above, result in large

                                                                                                      19
sample to sample variation. If there is a substantial lack of common support simply because of
sampling error, then reweighting can also not solve this. To make progress Blanchet et al. (2022) use
a replacement step, creating duplicates of observations in the surveys and assigning to each
observation above the merging point the average income for its population share (the population
share is determined by each individual’s weight and the population size) in the tax data. This
replacement step also preserves the distribution of other variables and their relationship with income
in the survey data.

Alvaredo’s (2011) analysis for Argentina assumed that the top 1% was completely missing from the
household surveys, and therefore tax data should be used for the top 1%. Blanchet et al. (2022) point
out that this is an extreme form of reweighting where the weights of the survey respondents are
rescaled, so they represent the bottom 99% and then tax data completely represents the top 1%.
Alvaredo (2011) showed that the Gini was 6-7 percentage points higher when correcting for missing
top incomes using this method for Argentina.


Chancel et al. (2023) estimate inequality and provide distributional national accounts for African
countries between 1990 and 2017. In doing so the authors attempt to overcome several challenges,
as discussed above, including a lack of income data in many African countries, poor quality of national
accounts and the assumed under-representation of the top tail of the distribution. Their starting point
is not raw survey data, but tabulations on consumption per capita from the World Bank’s Povcalnet
database, from which they impute consumption percentiles. They do use household survey microdata
for five African countries, because these surveys asked respondents about income and consumption,
and so are used to determine the relationship between the two over the distribution of consumption,
meaning consumption can then be used to impute incomes where only consumption data is available
from Povcalnet.


Chancel et al. (2023) then use tax data from South Africa and Côte d’Ivoire, the only African two
countries where such data was available to the authors, to determine the extent of under capturing
of top incomes and then to use this to correct the income distributions for the rest of the countries,
themselves almost all imputed from consumption decile share data derived from surveys. Chancel et
al. (2023) then parameterize the under-capturing of incomes by percentile using the tax data. They
use their results to assume that the bottom 80% of the distribution is unaffected while the top 1% is
adjusted upwards by between 50 and 100%.

This summary of differing methods using survey and administrative data shows that attempts to solve
the under-capturing of the rich in household surveys have common elements, although the (non-)

                                                                                                    20
availability of administrative data sources is a key determinant of what is possible. Increasing access
to tax data, even in summarized form, seems like a key goal if under-capturing of the top tail of the
income distribution is to be solved.


Challenges in Using Administrative Data
The discussion above has highlighted that administrative data can be used in several ways to solve
challenges encountered in measuring the upper tail in household surveys. But these data also come
with their own challenges, particularly in LMICs. In this section we discuss some of these challenges.
Awareness of the challenges of using admin data is not new. Kuznets (1963:12), in comparing top
income shares across countries produced by several different researchers, noted that “it may not be
an exaggeration to say that we deal here not with data on the distribution of income by size but with
estimates or judgments by courageous and ingenious scholars relating to size distribution of income
in the country of their concern.”


Haq (1964) lists several specific concerns in her analysis of income inequality in Pakistan using
aggregated tax data. These concerns have subsequently been highlighted in recent research using
admin data to better measure the top tail of the income distribution. These include the very small
proportion of the population covered by tax data, which was one tenth of one percent in Pakistan at
the time, changes in tax legislation that limited comparability over time (for example changes in the
tax threshold), changes in efficiency of tax collection over time and the extent of the unregistered (i.e.,
informal) sector.


Other issues have also been highlighted in more recent research. Ravallion (2022) cautions that
comparisons of surveys and national accounts data that show lower total incomes in surveys
compared to national accounts could reflect the poor quality of national accounts rather than missing
top incomes. Blanchet et. al (2022) note that in LMICs tax data is almost always in the form of
aggregated tables rather than microdata, which limits what it can be used for and point out that even
when microdata is available it contains few covariates, which can limit how the data can be used to
improve estimates of the top tail from household surveys.

Kerr (2021) highlights that researchers using and comparing tax and survey data for South Africa
created yearly earnings from household surveys obtained by multiplying monthly earnings by 12 for
those employed at the time the survey was undertaken. This is very different conceptually to tax data
on earnings in the last year, since the surveys exclude those employed at other points in the year, who
are more likely to be low earners. Kerr (2021) show that, in the South African case, this error makes

                                                                                                        21
the household survey and tax data earnings distributions look closer than they are, and thus
understates the extent of under-capturing of high earners. 5

Piketty et al. (2022) note that tax evasion and avoidance can reduce measured top income shares
compared to the true number and that these can both vary across time and countries, making within
and between country comparisons difficult. Jouste et al. (2023) show that an increase in the top
marginal tax rate in Tanzania, from 30% to 40%, which affected the top 1% of earners, may have
resulted in declines in reported incomes. In addition, that income in tax data does not include incomes
important at the top of the distribution, like retained earnings in firms (Piketty et al., 2022), is also
likely to be important in middle-income countries, although probably not for low-income countries
with very large informal sectors.

A final challenge is the survey and tax data may not identify the same units, and tax and other forms
of administrative data may not give a welfare measure that economists care about (Ravallion, 2022).
Many countries have joint tax filing for married couples, and this can make it more difficult to use the
tax data to complement surveys. Tax data also usually tell us little about households and thus cannot
easily be used to create a measure of household per capita income. Ravallion (2022) points out that
using tax data to improve surveys can often require researchers to define and use taxable income
from the surveys to match what is available in the tax data, but that taxable income is not an
appropriate measure of welfare, ignores transfers and is thus likely to overestimate income inequality.




Measuring the Upper Tail of the Wealth Distribution
Thus far we have focused on the challenges when measuring the top of the income distribution. But
much of the preceding discussion applies directly to wealth. In this section we focus on aspects of
measuring the top of the wealth distribution where there is not a simple correspondence between
the literatures on measuring the top of the income and wealth distributions.

All of the challenges that we have discussed above in measuring the top of the income distribution
using household surveys also apply to measuring the wealth distribution, although some are even
more of a concern. The ex-ante solutions we discussed above can also be used to solve wealth
challenges, as can the ex-post solutions using survey data. But other solutions that researchers have
used have relied on novel types of data. We discuss these issues in this section.



5
 This is not an issue if a survey asks about earnings over the last year to all individuals, including those not
currently employed, as the US CPS does. In countries where most employed people are employed throughout
the year, this may also not be a substantial concern.

                                                                                                              22
Sources of Wealth Data
Wealth data is collected less frequently than incomes, which are themselves not as common as
consumption, particularly in low-income countries. But wealth surveys do exist- China and India have
household surveys that collect detailed data on assets and liabilities of households- the Household
Income Project (CHIP) for China and the All-India Debt and Investment Survey (AIDIS). There are also
wealth focused surveys in Latin America- the Chilean Household Financial Survey, Colombian
Household Financial Burden and Financial Education Survey, Mexican National Survey on Household
Finances and the Financial Survey of Uruguayan Households (EFHU) (Gandelman and Lluberas, 2023).
But such surveys are not common in LMICs. One recent example to improve data on assets is the
World Bank Living Standards Measurement Study‐Plus (LSMS+) program to measure the ownership
of, and rights to, selected physical and financial assets in various African countries (Hasanbasri et al.,
2021). These surveys collected the value of assets but do not measure liabilities, so net wealth cannot
be computed.

Administrative data may provide information on wealth, although this is sometimes harder to utilize
than administrative data on incomes. This is because most tax administration systems do not collect
data on all or most forms of wealth directly. The most common wealth data collected through tax
administration systems is estate records (Piketty and Saez, 2006). Wealth data is also collected when
countries have wealth taxes, but these are not common. Some tax authorities ask questions about
assets, even though they are not taxed, but, given that they are not taxed, the accuracy of the data
can be questioned. To estimate wealth from income tax data, researchers use the capitalization
method, in which capital income and an assumed or observed rate of return are used to estimate the
value of the capital generating the observed capital income (Roine and Waldenstrom, 2015, Saez and
Zucman, 2016).

In recent times other less traditional forms of data have been used to measure the top of the wealth
distribution. One source is rich lists compiled by Forbes or other organizations (Piketty et al. 2022,
Bach et al. (2019), Xie and Jin (2015)). These sources provide wealth estimates for the extreme top of
the wealth distribution and can be used as a check on the household survey or administrative data
wealth estimates.

A second set of non-traditional data sources was used by Alstadsæter et al. (2019). These sources were
an HSBC Switzerland leak of customer data, voluntary declarations of hidden assets from tax
amnesties in Norway, Sweden and Denmark and leaked information of shell company owners in
Panama, the Panama Papers.




                                                                                                       23
Concentration of Wealth
Wealth is even more unequally distributed than income. Saez and Zucman (2016) measure wealth
inequality in the United States and find that in 2012 the top 10% of households owned a staggering
87.7% of the total wealth. In LMICs, wealth is also highly concentrated. In South Africa the top 10%
owned 85% of total wealth (Chatterjee et al, 2022), while in Latin America, the World Inequality report
estimated that the top 10% owned 77% of total household wealth (Chancel et al, 2022). The top 10%
owned 67% of wealth in China (Piketty et al., 2019) and 63% in India (Anand and Thampi, 2016). The
extreme inequality of wealth in many parts of the world means that the same challenges researchers
face when measuring the top of the wealth distribution using household surveys are even more of a
concern. For example, unit and item non-response and sparseness are much more severe problems
when wealth is concentrated in the hands of fewer people than incomes.

Solutions to Challenges of Measuring the Top of the Wealth Distribution
The solutions to challenges to measuring the top of the wealth distribution are similar to those
described above for income, so we do not discuss the methods themselves again. Instead, we discuss
research that has used these solutions to improve estimates of the wealth distribution in LMICs. We
noted above that wealth data is much less common than income data. This means that there is much
less research on wealth that attempts to improve measurement of the top of the wealth distribution
in LMICs.

Alstadsæter et al. (2018) is an example of using non-traditional data on wealth to supplement
administrative data estimates of the share held by the top of the wealth distribution. The authors used
the data described in Alstadsæter et al. (2019), discussed above, to improve the estimates of the
shares of wealth held by the top 0.01% for 10 countries, although the only LMIC included was the
Russian Federation, where the share held by the top 0.01% more than doubled once offshore wealth
is included. This implies that government-collected administrative data may have important
weaknesses when measuring the top tail of the wealth distribution.

Xie and Jin (2015) provide an example of improving the measurement of the top of the wealth
distribution using replacement and reweighting. The authors assume that the bottom 99.9% of the
Chinese wealth distribution is well represented by the household survey data they use. They then use
a Chinese rich list of the top 1,000 wealthiest Chinese and, assuming that the top can be represented
by the Pareto distribution, estimate the Pareto parameter from the rich list and then use this to
replace the top 0.1% with the values implied by the estimated Pareto parameter. This adjustment
means that the estimated share of wealth held by the top 1% rises from 16% to 35%.




                                                                                                    24
Chatterjee et al. (2022) combine surveys and income tax data to estimate wealth inequality in South
Africa, using a replacement method. They use incomes from household surveys and tax data and
replace the top of the distribution in the surveys with tax data at the point where survey data incomes
are lower than in the tax data, which they find to be around the 25th to 30th percentile. The authors
then use the income data to capitalize asset classes where capitalization can be applied and use survey
and tax data to estimate capital for other asset classes. The key finding is that South Africa has an
extremely unequal wealth distribution. The top 1% own 55% of total wealth, whilst the net wealth
held by the bottom 50% is negative due to liabilities exceeding assets.

Gaps in the Research on Measuring the Upper Tail of the Income and Wealth
Distributions
In this section we document gaps in the research that has been conducted on measuring the top tail
of the income distribution in LMICs. In the next section we discuss how these gaps can be filled by
outlining a possible research agenda.

Chancel et al. (2023) highlighted the limited availability of household surveys with income data in
Africa, as well as the lack of harmonized consumption data. This is very clearly an important gap- there
is no way to directly investigate the top of the income distribution without income data from surveys.

We have noted that a few papers using tax data from LMICs have shown important differences
between administrative and survey data at the top of the income distribution. But there is a need for
more studies in a wider variety of LMICs to further investigate the top of the income distribution in
surveys and admin data. This requires the availability of administrative tax data, at least in tabulation
form for different income groups. This type of data is not easily accessible and may best be pursued
on a country-by-country basis.

The use of other forms of data to check surveys in the absence of tax data is still uncommon. Van der
Weide et al. (2018) concluded that the Egyptian Income and Expenditure household survey severely
under-estimated inequality using external house price data but given the number of strong
assumptions required for their methods to be valid, it is possible that some of differences may be due
to the inadequacies of the method and/or the administrative house price data used rather than the
surveys.

Researchers finding solutions to the challenges of measuring the top of the income or wealth
distributions that use admin data generally do not investigate why the surveys and admin data find
substantially different incomes at the top of the distribution. To improve the methods for solving the
measurement challenges described above, to establish new methods and to improve the quality of


                                                                                                      25
survey and administrative data it would be useful to better understand why the distributions differ
when they do, especially because the few answers that have been given are for rich countries and are
not consistent, as we now describe.

A priori, unit non-response of high income or wealth individuals would seem to be the most important
factor explaining why surveys under-capture the top of these distributions. Evidence for this
proposition comes from Johansson-Tormod and Klevmarken (2022), who use administrative
data and two wealth surveys from Sweden in -2002 and 2003. The data is unusual because
the authors could match both survey responders and non-responders to the admin data
they accessed. The authors find that the survey underestimates mean wealth within the top
1 percent by 40 percent relative to the admin data. For the top percentile, Johansson-
Tormod and Klevmarken (2022) show that the mean of the top 1% using the survey response
values is very similar to the mean when using the same survey respondents (i.e. excluding non-
respondents) but using the wealth values from the admin data for those survey respondents (which
they can do because the surveys and admin data are linked). But the means for the top 1% are very
different when comparing the survey respondents using their wealth information from the admin data
and the full set of respondents plus non-respondents again using their wealth information from the
admin data. This implies that under-reporting is not a concern, but that unit non-response is.

Johansson-Tormod and Klevmarken (2022) use two surveys, finding similar results in both.
One of these is Share. The documentation for the survey indicates that for the Swedish
Share survey the only adjustment of the design weights is calibration to eight age and sex
totals from external demographic data. It is then perhaps unsurprising that unit non-
response is found to be the cause of the differences in the top 1%.

Burkhauser et al. (2017) compare UK household surveys and admin data based on tax records. They
show that the surveys diverge from tax data around the 95th percentile of the income distribution and
argue that under-reporting rather than unit non-response is responsible for this divergence, because
the 95th percentiles and below are similar in both data sources. Burkhauser et al. (2023) provide
further evidence in support of under-reporting being more important than unit non-response in the
UK, in the context of investigating the share of women in the top 1%. They show that the top 1% in
surveys and tax data have very similar observable characteristics, and that these are distinctly
different to the 9 percentiles below the top 1%.

We noted above that Flachaire et al. (2022) found that under-reporting at the top of the distribution
was important in Uruguay using linked survey and tax data, but the authors did not investigate unit




                                                                                                  26
non-response and thus could not examine the importance of unit non-response relative to under-
reporting.

The extent of item non-response to income or wealth household survey questions has not, as far as
we can tell, been the subject of any cross-country research. Campos-Vazquez and Lustig (2019)
showed that in Mexico the labor force survey has shown dramatic increases in item non-response to
labor income questions, from around 6% in 2002 to 30% by 2017, whereas the income and
expenditure survey item non-response has been roughly constant around 3%. The South African
Labour Force survey item non-response rate for labor income was around 5%-8% in the 2000s. The
publicly available Quarterly Labour Force Survey data from 2010 onwards does not allow for the
creation of reliable item non-response rates due to imputation but non-public data shows it also
increased to close to 30% in 2020. Further research on the broader trends in middle income countries
especially, where item (and unit) non-response rates are much higher, seems like an important gap,
as does the extent to which imputation improves the estimates of inequality and the top tail of the
income distribution.

A Research Agenda
Using Existing Data
Household surveys
Since we have noted that there is very limited income data in the public domain for some LMICs (e.g.
African countries), one crucial part of the research agenda measuring the top of the income
distribution would be to harmonize and make easily and publicly available the surveys that already
exist. The World Bank’s publicly available Statistics Online platform is one example of making data
public and more easily available, although this currently seems to include only data on consumption
and not income. It also seems to have been entirely ignored outside the World Bank. 6 The World Bank
is well placed to lead research that harmonizes the data that does exist, and ideally makes it publicly
accessible.

The Rural Income Generation Activities (RIGA) project is one example of a project that did make
available and harmonize household surveys with income data across countries. It was driven by the
World Bank and the Food and Agriculture Organization. 7 The project created harmonized and publicly
available microdata, including incomes, from 35 surveys for 22 countries, most of which were the



6
 On June 26, 2023, there were no references to Statistics Online on Twitter by anyone outside the World Bank
and only three in total. I added a fourth on the June 26.
7
    https://www.fao.org/economic/riga/en/

                                                                                                         27
World Bank’s LSMSs. Carletto et al. (2007) describe how the income data across the surveys were
harmonized. It appears that the microdata are available for both rural and urban areas, in which case
this is a potentially valuable source of income data to study the top tail of the income distribution.
Kerr et al. (2019) undertook harmonization of 26 years of household surveys on earnings income in
South Africa and made the harmonized data publicly available. Changes over time in the surveys, the
sampling, and the questionnaires meant that obtaining comparable earnings distributions over time
even for one country required a substantial amount of work.

We have described the method of Korinek et al. (2006) to adjust survey weights to correct for unit
non-ignorable non-response and we discussed several papers that implemented these corrections.
Ravallion (2022) notes that implementing this method requires regional non-response rates but that
these are seldom available. He argues for the inclusion of PSU-level response rates and notes that this
would be simple to include for survey data producers. Hlasny (2020) documents that some of the
surveys available through the Luxembourg Income Study (LIS), including Brazil and South Africa,
include non-responding households.

Administrative data
The UNU WIDER project “Regional Growth and Development in Southern Africa” resulted in access to
South African tax microdata through the South African National Treasury from 2015. This model of
partnership between international organizations and national governments, including tax offices, has
since been replicated by WIDER in Tanzania, Zambia, Uganda, and Rwanda. 8 Such data can be used to
improve the measurement of the top tail of the income distribution in these countries, especially since
publicly available household survey data on incomes exists for all these countries. Having a larger
number of countries with tax and survey data and undertaking comparisons would allow for more
general conclusions about the extent of the differences between survey and tax data for Africa, rather
than just two as in Chancel et al. (2023).

Using New Sources of Data
The most obvious research that could be undertaken using new data sources is obtaining and analyzing
administrative tax data for a larger number of LMICs than is currently available. Ideally this would be
tax microdata but even tax tabulations would be very useful. The first step would be to compare
estimates of top incomes from surveys and administrative data, and then examine the extent to which
under-reporting, item non-response and unit non-response and other challenges create any
differences between household survey and administrative data.


8
 https://www.wider.unu.edu/project/building-efficient-and-fair-tax-systems-lessons-based-administrative-
tax-data

                                                                                                           28
Ravallion (2022) argues that for some research on the top of the distribution, survey data will be
adequate. The evidence we have discussed above suggests that in South Africa and Côte d’Ivoire, a
divergence between survey and tax data starts quite far down the distribution and thus that tax data
are required to accurately measure the top tail. Further work comparing tax (both new and already
existing) and survey data will shed light on the extent to which surveys can be reliably used to measure
top incomes.

Making use of other types of administrative data may also be useful. Further work could use house
price data and improve the methods of Van der Weide et al. (2018). This is likely to be available only
in urban areas in low-income countries, and many middle-income countries as well. House price data
could be complemented with land record data for rural areas, where this exists. The extent to which
the top tail is comprised of those who live in rural areas in low-income countries could then be
investigated more comprehensively, a topic that there has not been much research on.

A second useful part of a research agenda using new data would link survey and tax microdata at the
individual level, to provide further insight into the reasons for differences across tax and survey data,
specifically the role of unit non-response, measurement error and under-reporting. We are only aware
of one study in a LMIC that uses linked survey and tax data – Flachaire et al. (2021) for Uruguay. The
links were only available for couples with children aged 0-3 so the extent to which this generalizes to
the adult population in Uruguay is unclear. Further research would thus be extremely useful.

A third part of the research agenda would be to undertake future household surveys asking about
incomes or wealth that oversample high-income or high-wealth households or areas. Oversampling
rich households using tax or other forms of administrative data is not common, even in high income
countries, probably because it requires coordination between various public agencies (for example
national statistical offices and tax authorities). In addition, administrative tax data in LMICs may miss
the rich whose incomes are outside the tax system, which may be a large proportion of those at the
top of the income distribution (Haq, 1964, Ravallion, 2022). Nevertheless, it would be useful, perhaps
as a pilot, for an LSMS to oversample high income households using tax data, where this can be
obtained. A less ambitious method of oversampling would be to use population census data to
oversample small areas with high incomes or with characteristics that are correlated with high
incomes, if income data is not collected in the census. A version of this was attempted in the NIDS
panel fifth wave in South Africa to increase the sample size of white and Indian South Africans, by
drawing a top-up sample from enumeration areas with large proportions of these groups, using
estimates from the most recent population census (Branson, 2019).




                                                                                                      29
Most of the work improving measurement of the top tail of the income or wealth distributions uses
administrative data. But there is very little work on how one might improve estimates of incomes or
wealth (and income from that wealth) that are outside the tax system. This is very important in LMICs
where the informal sector is a large share of economic activity (Ravallion, 2022). Ravallion (2022) notes
that if economic growth improves state capacity and this shifts more activity into the formal sector
then this could generate rising measured top income shares, even when there is no actual change.
This does not seem to have been investigated in the research discussed in this paper. Ravallion (2022)
also highlights that there is little evidence on how large the informal or hidden sector share of top
incomes or wealth is in any LMICs. This also seems an important area for further work. Alstadsæter et
al. (2022) used leaked data on Dubai real estate ownership to show that the value of property
ownership in Dubai by LMIC residents is a large share of GDP for several of these LMICs. Such data
would be useful to improve the measurement of the top of the wealth distribution in these countries
and serve as estimates for how much administrative tax data misses important sources of wealth at
the top of the distribution.

Conclusion
There has been a resurgence in research on the measurement of the top tail of the income and wealth
distribution in the last 20 years. Kuznets and Jenks (1953) were the key authors that began this
research agenda, but it declined in importance as household surveys became more widely available
and researchers turned their attention to microdata. Piketty (2001, 2003) put the measurement of the
top tail back onto the research agenda with his long run measurement of top incomes in France, and
subsequently this area of research has exploded.

In this paper we have surveyed this literature, focusing on the challenges that have been identified
when trying to measure the top of the income and wealth distributions, as well as the solutions that
have been used to overcome these challenges. Our focus has been low- and middle-income countries,
although we have discussed research from high-income countries, particularly when no research on a
specific topic has been undertaken in any LMIC.

There are multiple challenges that researchers measuring the top end of the wealth and income
distributions face. We focused first on the challenges when using household survey data, given that
this is the most common form of data on incomes and wealth available in LMICs and that other forms
of data (like administrative tax microdata or even tax tabulations) are uncommon. These include
missing high income or wealth individuals due to non-random non-response (item or unit) or due to
sampling error and measurement error (either from respondents themselves, proxy respondents or




                                                                                                      30
because of data processing errors). We also highlighted that there is a lack of publicly available survey
data with income questions for some LMICs, particularly in Africa.

We then discussed solutions to the challenges encountered when using household surveys that use
survey data and no external data. The first set of solutions are those that can be undertaken before a
survey starts. These include oversampling high income or wealth individuals, better fieldwork
protocols to minimize unit and item non-response, and questions to elicit at least some information
when a respondent does not want to give their exact income or wealth.

The second set of solutions are those that use only household survey data and are used ex-post. These
can be broadly classified as either reweighting or replacement methods and include reweighting for
unit non-response, imputation for item non-response and replacing the top of the distribution using
a parametric distribution estimated from the survey itself, usually the Pareto distribution.

The third set of solutions combines survey data with administrative data, usually tax data, to improve
measurement at the top, where researchers believe incomes are not well captured. These can also be
broadly classified as replacement or reweighting methods. They include replacing top incomes in
surveys with incomes from tax data above a certain threshold or reweighting so that the survey
income or wealth distribution matches that in administrative data. Most solutions using administrative
data use a combination of reweighting and replacement.

The research we have surveyed has resulted in a much better understanding of the challenges when
measuring the upper tail of the income and wealth distributions and improved estimates of the
incomes or wealth in the upper tail. But we have outlined gaps and thus areas in which future research
should concentrate.




                                                                                                      31
References
Alstadsæter, A., Johannesen, N., & Zucman, G. (2018). Who owns the wealth in tax havens?
Macro evidence and implications for global inequality. Journal of Public Economics, 162, 89-
100.

Alstadsæter, A., Johannesen, N., & Zucman, G. (2019). Tax evasion and inequality. American
Economic Review, 109(6), 2073-2103.

Alvaredo, F. (2011). A note on the relationship between top income shares and the Gini
coefficient. Economics Letters, 110(3), 274-277.

Alvaredo, F., & Atkinson, A. B. (2022). Top incomes in South Africa in the twentieth
century. Cliometrica, 16(3), 477-546.

Alvaredo, F., & Londoño Vélez, J. (2013). High incomes and personal taxation in a developing
economy: Colombia 1993-2010.Colombia CEQ Working Paper No. 12

Anand, I., & Thampi, A. (2016). Recent trends in wealth inequality in India. Economic and
Political Weekly, 59-67.

Atkinson,A.B. (2007).Methodological Issues. In: Atkinson,A.B., Piketty,T. (Eds.),Top Incomes
over the Twentieth Century: A Contrast Between European and English-Speaking Countries.
Oxford University Press, Oxford.

Bach, S., Corneo, G., & Steiner, V. (2009). From bottom to top: the entire income distribution
in Germany, 1992–2003. Review of income and wealth, 55(2), 303-330.

Bach, S., Thiemann, A., & Zucco, A. (2019). Looking for the missing rich: Tracing the top tail
of the wealth distribution. International Tax and Public Finance, 26(6), 1234-1258.

Blanchet, T., Chancel, L., Flores, I., Morgan, M. (2021). Distributional national accounts
guidelines, methods and concepts used in the world inequality database. World Inequality Lab.

Blanchet, T., Flores, I., & Morgan, M. (2022). The weight of the rich: improving surveys using
tax data. The Journal of Economic Inequality, 20(1), 119-150.

Bollinger, C. R., Hirsch, B. T., Hokayem, C. M., & Ziliak, J. P. (2019). Trouble in the tails?
What we know about earnings nonresponse 30 years after Lillard, Smith, and Welch. Journal
of Political Economy, 127(5), 2143-2185.

Branson, N. (2019). Adding a top-up sample to the National Income Dynamics Study in South
Africa. NIDS Technical Paper, 8.

                                                                                           32
Burkhauser, R. V., Feng, S., Jenkins, S. P., & Larrimore, J. (2012). Recent trends in top
income shares in the United States: reconciling estimates from March CPS and IRS tax return
data. Review of Economics and Statistics, 94(2), 371-388.

Burkhauser, R., Jenkins, S, Hérault, N. & Wilkins, R. (2016). What has been happening to UK
income inequality since the mid-1990s? Answers from reconciled and combined household
survey and tax return data (No. 2016-03). Institute for Social and Economic Research.

Burkhauser, R. V., Hérault, N., Jenkins, S. P., & Wilkins, R. (2018). Survey Under‐Coverage
of Top Incomes and Estimation of Inequality: What is the Role of the UK's SPI
Adjustment?. Fiscal Studies, 39(2), 213-240.

Burkhauser, R. V., Herault, N., Jenkins, S. P., & Wilkins, R. (2023). What Accounts for the
Rising Share of Women in the Top 1 percent?. Review of Income and Wealth, 69(1), 1-33.

Campos-Vazquez, R. M., & Lustig, N. (2019). Labour income inequality in Mexico: Puzzles
solved and unsolved. Journal of Economic and Social Measurement, 44(4), 203-219.

Carletto, G., Covarrubias, K., Davis, B., Krausova, M., & Winters, P. (2007). Rural Income
Generating Activities Study: Methodological note on the construction of income
aggregates. Agricultural Sector in Economic Development Service, Food and Agriculture
Organization.

Chancel, L., & Piketty, T. (2019). Indian income inequality, 1922‐2015: from british raj to
billionaire raj?. Review of Income and Wealth, 65, S33-S62.

Chancel, L., Piketty, T., Saez, E., & Zucman, G. (Eds.). (2022). World inequality report 2022.
Harvard University Press.

Chancel, L., Cogneau, D., Gethin, A., Myczkowski, A., & Robilliard, A. S. (2023). Income
inequality   in   Africa,   1990–2019:     Measurement,       patterns,   determinants. World
Development, 163, 1-23.

Chatterjee, A., Czajka, L., & Gethin, A. (2022). Wealth Inequality in South Africa, 1993–
2017. The World Bank Economic Review, 36(1), 19-36.

Choumert-Nkolo, J., Santana Tavera, G., & Saxena, P. (2023). Addressing Non-response
Bias in Surveys of Wealthy Households in Low-and Middle-Income Countries: Strategies and
Implementation. The Journal of Development Studies, 1-16.

Czajka, L. (2017). Income Inequality in Côte d'Ivoire: 1985-2014. World Inequality Lab
Working Paper 8-2017

                                                                                           33
Deaton, A. (1997). The analysis of household surveys: a microeconometric approach to
development policy. World Bank Publications.


Deaton, A. (2005). Measuring poverty in a growing world (or measuring growth in a poor
world). Review of Economics and statistics, 87(1), 1-19.

Doss, C. R., Grown, C., & Deere, C. D. (2008). Gender and asset ownership: A guide to
collecting individual-level data. World Bank policy Research working paper, (4704).

Finn, A., Franklin, S., Keswell, M., Leibbrandt, M., & Levinsohn, J. (2009). Expenditure: Report
on NIDS Wave 1. National Income Dynamics Study Technical Paper, 4, 1-33.

Flachaire, E., Lustig, N., & Vigorito, A. (2022). Underreporting of top incomes and inequality:
A comparison of correction methods using simulations and linked survey and tax data. Review
of Income and Wealth.

Frankel, S. H., & Herzfeld, H. (1943). European income distribution in the Union of South
Africa and the effect thereon of income taxation. South African Journal of Economics, 11(2),
121-136.


Gandelman, N., & Lluberas, R. (2023). Wealth in Latin America: Evidence from Chile,
Colombia, Mexico and Uruguay. Review of Income and Wealth. Forthcoming.

Haq, K. (1964). A measurement of inequality in urban personal income distribution in
Pakistan. The Pakistan Development Review, 4(4), 623-664.

Hasanbasri, A., Kilic, T., Koolwal, G., & Moylan, H. (2022). Individual wealth inequality:
Measurement and evidence from low-and middle-income countries. World Bank Working
Paper

Hlasny, V., & Verme, P. (2017). The impact of top incomes biases on the measurement of
inequality in the United States. ECINEQ WP 2017 – 452.

Hlasny, V., & Verme, P. (2018). Top incomes and the measurement of inequality in Egypt. The
World Bank Economic Review, 32(2), 428-455.

Hlasny, V., & Verme, P. (2022). The impact of top incomes biases on the measurement of
inequality in the United States. Oxford Bulletin of Economics and Statistics, 84(4), 749-788.

Jenkins, S. P. (2017). Pareto models, top incomes and recent trends in UK income
Inequality. Economica, 84(334), 261-289.



                                                                                             34
Johansson-Tormod, F., & Klevmarken, A. (2022). Comparing register and survey wealth
data. International Journal of Microsimulation, 15(1), 43-62.Jouste, M., Barugahara, T. K.,
Ayo, J. O., Pirttilä, J., & Rattenhuber, P. (2023). Taxpayer response to greater progressivity.

Kennickell, A. B. (2019). The tail that wags: differences in effective right tail coverage and
estimates of wealth inequality. The Journal of Economic Inequality, 17(4), 443-459.

Kerr, A. (2021). Measuring earnings inequality in South Africa using household survey and
administrative tax microdata (No. 2021/82). WIDER Working Paper.

Kerr, A., Lam, D., Wittenberg, M., 2019. The Post-Apartheid Labour Market Series v3.3
(PALMS). [dataset].

Kerr, A., & Wittenberg, M. (2015). Sampling methodology and fieldwork changes in the
October Household Surveys and Labour Force Surveys. Development Southern Africa, 32(5),
603-612.

Kerr, A., & Wittenberg, M. (2021). Union wage premia and wage inequality in South
Africa. Economic Modelling, 97, 255-271.

Kilic, T., Moylan, H., & Koolwal, G. (2021). Getting the (Gender-Disaggregated) lay of the land:
Impact of survey respondent selection on measuring land ownership and rights. World
Development, 146.

Korinek, A., Mistiaen, J. A., & Ravallion, M. (2006). Survey nonresponse and the distribution
of income. The Journal of Economic Inequality, 4, 33-55.

Kuznets, S., Jenks, E. (1953) Shares of Upper Income Groups in Income and Savings.
National Bureau of Economic Research, Cambridge (1953)

Kuznets, S. (1955). Economic Growth and Income Inequality. The American Economic
Review, 45(1), 1-28.

Kuznets, S. (1963). Quantitative aspects of the economic growth of nations: VIII. Distribution
of income by size. Economic development and cultural change, 11(2, Part 2), 1-80.

Lakner, C., & Milanovic, B. (2016). Global Income Distribution: From the Fall of the Berlin Wall
to the Great Recession. The World Bank Economic Review, 30(2), 203-232.

Leibbrandt, M., Woolard, I., & de Villiers, L. (2009). Methodology: Report on NIDS wave
1. Technical paper, 1.

Lepkowski, J. (2005). Non-observation error in household surveys in developing
countries. Department of Economic and Social Affairs, Statistics Division, editor. Household


                                                                                             35
surveys in developing and transition countries. New York: United Nations, 149-69.Lohr, S. L.
(2009). Sampling: design and analysis. CRC press..

Luiten, A., Hox, J., & de Leeuw, E. (2020). Survey nonresponse trends and fieldwork effort in
the 21st century: Results of an international study across countries and surveys. Journal of
Official Statistics, 36(3), 469-487.

Lustig, N. (2020). The "Missing Rich'" in Household Surveys: Causes and Correction
Approaches (Vol. 520). ECINEQ, Society for the Study of Economic Inequality.

McLennan, D., Noble, M., Wright, G. C., Barnes, H., & Masekesa, F. (2021). Exploring the
quality of income data in two African household surveys for the purpose of tax-benefit
microsimulation modelling: Imputing employment income in Tanzania and Zambia (No.
2021/134). WIDER Working Paper.

Milanovic, B. (2014). The return of “patrimonial capitalism”: a review of Thomas Piketty's
Capital in the twenty-first century. Journal of economic literature, 52(2), 519-534.

Morelli, S., & Muñoz, E. (2019). Unit nonresponse bias in the Current Population
Survey. unpublished paper, City University of New York Graduate Center. https://www.
erciomunoz. org/files/Draft_cps. pdf.

Piketty, T. (2001). Les hauts revenus en France au XXe sie`cle: Ine´galite´s et redistributions,
1901–1998. Paris: Grasset.
Piketty, T. (2003). Income inequality           in France,    1901–1998. Journal of      political
economy, 111(5), 1004-1042.

Piketty, T., & Saez, E. (2003). Income inequality in the United States, 1913–1998. The
Quarterly journal of economics, 118(1), 1-41.

Piketty, T., Yang, L., & Zucman, G. (2019). Capital accumulation, private property, and rising
inequality in China, 1978–2015. American Economic Review, 109(7), 2469-2496.

Piketty, T., Saez, E., & Zucman, G. (2022). Twenty years and counting: Thoughts about
measuring the upper tail. The Journal of Economic Inequality, 20(1), 255-264.

Prydz, E. B., Jolliffe, D., & Serajuddin, U. (2022). Disparities in Assessments of Living
Standards Using National Accounts and Household Surveys. Review of Income and
Wealth, 68, S385-S420.

Ravallion,   M.   (2022).     Missing   top   income   recipients. The   Journal   of   Economic
Inequality, 20(1), 205-222.



                                                                                               36
Roine, J., & Waldenström, D. (2015). Long-run trends in the distribution of income and
wealth. Handbook of income distribution, 2, 469-592.

Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York. John Wiley &
Sons

Saez, E., & Zucman, G. (2016). Wealth inequality in the United States since 1913: Evidence
from capitalized income tax data. The Quarterly Journal of Economics, 131(2), 519-578.

Scott, K., Steele, D., & Temesgen, T. (2005). Chapter XXIII: Living Standards Measurement
Study Surveys. Household sample surveys in developing and transition countries. United
Nations. New York

Si, Y., Heeringa, S., Johnson, D., Little, R. J., Liu, W., Pfeffer, F., & Raghunathan, T. (2023).
Multiple imputation with massive data: An application to the panel study of income
dynamics. Journal of Survey Statistics and Methodology, 11(1), 260-283.

Stecklov, G., Weinreb, A., & Carletto, C. (2018). Can incentives improve survey data quality
in developing countries?: results from a field experiment in India. Journal of the Royal
Statistical Society Series A: Statistics in Society, 181(4), 1033-1056.

Székely, M., & Hilgert, M. (2007). What's behind the inequality we measure? An investigation
using Latin American data. Oxford Development Studies, 35(2), 197-217.

Vaessen et al (2005) The Demographic and Health Surveys. Chapter XXII. Household Sample
Surveys in Developing and Transition Countries.

Van Der Weide, R., Lakner, C., & Ianchovichina, E. (2018). Is inequality underestimated in
Egypt? Evidence from house prices. Review of Income and Wealth, 64, S55-S79.

Waltl, S. R., & Chakraborty, R. (2022). Missing the wealthy in the HFCS: micro problems with
macro implications. The Journal of Economic Inequality, 20(1), 169-203.

Wittenberg, M. (2008). Nonparametric estimation when income is reported in bands and at
points. Cape Town: Economic Research Southern Africa Working Paper, (94).

Wittenberg, M. (2017). Measurement of earnings: Comparing South African tax and survey
data. SALDRU Working Paper 212.

Xie, Y., & Jin, Y. (2015). Household wealth in China. Chinese sociological review, 47(3), 203-
229.




                                                                                              37
Yonzan, N., Milanovic, B., Morelli, S., & Gornick, J. (2022). Drawing a line: comparing the
estimation of top incomes between tax data and household survey data. The Journal of
Economic Inequality, 20(1), 67-95.




                                                                                        38