Towards a
       Taxonomy of the
       Poor in Pakistan



Christina Wieser
Ibrahim Khan
                             Towards a Taxonomy of the Poor in Pakistan

                                              Christina Wieser               Ibrahim Khan1

                                                      This version: July 23, 2024




                                                           Abstract
Poor households are heterogeneous in the circumstances preventing an improvement in their welfare. It is
important to understand the nuances within different types of poor households so that critical pathways out of
poverty that remedy the variegated sets of constraints they face can be identified and acted on through policy
action. This paper attempts to categorize the bottom 40th consumption percentile of households (B40) in
Pakistan into different non-overlapping groups using a non-parametric hierarchical cluster analysis, which allows
for an empirically driven taxonomy of the poor in the country. Using data from the Household Integrated
Economic Survey (HIES) 2018-19, we identify five groups among the B40 and explore their salient household and
occupational attributes through the lens of an asset framework of shared prosperity.




Keywords: poverty, cluster analysis, Pakistan
JEL Classification: I32, C3




1 Christina Wieser (cwieser@worldbank.org) is a Senior Economist in the Poverty and Equity Global Practice at the World Bank. Ibrahim
Khan (mkhan83@worldbank.org) is a Consultant in the Poverty and Equity Global Practice at the World Bank. The authors would like to
thank Moritz Meyer, Maria Qazi, Oscar Eduardo Barriga Cabanillas, and Jon Jellema for their excellent feedback. We declare that we have
no relevant or material financial interests that relate to the research described in this paper. The findings, interpretations, and conclusions
expressed in this work do not necessarily reflect the views of The World Bank Group or any affiliated organizations, its Board of Executive
Directors, or the governments they represent. The World Bank does not guarantee the accuracy of the data included in this work.

Disclaimer: This work is a product of the staff of the World Bank with external contributions. The findings, interpretations, and conclusions
expressed in this work do not necessarily reflect the views of the Executive Directors of the World Bank or the governments they
represent. The World Bank does not guarantee the accuracy of the data included in this work and does not assume responsibility for any
errors, omissions, or discrepancies in the information, or liability with respect to the use of or failure to use the information, methods,
processes, or conclusions set forth. The boundaries, colors, denominations, links/footnotes, and other information shown in this work do
not imply any judgment on the part of The World Bank concerning the legal status of any territory or the endorsement or acceptance of
such boundaries. The citation of works authored by others does not mean the World Bank endorses the views expressed by those authors
or the content of their works.
    1. Introduction
Pakistan saw an extended period of rising household welfare between 2001 and 2018, later interrupted by a
series of crises. From 2001 to 2018, household real consumption grew by 60 percent, or an annual average of 3
percent, leading to a sustained decline in poverty from 64.3 percent in 2001 to 21.9 percent in 2018 (World
Bank, 2023), and was achieved primarily by the expansion of off-farm economic opportunities and increased
out-migration. During the period of improved welfare, inequality levels remained relatively stable, with a Gini
index of 28.4 in 2018 (World Bank, 2023).

While poverty declined across Pakistan, three challenges persist. First, disparities in welfare levels across
provinces, districts, and the rural-urban divide continue. Urban areas have seen a faster pace of poverty
reduction due to expanding employment, especially in services and construction, which provide employment
for lower income workers. Poverty has increasingly become concentrated among rural households working on
agriculture and among those with low human capital endowments. For instance, in 2018, rural areas—where
two-thirds of the population but four out of five of the poor live —had a poverty rate (28.2 percent) more than
twice as high as urban areas (10.9 percent).

Second, female and youth labor force participation remain low. Those (few) women who work have found
mostly low-skill, low-paying jobs. As much as 37 percent of Pakistan’s youth (aged 15 to 24) are not in
employment, education, or training (NEET), and even for young Pakistanis who work, labor market prospects
are grim.

Third, recent years have presented Pakistani households with greater difficulties due to multiple back-to-back
crises, such as macroeconomic, political, and climate-induced shocks. In particular, the COVID-19 pandemic in
2020 marked an end to Pakistan’s track record of consistent year-on-year poverty reduction with a projected
increase in poverty of 3 percentage points. Furthermore, A precarious macroeconomic situation, with
dangerously low reserves, an exchange rate in free fall, and an ambitious IMF-enforced fiscal consolidation plan
was compounded by an unprecedented calamity in the form of torrential rains and a combination of riverine,
urban, and flash flooding in the summer in 2022. The resulting surge in inflation further put pressure on poor
Pakistanis. Poverty was projected at 25.3 percent in 2023 (Barriga-Cabanillas et al., forthcoming).

Reducing poverty requires addressing the specific challenges faced by the poor in Pakistan. The poor have low
educational attainment and are usually engaged in low productivity work. Education contributes to individuals’
productivity, earning potential, and overall wellbeing. Still, Pakistan has enormous gaps in educational
attainment, with poorer individuals left uneducated and a widening gap over the past 20 years between rich and
poor. For example, in 2018, only 46 percent of the youth in the bottom quintile could read and write, increasing
only modestly from 38 percent in 2001. Along with such low educational attainment, poor Pakistanis work in
lower-paying sectors such as agriculture, construction, or low-quality services. Additionally, there is
interconnectedness between poverty, geography, and livelihoods, resulting in substantially higher poverty rates
in rural areas and the lagging provinces of Balochistan and Sindh.
However, the poor are not a homogenous group. They have different characteristics, face constraints in escaping
poverty, and their opportunities are shaped by the economic environment, where they live, and household-
specific factors. This, in turn, means that we need to better understand what different groups exist within the

                                                                                                              1
umbrella of the “poor” to allow for differentiated policy interventions. We, therefore, first specify different
groups of the poor based on their characteristics to identify some binding constraints each specific group faces.
Note that this paper does not examine determinants of poverty; instead, the focus is ex-post to determine, for
the population of poor in the country, how to group them such that their characteristics are interpretable and
subsequently may allow for policy action targeting each group’s specific set of constraints.

For this, we leverage cluster analysis, a statistical technique for grouping a population into a set of subgroups
called clusters based on the observed characteristics of households. The different groups derived from this
exercise are then given policy meaning through careful interpretation. Based on their similarity and dissimilarity,
it is a non-parametric approach to segmenting a population—in our case, the bottom 40 percent of the
consumption distribution. Broadly used across different applications of unsupervised (non-label) problems, the
development literature has used clustering to target and measure poverty characteristics across different
contexts (Rahman, et al., 2021). Clusters are identified by partitioning households into groups to maximize their
similarity within each group while maximizing the dissimilarity between groups. The analysis involves three
steps: (i) selection of clustering variables based on previous related works, theory (potential relationship with
poverty), and data availability; (ii) selection of the clustering procedure, measures of dissimilarity, and the
number of clusters; and (iii) validation and interpretation of the results by defining and labeling the obtained
clusters.

The clustering procedure is a hierarchical method to select an undefined number of clusters among the poor.
Using a wards linkage clustering algorithm, we form clusters by combining observations or smaller clusters
whose merger minimized the increase in the overall within-cluster variance. Once the clusters or subgroups of
the poor are determined by the algorithm, they are profiled to label and interpret the results of the clustering
exercise.

Results of the cluster analysis show that poverty manifests in varied circumstances, and we identify five
meaningful clusters among the bottom 40 percent of the consumption distribution. The first group consists of
ultra-poor households in rural areas that primarily rely on unskilled sharecropping as a means of earning,
supplemented by public social safety nets. The second group is poor rural households are also actively involved
in agriculture as owner-cultivators. The third key group of the bottom 40 are households in transition between
agriculture and service provision. The fourth set of households are urban with higher levels of education are
among the B40, have higher ownership of (productive) assets, and are in semi-skilled wage jobs in the industry
and services sectors. Lastly, the fifth group is poor households that find their livelihood in unskilled daily wage
labor in construction and other smaller service work—in some ways, the non-agricultural equivalent of our
cluster of sharecroppers—spread across the country. This taxonomy of the poor could be helpful in the larger
effort to make more meaningful, empirically informed policies targeted to address the diverse set of constraints
faced by the poor in Pakistan.

The paper is structured as follows. Section 2 describes the data sources and provides a step-by-step explanation
of the methodology applied, Section 3 explores and discusses the results through an asset framework lens, and
Section 4 concludes.




                                                                                                                 2
    2. Data and Methodology
Cluster analysis is a statistical technique of grouping a population into a set of meaningful subgroups called
clusters based of observed characteristics. Clusters are identified by partitioning units into groups to maximize
the similarity within each group while maximizing the dissimilarity between groups.

Machine learning methods are commonly used for pattern analysis in industry applications such as market
segmentation but increasingly in development research. Cluster analysis is distinguished from other machine
learning and econometric methods in that there is no assumption of a statistical model or structure (Brian,
Landau, Leese, & Stahl, 2011). Consequently, pre-processing steps must be undertaken to prepare the data for
analysis. Additionally, robustness checks must be conducted to test the stability of the cluster identified. Error!
Reference source not found. outlines the decision processes involved in the analysis (Mooi & Marko Sarstedt,
2017). Care must be taken at each node to ensure that each methodological decision is congruent with the
analysis objective and informed by the structure of the underlying data.

                                       Figure 1: Steps in a cluster analysis




   2.1 Data
The Pakistan Bureau of Statistics regularly conducts the Household Integrated Economic Survey (HIES). It collects
information on demographic characteristics and indicators of health, education, dwelling conditions, economic
activity, and household welfare. The latest survey round (2018-19) covers 24,809 households across 1,802
primary sampling units (PSUs) in rural and urban areas of the four major provinces and is representative at the
provincial level. For the first time, this round also includes the Federally Administered Tribal Areas (FATA) and
Frontier Regions (FR) as part of Khyber Pakhtunkhwa.



                                                                                                                 3
The HIES is designed to measure national and provincial poverty rates as it includes an extensive consumption
expenditure module that allows for calculating adult equivalent household consumption expenditure. Note that
our objective requires the identification of monetary poor households as a precursor to the analysis. Further,
given the extensive set of welfare attributes captured in this survey, the HIES data is an ideal setting to develop
a taxonomy of the poor.

Many important aspects of poverty may not be sufficiently captured by the household survey, such as the risk
of a natural disaster or remoteness. To address the constraints of such missing information, we source geospatial
data from external sources, including the World Bank’s Rural Accessibility Index, Climate Country Disaster
Statistics (CCDR), and the national Agroecological zones defined by the Pakistan Agriculture Research Council.
However, since we do not have geo-coordinates of the households in the HIES sample, we must incorporate the
external data at the more aggregate administrative district level. While some granularity of information is lost
in this, we can still use this information to proxy for broader geographical characteristics.


2.2 The unit of analysis
To define clusters, we need to decide on a unit of analysis most appropriate for our context. Determining the
data structure from which the clustering will be derived is a nontrivial consideration. For example, we might
restrict our data to just those below the poverty line or pre-screened on specific characteristics. Further, we
could run the algorithm nationally or separately for the provinces. Lastly, we could choose our unit of interest
to be the individual, the household, or even a larger aggregation (PSU). Each of these considerations is ultimately
normative and must be weighed against the policy objective of the analysis, which is to characterize various
target groups for poverty-response interventions.
Considering this, the unit of our analysis is households in the bottom 40th percentile of the income distribution
nationally. There are a few reasons which support this approach:
    1. Households form a common level at which targeted policy interventions are carried out relative to
         individuals—for example, the country's unconditional and conditional cash transfers target eligible
         households. While area-specific interventions are used in circumstances such as disaster response, our
         goal is to provide a more general framing of different groups experiencing poverty.
    2. Similarly, instead of repeating the analysis for different provinces, conducting it at the national level
         allows for a larger sample to draw inferences from without losing granularity, given that we include
         some geographical characteristics as clustering attributes.
    3. Extending the focus from those below the national poverty line (comprising the bottom 22 percent of
         the population according to the national poverty line) to the “Bottom 40 (B40)” recognizes that poverty
         is dynamic. Given that the last update of the HIES data occurred five years ago and poverty is assumed
         to have increased over the last few years based on a poverty projection model, expanding the sample
         allows our model to be more inclusive of households that may have moved across old thresholds.
         Furthermore, the B40 is not substantially different, or better off, than the country's poor. Error!
         Reference source not found. compares the overall set of households in the country to the B40 and the
         poor on welfare measures. Note that the B40 still score low on measures of human capital, accessibility,
         and consumption, with a monthly average per adult equivalent consumption of PKR 3,634 (around 26
         USD, in 2018 terms).



                                                                                                                 4
                              Table 1: Comparing the welfare of sub-populations.
           Variable                                                Pakistan        B40         Poor
           Per Adult Household Expenditure (PKR)                    6,873         3,634        3,130
           Household Size                                            6.24         7.76         8.25
           Dependency Ratio                                          0.99         1.34         1.46
           Share of adult members with primary education             52%          31%          26%
           Share of adult members with secondary education           28%          11%           8%
           Share of working members in very low-skill                21%          34%          39%
           occupation
           Received any Assistance (Zakat/BISP/Govt)                 8.6%        15.9%        18.4%
           Rural PSU                                                62.0%        79.5%        82.5%
           Accessibility Index (District-Level)                      0.37         0.40         0.41
           Experienced moderate food insecurity in the last         13.8%        25.0%        30.8%
           year
           Improved sanitation                                      74.7%        66.5%        63.3%
           Observations                                             24,809        8,789        4,651


2.3 Selecting and pre-processing variables
The choice of the set of attributes upon which the algorithm is run, including technical considerations and data
availability, is grounded in literature and a theoretical framework based on their relevance to poverty analysis
to prevent going into unstructured data mining. To this end, we explore and identify variables that encompass
the following aspects of household welfare:

    1.   Ownership of productive assets
    2.   Human capital
    3.   Livelihoods and nature of economic activities
    4.   Dwelling conditions and access to services
    5.   Exposure to climatic shocks

Note that the choice of themes and variables is not related to “determinants” of poverty but rather the attributes
that distinguish different types among poor households. Once we have a universe of characteristics for which
data is available within these themes, several steps are undertaken to reach a final set. We utilize guidelines set
in (Klopotek & Wierzchon, 2018) to this end in the following manner:

    1. The variables must sufficiently differentiate between potential clusters. Within the sample of B40
       households, the selected outcomes should have enough variation. For example, while we could include
       car ownership as a clustering variable, the percentage of the B40 that own a car is extremely low. So,
       this information could not help us identify different groups due to their homogeneity.



                                                                                                                 5
    2. The relation between sample size and several clustering variables must be reasonable. While there is no
       generally accepted guideline on the minimum number of variables required for a particular sample size
       (or vice versa), literature broadly suggests that there will be an over-proportional increase in
       observations for every additional clustering variable. What this means in practice is that our sample
       should be at least 10 (Qiu & Joe, 2009) to 30 (Dolnicar, Grün, & Leisch, 2016) times the number of
       clustering variables, with continued improvements up to 100. In our context, our B40 sample size is
       8,789, which suggests that we can safely choose 80-90 variables for our analysis. Of course, this is likely
       excessive in our context as we also must be careful not to fit the data over.

    3. The chosen variables must not be highly correlated. This is related to our earlier point; beyond a certain
       point, adding additional variables will only add to the possibility of two or more measures exhibiting a
       high degree of pairwise correlation. We examine the pairwise correlation of our preliminary set of
       variables defined by steps 1 and 2; for any attribute showing a> 0.75 correlation2, we either exclude one
       or create a merged version.

Following the above steps, we finalize 26 clustering variables for our analysis. Their descriptions are listed in
Appendix Table A-1.


2.4 Choosing the clustering procedure
Different procedures are designed to optimize specific criteria, such as maximizing the distance between
measures across potential clusters. Practically, a key distinction should be made between hierarchical and
partitioning methods.

Hierarchical methods are characterized by a “tree-like” structure, with units combining to form clusters and
clusters merging based on similarity. In the agglomerative approach, the most common hierarchical method, the
algorithm begins with N clusters (that is, every unit is a cluster) and merges until we are left with one cluster,
the entire sample itself. The divisive clustering approach does the same but in the other direction from top to
bottom.

Partitioning methods rely on an entirely different framework. The most common of these, a k-means procedure,
tries to minimize within-cluster variation to form homogenous clusters. In this approach, cluster affiliations can
change; consequently, it does not build a hierarchy as in the previous method. While k-means can be superior
to hierarchical methods as it is less affected by outliers, it requires us to pre-specify the number of clusters to
be formed.

In our context, we do not know the number of clusters apriori. Therefore, we employ a (agglomerative)
hierarchical clustering procedure for our objective. This requires an additional step: specifying the linkage
algorithm that defines the distance from a newly formed cluster to another cluster or unit. Some of the more
popular algorithms include single linkage, wherein the distance between two clusters is the shortest distance



2
 While the literature recommends 0.9, none of our listed variables have correlation above 80, so the measure was adjusted
down accordingly.
                                                                                                                       6
between any two of their members; centroid linkage, which computes the difference in the geometric center of
each cluster; and average linkage, which defines distance as the average between all units of the two clusters.

Our approach will utilize the Wards linkage algorithm. Instead of combining the two closest or similar objects,
this procedure merges objects to minimize the increase within-cluster variance. Error! Reference source not
found. diagrammatically represents this algorithm in a set with two clustering variables and seven objects.

Prior research suggests that this approach performs very well, especially with more clustering variables, and
tends to yield clusters of a similar size and tightness (Mooi & Marko Sarstedt, 2017). While highly correlated
variables and outliers tend to strongly influence the results of this algorithm, our data pre-processing steps
discussed earlier can assuage these concerns. Moreover, we will conduct robustness checks with alternate
algorithms to test the stability of the cluster solution.
                                     Figure 2: Example of a Wards Linkage




2.5 Considering measures of similarity or dissimilarity
All the clustering procedures discussed in the previous section rely on measures to express the similarity or
dissimilarity between pairs of objects/units. This can be something as straightforward as the simple or squared
Euclidean distance. However, we must take into consideration the form of the selected clustering variables.
Euclidean distance would require all inputs to be on a comparable (continuous) scale. Similarly, matching
coefficients such as Simple Matching, Russell-Rao, and Jaccard coefficients are utilized for cases where all inputs
are binary variables.

As discussed in Section 2.3, our finalized set of input variables includes both those on the continuous scale and
binaries. In this case of mixed input variables, the literature recommends using Gower’s dissimilarity coefficient




                                                                                                                 7
(Gower, 1971), which is a composite of several measures3 depending on the variable’s scale level and welds all
binary, continuous, and ordinal input into one value that is an overall measure of dissimilarity.

This allows us to retain our continuous-scale inputs, thereby not losing the granularity of information available
for the clustering procedure.


2.5 Deciding the number of clusters
Our final step requires deciding on the number of clusters subject to our input data, choice of clustering
procedure, and similarity measure. This choice has both an empirical as well as a pragmatic dimension. For the
latter, we need to choose a grouping that works best for our objective: to find clusters of poor with distinct
characteristics for which differing policies can be defined. While we might find that 15 stable groups amongst
the B40 exist, such a categorization might need to be more policy-relevant to identify. We keep this in mind as
we explore the thresholds ahead.

To quantitatively guide our decision, one approach is to seek a cluster solution such that any additional merging
of clusters would happen at a significantly high difference, or “distance,” of the dissimilarity measure threshold.
We can explore this diagrammatically in a dendrogram (Figure 3). Reading from bottom to top, the horizontal
lines represent the measure of Gower at which clusters merge to form a larger group. The figure suggests the
largest gap is at two or five clusters.4

The literature suggests additional quantitative criteria (“stopping rules”) that can be used to deter mine the
optimal cluster number. Two prominent ones are the variance ratio criterion (VRC) (Calinski & Harabasz, 1974)
and the Duda-Hart index (G.W & Cooper, 1985). The details of the construction and methodological details of
these measures are provided in the Appendix.




3
  If binary variables are used, the coefficient takes the value 1 when two objects do not share a certain characteristic and 0
else. When all the variables are binary and symmetric, Gower’s dissimilarity coefficient reduces to the simple matching
coefficient when expressed as a distance measure instead of a similarity measure (i.e., 1 – SM). If binary and asymmetric
variables are used, Gower’s dissimilarity coefficient equals the Jaccard coefficient when expressed as a distance measure
instead of a similarity measure (i.e., 1 – JC). If continuous variables are used, the coefficient is equal to the city-block distance
divided by each variable’s range. Ordinal variables are treated as if they were continuous, which is fine when the scale is
equidistant (Brian, Landau, Leese, & Stahl, 2011).
4   Imagine a horizontal line at the y axis value of 75 and count the number of vertical lines it intersects.
                                                                                                                                   8
                                                Figure 3: Dendrogram




Table 2 presents the VRC and Duda Hart index measures for cluster solutions. For the VRC, generally, we should
choose the number that maximizes the pseudo-F-statistic. However, as the VRC mechanically decreases with the
larger number of clusters, we compute the successive difference in VRC values, ������������ . The number that minimizes
the omega value indicates the most stable solution5. As the table shows, a 5-cluster solution is optimal.

Similarly, we choose the number that maximizes the Je(2) / Je(1) value for the Duda-Hart index. A modified
version of this index proposes picking the number that minimizes the pseudo-T-squared value instead (Duda,
Hart, & Stork, 2001). Two and five clusters yield the largest index value, and six have the lowest T squared.

In practice, these two measures must be considered in tandem by choosing a value that yields a large VRC and
Je(2)/Je(1), along with a small omega and T-squared. As we can see from the table, the two and five-cluster
solutions provide the ideal case. These results also coincide with our interpretation of the dendrogram.

How do we choose between these two cases? Here, as discussed at the beginning of this subsection, we will rely
on our objective function of this exercise to be the determinant. Two cluster groups of the bottom 40 th quintile
may not be preferable as they will likely not yield characteristics granular enough for policy intervention. Put
differently, it is not clear that policies aimed at poverty alleviation can benefit from categorizing poor households
into only two broad types. On the other hand, five groups may provide meaningful variation across dimensions
of interest such as geography or occupational activity, thereby providing the groundwork for targeted policy
interventions to address these households' unique constraints to income growth.




5
  One disadvantage of using the omega value is that it is not defined for a one-cluster solution, so the minimum number of
clusters that can be selected based on these criteria is three.
                                                                                                                        9
                          Table 2: Quantitative criteria for selection of cluster number
        Number       of Variance ratio criterion (VRC)         Duda-Hart Index
        Clusters        Pseudo F-Stat Omega (ω)                Je (2)/Je (1)        Pseudo T-squared

        2               940.506                                0.918                 485.891
        3               743.308          92.185                0.893                 403.700
        4               638.294          62.977                0.885                 361.633
        5               596.257          -19.288               0.915                 243.982
        6               534.932          15.723                0.913                 195.954
        7               489.331          13.579                0.844                 202.748
        8               457.308          2.439                 0.909                 155.732
        9               427.724          3.42                  0.897                 152.058
        10              401.560                                0.904                 136.741

Given all our discussed methodological considerations, we choose the five-cluster solution produced by
(agglomerative) hierarchical clustering of 26 input variables, using the Wards linkage algorithm and a Gower
dissimilarity measure. The following section will discuss the results by exploring the characteristics of each group
in detail.


    3.             Results
The non-parametric cluster algorithm divides our B40 households into five mutually exclusive groups. To
interpret the solution, we need to characterize each group using input variables or other measures. It is essential
to focus on variables that differentiate the clusters. For example, suppose one group has low levels of human
capital (education). This information is only relevant to our interpretation as the group differs substantially or in
a notable pattern from other groups.

To guide the interpretation, we use an asset framework of shared prosperity, comparing differences across
dimensions of the framework (discussed in the following subsection). Furthermore, we identify each group by a
meaningful, informative label. The group labels define features as a combination of situational and occupational
household characteristics. It is also immediately apparent that these groups, even when defined most
summarily, would necessitate poverty-alleviating interventions that look different and address distinct
constraints. We will explore each group in detail to bring further nuance to our understanding. The five key
groups of B40 households are as follows:
    4. Group 1 consists of ultra-poor households in rural areas that primarily rely on earnings from low-skill
        sharecropping, supplemented by public social safety nets.
    5. Group 2 are poor rural households characterized by active involvement in agriculture as owner-
        cultivators.
    6. Group 3 are households with low labor force participation receiving remittances from a migrant
        household member.

                                                                                                                  10
    7. Group 4 are urban households with higher education, who own [productive] assets, and whose members
       work in semi-skilled wage jobs in the industry and services sectors.
    8. Group 5 consists of poor households whose members work in unskilled daily wage labor in construction
       and other smaller service work.

Figure 4 presents the overall distribution of each group, as well as across four provinces of the country. What is
important to note here is not the overall number or size of the group but rather the relevant distribution within
each province. We note that G2, characterized by on-farm activities, is the largest group in Punjab’s B40
population. This is understandable, given the concentration of agriculture in the province. Similarly, G1, the most
vulnerable shock-prone remote households, have the highest share in Sindh. Given the shock exposure of the
province, this is consistent with the literature. One interesting thing to note is the high share of G5 in KP and
Balochistan. While this might appear puzzling at first, one must realize that in the absence of agriculture or other
forms of more advanced urban commerce, G5 almost appears to absorb the residuals of the other groups,
defined by unskilled labor and elementary work across various areas and sectors. Remembering this point as we
explore the asset framework lens will be important.

                               Figure 4: Province-level distribution of poor groups


                                                                              5000                                                (5) Informal
                                         Number of Households, in thousands




                                                                                                                                  elementary
                                                                                      21%                                         occupations
                                                                              4000
                                                                                                                                  (4) Urban, service
                       14%                                                                                                        industry oriented
             28%                                                                      25%
                                                                              3000
                                                                                                                                  (3) Remittance
                             24%                                                      14%     27%                                 receiving, low
                                                                              2000                                                employment
                                                                                              20%        36%
              21%                                                                     30%     6%                                  (2) Rural, on-farm
                      13%                                                     1000            18%        18%                      activities
                                                                                                         21%           48%
                                                                                              29%         22%          12%
                                                                                                                        9%
                                                                                      10%                              17%
                                                                                0                         3%           14%        (1) Remote, rural,
                                                                                     Punjab   Sindh     Khyber      Balochistan   shock-prone
                                                                                                      Pakhtunkhwa

In the sub-sections, we will explore group characteristics by presenting profiling and input variable differences
that are more stylized. However, more detail is provided in Appendix Table A-2 with summary statistics for each
group for all input variable measures and the overall mean for the sample of all B40 households, as well as the
national averages.


3.1 Interpreting cluster profiles from the lens of an asset framework
We use the asset framework of shared prosperity (Lopez-Calva & Rodríguez-Castelán, 2016) as our conceptual
framework to focus on the distribution of “assets” and how its components differentiate between di fferent
groups’ ability to escape poverty.

This framework recognizes that the income of individuals and households depends on the endowments,
utilization, and return of various types of assets, such as human capital, physical assets, and financial resources.
From a policy perspective, it provides a holistic approach to address the multidimensional aspects of poverty
and inequality, empowering the poor and vulnerable through better access to and effective use of their assets.

                                                                                                                                                       11
The asset framework of shared prosperity provides a comprehensive understanding of the factors contributing
to or hindering inclusive growth and poverty reduction. It helps policymakers identify and address the barriers
that prevent specific populations from fully participating in and benefiting from the development process. The
components considered in this framework include (i) endowments, such as human capital, physical capital,
financial capital, social capital, and natural capital; (ii) the intensity with which the assets are used, for instance,
in the labor market (as opposed to just owning and accumulating these assets); (iii) the returns associated with
them; (iv) transfers that they receive; and (v) to what extent external shocks may hinder the process of asset
accumulation. The prices of the basket of goods and services that the household consumes impact returns and
transfers. The endowments of assets, their use, returns, and transfers shape household income-generating
opportunities.

We discuss how endowments, their utilization, transfers, and shocks6 shape the different sets of constraints to
income generation households face in our five identified clusters:

                                 Figure 5: Asset Framework of Shared Prosperity




We use the asset framework of shared prosperity (Lopez-Calva & Rodríguez-Castelán, 2016) as our conceptual
framework to focus on the distribution of “assets” and how its components differentiate between different
groups’ ability to escape poverty.

This framework recognizes that the income of individuals and households depends on the endowments,
utilization, and return of various types of assets, such as human capital, physical assets, and financial resources.
From a policy perspective, it provides a holistic approach to address the multidimensional aspects of poverty
and inequality, empowering the poor and vulnerable through better access to and effective use of their assets.




6
  While it is recognized in the literature that the poor may face different levels of inflation, unemployment and other
consequences of macroeconomic downturns, due to the absence of meaningful micro-data, we do not consider the “Prices”
and “Returns” components of the framework and choose to focus on the broader occupational activity.
                                                                                                                     12
The asset framework of shared prosperity provides a comprehensive understanding of the factors contributing
to or hindering inclusive growth and poverty reduction. It helps policymakers identify and address the barriers
that prevent specific populations from fully participating in and benefiting from the development process. The
components considered in this framework include (i) endowments, such as human capital, physical capital,
financial capital, social capital, and natural capital; (ii) the intensity with which the assets are used, for instance,
in the labor market (as opposed to just owning and accumulating these assets); (iii) the returns associated with
them; (iv) transfers that they receive; and (v) to what extent external shocks may hinder the impact returns and
transfers. The endowments of assets, their use, returns, and transfers shape household process of asset
accumulation. The prices of the basket of goods and services that the household consumes he


3.1.1 Group 1: Remote, rural, and shock-prone households
Figure 7 provides summary statistics on measures of endowments across the groups. Household endowments
refer to the resources, assets, and capabilities individuals or families possess or accumulate over time.
Households in the first group (G1) are primarily located in rural areas. They exhibit the lowest levels of education,
asset ownership, and dwelling conditions, with the likelihood of unimproved floors and walls much higher than
other poor groups. Overcrowding (the number of household members per room) appears to be largest for this
group as well, but this may be driven by differences in room sizes between rural and urban dwellings (Figure 6).

On intensity of use, that is, how households use their endowments for income generation, such as through labor
market participation, Figure 8 presents the group profiles. G1 members are preliminarily involved in wage
employment, accompanied by a very small shares of self-employment. There is an apparent selection into
employment sectors: G1 is predominantly agricultural, consistent with their rural status. Moreover, G1 has the
highest share in lower-skill work among all groups (Figure 9); primarily consisting of low-skill sharecroppers.

Household income, generated from using assets, is often complemented by transfers. These transfers may
include domestic and international remittances, transfers from other households, or public transfers. Our set of
rural sharecroppers have the lowest share of receiving any remittances but are most likely of


                             Figure 6: Endowment: Accessibility and Remoteness
   1                                                        6
 0.9
 0.8                                                        5
 0.7
                                                            4
 0.6
 0.5                                                        3
 0.4
 0.3                                                        2
 0.2
 0.1                                                        1
   0
                                                            0
             *Rural PSU       *Accesibility Index (Higher
                                                                PSU Avg Distance to School   Overcrowding: Household
                                      = Worse)
                                                                          (km)                 Size/Occupied Rooms




                                                                                                                       13
                                                           Figure 7: Endowment of Assets, across groups

                             1.0
                             0.8
      Share of Households




                             0.6
                             0.4
                             0.2
                             0.0




any group to receive social assistance (Figure 11). Most notably, this assistance appears to be cash transfers
through the BISP program.7

External shocks encompass any circumstances that individual households and communities face that can have
pernicious consequences for the income-generating capacity of households (World Bank, 2013). These include,
but are not limited to, macroeconomic crises, extreme climate-related events, health-related shocks, and even
crime and violence. Given the absence of information on shock exposure and household coping mechanisms in
the data source, we used flood and drought risk measures from external sources. We spatially matched them to
our B40 households at the district level. While they exhibit less than ideal variation, we still find a notable trend
supporting our profiling of G1. Not only is this group more prone to droughts and floods, but this risk also maps
to higher levels of food insecurity, with nearly 2 in 5 households having experienced moderate food insecurity
in the last year (Figure 10).

                                                                            Figure 8: Intensity of Use

      1
    0.9
    0.8
    0.7
    0.6
    0.5
    0.4
    0.3
    0.2
    0.1
      0
                            Share of members    *Wage income   *Self employed   *Primary Industry: *Primary Industry:   *Primary Industry:   Primary Industry:   Share of members
                             actively working                                       Agriculture      Manufacturing        Construction       Services Sector     with very low skill
                                                                                                                                                                    occupation



                                 G1: Rural, remote, shock-prone, landless                                    G2: Rural, on farm activities
                                 G3: Off farm transition                                                     G4: Urban, service industry dependent
                                 G5: Low skill, informal wage industrial workers


7
 Interestingly, we find very low numbers of poor households reporting having received Zakat. This value, while improbable,
could be confounded by social stigma that may be attached to being a household that receives Zakat. There does not appear
to be a strongly evident non-reporting of the BISP transfer, perhaps respondents might have felt a non-response on this
would threaten their likelihood of receiving a cash transfer in the future.
                                                                                                                                                                                       14
This group faces several key binding constraints to improving their welfare: (i) low levels of human capital and
skills, (ii) poor connectivity (rural communities), (iii) associated insufficient public service delivery, (iii) limited
agricultural productivity and livelihood diversification, and (iv) a high degree of vulnerability to climate change
and shocks.
                                                 Figure 9: Intensity of Use in Agriculture
  1



0.5



  0
          Primary Industry: Agriculture Household cultivates Agri land           Share of cultivated land       Share of cultivated land share
                                                                                         owned                            cropped

      (1) Remote, rural and shock-prone households                             (2) Rural households involved in on-farm activities
      (3) Remittance receiving households with low employment                  (4) Urban, service industry oriented



3.1.2 Group 2: Rural households involved in on-farm activities
Group 2 (G2) households are comparable to G1 given their similar rural status. On endowments, they score much
better on human capital, asset ownership and dwelling conditions relative to G1. In fact, on some measures,
they rank the highest among B40 clusters, for example, half of all households in G2 own a motorcycle. About
one-third of members completed primary school, although this number drops to about 10 percent for
secondary-level education. Around 32 percent of households have mobile phone devices, and nearly half own
sewing machines and motorcycles. This suggests that, while still poor, these households have been able to invest
in some productive assets. Sewing machines can form the basis of subsistence production and repair of clothing,
and motorcycles (the highest amongst all clusters) represent a proxy for “accessibility to markets”.

This difference in asset endowment levels is complemented by the divergence in intensity of use. While both
groups are primarily practicing agriculture, G2 is involved in cultivating their own land. This is corroborated by

                                                      Figure 10: Exposure to Shocks

  0.5

  0.4

  0.3

  0.2

  0.1

      0
             Percentage of District Population in Flood   Rescaled Frequency of Agricultural Drought Experienced moderate food insecurity in last
                            Perimeter                                   (1984-2022)                                   year

               (1) Remote, rural and shock-prone households                          (2) Rural households involved in on-farm activities
               (3) Remittance receiving households with low employment               (4) Urban, service industry oriented
               (5) Informal elementary occupations

                                                                                                                                                    15
                                                         Figure 11: Transfers and Assistance
     0.4



     0.2



       0
               Received        Received any         Received          Received           Received           Received           Received     HH owes loans/is
           remittances from     assistance       remittance from   remittance from   assistance: Zakat assistance: In-kind assistance: BISP     in debt
           within or outside (Zakat/BISP/Govt)    within country   outside country
                country
           (1) Remote, rural and shock-prone households                                   (2) Rural households involved in on-farm activities
           (3) Remittance receiving households with low employment                        (4) Urban, service industry oriented

the fact that 70 percent of households in this cluster have self-employed working members, the highest of any
group, with only a small share in elementary occupations (about 20 percent). These characteristics signal more
productive employment and higher value compared to sharecropping households (G1). On transfers, around 15
percent of households in this group receive (mostly internal) remittances and the same percentage receives
some form of public assistance. While not as shock prone as the G1, G2 households rank second in their exposure
to floods and droughts, with 10 percent of households living in districts within flood perimeters.

The constraints this group faces in improving incomes are similar to smallholder farmers: (i) low levels of human
capital, (ii) insufficient yields from traditional low-value farming systems, (iii) weak linkages to regional markets,
and (iv) looming climate change threats to the agri-food system (World Bank, 2024).


3.1.3 Group 3: Remittance-receiving households with low employment
Households in Group 3 (G3) have a notably higher share of female heads and rank in about the median level for
other measures of endowments amongst the groups (Figure 7). It is harder to say whether G3 might exist in peri-
urban or rapidly urbanizing, but previously rural settlements as outdated (government-defined) measures of
rural/urban prevent a closer examination.

Households in off-farm transition appear to have a much higher share of self-employed workers despite having
the lowest share of members participating in the labor market overall.

On transfers, we note that G3 has a significantly higher share of households receiving remittances. Most of these
remittances are internal (Figure 11). This fact, combined with employment characteristics of household
members, allows us to make the case that G3 households (off-farm transition) have a “missing working member”
who has perhaps migrated to an urban center and sends remittances to the remaining family members in their
(rural or peri-urban) home. This framing is consistent with the higher share of female heads and lower labor
market participation in wage jobs, as the missing worker would not be classified as a household member in the
data.


3.1.4 Group 4: Urban households involved in service sector opportunities
Group 4 (G4) is most strongly identified by its urban nature with about half (much larger than the overall B40
share) of households living in urban areas. Households score the highest on asset ownership and level of
educational attainment and have the highest level of improved infrastructure (Figure 7). What is most notable
                                                                                                                                                               16
is that this level of endowment has not only the highest rank among cluster groups but is comparably higher
than the national average as well (see Appendix Table A-2), which is surprising for a group otherwise part of the
B40.

Workers from G4 households appear to be spread across manufacturing and service sectors, albeit with a low
share in elementary work, suggesting a relatively higher value service job (such as hospitality, shopkeepers, and
small-scale technicians). Households in this group are least likely to receive government assistance, as well as
least exposed to shocks.
3.1.5 Group 5: Households with working members in elementary service and industry
occupations
In this last cluster group, Group 5 (G5), about 80 percent of households are in rural areas, a higher share than
G3 and G4 but lower than G1 and G2. This group is comparable to the group of sharecroppers in its asset
endowment: they score low on human capital and asset ownership and have poor dwelling conditions.

Households in this category have a high share of workers in wage-employment elementary skilled jobs, with a
notable majority in the construction sector. This suggests that G5 may find their livelihood in low-skill daily wage
labor.

They are more likely to receive government transfers, only below the two rural groups, and about 1 in 3
households report being in debt. Despite being less exposed to natural and climate shocks, about 25 percent of
households in this group experience moderate food insecurity, which is only second to the very ultra shock prone
sharecropping households (G1).


    4.          Conclusion
Poverty manifests in forms that are often varied and stem from different circumstances that stunt the income-
generating capacity of households. Through meticulous non-parametric cluster analysis, an approach that does
not rely on an existing deterministic framework of the differing determinants of welfare or types of low-income
households, accompanied by an asset framework lens that allows us to structure and interpret the resulting
group profiles, we improve our understanding of Pakistan’s poor. We classi fy the bottom 40 percent of the
country’s poor into five mutually exclusive groups.
The first of these are ultra-poor households in rural areas, who primarily rely on unskilled sharecropping as a
means of earning, supplemented by public social safety nets. They have the poorest living conditions and are at
constant risk of further welfare losses due to shocks. These may also be those worst affected by the impending
consequences of climate change.

A second set of poor rural households are also actively involved in agriculture as owner-cultivators. Low levels
of education constrain their potential growth.

Third, is a group of households in a transitory space between agriculture and service provision. Perhaps these
are the households left behind in the rural-urban migration of earners, with more dependents and, in some


                                                                                                                 17
cases, female heads. Despite the high percentage of households receiving remittances, remittances are not
sufficient to move them out of poverty.

The fourth group are households in urban areas, there is the puzzling case of households with the highest level
of education among the B40 group, the highest ownership of (productive) assets, and semi-skilled wage jobs in
the industry and services sectors. This cluster has all the ingredients necessary to move up the income ladder.
Their constraints may result from broader economic despondency, and they may be the first to move out of the
B40 once the economy improves.

The fifth group are poor households find their livelihood in unskilled daily wage labor in construction and other
minor service work spread across the country. In some ways, they are the non-agricultural equivalent of our
cluster of sharecroppers: very low human capital and terrible dwelling conditions (maybe slums/ katchi abadis).

It is essential to recognize that a taxonomy of the poor is part of a larger effort to make more meaningful,
empirically informed policies targeted to address the diverse set of constraints faced by the underprivileged in
the country. While the discussion above only provides a brief glimpse towards this goal, ongoing work seeks to
map, for each group, their salient constraints and connect these to the global development literature on
interventions that chart pathways out of poverty.




                                                                                                              18
                                               References
Barriga-Cabanillas, O., Kishwar S., Meyer, M., Nasir, M., and Qazi, M. (forthcoming). Poverty Projections for
        Pakistan: Nowcasting and Forecasting. Poverty and Equity Assessment Background Paper.

Brian, E. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster Analysis. John Wiley & Sons, Ltd.

Calinski, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics - Theory
         and Methods, 3(1), 1–27.

Dolnicar, S., Grün, B., & Leisch, F. (2016). Increasing sample size compensates for data problems in segmentation
        studies. Journal of Business Research, 69(2), 992–999.

Duda, R., Hart, P., & Stork. (2001). Pattern classification (2nd Edition). Hoboken: Wiley.

G.W, M., & Cooper, M. (1985). An examination of procedures for determining the number. Psychometrika, 50(2),
       159–179.

Gower, J. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27(4), 857–871.

Klopotek, M., & Wierzchon, S. (2018). Cluster Analysis. In Studies in Big Data, Volume 34. Cham: Springer.

Lopez-Calva, L. F., & Rodríguez-Castelán, C. (2016). Pro-Growth Equity: A Policy Framework for the Twin Goals.
       Policy Research Working Paper; No. 7897.

Mooi, E., & Marko Sarstedt, I. M.-R. (2017). Cluster Analysis. In Springer Texts in Business and Economics (p.
       3130366). Singapore: Springer.
Qiu, W., & Joe, H. (2009). clusterGeneration: Random cluster generation (with specified degree of separation).
        R package version 1.2.7.

Rahman, A.M., Sani, N.S., Hamdan, R., Othman, Z.A., & Abu Bakar, A. (2021). A clustering approach to identifying
      multidimensional poverty indicators for the bottom 40 percent group. PLoS ONE 16(8).
World Bank. (2013). World Development Report: Risk and Opportunity - Managing Risk for Development.
       Washington, DC: World Bank Group.

World Bank. (2023). Macro-Poverty Outlook. World Bank.




                                                                                                                19
                                                   Appendix
                                    Table A-1: List of Clustering Variables
 Thematic Area        Input Variables
                       HH has no members with any formal education
                       Share adult members of HH with primary education
  Human Capital
                       Share adult members of HH with secondary education
                       Share adult members of HH with postsecondary education
                       At least one HH member with wage income
                       At least one HH member who is self-employed
                       HH received remittances from within or outside the country
                       HH received any assistance (Zakat/BISP/Govt)
                       At least one HH member with primary ISIC: agriculture
  Livelihoods          At least one HH member with primary ISIC: livestock
                       At least one HH member with primary ISIC10: Manufacturing
                       At least one HH member with primary ISIC10: Construction
                       At least one HH member with primary ISIC10: Commerce
                       At least one HH member with primary ISIC10: Transport / Communications
                       At least one HH member with primary ISIC10: Other Services, Unspecified
                       HH owns Sewing/Knitting Machine
                       HH owns Motorcycle/Scooter
                       HH has internet facility
Physical Assets and
                       Share of HH members who own a mobile phone
Dwelling Conditions
                       HH floor is made of earth/sand/dung
                       HH walls are made of mud/mud bricks/wood/bamboo/stones
                       HH has improved sanitation
                       HH is in a rural PSU
                       Overall Accessibility Index (District)
Access and Shocks
                       Percent of Population in Flood Extent Perimeter (2022) (District)
                       Rescaled Frequency of Severe Agri Drought (1984-2022) (District)




                                                                                                 20
Quantitative Stopping Rules

      The Calinski and Harabasz’s (1974) variance ratio criterion (VRC) is defined, for a solution with ������ objects
       and ������ clusters:
                                                              ������������������ /(������ − ������)
                                                ������������������������ =
                                                             ������������������ /(������ − ������)
       Where ������������������ is the sum of the squares between the clusters and ������������������ is the sum of the squares within
       the clusters. The difference in VRC values, ������������ , is computed as:

                                   ������������ = (������������������������+������ − ������������������������ ) − ( ������������������������ − ������������������������−������ )

      The Duda-Hart index essentially performs the same calculation as the VRC but compares the ������������������ in a
       pair of clusters to be split both before and after this split. More precisely, the index is the ������������������ in the
       two clusters ������������(������) divided by the SSW in one cluster ������������(������). The pseudo-T-squared takes the number of
       observations in both groups into account.




                                                                                                                 21
                                 Table A-2: Descriptive Statistics of B40 Groups (Cluster Variables)
                                                                            Group 1    Group 2      Group 3         Group 4    Group 5

                                                                            Remote
                                                                                                                    Urban,
Variable                                                            B40     rural,     Rural, on    Remittance                 Informal,
                                                         National                                                   service
                                                                            shock-     farm         receiving low              elementary
                                                         Average                                                    industry
                                                                            prone      activities   employment                 occupations
                                                                                                                    oriented
                                                                            landless

Owns Sewing/Knitting Machine                             0.594      0.461   0.150      0.495        0.527           0.652      0.408

Owns Motorcycle/Scooter                                  0.540      0.391   0.236      0.521        0.376           0.484      0.289

Has internet facility                                    0.341      0.149   0.053      0.126        0.159           0.281      0.110

Share of members who own a mobile phone                  0.491      0.354   0.267      0.316        0.375           0.420      0.371

Has no members with any formal education                 0.196      0.331   0.756      0.198        0.366           0.051      0.434

Share adult members with primary education               0.516      0.313   0.083      0.313        0.322           0.543      0.248

Share adult members with secondary education             0.278      0.108   0.023      0.101        0.106           0.209      0.079

Share adult members with postsecondary education         0.150      0.038   0.010      0.036        0.030           0.080      0.027

Received remittances from within or outside country      0.182      0.137   0.053      0.173        0.375           0.099      0.064

Received any assistance (Zakat/BISP/Govt)                0.086      0.159   0.255      0.165        0.130           0.111      0.156

At least one member with wage income                     0.668      0.763   0.994      0.589        0.036           0.967      0.988

At least one member who is self employed                 0.369      0.348   0.112      0.704        0.665           0.265      0.068

At least one member with primary ISIC: agriculture       0.266      0.355   0.972      0.830        0.019           0.033      0.038

At least one member with primary ISIC: livestock         0.258      0.314   0.509      0.861        0.052           0.035      0.075

At least one member with primary ISIC: manufacturing     0.183      0.183   0.057      0.123        0.089           0.383      0.189

At least one member with primary ISIC: construction      0.150      0.239   0.138      0.131        0.009           0.160      0.555

At least one member with primary ISIC: commerce          0.213      0.189   0.029      0.102        0.373           0.315      0.160

At least one member with primary ISIC: transport/comms   0.099      0.122   0.042      0.057        0.156           0.226      0.122

At least one member with primary ISIC: Other Services,   0.177      0.135   0.053      0.074        0.058           0.340      0.110

HH floor is unimproved                                   0.349      0.608   0.897      0.673        0.537           0.192      0.760

HH walls are unimproved                                  0.182      0.360   0.670      0.344        0.274           0.045      0.501

HH has improved sanitation                               0.747      0.665   0.487      0.636        0.731           0.791      0.652

HH is in a rural PSU                                     0.620      0.795   0.944      0.962        0.736           0.510      0.820

Overall accessibility index: equally weighted            0.363      0.401   0.469      0.459        0.363           0.335      0.384

Percentage of District Population in Flood Perimeter     0.083      0.102   0.193      0.100        0.068           0.080      0.093

Rescaled Freq of Severe Agri Drought (1984-2022)         0.020      0.033   0.073      0.036        0.024           0.014      0.029

Observations                                             25,809     8,789   1,323      2,058        1,100           1,689      2,619




                                                                                                                                       22
                                          Table A-3: Additional Profiling Statistics for Cluster Groups


                                                                         Group 1         Group 2         Group 3         Group 4    Group 5

                                                                                                                         Urban,
Variable                                                         B40     Remote rural,   Rural,     on   Remittance                 Informal,
                                                                                                                         service
                                                     National            shock-prone     farm            receiving low              elementary
                                                                                                                         industry
                                                     Average             landless        activities      employment                 occupations
                                                                                                                         oriented

Female Head                                          0.101       0.067   0.032           0.052           0.243           0.048      0.028

Dependency Ratio (Rescaled)                          0.117       0.151   0.158           0.143           0.216           0.119      0.148

Owns Agricultural Land                               0.077       0.062   0.027           0.089           0.084           0.048      0.056

Owns non-Agricultural Land                           0.025       0.027   0.008           0.022           0.050           0.023      0.031

Share of members actively working                    0.499       0.541   0.735           0.638           0.316           0.491      0.505

Primary Industry: Services Sector                    0.369       0.322   0.079           0.167           0.523           0.558      0.302

Share of members with very low skill occupation      0.208       0.343   0.596           0.238           0.050           0.244      0.523

Household cultivates Agri land                       0.218       0.236   0.359           0.644           0.048           0.036      0.057

Share of cultivated land owned                       0.684       0.593   0.196           0.707           0.000           0.000      0.000

Share of cultivated land share cropped               0.181       0.301   0.706           0.186           0.000           0.000      0.000

Received remittance from within country              0.115       0.101   0.045           0.117           0.305           0.061      0.050

Received remittance from outside country             0.072       0.04    0.009           0.062           0.083           0.041      0.014

Received assistance: Zakat                           0.007       0.009   0.007           0.003           0.010           0.011      0.013

Received assistance: In-kind                         0.043       0.023   0.012           0.027           0.019           0.027      0.025

Received assistance: BISP                            0.077       0.15    0.248           0.162           0.121           0.101      0.143

HH owes loans/is in debt                             0.213       0.279   0.236           0.278           0.343           0.247      0.295

Experienced moderate food insecurity in last year    0.138       0.25    0.385           0.231           0.248           0.165      0.265

PSU Avg Distance to School (km)                      1.827       1.769   2.024           1.915           1.638           1.587      1.716

Overcrowding: Household Size/Occupied Rooms          3.239       4.467   5.068           4.371           4.224           4.290      4.502

Observations                                         25,809      8,789   1,323           2,058           1,100           1,689      2,619




                                                                                                                                      23
                                          Table A-4: Summary Statistics of K-Means Clustering K=5


                                  Variable                                 B40     K-Cluster 1   K-Cluster 2   K-Cluster 3   K-Cluster 4   K-Cluster 5

HH owns Sewing/Knitting Machine                                            0.461      0.270         0.501        0.500         0.708         0.287

HH owns Motorcycle/Scooter                                                 0.391      0.334         0.501        0.376         0.511         0.243

HH has internet facility                                                   0.149      0.028         0.097        0.184         0.348         0.059

Share of HH members who own a mobile phone                                 0.354      0.286         0.325        0.369         0.442         0.336

HH has no members with any formal education                                0.331      1.000         0.000        0.000         0.000         1.000

Share adult members of HH with primary education                           0.313      0.000         0.391        0.384         0.724         0.000

Share adult members of HH with secondary education                         0.108      0.000         0.072        0.056         0.471         0.000

Share adult members of HH with postsecondary education                     0.038      0.000         0.016        0.016         0.187         0.000

HH received remittances from within or outside country                     0.137      0.148         0.122        0.154         0.134         0.115

HH received any assistance (Zakat/BISP/Govt)                               0.159      0.112         0.129        0.196         0.106         0.209

At least one HH member with wage income                                    0.763      0.705         0.740        0.786         0.788         0.771

At least one HH member who is self employed                                0.348      0.342         0.448        0.344         0.372         0.239

At least one HH member with primary ISIC: agriculture                      0.355      0.444         0.426        0.312         0.215         0.412

At least one HH member with primary ISIC: livestock                        0.314      0.447         0.445        0.261         0.190         0.278

At least one HH member with primary ISIC10: Manufacturing                  0.183      0.107         0.175        0.212         0.246         0.144

At least one HH member with primary ISIC10: Construction                   0.239      0.214         0.238        0.269         0.198         0.244

At least one HH member with primary ISIC10: Commerce                       0.189      0.107         0.202        0.222         0.262         0.117

At least one HH member with primary ISIC10: Transport and Comnunications   0.122      0.073         0.107        0.150         0.142         0.108

At least one HH member with primary ISIC10: Other Services, Unspecified    0.135      0.078         0.130        0.140         0.252         0.076

HH floor is made of earth/sand/dung                                        0.608      0.775         0.577        0.581         0.348         0.782

HH walls are made of mud/mud bricks/wood/bamboo/stones                     0.360      0.446         0.296        0.333         0.189         0.553

HH has improved sanitation                                                 0.665      0.423         0.590        0.768         0.804         0.634

HH is in a rural PSU                                                       0.795      0.910         0.848        0.758         0.679         0.818

Overall accessibility index: equally weighted educ,health,mkt              0.401      0.630         0.633        0.265         0.329         0.285

Percentage of District Population in Flood Extent Perimeter                0.102      0.042         0.039        0.128         0.097         0.174

Rescaled Freq of Severe Agri Drought (1984-2022)                           0.033      0.067         0.068        0.013         0.015         0.020

Observations                                                               8789       1324          1478          2842          1358          1787




                                                                                                                                             24
25