Towards a Taxonomy of the Poor in Pakistan Christina Wieser Ibrahim Khan Towards a Taxonomy of the Poor in Pakistan Christina Wieser Ibrahim Khan1 This version: July 23, 2024 Abstract Poor households are heterogeneous in the circumstances preventing an improvement in their welfare. It is important to understand the nuances within different types of poor households so that critical pathways out of poverty that remedy the variegated sets of constraints they face can be identified and acted on through policy action. This paper attempts to categorize the bottom 40th consumption percentile of households (B40) in Pakistan into different non-overlapping groups using a non-parametric hierarchical cluster analysis, which allows for an empirically driven taxonomy of the poor in the country. Using data from the Household Integrated Economic Survey (HIES) 2018-19, we identify five groups among the B40 and explore their salient household and occupational attributes through the lens of an asset framework of shared prosperity. Keywords: poverty, cluster analysis, Pakistan JEL Classification: I32, C3 1 Christina Wieser (cwieser@worldbank.org) is a Senior Economist in the Poverty and Equity Global Practice at the World Bank. Ibrahim Khan (mkhan83@worldbank.org) is a Consultant in the Poverty and Equity Global Practice at the World Bank. The authors would like to thank Moritz Meyer, Maria Qazi, Oscar Eduardo Barriga Cabanillas, and Jon Jellema for their excellent feedback. We declare that we have no relevant or material financial interests that relate to the research described in this paper. The findings, interpretations, and conclusions expressed in this work do not necessarily reflect the views of The World Bank Group or any affiliated organizations, its Board of Executive Directors, or the governments they represent. The World Bank does not guarantee the accuracy of the data included in this work. Disclaimer: This work is a product of the staff of the World Bank with external contributions. The findings, interpretations, and conclusions expressed in this work do not necessarily reflect the views of the Executive Directors of the World Bank or the governments they represent. The World Bank does not guarantee the accuracy of the data included in this work and does not assume responsibility for any errors, omissions, or discrepancies in the information, or liability with respect to the use of or failure to use the information, methods, processes, or conclusions set forth. The boundaries, colors, denominations, links/footnotes, and other information shown in this work do not imply any judgment on the part of The World Bank concerning the legal status of any territory or the endorsement or acceptance of such boundaries. The citation of works authored by others does not mean the World Bank endorses the views expressed by those authors or the content of their works. 1. Introduction Pakistan saw an extended period of rising household welfare between 2001 and 2018, later interrupted by a series of crises. From 2001 to 2018, household real consumption grew by 60 percent, or an annual average of 3 percent, leading to a sustained decline in poverty from 64.3 percent in 2001 to 21.9 percent in 2018 (World Bank, 2023), and was achieved primarily by the expansion of off-farm economic opportunities and increased out-migration. During the period of improved welfare, inequality levels remained relatively stable, with a Gini index of 28.4 in 2018 (World Bank, 2023). While poverty declined across Pakistan, three challenges persist. First, disparities in welfare levels across provinces, districts, and the rural-urban divide continue. Urban areas have seen a faster pace of poverty reduction due to expanding employment, especially in services and construction, which provide employment for lower income workers. Poverty has increasingly become concentrated among rural households working on agriculture and among those with low human capital endowments. For instance, in 2018, rural areas—where two-thirds of the population but four out of five of the poor live —had a poverty rate (28.2 percent) more than twice as high as urban areas (10.9 percent). Second, female and youth labor force participation remain low. Those (few) women who work have found mostly low-skill, low-paying jobs. As much as 37 percent of Pakistan’s youth (aged 15 to 24) are not in employment, education, or training (NEET), and even for young Pakistanis who work, labor market prospects are grim. Third, recent years have presented Pakistani households with greater difficulties due to multiple back-to-back crises, such as macroeconomic, political, and climate-induced shocks. In particular, the COVID-19 pandemic in 2020 marked an end to Pakistan’s track record of consistent year-on-year poverty reduction with a projected increase in poverty of 3 percentage points. Furthermore, A precarious macroeconomic situation, with dangerously low reserves, an exchange rate in free fall, and an ambitious IMF-enforced fiscal consolidation plan was compounded by an unprecedented calamity in the form of torrential rains and a combination of riverine, urban, and flash flooding in the summer in 2022. The resulting surge in inflation further put pressure on poor Pakistanis. Poverty was projected at 25.3 percent in 2023 (Barriga-Cabanillas et al., forthcoming). Reducing poverty requires addressing the specific challenges faced by the poor in Pakistan. The poor have low educational attainment and are usually engaged in low productivity work. Education contributes to individuals’ productivity, earning potential, and overall wellbeing. Still, Pakistan has enormous gaps in educational attainment, with poorer individuals left uneducated and a widening gap over the past 20 years between rich and poor. For example, in 2018, only 46 percent of the youth in the bottom quintile could read and write, increasing only modestly from 38 percent in 2001. Along with such low educational attainment, poor Pakistanis work in lower-paying sectors such as agriculture, construction, or low-quality services. Additionally, there is interconnectedness between poverty, geography, and livelihoods, resulting in substantially higher poverty rates in rural areas and the lagging provinces of Balochistan and Sindh. However, the poor are not a homogenous group. They have different characteristics, face constraints in escaping poverty, and their opportunities are shaped by the economic environment, where they live, and household- specific factors. This, in turn, means that we need to better understand what different groups exist within the 1 umbrella of the “poor” to allow for differentiated policy interventions. We, therefore, first specify different groups of the poor based on their characteristics to identify some binding constraints each specific group faces. Note that this paper does not examine determinants of poverty; instead, the focus is ex-post to determine, for the population of poor in the country, how to group them such that their characteristics are interpretable and subsequently may allow for policy action targeting each group’s specific set of constraints. For this, we leverage cluster analysis, a statistical technique for grouping a population into a set of subgroups called clusters based on the observed characteristics of households. The different groups derived from this exercise are then given policy meaning through careful interpretation. Based on their similarity and dissimilarity, it is a non-parametric approach to segmenting a population—in our case, the bottom 40 percent of the consumption distribution. Broadly used across different applications of unsupervised (non-label) problems, the development literature has used clustering to target and measure poverty characteristics across different contexts (Rahman, et al., 2021). Clusters are identified by partitioning households into groups to maximize their similarity within each group while maximizing the dissimilarity between groups. The analysis involves three steps: (i) selection of clustering variables based on previous related works, theory (potential relationship with poverty), and data availability; (ii) selection of the clustering procedure, measures of dissimilarity, and the number of clusters; and (iii) validation and interpretation of the results by defining and labeling the obtained clusters. The clustering procedure is a hierarchical method to select an undefined number of clusters among the poor. Using a wards linkage clustering algorithm, we form clusters by combining observations or smaller clusters whose merger minimized the increase in the overall within-cluster variance. Once the clusters or subgroups of the poor are determined by the algorithm, they are profiled to label and interpret the results of the clustering exercise. Results of the cluster analysis show that poverty manifests in varied circumstances, and we identify five meaningful clusters among the bottom 40 percent of the consumption distribution. The first group consists of ultra-poor households in rural areas that primarily rely on unskilled sharecropping as a means of earning, supplemented by public social safety nets. The second group is poor rural households are also actively involved in agriculture as owner-cultivators. The third key group of the bottom 40 are households in transition between agriculture and service provision. The fourth set of households are urban with higher levels of education are among the B40, have higher ownership of (productive) assets, and are in semi-skilled wage jobs in the industry and services sectors. Lastly, the fifth group is poor households that find their livelihood in unskilled daily wage labor in construction and other smaller service work—in some ways, the non-agricultural equivalent of our cluster of sharecroppers—spread across the country. This taxonomy of the poor could be helpful in the larger effort to make more meaningful, empirically informed policies targeted to address the diverse set of constraints faced by the poor in Pakistan. The paper is structured as follows. Section 2 describes the data sources and provides a step-by-step explanation of the methodology applied, Section 3 explores and discusses the results through an asset framework lens, and Section 4 concludes. 2 2. Data and Methodology Cluster analysis is a statistical technique of grouping a population into a set of meaningful subgroups called clusters based of observed characteristics. Clusters are identified by partitioning units into groups to maximize the similarity within each group while maximizing the dissimilarity between groups. Machine learning methods are commonly used for pattern analysis in industry applications such as market segmentation but increasingly in development research. Cluster analysis is distinguished from other machine learning and econometric methods in that there is no assumption of a statistical model or structure (Brian, Landau, Leese, & Stahl, 2011). Consequently, pre-processing steps must be undertaken to prepare the data for analysis. Additionally, robustness checks must be conducted to test the stability of the cluster identified. Error! Reference source not found. outlines the decision processes involved in the analysis (Mooi & Marko Sarstedt, 2017). Care must be taken at each node to ensure that each methodological decision is congruent with the analysis objective and informed by the structure of the underlying data. Figure 1: Steps in a cluster analysis 2.1 Data The Pakistan Bureau of Statistics regularly conducts the Household Integrated Economic Survey (HIES). It collects information on demographic characteristics and indicators of health, education, dwelling conditions, economic activity, and household welfare. The latest survey round (2018-19) covers 24,809 households across 1,802 primary sampling units (PSUs) in rural and urban areas of the four major provinces and is representative at the provincial level. For the first time, this round also includes the Federally Administered Tribal Areas (FATA) and Frontier Regions (FR) as part of Khyber Pakhtunkhwa. 3 The HIES is designed to measure national and provincial poverty rates as it includes an extensive consumption expenditure module that allows for calculating adult equivalent household consumption expenditure. Note that our objective requires the identification of monetary poor households as a precursor to the analysis. Further, given the extensive set of welfare attributes captured in this survey, the HIES data is an ideal setting to develop a taxonomy of the poor. Many important aspects of poverty may not be sufficiently captured by the household survey, such as the risk of a natural disaster or remoteness. To address the constraints of such missing information, we source geospatial data from external sources, including the World Bank’s Rural Accessibility Index, Climate Country Disaster Statistics (CCDR), and the national Agroecological zones defined by the Pakistan Agriculture Research Council. However, since we do not have geo-coordinates of the households in the HIES sample, we must incorporate the external data at the more aggregate administrative district level. While some granularity of information is lost in this, we can still use this information to proxy for broader geographical characteristics. 2.2 The unit of analysis To define clusters, we need to decide on a unit of analysis most appropriate for our context. Determining the data structure from which the clustering will be derived is a nontrivial consideration. For example, we might restrict our data to just those below the poverty line or pre-screened on specific characteristics. Further, we could run the algorithm nationally or separately for the provinces. Lastly, we could choose our unit of interest to be the individual, the household, or even a larger aggregation (PSU). Each of these considerations is ultimately normative and must be weighed against the policy objective of the analysis, which is to characterize various target groups for poverty-response interventions. Considering this, the unit of our analysis is households in the bottom 40th percentile of the income distribution nationally. There are a few reasons which support this approach: 1. Households form a common level at which targeted policy interventions are carried out relative to individuals—for example, the country's unconditional and conditional cash transfers target eligible households. While area-specific interventions are used in circumstances such as disaster response, our goal is to provide a more general framing of different groups experiencing poverty. 2. Similarly, instead of repeating the analysis for different provinces, conducting it at the national level allows for a larger sample to draw inferences from without losing granularity, given that we include some geographical characteristics as clustering attributes. 3. Extending the focus from those below the national poverty line (comprising the bottom 22 percent of the population according to the national poverty line) to the “Bottom 40 (B40)” recognizes that poverty is dynamic. Given that the last update of the HIES data occurred five years ago and poverty is assumed to have increased over the last few years based on a poverty projection model, expanding the sample allows our model to be more inclusive of households that may have moved across old thresholds. Furthermore, the B40 is not substantially different, or better off, than the country's poor. Error! Reference source not found. compares the overall set of households in the country to the B40 and the poor on welfare measures. Note that the B40 still score low on measures of human capital, accessibility, and consumption, with a monthly average per adult equivalent consumption of PKR 3,634 (around 26 USD, in 2018 terms). 4 Table 1: Comparing the welfare of sub-populations. Variable Pakistan B40 Poor Per Adult Household Expenditure (PKR) 6,873 3,634 3,130 Household Size 6.24 7.76 8.25 Dependency Ratio 0.99 1.34 1.46 Share of adult members with primary education 52% 31% 26% Share of adult members with secondary education 28% 11% 8% Share of working members in very low-skill 21% 34% 39% occupation Received any Assistance (Zakat/BISP/Govt) 8.6% 15.9% 18.4% Rural PSU 62.0% 79.5% 82.5% Accessibility Index (District-Level) 0.37 0.40 0.41 Experienced moderate food insecurity in the last 13.8% 25.0% 30.8% year Improved sanitation 74.7% 66.5% 63.3% Observations 24,809 8,789 4,651 2.3 Selecting and pre-processing variables The choice of the set of attributes upon which the algorithm is run, including technical considerations and data availability, is grounded in literature and a theoretical framework based on their relevance to poverty analysis to prevent going into unstructured data mining. To this end, we explore and identify variables that encompass the following aspects of household welfare: 1. Ownership of productive assets 2. Human capital 3. Livelihoods and nature of economic activities 4. Dwelling conditions and access to services 5. Exposure to climatic shocks Note that the choice of themes and variables is not related to “determinants” of poverty but rather the attributes that distinguish different types among poor households. Once we have a universe of characteristics for which data is available within these themes, several steps are undertaken to reach a final set. We utilize guidelines set in (Klopotek & Wierzchon, 2018) to this end in the following manner: 1. The variables must sufficiently differentiate between potential clusters. Within the sample of B40 households, the selected outcomes should have enough variation. For example, while we could include car ownership as a clustering variable, the percentage of the B40 that own a car is extremely low. So, this information could not help us identify different groups due to their homogeneity. 5 2. The relation between sample size and several clustering variables must be reasonable. While there is no generally accepted guideline on the minimum number of variables required for a particular sample size (or vice versa), literature broadly suggests that there will be an over-proportional increase in observations for every additional clustering variable. What this means in practice is that our sample should be at least 10 (Qiu & Joe, 2009) to 30 (Dolnicar, Grün, & Leisch, 2016) times the number of clustering variables, with continued improvements up to 100. In our context, our B40 sample size is 8,789, which suggests that we can safely choose 80-90 variables for our analysis. Of course, this is likely excessive in our context as we also must be careful not to fit the data over. 3. The chosen variables must not be highly correlated. This is related to our earlier point; beyond a certain point, adding additional variables will only add to the possibility of two or more measures exhibiting a high degree of pairwise correlation. We examine the pairwise correlation of our preliminary set of variables defined by steps 1 and 2; for any attribute showing a> 0.75 correlation2, we either exclude one or create a merged version. Following the above steps, we finalize 26 clustering variables for our analysis. Their descriptions are listed in Appendix Table A-1. 2.4 Choosing the clustering procedure Different procedures are designed to optimize specific criteria, such as maximizing the distance between measures across potential clusters. Practically, a key distinction should be made between hierarchical and partitioning methods. Hierarchical methods are characterized by a “tree-like” structure, with units combining to form clusters and clusters merging based on similarity. In the agglomerative approach, the most common hierarchical method, the algorithm begins with N clusters (that is, every unit is a cluster) and merges until we are left with one cluster, the entire sample itself. The divisive clustering approach does the same but in the other direction from top to bottom. Partitioning methods rely on an entirely different framework. The most common of these, a k-means procedure, tries to minimize within-cluster variation to form homogenous clusters. In this approach, cluster affiliations can change; consequently, it does not build a hierarchy as in the previous method. While k-means can be superior to hierarchical methods as it is less affected by outliers, it requires us to pre-specify the number of clusters to be formed. In our context, we do not know the number of clusters apriori. Therefore, we employ a (agglomerative) hierarchical clustering procedure for our objective. This requires an additional step: specifying the linkage algorithm that defines the distance from a newly formed cluster to another cluster or unit. Some of the more popular algorithms include single linkage, wherein the distance between two clusters is the shortest distance 2 While the literature recommends 0.9, none of our listed variables have correlation above 80, so the measure was adjusted down accordingly. 6 between any two of their members; centroid linkage, which computes the difference in the geometric center of each cluster; and average linkage, which defines distance as the average between all units of the two clusters. Our approach will utilize the Wards linkage algorithm. Instead of combining the two closest or similar objects, this procedure merges objects to minimize the increase within-cluster variance. Error! Reference source not found. diagrammatically represents this algorithm in a set with two clustering variables and seven objects. Prior research suggests that this approach performs very well, especially with more clustering variables, and tends to yield clusters of a similar size and tightness (Mooi & Marko Sarstedt, 2017). While highly correlated variables and outliers tend to strongly influence the results of this algorithm, our data pre-processing steps discussed earlier can assuage these concerns. Moreover, we will conduct robustness checks with alternate algorithms to test the stability of the cluster solution. Figure 2: Example of a Wards Linkage 2.5 Considering measures of similarity or dissimilarity All the clustering procedures discussed in the previous section rely on measures to express the similarity or dissimilarity between pairs of objects/units. This can be something as straightforward as the simple or squared Euclidean distance. However, we must take into consideration the form of the selected clustering variables. Euclidean distance would require all inputs to be on a comparable (continuous) scale. Similarly, matching coefficients such as Simple Matching, Russell-Rao, and Jaccard coefficients are utilized for cases where all inputs are binary variables. As discussed in Section 2.3, our finalized set of input variables includes both those on the continuous scale and binaries. In this case of mixed input variables, the literature recommends using Gower’s dissimilarity coefficient 7 (Gower, 1971), which is a composite of several measures3 depending on the variable’s scale level and welds all binary, continuous, and ordinal input into one value that is an overall measure of dissimilarity. This allows us to retain our continuous-scale inputs, thereby not losing the granularity of information available for the clustering procedure. 2.5 Deciding the number of clusters Our final step requires deciding on the number of clusters subject to our input data, choice of clustering procedure, and similarity measure. This choice has both an empirical as well as a pragmatic dimension. For the latter, we need to choose a grouping that works best for our objective: to find clusters of poor with distinct characteristics for which differing policies can be defined. While we might find that 15 stable groups amongst the B40 exist, such a categorization might need to be more policy-relevant to identify. We keep this in mind as we explore the thresholds ahead. To quantitatively guide our decision, one approach is to seek a cluster solution such that any additional merging of clusters would happen at a significantly high difference, or “distance,” of the dissimilarity measure threshold. We can explore this diagrammatically in a dendrogram (Figure 3). Reading from bottom to top, the horizontal lines represent the measure of Gower at which clusters merge to form a larger group. The figure suggests the largest gap is at two or five clusters.4 The literature suggests additional quantitative criteria (“stopping rules”) that can be used to deter mine the optimal cluster number. Two prominent ones are the variance ratio criterion (VRC) (Calinski & Harabasz, 1974) and the Duda-Hart index (G.W & Cooper, 1985). The details of the construction and methodological details of these measures are provided in the Appendix. 3 If binary variables are used, the coefficient takes the value 1 when two objects do not share a certain characteristic and 0 else. When all the variables are binary and symmetric, Gower’s dissimilarity coefficient reduces to the simple matching coefficient when expressed as a distance measure instead of a similarity measure (i.e., 1 – SM). If binary and asymmetric variables are used, Gower’s dissimilarity coefficient equals the Jaccard coefficient when expressed as a distance measure instead of a similarity measure (i.e., 1 – JC). If continuous variables are used, the coefficient is equal to the city-block distance divided by each variable’s range. Ordinal variables are treated as if they were continuous, which is fine when the scale is equidistant (Brian, Landau, Leese, & Stahl, 2011). 4 Imagine a horizontal line at the y axis value of 75 and count the number of vertical lines it intersects. 8 Figure 3: Dendrogram Table 2 presents the VRC and Duda Hart index measures for cluster solutions. For the VRC, generally, we should choose the number that maximizes the pseudo-F-statistic. However, as the VRC mechanically decreases with the larger number of clusters, we compute the successive difference in VRC values, . The number that minimizes the omega value indicates the most stable solution5. As the table shows, a 5-cluster solution is optimal. Similarly, we choose the number that maximizes the Je(2) / Je(1) value for the Duda-Hart index. A modified version of this index proposes picking the number that minimizes the pseudo-T-squared value instead (Duda, Hart, & Stork, 2001). Two and five clusters yield the largest index value, and six have the lowest T squared. In practice, these two measures must be considered in tandem by choosing a value that yields a large VRC and Je(2)/Je(1), along with a small omega and T-squared. As we can see from the table, the two and five-cluster solutions provide the ideal case. These results also coincide with our interpretation of the dendrogram. How do we choose between these two cases? Here, as discussed at the beginning of this subsection, we will rely on our objective function of this exercise to be the determinant. Two cluster groups of the bottom 40 th quintile may not be preferable as they will likely not yield characteristics granular enough for policy intervention. Put differently, it is not clear that policies aimed at poverty alleviation can benefit from categorizing poor households into only two broad types. On the other hand, five groups may provide meaningful variation across dimensions of interest such as geography or occupational activity, thereby providing the groundwork for targeted policy interventions to address these households' unique constraints to income growth. 5 One disadvantage of using the omega value is that it is not defined for a one-cluster solution, so the minimum number of clusters that can be selected based on these criteria is three. 9 Table 2: Quantitative criteria for selection of cluster number Number of Variance ratio criterion (VRC) Duda-Hart Index Clusters Pseudo F-Stat Omega (ω) Je (2)/Je (1) Pseudo T-squared 2 940.506 0.918 485.891 3 743.308 92.185 0.893 403.700 4 638.294 62.977 0.885 361.633 5 596.257 -19.288 0.915 243.982 6 534.932 15.723 0.913 195.954 7 489.331 13.579 0.844 202.748 8 457.308 2.439 0.909 155.732 9 427.724 3.42 0.897 152.058 10 401.560 0.904 136.741 Given all our discussed methodological considerations, we choose the five-cluster solution produced by (agglomerative) hierarchical clustering of 26 input variables, using the Wards linkage algorithm and a Gower dissimilarity measure. The following section will discuss the results by exploring the characteristics of each group in detail. 3. Results The non-parametric cluster algorithm divides our B40 households into five mutually exclusive groups. To interpret the solution, we need to characterize each group using input variables or other measures. It is essential to focus on variables that differentiate the clusters. For example, suppose one group has low levels of human capital (education). This information is only relevant to our interpretation as the group differs substantially or in a notable pattern from other groups. To guide the interpretation, we use an asset framework of shared prosperity, comparing differences across dimensions of the framework (discussed in the following subsection). Furthermore, we identify each group by a meaningful, informative label. The group labels define features as a combination of situational and occupational household characteristics. It is also immediately apparent that these groups, even when defined most summarily, would necessitate poverty-alleviating interventions that look different and address distinct constraints. We will explore each group in detail to bring further nuance to our understanding. The five key groups of B40 households are as follows: 4. Group 1 consists of ultra-poor households in rural areas that primarily rely on earnings from low-skill sharecropping, supplemented by public social safety nets. 5. Group 2 are poor rural households characterized by active involvement in agriculture as owner- cultivators. 6. Group 3 are households with low labor force participation receiving remittances from a migrant household member. 10 7. Group 4 are urban households with higher education, who own [productive] assets, and whose members work in semi-skilled wage jobs in the industry and services sectors. 8. Group 5 consists of poor households whose members work in unskilled daily wage labor in construction and other smaller service work. Figure 4 presents the overall distribution of each group, as well as across four provinces of the country. What is important to note here is not the overall number or size of the group but rather the relevant distribution within each province. We note that G2, characterized by on-farm activities, is the largest group in Punjab’s B40 population. This is understandable, given the concentration of agriculture in the province. Similarly, G1, the most vulnerable shock-prone remote households, have the highest share in Sindh. Given the shock exposure of the province, this is consistent with the literature. One interesting thing to note is the high share of G5 in KP and Balochistan. While this might appear puzzling at first, one must realize that in the absence of agriculture or other forms of more advanced urban commerce, G5 almost appears to absorb the residuals of the other groups, defined by unskilled labor and elementary work across various areas and sectors. Remembering this point as we explore the asset framework lens will be important. Figure 4: Province-level distribution of poor groups 5000 (5) Informal Number of Households, in thousands elementary 21% occupations 4000 (4) Urban, service 14% industry oriented 28% 25% 3000 (3) Remittance 24% 14% 27% receiving, low 2000 employment 20% 36% 21% 30% 6% (2) Rural, on-farm 13% 1000 18% 18% activities 21% 48% 29% 22% 12% 9% 10% 17% 0 3% 14% (1) Remote, rural, Punjab Sindh Khyber Balochistan shock-prone Pakhtunkhwa In the sub-sections, we will explore group characteristics by presenting profiling and input variable differences that are more stylized. However, more detail is provided in Appendix Table A-2 with summary statistics for each group for all input variable measures and the overall mean for the sample of all B40 households, as well as the national averages. 3.1 Interpreting cluster profiles from the lens of an asset framework We use the asset framework of shared prosperity (Lopez-Calva & Rodríguez-Castelán, 2016) as our conceptual framework to focus on the distribution of “assets” and how its components differentiate between di fferent groups’ ability to escape poverty. This framework recognizes that the income of individuals and households depends on the endowments, utilization, and return of various types of assets, such as human capital, physical assets, and financial resources. From a policy perspective, it provides a holistic approach to address the multidimensional aspects of poverty and inequality, empowering the poor and vulnerable through better access to and effective use of their assets. 11 The asset framework of shared prosperity provides a comprehensive understanding of the factors contributing to or hindering inclusive growth and poverty reduction. It helps policymakers identify and address the barriers that prevent specific populations from fully participating in and benefiting from the development process. The components considered in this framework include (i) endowments, such as human capital, physical capital, financial capital, social capital, and natural capital; (ii) the intensity with which the assets are used, for instance, in the labor market (as opposed to just owning and accumulating these assets); (iii) the returns associated with them; (iv) transfers that they receive; and (v) to what extent external shocks may hinder the process of asset accumulation. The prices of the basket of goods and services that the household consumes impact returns and transfers. The endowments of assets, their use, returns, and transfers shape household income-generating opportunities. We discuss how endowments, their utilization, transfers, and shocks6 shape the different sets of constraints to income generation households face in our five identified clusters: Figure 5: Asset Framework of Shared Prosperity We use the asset framework of shared prosperity (Lopez-Calva & Rodríguez-Castelán, 2016) as our conceptual framework to focus on the distribution of “assets” and how its components differentiate between different groups’ ability to escape poverty. This framework recognizes that the income of individuals and households depends on the endowments, utilization, and return of various types of assets, such as human capital, physical assets, and financial resources. From a policy perspective, it provides a holistic approach to address the multidimensional aspects of poverty and inequality, empowering the poor and vulnerable through better access to and effective use of their assets. 6 While it is recognized in the literature that the poor may face different levels of inflation, unemployment and other consequences of macroeconomic downturns, due to the absence of meaningful micro-data, we do not consider the “Prices” and “Returns” components of the framework and choose to focus on the broader occupational activity. 12 The asset framework of shared prosperity provides a comprehensive understanding of the factors contributing to or hindering inclusive growth and poverty reduction. It helps policymakers identify and address the barriers that prevent specific populations from fully participating in and benefiting from the development process. The components considered in this framework include (i) endowments, such as human capital, physical capital, financial capital, social capital, and natural capital; (ii) the intensity with which the assets are used, for instance, in the labor market (as opposed to just owning and accumulating these assets); (iii) the returns associated with them; (iv) transfers that they receive; and (v) to what extent external shocks may hinder the impact returns and transfers. The endowments of assets, their use, returns, and transfers shape household process of asset accumulation. The prices of the basket of goods and services that the household consumes he 3.1.1 Group 1: Remote, rural, and shock-prone households Figure 7 provides summary statistics on measures of endowments across the groups. Household endowments refer to the resources, assets, and capabilities individuals or families possess or accumulate over time. Households in the first group (G1) are primarily located in rural areas. They exhibit the lowest levels of education, asset ownership, and dwelling conditions, with the likelihood of unimproved floors and walls much higher than other poor groups. Overcrowding (the number of household members per room) appears to be largest for this group as well, but this may be driven by differences in room sizes between rural and urban dwellings (Figure 6). On intensity of use, that is, how households use their endowments for income generation, such as through labor market participation, Figure 8 presents the group profiles. G1 members are preliminarily involved in wage employment, accompanied by a very small shares of self-employment. There is an apparent selection into employment sectors: G1 is predominantly agricultural, consistent with their rural status. Moreover, G1 has the highest share in lower-skill work among all groups (Figure 9); primarily consisting of low-skill sharecroppers. Household income, generated from using assets, is often complemented by transfers. These transfers may include domestic and international remittances, transfers from other households, or public transfers. Our set of rural sharecroppers have the lowest share of receiving any remittances but are most likely of Figure 6: Endowment: Accessibility and Remoteness 1 6 0.9 0.8 5 0.7 4 0.6 0.5 3 0.4 0.3 2 0.2 0.1 1 0 0 *Rural PSU *Accesibility Index (Higher PSU Avg Distance to School Overcrowding: Household = Worse) (km) Size/Occupied Rooms 13 Figure 7: Endowment of Assets, across groups 1.0 0.8 Share of Households 0.6 0.4 0.2 0.0 any group to receive social assistance (Figure 11). Most notably, this assistance appears to be cash transfers through the BISP program.7 External shocks encompass any circumstances that individual households and communities face that can have pernicious consequences for the income-generating capacity of households (World Bank, 2013). These include, but are not limited to, macroeconomic crises, extreme climate-related events, health-related shocks, and even crime and violence. Given the absence of information on shock exposure and household coping mechanisms in the data source, we used flood and drought risk measures from external sources. We spatially matched them to our B40 households at the district level. While they exhibit less than ideal variation, we still find a notable trend supporting our profiling of G1. Not only is this group more prone to droughts and floods, but this risk also maps to higher levels of food insecurity, with nearly 2 in 5 households having experienced moderate food insecurity in the last year (Figure 10). Figure 8: Intensity of Use 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Share of members *Wage income *Self employed *Primary Industry: *Primary Industry: *Primary Industry: Primary Industry: Share of members actively working Agriculture Manufacturing Construction Services Sector with very low skill occupation G1: Rural, remote, shock-prone, landless G2: Rural, on farm activities G3: Off farm transition G4: Urban, service industry dependent G5: Low skill, informal wage industrial workers 7 Interestingly, we find very low numbers of poor households reporting having received Zakat. This value, while improbable, could be confounded by social stigma that may be attached to being a household that receives Zakat. There does not appear to be a strongly evident non-reporting of the BISP transfer, perhaps respondents might have felt a non-response on this would threaten their likelihood of receiving a cash transfer in the future. 14 This group faces several key binding constraints to improving their welfare: (i) low levels of human capital and skills, (ii) poor connectivity (rural communities), (iii) associated insufficient public service delivery, (iii) limited agricultural productivity and livelihood diversification, and (iv) a high degree of vulnerability to climate change and shocks. Figure 9: Intensity of Use in Agriculture 1 0.5 0 Primary Industry: Agriculture Household cultivates Agri land Share of cultivated land Share of cultivated land share owned cropped (1) Remote, rural and shock-prone households (2) Rural households involved in on-farm activities (3) Remittance receiving households with low employment (4) Urban, service industry oriented 3.1.2 Group 2: Rural households involved in on-farm activities Group 2 (G2) households are comparable to G1 given their similar rural status. On endowments, they score much better on human capital, asset ownership and dwelling conditions relative to G1. In fact, on some measures, they rank the highest among B40 clusters, for example, half of all households in G2 own a motorcycle. About one-third of members completed primary school, although this number drops to about 10 percent for secondary-level education. Around 32 percent of households have mobile phone devices, and nearly half own sewing machines and motorcycles. This suggests that, while still poor, these households have been able to invest in some productive assets. Sewing machines can form the basis of subsistence production and repair of clothing, and motorcycles (the highest amongst all clusters) represent a proxy for “accessibility to markets”. This difference in asset endowment levels is complemented by the divergence in intensity of use. While both groups are primarily practicing agriculture, G2 is involved in cultivating their own land. This is corroborated by Figure 10: Exposure to Shocks 0.5 0.4 0.3 0.2 0.1 0 Percentage of District Population in Flood Rescaled Frequency of Agricultural Drought Experienced moderate food insecurity in last Perimeter (1984-2022) year (1) Remote, rural and shock-prone households (2) Rural households involved in on-farm activities (3) Remittance receiving households with low employment (4) Urban, service industry oriented (5) Informal elementary occupations 15 Figure 11: Transfers and Assistance 0.4 0.2 0 Received Received any Received Received Received Received Received HH owes loans/is remittances from assistance remittance from remittance from assistance: Zakat assistance: In-kind assistance: BISP in debt within or outside (Zakat/BISP/Govt) within country outside country country (1) Remote, rural and shock-prone households (2) Rural households involved in on-farm activities (3) Remittance receiving households with low employment (4) Urban, service industry oriented the fact that 70 percent of households in this cluster have self-employed working members, the highest of any group, with only a small share in elementary occupations (about 20 percent). These characteristics signal more productive employment and higher value compared to sharecropping households (G1). On transfers, around 15 percent of households in this group receive (mostly internal) remittances and the same percentage receives some form of public assistance. While not as shock prone as the G1, G2 households rank second in their exposure to floods and droughts, with 10 percent of households living in districts within flood perimeters. The constraints this group faces in improving incomes are similar to smallholder farmers: (i) low levels of human capital, (ii) insufficient yields from traditional low-value farming systems, (iii) weak linkages to regional markets, and (iv) looming climate change threats to the agri-food system (World Bank, 2024). 3.1.3 Group 3: Remittance-receiving households with low employment Households in Group 3 (G3) have a notably higher share of female heads and rank in about the median level for other measures of endowments amongst the groups (Figure 7). It is harder to say whether G3 might exist in peri- urban or rapidly urbanizing, but previously rural settlements as outdated (government-defined) measures of rural/urban prevent a closer examination. Households in off-farm transition appear to have a much higher share of self-employed workers despite having the lowest share of members participating in the labor market overall. On transfers, we note that G3 has a significantly higher share of households receiving remittances. Most of these remittances are internal (Figure 11). This fact, combined with employment characteristics of household members, allows us to make the case that G3 households (off-farm transition) have a “missing working member” who has perhaps migrated to an urban center and sends remittances to the remaining family members in their (rural or peri-urban) home. This framing is consistent with the higher share of female heads and lower labor market participation in wage jobs, as the missing worker would not be classified as a household member in the data. 3.1.4 Group 4: Urban households involved in service sector opportunities Group 4 (G4) is most strongly identified by its urban nature with about half (much larger than the overall B40 share) of households living in urban areas. Households score the highest on asset ownership and level of educational attainment and have the highest level of improved infrastructure (Figure 7). What is most notable 16 is that this level of endowment has not only the highest rank among cluster groups but is comparably higher than the national average as well (see Appendix Table A-2), which is surprising for a group otherwise part of the B40. Workers from G4 households appear to be spread across manufacturing and service sectors, albeit with a low share in elementary work, suggesting a relatively higher value service job (such as hospitality, shopkeepers, and small-scale technicians). Households in this group are least likely to receive government assistance, as well as least exposed to shocks. 3.1.5 Group 5: Households with working members in elementary service and industry occupations In this last cluster group, Group 5 (G5), about 80 percent of households are in rural areas, a higher share than G3 and G4 but lower than G1 and G2. This group is comparable to the group of sharecroppers in its asset endowment: they score low on human capital and asset ownership and have poor dwelling conditions. Households in this category have a high share of workers in wage-employment elementary skilled jobs, with a notable majority in the construction sector. This suggests that G5 may find their livelihood in low-skill daily wage labor. They are more likely to receive government transfers, only below the two rural groups, and about 1 in 3 households report being in debt. Despite being less exposed to natural and climate shocks, about 25 percent of households in this group experience moderate food insecurity, which is only second to the very ultra shock prone sharecropping households (G1). 4. Conclusion Poverty manifests in forms that are often varied and stem from different circumstances that stunt the income- generating capacity of households. Through meticulous non-parametric cluster analysis, an approach that does not rely on an existing deterministic framework of the differing determinants of welfare or types of low-income households, accompanied by an asset framework lens that allows us to structure and interpret the resulting group profiles, we improve our understanding of Pakistan’s poor. We classi fy the bottom 40 percent of the country’s poor into five mutually exclusive groups. The first of these are ultra-poor households in rural areas, who primarily rely on unskilled sharecropping as a means of earning, supplemented by public social safety nets. They have the poorest living conditions and are at constant risk of further welfare losses due to shocks. These may also be those worst affected by the impending consequences of climate change. A second set of poor rural households are also actively involved in agriculture as owner-cultivators. Low levels of education constrain their potential growth. Third, is a group of households in a transitory space between agriculture and service provision. Perhaps these are the households left behind in the rural-urban migration of earners, with more dependents and, in some 17 cases, female heads. Despite the high percentage of households receiving remittances, remittances are not sufficient to move them out of poverty. The fourth group are households in urban areas, there is the puzzling case of households with the highest level of education among the B40 group, the highest ownership of (productive) assets, and semi-skilled wage jobs in the industry and services sectors. This cluster has all the ingredients necessary to move up the income ladder. Their constraints may result from broader economic despondency, and they may be the first to move out of the B40 once the economy improves. The fifth group are poor households find their livelihood in unskilled daily wage labor in construction and other minor service work spread across the country. In some ways, they are the non-agricultural equivalent of our cluster of sharecroppers: very low human capital and terrible dwelling conditions (maybe slums/ katchi abadis). It is essential to recognize that a taxonomy of the poor is part of a larger effort to make more meaningful, empirically informed policies targeted to address the diverse set of constraints faced by the underprivileged in the country. While the discussion above only provides a brief glimpse towards this goal, ongoing work seeks to map, for each group, their salient constraints and connect these to the global development literature on interventions that chart pathways out of poverty. 18 References Barriga-Cabanillas, O., Kishwar S., Meyer, M., Nasir, M., and Qazi, M. (forthcoming). Poverty Projections for Pakistan: Nowcasting and Forecasting. Poverty and Equity Assessment Background Paper. Brian, E. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster Analysis. John Wiley & Sons, Ltd. Calinski, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics - Theory and Methods, 3(1), 1–27. Dolnicar, S., Grün, B., & Leisch, F. (2016). Increasing sample size compensates for data problems in segmentation studies. Journal of Business Research, 69(2), 992–999. Duda, R., Hart, P., & Stork. (2001). Pattern classification (2nd Edition). Hoboken: Wiley. G.W, M., & Cooper, M. (1985). An examination of procedures for determining the number. Psychometrika, 50(2), 159–179. Gower, J. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27(4), 857–871. Klopotek, M., & Wierzchon, S. (2018). Cluster Analysis. In Studies in Big Data, Volume 34. Cham: Springer. Lopez-Calva, L. F., & Rodríguez-Castelán, C. (2016). Pro-Growth Equity: A Policy Framework for the Twin Goals. Policy Research Working Paper; No. 7897. Mooi, E., & Marko Sarstedt, I. M.-R. (2017). Cluster Analysis. In Springer Texts in Business and Economics (p. 3130366). Singapore: Springer. Qiu, W., & Joe, H. (2009). clusterGeneration: Random cluster generation (with specified degree of separation). R package version 1.2.7. Rahman, A.M., Sani, N.S., Hamdan, R., Othman, Z.A., & Abu Bakar, A. (2021). A clustering approach to identifying multidimensional poverty indicators for the bottom 40 percent group. PLoS ONE 16(8). World Bank. (2013). World Development Report: Risk and Opportunity - Managing Risk for Development. Washington, DC: World Bank Group. World Bank. (2023). Macro-Poverty Outlook. World Bank. 19 Appendix Table A-1: List of Clustering Variables Thematic Area Input Variables HH has no members with any formal education Share adult members of HH with primary education Human Capital Share adult members of HH with secondary education Share adult members of HH with postsecondary education At least one HH member with wage income At least one HH member who is self-employed HH received remittances from within or outside the country HH received any assistance (Zakat/BISP/Govt) At least one HH member with primary ISIC: agriculture Livelihoods At least one HH member with primary ISIC: livestock At least one HH member with primary ISIC10: Manufacturing At least one HH member with primary ISIC10: Construction At least one HH member with primary ISIC10: Commerce At least one HH member with primary ISIC10: Transport / Communications At least one HH member with primary ISIC10: Other Services, Unspecified HH owns Sewing/Knitting Machine HH owns Motorcycle/Scooter HH has internet facility Physical Assets and Share of HH members who own a mobile phone Dwelling Conditions HH floor is made of earth/sand/dung HH walls are made of mud/mud bricks/wood/bamboo/stones HH has improved sanitation HH is in a rural PSU Overall Accessibility Index (District) Access and Shocks Percent of Population in Flood Extent Perimeter (2022) (District) Rescaled Frequency of Severe Agri Drought (1984-2022) (District) 20 Quantitative Stopping Rules  The Calinski and Harabasz’s (1974) variance ratio criterion (VRC) is defined, for a solution with objects and clusters: /( − ) = /( − ) Where is the sum of the squares between the clusters and is the sum of the squares within the clusters. The difference in VRC values, , is computed as: = (+ − ) − ( − − )  The Duda-Hart index essentially performs the same calculation as the VRC but compares the in a pair of clusters to be split both before and after this split. More precisely, the index is the in the two clusters () divided by the SSW in one cluster (). The pseudo-T-squared takes the number of observations in both groups into account. 21 Table A-2: Descriptive Statistics of B40 Groups (Cluster Variables) Group 1 Group 2 Group 3 Group 4 Group 5 Remote Urban, Variable B40 rural, Rural, on Remittance Informal, National service shock- farm receiving low elementary Average industry prone activities employment occupations oriented landless Owns Sewing/Knitting Machine 0.594 0.461 0.150 0.495 0.527 0.652 0.408 Owns Motorcycle/Scooter 0.540 0.391 0.236 0.521 0.376 0.484 0.289 Has internet facility 0.341 0.149 0.053 0.126 0.159 0.281 0.110 Share of members who own a mobile phone 0.491 0.354 0.267 0.316 0.375 0.420 0.371 Has no members with any formal education 0.196 0.331 0.756 0.198 0.366 0.051 0.434 Share adult members with primary education 0.516 0.313 0.083 0.313 0.322 0.543 0.248 Share adult members with secondary education 0.278 0.108 0.023 0.101 0.106 0.209 0.079 Share adult members with postsecondary education 0.150 0.038 0.010 0.036 0.030 0.080 0.027 Received remittances from within or outside country 0.182 0.137 0.053 0.173 0.375 0.099 0.064 Received any assistance (Zakat/BISP/Govt) 0.086 0.159 0.255 0.165 0.130 0.111 0.156 At least one member with wage income 0.668 0.763 0.994 0.589 0.036 0.967 0.988 At least one member who is self employed 0.369 0.348 0.112 0.704 0.665 0.265 0.068 At least one member with primary ISIC: agriculture 0.266 0.355 0.972 0.830 0.019 0.033 0.038 At least one member with primary ISIC: livestock 0.258 0.314 0.509 0.861 0.052 0.035 0.075 At least one member with primary ISIC: manufacturing 0.183 0.183 0.057 0.123 0.089 0.383 0.189 At least one member with primary ISIC: construction 0.150 0.239 0.138 0.131 0.009 0.160 0.555 At least one member with primary ISIC: commerce 0.213 0.189 0.029 0.102 0.373 0.315 0.160 At least one member with primary ISIC: transport/comms 0.099 0.122 0.042 0.057 0.156 0.226 0.122 At least one member with primary ISIC: Other Services, 0.177 0.135 0.053 0.074 0.058 0.340 0.110 HH floor is unimproved 0.349 0.608 0.897 0.673 0.537 0.192 0.760 HH walls are unimproved 0.182 0.360 0.670 0.344 0.274 0.045 0.501 HH has improved sanitation 0.747 0.665 0.487 0.636 0.731 0.791 0.652 HH is in a rural PSU 0.620 0.795 0.944 0.962 0.736 0.510 0.820 Overall accessibility index: equally weighted 0.363 0.401 0.469 0.459 0.363 0.335 0.384 Percentage of District Population in Flood Perimeter 0.083 0.102 0.193 0.100 0.068 0.080 0.093 Rescaled Freq of Severe Agri Drought (1984-2022) 0.020 0.033 0.073 0.036 0.024 0.014 0.029 Observations 25,809 8,789 1,323 2,058 1,100 1,689 2,619 22 Table A-3: Additional Profiling Statistics for Cluster Groups Group 1 Group 2 Group 3 Group 4 Group 5 Urban, Variable B40 Remote rural, Rural, on Remittance Informal, service National shock-prone farm receiving low elementary industry Average landless activities employment occupations oriented Female Head 0.101 0.067 0.032 0.052 0.243 0.048 0.028 Dependency Ratio (Rescaled) 0.117 0.151 0.158 0.143 0.216 0.119 0.148 Owns Agricultural Land 0.077 0.062 0.027 0.089 0.084 0.048 0.056 Owns non-Agricultural Land 0.025 0.027 0.008 0.022 0.050 0.023 0.031 Share of members actively working 0.499 0.541 0.735 0.638 0.316 0.491 0.505 Primary Industry: Services Sector 0.369 0.322 0.079 0.167 0.523 0.558 0.302 Share of members with very low skill occupation 0.208 0.343 0.596 0.238 0.050 0.244 0.523 Household cultivates Agri land 0.218 0.236 0.359 0.644 0.048 0.036 0.057 Share of cultivated land owned 0.684 0.593 0.196 0.707 0.000 0.000 0.000 Share of cultivated land share cropped 0.181 0.301 0.706 0.186 0.000 0.000 0.000 Received remittance from within country 0.115 0.101 0.045 0.117 0.305 0.061 0.050 Received remittance from outside country 0.072 0.04 0.009 0.062 0.083 0.041 0.014 Received assistance: Zakat 0.007 0.009 0.007 0.003 0.010 0.011 0.013 Received assistance: In-kind 0.043 0.023 0.012 0.027 0.019 0.027 0.025 Received assistance: BISP 0.077 0.15 0.248 0.162 0.121 0.101 0.143 HH owes loans/is in debt 0.213 0.279 0.236 0.278 0.343 0.247 0.295 Experienced moderate food insecurity in last year 0.138 0.25 0.385 0.231 0.248 0.165 0.265 PSU Avg Distance to School (km) 1.827 1.769 2.024 1.915 1.638 1.587 1.716 Overcrowding: Household Size/Occupied Rooms 3.239 4.467 5.068 4.371 4.224 4.290 4.502 Observations 25,809 8,789 1,323 2,058 1,100 1,689 2,619 23 Table A-4: Summary Statistics of K-Means Clustering K=5 Variable B40 K-Cluster 1 K-Cluster 2 K-Cluster 3 K-Cluster 4 K-Cluster 5 HH owns Sewing/Knitting Machine 0.461 0.270 0.501 0.500 0.708 0.287 HH owns Motorcycle/Scooter 0.391 0.334 0.501 0.376 0.511 0.243 HH has internet facility 0.149 0.028 0.097 0.184 0.348 0.059 Share of HH members who own a mobile phone 0.354 0.286 0.325 0.369 0.442 0.336 HH has no members with any formal education 0.331 1.000 0.000 0.000 0.000 1.000 Share adult members of HH with primary education 0.313 0.000 0.391 0.384 0.724 0.000 Share adult members of HH with secondary education 0.108 0.000 0.072 0.056 0.471 0.000 Share adult members of HH with postsecondary education 0.038 0.000 0.016 0.016 0.187 0.000 HH received remittances from within or outside country 0.137 0.148 0.122 0.154 0.134 0.115 HH received any assistance (Zakat/BISP/Govt) 0.159 0.112 0.129 0.196 0.106 0.209 At least one HH member with wage income 0.763 0.705 0.740 0.786 0.788 0.771 At least one HH member who is self employed 0.348 0.342 0.448 0.344 0.372 0.239 At least one HH member with primary ISIC: agriculture 0.355 0.444 0.426 0.312 0.215 0.412 At least one HH member with primary ISIC: livestock 0.314 0.447 0.445 0.261 0.190 0.278 At least one HH member with primary ISIC10: Manufacturing 0.183 0.107 0.175 0.212 0.246 0.144 At least one HH member with primary ISIC10: Construction 0.239 0.214 0.238 0.269 0.198 0.244 At least one HH member with primary ISIC10: Commerce 0.189 0.107 0.202 0.222 0.262 0.117 At least one HH member with primary ISIC10: Transport and Comnunications 0.122 0.073 0.107 0.150 0.142 0.108 At least one HH member with primary ISIC10: Other Services, Unspecified 0.135 0.078 0.130 0.140 0.252 0.076 HH floor is made of earth/sand/dung 0.608 0.775 0.577 0.581 0.348 0.782 HH walls are made of mud/mud bricks/wood/bamboo/stones 0.360 0.446 0.296 0.333 0.189 0.553 HH has improved sanitation 0.665 0.423 0.590 0.768 0.804 0.634 HH is in a rural PSU 0.795 0.910 0.848 0.758 0.679 0.818 Overall accessibility index: equally weighted educ,health,mkt 0.401 0.630 0.633 0.265 0.329 0.285 Percentage of District Population in Flood Extent Perimeter 0.102 0.042 0.039 0.128 0.097 0.174 Rescaled Freq of Severe Agri Drought (1984-2022) 0.033 0.067 0.068 0.013 0.015 0.020 Observations 8789 1324 1478 2842 1358 1787 24 25