Welfare Dynamics in Colombia: Results from Synthetic Panels

This study explores the short-run transitions between poverty, vulnerability, and middle class, using synthetic panels constructed from multiple rounds of Colombia's Integrated Household Survey (in Spanish Gran Encuesta Integrada de Hogares). The paper reports results from two approaches to define a vulnerability line: the first one employs a nonparametric and parsimonious model, while the second utilizes a fully parametric regression model with covariates. The estimation results suggest a range of between $8 to $13 per day per person in 2005 purchasing power parity dollars as the vulnerability line. Using an average daily vulnerability line of $10 per day per person, subsequent estimates on welfare dynamics suggest that, during the past decade, 20 percent of the Colombian population experienced downward mobility, and 24 percent experienced upward mobility. Furthermore, upward mobility increases with higher education levels and is lower for female-headed households.


Introduction
Colombia's recent record of solid economic growth led to significant reductions in poverty and improvements in social indicators from 2002 to 2016. Moreover, during this period, the extreme poverty rate more than halved, falling from 17.7 percent to 8.5 percent; while moderate poverty fell from 49.7 percent to 28.0 percent, as shown in Figure 1. Ideally, a researcher would like to use a longitudinal survey or panel data to analyze welfare dynamics or income mobility. However, in many developing countries panel data sets are not readily available, span few periods, or suffer from "non-random" attrition issues, hindering the capacity of researchers to study elements such as the factors that help households escape or remain in poverty (Dang and Lanjouw, 2013;Bourguignon and Moreno, 2015). To overcome the absence of panel data or longitudinal surveys, authors such as Deaton (1985), Deaton and Paxson (1994) and Pencavel (2007) have proposed methodologies to construct pseudo-panels by following similar age cohorts across multiple crosssection surveys. Nevertheless, as argued by Dang et al. (2014), these methodologies typically rely on having several rounds of cross-section surveys, but do not allow to analyze mobility at a more disaggregated level than the cohort. In addition, Fields and Viollaz (2013) argue that pseudo-panel methodologies might not perform well in predicting income mobility in some cases. 1 4 be classified as vulnerable. Notice, however, that the methodology proposed by Lopez-Calva and Ortiz-Juarez (2014) requires at least two waves of longitudinal information or panel data, which are not currently available in Colombia.
Colombia has two publicly available surveys designed to follow households across multiple periods: (i) the Encuesta Longitudinal de Protección Social (ELPS) 4 prepared by the Colombian statistics office (DANE), and (ii) the Encuesta Longitudinal Colombiana (ELCA) 5 elaborated by the Universidad de los Andes. However, in practice there is only one publicly available wave of the ELPS (making it a crosssection for concrete purposes) and, while there are two waves available of the ELCA, the survey does not properly capture incomes (the official welfare measure used for estimating extreme and moderate monetary poverty), raising concerns on the comparability of poverty estimates obtained using the ELCA and the GEIH. Therefore, the use of alternative strategies, such as a the one proposed in this paper, is required to assess the dynamics of households living in poverty, vulnerability or the middle class.
Using cross-section information from multiple rounds of the Gran Encuesta Integrada de Hogares (GEIH), this paper constructs a synthetic-panel for Colombia, and based on the methodology proposed by Dang et al. (2016), this paper estimates the vulnerability lines relevant for Colombia during the period 2008-2016, and calculates the transitions between poverty, vulnerability, and middle-class. Additionally, the paper shows several sensitivity analyses to better inform the crucial decision of choosing a vulnerability line. Results suggests a US$ 10 dollar-a-day in 2005 PPP (i.e. US$ 13.2 dollar-a-day in 2011 PPP) as the vulnerability threshold for Colombia. The "monetary welfare" dynamics suggest that roughly 56 percent of the Colombian population remain in the same income categories, 20 percent experience downward mobility and the remaining 24 percent experienced upward mobility. Furthermore, we observe that the rate of escaping poverty and vulnerability into the middle class, and the rate of escaping poverty into vulnerability increases with the levels of education, and it is lower for female household heads than their counterpart male household heads.
The rest of this study is organized as follows. The next section discusses the methodology proposed by Dang et al. (2014) to construct synthetic panels based on cross-section surveys to analyze poverty dynamics 6 with an application for Colombia. The third section shows the main characteristics of the data available for Colombia during the 2008-2016 period and, more importantly, of the relevant sample used for this study as well as how the window width of the synthetic panel was defined. Section 4 presents a sensitivity analysis for the vulnerability lines in Colombia using different base years and alternative 4 https://www.dane.gov.co/index.php/estadisticas-por-tema/pobreza-y-condiciones-de-vida/encuesta-longitudinal-deproteccion-social-elps 5 https://encuestalongitudinal.uniandes.edu.co/en/ 6 In this paper, the term poverty dynamics is associate to joint probabilities, and not to conditional probabilities. For example, the joint probability of being poor in t and being poor in t+1.
5 methods to estimate the vulnerability lines. The fifth section shows preliminary results and discuss briefly welfare dynamics across multiple potential states (i.e. poor, vulnerable and middle class). The last section presents some final remarks.

Methodology
A proper study of welfare dynamics typically entails a demanding minimum set of data requirements. It is necessary to follow the same observation (household or individual) for at least two-or preferably, multiple-periods. However, panel data or longitudinal data sets are hard to come by, especially in developing countries, while "snap-shots" of welfare captured in cross-section surveys are far more common (Dang and Lanjouw, 2013). This paper proposes to rely on a synthetic panel approach to provide point estimates of the income mobility in Colombia using as few as two rounds of cross section surveys.
Moreover, this study extends the typical analysis of transition in and out of poverty to analyze a more general setup of household movements across different income groups (poor, vulnerable and middle class).
The approach is intended to overcome the lack of available panel data by constructing a "synthetic panel" using only time-invariant individual and household characteristics from multiple rounds of the Gran Encuesta Integrada de Hogares (GEIH) of Colombia and exploiting this information to estimate the vulnerability lines necessary for the analysis of welfare dynamics. First, the following section explains the estimation of the vulnerability line and then presents an overview of the methodology used to study the transitions across poverty, vulnerability and middle class.

Vulnerability lines
On occasions researchers or policy makers are interested in studying more than the transitions in and out of poverty. In Colombia, where a large share of the population has escaped poverty during the last decade-despite the country was recently exposed to downside risks from volatile commodity pricesthere is an increasing interest in identifying the dynamics into the condition of vulnerability (i.e. population out of poverty but at risk of falling back into poverty, hence being vulnerable) and the middle class. In this context, the discussion on the estimation of a vulnerable group in Colombia is relevant from both a technical and a public policy perspective.
Since true panel data are not available in Colombia, this paper proposes to rely on the approach by Dang and Lanjouw (2017) to estimate vulnerability lines, using as few as two rounds of cross sections and moderate assumptions. Dang and Lanjouw (2017) define the vulnerability line V 1 such that a specified proportion of the population with a consumption level above this line in period 1 will fall below the poverty line Z in period 2. This proportion is referred to as the "insecurity index", Ρ , since the population with income levels above the vulnerability line could be regarded as "secure". Given a value for the "insecurity index" Ρ , then V 1 satisfies:

Ρ |
In addition, the definition of the insecurity index could be linked to a notion of "secure" population, which has incomes above the poverty line but still below the vulnerability line in period 1. The likelihood among this population of falling into poverty in period 2 is the "vulnerability index" Ρ and satisfies:

Ρ |
Both the "insecurity index" and "vulnerability index" provide operational measures for households' vulnerability to poverty, but while the vulnerability index focuses in the population in the middle of the income or consumption distribution, the insecurity index focuses on households located in the top of such distribution. Figure 2, taken from Dang and Lanjouw (2017), shows the differences between the insecurity index and the vulnerability index and how they relate.

Overview of the framework
As an introduction to the synthetic panel methodology proposed by Dang and Lanjouw (2016), this section summarizes the framework used to construct synthetic panel data from two rounds of cross sectional data. Assume there are two rounds of cross sectional surveys such that is the corresponding income for individual 1,2, … , in survey round 1,2 , with sample size . Now, let , be a vector of household characteristics. These variables can be either time-invariant (e.g., gender, ethnicity, language, place of birth, etc.), variables that can be easily recalled for round 1 in round 2 (e.g., information about household heads' age, education, etc.), or retrospective regressors.
Using these variables, the linear projection of household`s "i" income (or consumption) , on household characteristics for each survey round "j" is given by: (1) , , , If we are only interested in studying poverty dynamics, (using both incomes , and the poverty line are expressed in real terms), we are interested in knowing such quantities as (2) 7 which represents the percentage of households that are poor in the first period but non-poor in the second period (see appendix A, for a more detailed explanation). Nevertheless, when we are interested in studying the dynamics between poverty and vulnerability, we are interested in such quantities as (3) which represents the percentage of poor households in the first period that move into the vulnerable category in the second period. There are in total nine combinations of income categories when two periods are considered (see appendix B, for a more detailed explanation).
In the absence of true panel data, we need to use synthetic panels to study mobility, making two standard assumptions. Following Dang and Lanjouw (2013), the first assumption is that the underlying populations being sampled in survey rounds 1 and 2 are the same in terms of the time-invariant household characteristics. The second one is that and have a bivariate normal distribution 7 with the (partial) correlation coefficient and standard deviations and respectively. If is known, Dang and Lanjouw (2013) propose to estimate quantity (3) by where Φ . stands for the bivariate normal cumulative distribution function (cdf). A key element in the analysis of income mobility is the estimation of the correlation coefficient. Since is usually unknown in most contexts, it is possible to obtain an approximation based on asymptotic theory following the approach proposed by Dang and Lanjouw (2016). The procedure requires aggregating all the variables to the cohort level, where cohorts are formed by a different combination of all the values of the time-invariant characteristics (including age, gender, and education): Then, the partial correlation coefficient can be estimated as follows, ,

8
The estimates of correspond to the linear projection on household income or consumption on household characteristics aggregated at the year of birth level for survey round 1,2.

Data
This section of the paper discusses the characteristics of the main source of information used to construct the synthetic panel for Colombia during the 2008-2016 period -the Gran Encuesta Integrada de Hogares (GEIH) -and is divided in three subsections. The first part shows the main characteristics of the GEIH, the cross-sectional national household survey that provides the data to build each wave of the synthetic panel. After analyzing the main characteristics of the population in Colombia during the period 2008-2016, the second part describes the relevant sample: all households included in the surveys from 2008 to 2016 whose heads where born between 1948 and 1973. The rationality behind this selection is that household heads in this cohort who were interviewed in 2008 (the earlier wave of the GEIH analyzed) are expected to have completed their education-at least 25 years old-and still be part of the labor forceyounger than 60 years. As mentioned on Lucchetti (2017), the selection of these household heads also avoids life cycle events that may invalidate the time-invariant assumption. Once this cohort is fixed, the methodology suggests following the same cohort of individuals across time. This section ends with a discussion on the selection of the optimal window of analysis to build the synthetic panel, or the distance in years between two cross-sections of the survey. This paper argues that such window for Colombia should not be longer than two years, since for longer gaps it is likely that the characteristics of the households would change significantly, thus violating the assumption of time invariability of the characteristics associated with the income generation function.

Main source of information: The GEIH
9 The GEIH has national coverage with the following levels of temporal and geographical Eastern, Central, Pacific and Bogota) and for headlands and populated centers and rural dispersed and for the national total per zone (headers and populated centers and rural dispersed). (iv) Yearly: by capital city with its metropolitan area, by large regions and area (headlands and populated centers and dispersed rural) and by departments.

The characteristics of the synthetic panel
Following Dang and Lanjouw (2013) and Dang et al. (2014) it is important to verify that the distributions of the time-invariant variables for the two survey rounds are similar across different periods, since the proposed approach relies on the assumption that both surveys represent the same population and that income can be modeled based on such time invariant characteristics.  Table   1 suggest that the restricted sample reproduces very closely the moderate official poverty estimates for Colombia during the period of analysis. The average difference in poverty rates associated with the full and the restricted sample is 0.40 percent points from 2008 to 2016. This suggests that our estimation sample reflects adequately the Colombian population's poverty rates as measured in the unrestricted cross sections.
The second step is to assess whether the GEIH rounds are strictly comparable. Our findings suggest that the survey rounds do not appear to suffer from serious comparability issues, especially the potentially time invariant variables for the income model. We focus on household heads, which represent 28 percent of the population, to implement the synthetic-panel methodology. In addition, this paper uses the survey design of the GEIH to improve the precision of the estimates presented in this section. The same cohort of individuals is followed across time to implement the methodology proposed by Dang and Lanjouw (2016).
The variables chosen to construct the synthetic panels are the following: birth year (cohort group), gender, and education attainment (level). There are only 0.04 percent missing values related to the previous variables. 9 Given that one of the time invariant characteristics chosen for the analysis is the level of education of the household head, we restrict the sample to individuals from 25 to 68 years of age. 10 This decision is made to avoid truncation in the variable of educational attainment, and to guarantee representativeness of household heads (the implicit assumption being that by 25 years old the average Colombian should have completed his or her education). In addition, restricting the head of the household head's age to a specific range is a standard procedure to keep the household composition stable over different periods. Moreover, the population also tends to become more educated over time, although the share of household heads with no formal education remained relatively constant across all the periods of analysis.

Defining the relevant window of analysis
To define the time interval between two cross sections in which the assumption of invariability holds, the different characteristics of the household heads are formally tested (see Table C in the Appendix C). This procedure implements a t-test of the means for each of the time invariant characteristics at different periods to determine if they are not statistically different. The results suggest that the assumption of time invariability is less plausible when comparing pairs of surveys more than two years apart from each other. For instance, The assumption of time invariability seems to be consistent with the empirical findings but will not necessarily hold for other education levels. For instance, for the share of 9 As part of the analysis, we identify income per capita outliers using the Blocked adaptive computationally efficient outlier nominators (BACON). After applying this method, this paper finds that 0.1 percent of the sample observations are classified as outliers and they are present mostly in 2008 (that is approximately 1 percent of the sample in 2008). 10 As the age range should be kept fixed over time for all the different cohorts (i.e. adjusting for the year difference between the survey rounds), we should use the age range 25-60 for 2008, 27-62 for 2010, 29-64 for 2012 and so on.
household heads only with primary education (Table C.3), differences are statistically significant for most cases when comparing pairs of surveys with more than one year of separation. In sum, the results show the differences in education levels of the heads of households across different years are significant, suggesting caution and providing evidence to be conservative when relying in the assumption of time invariability of the characteristics across periods. Strictly speaking, the results of the means tests suggest that for Colombia the time interval between the cross-section surveys should not be more than two years apart from each other.

Vulnerability lines
Identifying the vulnerable group usually relies on estimating "appropriate" lines that allow us to classify the population in different categories, similarly to the definition of poverty. However, in contrast with poverty, where typically one threshold (i.e. poverty line) is enough to split the population in poor and nonpoor, in the case of vulnerability two lines might be necessary. First, we need to define the cutoff point that represents the lower bound of the vulnerable group, which in practice usually coincides with the poverty line, meaning that people or households who graduate from poverty (i.e. achieve incomes above the poverty line) do not immediately become part of the middle class but instead remain in a state of vulnerability.
Second, probably the most relevant line is the upper bound of the vulnerable group that we call "vulnerability line", and typically represents the lower bound of the middle class.
Once poverty and vulnerability lines are defined, individuals or households could be classified either as poor when their incomes are below the poverty line, or as vulnerable when their incomes are above the poverty line but below the vulnerability line, or as middle class when their incomes are above the vulnerability line (and implicitly above the poverty line, since the value of the vulnerability line is higher than the poverty line). Notice, the economic literature (Atkinson and Brandolini, 2013;Lopez-Calva and Ortiz-Juarez, 2014) usually considers the upper bound line of the vulnerable group as the lower bound of the middle class. In addition, any proposal to empirically estimate a vulnerable group requires accepting the implicit assumption that it is possible not only to formulate a relevant concept for class but also to identify these categories from empirical methods (Lopez-Calva and Ortiz-Juarez, 2014).
Although there is currently no consensus in the literature on the best methodology to estimate the vulnerability line, the Government of Colombia has been exploring the use of the economic security approach, based on the criterion of vulnerability to poverty, to identify the upper limit of the vulnerable group or the lower bound of the middle class based on the Lopez-Calva and Ortiz-Juarez approach (see Pavon and Perez, 2016). 11 Even though these results based on the ELCA were informative, they raised several concerns since the poverty figures were not only different from official estimates but also reflected a different poverty line (i.e. international poverty lines were used in this exercise). However, the main requirement the Lopez-Calva and Ortiz-Juarez (2014) methodology has is the existence of longitudinal information or panel data, which are not currently available in Colombia.
It is important to note that, for the Dang and Lanjouw (2017) approach, there is no close solution for that can be obtained from the equations of the "insecurity" and "vulnerability" indexes. However, given household income in both periods, the poverty line z, and some pre-determined value for either the insecurity or vulnerability index, it is possible to empirically solve for the vulnerability line . The construction of the vulnerability lines can be approached as a two-step process: the first is to identify the appropriate poverty lines (which are usually given; for example, the international poverty line); the second step is to iterate upward from the given poverty line, until we reach a value of the vulnerability line that provides the specified vulnerability index.
This method produces a set of vulnerability lines for a set of vulnerability indexes, posing the challenge for policy makers or society to choose a number from that set. The rule of thumb used under this approach for developing countries is to set a vulnerability index between 15 and 30 percent (or as desired by the social development objectives). This identification difficulty is not particular to this approach. For

Estimation of the vulnerability line
Poverty lines are sensitive to the base year used for their estimation. These lines are drawn from particular welfare distributions that change over time given the underlying development process in which many 11 The vulnerability to poverty approach can be divided in three steps. The first stage focuses on identifying the characteristics associated with the transition to and from the poverty condition. The second stage seeks to model the probability of falling into poverty, based on a series of observable variables, using a logistic model. The third step uses the variables that explain the probability of falling into poverty to predict the expected income associated with each level of probability. The third stage allows to identify the level of income associated with a 10 percent probability of falling into poverty (per the stylized facts reported by Cruces et al., 2011). 12 Lawrence (1984), Blackburn & Bloom (1987), Horrigan & Haugen (1988), Kosters & Ross (1989), Birdsall, Graham & Pettinato (2000) and D'Ambrosio et al (2002), Atkinson, A. and Brandolini, A. (2011) to quote a few. increases. This event seems to happen when the vulnerability index is within 30-32 percent which represents an increment of around two times the poverty line.

An alternative approach to define the vulnerability line
This section presents the results of implementing the Hertova, Lopez-Calva and Ortiz-Juarez (2010) approach to estimate the vulnerable households. The idea is to shed some light on the robustness of the results found so far. However, it is relevant to notice that the empirical method was adapted to find a vulnerability line.
In recent years, authors like Atkinson and Brandolini (2013) and Lopez-Calva and Ortiz-Juarez (2014) have proposed to study vulnerability anchoring the concept to the risk of falling into poverty. In line with Ravallion (2010), these authors suggest that although during recent decades, the population in developing countries seems to be escaping poverty, some of the households moving beyond the poverty threshold are still highly vulnerable and only marginally better off than their "poor" counterparts. In this context, the key element to define the middle class would be how safe are the income based middle-class citizens from falling back into poverty.
In the case of Colombia, one of the limitations to apply the methodology proposed by Lopez-Calva and Ortiz-Juarez (2014) is that there is no publicly available longitudinal survey that allows to map the 14 different probabilities of falling into poverty with a specific level of income or consumption. 13 Therefore, this paper takes the approach proposed by Hertova, Lopez-Calva and Ortiz-Juarez (2010). The authors use cross-section data surveys to determine the vulnerable population; we adapt this methodology to find the amount of comparable income associated with a 10 percent risk of falling into poverty (as suggested in Lopez-Calva and Ortiz-Juarez (2014)).
Note that, in contrast with the regressions in Section 3, the set of control variables or observable characteristics is not limited to time invariant characteristics. Table 4 shows the estimated vulnerability lines for four alternative specifications:  Specification 1: only includes variables associated with time invariable characteristics of the head of the household (i.e., education level, year of birth and gender).
 Specification 2: adds a set of variables associated with labor market outcomes (i.e., employment status, sector of the economy, type of employment).
 Specification 3: adds characteristics of the household such as household size and access to basic services.
 Specification 4: adds to specification 3, additional controls for exposure to shocks such as losing a job.
The results suggest that the vulnerability line is sensitive to the model specification. For instance, models controlling for a larger set of variables tend to produce lower income estimates associated with a 10 percent risk of falling into poverty. 14 A plausible explanation is that models including more control variables do a better job capturing the elements associated with the probability of falling back into poverty, thus given that the model already controls for such elements the level of income associated with a particular risk of transition into poverty is lower.

Identifying the vulnerability line
Even though there is no objective method to identify the vulnerability line, the final choice of the vulnerability line seeks to be informed by the results from previous exercises. From the sensitivity analysis, a convergence pattern of the vulnerability index seems to happen within the 30-32 percent across time. This 13 Recent advances with synthetic panel techniques such as that of Bourguignon and Moreno (2015) may be applied to address this issue. Other alternative approaches have been proposed that aim to construct some measure of income mobility based on averaging the error terms of the household consumption model in some way (see, e.g., Stampini et al. (2016) and Lucchetti (2017)), but we would like to caution against such approaches since these studies do not offer an underlying theory that supports doing so. 14 We find the same patterns as Lopez-Calva and Ortiz-Juarez (2014) We would like to offer some further reflections about identifying the vulnerability line. This task depends to a large extent on the specific context of the country, and subjective judgment. 16 Thus it can be useful to combine details from both the contextual background, as well as findings from previous studies to construct the vulnerability line. This process ensures that different economic and societal factors are fully taken into account. Our discussion above did so and suggests that a vulnerability index of 30 percent could be appropriate for Colombia. But we will also examine other robustness checks in future research, for example, by investigating mobility patterns when the vulnerability lines are varied. 17 It is also possible to break down the analysis by subgroups based on observable population characteristics such as gender and education level. When comparing welfare dynamics among male and female household heads, we observe that overall mobility is similar across genders and over time (Figure   4 -7). However, female household heads are slightly less likely to escape poverty in every period than their In terms of welfare dynamics, the population with the highest level of education (tertiary) remained significantly more immobile during the period of analysis (i.e. on average approximately 70 percent of the population stayed in a similar income category between pairs of years from 2008 to 2016). In addition, the population with primary education, middle school and secondary education showed similar levels of immobility (i.e. on average 55 percent of households remained immobile across pairs of years). Moreover, the group of households whose heads were uneducated showed lower overall mobility (i.e. on average almost 60 percent of the population remained in the same income category between pairs of years).

Results: "Monetary welfare" dynamics
The rate at which the poor could escape poverty and move to vulnerability is shown by Figure 8.
This presents not only the overall rates but also by different observable characteristics of the head of the household such as gender, level of education and age. For instance, it shows that the upward mobility from poverty 19 towards vulnerability is slightly higher among households headed by males than by females. More importantly, these rates increase with the level of education of the household head. In particular, households whose head has no education are substantially less likely to move up the ladder than any other education group. This same figure shows that the highest rates of escaping poverty happened during the 2010-2012 interval. It is relevant to point out that the results found for households who had fallen into poverty mirror those mentioned above. For instance, the likelihood to fall into poverty is higher for less educated household heads, female and younger. 20 Finally, the upward mobility from vulnerability to the middle class 21 classified by gender and education level of the household head is shown in Figure 9. Similarly, households where the head is male are slightly more likely to escape vulnerability towards the middle class, while the rate of upward mobility from vulnerability to middle class also increases with the level of education of the head of the household.
In addition, note that it is not immediately clear that there was a period where the upward mobility from vulnerability to join the middle class was always higher than in other periods. 19 The upward mobility from poverty to vulnerable is the ratio between the population who move out of poverty to vulnerability divided by the sum of the population who transition out of poverty (either to vulnerability or middle class) and who remain in poverty. 20 We do not discuss movements from poor to middle class, given the small sample size. Results available upon request. 21 The upward mobility from vulnerability to the middle class is the ratio between the population who move out of vulnerability to middle-class divided by the sum of the population who transition out of vulnerability (either to poverty or middle class) and who remain in vulnerability.

Final remarks
This study contributes to filling the gap in the empirical literature currently limiting an evidence-

Education level
No education 6.6 6.9 6.9 6.9 7.1 6.9 6.7 6.6 6.6  a The vulnerability index is the share of the vulnerable population in the first period that becomes poor in the second period. The vulnerable population is understood here as those people with income per-capita above the poverty line, but below the corresponding vulnerability line.            30

Appendix A: Welfare dynamics with a poverty line
When there is only one poverty line of interest (using both incomes , and the poverty line are expressed in real terms), it is possible to represent the four relevant states using a 2 by 2 matrix, such as the one depicted in  However, notice that when only repeated cross-sections are available, it is not straightforward to construct the transitions in Table A. 1, since it is not possible to observe the same household in multiple periods, as would occur with panel data.
Then, for example, the percentage of households that are poor in the first round and non-poor in the second round could be estimated using the following probability: Pr . The prime difficulty with repeated cross-sections is that the researcher is not able to observe the values of y i1 and y i2 for the same household in multiple periods. However, it is possible to write the previous probability as a function of the joint distribution of the error terms and , capturing the correlation of those parts of the household consumption in the two periods which are not explained by the household characteristics , and , : Pr ′ , ′ , Importantly, it is possible to operationalize the previous expression relying on a bivariate normal distribution and using Φ . to represent the bivariate normal cumulative distribution function (cdf) as: , , Φ 2 ′ , , ′ , , A key element in the analysis of income mobility is the estimation of the correlation coefficient , that is likely to be non-negative, using one of the following alternatives: i. First, the simplest case occurs if is known, since the estimation of the bivariate normal distribution becomes relatively straightforward. However, this is not the typical case, since the real value of is usually unknown in many contexts, such as in the case of Colombia (when we do not have panel data).
ii. Second, we can obtain the upper and lower bounds of mobility by assuming minimum and maximum values for the correlation, for instance starting with 0 and 1. In the first case ( 0), the researcher is implicitly assuming that there is zero correlation between the error terms, thus income prediction for the first round is done by randomly drawing with replacement from the empirical distribution of the first-round estimated residuals for each household i in the second round. In the second case, when 1, the implicit assumption is that the correlations of the idiosyncratic shocks are perfect and positive, adding more "persistence" and "stickiness" to the vector of income.
iii. Third, we can identify a range of values for , from a group of similar comparable countries with actual panel data. 22 This method would also allow to refine the bounds for the higher and lower values that could take. Moreover, instead of a range of values we could adopt a single value for based on information from comparable sources.
iv. Finally, it is possible to obtain an approximation of this based on asymptotic theory following the approach proposed by Dang and Lanjouw (2016). The procedure implies aggregating all the variables at a cohort level, and estimating the following cohort-level equation: Then the partial correlation coefficient can be estimated as follows,

,
Notice that the estimates of correspond to the linear projection of household income-or consumptionon household characteristics aggregated at a cohort level for survey round j=1,2. 22 Lucchetti (2017)  Given the two relevant lines and three alternative states of interest, is possible to represent mobility as a three by three matrix with nine potential scenarios of income mobility, such as the example in Table   B.1. One of the useful properties of the matrix in Table B.1 is that it allows to directly establish income immobility by summing up the cells on the main diagonal (which correspond to the share of households who remain in the same state in the initial and final periods). Similar to the discussion in section 2.1, it is possible to express the transition probabilities as functions of the joint distribution of the error terms: (a) Poor-Poor : Poor-Middle Class : Vulnerable-Vulnerable :    Standard errors in parenthesis, * p<0.1 ** p<0.05 *** p<0.01 Note: Each box presents the differences in the percentage of female household heads between two different given years. As the grey becomes darker, the difference on the percentage of female household heads between the two given years becomes more statistically significant. Source: Own estimations based on GEIH from 2008 to 2016. Standard errors in parenthesis, * p<0.1 ** p<0.05 *** p<0.01 Note: Each box presents the differences in the percentage of household heads without education between two different given years.
As the grey becomes darker, the difference on the percentage of household heads without education between the two given years becomes more statistically significant. Source: Own estimations based on GEIH from 2008 to 2016. Standard errors in parenthesis, * p<0.1 ** p<0.05 *** p<0.01 Note: Each box presents the differences in the percentage of household heads with primary education between two different given years. One of the years is reported in the row, and the other one in the column. As the grey becomes darker, the difference on the percentage of household heads with primary education between the two given years becomes more statistically significant. Source: Own estimations based on GEIH from 2008 to 2016. Standard errors in parenthesis, * p<0.1 ** p<0.05 *** p<0.01 Note: Each box presents the differences in the percentage of household heads with middle school between two different given years. One of the years is reported in the row, and the other one in the column. As the grey becomes darker, the difference on the percentage of household heads with middle school between the two given years becomes more statistically significant. Source: Own estimations based on GEIH from 2008 to 2016. Standard errors in parenthesis, * p<0.1 ** p<0.05 *** p<0.01 Note: Each box presents the differences in the percentage of household heads with secondary education between two different given years. One of the years is reported in the row, and the other one in the column. As the grey becomes darker, the difference on the percentage of household heads with secondary education between the two given years becomes more statistically significant. Source: Own estimations based on GEIH from 2008 to 2016. Standard errors in parenthesis, * p<0.1 ** p<0.05 *** p<0.01 Note: Each box presents the differences in the percentage of household heads with tertiary education between two different given years. One of the years is reported in the row, and the other one in the column. As the grey becomes darker, the difference on the percentage of household heads with tertiary education between the two given years becomes more statistically significant. Source: Own estimations based on GEIH from 2008 to 2016.  Exposure to shocks Standard errors in parenthesis. *p<0.1 **p<0.05 ***p<0.01 Note: Results are constrained to the sample of households whose heads were born between 1948 and 1973. Labor market controls included the employment status, sector of the economy and type of employment. Household characteristics includes household size and access to basic services as water, electricity and sewage. Exposure to shocks includes losing the job and the cause. Source: Own estimations based on GEIH from 2008 to 2016. Results are constrained to the sample of households whose heads were born between 1948 and 1973. Labor market controls included the employment status, sector of the economy and type of employment. Household characteristics includes household size and access to basic services as water, electricity and sewage. Exposure to shocks includes losing the job and the cause. Source: Own estimations based on GEIH from 2008 to 2016.