Policy Research Working Paper                            10147




         Recovering Income Distribution
    in the Presence of Interval-Censored Data
                            Gustavo Canavire-Bacarreza
                               Fernando Rios-Avila
                              Flavia Sacco-Capurro




Poverty and Equity Global Practice
August 2022
Policy Research Working Paper 10147


  Abstract
 This paper proposes a method to analyze interval-censored                          the methodвЂ™s performance under the assumption of multi-
 data, using multiple imputation based on a heteroskedastic                         plicative heteroskedasticity, with and without conditional
 interval regression approach. The proposed model aims to                           normality. Second, it uses the proposed methodology to
 obtain a synthetic data set that can be used for standard                          analyze labor income data in Grenada for 2013вЂ“20, where
 analysis, including standard linear regression, quantile                           the salary data are interval-censored according to the salary
 regression, or poverty and inequality estimation. The paper                        intervals prespecified in the survey questionnaire. The
 presents two applications to show the performance of the                           results obtained are consistent across both exercises.
 method. First, it runs a Monte Carlo simulation to show




 This paper is a product of the Poverty and Equity Global Practice. It is part of a larger effort by the World Bank to
 provide open access to its research and make a contribution to development policy discussions around the world. Policy
 Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted
 at gcanavire@worldbank.org.




         The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
         issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
         names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
         of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
         its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                       Produced by the Research Support Team
    Recovering Income Distribution in the Presence of Interval-Censored Data *




                  Gustavo Canavire-BacarrezaвЂ                        Fernando Rios-Avila вЂЎ

                                           Flavia Sacco-CapurroВ§




Keywords: interval-censored data, Monte Carlo simulation, heteroskedastic interval regression,
wages

JEL Codes: C150, C340, J3




*
  The findings, interpretations, and conclusions expressed in this paper do not necessarily reflect the views of the
World Bank, the Executive Directors of the World Bank or the governments they represent. The World Bank does not
guarantee the accuracy of the data included in this work. The authors would like to thank Ximena del Carpio, Leonardo
Luchetti, Carlos Ospino, Daniel Mahler and the participants at the 5th IZA Labor Statistics Workshop: The
Measurement of Incomes, Living Costs and Standards of Living and the 2022 Stata Conference for helpful comments
and suggestions.
вЂ 
  The World Bank, gcanavire@worldbank.org
вЂЎ
  Levy Institute at Bard College, f.rios.a@gmail.com
В§
  The World Bank, fsaccocapurro@worldbank.org
   1. Introduction

Labor force surveys are a useful data source to understand employment dynamics in both
developing and developed countries. These surveys provide vast information on the labor market
status at higher frequency levels than living conditions surveys. In some cases, they are the only
source of information to describe and examine the structure of the labor markets. In the Latin
American and the Caribbean region, countries like Bolivia, Costa Rica, Ecuador, Jamaica, Mexico,
Peru, and Uruguay collect their labor force surveys quarterly as opposed to a yearly basis, which
is the case of most household and living standard surveys.


One of the key features of these labor surveys is that they provide information on the wages and
salaries of workers. This allows to estimate job market trends and obtain inequality measures of
labor income among workers. However, the full income distribution in many countries cannot be
retrieved because labor income is reported in brackets. Because of this, the estimation of inequality
or poverty measures, as well as regression type analysis, is difficult. This is the case of the labor
force survey for all countries in the Organization of Eastern Caribbean States (OECS).


This is not unique to the Caribbean region. Countries like Colombia, Germany, Australia, New
Zealand, Bosnia and Herzegovina, North Macedonia and Serbia, among others, have similar data
collection protocols for their micro census (Walter and Weimer 2018). In the U.S, the current
population survey (CPS) collects detailed family income only once a year, in the March
supplement, but collects family income in brackets on a monthly basis.

One argument in favor of using interval-censored questions to collect information on income is
the higher response rate compared to questions asking to report exact amounts (Wang et al., 2013).
This happens because income information is considered вЂњsensitiveвЂќ, and people are reluctant to
report actual earnings, and may choose not to respond to those questions at all (Moore et al., 2000;
Hagenaars and Vos, 1988). Field tests conducted in the past have shown that asking follow-up
income questions in a series of unfolding brackets achieves superior results in terms of response
rates for income amounts, as was the case of the National Health Interview Survey (NHIS) and the
Behavioral Risk Factor Surveillance System Survey (BRFSS), both administered by the Center for
Disease Control and Prevention of the United States (Angelov and Ekstrom 2018, Yan et al. 2018).
However, even though this form of data collection solves the problem of underreporting or
misreporting, it raises a problem for recovering the full wage (income) distribution, which is key
to understanding and analyzing inequality.

To better use the information from these types of surveys, we propose an imputation approach to
simulate the distribution of the data that is only available in brackets. The method is an extension
on the imputation approach described in Royston (2007), which considers heteroskedastic errors
to model the conditional distribution of the censored data. The estimated conditional distribution
is then used impute the data using draws from the estimated conditional distribution. Once the
imputed data is obtained, standard aggregation methods (Rubin, 1987) can be used to analyze the
censored data as if it were fully observed. For example, it can be used calculate poverty or
inequality measures, as well as perform regression analysis. To demonstrate the flexibility of this
approach, we use a Monte Carlo simulation to analyze the sensitivity of our method. As an
empirical example, we use the approach to analyze wage inequality in Grenada utilizing the
countryвЂ™s Labor Force Survey.

Other approaches exist in the literature, and have been used for analyzing this kind of data. To
measure income inequality with right-censored (top-coded) data, Jenkins et al. (2011) propose
multiple-imputation methods for estimation and inference where censored observations are
imputed using draws from a flexible parametric model fitted to the censored distribution, such as
Generalized Beta of the second kind (GB2), Sigh-Maddala or Dagum distributions. Chen (2017)
provides a generalized approach for the estimation of parametric income distributions using
grouped data, showing its consistency through complementary simulation results. More recently,
Walter and Weimer (2018) propose an iterative kernel density algorithm that generates pseudo
samples from the interval-censored income variable to estimate poverty and inequality indicators.
While the interval regression approach we propose fits with the models described in Chen (2017),
Jenkins et al (2011), and Walter and Weimer (2018), these papers focus on recovering the
unconditional distribution of income, without considering the relationship with explanatory
variables.


Zhou et al. (2017) and Chih-Yuan et al. (2021) propose methodologies for the estimation of
conditional quantile regressions using interval censored data, under different distributional
assumptions. While this approach can be used for analyzing interval censored data, it only focuses
on estimating conditional quantile regressions, requiring specialized software that are not readily
available. In contrast, the method we propose can be applied not only for the estimation of
conditional quantile regressions, but also for the estimation of unconditional distribution statistics.


Other studies, like the one proposed by Han et al. (2020), construct new measures of the income
distribution and estimate poverty in the U.S. using data from the monthly Current Population
Survey (CPS). They address the problem of censored income data using draws from the empirical
income distribution observed in the last March supplement. A similar method is proposed by
Parolin and Wimer (2020), who produce monthly updates of the Supplemental Poverty Measure
(SPM) rates with demographic data from the CPS and poverty data from the previous March
supplement of the CPS. However, these studies seek to obtain income estimates using the
uncensored distribution of previous years, which is not always available with other data sources,
like the ones analyzed in this paper.


Buutner and Rassler (2008) propose a multiple imputation approach, similar to ours, to analyze
wages from the German Institute of Employment Research (IAB) employment survey. While their
method focuses on the analysis top coded data, we expand the approach to analyze data with a
more generalized censoring structure.

The paper is organized as follows. Section 2 introduces the model and the econometric issues
associated with the imputation method; Section 3 provides a Monte Carlo simulation exercise to
analyze the performance of the methodology; Section 4 uses the methodology to analyze labor
income distribution changes in Grenada using the 2013-2020 series of the Labor Force Survey.
Section 5 concludes.

   2. Methodology

To address the problem of interval-censored data, we propose a multiple imputation approach
based on a heteroskedastic interval-regression model. An interval-regression model is a
generalization of the Tobit model that allows using a mixture of censored and completely observed
data, even if the censoring thresholds are unique to each individual. The goal of the model is to
find a set of parameters that maximizes the probability that, given a set of characteristics, the
predicted latent earnings fall within the declared earning threshold. Imputations are obtained using
random draws of the estimated conditional distributions.

     2.1. Interval regression model

Assume that (log) earned income (н µн±¦н µн±¦н µн±–н µн±– ) has a data generating process such that:

                                                               н µн±¦н µн±¦н µн±–н µн±– = н µнј‡н µнј‡ (н µн±Ґн µн±Ґн µн±–н µн±– ) + н µн±Јн µн±Јн µн±–н µн±– н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– )                                       (1)

Where н µн±Јн µн±Јн µн±–н µн±– is a homoscedastic i.i.d. error, with mean 0 and standard deviation 1, that is independent
of the characteristics н µн±Ґн µн±Ґ. н µнј‡н µнј‡(н µн±Ґн µн±Ґн µн±–н µн±– ) and н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– ) are flexible functions of н µн±Ґн µн±Ґн µн±–н µн±– . н µнј‡н µнј‡ (н µн±Ґн µн±Ґн µн±–н µн±– ) represents the
conditional mean of н µн±¦н µн±¦н µн±–н µн±– , and н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– ) is a strictly positive function that represents the conditional
standard deviation of н µн±¦н µн±¦н µн±–н µн±– . Following Machado and Santos-Silva (2019), the conditional mean
н µнј‡н µнј‡ (н µн±Ґн µн±Ґн µн±–н µн±– ) captures location shift effects of characteristics on the outcome, whereas н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– ) capture the
scale shifts, which relate to how much of the spread is explained by differences in characteristics.
Following the standard setup of interval-regression models (Stewart, 1983), we impose the
assumption that н µн±Јн µн±Јн µн±–н µн±– follows a standard normal distribution, so that н µн±¦н µн±¦н µн±–н µн±– |н µн±Ґн µн±Ґн µн±–н µн±– is also normally distributed
with mean н µнј‡н µнј‡ (н µн±Ґн µн±Ґн µн±–н µн±– ) and standard deviation н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– ). 1


                                              н µн±–н µн±–н µн±–н µн±– н µн±Јн µн±Јн µн±–н µн±– ~н µн±Ѓн µн±Ѓ(0,1) в†’ н µн±¦н µн±¦н µн±–н µн±– |н µн±Ґн µн±Ґн µн±–н µн±– ~н µн±Ѓн µн±ЃпїЅн µнј‡н µнј‡(н µн±Ґн µн±Ґ), н µнјЋн µнјЋ(н µн±Ґн µн±Ґ)пїЅ                               (2)


Under this assumption, equation 1 can be estimated via maximum likelihood by maximizing the
following function:

                                                                                                                    1            н µн±¦н µн±¦н µн±–н µн±– в€’ н µнј‡н µнј‡(н µн±Ґн µн±Ґ)
                          н µн°їн µн°їн µн±–н µн±– пїЅн µнј‡н µнј‡ (н µн±Ґн µн±Ґ), н µнјЋн µнјЋ(н µн±Ґн µн±Ґ)пїЅ = н µн±“н µн±“н µн±¦н µн±¦|н µн±Ґн µн±Ґ пїЅн µнј‡н µнј‡(н µн±Ґн µн±Ґ), н µнјЋн µнјЋ(н µн±Ґн µн±Ґ)пїЅ =                    н µнј™н µнј™ пїЅ                      пїЅ   (3a)
                                                                                                                н µнјЋн µнјЋ(н µн±Ґн µн±Ґ)                н µнјЋн µнјЋ(н µн±Ґн µн±Ґ)




1
 While this assumption is unnecessary for the estimation of standard linear regression models, imposing some
distribution assumption on the errors is necessary when estimating models via maximum likelihood. Nevertheless,
as described in MacDonald, Stoddard and Walton (2018), it is possible to relax this assumption using more flexible
distributions.
                                                                                                                   1
                                                                    н µнј‡н µнј‡(н µн±Ґн µн±Ґ), н µнјЋн µнјЋ(н µн±Ґн µн±Ґ) = max                       пїЅ log(н µн°їн µн°їн µн±–н µн±– )                                                          (3b)
                                                                                                                  н µн±Ѓн µн±Ѓ

Under these conditions, and assuming a flexible enough model specification to capture the
conditional mean and conditional variance, estimating equation (1) allows us to recover the whole
distribution of the dependent variable н µн±¦н µн±¦н µн±–н µн±– .

When н µн±¦н µн±¦н µн±–н µн±– is fully observed, this variable can be directly used for estimating any measure of poverty
or inequality, or to analyze the relationship between observed characteristics н µн±‹н µн±‹ and the outcome
н µн±¦н µн±¦, using standard statistical methods. Often, however, due to survey design, one may only have
access to data reported in brackets. In other words, rather than observing н µн±¦н µн±¦н µн±–н µн±– , one may only observe
that reported income by individual н µн±–н µн±– is within some lower (н µн±™н µн±™н µн±™н µн±™н µн±–н µн±– ) and upper (н µн±ўн µн±ўн µн±ўн µн±ўн µн±–н µн±– ) thresholds, which
may be different for each individual. In this case, unless н µн±™н µн±™н µн±™н µн±™н µн±–н µн±– = н µн±ўн µн±ўн µн±ўн µн±ўн µн±–н µн±– , the likelihood function defined
by Equations 3a and 3b is not defined.

An alternative for estimating a model with this type of data is the use of what is known as interval
regression. Interval regression is a generalization of the censored regression estimators like the
Tobit model (see Cameron and Trivedi (2010, ch 16) for a discussion of censored regressions),
where data can be a mixture of left-censored, right-censored, interval-censored, or fully observed.
For simplicity, we refer to the case with interval-censored data.

When the data is interval-censored, rather than modeling the outcome itself, the approach focuses
on modeling the probability that an individual н µн±–н µн±– reports income to be within the underlying income
brackets:

                                                                                н µн±ѓн µн±ѓ(н µн±™н µн±™н µн±™н µн±™н µн±–н µн±– в‰¤ н µн±¦н µн±¦н µн±–н µн±– < н µн±ўн µн±ўн µн±ўн µн±ўн µн±–н µн±– |н µн±Ґн µн±Ґн µн±–н µн±– )                                                            (4)

Using the data generating process (d.g.p.) defined by equation 1, and the normality assumption
of the error н µн±Јн µн±Јн µн±–н µн±– , equation (4) can be rewritten as:

         н µн±™н µн±™н µн±™н µн±™н µн±–н µн±– в€’ н µнј‡н µнј‡(н µн±Ґн µн±Ґн µн±–н µн±– )              н µн±ўн µн±ўн µн±ўн µн±ўн µн±–н µн±– в€’ н µнј‡н µнј‡ (н µн±Ґн µн±Ґн µн±–н µн±– )                             н µн±ўн µн±ўн µн±ўн µн±ўн µн±–н µн±– в€’ н µнј‡н µнј‡ (н µн±Ґн µн±Ґн µн±–н µн±– )                   н µн±™н µн±™н µн±™н µн±™н µн±–н µн±– в€’ н µнј‡н µнј‡(н µн±Ґн µн±Ґн µн±–н µн±– )
н µн±ѓн µн±ѓ пїЅ                                  в‰¤ н µн±Јн µн±Јн µн±–н µн±– <                                 |н µн±Ґн µн±Ґн µн±–н µн±– пїЅ = P пїЅн µн±Јн µн±Јн µн±–н µн±– <                                 пїЅ в€’ P пїЅн µн±Јн µн±Јн µн±–н µн±– <                                пїЅ (5н µн±Ћн µн±Ћ)
                      н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– )                          н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– )                                             н µнјЋн µнјЋ (н µн±Ґн µн±Ґн µн±–н µн±– )                                     н µнјЋн µнјЋ (н µн±Ґн µн±Ґн µн±–н µн±– )
                                                       н µн±ўн µн±ўн µн±ўн µн±ўн µн±–н µн±– в€’ н µнј‡н µнј‡ (н µн±Ґн µн±Ґн µн±–н µн±– )        н µн±™н µн±™н µн±™н µн±™н µн±–н µн±– в€’ н µнј‡н µнј‡ (н µн±Ґн µн±Ґн µн±–н µн±– )
                                                   = О¦пїЅ                                пїЅ в€’ О¦пїЅ                                 пїЅ                             (5н µн±Џн µн±Џ)
                                                                 н µнјЋн µнјЋ (н µн±Ґн µн±Ґн µн±–н µн±– )                          н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– )


Where О¦(. ) is the cumulative normal density function. Using equation (5b), the loglikelihood
function that is maximized to identify the parameters н µнј‡н µнј‡(н µн±Ґн µн±Ґн µн±–н µн±– ) and н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– ) is defined as:

                                             н µн±ўн µн±ўн µн±ўн µн±ўн µн±–н µн±– в€’ н µнј‡н µнј‡ (н µн±Ґн µн±Ґн µн±–н µн±– )        н µн±™н µн±™н µн±™н µн±™н µн±–н µн±– в€’ н µнј‡н µнј‡(н µн±Ґн µн±Ґн µн±–н µн±– )
   н µн°їн µн°їн µн±–н µн±– пїЅн µнј‡н µнј‡(н µн±Ґн µн±Ґ), н µнјЋн µнјЋ(н µн±Ґн µн±Ґ)пїЅ = О¦ пїЅ                                   пїЅ в€’ О¦пїЅ                                пїЅ н µн±–н µн±–н µн±–н µн±– data is interval в€’ censored   (6a)
                                                        н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– )                          н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– )


                                                                    н µн±ўн µн±ўн µн±ўн µн±ўн µн±–н µн±– в€’ н µнј‡н µнј‡ (н µн±Ґн µн±Ґн µн±–н µн±– )
                            н µн°їн µн°їн µн±–н µн±– пїЅн µнј‡н µнј‡ (н µн±Ґн µн±Ґ), н µнјЋн µнјЋ(н µн±Ґн µн±Ґ)пїЅ = О¦ пїЅ                                пїЅ н µн±–н µн±–н µн±–н µн±– data is left в€’ censored                      (6b)
                                                                              н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– )


                                                                   н µн±™н µн±™н µн±™н µн±™н µн±–н µн±– в€’ н µнј‡н µнј‡ (н µн±Ґн µн±Ґн µн±–н µн±– )
                        н µн°їн µн°їн µн±–н µн±– пїЅн µнј‡н µнј‡(н µн±Ґн µн±Ґ), н µнјЋн µнјЋ(н µн±Ґн µн±Ґ)пїЅ = 1 в€’ О¦ пїЅ                                пїЅ н µн±–н µн±–н µн±–н µн±– data is right в€’ censored                      (6c)
                                                                                н µнјЋн µнјЋ (н µн±Ґн µн±Ґн µн±–н µн±– )


                                                                    1             н µн±™н µн±™н µн±™н µн±™н µн±–н µн±– в€’ н µнј‡н µнј‡(н µн±Ґн µн±Ґн µн±–н µн±– )
                         н µн°їн µн°їн µн±–н µн±– пїЅн µнј‡н µнј‡(н µн±Ґн µн±Ґ), н µнјЋн µнјЋ(н µн±Ґн µн±Ґ)пїЅ =                   П•пїЅ                                пїЅ н µн±–н µн±–н µн±–н µн±– data is fully observed          (6d)
                                                               н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– )                 н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– )


Which can be used to obtain estimates for н µнј‡н µнј‡ (н µн±Ґн µн±Ґ) and н µнјЋн µнјЋ(н µн±Ґн µн±Ґ) using maximum likelihood estimation.

     2.2. Model imputation

As previously described, when dealing with interval-censored data, we have limited access to the
observed distribution of the variable of interest. This is in contrast with standard multiple
imputation analysis, where the variable of interest is fully unobserved. This distinction has
implications on the imputation strategy because it determines the appropriate draw of the imputed
error.

Consider the d.g.p stated in equation 1 and define н µн±¦н µн±¦н µн±–н µн±–в€— to be the true but unobserved variable of
interest. By definition, if the data is interval-censored, the range of values that can be potentially
used to impute н µн±¦н µн±¦н µн±–н µн±–в€— are bounded between the lower and upper threshold of a given interval. In
addition, conditional on the observed characteristics н µн±Ґн µн±Ґ, and the parameters н µнј‡н µнј‡ (н µн±Ґн µн±Ґн µн±–н µн±– ) and н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– ), it
implies that the unobserved error н µн±Јн µн±Јн µн±–н µн±–в€— is also bounded:
                                                                          н µн±™н µн±™н µн±™н µн±™н µн±–н µн±– в€’ н µнј‡н µнј‡ (н µн±Ґн µн±Ґн µн±–н µн±– ) н µн±ўн µн±ўн µн±ўн µн±ўi в€’ н µнј‡н µнј‡ (н µн±Ґн µн±Ґн µн±–н µн±– )
                                                          н µн±Јн µн±Јн µн±–н µн±–в€— в€€ пїЅ                                  ,                             пїЅ                                           (7)
                                                                                       н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– )             н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– )


Furthermore, under the assumption that н µн±Јн µн±Јн µн±–н µн±– follows a standard normal distribution, we can impute
values for н µн±¦н µн±¦н µн±–н µн±–в€— , by simply getting random draws for н µн±Јн µн±Јн µн±–н µн±–в€— from a truncated random normal distribution:


                                                                                                           н µн±™н µн±™н µн±™н µн±™н µн±–н µн±– в€’ н µнј‡н µнј‡ (н µн±Ґн µн±Ґн µн±–н µн±– )     н µн±ўн µн±ўн µн±ўн µн±ўi в€’ н µнј‡н µнј‡(н µн±Ґн µн±Ґн µн±–н µн±– )
                   пїЅн µн±–н µн±– = О¦в€’1 (н µн±џн µн±џн µн±–н µн±– ), where н µн±џн µн±џн µн±–н µн±– ~н µн±€н µн±€н µн±€н µн±€н µн±€н µн±€н µн±€н µн±€н µн±€н µн±€н µн±€н µн±€н µн±€н µн±€ пїЅО¦ пїЅ
                  н µн±Јн µн±Ј                                                                                                                     пїЅ,О¦пїЅ                            пїЅпїЅ      (8)
                                                                                                                        н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– )                 н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– )


Where О¦в€’1 (н µн±џн µн±џн µн±–н µн±– ) corresponds to the н µн±џн µн±џн µн±Ўн µн±Ўв„Ћ quantile for the standard normal distribution. Finally, the
imputed value for the outcome of interest н µн±¦н µн±¦н µн±–н µн±–в€— is given by:

                                                                                                 пїЅн µн±–н µн±– н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– )
                                                                     пїЅн µн±–н µн±– = н µнј‡н µнј‡ (н µн±Ґн µн±Ґн µн±–н µн±– ) + н µн±Јн µн±Ј
                                                                    н µн±¦н µн±¦                                                                                                           (9)

Because the population parameters н µнј‡н µнј‡(н µн±Ґн µн±Ґн µн±–н µн±– ) and н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– ) are unknown, we use the sample equivalents
that are estimated using the interval regression estimator via maximum likelihood. 2 To account for
the uncertainty of the regression estimation, we obtain random draws from the following joint
normal distribution:

                                                    пїЅ(н µн±Ґн µн±Ґ)
                                                   н µнј‡н µнј‡                н µнј‡н µнј‡М‚ (н µн±Ґн µн±Ґ) пїЅ          н µн±›н µн±›
                                               пїЅ             пїЅ ~н µн±Ѓн µн±Ѓ пїЅ                      пїЅ в€— ; н µн±›н µн±›
                                                                                          пїЅ=О©
                                                                                   , О©пїЅ ; О©                 2
                                                                                                    пїЅ ~н µнј’н µнј’н µн±›н µн±›                                                                  (10)
                                                    пїЅ (н µн±Ґн µн±Ґ)
                                                   н µнјЋн µнјЋ                  пїЅ (н µн±Ґн µн±Ґ)
                                                                        н µнјЋн µнјЋ                    пїЅ
                                                                                               н µн±›н µн±›

      пїЅ is the ML variance-covariance matrix estimate, н µн±›н µн±› is the number of observations in the
Where О©
             пїЅ is a random draw from a chi squared distribution н µн±›н µн±› degrees of freedom. Finally, the
sample, and н µн±›н µн±›
imputation for н µн±¦н µн±¦н µн±–н µн±–в€— will be given by:

                                                                     пїЅн µн±–н µн±– = н µнј‡н µнј‡
                                                                    н µн±¦н µн±¦                      пїЅн µн±–н µн±– н µнјЋн µнјЋ
                                                                              пїЅ(н µн±Ґн µн±Ґн µн±–н µн±– ) + н µн±Јн µн±Ј    пїЅ (н µн±Ґн µн±Ґн µн±–н µн±– )                                                              (11н µн±Ћн µн±Ћ)
                                                                                        н µн±™н µн±™н µн±™н µн±™ в€’ н µнј‡н µнј‡пїЅ(н µн±Ґн µн±Ґн µн±–н µн±– )     н µн±ўн µн±ўн µн±ўн µн±ўн µн±–н µн±– в€’ н µнј‡н µнј‡пїЅ(н µн±Ґн µн±Ґн µн±–н µн±– )
             пїЅн µн±–н µн±– = О¦в€’1 (н µн±џн µн±џн µн±–н µн±–Мѓ ), where н µн±џн µн±џн µн±–н µн±– ~н µн±€н µн±€н µн±€н µн±€н µн±€н µн±€н µн±€н µн±€н µн±€н µн±€н µн±€н µн±€н µн±€н µн±€ пїЅО¦ пїЅ н µн±–н µн±–
            н µн±Јн µн±Ј                                                                                                    пїЅ,О¦пїЅ                                пїЅпїЅ                      (11н µн±Џн µн±Џ)
                                                                                                  пїЅ (н µн±Ґн µн±Ґн µн±–н µн±– )
                                                                                                 н µнјЋн µнјЋ                               пїЅ (н µн±Ґн µн±Ґн µн±–н µн±– )
                                                                                                                                   н µнјЋн µнјЋ



2
 For numerical purposes, it is also important to emphasize that н µнјЋн µнјЋ (н µн±Ґн µн±Ґн µн±–н µн±– ) is not estimated directly, but ln н µнјЋн µнјЋ(н µн±Ґн µн±Ґн µн±–н µн±– ) is
estimated instead.
       пїЅн µн±–н µн±– is used in (11a) instead of н µн±Јн µн±Ј
Where н µн±Јн µн±Ј                                пїЅн µн±–н µн±– , to account for the role of the estimated parameters on the
       пїЅ.
error н µн±Јн µн±Ј
In summary, the imputation algorithm is as follows:
    1. Estimate the parameters associated with н µнј‡н µнј‡ (н µн±Ґн µн±Ґ) and н µнјЋн µнјЋ(н µн±Ґн µн±Ґ) using a heteroskedastic interval
         regression approach via maximum likelihood, as well as the variance covariance matrix
         О©.
    2. Obtain н µн±›н µн±›                            2
               пїЅ from a random draw from н µнј’н µнј’н µн±›н µн±›                пїЅ.
                                                  , and estimate О©
                                                                     н µнј‡н µнј‡М‚ (н µн±Ґн µн±Ґ) пїЅ
                                 пїЅ(н µн±Ґн µн±Ґ) and н µнјЋн µнјЋ
    3. Obtain a random draw for н µнј‡н µнј‡          пїЅ (н µн±Ґн µн±Ґ) from н µн±Ѓн µн±Ѓ пїЅ               , О©пїЅ.
                                                                       пїЅ (н µн±Ґн µн±Ґ)
                                                                      н µнјЋн µнјЋ
                                пїЅн µн±–н µн±– , conditional on н µнј‡н µнј‡
    4. Obtain random draws for н µн±Јн µн±Ј                                  пїЅ (н µн±Ґн µн±Ґ), for each observation н µн±–н µн±– .
                                                        пїЅ(н µн±Ґн µн±Ґ) and н µнјЋн µнјЋ
                                            пїЅн µн±–н µн±– .
    5. Get the full sample of imputed data н µн±¦н µн±¦
    6. Repeat steps 2-4 M times and obtain M sets of imputed samples.
Steps 2-4 corresponds to simulating from the posterior distribution, similar to what is described in
Gelman et al. (2014).

    2.3. Model estimation and inference

Once the M imputed data sets have been obtained, statistical analysis can be done by independently
implementing the desired model estimation across all M imputed samples. The aggregation and
summary from the M estimated models could then be done applying the combination rules
described in Rubin (1987).

                                                     М‚н µн±љн µн±љ and н µн±‰н µн±‰
Let н µн»Ѕн µн»Ѕ be the set of parameters of interest, and н µн»Ѕн µн»Ѕ         пїЅн µн±љн µн±љ be the set of estimated coefficients and
corresponding variance-covariance matrix obtained using simulated sample н µн±љн µн±љ. The multiple
                       М‚н µн±Ђн µн±Ђ for the parameter of interest is given by:
imputation estimates н µн»Ѕн µн»Ѕ

                                                          н µн±Ђн µн±Ђ
                                                         1
                                                 М‚н µн±Ђн µн±Ђ = пїЅ н µн»Ѕн µн»Ѕ
                                               н µн»Ѕн µн»Ѕ          М‚н µн±љн µн±љ                                           (13)
                                                        н µн±Ђн µн±Ђ
                                                         н µн±љн µн±љ=1


                                          пїЅн µн±Ђн µн±Ђ is given by:
Whereas the variance-covariance estimate н µн±‰н µн±‰
                                         н µн±Ђн µн±Ђ
                                     1                                      М‚н µн±Ђн µн±Ђ пїЅвЂІпїЅн µн»Ѕн µн»Ѕ
                                                                  М‚н µн±љн µн±љ в€’ н µн»Ѕн µн»Ѕ
                                                      н µн±Ђн µн±Ђ + 1 пїЅн µн»Ѕн µн»Ѕ                             М‚н µн±Ђн µн±Ђ пїЅ
                                                                                       М‚н µн±љн µн±љ в€’ н µн»Ѕн µн»Ѕ
                             пїЅн µн±Ђн µн±Ђ = пїЅ н µн±‰н µн±‰
                            н µн±‰н µн±‰              н µн±љн µн±љ + пїЅ         пїЅ                                                        (14)
                                    н µн±Ђн µн±Ђ                  н µн±Ђн µн±Ђ               н µн±Ђн µн±Ђ в€’ 1
                                        н µн±љн µн±љ=1


    3. Simulation studies
    3.1. Setup

         We examine the performance of our proposed estimator under several simulation scenarios,
using data structures with explicit multiplicative heteroskedasticity, similar to the ones proposed
in Machado and Santos-Silva (2019), and with a varying coefficient model structure, as in Hsu,
Wen and Chen (2021). In both cases, the goal is to simulate data that would show heterogeneity
when using conditional quantile regressions for the estimation. This structure is flexible enough to
also allow the estimation of other distribution based regressions such as unconditional quantile
regressions (Firpo, Fortin and Lemieux, 2009) and Recentered Influence function regressions in
general (Rios-Avila, 2020).

         The first set of simulations is designed to study the performance of the estimator under
the assumption of multiplicative heteroskedasticity assuming the following functional form:

                                        н µн±¦н µн±¦ = н µн»Ѕн µн»Ѕ0 + н µн»Ѕн µн»Ѕ1 н µн±Ґн µн±Ґ1 + н µн»Ѕн µн»Ѕ2 н µн±Ґн µн±Ґ2 + н µн±Јн µн±Јн µн±Јн µн±Ј(н µн±Ґн µн±Ґ1 , н µн±Ґн µн±Ґ2 )             (15)

                                                                           2
         Where н µн±Ґн µн±Ґ1 ~н µн±Џн µн±Џн µн±Џн µн±Џн µн±Џн µн±Џн µн±Џн µн±Џн µн±Џн µн±Џн µн±Џн µн±Џн µн±Џн µн±Џн µн±Џн µн±Џ(0.5) and н µн±Ґн µн±Ґ2 ~н µнј’н µнј’5 /5 . Following Machado and Santos-Silva (2019),
we use two different functional forms for н µнјЋн µнјЋ(н µн±Ґн µн±Ґ1 , н µн±Ґн µн±Ґ2 ):

                                            н µнјЋн µнјЋ1 (н µн±Ґн µн±Ґ1 , н µн±Ґн µн±Ґ2 ) = н µн»ѕн µн»ѕ0 + н µн»ѕн µн»ѕ1 н µн±Ґн µн±Ґ1 + н µн»ѕн µн»ѕ2 н µн±Ґн µн±Ґ2                (16н µн±Ћн µн±Ћ)

                                                 н µнјЋн µнјЋ2 (н µн±Ґн µн±Ґ1 , н µн±Ґн µн±Ґ2 ) = н µн±’н µн±’ н µн»ѕн µн»ѕ0+н µн»ѕн µн»ѕ1 н µн±Ґн µн±Ґ1+н µн»ѕн µн»ѕ2 н µн±Ґн µн±Ґ2          (16н µн±Џн µн±Џ)

         In both cases, we require that н µнјЋн µнјЋ(н µн±Ґн µн±Ґ1 , н µн±Ґн µн±Ґ2 ) to be strictly positive. The first case, equation (16a),
imposes the assumption of linear hetoreskedasticity and provides a closed form solution for the
corresponding quantile coefficients. The second option, equation (16b), guarantees standard
deviation to be strictly positive, but does not have a closed form solution for the corresponding
conditional quantile regression coefficients. As described in Machado and Santos-Silva (2019),
this data generating process also guarantees that quantiles will not cross, and thus the
corresponding coefficients can be estimated directly using standard conditional quantile regression
estimators.

          Using this data structure, we consider four different distributions for the error н µн±Јн µн±Ј: Normal
distribution, logistic distribution, chi square distribution with 5 degrees of freedom, and uniform
distribution. All of them were adjusted to have a mean 0 and standard deviation 1. Whereas the
first two distributions are meant to show how sensitive is the estimator to the normality
assumption, the third and fourth aim to show how sensitive the results are to cases where the error
has a skew distribution, or a distribution with limited range. With this considerations, the data
generating process is defined as:

                                      н µн±¦н µн±¦ = н µн±Ґн µн±Ґ1 + н µн±Ґн µн±Ґ2 + н µн±Јн µн±Ј в€— (1 в€’ 0.5н µн±Ґн µн±Ґ1 + 0.2н µн±Ґн µн±Ґ2 )                           (17н µн±Ћн µн±Ћ)
                                           н µн±¦н µн±¦ = н µн±Ґн µн±Ґ1 + н µн±Ґн µн±Ґ2 + н µн±Јн µн±Ј в€— н µн±’н µн±’ 0.6в€’0.5+0.2н µн±Ґн µн±Ґ2                           (17н µн±Џн µн±Џ)

          The second set of simulations use a data generating process following a varying coefficient
approach, based on the percentile н µнјЏн µнјЏ an observation belongs to. In this setup, we assume that н µнјЏн µнјЏ is
defined by a random draw from a uniform distribution, and that н µн±¦н µн±¦ is given by:

                                           н µн±¦н µн±¦ = н µн»Ѕн µн»Ѕ0 (н µнјЏн µнјЏ) + н µн»Ѕн µн»Ѕ1 (н µнјЏн µнјЏ)н µн±Ґн µн±Ґ1 + н µн»Ѕн µн»Ѕ2 (н µнјЏн µнјЏ)н µн±Ґн µн±Ґ2                     (18)

          Following Hsu, et al (2021), the coefficients н µн»Ѕн µн»Ѕ(н µнјЏн µнјЏ)вЂІн µн± н µн±  are defined as:

        н µн»Ѕн µн»Ѕ0 (н µнјЏн µнјЏ) = 1 + 0.5О¦в€’1 (н µнјЏн µнјЏ); н µн»Ѕн µн»Ѕ1 (н µнјЏн µнјЏ) = 0.4 + 1.2О¦в€’1 (н µнјЏн µнјЏ); н µн»Ѕн µн»Ѕ2 (н µнјЏн µнјЏ) = 0.6 + 0.5О¦в€’1 (н µнјЏн µнјЏ)         (19н µн±Ћн µн±Ћ)
                          н µн»Ѕн µн»Ѕ0 (н µнјЏн µнјЏ) = н µн»Ѕн µн»Ѕ1 (н µнјЏн µнјЏ) = н µн»Ѕн µн»Ѕ2 (н µнјЏн µнјЏ) = 0.5(1 + О¦в€’1(н µнјЏн µнјЏ) в€’ log(1 в€’ н µнјЏн µнјЏ))                (19н µн±Џн µн±Џ)

          Equation (19a) imposes a structure that is similar to the multiplicative normality under
linear heteroskedasticity (equation 17a), whereas the second equation imposes a skew conditional
distribution of the outcome.

          In all scenarios, we assume that data is subject to interval censoring, such that н µн±™н µн±™н µн±™н µн±™н µн±–н µн±– =
вЊЉн µн±¦н µн±¦н µн±–н µн±– вЊ‹ & н µн±ўн µн±ўн µн±ўн µн±ўн µн±–н µн±– = вЊ€н µн±¦н µн±¦н µн±–н µн±– вЊ‰, where вЊЉ. вЊ‹ and вЊ€. вЊ‰ represent the nearest integers that is lower or higher than н µн±¦н µн±¦н µн±–н µн±–
respectively. In addition, we also assume if н µн±¦н µн±¦н µн±–н µн±– < в€’1 or н µн±¦н µн±¦н µн±–н µн±– > 10, the lower and upper thresholds,
respectively, will be undefined.
       For the implementation and analysis, we use 2500 replications, with a sample size of 1,000
observations. Replications using sample sizes of 500 and 2,000 are provided in the appendix, with
results that are qualitatively similar. We focus on the comparison of conditional quantile
regressions for the 10th, 50th and 90th quantiles, as well as for the 10th, 50th and 90th unconditional
quantiles. Quantile regressions were estimated using the fast algorithm developed in
Chernozhukov et al (2022) and implemented via the Stata command -qrprocess-, whereas the
unconditional quantile regressions were estimated following Firpo, Fortin and Lemieux (2009)
and implanted via the Stata command -rifhdreg- (Rios-Avila, 2020). Finally, the simulation was
implemented using -parallel- (Vega Yon and Quistorff, 2019). Finally, our imputation method is
implanted with a new user-written program -intreg_mi-, which is available upon request.

       While population parameters for conditional quantile regressions for some of the data
generating exists, there are no close form solutions for the population parameters corresponding
to the RIF regressions. Because of this, we assume that average estimates using fully observed
data represent the population parameters, for the calculation of the relevant statistics. Thus, by
construction, the bias of the model estimations using fully observed data is zero.

   3.2. Results

       Tables 1 to 3 provide a summary of the results for the Monte Carlo simulations using the
different data generating processes. In each table, we present the bias of our imputation procedure
compared to the average parameters using fully observed data as if they were the asymptotic
population parameters. For both conditional and unconditional quantile regressions, the bias
observed using the multiple imputation data is small when the homoscedastic error is assumed to
follow a symmetric bell curve distribution, regardless of the type of heteroskedasticity implied by
the dgp.

       When the errors follow a chi2 distribution or uniform distribution, we observe some bias,
especially for the lower quantile coefficients. The bias, however, is considerably smaller if the data
generating process assumes an functional form with exponential heteroskedasticity. Finally, the
results using the varying coefficient structure reveal low bias in both cases. In all simulations, the
bias magnitude did not depend on the sample size (see appendix).
       In terms of the mean absolute error (MAE), we present the ratio between the MAE for the
imputed data and the MAE for the fully observed data. Except for cases when the bias is large, the
MAE for the imputed data is somewhat smaller than the one using fully observed data, by almost
10%. It is possible that this gain in the precision of the point estimate may be simulation specific,
since we are indirectly using parametric structures for the estimation of the quantile regressions.
In terms of standard errors ratio, which compares the average standard errors of the imputed data
to fully observed data, we observe that the standard errors for imputed data are about 15% larger
on average, than the standard errors based on fully observed data. This is expected given the
information loss due to the nature of the interval censored data.
Table 1. Monte Carlo Simulation: N=1000, Linear Heteroskedasticity

                                                 н µн±ўн µн±ў~normal                           н µн±ўн µн±ў~logistic                          н µн±ўн µн±ў~Chi2                          н µн±ўн µн±ў~uniform
 н µн±¦н µн±¦ = н µн±Ґн µн±Ґн µн±Ґн µн±Ґ + н µн±ўн µн±ў в€— н µн»ѕн µн»ѕн µн»ѕн µн»ѕ                       MAE      StErr                        MAE      StErr                       MAE      StErr                        MAE      StErr
                                     TRUE     Bias       Ratio    Ratio    TRUE     Bias       Ratio    Ratio    TRUE     Bias      Ratio    Ratio    TRUE     Bias       Ratio    Ratio
                          x1          2.011    0.008     -0.014    0.235    1.955   -0.030     -0.094    0.156    1.848    0.321     3.886    0.915    2.094    0.258      2.073    0.690
 CQR-Q10                  x2          0.798    0.002     -0.030    0.155    0.803   -0.006     -0.063    0.134    0.831    0.123     1.490    0.377    0.786    0.052      0.293    0.296
                          cons       -2.381   -0.008      0.006    0.292   -2.247    0.022     -0.084    0.207   -1.989   -0.471     5.425    1.035   -2.572   -0.264      1.946    0.749
                          x1          1.000    0.001     -0.055    0.070    1.003   -0.001     -0.056    0.086    1.168    0.020    -0.029    0.094    0.996    0.004     -0.053    0.043
 CQR-Q50                  x2          1.001   -0.002     -0.055    0.071    0.997   -0.001     -0.025    0.088    0.969    0.000    -0.048    0.084    0.999    0.001     -0.074    0.041
                          cons       -0.001    0.002     -0.042    0.072   -0.002    0.002     -0.060    0.088   -0.385    0.023    -0.032    0.096    0.005   -0.005     -0.042    0.045
                          x1         -0.009    0.000     -0.049    0.103    0.041    0.005     -0.066    0.096   -0.051   -0.011    -0.051    0.048   -0.097   -0.008     -0.019    0.146
 CQR-Q90                  x2          1.199    0.001     -0.041    0.110    1.191    0.000     -0.067    0.109    1.215   -0.004    -0.054    0.047    1.216   -0.001      0.024    0.145
                          cons        2.383   -0.001     -0.059    0.099    2.252    0.007     -0.061    0.095    2.486   -0.003    -0.048    0.046    2.573   -0.042      0.057    0.141
                          x1          2.097    0.008     -0.005    0.159    1.915   -0.021     -0.065    0.114    1.840    0.188     0.903    0.287    2.539    0.236      0.760    0.459
 UQR-Q10                  x2          0.611    0.001     -0.021    0.060    0.602   -0.008     -0.067    0.047    0.684    0.079     0.312    0.252    0.651    0.027      0.075    0.191
                          cons       -2.537   -0.006      0.006    0.130   -2.342    0.016     -0.063    0.108   -2.370   -0.174     0.557    0.332   -3.046   -0.163      0.352    0.401
                          x1          1.006    0.000     -0.083    0.130    1.026   -0.007     -0.059    0.159    1.165    0.039    -0.011    0.161    0.945    0.012     -0.070    0.066
 UQR-Q50                  x2          0.929    0.000     -0.058    0.110    0.919   -0.002     -0.061    0.128    0.945    0.000    -0.067    0.144    0.921    0.007     -0.096    0.062
                          cons        0.131    0.001     -0.069    0.122    0.120    0.004     -0.072    0.140   -0.199    0.020     0.001    0.137    0.190   -0.004     -0.076    0.073
                          x1          0.052   -0.001     -0.076    0.103    0.106    0.004     -0.061    0.108    0.014   -0.004    -0.055    0.034   -0.003   -0.004     -0.045    0.166
 UQR-Q90                  x2          1.466    0.001     -0.053    0.192    1.492   -0.004     -0.089    0.213    1.455   -0.004    -0.087    0.111    1.484   -0.015      0.052    0.231
                          cons        2.263   -0.001     -0.075    0.129    2.134    0.010     -0.063    0.134    2.369    0.000    -0.060    0.068    2.314    0.010      0.003    0.176
Table 2 Monte Carlo Simulation: N=1000, exponential Heteroskedasticity

                                                       н µн±ўн µн±ў~normal                           н µн±ўн µн±ў~logistic                           н µн±ўн µн±ў~Chi2                            н µн±ўн µн±ў~uniform
 н µн±¦н µн±¦ = н µн±Ґн µн±Ґн µн±Ґн µн±Ґ + н µн±ўн µн±ў в€— н µн±’н µн±’ н µн»ѕн µн»ѕн µн»ѕн µн»ѕ                        MAE      StErr                         MAE      StErr                        MAE      StErr                         MAE      StErr
                                          TRUE      Bias       Ratio    Ratio   TRUE      Bias        Ratio    Ratio   TRUE      Bias       Ratio    Ratio   TRUE      Bias        Ratio    Ratio
                             x1            1.639   -0.001      -0.032   0.168    1.603   -0.009       -0.066   0.143    1.536   -0.030       0.301   0.484    1.691   -0.012        0.228   0.320
 CQR-Q10                     x2            0.743    0.004      -0.040   0.173    0.758    0.003       -0.078   0.162    0.788    0.018       0.258   0.465    0.726    0.046        0.501   0.268
                             cons         -1.280   -0.003      -0.026   0.160   -1.209   -0.013       -0.081   0.129   -1.072   -0.050       0.434   0.456   -1.384   -0.008        0.352   0.344
                             x1            1.000    0.000      -0.076   0.115    0.999    0.000       -0.045   0.158    1.102    0.025       0.009   0.152    1.001   -0.001       -0.123   0.045
 CQR-Q50                     x2            0.999    0.000      -0.060   0.114    0.998   -0.003       -0.027   0.158    0.959   -0.009      -0.012   0.154    1.002    0.007       -0.104   0.043
                             cons          0.000    0.001      -0.078   0.119    0.004    0.003       -0.032   0.162   -0.204    0.053       0.195   0.154    0.000   -0.008       -0.105   0.045
                             x1            0.364   -0.002      -0.066   0.170    0.392    0.009       -0.062   0.154    0.331   -0.003      -0.075   0.053    0.306    0.014        0.148   0.270
 CQR-Q90                     x2            1.255   -0.001      -0.065   0.162    1.239   -0.002       -0.040   0.157    1.269   -0.007      -0.078   0.035    1.274   -0.012        0.152   0.253
                             cons          1.279    0.001      -0.066   0.164    1.216    0.012       -0.061   0.147    1.340   -0.010      -0.083   0.047    1.386   -0.040        0.268   0.266
                             x1            1.613   -0.002      -0.184   0.256    1.478    0.018       -0.097   0.258    1.273    0.071       0.059   0.265    1.734   -0.125        0.171   0.089
 UQR-Q10                     x2            0.582    0.001      -0.097   0.107    0.565    0.006       -0.052   0.100    0.648   -0.003      -0.106   0.180    0.670   -0.084        0.219   0.042
                             cons         -1.533   -0.001      -0.135   0.274   -1.390   -0.019       -0.043   0.256   -1.417   -0.042      -0.138   0.348   -1.843    0.200        0.496   0.167
                             x1            1.003    0.002      -0.072   0.246    1.025   -0.015       -0.053   0.273    1.099    0.037       0.045   0.284    0.922    0.072        0.152   0.181
 UQR-Q50                     x2            0.850    0.001      -0.079   0.183    0.853   -0.009       -0.074   0.200    0.860   -0.031       0.037   0.219    0.839    0.042        0.077   0.139
                             cons          0.170    0.000      -0.079   0.207    0.152    0.016       -0.047   0.221    0.023    0.059       0.191   0.233    0.245   -0.077        0.144   0.182
                             x1            0.430   -0.002      -0.047   0.077    0.446   -0.006       -0.068   0.050    0.383   -0.020      -0.075   0.000    0.429    0.009       -0.010   0.129
 UQR-Q90                     x2            1.624   -0.001      -0.124   0.484    1.612   -0.024       -0.105   0.433    1.585   -0.064      -0.055   0.295    1.630    0.041       -0.083   0.558
                             cons          1.212    0.002      -0.134   0.323    1.190    0.027       -0.127   0.290    1.305    0.066      -0.073   0.163    1.217   -0.043       -0.078   0.377
Table 3 Monte Carlo Simulation: N=1000, Varying coefficient structure

                                           Type 1                            Type 2
                                                MAE     StErr                     MAE     StErr
   н µн±¦н µн±¦ = н µн±Ґн µн±Ґн µн±Ґн µн±Ґ(н µн±Ўн µн±Ў)      TRUE      Bias    Ratio   Ratio   TRUE      Bias    Ratio   Ratio
                       x1     -1.140    0.000 -0.030    0.103   -0.092    0.011   0.033   0.154
 CQR-Q10 x2                   -0.035    0.010 -0.038    0.120   -0.086    0.010   0.016   0.156
                       cons    0.356   -0.010 -0.046    0.182   -0.086   -0.043   0.178   0.170
                       x1      0.404   -0.002 -0.052    0.067    0.845   -0.009 -0.021    0.056
 CQR-Q50 x2                    0.601   -0.001 -0.050    0.065    0.841   -0.004 -0.016    0.045
                       cons    0.998    0.003 -0.031    0.085    0.853    0.029   0.011   0.059
                       x1      1.938    0.001 -0.048    0.098    2.282   -0.001 -0.045    0.045
 CQR-Q90 x2                    1.236   -0.005 -0.038    0.098    2.280   -0.010 -0.067    0.054
                       cons    1.644    0.004 -0.046    0.139    2.309   -0.002 -0.066    0.073
                       x1     -1.211    0.002 -0.046    0.170   -0.097    0.018 -0.027    0.174
 UQR-Q10 x2                   -0.044    0.002 -0.023    0.072   -0.078    0.012 -0.031    0.157
                       cons    0.482   -0.004 -0.025    0.066   -0.074   -0.056   0.078   0.162
                       x1      0.418   -0.003 -0.059    0.119    0.900   -0.008 -0.023    0.031
 UQR-Q50 x2                    0.535   -0.003 -0.071    0.112    0.737   -0.005 -0.015    0.026
                       cons    0.899    0.007 -0.061    0.124    0.757    0.017 -0.031    0.028
                       x1      1.982    0.001 -0.079    0.150    2.214   -0.002 -0.039    0.051
 UQR-Q90 x2                    1.296    0.002 -0.060    0.136    2.321    0.001 -0.022    0.077
                       cons    1.778   -0.003 -0.093    0.141    2.499   -0.006 -0.031    0.057




     4. Wage inequality in Grenada

This illustration focuses on an empirical application of our proposed method for the case of
Grenada, focusing on the description of wage inequality trends in the country between 2013 and
2020 using the annual Labor Force Survey (LFS). This survey provides the only source of
information that can be used to describe the status of the labor market and the distribution of labor
income in the country.

One major limitation of this survey, however, is the collection of labor income data. Compared to
standard household surveys or labor force surveys in most developed countries, labor income
recorded in the LFS in Grenada is only available in brackets. Furthermore, there is a large
proportion of the employed population who do not declare their labor income. Table 4 provides an
overview of the labor income distribution across time.
                                 Table 4 Labor Income distribution by year

 Year                    2013    2014       2015         2016         2017    2018    2019     2020
 >200                      3.0     1.2        3.7          3.5          1.4     0.2     0.0      0.4
 200-399                   6.9     5.8        6.3          5.3          4.1     1.6     1.2      1.1
 400-799                  15.4    15.9       12.3         14.2         13.7     9.0     8.3     10.3
 800-1199                 19.1    20.0       18.3         18.7         21.1    20.4    23.8     24.6
 1200-1999                17.7    17.4       13.9         13.1         18.4    14.7    14.9     15.9
 2000-3999                15.6    11.3       11.2         11.5         10.5     9.7    12.8     11.8
 4000-5999                 2.6     2.4        2.4          2.2          2.2     1.6     1.2      2.1
 6000+                     2.0     1.2        0.6          0.6          0.7     1.0     1.0      0.5
 Not stated               17.7    24.8       31.3         30.9         27.9    41.8    36.7     33.2

In this case, we face two types of problems. On the one hand, we only have access to interval-
censored data, which is insufficient to analyze changes in the distribution of earnings in the
country, and, on the other hand, we have an increasing proportion of individuals who do not declare
income. We apply the imputation procedure previously described to address both problems,
estimating the interval-censored regression for each year, with a set of household-level
characteristics and job type characteristics. The sample of interest includes all adults who declared
to be employed, even if they did not state their income.

We make the simplifying assumption that people who did not state income are randomly
distributed conditional on observed characteristics. To account for the fact that characteristics may
differ across those who did or did not state their incomes, an inverse probability weighting strategy
is used to estimate the interval regression model. Finally, the imputation procedure is implemented
as discussed in section 3 but assuming no lower and upper bounds for the imputed wages.
Nevertheless, the maximum imputed wage for those who do not state their income is capped at the
maximum predicted among those who declare their income. In all cases, imputed earnings are
adjusted by inflation.
                                                         Figure 1 Average Monthly Earnings by Year and Gender


                                           1600




                                           1500
                 Average Monthly earning




                                                                                                                       All
                                                                                                                       Men
                                           1400
                                                                                                                       Women




                                           1300




                                           1200

                                                  2012          2014            2016           2018             2020
                                                                                year




The results suggest that after a small decline in average real monthly earnings from 2013 to 2016,
there was a slight improvement in the following two years, with a small decline in 2019, with
average wages remaining at stable levels in 2020, despite the Covid-19 pandemic. 7 The results
also suggest that the gender earnings gap has shown a somewhat increasing trend between 2013
and 2019, although it is predicted to decline a little in 2020.




7
    This estimate does not take into account the decline in labor force participation observed during the pandemic.
                                                Figure 2 Selected Quantiles and Gini coefficient across Years


                                                                                                                46
                                  2500

                                                                                                                44

                                  1500
                                                                                                                42
   Monthly earnings (log scale)




                                  1000
                                                                                                                40                 Q10




                                                                                                                     Gini points
                                                                                                                                   Q50
                                                                                                                                   Q90
                                  600                                                                           38                 Gini


                                                                                                                36


                                  300
                                                                                                                34



                                                                                                                32

                                         2012     2014             2016              2018             2020
                                                                          year




In terms of inequality, the estimates suggest that it has declined substantially across the years. The
estimated Gini coefficient fell from 44.2 Gini points in 2015 to 34.1 in 2019, with a significant
increase in 2020. This decline in inequality seems to have been driven by faster growth in the
lower and middle sections of the wage distribution and a small decline in the upper section of the
distribution.

            5. Conclusion

In this paper, we present an imputation strategy that can be used to analyze interval-censored data.
Our method proposes that a flexible enough interval regression model can be used to impute
interval-censored data, which allows to recover the full distribution of data, and can be further
analyzed using standard statistical methods.

The main limitation of our strategy is the assumption of conditional normality, which is required
for the estimation of the interval regression model using standard software. The principles of the
imputation approach, however, could be extended to allow for more flexible moment
specifications, as well as error distributions.
Nevertheless, the Monte Carlo simulation suggests that as long as the latent error has a symmetric
bell shaped distribution, regression analysis using the imputed data show small bias, with
performance that is comparable to analyzing the uncensored data. Furthermore, when the
heteroskedasticity structure is given by an exponential function, biases are small even when the
latent error follows a skew or a limited distribution.

For the specific case of Grenada, we only had access to interval-censored data, which is insufficient
to analyze changes in the distribution of earnings in the country, and, on the other hand, we have
an increasing proportion of individuals who do not declare income. We apply the imputation
procedure to address both problems, estimating the interval-censored regression for each year, with
a set of household-level characteristics and job type characteristics. The results suggest that earned
income inequality in this country has declined, which coincides with other economic performance
indicators in the country.
References

Angelov, A. G., & EkstrГ¶m, M. (2018). Maximum likelihood estimation for survey data with informative
       interval censoring. AStA Advances in Statistical Analysis, 103(2), 217-236.

BГјttner, T., & RГ¤ssler, S. (2008). Multiple imputation of right-censored wages in the German IAB
        Employment Sample considering heteroscedasticity.

Cameron, A. Colin and Trivedi, Pravin K.                      (2010).    Microeconometrics:      Methods      and
      Applications.Cambridgege University press.

Chen, Y.-T. (2017). A unified approach to estimating and testing income distributions with
      grouped data. Journal of Business & Economic Statistics, pages 1вЂ“18.

Chen Y, Zhao Y (2021) Efficient sparse estimation on interval-censored data with approximated L0 norm:
       Application        to     child      mortality.     PLoS        ONE        16(4):     e0249359.
       https://doi.org/10.1371/journal.pone.0249359

Chernozhukov, V., FernГЎndez-Val, I. & Melly, B. Fast algorithms for the quantile regression
      process. Empir Econ 62, 7вЂ“33 (2022). https://doi.org/10.1007/s00181-020-01898-0

Chih-Yuan, H., Chi-Chung, W. and Yi-Hau, C. (2021). Quantile function regression analysis for interval
       censored data, with application to salary survey data. Japanese Journal of Statistics and Data
       Science 72. DOI: 10.1007/s42081-021-00113-3

Demirtas, H., S. A. Freels, and R. M. Yucel. 2008. "Plausibility of Multivariate Normality Assumption
       When Multiply Imputing Non-Gaussian Continuous Outcomes: A Simulation
       Assessment." Journal of Statistical Computation and Simulation 78 (1): 69вЂ“84.

Firpo,    S., Fortin, N. M., & Lemieux, T. (2009). Unconditional Quantile
         Regressions. Econometrica, 77(3), 953вЂ“973. http://www.jstor.org/stable/40263848

Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. 2014. Bayesian Data
      Analysis. 3rd ed. Boca Raton, FL: Chapman & Hall/CRC.

Hagenaars, A. and De Vos, K. (1988). The Definition and Measurement of Poverty. Journal of Human
      Resources, 23, 211-221. http://dx.doi.org/10.2307/145776

Han, J., Meyer, B. D., & Sullivan, J. X. (2020). Income and Poverty in the COVID-19 Pandemic (No.
        w27729). National Bureau of Economic Research.

Jann, B. (2003). The Swiss Labor Market Survey 1998 (SLMS 98). Schmollers Jahrbuch : Zeitschrift fГјr
        Wirtschaftsund         Sozialwissenschaften,    123(2),       329-335.         https://nbn-
        resolving.org/urn:nbn:de:0168-ssoar-409467

Jenkins, S., Burkhauser, R., Feng, S., & Larrimore, J. (2011). Measuring inequality using censored data: A
        multiple-imputation approach to estimation and inference. Journal of the Royal Statistical Society. Series
        A (Statistics in Society), 174(1), 63-81.
McDonald, J., Stoddard, O. & Walton, D. (2018) On using interval response data inexperimental economics.
      Journal of Behavioral and Experimental Economics 72 (2018), 9вЂ“16.

Machado, JosГ© A.F. & Santos Silva, J.M.C., 2019. "Quantiles via moments," Journal of Econometrics,
      Elsevier, vol. 213(1), pages 145-173.

Moore, J. C., L. Stinson and E. Welniak. "Income Measurement Error in Surveys: A Review." Journal of
       Official Statistics 16 (2000): 331-362.

Parolin, Z., & Wimer, C. (2020). Forecasting estimates of poverty during the COVID-19 crisis. Poverty
        and Social Policy Brief, 4(8).

Rios-Avila, F. (2020). Recentered influence functions (RIFs) in Stata: RIF regression and RIF
       decomposition. The Stata Journal, 20(1), 51вЂ“94. https://doi.org/10.1177/1536867X20909690

Royston, P. (2007). Multiple imputation of missing values: Further update of ice, with an emphasis on
       interval censoring. Stata Journal, 7: 445вЂ“464.

Rubin, D. B. 1987. Multiple Imputation for Non-response in Surveys. New York: Wiley.

Ting Yan. Liangqiang Qu. Zhaohai Li. Ao Yuan. "Conditional kernel density estimation for some
      incomplete data models." Electron. J. Statist. 12 (1) 1299 - 1329, 2018. https://doi.org/10.1214/18-
      EJS1423

Vega Yon, G. G., & Quistorff, B. (2019). parallel: A command for parallel computing. The Stata
      Journal, 19(3), 667вЂ“684. https://doi.org/10.1177/1536867X19874242

Walter, P., & Weimer, K. (2018). Estimating poverty and inequality indicators using interval censored
        income data from the german microcensus (No. 2018/10). DiskussionsbeitrГ¤ge.

Wang, X., Chen, MH. & Yan, J. Bayesian dynamic regression models for interval censored survival data
      with application to children dental health. Lifetime Data Anal 19, 297вЂ“316 (2013).
      https://doi.org/10.1007/s10985-013-9246-

Xiuqing Zhou, Yanqin Feng & Xiuli Du (2017) Quantile regression for interval censored
       data, Communications     in    Statistics - Theory  and     Methods, 46:8, 3848
       3863, DOI: 10.1080/03610926.2015.1073317

Yi-Ting Chen (2018) A Unified Approach to Estimating and Testing Income Distributions With Grouped
       Data, Journal of Business & Economic Statistics, 36:3, 438-455
Appendix

Table A1. Monte Carlo Simulation: N=500, Linear Heteroskedasticity

                                                  н µн±ўн µн±ў~normal                           н µн±ўн µн±ў~logistic                           н µн±ўн µн±ў~Chi2                            н µн±ўн µн±ў~uniform
 н µн±¦н µн±¦ = н µн±Ґн µн±Ґн µн±Ґн µн±Ґ + н µн±ўн µн±ў в€— н µн»ѕн µн»ѕн µн»ѕн µн»ѕ                        MAE      StErr                         MAE      StErr                        MAE      StErr                         MAE      StErr
                                     TRUE      Bias       Ratio    Ratio   TRUE      Bias        Ratio    Ratio   TRUE      Bias       Ratio    Ratio   TRUE      Bias        Ratio    Ratio
                          x1          2.005    0.023      -0.013   0.249    1.962   -0.028       -0.070   0.177    1.843    0.330       2.664   0.807    2.093    0.265        1.397   0.587
 CQR-Q10                  x2          0.808    0.006      -0.036   0.162    0.802   -0.004       -0.041   0.146    0.838    0.123       0.816   0.335    0.783    0.052        0.249   0.249
                          cons       -2.385   -0.025      -0.021   0.303   -2.251    0.020       -0.068   0.227   -1.988   -0.481       3.696   0.918   -2.563   -0.274        1.395   0.638
                          x1          1.003    0.001      -0.063   0.062    0.998   -0.001       -0.031   0.080    1.160    0.021      -0.042   0.095    1.009    0.000       -0.059   0.054
 CQR-Q50                  x2          1.004    0.000      -0.057   0.063    0.999    0.000       -0.039   0.080    0.971    0.000      -0.044   0.082    0.996    0.000       -0.069   0.055
                          cons       -0.004    0.000      -0.062   0.064    0.005    0.001       -0.040   0.081   -0.379    0.023      -0.044   0.104   -0.002   -0.002       -0.064   0.056
                          x1         -0.011    0.000      -0.050   0.105    0.040    0.005       -0.070   0.106   -0.047   -0.012      -0.050   0.054   -0.090   -0.010       -0.021   0.120
 CQR-Q90                  x2          1.205   -0.001      -0.064   0.112    1.190   -0.004       -0.069   0.119    1.212   -0.002      -0.069   0.056    1.209    0.000        0.040   0.118
                          cons        2.376    0.000      -0.053   0.103    2.255    0.011       -0.063   0.104    2.488   -0.006      -0.054   0.052    2.569   -0.039        0.038   0.115
                          x1          2.071    0.022       0.017   0.165    1.907   -0.018       -0.024   0.114    1.829    0.201       0.715   0.295    2.461    0.264        0.724   0.437
 UQR-Q10                  x2          0.613    0.005      -0.029   0.074    0.594   -0.006       -0.063   0.049    0.687    0.087       0.262   0.258    0.631    0.038        0.096   0.220
                          cons       -2.528   -0.016       0.021   0.149   -2.327    0.012       -0.051   0.114   -2.366   -0.190       0.436   0.348   -2.983   -0.196        0.333   0.413
                          x1          1.023    0.002      -0.058   0.129    1.035   -0.009       -0.065   0.158    1.169    0.039      -0.025   0.151    0.956    0.011       -0.081   0.064
 UQR-Q50                  x2          0.941    0.003      -0.074   0.106    0.934   -0.001       -0.053   0.127    0.965   -0.004      -0.056   0.137    0.921    0.011       -0.093   0.058
                          cons        0.111   -0.003      -0.077   0.116    0.107    0.004       -0.051   0.136   -0.218    0.023      -0.004   0.131    0.183   -0.006       -0.080   0.068
                          x1          0.042    0.000      -0.064   0.098    0.100    0.006       -0.077   0.104    0.018   -0.006      -0.055   0.039    0.000   -0.004       -0.049   0.164
 UQR-Q90                  x2          1.470   -0.005      -0.057   0.180    1.469   -0.004       -0.074   0.207    1.430   -0.006      -0.074   0.109    1.481   -0.012        0.042   0.220
                          cons        2.267    0.001      -0.046   0.118    2.158    0.010       -0.062   0.125    2.389    0.002      -0.057   0.064    2.313    0.009       -0.006   0.167
Table A2 Monte Carlo Simulation: N=500, exponential Heteroskedasticity

                                                       н µн±ўн µн±ў~normal                           н µн±ўн µн±ў~logistic                           н µн±ўн µн±ў~Chi2                            н µн±ўн µн±ў~uniform
 н µн±¦н µн±¦ = н µн±Ґн µн±Ґн µн±Ґн µн±Ґ + н µн±ўн µн±ў в€— н µн±’н µн±’ н µн»ѕн µн»ѕн µн»ѕн µн»ѕ                        MAE      StErr                         MAE      StErr                        MAE      StErr                         MAE      StErr
                                          TRUE      Bias       Ratio    Ratio   TRUE      Bias        Ratio    Ratio   TRUE      Bias       Ratio    Ratio   TRUE      Bias        Ratio    Ratio
                             x1            1.643    0.000      -0.053   0.172    1.608   -0.008       -0.091   0.159    1.535   -0.029       0.217   0.422    1.692   -0.011        0.244   0.258
 CQR-Q10                     x2            0.749    0.004      -0.076   0.174    0.762    0.003       -0.076   0.175    0.788    0.018       0.196   0.404    0.728    0.046        0.372   0.211
                             cons         -1.287   -0.004      -0.057   0.169   -1.219   -0.013       -0.077   0.147   -1.071   -0.049       0.253   0.400   -1.383   -0.011        0.344   0.275
                             x1            1.001    0.001      -0.068   0.105    0.999    0.000       -0.039   0.146    1.101    0.022      -0.035   0.139    0.997    0.000       -0.124   0.052
 CQR-Q50                     x2            1.002   -0.002      -0.066   0.103    1.002   -0.005       -0.042   0.143    0.958   -0.010      -0.003   0.139    0.998    0.006       -0.117   0.050
                             cons         -0.003    0.001      -0.085   0.106   -0.002    0.006       -0.054   0.150   -0.202    0.055       0.090   0.140    0.003   -0.008       -0.121   0.052
                             x1            0.363    0.000      -0.066   0.173    0.392    0.004       -0.073   0.166    0.329   -0.003      -0.059   0.062    0.307    0.011        0.124   0.218
 CQR-Q90                     x2            1.254   -0.004      -0.054   0.165    1.238   -0.004       -0.068   0.168    1.261   -0.006      -0.082   0.058    1.271   -0.014        0.144   0.201
                             cons          1.278    0.003      -0.045   0.165    1.217    0.015       -0.069   0.160    1.346   -0.015      -0.067   0.061    1.385   -0.037        0.161   0.213
                             x1            1.596   -0.001      -0.143   0.204    1.466    0.022       -0.082   0.203    1.288    0.054      -0.057   0.209    1.724   -0.103       -0.007   0.081
 UQR-Q10                     x2            0.581    0.001      -0.080   0.099    0.564    0.006       -0.052   0.094    0.654   -0.011      -0.073   0.149    0.671   -0.073        0.047   0.049
                             cons         -1.527   -0.002      -0.090   0.235   -1.388   -0.022       -0.052   0.217   -1.427   -0.027      -0.135   0.288   -1.836    0.175        0.146   0.158
                             x1            1.020    0.004      -0.061   0.235    1.041   -0.016       -0.053   0.266    1.118    0.038       0.015   0.276    0.928    0.072        0.029   0.175
 UQR-Q50                     x2            0.870   -0.002      -0.075   0.172    0.870   -0.010       -0.059   0.196    0.870   -0.027      -0.008   0.212    0.846    0.043        0.018   0.133
                             cons          0.143    0.002      -0.057   0.188    0.125    0.020       -0.038   0.206    0.003    0.055       0.095   0.217    0.234   -0.076        0.043   0.166
                             x1            0.424    0.002      -0.042   0.087    0.436   -0.006       -0.066   0.062    0.373   -0.016      -0.074   0.011    0.421    0.005       -0.027   0.132
 UQR-Q90                     x2            1.598    0.005      -0.083   0.400    1.604   -0.021       -0.108   0.367    1.572   -0.053      -0.089   0.253    1.606    0.031       -0.086   0.471
                             cons          1.239   -0.006      -0.085   0.262    1.206    0.025       -0.120   0.238    1.325    0.049      -0.102   0.135    1.241   -0.032       -0.099   0.308
Table A3 Monte Carlo Simulation: N=500, Varying coefficient structure

                                           Type 1                            Type 2
                                                MAE     StErr                     MAE     StErr
   н µн±¦н µн±¦ = н µн±Ґн µн±Ґн µн±Ґн µн±Ґ(н µн±Ўн µн±Ў)      TRUE      Bias    Ratio   Ratio   TRUE      Bias    Ratio   Ratio
                       x1     -1.133    0.002 -0.030    0.110   -0.078    0.010   0.027   0.139
 CQR-Q10 x2                   -0.036    0.008 -0.045    0.119   -0.078    0.009   0.036   0.146
                       cons    0.359   -0.009 -0.067    0.169   -0.094   -0.041   0.134   0.152
                       x1      0.401   -0.001 -0.044    0.064    0.852   -0.009 -0.044    0.052
 CQR-Q50 x2                    0.598    0.000 -0.052    0.059    0.848   -0.004 -0.002    0.042
                       cons    1.002    0.001 -0.051    0.070    0.847    0.028   0.005   0.051
                       x1      1.933    0.001 -0.050    0.100    2.296    0.001 -0.040    0.047
 CQR-Q90 x2                    1.234   -0.005 -0.039    0.099    2.276   -0.010 -0.046    0.067
                       cons    1.649    0.004 -0.045    0.136    2.305   -0.001 -0.055    0.086
                       x1     -1.183    0.003 -0.063    0.152   -0.085    0.013 -0.015    0.166
 UQR-Q10 x2                   -0.045    0.003 -0.046    0.075   -0.077    0.014 -0.018    0.149
                       cons    0.471   -0.005 -0.029    0.066   -0.077   -0.053   0.036   0.152
                       x1      0.419    0.000 -0.061    0.122    0.914   -0.005 -0.026    0.037
 UQR-Q50 x2                    0.542   -0.003 -0.053    0.109    0.746   -0.003 -0.025    0.030
                       cons    0.892    0.004 -0.067    0.118    0.742    0.015 -0.030    0.032
                       x1      1.953    0.003 -0.055    0.123    2.215    0.000 -0.041    0.050
 UQR-Q90 x2                    1.291    0.002 -0.044    0.130    2.305   -0.003 -0.033    0.076
                       cons    1.800   -0.004 -0.061    0.112    2.517   -0.005 -0.037    0.049
Table A4. Monte Carlo Simulation: N=2000, Linear Heteroskedasticity

                                                  н µн±ўн µн±ў~normal                           н µн±ўн µн±ў~logistic                           н µн±ўн µн±ў~Chi2                            н µн±ўн µн±ў~uniform
 н µн±¦н µн±¦ = н µн±Ґн µн±Ґн µн±Ґн µн±Ґ + н µн±ўн µн±ў в€— н µн»ѕн µн»ѕн µн»ѕн µн»ѕ                        MAE      StErr                         MAE      StErr                        MAE      StErr                         MAE      StErr
                                     TRUE      Bias       Ratio    Ratio   TRUE      Bias        Ratio    Ratio   TRUE      Bias       Ratio    Ratio   TRUE      Bias        Ratio    Ratio
                          x1          2.012    0.001      -0.011   0.228    1.956   -0.032       -0.058   0.140    1.846    0.315       5.759   1.028    2.097    0.254        3.135   0.790
 CQR-Q10                  x2          0.797    0.000      -0.040   0.150    0.813   -0.007       -0.060   0.121    0.831    0.120       2.261   0.414    0.785    0.053        0.433   0.343
                          cons       -2.379   -0.002      -0.028   0.283   -2.253    0.025       -0.059   0.188   -1.990   -0.462       7.931   1.152   -2.576   -0.261        2.864   0.859
                          x1          1.000    0.001      -0.056   0.076    1.000   -0.001       -0.042   0.095    1.162    0.020      -0.016   0.090    0.997    0.001       -0.077   0.038
 CQR-Q50                  x2          1.000    0.000      -0.057   0.076    1.001   -0.001       -0.044   0.099    0.967    0.000      -0.066   0.086    1.001    0.001       -0.077   0.035
                          cons       -0.001    0.000      -0.059   0.076   -0.001    0.002       -0.047   0.096   -0.379    0.024      -0.025   0.087    0.002   -0.002       -0.078   0.040
                          x1         -0.013    0.001      -0.045   0.099    0.040    0.008       -0.073   0.092   -0.059   -0.011      -0.055   0.045   -0.099   -0.010       -0.007   0.180
 CQR-Q90                  x2          1.204   -0.001      -0.047   0.107    1.193   -0.002       -0.055   0.099    1.207   -0.003      -0.059   0.041    1.217    0.000        0.036   0.178
                          cons        2.380   -0.002      -0.057   0.096    2.251    0.008       -0.058   0.088    2.497   -0.003      -0.048   0.042    2.577   -0.042        0.108   0.177
                          x1          2.102    0.004      -0.011   0.174    1.928   -0.018       -0.091   0.123    1.849    0.168       1.117   0.282    2.605    0.208        0.772   0.504
 UQR-Q10                  x2          0.611    0.001      -0.032   0.059    0.615   -0.010       -0.061   0.050    0.683    0.073       0.482   0.241    0.672    0.018        0.062   0.176
                          cons       -2.537   -0.004       0.006   0.128   -2.357    0.014       -0.080   0.109   -2.374   -0.157       0.725   0.318   -3.104   -0.135        0.370   0.415
                          x1          0.997    0.002      -0.071   0.136    1.014   -0.010       -0.070   0.162    1.141    0.046       0.068   0.171    0.941    0.013       -0.069   0.065
 UQR-Q50                  x2          0.918    0.000      -0.069   0.112    0.914   -0.003       -0.066   0.131    0.932    0.000      -0.080   0.151    0.924    0.007       -0.093   0.063
                          cons        0.144   -0.001      -0.064   0.129    0.134    0.006       -0.055   0.147   -0.174    0.017       0.001   0.143    0.190   -0.003       -0.066   0.077
                          x1          0.051    0.001      -0.047   0.104    0.104    0.006       -0.073   0.110    0.010   -0.005      -0.051   0.035   -0.006   -0.003       -0.043   0.169
 UQR-Q90                  x2          1.478   -0.001      -0.062   0.203    1.502   -0.002       -0.079   0.220    1.463    0.001      -0.083   0.127    1.474   -0.011        0.060   0.251
                          cons        2.249    0.000      -0.065   0.142    2.126    0.008       -0.082   0.144    2.362   -0.003      -0.043   0.083    2.326    0.008       -0.010   0.190
Table A5 Monte Carlo Simulation: N=2000, exponential Heteroskedasticity

                                                  н µн±ўн µн±ў~normal                           н µн±ўн µн±ў~logistic                           н µн±ўн µн±ў~Chi2                            н µн±ўн µн±ў~uniform
 н µн±¦н µн±¦ = н µн±Ґн µн±Ґн µн±Ґн µн±Ґ + н µн±ўн µн±ў в€— н µн»ѕн µн»ѕн µн»ѕн µн»ѕ                        MAE      StErr                         MAE      StErr                        MAE      StErr                         MAE      StErr
                                     TRUE      Bias       Ratio    Ratio   TRUE      Bias        Ratio    Ratio   TRUE      Bias       Ratio    Ratio    TRUE      Bias       Ratio    Ratio
                          x1          1.639    0.000      -0.053   0.162    1.603   -0.009       -0.076   0.130    1.536   -0.030       0.442    0.541    1.693   -0.014       0.254   0.390
 CQR-Q10                  x2          0.744    0.004      -0.051   0.169    0.758    0.002       -0.086   0.150    0.786    0.017       0.308    0.512    0.726    0.044       0.721   0.333
                          cons       -1.280   -0.004      -0.043   0.153   -1.210   -0.013       -0.099   0.114   -1.072   -0.048       0.749    0.503   -1.386   -0.005       0.312   0.420
                          x1          0.999    0.001      -0.072   0.127    1.001    0.000       -0.047   0.170    1.103    0.024       0.071    0.157    1.001   -0.001      -0.118   0.038
 CQR-Q50                  x2          1.000    0.000      -0.060   0.122    0.999   -0.003       -0.012   0.170    0.957   -0.008      -0.011    0.165    1.001    0.007      -0.129   0.033
                          cons        0.000   -0.001      -0.044   0.129    0.000    0.004       -0.042   0.169   -0.203    0.053       0.375    0.161   -0.002   -0.008      -0.118   0.038
                          x1          0.363   -0.002      -0.041   0.165    0.397    0.006       -0.060   0.142    0.332   -0.003      -0.066    0.047    0.310    0.014       0.137   0.338
 CQR-Q90                  x2          1.256   -0.002      -0.062   0.161    1.241   -0.002       -0.088   0.142    1.265   -0.003      -0.076    0.028    1.276   -0.012       0.161   0.317
                          cons        1.279    0.002      -0.053   0.160    1.211    0.016       -0.075   0.140    1.341   -0.013      -0.063    0.045    1.383   -0.040       0.365   0.333
                          x1          1.621   -0.001      -0.219   0.316    1.485    0.022       -0.097   0.314    1.266    0.081       0.396    0.326    1.735   -0.141       0.500   0.106
 UQR-Q10                  x2          0.587    0.000      -0.086   0.123    0.570    0.006       -0.064   0.115    0.643    0.002      -0.112    0.211    0.675   -0.091       0.542   0.037
                          cons       -1.542   -0.001      -0.131   0.326   -1.401   -0.020       -0.049   0.301   -1.406   -0.052      -0.049    0.412   -1.850    0.216       1.038   0.181
                          x1          0.991    0.002      -0.082   0.258    1.014   -0.016       -0.038   0.289    1.082    0.036       0.143    0.298    0.916    0.070       0.329   0.196
 UQR-Q50                  x2          0.843   -0.001      -0.086   0.194    0.842   -0.010       -0.046   0.219    0.844   -0.030       0.114    0.241    0.831    0.041       0.157   0.156
                          cons        0.184    0.002      -0.083   0.228    0.167    0.019       -0.046   0.243    0.046    0.059       0.339    0.264    0.256   -0.074       0.330   0.200
                          x1          0.433    0.000      -0.028   0.072    0.455   -0.007       -0.051   0.046    0.388   -0.023      -0.071   -0.009    0.437    0.012       0.004   0.135
 UQR-Q90                  x2          1.630   -0.001      -0.148   0.553    1.632   -0.027       -0.114   0.514    1.606   -0.081      -0.010    0.348    1.647    0.047      -0.100   0.682
                          cons        1.201    0.001      -0.158   0.379    1.167    0.030       -0.143   0.358    1.282    0.083      -0.019    0.199    1.194   -0.049      -0.093   0.472
Table A6 Monte Carlo Simulation: N=2000, Varying coefficient structure

                                           Type 1                            Type 2
                                                MAE     StErr                     MAE     StErr
   н µн±¦н µн±¦ = н µн±Ґн µн±Ґн µн±Ґн µн±Ґ(н µн±Ўн µн±Ў)      TRUE      Bias    Ratio   Ratio   TRUE      Bias    Ratio   Ratio
                       x1     -1.131    0.003 -0.074    0.101   -0.085    0.010   0.024   0.167
 CQR-Q10 x2                   -0.041    0.010 -0.030    0.116   -0.090    0.011   0.021   0.158
                       cons    0.358   -0.012 -0.060    0.187   -0.087   -0.043   0.208   0.184
                       x1      0.400    0.000 -0.041    0.072    0.849   -0.009 -0.023    0.059
 CQR-Q50 x2                    0.599   -0.002 -0.057    0.072    0.845   -0.006 -0.027    0.047
                       cons    0.999    0.004 -0.058    0.098    0.850    0.030   0.040   0.065
                       x1      1.937    0.000 -0.045    0.098    2.294   -0.007 -0.045    0.041
 CQR-Q90 x2                    1.240   -0.006 -0.055    0.094    2.293   -0.015 -0.069    0.047
                       cons    1.640    0.005 -0.055    0.134    2.291    0.006 -0.075    0.063
                       x1     -1.219    0.000 -0.087    0.194   -0.089    0.017 -0.030    0.179
 UQR-Q10 x2                   -0.049    0.002 -0.037    0.069   -0.079    0.014 -0.009    0.159
                       cons    0.492   -0.003 -0.039    0.067   -0.076   -0.057   0.165   0.166
                       x1      0.410   -0.003 -0.061    0.124    0.897   -0.007 -0.016    0.027
 UQR-Q50 x2                    0.528   -0.006 -0.060    0.116    0.733   -0.006 -0.028    0.022
                       cons    0.908    0.010 -0.066    0.129    0.765    0.016 -0.037    0.027
                       x1      1.999    0.002 -0.097    0.190    2.249   -0.002 -0.045    0.052
 UQR-Q90 x2                    1.314    0.002 -0.061    0.159    2.348   -0.003 -0.031    0.081
                       cons    1.747   -0.003 -0.109    0.191    2.458   -0.002 -0.044    0.072