WPS7028


Policy Research Working Paper                     7028




    GLS Estimation and Empirical Bayes
     Prediction for Linear Mixed Models
with Heteroskedasticity and Sampling Weights
        A Background Study for the POVMAP Project

                              Roy van der Weide




Development Research Group
Poverty and Inequality Team
September 2014
Policy Research Working Paper 7028


  Abstract
 This note adapts results by Huang and Hidiroglou (2003)                            the poverty mapping approach put forward by Elbers et al.
 on Generalized Least Squares estimation and Empirical                              (2003). The estimators presented here have been imple-
 Bayes prediction for linear mixed models with sampling                             mented in version 2.5 of POVMAP, the custom-made
 weights. The objective is to incorporate these results into                        poverty mapping software developed by the World Bank.




  This paper is a product of the Poverty and Inequality Team, Development Research Group. It is part of a larger effort by
  the World Bank to provide open access to its research and make a contribution to development policy discussions around
  the world. Policy Research Working Papers are also posted on the Web at http://econ.worldbank.org. The author may be
  contacted at rvanderweide@worldbank.org.




         The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
         issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
         names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
         of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
         its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                       Produced by the Research Support Team
 GLS Estimation and Empirical Bayes Prediction for
  Linear Mixed Models with Heteroskedasticity and
                Sampling Weights
                      A background study for the POVMAP project


                                 Roy van der Weide∗




Keywords: linear mixed models, small area estimation, Empirical Bayes, sampling weights,
poverty, inequality

JEL Classification: I32, C31, C43, C53


  ∗
    World Bank Research Department. Email: rvanderweide@worldbank.org. A big thank you goes
to Chris Elbers for providing comments on an earlier version of this note.
1         Introduction
The poverty mapping approach put forward by Elbers et al. (2003; henceforward ELL)
makes it possible to estimate poverty and inequality at a highly disaggregated level.
Depending on the geography of the country of interest, estimates of poverty might be
obtained for areas as small as a city or community, which greatly facilitates the targeting
of the poor among other applications (see e.g. Elbers et al., 2007). ELL achieve this
by means of a massive out-of-sample prediction exercise that “imputes” income or
consumption data for every household recorded in a population census. Once estimates
of consumption are available for all households in the population this data can then
be aggregated at almost any desired level of aggregation. The household consumption
model used for prediction is estimated to data from a household income survey where
the independent variables are restricted to those that are available in both the survey
and the census.
    A linear mixed model is assumed which is standard in the small area estimation
literature (see e.g. Rao, 2003). Spatial correlation between the residuals is accounted
for by means of a nested error structure that consists of a random area eﬀect and an
idiosyncratic household eﬀect. ELL believed their approach would be most convincing
if the assumptions about the errors are kept to a minimum. Speciﬁcally, the household
errors are allowed to be heteroskedastic and by default no assumptions are made about
the shape of the error distribution functions.
    The ELL approach has been applied to obtain poverty maps in over 60 countries
worldwide. Part of this success may arguably be attributed to its implementation in
POVMAP, a custom-made software package developed by the World Bank that can be
downloaded from the public domain at no cost.1 The POVMAP project has made ELL,
a computationally intensive approach, available to a large audience of applied users
(and has thereby greatly lowered the threshold for adopting ELL). The ﬁrst version
of POVMAP (i.e. POVMAP 1.0) ran under MS-DOS. A graphical user interface was
added with the second version (POVMAP 2.0). Both versions of POVMAP closely
follow the procedures from the original ELL publication.
    A decade has past since the original publication, a good time to take stock of new
developments. The developments that we will focus on in this note are Empirical Bayes
(EB) prediction married with the ELL approach (see Molina and Rao, 2010) while
accounting for unequal sampling probabilities in the income survey (see Huang and
Hidiroglou, 2003). EB prediction utilizes the survey data to narrow down the random
area eﬀects while non-EB prediction (i.e. conventional ELL) makes no such attempt.
    1
        POVMAP2 can be downloaded from: iresearch.worldbank.org/PovMap/



                                               2
As such, EB will only make a diﬀerence for areas that are represented in the survey
(for other areas EB reduces to conventional ELL prediction).2
    The objective of this note is to adapt the results by Huang and Hidiroglou (2003) on
EB prediction for generalized linear mixed models with sampling weights to the ELL
framework. Note that while the original paper by Molina and Rao (2010) implements
EB prediction by assuming homoskedastic errors, this assumption is easily relaxed, as
can be seen in this note. The introduction of sampling weights (as probability weights)
also concerns the estimation of the model parameters, which in this case involves a
modiﬁcation to Generalized Least Squares (GLS). Our note functions as a background
study for a milestone upgrade of POVMAP to version 2.5.
    Following Huang and Hidiroglou (2003) and Molina and Rao (2010) we assume nor-
mally distributed errors when we present Empirical Bayes prediction. For a treatment
of EB under less restrictive assumptions, see the recent study by Elbers and van der
Weide (2014). In POVMAP 2.5, the user will have a choice between normal EB (Molina
and Rao, 2010) and non-normal non-EB (ELL). The relative performance of these two
options will depend on: (a) the size of the random area eﬀect; (b) the number of small
areas represented in the survey; and (c) the degree of non-normality of the errors. Nor-
mal EB prediction is expected to do well if there are relatively large random area eﬀects,
if many of the small areas are covered by the survey, while the error distrbutions can
be reasonably well approximated by a normal distribution.
    The outline of the note is as follows. Section 2 introduces the model framework and
some notation. Section 3 presents the modiﬁcation to the GLS estimator due to the
introduction of probability weights. EB prediction is presented in Section 4, where we
explicitly allow both for sampling weights and heteroskedasticity.


2       Model and notation
Suppose that the (log) consumption data can be described by the following nested error
regression model:
                              yah = β T xah + ua + εah ,                           (1)

where the subscript ah refers to household h in area a, where yah denotes (log) household
per capita consumption, xah denotes a vector containing m independent variables, and
where ua and εah represent the area error and the household speciﬁc error with zero
    2
    The challenge is to identify the conditional distribution for the area error. When both the area and
the household errors are normally distributed, it follws that the area error conditional on the survey
data will also be normally distributed. If we allow the errors to be non-normally distributed however,
then working out the conditonal distribution will no longer be a trivial exercise (see Elbers and van
der Weide, 2014).


                                                   3
                                  2       2
means and variances denoted by σu   and σε,ah , respectively. The two errors are assumed
                                          2
independent from each other. Note that σε,ah    is permitted to vary between households,
        2
while σu is assumed to be a constant. For ease of exposition, we will assume that the
variance parameters are known.3
   Let na denote the number of households sampled in area a, so that n =              a na
denotes the total sample size. Let wah denote the sampling weight for household ah.
Let us also deﬁne W as the diagional matrix with the sampling weights wah along the
diagonal (sorted by area), and deﬁne Ω as a diagonal matrix with the following matrices
                                               h wah
along its diagonal (sorted by area): Ωa =          2  Ina , where Ina denotes the identity
                                               h wah
matrix of dimension na .
   We will at times also represent the model in matrix notation:

                                          y = Xβ + u + ε.                                            (2)

Let R = E [εεT ] denote the diagonal matrix with the household error variances σε,ah     2

along the diagonal (sorted by area). We will denote the diagonal block of R correspond-
ing to area a by Ra . Similarly, let Q = E [uuT ] be the block-diagonal matrix where the
                            2
blocks are given by Qa = σu   1na 1T
                                   na , where 1na denotes the unit vector of length na .




3       Estimation of β using GLS
You and Rao (2002) derive a GLS estimator for β with sampling weights under the
                    2       2
assumption that σε,ah   = σε  for all households by solving weighted moment conditions.
Huang and Hidiroglou (2003) have relaxed this assumption by permitting heteroskedas-
                              2
ticity, i.e. a non-constant σε,ah . Their GLS estimator reduces to the estimator of You
and Rao (2002) if one were to insert constant variances (which we will conﬁrm).
    The weighted GLS estimator for β from Huang and Hidiroglou (2003) satisﬁes:

                                   ˆw = (X T V
                                   β                        ˆ −1 y,
                                             ˆ −1 X )−1 X T V                                        (3)
                                              w               w


with:
                                                  ˆ + ΩQ,
                                        ˆw = W −1 R
                                        V              ˆ                                             (4)

where the two matrices W and Ω are functions of the sampling weights only (see Section
2).
    3                                      2        2        2
    See the Annex for the estimation of σu   and σε   = E [σε,ah ]. For the estimation of the conditional
           2
variances σε,ah we refer the reader to Elbers et al. (2003).




                                                   4
                   ˆw can be estimated by:
   The variance of β

                    ˆw ] = (X T V
                var[β                           ˆ −1 V
                                ˆ −1 X )−1 (X T V    ˆV             ˆ −1 X )−1 ,
                                                       ˆ −1 X )(X T V                    (5)
                                  w              w      w            w


with:
                                            ˆ + Q.
                                         ˆ =R
                                         V      ˆ                                        (6)

Note that Vˆ and Vˆw are two diﬀerent matrices. Also note that β   ˆw reduces to the
conventional GLS estimator if we insert constant sampling weights.


3.1                                              ˆw ]
                                      ˆw and var[β
        Expanding the expressions for β
In this subsection we will attempt to further work out the expressions for β ˆw and var[βˆw ]
with the objective to ease implementation. Note that β    ˆw is a function of V
                                                                              ˆw−1
                                                                                   . We will
drop the “hat” to ease notation. Due to the block-diagonal nature of Vw , we have that
              −1
its inverse Vw   too will be block-diagonal where its blocks solve the inverse of the blocks
of Vw .
                                                   ˆw as follows:
     This allows us to re-write the expression for β

                     ˆw = (X T V −1 X )−1 X T V −1 y
                     β                                                                   (7)
                                w              w
                                                   −1
                                       T −1                   T −1
                          =           Xa Va,w Xa             Xa Va,w ya ,                (8)
                                  a                      a


where Va,w denotes the area a block of Vw , and where Xa and ya denote the corre-
sponding area a “blocks” of X and y , respectively, containing only the rows from area
a. To further expand this expression let us work out the inverse of Va,w . Note that
            −1
Va,w = Wa      Ra + Ωa Qa , where Wa and Ra are both diagonal matrices of dimension
                      2
na with wah and σε,ah     along their diagonal, respectively. Recall that Qa is deﬁned as
        2
Qa = σu   1na 1T
               na , where 1na denotes the unit vector of length na .
    It will be convenient to represent the blocks Va,w as follows:

                                          2           wah
                           Va,w = Ra,w + σu         h
                                                       2
                                                             1na 1T
                                                                  na ,                   (9)
                                                    h wah

                                                                                        2
                                                                                       σε,ah
where Ra,w is a diagonal matrix of dimension na with diagonal elements given by        wah
                                                                                             .
The inverse of Va,w is then seen to solve:

                        −1    −1             γa,w        −1           −1
                      Va,w = Ra,w −           −1
                                                        Ra,w 1na 1T
                                                                  na Ra,w ,             (10)
                                         1T
                                          na Ra,w 1na




                                               5
where:
                                                           2
                                                          σu
                       γa,w =                        2                                                                 (11)
                                        2 +       h wah         −1        −1
                                       σu             wah
                                                          (1T
                                                            na Ra,w 1na )
                                                  h
                                                                 2
                                                                σu
                                  =                                                         −1 .                       (12)
                                        2 +          2                           wah
                                       σu         h wah            h wah          2
                                                                               h σε,ah


                               −1                                            −1
   Given this expression for Vw   , let us work out what this means for X T Vw  X and
  T −1
X Vw y separately, and then put these back together to obtain the alternative repre-
              ˆw . We begin with X T V −1 X .
sentation for β                        w


     −1              T −1
X T Vw  X =         Xa Va,w Xa
                a

                     T        −1              γa,w               −1           −1
           =        Xa       Ra,w −        T   −1 1
                                                                Ra,w 1na 1T
                                                                          na Ra,w Xa
                a
                                          1na Ra,w na
                                                                                 −1                                  −1
                     T −1                   −1
                                                                                Ra,w                                Ra,w
           =        Xa Ra,w Xa − γa,w (1T             T
                                        na Ra,w 1na )Xa                          −1
                                                                                                    1na 1T
                                                                                                         na          −1
                                                                                                                               Xa
                a
                                                                            1T
                                                                             na Ra,w 1na                        1T
                                                                                                                 na Ra,w 1na

                                  wah                                      wah
           =                       2
                                          xah xT
                                               ah − γa,w                    2
                                                                                       ¯a,w x
                                                                                       x    ¯Ta,w      ,
                a        h
                                  σε,ah                              h
                                                                           σε,ah

with:
                                                  1                    wah
                             ¯a,w =
                             x                   wah                    2
                                                                                    xah .                              (13)
                                                  2
                                               h σε,ah         h
                                                                       σε,ah
                                                             −1
By similar logic we obtain the following expression for X T Vw  y:

               −1                         wah                                         wah
          X T Vw  y=                       2
                                                       xah yah − γa,w                  2
                                                                                                   x    ¯a,w
                                                                                                   ¯a,w y       ,      (14)
                         a         h
                                          σε,ah                                 h
                                                                                      σε,ah

with:
                                                  1                    wah
                             ¯a,w =
                             y                   wah                    2
                                                                                    yah .                              (15)
                                                  2
                                               h σε,ah         h
                                                                       σε,ah
                                                  −1            −1
   Combining the expressions we obtained for X T Vw  X and X T Vw  y yields the fol-
                      ˆ
lowing expression for βw :
                                                                                                           −1
          ˆw =                         wah                                      wah
          β                             2
                                               xah xT
                                                    ah        − γa,w             2
                                                                                              ¯a,w x
                                                                                              x    ¯Ta,w
                       a      h
                                       σε,ah                               h
                                                                                σε,ah

                                       wah                                      wah
               ×                        2
                                               xah yah − γa,w                    2
                                                                                             ¯a,w y
                                                                                             x    ¯a,w     .
                       a      h
                                       σε,ah                               h
                                                                                σε,ah


                                                          6
                                   2       2                   2
If we assume constant variance σε,ah   = σε  , we have that σε   drops from the equation
altogether, in which case our expression for β ˆw is seen to coincide with the expression
obtained by You and Rao (2002) under the same assumptions.
    Let us next try to re-write the expression for the variance of βˆw in a way that will
make it easier to compute. Due to the block-diagonal nature of both Vw and V , it
                 ˆw ] can be written as:
follows that var[β

        ˆw ] = (X T V −1 X )−1 (X T V −1 V V −1 X )(X T V −1 X )−1
    var[β            w               w      w            w
                                       −1                                                         −1
                          T −1                     T −1        −1                    T −1
             =           Xa Va,w Xa               Xa Va,w Va Va,w Xa                Xa Va,w Xa         ,
                     a                        a                                a


where for ease of notation we have dropped the “hat” from the right-hand-side (RHS).
                                        T −1
Note that we have already expanded Xa    Va,w Xa when we revisited the expression for
ˆw , which leaves only X V Va V Xa . Let us ﬁrst examine the matrix V −1 Va V −1 .
β                        T  − 1    −1
                         a a,w    a,w                                      a,w   a,w
Writing out the matrix multiplication yields:

  −1      −1    −1      −1     2 −1            −1                    γa,w           −1           −1      −1
Va,w Va Va,w = Ra,w Ra Ra,w + σu Ra,w 1na 1T
                                           na Ra,w −              T   −1 1
                                                                                   Ra,w 1na 1T
                                                                                             na Ra,w Ra Ra,w
                                                                 1na Ra,w na

                2       −1           −1               γa,w          −1      −1           −1
             − σu γa,w Ra,w 1na 1T
                                 na Ra,w −             −1
                                                                   Ra,w Ra Ra,w 1na 1T
                                                                                     na Ra,w
                                                  1T
                                                   na Ra,w 1na

                2         γa,w          −1           −1           −1
             − σu      T    −1 1
                                       Ra,w 1na 1T            T
                                                 na Ra,w 1na 1na Ra,w
                      1na Ra,w  na
                                 2
                       γa,w           −1           −1      −1           −1
             +           −1
                                     Ra,w 1na 1T                    T
                                               na Ra,w Ra Ra,w 1na 1na Ra,w
                   1T
                    na Ra,w 1na

                2              γa,w          −1           −1           −1
             + σu γa,w          −1
                                            Ra,w 1na 1T            T
                                                      na Ra,w 1na 1na Ra,w .
                           1T
                            na Ra,w 1na


After rearranging terms we obtain:

                   −1      −1    2               −1           −1
                 Va,w Va Va,w = σu (1 − γa,w )2 Ra,w 1na 1T                T
                                                          na Ra,w + Ba Ra Ba ,


where:
                               −1             γa,w         −1           −1
                         Ba = Ra,w −           −1
                                                          Ra,w 1na 1T
                                                                    na Ra,w .                          (16)
                                          1T
                                           na Ra,w 1na




                                                  7
                     T −1        −1
Inserting this into Xa Va,w Va Va,w Xa yields:

 T −1        −1        2               T −1                                            −1
Xa Va,w Va Va,w Xa = σ u (1 − γa,w )2 Xa Ra,w 1na                                 1T             T        T
                                                                                   na Ra,w Xa + Xa Ba Ra Ba Xa
                                                                             2
                             2                2                 wah
                    =       σu (1    − γa,w )                    2
                                                                                 ¯a,w x
                                                                                 x    ¯Ta,w
                                                           h
                                                                σε,ah
                                        2
                                      wah
                    +                  2
                                                               ˇa,h xah xT
                                                               w         ah − γa,w x    ˇT
                                                                                   ¯a,w x a,w − γa,w x    ¯T
                                                                                                     ˇa,w x        2
                                                                                                                       ¯a,w x
                                                                                                            a,w + γa,w x    ¯Ta,w ,
                                 h
                                      σε,ah            h


where:
                                              ˇa,w =
                                              x                        ˇa,h xa,h ,
                                                                       w
                                                                 h

with:                                                                    2
                                                                       wah
                                                                        2
                                                                       σε,ah
                                                ˇa,h =
                                                w                          2
                                                                         wah
                                                                                  .
                                                                          2
                                                                       h σε,ah

   Putting the terms together gives us the following elaborate expression for the vari-
        ˆw :
ance of β
                                         −1                                                                             −1
    ˆw ] =
var[β                T −1
                    Xa Va,w Xa                              T −1
                                                           Xa           −1
                                                              Va,w Va Va,w Xa                          T −1
                                                                                                      Xa Va,w Xa
                a                                  a                                              a
                                                                                          
                                                                         2
                         2                                     wah
          = C          σu (1 − γa,w )2                         2
                                                                             x    ¯T
                                                                             ¯a,w x a,w
                                                                                         CT
                    a                                  h
                                                               σε,ah
                                   2
                                 wah
          + C                     2
                                                           ˇa,h xah xT
                                                           w         ah − γa,w x    ˇT
                                                                               ¯a,w x a,w − γa,w x    ¯T
                                                                                                 ˇa,w x        2
                                                                                                        a,w + γa,w x    ¯T
                                                                                                                   ¯a,w x a,w     CT .
                    a        h
                                 σε,ah            h


where:
                                                                                                               −1
                                      wah                                                 wah
             C=                        2
                                                  xah xT
                                                       ah        − γa,w                    2
                                                                                                  x    ¯T
                                                                                                  ¯a,w x a,w        .
                        a    h
                                      σε,ah                                           h
                                                                                          σε,ah

3.2      Probability weighted OLS nested as a special case
The weighted OLS estimator is eﬀectively obtained by setting Vw = σ 2 W and V =
σ 2 In , where σ 2 denotes the variance of the total error term. This yields the following
expression for β  ˜w :

                 ˜w = (X T W X )−1 X T W y
                 β
                                                                        −1

                        =                     wah xah xT
                                                       ah
                                                                                                       T
                                                                                              wah xah yah .
                                     a   h                                         a      h



                                                                  8
The corresponding variance solves:

    ˜w ] = σ 2 (X T W X )−1 X T W 2 X (X T W X )−1
var[β
                                                −1                                                              −1
                 2
           = σ                    wah xah xT
                                           ah
                                                                      2
                                                                     wah xah xT
                                                                              ah                  wah xah xT
                                                                                                           ah        .
                      a       h                         a        h                        a   h


Note that in Stata this estimator can be obtained by including the sampling weights as
“probability weights” in the regular regression function (without using robust standard
errors).


4       Empirical Bayes prediction assuming normality
Here we are interested in identifying the distribution of the area random error ua condi-
tional on the residuals ea for the households sampled from area a.4 This task is greatly
simpliﬁed by assuming that both ua and εah are normally distributed, as is done by
Huang and Hidiroglou (2003) and You and Rao (2002). It then follows that the dis-
tribution of ua conditional on ea too will be normal. What remains is to identify the
mean and variance of this distribution.
    Huang and Hidiroglou (2003) oﬀer an estimate of the conditional mean E [ua |ea ] for
the general linear mixed model. Applying their results to our nested error regression
                                                   2
model with potentially non-constant variances σε,ah   , we obtain the following:

                                                            h wah       2 T     −1
                              E [ua |ea ] = u
                                            ˆa =               2
                                                                       σu 1na Va,w ea ,                         (17)
                                                            h wah


where ea = (ea1 , . . . , eana )T denotes the vector of area a residuals coming out of the
(weighted) GLS regression. Note that we dropped the “hat” from the RHS to ease
                                                             −1
notation. Substituting the expression we derived for Va,w       (see eq. (10)) into eq. (17)
yields:

                            wah         2 T    −1                   γa,w         −1           −1
           u
           ˆa =           h
                             2
                                       σu 1na Ra,w −             T   −1 1
                                                                                Ra,w 1na 1T
                                                                                          na Ra,w ea
                          h wah                                 1na Ra,w na
                                    −1
                              1Tna Ra,w ea
                = γa,w              −1
                              1T
                               na Ra,w 1na
                                      wah
                                                 
                                   h    2
                                       σε,ah
                                               eah
                = γa,w                   wah
                                                   .
                                           2
                                       h σε,ah

    4
   For ease of exposition we will treat the residuals ea as if they were observed data, i.e. as if β was
known. In practice of course we will be working with estimates of ea .


                                                        9
    Huang and Hidiroglou (2003) unfortunately do not oﬀer an estimate of the variance
                     ˆa , which is what we would need to implement Empirical Best
of ua conditional on u
estimation. One way to compute var[ua |uˆa ] is to appeal to the law of total variance:

                                               ˆa ]] + var[E [ua |u
                         var[ua ] = E [var[ua |u                  ˆa ]]
                                   = E [var[ua |u           ua ].
                                                ˆa ]] + var[ˆ

                                                                             wah             wah
               ua ] it will be convenient to deﬁne αah =
To compute var[ˆ                                                              2
                                                                             σε,ah
                                                                                      /(      2
                                                                                           h σε,ah ):


                              ua ] = var[γa,w
                          var[ˆ                           αah eah ]
                                                     h
                                          2
                                    =    γa,w var[ua     +            αah εah ]
                                                                 h

                                       2        2                     2   2
                                    = γa,w     σu +                  αah σε,ah    .
                                                             h


Inserting this into eq. (18) gives us:

                                          2    2          2                 2   2
                      E [var[ua |u
                                 ˆa ]] = σu − γa,w       σu +              αah σε,ah .                  (18)
                                                                       h

                                                                              2      2
    It can be veriﬁed that under the assumption of constant variance σε,ah       = σε  , we have
                                                              σ 2
        2           2   2                 2    2        2
that σu   + h αah     σε,ah simpliﬁes to σu + σε    h αah = γa,w . In this case the conditional
                                                                u

                                                                   2
variance is seen to take the form: var[ua |u   ˆa ] = (1 − γa,w )σu  , which coincides with the
expression derived by You and Rao (2002) under the same assumptions. Interestingly,
                                                          2
var[ua |uˆa ] will be of the same form when we allow σε,ah   to vary but assume the sampling
weights to be constant. This representation of the conditonal variance does not apply
                                                      2
to the more general case however where both σε,ah          and the sampling weights will vary
across households.




                                               10
                             2      2
5      Annex: Estimation of σu and σε
A variation of Henderson’s method III estimator for the variance parameters that per-
mits the use of sampling weights can be found in Huang and Hidiroglou (2003). Let us
deﬁne (borrowing notation from Huang and Hidiroglou, 2003):

SSE =                       ¯a,w )2 −
                 wah (yah − y                     wah (yah − y            ¯a,w )T ×
                                                             ¯a,w )(xah − x
           ah                            ah
                                                                     −1
                                                            T
       ×                      ¯a,w )(xah − x
                   wah (xah − x            ¯a,w )                                          ¯a,w )(xah − x
                                                                                wah (yah − y            ¯a,w )
            ah                                                            ah
                                                                                                                           
                                                                           −1

    t2 = tr                                    ¯a,w )T
                                   ¯a,w )(xah − x
                        wah (xah − x                                                  2
                                                                                     wah (xah − x            ¯a,w )T 
                                                                                                ¯a,w )(xah − x
                   ah                                                           ah
                                                                     
                                       −1

    t3 = tr            wah xah xT
                                 ah
                                                    2
                                                   wah xah xT
                                                            ah
                                                               
                   ah                       ah
                                                                                    
                                       −1

    t4 = tr            wah xah xT
                                 ah                (       wah )2 x    ¯T
                                                                  ¯a,w x a,w ,
                                                                            
                   ah                         a        h


where y                                ¯a,w = h wah xah /( h wah ) denote the weighted
       ¯a,w = h wah yah /( h wah ) and x
mean of yah and xah , respectively. (Note however that these weighted mean variables
are diﬀerent from those deﬁned in the main text for they use diﬀerent weights.)
                                                    2      2
    The estimators for the unconditional variances σu and σε can then be obtained as:

  2                      SSE
ˆε,w
σ    =                              2
                                   wah
            ah wah −                        − t2
                                 h
                           a     h wah

                      2                                                                  −1
  2         ah   wah yah −(     ah   wah yah xT
                                              ah )              ah   wah xah xT
                                                                              ah   (          ah   wah yah xah ) − (   ah   wah − t3 )ˆ 2
                                                                                                                                      σε,w
ˆu,w
σ      =                                                                                                                                     .
                                                                        ah wah − t4


    You and Rao (2002) ﬁnd that the use of sampling weights makes little diﬀerence
for the estimation of the variance parameters (they opt for leaving out the sampling
weights for this purpose).




                                                           11
References
Elbers, C., Fujii, T., Lanjouw, P., Ozler, B. and Yin, W. (2007). Poverty alleviation
  through geographic targeting: How much does disaggregation help? Journal of
  Development Economics, 83, 198–213.

Elbers, C., Lanjouw, J. and Lanjouw, P. (2003). Micro-level estimation of poverty and
  inequality. Econometrica, 71, 355–364.

Elbers, C. and van der Weide, R. (2014). Estimation of normal mixtures in a nested
  error model with an application to small area estimation of poverty and inequality.
  Policy Research World Bank Working Paper, no. 6962.

Huang, R. and Hidiroglou, M. (2003). Design consistent estimators for a mixed linear
 model on survey data. Joint Statistical Meetings, Section on Survey Research Methods
 1897–1904.

Molina, I. and Rao, J. (2010). Small area estimation of poverty indicators. Canadian
 Journal of Statistics, 38, 369–385.

Rao, J. (2003). Small area estimation. London: Wiley.

You, Y. and Rao, J. (2002). A pseudo-empirical best linear unbiased prediction ap-
  proach to small area estimation using survey weights. Canadian Journal of Statistics,
  30, 431–439.




                                          12