Policy Research Working Paper                        11039




    Exploring Geospatial-Based Approaches
  to Develop a Pre-Census National Sampling
              Frame in Armenia
                             Saida Ismailakhunova
                              Avralt-Od Purevjav
                           Tsenguunjav Byambasuren
                                 Sarchil Qader




Poverty and Equity Global Department
January 2025
Policy Research Working Paper 11039


  Abstract
 The lack of a workable and accurate national sampling                              the pre-census enumeration areas tool, leveraging a semi-au-
 frame is one of the methodological constraints in conduct-                         tomatic and resource-efficient approach and the most recent
 ing representative national surveys. It undermines policy                          data available. The generated national sampling frame offers
 and research efforts in many developing countries, partic-                         several advantages over the existing potential sampling
 ularly those experiencing significant internal displacement                        frames in the country, which are based on census settle-
 and relocation due to territorial and military conflicts. This                     ments, electoral precincts, grid sampling, and old census
 paper addresses this issue by constructing the first digitized                     enumeration areas. This new frame is applied to conducting
 national sampling frame in Armenia, a conflict-affected                            the World Bank’s “Listening to Armenia” survey, demon-
 developing country, where reliable and accessible national                         strating its potential for other socioeconomic surveys in the
 sampling frames for household and individual surveys are                           country. The method can also be employed to generate and
 severely limited. The first-ever sensible digitized urban and                      update national sampling frames in other countries more
 rural classification boundaries in the country have been cre-                      efficiently, given that manually generating both the digi-
 ated for the sampling frame. The paper constructs a national                       tized and non-digitized national sampling frames requires
 sampling frame that meets international standards based on                         substantial financial and non-financial resources.




 This paper is a product of the Poverty and Equity Global Department. It is part of a larger effort by the World Bank to
 provide open access to its research and make a contribution to development policy discussions around the world. Policy
 Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted
 at sismailakhunova@worldbank.org.




         The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
         issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
         names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
         of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
         its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                       Produced by the Research Support Team
    Exploring Geospatial-Based Approaches to Develop a Pre-Census
                National Sampling Frame in Armenia


                                               Saida Ismailakhunova
                                                Avralt-Od Purevjav
                                             Tsenguunjav Byambasuren
                                                   Sarchil Qader1




JEL Classification: C55, C83, C80

Keywords: Sampling frame, Enumeration areas, Census, Geospatial, Household surveys, Developing countries




1   Ismailakhunova: Poverty and Equity Global Practice, The World Bank, Tbilisi, Georgia (email:
sismailakhunova@worldbank.org); Purevjav: Poverty and Equity Global Practice, The World Bank, Washington, DC 20433,
United States (email: apurevjav@worldbank.org); Byambasuren: Dyson School of Applied Economics and Management, Cornell
University, 438 Warren Hall, Ithaca, NY 14853, United States & Poverty and Equity Global Practice, The World Bank,
Washington, DC 20433, United States (email: tb497@cornell.edu); Qader: WorldPop, University of Southampton, University
Road, Southampton SO17 1BJ, United Kingdom (email: s.qader@soton.ac.uk). Acknowledgments: We thank Obert Pimhidzai,
Ambar Narayan, Carolin Geginat, Juan Muñoz, Walker Kosmidou-Bradley, Narine Tadevosyan, Hans Peterson, and the UNICEF
Armenia team for helpful feedback and comments. We are also grateful to CRRC-Armenia for sharing the sampling frame based
on the 2011 census settlements and the 2023 electoral precincts.
1 Introduction

The national sampling frame, a list of primary sampling units (PSUs) such as enumeration areas from a recent
census, is essential for collecting reliable data. This is a global challenge in conducting representative surveys
necessary for high-quality research and policy analysis with minimum bias in many developing countries,
especially in conflict-affected nations. Most countries, including the Republic of Armenia, have national
sampling frames based on census enumeration areas developed for past population censuses. However, the
census frames are often outdated, non-digital, incomplete, and restrictive or hardly accessible to parties other
than local governmental organizations, such as researchers, academics, and international organizations.
These issues are particularly severe in Armenia as the country recently experienced a large-scale refugee crisis
and dramatic change in population distribution across regions over time due to the Nagorno-Karabakh
Conflict. 2 The lack of accessible and workable population sampling frames due to these challenges
undermines the initiatives and efforts to collect surveys for statistical, policy, and research purposes.

This paper first systematically assesses the existing sampling frames in the country, focusing on those that
leverage census settlements—villages in rural areas, towns in urban areas, districts in the capital city,
Yerevan—and electoral precincts as PSUs. The strengths and limitations of these sampling frames and
potential solutions to those challenges and drawbacks have also been discussed. To overcome the limitations
of those existing and accessible national sampling frames in the country, this paper thus constructs an
innovative national sampling frame based on pre-census enumeration areas (pre-EAs) using a novel approach
and the most recent population estimates.

The primary challenge with using census settlements as PSUs is the relatively few and large-sized PSUs,
making the sampling extremely challenging and costly even in a stratified two-stage cluster sampling design
via increasing the monetary and non-monetary cost of household listing. Especially, in this frame, Yerevan
is split into only 12 districts with large populations and extensive territories. All districts will likely be selected
in the first stage of the two-stage stratified cluster sampling design according to the probability proportional
to size (PPS) method. In the second stage, thus, the survey conductor is likely to conduct the household
listing, the ideal and preferred approach for the second stage of the two-stage sampling, in the entire Yerevan,
which would take the costs and efforts required in the Census. So, under this sampling frame, household
listing is almost impossible for household surveys. 3 Electoral precincts, on the other hand, offer relatively




2 Armenia  and Azerbaijan have been in a territorial conflict over the Nagorno-Karabakh area which has a majority ethnic-Armenian
population but was controlled by Azerbaijan before the collapse of the Soviet Union in the late 1980s. Following the collapse of
the Soviet Union, the local parliament of Nagorno-Karabakh voted to become part of Armenia, which led to an ethnic clash
between the two countries. Then the first Nagorno-Karabakh War started and ended in 1994 with an agreement stating that the
area remains part of Azerbaijan but run by a self-declared republic of ethnic Armenians, backed by the Armenian government.
However, Azerbaijan regained power over the area in 2020, and more than half of the estimated 120,000 ethnic Armenians fled
from Nagorno-Karabakh to Armenia since the ceasefire (https://www.bbc.com/news/world-europe-66852070).
3 Although it is not ideal and not encouraged, one might consider using a random walk approach to select households in the

second stage of the two-stage sampling as a last resort under pressing constraints. However, sample households will likely be
concentrated in particular areas or demographic groups under this approach if the random walk is not properly executed, which
happens most of the time, because the enumerators can walk around the starting point. Listing the households is preferred over
random walk but requires time and financial resources. Survey conductors tend to use random walks if subject to those constraints,
like the Listening to Armenia Survey. If the starting point and skipping steps are randomly selected, the random walk approach
would be essentially the same as random sampling with equal probabilities. However, households selected via random walk can
be concentrated around certain areas, leaving out most PSU territories. Consequently, representativeness suffers, and only a sub-
population might be sampled. Due to these and other issues of the random walk method that can introduce serious biases, it is


                                                                2
more PSUs with smaller sizes; however, cartography or boundary information is often unavailable for this
type of sampling frame like in Armenia and the PSU sizes are still quite large, indicating a costly procedure
for household listing as well. 4 Hence, alternative cost-effective national sampling frames based on reasonably-
sized PSUs with digital boundaries are worthwhile and necessary to facilitate conducting household surveys
essential for policy making and research analysis.

The national sampling frame is commonly based on three sources: Census enumeration areas (EAs), 5 sub-
national administrative boundaries, and a gridded population sampling frame. The census is conducted every
5-10 years or more, depending on financial and administration costs and the nature of the census questions.
For example, Armenia’s sampling frame based on census settlements relies on the 2011 Census, which is
severely outdated due to demographic changes associated with population displacements and mobility across
regions in response to territorial conflicts. 6 The Committee of the Republic of Armenia (ArmStat) conducted
a Population Census using a combined approach of administrative data and sampled census in 2022,
providing a potential up-to-date national sampling frame. Due to the COVID-19 pandemic, the census was
held in a hybrid format, which was the case in many countries. However, this frame, based on a sample of
25% of the addresses in the State Population Register (SPR), is restrictive and unreachable. Because this
dataset is inaccessible, it is crucial to have an alternative methodology to update the dataset from the previous
census. 7 One of the early techniques to overcome this challenge is grid sampling, where cells—rectangular
areas with population estimates—serve as PSUs. The sample frame relies on millions of grid cells publicly
available from different data sources, such as the Gridded Population of the World (GPW), which can be
accessed through platforms like Google Earth Engine. The grid size varies across data sources, and the choice
of the grid size and data provider differs across countries. The grid sampling, however, is subject to some
issues. First, grid boundaries are unnatural and can go through buildings and disregard visible features.
Second, although the spatial size of the grids is equal, the population size of the grids is unequal. Sparsely
populated grids need to be collapsed, and densely populated grids need to be segmented (Qader et al., 2020,
2021). 8

This paper thus provides Armenia’s first digital national sampling frame, successfully developed using various
innovative geospatial techniques and datasets. Multiple datasets and maps have also been produced to
facilitate the ground survey data collection. The proposed national sampling frame, where PSUs or EAs are
automatically created, offers several advantages over other sampling frames (Qader et al., 2021). First, the
EAs have natural boundaries like streets and natural elements like rivers. Second, administrative boundaries
are strictly respected because the EAs are nested within administrative boundaries by design. While some




not widely accepted as a standard approach. This paper does not discuss the method in detail since the focus is on the national
sampling frame.
4 The sampling frame that relies on the list of electoral precincts has been used by the “Caucasus Research Resource Center –

Armenia” Foundation (CRRC-Armenia), an independent, non-partisan research center, for the Caucasus Barometer Biennial
Survey that studies citizens’ views on democracy, governance, political reforms, and socio-economic developments in the region.
5 The enumeration areas are the operational geographic units for the collection and dissemination of census data and are often

used as a national sampling frame for various types of socio-economic surveys.
6 Figure A.1 visually demonstrates an example of significantly changing population distribution over time to show the magnitude

of the shifting population, which indicates the limitation of sampling frames with outdated population information.
7 Several attempts have been made to access the State Population Register (SPR) but failed to obtain the sampling frame based on

SPR addresses. An alternative approach thus has been explored due to the time constraint.
8 A sampling frame based on the traditional grid sampling method has been explored. However, several challenges are encountered

for this approach. For example, it is challenging to identify the grids' settlement type (rural or urban). Identifying administrative
units, e.g., districts or villages, to which a particular grid belongs is also challenging. In addition, this strategy has other limitations,
such as grid boundaries cutting buildings or settlements.


                                                                     3
automatically created boundaries can be unnatural, creating EAs via an automatic procedure avoids any
geometric errors (e.g., pockets, disjoint sections, and overlap) and is much more resource-efficient than
manual creation by a human cartographer. 9 Third, the population estimates are generally homogeneous, i.e.,
neither too small nor too large, even though they require further adjustments. Although some basic
comparisons were carried out to evaluate the gridded population outputs in this paper, comprehensive
validation of gridded population estimates is out of the scope of this paper due to a lack of data, time, and
resources.

The datasets used to create a semi-automatic mapping of pre-EAs include high-resolution gridded
populations, the spatial distribution of settled areas, and publicly available natural and administrative
boundaries from numerous sources such as OpenStreetMap (OSM) and WorldPop. The paper also
implements cross-validation exercises to examine the applicability of the proposed sampling frame and
compare it with other existing sampling frames in the country. For this purpose, several datasets have been
leveraged, including (i) the 2011 Census with spatial information on marzes 10 and settlement type (urban or
rural), (ii) Census settlements that are based on the 2011 Census, (iii) 2023 electoral precincts, and (iv) other
population data from the ArmStat at the aggregate level, such as marz-settlement type (urban and rural
regions).

This paper contributes to the literature and the field of survey sampling in several ways. First, the paper
develops a national sampling frame in Armenia based on pre-census enumeration areas (pre-EAs) as an
exemplary case to demonstrate the applicability of the semi-automated spatial technique which can benefit
many other countries. The method has been implemented and tested in several countries such as Somalia,
which lacks a digital national sampling frame (Qader et al., 2020, 2021), Cameroon, which has a special
demand for a customized national sampling frame for refugees, the Democratic Republic of Congo (Qader
et al., 2023), and Burkina Faso (Qader et al., 2022). The tool has also been piloted in Uzbekistan as the first
case in Central Asia. However, the implementation of the approach in Armenia is much more extensive.

Second, the national sampling frame created in this paper contributes to the survey collection in Armenia.
Depending on their availability, using various sampling frames within the country makes it challenging for
researchers, policy makers, and users of household and individual survey datasets to compare results across
surveys. This paper thus provides a methodological contribution by offering a national sampling frame based
on publicly available datasets, which contributes to avoiding the use of frames with different population
estimates. This supports all individuals and organizations conducting household surveys by providing a
standardized frame that ensures consistency across surveys and decentralizes the sources of the national
sampling frame.

Third, a rigorous evaluation of various sampling frames, including the new national sampling frame, provided
in this paper contributes to the survey sampling literature and practice in Armenia. To the best of our
knowledge, no study has systematically compared various sampling frames yet, especially in Armenia. Our




9 The manual approach to delineate enumeration areas (EAs) can also be challenging due to security restrictions on accessing

fragile, conflict-affected, and violent areas, like COVID-19 restrictions.
10 Armenia is subdivided into eleven administrative divisions. Of these, ten are provinces, known as marzer (մարզեր) or in the

singular form marz (մարզ) in Armenian. The marz is the first administrative unit in Armenia, which is equivalent to a region or
state. Yerevan is treated separately and granted special administrative status as the country's capital.


                                                              4
sampling frame does not necessarily substitute the existing sampling frames but could complement them,
particularly the national census frame, and can be used as an alternative.

Finally, this approach creates demand for public datasets like OpenStreetMap. The existence of such high-
quality, public, geospatial data can return tens to hundreds of millions of dollars in societal value, even
without counting their indirect benefits (Hansen and Schrøder, 2019).

2 Existing and Accessible Sampling Frames

The household survey is one of the most popular methods for collecting population and socioeconomic
indicators. An accurate sampling frame is essential to carry out household and other surveys at the national
level. These frames are frequently created and updated during the national housing and population census.
Some countries may not have a digital national sampling frame because their census was not digitized and
more seriously some might not even have a physical map of the census frame, or it might be not accessible.
Additionally, the frames might be outdated because demographic factors change quickly, and most censuses
are conducted every ten years. In Armenia, there is no workable and accessible map or cartographic
information that can be used for a national sampling frame. It is a significant obstacle to conducting
nationally representative socioeconomic surveys. There is also no up-to-date digitalized national sample
frame in the country.

The last census completed in Armenia was conducted in 2011 (ArmStat, 2024), and there are currently no
up-to-date enumeration areas that can be employed as national sampling frames for representative
socioeconomic surveys. The spatial resolution of the census results in the current use is restricted to the
provincial or district level (2nd and 3rd administrative units), making it difficult to understand how people are
distributed at the facility, sub-district, or neighborhood levels—the levels where policy interventions are
typically provided. In developing countries, creating a sampling frame for surveys using representative
community samples usually involves manually delineating small geographic areas (or enumeration areas) on
high-resolution satellite imagery. This is a standard method employed by multiple National Statistical Offices
(NSOs) but is logistically complex, and requires substantial material resources, Geographic Information
System (GIS) experts, and training (Qader et al., 2020, 2021). In addition, the process is extremely time-
consuming and expensive and often leads to postponing the survey. For example, it can take 2-3 years to
carry out a survey (Grosh and Glewwe, 1995). These factors demand a fast and inexpensive sampling frame
and population enumeration methodologies.

In this paper, a digital national sampling frame has been created using a semi-automatic and cost-effective
approach. However, background on the existing sampling frames in the country is provided before discussing
the new sampling frame. Three existing datasets that can serve as a potential sampling frame for household
surveys include the 2022 Census with addresses from the State Population Register (SPR), the 2011 Census
with settlements (villages in rural areas, towns in non-Yerevan urban regions, and districts in Yerevan), and
2023 election data with electoral precincts.

The ArmStat conducted a Population Census in November 2022 using a combined approach based on
administrative data from the State Population Register (SPR) and a sample of 25% of the addresses in the
SPR (ArmStat, 2022). In this dataset, primary sampling units (PSUs) are the workload of SPR addresses for
each enumerator during the census. Although these PSUs do not have identifiable boundaries like the
traditional census enumeration areas (EAs), this dataset can be used as a sampling frame since they cover
approximately 93% of all addresses in the country. The household listing in this dataset is as of October


                                                       5
2022; however, the addresses are distant because the EAs are large. Regardless of its strengths and limitations,
this frame is inaccessible as the data is only available on a restricted computer at ArmStat (Pettersson, 2023).

Therefore, our focus is on the latter two datasets, including the 2011 settlement-based frame and the 2023
electoral precinct-based frame, and their advantages and disadvantages are described below.

2.1 Census Settlements

The most granular spatial information in this dataset is “settlement”. There are 1,037 settlements (980
villages, 45 towns, and 12 districts in Yerevan). The main advantage of this frame is that it identifies slightly
over 1,000 geographical areas in the country, which is better than having only regions or marzes.

Suppose that 400 settlements were selected in the first stage of the two-stage stratified cluster sampling design
as primary sampling units (PSUs), and all 12 districts in Yerevan are likely to be selected via probabilities
proportional to size (PPS) approach in the first stage due to the large population in these large districts.
Supposedly, if 10 households were randomly selected from each district in the second stage, the sample size
from Yerevan would be 120 households, only 3% of the total sample of 4,000 households. However,
approximately 35% of the population and 38% of total households live in Yerevan according to the 2011
Armenia Census. One can select a disproportionate number of households from each PSU in the second
stage to account for the variations in the size of the PSUs in the first stage. For example, selecting 100
households from each district in Yerevan would yield 1,200 households from Yerevan, which is 30% of the
sample.

It is better than having spatial information only at the level of administrative units or strata (strata is defined
as a combination of marz and settlement types–urban or rural status–following other official and major
surveys) like in several other easily accessible datasets. The primary and more serious disadvantage of this
sampling frame, however, is the large-sized settlements, especially in Yerevan. Although the identification
information is unavailable, the Census frame from the Committee of the Republic of Armenia (ArmStat) has
about 12,000 enumeration areas. The settlements in this frame are thus 12 times larger than the Census
enumeration areas on average. Using a few large PSUs would also conceptually undermine the two-stage
sampling design, converging to a one-stage design. Large “settlements” thus must be segmented into smaller
areas to make workable PSUs. 11 For example, large-sized PSUs were manually segmented into smaller PSUs
in the past, such as in Nepal (Central Bureau of Statistics of Nepal, 1996); however, this traditional or manual
method is costly and time-consuming in practice. Thus, an innovative technique has been proposed in this
paper to divide these large areas into smaller areas.

Another problem with this potential sampling frame is that the population based on the 2011 Census in this
sample frame is outdated and misallocated. The outdated data is generally not a severe concern for national
sampling frames because any survey fielded before the 2022 Armenia Census could be based on the 2011
Census frame. However, this problem can be significant in Armenia, given that it has been experiencing
substantial household displacement and domestic migration due to territorial conflicts in recent years, i.e.,
the current spatial distribution of the population might be significantly different from that during the




11If these large census settlements were divided into grids, challenges related to the traditional gridded sampling approach would
be raised, such as building-cutting boundaries.


                                                                6
previous census. One can update the population using population growth at some aggregate level. If the
population is updated at a more granular level than administrative units, changes in the population
distribution across areas can at least partly resolve problems due to outdated data. However, any adjustments
will not change the PSU’s probability of being selected in the first stage if the adjustments are made at the
administrative units or more aggregate level, such as regions. The reason is that the PPS is specific across
administrative units, and any monotonic transformation of PSU size within the administrative units cannot
change the PSU’s selection probability. Therefore, the population and number of households should be
adjusted at a level more granular than administrative units.

2.2 Electoral Precincts

The data on electoral precincts is publicly available and generally trustworthy as the official general election
committee publishes the data. There are two advantages to using this dataset as a sampling frame for
household surveys. First, the data is regularly updated and current. Second, there are 1,992 electoral precincts,
more than the 1,037 settlements in the 2011 Census settlement data. Therefore, the spatial information is
relatively more granular than the 2011 Census data, and problems associated with a few large-sized PSUs are
less severe than those under a potential sampling frame based on census settlements.

However, like the “settlements” in the 2011 Census data, employing electoral precincts as PSUs also poses
a problem associated with large-sized PSUs. It is also worth mentioning that there are also some small-sized
electoral precincts, which are less problematic. As stated in Pettersson (2023), the size of voting point areas
in Armenia ranges from 7 addresses to 1,200 addresses. The small voting point (VP) areas can be combined
with a neighboring area or neighboring areas, and large VP areas can be divided into smaller segments. The
procedure for merging multiple VP areas should be straightforward; however, segmenting large areas could
be costly as it requires some spatial analysis and probably cartographic tasks.

Additionally, the boundaries of PSUs are needed because the enumerator might go beyond the targeted area
if they do not know the boundaries. This feature is absent in both sampling frames under investigation in
this section since there is no boundary (neither digitally nor physically) available for the Census settlements
and electoral precincts. However, it is more disadvantageous for the precinct-based sampling frame because
precincts are relatively smaller in area than settlements. The probability of enumerators walking into the non-
selected PSUs that neighbor with the selected PSU is thus higher for precincts. Enumerators can walk outside
the boundary for some very small electoral precincts, such as the one with 7 addresses, if not provided with
a proper map while doing the fieldwork.

The pros and cons of a frame based on the 2023 electoral precincts show that it is relatively better than the
frame based on the 2011 Census settlements. Thus, the 2023 electoral precincts have been further evaluated
as a sampling frame for representative individual- and household-level surveys by exploring the data on
electoral precincts. The size of the electoral precinct is measured by the number of voters or adult population,
which leaves out children or individuals younger than 18 years old. Table 1 shows the distribution of strata
size using total and adult population. The strata are similarly defined as earlier, i.e., the combination of marz
and settlement type—urban or rural status. The 2020 population is obtained from the WorldPop predicted
based on the 2011 Census (Bondarenko et al., 2020). Then the 2020 population has been predicted further
to 2022 as shown in Column 1 using strata-level population growth obtained from the Committee of the
Republic of Armenia (ArmStat). The estimated total population in 2022 in this data is 2.9 million, close to
the official statistics from ArmStat. Column 2 displays the 2023 number of voters (adult population who are
18 or older) obtained from the electoral precinct-based sampling frame. Column 3 presents the difference
between the population and the adult population in subsequent years. Although these two population

                                                       7
numbers are in different years, there are some irregular instances where the adult population exceeds the
total population by approximately 15,000 people, for example, in rural areas of Ararat.

                         Table 1: Population and number of voters across strata
                                                               Population
                                                                              Number of voters        Difference
                                                             (2022 estimate
                        Settlement                                            (18 or older, 2023   between total and
 Marz name                                  Strata ID        based on 2020
                           type                                                 election data)      adult population
                                                             WorldPop data)
                                                                   (1)               (2)             (3) = (1) - (2)
 Aragatsotn               Urban                 1                26,738            27,847                -1,109
 Aragatsotn               Rural                 2                98,949            82,622                16,327
 Ararat                   Urban                 3                72,294            56,730                15,564
 Ararat                   Rural                 4               136,288            150,805              -14,517
 Armavir                  Urban                 5                82,953            76,205                 6,748
 Armavir                  Rural                 6               183,009            137,823               45,186
 Gegharkunik              Urban                 7                65,902            61,823                 4,079
 Gegharkunik              Rural                 8               161,812            109,803               52,009
 Kotayk                   Urban                 9               137,493            126,680               10,813
 Kotayk                   Rural                10               108,771            94,876                13,895
 Lori                     Urban                11               124,050            134,962              -10,912
 Lori                     Rural                12                87,532            76,157                11,375
 Shirak                   Urban                13               133,620            128,691                4,929
 Shirak                   Rural                14                96,856            77,832                19,024
 Syunik                   Urban                15                90,205            65,044                25,161
 Syunik                   Rural                16                44,350            33,042                11,308
 Tavush                   Urban                17                49,859            39,214                10,645
 Tavush                   Rural                18                69,943            58,907                11,036
 Vayots Dzor              Urban                19                16,160            16,811                 -651
 Vayots Dzor              Rural                20                31,501            25,678                 5,823
 Yerevan                  Urban                21              1,098,866           824,317              274,549
 Armenia                                                       2,917,151          2,405,869             511,282
Notes: The 2023 election data on the number of voters or adult population and other spatial information are obtained
from the CRRC-Armenia. Column (3) presents the difference between the 2022 population updated based on
WorldPop data and the 2023 number of voters or adult population from the election data.

As shown in Column 2 of Table 1, the total number of voters or adult population is 2.4 million, which is
quite close to the total population. It indicates approximately 18% of the population is children below 18
years old. However, children below 18 years of age account for around 23%-24% of Armenia’s population
in other datasets. So, Armenia’s total number of adult populations from different and official data sources
has been investigated to compare the aggregate numbers. Table 2 reports the findings. The share of the
adult population from the 2011 Armenia Census (population 0-17 years old) is 77%, and the adult population
share based on the combination of 2022 World Bank data (population 0-14 years old) and the 2011 Armenia
Census (population 15-17 years old) is 76%, indicating that election data overstates the number of adult
populations by 4%-5%.

Despite these discrepancies, the PSU size measured by the number of adults or voters is not a serious concern
since the total and adult populations across strata or administrative units are strongly and positively correlated
with the correlation coefficient ������������ = 0.996 (������������-value = 0.000).




                                                         8
                                 Table 2: Share of adult population in different datasets
                                     Adult population                Adult population                  Number of voters
                                     (18 or older, 2011   (18 or older, 2022 World Bank data &         (18 or older, 2023
                                     Armenia Census)              2011 Armenia census)                   election data)
                                             (1)                            (2)                                (3)
     Share in total population              77%                            76%                                81%

Notes: The 2011 Census data used in Column 1 reports age-specific population, and the share of the population with
18 or older is shown.

In addition to the absolute value of PSU size, the distribution of the size measure across PSUs is also
important. Figure 1 thus plots the distribution of the 2023 adult population across electoral precincts. It is
ideal to have equally sized PSUs or the PSU size to be evenly distributed in the sample frames, and the
population in census frames tends to be normally distributed with few small and large PSUs. However, the
distribution of voters across electoral precincts is U-shaped. The precinct size ranges from 10 to 2,061 voters,
with the mean (median) size of 1,208 (1,399), confirming that merging and segmentation are necessary to
make the electoral precinct-based frame workable, consistent with the distribution of households in
Pettersson (2023).

     Figure 1: The distribution of voters or adult population across electoral precincts in Armenia in
                                                    2023




Finally, no other major and official surveys use electoral precincts for sampling frame purposes in Armenia.
Major surveys such as the Demographic and Health Survey (DHS) for Armenia leverage sample frames based
on enumeration areas. 12 Table 3 lists the sampling frames of the major surveys in Armenia. The lack of a




12   Refer to Appendix A of Armenia’s DHS                 2015-2016   Report   for   details   about   its   sample   design
(https://dhsprogram.com/pubs/pdf/FR325/FR325.pdf).


                                                            9
usable and accessible sample frame based on enumeration areas demands a workable and up-to-date national
sampling frame in the country. Thus, given the unsuitability of the existing sample frames, this paper
develops a national sampling frame based on pre-defined enumeration areas (pre-EAs) in Armenia that
conceptually prevails over the existing sampling frames evaluated in this section.

                           Table 3: Sampling frames of major surveys in Armenia
 Surveys                                                              PSUs in the sampling frames
 Demographic and Health Survey (DHS)                Enumeration areas in the Armenia Population and Housing Census
 Integrated Living Conditions Survey (ILCS)                      Population census enumeration areas
 UNICEF Multiple Indicator Cluster Survey                   Currently in search of a suitable sampling frame
     (MICS, first ever in Armenia)
Notes: The PSUs stand for primary sampling units.


3 Data and Methods

3.1 Input Datasets

This section describes the datasets obtained from different sources to establish the national sampling frame
and support collecting field data in Armenia.

3.1.1 Gridded Population

The gridded population for Armenia is obtained from WorldPop (Bondarenko et al., 2020) and is based on
the 2020 population census or projection-based estimates for 2020. The population data contain an estimated
total number of people per grid cell (panel (a) Figure 2). The data at a resolution of 3 arcs (about 100 meters
at the equator) can be downloaded in Geotiff format with the projection of Geographic Coordinate System,
WGS84. The estimated number of people is in units of one pixel. The values marked with “NoData” denote
regions mapped as unpopulated according to the results of the Built-Settlement Growth Model (BSGM)
developed by Nieves et al. (2020). The WorldPop gridded population dataset was produced by disaggregating
the projected subnational population totals into grid cells using machine learning techniques and a variety of
geospatial layers derived from satellite imagery (Stevens et al., 2015).

3.1.2 Digitized Features Visible from the Ground

The outline of enumeration areas (EAs) should be positioned in line with prominent visible ground features
to facilitate ground-based data collection. Extensive digitized visible ground features are needed to ensure
that the outline of the pre-EA meets the criteria during the automatic creation of the national sample frame
in Armenia. Thus, extensive digital features, both natural and man-created, mostly from OpenStreetMap
(OSM) are compiled. Panel (b) of Figure 2 illustrates information obtained from the OSM data, including
road networks, waterways, and railways. This data provides modifiable and updatable inputs to enable
multiple iterations of EA generation. The figure depicts the input datasets where the two datasets have not
been combined yet. The subsequent figures show how the input datasets have been used to divide the country
with their estimated total population.




                                                          10
                     Figure 2: Population estimates and digitized visible ground features




Notes: Panel (a) shows the gridded population estimates at ∼100×100 meters, while panel (b) depicts the digitized
visible ground features from OSM. Basemap: ESRI Satellite Imagery.

3.1.3 Settlement Boundary and Classes

Determining the precise location and boundary of each settlement is essential for creating pre-EAs and
providing field direction. The settlement boundaries can facilitate identifying the outline of settled areas and
preventing the mixing of pre-EAs among large and settled areas. Thus, the settlement boundary has been
created and the settlements have been classified into urban and rural classes. First, using the Global Human
Settlement Layers (GHSL) product, settlements across the country are identified and converted to a vector
layer to generate the settlement boundary. Then, the settlements are classified into urban and rural classes
with the help of GHSL classes. The GHSL provides various settlement information in different spatial and
temporal resolutions with various informative classes. 13 The following two datasets are acquired from GHSL
for settlement delineation.

GHS-BUILT-S R2023A: The spatial raster dataset known as GHS-BUILT-S illustrates the distribution of
built-up (BU) surface estimates in five-year intervals between 1975 and 2030, together with two functional
use components: the overall BU surface and the non-residential (NRES) BU surface (Schiavina et al., 2023).
Panel (a) of Figure 3 displays the data, which is created by spatially and temporally interpolating five
observed sets of multiple-sensor and multi-platform satellite images (Landsat and Sentinel 2).

GHS-SMOD R2023A: Settlements have been delineated and classified globally by the GHS Settlement
Model layers (GHS-SMOD) using the logic of cell cluster population size, population, and built-up area



13   https://human-settlement.emergency.copernicus.eu/index.php


                                                            11
densities as recommended by the United Nations Statistical Commission and defined according to stage I of
the Degree of Urbanization (Eurostat, 2021). The built-up surface, land layer, and a 1km2 population spatial
raster dataset are the inputs for the GHSL SMOD. As shown in panel (b) of Figure 3, the GHS SMOD
classifies the 1km2 grid cell at the first hierarchical level into three spatial entities, including “urban center”,
“urban cluster”, and “rural grid cells” (Schiavina et al., 2023).

                                  Figure 3: Global human settlement layer




Notes: Panel (a) presents the built-up area, and panel (b) shows the SMOD classes. Basemap: ESRI Satellite Imagery.

3.1.4 Administrative Boundary

It is necessary to nest the produced enumeration area (EA) boundary inside administrative borders. The
generated EA boundary should be nested within administrative boundaries. The HDX website
(https://data.humdata.org/) provided the necessary administrative border, which is based on the 2011
census, for Armenia.

3.2 Semi-automatic Creation of pre-EAs

This section describes the semi-automated process employed to generate the first digital national sampling
frame in Armenia. The national sampling frame is automatically created, although some manual adjustments

                                                        12
may be required to improve the outputs due to inadequate and poor-quality input datasets. The process is
categorized into three main parts presented in the following sections.

3.2.1 Urban and Rural Classification

People living in and around cities are referred to as the urban population, which usually has a fairly high
population density. In contrast, a rural population is dispersed over vast areas of land and is found in
developing regions. The overall population and geographic area are primary determinants of the size of the
enumeration areas (EAs). A balance between maximum population and area constraints is needed to produce
EAs of manageable size. Because of the disparities in population density and distribution between urban and
rural administrative units, different criteria should be applied while forming EAs.

Armenia does not have well-defined boundaries to separate urban and rural areas. The following approaches
have been used to establish the border between Armenia’s urban and rural classes as a first step in developing
the national sampling frame:
    1. Take the GHS-SMOD dataset and extract all urban classes; combine them into a single class.
    2. Convert the raster dataset from the combined classes to polygon vector format. These polygons
       represent the urban area.
    3. Extract the built-up area with values greater than 0 from the GHS-BUILT-S data.
    4. Create polygons from the raster built-up area. The country’s settlement boundaries and extent are
       represented in this output.
    5. Apply a 50-meter buffer to output 4 to account for recent urban growth and prevent cutting
       structures at the edge of the settlements.
    6. Intersect output 5 with the administrative boundary in Armenia.
    7. Make the necessary manual and automatic adjustments when necessary. For example, an
       administrative boundary should be regarded as urban if more than 90% of areas are urban (e.g.,
       Yerevan).
    8. Any polygon in output 7 that crosses the output 2 boundary is an urban area. The rural class
       encompasses the remaining portion of the administrative boundaries.

3.2.2 The preEA Tool

The “preEA” is a powerful and flexible package that has been developed by WorldPop in close collaboration
with GeoData and consultation with various governments, UN agencies, and experts around the world
(Qader et al., 2021, 2023). Since its development is ongoing, the tool is not yet available to the public. It has
been designed as a user-friendly QGIS plugin using the Python programming language. The implementation
of the preEA tool is fully automatic; however, some preparation processing steps should be taken before
using the tool. These processes will be elucidated below.

Input Data Preparation: The data preparation consists of three steps. First, the input boundary datasets
are reprojected into the projected UTM WGS 1984 coordinate system. The preEA tool is compatible with
such projection to ensure that the output units are in familiar area units such as meters or feet. Second, the
digitized boundary (roads, waterways, and railways) is masked by the extent of the administrative boundary.
Third, to prevent the creation of sliver polygons in the outputs, double lines (such as motorways) are merged

                                                       13
in the road datasets. It was accomplished by utilizing a 25-meter distance on Merge Divided Road in ArcGIS
Pro. A single line will be created from any roads that fall within 25 meters of each other and have the same
road code or class (Figure 4).

                                     Figure 4: Preparation of road dataset




Notes: Panel (a) shows the original road data, while panel (b) presents the road data after applying the Merge divided
road technique. Basemap: ESRI Satellite Imagery.

Create Uncrossable Features: Enumerators should avoid crossing major obstacles when collecting data
on the ground to increase efficiency and save costs. For this purpose, different features were set uncrossable:

(1) Uncrossable lines: In the OSM line features, the name class of the digitized features is recorded in the
“fclass” column. After a considerable visual assessment, road classes such as primary, secondary, trunk, and
tertiary, and waterway classes such as river were extracted, combined, and defined as uncrossable lines.

(2) Uncrossable settlement boundary: It is always preferred if enumeration areas (EAs) from different cities, towns,
or villages are kept apart from one another. Three procedures are used to establish uncrossable settlement
boundaries to prevent this. First, the Zonal Stat technique was used to summarize the total population in
each settlement polygon that was created in section 3.2.1. Second, a visual inspection approach was utilized
to identify the minimum population threshold to define the uncrossable settlement boundaries.
Consequently, all settlement boundaries with a population of more than 200 were declared uncrossable.
Third, the polygons containing more than 200 people were transformed into lines and merged with additional
uncrossable lines to get the final uncrossable features.

Implementing the preEA Tool: In the first part of the process, the preEA tool divides the region into
small geographic units by polygonizing all the input feature datasets. The user needs to define the hard
constraints, which include maximum population and maximum area. Apart from the hard constraints on
population and area, it is necessary to input administrative boundaries and uncrossable features, which the
pre-EA boundaries formed must adhere to. In addition, the user should define soft constraints, including
minimum length shared boundaries and different weighting coefficients, to guarantee that pre-EAs are
created that meet user expectations and global standards. Following the establishment of all the parameters,


                                                         14
the small geographic units will be merged until one of the hard constraints is satisfied. The output is in vector
format, and each generated pre-EA has necessary attribute information from the administrative boundary,
total estimated population, area, and unique IDs (Random IDs and ID numbering using serpentine
technique). Table 4 presents the parameter calibration used in the preEA tool. The user can adjust the
merging process’s priority parameters. For instance, when the population in the tool is given more weight,
the created pre-EAs’ total population will be homogeneous and near the maximum population threshold. In
contrast, raising the shape factor coefficient can achieve a more compact shape.

                                Table 4: Parameter calibration in the preEA tool
                                                           Minimum
                      Maximum                                                  Area             Population       Share factor
      Class                           Maximum area       length shared
                      population                                            (coefficient)      (coefficient)     (coefficient)
                                                           boundaries
 Urban                   1000             10km2              20%                  0                 1                  2
 Rural                   800              10km2              20%                  0                 1                  2


Post-Processing: The quality of the input datasets primarily undermines the quality of the outputs from the
preEA tool. Some of the generated pre-EA outputs need to be assessed and modified because of the
limitation of the input datasets and the imposing of various constraints in the tool. This is also the primary
rationale behind the output’s designation as pre-EA rather than EA. For example, pre-EAs with negligible
populations and geographic areas were automatically merged with their neighbor via employing the
“Eliminate Small Area” tool in the preEA tool package. Manual modifications were applied to pre-EAs
spanning large geographic areas with high populations.

3.3 Further Manual Adjustments

The national sampling frame created using the automatic approach requires additional modifications for
several reasons. First, some adjustments are needed in areas where the existing datasets were inadequate to
create smaller EAs. Second, the output contains several polygons with positive populations but no settlement
of residents. The pre-EAs with positive populations that clearly fall within non-residential areas such as
offices, buildings under construction, 14 recreational centers, factories, manufacturing zones, and agricultural
lands are manually adjusted.

Appendix B discusses the use of boundary information produced by the approach for creating field and
offline maps to supplement the survey data collection process.

4 Results: A National Sampling Frame – Pre-EAs for Armenia

This section presents the sampling frame in which the pre-census enumeration areas (pre-EAs) serve as
primary sampling units (PSUs).




14 Note that the judgments are based on satellite images from Google Street Maps available at the time of this study, i.e., April

2024. So, settlements and residential buildings recently built might have been overlooked in the construction of this sampling
frame. However, these measurement errors are more likely to be random, so this limitation should not introduce systematic bias
in the frame.


                                                               15
4.1 Urban and Rural Classification

Armenia’s first accessible and usable urban and rural digitized border has been produced in this paper
(Figure 5). According to the generated boundary and WorldPop gridded population data for 2020, urban
areas account for 20% of the land area, while rural areas cover the remaining 80%. In terms of population,
more than 60% of Armenians live in urban areas, with the remaining less than 40 residing in rural areas.

                         Figure 5: Generated urban and rural areas in Armenia




              Notes: The map shows areas classified as urban areas in red and rural areas in yellow
              and how those are nested within the subregion boundaries in Armenia
              (https://data.humdata.org/dataset/geoboundaries-admin-boundaries-for-armenia).
              Basemap: OSM Standard.


4.2 Population Estimates

A critical ingredient in creating EAs and sampling frames is the availability of granular information on the
number and location of the population, which helps to set EA size at a suitable level for enumerators to


                                                        16
cover the area in a reasonable timeframe. The grid-level population estimates are obtained from WorldPop
data, which provides population density estimated using a global high-resolution population model
(WorldPop, 2024). The population count is available from the 2011 Census data from the Committee of the
Republic of Armenia (ArmStat) with limited spatial information, including marz and settlement types.
Despite the limited spatial information in the census data, the population estimates from WorldPop have
been compared with the actual population count at the aggregate level to examine the accuracy of the
estimated population. The population distribution at the marz and urban/rural levels are highly comparable.
The correlation coefficient between populations from the two sources is 0.99 (SE: 0.05, p-value: 0.00) for
marz populations and 1 for urban/rural populations. Figure 6 compares the “urban” 15 population at the
marz level. Figure 7 compares the “total” population at the urban/rural level. These suggest that the
population estimates from the WorldPop data are consistent with the actual population, at least at the marz
and urban/rural levels. More importantly, the automatic creation of urban and rural boundaries has produced
an accurate classification.

                                   Figure 6: Urban population in marzes , 2011
                   (a) Absolute number                                              (b) Share in total




                                   Figure 7: Urban and rural population, 2011
                   (a) Absolute number                                              (b) Share in total




15The “total” population was not compared because marz names were missing for rural settlements in the WorldPop data used
for this analysis. However, there is hardly a reason for rural populations to vary significantly across the two data sources.


                                                             17
4.3 Armenia’s National Sampling Frame

Figure 8 illustrates the pre-EAs created in this paper using the method described in Section 3, showing that
the pre-EAs cover the entire territory and the boundaries are drawn without any geometric mistakes. In the
first stage of automatic production of the pre-EA, the preEA tool produced 130,378 building blocks (panel
(a)). After the merging process, 7,413 pre-EAs were delineated for the entire Armenia of which 3,813 were
in urban and 3,600 in rural areas (panel (b)). Nearly 60% of the pre-EAs, specifically 4,354, have a population
strictly greater than zero after the manual adjustments, most of which, have an estimated population of
between 100 and 1,000 people. The remaining 3,059 pre-EAs are unsettled, i.e., their population is zero as
described in Section 3. Thus, although the map covers the entire country, approximately 41% of the area in
the map would not be accounted for in the sampling designs because the probability of being selected is zero
for empty PSUs. However, all individuals are included.

                                         Figure 8: PreEA tool outputs




Notes: Panel (a) depicts building blocks before merging, while panel (b) shows pre-EA outputs after merging. Basemap:
OSM Standard.

Next, the PSU size is characterized to examine the sampling frame and analyze the population distribution
across pre-EAs, focusing on those with a positive population. Figure 9 plots the distribution of 2022
population estimates across pre-EAs in the country.




                                                         18
                    Figure 9: Distribution of population across pre-EAs in Armenia




Notes: The figure presents the distribution of population across pre-defined enumeration areas (pre-EAs) in 2022. The
extreme values of the population have been winsorized at the 1st and 99th percentiles to account for the potential
outliers, i.e., we set the low (high) values at the 1st percentile (99th percentile).

The population distribution is fairly normal: 90% of the PSUs have a population below 1,156, and the
remaining 10% have a population between 1,156 and 2,274 after winsorizing at the 1st and 99th percentiles
to avoid potential outliers. The population of the pre-EA is not winsorized in the sample frame, and it is our
suggestion to adjust those extreme values as the population might have been overestimated in those areas.
However, the users of this sampling frame can make their judgments regarding such values. Despite some
outliers at the top part of the distribution, the population across pre-EA is relatively more evenly distributed
than the adult population across electoral precincts, which displays U-shaped patterns as shown in Figure 1.

Figure 10 depicts examples of pre-EAs with positive populations that fall within non-residential areas. Since
these pre-EAs do not contain residents, the frame is adjusted by setting the population estimates of those
pre-EAs to zero. Before this modification, there were 1,601 pre-EAs already with zero populations, and the
population of 1,458 additional pre-EAs was changed to zero. Thus, a total of 3,059 pre-EAs are considered
as unsettled areas. After setting population estimates to zero for these pre-EAs, the population estimates of
the remaining pre-EAs with positive populations have been adjusted by multiplying them with a strata-level
factor to match the strata-level population counts.




                                                         19
  Figure 10: Examples of pre-EAs that are located in non-residential areas yet have a population
                                       greater than zero




Notes: Basemap: ESRI Satellite Imagery.

Certain international guidelines must be followed during the development of pre-EA boundaries, as shown
in Figure 11. Within the specified administrative limits, the pre-EA boundaries must be nested (panel (a)).
Rivers and major roads are examples of uncrossable features that the pre-EA outlines must respect (panel
(b)). The pre-EA boundaries should be aligned with ground-visible features such as roads and infrastructures.
Panel (c) shows examples of the pre-EA boundaries in urban areas, whereas the pre-EA boundaries in rural




                                                     20
areas are shown in panel (d). The figures demonstrate how well the pre-EA’s outlines followed the discernible
ground features. 16

                                  Figure 11: The outline of pre-EA boundaries




Notes: Basemap: OSM Standard and ESRI Satellite Imagery.

4.4 The First Application: Listening to Armenia

This section presents an application of the proposed national sampling frame based on pre-EAs in Armenia,
specifically describing the sampling design of the World Bank Group’s “Listening to Armenia” survey
(L2Arm). The application of the proposed sampling frame in the L2Arm survey has proven to be a successful
approach. It allowed for a nationally representative sample that met the objectives of capturing urban and
rural distinctions across all regions of Armenia. The innovative use of pre-EAs optimizes resource allocation
and ensures the feasibility of the survey within time and budget constraints.




16There are no boundaries of electoral precincts and Census settlements in Armenia, which is one of the limitations of those
potential sampling frames in the country, as discussed in Section 2.


                                                            21
4.4.1 Overview of the Survey

Listening to South Caucasus (L2SC) is an expansion of a collaborative effort that has been conducted in
multiple countries in the Europe and Central Asian region. This initiative aims to comprehensively monitor
the views and well-being of a representative group of people as the government introduces social and
economic reforms that affect every business and citizen. By reflecting on the experience of this group over
the years, the study provides an up-to-date understanding of how policies reflect on people’s daily lives. The
study comprises a nationally representative baseline survey and a high-frequency panel survey of a subset of
the baseline participant households. The information collected through the L2SC initiative informs reform
efforts directly by raising the profile of citizens’ views and enabling in-depth economic analysis.

While the L2SC survey covers Armenia and Georgia, this paper focuses on the baseline survey in Armenia—
Listening to Armenia (L2Arm)—where the new national sampling frame based on pre-EAs has been
proposed. 17

4.4.2 Sampling Design

The sampling design optimizes the spatial allocation of the household sample to provide valid
representativeness at the national level for both urban and rural areas. A two-stage stratified cluster sampling
design is employed to select participating households, ensuring a balanced sample distribution across
regions 18 and accounting for differences between urban and rural areas, survey budgets, and discrepancies in
population estimates. The L2Arm survey’s implementation highlighted the robustness of the sampling frame,
as it successfully captured the population distribution across diverse geographic and demographic strata. The
use of probability proportional to size (PPS) sampling ensured that the selection process was equitable and
aligned with population estimates, further validating the practicality of the proposed approach.

In the first stage, a certain number of primary sampling units (PSUs) will be selected in each urban and rural
stratum (urban and rural areas within each administrative region). In the second stage, the ultimate sampling
units or the secondary sampling units (SSUs)—households in the case of L2SC—are randomly selected
within each PSU. The survey is then implemented among the selected households. Given that our focus in
this paper is on the first stage of the two-stage procedure, the sampling frame used for the survey in the first
stage is highlighted.

Sampling Frame: As mentioned, the sampling frame is based on pre-census enumeration areas (pre-EAs)
providing the most accurate information on the geographic distribution of the population across Armenia.
This allows for the most precise formulation of a sample design.

Sample Size: The objective of any sample design is to achieve the highest precision in indicators of interest
given survey parameters. The sampling design aims to efficiently allocate the given PSUs across strata and




17 The Listening to Georgia (L2Geo) leverages the national sampling frame based on the 2014 Census with population updated

by 2022 and 9,514 enumeration areas as PSUs. The preference to use comparable frames in Georgia and Armenia was one of the
motivations for developing a new national sampling frame in Armenia based on pre-EAs in this paper.
18 A balanced sample refers to a sample where the means of covariates are equal to the population means of those covariates,

ensuring that the sample is representative of the population of interest (Brown et al., 2015).


                                                            22
the households across PSUs. For L2Arm, 400 PSUs are allocated across strata proportional to the
population 19 (i.e., implicit allocation) with some adjustments. Ten households are targeted within each PSU. 20
Table 5 presents the proposed baseline sample allocation.

In the first stage of the two-stage stratified cluster sampling design, PSUs in each stratum are randomly
selected using systematic random sampling with probability proportional to size (PPS), size being the
estimated population of the pre-EA. This method assigns each PSU’s likelihood of selection based on the
PSU’s size within the stratum. Population size, rather than the number of households, is used due to a lack
of data on the number of households at the PSU level in the sampling frame. Thus, each PSU’s likelihood
of selection corresponds to the percentage of the stratum population residing in the PSU. In the second
stage, a set number of households are randomly selected from each chosen PSU. 21

            Table 5: Baseline sample design based on proportional sample allocation (2022)
                                                                                               Baseline
                       Stratum                          Estimated         Allocated        Target N of HHs
                                                                                                              Number of HHs
                                                      Population Size   Number of PSUs         PSU size
 Yerevan                                                1,098,866             147                 10                1470
 Aragatsotn - Urban                                       26,738               4                  10                 40
 Aragatsotn - Rural                                       98,949               13                 10                 130
 Ararat - Urban                                           72,294               10                 10                 100
 Ararat - Rural                                          186,983               25                 10                 250
 Armavir - Urban                                          82,953               11                 10                 110
 Armavir - Rural                                         183,703               25                 10                 250
 Gegharkunik - Urban                                      65,902               9                  10                 90
 Gegharkunik - Rural                                     162,809               22                 10                 220
 Kotayk - Urban                                          137,493               18                 10                 180
 Kotayk - Rural                                          116,364               16                 10                 160
 Lori - Urban                                            124,050               17                 10                 170
 Lori - Rural                                             87,532               12                 10                 120
 Shirak - Urban                                          133,620               18                 10                 180
 Shirak - Rural                                           96,856               13                 10                 130
 Syunik - Urban                                           90,205               12                 10                 120
 Syunik - Rural                                           44,350               6                  10                 60
 Tavush - Urban                                           49,859               7                  10                 70
 Tavush - Rural                                           69,943               9                  10                 90
 Vayots Dzor - Urban                                      16,160               2                  10                 20
 Vayots Dzor - Rural                                      31,501               4                  10                 40
 Total                                                  2,977,130             400                                   4000

Notes: The population estimates at the stratum level are by the end of 2022 and match with the population statistics
from the Statistical Committee of the Republic of Armenia. Refer to https://armstat.am/file/doc/99538403.xlsx for
the urban population and https://armstat.am/file/doc/99538413.xlsx for the rural population.




19 It is preferred to use total number of households as the size. However, there is no available information on the number of
households in each pre-EA.
20 Although 400 PSUs and 10 households in each PSU, with a total target sample of 4,000 households, are targeted, the sample

size can be smaller than the target if the total number of households in the selected PSU is less than the target or due to non-
responsiveness.
21 Appendix C describes the calculation of survey weights.




                                                               23
The approach proposed in this paper can be applied to other countries across the Europe and Central Asia
(ECA) region and beyond. The method can help survey implementation in countries where a reliable national
sampling frame is unavailable which often causes the World Bank, UN, and other international organizations
to face critical challenges in preparing socioeconomic surveys. In those countries, the National Statistical
Offices (NSOs) are often unwilling to share their census sampling frame with international organizations
although the sample frame might be available. Even if the census sampling frame is available, census EAs
are often large and cover large population areas which require substantial resources and time to conduct the
household listing. The resource and time-efficient approach we propose thus can save a significant amount
of budget, which could be invested in other areas, in all sampling stages.

5 Discussion

The national sampling frame based on pre-EAs provides several benefits that the existing and accessible
potential sampling frames cannot offer. However, it could also introduce some practical challenges and
methodological limitations. This section thus now discusses the additional benefits and potential concerns
and proposes alternative solutions that have been previously tested to resolve some of those challenges in
other developing countries where pre-EAs have been employed, such as Somalia (Qader et al., 2021) and the
Democratic Republic of the Congo (Qader et al., 2023).

Population Estimates as a Measure of EA Size: The main challenge of using the proposed national
sampling frame for household surveys is that the size of the pre-EA is measured by population estimates,
which can differ from the actual population and thus introduce bias in the probability of EAs being selected
in the first stage of the two-stage design.

Although it has been made clear that this study does not address the validation of the gridded population
estimates, it is important to provide some clarity to understand the limitations of the data. The automatic
creation of a national sampling frame requires granular information about the population to ensure that the
output sampling units are manageable. This level of granularity does not exist in the existing census data in
Armenia. Instead, gridded population data is leveraged. In developing countries, several gridded population
datasets with various spatial resolutions are accessible, including Gridded Population of the World (GPWv4)
(CIESIN, 2016), WorldPop (Bondarenko et al., 2020), High-Resolution Settlement Layer (HRSL) (Facebook
and CIESIN, 2016), Demobase Population datasets (Azar et al., 2013), Global Human Settlement Population
Grid (GHS-POP) (JRC, 2015), Global Rural-Urban Mapping Project (GRUMP) (CIESIN, 2011) and
LandScan (Dobson et al., 2000). The quality of the input data model, which includes census data, satellite-
derived covariates, and statistical model selection, is primarily responsible for the accuracy and quality of the
gridded population data. The WorldPop-constrained gridded population data for Armenia in 2020 has been
used to create the national sampling frame. It was WorldPop’s most recent gridded population available. It
implies that there may be notable differences between the population’s size and distribution in 2020 and
today. Since the user can update the national sampling frame’s population based on their desired population
such as population registry, this should not be a major concern.

This paper established an accessible and usable urban and rural classification in Armenia for the first time,
and it fits within the concept of a national sampling frame. The output might not reflect the reality, despite
a significant correlation between the aggregated population estimates from the census and gridded
population estimates for urban and rural areas, as discussed in Section 5. It is primarily because the GHSL
SMOD divides the world into urban and rural classes using gridded populations and built-up areas developed
using various data sources. The algorithms used to generate the input data for both datasets and the date of
the satellite imagery from which the covariates are extracted introduce certain biases. The severity of bias

                                                       24
due to using population estimates as PSU size depends on the size of the discrepancy between the actual
population and population estimates and whether the difference is systematic.

Our findings clearly show that the pre-EAs’ total populations vary, ranging from zero to a certain population
size. In the preEA tool, the user has the option to define various constraints. The maximums of population
and geographic areas are the two main hard constraints. These maximum thresholds may differ depending
on the goal of the work or from one country to another. The primary goal of establishing maximum
thresholds for both population and area is to strike a balance between the two limits to prevent the creation
of unmanageable pre-EAs in areas with sparse populations. When one of these constraints is met, the
aggregation will cease during the merging procedure. The size of pre-EAs in uninhabited areas (indicated by
gridded population areas) is simply determined by the maximum geographic area; if the maximum area is
reached, the aggregation will cease. As a result, several preEAs with zero or low population values might
have been created. The main benefit of this strategy is that it eliminates the need to include uninhabited areas
in your sampling surveys, which will result in significant time and cost savings. However, as the method
primarily depends on gridded population data, a few inhabited areas may be overlooked if the data is
unreliable. This can be resolved in the tool by ignoring the geographic area constraint, but doing so creates
enormous pre-EAs that may be difficult to enumerate, especially in rural areas.

Model estimates of gridded population data can be enhanced using a reliable approach and sufficient
resources and data to identify non-residential buildings. Despite significant efforts, this still presents a
challenge because of the complexity of the problems (such as the structures' resemblance and the coexistence
of residential and commercial tenants in the same building). As a result, non-residential areas are not excluded
in the population prediction of several gridded population datasets. Consequently, certain pre-EAs that are
located within non-residential areas exhibit positive population values.

It should be mentioned that when the pre-EAs were verified against high-resolution satellite imagery base
maps from ESRI and Google, many of these validations were carried out by observation. Since the dates of
the satellite imagery were not considered, these evaluations might not be impartial. Therefore, without a
comprehensive ground check, these assumptions cannot be verified.

Digitized Elements and Boundaries: Digitalized elements, both natural and man-made, are essential for
the automatic generation of the national sample frame. The method in this paper made use of
OpenStreetMap’s (OSM) extensive digital line availability on roads, railways, and waterways. However, the
pre-EAs that were created exceeded the specified thresholds, such as population and geographic area,
because of the poor quality and incomplete spatial coverage of the existing digitized boundaries. The primary
causes of this are (i) incomplete and (ii) disconnected lines. If any of the natural and artificial features still
need to be fully digitized, more work must be done either manually or automatically. A line should never be
left open, and it should always be connected to other features as much as feasible. This is because
disconnected lines will not be polygonized during the polygonization process, which leads to the creation of
large pre-EAs. In addition to spatial coverage, the quality of the OSM attribute data is critical for the
automatic creation of the national sample frame. Several line features, including major roads, rivers, and
other impediments, were designated as uncrossable to facilitate the collecting of ground data and increase
efficiency. The sole source that can determine the kinds of features on the ground is the attribute information.
An inaccurate feature class in the attribute table may result in the national sampling frame’s uncrossable
features being set incorrectly.

The semi-automatic approach creates pre-EA with digitized visible ground features that are unlikely to
intersect with buildings or other structures. But occasionally, this might happen. The administrative boundary


                                                       25
or the poorly entered visible digital lines may be the cause for this. Given that administrative boundaries
cannot be changed or modified without consulting with pertinent government agencies, the user should
exercise caution when determining the reasons behind cutting buildings and other structures. Figure 12, for
example, shows a pre-EA output with an outline that cuts buildings. This is due to the administrative
boundary of the municipality and thus cannot be modified.

                        Figure 12: An example of pre-EA boundary-cutting buildings




This method has solely utilized publicly accessible natural and man-made features, and settlement
boundaries, such as OSM and GHSL, for reproducibility and worldwide application. Nonetheless, several
government agencies can provide input datasets such as roads and waterways with higher quality and greater
geographic coverage. In addition, future studies could also investigate leveraging the more modern and
comprehensive commercial road network (Strano et al., 2017). If inadequate spatial coverage is a major
concern, an alternative approach would be to use “mapathons”—a coordinated mapping event—to enhance
the current open-source data on roads and rivers before implementing this method.

One of the main challenges for collecting high-quality surveys in Armenia was the lack of boundaries for
PSUs. To the best of our knowledge, it is questionable whether there is any digital map of census enumeration
areas at the Statistical Committee of the Republic of Armenia (ArmStat). 22 Thus, this paper creates the first




22Although it is hardly accessible, the ArmStat has a physical map of census enumeration areas. It is however uncertain if there is
a digital version and what information is embedded in the map.


                                                                26
accessible digitized national sampling frame for the country. The boundaries of the PSUs and other
administrative units in our sampling frame offer several benefits. For example, it helps to avoid potential
mistakes of households outside of the selected to be listed and covered in the survey data. If this potential
mistake is not random, it might severely undermine the quality of the subsequent analysis using the collected
data. However, this error could be non-random and more systematic as this mistake can be more frequently
made for PSUs with larger areas and thus longer boundaries. So, this feature could play a critical role from
the quality control perspective. 23

It may be challenging to directly compare the resources and budget of our automatic method to manual
approaches because many countries do not release a breakdown of the costs associated with conducting
censuses at different phases, especially when it comes to the resources needed to manually digitize national
census enumeration areas. For example, the 2010 census mapping effort in Zambia was projected to cost US
$7 million and take almost two years to complete (United Nations Secretariat, 2007). If the pre-census sample
frame in Armenia had been digitalized manually, we would have needed to dedicate significant financial
resources to the extensive training of a team of cartographers to teach them how to digitize all the units on
high-resolution satellite imagery. In addition, all pre-enumeration areas were to be manually digitized by the
team according to all specified requirements. It would have taken a lot of work to verify quality and correct
several geometric errors because this was hand-drawn. The entire procedure is obviously time- and resource-
consuming. Nonetheless, it took less than two months to complete the automatic creation of the pre-census
sampling frame in Armenia, including the manual correction required due to a lack of spatial input data. One
specialist carried out the entire procedure. The significant labor, time, and cost savings achieved by
developing an automatic pre-census sample frame could be invested in various aspects of national surveys
and census preparation.

Despite limitations, the method was successfully implemented, and a national sampling frame was produced
for Armenia. From financial, timeframe, and technological perspectives, the method performed better than
the conventional manual techniques. Delineating a nationwide sampling frame by hand has historically
required years and massive financial resources. In addition, the manual method is prone to several geometric
issues, including gaps, overlaps, pockets, and disjoints since it is carried out by humans. These geometric
problems could lead to bias being introduced into the frame and eventually into the collected data.
Nevertheless, there are no geometric problems with the automatic method. This approach has several
benefits over the gridded population sampling frame. Several researchers have directly employed gridded
population data as a sample frame (Thomson et al., 2017; Cajka et al., 2018; Qader et al., 2020). The design
of the sampling units is the primary distinction between our approach and the gridded population sampling
frame. Houses and other structures are cut because the grid’s outline is not in line with features that can be
seen on the ground. In contrast, the preEA tool creates pre-EA boundaries that adhere to observable
elements like natural borders and roads.




23 A digital map of the enumeration areas also provides a significant amount of spatial information, such as settlement areas with
residents within the PSU and their centroids, which can be used, for example, as starting points for random walks. If the household
listing is possible for each pre-EA and all household addresses are available, a random walk procedure and thus starting point
information are unnecessary. However, complete household listing is not available or hard to obtain in some developing countries,
and the household listing operation—visiting each of the selected PSUs and listing all residential households in those PSUs—is
costly and time-consuming. So, although not encouraged, some surveys employ random walks as a last resort to select households
in the second stage if subject to pressing constraints, and thus, the digital map supports such a procedure.


                                                                27
Potential Applications of the Method in Different Countries: The lack of a national sample frame poses
serious implementation-stage issues for many household surveys. An up-to-date digitized national sampling
frame may not be available in some countries. The sampling frame might exist in other countries, but NSOs
are unwilling to provide the World Bank and other international agencies access to the data. In other
situations, the frame is based on the census enumeration areas, and because of their large spatial unit,
conducting the second and third stages of sampling will cost a significant amount of money to achieve the
necessary household size. Furthermore, if the sample selection is based on census enumeration areas, the
household listing for the selected sampling units would also need additional time and resources because of
their extensive spatial coverage and huge population totals. Our proposed method and strategy can be used
to create a new national sampling frame if one of the previously described difficulties arises in future surveys
conducted in other countries.

6 Conclusion

This paper develops a new national sampling frame in the Republic of Armenia, serving as an example of
developing nations with limited access to workable sampling frames for representative individual and
household surveys. Specifically, pre-census enumeration areas (pre-EAs) have been created using an
innovative approach for the automatic delineation of a national sampling frame, which offers strengths in
various dimensions over other potential sources of sampling frames, such as those based on old census
enumeration areas, census settlements, electoral precincts, and traditional gridded sampling.

The national sampling frame constructed in this paper divides the country into about 7,500 pre-EAs, most
of which have population estimates strictly greater than zero. These population estimates are most recently
updated and homogeneous, mostly between 100 and 1,000 people. The digitized enumeration areas with
clear boundaries support the household selection, e.g., by enabling to avoid selecting households outside the
selected PSUs.

Several methodological and practical contributions have been made to the survey sampling literature and
entities that collect and leverage representative surveys, such as researchers and policy makers. The paper
first extends the application of the semi-automatic approach for creating national sampling frames by
generating the first digitized frame based on pre-EAs in Armenia as an alternative to traditional approaches
to delineating national sampling frames. Our exercise and analysis on the new national sampling frame
showcase the applicability of pre-EAs in other countries with similar challenges in the sampling frame.
Second, our national sampling frame contributes to the survey collectors and users of household and
individual surveys in Armenia by providing a standardized and decentralized frame. Third, the existing
sample frames in the country have been systematically evaluated in comparison against the proposed frame
by discussing their strengths and limitations. This suggests that our frame complements the existing potential
sampling frames and can serve as an alternative.

The paper concludes with a caveat and a direction for future research. A new and workable national sampling
frame has been proposed in this paper, resolving a challenge often faced by survey conductors in the first
stage of the two-stage sampling design. However, solutions to the challenges faced in the second stage of
the two-stage design, e.g., strategies to perform household listing, are outside the scope of this paper. So,
future research can explore innovative ways to list the households in the country, for example, when using
the sampling frame proposed in this paper.




                                                      28
References

ArmStat. 2022. “The Necessity of Population Census Conduction and Ensuring Its Legal Basis.”
 https://www.armstat.am/file/article/introduction.pdf.

ArmStat. 2024. “The Results of 2011 Population Census of the Republic of Armenia (Indicators of the
 Republic of Armenia).” https://armstat.am/en/?nid=532.

Azar, Derek, Ryan Engstrom, Jordan Graesser, and Joshua Comenetz. 2013. “Generation of Fine-Scale
 Population Layers using Multi-Resolution Satellite Imagery and Geospatial Data.” Remote Sensing of
 Environment, 130: 219–232. https://doi.org/10.1016/j.rse.2012.11.022.

Bondarenko, Maksym, David Kerr, Alessandro Sorichetta, and Andrew Tatem. 2020. “Census/Projection-
 Disaggregated Gridded Population Datasets for 189 Countries in 2020 using Built-Settlement Growth
 Model (BSGM) Outputs.” University of Southampton https://dx.doi.org/10.5258/SOTON/WP00684
 [Dataset].

Brown, J. A., B. L. Robertson, and Trent McDonald. 2015. Spatially Balanced Sampling: Application to
  Environmental Surveys. Procedia Environmental Sciences, 27: 6-9.
  https://doi.org/10.1016/j.proenv.2015.07.108.

Cajka, James, Safaa Amer, Jamie Ridenhour, and Justine Allpress. 2018. “Geo-Sampling in Developing
 Nations.” International Journal of Social Research Methodology, 21(6): 729–746.
 https://dx.doi.org/10.1080/13645579.2018.1484989.

Center for International Earth Science Information Network - CIESIN - Columbia University,
 International Food Policy Research Institute - IFPRI, The World Bank, and Centro Internacional de
 Agricultura Tropical - CIAT. 2011. “Global Rural-Urban Mapping Project, Version 1 (GRUMPv1):
 Population Count Grid.” Palisades, New York: NASA Socioeconomic Data and Applications Center
 (SEDAC). https://doi.org/10.7927/H4VT1Q1H.

Center for International Earth Science Information Network - CIESIN - Columbia University. 2016.
 “Gridded Population of the World, Version 4 (GPWv4): Administrative Unit Center Points with
 Population Estimates.” Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC).
 https://doi.org/10.7927/H4F47M2C.

Central Bureau of Statistics of Nepal. 1996. “Nepal Living Standard Survey – 1995: Basic Information
 Document.” https://microdata.worldbank.org/index.php/catalog/2301/download/34437.

Facebook Connectivity Lab and Center for International Earth Science Information Network - CIESIN -
  Columbia University. 2016. “High Resolution Settlement Layer (HRSL).” Source imagery for HRSL ©
  2016 DigitalGlobe. https://www.ciesin.columbia.edu/data/hrsl/.

Dobson, Jerome E., Edward A. Bright, Phillip R. Coleman, Richard C. Durfee, and Brian A. Worley. 2000.
 “LandScan: A Global Population Database for Estimating Populations at Risk.” Photogrammetric


                                                   29
  Engineering and Remote Sensing, 66(7): 849–857. https://www.asprs.org/wp-
  content/uploads/pers/2000journal/july/2000_jul_849-857.pdf.

Eurostat. 2021. “Applying the Degree of Urbanisation: A Methodological Manual to Define Cities, Towns
 and Rural Areas for International Comparisons.” Publications Office of the European Union.
 https://dx.doi.org/10.2785/706535.

European Commission, Joint Research Centre (JRC). 2015. “GHS-POP R2015A - GHS population grid,
 derived from GPW4, multitemporal (1975, 1990, 2000, 2015) - OBSOLETE RELEASE.” European
 Commission, Joint Research Centre (JRC). https://data.europa.eu/89h/jrc-ghsl-
 ghs_pop_gpw4_globe_r2015a [Dataset].

Grosh, Margaret E., and Paul Glewwe. 1995. “A Guide to Living Standards Measurement Study Surveys
 and their Data Sets.” LSMS Working Paper No. 120.
 https://documents1.worldbank.org/curated/ar/270551468764720584/pdf/multi-page.pdf.

Hansen, Henning S., and Lise Schrøder. 2019. “The Societal Benefits of Open Government Data with
 Particular Emphasis on Geospatial Information.” In Electronic Government and the Information Systems
 Perspective: 8th International Conference, EGOVIS 2019, Linz, Austria, August 26–29, 2019, Proceedings 8. eds.
 by A. Kő, E. Francesconi, G. Anderst-Kotsis, A. Tjoa, and I. Khalil: Springer International Publishing,
 31-44. https://doi.org/10.1007/978-3-030-27523-5_3.

Nieves, Jeremiah J., Maksym Bondarenko, Alessandro Sorichetta, Jessica E. Steele, David Kerr, Alessandra
 Carioli, Forrest R. Stevens, Andrea E. Gaughan, and Andrew J. Tatem. 2020. “Predicting Near-Future
 Built-Settlement Expansion using Relative Changes in Small Area Populations.” Remote Sensing, 12(10):
 1545. https://www.mdpi.com/2072-4292/12/10/1545.

Pettersson, Hans. 2023. “Sample Design and Sample Size for the Armenia MICS 2024.” UNICEF
  Consultancy Report. Unpublished manuscript.

Qader, Sarchil H., Andrew Harfoot, Mathias Kuepie, Edith Darin, Sabrina Juran, Attila Lazar, and Andrew
 Tatem. 2022. “National Automatic preEnumeration Areas (preEAs) in Burkina Faso (2019), Version
 1.0.” WorldPop, University of Southampton.
 https://data.worldpop.org/repo/wopr/BFA/preEAs/v1.0/.

Qader, Sarchil H., Heather Chamberlain, Mathias Kuepie, Freja K. Hunt, Attila Lazar, and Andrew Tatem.
 2023. “Field Testing of Pre-Enumeration Areas Created using Semi-Automated Delineation Approach,
 Democratic Republic of Congo.” WorldPop, University of Southampton.
 https://dx.doi.org/10.5258/SOTON/WP00759.

Qader, Sarchil H., Veronique Lefebvre, Andrew J. Tatem, Utz Pape, Warren Jochem, Kristen Himelein,
 Amy Ninneman, Philip Wolburg, Gonzalo Nunez-Chaim, Linus Bengtsson, et al. 2020. “Using Gridded
 Population and Quadtree Sampling Units to Support Survey Sample Design in Low-Income Settings.”
 International Journal of Health Geographics, 19: 1–16. https://dx.doi.org/10.1186/s12942-020-00205-5.




                                                       30
Qader, Sarchil, Veronique Lefebvre, Andrew Tatem, Utz Pape, Kristen Himelein, Amy Ninneman, Linus
 Bengtsson, and Tomas Bird. 2021. “Semi-Automatic Mapping of Pre-Census Enumeration Areas and
 Population Sampling Frames.” Humanities and Social Sciences Communications, 8(1): 1–14.
 https://dx.doi.org/10.1057/s41599-020-00670-0.

Schiavina, M., M. Melchiorri, M. Pesaresi, P. Politis, S.M. Carneiro Freire, L. Maffenini, P. Florio, D.
  Ehrlich, K. Goch, A. Carioli, J. Uhl, P. Tommasi, and T. Kemper. 2023. “GHSL Data Package 2023.”
  Publications Office of the European Union. https://dx.doi.org/10.2760/098587.

Stevens, Forrest R., Andrea E. Gaughan, Catherine Linard, and Andrew J. Tatem. 2015. “Disaggregating
  Census Data for Population Mapping using Random Forests with Remotely- Sensed and Ancillary Data.”
  PLoS ONE, 10(2): e0107042. https://doi.org/10.1371/journal.pone.0107042.

Strano, Emanuele, Andrea Giometto, Saray Shai, Enrico Bertuzzo, Peter J. Mucha, and Andrea Rinaldo.
  2017. “The Scaling Structure of the Global Road Network.” Royal Society Open Science, 4(10): 170590.
  https://doi.org/10.1098/rsos.170590.

Thomson, Dana R., Forrest R. Stevens, Nick W. Ruktanonchai, Andrew J. Tatem, and Marcia C. Castro.
 2017. “GridSample: An R Package to Generate Household Survey Primary Sampling Units (PSUs) from
 Gridded Population Data.” International Journal of Health Geographics, 16: 1–19.
 https://dx.doi.org/10.1186/s12942-017-0098-4.

United Nations Secretariat. 2007. “Report of the Sub-regional Workshop on Census Cartography and
 Management.” ESA/STAT/AC.144/L.3.

WorldPop. 2024. “WorldPop Data.” https://www.worldpop.org/datacatalog/.




                                                    31
Appendix A: Additional Figures

          Figure A.1: Change in population distribution in an area between 2011 and 2024




Appendix B: Field and Offline Maps

To take advantage of the boundaries and other spatial information, the survey conductor can create field
maps of selected pre-EAs for enumerators to feed their navigation when they collect the survey. Even in the
presence of the field maps, it could still be challenging for enumerators to navigate themselves in the selected
pre-EA especially, when the enumerators are not familiar with the area and cannot eyeball the boundaries
from the information provided in the physical maps, like street addresses and some information about
churches and schools. A potential solution to this problem could be offline maps, which inform the
enumerators of their location live and signal if they overstep outside the selected pre-EAs. In many
developing countries like Armenia, access to the internet is a substantial challenge, especially in rural and
remote areas, so enumerators can benefit from the offline maps that operate well with the minimum
requirement of only access to satellite. So, enumerators should be able to navigate smoothly and avoid the
risk of going beyond the boundaries of selected PSUs unless they are, for example, under the tunnel or
between narrow alleys in the mountain. In total, 400 field paper maps and 400 georeferenced offline maps
were created using QGIS software.

Multiple settled areas are probably present in various pre-EAs. Map definitions of these settled areas may be
helpful for ground navigation. Several actions have been taken to display the settled region on the field maps.
Administrative boundaries with incorporated urban and rural areas were intersected with settlement
boundaries. Zonal Statistic Polygons in QGIS were then used to determine the population sum for the
resultant intersected polygons based on the gridded population data. A point layer was created from the
output. Additionally, the settlement locations’ maximum population values inside each pre-EA were
extracted. The settlement with the highest population numbers is shown by the black circle surrounding
settlement sites on the field map. This can help the enumerator to find the most densely populated area as a
starting point. The X and Y coordinates were calculated for all the settlement points and were shown on the
field map (Figures B.1 and B.2).




                                                      32
Figure B.1: An example of a detailed field paper map in rural areas




Figure B.2: An example of a detailed field paper map in urban areas




                                33
Appendix C: Survey Weights

Sampling weights account for the fact that different members of the population have different probabilities
of being selected for interviews, represent various numbers of people in the overall population, and are
necessary when computing the representative statistics at the level of the domain. Sampling weights are also
adjusted accounting for non-responsive rates given the survey design if required.

The dataset will have two sets of weights, including household and individual weights. The household weights
are the inverse probability of selection of households and are calculated from the following two components
in our two-stage sampling design. The first component is the sampling weight (inverse probability of
selection) of PSU within the stratum, and the second component is the sampling weight of selection of
households within the PSU. For calculating individual weights, the third component, sampling weights of
individuals within the household, is added to compute the household weights. Each of the components is
calculated as follows:

Component 1: The inverse probability of selection of PSU within the stratum by using PPS (Probability
Proportional to Size) is calculated as:

                                                                              1               ������������������������������������������������������������������������������������������������
                                                      ������������������������������������ =
                                                   ������������                                 =                                      ,
                                                                        ������������������������������������������������ ������������������������������������������������ × ������������������������������������������������

where ������������                                                                           ������������������������������������ is the probability of selection of PSU
                   ������������������������������������ is the sampling weight of PSU within the stratum, ������������
within the stratum, ������������������������������������������������ is the number of selected PSUs within the stratum, ������������������������������������������������ is the size of selected PSU,
and ������������������������������������������������������������������������������������������������ is the population of the stratum. The size measure can be the population, the number of
households, the number of electors, or the school attendance, while the number of households would be the
preferred option in most household surveys.

Component 2: The inverse probability of selection of household within PSU is calculated as:

                                                                                       1                  ������������ℎℎ������������������������������������
                                                    ������������ℎℎ������������������������������������ =                             =                         ,
                                                                                ������������ℎℎ������������������������������������        ������������ℎℎ������������������������������������

where ������������ℎℎ������������������������������������ is the sampling weight of household within PSU, ������������ℎℎ������������������������������������ is the probability of selection of
household within PSU, ������������ℎℎ������������������������������������ is the number of sampled (interviewed) households within PSU, and ������������ℎℎ������������������������������������
is the total number of households within PSU.

Component 3: The inverse probability of selection of individual within the household is calculated by:

                                                                         1                   ������������������������������������������������ℎℎ ������������������������������������������������ℎℎ
                                           ������������������������������������������������ℎℎ =                        =                        =                  ,
                                                                  ������������������������������������������������ℎℎ         ������������������������������������������������ℎℎ          1

where ������������������������������������������������ℎℎ is the sampling weight of individual within the household, ������������������������������������������������ℎℎ is the probability of selection
of individual within the household, ������������������������������������������������ℎℎ is the number of sampled (interviewed) individuals within the


                                                                                       34
household, equal to 1, as only one individual was allowed to be interviewed from each household, and ������������������������������������������������ℎℎ
is the size of the household surveyed (asked and recorded during the interview).

Based on these components, household and individual weights are calculated as:

                                                          ������������������������������������ × ������������ℎℎ������������������������������������ ,
                                              ������������ℎℎ = ������������

                                              ������������������������������������������������ = ������������ℎℎ × ������������������������������������������������ℎℎ .




                                                                   35