Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION*
December 2022
After implementing Phase 1 of the High-Frequency Phone Survey (HFPS) project in Latin America and The Caribbean
in 2020, the World Bank conducted Phase 2 in 2021 to continue to assess the socio-economic impacts of the COVID-19
pandemic on households. This new phase, conducted in partnership with the UNDP LAC Chief Economist office, included
two waves. Wave 1 covered 24 countries1 and Wave 2 covered 22 countries. Of these countries, 13 participated in Phase 1
and the rest joined in Phase 2.
The 13 countries from Phase 1 are Argentina, Bolivia, Colombia, Costa Rica, Chile, Dominican Republic, Ecuador, El Salvador,
Guatemala, Honduras, Mexico, Paraguay and Peru. In these countries, Phase 2 Wave 1 tried to recontact households and
individuals who had responded in Phase 1 Wave 1 in 2020 and added a fresh supplement sample to compensate for
attrition nonresponse. Phase 2 Wave 2 tried to recontact all respondents to Phase 2 Wave 1 and also incorporated a
supplement sample. Countries that joined in Phase 2 (in addition to those of Phase 1) are Antigua & Barbuda, Belize,
Brazil, Dominica, Guyana, Haiti, Jamaica, Nicaragua, Panama, Saint Lucia and Uruguay.2
This document describes the sampling design, weighting and the right procedure to estimate indicators for the LAC HFPS
Phase 2 surveys. For the sake of clarity, countries participating in Phase 1 in 2020 will be called “Original Countries”. The
countries added in Phase 2 in 2021 will be called “Added Countries”.
As in the 2020 phase, Phase 2 estimates represent households with a landline or at least one cell phone and individuals
18 years of age or above with an active cell phone number or a landline at home.
1. HFPS PHASE 2 SAMPLING DESIGN
Phase 2 Wave 1 samples for the Original Countries included two components: a) a panel formed by respondents to
Phase 1 Wave 1, and b) a supplement fresh sample of phone numbers to compensate for attrition between Phase 1 Wave
1 and Phase 2 Wave 1, and to slightly increase the overall sample size3. The samples of the Added Countries were entirely
new, as described below.
Phase 2 Wave 2 samples included two components in all countries: a) a panel formed by respondents to Phase 2
Wave 1, and b) a supplementary sample of phone numbers to compensate for attrition between Phase 2 Wave 1 and
Phase 2 Wave 2.
Phase 2 Waves 1 and 2 aimed to achieve between 800 and 3,000 completed interviews per country. Table 1 displays the
number of complete interviews and response rate by country.
* This document was prepared by Ramiro Flores Cruz, partner at Sistemas Integrales and World Bank consultant on survey methodology and sampling, with the
financial support from the Latin American and Caribbean Regional Vice Presidency.
1 While it originally covered 23 countries, Brazil was later added to the LAC HFPS Phase 2 project. The survey was inspired by the broader regional project, but
followed country-specific characteristics. See World Bank and UNDP (2022) here: https://microdata.worldbank.org/index.php/catalog/4533 . This note describes
the methodology applicable to the 23 countries.
2 Antigua and Barbuda and Brazil were not included in Phase 2 Wave 2.
3 Ecuador, which is one of the Original Countries, is an exception since it used a fully fresh sample of phone numbers.
1
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
Table 1. HFPS Phase 2, Waves 1 & 2. Complete interviews and response rate by country.
Complete interviews Response rate in Complete interviews Response rate in
Country Type of country
in Phase 2 Wave 1 Phase 2 Wave 1 in Phase 2 Wave 2 Phase 2 Wave 2
Argentina Original 1,216 13.0% 1,321 13.5%
Bolivia Original 1,272 35.9% 1,312 35.0%
Chile Original 1,212 21.1% 1,329 14.2%
Colombia Original 1,221 33.9% 1,688 36.7%
Costa Rica Original 805 16.0% 905 15.2%
Dominican Republic Original 1,205 26.6% 1,364 32.6%
Ecuador a
Original 951 18.9% 1,032 29.2%
El Salvador Original 818 21.9% 812 26.0%
Guatemala Original 1,207 23.9% 1,521 24.6%
Honduras Original 1,021 23.3% 1,004 28.9%
Mexico Original 2,625 9.6% 2,511 10.5%
Paraguay Original 1,076 21.3% 1,061 34.0%
Peru Original 1,212 21.8% 1,724 26.8%
Antigua & Barbudab Added 790 37.1%
Belize Added 816 40.6% 898 49.2%
Dominica Added 861 37.6% 879 47.2%
Guyana Added 785 38.7% 875 46.1%
Haiti Added 2,814 36.8% 2,361 43.9%
Jamaica Added 829 27.8% 871 38.3%
Nicaragua Added 833 33.1% 865 36.9%
Panama Added 815 23.3% 1,335 36.4%
Saint Lucia Added 835 40.0% 860 43.6%
Uruguay Added 816 21.8% 930 26.2%
a
Ecuador was part of HFPS Phase 1 in 2020, so it is considered an Original Country in this table. However, its Phase 2 Wave 1 sample was entirely new and included no
panel, so its sample weighting followed the same procedures as for the Added Countries.
b
Antigua and Barbuda was covered in Phase 2 Wave 1 only.
The fresh samples in the Original and Added Countries have the same Random Digit Dialing (RDD) dual-frame design
as the Phase 1 samples and were all selected from the same RDD sampling frames of phone numbers used in Phase 1.4
4 For a full description of Phase 1 sampling design see HFPS Phase 1 Technical Note on Sampling, Weighting and Estimation, available at https://documents1.worldbank.
org/curated/en/336371631859678760/pdf/COVID-19-High-Frequency-Phone-Surveys-in-Latin-America-Technical-Note-on-Sampling-Design-Weighting-and-
Estimation.pdf
2
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
The RDD methodology generates virtually all possible telephone numbers in the country under the national telephone
numbering plan and draws a random sample of numbers. This method guarantees full coverage of telephone numbers
eliminating coverage bias with respect to the population with a phone.5
First, in each country, a large first-instance sample was selected in each frame (mobiles and landlines) of numbers, with
an allocation ranging from 0 percent landlines and 100 percent cell phones to 20 percent landlines and 80 percent cell
phones (landline and cell telephone numbers are distinguished by their prefixes). Landline numbers were included with
a small share of the total sample in nearly all countries to cover the landline-only households and persons, which have a
low prevalence in most Latin American countries and to achieve more accurate sex and age sample distributions.6
The landline frame in each country was geographically grouped by department, province, or state, and the sample of
landlines was selected with proportionate allocation across these strata. Geographic stratification of cell phones was only
possible in Argentina, Bolivia and Mexico since these are the only countries where mobile number prefixes are linked to
districts (provinces, departments or states7).
The first-instance samples of landline and cell phone numbers were then screened through an automated process to
identify the active numbers. The active numbers were cross-checked with business registries (based on yellow page
directories and websites) to identify and remove business numbers not eligible for this survey.
A smaller second-instance sample was then selected from the active residential numbers identified in the first-instance
sample and was delivered to the country operations team to be contacted and interviewed.
2. PHASE 2 WAVE 1
2.1. PHASE 2 WAVE 1 WEIGHTING PROCEDURES FOR THE HFPS ORIGINAL COUNTRIES
HFPS Phase 2 Wave 1 has three units of analysis: households, adult individuals (18 years of age and older) and children 6
through 17 years of age. Weights were computed for each sample unit and should be used according to the estimate of
interest.
The weighting process for the Original Countries included five steps:
5 Given that the HFPS used a sampling frame of telephone numbers, results are technically only about the population with a phone and exclude the population with
no phone.
6 Survey methodology literature and experience show that cell phone survey respondents are more likely to be male and younger than landline phone respondents
due to both cell phone ownership patterns and differential response rates, with females and seniors less likely to answer a call from an unknown number. Even
though the underrepresentation of females and seniors in a 100 percent cell phone sample can be compensated via nonresponse weighting adjustment and
calibration, the more unbalanced the sample, the larger the weighting adjustments needed and, hence, the larger the standard errors of the final survey estimates.
The inclusion of landline telephone numbers improves the sex and age representation in the sample and, therefore, the weighting adjustments will be smaller and
so will be the standard errors of the survey estimates.
7 To be sure, it is worth noting that the prefix of cell phone numbers does not guarantee completely the location of the individual, as they could migrate from one
region to the other and keep their initial number. Information from respondents’ actual location was collected in the survey.
3
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
1. Calculation of the inclusion probabilities of landline and cell phone numbers.
2. Computation of design weights for households and individuals.
3. Attrition nonresponse weighting adjustment for the panel sample component, and first-time nonresponse
adjustment for the supplement sample component.
4. Calibration of household, adult and child/adolescent weights, using external ancillary data from official sources
(adjusted by national phone coverage).
5. Weight trimming and recalibration.
Step 1: Inclusion probabilities of landline and cell phone numbers
Inclusion probabilities of landline and cell phone numbers depend on the implemented sampling design and its features:
stratification, sample size per stratum, frame size per stratum and the selection method applied in each stratum. The
HFPS sample design does not include any clustering.
The size of the Phase 2 Wave 1 overall (cell phones and landlines) selected sample of phone numbers (i.e., before any
fieldwork activities) in each of the Original Countries equals the Phase 1 Wave 1 overall selected sample of phone numbers
plus the size of the Phase 2 Wave 1 overall supplement fresh sample of phone numbers.
+
where
is the size of the Phase 2 Wave 1 overall selected sample of phone numbers;
is the size of the Phase 1 Wave 1 overall selected sample of phone numbers; and
is the size of the Phase 2 Wave 1 overall selected fresh supplement sample of numbers.
The calculation of the inclusion probabilities for Phase 2 Wave 1 sample follows procedures analog to the ones used in
Phase 1 for a dual-frame phone sample.
A first-instance8 sample was selected in each of the two frames (mobile number and landline number frames) with
simple random selection without replacement. This selection was made from the entire frame for mobile numbers and
within geographic strata for landlines. The selected numbers were then screened and classified into active and inactive.
8 We use the term “instance” in place of “sampling phase” to avoid confounding “sampling phase” with the “HFPS phase”. A sampling phase occurs when a subsample
of elements is selected out of a larger sample of the same type of elements.
4
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
The first-instance inclusion probabilities of cell phone and landline numbers are9
where
is the Phase 2 Wave 1 first-instance inclusion probability of the i-th cell phone number;
is the size of the Phase 2 Wave 1 first-instance sample of cell phones, composed of
active cell phones and inactive cell phones;
is the cell phone frame size, the total number of all possible cell phones according to the national
numbering plan;
is the Phase 2 Wave 1 first-instance inclusion probability of the i-th landline number
i n stratum h;
is the size of the Phase 2 Wave 1 first-instance sample of landlines in stratum h, composed
of active landlines and inactive landlines; and
is the landline frame size in stratum h, the total number of all possible landline numbers
according to the national numbering plan.
Next, two second-instance samples were selected systematically out of the first-instance samples of active cell and
active landline telephone numbers. The second-instance inclusion probabilities of cell phones and landlines conditional
on being selected in the first phase are
9 Inclusion probabilities of cell phones do not show a stratum index since most cell phone samples were not stratified for the reasons stated above. Only cell phone
samples for Argentina, Bolivia, and Mexico were stratified.
5
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
where
is the second-instance inclusion probability of the i-th active cell phone number
conditional on being selected in the first instance;
is the size of the second-instance sample of active cell phones;
is the second-instance inclusion probability of the i-th active landline number in stratum h
conditional on being selected in the first instance; and
is the size of the second-instance sample of active landlines in stratum h.
The final inclusion probabilities of cell phones and landlines are the product of their first-instance inclusion probabilities
and the conditional second-instance inclusion probabilities.
where alludes to the rate of active phones estimated in the first instance.10 Hence, the unconditional inclusion
probabilities of the second-instance active numbers and can be expressed as the ratio between
the active numbers selected in the second instance and an estimate of the total active numbers .
10 estimates are highly precise due to the very large size of the first-instance samples.
6
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
Step 2: Design weights for households, adults and children & adolescents
The selection probabilities of households and adult individuals are based on the inclusion probabilities of the cell phones
and landlines through which they can be reached. Therefore, the computation of household and individual selection
probabilities should account for multiple chances of selection and for the overlapping between the cell phone and
landline frames.
Thus, a multiplicity adjustment was made to eliminate the over-representation of sample households and individuals
that could be reached through more telephone numbers than other households and individuals, and thus eliminate the
chance for multiplicity sampling bias. For this purpose, the survey collected information about the number of cell phones
and landlines in the respondent households through the following questions:11
1. How many working cell phones in total are owned by the persons in your household, including you?
2. Is there any working landline in your household?
3. How many working landlines are there in your household currently?
Finally, household and individual design weights, w0j and w0k respectively, were calculated as the inverse of the multiplicity-
adjusted selection probabilities.12
Weighting of data on children & adolescents
HFPS Phase 2 Waves 1 and 2 collected specific data about a randomly selected child or adolescent 6 through 17 years
of age in each interviewed household. Wave 2 also collected information for a randomly selected child between 0 and
5 years of age. To implement this, the questionnaire first collected a roster of all children and adolescents living in each
respondent household and selected one at random.
The child/adolescent weight is based on his/her probability of selection within the household, conditional on his/her
household being selected in the sample. Hence
= /
= /
where
11 In the Caribbean countries, where a large share of persons have more than one active cell phone number, the questionnaire also asked the respondent about the
number of cell phone numbers owned individually. This data was used to adjust individual weights for multiplicity.
12 For more details on the computing of design weights see HFPS Phase 1 Technical Note on Sampling, Weighting and Estimation.
7
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
is the selection probability of the n-th child/adolescent in the j-th household when the household is
contacted through a cell phone,
is selection probability of the j-th household when contacted through a cell phone, adjusted for
multiplicity of working cell phones in the household;
is the number of eligible children and adolescents (6-17 years old) in the j-th household;
is the selection probability of the n-th child/adolescent of the j-th household in stratum h when the
household is contacted through a landline; and
is selection probability of the j-th household when contacted through a landline, adjusted for multiplicity
of working landlines in the household.
Children and adolescents’ design weight w0n is the inverse of the above selection probabilities = 1/
depending on whether the sample household was reached via a cell phone or a landline.
Step 3: Nonresponse weighting adjustment
When a phone number is called, it is not always possible to carry out an interview. Nonresponse occurs because of a
number of constraints. Most common are that nobody answers the call (no contact), the respondent is unwilling to
cooperate (refusal), or language barriers exist.
As in Phase 1, five main strategies were implemented during fieldwork to minimize nonresponse:
a. An SMS text message was sent to the sample cell phone numbers a couple of days before calling to inform that
a survey firm would reach out and persuade the phone holder to answer.
b. In most countries, the sample was released to the teams implementing the data collection over successive
replicates to keep nonresponse closely monitored.
c. Stringent calling protocols were put in place and monitored to ensure a minimum number of attempts on
different days and times (5 to 10 attempts depending on the country).
d. The survey offered monetary and non-monetary incentives in most countries to those who cooperated (e.g.,
gift cards and phone credit with an equivalent monetary value between US$3 and US$11). Incentives were
somewhat higher for panel units.
e. In some countries, the most experienced interviewers recontacted the numbers classified as a “Refusal” to
convert them into a “Complete interview”.
8
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
As described earlier, Phase 2 Wave 1 samples of the Original Countries had two components: a) a panel of respondents
to Phase 1 Wave 1, and b) a supplement fresh sample of phone numbers. These two sample components were subject
to different subsequent response mechanisms. The panel component was affected by first-time nonresponse in
Phase 1 Wave 1 plus attrition nonresponse in Phase 2 Wave 1. In contrast, the supplement sample incorporated in Phase 2
Wave 1 was only subject to first-time nonresponse. These two different response mechanisms were accounted for in the
nonresponse weighting adjustments described below.
A. Nonresponse adjustment of the panel sample component
A1. First-time nonresponse adjustment
The Phase 2 Wave 1 panel component was initially affected by first-time nonresponse in Phase 1 Wave 1. Therefore, the
design weights of responding panel households and individuals were adjusted to compensate for the nonresponse that
occurred in Phase 1 Wave 1 to reduce potential nonresponse bias. A class-based nonresponse adjustment was used by
crossing all categories of auxiliary variables that were known to be correlated with the likelihood of responding and were
available for both respondents and nonrespondents. Given that the survey used RDD sampling, the information in the
sampling frame was limited and the only variables available for respondents and non-respondents were the type of
phone number (landline or cell phone) and the corresponding geographic region (known for landlines in all countries and
for cell phones only in Argentina, Bolivia and Mexico).
The weighting class nonresponse adjustment was based on the inverse of the weighted response rate estimate in each
class, which is the ratio of the sum of the design weights of all units (respondents and nonrespondents) in class c to the
sum of the design weights of respondents in that class.
;
where is the nonresponse adjustment factor that should be applied to responding households in
class c, and is the nonresponse adjustment factor for responding individuals in that class. R and NR
indicate the responding and nonresponding units, respectively.
Thus, the nonresponse adjusted weights for responding households and individuals are
9
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
A2. Attrition nonresponse adjustment
In Phase 2 Wave 1, the sample panel component was subject to attrition nonresponse between Phase 1 Wave 1 and
Phase 2 Wave 1. Table 2 shows the number of respondents to Phase 1 Wave 1, how many of these responded to Phase 2
Wave 1 and the resulting recontact and attrition rates.13
Table 2. Phase 2 Wave 1 - Original Countries. Complete interviews and attrition rate by country based on
Phase 1 Wave 1 outcomes.
(1) (2) (3) (4)
Country Type of country Complete interviews Complete interviews Recontact rate Attrition rate
in Phase 1 Wave 1 in the panel ph1w1/ph2w1 ph1w1/ph2w1
Argentina Original 1,001 197 19.7% 80.3%
Bolivia Original 1,075 363 33.8% 66.2%
Chile Original 1,000 262 26.2% 73.8%
Colombia Original 1,000 268 26.8% 73.2%
Costa Rica Original 801 143 17.9% 82.1%
Dominican Republic Original 748 113 15.1% 84.9%
Ecuador a
Original 1,227
El Salvador Original 804 129 16.0% 84.0%
Guatemala Original 806 135 16.7% 83.3%
Honduras Original 807 141 17.5% 82.5%
Mexico Original 2,109 251 11.9% 88.1%
Paraguay Original 715 57 8.0% 92.0%
Peru Original 1,000 236 23.6% 76.4%
a
Ecuador was part of HFPS Phase 1 in 2020, so it is considered an Original Country in this table. However, its Phase 2 Wave 1 sample was entirely new and included no
panel, so its sample weighting should follow the same procedures as for the Added Countries.
13 Since in Phase 2 Wave 1 the project had no access to the Phase 1 Wave 1 respondent names, it was not possible to assert that respondents to Phase 2 Wave 1 were the
same that had been interviewed in Phase 1 Wave 1. Therefore, a matching algorithm was used to identify which respondents were highly likely to be the same as in
Phase 1 Wave 1. A household was considered to be the same as in Phase 1 Wave 1 if it had the same household ID, the household head´s sex was also the same, the
household head´s age differed in two years or less, and if at least one of the following variables had the same value: geographic region, urban/rural area, number of
adult members, number of children, household head´s educational attainment and whether the household had a landline phone. Only those households meeting
these conditions were labeled as panel respondents, and this partly explains the large attrition rates between Phase 1 Wave 1 and Phase 2 Wave 1. In contrast, in
Phase 2 Wave 2 the survey had access to the Phase 2 Wave 1 respondent names, so no matching algorithm was needed and attrition rates between Phase 2 Wave
1 and Phase 2 Wave 2 were significantly lower (see Section 3).
10
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
Attrition nonresponse weighting adjustment involved the following steps.
1. Identify a set of variables collected in Phase 1 Wave 1 as potential predictors of response in Phase 2 Wave 1.
2. Examine any missing data patterns in those preselected variables.
3. Impute any item missing values in the potential predictors identified in Phase 1 Wave 1 using a sequential regression
imputation procedure.
4. Run two alternate techniques and choose the most efficient one, i.e., the one that yielded smaller standard errors
for the key survey estimates:
4a. Run a random forest algorithm over the potential predicting variables identified in Phase 1 Wave 1 to
determine the best predictors of Phase 2 Wave 1 response, and form a set of response cells defined by the
interactions of the identified response predictors.
4b. Use a response propensity score adjustment by fitting a logistic regression model, with the preselected
variables from Phase 1 Wave 1 as independent variables and a dummy response indicator in Phase 2 Wave
1 as the outcome variable. Predict a response propensity for each respondent and nonrespondent unit
in Phase 2 Wave 1. Stratify all sample units (respondents and nonrespondents) based on their response
propensities to create equal-sized adjustment classes.
5. The response propensity model proved more efficient and the weighted median estimated propensity was
computed in each adjustment class.
6. Adjust the weights of the respondent sample units using the inverse of the median propensity in each adjustment
class.
The estimated response propensity can be written as
where
= , is the vector of p independent variables for unit i considered for the model; and
=( 0, 1,
…,
) allude to the estimated logistic regression coefficients corresponding to the p independent
p
variables.
The preselected variables considered for the response propensity logistic regression models were sex, age and educational
attainment of the household head, urban/rural location, number of household members, number of rooms, number of
11
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
cell phones, landline ownership, type of phone contacted (landline or cell phone) and whether any adult in the household
went without food for an entire day due to lack of money or other resources. These variables were assessed for each
country through a backward stepwise procedure, keeping in the final propensity model the ones with a regression
coefficient significant at the .05 level.
B. Nonresponse adjustment of the supplement sample component
The supplement sample weights adjustment follows the same procedure as the first-time nonresponse adjustment
described for the panel component in point A.1 above.
Step 4: Weight calibration
Finally, the nonresponse adjusted weights for both respondent households and individuals were calibrated to reflect the
totals by region, sex, age and educational attainment available from external national official sources, adjusted by phone
coverage. This last adjustment has two objectives:
• To further reduce potential nonresponse biases that were not addressed by the nonresponse adjustment in Step
3, by using auxiliary variables from external sources. This can be achieved as long as the calibration auxiliaries are
correlated with both nonresponse and the study variables.
• To improve the precision of the survey estimators (i.e., reduce the sampling variances), as long as the auxiliaries
are correlated with the study variables of interest.14
Calibration works by minimizing a measure of the distance between the input weights (nonresponse adjusted weights
in this case) and the calibrated weights, under the constraint that the sum of the calibrated weights equals the sum of
the totals of all the auxiliaries from the external source. Unlike the nonresponse adjustment, weight calibration requires
auxiliary variables only for respondents.
The available auxiliary variables were: total households by region and adult population by region, sex, age group,
educational attainment. These were all categorical variables, the region variable had many categories in most countries
and the overall samples were rather small. Under these circumstances, raking was the most suitable calibration method.
The HFPS applied a raking calibration using a logit distance function since this generally fitted a more exact adjustment
on the calibration auxiliaries.
The final weights for respondent households and individuals can then be expressed as:
14 This objective was not addressed in this survey since it would have entailed computing a large set of replicate weights (with bootstrap or jackknife replication
methods), which could be confusing for the final user.
12
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
where
is the design weight for the j-th household;
is the nonresponse adjustment factor for households in class c;
is the calibration factor for the j-th household;
is the design weight for the k-th individual;
is the nonresponse adjustment factor for individuals in class c; and
is the calibration factor for the k-th individual.
Tables 3 and 4 show the data sources used for calibrating the weights in each Original Country. Population totals taken
from these sources were adjusted for telephone coverage based on the national phone coverage rates published by the
United Nations International Telecommunication Union (ITU).
Table 3. Original Countries. Sources of auxiliary data used for weight calibration by region, sex age.
Country Data source used for weight calibration (region, sex and age)
Argentina Instituto Nacional de Estadística y Censos. Proyecciones de Población 2021.
Bolivia Instituto Nacional de Estadística. Proyecciones de Población 2021.
Chile Instituto Nacional de Estadística. Estimaciones y Proyecciones de Población 2021.
Colombia Departamento Administrativo Nacional de Estadística. Proyecciones de Población 2021.
Costa Rica Instituto Nacional de Estadística y Censos. Proyecciones de Población 2021.
Dominican Rep. Oficina Nacional de Estadística. Proyecciones de Población 2021.
Ecuador Instituto Nacional de Estadística y Censos. Proyecciones de Población 2020.
El Salvador Dirección General de Estadística y Censos. Proyecciones de Población 2021.
Guatemala Instituto Nacional de Estadística de Guatemala. Proyecciones de Población 2021.
Honduras Instituto Nacional de Estadística. Proyecciones de Población. 2021
Mexico Instituto Nacional de Estadística y Geografía. Censo Nacional de Población 2020.
Paraguay Instituto Nacional de Estadística. Proyecciones de Población 2021.
Peru Instituto Nacional de Estadística e Informática. Estimaciones y Proyecciones de Población 2021.
13
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
Table 4. Original Countries. Sources of auxiliary data used for weight calibration by educational attainment.
Country Data source used for weight calibration (educational attainment)
Argentina Instituto Nacional de Estadística y Censos. Censo Nacional de Población, Hogares y Viviendas 2010.
Bolivia Instituto Nacional de Estadística. Encuesta de Hogares 2020
Chile Instituto Nacional de Estadística. Censo de Población y Vivienda 2017.
Colombia Departamento Administrativo Nacional de Estadística. Censo Nacional de Población y Vivienda 2018.
Costa Rica Instituto Nacional de Estadística y Censos. Censo Nacional de Población y Vivienda 2011.
Dominican Rep. -
Instituto Nacional de Estadística y Censos. Encuesta Nacional de Empleo, Desempleo y Subempleo,
Ecuador
diciembre 2019.
El Salvador -
Guatemala Instituto Nacional de Estadística de Guatemala. Censo Nacional de Población y Vivienda 2018.
Honduras -
Mexico Instituto Nacional de Estadística y Geografía. Censo Nacional de Población 2020.
Paraguay Instituto Nacional de Estadística. Encuesta Permanente de Hogares 2019.
Peru Instituto Nacional de Estadística e Informática. Censo de Población y Vivienda 2017.
Step 5: Weight trimming
The distributions of the resulting household and individual weights were examined in each country to decide whether any
trimming was needed. Weight trimming sought to reduce excess variation in the final weights introduced by nonresponse
adjustments and calibration, thus mitigating the inflation of the standard errors of the estimates due to weighting.
The trimming process took the largest weights within each of the age-sex-region-education subgroups and reduced their
value to the next largest value of the weights. The number of weights trimmed was generally small in all countries. These
trimmed weights were then recalibrated to the population control distributions.
On the other hand, trimming may also change estimates if done carelessly, particularly if the value of a variable with a
large weight value is itself large. Therefore, the trimming process included a sensitivity analysis to assess whether large
changes in estimates might occur as weights were trimmed.
Trimming was carried out in a series of rounds within each subgroup. Each time the larger weights in a round were
trimmed, a set of key survey estimates were computed for the overall sample. If the relative change was more than 2%
of the estimated value before any trimming was done, the trimming step was not used. This limit was reached for only
very few estimates in very few countries. The trimming process thus sought to reduce unnecessary weight variation while
avoiding significant changes in key survey estimates.
Finally, trimmed weights were recalibrated to the same population totals used in the first calibration described in Step 4.
14
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
2.2. PHASE 2 WAVE 1 WEIGHTING PROCEDURES FOR THE HFPS ADDED COUNTRIES
The sample design and weighting procedures in the Added Countries are the same ones used for the Original Countries in
Phase 1 Wave 1. For a detailed description, refer to HFPS Phase 1 Technical Note on Sampling, Weighting and Estimation.
Tables 5 and 6 show the data sources used for calibrating the weights in each Added Country.
Table 5. Added Countries. Sources of the auxiliary data used for weight calibration by region, sex and age.
Country Data source used for weight calibration (region, sex and age)
Antigua & Barbuda Statistics Division, Ministry of Finance and Corporate Governance. Population Projections 2021.
Belize Statistical Institute of Belize. Population Projections 2020.
Dominica Central Statistics Office of Dominica. Population Projections 2021.
Guyana Bureau of Statistics of Guyana. Population Projections 2017.
Haiti United Nations Statistics Division. Population Projections 2020.
Jamaica Statistical Institute of Jamaica. Population Projections 2019.
Nicaragua United Nations Statistics Division. Population Projections 2020.
Panama Instituto Nacional de Estadística y Censo. Estimaciones y Proyecciones de Población 2021.
Saint Lucia Central Statistical Office. Population Estimates and Projections 2018.
Uruguay Instituto Nacional de Estadística. Estimaciones y Proyecciones de Población 2021.
Table 6. Added Countries. Sources of the auxiliary data used for weight calibration by educational attainment.
Country Data source used for weight calibration (educational attainment)
Antigua & Barbuda -
Belize Statistical Institute of Belize. 2010 Populations Census.
Dominica -
Guyana -
Haiti -
Jamaica The Planning Institute of Jamaica. Jamaica Survey of Living Conditions 2017.
Nicaragua -
Panama Instituto Nacional de Estadística y Censo. Censo Nacional de Población 2010.
Saint Lucia -
Uruguay -
Note: In those countries with no available recent data about educational attainment, weight calibration was only done by region, sex and age.
15
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
3. PHASE 2 WAVE 2
HFPS Phase 2 Wave 2 has four units of analysis: households, adult individuals (18 years of age and older), children 0
through 5 years of age, and children 6 through 17 years of age.15 Weights were computed for each sample unit and should be
used according to the estimate of interest.
The size of the Phase 2 Wave 2 overall (cell phones and landlines) selected sample of phone numbers (i.e., before any fieldwork
activities) in each of the Original Countries equals the Phase 1 Wave 1 overall selected sample of phone numbers, plus the Phase
2 Wave 1 overall supplement fresh sample, plus the Phase 2 Wave 2 overall supplement fresh sample of phone numbers.
where
is the size of the Phase 2 Wave 2 overall selected sample of phone numbers in each Original Country;
is the size of the Phase 1 Wave 1 overall selected sample of phone numbers;
is the size of the Phase 2 Wave 1 overall selected fresh supplement sample of numbers; and
is the size of the Phase 2 Wave 2 overall selected fresh supplement sample of numbers.
These three sample components were subject to different response mechanisms. The sample was affected by first-time
nonresponse in Phase 1 Wave 1, plus attrition nonresponse in Phase 2 Wave 1 and attrition nonresponse in Phase 2 Wave
2. The fresh supplement sample was subject to first-time nonresponse in Phase 2 Wave 1 and attrition nonresponse
in Phase 2 Wave 2. Finally, the fresh supplement sample incorporated in Phase 2 Wave 2 was only subject to first-time
nonresponse. These response mechanisms were accounted for in the nonresponse weighting adjustments through a set
of procedures analog to the ones described in Section 2 above.
On the other hand, the size of the Phase 2 Wave 2 overall selected sample of phone numbers in the Added Countries
equals the Phase 2 Wave 1 overall selected sample of phone numbers, plus the Phase 2 Wave 2 overall fresh supplement
sample of phone numbers.
15 In Ecuador, the units of analysis were households, adult individuals, children 0 through 4 years of age, and children 5 to 17 years of age.
16
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
where
is the size of the Phase 2 Wave 2 overall selected sample of phone numbers in each Added Country;
is the size of the Phase 2 Wave 1 overall selected sample of phone numbers; and
is the size of the Phase 2 Wave 2 overall selected fresh supplement sample of numbers
These two sample components also went through different response mechanisms. The sample was affected by
first-time nonresponse in Phase 2 Wave 1 and attrition nonresponse in Phase 2 Wave 2. The supplement
was only subject to first-time nonresponse in Phase 2 Wave 2.
Again, these response mechanisms were also considered in the nonresponse weighting adjustments through procedures
analog to the ones described in Section 2.
Minority populations
HFPS Phase 2 Wave 2 produced survey results for minority groups (indigenous and afro-descendant populations) in
seven countries: Bolivia, Dominican Republic, Colombia, Guatemala, Mexico, Panama and Peru. In order to attain minority
samples sufficiently large to allow for reliable estimates, the HFPS had to sample minority populations at a sampling rate
higher than for the primary samples in five countries: Bolivia, Dominican Republic, Colombia, Panama and Peru. For this
purpose, the HFPS executed a screening operation in each of these countries.16
The screening process consisted of calling an additional random sample of numbers and applying a short questionnaire
to identify respondents who were part of the HFPS minority groups. Respondents classified as part of a minority
population were then interviewed using the main survey questionnaire. These were then added to the primary sample
of respondents, and therefore, in these five countries the Phase 2 Wave 2 final number of interviewed households and
individuals is larger than in Phase 2 Wave 1.
16 Given the high prevalence of minority populations in Guatemala and Mexico, their main samples included a number of interviewed minority households and
individuals large enough for achieving reliable estimates, and no additional samples were needed. The project conducted a separate data collection exercise in
Ecuador to obtain indicators of the Venezuelan immigrants in the country. The sampling strategy is also different from the HFPS Phase 2 strategy described here.
Results of such effort will be made available under a separate publication.
17
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
Table 7. Phase 2 Wave 2 - Original & Added Countries. Complete interviews and attrition rate by country.
(1) (2) (3) (4)
Country Type of country Complete interviews Complete interviews Recontact rate Attrition rate
in Phase 2 Wave 1 in the panel ph2w1/ph2w2 ph2w1/ph2w2
Argentina Original 1,216 595 48.9% 51.1%
Bolivia Original 1,272 622 48.9% 51.1%
Chile Original 1,212 482 39.8% 60.2%
Colombia Original 1,221 675 55.3% 44.7%
Costa Rica Original 805 354 44.0% 56.0%
Dominican Republic Original 1,205 531 44.1% 55.9%
Ecuador a
Original 951 490 51.5% 48.5%
El Salvador Original 818 288 35.2% 64.8%
Guatemala Original 1,207 593 49.1% 50.9%
Honduras Original 1,021 355 34.8% 65.2%
Mexico Original 2,625 850 32.4% 67.6%
Paraguay Original 1,076 594 55.2% 44.8%
Peru Original 1,212 584 48.2% 51.8%
Antigua & Barbuda b
Added 790
Belize Added 816 431 52.8% 47.2%
Dominica Added 861 452 52.5% 47.5%
Guyana Added 785 431 54.9% 45.1%
Haiti Added 2,814 1,487 52.8% 47.2%
Jamaica Added 829 373 45.0% 55.0%
Nicaragua Added 833 362 43.5% 56.5%
Panama Added 815 415 50.9% 49.1%
Saint Lucia Added 835 415 49.7% 50.3%
Uruguay Added 816 373 45.7% 54.3%
a
Ecuador was part of HFPS Phase 1 in 2020, so it is considered an Original Country in this table. However, its Phase 2 Wave 1 sample was entirely new and included no
panel, so its sample weighting should follow the same procedures as for the Added Countries.
b
Antigua and Barbuda was covered in Phase 2 Wave 2 only.
18
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
Table 8. Phase 2 Wave 2. Complete minority interviews by country.
(1) (2) (3) (4)
Country Complete minority Screened numbers Complete minority Total minority complete
interviews from the interviews from the interviews
primary sample screening operation
Bolivia 475 1,112 129 604
Dominican Rep. 766 1,471 167 933
Colombia 529 7,059 312 841
Guatemala 639 639
Mexico 726 726
Panama 480 2,209 349 829
Peru 296 13,220 422 718
As shown in Table 8, the total sample of minority respondents in the five countries is formed by two components: minority
cases already present in the primary sample (column 1) and minority cases interviewed in the screening operation (column
3). Both minority components have the same base weights based on the total combined sample of minority cases. The
base weights of the first minority component were adjusted for attrition nonresponse between Phase 2 waves 1 and 2,
whereas the base weights of the second minority component were adjusted for first-time nonresponse.
Finally, the adjusted weights of the combined sample of minority respondents were calibrated to minority totals by sex,
age and educational attainment available from official sources. Table 9 shows the data sources used for calibrating the
weights of the minority population samples in the five countries. The datasets published by the LAC HFPS Phase 2 project
contain the sampling weights for both the main and minority samples in the same data structure (see Box 1).
Table 9. Phase 2 Wave 2 - Minority Populations. Sources of the auxiliary data used for weight calibration
by sex, age and educational attainment.
Country Data source used for weight calibration (region, sex, age and educational attainment)
Bolivia Instituto Nacional de Estadística. Encuesta de Hogares. 2021.
Colombia Departamento Administrativo Nacional de Estadística. Encuesta Nacional de Calidad de Vida. 2021.
Dominican Rep. Vanderbilt University. LAPOP Survey. 2020.
Panama Instituto Nacional de Estadística y Censo. Encuesta de Propósitos Múltiples. 2021.
Peru Instituto Nacional de Estadística e Informática. Encuesta Nacional de Hogares. 2020.
19
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
Box 1. Non-minority and minority population data structure
Combining two samples selected at quite different sampling rates (the primary sample and the
screening sample) in one single sample -and dataset- implies having quite variable weights, which
makes estimates for the overall population (including non-minority and minority populations) less
efficient. That is, the standard errors for a given sample size will be generally larger than with units
selected at a single rate[1]. Therefore, two alternatives were considered to organize the survey data: a)
use only the primary sample to obtain overall population estimates and use the total minority sample
separately to produce minority population estimates, and b) use a unique sample, including the primary
and the screening samples, to produce overall population estimates and obtain minority estimates by
filtering the total minority sample cases.
The first alternative would produce more efficient overall estimates, although with a smaller sample.
Conversely, the second alternative would produce less efficient overall estimates yet with a larger
sample. Minority estimates would be the same with either alternative. On the other hand, the second
alternative would allow a much simpler and (end-user) friendly approach to handle and analyze the
final publicly-available datasets.
To further inform this decision, a simulation was run under different minority population prevalences,
varying sample sizes and different sampling rates. The exercise showed that given the minority
population prevalences and the sample sizes in the five countries of interest (see Table 8), the second
alternative yielded less efficient but slightly more precise estimates for most variables. Consequently,
and also considering its simplicity, the second alternative was pursued in preparation of the LAC HFPS
Phase 2 Wave 2 datasets.
[1] Lower efficiencies are expressed through larger sample design effects.
4. ESTIMATION AND SAMPLING ERRORS
When estimating sampling errors (expressed in the sampling variances, standard errors, coefficients of variation and
confidence intervals) for statistics such as means, proportions, ratios and regression parameters, sample design features
and weighting need to be accounted for. If not, sampling error estimates will be biased. Standard errors and coefficients
of variation would be usually understated and confidence intervals would be narrower than expected.
The two most usual approaches to estimating sampling errors for survey data are 1) the Taylor Series Linearization (TSL)
of the estimator and the corresponding approximation to its variance, or 2) the use of resampling variance estimation
20
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
techniques such as Balanced Repeated Replication (BRR), Jackknife Repeated Replication (JRR) and bootstrap. Stata
and other statistical software packages use the TSL method as the default for estimating survey data sampling errors.
To determine the precision of an HFPS estimate, the data user can estimate the corresponding sampling error using the
Stata code in Annex 1. This script delivers the point estimate, the standard error, the 95% confidence interval and the
coefficient of variation accounting for the sample design features and weighting.
The standard error is the square root of the sampling variance. The coefficient of variation is a relative measure of the
standard error, calculated as the ratio between the standard error and the point estimate (it is usually expressed in
percentage terms). As a rule of thumb, estimates with coefficients of variation of 1 percent or lower are considered to have
a very high level of precision. Coefficients of variation between 1 and 3 percent are generally classified as very good, from 3
to 5 percent as good, 5 to 10 percent as acceptable, and 10 to 15 percent as large. Above 15 percent is classified as too large
and the corresponding estimate is considered unreliable.
REFERENCE LITERATURE
Heeringa, S., West, B., and P. Berglund. (2017). Applied Survey Data Analysis (Second Edition). New York, Taylor & Francis
Group.
Lohr, S. and J. RAO. (2006). Estimation in Multiple-Frame Surveys, Journal of the American Statistical Association, 101,
1019−1030.
Lohr, S. (2011). Alternative Survey Sample Designs: Sampling with Multiple Overlapping Frames, Survey Methodology, 37,
197−213. Statistics Canada.
Skinner, C. and J. Rao. (1996). Estimation in Dual-Frame Surveys with Complex Designs, Journal of the American Statistical
Association, 91, 349−356.
Thompson, S. (2012). Chapter 15: Network Sampling and Link-Tracing Designs, in Sampling. New York, Wiley.
Valliant, R., Dever J., and F. Kreuter. (2016). Practical Tools for Designing and Weighting Sample Surveys. New York, Springer.
21
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
ANNEX 1
Stata Code for Weighted Estimates and Sampling Error Computation
Cross-sectional data
This annex provides examples of the Stata syntax for computing estimates and their corresponding sampling errors
(measured by standard errors, confidence intervals, and coefficients of variation), accounting for the HFPS sample design
and weighting. For more details, data users are referred to the online Stata manual for the svy command (http://www.
stata.com/manuals15/svy.pdf).
The following examples are based on Phase 1 Wave 1 data.
To specify the sample design features in any of the HFPS Phase 1 Wave 1 datasets, use command:
svyset [pweight= w_hh_ph2w1], strata (stratum)
*For household-level estimates in Phase 2 Wave 1 use weight w_hh_ph2w1
*For individual-level estimates in Phase 2 Wave 1 use weight w_ind_ph2w1
Numeric variables (means):
To estimate the mean age of the population 18+, use command:
svy: mean u03_03
estat cv
To estimate the mean age of the population 18+ by gender, use command:
svy: mean u03_03, over(u03_04)
estat cv
To estimate the mean age of the population 18+ who did not work in the week prior to the interview, use command:
svy, subpop (if u05_01==2): mean u03_03
estat cv
22
Public Disclosure Authorized
HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2
SAMPLING DESIGN, WEIGHTING, AND ESTIMATION
December 2022
Categorical variables (proportions):
To estimate the frequency distribution of persons 18+ on whether they worked in the week prior to the interview, use
command:
svy: tab u05_01, se ci cv
To estimate the frequency distribution of persons 18+ on whether they worked in the week prior to the interview by sex, use
command:
svy: tab u05_01 u03_04, col se ci cv
To estimate the frequency distribution of persons 18+ on whether they worked in the week prior to the interview among
males, use command:
svy, subpop (if u03_04==1): tab u05_01, se ci cv
Linear regression:
To estimate the regression coefficients of a continuous variable y on two continuous variables x1 and x2, use command:
svy: regress y x1 x2
To estimate the regression coefficients of a continuous variable y on two continuous variables x1 and x2 and two categorical
variables x3 and x4, use command:
svy: regress y x1 x2 i.x3 i.x4
23