COVID-19
HIGH-FREQUENCY PHONE SURVEY (HFPS)
IN LATIN AMERICA
TECHNICAL NOTE ON SAMPLING DESIGN,
WEIGHTING, AND ESTIMATION *
September 2021
T he COVID-19 High-Frequency Phone Survey (HFPS) 2020 was conducted in 13 Latin American countries: Argenti-
na, Bolivia, Chile, Colombia, Costa Rica, Dominican Republic, Ecuador, El Salvador, Guatemala, Honduras, Mexico,
Paraguay, and Peru. It followed a panel sample over three waves of data collection in 12 countries and over four waves
in Ecuador.1 All waves spanned from May to August 2020 and each wave’s collection period lasted about ten days on
average. The survey was administered to one adult per household. Each respondent was presented with both individual
and household-level questions.
All national samples were based on a dual frame of cell and landline phones, and selected as a one-stage probability
sample, with geographic stratification of landline numbers. The samples were generated through a Random Digit Dialing
(RDD) process covering all cell and landline telephone numbers active at the time of the sample selection.
Survey estimates represent households with a landline or at least one cell phone and individuals of 18 years of age or
above who have an active cell phone number or a landline at home.
1. Sampling design
The RDD methodology generates virtually all possible telephone numbers in the country under the national telephone
numbering plan and then draws a random sample of numbers. This method guarantees full coverage of the population
with a phone.2
First, in each country, a large first-phase sample was selected in each frame of numbers, with an allocation ranging from
0 percent landlines and 100 percent cell phones to 20 percent landlines and 80 percent cell phones (landline and cell
telephone numbers are distinguished by their prefixes). Landline numbers were included with a small share of the total
sample in nearly all countries for two reasons: to cover the landline-only households and individuals, who have a low
prevalence in most Latin American countries; and to achieve more accurate sex and age sample distributions.3
* This note was prepared by Ramiro Flores Cruz, partner at Sistemas Integrales and World Bank consultant on survey methodology and sampling, with the financial
support from the Latin American and Caribbean Regional Vice Presidency.
1 Ecuador HFPS had a sample design different to the other HFPS countries since it was based on respondents to the 2019 Human Mobility and Host Community
Survey (EPEC by its acronym in Spanish), which collected phone numbers in the field. For more details about EPEC’s sample design see Muñoz, Juan; Muñoz, Jose;
Olivieri, Sergio. 2020. Big Data for Sampling Design: The Venezuelan Migration Crisis in Ecuador. Policy Research Working Paper; No. 9329. World Bank, Washington, DC.
https://openknowledge.worldbank.org/handle/10986/34175
2 Given that the HFPS used a sampling frame of telephone numbers, results represent the population with at least one active phone and exclude the population with
no phone.
3 Survey methodology literature and experience show that cell phone survey respondents are more likely to be male and younger than landline phone respondents due
to both cell phone ownership patterns and differential response rates, with females and seniors less likely to answer a call from an unknown number. The underrep-
resentation of females and seniors in a 100 percent cell phone sample can be compensated via nonresponse weighting adjustment and calibration. That said, the
more unbalanced the sample, the larger the weighting adjustments needed; hence, standard errors in the final survey estimates are larger. The inclusion of landline
telephone numbers improves the sex and age representation in the sample. As such, the weighting adjustments and the standard errors of the survey estimates will
both be smaller.
1
www.worldbank.org /BancoMundial @BancoMundialLAC
COVID-19
HIGH-FREQUENCY PHONE SURVEY (HFPS)
IN LATIN AMERICA
TECHNICAL NOTE ON SAMPLING DESIGN,
WEIGHTING, AND ESTIMATION
September 2021
The landline frame in each country was geographically stratified by department, province, or state, and the sample of
landlines was selected with proportionate allocation across these strata. Geographic stratification of cell phones was only
done in Argentina, Bolivia, and Mexico.4 It is important to note that the HFPS sample design allows for obtaining precise
estimates at the country level only. Some subnational estimates may have large sampling errors.5
The first-phase samples of landline and cell phone numbers were then screened through an automated process to iden-
tify the active numbers. The active numbers were then cross-checked with business registries (based on yellow page
directories and websites) to identify and remove business numbers not eligible for this survey.
A smaller second-phase sample6 was then selected from the active residential numbers identified in the first-phase sam-
ple and was delivered to each country operations team to be contacted by the interviewers.7
HFPS sample sizes
The HFPS was conducted in three waves. Table 1 shows the wave 1 final sample size per country and the allocation be-
tween both frames.8 In the first wave, when a cell phone was called, the call answerer was interviewed as long as he or she
was 18 years of age or above. When a landline number was called, the interviewer asked to talk to any household member
18 years of age or older. In both cell phone and landline calls, the respondent was asked individual and household ques-
tions. Landlines are 10 to 15 percent of the sample in eight countries, 20 percent in two, and 0 percent in three (Table 1).
Wave 1 respondents were recontacted to be interviewed in the second and third waves. The questionnaires across waves
included different questions but kept core questions considered key to longitudinal analysis.
4 Geographic stratification of cell phones was only feasible in these three countries because only in them cell phone number prefixes can be linked to the district
(department, province, or state) in which they were issued.
5 Annex 1 shows how to compute sampling errors for different estimates using Stata.
6 Note that the selection of phone numbers involves two sampling phases, and not two sampling stages. The HFPS involves only one sampling stage.
7 Furthermore, the second-phase sample was delivered in batches to the country teams during fieldwork. Delivering large lists of numbers could have facilitated the
“misuse” of the sample by easily replacing non-answering numbers, raising nonresponse rates and potentially increasing nonresponse biases.
8 The HFPS samples are element samples (i.e., they have one sampling stage), so the design effects are about 1 and the effective sample sizes are similar to the
nominal sizes. In contrast, multi-stage cluster samples typically have design effects larger than 1 and the effective sample sizes are smaller than the nominal sizes,
generating larger standard errors.
2
www.worldbank.org /BancoMundial @BancoMundialLAC
COVID-19
HIGH-FREQUENCY PHONE SURVEY (HFPS)
IN LATIN AMERICA
TECHNICAL NOTE ON SAMPLING DESIGN,
WEIGHTING, AND ESTIMATION
September 2021
Table 1. Sample size and allocation to cell phones and landlines in HFPS Wave 1
Country Sample size Cell phones Landlines
Argentina 1,000 85% 15%
Bolivia 1,000 100% 0%
Chile 1,000 80% 20%
Colombia 1,000 85% 15%
Costa Rica 800 90% 10%
Dominican Republic 800 85% 15%
Ecuador 1,200 85% 15%
El Salvador 800 90% 10%
Guatemala 800 90% 10%
Honduras 800 100% 0%
Mexico 2,000 80% 20%
Paraguay 800 100% 0%
Peru 1,000 90% 10%
2. Weighting
The HFPS has two sample units: households and individuals. Sampling weights were computed for each unit and should
be used according to the estimate of interest. The weighting process involves four steps:
1. Calculation of the inclusion probabilities of landline and cell phone numbers.
2. Computation of design weights for households and individuals.
3. Nonresponse weighting adjustment.
4. Calibration of individual and household weights, using external data from official sources (adjusted for the
national phone coverage).
In the second and third HFPS waves, household and individual weights were further adjusted for attrition nonresponse
from Wave 1 to 2 and from Wave 1 to 3.
Step 1: Inclusion probabilities of landline and cell phone numbers
A first-phase sample was selected in each of the two frames (cell phone number and landline number frames) with
simple random selection without replacement from the entire frame or within geographic strata. The selected numbers
were then screened and classified into active and inactive.
3
www.worldbank.org /BancoMundial @BancoMundialLAC
COVID-19
HIGH-FREQUENCY PHONE SURVEY (HFPS)
IN LATIN AMERICA
TECHNICAL NOTE ON SAMPLING DESIGN,
WEIGHTING, AND ESTIMATION
September 2021
The first-phase inclusion probabilities of cell phone and landline numbers are9
where
is the first-phase inclusion probability of the i-th cell phone number;
is the size of the first-phase sample of cell phones, composed of active cell phones and
inactive cell phones;
is the cell phone frame size, the total number of all possible cell phones according to the national
numbering plan;
is the first-phase inclusion probability of the i-th landline number in stratum h;
is the size of the first-phase sample of landlines in stratum h, composed of active landlines and
inactive landlines; and
is the landline frame size in stratum h, the total number of all possible landline numbers according to the national
numbering plan.
Next, two second-phase samples were selected systematically out of the first-phase samples of active cell and active
landline telephone numbers. The second-phase inclusion probabilities of cell phones and landlines are
where
9 Inclusion probabilities of cell phones do not show a stratum index since most cell phone samples were not stratified for the reasons stated above. Only cell phone
samples for Argentina, Bolivia, and Mexico were stratified.
4
www.worldbank.org /BancoMundial @BancoMundialLAC
COVID-19
HIGH-FREQUENCY PHONE SURVEY (HFPS)
IN LATIN AMERICA
TECHNICAL NOTE ON SAMPLING DESIGN,
WEIGHTING, AND ESTIMATION
September 2021
is the second-phase inclusion probability of the i-th active cell phone number conditional on being selected in
the first phase;
is the size of the second-phase sample of active cell phones;
is the second-phase inclusion probability of the i-th active landline number in stratum h conditional on being
selected in the first phase; and
is the size of the second-phase sample of active landlines in stratum h.
The unconditional inclusion probabilities of the second-phase active cell phones and landlines are
where is the rate of active phones estimated in the first phase.10 Hence, the unconditional inclusion probabilities
of the second-phase active numbers π C πL
i and hi can be expressed as the ratio between the active numbers selected
in the second phase and an estimate of the total active numbers in the frame .
Step 2: Design weights for households and individuals
The selection probabilities of households and individuals aged 18 years and older are based on the inclusion probabili-
ties of the cell phones and landlines through which they can be reached. Therefore, the computation of household and
individual weights should account for multiple chances of selection and for the overlapping between the cell phone and
landline frames. This multiplicity weighting adjusts estimates to eliminate the over-representation of households and
individuals in the sample that can be reached through more telephone numbers than other households and individuals.
It thus eliminates the chance for multiplicity sampling bias.
10 RA(1) estimates are highly precise due to the very large size of the first-phase samples.
5
www.worldbank.org /BancoMundial @BancoMundialLAC
COVID-19
HIGH-FREQUENCY PHONE SURVEY (HFPS)
IN LATIN AMERICA
TECHNICAL NOTE ON SAMPLING DESIGN,
WEIGHTING, AND ESTIMATION
September 2021
Multiplicity adjustment
There is multiplicity probability when a household has a larger selection probability because it can be selected through
different sample elements (telephone numbers). Households with more than one cell phone or more than one landline
number are over-represented in sample designs like this. As a result, their selection probabilities need to be adjusted to
account for this increased chance of selection. The multiplicity-adjusted household selection probabilities in each frame
are computed as
where
is selection probability of the j-th household when contacted through a cell phone, adjusted for multiplicity of
working cell phones in the household;
is the number of working cell phones in the j-th household;
is the selection probability of the j-th household in stratum h when contacted through a landline, adjusted for
multiplicity of working landlines in the household; and
is the number of working landlines in the j-th household.
Therefore, if a household has mc cell phones, its chance of being selected through a cell phone is mc higher than a house-
hold where there is only one cell phone. The same applies to landlines, in which case the multiplicity factor is ml. Since the
number of cell phones and landlines in a household is unknown at the time of the sample design, it needs to be asked
during the interview in the questionnaire.
The probability of an individual being selected through a cell phone equals the inclusion probability of his or her cell
phone number. On the other hand, the probability of an individual being selected through a landline equals the selection
probability of his or her household, conditional on the number of working landlines in the household, over the number of
individuals aged 18 years and older in the household.
6
www.worldbank.org /BancoMundial @BancoMundialLAC
COVID-19
HIGH-FREQUENCY PHONE SURVEY (HFPS)
IN LATIN AMERICA
TECHNICAL NOTE ON SAMPLING DESIGN,
WEIGHTING, AND ESTIMATION
September 2021
where
is the selection probability of the k-th individual when contacted through a cell phone;
is the selection probability of the k-th individual in stratum h when contacted through a landline in the j-th
household; and
is the number of eligible persons 18 years of age or older in the j-th household.
Overlapping sampling frames
Households and individuals with both cell and landline telephones (dual cases) have a higher probability of being select-
ed than those with only cell phones or only landlines. The following diagram displays the overlapping pattern of the cell
phone and landline sampling frames.
Figure 1. Partially overlapping frames
A (cell phone frame) B (landline frame)
Cell phone only Dual Landline only
In order to adjust the selection probabilities for multiplicity, it is essential to collect relevant information during the inter-
view. It is necessary to know the domain ownership of the sample households and individuals, plus the number of cell
phones and landlines in the sample households. For this purpose, the HFPS questionnaire included the following three
questions:
7
www.worldbank.org /BancoMundial @BancoMundialLAC
COVID-19
HIGH-FREQUENCY PHONE SURVEY (HFPS)
IN LATIN AMERICA
TECHNICAL NOTE ON SAMPLING DESIGN,
WEIGHTING, AND ESTIMATION
September 2021
1. How many working cell phones in total are owned by the persons in your household, including you?
2. Is there any working landline in your household?
3. How many working landlines are there in your household currently?
By knowing the domain of ownership, the selection probability for each sample unit can be calculated based on the
following probability property
P (⋃) = () + () − (⋂)
where (⋂) = () × (), given that A and B are independent
In general, in a dual-frame telephone sample design
if the sample unit is cell phone only
= if the sample unit is landline only
if the sample unit is dual
where C y L are the selection probabilities of the sample units (households or individuals) in domains cell phone
only and landline only.
In the specific HFPS setting (with overlapping frames and multiplicity)
Selection probabilities for households are
if the household is cell phone only
j= if the household is landline only
if the household has both cell phone and landline
Selection probabilities for individuals are
k=
if the individual is cell phone only
if the individual is landline only
if the individual has both cell phone and landline telephones
8
www.worldbank.org /BancoMundial @BancoMundialLAC
COVID-19
HIGH-FREQUENCY PHONE SURVEY (HFPS)
IN LATIN AMERICA
TECHNICAL NOTE ON SAMPLING DESIGN,
WEIGHTING, AND ESTIMATION
September 2021
Household and individual design weights, w0j and w0k respectively, are the inverse of the above selection probabilities
Step 3: Nonresponse adjustment
When a phone number is called, it is not always possible to carry out an interview. Nonresponse occurs because of a num-
ber of constraints. Most common are that nobody answers the call (no contact), the respondent is unwilling to cooperate
(refusal), or language barriers exist.
Four main strategies were implemented to minimize nonresponse:
a. The survey management team sent SMS text messages to the sample cell phone numbers before calling to
inform that a survey firm would reach out and persuade the phone holder to answer.
b. In most countries, the sample was released to the country operations teams over successive replicates to keep
nonresponse monitored and under the central management team’s control.
c. Stringent calling protocols were put in place and monitored to ensure a minimum number of attempts on differ-
ent days and times (5 to 10 attempts depending on the country).
d. The survey offered monetary and non-monetary incentives in most countries to those who cooperated (e.g., gift
cards and phone credit).
e. In some countries, the most experienced interviewers recontacted the numbers classified as a “Refusal” to con-
vert them into a “Complete interview”.
These actions enabled response rates that were higher than similar RDD sample surveys. Wave 1 response rates and re-
contact rates in waves 2 and 3 varied across countries. The highest response levels were in Bolivia and Ecuador, while the
lowest were in Argentina and Mexico.
The survey attempted to recontact all respondents from wave 1 in waves 2 and 3. Table 2 displays wave 1 response rates
and recontact rates for wave 1 to wave 2 and wave 1 to wave 3.
9
www.worldbank.org /BancoMundial @BancoMundialLAC
COVID-19
HIGH-FREQUENCY PHONE SURVEY (HFPS)
IN LATIN AMERICA
TECHNICAL NOTE ON SAMPLING DESIGN,
WEIGHTING, AND ESTIMATION
September 2021
Table 2. HFPS 2020 response and recontact rates by country and wave
Wave 1 Recontract rate Recontract rate
Country
response rate W1/W2 W1/W3
Argentina 14,6% 70,3% 63,7%
Bolivia 34,4% 62,3% 66,1%
Chile 21,3% 62,2% 68,4%
Colombia 26,6% 73,0% 63,8%
Costa Rica 25,7% 79,4% 82,1%
Dominican Republic 28,4% 83,4% 82,7%
Ecuador 44,6% 83,7% 69,5%
El Salvador 28,7% 77,7% 75,1%
Guatemala 29,2% 77,5% 78,9%
Honduras 20,7% 68,2% 64,6%
Mexico 14,4% 59,0% 54,6%
Paraguay 18,4% 68,0% 63,9%
Peru 28,5% 84,1% 82,1%
The design weights of responding households and individuals were adjusted to compensate for nonresponse and thus re-
duce potential nonresponse bias on the survey estimates. A class-based nonresponse adjustment was used. Classes were
formed by crossing all categories of auxiliary variables that were known to be correlated with the likelihood of responding
and were available for both respondents and nonrespondents. Given that the survey used RDD sampling, the information
in the sampling frame was limited. The only variables available for respondents and non-respondents were the type of
phone number (landline or cell phone) and the corresponding geographic region (known for landlines in all countries and
for cell phones only in Argentina, Bolivia and Mexico).
The weighting class nonresponse adjustment is based on the inverse of the weighted response rate estimate in each
class. This is the ratio of the sum of the design weights of all units (respondents and nonrespondents) in class c to the
sum of the design weights of respondents in that class.
where ajc is the nonresponse adjustment factor that should be applied to responding households in class c, and akc is the
nonresponse adjustment factor for responding individuals in that class. R and NR indicate the responding and nonre-
sponding units, respectively.
10
www.worldbank.org /BancoMundial @BancoMundialLAC
COVID-19
HIGH-FREQUENCY PHONE SURVEY (HFPS)
IN LATIN AMERICA
TECHNICAL NOTE ON SAMPLING DESIGN,
WEIGHTING, AND ESTIMATION
September 2021
Thus, the nonresponse adjusted weights for responding households and individuals are
Step 4: Calibration of individual and household weights
Finally, the weights for the responding households and individuals were calibrated to reflect the total population with
phone by sex, age, and region available from external national official sources. This last adjustment has two objectives:
- To further reduce potential nonresponse biases that were not addressed by the nonresponse adjustment in Step
3, by using auxiliary variables from external sources. This can be achieved as long as the calibration auxiliaries are
correlated with nonresponse and the study variables.
- To improve the precision of estimators (i.e., reduce the sampling variances), as long as the auxiliaries are correlated
with the study variables of interest.11
Calibration works by minimizing a measure of the distance between the input weights (nonresponse adjusted weights
in this case) and the calibrated weights, under the constraint that the sum of the calibrated weights equals the sum of
the totals of the auxiliaries from the external source. Unlike the nonresponse adjustment, weights calibration requires
auxiliary variables only for respondents.
Among the existing calibration techniques, the HFPS applied the raking method, using the logit distance function. This
method was most suitable given that all available auxiliary variables (region, sex, and age groups) were categorical, the
region variable had many categories in most countries, and the overall samples were rather small.
The final weights for responding households and individuals can then be expressed as
11 This objective was not addressed in this survey since it would have entailed computing a large set of replicate weights (with bootstrap or jackknife replication
methods), which could be confusing for the final user.
11
www.worldbank.org /BancoMundial @BancoMundialLAC
COVID-19
HIGH-FREQUENCY PHONE SURVEY (HFPS)
IN LATIN AMERICA
TECHNICAL NOTE ON SAMPLING DESIGN,
WEIGHTING, AND ESTIMATION
September 2021
where
is the design weight for the j-th household;
is the nonresponse adjustment factor for households in class c;
is the calibration factor for the j-th household;
is the design weight for the k-th individual;
is the nonresponse adjustment factor for individuals in class c; and
is the calibration factor for the k-th individual.
Table 3 shows the data sources used for calibrating the weights in each country. Population totals by sex, age, and region
taken from these sources were further adjusted for telephone coverage, using the national phone coverage rates pub-
lished by the International Telecommunication Union (ITU) from the United Nations.
Table 3. Data sources for the auxiliary data used for weight calibration
Country Data source used for weight calibration
Instituto Nacional de Estadísticas y Censos. Proyecciones elaboradas en base al Censo Nacional de Poblaciones, Hogares y
Argentina
Viviendas 2010.
Bolivia Instituto Nacional de Estadística. Proyecciones de Población. 2020.
Chile Instituto Nacional de Estadística. Estimaciones y Proyecciones de Población de Chile 1992-2050.
Colombia Departamento Administrativo Nacional de Estadística. Proyecciones de Población Nacional para el Periodo 2018-2070.
Costa Rica Centro Centroamericano de Población. Proyecciones Distritales de Población de Costa Rica 2000-2050.
Dominican Republic Oficina Nacional de Estadística. Población Estimada y Proyectada para el Período 1950-2100.
Ecuador World Bank. Ecuador Sociodemographic and Lavor Force Survey for Population in Human Mobility-EPEC (2019).
El Salvador Centro Centroamericano de Población. Proyecciones de Población de El Salvador. 2000-2050.
Guatemala Intituto Nacional de Estadística. Proyecciones Nacionales 1950-2050.
Honduras Intituto Nacional de Estadística. Proyecciones de Población 2013-2015.
Mexico Consejo Nacional de Población. Proyecciones de la Población de México y de las Entidades Federativas, 2016-2050.
Dirección General de Estadística, Encuestas y Censos. Proyección de la población nacional por sexo y edad, 2000-2025.
Paraguay
Revisión 2015.
Peru Instituto Nacional de Estadística e Informática. Estimaciones y Proyecciones de Población. Boletín Especial Nº 21 y 22.
12
www.worldbank.org /BancoMundial @BancoMundialLAC
COVID-19
HIGH-FREQUENCY PHONE SURVEY (HFPS)
IN LATIN AMERICA
TECHNICAL NOTE ON SAMPLING DESIGN,
WEIGHTING, AND ESTIMATION
September 2021
3. Estimation and Sampling Errors
When analyzing the data, it is essential to compute and assess the precision of the survey estimates, i.e., the magnitude
of their sampling error. Sampling errors can be expressed through the sampling variances, standard errors, coefficients of
variation,12 and confidence intervals, although all these may also include part of the non-sampling errors.
When estimating sampling errors for means, proportions, ratios, and linear and nonlinear regression parameters, HFPS
sample design features and weighting need to be accounted for. If these are not considered, standard statistical software
will treat the sample as a simple random sample, which would lead to biased estimates of sampling variances.
The two most common approaches for estimating sampling errors for complex sample data are: (1) the Taylor Series
Linearization (TSL) of the estimator and the corresponding approximation to its variance, or (2) the use of resampling
variance estimation techniques, such as balanced repeated replication (BRR), jackknife repeated replication (JRR), and
bootstrap. Stata and other statistical software packages use the TSL method as the default for estimating sampling
errors.
Annex 1 indicates the Stata script that should be used to account for the HFPS sample design and weighting when com-
puting an estimate based on cross-sectional data (i.e., based on one wave only).
As mentioned, the HFPS has a panel design and the survey attempted to recontact all respondents from wave 1 in waves
2 and 3. Thanks to the overlapping of sample units over the survey waves, panel surveys allow more precise estimates of
the change or difference for an indicator between successive waves to be obtained. Sequential cross-sectional surveys,
where each wave’s sample includes different households and individuals, can also track changes over time. In this case,
however, change estimates are less precise (i.e., have a larger sampling error) than with a panel survey.
Thus, the HFPS panel should be able to determine more precisely whether a decrease or increase in a given indicator over
time is statistically significant. It should ideally be able to detect small changes between two waves.
Under these conditions, the design-based variance of the change estimate for the indicator of interest is
given by
12 The standard error is the square root of the sampling variance. The coefficient of variation is a relative measure of the standard error and is calculated as the ratio
between the standard error and the point estimate (it is usually expressed in percentage terms). As a rule of thumb, estimates with coefficients of variation of 1
percent or lower are considered to have a very high level of precision. Coefficients of variation between 1 and 3 percent are generally classified as very good, from 3 to
5 percent as good, from 5 to 10 percent as acceptable, and from 10 to 15 percent as large. Above 15 percent is classified as too large and the corresponding estimate
is considered unreliable.
13
www.worldbank.org /BancoMundial @BancoMundialLAC
COVID-19
HIGH-FREQUENCY PHONE SURVEY (HFPS)
IN LATIN AMERICA
TECHNICAL NOTE ON SAMPLING DESIGN,
WEIGHTING, AND ESTIMATION
September 2021
where
is the cross-section estimate of the indicator of interest in wave 1;
is the cross-section estimate of the indicator of interest in wave 2;
is the estimate of net change of between waves 1 and 2; and
is the correlation between the two wave indicators.
The above expression shows how the sampling variance of the change estimate (for indicator ) is reduced. The precision
of the change estimate is thus increased due to the existing correlation between and . Since respondents in a panel
are the same in waves 1 and 2, then the correlation between and is expected to be non-zero. The larger the correla-
tion, the more precise the change estimate.13
Annex 2 includes the Stata code for testing the change of an indicator between any two HFPS waves, accounting for
both the sample design features and the panel overlap. The test output shows the change point estimate, plus the cor-
responding standard error, t-score, p-value, and 95% confidence interval.
13 The magnitude of depends on each particular indicator of interest .
14
www.worldbank.org /BancoMundial @BancoMundialLAC
COVID-19
HIGH-FREQUENCY PHONE SURVEY (HFPS)
IN LATIN AMERICA
TECHNICAL NOTE ON SAMPLING DESIGN,
WEIGHTING, AND ESTIMATION
September 2021
Reference literature
Heeringa, S., West, B., and P. Berglund. (2017). Applied Survey Data Analysis (Second Edition). New York, Taylor & Francis
Group.
Lohr, S. and J. RAO. (2006). Estimation in Multiple-Frame Surveys, Journal of the American Statistical Association, 101,
1019−1030.
Lohr, S. (2011). Alternative Survey Sample Designs: Sampling with Multiple Overlapping Frames, Survey Methodology, 37,
197−213. Statistics Canada.
Skinner, C. and J. Rao. (1996). Estimation in Dual-Frame Surveys with Complex Designs, Journal of the American Statisti-
cal Association, 91, 349−356.
Thompson, S. (2012). Chapter 15: Network Sampling and Link-Tracing Designs, in Sampling. New York, Wiley.
Valliant, R., Dever J., and F. Kreuter. (2016). Practical Tools for Designing and Weighting Sample Surveys. New York, Spring-
er.
15
www.worldbank.org /BancoMundial @BancoMundialLAC
COVID-19
HIGH-FREQUENCY PHONE SURVEY (HFPS)
IN LATIN AMERICA
TECHNICAL NOTE ON SAMPLING DESIGN,
WEIGHTING, AND ESTIMATION
September 2021
Annex 1
Stata Code for Weighted Estimates and Sampling Error Computation
Cross-sectional data
This annex provides a set of examples of the Stata syntax for computing estimates and their corresponding sampling
errors (measured by standard errors, confidence intervals, and coefficients of variation), accounting for the HFPS sam-
ple design and weighting. For more details, data users are referred to the online Stata manual for the svy command
(http://www.stata.com/manuals15/svy.pdf).
To specify the sample design features in any of the HFPS datasets, use command:
svyset [pweight=w_hh_w1]
*Use weight w_hh_w1 for household-level estimates in wave 1 (w_hh_w2 for
wave 2 and w_hh_w3 for wave 3)
*Use weight w_ind_w1 for individual-level estimates in wave 1 (w_ind_w2 for
wave 2 and w_ind_w3 for wave 3)
Numeric variables (means):
To estimate the mean age of the population 18+, use command:
svy: mean q03_07
estat cv
To estimate the mean age of the population 18+ by gender, use command:
svy: mean q03_07, over(q03_03)
estat cv
To estimate the mean age of the population 18+ who did not work in the week prior to the interview, use command:
svy, subpop (if q07_01==2): mean q03_07
estat cv
16
www.worldbank.org /BancoMundial @BancoMundialLAC
COVID-19
HIGH-FREQUENCY PHONE SURVEY (HFPS)
IN LATIN AMERICA
TECHNICAL NOTE ON SAMPLING DESIGN,
WEIGHTING, AND ESTIMATION
September 2021
Categorical variables (proportions):
To estimate the frequency distribution of persons 18+ according to their level of concern that a family member could fall
seriously ill because of COVID-19, use command:
svy: tab q10_01, se ci cv
To estimate the frequency distribution of persons 18+ on whether they worked in the week prior to the interview by status
in employment, use command:
svy: tab q07_01 q07_05, col se ci cv
To estimate the frequency distribution of households on whether they received money from the government or NGOs,
among households where a member lost his or her job since the beginning of the quarantine, use command:
svy, subpop (if q07_20==1): tab q11_04, se ci cv
Linear regression:
To estimate the regression coefficients of a continuous variable y on two continuous variables x1 and x2, use command:
svy: regress y x1 x2
To estimate the regression coefficients of a continuous variable y on two continuous variables x1 and x2 and two categor-
ical variables x3 and x4, use command:
svy: regress y x1 x2 i.x3 i.x4
17
www.worldbank.org /BancoMundial @BancoMundialLAC
COVID-19
HIGH-FREQUENCY PHONE SURVEY (HFPS)
IN LATIN AMERICA
TECHNICAL NOTE ON SAMPLING DESIGN,
WEIGHTING, AND ESTIMATION
September 2021
Annex 2
Stata Code for Testing Changes between HFPS Waves
Panel data
This annex provides the Stata code for testing the change of an indicator between any two HFPS waves, accounting for
both the sample design and the panel overlap.
The following example is based on the first two waves of the Colombia HFPS for testing the change in the proportion of
persons 18+ who worked for at least one hour in the week before the HFPS interview.
The variable of interest is originally named q07_01 in Wave 1 and p07_01 in Wave 2. Rename it as d07_01 in both
waves, so it has the same name in both datasets.
Rename the weights variables w_ind_w1 in Wave 1 and w_ind_w2 in Wave 2 as w_ind.
In both datasets, keep variables caso_se, w_ind, estrato, ola, and the variable to be tested d07_01.
Save both data sets as new files with new names.
use HFPS_COL_W1_2020.dta, clear
rename q07_01 d07_01
rename w_ind_w1 w_ind
keep caso_se w_ind estrato ola d07_01
save HFPS_COL_W1_2020_prime.dta, replace
use HFPS_COL_W2_2020.dta, clear
rename p07_01 d07_01
rename w_ind_w2 w_ind
keep caso_se w_ind estrato ola d07_01
save HFPS_COL_W2_2020_prime.dta, replace
“Stack” the two resulting datasets, combining them into a single dataset.
use HFPS_COL_W1_2020_prime.dta, clear
set more off
append using HFPS_COL_W2_2020_prime.dta, force
18
www.worldbank.org /BancoMundial @BancoMundialLAC
COVID-19
HIGH-FREQUENCY PHONE SURVEY (HFPS)
IN LATIN AMERICA
TECHNICAL NOTE ON SAMPLING DESIGN,
WEIGHTING, AND ESTIMATION
September 2021
Test the change in d07_01 for the full population 18+:
replace d07_01 = 0 if d07_01 == 2
svyset caso_se [pweight=w_ind]
svy: mean d07_01, over (ola)
lincom [d07_01]2-[d07_01]1
Test the change in d07_01 among women 18+:
replace d07_01 = 0 if d07_01 == 2
svyset caso_se [pweight=w_ind]
svy, subpop(if q03_03==2): mean d07_01, over (ola)
lincom [d07_01]2-[d07_01]1
19
www.worldbank.org /BancoMundial @BancoMundialLAC