COVID-19 HIGH-FREQUENCY PHONE SURVEY (HFPS) IN LATIN AMERICA TECHNICAL NOTE ON SAMPLING DESIGN, WEIGHTING, AND ESTIMATION * September 2021 T he COVID-19 High-Frequency Phone Survey (HFPS) 2020 was conducted in 13 Latin American countries: Argenti- na, Bolivia, Chile, Colombia, Costa Rica, Dominican Republic, Ecuador, El Salvador, Guatemala, Honduras, Mexico, Paraguay, and Peru. It followed a panel sample over three waves of data collection in 12 countries and over four waves in Ecuador.1 All waves spanned from May to August 2020 and each wave’s collection period lasted about ten days on average. The survey was administered to one adult per household. Each respondent was presented with both individual and household-level questions. All national samples were based on a dual frame of cell and landline phones, and selected as a one-stage probability sample, with geographic stratification of landline numbers. The samples were generated through a Random Digit Dialing (RDD) process covering all cell and landline telephone numbers active at the time of the sample selection. Survey estimates represent households with a landline or at least one cell phone and individuals of 18 years of age or above who have an active cell phone number or a landline at home. 1. Sampling design The RDD methodology generates virtually all possible telephone numbers in the country under the national telephone numbering plan and then draws a random sample of numbers. This method guarantees full coverage of the population with a phone.2 First, in each country, a large first-phase sample was selected in each frame of numbers, with an allocation ranging from 0 percent landlines and 100 percent cell phones to 20 percent landlines and 80 percent cell phones (landline and cell telephone numbers are distinguished by their prefixes). Landline numbers were included with a small share of the total sample in nearly all countries for two reasons: to cover the landline-only households and individuals, who have a low prevalence in most Latin American countries; and to achieve more accurate sex and age sample distributions.3 * This note was prepared by Ramiro Flores Cruz, partner at Sistemas Integrales and World Bank consultant on survey methodology and sampling, with the financial support from the Latin American and Caribbean Regional Vice Presidency. 1 Ecuador HFPS had a sample design different to the other HFPS countries since it was based on respondents to the 2019 Human Mobility and Host Community Survey (EPEC by its acronym in Spanish), which collected phone numbers in the field. For more details about EPEC’s sample design see Muñoz, Juan; Muñoz, Jose; Olivieri, Sergio. 2020. Big Data for Sampling Design: The Venezuelan Migration Crisis in Ecuador. Policy Research Working Paper; No. 9329. World Bank, Washington, DC. https://openknowledge.worldbank.org/handle/10986/34175 2 Given that the HFPS used a sampling frame of telephone numbers, results represent the population with at least one active phone and exclude the population with no phone. 3 Survey methodology literature and experience show that cell phone survey respondents are more likely to be male and younger than landline phone respondents due to both cell phone ownership patterns and differential response rates, with females and seniors less likely to answer a call from an unknown number. The underrep- resentation of females and seniors in a 100 percent cell phone sample can be compensated via nonresponse weighting adjustment and calibration. That said, the more unbalanced the sample, the larger the weighting adjustments needed; hence, standard errors in the final survey estimates are larger. The inclusion of landline telephone numbers improves the sex and age representation in the sample. As such, the weighting adjustments and the standard errors of the survey estimates will both be smaller. 1 www.worldbank.org /BancoMundial @BancoMundialLAC COVID-19 HIGH-FREQUENCY PHONE SURVEY (HFPS) IN LATIN AMERICA TECHNICAL NOTE ON SAMPLING DESIGN, WEIGHTING, AND ESTIMATION September 2021 The landline frame in each country was geographically stratified by department, province, or state, and the sample of landlines was selected with proportionate allocation across these strata. Geographic stratification of cell phones was only done in Argentina, Bolivia, and Mexico.4 It is important to note that the HFPS sample design allows for obtaining precise estimates at the country level only. Some subnational estimates may have large sampling errors.5 The first-phase samples of landline and cell phone numbers were then screened through an automated process to iden- tify the active numbers. The active numbers were then cross-checked with business registries (based on yellow page directories and websites) to identify and remove business numbers not eligible for this survey. A smaller second-phase sample6 was then selected from the active residential numbers identified in the first-phase sam- ple and was delivered to each country operations team to be contacted by the interviewers.7 HFPS sample sizes The HFPS was conducted in three waves. Table 1 shows the wave 1 final sample size per country and the allocation be- tween both frames.8 In the first wave, when a cell phone was called, the call answerer was interviewed as long as he or she was 18 years of age or above. When a landline number was called, the interviewer asked to talk to any household member 18 years of age or older. In both cell phone and landline calls, the respondent was asked individual and household ques- tions. Landlines are 10 to 15 percent of the sample in eight countries, 20 percent in two, and 0 percent in three (Table 1). Wave 1 respondents were recontacted to be interviewed in the second and third waves. The questionnaires across waves included different questions but kept core questions considered key to longitudinal analysis. 4 Geographic stratification of cell phones was only feasible in these three countries because only in them cell phone number prefixes can be linked to the district (department, province, or state) in which they were issued. 5 Annex 1 shows how to compute sampling errors for different estimates using Stata. 6 Note that the selection of phone numbers involves two sampling phases, and not two sampling stages. The HFPS involves only one sampling stage. 7 Furthermore, the second-phase sample was delivered in batches to the country teams during fieldwork. Delivering large lists of numbers could have facilitated the “misuse” of the sample by easily replacing non-answering numbers, raising nonresponse rates and potentially increasing nonresponse biases. 8 The HFPS samples are element samples (i.e., they have one sampling stage), so the design effects are about 1 and the effective sample sizes are similar to the nominal sizes. In contrast, multi-stage cluster samples typically have design effects larger than 1 and the effective sample sizes are smaller than the nominal sizes, generating larger standard errors. 2 www.worldbank.org /BancoMundial @BancoMundialLAC COVID-19 HIGH-FREQUENCY PHONE SURVEY (HFPS) IN LATIN AMERICA TECHNICAL NOTE ON SAMPLING DESIGN, WEIGHTING, AND ESTIMATION September 2021 Table 1. Sample size and allocation to cell phones and landlines in HFPS Wave 1 Country Sample size Cell phones Landlines Argentina 1,000 85% 15% Bolivia 1,000 100% 0% Chile 1,000 80% 20% Colombia 1,000 85% 15% Costa Rica 800 90% 10% Dominican Republic 800 85% 15% Ecuador 1,200 85% 15% El Salvador 800 90% 10% Guatemala 800 90% 10% Honduras 800 100% 0% Mexico 2,000 80% 20% Paraguay 800 100% 0% Peru 1,000 90% 10% 2. Weighting The HFPS has two sample units: households and individuals. Sampling weights were computed for each unit and should be used according to the estimate of interest. The weighting process involves four steps: 1. Calculation of the inclusion probabilities of landline and cell phone numbers. 2. Computation of design weights for households and individuals. 3. Nonresponse weighting adjustment. 4. Calibration of individual and household weights, using external data from official sources (adjusted for the national phone coverage). In the second and third HFPS waves, household and individual weights were further adjusted for attrition nonresponse from Wave 1 to 2 and from Wave 1 to 3. Step 1: Inclusion probabilities of landline and cell phone numbers A first-phase sample was selected in each of the two frames (cell phone number and landline number frames) with simple random selection without replacement from the entire frame or within geographic strata. The selected numbers were then screened and classified into active and inactive. 3 www.worldbank.org /BancoMundial @BancoMundialLAC COVID-19 HIGH-FREQUENCY PHONE SURVEY (HFPS) IN LATIN AMERICA TECHNICAL NOTE ON SAMPLING DESIGN, WEIGHTING, AND ESTIMATION September 2021 The first-phase inclusion probabilities of cell phone and landline numbers are9 where is the first-phase inclusion probability of the i-th cell phone number; is the size of the first-phase sample of cell phones, composed of active cell phones and inactive cell phones; is the cell phone frame size, the total number of all possible cell phones according to the national numbering plan; is the first-phase inclusion probability of the i-th landline number in stratum h; is the size of the first-phase sample of landlines in stratum h, composed of active landlines and inactive landlines; and is the landline frame size in stratum h, the total number of all possible landline numbers according to the national numbering plan. Next, two second-phase samples were selected systematically out of the first-phase samples of active cell and active landline telephone numbers. The second-phase inclusion probabilities of cell phones and landlines are where 9 Inclusion probabilities of cell phones do not show a stratum index since most cell phone samples were not stratified for the reasons stated above. Only cell phone samples for Argentina, Bolivia, and Mexico were stratified. 4 www.worldbank.org /BancoMundial @BancoMundialLAC COVID-19 HIGH-FREQUENCY PHONE SURVEY (HFPS) IN LATIN AMERICA TECHNICAL NOTE ON SAMPLING DESIGN, WEIGHTING, AND ESTIMATION September 2021 is the second-phase inclusion probability of the i-th active cell phone number conditional on being selected in the first phase; is the size of the second-phase sample of active cell phones; is the second-phase inclusion probability of the i-th active landline number in stratum h conditional on being selected in the first phase; and is the size of the second-phase sample of active landlines in stratum h. The unconditional inclusion probabilities of the second-phase active cell phones and landlines are where is the rate of active phones estimated in the first phase.10 Hence, the unconditional inclusion probabilities of the second-phase active numbers π C πL i and hi can be expressed as the ratio between the active numbers selected in the second phase and an estimate of the total active numbers in the frame . Step 2: Design weights for households and individuals The selection probabilities of households and individuals aged 18 years and older are based on the inclusion probabili- ties of the cell phones and landlines through which they can be reached. Therefore, the computation of household and individual weights should account for multiple chances of selection and for the overlapping between the cell phone and landline frames. This multiplicity weighting adjusts estimates to eliminate the over-representation of households and individuals in the sample that can be reached through more telephone numbers than other households and individuals. It thus eliminates the chance for multiplicity sampling bias. 10 RA(1) estimates are highly precise due to the very large size of the first-phase samples. 5 www.worldbank.org /BancoMundial @BancoMundialLAC COVID-19 HIGH-FREQUENCY PHONE SURVEY (HFPS) IN LATIN AMERICA TECHNICAL NOTE ON SAMPLING DESIGN, WEIGHTING, AND ESTIMATION September 2021 Multiplicity adjustment There is multiplicity probability when a household has a larger selection probability because it can be selected through different sample elements (telephone numbers). Households with more than one cell phone or more than one landline number are over-represented in sample designs like this. As a result, their selection probabilities need to be adjusted to account for this increased chance of selection. The multiplicity-adjusted household selection probabilities in each frame are computed as where is selection probability of the j-th household when contacted through a cell phone, adjusted for multiplicity of working cell phones in the household; is the number of working cell phones in the j-th household; is the selection probability of the j-th household in stratum h when contacted through a landline, adjusted for multiplicity of working landlines in the household; and is the number of working landlines in the j-th household. Therefore, if a household has mc cell phones, its chance of being selected through a cell phone is mc higher than a house- hold where there is only one cell phone. The same applies to landlines, in which case the multiplicity factor is ml. Since the number of cell phones and landlines in a household is unknown at the time of the sample design, it needs to be asked during the interview in the questionnaire. The probability of an individual being selected through a cell phone equals the inclusion probability of his or her cell phone number. On the other hand, the probability of an individual being selected through a landline equals the selection probability of his or her household, conditional on the number of working landlines in the household, over the number of individuals aged 18 years and older in the household. 6 www.worldbank.org /BancoMundial @BancoMundialLAC COVID-19 HIGH-FREQUENCY PHONE SURVEY (HFPS) IN LATIN AMERICA TECHNICAL NOTE ON SAMPLING DESIGN, WEIGHTING, AND ESTIMATION September 2021 where is the selection probability of the k-th individual when contacted through a cell phone; is the selection probability of the k-th individual in stratum h when contacted through a landline in the j-th household; and is the number of eligible persons 18 years of age or older in the j-th household. Overlapping sampling frames Households and individuals with both cell and landline telephones (dual cases) have a higher probability of being select- ed than those with only cell phones or only landlines. The following diagram displays the overlapping pattern of the cell phone and landline sampling frames. Figure 1. Partially overlapping frames A (cell phone frame) B (landline frame) Cell phone only Dual Landline only In order to adjust the selection probabilities for multiplicity, it is essential to collect relevant information during the inter- view. It is necessary to know the domain ownership of the sample households and individuals, plus the number of cell phones and landlines in the sample households. For this purpose, the HFPS questionnaire included the following three questions: 7 www.worldbank.org /BancoMundial @BancoMundialLAC COVID-19 HIGH-FREQUENCY PHONE SURVEY (HFPS) IN LATIN AMERICA TECHNICAL NOTE ON SAMPLING DESIGN, WEIGHTING, AND ESTIMATION September 2021 1. How many working cell phones in total are owned by the persons in your household, including you? 2. Is there any working landline in your household? 3. How many working landlines are there in your household currently? By knowing the domain of ownership, the selection probability for each sample unit can be calculated based on the following probability property P (⋃) = () + () − (⋂) where (⋂) = () × (), given that A and B are independent  In general, in a dual-frame telephone sample design if the sample unit is cell phone only = if the sample unit is landline only if the sample unit is dual where C y L are the selection probabilities of the sample units (households or individuals) in domains cell phone only and landline only.  In the specific HFPS setting (with overlapping frames and multiplicity) Selection probabilities for households are if the household is cell phone only j= if the household is landline only if the household has both cell phone and landline Selection probabilities for individuals are k= if the individual is cell phone only if the individual is landline only if the individual has both cell phone and landline telephones 8 www.worldbank.org /BancoMundial @BancoMundialLAC COVID-19 HIGH-FREQUENCY PHONE SURVEY (HFPS) IN LATIN AMERICA TECHNICAL NOTE ON SAMPLING DESIGN, WEIGHTING, AND ESTIMATION September 2021 Household and individual design weights, w0j and w0k respectively, are the inverse of the above selection probabilities Step 3: Nonresponse adjustment When a phone number is called, it is not always possible to carry out an interview. Nonresponse occurs because of a num- ber of constraints. Most common are that nobody answers the call (no contact), the respondent is unwilling to cooperate (refusal), or language barriers exist. Four main strategies were implemented to minimize nonresponse: a. The survey management team sent SMS text messages to the sample cell phone numbers before calling to inform that a survey firm would reach out and persuade the phone holder to answer. b. In most countries, the sample was released to the country operations teams over successive replicates to keep nonresponse monitored and under the central management team’s control. c. Stringent calling protocols were put in place and monitored to ensure a minimum number of attempts on differ- ent days and times (5 to 10 attempts depending on the country). d. The survey offered monetary and non-monetary incentives in most countries to those who cooperated (e.g., gift cards and phone credit). e. In some countries, the most experienced interviewers recontacted the numbers classified as a “Refusal” to con- vert them into a “Complete interview”. These actions enabled response rates that were higher than similar RDD sample surveys. Wave 1 response rates and re- contact rates in waves 2 and 3 varied across countries. The highest response levels were in Bolivia and Ecuador, while the lowest were in Argentina and Mexico. The survey attempted to recontact all respondents from wave 1 in waves 2 and 3. Table 2 displays wave 1 response rates and recontact rates for wave 1 to wave 2 and wave 1 to wave 3. 9 www.worldbank.org /BancoMundial @BancoMundialLAC COVID-19 HIGH-FREQUENCY PHONE SURVEY (HFPS) IN LATIN AMERICA TECHNICAL NOTE ON SAMPLING DESIGN, WEIGHTING, AND ESTIMATION September 2021 Table 2. HFPS 2020 response and recontact rates by country and wave Wave 1 Recontract rate Recontract rate Country response rate W1/W2 W1/W3 Argentina 14,6% 70,3% 63,7% Bolivia 34,4% 62,3% 66,1% Chile 21,3% 62,2% 68,4% Colombia 26,6% 73,0% 63,8% Costa Rica 25,7% 79,4% 82,1% Dominican Republic 28,4% 83,4% 82,7% Ecuador 44,6% 83,7% 69,5% El Salvador 28,7% 77,7% 75,1% Guatemala 29,2% 77,5% 78,9% Honduras 20,7% 68,2% 64,6% Mexico 14,4% 59,0% 54,6% Paraguay 18,4% 68,0% 63,9% Peru 28,5% 84,1% 82,1% The design weights of responding households and individuals were adjusted to compensate for nonresponse and thus re- duce potential nonresponse bias on the survey estimates. A class-based nonresponse adjustment was used. Classes were formed by crossing all categories of auxiliary variables that were known to be correlated with the likelihood of responding and were available for both respondents and nonrespondents. Given that the survey used RDD sampling, the information in the sampling frame was limited. The only variables available for respondents and non-respondents were the type of phone number (landline or cell phone) and the corresponding geographic region (known for landlines in all countries and for cell phones only in Argentina, Bolivia and Mexico). The weighting class nonresponse adjustment is based on the inverse of the weighted response rate estimate in each class. This is the ratio of the sum of the design weights of all units (respondents and nonrespondents) in class c to the sum of the design weights of respondents in that class. where ajc is the nonresponse adjustment factor that should be applied to responding households in class c, and akc is the nonresponse adjustment factor for responding individuals in that class. R and NR indicate the responding and nonre- sponding units, respectively. 10 www.worldbank.org /BancoMundial @BancoMundialLAC COVID-19 HIGH-FREQUENCY PHONE SURVEY (HFPS) IN LATIN AMERICA TECHNICAL NOTE ON SAMPLING DESIGN, WEIGHTING, AND ESTIMATION September 2021 Thus, the nonresponse adjusted weights for responding households and individuals are Step 4: Calibration of individual and household weights Finally, the weights for the responding households and individuals were calibrated to reflect the total population with phone by sex, age, and region available from external national official sources. This last adjustment has two objectives: - To further reduce potential nonresponse biases that were not addressed by the nonresponse adjustment in Step 3, by using auxiliary variables from external sources. This can be achieved as long as the calibration auxiliaries are correlated with nonresponse and the study variables. - To improve the precision of estimators (i.e., reduce the sampling variances), as long as the auxiliaries are correlated with the study variables of interest.11 Calibration works by minimizing a measure of the distance between the input weights (nonresponse adjusted weights in this case) and the calibrated weights, under the constraint that the sum of the calibrated weights equals the sum of the totals of the auxiliaries from the external source. Unlike the nonresponse adjustment, weights calibration requires auxiliary variables only for respondents. Among the existing calibration techniques, the HFPS applied the raking method, using the logit distance function. This method was most suitable given that all available auxiliary variables (region, sex, and age groups) were categorical, the region variable had many categories in most countries, and the overall samples were rather small. The final weights for responding households and individuals can then be expressed as 11 This objective was not addressed in this survey since it would have entailed computing a large set of replicate weights (with bootstrap or jackknife replication methods), which could be confusing for the final user. 11 www.worldbank.org /BancoMundial @BancoMundialLAC COVID-19 HIGH-FREQUENCY PHONE SURVEY (HFPS) IN LATIN AMERICA TECHNICAL NOTE ON SAMPLING DESIGN, WEIGHTING, AND ESTIMATION September 2021 where is the design weight for the j-th household; is the nonresponse adjustment factor for households in class c; is the calibration factor for the j-th household; is the design weight for the k-th individual; is the nonresponse adjustment factor for individuals in class c; and is the calibration factor for the k-th individual. Table 3 shows the data sources used for calibrating the weights in each country. Population totals by sex, age, and region taken from these sources were further adjusted for telephone coverage, using the national phone coverage rates pub- lished by the International Telecommunication Union (ITU) from the United Nations. Table 3. Data sources for the auxiliary data used for weight calibration Country Data source used for weight calibration Instituto Nacional de Estadísticas y Censos. Proyecciones elaboradas en base al Censo Nacional de Poblaciones, Hogares y Argentina Viviendas 2010. Bolivia Instituto Nacional de Estadística. Proyecciones de Población. 2020. Chile Instituto Nacional de Estadística. Estimaciones y Proyecciones de Población de Chile 1992-2050. Colombia Departamento Administrativo Nacional de Estadística. Proyecciones de Población Nacional para el Periodo 2018-2070. Costa Rica Centro Centroamericano de Población. Proyecciones Distritales de Población de Costa Rica 2000-2050. Dominican Republic Oficina Nacional de Estadística. Población Estimada y Proyectada para el Período 1950-2100. Ecuador World Bank. Ecuador Sociodemographic and Lavor Force Survey for Population in Human Mobility-EPEC (2019). El Salvador Centro Centroamericano de Población. Proyecciones de Población de El Salvador. 2000-2050. Guatemala Intituto Nacional de Estadística. Proyecciones Nacionales 1950-2050. Honduras Intituto Nacional de Estadística. Proyecciones de Población 2013-2015. Mexico Consejo Nacional de Población. Proyecciones de la Población de México y de las Entidades Federativas, 2016-2050. Dirección General de Estadística, Encuestas y Censos. Proyección de la población nacional por sexo y edad, 2000-2025. Paraguay Revisión 2015. Peru Instituto Nacional de Estadística e Informática. Estimaciones y Proyecciones de Población. Boletín Especial Nº 21 y 22. 12 www.worldbank.org /BancoMundial @BancoMundialLAC COVID-19 HIGH-FREQUENCY PHONE SURVEY (HFPS) IN LATIN AMERICA TECHNICAL NOTE ON SAMPLING DESIGN, WEIGHTING, AND ESTIMATION September 2021 3. Estimation and Sampling Errors When analyzing the data, it is essential to compute and assess the precision of the survey estimates, i.e., the magnitude of their sampling error. Sampling errors can be expressed through the sampling variances, standard errors, coefficients of variation,12 and confidence intervals, although all these may also include part of the non-sampling errors. When estimating sampling errors for means, proportions, ratios, and linear and nonlinear regression parameters, HFPS sample design features and weighting need to be accounted for. If these are not considered, standard statistical software will treat the sample as a simple random sample, which would lead to biased estimates of sampling variances. The two most common approaches for estimating sampling errors for complex sample data are: (1) the Taylor Series Linearization (TSL) of the estimator and the corresponding approximation to its variance, or (2) the use of resampling variance estimation techniques, such as balanced repeated replication (BRR), jackknife repeated replication (JRR), and bootstrap. Stata and other statistical software packages use the TSL method as the default for estimating sampling errors. Annex 1 indicates the Stata script that should be used to account for the HFPS sample design and weighting when com- puting an estimate based on cross-sectional data (i.e., based on one wave only). As mentioned, the HFPS has a panel design and the survey attempted to recontact all respondents from wave 1 in waves 2 and 3. Thanks to the overlapping of sample units over the survey waves, panel surveys allow more precise estimates of the change or difference for an indicator between successive waves to be obtained. Sequential cross-sectional surveys, where each wave’s sample includes different households and individuals, can also track changes over time. In this case, however, change estimates are less precise (i.e., have a larger sampling error) than with a panel survey. Thus, the HFPS panel should be able to determine more precisely whether a decrease or increase in a given indicator over time is statistically significant. It should ideally be able to detect small changes between two waves. Under these conditions, the design-based variance of the change estimate for the indicator of interest is given by 12 The standard error is the square root of the sampling variance. The coefficient of variation is a relative measure of the standard error and is calculated as the ratio between the standard error and the point estimate (it is usually expressed in percentage terms). As a rule of thumb, estimates with coefficients of variation of 1 percent or lower are considered to have a very high level of precision. Coefficients of variation between 1 and 3 percent are generally classified as very good, from 3 to 5 percent as good, from 5 to 10 percent as acceptable, and from 10 to 15 percent as large. Above 15 percent is classified as too large and the corresponding estimate is considered unreliable. 13 www.worldbank.org /BancoMundial @BancoMundialLAC COVID-19 HIGH-FREQUENCY PHONE SURVEY (HFPS) IN LATIN AMERICA TECHNICAL NOTE ON SAMPLING DESIGN, WEIGHTING, AND ESTIMATION September 2021 where is the cross-section estimate of the indicator of interest in wave 1; is the cross-section estimate of the indicator of interest in wave 2; is the estimate of net change of between waves 1 and 2; and is the correlation between the two wave indicators. The above expression shows how the sampling variance of the change estimate (for indicator ) is reduced. The precision of the change estimate is thus increased due to the existing correlation between and . Since respondents in a panel are the same in waves 1 and 2, then the correlation between and is expected to be non-zero. The larger the correla- tion, the more precise the change estimate.13 Annex 2 includes the Stata code for testing the change of an indicator between any two HFPS waves, accounting for both the sample design features and the panel overlap. The test output shows the change point estimate, plus the cor- responding standard error, t-score, p-value, and 95% confidence interval. 13 The magnitude of depends on each particular indicator of interest . 14 www.worldbank.org /BancoMundial @BancoMundialLAC COVID-19 HIGH-FREQUENCY PHONE SURVEY (HFPS) IN LATIN AMERICA TECHNICAL NOTE ON SAMPLING DESIGN, WEIGHTING, AND ESTIMATION September 2021 Reference literature Heeringa, S., West, B., and P. Berglund. (2017). Applied Survey Data Analysis (Second Edition). New York, Taylor & Francis Group. Lohr, S. and J. RAO. (2006). Estimation in Multiple-Frame Surveys, Journal of the American Statistical Association, 101, 1019−1030. Lohr, S. (2011). Alternative Survey Sample Designs: Sampling with Multiple Overlapping Frames, Survey Methodology, 37, 197−213. Statistics Canada. Skinner, C. and J. Rao. (1996). Estimation in Dual-Frame Surveys with Complex Designs, Journal of the American Statisti- cal Association, 91, 349−356. Thompson, S. (2012). Chapter 15: Network Sampling and Link-Tracing Designs, in Sampling. New York, Wiley. Valliant, R., Dever J., and F. Kreuter. (2016). Practical Tools for Designing and Weighting Sample Surveys. New York, Spring- er. 15 www.worldbank.org /BancoMundial @BancoMundialLAC COVID-19 HIGH-FREQUENCY PHONE SURVEY (HFPS) IN LATIN AMERICA TECHNICAL NOTE ON SAMPLING DESIGN, WEIGHTING, AND ESTIMATION September 2021 Annex 1 Stata Code for Weighted Estimates and Sampling Error Computation Cross-sectional data This annex provides a set of examples of the Stata syntax for computing estimates and their corresponding sampling errors (measured by standard errors, confidence intervals, and coefficients of variation), accounting for the HFPS sam- ple design and weighting. For more details, data users are referred to the online Stata manual for the svy command (http://www.stata.com/manuals15/svy.pdf). To specify the sample design features in any of the HFPS datasets, use command: svyset [pweight=w_hh_w1] *Use weight w_hh_w1 for household-level estimates in wave 1 (w_hh_w2 for wave 2 and w_hh_w3 for wave 3) *Use weight w_ind_w1 for individual-level estimates in wave 1 (w_ind_w2 for wave 2 and w_ind_w3 for wave 3) Numeric variables (means): To estimate the mean age of the population 18+, use command: svy: mean q03_07 estat cv To estimate the mean age of the population 18+ by gender, use command: svy: mean q03_07, over(q03_03) estat cv To estimate the mean age of the population 18+ who did not work in the week prior to the interview, use command: svy, subpop (if q07_01==2): mean q03_07 estat cv 16 www.worldbank.org /BancoMundial @BancoMundialLAC COVID-19 HIGH-FREQUENCY PHONE SURVEY (HFPS) IN LATIN AMERICA TECHNICAL NOTE ON SAMPLING DESIGN, WEIGHTING, AND ESTIMATION September 2021 Categorical variables (proportions): To estimate the frequency distribution of persons 18+ according to their level of concern that a family member could fall seriously ill because of COVID-19, use command: svy: tab q10_01, se ci cv To estimate the frequency distribution of persons 18+ on whether they worked in the week prior to the interview by status in employment, use command: svy: tab q07_01 q07_05, col se ci cv To estimate the frequency distribution of households on whether they received money from the government or NGOs, among households where a member lost his or her job since the beginning of the quarantine, use command: svy, subpop (if q07_20==1): tab q11_04, se ci cv Linear regression: To estimate the regression coefficients of a continuous variable y on two continuous variables x1 and x2, use command: svy: regress y x1 x2 To estimate the regression coefficients of a continuous variable y on two continuous variables x1 and x2 and two categor- ical variables x3 and x4, use command: svy: regress y x1 x2 i.x3 i.x4 17 www.worldbank.org /BancoMundial @BancoMundialLAC COVID-19 HIGH-FREQUENCY PHONE SURVEY (HFPS) IN LATIN AMERICA TECHNICAL NOTE ON SAMPLING DESIGN, WEIGHTING, AND ESTIMATION September 2021 Annex 2 Stata Code for Testing Changes between HFPS Waves Panel data This annex provides the Stata code for testing the change of an indicator between any two HFPS waves, accounting for both the sample design and the panel overlap. The following example is based on the first two waves of the Colombia HFPS for testing the change in the proportion of persons 18+ who worked for at least one hour in the week before the HFPS interview. The variable of interest is originally named q07_01 in Wave 1 and p07_01 in Wave 2. Rename it as d07_01 in both waves, so it has the same name in both datasets. Rename the weights variables w_ind_w1 in Wave 1 and w_ind_w2 in Wave 2 as w_ind. In both datasets, keep variables caso_se, w_ind, estrato, ola, and the variable to be tested d07_01. Save both data sets as new files with new names. use HFPS_COL_W1_2020.dta, clear rename q07_01 d07_01 rename w_ind_w1 w_ind keep caso_se w_ind estrato ola d07_01 save HFPS_COL_W1_2020_prime.dta, replace use HFPS_COL_W2_2020.dta, clear rename p07_01 d07_01 rename w_ind_w2 w_ind keep caso_se w_ind estrato ola d07_01 save HFPS_COL_W2_2020_prime.dta, replace “Stack” the two resulting datasets, combining them into a single dataset. use HFPS_COL_W1_2020_prime.dta, clear set more off append using HFPS_COL_W2_2020_prime.dta, force 18 www.worldbank.org /BancoMundial @BancoMundialLAC COVID-19 HIGH-FREQUENCY PHONE SURVEY (HFPS) IN LATIN AMERICA TECHNICAL NOTE ON SAMPLING DESIGN, WEIGHTING, AND ESTIMATION September 2021 Test the change in d07_01 for the full population 18+: replace d07_01 = 0 if d07_01 == 2 svyset caso_se [pweight=w_ind] svy: mean d07_01, over (ola) lincom [d07_01]2-[d07_01]1 Test the change in d07_01 among women 18+: replace d07_01 = 0 if d07_01 == 2 svyset caso_se [pweight=w_ind] svy, subpop(if q03_03==2): mean d07_01, over (ola)  lincom [d07_01]2-[d07_01]1   19 www.worldbank.org /BancoMundial @BancoMundialLAC