Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION* December 2022 After implementing Phase 1 of the High-Frequency Phone Survey (HFPS) project in Latin America and The Caribbean in 2020, the World Bank conducted Phase 2 in 2021 to continue to assess the socio-economic impacts of the COVID-19 pandemic on households. This new phase, conducted in partnership with the UNDP LAC Chief Economist office, included two waves. Wave 1 covered 24 countries1 and Wave 2 covered 22 countries. Of these countries, 13 participated in Phase 1 and the rest joined in Phase 2. The 13 countries from Phase 1 are Argentina, Bolivia, Colombia, Costa Rica, Chile, Dominican Republic, Ecuador, El Salvador, Guatemala, Honduras, Mexico, Paraguay and Peru. In these countries, Phase 2 Wave 1 tried to recontact households and individuals who had responded in Phase 1 Wave 1 in 2020 and added a fresh supplement sample to compensate for attrition nonresponse. Phase 2 Wave 2 tried to recontact all respondents to Phase 2 Wave 1 and also incorporated a supplement sample. Countries that joined in Phase 2 (in addition to those of Phase 1) are Antigua & Barbuda, Belize, Brazil, Dominica, Guyana, Haiti, Jamaica, Nicaragua, Panama, Saint Lucia and Uruguay.2 This document describes the sampling design, weighting and the right procedure to estimate indicators for the LAC HFPS Phase 2 surveys. For the sake of clarity, countries participating in Phase 1 in 2020 will be called “Original Countries”. The countries added in Phase 2 in 2021 will be called “Added Countries”. As in the 2020 phase, Phase 2 estimates represent households with a landline or at least one cell phone and individuals 18 years of age or above with an active cell phone number or a landline at home. 1. HFPS PHASE 2 SAMPLING DESIGN Phase 2 Wave 1 samples for the Original Countries included two components: a) a panel formed by respondents to Phase 1 Wave 1, and b) a supplement fresh sample of phone numbers to compensate for attrition between Phase 1 Wave 1 and Phase 2 Wave 1, and to slightly increase the overall sample size3. The samples of the Added Countries were entirely new, as described below. Phase 2 Wave 2 samples included two components in all countries: a) a panel formed by respondents to Phase 2 Wave 1, and b) a supplementary sample of phone numbers to compensate for attrition between Phase 2 Wave 1 and Phase 2 Wave 2. Phase 2 Waves 1 and 2 aimed to achieve between 800 and 3,000 completed interviews per country. Table 1 displays the number of complete interviews and response rate by country. * This document was prepared by Ramiro Flores Cruz, partner at Sistemas Integrales and World Bank consultant on survey methodology and sampling, with the financial support from the Latin American and Caribbean Regional Vice Presidency. 1 While it originally covered 23 countries, Brazil was later added to the LAC HFPS Phase 2 project. The survey was inspired by the broader regional project, but followed country-specific characteristics. See World Bank and UNDP (2022) here: https://microdata.worldbank.org/index.php/catalog/4533 . This note describes the methodology applicable to the 23 countries. 2 Antigua and Barbuda and Brazil were not included in Phase 2 Wave 2. 3 Ecuador, which is one of the Original Countries, is an exception since it used a fully fresh sample of phone numbers. 1 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 Table 1. HFPS Phase 2, Waves 1 & 2. Complete interviews and response rate by country. Complete interviews Response rate in Complete interviews Response rate in Country Type of country in Phase 2 Wave 1 Phase 2 Wave 1 in Phase 2 Wave 2 Phase 2 Wave 2 Argentina Original 1,216 13.0% 1,321 13.5% Bolivia Original 1,272 35.9% 1,312 35.0% Chile Original 1,212 21.1% 1,329 14.2% Colombia Original 1,221 33.9% 1,688 36.7% Costa Rica Original 805 16.0% 905 15.2% Dominican Republic Original 1,205 26.6% 1,364 32.6% Ecuador a Original 951 18.9% 1,032 29.2% El Salvador Original 818 21.9% 812 26.0% Guatemala Original 1,207 23.9% 1,521 24.6% Honduras Original 1,021 23.3% 1,004 28.9% Mexico Original 2,625 9.6% 2,511 10.5% Paraguay Original 1,076 21.3% 1,061 34.0% Peru Original 1,212 21.8% 1,724 26.8% Antigua & Barbudab Added 790 37.1%     Belize Added 816 40.6% 898 49.2% Dominica Added 861 37.6% 879 47.2% Guyana Added 785 38.7% 875 46.1% Haiti Added 2,814 36.8% 2,361 43.9% Jamaica Added 829 27.8% 871 38.3% Nicaragua Added 833 33.1% 865 36.9% Panama Added 815 23.3% 1,335 36.4% Saint Lucia Added 835 40.0% 860 43.6% Uruguay Added 816 21.8% 930 26.2% a Ecuador was part of HFPS Phase 1 in 2020, so it is considered an Original Country in this table. However, its Phase 2 Wave 1 sample was entirely new and included no panel, so its sample weighting followed the same procedures as for the Added Countries. b Antigua and Barbuda was covered in Phase 2 Wave 1 only. The fresh samples in the Original and Added Countries have the same Random Digit Dialing (RDD) dual-frame design as the Phase 1 samples and were all selected from the same RDD sampling frames of phone numbers used in Phase 1.4 4 For a full description of Phase 1 sampling design see HFPS Phase 1 Technical Note on Sampling, Weighting and Estimation, available at https://documents1.worldbank. org/curated/en/336371631859678760/pdf/COVID-19-High-Frequency-Phone-Surveys-in-Latin-America-Technical-Note-on-Sampling-Design-Weighting-and- Estimation.pdf 2 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 The RDD methodology generates virtually all possible telephone numbers in the country under the national telephone numbering plan and draws a random sample of numbers. This method guarantees full coverage of telephone numbers eliminating coverage bias with respect to the population with a phone.5 First, in each country, a large first-instance sample was selected in each frame (mobiles and landlines) of numbers, with an allocation ranging from 0 percent landlines and 100 percent cell phones to 20 percent landlines and 80 percent cell phones (landline and cell telephone numbers are distinguished by their prefixes). Landline numbers were included with a small share of the total sample in nearly all countries to cover the landline-only households and persons, which have a low prevalence in most Latin American countries and to achieve more accurate sex and age sample distributions.6 The landline frame in each country was geographically grouped by department, province, or state, and the sample of landlines was selected with proportionate allocation across these strata. Geographic stratification of cell phones was only possible in Argentina, Bolivia and Mexico since these are the only countries where mobile number prefixes are linked to districts (provinces, departments or states7). The first-instance samples of landline and cell phone numbers were then screened through an automated process to identify the active numbers. The active numbers were cross-checked with business registries (based on yellow page directories and websites) to identify and remove business numbers not eligible for this survey. A smaller second-instance sample was then selected from the active residential numbers identified in the first-instance sample and was delivered to the country operations team to be contacted and interviewed. 2. PHASE 2 WAVE 1 2.1. PHASE 2 WAVE 1 WEIGHTING PROCEDURES FOR THE HFPS ORIGINAL COUNTRIES HFPS Phase 2 Wave 1 has three units of analysis: households, adult individuals (18 years of age and older) and children 6 through 17 years of age. Weights were computed for each sample unit and should be used according to the estimate of interest. The weighting process for the Original Countries included five steps: 5 Given that the HFPS used a sampling frame of telephone numbers, results are technically only about the population with a phone and exclude the population with no phone. 6 Survey methodology literature and experience show that cell phone survey respondents are more likely to be male and younger than landline phone respondents due to both cell phone ownership patterns and differential response rates, with females and seniors less likely to answer a call from an unknown number. Even though the underrepresentation of females and seniors in a 100 percent cell phone sample can be compensated via nonresponse weighting adjustment and calibration, the more unbalanced the sample, the larger the weighting adjustments needed and, hence, the larger the standard errors of the final survey estimates. The inclusion of landline telephone numbers improves the sex and age representation in the sample and, therefore, the weighting adjustments will be smaller and so will be the standard errors of the survey estimates. 7 To be sure, it is worth noting that the prefix of cell phone numbers does not guarantee completely the location of the individual, as they could migrate from one region to the other and keep their initial number. Information from respondents’ actual location was collected in the survey. 3 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 1. Calculation of the inclusion probabilities of landline and cell phone numbers. 2. Computation of design weights for households and individuals. 3. Attrition nonresponse weighting adjustment for the panel sample component, and first-time nonresponse adjustment for the supplement sample component. 4. Calibration of household, adult and child/adolescent weights, using external ancillary data from official sources (adjusted by national phone coverage). 5. Weight trimming and recalibration. Step 1: Inclusion probabilities of landline and cell phone numbers Inclusion probabilities of landline and cell phone numbers depend on the implemented sampling design and its features: stratification, sample size per stratum, frame size per stratum and the selection method applied in each stratum. The HFPS sample design does not include any clustering. The size of the Phase 2 Wave 1 overall (cell phones and landlines) selected sample of phone numbers (i.e., before any fieldwork activities) in each of the Original Countries equals the Phase 1 Wave 1 overall selected sample of phone numbers plus the size of the Phase 2 Wave 1 overall supplement fresh sample of phone numbers. + where is the size of the Phase 2 Wave 1 overall selected sample of phone numbers; is the size of the Phase 1 Wave 1 overall selected sample of phone numbers; and is the size of the Phase 2 Wave 1 overall selected fresh supplement sample of numbers. The calculation of the inclusion probabilities for Phase 2 Wave 1 sample follows procedures analog to the ones used in Phase 1 for a dual-frame phone sample. A first-instance8 sample was selected in each of the two frames (mobile number and landline number frames) with simple random selection without replacement. This selection was made from the entire frame for mobile numbers and within geographic strata for landlines. The selected numbers were then screened and classified into active and inactive. 8 We use the term “instance” in place of “sampling phase” to avoid confounding “sampling phase” with the “HFPS phase”. A sampling phase occurs when a subsample of elements is selected out of a larger sample of the same type of elements. 4 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 The first-instance inclusion probabilities of cell phone and landline numbers are9 where is the Phase 2 Wave 1 first-instance inclusion probability of the i-th cell phone number; is the size of the Phase 2 Wave 1 first-instance sample of cell phones, composed of active cell phones and inactive cell phones; is the cell phone frame size, the total number of all possible cell phones according to the national numbering plan; is the Phase 2 Wave 1 first-instance inclusion probability of the i-th landline number i n stratum h; is the size of the Phase 2 Wave 1 first-instance sample of landlines in stratum h, composed of active landlines and inactive landlines; and is the landline frame size in stratum h, the total number of all possible landline numbers according to the national numbering plan. Next, two second-instance samples were selected systematically out of the first-instance samples of active cell and active landline telephone numbers. The second-instance inclusion probabilities of cell phones and landlines conditional on being selected in the first phase are 9 Inclusion probabilities of cell phones do not show a stratum index since most cell phone samples were not stratified for the reasons stated above. Only cell phone samples for Argentina, Bolivia, and Mexico were stratified. 5 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 where is the second-instance inclusion probability of the i-th active cell phone number conditional on being selected in the first instance; is the size of the second-instance sample of active cell phones; is the second-instance inclusion probability of the i-th active landline number in stratum h conditional on being selected in the first instance; and is the size of the second-instance sample of active landlines in stratum h. The final inclusion probabilities of cell phones and landlines are the product of their first-instance inclusion probabilities and the conditional second-instance inclusion probabilities. where alludes to the rate of active phones estimated in the first instance.10 Hence, the unconditional inclusion probabilities of the second-instance active numbers and can be expressed as the ratio between the active numbers selected in the second instance and an estimate of the total active numbers . 10 estimates are highly precise due to the very large size of the first-instance samples. 6 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 Step 2: Design weights for households, adults and children & adolescents The selection probabilities of households and adult individuals are based on the inclusion probabilities of the cell phones and landlines through which they can be reached. Therefore, the computation of household and individual selection probabilities should account for multiple chances of selection and for the overlapping between the cell phone and landline frames. Thus, a multiplicity adjustment was made to eliminate the over-representation of sample households and individuals that could be reached through more telephone numbers than other households and individuals, and thus eliminate the chance for multiplicity sampling bias. For this purpose, the survey collected information about the number of cell phones and landlines in the respondent households through the following questions:11 1. How many working cell phones in total are owned by the persons in your household, including you? 2. Is there any working landline in your household? 3. How many working landlines are there in your household currently? Finally, household and individual design weights, w0j and w0k respectively, were calculated as the inverse of the multiplicity- adjusted selection probabilities.12 Weighting of data on children & adolescents HFPS Phase 2 Waves 1 and 2 collected specific data about a randomly selected child or adolescent 6 through 17 years of age in each interviewed household. Wave 2 also collected information for a randomly selected child between 0 and 5 years of age. To implement this, the questionnaire first collected a roster of all children and adolescents living in each respondent household and selected one at random. The child/adolescent weight is based on his/her probability of selection within the household, conditional on his/her household being selected in the sample. Hence = / = / where 11 In the Caribbean countries, where a large share of persons have more than one active cell phone number, the questionnaire also asked the respondent about the number of cell phone numbers owned individually. This data was used to adjust individual weights for multiplicity. 12 For more details on the computing of design weights see HFPS Phase 1 Technical Note on Sampling, Weighting and Estimation. 7 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 is the selection probability of the n-th child/adolescent in the j-th household when the household is contacted through a cell phone, is selection probability of the j-th household when contacted through a cell phone, adjusted for multiplicity of working cell phones in the household; is the number of eligible children and adolescents (6-17 years old) in the j-th household; is the selection probability of the n-th child/adolescent of the j-th household in stratum h when the household is contacted through a landline; and is selection probability of the j-th household when contacted through a landline, adjusted for multiplicity of working landlines in the household. Children and adolescents’ design weight w0n is the inverse of the above selection probabilities = 1/ depending on whether the sample household was reached via a cell phone or a landline. Step 3: Nonresponse weighting adjustment When a phone number is called, it is not always possible to carry out an interview. Nonresponse occurs because of a number of constraints. Most common are that nobody answers the call (no contact), the respondent is unwilling to cooperate (refusal), or language barriers exist. As in Phase 1, five main strategies were implemented during fieldwork to minimize nonresponse: a. An SMS text message was sent to the sample cell phone numbers a couple of days before calling to inform that a survey firm would reach out and persuade the phone holder to answer. b. In most countries, the sample was released to the teams implementing the data collection over successive replicates to keep nonresponse closely monitored. c. Stringent calling protocols were put in place and monitored to ensure a minimum number of attempts on different days and times (5 to 10 attempts depending on the country). d. The survey offered monetary and non-monetary incentives in most countries to those who cooperated (e.g., gift cards and phone credit with an equivalent monetary value between US$3 and US$11). Incentives were somewhat higher for panel units. e. In some countries, the most experienced interviewers recontacted the numbers classified as a “Refusal” to convert them into a “Complete interview”. 8 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 As described earlier, Phase 2 Wave 1 samples of the Original Countries had two components: a) a panel of respondents to Phase 1 Wave 1, and b) a supplement fresh sample of phone numbers. These two sample components were subject to different subsequent response mechanisms. The panel component was affected by first-time nonresponse in Phase 1 Wave 1 plus attrition nonresponse in Phase 2 Wave 1. In contrast, the supplement sample incorporated in Phase 2 Wave 1 was only subject to first-time nonresponse. These two different response mechanisms were accounted for in the nonresponse weighting adjustments described below. A. Nonresponse adjustment of the panel sample component A1. First-time nonresponse adjustment The Phase 2 Wave 1 panel component was initially affected by first-time nonresponse in Phase 1 Wave 1. Therefore, the design weights of responding panel households and individuals were adjusted to compensate for the nonresponse that occurred in Phase 1 Wave 1 to reduce potential nonresponse bias. A class-based nonresponse adjustment was used by crossing all categories of auxiliary variables that were known to be correlated with the likelihood of responding and were available for both respondents and nonrespondents. Given that the survey used RDD sampling, the information in the sampling frame was limited and the only variables available for respondents and non-respondents were the type of phone number (landline or cell phone) and the corresponding geographic region (known for landlines in all countries and for cell phones only in Argentina, Bolivia and Mexico). The weighting class nonresponse adjustment was based on the inverse of the weighted response rate estimate in each class, which is the ratio of the sum of the design weights of all units (respondents and nonrespondents) in class c to the sum of the design weights of respondents in that class. ; where is the nonresponse adjustment factor that should be applied to responding households in class c, and is the nonresponse adjustment factor for responding individuals in that class. R and NR indicate the responding and nonresponding units, respectively. Thus, the nonresponse adjusted weights for responding households and individuals are 9 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 A2. Attrition nonresponse adjustment In Phase 2 Wave 1, the sample panel component was subject to attrition nonresponse between Phase 1 Wave 1 and Phase 2 Wave 1. Table 2 shows the number of respondents to Phase 1 Wave 1, how many of these responded to Phase 2 Wave 1 and the resulting recontact and attrition rates.13 Table 2. Phase 2 Wave 1 - Original Countries. Complete interviews and attrition rate by country based on Phase 1 Wave 1 outcomes. (1) (2) (3) (4) Country Type of country Complete interviews Complete interviews Recontact rate Attrition rate in Phase 1 Wave 1 in the panel ph1w1/ph2w1 ph1w1/ph2w1 Argentina Original 1,001 197 19.7% 80.3% Bolivia Original 1,075 363 33.8% 66.2% Chile Original 1,000 262 26.2% 73.8% Colombia Original 1,000 268 26.8% 73.2% Costa Rica Original 801 143 17.9% 82.1% Dominican Republic Original 748 113 15.1% 84.9% Ecuador a Original 1,227       El Salvador Original 804 129 16.0% 84.0% Guatemala Original 806 135 16.7% 83.3% Honduras Original 807 141 17.5% 82.5% Mexico Original 2,109 251 11.9% 88.1% Paraguay Original 715 57 8.0% 92.0% Peru Original 1,000 236 23.6% 76.4% a Ecuador was part of HFPS Phase 1 in 2020, so it is considered an Original Country in this table. However, its Phase 2 Wave 1 sample was entirely new and included no panel, so its sample weighting should follow the same procedures as for the Added Countries. 13 Since in Phase 2 Wave 1 the project had no access to the Phase 1 Wave 1 respondent names, it was not possible to assert that respondents to Phase 2 Wave 1 were the same that had been interviewed in Phase 1 Wave 1. Therefore, a matching algorithm was used to identify which respondents were highly likely to be the same as in Phase 1 Wave 1. A household was considered to be the same as in Phase 1 Wave 1 if it had the same household ID, the household head´s sex was also the same, the household head´s age differed in two years or less, and if at least one of the following variables had the same value: geographic region, urban/rural area, number of adult members, number of children, household head´s educational attainment and whether the household had a landline phone. Only those households meeting these conditions were labeled as panel respondents, and this partly explains the large attrition rates between Phase 1 Wave 1 and Phase 2 Wave 1. In contrast, in Phase 2 Wave 2 the survey had access to the Phase 2 Wave 1 respondent names, so no matching algorithm was needed and attrition rates between Phase 2 Wave 1 and Phase 2 Wave 2 were significantly lower (see Section 3). 10 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 Attrition nonresponse weighting adjustment involved the following steps. 1. Identify a set of variables collected in Phase 1 Wave 1 as potential predictors of response in Phase 2 Wave 1. 2. Examine any missing data patterns in those preselected variables. 3. Impute any item missing values in the potential predictors identified in Phase 1 Wave 1 using a sequential regression imputation procedure. 4. Run two alternate techniques and choose the most efficient one, i.e., the one that yielded smaller standard errors for the key survey estimates: 4a. Run a random forest algorithm over the potential predicting variables identified in Phase 1 Wave 1 to determine the best predictors of Phase 2 Wave 1 response, and form a set of response cells defined by the interactions of the identified response predictors. 4b. Use a response propensity score adjustment by fitting a logistic regression model, with the preselected variables from Phase 1 Wave 1 as independent variables and a dummy response indicator in Phase 2 Wave 1 as the outcome variable. Predict a response propensity for each respondent and nonrespondent unit in Phase 2 Wave 1. Stratify all sample units (respondents and nonrespondents) based on their response propensities to create equal-sized adjustment classes. 5. The response propensity model proved more efficient and the weighted median estimated propensity was computed in each adjustment class. 6. Adjust the weights of the respondent sample units using the inverse of the median propensity in each adjustment class. The estimated response propensity can be written as where = , is the vector of p independent variables for unit i considered for the model; and =( 0, 1, …, ) allude to the estimated logistic regression coefficients corresponding to the p independent p variables. The preselected variables considered for the response propensity logistic regression models were sex, age and educational attainment of the household head, urban/rural location, number of household members, number of rooms, number of 11 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 cell phones, landline ownership, type of phone contacted (landline or cell phone) and whether any adult in the household went without food for an entire day due to lack of money or other resources. These variables were assessed for each country through a backward stepwise procedure, keeping in the final propensity model the ones with a regression coefficient significant at the .05 level. B. Nonresponse adjustment of the supplement sample component The supplement sample weights adjustment follows the same procedure as the first-time nonresponse adjustment described for the panel component in point A.1 above. Step 4: Weight calibration Finally, the nonresponse adjusted weights for both respondent households and individuals were calibrated to reflect the totals by region, sex, age and educational attainment available from external national official sources, adjusted by phone coverage. This last adjustment has two objectives: • To further reduce potential nonresponse biases that were not addressed by the nonresponse adjustment in Step 3, by using auxiliary variables from external sources. This can be achieved as long as the calibration auxiliaries are correlated with both nonresponse and the study variables. • To improve the precision of the survey estimators (i.e., reduce the sampling variances), as long as the auxiliaries are correlated with the study variables of interest.14 Calibration works by minimizing a measure of the distance between the input weights (nonresponse adjusted weights in this case) and the calibrated weights, under the constraint that the sum of the calibrated weights equals the sum of the totals of all the auxiliaries from the external source. Unlike the nonresponse adjustment, weight calibration requires auxiliary variables only for respondents. The available auxiliary variables were: total households by region and adult population by region, sex, age group, educational attainment. These were all categorical variables, the region variable had many categories in most countries and the overall samples were rather small. Under these circumstances, raking was the most suitable calibration method. The HFPS applied a raking calibration using a logit distance function since this generally fitted a more exact adjustment on the calibration auxiliaries. The final weights for respondent households and individuals can then be expressed as: 14 This objective was not addressed in this survey since it would have entailed computing a large set of replicate weights (with bootstrap or jackknife replication methods), which could be confusing for the final user. 12 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 where is the design weight for the j-th household; is the nonresponse adjustment factor for households in class c; is the calibration factor for the j-th household; is the design weight for the k-th individual; is the nonresponse adjustment factor for individuals in class c; and is the calibration factor for the k-th individual. Tables 3 and 4 show the data sources used for calibrating the weights in each Original Country. Population totals taken from these sources were adjusted for telephone coverage based on the national phone coverage rates published by the United Nations International Telecommunication Union (ITU). Table 3. Original Countries. Sources of auxiliary data used for weight calibration by region, sex age. Country Data source used for weight calibration (region, sex and age) Argentina Instituto Nacional de Estadística y Censos. Proyecciones de Población 2021. Bolivia Instituto Nacional de Estadística. Proyecciones de Población 2021. Chile Instituto Nacional de Estadística. Estimaciones y Proyecciones de Población 2021. Colombia Departamento Administrativo Nacional de Estadística. Proyecciones de Población 2021. Costa Rica Instituto Nacional de Estadística y Censos. Proyecciones de Población 2021. Dominican Rep. Oficina Nacional de Estadística. Proyecciones de Población 2021. Ecuador Instituto Nacional de Estadística y Censos. Proyecciones de Población 2020. El Salvador Dirección General de Estadística y Censos. Proyecciones de Población 2021. Guatemala Instituto Nacional de Estadística de Guatemala. Proyecciones de Población 2021. Honduras Instituto Nacional de Estadística. Proyecciones de Población. 2021 Mexico Instituto Nacional de Estadística y Geografía. Censo Nacional de Población 2020. Paraguay Instituto Nacional de Estadística. Proyecciones de Población 2021. Peru Instituto Nacional de Estadística e Informática. Estimaciones y Proyecciones de Población 2021. 13 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 Table 4. Original Countries. Sources of auxiliary data used for weight calibration by educational attainment. Country Data source used for weight calibration (educational attainment) Argentina Instituto Nacional de Estadística y Censos. Censo Nacional de Población, Hogares y Viviendas 2010. Bolivia Instituto Nacional de Estadística. Encuesta de Hogares 2020 Chile Instituto Nacional de Estadística. Censo de Población y Vivienda 2017. Colombia Departamento Administrativo Nacional de Estadística. Censo Nacional de Población y Vivienda 2018. Costa Rica Instituto Nacional de Estadística y Censos. Censo Nacional de Población y Vivienda 2011. Dominican Rep. - Instituto Nacional de Estadística y Censos. Encuesta Nacional de Empleo, Desempleo y Subempleo, Ecuador diciembre 2019. El Salvador - Guatemala Instituto Nacional de Estadística de Guatemala. Censo Nacional de Población y Vivienda 2018. Honduras - Mexico Instituto Nacional de Estadística y Geografía. Censo Nacional de Población 2020. Paraguay Instituto Nacional de Estadística. Encuesta Permanente de Hogares 2019. Peru Instituto Nacional de Estadística e Informática. Censo de Población y Vivienda 2017. Step 5: Weight trimming The distributions of the resulting household and individual weights were examined in each country to decide whether any trimming was needed. Weight trimming sought to reduce excess variation in the final weights introduced by nonresponse adjustments and calibration, thus mitigating the inflation of the standard errors of the estimates due to weighting. The trimming process took the largest weights within each of the age-sex-region-education subgroups and reduced their value to the next largest value of the weights. The number of weights trimmed was generally small in all countries. These trimmed weights were then recalibrated to the population control distributions. On the other hand, trimming may also change estimates if done carelessly, particularly if the value of a variable with a large weight value is itself large. Therefore, the trimming process included a sensitivity analysis to assess whether large changes in estimates might occur as weights were trimmed. Trimming was carried out in a series of rounds within each subgroup. Each time the larger weights in a round were trimmed, a set of key survey estimates were computed for the overall sample. If the relative change was more than 2% of the estimated value before any trimming was done, the trimming step was not used. This limit was reached for only very few estimates in very few countries. The trimming process thus sought to reduce unnecessary weight variation while avoiding significant changes in key survey estimates. Finally, trimmed weights were recalibrated to the same population totals used in the first calibration described in Step 4. 14 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 2.2. PHASE 2 WAVE 1 WEIGHTING PROCEDURES FOR THE HFPS ADDED COUNTRIES The sample design and weighting procedures in the Added Countries are the same ones used for the Original Countries in Phase 1 Wave 1. For a detailed description, refer to HFPS Phase 1 Technical Note on Sampling, Weighting and Estimation. Tables 5 and 6 show the data sources used for calibrating the weights in each Added Country. Table 5. Added Countries. Sources of the auxiliary data used for weight calibration by region, sex and age. Country Data source used for weight calibration (region, sex and age) Antigua & Barbuda Statistics Division, Ministry of Finance and Corporate Governance. Population Projections 2021. Belize Statistical Institute of Belize. Population Projections 2020. Dominica Central Statistics Office of Dominica. Population Projections 2021. Guyana Bureau of Statistics of Guyana. Population Projections 2017. Haiti United Nations Statistics Division. Population Projections 2020. Jamaica Statistical Institute of Jamaica. Population Projections 2019. Nicaragua United Nations Statistics Division. Population Projections 2020. Panama Instituto Nacional de Estadística y Censo. Estimaciones y Proyecciones de Población 2021. Saint Lucia Central Statistical Office. Population Estimates and Projections 2018. Uruguay Instituto Nacional de Estadística. Estimaciones y Proyecciones de Población 2021. Table 6. Added Countries. Sources of the auxiliary data used for weight calibration by educational attainment. Country Data source used for weight calibration (educational attainment) Antigua & Barbuda - Belize Statistical Institute of Belize. 2010 Populations Census. Dominica - Guyana - Haiti - Jamaica The Planning Institute of Jamaica. Jamaica Survey of Living Conditions 2017. Nicaragua - Panama Instituto Nacional de Estadística y Censo. Censo Nacional de Población 2010. Saint Lucia - Uruguay - Note: In those countries with no available recent data about educational attainment, weight calibration was only done by region, sex and age. 15 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 3. PHASE 2 WAVE 2 HFPS Phase 2 Wave 2 has four units of analysis: households, adult individuals (18 years of age and older), children 0 through 5 years of age, and children 6 through 17 years of age.15 Weights were computed for each sample unit and should be used according to the estimate of interest. The size of the Phase 2 Wave 2 overall (cell phones and landlines) selected sample of phone numbers (i.e., before any fieldwork activities) in each of the Original Countries equals the Phase 1 Wave 1 overall selected sample of phone numbers, plus the Phase 2 Wave 1 overall supplement fresh sample, plus the Phase 2 Wave 2 overall supplement fresh sample of phone numbers. where is the size of the Phase 2 Wave 2 overall selected sample of phone numbers in each Original Country; is the size of the Phase 1 Wave 1 overall selected sample of phone numbers; is the size of the Phase 2 Wave 1 overall selected fresh supplement sample of numbers; and is the size of the Phase 2 Wave 2 overall selected fresh supplement sample of numbers. These three sample components were subject to different response mechanisms. The sample was affected by first-time nonresponse in Phase 1 Wave 1, plus attrition nonresponse in Phase 2 Wave 1 and attrition nonresponse in Phase 2 Wave 2. The fresh supplement sample was subject to first-time nonresponse in Phase 2 Wave 1 and attrition nonresponse in Phase 2 Wave 2. Finally, the fresh supplement sample incorporated in Phase 2 Wave 2 was only subject to first-time nonresponse. These response mechanisms were accounted for in the nonresponse weighting adjustments through a set of procedures analog to the ones described in Section 2 above. On the other hand, the size of the Phase 2 Wave 2 overall selected sample of phone numbers in the Added Countries equals the Phase 2 Wave 1 overall selected sample of phone numbers, plus the Phase 2 Wave 2 overall fresh supplement sample of phone numbers. 15 In Ecuador, the units of analysis were households, adult individuals, children 0 through 4 years of age, and children 5 to 17 years of age. 16 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 where is the size of the Phase 2 Wave 2 overall selected sample of phone numbers in each Added Country; is the size of the Phase 2 Wave 1 overall selected sample of phone numbers; and is the size of the Phase 2 Wave 2 overall selected fresh supplement sample of numbers These two sample components also went through different response mechanisms. The sample was affected by first-time nonresponse in Phase 2 Wave 1 and attrition nonresponse in Phase 2 Wave 2. The supplement was only subject to first-time nonresponse in Phase 2 Wave 2. Again, these response mechanisms were also considered in the nonresponse weighting adjustments through procedures analog to the ones described in Section 2. Minority populations HFPS Phase 2 Wave 2 produced survey results for minority groups (indigenous and afro-descendant populations) in seven countries: Bolivia, Dominican Republic, Colombia, Guatemala, Mexico, Panama and Peru. In order to attain minority samples sufficiently large to allow for reliable estimates, the HFPS had to sample minority populations at a sampling rate higher than for the primary samples in five countries: Bolivia, Dominican Republic, Colombia, Panama and Peru. For this purpose, the HFPS executed a screening operation in each of these countries.16 The screening process consisted of calling an additional random sample of numbers and applying a short questionnaire to identify respondents who were part of the HFPS minority groups. Respondents classified as part of a minority population were then interviewed using the main survey questionnaire. These were then added to the primary sample of respondents, and therefore, in these five countries the Phase 2 Wave 2 final number of interviewed households and individuals is larger than in Phase 2 Wave 1. 16 Given the high prevalence of minority populations in Guatemala and Mexico, their main samples included a number of interviewed minority households and individuals large enough for achieving reliable estimates, and no additional samples were needed. The project conducted a separate data collection exercise in Ecuador to obtain indicators of the Venezuelan immigrants in the country. The sampling strategy is also different from the HFPS Phase 2 strategy described here. Results of such effort will be made available under a separate publication. 17 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 Table 7. Phase 2 Wave 2 - Original & Added Countries. Complete interviews and attrition rate by country. (1) (2) (3) (4) Country Type of country Complete interviews Complete interviews Recontact rate Attrition rate in Phase 2 Wave 1 in the panel ph2w1/ph2w2 ph2w1/ph2w2 Argentina Original 1,216 595 48.9% 51.1% Bolivia Original 1,272 622 48.9% 51.1% Chile Original 1,212 482 39.8% 60.2% Colombia Original 1,221 675 55.3% 44.7% Costa Rica Original 805 354 44.0% 56.0% Dominican Republic Original 1,205 531 44.1% 55.9% Ecuador a Original 951 490 51.5% 48.5% El Salvador Original 818 288 35.2% 64.8% Guatemala Original 1,207 593 49.1% 50.9% Honduras Original 1,021 355 34.8% 65.2% Mexico Original 2,625 850 32.4% 67.6% Paraguay Original 1,076 594 55.2% 44.8% Peru Original 1,212 584 48.2% 51.8% Antigua & Barbuda b Added 790       Belize Added 816 431 52.8% 47.2% Dominica Added 861 452 52.5% 47.5% Guyana Added 785 431 54.9% 45.1% Haiti Added 2,814 1,487 52.8% 47.2% Jamaica Added 829 373 45.0% 55.0% Nicaragua Added 833 362 43.5% 56.5% Panama Added 815 415 50.9% 49.1% Saint Lucia Added 835 415 49.7% 50.3% Uruguay Added 816 373 45.7% 54.3% a Ecuador was part of HFPS Phase 1 in 2020, so it is considered an Original Country in this table. However, its Phase 2 Wave 1 sample was entirely new and included no panel, so its sample weighting should follow the same procedures as for the Added Countries. b Antigua and Barbuda was covered in Phase 2 Wave 2 only. 18 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 Table 8. Phase 2 Wave 2. Complete minority interviews by country. (1) (2) (3) (4) Country Complete minority Screened numbers Complete minority Total minority complete interviews from the interviews from the interviews primary sample screening operation Bolivia 475 1,112 129 604 Dominican Rep. 766 1,471 167 933 Colombia 529 7,059 312 841 Guatemala 639     639 Mexico 726     726 Panama 480 2,209 349 829 Peru 296 13,220 422 718 As shown in Table 8, the total sample of minority respondents in the five countries is formed by two components: minority cases already present in the primary sample (column 1) and minority cases interviewed in the screening operation (column 3). Both minority components have the same base weights based on the total combined sample of minority cases. The base weights of the first minority component were adjusted for attrition nonresponse between Phase 2 waves 1 and 2, whereas the base weights of the second minority component were adjusted for first-time nonresponse. Finally, the adjusted weights of the combined sample of minority respondents were calibrated to minority totals by sex, age and educational attainment available from official sources. Table 9 shows the data sources used for calibrating the weights of the minority population samples in the five countries. The datasets published by the LAC HFPS Phase 2 project contain the sampling weights for both the main and minority samples in the same data structure (see Box 1). Table 9. Phase 2 Wave 2 - Minority Populations. Sources of the auxiliary data used for weight calibration by sex, age and educational attainment. Country Data source used for weight calibration (region, sex, age and educational attainment) Bolivia Instituto Nacional de Estadística. Encuesta de Hogares. 2021. Colombia Departamento Administrativo Nacional de Estadística. Encuesta Nacional de Calidad de Vida. 2021. Dominican Rep. Vanderbilt University. LAPOP Survey. 2020. Panama Instituto Nacional de Estadística y Censo. Encuesta de Propósitos Múltiples. 2021. Peru Instituto Nacional de Estadística e Informática. Encuesta Nacional de Hogares. 2020. 19 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 Box 1. Non-minority and minority population data structure Combining two samples selected at quite different sampling rates (the primary sample and the screening sample) in one single sample -and dataset- implies having quite variable weights, which makes estimates for the overall population (including non-minority and minority populations) less efficient. That is, the standard errors for a given sample size will be generally larger than with units selected at a single rate[1]. Therefore, two alternatives were considered to organize the survey data: a) use only the primary sample to obtain overall population estimates and use the total minority sample separately to produce minority population estimates, and b) use a unique sample, including the primary and the screening samples, to produce overall population estimates and obtain minority estimates by filtering the total minority sample cases. The first alternative would produce more efficient overall estimates, although with a smaller sample. Conversely, the second alternative would produce less efficient overall estimates yet with a larger sample. Minority estimates would be the same with either alternative. On the other hand, the second alternative would allow a much simpler and (end-user) friendly approach to handle and analyze the final publicly-available datasets. To further inform this decision, a simulation was run under different minority population prevalences, varying sample sizes and different sampling rates. The exercise showed that given the minority population prevalences and the sample sizes in the five countries of interest (see Table 8), the second alternative yielded less efficient but slightly more precise estimates for most variables. Consequently, and also considering its simplicity, the second alternative was pursued in preparation of the LAC HFPS Phase 2 Wave 2 datasets. [1] Lower efficiencies are expressed through larger sample design effects. 4. ESTIMATION AND SAMPLING ERRORS When estimating sampling errors (expressed in the sampling variances, standard errors, coefficients of variation and confidence intervals) for statistics such as means, proportions, ratios and regression parameters, sample design features and weighting need to be accounted for. If not, sampling error estimates will be biased. Standard errors and coefficients of variation would be usually understated and confidence intervals would be narrower than expected. The two most usual approaches to estimating sampling errors for survey data are 1) the Taylor Series Linearization (TSL) of the estimator and the corresponding approximation to its variance, or 2) the use of resampling variance estimation 20 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 techniques such as Balanced Repeated Replication (BRR), Jackknife Repeated Replication (JRR) and bootstrap. Stata and other statistical software packages use the TSL method as the default for estimating survey data sampling errors. To determine the precision of an HFPS estimate, the data user can estimate the corresponding sampling error using the Stata code in Annex 1. This script delivers the point estimate, the standard error, the 95% confidence interval and the coefficient of variation accounting for the sample design features and weighting. The standard error is the square root of the sampling variance. The coefficient of variation is a relative measure of the standard error, calculated as the ratio between the standard error and the point estimate (it is usually expressed in percentage terms). As a rule of thumb, estimates with coefficients of variation of 1 percent or lower are considered to have a very high level of precision. Coefficients of variation between 1 and 3 percent are generally classified as very good, from 3 to 5 percent as good, 5 to 10 percent as acceptable, and 10 to 15 percent as large. Above 15 percent is classified as too large and the corresponding estimate is considered unreliable. REFERENCE LITERATURE Heeringa, S., West, B., and P. Berglund. (2017). Applied Survey Data Analysis (Second Edition). New York, Taylor & Francis Group. Lohr, S. and J. RAO. (2006). Estimation in Multiple-Frame Surveys, Journal of the American Statistical Association, 101, 1019−1030. Lohr, S. (2011). Alternative Survey Sample Designs: Sampling with Multiple Overlapping Frames, Survey Methodology, 37, 197−213. Statistics Canada. Skinner, C. and J. Rao. (1996). Estimation in Dual-Frame Surveys with Complex Designs, Journal of the American Statistical Association, 91, 349−356. Thompson, S. (2012). Chapter 15: Network Sampling and Link-Tracing Designs, in Sampling. New York, Wiley. Valliant, R., Dever J., and F. Kreuter. (2016). Practical Tools for Designing and Weighting Sample Surveys. New York, Springer. 21 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 ANNEX 1 Stata Code for Weighted Estimates and Sampling Error Computation Cross-sectional data This annex provides examples of the Stata syntax for computing estimates and their corresponding sampling errors (measured by standard errors, confidence intervals, and coefficients of variation), accounting for the HFPS sample design and weighting. For more details, data users are referred to the online Stata manual for the svy command (http://www. stata.com/manuals15/svy.pdf). The following examples are based on Phase 1 Wave 1 data. To specify the sample design features in any of the HFPS Phase 1 Wave 1 datasets, use command: svyset [pweight= w_hh_ph2w1], strata (stratum) *For household-level estimates in Phase 2 Wave 1 use weight w_hh_ph2w1 *For individual-level estimates in Phase 2 Wave 1 use weight w_ind_ph2w1 Numeric variables (means): To estimate the mean age of the population 18+, use command: svy: mean u03_03 estat cv To estimate the mean age of the population 18+ by gender, use command: svy: mean u03_03, over(u03_04) estat cv To estimate the mean age of the population 18+ who did not work in the week prior to the interview, use command: svy, subpop (if u05_01==2): mean u03_03 estat cv 22 Public Disclosure Authorized HIGH-FREQUENCY PHONE SURVEY (HFPS) - PHASE 2 SAMPLING DESIGN, WEIGHTING, AND ESTIMATION December 2022 Categorical variables (proportions): To estimate the frequency distribution of persons 18+ on whether they worked in the week prior to the interview, use command: svy: tab u05_01, se ci cv To estimate the frequency distribution of persons 18+ on whether they worked in the week prior to the interview by sex, use command: svy: tab u05_01 u03_04, col se ci cv To estimate the frequency distribution of persons 18+ on whether they worked in the week prior to the interview among males, use command: svy, subpop (if u03_04==1): tab u05_01, se ci cv Linear regression: To estimate the regression coefficients of a continuous variable y on two continuous variables x1 and x2, use command: svy: regress y x1 x2 To estimate the regression coefficients of a continuous variable y on two continuous variables x1 and x2 and two categorical variables x3 and x4, use command: svy: regress y x1 x2 i.x3 i.x4 23