Reducing Bias in Phone Survey Samples Effectiveness of Reweighting Techniques Using Face-to-Face Surveys as Frames in Four African Countries

Several developing countries are currently implementing phone surveys in response to immediate data needs to monitor the socioeconomic impact of COVID-19. However, phone surveys are often subject to coverage and non-response bias that can compromise the representativeness of the sample and the external validity of the estimates obtained from the survey. Using data from high-frequency phone surveys in Ethiopia, Malawi, Nigeria, and Uganda, this study investigates the magnitude and source of biases present in these four surveys and explores the effectiveness of techniques applied to reduce bias. Varying levels of coverage and non-response bias are found in all four countries. The successfully contacted samples in these four countries were biased toward wealthier households with higher living standards. Left unaddressed, this bias would result in biased estimates from the interviewed sample that do not fully reflect the situation of poorer households in the country. However, phone survey biases can be substantially reduced by applying survey weight adjustments using information from the representative survey from which the sample is drawn. Applying these methods to the four surveys resulted in a substantial reduction in bias, although the bias was not fully eradicated. This highlights one of the potential advantages of drawing phone survey samples from existing face-to-face, representative surveys over random digit dialing or using lists from telecom providers where such adjustment methods can be more limited.


Policy Research Working Paper 9676
Several developing countries are currently implementing phone surveys in response to immediate data needs to monitor the socioeconomic impact of COVID-19. However, phone surveys are often subject to coverage and non-response bias that can compromise the representativeness of the sample and the external validity of the estimates obtained from the survey. Using data from high-frequency phone surveys in Ethiopia, Malawi, Nigeria, and Uganda, this study investigates the magnitude and source of biases present in these four surveys and explores the effectiveness of techniques applied to reduce bias. Varying levels of coverage and non-response bias are found in all four countries. The successfully contacted samples in these four countries were biased toward wealthier households with higher living standards. Left unaddressed, this bias would result in biased estimates from the interviewed sample that do not fully reflect the situation of poorer households in the country. However, phone survey biases can be substantially reduced by applying survey weight adjustments using information from the representative survey from which the sample is drawn. Applying these methods to the four surveys resulted in a substantial reduction in bias, although the bias was not fully eradicated. This highlights one of the potential advantages of drawing phone survey samples from existing face-to-face, representative surveys over random digit dialing or using lists from telecom providers where such adjustment methods can be more limited.
This paper is a product of the Development Research Group, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at kmcgee@worldbank.org.

Introduction
The outbreak of the COVID-19 virus has had far-ranging impacts across the globe. In addition to the serious health effects, COVID-19 has caused widespread economic disruption. The epidemic has resulted in closure of schools and markets, disruption in public services, banning of social gatherings and restriction of mobility and transportation of goods and people. Given these wide-ranging impacts of the pandemic, governments, policy makers and researchers are in dire need of just-in-time and reliable data both to monitor the health and socio-economic impact of the epidemic on livelihoods and well-being. However, amid a pandemic it is difficult to deploy enumerators to the field to collect information from households and communities through face-to-face (F2F) interviews given the risk of transmission to both interviewer and respondent. As a solution to this challenge and to address the urgent demand for rapid and informative data, high frequency phone surveys (HFPS) of households are being implemented in many developing countries. The improvement of phone penetration in these counties in recent years, as well as the existence of recently conducted representative household surveys that contain re-contact information for some or all household members, has created opportunities to conduct phone surveys.
While phone surveys have distinct advantages that are particularly relevant during the pandemic, they also suffer from drawbacks. One of the most substantial weaknesses of phone surveys is that they are often subject to several sources of bias that can compromise the representativeness of the sample and the external validity of the estimates obtained from the survey (Himelein et al, 2020). This study investigates the magnitude and source of bias in recent telephone surveys as well as exploring the effectiveness of techniques to reduce it. Using telephone surveys in Ethiopia, Malawi, Nigeria and Uganda that were sampled from recently completed representative F2F surveys, this study will first explore the sources of bias introduced at each stage of the phone survey selection and interview process, by comparing the profile of characteristics of the F2F and the phone survey samples. After examining the source and extent of bias, it will then assess the effectiveness of reweighting adjustment methods to correct for these biases. These adjustment methods are especially powerful when using samples drawn from existing representative face-to-face surveys, like the Living Standards Measurement Study (LSMS) surveys, which contain a wealth of detailed information on all sampled units. While similar adjustments can be performed under other sampling approaches, such as sampling from a list of numbers from a telecom provider or random digit dialing, the scope of the adjustments is more limited in those approaches since they often do not have detailed information on sampled units that is available in a representative F2F survey. This could present one advantage of sampling from representative F2F surveys over other sampling approaches.
The remainder of this paper will proceed as follows. In Section 2, the benefits and limitations of phone surveys will be briefly reviewed including the different sources of bias and methods available to reduce the bias. In Section 3, data and methods will be discussed. Section 4 presents the results and Section 5 concludes.

Phone Surveys
Even though F2F surveys have been the main mode of data collection in developing countries for many years, there is also a growing list of successfully implemented phone surveys (see Tomlinson, M., Solomon, W., Singh, Y. et al, 2009, Ballivian et al 2015, Dabalen et al, 2016, Lau et al, 2019, (Fu et al, 2015, Himelein and Kastelic, 2015and Leo et al, 2015. Phone surveys have an array of advantages over typical F2F surveys, primary among them is cost. F2F surveys require interviewers to travel to the household's or individual's location to conduct the interview which introduces substantial travel and logistical costs, particularly for national surveys. On the other hand, phone surveys require no travel and interviewers can conduct all their calls from one location. In the era of the coronavirus pandemic, another advantage of phone surveys has been realized in that it eliminates the need for contact with others (either through travel or conducting the F2F interview) and thereby has minimal risk of COVID-19 exposure for interviewers and respondents alike. This advantage contrasted with the COVID-19 risks associated with F2F data collection has led to a proliferation of phone surveys in the developing world during the coronavirus pandemic.
However, phone surveys also suffer from some disadvantages when compared to F2F surveys. One of the main disadvantages (and the focus of this study) is that phone surveys often suffer from coverage and nonresponse bias that erode the representativeness of the interviewed sample. Coverage bias results from the typical exclusion of the segment of the population that does not have access to a telephone (either mobile or landline). Although mobile phone penetration is increasing in developing countries, in many countries there is still a sizable share of the population that lacks access and would not be represented in typical phone survey samples. There are likely substantial differences between this uncovered portion of the sample and the covered portion that will introduce bias in the results obtained.
Phone surveys also commonly suffer from nonresponse bias whereby the pool of sampled households or individuals who are not successfully interviewed are substantially different from those interviewed. Nonresponse is a common feature of any survey including F2F surveys, but the potential for nonresponse is typically substantially higher for phone surveys. One reason nonresponse is often higher for phone surveys is the difficulties contacting respondents over the phone. Successful contact is subject to mobile network availability and quality, whether the respondent's mobile phone is charged and operational, whether the respondent answers a call from an unknown number, etc. After successful contact, nonresponse can equally result from refusal of the respondent to participate in the interview or a breakoff mid-way through the interview. These two sources of nonresponse are also present in F2F surveys but could be more substantial in phone surveys since respondents might be less responsive to an unseen interviewer over the phone than an interviewer they meet in person. Bias due to this nonresponse will be introduced if the nonresponse is nonrandom with some segments of the population being more likely to not respond. Given the additional sources of nonresponse common to phone surveys, there is also a higher potential for systematic bias due to that nonresponse.

Methods to Counteract Bias
While coverage and nonresponse bias are real concerns with phone surveys, there are methods to attempt to reduce the bias. The methods available will vary according to the type of frame used for the survey. There are three predominant types of frames used for telephone surveys: (1) existing representative F2F surveys that collected phone numbers of respondents (for example, LSMS-ISA, DHS, HBS, etc.), (2) a list of phone numbers from a telecom provider, and (3) a list of numbers generated randomly (random digit dialing, or RDD) (Himelein et al, 2020). While there are advantages and disadvantages to each of these approaches, 5 there are some distinct advantages to using an existing F2F survey when it comes to counteracting coverage and nonresponse bias. The F2F survey that the phone sample is drawn from will typically contain detailed information across a wide variety of domains not just for interviewed households but also households that were ineligible (i.e. did not have access to a mobile phone) and those that were not successfully interviewed (i.e. those that did not respond or refused). This wealth of information can be harnessed to more effectively reduce bias in the phone survey sample. RDD surveys and surveys utilizing network provider frames often do not have the luxury of having detailed information on the ineligible and nonresponding sample to attempt to reduce the bias and thus any bias adjustments employed for those surveys are potentially inferior to those from representative F2F surveys. 6 The most prominent method to achieve a reduction in bias in telephone surveys is throug h adjustments to the survey weights. In order to implement these weighting adjustments, additional information is required to understand and attempt to correct for the bias. External information on known demographic composition of the general population (e.g. from population census) can be compared with that of the interviewed sample and the weights adjusted to counteract that balance. Such external information is usually limited to basic demographics. More detailed information from recent representative F2F surveys (e.g. LSMS, Demographic and Health Surveys, Household Budget Surveys, etc.) could similarly be compared with the interviewed sample. However, for RDD and telecom list-based phone surveys, information on these characteristics must also captured in the phone survey which is not always feasible in a phone survey format where the scope (in terms of question complexity) and length (in terms of interview time) are much reduced compared to a typical F2F surveys. For example, it is not feasible to collect all information necessary to construct an aggregate measure of household consumption expenditures, the prevailing welfare measure in many developing countries. Phone surveys utilizing existing representative surveys as a frame have much more scope for effective bias adjustments through reweighting as a result of the rich set of information available for the representative sample that serves as a frame. There are a variety of approaches to counteract the biases present in phone surveys utilizing the detailed information from the representative F2F survey, though three prominent methods are explored here.
The first is through weighting class adjustments, whereby the phone survey sample is divided into cells that represent different "classes" of respondents (Little, 1986). These cells are typically constructed by crossing several characteristics that may be correlated with the likelihood of nonresponse (e.g. gender, age, education, location, etc.). The phone survey response rate is then calculated for each of these cells and the weights for interviewed households in the cell increased by the inverse of the response rate. 5 See Himelein et al (2020) for a more complete comparison of the advantages and disadvantages of these three sampling approaches. 6 One disadvantage of using a representative F2F survey as a frame (compared to RDD and list-based frames) is that F2F surveys are often a two-stage clustered design with census enumeration areas typically serving as the clusters from which households/respondents are selected. This clustered design introduces design effects that reduces the effective sample size and increases the size of the overall sample required to achieve the desired level of precision (relative to a simple random sample). These design effects are at least partially retained in a phone sample drawn from the clustered F2F survey sample. These imported design effects imply that a larger sample is required to achieve the same level of precision for a phone survey drawn from a clustered F2F survey than an RDD or list-based phone survey (which is not clustered).
Another method is propensity score adjustments, whereby the probability of response is modeled (typically using a probit or logit model) using the response status from the phone survey as well as observable household/individual characteristics for the entire sample (both responders and nonresponders) derived from the representative survey (Little, 1986). The probability of response for each sampled unit is then predicted using the model parameters and the inverse of the predicted response probability (or propensity) applied to the base weights. This approach could introduce extreme weight adjustments and thereby increase the variance of the weights (and estimates obtained). However, the risk of this can be reduced by forming response classes based on the predicted probability (e.g. probability deciles) and the average predicted probability within the class used as the adjustment factor. The propensity modeling approach is particularly attractive for telephone surveys drawn from representative surveys that contain a diverse set of characteristics that can be incorporated into the propensity model.
A further method that can be employed is through calibration of nonresponse (Lundström & Särndal, 1999;Andersson & Särndal, 2016). Calibration is a weight adjustment technique utilized in many surveys whereby weights are adjusted so that the weighted sample composition matches known characteristics of the underlying population. Typically, auxiliary data from population censuses are utilized for these calibrations. However, this approach can be extended to also include sample-based estimates from a representative survey used as a frame (Andersson & Särndal, 2016). In this case, the weights for the interviewed sample will be calibrated across all considered characteristics such that the weighed profile of characteristics obtained from the interviewed sample closely match those from the full representative survey sample. Provided the characteristics included in the calibration are closely associated with nonresponse, the calibration will serve to counteract the nonrespons e bias. There are many complex calibration models that could be used (and are beyond the scope of this paper), but most are subject to an algorithm that minimizes the distance between the uncalibrated and calibrated weights and therefore might result in a lower increase in variation of the adjusted weights compared to the propensity modeling approach.

High Frequency Phone Surveys -Ethiopia, Malawi, Nigeria, & Uganda
In order to examine bias in phone surveys and effectiveness of bias adjustment methods, this study utilizes four recent high frequency phone surveys (HFPS) in Ethiopia, Malawi, Nigeria, and Uganda. These four surveys were implemented as part of a broader initiative by the World Bank to support implementation of high frequency phone surveys in several developing countries to monitor the socioeconomic impact of the COVID-19 pandemic. The baseline rounds of the Ethiopia and Nigeria HFPS were conducted in April shortly after the rapid spread of the pandemic and safety protocols and restrictions put in place. The baseline rounds of Malawi and Uganda were implemented shortly thereafter in May and June. The four HFPSs contacted households drawn from the most recent round of Living Standards Measurement Study -Integrated Surveys on Agriculture (LSMS-ISA). The LSMS-ISA are longitudinal, face-to-face, representative surveys conducted in partnership with National Statistics Offices in the respective countries. In Ethiopia, the fourth round of the Ethiopia Socioeconomic Survey (ESS) conducted in 2018/19 served as the frame for the HFPS. The fifth round of the Integrated Household Panel Survey (IHPS) 6 conducted in 2019 served as the frame for the Malawi HFPS. The sixth round of the Uganda National Panel Survey (UNPS) fielded in 2019/20 served as the frame for the Uganda HFPS. Lastly, the fourth round of the General Household Survey -Panel (GHS-Panel) in 2018/19 served as the frame for the Nigeria HFPS.
The four LSMS-ISA surveys that served as the frames for the HFPS are panel surveys that follow the same households over time. To facilitate recontact and tracking in the subsequent wave of the survey, interviewers collected phone numbers of up to 4 household members and 2 non-household member reference persons. It is this contact information that enabled seamless implementation of the HFPS surveys in each country. However, the availability of contact details varies between the countries, largely due to much higher mobile phone penetration in Nigeria than in Ethiopia, Malawi, and Uganda. Figure 3.1 summarizes the LSMS-ISA samples, contact information availability, and selection process for the HFPS in each country. The share of LSMS-ISA households with phone numbers (i.e. the coverage rate in Figure 3.1) ranges from 73 percent in Malawi and Uganda up to 99 percent in Nigeria. This at least partially reflects the much higher mobile phone penetration in Nigeria compared with the other three countries. 7 Figure  3.2 presents estimates of phone ownership in the four countries. Nigeria has the highest share of households who own a mobile phone at 76 percent, followed by Uganda (69 percent), Malawi (59 percent), and Ethiopia (48 percent). In all four countries, the LSMS-ISA sample households with contact information (row B in Figure 3.1) served as the frame for the HFPS.  7 Included in the share of the LSMS-ISA sample with phone numbers are households that did not provide a phone number of a household member but did provide the phone number of at least one reference person. As a result, the share with phone numbers does not necessarily reflect whether the household has a phone, but whether they provided any phone number of a member or reference person.

Survey Sample
The selection of the HFPS sample of households proceeded differently in the four countries. In Nigeria, 3,000 households were selected from the frame of 4,934 GHS-Panel households with contact details. The target number of successfully interviewed households to produce nationally representative estimates with reasonable precision was 1,800. Given the large amount of auxiliary information available in the GHS-Panel for these households, a balanced sampling approach using the cube method (Tille, 2006) was adopted for the selection of the 3,000 households for the HFPS. 8 For the first-round of the Nigeria HFPS, all the 3,000 sampled households were contacted and 69 percent of them were successfully reached by the interviewers (2,057 households). Of those contacted, 94 percent or 1,950 households were successfully interviewed. This yielded an overall crude response rate 9 for the sample of 65 percent. For the Ethiopia HFPS, to obtain representative estimates at the national, urban, and rural level, the target sample size for the HFPS was 3,300 households: 1,300 in rural and 2,000 households in urban areas. To account for non-response and attrition all of the 5,374 households with contact information in the ESS were called in round 1 of the HFPS. In rural areas 1,413 households owning a phone and 771 households with reference phone were contacted. Whereas in urban areas 3,213 households owing phone and 24 households with a reference phone numbers were contacted. A total of 3,249 households (2,271 urban and 978 rural households were fully interviewed yielding an overall crude response rate of 60 percent (67% for urban and 50% for rural households). 8 The variables considered in the cube sampling approach are the same variables included in the bias correction calibration below (displayed in Table 3.2). 9 Response and contact rates reported here are "crude" in that they do not cleanly correspond to AAPOR standard definitions. This crude calculation is retained for the sake of simplicity. For both the Malawi and Uganda, the entire sample of households from the latest round of the LSMS-ISA which had contact information served as the HFPS sample. In Malawi, the initial HFPS sample consisted of 2,337 households while for Uganda it consisted of 2,421 households. The crude contact and response rate was very high in Uganda where 93 percent (2,259) of sampled households were successfully interviewed. Malawi's contact and response rates were slightly higher than Nigeria and Ethiopia with 74 percent (1,729) of sampled households being successfully interviewed.

Sampling Weight and Bias Adjustments
Sampling weights were calculated in all four surveys following similar methods. The sampling weight for the most recent round of the LSMS-ISA survey served as the starting point in all four countries since these weights produce representative estimates from the full sample of households in those F2F surveys. These weights were then inflated through basic ratio adjustments at each stage of the selection and interview process from the frame to the interviewed sample (depicted in Figure 3.1). These ratio adjustments preserve the original sum of weights from the full sample of the LSMS-ISA survey. These naïve ratio adjustments do not take into account any nonrandom bias contained in the sample.
The ratio adjustments were then followed by the weighting adjustments to reduce bias in the interviewed sample relative to the F2F representative sample. The weighting adjustment implemented in each country serves to counteract both coverage and nonresponse bias simultaneously. However, the approaches taken at this stage vary in the four countries. For the Ethiopia, Malawi, and Uganda HFPS, the bias adjustment was conducted using a propensity model approach following Himelein (2014). The probability of household response was modeled using a logistic regression. The characteristics included in the logit response probability module are presented in Table 3.2 and were selected due to observed bias in the interviewed sample along these dimensions. Full results of the logit model are provided in Appendix Table  3.1. The predicted probability of response from the model parameters was obtained for each household the inverse of which was the basis for the bias adjustment factor. However, in order to prevent extreme weights due to the correction factor, the predicted probabilities were sorted into deciles and the mean inverse response probability within each decile taken as the final weight adjustment factor.
In Nigeria, the bias adjustment was performed using the calibration approach whereby the weights were adjusted (minimizing the distance between the original and calibrated weights) to achieve the same weighted estimates across selected characteristics in the GHS-Panel sample. Table 3.2 contains the list of characteristics that were used in the calibration model. These characteristics were selected for inclusion in the calibration due to observed bias in the interviewed sample across these dimensions. Adjusting the weights to match the same profile of characteristics obtained from the fully representative GHS-Panel sample will counteract the bias along these dimensions. However, one drawback to the calibration approach is that it can be difficult to consider a large set of characteristics and achieve convergence in the calibration model. Following the bias adjustments, the weights were trimmed at the 2 nd and 98 th percentiles to prevent outlier weights that would result in a substantial increase in the variance of the weights. Lastly, the weights were post-stratified to match population totals in each country (following the same poststratification approach utilized in the LSMS-ISA weight calculation). For full details on the sampling and weight calculation for Ethiopia and Nigeria, see Ambel et al (2020) and McGee et al (2020), respectively. Full documentation as well as the microdata for these four surveys can be found on the World Bank's Microdata Catalogue 10 or on the LSMS website.

Method of Analysis
The analysis undertaken in this study involves two steps. First, the magnitude and different sources of bias introduced in the phone survey selection and interview process are identified. Several subsamples are compared, reflecting the eligibility, selection, and interview process for the HFPS in the four countries. To measure the bias, pairwise comparisons of weighted means across a wide variety of household characteristics is carried out. In order to assess the extent of coverage bias, differences between the full F2F survey sample and the eligible sample for the HFPS (i.e. households with contact information) are examined. To measure sample representativeness in Nigeria, the profile of the selected sample is compared with the frame of eligible households. Lastly, to estimate nonresponse bias, the sample of households successfully contacted and those successfully interviewed is compared with the overall HFPS sample. Comparing the profile of characteristics for these samples will provide a more refined impression of the bias introduced throughout the course of the phone survey process. This comparative analysis will also reveal at what stage in the selection process the bias is the most substantial.
After conducting the comprehensive assessment of bias, an assessment of the reweighting adjustments to counteract bias is conducted to see how effective these methods were at reducing bias. Weighted estimates for the diverse set of characteristics examined in the first analysis is presented for the full representative F2F survey as well as the HFPS. The F2F survey characteristics serve as the benchmark and are compared with the profile of characteristics obtained from the HFPS sample applying the weights that include the bias adjustments (as described in Section 3.1.3).
Comparing these two sets of weighted characteristics will show how much bias remains in the sample after the correction. However, the relative effectiveness of the bias adjustments is further analyzed by computing a second set of HFPS weights which exclude the bias adjustments. The process of calculating these "unadjusted" weights follows exactly the same steps as the "adjusted" weights and only excludes the bias adjustment. The unadjusted weights serve to simulate the extent of bias that would have been present without the bias adjustments. By comparing the relative deviations from the F2F sample profile, this will provide an indication of how effective the bias adjustment techniques employed were in reducing the extent of bias in the HFPS sample. Finally, the advantages of drawing phone survey samples from representative F2F surveys is highlighted based on the performance shown using adjusted weights when compared to unadjusted weights.
All the pair-wise comparisons of characteristic means in the analysis below are conducted following the same approach. All means are weighted using either the LSMS-ISA weights or weights calculated from the HFPS and the sampling properties of each survey are accounted for (clustering, stratification, etc.). An adjusted Wald test is used to determine if differences in the means between the different subsamples are significant.

Sources of Bias
Before examining how to correct for bias, it is important to first understand the sources of bias that enter into the phone survey sample and how that bias affects the composition of the sample. Tables 4.1a to 4.3b present weighted means for a wide array of characteristics 11 from the four LSMS-ISA F2F surveys across several different subsamples ending ultimately with the sample of successfully interviewed households in the HFPS for each country. Comparing the means across these subsamples will indicate (1) at what stage any bias is entering into the samples and (2) across what set of characteristics the samples are biased in relation to the F2F surveys. Figure 4.1 also presents estimated bias (measured by deviation from LSMS-ISA mean) in graphical form for selected characteristics. Table 4.1a presents results comparing characteristics for the full F2F sample (columns 1, 4, 7, and 10) as well as the portion of the F2F sample that has contacted information (columns 2, 5, 8, and 11) for the samples from all four countries. The portion of the sample with contact information serves as the frame of eligible households to be selected for the phone survey sample. Therefore, any differences between these two samples would indicate the presence of coverage bias brought on by exclusion of households who are ineligible (i.e. have no phone number to be contacted on). Columns 3, 6, 9, and 12 present differences between the two samples (with significant differences indicated with "*" and highlighted in red).

Coverage bias
Looking down the variety of characteristics contained in Table 4.1a for Nigeria (column 9), it is evident that there are no substantial differences between the full sample and phone survey frame, suggesting coverage bias is not a major concern for Nigeria. This is not surprising however, since the share of households without contact information is very small (less than 1%) reflecting the relatively high penetration of mobile phones in the country.
However, the results presented in columns 3, 6 and 12 (Table 4.1a) provide a clear indication of coverage bias in the Ethiopia, Malawi and Uganda samples. Households with contact information are generally richer (as measured by per capita consumption expenditure), more likely to own key assets (TVs, refrigerators, and mobile phones), more likely to live in dwellings with improved features such as a modern roof and floor, improved water source and toilet facilities, and electricity, as well as more likely to have a financial account. In addition, the household heads in the eligible sample tend to be better educated and more likely to working in more formal wage employment (in Ethiopia and Malawi). However, when the Ethiopia sample is disaggregated into urban and rural areas (the two domains for that sample) in Table 4.1b, many of the indications of coverage bias are not observed, particularly for the urban sample. This suggests that coverage bias is not a serious concern in urban areas, but may be a more serious concern in rural areas where mobile phone penetration is much lower and as a result 38 percent of ESS rural households are ineligible (compared to just 7% of urban households).

Sample representativeness
The Nigeria HFPS included a sample selection performed on the frame (GHS-Panel households with a phone number) to reach the phone survey sample (see Section 3.1.1 above for more details on the Nigeria sample selection). This is in contrast to Ethiopia, Malawi, and Uganda where the entire frame served as the phone survey sample in those countries. As a result of this additional sampling step in Nigeria, it is important to examine just how representative the sample selected is. Unlike the issues of coverage and response, no distortions in the representativeness of the sample are expected from this sampling selection, but it is nonetheless useful to confirm this is the case. Table 4.2 presents the mean characteristics for the full GHS-Panel sample, the frame (households with contact information), and the selected sample for the Nigeria HFPS (columns 1, 2, and 3, respectively). Column 4 also contains the difference between the mean characteristics observed in the sample and the frame (with significance indicated as above). There are no significant differences for any characteristics between the sample and the frame, suggesting that the balanced sampling approach performed well, and the Nigeria HFPS sample retains the representativeness present in the frame.

Non-response bias
The last source of bias to examine is for non-response. Non-response as considered here includes cases from the HFPS sample that were either (1) not successfully contacted or (2) contacted but not successfully interviewed (refused or breakoff mid-interview). Both of these sources of non-response (non-contact and non-interview) will be considered separately here. Table 4.3a presents the comparison of the weighted mean characteristics for the HFPS sample (columns 1, 6, 11, and 16), sample of households successfully contacted in the HFPS (columns 2, 7, 12, and 17), and the sample of households successfully interviewed in the HFPS (columns 4, 9, 14, and 19) for the four countries. Columns 3, 8, 13, and 18 present the differences in characteristics between the successfully contacted sample and the overall HFPS sample, which represents nonresponse bias due to the inability to contact sampled households. This reflects nonresponse due to poor network availability; respondent mobile phones being switched off or non-functioning, respondent's unwillingness to answer a call from an unknown number, etc. For all four countries there appears to be some substantial bias introduced at this stage. Compared to the HFPS sample, successfully contacted households were more likely to reside in urban areas (for Ethiopia and Malawi), are wealthier (in terms of consumption expenditures, asset ownership, and housing), and have heads that are better educated. For Ethiopia, Malawi, and Uganda, the bias due to noncontact appears to further magnify the coverage bias with many of the same indicators showing significant bias and in the same direction as that for coverage bias. Non-response bias due to non-contact does appear to be less substantial for Uganda with fewer significant differences in Column 18 compared to the other countries' samples. However, this is largely a product of the much higher successful contact rate in Uganda compared to the others.
When disaggregating between the urban and rural HFPS sample in Ethiopia (shown in Table 4.3b), the extent of the detected bias due to noncontact is reduced for both samples. However, bias due to noncontact is detected across more indicators in the urban sample suggesting that bias due to noncontact could be more prevalent for that sample. However, the direction of the bias (in favor of wealthier households) is consistent across the urban and rural samples in Ethiopia.
Lastly, columns 5, 10, 15, and 20 in Table 4.3a present estimates of bias due to non-interview of successfully contacted households in the four countries' samples. This represents any bias due to respondent refusal to participate in the survey or a break-off during the interview. For all four countries, the bias introduced at this stage appears to be minimal with no significant difference observed for any indicators. This is unsurprising however since refusal and breakoffs were relatively uncommon in Ethiopia, Malawi, Uganda, and Nigeria (representing only 3%, 1%, 5%, and 1% of successfully contacted households, respectively). Minimal bias due to non-interview was also observed in both the urban and rural HFPS samples in Ethiopia in Table 4.3b.

Overall bias
The results so far have examined the separate sources of bias introduced at each stage of the phone survey selection process. Column 3, 6, 9, and 12 in Table 4.4a present the cumulative bias for the Ethiopia, Malawi, Nigeria, and Uganda HFPS interviewed samples. It is clear that the successfully interviewed HFPS samples in all four countries are biased across many different household characteristics. Overall, the bias is skewed towards urban households (except in Uganda) as well as households that are relatively better off in terms of material well-being. Household heads that are successfully interviewed are better educated and more likely to working in formal (wage) employment compared to the overall LSMS-ISA sample. These results are mimicked in Table 4.4b when disaggregating the Ethiopia sample into the urban and rural components.

Testing Effectiveness of Weighting Adjustments
The results above have clearly demonstrated the bias present in the sample of successfully interviewed households in the HFPS in the four countries. This bias would compromise the external validity of the results obtained from the HFPS which would not be fully representative of the general population. Furthermore, the bias in all four countries skews towards households that are generally better off and therefore would not accurately reflect the situation for poorer households, a segment of the population that is likely most vulnerable to the COVID-19 crisis.
However, since the samples for these two surveys are derived from representative samples in the ESS, IHPS, GHS-Panel and UNPS, there is potential to reduce the bias by harnessing the detailed information available in the F2F surveys to conduct the weighting adjustments described in Section 3.1.2. Implementing these weighting adjustments will likely counteract the bias but will not fully eliminate it. This section presents results from an assessment of the effectiveness of the adjustments performed on the HFPS in each country. Table 4.5a presents three different characteristic profiles representing: (i) the full representative LSMS-ISA samples applying usual survey weights (columns 1, 6, 11, and 16) , (ii) the HFPS samples applying weights that include the bias reweighting adjustment (columns 2, 7, 12, and 17), and (iii) the HFPS samples applying weights that do not include the bias reweighting adjustment (columns 4, 9, 14, and 18).
Comparing the relative gaps between the representative profile in (i) with the profile of characteristics obtained in (ii) and (iii) will provide an indication both of the effectiveness of the bias adjustment as well as the extent of bias that remains after the adjustment. The difference between the profile of characteristics in the representative sample and the HFPS sample using the adjusted and unadjusted weights was estimated and presented in Table 4.5a.
Examining first the difference between the representative sample and the HFPS sample using the unadjusted weights (columns 5, 10, 15, and 20), in Ethiopia and Malawi there is substantial bias across the majority of characteristics. For Nigeria and Uganda, the bias is still present, though across fewer different indicators than in Ethiopia and Malawi. The results here using the HFPS unadjusted weights largely mimic the results from above (using the LSMS-ISA weights) indicating that there is substantial bias.    Looking at the results in columns 3, 8, 13, and 18 which present the difference between the F2F sample and the HFPS sample using the HFPS weight with the bias adjustment, there is a substantial improvement in the bias. For Nigeria (column 13) there are zero indicators which are significantly different from the GHS-Panel sample, suggesting that the bias correction using the calibration approach has been very effective at reducing the bias. In Ethiopia, Malawi, and Uganda, there are still some indications of bias, however the bias has been substantially reduced following the weighting adjustments. Only a handful of the characteristics that were significantly biased when using the unadjusted weights are also biased when using the adjusted weights. For those characteristics that do still exhibit bias after applying the adjustment, the bias is reduced in all cases. This illustrates the effectiveness of the bias reduction methods employed in the HFPS in all four countries, but also highlights that these adjustments do not fully counteract the bias, particularly in the Ethiopia, Malawi, and Uganda HFPS.

32
The results in Table 4.5a have provided evidence that the bias adjustment methods employed in the HFPS in the four countries have been effective at reducing the bias present in the interviewed sample. The benefits can be further illustrated by examining not just the point estimates for the considered characteristics but also the confidence intervals. Figures 4.2 to 4.8 plot the point estimates and 95 percent confidence intervals for selected characteristics for the full LSMS-ISA (F2F) sample, HFPS sample with adjusted weights, and HFPS sample with unadjusted weights. In the figures, all estimates are standardized by subtracting the F2F survey mean (so the F2F mean will always be zero) in order to allow comparison across indicators. For household size and head age, the estimates were further standardized by dividing by the F2F mean. Thus, the confidence intervals represent the percent deviation from the mean for these two variables.
Looking through the figures, it is clear that the adjustments for bias result in substantially improved alignment with the LSMS-ISA survey point estimates and confidence intervals. Although the level of reduction in the bias achieved varies across indicators, the adjustment for nearly all indicators does result in a shift of HFPS estimates towards the LSMS-ISA. For many indicators, the reduction is the bias from the adjustments is substantial, for example literacy in Figure 4.2. As expected, the bias adjustment was especially effective for the indicators that were included in the bias adjustments in each country (listed in Table 3.2). However, for other indicators, the reduction in bias is more modest. For example, the housing characteristics in Figure 4.5 show substantial bias without the adjustment to the weights and the bias is significantly reduced after applying the adjustments, but there still remains a gap between the point estimates for the F2F and HFPS samples (though not statistically significant).
Comparing the results for the four countries, there is remarkable consistency in the direction of the bias across the indicators, though the magnitude of the bias varies. The bias is always towards households that are better off in terms of consumption expenditures and general living standards. This is especially evident in Figure 4.4 for consumption expenditure quintiles where the bias follows a linear pattern with the largest bias among the poorest and richest households and in opposing directions (negatively biased for the poorest and positively biased for the richest). In this case the bias adjustment is quite effective in all four countries (and was included in the adjustment model).
While Figures 4.2 to 4.8 illustrate the effectiveness of bias adjustments for dichotomous variables, Figure  4.9 goes a step further to show how effective the adjustments are for continuous welfare measures: consumption expenditures for Ethiopia, Nigeria and Uganda and wealth index for Malawi. Comparing the distribution for the full LSMS-ISA sample with that of the HFPS without the bias adjustments, the modest bias can be seen in all four countries, particularly in Ethiopia and Malawi. However, the HFPS sample distribution with the bias weighting adjustments aligns much more closely with the LSMS-ISA distribution, demonstrating the effectiveness of the bias corrections across the distribution of continuous measures.
Although the bias adjustments have broadly been shown to substantially reduce bias in the estimated mean of the examined indicators, one concern is that the adjustments to weights will result in a higher variance in the weights and thus larger standard errors. Large differences between the LSMS-ISA and HFPS interviewed sample could result in extreme outlier weights following the weighting adjustments. Larger standard errors would compromise the precision of the estimates and, if the standard errors are too large, will limit the usefulness of the data. In order to counteract this potential, an additional step (outlined in Section 3.1.2) is implemented in the response propensity correction factors by sorting into deciles according to the correction factor value and applying the mean correction factor within each decile. The impact of the adjustments on standard errors can be observed in Figures 4.2 to 4.8 by comparing the width of the confidence intervals of the HFPS sample estimates with and without the bias weighting adjustments. Although some expansion of the width of the confidence intervals can be observed, in all cases the difference is minor or imperceptible. Therefore, it appears the bias corrections implemented in the four countries has not come at the cost of substantially larger standard errors.
Overall, the results from this analysis have demonstrated the effectiveness of the weight adjustment techniques employed in the Ethiopia, Malawi, Nigeria, and Uganda HFPS for reducing bias in the interviewed samples. Absent these adjustments, the phone survey samples would suffer from substantial bias that would compromise the representativeness of the results obtained from the phone survey sample. The reduction in the bias from these methods highlights the distinct advantage of phone surveys taken from representative F2F household survey samples. Although difficult to effectively simulate here, the adjustments possible for RDD and telecom list-based phone surveys would likely be less effective at reducing bias than adjustments made using the rich information from the F2F representative surveys used in these four countries. Therefore, it might be expected that effectiveness of bias adjustments in RDD and telecom list-based surveys to fall between the unadjusted and adjusted corrections considered here. The effectiveness of adjustments under these other phone survey sample approaches will vary depending on the availability of external information on demographic and economic characteristics of the general population and the extent to which corresponding information can be captured in the phone survey itself.

Conclusion
Several developing countries are currently implementing phone surveys in response to immediate data needs to monitor the socioeconomic impact of COVID-19. In addition to being a safe alternative during the pandemic, phone surveys have several other logistical advantages. However, they are often subject to coverage and non-response bias that can compromise the representativeness of the sample and the external validity of the estimates obtained from the survey. These biases can be more relevant to developing countries where a considerable share of the population lacks access to a phone and connectivity problems are pervasive. Using data from high frequency phone surveys in Ethiopia, Malawi, Nigeria, and Uganda, this study investigated the magnitude and source of the biases and explored the effectiveness of the techniques applied to reduce them. The study demonstrated the advantages of sampling from representative face-to-face surveys to adjust for these biases.
The study finds substantial coverage bias in Ethiopia, Malawi, and Uganda. The profile of households with contact information tended to be considerably different from the representative F2F survey. Households in the phone survey frame were more likely to be urban and richer and more likely to own key assets and to live in dwellings with improved features such as a modern roof and floor, improved water source and toilet facilities, and electricity. Coverage bias was not much of a concern in Nigeria, but this can be largely attributed to relatively higher mobile phone penetration in the country compared to Ethiopia, Malawi, and Uganda.
However, a more serious problem and common in all four countries is non-response bias due to unsuccessful contact with the respondent. The successfully contacted sample was biased towards wealthier households with higher living standards. This bias can be largely attributed to the difficulties contacting respondents either as a result of poor network reliability, respondent's phone being turned off or unpowered, or the respondent not picking up the phone. This shows that even though higher phone penetration can lead to lower coverage bias, there is still substantial potential for bias due to unsuccessful contact with respondents in these countries. On the other hand, bias due to non-interview (refusals and breakoffs) did not introduce much additional bias in any of the countries.
The overall bias found is substantial and widespread across different characteristics. The bias nearly always tends to favor wealthier households and thus poorer households are underrepresented. This direction in the bias, left unaddressed, would result in biased estimates from the interviewed sample that do not fully reflect the situation of poorer households in the county. This is a population of critical interest to policy makers since poorer households are likely most vulnerable to the negative impacts of the COVID-19 crisis. Counteracting this bias is therefore essential to ensure that the results more closely reflect the reality of the poor and provide accurate information to policy makers.
The study has shown that these phone survey biases can be substantially reduced by applying survey weight adjustments using information from the representative F2F survey from which the sample is drawn. This was demonstrated using a wide array of demographic and socioeconomic variables that are often included in standard household consumption and well-being measurement surveys. While these bias adjustments did not fully eradicate the bias across all dimensions, they were highly effective at reducing bias. This highlights one advantage to drawing phone survey samples from existing face-to-face, representative surveys rather than from RDD or lists from telecom providers where such adjustment methods can be more limited.
Phone surveys are likely to be widely considered as alternative data collection platforms in developing countries especially in emergency situations. National statistical offices have unique opportunities to implement these techniques using their recently implemented representative F2F surveys that can serve as frames for phone surveys. Note: Coefficients presented with standard errors in parentheses. Significance denoted * p<0.05, ** p<0.01, *** p<0.001.