Policy Research Working Paper 10179 Capturing Sensitive Information from Difficult-to-Reach Populations Evidence from a Novel Internet-Based Survey in Yemen Sharad Tandon Tara Vishwanath Poverty and Equity Global Practice September 2022 Policy Research Working Paper 10179 Abstract As conflicts across the globe escalate and data collection in share of respondents expressing sensitive views in the inter- these settings becomes more sensitive, policy makers and net survey. The differences between modalities was larger researchers are forced to turn to alternative methods for for sensitive questions than for non-sensitive questions, and accurately collecting vital information. This paper assesses all the differences were qualitatively identical for subsets of the ability of novel and anonymous internet-based surveys the sample that are underrepresented in internet surveys. to elicit sensitive information in the Republic of Yemen’s Overall, the results suggest that internet surveys can be an conflict by comparing identical sensitive and non-sensi- effective tool to use in conjunction with other techniques tive questions in an internet survey to a concurrent mobile to acquire information that would otherwise be difficult phone survey. There were significant differences between to collect. the modalities in all the sensitive questions, with a greater This paper is a product of the Poverty and Equity Global Practice. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at standon3@worldbank.org or tvishwanath@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Capturing Sensitive Information from Difficult-to-Reach Populations: Evidence from a Novel Internet-Based Survey in the Republic of Yemen∗ By Sharad Tandon‡ and Tara Vishwanath§ Keywords: Conflict; Measurement; Yemen JEL Classification: D12; I31; I38; O10; O53 ∗ The views expressed here are those of the authors and may not be attributed to the World Bank. ‡ The World Bank, 1818 H St. NW, Washington, D.C. 20420, USA, standon3@worldbank.org. § The World Bank, 1818 H St. NW, Washington, D.C. 20420, USA, tvishwanath@worldbank.org. Section1. Introduction It is challenging to collect critical welfare information in the growing number of conflict-affected regions across the world (e.g., ACLED 2021). Aside from largely not being able to conduct nationally representative household surveys (e.g., Corral et al. 2020), there are also difficulties in coaxing respondents to truthfully respond to sensi- tive questions. Fully truthful answers about conflict-related issues might be obscured by social desirability bias, where individuals might not like to admit support for certain groups that have been accused of human rights abuses (e.g., Fisher 1993); individuals are often targeted for their opinions and actions that are contrary to the wishes of the official and de facto authorities in fragile and conflict settings (e.g., Human Rights Watch 2021); and even benign information on the humanitarian situation in these settings might be strictly controlled so as to not contradict official humanitarian assessments (e.g., Favari et al. 2022). As the share of the extreme poor becomes more concentrated in fragile and conflict settings (e.g., World Bank 2018), sensitive questions about the ways in which the conflict intersects with individuals’ lives are becoming arguably more important to accurately capture than information used to build traditional welfare measures, such as monetary poverty (e.g., Favari et al. 2022). We investigate the possibility that anonymous and novel internet-based surveys might better elicit sensitive information in conflict settings. In particular, we investigate surveys using Random Domain Intercept Technology (RDIT), which invites a subset of internet users who reach wrong or dated web pages to complete a short and anonymous survey.1 Unlike either mobile phone or face-to-face surveys, these internet surveys do not capture any information about respondents. In such a setting, it is possible that respondents might worry less about the social desirability of their answers, being targeted for their opinions, or about any of the other issues that might limit respondents’ willingness to respond to sensitive questions. Although the internet surveys used here are indeed anonymous, we investigate whether 1 Specifically, the surveys are performed by a company named RIWI, which purchases the domain names of old websites and those of spellings that are close to popular websites. A proprietary algorithm then determines which of the internet users that reach RIWI-owned websites are offered the chance to participate in an RDIT survey. 2 respondents on the ground actually believe that is the case and are more willing to pro- vide sensitive information on average. We ask a number of questions about sensitive behaviors and viewpoints in the internet survey that we would expect to be underre- ported in less anonymous modalities. And then we estimate whether a larger share of internet respondents were willing to choose the sensitive answer than in a concurrent mobile phone survey asking identical questions, and further compare these differences in sensitive questions to differences in non-sensitive questions. Importantly, this empirical approach does not assess whether all respondents in the internet survey are willing to reveal sensitive information, as some share of individuals might never openly respond to sensitive questions regardless of the modality. We find that there are in fact large differences in all sensitive questions between the two modalities, with respondents in the internet-based survey being much more likely to choose sensitive responses than in the mobile phone survey. The results illustrate that the modalities were significantly different in instances where responses might potentially threaten the safety of respondents, such as naming parties responsible for violence wit- nessed by respondents; and the modalities were different in instances where potential responses were not widely socially acceptable, such as whether violence might ever be moral or preferences over the targeting of humanitarian assistance that might be viewed by some as selfish. On average, the share choosing the sensitive option in the internet survey was approx- imately 16.2 percentage points greater than in the mobile phone survey, with a maximum difference of 29.4 percentage points and minimum difference of 9.9 percentage points. Im- portantly, given the relatively large percentage point differences in all sensitive questions, some of the percent differences between the modalities were especially large for sensitive viewpoints to which almost nobody admitted in the mobile phone survey. The large differences in response to sensitive questions stands in stark contrast to the much smaller differences between the modalities in less sensitive questions. When asked whether respondents witnessed violence over the course of the conflict or about respondents’ subjective feelings about safety in the past month, the responses were very 3 similar with an average absolute difference between the modalities of 4.75 percentage points. This is less than one-third of the average difference between the modalities in the questions that were more sensitive. Furthermore, the attrition patterns are also consistent with the respondents in the internet-based surveys believing in the anonymity of the modality. Respondents were offered one question at a time and had to provide a response before being posed the next question. Overall attrition rates were not systematically different for sensitive and non- sensitive questions, which we might expect if respondents did not believe their responses were anonymous. Furthermore, attrition rates at each question were nearly identical in regions controlled by the de facto authorities in Sana’a (DFA) and the internationally recognized government (IRG) despite the fact that some of the questions were potentially more sensitive in DFA-controlled regions. However, the ability of novel internet-based surveys to effectively elicit sensitive in- formation relies on two critical issues. First, it is important to characterize how internet users differ from the general population to properly analyze and caveat the results. The results illustrate that the internet surveys are biased in exactly the ways one would expect- the internet respondents are slightly more educated and better off economically than mobile phone respondents, who themselves are slightly better educated than the general population. However, all results are qualitatively identical when restricting the sample in both the internet and the mobile phone surveys to be more comparable. Second, it is important to investigate the degree to which the low response rate in the internet survey and the potentially different motives for non-response than in other modalities might be biasing the results. Although we cannot fully reject the possibil- ity that there is unobserved sample selection, the results are consistent with the sample selection not being different from more commonly used modalities. The answers to non- sensitive questions are similar between the internet and mobile phone survey- both overall and when making the internet and mobile phone samples more comparable demographi- cally; and the internet surveys are also able to identify many of the major ways in which the conflict has impacted households that have been independently reported by other 4 sources (e.g., FAO 2017; World Bank 2017; WFP 2019; etc.). Combined, these results suggest that novel internet-based surveys can be another important tool used to more accurately collect sensitive information. In the case of the Republic of Yemen and the strict restrictions on questionnaires, novel internet surveys are one of the only modalities in which some sensitive questions can be asked.2 Used alongside indirect survey methods, such as list experiments, endorsement experiments, and randomized response design (e.g., Glynn 2013; Blair et al. 2013; Blair et al. 2015; etc.), these surveys can expand the range of questions and types of information collected on sensitive topics; and used alongside embedding key informants in the community (e.g., Blattman et al. 2016), these surveys can be used to direct the content of and validate key informant surveys. Given the difficulty of fully validating questions on sensitive topics, the broad alignment of a large number of techniques, potentially including internet-based surveys depending on the context, is likely the best evidence on which to base policy decisions. The rest of the paper is structured as follows. Section 2 describes existing meth- ods typically used to collect sensitive information; Section 3 describes the difficulties in collecting a broad range of information in the Republic of Yemen’s conflict; Section 4 describes the internet survey and the mobile phone survey used in the analysis; Section 5 describes the empirical strategy; Section 6 reports the empirical results; and Section 7 concludes. Section 2. Background on Capturing Sensitive Information There are a wide variety of techniques to try and capture sensitive information from respondents that broadly fall into two categories. First, there are indirect methods where researchers can infer the population level responses to sensitive questions and protect the anonymity of the respondent through subtle changes in survey questions or through the introduction of randomization of responses to survey questions. These methods include endorsement experiments (e.g., Lyall et al. 2013), list experiments (e.g., Imai 2011; Blair 2 See both the background section on the Republic of Yemen and the data section for a discussion of the difficulties we had inserting some sensitive questions in a mobile phone survey of Yemeni households conducted outside of the country. 5 and Imai 2012; Glynn 2013), and randomized response design (e.g., Blair et al. 2015). In the case of endorsement experiments and list experiments, a control group is used to compare the responses of the treatment group with a slightly changed questionnaire; and in the case of randomized response design, by accounting for the known probability of the response being uninformative to the sensitive question, the researcher can infer the response to sensitive questions at the population level. However, these indirect methods have their limitations. Indirect approaches are lim- ited in the types of things that can be asked and the specificity with which respondents can describe sometimes complicated viewpoints3 ; the results of some of these approaches are highly dependent on the wording of questions and can significantly vary from experi- ment to experiment4 ; and the inability to identify responses at the individual level limits the ability to further investigate the causes and consequences of sensitive decisions. As a consequence of some of these challenges, the estimates from indirect methods are often not very precise in the few instances that one has the ability to validate the results (e.g., Rosenfield et al. 2015; Kramon and Weghorst 2019; etc.). But in addition to these indirect approaches, there is a second and more qualitative approach where an individual embeds themselves in communities to build a better rapport with respondents and populations of interest, who then might reveal things that they might not have in a more typical questionnaire (e.g., Blattman et al. 2016). But this approach can be time-consuming, costly, and difficult to perform on a large scale5 ; and even if one builds rapport with respondents, given the lack of anonymity, it is still possible that respondents continue to underreport sensitive behaviors.6 3 For example, an endorsement experiment can be used to try and infer support for a particular group, but cannot be used to understand what that support consisted of, the frequency of support, and so on. 4 In the mobile phone survey analyzed here, we also included an endorsement experiment that resulted in a likely underestimate of support for parties to the conflict; and in the internet-based survey used here, we performed a series of endorsement and list experiments to evaluate support for parties to the conflict and found inconsistent results. It is difficult to know exactly why the results were not robust. For example, it is possible that some of actions that were endorsed by parties to the conflict were deemed too important to be impacted by partisan views, but regardless of the reason, the results were not robust across wordings, contexts, and modalities. 5 For example, in Blattman et al. (2016), only a subset of the households that were part of the randomized control trial were able to be interviewed in this way. 6 There is another approach that is direct and anonymous from the enumerator, where respondents are administered self-interviewing surveys through audio recordings and tablets. However, the respondent is made aware that their responses and identifying information will be accessible to researchers, and thus 6 However, it is also important to note that many of the techniques designed to capture sensitive information have been used in more traditional settings to address issues not able to adequately be covered by a wide range of traditional data collection. However, as mentioned in the Introduction, fragile and conflict settings both increase the potential need to capture sensitive information and pose significant additional challenges beyond those experienced in traditional settings. For example, embedding individuals in com- munities to ask about personal involvement in the conflict is potentially dangerous for enumerators and limits the ability of any amount of trust enabling individuals to ad- mit opinions and actions that could place them in trouble with authorities (e.g., Human Rights Watch 2021). Additionally, many of the indirect methods have primarily been utilized in relatively stable environments, and some have suggested that sudden shocks and crises might limit the ability of these methods to precisely infer sensitive viewpoints (e.g., Kramon and Weghorst 2019; etc.). Given the difficulty in validating and eliciting sensitive opinions and the added dif- ficulties of collecting such information in fragile and conflict settings, it is important to rely on a number of different methods. In the remaining sections of the paper, we try and validate anonymous internet-based surveys that, if they prove valuable, offer advan- tages in conflict settings over other approaches and can be used alongside these other techniques to provide policy makers and stakeholders a more complete characterization of important, sensitive issues. Section 3. The Increased Need to Collect Sensitive Information in the Re- public of Yemen’s Conflict We analyze the validity of anonymous internet surveys in the midst of the Republic of Yemen’s conflict. Importantly, this is a setting that illustrates how traditionally sensitive behaviors become more prevalent and how a wide range of routine data can become sensitive in a conflict setting. These sensitivities arise from the complex nature of the conflict, the resulting humanitarian crisis, and the large humanitarian response. their responses are not anonymous to everybody. Although these approaches have led to increases in the reporting of sensitive responses in some instances, these increases are potentially spurious and have been questioned (e.g., Park et al. 2022). 7 The conflict in the Republic of Yemen involves a number of different actors and motivations. The northern parts of the country- including the pre-conflict capital and territories where at least two-thirds of the population live- have been under control of Houthi forces since late 2014 and early 2015, who have become known as the de facto authorities (DFA) in Sana’a (e.g., World Bank 2017); the southern parts of the country are under control of the internationally recognized government (IRG), that has been based abroad in Riyadh and in Aden at various points of the conflict (e.g., OCHA 2019); a coalition of nine foreign governments from the Middle East and North Africa intervened beginning in March 2015 in support of the IRG, targeting DFA positions with air strikes and supporting IRG ground troops along the border of DFA controlled territory (e.g., OCHA 2016); and both a violent secessionist movement and terrorist forces vie for control of different portions of the IRG-controlled territory (e.g., OCHA 2021). As a result of the complex and multifaceted war that continues to unfold in the Republic of Yemen, the country has experienced a large increase in violence following the initial escalation in March 2015 (e.g., OCHA 2017; Tandon and Vishwanath 2020; etc.). However, the large number of parties to the conflict and the large amount of violence increases the sensitivity of data collection surrounding the conflict. For example, it might be difficult for individuals to openly or without qualification support certain groups in the conflict that are involved in a large amount of violence, particularly violence that affects civilians (e.g., BBC 2022). When discussing the conflict itself, traditional modal- ities of data collection might not fully and accurately describe the feelings of the entire population. Furthermore, the conflict has been accompanied by both a humanitarian crisis and a forced displacement crisis. Immediately following the escalation of the conflict, the preva- lence of the population with poor food access escalated from approximately 10 percent of the population prior to the conflict to nearly 60 percent in the first nationally representa- tive assessment conducted at the end of 2016 (e.g., FAO 2017; Favari et al. 2022); and in the months following the escalation of the conflict in March 2015, approximately 10 per- cent of the population became forcibly displaced according to official figures, with roughly 8 equal numbers becoming newly displaced and returning in the years that have followed (e.g., TFPM 2017; OCHA 2019; D’Souza et al. 2022; Favari et al. 2022; etc.). However, these stressors have aggravated a number of longstanding problems in the Republic of Yemen that are too sensitive to measure with traditional modalities of data collection, including reports of increased domestic and gender based violence in the country (e.g., Oxfam 2020; the Guardian 2021; etc.); and have also led to an increase in other illegal activities, including reports of diversion of humanitarian assistance (e.g., Salisbury 2017). In the face of such a volatile environment, data collection on all issues has become sensitive in parts of the country. In particular, DFA authorities have begun to tightly control the collection of data in regions under its control.7 This control over data collec- tion has further made respondents in DFA-controlled regions hesitant to respond or fully express themselves in a wide variety of surveys (e.g., Alamoyad et al. 2020; etc.). Thus, even what would normally be non-sensitive issues in non-conflict settings can actually become sensitive. The inability to perform surveys at all across the entire country due to the conflict, the inability of these surveys to address all issues that are critical to policy makers, and the inability of individuals to respond truthfully to those questions even if they were posed have all created significant information gaps for the humanitarian and develop- ment response. Aside from intermittent food security assessments that are performed approximately every two years and are painstakingly negotiated with DFA authorities and remote food security monitoring via mobile phone, there is very little household- level data collected in the country (e.g., IPC 2018; WFP 2019; IPC 2020: IPC 2022). This leaves significant gaps as to how households are able to cope with the humanitarian disaster, how the private sector has adapted to the decline in capacity and the many conflict-related macroeconomic shocks, all the ways in which the conflict intersects with 7 Although the exact conditions placed on data collection vary from survey to survey and organization to organization, the Social Fund for Development tried to conduct a survey on the humanitarian situation in DFA-controlled regions, but was forced to abandon the survey due to a large number of restrictions. In these restrictions, the DFA would not permit any questions to be asked about the conflict itself or anything that might traditionally be deemed sensitive, there were a large number of restrictions that interfere with the sampling and could bias the results (intentionally or not), and the DFA-appointed individuals needed to accompany the survey teams. See Aghajanian and Ghorpade (2022) for details. 9 Yemenis’ lives, and the country that Yemenis would like to see emerge from the conflict (e.g., World Bank 2022a; World Bank 2022b).8 Importantly, truthful answers to many of these questions are inhibited both by fears about safety and also by social desirability bias. Section 4a. Data In order to assess the potential of anonymous internet-based surveys to elicit more sensitive information than more commonly used survey modalities, we fielded both an internet-based survey and a mobile phone survey. For the internet-based survey, we per- formed a Random Domain Intercept Technology (RDIT) survey, which delivers anony- mous opt-in surveys to internet users. Specifically, the company that performs RDIT surveys has purchased a number of internet domains names that are close spellings of commonly used websites or are expired websites that are no longer in operation. Internet users that accidentally reach one of these misspelled or expired websites are offered the chance to participate in an anonymous survey through a series of proprietary algorithms.9 We fielded an RDIT survey for the month between January 11 and February 10, 2019. The survey was offered to 198,049 internet users, of which 5,198 completed the entire survey.10 Respondents are first informed that they have randomly been selected to participate in a survey that is voluntary and completely anonymous. In the empirical analysis, we only use completed surveys when comparing responses in the internet survey to the mobile phone survey.11 , 12 8 There continue to be some anonymous key informant interviews that shed some light on these issues. For example, see ICG (2021) and ACAPS (2020) for descriptions of how the private sector has survived following the de-banking of Yemeni firms in international financial systems and how food supply chains are currently working in the country, respectively. 9 RDIT cannot be blocked by state surveillance or internet control and evades firewalls by operating on hundreds of thousands of rotating domains simultaneously, as opposed to surveys offered through social media that operate on a single website (e.g., RIWI 2016). 10 Some of the questions are only asked to subsets of respondents based on a response to a previous question; and the internet survey also has a module that is randomized across respondents where one- third of respondents are posed one of three modules. Thus, the sample sizes of questions that are compared between the surveys varies. In all cases the number of observations is listed in the analysis. 11 The results are qualitatively identical if we use all partial answers. 12 The survey also keeps track of limited amounts of metadata of respondents who are offered to participate in the survey. This information includes the browser type (e.g., Chrome, Android, etc.), the version of the browser type, the operating system, the version of the operating system, the type of device (e.g., smart phone, desktop, etc.), and the region associated with the IP address. However, the region can depend on the internet infrastructure in each country, where, for example, all IP addresses are 10 We also performed a mobile phone survey in April 2019. Households were first reached through random digit dialing and were given an initial automated interactive voice re- sponse (IVR) survey, where they were asked about the region in which they live and whether they would be willing to respond to an in-person mobile phone survey. Then, 750 households were given the final survey that included identical questions fielded in the internet survey. The share of households in the final survey that lived in DFA and IRG- controlled territory was equal to their share in the population, and we further limited the share that lived in the largest cities in each region (Sana’a and Aden) to the share of the total population that lived in each city. In both surveys, we include 11 identical questions in three broad groups, which are listed in Tables 1 and 2. First, we collect demographic information related to the like- lihood of responding to an internet-based survey, including age, location, highest school grade completed, subjective assessments of income, and employment in the past month; second, we collect information on non-sensitive questions that are not obviously related to the likelihood of responding to an internet-based survey; and third, we collect informa- tion on sensitive questions to which respondents in traditional surveys might not answer truthfully. For the sensitive questions, we vary the source of sensitivity- broken up into those potentially sensitive due to social desirability bias and those potentially sensitive due to both social desirability bias and safety concerns. For example, the question asking who the respondent held accountable for the violence they had witnessed is likely sensitive due to safety concerns or the potential judgement of others for favoring another party in the conflict; and the question on whether it could ever be morally justified to engage in violence is likely sensitive because it is not socially acceptable to all to support violence. The source of sensitivity for each sensitive question is listed in Table 2.13 associated with the capital city in some countries. In order to get accurate information on the region, it is important to validate this region with self-reported regions in the survey. 13 There are also potential variations in the level of sensitivity in sensitive questions. For example, one question asks whether the respondent disagrees with a humanitarian targeting plan, where the sensitivity could come from a belief that humanitarian agencies are wrong in their assessments, which is not necessarily sensitive to all individuals, or from the opinion that assistance should not be targeted at those most in need if the respondent does not receive assistance in those scenarios, which is a response that is potentially socially unacceptable to some. These potential ambiguities, or the lack of any ambiguity, 11 Additionally, for each sensitive question, Table 2 identifies the response that is likely the most sensitive for respondents to choose. For example, when asked which party to the conflict that the respondent held most responsible for the violence they witnessed, the least sensitive option would be to avoid naming a party and to list ”Other parties involved in the conflict.” Alternatively, choosing any other party would potentially be sensitive, with the authority in control of their region potentially being the most sensitive. Importantly, in our initial questionnaire drafts, we had originally included questions that were deemed too sensitive to even ask in a mobile phone survey, where the survey firm was worried about high attrition and about drawing the attention of authorities. These questions included those about whether individuals had supported groups in the conflict, what type of support they might have provided, and who was responsible for the start of the war. The inability to even pose basic questions about the conflict further illustrates the need for alternative modalities in such settings to more fully inform stakeholders. Section 4b. Data Issues Before analyzing the degree to which the anonymity of internet-based surveys allows respondents to choose more sensitive responses, it is important to address three impor- tant data issues. First, although the surveys are identified as being anonymous, it is important to further investigate the degree to which respondents might have believed in the guarantee of anonymity. By analyzing attrition rates at particular questions in the internet survey, we are better able to infer the degree to which respondents’ beliefs about anonymity. Respondents were offered questions one at a time, and were only offered an additional question after responding to each question already posed. If respondents did not believe the survey was anonymous, we might expect a higher share to drop out before answering but after being posed a sensitive question. However, Figure 1 reports the attrition rate at each question and illustrates that this was not the case.14 Although the large sample size in the survey and the resulting are described in Table 2. 14 The attrition rates for demographic and non-sensitive questions are reported in the left panel of the figure; and the attrition rates for sensitive questions are reported in the right panel. Within each panel, the attrition rates from left to right are reported in the order in which the questions were given to respondents. 12 narrow 95 percent confidence intervals illustrate that the small differences in attrition rates across questions are statistically significant at conventional significance levels, there was not a systematic difference in attrition rates between sensitive questions and the other questions in the survey. Some of the attrition rates of sensitive questions were higher than those of non-sensitive questions, and some of them were lower; the question with the highest attrition rate- the question asking about subjective assessments of income- is non- sensitive; and Figure 2 illustrates that the attrition rate was nearly identical between DFA and IRG-controlled regions at every question despite the potentially higher sensitivity of some of the questions in DFA-controlled regions. Second, internet users differ from the general population and from the samples of other more commonly used survey modalities in important ways that can limit the rep- resentativeness of the results. By quantifying the differences between internet surveys, other survey modalities, and the general population, we can begin to caveat the results and to provide important robustness checks for analysis that relies upon internet-based surveys. Although it is difficult to fully quantify differences between the internet surveys and the entire population given the lack of traditional data in the country, we are able to compare difficult-to-adjust demographic variables in the internet survey to nationally representative data from before the conflict, including age, sex, highest education level achieved, and population by governorate.15 The summary statistics from each survey are presented in Table 3 and suggest ex- actly what one might expect of internet surveys- respondents to the internet surveys are younger, less likely to be female, more concentrated in urban areas, and tend to be more educated than the general population. However, mobile phone surveys are biased in the exact same ways, with little difference between the mobile phone surveys and internet surveys in the share of the respondents that are female and the average age of respon- dents. But the respondents to the internet survey do tend to be even more skewed towards 15 Despite the onset of a forced displacement crisis beginning in March 2015, official figures suggest that approximately 10 percent of the population was currently displaced at the time of the surveys analyzed here, there likely has been limited migration outside the governorate where households lived prior to the escalation of the conflict, and there has also been a high share of displaced households that did not move outside their district (e.g., TFPM 2017; OCHA 2021; D’Souza et al. 2022a; D’Souza et al. 2022b). 13 better education, higher incomes, and a higher likelihood of working than respondents to the mobile phone survey.16 , 17 These differences suggest that one of the most important caveats of analyses using internet-based surveys is the need to account for the overrepresentation of certain groups in the survey modality. Given the difficulty of trying to accurately measure the income and employment status of the population to properly re-weight the surveys to make them more representative of the general population, we focus on ensuring that all differences between the internet and mobile phone surveys are qualitatively identical even when restricting the samples of each survey to be more comparable. For example, in our analysis of estimating the difference in responses to sensitive questions between the two modalities, we further check whether that difference survives restricting both modalities only to lower income households, those located in less urban governorates, older respondents, and so on. Similarly, if one were to use the modality to estimate changes in response to large shocks, it is important to separately estimate the change for households that are over and underrepresented in internet-based surveys. And lastly, we also have to address the possibility that the significant non-response rates in the internet-based surveys might result in significant non-observable sample se- lection that also limits the generalizability of the results. Although the response rates of random digit dialing mobile phone surveys similar to the one used here are also very low18 , it is important to better understand and verify that the low response rates are not 16 The largest difference in the table is in the share of non-working respondents who are not currently looking for work. Given the much smaller difference in the shares of households that are working, the difference is driven by the omitted category- a larger share of internet respondents who are not working and are not looking for work (e.g., students, retired, etc.). Although this is difficult to precisely interpret, it could be consistent with individuals who are not working or looking for work having more time to use the internet. 17 Although there are no comparison questions in the 2014 HBS for the subjective income assessment and the question on employment, it is likely that the mobile phone respondents have higher incomes and are more likely to work than the general population. Although this is difficult to precisely establish, reports derived from the most recent household surveys in the Republic of Yemen suggest that mobile phone respondents have better food access than the general population (e.g., IPC 2020; Favari et al. 2022; IPC 2022). However, the face-to-face survey that was conducted with the least amount of political and conflict interference at the end of 2016 suggest that food access of mobile phone users was similar to that of a survey that was attempting to be nationally representative in a difficult environment (e.g., FAO 2017; WFP 2017; etc.). Only in later years as the political interference in the face-to-face food security assessments likely increased, did the wedge between food security assessments and the food access of mobile phone users become apparent (e.g., Favari et al. 2022). 18 Although we are unable to calculate the response rates exactly due to how the RDD survey was 14 driving the differences between the internet and mobile phone surveys. We investigate the potential of non-response driving the results in two ways. First, we compare the internet surveys to pre-conflict and other post-conflict data sources to ensure that the surveys are detecting at least some of the large changes in well-being that have been independently reported by other data sources. The internet surveys corroborate a number of these independently reported patterns. Food insecurity drastically rose following the start of the conflict from approximately 10 percent of the population to between 40 and 60 percent of the population during the time period under analysis (e.g., IPC 2015; Favari et al. 2022), and 58.5 percent of internet respondents report having had trouble purchasing food in the market in the month before the survey19 ; underemployment drastically escalated following the start of the conflict with the majority of households critically relying on humanitarian assistance as the primary source to afford necessary purchases (e.g., IPC 2017; IPC 2018; WFP 2019; Favari et al. 2022), and only 18 percent of internet respondents reported working three or more weeks in the month before the survey; and violence has drastically escalated since the start of the conflict and affected more regions of the country by the time period under analysis here than in the first months of the escalation (e.g., Tandon and Vishwanath 2020; OCHA 2021), and 77 percent of respondents in the internet survey either witnessed violent incidents themselves or know someone who has witnessed violence. Combined, these results are consistent with sample selection not precluding the internet surveys from detecting many of the large changes that have occurred since the start of the conflict. Second, in the basic empirical strategy that we present in the next section, we also investigate whether differences in unobservable sample selection between internet and mobile phone surveys make it too difficult to compare the two modalities. Specifically, we compare the difference between internet and mobile phone surveys in sensitive questions to the difference between non-sensitive questions where there is less motivation to answer conducted, benchmark response rates in such surveys are 15 - 20 percent (e.g., World Bank 2020), which are levels at which the same non-response issues are critical for mobile phone surveys as well. 19 The question distinguished whether food was available for purchase in the market. In total, 50.5 percent of respondents had trouble purchasing food in the past month even though food was available to purchase, 7.9 percent had trouble purchasing food because food was not available to purchase, and 41.5 percent did not have trouble purchasing food from the market in the past month. 15 differently between the two modalities. A small difference between the modalities in the non-sensitive questions would further be consistent with the sample selection between the two modalities being similar. Section 5. Baseline Empirical Strategy As discussed above, we assess the ability of novel and anonymous internet-based surveys to elicit more sensitive responses relative to mobile phone surveys by comparing the difference in the modalities in sensitive questions to non-sensitive questions. First, we create a series of indicator variables from both the sensitive and non-sensitive questions, and then stack the responses from the internet survey and from the mobile phone survey for each individual question. For sensitive questions, we make the answer to each indicator function to be the more sensitive option as identified in Table 2 (i.e., an indicator equaling one if the individual named a party to the conflict as being responsible for violence witnessed and not choosing the other option). We then estimate the following specification for each individual question: j j (1) Indicatori = β0 + β1 Interneti + i where Indj i denotes the response to indicator function j for individual i; and Interneti denotes an indicator equal to one if the response was from the internet survey. In this specification, β0 denotes the share of the respondents to the mobile phone survey that j answered affirmatively to question j , and β1 denotes how much larger the share was for the respondents to the internet survey. If the internet surveys were more able to elicit j sensitive information, we would expect for β1 > 0 for sensitive questions and for the j magnitude of β1 to be larger in magnitude for sensitive questions than for non-sensitive sens non−sens 20 questions (i.e., β1 > |β1 |). j Importantly, as discussed in the Introduction, an estimate of β1 > 0 for sensitive ques- 20 Although we estimate the difference separately in the main text, in the appendix we jointly esti- mate the difference across questions and formally compare the difference for sensitive and non-sensitive questions. See Appendix 1. 16 tions can still be consistent with a segment of internet respondents that resist choosing sensitive options in the internet survey. Rather, what the specifications are estimating is whether there is a larger share of respondents that are willing to admit sensitive opinions or viewpoints in the internet surveys than in the mobile phone surveys. Thus, similar to other techniques designed to elicit sensitive information from respondents, these estimates could still be consistent with underreporting of sensitive behaviors and viewpoints. Section 6a. Differences in the Responses to Sensitive and Non-Sensitive Ques- tions The empirical results are consistent with the novel and anonymous internet-based surveys used here being able to elicit more sensitive information than mobile phone surveys. Table 4 reports estimates of specification (1) for all non-demographic information that is common between the two surveys. The estimates for sensitive questions are reported in columns (1)-(4); and the estimates for non-sensitive questions are reported in columns (5)-(8). A number of important patterns emerge from Table 4. First, for sensitive questions, respondents were significantly more likely to choose the sensitive option in the internet survey than in the mobile phone survey. The estimate of β1 in columns (1)-(4) of Table 4 are all greater than zero, large in magnitude, and are all statistically significant at the one percent level. The largest difference between the modalities is 29.4 percentage points, and for that question one can rule out a difference smaller than 24.4 percentage points at conventional significance levels21 ; and the smallest difference between the two modalities in the sensitive questions is 9.9 percentage points, and for that question one can rule out a difference smaller than 6.4 percentage points.22 Second, in addition to the percentage point differences being large in magnitude, the percent differences are also large. For columns (1)-(3) of Table 4, the percent increase in the share choosing the sensitive option ranges between 36.4 and 62.6 percent. However, 21 This is the lower bound of the 95 percent confidence interval on the estimate of β1 in column (1) of Table 4. 22 This is the lower bound of the 95 percent confidence interval on the estimate of β1 in column (3) of Table 4. 17 the percent increase in column 4 - the share that did not care about the targeting of humanitarian assistance as long as they themselves received assistance- was especially large. Essentially nobody in the mobile phone survey admitted to having this preference. But a non-negligible share - 9.9 percent- admitted to it in the internet survey, which represents a 990 percent increase. Thus, even in the case where the percentage point difference between the surveys was one of the smallest among sensitive questions, the percent increase was the largest. Third, the difference between the modalities in the sensitive questions was significantly larger than the difference in non-sensitive questions. The average difference between the modalities for sensitive questions in columns (1)-(4) was 3.4 times greater than the average of the absolute difference in columns (5)-(8)23 ; the largest percentage point difference in the sensitive questions (column 1) was over twice as large as the largest difference for the non-sensitive questions (column 6); the smallest difference in the sensitive questions was still 9.9 percentage points (columns 3 and 4) while the smallest difference for the non-sensitive questions was essentially zero (columns 5 and 8); and the differences in the sensitive questions were all statistically significant at the 1 percent level, while two of the four differences in the non-sensitive questions were not statistically different from zero at conventional significance levels and were very low in magnitude.24 And fourth, despite the strong average difference reported across sensitive questions in Table 4, the results also illustrate that the difference between modalities was not equal across the sensitive questions. The largest difference between the modalities was in the question that was potentially sensitive due to social desirability bias and the possibility of the respondent’s safety being threatened- the question asking respondents to name the party most responsible for the violence they witnessed. The point estimate was almost twice as large as the sensitive question with the second largest difference between the modalities- the question on whether violence could ever be morally justified; and the 95 percent confidence interval of the estimate of the difference in naming the party 23 The average of columns (1)-(4) was 16.2 percentage points; the average of the absolute difference in columns (5)-(8) was 4.75 percentage points. 24 The larger difference in sensitive than non-sensitive questions between the modalities survives more formal comparisons as well. See Appendix 1. 18 responsible did not overlap with the 95 percent confidence interval of the differences in any of the other sensitive or non-sensitive questions in columns (2)-(8).25 Although it is difficult to make strong inferences from a single question, the potential differences in responses based on the source of that sensitivity is an issue that needs to be further investigated. Importantly, the results survive an important robustness check. It is important to further illustrate that the differences in the modalities in sensitive questions is not being driven by the observable differences in the internet and mobile phone samples. Table 5 illustrates that when restricting the samples in the internet and mobile phone surveys to be more comparable using all the demographic information available, the sensitive questions continue to have much larger differences between the modalities than the non- sensitive questions. Specifically, Table 5 illustrates that these results survive restricting the sample to those groups that are underrepresented in the internet survey- to only those with above the median age in the internet survey (above 26), to only female respondents, to only those who do not live in the two largest cities in the country, to those whose subjective income levels are not sufficient to live comfortably, to those who worked less than three weeks in the past month, and to those whose highest school level completed was either primary school or below.26 Section 6b. Inferring Potential Sources of Sensitivity and Discussion Although the results illustrate strong differences between internet and mobile phone surveys in the responses to sensitive questions, the differences between the surveys can potentially be further used to make inferences about the sources of sensitivity, which potentially has implications for the conflict itself and the type of country that Yemenis might want to see emerge from the conflict. Specifically, we investigate differences in the magnitude of the baseline estimates across regions where the sources of sensitivity 25 See Appendix 1 for a formal specification illustrating the larger difference by adding a triple inter- action term to the formal comparison made in Appendix 1. 26 The lack of a difference in the baseline results when restricting the sample further survives in the more formal comparison in Appendix 1. Furthermore, the results are qualitatively identical when interacting respondent characteristics with the main effect and including the whole sample. 19 potentially differs. As discussed in the background section, given the overall greater sensitivity around data collection in DFA-controlled regions, it is possible that many of the sensitive ques- tions might be more sensitive in DFA-controlled regions as well. If that were the case, we might expect there to be significantly larger deviations in sensitive questions between the internet and mobile phone surveys in DFA-controlled regions. However, we illustrate that this is surprisingly not the case. Table 6 re-estimates specification (1) for all sensitive questions, but further allows the difference between the modalities to vary based on whether the respondent lives in regions under DFA control. The estimates are low in magnitude, vary in sign, and are not precisely estimated. Although these results are difficult to precisely interpret, the results are consistent with questions being equally sensitive across the entire country. It is possible that indi- viduals in regions controlled by the IRG might worry about the DFA eventually taking over more territory or the entire country, and do not want there to be a record of them being critical of them; or it is possible that the primary driver of the sensitivity could be that individuals are afraid of naming groups responsible for violence they witnessed because they do not want to be seen as being implicitly supportive of another group. More investigation can help to address some of these uncertainties and better understand the sources of sensitivity in the Republic of Yemen’s conflict, potentially using internet surveys along with the other techniques described in Section 2. Section 7. Conclusion We investigate the ability of novel and anonymous internet-based surveys to poten- tially elicit more sensitive information from respondents by comparing the results of an internet survey to a concurrent mobile phone survey. Importantly, we illustrate how col- lecting sensitive information becomes significantly more important in conflict settings- both because a wide range of typical sensitive behaviors become more important, and because a wider range of information that is critical to well-being and the conflict itself becomes difficult to collect. The results demonstrate how there were stronger differences 20 in the responses to sensitive questions than in non-sensitive questions, and the differ- ences were being driven by internet respondents being more likely to choose the sensitive option. Combined, the results suggest that anonymous internet surveys can be an ad- ditional tool to collect information that individuals would likely underreport in other survey modalities. However, there are a number of caveats to these results and future work to be done. First, although the internet respondents are more likely to choose the sensitive option in this setting, that does not mean that there still is not a subset of internet respondents that are purposely not admitting to sensitive preferences, opinions, and actions. Additional work needs to be done to investigate what share of individuals might still not fully respond to sensitive questions in these types of surveys, potentially by comparing responses to internet surveys to observable outcomes (e.g., election outcomes, market outcomes, etc.) or by using internet surveys along with key informant interviews and other techniques to try and better triangulate how people feel about sensitive issues. Additionally, this is a single conflict setting in which pre-conflict mobile phone usage and internet usage were high relative to conflicts in other parts of the world.27 Further- more, the surveys used here were small, and larger and more detailed surveys could better vary the source of the sensitivity, can better compare different issues where the source of sensitivity is similar but the degree of sensitivity might vary substantially, and so on. Additional work can shed light on these and other issues, and further illustrate how these results might generalize to other conflict settings. 27 For example, see estimates of penetration of mobile phones and internet in the World Development Indicators at https://datatopics.worldbank.org/world-development-indicators/. 21 References ACAPS. 2020. “Yemen Food Supply Chain.” Report, ACAPS, Amman, Jordan. ACLED. 2021. ”Global Conflict and Disorder Patterns 2021.” Report, ACLED, ”https : //acleddata.com/acleddatanew/wp−content/uploads/2020/02/Global−Conf lict− and − Disorder − P atterns M SC F eb11 F inal.pdf ” Aghajanian, A and Y. Ghorpade. 2022. “Yemen Human Development Survey- Report.” Report, World Bank. Almoayad, S., E. Favari, S. Halabi, S. Krishnaswamy, A. Music, and S. Tandon. 2020. “Using Remote and Non-Traditional Data Collection to Better Understand the Cur- rent State of Education in Yemen.” Report, World Food Programme and World Bank. https://reliefweb.int/report/yemen/education-yemen-utilizing-remote-and-non-traditional- data-collection-better-understand. BBC. 2014. ”How Yemen’s Capital Sana’a Was Seized by Houthi Rebels.” Report, BBC, https://www.bbc.com/news/world-29380668. BBC. 2022. ”Yemen: Why is the War There Getting More Violent?” Report, BBC, https://www.bbc.com/news/world-middle-east-29319423. Blair, G., C. Fair, N. Malhotra, and J. Shapiro. 2013. “Poverty and Support for Militant Politics: Evidence from Pakistan. American Journal of Political Science 57, 30-48. Blair, G., and K. Imai. 2012. ”Statistical Analysis of List Experiments.” Political Anal- ysis 20, 47-77. Blair, G., K. Imai,, and Y. Zhou. 2015. “Design and Analysis of Randomized Response Technique.” Journal of American Statistical Association 110 (511), 1304-1319. Blattman, C., J. Jamison, T. Koroknay-Palicz, K. Rodrigues, and M. Sheridan. 2016. ”Measuring the Measurement Error: a Method to Qualitatively Validate Survey Data.” Journal of Development Economics 120, 99-112. Corral, P., A. Irwin, N. Kirshnan, D. Mahler, and T. Vishwanath. 2020. ”Fragility and Conflict: On the Front Lines of Poverty Reduction.” Report, World Bank. D’Souza, A., E. Favari, S. Krishnaswamy, and S. Tandon. 2022. “Consequences of Forced Displacement: Evidence from Yemen.” Policy Research Working Paper WPS10176, World Bank, Washington, D.C. D’Souza, A., E. Favari, S. Krishnaswamy, and S. Tandon. 2022b. “How Does Violence Force Displacement in Active Conflict? Evidence from Yemen.” Policy Research Working Paper WPS10177, World Bank, Washington, D.C. FAO. 2017. “Emergency Food and Nutrition Security Assessment.” Report, Food and Agriculture Organization of the United Nations, Rome, Italy. Favari, E., S. Krishnaswamy, and S. Tandon. 2022. ”Surviving in the Time of War.” Report, World Bank and the World Food Programme, Washington, D.C. and Rome. Fisher, R. 1993. “Social Desirability Bias and the Validity of Indirect Questioning.” Journal of Consumer Research 20 (2), 303-315. 22 Glynn, A. 2013. “What Can We Learn with Statistical Truth Serum? Design and Anal- ysis of the List Experiment.” Public Opinion Quarterly 77, 159-172. The Guardian. 2020. ”Four Journalists in Yemen Sentenced to Death for Spying.” Re- port, Sana’a, https://www.theguardian.com/world/2020/apr/11/four-journalists-in-yemen- sentenced-to-death-for-spying. The Guardian. 2021. ”Women Face Rising Violence Amid War in Yemen.” Report, the Guardian, ”https://www.theguardian.com/global-development/2021/feb/22/he-treated- me-as-a-slave-women-face-rising-violence-amid-war-in-yemen.” Human Rights Watch. 2016. “Yemen: Abusive Detention Rife Under Houthis.” Report, Human Rights Watch, https://www.hrw.org/news/2016/11/17/yemen-abusive-detention- rife-under-houthis. Human Rights Watch. 2021. ”World Report 2021.” Report, Human Rights Watch, New York. ICG. 2017. “Brokering a Ceasefire in Yemen’s Economic Conflict.” Report, International Crisis Group, New York. https://www.crisisgroup.org/middle-east-north-africa/gulf-and- arabian-peninsula/yemen/231-brokering-ceasefire-yemens-economic-conflict. IPC. 2017. “IPC Map of Yemen: March – July, 2017.” Integrated Food Security Phase Classification. Report, Food Security Cluster, Humanitarian Response, Yemen. IPC. 2018. “IPC Map of Yemen: December 2018 – January 2019.” Integrated Food Security Phase Classification Report, Food Security Cluster, Humanitarian Response, Yemen. IPC. 2020. “Acute Food Insecurity Situation October – December 2020 and Projection for January – June 2021.” Integrated Food Insecurity Phase Classification Report, Food Security Cluster, Humanitarian Response, Yemen. IPC 2022. “Acute Food Insecurity Situation January – May 2022 and Projection for June – December 2022. Integrated Food Insecurity Phase Classification Report, Food Security Cluster, Humanitarian Response, Yemen. Imai, K. 2011. “Multivariate Regression Analysis for the Item Count Technique.” Journal of the American Statistical Association 106 (494), 407-416. Kramon E, and K. Weghorst. 2019. ”(Mis) Measuring sensitive attitudes with the list experiment.” Public Opinion Quarterly 83,236–263. Lyall, J., G. Blair, and K. Imai. 2013. “Explaining Support for Combatants during Wartime: A Survey Experiment in Afghanistan. American Political Science Review 107, 679-705. OCHA. 2016. ”Yemen: 2017 Humanitarian Needs Overview.” Report, United Nations Office for the Coordination of Humanitarian Affairs, Sana’a, Yemen. OCHA. 2017. Yemen: 2018 Humanitarian Needs Overview. Report, United Nations Office for the Coordination of Humanitarian Affairs, Sana’a, Yemen. OCHA. 2018. Yemen: 2018 Humanitarian Needs Overview. Report, United Nations 23 Office for the Coordination of Humanitarian Affairs, Sana’a, Yemen. OCHA. 2019. “Yemen: 2019 Humanitarian Response Plan.” Report, United Nations Of- fice for the Coordination of Humanitarian Affairs, Sana’a, Yemen. https://reliefweb.int/report/yemen/2 yemen-humanitarian-response-plan-january-december-2019-enar. OCHA. 2021. “Humanitarian Needs Overview- 2021.” Report, Sana’a, Yemen. https : //relief web.int/sites/relief web.int/f iles/resources/Y emen HN O 2021 F inal.pdf . Oxfam. 2020. ”Connecting Voices and Action to End Violence Against Women and Girls in the MENA Region.” Report, Oxfam and Naseej, ”https://www.oxfamitalia.org/naseej/.” Park, D., S. Aggarwal, D. Jeong, N. Kumar, J. Robinson, and A. Spearot. 2022. ”Pri- vate but Misunderstood? Evidence on Measuring Intimate Partner Violence via Self- Interviewing in Rural Liberia and Malawi.” World Bank Policy Research Working Paper WPS 10124, World Bank, Washington, D.C. RIWI. 2016. “Predicting the Vote Share in the 2016 Presidential Election.” Report, RIWI, https : //riwi.com/wp−content/uploads/2016/12/2016U SElection P redictionM ethodology F I Rosenfeld, B., K. Imai, and J. Shapiro. 2015. “An Empirical Validation Study of Popular Survey Methodologies for Sensitive Questions.” American Journal of Political Science 60 (3), 783-802. Salisbury, P. 2017. “Yemen’s War Profiteers Are Potential Spoilers of the Peace Process.” Report, Sana’a Center for Strategic Studies, Sana’a, Yemen. TFPM. 2017. “Yemen: TFPM – 14’th Report.” Report, IOM and UNHCR, Sana’a, Yemen, https://reliefweb.int/report/yemen/yemen-task-force-population-movement-tfpm- 14th-report-may-2017-enar. Tandon, S. and T. Vishwanath. 2020. “The Evolution of Poor Food Access Over the Course of the Conflict in Yemen.” World Development 130, 104922. WFP. 2017. “Yemen mVAM Bulletin, March 2017.” Report, World Food Programme, Cairo, Egypt, https://reliefweb.int/report/yemen/yemen-mvam-bulletin-20-march-2017- food-security-indicators-are-poor-raymah-ad-dali-and. WFP. 2019. “Yemen mVAM Bulletin, March 2019.” Report, World Food Programme, Cairo, Egypt, https : //vam.wf p.org/sites/mvam monitoring/yemen.html. Wiggins, S., S. Levine, M. Allen, M. Elsamahi, V. Krishnan, I. Mosel, and N. Patel. 2021. “Livelihoods and Markets in Protracted Conflict: A Review of Evidence and Practice.” SPARC Knowledge Report, Supporting Pastoralism and Agriculture in Recurrent and Protracted Crises (SPARC). https://www.sparc-knowledge.org/node/51/printable/pdf. World Bank. 2017. “Yemen Poverty Notes.” Report, Washington, D.C. https://www.worldbank.org/en/ poverty-notes-june-2017. World Bank. 2018. “Piecing Together he Poverty Puzzle: Poverty and Shared Prosperity 2018.” Report, World Bank, Washington, D.C. World Bank. 2020. ”Mobile Phone Surveys of Households to Assess the Impacts of COVID-19: Guidelines on Sampling Design.” Report, World Bank, https://documents1.worldbank.org/c 24 on-Sampling-Design.pdf. World Bank. 2022a. “Country Engagement Note for the Republic of Yemen.” Country Engagement Note, World Bank, https://documents1.worldbank.org/curated/en/351911650316441159/p Country-Engagement-Note-for-the-Period-FY22-FY23.pdf. World Bank. 2022b. “Glimmers of Hope During Dark Times- Country Economic Mem- orandum for the Republic of Yemen.” Country Economic Memorandum, World Bank. 25 Table 1. Demographic and Non-Sensitive Questions in both the Internet and Mobile Phone Surveys Demographic Questions Related to the Likelihood of Responding to an Internet Survey What best What What is the highest describes your Did you work outside What is What is your district do level of education household of the household in the your age? sex? you live in? you completed? income? past month? _______ _______ ______ _____________ _______________ ___________ -Number -Female -Response -No schooling -Living comfortably -Yes, less than 3 weeks -Male -Less than primary -Difficult to get by -Yes, 3 weeks or more -High school -Critical- trouble -No, I am looking for -Post-secondary even paying for work vocational food -No, I am not looking -Bachelor’s for work -Master’s or higher Non-Sensitive Questions Have you witnessed shootings or bombings during the war? Compared to last month, how safe do you feel in your area? -Yes -Safer -No, but I know someone who has -No significant change -No, not at all -Less safe 26 Table 2. Sensitive Questions in Both Internet and Phone Surveys Question Responses Explanation of Sensitive Option Sensitivity Respondents Would Most Try to Avoid Which party do you hold most -Specific parties to the Potential safety All options aside responsible for any violence conflict concerns arising from from Other parties you’ve witnessed? -Other parties involved supporting parties in the in the conflict conflict; and potential social desirability bias supporting groups involved in violence against civilians. Do you agree or disagree with the -Agree Potential social Agree following statement? -Disagree desirability bias where The use of violence for a noble respondents might be cause can be morally justified. hesitant to admit they could support violence in certain instances. Many people in all governorates -Support this plan Potential social Oppose this plan of Yemen do not have enough -Oppose this plan desirability bias where food. Donors have identified 3 respondents might be governorates that will receive hesitant to either admit double the food assistance of the that they disbelieve rest of the country. Would you humanitarian authorities support or oppose this plan? and their assessment of where need is the greatest, or admit that they would not support more assistance going to those with the most needs. Which of these food assistance -Equal food assistance Potential social Does not matter as scenarios would you most across the entire country desirability bias where long as your support? -Does not matter as long people might not like to household receives as your household admit that they only assistance receives assistance care about receiving food assistance themselves. Notes: The question about the group the respondent holds responsible for violence was part of a rotating module offered to one-third of the respondents, which was 1889 households; all mobile phone respondents were posed the question. Furthermore, the question was only posed to individuals who witnessed violence or knew someone who had during the conflict, which was 77 and 82 percent of respondents in the internet and mobile phone surveys, respectively. Additionally, the last question about which of the food assistance scenarios the respondent would most support was only asked of households that opposed the plan of higher food assistance in three governorates, which was 37 and 28 percent and of the entire sample in the internet and mobile phone surveys, respectively. 27 Table 3. Differences in Demographic Variables Between Internet, Mobile Phone, and the Household Budget Surveys Difference Between Difference Internet and Mobile Between Internet Mobile Population- Phone Surveys (Col. and Population Internet Phone 2014 HBS 1 - Col. 2) (Col. 1 - Col. 3) (1) (2) (3) (4) (5) Age 29.31 30.04 36.37 -0.733* -7.069*** [0.17] [0.41] [0.13] [0.445] [0.209] Share Female 0.20 0.19 0.53 0.011 -0.327*** [0.01] [0.01] [0.00] [0.016] [0.007] Region- Aden 0.17 - 0.03 - - [0.01] - Region- Amanat al Asimah 0.10 - 0.11 - - [0.00] - Income- Critically Poor 0.21 0.27 - -0.055*** - [0.01] [0.02] [0.018] Income- Difficulty 0.51 0.62 - -0.112*** - [0.01] [0.02] [0.020] Worked Less than 3 Weeks in Past Month 0.30 0.21 - 0.086*** - [0.01] [0.02] [0.020] Worked More than 3 Weeks in Past Month 0.18 0.19 - -0.008 - [0.01] [0.02] [0.019] Did Not Work in Past Month but Looking 0.29 0.53 - -0.240*** - [0.01] [0.02] [0.024] Below Primary 0.11 0.18 0.51 -0.076*** -0.401*** [0.00] [0.02] [0.00] [0.018] [0.006] Primary 0.08 0.20 0.24 -0.119*** -0.167*** [0.00] [0.02] [0.00] [0.018] [0.005] Secondary 0.55 0.39 0.17 0.162*** 0.382*** [0.01] [0.02] [0.00] [0.023] [0.007] University or Above 0.23 0.20 0.08 0.034* 0.154*** [0.01] [0.02] [0.00] [0.019] [0.006] Notes: This table reports summary statistics of demographic information contained in internet surveys, mobile phone surveys, and for the variables that are available in the 2014 Household Budget Surveys. The mobile phone survey was stratified by governorate so as to match the population share and is omitted from the table; and share of the population in Aden and Sana'a in the HBS are exact population figures calculated from survey weights and not estimates. Columns (4) and (5) respectively report the difference in the summary statistics between the internet surveys and the mobile phone survey and the HBS respectively. The HBS estimates are weighted to be nationally representative and both the internet and mobile phone surveys are unweighted. 28 Table 4. Baseline Specification- Difference between Internet and Mobile Phone Surveys by Sensitivity of the Question Sensitive Questions Non-Sensitive Questions _________________________________ ______________________________________ (1) (2) (3) (4) (5) (6) (7) (8) Had Not Personally Identified Personally Witnessed Groups Violence Disagree Witnessed Violence But Responsible Could be With Only Want No Change Violence Know for Violence Morally Humanitarian Assistance Less Safe in in Safety During Somone VARIABLES Witnessed Justified Targeting Themselves Past Month Past Month Conflcit Who Has internet 0.294*** 0.157*** 0.099*** 0.099*** 0.001 -0.145*** -0.041** -0.003 [0.025] [0.017] [0.018] [0.010] [0.018] [0.022] [0.020] [0.014] Constant 0.502*** 0.235*** 0.272*** 0.010 0.207*** 0.599*** 0.698*** 0.118*** [0.023] [0.016] [0.017] [0.007] [0.015] [0.019] [0.017] [0.012] Observations 1,731 5,898 5,898 2,121 2,589 2,589 2,591 2,591 R-squared 0.086 0.011 0.004 0.009 0.000 0.017 0.002 0.000 Notes: This table estimates the difference in responses to sensitive and non-sensitive questions between internet and mobile phone surveys. Columns (1)-(4) report the difference in sensitive questions, with the sensitive option equaling one in each question; and columns (5)-(6) report the differences in non-sensitive questions. The total number of respondents in both surveys combined are 5,898, but due to the questionnaire design, some questions were only asked of a subset of households. Details of the skip patterns are reported in the Data section and the appendix. Robust standard errors are reported and the observations are unweighted. *** denotes statistical significance at the 1 percent level; ** denotes statistical significance at the 5 percent level; and * denotes statistical significance at the 10 percent level. 29 Table 5. Baseline Estimates when Restricting the Samples of the Internet and Mobile Phone Surveys to be More Comparable Sensitive Questions Non-Sensitive Questions _________________________________ ______________________________________ (1) (2) (3) (4) (5) (6) (7) (8) Panel 1: Restrict respondents to above median age in internet surveys (26 years old) Identify Violence Disagree Personally No Chand in Witnessed Know VARIABLES Groups Moral Targeting Want Assist. Less Safe Safety Violence Someone- internet 0.416*** 0.119*** 0.130*** 0.060*** 0.038 -0.158*** -0.007 -0.024 [0.053] [0.034] [0.040] [0.011] [0.038] [0.046] [0.045] [0.030] Constant 0.422*** 0.191*** 0.337*** 0.000 0.185*** 0.630*** 0.671*** 0.121*** [0.046] [0.030] [0.036] [0.000] [0.030] [0.037] [0.036] [0.025] Observations 327 1,097 1,096 489 491 491 491 491 Panel 2: Restrict to female respondents Know Identify Violence Disagree Personally No Chand in Witnessed Someone- VARIABLES Groups Moral Targeting Want Assist. Less Safe Safety Violence Violence internet 0.333*** 0.245*** 0.064 0.098*** 0.034 -0.145*** -0.051 0.000 [0.058] [0.037] [0.042] [0.031] [0.040] [0.050] [0.047] [0.033] Constant 0.478*** 0.179*** 0.291*** 0.026 0.187*** 0.560*** 0.687*** 0.119*** [0.052] [0.033] [0.039] [0.025] [0.034] [0.043] [0.040] [0.028] Observations 331 1,182 1,182 410 510 510 510 510 Panel 3: Restrict to respondents outside of the two largest cities Know Identify Violence Disagree Personally No Chand in Witnessed Someone- VARIABLES Groups Moral Targeting Want Assist. Less Safe Safety Violence Violence internet 0.279*** 0.165*** 0.082*** 0.101*** -0.019 -0.124*** -0.054** -0.002 [0.029] [0.019] [0.020] [0.012] [0.020] [0.024] [0.023] [0.016] Constant 0.510*** 0.234*** 0.280*** 0.012 0.230*** 0.573*** 0.673*** 0.128*** [0.025] [0.017] [0.018] [0.008] [0.017] [0.020] [0.019] [0.014] Observations 1,266 4,373 4,372 1,531 1,990 1,990 1,992 1,992 Panel 4: Restrict sample to those with subjective income assessment of either difficult to get by or critically poor Know Identify Violence Disagree Personally No Chand in Witnessed Someone- VARIABLES Groups Moral Targeting Want Assist. Less Safe Safety Violence Violence internet 0.272*** 0.099*** 0.126*** 0.065*** 0.026 -0.104*** -0.018 0.015 [0.028] [0.019] [0.020] [0.011] [0.020] [0.024] [0.022] [0.015] Constant 0.503*** 0.235*** 0.275*** 0.012 0.207*** 0.595*** 0.704*** 0.105*** [0.024] [0.017] [0.018] [0.008] [0.016] [0.020] [0.018] [0.012] Observations 1,362 4,394 4,393 1,683 1,965 1,965 1,967 1,967 Panel 5: Restrict to sample to those who did not work at least three weeks in past month Know Identify Violence Disagree Personally No Chand in Witnessed Someone- VARIABLES Groups Moral Targeting Want Assist. Less Safe Safety Violence Violence internet 0.316*** 0.162*** 0.077*** 0.089*** -0.012 -0.154*** -0.044* -0.013 [0.033] [0.023] [0.024] [0.015] [0.024] [0.028] [0.026] [0.019] Constant 0.474*** 0.227*** 0.290*** 0.018 0.220*** 0.602*** 0.700*** 0.131*** [0.031] [0.021] [0.023] [0.013] [0.021] [0.025] [0.023] [0.017] Observations 1,293 4,659 4,659 1,677 1,944 1,944 1,945 1,945 Panel 6: Restrict sample to those whose highest grade level completed was primary school or below Know Identify Violence Disagree Personally No Chand in Witnessed Someone- VARIABLES Groups Moral Targeting Want Assist. Less Safe Safety Violence Violence internet 0.404*** 0.234*** 0.033 0.164*** -0.002 -0.254*** -0.091** -0.023 [0.051] [0.037] [0.036] [0.036] [0.034] [0.044] [0.043] [0.031] Constant 0.460*** 0.280*** 0.274*** 0.039 0.173*** 0.632*** 0.677*** 0.140*** [0.045] [0.033] [0.033] [0.027] [0.028] [0.036] [0.034] [0.025] Observations 332 1,134 1,134 341 536 536 537 537 Notes: This table replicates the estimate in Table 4 but restricting both the internet and mobile phone surveys to specific demographic groups to make the samples more comparable. Each panel corresponds to restricting the samples to a group that is underrepresented in the internet survey, using all demographic information reported in Table 3. Columns (1)-(4) report the difference in sensitive questions, with the sensitive option equaling one in each question; and columns (5)-(6) report the differences in non-sensitive questions. Robust standard errors are reported and the observations are unweighted. *** denotes statistical significance at the 1 percent level; ** denotes statistical significance at the 5 percent level; and * denotes statistical significance at the 10 percent level. 30 Table 6. Differences between Modalities in Sensitive Questions by Authority in Control (1) (2) (3) (4) Identified Groups Violence Disagree Responsible Could be With Only Want for Violence Morally Humanitaria Assistance VARIABLES Witnessed Justified n Targeting Themselves DFA Controlled x 0.010 -0.050 0.025 -0.006 Internet [0.051] [0.035] [0.037] [0.021] DFA Controlled -0.025 0.015 -0.036 -0.000 [0.045] [0.032] [0.034] [0.015] Internet 0.287*** 0.178*** 0.084*** 0.101*** [0.035] [0.025] [0.027] [0.014] Constant 0.514*** 0.227*** 0.292*** 0.011 [0.032] [0.023] [0.025] [0.011] Observations 1,731 5,898 5,897 2,121 R-squared 0.086 0.012 0.005 0.009 Notes: This table replicates the estimate in Table 4 but allows the difference between internet and phone surveys to vary by the authorities who control the region- either Houthi forces of the internationally recognized government. Robust standard errors are reported and the observations are unweighted. *** denotes statistical significance at the 1 percent level; ** denotes statistical significance at the 5 percent level; and * denotes statistical significance at the 10 percent level. 31 Figure 1. Attrition Rates for Each Question in the Internet-Based Survey 0.15 0.10 0.05 0.00 Upper Bound 95 percent CI Coefficient Lower Bound 95 percent CI Notes: This figure reports attrition rates in the internet survey after respondents are offered the listed question but before they answer. Questions are listed in the order they were asked. Attrition rates for non-sensitive questions are in black, and attrition for sensitive questions are in red. 32 Figure 2. Attrition Rates for Each Question in the Internet-Based Survey by Region of Control 2a. Regions Controlled by Houthi Forces 0.15 0.10 0.05 0.00 District Income Education Labor Moral Agree Want Safety Witness Naming Targeting Assistance Violence Party Upper Bound 95 percent CI Coefficient Lower Bound 95 percent CI 2b. Regions Controlled by the Internationally Recognized Government 0.15 0.10 0.05 0.00 District Income Education Labor Moral Agree Want Safety Witness Naming Targeting Assistance Violence Party Upper Bound 95 percent CI Coefficient Lower Bound 95 percent CI Notes: These figures report attrition rates in the internet survey after respondents are offered the listed question but before they answer. Questions are listed in the order they were asked. Attrition rates for non-sensitive questions are in black, and attrition rates for sensitive questions are in red, with sensitive defined by Tables 1 and 2. The top panel reports attrition rates for respondents living in regions controlled by Houthi forces; and the bottom panel reports attrition rates for respondents living in regions controlled by the internally recognized government. The figure cannot report differences in attrition rates to the question on governorate because the question is used to identify the controlling authority. 33 Appendix 1. Formal Comparison of Differences Between Modalities by the Sensitivity of the Question Specification (1) in the main text estimates the difference between internet and mobile phone surveys question-by-question, and comapares the differences from each specifica- tion. Here, we corroborate those findings by formally estimating the differences. All patterns discussed in the main text are able to be demonstrated using this more formal estimation approach. Specifically, we also estimate a specifcation where we stack all indicator variables in the same question from with answers from both modalities, and then we estimate the following specification: Indj j j i = β0 + β1 Interneti + β2 Sensitive + β3 Sensitive ∗ Interneti + i where all variables are the same as in the main text, and Sensitivej denotes an indicator for whether question j is sensitive, as defined by Table 2. Estimates of β3 denotes how much larger the average difference is between the two modalities in sensitive questions than in non-sensitive questions, where we would expect β3 > 0 if the internet surveys were able to elicit more sensitive information.1 Additionally, we further estimates how the baseline estimate varies based on the source or sensitivity or based on whether the respondent lived in DFA-controlled territory. Specifically, we also estimate the following specification: Indj j j j i = β0 + β1 Interneti + β2 Sensitive + β3 Chari + β4 Sensitive ∗ Interneti + j j j β5 Sensitivej ∗ Chari + β6 Chari ∗ Interneti + β7 Chari ∗ Interneti ∗ Sensitivej + i j where all variables are the same as above, and Chari denotes an indicator equaling one if the responses come from the question asking respondents to name individual parties responsible for the violence they witnessed or is equal to one if the individual lives in DFA- controlled regions. In the above speciificaiton β7 denotes how much larger the difference is between modalities based for the characterisitcs of the respondents or questions. 1 Differences in the sample sizes of the questions leads to different weighting of one question versus the other than simply averaging the mean difference by question in Table 4 in the main text. Regardless of whether the questions are re-weighted or unweighted, the results are qualitatively identical, and the unweighted results are reported in the Appendix. 34 Appendix 1, cont. Empirical Estimates Formally Comparing Differences in the Modalities by the Sensitivity of the Question Dependent Variable: Indicator Equaling One if Respondent Answers Affirmatively (Pooling all questions together) VARIABLES (1) (2) (3) (4) (5) (6) (7) (8) (9) Internet -0.047*** -0.046*** -0.040* -0.050*** -0.020* -0.056*** -0.092*** -0.047*** -0.059*** [0.011] [0.015] [0.024] [0.012] [0.012] [0.014] [0.022] [0.011] [0.015] Sensitive -0.116*** -0.132*** -0.117*** -0.109*** -0.111*** -0.127*** -0.109*** -0.181*** -0.113*** [0.014] [0.018] [0.031] [0.015] [0.014] [0.018] [0.027] [0.014] [0.020] Inernet x Sensitive 0.138*** 0.145*** 0.160*** 0.137*** 0.090*** 0.148*** 0.220*** 0.162*** 0.149*** [0.015] [0.021] [0.034] [0.017] [0.017] [0.020] [0.031] [0.016] [0.022] Internet x Sensitive x Safety - - - - - - - 0.180*** - [0.028] Internet x Sensitive x DFA Control - - - - - - - -0.033 - [0.031] Constant 0.405*** 0.402*** 0.388*** 0.401*** 0.403*** 0.413*** 0.406*** 0.405*** 0.411*** [0.009] [0.013] [0.021] [0.010] [0.010] [0.013] [0.018] [0.009] [0.014] Entire Employment Entire Entire Age Sex Region Income Education Sample Restriction Sample Status Sample Sample Observations 26,008 12,959 5,145 19,506 19,696 20,066 5,087 26,008 26,008 R-squared 0.003 0.004 0.005 0.003 0.003 0.003 0.013 0.047 0.004 Note: This table estimates how much larger the difference in responses to a series of yes-no questions are in an internet survey relative to a mobile phone survey, and allows the estimate to vary discontinuously based on the sensitivity of the question. All questions are stacked with responses from both the internet and phone survey, and the differences between the modalities are estimated for both sensitive and non-sensitive questions. Column (1) estimates the full specification; Columns (2)-(7) restrict the samples in the internet and mobile phone survey to be more similar; column (8) further compares the difference between the modalities based on the source of the sensitivitiy- safety or social desirability bias; and column (9) further compares the difference between modalities based on the controlling authority in the region in which respondents live. All non-sensitive questions have been re-defined such that the share responding affirmatively is higher for the internet survey than the mobile phone survey so that the interaction term reports how much larger the share is choosing the sensitive option in the internet survey in sensitive questions relative to the absolute difference between modalities in the non-sensitive question. Robust standard errors are reported in parentheses. * denotes statistical significance at the 1 percent level; ** denotes statistical significance at teh 5 percent level; and * denotes statistical significance at the 10 percent level. 35