Policy Research Working Paper 10165 Inferring COVID-19 Vaccine Attitudes from Twitter Data An Application to the Arabic Speaking World Roy van der Weide Development Economics Development Research Group September 2022 Policy Research Working Paper 10165 Abstract This study investigates whether Twitter data can be used to organizations that make up a significant share of the to infer attitudes towards COVID-19 vaccination with an discourse on Twitter, and (2) adjusting for the fact that the application to the Arabic speaking world. At first glance, population of Twitter users is biased towards more educated anti-vaccine sentiment estimated from Twitter data is sur- individuals. The most effective messages on the anti-vac- prisingly low in comparison to estimates obtained from cine side highlight claims that the vaccine causes serious survey data. Only about 3 percent of Twitter accounts in life-threatening side effects. In the pro-vaccine camp, tweets our database are identified as anti-COVID-vaccination containing content showing public figures receiving the (compared to 20 to 30 percent of survey respondents). This vaccine are found to have the largest reach by far. bias is resolved when: (1) filtering out accounts belonging This paper is a product of the Development Research Group, Development Economics. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The author may be contacted at rvanderweide@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Inferring COVID-19 Vaccine Attitudes from Twitter Data: An Application to the Arabic Speaking World1 Roy van der Weide2 Keywords: COVID-19, health behavior, Twitter, network analysis JEL codes: I12, I15, D85, O33 1 Excellent research assistance was provided by Joey Shea. The authors are grateful to Alexei Abrahams for providing the Twitter data and to Renos Vakis, Mohamad Chatila, and Zeina Afif for sharing the COVID survey data. The authors also thank Johannes Hoogeveen, Ha Minh Nguyen, and Roberta Gatti for their helpful comments and suggestions. Financial assistance was received from the State and Peacebuilding Trust-Fund (SPF). The views expressed here are the authors’ and do not reflect those of the World Bank, its Executive Directors, or the countries they represent. 2 The World Bank: rvanderweide@worldbank.org 1. Introduction Can Twitter data be used to monitor attitudes towards COVID-19 vaccination? How do estimates of vaccination hesitancy rates derived from Twitter data compare to estimates obtained from more conventional survey data? Who are the most effective messengers on Twitter and what are the most effective messages used on either side of the discourse (supporters versus opponents of COVID vaccination)? We investigate these questions in an application to the Middle East for which we have both conventional survey data on COVID vaccination attitudes and a database of tweets that feature the most prominent COVID hashtag in Arabic spanning the period from mid-December 2020 until mid-April 2021. To identify a given Twitter account’s attitude towards COVID vaccination (for all accounts observed in our database), we adopt the following approach. First, we identify the clusters of user accounts observed in our data. (Clusters are groups of users who are more prone to interacting with each other on Twitter. The resulting clusters, while purely identified by exploiting data on retweet behavior, tend be divided across geographic and ideological lines.) Second, we identify the 50-100 most influential accounts from each cluster. These influencers represent the central nodes in the network of Twitter accounts. Third, we manually inspect the Twitter feeds of these influencers for explicit references towards COVID vaccination that allows us to label them as either pro-vaccination or anti-vaccination (or neutral with respect to COVID vaccination). Fourth, and finally, we iteratively impute the vaccination attitude of other users in our database (i.e. who participated in the discourse on COVID vaccination on Twitter) based on their retweeting of users whom vaccination attitude has been established. We hereby rely on the assumption that a retweet without the addition of any comments is interpreted as an endorsement. We find that COVID vaccination hesitancy on Twitter in the Middle East is estimated at just below 20 percent, which is of a comparable magnitude as hesitancy rates estimated from survey data, provided that the appropriate filters are applied. At first glance, Twitter-based estimates of hesitancy appear notably downwards biased; when anti-vaccine attitudes are evaluated over the full population of Twitter accounts who engaged in the discourse on COVID vaccination, hesitancy is estimated at around 3 percent. This bias is fully accounted for, however, when we: (1) evaluate hesitancy over Twitter accounts belonging to individuals (i.e. filtering out accounts belonging to organizations who make up a significant share of the discourse on Twitter), and (2) adjust for the fact that the population of Twitter users is biased towards more educated individuals. With regards to effective messengers and messages in the discourse on COVID vaccination, we find that the most effective tweets on the anti-vaccine side highlight claims that the vaccine causes serious life-threatening side effects -- while on the pro-vaccine side, tweets showing public figures receiving the vaccine are found to be most effective. Finally, state-news accounts are found to receive the overwhelming majority of retweets in our database making them the most effective messengers. 2 The remainder of the paper is organized as follows. Sections 2 and 3 introduce our data and methodology, respectively. The main empirical observations derived from our database of Twitter data on COVID vaccination are presented in Section 4. In Section 5 we contrast our Twitter-based estimates of vaccination hesitancy to estimates obtained from more conventional survey data and rationalize the divergence. Finally, Section 6 concludes. 2. Data Between December 2020 and April 2021, we used Twitter’s open API to download tweets featuring the most prominent COVID hashtag in Arabic: ‫كورونا‬#. As our interest is in COVID vaccination attitudes in the region, we filtered the database for vocabulary related to vaccination. Specifically, we compiled a list of 24 vaccination-words in Arabic (shown in Table 1) and identified the sub-sample of tweets that feature at least one of these words. English Arabic Vaccine ‫لقاح‬ Vaccine ‫مصل‬ Treatment ‫عالج‬ Immunization against ‫تحصين ضد‬ Doctors Syndicate ‫نقابة أطباء‬ Sinopharm ‫سينوفارم‬ Vaccination ‫التلقيح‬ Possible side effects ‫واألعراض الجانبية المحتملة‬ Clinical trials ‫التجارب السريرية‬ Immunity ‫المناعة‬ Medicines ‫أدوية‬ G42 UAE company ‫شركة‬G42 ‫اإلماراتية‬ G42 UAE Healthcare Company ‫شركة‬G42 ‫اإلماراتية للرعاية الصحية‬ Chinese vaccine ‫اللقاح الصيني‬ Pfizer vaccine ‫لقاح فايزر‬ Sputnik vaccine ‫لقاح سبوتنيك‬ Table 1: Vaccination vocabulary Our data is organized into two sub-periods. The first spans from mid-December through to mid-February and contains approximately 730,000 tweets and 230,000 user accounts engaging in the vaccine discourse. Approximately 4000 influencers accounted for 95% of the discourse. The second period runs from late February to early April and contains approximately 500,000 tweets and 190,000 user accounts. About 5100 of these influencers accounted for 95% of the overall discourse. Twitter shares a select number of tweet and user account attributes, including tweet content, number of likes, number of retweets, user Twitter handle, number of followers. Attributes such as location (i.e. country of residence, nationality), whether the user is an 3 individual or an organization, and the profession of the user (i.e. government official, news, academic, blogger) are not included. We coded these attributes ourselves for the 100 most influential user accounts in our database. The primary user attribute of interest in our study is the user’s expressed attitude towards COVID vaccination. We identified this attribute for 687 influencers by screening their Twitter feeds and classifying their position with respect to COVID vaccination into one of three categories: (a) anti-vaccination, (b) pro-vaccination, and (c) neutral. User accounts are classified as anti-vaccination if at least one of their tweets expresses overt and explicit rejection of COVID vaccines or vaccination more broadly. Examples of anti-vaccination content includes accounts that point to cases of severe sickness or death post-vaccination, tweets which urge people to rely on alternative immunity boosting techniques to fight COVID instead of taking a vaccine, or content arguing that large pharmaceutical companies were responsible for manufacturing widespread fear of COVID in order to make profits. Conspiratorial content – for example, tweets claiming the vaccine was a plot by Bill Gates to inject the world with microchips – is also classified as anti-vaccination. Users are classified as pro-vaccination if at least one of their tweets expresses overt and explicit endorsement or support of a COVID vaccine or vaccination program. For example, any content – either written, photographic, or video – indicating that an individual has received a COVID vaccine was classified as pro-vaccination. Other cases include tweets where the account applauds their country’s vaccine rollout, shares content of high-profile figures encouraging vaccination, or shares science-based information about COVID vaccines or vaccination more broadly. Less obvious cases include instances where accounts post and/or retweet news articles about the vaccine. In these instances, additional tweets are manually investigated to ascertain if the account ever explicitly expressed support or rejection of a vaccine or vaccination strategy. If no overt and explicit sentiment could be identified, then the account is classified as “neutral.” Other user accounts are classified as neutral when they only refer to vaccination as a detail within the context of another issue, where vaccination is only mentioned cursorily. For example, in the case of the trending hashtag “the vaccine is for the Lebanese first” in Lebanon, certain accounts use the hashtag without commenting on vaccination explicitly. Instead, these tweets may comment on racism, corruption, or political divisions in the country, say. 3. Methodology This section will discuss (1) how we imputed vaccination attitudes for 301,783 user accounts (about 96 percent of the 314,380 users in our database) using the hard-coded vaccination attitudes we observed for the 687 influencers, and (2) how we identified the arguments used by users to support their views towards vaccination. 4 Imputing vaccination attitudes For the imputation of attitudes towards COVID vaccination, we rely on the assumption that retweets without the addition of qualifications (i.e. excluding quote-retweets) signify endorsements. We hard-code vaccination attitude as a numeric indicator taking the values 1 (pro-vaccination), 0 (neutral), and -1 (hesitant towards vaccination) for a subset of influential user accounts first. Next, we adopt an iterative process where we infer user vaccination attitudes conditional on the attitudes of users that have been established in prior iterations. The attitudes of the initial set of users (i.e., the “seed influencers”) are coded based on a manual inspection of their Twitter feeds. We continue to iterate until convergence is reached, i.e. until we are unable to infer the vaccination attitude for any users for whom we have yet to establish their attitude. Once the attitudes for the “seed influencers” are obtained, the exact iterative steps are: 1. Identify the set of users that satisfy both of the following two criteria: (1) their vaccination attitude has not yet been inferred, and (2) they retweeted at least once a vaccination- related tweet in our database from a user account for whom their vaccination attitude has been inferred. 2. For each user identified in step 1, evaluate the mean value over the vaccination attitudes of the user accounts they retweeted. 3. Add the users from steps 1-2 to the users for whom we have estimates of vaccination attitude and return to step 1 (until convergence). Note that the imputed attitude towards vaccination can attain any value in between -1 and 1 (given that it is obtained as the mean value over the attitudes of endorsed users). If a given user has retweeted 10 users with observed vaccination attitudes consisting of 8 hesitant users, 1 neutral user and 1 pro-vaccination user, then the imputed attitude for this user will take the value of (0.8 * -1) + (0.1 * 0) + (0.1 * 1) = -0.7. The empirical density of imputed attitudes (not reported here) is found to be concentrated at the values -1, 0, and 1, with comparatively little density in between these values. We round the imputed vaccination attitudes to the nearest integer values in order to classify each user into one of the three groups. Figure 1 plots the interactions between influencers for the first time-period (mid- December 2020 until mid-February 2021) as a network graph. The nodes represent the user accounts, while the “edges” connecting the nodes represent retweets (i.e. two nodes are connected when the two corresponding users retweeted each other). The cluster of nodes colored black shows the set of users who are identified/inferred as being hesitant towards COVID vaccination. 5 Figure 1: Network graph of user accounts Arguments To identify the argument used by hesitant user accounts versus users who are supportive of vaccination, we manually inspected a minimum of 200 most influential tweets from either side and identified the arguments used. The total of 437 hard-coded tweets account for just below 20 percent of all retweets. The list of arguments used is shown in Table 2. Each tweet is assigned to one or more arguments. We use this data to evaluate for each argument what share of retweets it has garnered. ANTI-VACCINATION ARGUMENT 0 Tweet unavailable; Twitter account takedown 1 Politics 2 Conspiracy theories 3 Vaccines are untested & unsafe 4 Freedom of choice PRO-VACCINATION ARGUMENT 0 Tweet unavailable; Twitter account takedown 1 Vaccines are tested & safe 2 Vaccines are effective & needed 3 Sharing vaccination information 4 Heralding vaccine creation & distribution 5 A king or celebrity or politician took it 6 Politics, religion, humor Table 2: arguments used on both sides (pro-vaccination versus anti-vaccination users) 6 4. Sentiment towards COVID vaccination on Twitter 4.1 How large is anti-vaccine sentiment on Twitter in the Middle East? We first sought to understand the size of the pro-vaccine and anti-vaccine camps in the region. Overall, we find that only about 3 percent of Twitter accounts demonstrate overt and explicit anti-vaccine attitudes. In contrast, about 89 percent of accounts communicate distinct positive views towards the vaccine. About 8 percent of accounts are classified as neutral towards the vaccine. The estimates are summarized in the left panel of Figure 2. The right panel of Figure 2 shows that the retweet shares are of a similar magnitude: pro-vaccine influencer accounts garnered 91 percent of retweets, neutral influencer accounts garnered 6 percent of retweets, and anti-vaccine influencer accounts garnered 3 percent of retweets. By far, the most retweeted tweets in our sample expressed positive attitudes towards vaccination. Figure 2: share of influencers (left) by vaccination attitude and share of retweets (right) At first glance, 3 percent of anti-vaccine user accounts appears low when compared to estimates of vaccination hesitancy obtained from other sources. We will examine this divergence and make an attempt to rationalize it in Section 5. 4.2 How effective is the messaging on either side? The most retweeted tweets are from user accounts that champion COVID vaccination as shown in Figure 3 which labels the top-40 most retweeted tweets based on their attitude towards vaccination. Consistent with this observation, anti-vaccine accounts are found to have a smaller number of followers when compared to user accounts who support the vaccination against COVID (see Figure 4). In sum, user accounts who are opposed to vaccination are both smaller in number and have a comparatively smaller following. 7 Figure 3: Number of retweets garnered by top-40 tweets Figure 4: Kernel density of number of followers This does not necessarily imply that hesitant users are less effective in their messaging when compared to vaccination champions. To evaluate effectiveness of messaging for both sides, we implement a kernel regression of the (log) number of garnered retweets against the (log) number of followers for each group separately. This allows us to compare effectiveness conditional on a user’s number of followers. The result is plotted in Figure 5. It follows that anti- vaccine accounts garner a comparatively larger number of retweets for a given number of followers. It can also be seen in the kernel regression that anti-vaccine users are under- represented in the population of user accounts with an excess of 250 thousand followers, which 8 is where the largest number of retweets are garnered (by pro-vaccine user accounts). In other words, while the anti-vaccine community on Twitter may be small, they are effective in their messaging (although their messages are unlikely to resonate with members outside of their community). The observation that anti-vaccine groups garner more retweets given their number of followers and consequently stimulate more user engagement is consistent with findings from earlier studies (Germani and Biller-Adorno 2021; Puri, Coomes, Haghbayan and Gunaratne 2020). It should be noted that both sides (supporters versus opponents of vaccination) arguably operate in echo chambers, with accounts from each camp producing content that is repeatedly reshared and recycled within their camps. While pro-vaccine tweets have a larger overall reach compared to their anti-vaccine counterparts, anti-vaccine content circulates more aggressively within its self-contained echo-chamber. Earlier studies similarly find that user content rarely interacts with the opposing camp, demonstrating the segregation of these communities (Puri, Coomes, Haghbayan and Gunaratne 2020; Schmidt, Zollo et al. 2018). Figure 5: Kernel regression of (log) retweets on (log) followers 4.3 What are the most effective arguments used by either side? We next inspect the arguments used by both anti-vaccine and pro-vaccine accounts to understand which messaging was most effective insofar as it had the largest reach. Figure 6 ranks the arguments used by anti-vaccine users based on the number of retweets these arguments garnered. The most effective message adopted by anti-vaccine users features news media coverage of side-effects and/or people becoming sick (in some cases fatally) after taking the vaccine. Arguments based on conspiracy theories come in second, but still attract almost 40 percent of retweets in our sample. Arguments referencing political issues capture just below 10 percent of retweets. Also noteworthy is that arguments that reference freedom of choice attract 9 a negligible share of retweets in our sample. Our results compliment earlier studies on anti- vaccine attitudes (Jamison, Amelia et al. 2020; Troiano and Nardi 2021; Kata 2010). Figure 6: Number of retweets by argument (anti-vaccine camp) Figure 7 shows the arguments used by the pro-vaccine camp. It can be seen that tweets showing public figures (i.e. politicians, members of royal families, celebrities) taking the vaccine – either via photos, videos or a written admission – overwhelmingly achieved the largest reach (garnering upwards of 200,000 retweets in our sample). The argument with the second largest reach in the pro-vaccine camp was content containing basic information about COVID, the vaccine or government regulations in response to the pandemic. This content does not express an explicit opinion on the vaccine, but instead provides information regarding the vaccine rollout – where to find the nearest vaccine center, who is eligible to receive the vaccine, the number of recently arrived doses, among others. Arguments stating that vaccines are tested and safe to use rank fourth in terms of the number of retweets garnered, while arguments stating that vaccinations are effective and needed to end the pandemic rank sixth. 10 Figure 7: Number of retweets by argument (pro-vaccine camp) The significant presence of content featuring popular public figures promoting vaccination is notable as this form of messaging has long been identified as an effective tool to increase positive vaccine attitudes. Studies have found that pro-vaccine content containing well- known and popular individuals positively impacts the willingness to receive the vaccine and is one of the most effective kinds of messaging (Puri, Coomes, Haghbayan and Gunaratne 2020). 4.4 Who are the most effective messengers? To understand what accounts are the most effective messengers, we also hard-coded whether accounts belongs to a government official, journalist or news outlet, think-tank, academic, blogger etc., for the 100 most influential user accounts (which account for the large majority of tweets in our database). A tabulation of the number of retweets garnered by type of user account is shown in Figure 8. State news accounts are found to have received the overwhelming majority of retweets (with approximately 250,000 retweets). Non-state-news account have had a fraction of the reach by comparison. Among the non-state-news accounts, bloggers are found to have the greater reach (with approximately 40,000 retweets) ahead of government officials (with approximately 30,000 retweets), independent news (with approximately 25,000 retweets), and academics (with approximately 10,000 retweets). 11 Figure 8: Number of retweets by type of influencer 5. Why is opposition among Twitter users lower than among Facebook users? Our Twitter-based estimates that opposition to COVID vaccination appears low at approximately 3 percent of user accounts who engaged on COVID vaccination in the MENA region. How does this compare to estimates obtained from other data sources? How might different platforms offer competing versions of vaccine opposition within a studied population? And what might potential discrepancies tell us about the use of social media as an effective public health messaging tool? We will compare our Twitter-based estimates of vaccine hesitancy to estimates from two alternative sources of data: (a) Facebook survey data on vaccine hesitancy in Lebanon, the West Bank, and Gaza, and (b) a more conventional phone survey on vaccine hesitancy in Lebanon. We will refer to these two data sources as Facebook-survey and phone-survey. Figure 9 shows estimates of the anti-vaccination rates derived from either survey in one graph. The Facebook- survey estimates of vaccine hesitancy are 21 percent for Lebanon, 17 percent for the West Bank, and 25 percent for Gaza. The phone-survey estimates vaccine hesitancy in Lebanon at approximately 26 percent. While our Twitter-based estimates correspond to a different population, the MENA region at large, the gap in estimated hesitancy rates is too large in magnitude to be explained by the discrepancy in country compositions. What then would rationalize this divergence? 12 Figure 9: vaccine hesitancy rates by country (survey estimates) In this section we will consider the following candidate explanations for why the vaccination opposition rate estimated from Twitter data is visibly lower than hesitancy rates estimated from Facebook-survey and phone-survey data: (1) Twitter accounts include both individuals and organizations, whereas the two surveys collect data for individuals, (2) Twitter users are a self-selective sub-sample biased towards higher levels of education, higher income and increased political engagement, and (3) the vaccine rollout during the period in which the data for our sample was collected was uneven and limited and users therefore may not yet have developed strong views on vaccination, nor the incentive to express these views publicly on Twitter. Possible biases we are unable to address at this point include: (a) recent studies have found that greater Facebook usage is positively correlated with greater COVID vaccination hesitancy, which may introduce a positive bias in hesitancy estimated from Facebook survey data (see e.g. Figure 17 in Eurofound, 2021), (b) individuals may be more comfortable sharing vaccination attitudes anonymously (in a survey) than publishing their view in the public domain (on Twitter), and (c) solicited views expressed through a survey or interview differ from unsolicited views expressed publicly and readily. Note that these potential biases may run in opposite directions. 5.1 Twitter accounts include both individuals and organizations Figure 10 shows that organizations account for about 65 influencer accounts in our sample, in comparison to persons, which accounted for about 38 influencer accounts. When only considering accounts belonging to persons, our Twitter-based estimate of vaccine opposition increases to 19 percent (Figure 11), which is remarkably close to the vaccine hesitancy estimated from the survey data. 13 It follows that vaccine opposition among organizations on Twitter is near zero. Organizations in our sample consist of accounts representing news organizations, NGOs, private companies, law firms, government entities, international bodies, among others. These entities are more likely to express overtly positive attitudes towards vaccination, given the global institutional push to promote COVID vaccination. Many of these organizations included government entities and official news sources, both of which are obligated to share official information from the WHO and their domestic health ministry. Accounts representing other organizations may arguably be reluctant to share anti-vaccine content due to the possibility of reputational damage or deplatforming.3 Figure 10: number of influencers by type (person vs. organization) Figure 11: share of influencers by vaccination attitude: all accounts (left) vs individuals (right) 3 https://help.twitter.com/en/rules-and-policies/medical-misinformation-policy 14 5.2 The Twitter population is a selective subsample that is more educated The population of individuals who engage on Twitter is plausibly biased towards higher levels of education (and income) when compared to the general population. Under the assumption that anti-vaccine attitudes are lower among more educated individuals, this self- selection bias may help explain why Twitter-based estimates of COVID vaccination opposition are comparatively low. To test this hypothesis, we use the data from both the Facebook- and phone- surveys to estimate vaccine hesitancy separately for each of four individual education categories recorded in the surveys: (i) no education, (ii) primary education, (iii) secondary education, and (iv) completed tertiary education. The result, shown in Figure 12, confirms that vaccine hesitancy indeed declines with education (with the exception of Gaza). When we restrict the sample of survey respondents to those who completed a secondary degree or higher, the hesitancy rates for Lebanon and the West Bank are estimated between 16 and 19 percent (compared to up to 25 percent when evaluated over the full sample of individuals). These rates come remarkably close to the opposition rate of 19 percent estimated from Twitter data when the Twitter accounts are restricted to those belonging to individuals (i.e. filtering out accounts belonging to organizations). Figure 12: vaccine hesitancy versus education level (survey estimates) 5.3 Vaccine rollout is slow in MENA such that discourse may yet have to pick up The vaccine rollout in MENA has been comparatively slow. In Saudi Arabia, the daily number of vaccine doses administered per 100 people did not reach more than 0.05 in January 2021. Throughout February, this rate remained low before ticking up at the end of the month. The rate increased throughout March, reaching a peak of 0.46 vaccine doses administered per 15 100 people on March 25.4 Vaccination rates in both Egypt and Lebanon remained extremely low throughout January, February and March, with Egypt never exceeding 0.01 daily doses administered per 100 people during this time period and Lebanon never exceeding 0.11. 5 Kuwait’s rollout remained low throughout January, but ticked up during February and March with the daily number of vaccine doses administered per 100 people ranging from 0.16 to 0.30 during that period.6 The United Arab Emirates is the outlier in these states, with relatively high rates of vaccination at the beginning of January that persisted throughout February and March.7 The delayed rollout of vaccines in MENA could lead to either an increase or decline in opposition over time. Users may develop stronger views on vaccination as they are being rolled out. Fridman et al. (2021) observed a decline in the intention to seek a COVID vaccine as the vaccines became available in a six-month study between March and April 2020. Alternatively, when vaccinations are shown to be a change for good both inside and outside MENA (where vaccinations have been rolled out in larger numbers and COVID cases are seen to be coming down), opposition towards vaccination may decline. Figure 13: share of influencers by vaccination attitude: Dec-Feb (left) vs. Feb-Apr (right) To assess whether vaccine opposition in MENA has changed over time we estimate opposition rates for two periods of time. The first period corresponds to December 2020 to February 2021, while the second period runs from February 2021 to April 2021. Figure 13 plots the results. The percentage of accounts (retweet share) tagged as demonstrating negative attitudes towards the vaccine is 3% (3%) in the first period and 3% (4%) in the second. In sum, the increase in vaccine opposition over time is found to be negligible. Given the disparate rates of vaccine administration in each of the countries from which our data was collected during this period, it is difficult to assess the effects of the vaccine rollout itself on anti-vaccine sentiment. 4 https://ourworldindata.org/coronavirus/country/saudi-arabia 5 https://ourworldindata.org/coronavirus/country/lebanon; https://ourworldindata.org/coronavirus/country/egypt 6 https://ourworldindata.org/coronavirus/country/kuwait 7 https://ourworldindata.org/coronavirus/country/united-arab-emirates 16 6. Concluding remarks This study investigates whether Twitter data can be used to infer attitudes towards COVID vaccination with an application to the Arabic speaking world. Our objectives are three-fold. First, to estimate the prevalence of anti-vaccine attitudes using Twitter data. Second, to identify which messengers and what arguments are found to be most effective. Third, to assess how Twitter- based estimates of vaccine hesitancy compare to estimates obtained from survey data. The following findings stand out. At first glance, anti-vaccine sentiment estimated from Twitter data is surprisingly low in comparison to estimates obtained from survey data. Only about 3 percent of Twitter accounts in our database are identified as anti-COVID-vaccination (compared to 20 to 30 percent of survey respondents). The most effective messages on the anti-vaccine side highlight claims that the vaccine causes serious life-threatening side effects. In the pro-vaccine camp, tweets containing content showing public figures receiving the vaccine are found to have the largest reach by far. This observation sits well with the earlier study by Alatas et al. (2019), which finds that Twitter endorsements of vaccination by celebrities are most effective. The study furthermore finds that celebrity endorsements can influence beliefs about vaccination. With respect to messengers in our application to the MENA region, we find that state-news accounts receive the overwhelming majority of retweets in our database. To explain why our estimates of anti-vaccine attitudes are visibly lower than survey-based estimates, which are found to range approximately between 20 to 30 percent, we consider three candidate sources of bias. First, Twitter accounts include both individuals and organizations, whereas survey respondents consist of individuals only. The organizations in our sample are primarily news organizations, NGOs, private companies, law firms, government entities, and international bodies – all of which are more likely to express overtly positive attitudes towards vaccination. Individuals, on the other hand, are more likely to hold anti-vaccine attitudes and express them explicitly online. Second, Twitter users are a selective sub-sample of the population biased towards higher levels of education, higher income and increased political engagement. Third, the vaccine rollout during the period in which the data for our sample was collected was slow and uneven. Therefore, users may not have developed strong views on vaccination during this period, nor did they have the incentive to express these views on Twitter. Most of the difference between the Twitter-based and survey-based estimates of COVID vaccine hesitancy can be accounted for by restricting Twitter accounts to those belonging to individuals. Using the survey data, we furthermore find that vaccine hesitancy declines with the respondent’s level of education. Restricting the survey samples to individuals who completed a secondary degree or higher we obtain survey-based estimates of vaccine hesitancy that are of the same magnitude as Twitter-based estimates of hesitancy (among accounts belonging to individuals). Two possible directions for future work include: (a) extending this analysis to other regions of the world, starting with the Twitter discourse on COVID vaccination in English (which will be a magnitude larger in volume), (b) extending the analysis to other social media platforms, 17 including the discourses observed on public Facebook pages, and (c) studying the effectiveness of COVID vaccination campaigns. Bibliography Alatas, V., Chandrasekhar, A., Mobius, M., Olken, B. and C. Paladines (2019), “When celebrities speak: A nationwide Twitter experiment promoting vaccination in Indonesia”, NBER Working Paper 25589 Eurofound (2021), “Living, working and COVID-19 (Update April 2021): Mental health and trust decline across EU as pandemic enters another year”, April 2021 Factsheet Fridman A, Gershon R, and A. Gneezy (2021), “COVID-19 and vaccine hesitancy: A longitudinal study.” PLOS ONE 16(4): e0250123. https://doi.org/10.1371/journal.pone.0250123 Germani, F. and N. Biller-Andorno (2021), “The anti-vaccination infodemic on social media: A behavioral analysis.” PLOS ONE 16(3):e0247642. https://doi.org/10.1371/journal.pone.0247642 Jamison A, Broniatowski D., Smith M., et al. (2020), “Adapting and Extending a Typology to Identify Vaccine Misinformation on Twitter.” American Journal of Public Health. 110(S3): S331- S339. doi:10.2105/AJPH.2020.305940 Kata, A. (2010), “A postmodern Pandora's box: Anti-vaccination misinformation on the Internet.” Vaccine. 28(7); 1709-1716 Meadows C, Tang L. and W. Liu (2019), “Twitter message types, health beliefs, and vaccine attitudes during the 2015 measles outbreak in California.” Am J Infect Control. 47(11):1314-1318. doi: 10.1016/j.ajic.2019.05.007. Epub 2019 Jun 29. PMID: 31266661. Puri, N., Coomes, E., Haghbayan, H. and K. Gunaratne (2020), “Social media and vaccine hesitancy: new updates for the era of COVID-19 and globalized infectious diseases.” Human Vaccines & Immunotherapeutics, 16:11, 2586-2593, DOI: 10.1080/21645515.2020.1780846 Schmidt A., Zollo F., Scala A., Betsch C. and W. Quattrociocchi (2018), “Polarization of the vaccination debate on Facebook.” Vaccine. 36(25):3606-3612 Troiano G. and A. Nardi (2021), “Vaccine hesitancy in the era of COVID-19.” Public Health. Volume 194, 245-251. 18