Policy Research Working Paper 10024 Displacement and Return in the Internet Era How Social Media Captures Migration Decisions in Northern Syria Erin Walk Kiran Garimella Fotini Christia Social Sustainability and Inclusion Global Practice April 2022 Policy Research Working Paper 1024 Abstract Starting in 2011, the Syrian civil war has resulted in the economy. Building on these findings, the paper first uses displacement of over 80% of the Syrian population. This mixed effects models to show that these results hold pre- paper analyzes how the widespread use of social media and post- return as well as when migration is quantified as has recorded migration considerations for Syrian refugees monthly population flows. Second, it leverages mediation using social media text and image data from three popular analysis to find that discussion on social media mediates platforms (Twitter, Telegram, and Facebook). Leveraging the relationship between violence and return in months survey data as a source of ground truth on the presence where there are fewer violent events. Monitoring refugee of IDPs and returnees, it uses topic modeling and image return in war prone areas is a complex task and social media analysis to find that areas without return have a higher may provide researchers, aid groups, and policymakers with prevalence of violence-related discourse and images while tools for assessing return in areas where survey or other data areas with return feature content related to services and the is unavailable or difficult to obtain. This paper is a product of the Social Sustainability and Inclusion Global Practice. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at ewalk@mit.edu. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Displacement and Return in the Internet Era: How Social Media Captures Migration Decisions in Northern Syria Erin Walk* , Kiran Garimella* Fotini Christia++ * MIT IDSS ++MIT Political Science and MIT IDSS JEL Classification: F22, F52 Keywords: Social media, Displacement, Refugee return, Syria, Migration, Civil War This paper was commissioned by the World Bank Social Sustainability and Inclusion Global Prac- tice as part of the activity “Preventing Social Conflict a nd P romoting S ocial C ohesion i n Forced Displacement Contexts.” The activity is task managed by Audrey Sacks and Susan Wong with assistance from Stephen Winkler. This work is part of the program “Building the Evidence on Protracted Forced Displacement: A Multi-Stakeholder Partnership”. The program is funded by UK aid from the United Kingdom’s Foreign, Commonwealth and Development Office ( FCDO), i t is managed by the World Bank Group (WBG) and was established in partnership with the United Nations High Commissioner for Refugees (UNHCR). The scope of the program is to expand the global knowledge on forced displacement by funding quality research and disseminating results for the use of practitioners and policy makers. This work does not necessarily reflect t he v iews of FCDO, the WBG or UNHCR. 1 Introduction The Syrian civil war, raging for almost a decade, has led to some of the largest population displacement of our times. Host nations, even those that had followed an open-border policy (such as Turkey, Jordan and Lebanon), have seen their citizens become wary if not outright hostile toward refugees (Kirişci (2014), Nielsen (2016), Aktas et al. (2018), Getmansky et al. (2018), Yahya et al. (2018)). The European Union is militantly protecting its borders against new migrant flows and Turkey has announced the projected return of over one million Syrian refugees in a 20-mile-deep safe zone in northern Syria that includes the areas of Idlib, Tel Abyad, and Azaz (Hoffman and Makovsky (2021), Gall (2019)). The COVID-19 pandemic-induced border closures and the spread of the disease in crowded refugee camps have further increased calls for refugee return. Despite these hopes, in 2020 only 467,000 Syrian refugees returned home, while 1.8 million were newly displaced (NRC (2021)). Though UNHCR intention surveys indicate that most refugees would like to return at some point, many are unwilling to do so under the current conditions, which are further impacted by the large number of IDPs, and report that they have given up hope of being able to return in the next 5 to 10 years (NRC (2021)). Furthermore, many young and educated Syrians in the capital city of Damascus indicate that they would choose to emigrate if given the opportunity (Jalabi (2021)). This paper analyzes refugee return and internal displacement in the internet era with a focus on understanding how the widespread use of social media captures considerations on return for Syrian refugees. In doing so, it also considers displacement in Syria as a lens through which to view the difficulties returning refugees may face. Though many have analyzed attitudes around return with surveys (Ghosn et al. (2021), Alrababa’h et al. (2020), Krishnan (2020)), this paper is, to our knowledge, the first exploration using observational social media text and image data on the prospects for refugee return in northern Syria. Northern Syria is a territory still militarily contested by the Syrian Opposition with the support of Turkey; the Assad regime with the support of Russia; and the Kurdish Autonomous Authority with the support of the US. Neighboring Turkey has also been vocal in plans for re-settling refugees in Turkish-controlled areas of Syria (Hoffman and Makovsky (2021)). The paper draws on original social media data from the three most popular social media platforms in Syria: Telegram, Facebook, and Twitter. Our focus is on attitudes, behaviors, and information dissemination around topics of internal displacement and refugee return. To identify discussions and conditions that reflect internal displacement and return we use unsupervised machine learning (ML) methods to cluster topics of interest from the text and image data. We also run ‘seeded’ topic models, models run on subsets of the data filtered for keywords related to displacement, local governance, service provision, and return. Using insights from the topic models we run two sets of analysis. First, we use mixed effects models to identify changes in topic discussion post-return. Second, we use mediation analysis to understand the role that social media plays in the relationship between local violence and return. We find that discussions of violence are more prevalent in areas with neither returnees nor IDPs, with regime military action, air strike warnings and the anti-ISIS campaign 29–52% more prevalent,1 whereas discussions of services and the economy are more prevalent in areas with returnees and IDPs (an increase between 16% and 73%). Images posted in return groups are more likely to be goods for sale including cars, motorcycles, and other miscellaneous items (indicating active trade and commerce, between 79% to 700% more prevalent in areas with only returnees than areas with neither IDPs nor returnees). In non-return areas, images are more likely to be violence-related including militants and tanks (up to 113% more prevalent than in areas with IDPs 1 These percentages reflect the percent difference in mean topic proportion between areas without returnees and IDPs and those with. 3 and returnees). These results hold when enabling correlation between topics and adding time and location effects, as shown using mixed effects models. Furthermore, using more granular data on return and displacement flows we show that increased return is associated with economy topics and negatively associated with violence topics, while increased displacement is positively associated with violence topics. Finally, we use causal mediation analysis to show that in months with lower violence levels increased discussion of violent events mediates the relationship between violence and return. In summary, our paper makes the following contributions: First, it shows the benefits of social media data, including images and messages, for fleshing out motives and discussions around displacement and return. Most research to date has been done using surveys and on the ground interviews which are costly and not always possible. By combining survey and other data with social media data, future research can leverage the precision of the former with the ease of remote access of the latter (Singh et al. (2019)). Second, we combine text analysis with image analysis to see to what degree images dovetail with the discourse and what additional information we can extract from this medium. The rest of the paper is organized as follows: First, we discuss the context for our project including the situation in Syria, reasons for displacement and ongoing violence, and relevant literature on displacement, attitudes around return, and social media use. Next, we outline our research design and data collection, including procedures for choosing groups as well as text and image analysis methods. We then present our results and discussion, outlining our findings on return and internal displacement from topic models, images, mixed effects, and causal mediation analysis. We conclude with program and policy implications and suggestions for next steps. 2 The Syrian Context Since the start of the Syrian civil war in March 2011, Syria has seen unprecedented levels of internal displacement, movement of refugees, and civilian casualties. The war began as a protest movement in 2011 to remove President Bashar Al-Assad from power, a movement that coincided with Arab spring protests that removed the Egyptian and Tunisian presidents. Prior to the civil war there were already high levels of perceived corruption and diminished trust in public institutions (Bank (2017)). By 2012, the protests had evolved into militarized violence. As the war is in its 10th year over 400,000 Syrians have been killed and 13.2 million Syrians are refugees, asylum seekers, and internally displaced people, accounting for one sixth of the global total and 80% of the total Syrian population (UNHCR (2020)). Of these 13.2 million, 6.6 million are registered refugees, about two thirds of which reside in Turkey and five sixths of which reside in countries bordering Syria, the majority not in refugee camps (UNHCR (2021)). The war has exacted a heavy economic and social toll both on Syria and its neighboring countries (Bank (2020b), Bank (2017)). The Syrian civil war is also complex in terms of foreign intervention. Since the early stages of the war many foreign parties have been involved in the conflict including Russia, Iran, and the Lebanese militia Hezbollah on behalf of the regime (Phillips (2016)), and states in the Gulf and Turkey on behalf of competing opposition factions. The US has intervened in support of the Kurdish Syrian Democratic Forces (SDF) and International Coalition partners against ISIS. Meanwhile, foreign fighters and mercenaries have joined the conflict primarily on behalf of jihadist groups including ISIS (Mitts (2019)). Syrians have been displaced by air strikes and conflict in their home cities, including takeover by militants and regime forces, which is made even more volatile by the constantly changing factions and areas of control. There has also been a notable decrease in economic opportunity due to destruction of cities and roads as well as sanctions aimed at weakening 4 the regime.2 The coalescing of these various groups and interests makes the situation, and return prospects, in northern Syria incredibly complex. Turkey has faced a lack of stability and domestic political strife over the influx of refugees since 2011, yet as the situation draws on many seem less likely to return. In 2017 the annual Syrians Barometer poll conducted by Murat Erdoğan indicated that 17% of Syrians would not return under any conditions, and in 2019 this had increased to 52% (Erdogan (2019)). Many more indicate they would only return if the war ended and a governance structure which they favored was put into place. In order to stem the flow of refugees, and in some instances force return, Turkey has taken strong offensive actions which limit international humani- tarian aid through both stifling opportunity and igniting humanitarian concerns over treatment of Kurds.3 In many areas governing and aid provision are overseen by the governors of neighboring Turkish provinces (Hoffman and Makovsky (2021), Al-Hilu (2019)). In Idlib, most of the struc- ture is provided by former Al-Qa’eda affiliated Hayat Tahrir Al Sham, with some collaboration with Turkey. The segmentation of areas of control and the closure of border crossings other than Bab al-Hawa make the work of humanitarian actors in Syria quite difficult. Many actors have channeled aid for their own purposes to control goods provision, and the burden of provisioning aid falls overwhelmingly on Syrian workers who may also be targets for attack (Hall and Todman (2021)). Though satisfaction with the impact of humanitarian aid has increased, organizations still face challenges in regards to incorporating refugees into aid processes and decision making, funding local organizations who can target aid, and coordinating aid provision (Voluntas (2019)). Since the defeat of ISIS by the International Coalition forces in Ar-Raqqa in October 2017 the war in Syria seems to be nearing its end. However, though 75% of displaced Syrians would like to one day return to their former homes (UNHCR (2019)), ongoing violence has resulted in continued displacement (NRC (2021)). Syrians in Damascus, especially the young and educated, indicate they would be interested in emigrating if the opportunity presented itself, but they are unable to do so for monetary and other reasons (Jalabi (2021)). Furthermore, desire to return is linked to strong ties with one’s home country and hometown, ties which are likely to erode the longer the war drags on and the longer individuals are displaced (Ghosn et al. (2021)). Return to Syria in general has been infrequent and selective, though better security and service access increases returns (Bank (2019)). Within Syria some governorates including Deir-ez-Zour, Ar-Raqqa, and Al-Hasekah have lost a large share of their population (all over 25%) whereas others such as Idlib and Rural Damascus have gained inhabitants (Bank (2019)). In the 2015-2016 Syrian Refugees and Host Communities Surveys (SRHCS) most refugees reported less than a week to prepare to leave, with the majority ending up in a neighboring country (Krishnan (2020)). Refugees in neighboring countries are less likely to be highly educated, with only 1% having completed a university degree and most working in wage jobs and construction. In Lebanon, manufacturing, construction, and agriculture are the only sectors available for refugee employment. More Syrian men are able to work in Syria, but refugees have better access to resources than IDPs and residents in Syrian governorates with a high level of conflict. Conversely, refugee children often have worse access to education. Housing conditions in Syria are poor, especially in Idlib, and many returnees lack documents (Bank (2019)). More recently 2 Documentation of U.S. sanctions on Syria https://home.treasury.gov/policy-issues/ financial-sanctions/sanctions-programs-and-country-information/syria-sanctions 3 “German NGO scraps Syria project over claims it would aid Turkey’s ethnic cleansing in Afrin,” https://www.kurdistan24.net/en/story/ 23475-German-NGO-scraps-Syria-project-over-claims-it-would-aid-Turkey% 27s-ethnic-cleansing-in-Afrin. 5 the COVID-19 pandemic has deeply impacted refugee communities in the aforementioned locations. Refugees in host countries were more likely to be living below the poverty line pre-pandemic and refugees were often more highly impacted due to reliance on wage work (Bank (2020a)). Within Syrian, Syrians living in all regions are subject to a lack of services including electricity and water, and 60% of Syrians are food insecure (WFP (2021)). 3 Relevant Literature The situation of Syrian refugees is not unique, with over 80% of global refugees hosted by developing countries bordering the regions refugees flee from (UNHCR (2020)). These host countries struggle with service provision and maintaining the requisite bureaucracy to determine refugee legal status, while also facing backlash and anti-refugee sentiment among their own citizens (Lazarev and Sharma (2017), Alrababa’h et al. (2021), Chu et al. (2019), Braithwaite et al. (2019), Bradley (2013)). Refugees may be displaced from their home country for many reasons including economic, political, and environmental factors, and their destination depends on similar factors in potential host countries (Arias et al. (2014), Lischer (2005), Rüegger (2013), Davenport et al. (2003), Zolberg et al. (1989), Martin et al. (2019)). Kunz (1973) codes displacement as anticipatory, when the refugee is prepared to leave before the situation in their home country has degraded, or acute, when refugees are forced to leave due to government failure, violence, or environmental conditions (Kunz (1973)). The relative importance of different displacement conditions varies depending on region and political climate (Singh et al. (2020)). Given area politics, we may expect similarities in displacement between Syria and Iraq, where those topics which generated the most buzz prior to displacement were related to politics, insecurities, and infrastructure (Martin and Singh (2018)). Barring local conditions, violence is often considered the key driver of displacement (Dav- enport et al. (2003), Zolberg et al. (1989)). In a 2020 CARE survey conducted through interviews with Syrian IDPs, 99% said they had been displaced due to violence or fighting (Hoffman and Makovsky (2021)). The type and prevalence of violence also matters, in that state sponsored vio- lence and genocide are more likely to lead to refugees whereas civil wars are associated more highly with internal displacement (Steele (2019), Moore and Shellman (2006)). Individuals who witness, but do not necessarily directly experience, violence are more likely to delay leaving if they receive community support and experience post-traumatic growth (Schon (2019)). Enabling return for the displaced requires a good understanding of how the displaced view their possibilities and options around return (Metivier et al. (2018), Camarena and Hägerdal (2020)) as well as the broader effects of such return on postwar politics (Fabbe et al. (2019)) or on security (Van Leeuwen and Van Der Haar (2016), Camarena (2016), Schwartz (2019)). Willingness to confront danger on the path to return may also be impacted by refugees’ experiences of violence prior to return and belief in their ability to adapt to circumstances on the ground (Ghosn et al. (2021)). As a result, refugees who spent more time in their home country before fleeing, and were thus more likely to have been victims of violence, are more likely to return. In the Syrian context, refugees who are single and male are more likely to return since family ties often play a large role in these decisions (Bank (2019)). 3.1 Return Considerations Knowledge of the factors which initiate displacement is key to understanding when refugees may return. Individuals are more likely to return once there are economic opportuni- ties, services, and social networks in place to support their transition, the same features which may have prompted movement away from a region when they became unavailable (Aymerich and Zeyneloglu (2019), Hoogeveen et al. (2017), Arias et al. (2014)). In a survey study conducted on 6 3,003 displaced Syrian families in Lebanon, Alrababa’h et al. (2020) weigh the influence of “pull” factors, which draw individuals back to their former homes, and “push” factors which encourage them to leave their host communities. In doing so they find that conditions in a refugee’s home country and access to information about those conditions are a primary driver in decision making (Alrababa’h et al. (2020)). Indeed, economic factors may be even more vital to return choices than sentiment towards or connection with their home country (Camarena and Hägerdal (2020)). Refugees with strong ties to their home country may still choose to return as regular visitors rather than permanently if their new residence offers more attractive economic opportunities. Addition- ally, refugees are less likely to return to areas with ethic mixing when ethic tensions were a factor in initial violence and displacement (Camarena and Hägerdal (2020)). Refugees have complex relationships with their host communities as well as with their home locations. Displacement can be part of a strategy of war, with militants manipulating refugee and IDP movement to gain access to resources or ethnically cleanse certain areas (Lischer (2008)). As part of these strategies refugees may be militarized, spreading conflict to their new locations (Lischer (2008)). More positively, in areas with low public service provision the introduction of refugees to a community may bring goods and services which benefit all the residents (Zhou et al. (2021)). Such community benefits mitigate negative sentiment towards refugee settlements. Identifying effective ways to measure return practices is not straightforward as there are no systematic data sources on refugee return, and many current reports rely on surveys to test the underlying assumptions governing return decisions (Ghosn et al. (2021)). Given the importance of information for decisions around return (Alrababa’h et al. (2020)) and gauging living conditions in the home country, leveraging social media sources is of critical relevance because they offer an immediate assessment of returnee preferences and thoughts on return as well as a way to measure how refugees access and consume information that would be pertinent to their return. Relevance to the call: Our paper contributes to work on refugee resettlement and attitudes from other researchers in this call including Bove et al. (2021), Kaplan (2021), and Parry and Aymerich (2021). Bove et al. (2021) notes that refugees are more likely to return to areas with UN peacekeeping missions, as they provide a sense of stability and help with service provision. Local governments and militias in Syria may provide similar services in certain regions (Bove et al. (2021)). Additionally, local peacekeeping agreements, if crafted with the aide of those from all sides, may facilitate safer and lasting return. Such agreements may even work in cases of high tension, such as perceived ISIL affiliation of IDPs, if the IDPs are part of the discussion or trusted leaders work as negotiators for them (Parry and Aymerich (2021)). Finally, strong religious beliefs may increase the likelihood that individuals stay in their home communities even in the presence of violence (Kaplan (2021)), as well as making them more likely to encourage others to return. Our paper augments these studies, which focus on specific factors, by considering return as a whole in terms of discussions occurring in these communities. As a result, we can determine if the factors outlined here are present in social media discourse and to what extent discussion of them differs. 3.2 Social Media and Information Access Beginning with the protests in 2011, social media has played an integral role in the Syrian civil war leading researchers to refer to it as the “most socially mediated civil conflict in history” (Lynch et al. (2014)). In the early stages of conflict platforms such as Twitter, YouTube, and Facebook were used by civilian activists to organize and share images of the protests (Freelon et al. (2015)). As the protests evolved into war, many of the key players became involved on social media. Extremist groups have used Twitter to spread their ideology (Klausen (2015), Wei et al. (2016)), as well as sectarian hate speech (Siegel and Badaan (2020), Abdo (2015)), and propaganda (Chatfield et al. (2015)), while Syrian Opposition Forces have used Facebook to communicate their 7 war narrative (Crilley (2017)). The presence of perspectives from all sides of the conflict leads Gohdes (2020) to note that “the Syrian conflict is one of the first conflicts where lines between offline and on-line conflict engagement have become blurred” (Gohdes (2020)). Outside of the Syrian context, social media data is increasingly used alongside, or even in place of, public opinion surveys, often with highly similar results (Schober et al. (2016)). In addition to Twitter and Facebook, the role of which have been more widely studied (Khamis et al. (2012), Metzger and Siegel (2019)), we extend analysis to include data from Tele- gram “an encrypted platform that is harder for governments to monitor” (Mitts (2019)). Much of the research on Telegram has focused on its use by ISIS (Prucha (2016), McDowell-Smith et al. (2017), Yayla and Speckhard (2017)), though Telegram’s privacy policies also support civilians and protest movements worldwide (Urman et al. (2020)). Though much work has been dedicated to the role of social media in constructing violence narratives, it also plays a vital role in migration movement (Frouws et al. (2016), Miconi (2020), Sánchez-Querubín and Rogers (2018)). Refugees share information about routes and conditions in potential host countries to ease the journey for future displaced peoples (Frouws et al. (2016)). In interviews conducted with 44 young refugee or immigrant Syrian social media users, Miconi (2020) finds that these platforms are used for not only staying connected to war developments but also facilitating resettlement in the host country. A similar Facebook specific study notes that members of the diaspora turn to the platform to maintain social ties (Ramadan (2017)). Given the ubiquity of social media in the conflict, it follows that Syrians use social media not just for the war effort or to learn of migration routes but also to assess local conditions for return. In acknowledgement of the convenience and importance of social media, a recent line of inquiry involves combining social media data with traditional variables to predict displacement (Singh et al. (2019), Abrishamkar et al. (2018)). Singh et al. (2019) rely on Twitter data from Iraq to assess when violent events are taking place using sentiment analysis of tweets with the hashtag “ISIS”. In combining this with traditional movement variables, they improved the accuracy of displacement predictions. To show the flexibility of such techniques, they are adapting a Spanish language model for use on the Venezuelan crisis (Singh et al. (2020)). Similarly, Abrishamkar et al. (2018) adapted a variety of techniques to get a signal of violence from news articles, and used this signal as a factor in predicting displacement (Abrishamkar et al. (2018)). By assessing topic variations in return and non-return areas and areas with and without IDPs, and noting differences in discussion of violence, governance, and services, we show that social media reveals information about return decisions and builds a foundation for adapting such techniques to consider signals which favor return, as well as bringing in new aspects such as image analysis. 4 Research Design and Data Collection Our analysis relies on novel data collected from Telegram, Twitter, and Facebook in the latest stage of the war beginning after the recapture of Ar-Raqqa from ISIS on 17 October 2017 and going through 1 December 2020. Group selection processes focus on entities which are located in Syria or primarily discussing issues related to Syria. The success of work on predicting displacement using social media conversation (Abrishamkar et al. (2018)) as well as conclusions that social media can be an effective tool for measuring public opinion (Schober et al. (2016)) indicate that users discuss salient events and opinions on social media and thus we anticipate differences in discussion in areas with and without return or IDPs. Analysis is limited to messages posted in Arabic, as messages in the native language are more predictive of displacement (Singh et al. (2020)). We augment our collection of social media data with more traditional indicators gathered from surveys and interviews by the REACH resource center (REACH (2020)), the UN Office for the Coordination 8 of Humanitarian Affairs (OCHA, OCHA (2021)), and the Armed Conflict Location and Event Data Project (ACLED, Raleigh et al. (2010)). The REACH resource center presents monthly data dating back to 2018 on a variety of indicators gathered from interviews with individuals living in communities across Syria (REACH (2020)). Not all communities have data points in every report, but any community which is men- tioned at least once is included in the data resulting in a total of 3,548 locations. All communities considered in the paper have community level postal codes per the Syrian census.4 Our main in- dicators are whether or not the community has returnees and/or hosts internally displaced people (IDPs) as reported by the community contacts. We use this dataset as ground truth to build our findings from the social media data. However, such information remains limited as there is no indicator for the scale of return. For communities in which return first occurred during data collection we note that month as the ‘return date’. There are no communities in the dataset which first hosted IDPs during the data collection period. Return communities from the REACH data are broadly similar across metrics from the 2004 census such as population, distance from the nearest large city in the sub-district, distance from the border, agricultural employment, and primary ethnicity as noted in Appendix 9.2. Areas with only IDPs have a slightly lower average population in Al-Hasakeh, Aleppo, and Idlib. Though census features are likely outdated after 11 years of war, there is not a more recent official data source for population comparison. The distances are calculated using border and city coordinates from the census and Syria shape files. (a) All, coded by return type. ..... ........(b) Locations with return during collection, coded .................... ................ by return year. Figure 1: Maps of REACH location mentions which are also within the census data. To supplement the REACH data we use data from OCHA on population flows, which is aggregated data collected by several humanitarian aid partners (OCHA (2021)). In total this dataset contains information on 2,882 different locations, of which 1,550 have at least one month of return information and 1,983 have at least one month of displacement information, a map of which is available in Appendix 8.1. Out of 24 available months between January 2019 and the end of our data collection in December 2020, the average location appears three times in the returnee data and six times in the displacement data. Finally, we use data from the Armed Conflict Location and Event Data Project (ACLED), which records all the violent events in a given location taken from newspapers and other sources, in the mediation analysis component (Raleigh et al. (2010)). 4 A link to the areas which appear in the dataset is available here: https://docs.google.com/ spreadsheets/d/12eFw5iS24saWNcQGDVJYYsKtJX-cZ0O-bZhATEhGEMs/edit?usp=sharing 9 4.1 Group Selection on Social Media To examine refugee displacement and return on these platforms, we created parallel processes to identify, collect data from, and analyze accounts, channels, and groups posting public messages about the Syrian conflict on Twitter, Telegram, and Facebook. While this process aimed to limit selection effects to the extent possible, the different natures of the platforms ultimately mean that our samples comprise different populations, and the differences between search functionalities meant there was no single starting point. Underscoring the replicability of our process — and the process’s resemblance to other selection processes common in the use of social media data — we present source specific models in Appendix 9 to show the selection effects inherent in the study of social media users on any singular platform. We limited our analysis to Arabic language, the predominant language of use among all actors in the conflict.5 Our final dataset contains messages in Arabic from 657 public channels and groups on Telegram, 2,106 public Twitter accounts, and 2,124 public Facebook groups and pages. 4.2 Identifying Location Specific Messages The first step of our process was attributing message sets to certain locations. Twitter, Telegram and Facebook’s functionalities make it difficult to verify the location that users post from in all but a few cases. To understand location specific discussion we searched all of the collected messages for 2004 Syrian census locations using string matching on the names in Arabic. We augmented this dataset with locations from the GeoNames API6 , which includes common misspellings or dialectic spellings of locations.7 The Location Mentions dataset, comprised of the location mentioning messages associ- ated with the REACH data, includes messages from 770 unique locations, 54 of which are mentioned more than 10,000 times, 229 of which have returnees, and 129 of which have IDPs. A heat map showing the distribution of mentions of locations overall, locations with returnees and locations with IDPs by source is in Appendix 8.1, as are maps with locations mentioned by returnee status and year of return. The majority of locations mentioned in the Location Mentions dataset are in northeastern Syria in Idlib, Aleppo and northern Ar-Raqqa, an outcome of REACH data emphasis, as can be seen in Section 4 Figure 1. For 218 of these locations return first occurred during the time of data collection, the Return Date dataset. Of these locations, 119 have both returnees and IDPs. In the overall REACH data it is also true that most locations with IDPs also have returnees (26% of locations have IDPs and no return). Table 1 shows the size of the datasets and the percentage of messages within that dataset that are from areas with returnees and IDPs. For example, in the main Location Mentions dataset there are 2,360,559 messages of which 1,264,871 are from Telegram. Of the total messages, 16% (370,345 messages) are from areas with only return, 5% are from areas with only IDPs, 62% are from areas with neither and 17% are from areas with both. Of 5 To truncate the dataset to Arabic, we first obtained the language of a message using langid (Lui and Baldwin (2012)), an off the shelf tool for detecting language from text. The tool is based on pre-trained machine learning models and can detect over 90 languages. Before beginning data analysis, we performed basic cleaning on the text data using Nielsen’s stemmer (Nielsen (2017)), but prevented the stemming of proper nouns like key locations and political figures. Next, we removed stopwords using a base list of Arabic stop words and some words in Syrian dialect (my brother, where, why, how, etc.), and the words ‘channel’, ‘subscribe’, ‘Telegram’ and ‘Twitter’. The base list of Arabic stop words is from Mohatahar Arabic Stopword, Github, https://github.com/mohataher/arabic-stop-words/blob/master/list.txt 6 http://www.geonames.org/ 7 We removed any locations which had the same names as governorates to improve granularity. A few top locations which were disproportionately represented due to false positive matches, ‘mil’, ‘san’, and ‘ada’, were removed. 10 Table 1: The fraction of messages from return and IDP areas by source for each dataset. Location Mentions Return Date Total % IDP % Returnee % Neither % Both Total %Pre Total 2,360,559 5% 16% 62% 17% 769,745 53% Telegram 1,264,871 5% 16% 64% 15% 391,356 57% Facebook 676,580 4% 15% 60% 21% 241,964 46% Twitter 419,108 8% 15% 58% 19% 136,425 56% the messages from locations where return occurred during the time of data collection 53% (411,694 messages) are from pre-return and 47% are from areas post-return. We also compiled a set of groups which mention location information in the name of the group or group description (the Researcher Coded dataset). Such groups include @Na- tional.Defense.in.maharda on Facebook, @newsmanbij on Twitter and @saraqib2017n on Telegram. We use this smaller set of 173,906 messages as a robustness check for our results since it includes information we may be missing by using messages with location mentions. For example, in sales messages individuals might not include their location if they are already posting in a group where the location is apparent, and thus such messages would not appear in the Location Mentions and Return Date datasets but would in the Researcher Coded dataset. For the image analysis we used all images posted in the Researcher Coded dataset groups, a final image set of 24,304 images. Of these images, 921 are from areas with both IDPs and returnees, 9,001 are from areas with only returnees, and 14,382 are from areas with neither. None of the Researcher Coded groups are from areas with only IDPs. To better understand displacement and return flows rather than solely IDP settlement we incorporate monthly data from OCHA which is compiled from a variety of humanitarian actors following methodology agreed to by an inter-agency IDP task force (OCHA (2021)). This data contains information on displacement out of and return to communities starting in January 2019. Though this dataset, as discussed above, is somewhat smaller than the REACH dataset– especially in terms of return– it has increased granularity regarding the size of return and displacement move- ments. We associated the data with location by postal code for the seeded return and displacement datasets resulting in a displacement percentage dataset of 41,434 messages from 316 locations and a return percentage dataset of 62,434 messages from 184 locations. 4.3 Methods Our primary analysis uses unsupervised tools for text and image analysis. For text analysis, we use structural topic modeling (Roberts et al. (2014)) and seeded topic models using words from word2vec (Mikolov et al. (2013)). For image analysis, we use feature extraction and clustering to obtain ‘visual topics.’ Structural topic models identify topics in text data, assigning a probability of belonging in a topic to each word. We run two unseeded models: the Location Mentions model with a four-way variable identifying locations as ‘IDP only,’ ‘Returnee Only,’ ‘Both,’ and ‘Neither’ as covariates for comparability and to be able to explore heterogeneity, and the Return Date model with a dichotomous return variable for pre- and post- return. The results below display our analysis on models with 20 topics, chosen to maximize topic coherence both mathematically and through observation of models with different topic numbers (Mimno et al. (2011)). For all structural topic model analysis, expert annotators labelled each topic by looking at the 30 most salient keywords in that topic both in terms of frequency “F” and frequency and exclusivity “FREX”. Throughout our results, we present the English translations of the topic labels. 11 For the image analysis we use ResNet-50 (He et al. (2016)), a pre-trained convolutional neural network model trained on the ImageNet dataset (Deng et al. (2009)) to extract image features and then cluster them using k-means clustering. Our final analysis uses 30 clusters. The mixed effects models include results from ‘seeded’ topic models obtained by using a subset of messages filtered for keywords about services, local governance, or displacement. The goal of these seeded models is to pull out additional insights which may be obscured in the Location Mentions and Return Date models. For seeded analysis, we worked with a Syrian researcher who provided an initial list of words focused on services, local governance, return, and displacement. We then used word2vec (Mikolov et al. (2013)) with the Continuous Bag of Words (CBOW) embeddings to find the 20 most contextually similar terms for each keyword.8 The resulting list, in Appendix 9.3, was manually filtered for relevance and the expanded keyword list was used to collect all messages that included any of the words. The seeded return model includes 672 locations with 477,581 messages from areas without return and 419,090 messages from areas with return, and the seeded displacement model includes 583 locations with 55,737 messages from areas with IDPs and 125,158 messages from areas without. Both models were run with 15 topics. 4.4 Mixed Effects Modeling For the Return Date model as well as the seeded return model we run a mixed effects model to identify changes in topic prevalence pre- and post- return while both enabling correlation between topics and accounting for location characteristics and time shocks which occur across all messages. Similarly, for the seeded displacement and seeded return models we run mixed effects models to identify the association between population displacement or return and percentage of topic discussion. We model the outcome percent Yij as a function of return in the previous month, with varying intercepts based on subdistrict, month, and topic. We model this as: T T Yij = Xij β + Zij δj + ϵij (1) In our specification we have Xij as either a dichotomous variable indicating whether there is return in location i in month j or a continuous variable indicating the percent of the 2004 population displaced or returned to location i in month j . Zij is a matrix including the group- specific intercept, which takes into account the topic, the subdistrict of location i, and the month j , as well as the values for return, which can vary by topic.9 4.5 Causal Mediation Analysis Drawing on the importance of information access in return decisions (Alrababa’h et al. (2020)), we explore whether discussions on social media have a causal effect on return. We hy- pothesize that a high level of violence in a location in a given month affects whether people will return in the following month, with more robust access to information as the causal mechanism. 8 A note on CBOW word embeddings: for a given keyword, the objective of a word embedding is to find a high dimensional representation such that similar words are placed close by in the high dimensional space. Hence, two words which are used in a similar context will be similar in the word embedding space. We also experimented with other word embedding techniques such as FastText (Bojanowski et al. (2016)), which led to less diversity within the list of relevant terms. We ran trials using stemmed and tokenized words, and ultimately used tokenized un-stemmed text with bigrams. We kept stop words for context. 9 We chose this model based upon the hierarchical structure of our data, which naturally had time and location groups. However, the time and location effects did not end up affecting results, either alone or combined, in any of the cases, indicating that the primary variation is between topic groups as well as the relationship between topic groups and return. 12 Specifically, a high level of violent events and a higher percentage of total messages about violence will prevent return in the following month because refugees will have access to information about undesirable conditions. When discretizing the treatment variable we considered the distribution of violent events in different locations. These distributions, a sample of which can be seen on the right of Figure 2, vary greatly, but many are left skewed. However, they have more balanced distributions than overall events, left in Figure 2. In order to standardize the binary cutoff we use the median because it has a similar meaning across the various distributions and is less susceptible to outliers than the mean. Furthermore, we consider location quantiles rather than month quantiles because return decisions are often motivated by family or other social ties that are location specific (Bank (2019)). Thus, one can see return not as returnees making a decision to return then choosing a location with low violence but rather as returnees choosing a return location and then deciding whether or not to return based on the conditions in that location. Figure 2: Distribution of all monthly event counts in the ACLED data, left, and monthly ACLED event counts from a random subset of subdistricts, right. Using mediation models relies on sequential ignorability, meaning that given the observed pre-treatment confounders the treatment assignment is statistically independent of potential out- comes and potential mediators, and that there are no unmeasured pre-treatment or post-treatment covariates that confound the relationship between the levels of violence discussion and whether a location has return or displacement. The additional location covariates are district, pre-war pop- ulation from 2004 census, and the percentage of people employed in agriculture according to the 2004 census.10 Though the first assumption could be violated by purposeful violence initiated with the goal of population displacement, it is a reasonable one within the given time frame. One such potential violation would be extreme factions displacing citizens from a city as part of a program of ethnic cleansing or to access resources, as discussed by Lischer (2008). The defeat of ISIS in 2017 as well as the rise of Turkey as a key player in the north with a vested interest in enabling return makes purposeful forced displacement unlikely, especially given the range of locations included (Hoffman and Makovsky (2021)). Though Turkey has participated in ethnic cleansing in northern Syria its program involves re-settling other refugees in these areas. Imai et al. (2011) provide a way to test these assumptions with sensitivity analysis in the no interaction context. 10 Agriculture may also approximate how urban or rural a location is. 13 Data is grouped by month (j = 20) and subdistrict (i = 52). Not all subdistricts are represented in all months, but all are represented in at least two months.11 There are 19 districts represented and the average number of months of data per subdistrict is just under 10 with a median of 9. To model the mediation effects we use the mediation package (Tingley et al. (2014)). This technique fits mediator and outcome models, then generates mediator predictors under control and treatment conditions. The outcome model is used to make predictions and calculate the quantities of interest outlined below. The outcome is a binary indicator of whether there is or is not return in a given month: Yij = α1 + β1 Tij + γMij + δTij Mij + ψ1 XiT + ϵ1i (2) Here the outcome Y is return from the OCHA data, the treatment T is violent events in the preceding month from the ACLED data, the mediator M is discussion of violence in the previous month on social media, and the covariates Xi are log population, agriculture and district. In addition, we include an interaction term between the mediator and the treatment. We would expect the treatment to interact with the mediator in that a complete absence of violent events in a location would make it unlikely that there is any news about such events. The mediator is a continuous variable so we use a linear model for the mediator model with the same variables as above: Mij = α2 + β2 Tij + ψ2 XiT + ϵ2i (3) In our outcome we have indirect or ‘causal mediation’ effects, ACME (δt ), the effect of changing the mediator status on the outcome given the same treatment (Imai et al. (2011)). In addition to this, we have the Average Direct Effect (ADE, ξi ), the effect of changing the treat- ment status while maintaining the same mediator value, and the overall Average Treatment Effect (ATE).12 5 Results In this section, we present our findings on differences in content sharing in return and non-return areas and areas with and without IDPs. In Section 5.1, we assess differences in the prevalence of text topics and image clusters.13 We add a time element by running a mixed effects model on topic prevalence pre- and post- return to determine if return coincides with changes in discussion in Section 5.2. In this section we also consider quantity of displacement from and return to a region and how this is associated with discourse. Finally, we consider the role which social media information may play in return decisions using mediation analysis. Overall, we see that areas with return discuss local governance, job opportunities, and non-operational war issues more and differences are statistically significant at the 99% level ac- counting for Bonferroni correction for multiple hypothesis testing.14 In the Location Mentions dataset in Figure 3, the topic related to the economy is nearly 75% more prevalent in areas with both returnees and IDPs as compared to those with neither (+0.011 points) and the one related 11 Higher month cutoffs lead to similar results. Two was chosen as the cutoff to minimize loss of data while keeping the violent events quantile measure meaningful. 12 When including interaction terms, the mediation package creates confidence intervals using Quasi- Bayesian estimation and robust standard errors (Imai et al. (2011)). 13 Researcher Coded analysis is in Appendix 9.1. 14 The 99% significance level applies for all results discussed in the remainder of this section using Bonferroni correction for multiple hypothesis testing unless otherwise noted. 14 to goods, numbers and services is slightly over 25% more prevalent (+0.008).15 Those related to politics are also more prevalent, between 16 and 100 percent (+.004 to .05). War related violence topics, especially those related to the regime, are less prevalent in return areas (-.01 to .05 points). Images also support these conclusions, with images shared in accounts from areas with only re- turnees between 79 and 700% more likely to be goods for sale including cars, motorcycles and other items and images from areas with neither returnees nor IDPs more likely to be tanks (+.026, 113% compared to Both) or groups of militants (+.022, 49%) as seen in Figure 6. Adding additional granularity by considering changes pre- and post- return, we use mixed effects models to find that topics which are more represented pre-return include goods and services related topics in both the Return Date and seeded return model (+4% and +1.3% respectively), Figure 7. In addition, violence related topics including air strike topics are less represented both post-return and when there is less return in a given location (-1.8% Return Date , -3.8% seeded return with return percentage), Figures 7 and 8. Increased displacement, on the other hand, is associated with violence topics including anti-ISIS campaign and air strikes (+0.33 and +0.38 respectively), Figure 8. In general, the clearest signal from social media is on violence-related issues, suggesting that returnees and IDPs can easily look to public groups and pages for information about ongoing conflict to inform return choices. Unsurprisingly, security considerations come first. Building on this insight, we use causal mediation analysis to reveal that information sharing on social media mediates the relationship between violent events in an area and return to that area, Figure 9. 5.1 Content in Return and non-Return Areas We first consider a model run on the Location Mentions dataset. This model was run with a ‘four way’ covariate including information about returnees and IDPs. The covariate equaled 1 for areas with both, 2 for areas with returnees and without IDPs, 3 for areas with IDPs and without returnees, and 4 for areas with neither. For a wholistic analysis, we considered prevalence of image clusters given the same ‘four way’ covariate. Images are from the Researcher Coded groups, in which no locations with only IDPs are represented. The Location Mentions STM model is dominated by violence related topics, motivating the use of the models discussed later in this section to attain additional insights. The most common words for each topic are in Figure 4, and the Arabic versions are available in Appendix 8.2. For the image analysis, sample clusters can be seen in Figure 5. As shown in Figure 3 discussion around organized war violence, such as “Regime military action, Air strike”, is more represented in areas without return and without IDPs, with an increase of 0.045 between neither and both (52% more prevalent). Other violence related topics such as “Air strike warning” (0.05, 29%), “Liberation army, Regime military” (0.029, 49%), and “anti-ISIS campaign” (0.01, 33%) are also more prevalent in areas without returnees or IDPs. This result supports research indicating that individuals are unlikely to want to return to areas with ongoing violence and military campaigns, especially if they were exposed to violence prior to leaving (Ghosn et al. (2021)). Image sharing aligns with the message outcomes, with the two clearly violence related topics ‘Militants, army’ (+.026, +113% compared to Both) and ‘Construction vehicles, tanks’ (+.022, 49% compared to Both) more represented in areas without returnees or IDPs as can be seen in Figure 6. 15 For the rest of the paper percents are calculated as the percent difference between the topic percentage value of the lower covariate and the topic percentage point value of the upper covariate. i.e. a topic with mean percent 0.05 in an area without returnees and 0.1 in an area with returnees would be noted as a 100% increase in areas with returnees. 15 0 0.05 0.1 0.15 0.2 0.25 Air strike warning Regime military action, Air strike Foreign intervention Children, civilians Liberation army, Regime military Religion, texts Air strikes, damage, civilians War news, war reporting Description, names Aleppo news ME politics, Assad, Lebanon Crimes, investigations anti−ISIS campaign News reports Goods, coronavirus, numbers Regime military actions Idlib, roads, governance Economy, weather Afrin, Turkish−Kurdish politics Horoscopes, description Return Only IDPs Only 0 0.05 0.1 0.15 0.2 0.25 Neither Estimated topic proportion Both Figure 3: Topic prevalence based on presence of returnees and IDPs, Location Mentions dataset. The prevalence of violence- related discussion overall in the Location Mentions models, accounting for 7 of the 20 topics in Figure 4, outlines the persistent nature of violence in the daily lives of Syrians. Social media, especially public groups and channels, serve the population through announcements documenting daily violence and military movements which are ultimately useful for both those currently living in Syria and those choosing whether or not to return. Discussion in areas with only IDPs is often more similar to discussion in areas with both returnees and IDPs than to discussion in areas with neither. Though images are often used alongside, or in the place of, text to convey a message, explicitly violence related topics are less represented in the image analysis than in the text. This may be the result of filtering choices by the different platforms to remove violent or disturbing imagery.16 For most violence related topics, discussion in areas with only IDPs is more similar to discussion in areas with both returnees and IDPs, whereas discussion in areas with just returnees is more similar to discussion in areas with neither returnees nor IDPs. Individuals fleeing violence often have little choice in their destination (Bank (2019)), but if fleeing violence are unlikely to remain in areas where it is prevalent. Returnees, on the other hand, are returning in part because of ties to their home town (Ghosn et al. (2021)) and thus may be willing to endure more uncertainty to be there. In fact, a UNHCR monitoring survey of voluntary returnees found that 54% returned to Syria to reunite with their families (Hoffman and Makovsky (2021)). The central role that violence plays also underscores the importance of using multiple platforms. As can be seen in Appendix 9, the majority of violence related content is over represented on Telegram, while much of the economic and job related content is over represented on Facebook. By using a variety of data sources we are getting a richer image of the situation on the ground. 16 Twitter transparency site: https://transparency.twitter.com/ and Facebook Community Stan- dards: https://transparency.fb.com/policies/community-standards/ 16 Regime military 237,592 F: rural, Idlib, side, cannon, rural_Idlib, Rikh, shelling 1. action, Air strike messages FREX: bombardment, countryside_Idlib, Idlib, artillery shelling, shot_war, hit_cannon, shot_rikh Description, 41,649 F: Muhammad, Hamad, Sheikh, Hassan, Ahmed, good, girl 2. names messages FREX: I know, Master, I saw, Thank God, I said, praised_praised, Hadal Idlib, roads, 74,131 F: City, camp, cities, road, Idlib, Idlib, bridge 3. governance messages FREX: Boil, Kfar Takh, Kfar Takh_Rima, Mahal_City, Idlib_City, City_Sarmed, place_religion anti-ISIS 230,185 F: Monastery, Sri, East, Dimkar, SDF, Sri_Dimkar, Raqqa 4. campaign messages FREX: Brif_Deir, Ain_Issa, rural_Raqqa, rural_Hasakah, Lazar_East, Raqqa_Sri, Demqar_arrested Air strike 120,025 F: Accurate, reach, possible, accurate, possible_reach, flying, war 5. warning messages FREX: Accurate, accurate, possible_reach, flying, accurate_accurate, war_flying, sanctity 24,223 F: Aleppo, countryside, west, rural Aleppo, leave, countryside of Aleppo, points 6. Aleppo news messages FREX: Rural_Aleppo, Aleppo_west, city_of_Aleppo, Aleppo_side, friend_friend, Aleppo_road, Tarnab Liberation army, 40,165 F: Army, Mujahid, Liberation, Erh, Idlib, walking, armed 7. Regime military messages FREX: Mahar_Brief, Telegram, shutter_telegram, telegram_bridge, army_controlling, walk_er, er_complex 145,970 F: Confidential, sketch, corridor, class, confidential, across report, NA 8. News reports messages FREX: attachment_report, inherit, inherit_press, rent, shutter_telegram, ether, sight_attachment War news, 61,719 F: War, east, military, Damascus, army, Erh, side 9. war reporting messages FREX: Eastern, war_centered, eastern_Aust, eastern_Marj, east_bombing, east_rear, conscience_askar 269,595 F: Stair, price, stoop, lower, east, side, west 10. Economy, weather messages FREX: rate, precipitation, light, gasoline, partial, rise, wet Afrin, Turkish-Kurdish 87,141 F: Afrin, occupy, hero, mercenary, debtor, witness, Reikh 11. politics messages FREX: mercenaries, hero_honor, occupied_mercenaries, Rech_martyrdom, Team_hamza, mercenary_occupation, martyrdom_hero 128,413 F: undercover, America, anchor, confidential, let, united, saddle 12. Foreign intervention messages FREX: please, Washington, Jin_constitution, ok_please, Tayyib_Ardagh, Rais_Rajab, United_Rick ME politics, 87,026 F: people, chief, company, Arabs, Lebanon, party, council 13. Assad, Lebanon messages FREX: Mr. President, People's Assembly, Palestinian, Ba'ath_Arab, Netni, of the Ba'ath Party, Arab_Shter Horoscopes, 93,816 F: Pregnancy, new, good, pressure, partner, horoscope, cancer 14. description messages FREX: cancer, pregnancy_neese, your time, neez_pressure, pressure_carry, can be, NA Air strikes, 125,279 F: Bird, war, Idlib, bombing, bird_war, explosive, fun 15. damage, civilians messages FREX: explosive, explosive, merry_plane, barrel, wounded man, Idlib summary, barrel bomb Goods, coronavirus, 167,692 F: Corn, numeral, manager, company, servant, new, reward 16. numbers messages FREX: press_cren, new_log, send, new_press, send_cren, kern_rising, sterilization 74,977 F: Assad, countryside, window, ower, cannon, do shield, NA 17. Regime military actions messages FREX: to fall, Fee_Shield, Pearl_Tiger, Fall_Of, Bomb_Mine, Exhaust_Trade, Land_Mine Religion, 219,306 F: same, much, all, debt, less, earth, wealth 18. texts messages FREX: I want, make, you turn, you were, dict, let down, location Children, 79,470 F: child, civilian, bombardment, circle, hospital, witness, city 19. civilians messages FREX: Ratif_Haseel, child_Mohammed, patient_hospital, city_Arab, transfer_hospital, Kafr Batn, hospital_hospital Crimes, 45,349 F: condition, kill, person, carry, poison, investigation, prison 20. investigations messages FREX: betrayed, crime_murder, confess, grimm, thousand_breaths, disappear, rape Figure 4: Location Mentions Topics with four way covariate. “F” indicates words that are most frequent in each topic. “FREX” indicates words that are both frequent in and exclusive to each topic. Message numbers are the total number of messages from all sources combined for that topic. All results are translated from Arabic. Arabic version in Appendix 8.2 Topics in the Location Mentions model which are more prevalent in areas with returnees and IDPs (Figure 3) are largely related to foreign affairs, governance, politics, and the economy such as “Foreign intervention” (both +0.05, 100%), “ME politics, Assad, Lebanon” (return +0.006, 17%), “Idlib, roads, governance” (both +0.004, 16%), “Economy, weather” (both +0.011, 73%) and “Goods, coronavirus, numbers” (both +0.008, 28%). Discussion of religion is also more prevalent in areas with return, suggesting that results from Bove et al. (2021) are also upheld in Syria. For the economy and goods topics, unlike the violence topics, areas with only returnees are more similar to areas with both returnees and IDPs than areas with only IDPs are. The prevalence of economic discussion in these areas supports research from Camarena and Hägerdal (2020) on the salience of economic considerations in driving longer term return. In addition to text discussion, areas with only returnees have more images of items, which seem to be goods for sale (+0.06, +188% compared to neither), cars (+0.019, +79%), and motorcycles (+0.021, +700%) as seen in Figure 6. This is similar to the buzz driving displacement in Iraq where the key drivers are politics, insecurities and infrastructure (Singh et al. (2020)). Overall, factors for return in Syria appear to be similar to those for displacement, but in reverse. The higher prevalence of content related to local governance and goods provision in return and IDP areas, contrasted with the higher prevalence of violent content in non-return and non-IDP areas, indicates that these are important pillars for establishing a healthy community which individuals feel safe returning to. For robustness, we also note that these results are reflected in the Researcher Coded models in Appendix 9.1. 17 (a) Militants (b) Gatherings and Protests Figure 5: Sample image clusters 0 0.1 0.2 Buildings People, varied Gatherings, meetings, suits Militants, army Solo people: children, injuries Gatherings, protests Construction vehicles, tanks Cartoons, logos Graphics images Graphics images Landscapes Rubble, dead Solo people: men Solo people: camoflage gear Items Writing, white background Cars Politicians, Assad Fires: dark background, night Writing Writing, white background Writing Motorcycles Both Return Only 0 0.1 0.2 Neither Estimated topic proportion Figure 6: Prevalence of images by topic, Return and Non-return areas. We do not show any error bars because all images were used. Different social media platforms service disparate populations, and thus have a higher prevalence of different kinds of content. As shown in Appendix 9, Telegram has more of the items, motorcycles, and cars images whereas tanks and construction vehicles and militants are more rep- resented on Facebook. Twitter includes more people, gatherings and protests. Similarly, Telegram has more violence related messages including those related to air strikes and regime military ac- tion. Heterogeneity by source may be linked to uses of the platforms by different types of groups, as explored in Walk et al. (2021). Twitter is used frequently by journalists and activists hoping to reach a foreign audience, thus images of protests may incite action beyond Syria. Facebook and 18 Telegram are used more for day to day activity and discussion, and Telegram has a large number of groups for buying and selling goods. 5.2 Time and Population Granularity: Mixed Effects In addition to the above models, we consider a model run exclusively on locations in which return occurred within the time frame under consideration, the Return Date model. For this model we examine the impact of return on discussion proportion using a return fixed effect, random intercepts for topic, location, and time, and random slopes which allow the impact of return to vary by topic. This model enables correlation between topics, which is likely given similarities between topics, and thus removes multiple hypothesis testing issues. We run the same mixed effects model on the return seeded messages. Finally, we run mixed effects models on the seeded return and displacement topics with a fixed effect for return percentage and displacement percentage respectively. These percentages are found by taking the OCHA population flow data and dividing by the population of each community per the 2004 census. (a) Return Date (b) Seeded Return Figure 7: Fixed effects for topic changes based on population displacement First we consider the pre- and post- return models, Figure 7. Similar to the results in the previous section, the topics more represented post- return include economy related topics, namely ‘Shops, economy’ (4% increase) in the Return Date model in Figure 7a and ‘Schools, shops, services’ (+1.3%) in the seeded return model in Figure 7b. Both models also include a crimes topic which is more represented post-return (+.8% Return Date , +2.4% seeded), potentially indicating that crimes are more ‘newsworthy’ and likely to be investigated in these areas. Such investigations could also be a sign of more stable governance. Pre-return topics in the Return Date model skew defense related, including ‘Air strike, Idlib’ (-1.8%), ‘Military control’ (-0.8%), and ‘Idlib, hospitals, roads’ (-2.9%). Though ‘Idlib, hospitals, roads’ has a slight infrastructure bent, the top words include ‘team,’ which may refer to military squads, ‘transport,’ ‘liberator’ and ‘expand,’ which are military transport related. Pre- return topics in the seeded return model do not provide such a clear picture. Though ‘Foreign intervention, Turkish-Kurdish politics’ (-0.1%) is war related, the other topics such as ‘Weather’ and ‘Religion, land ownership,’ of which the most representative messages are Quran passages, are 19 typical of casual online discussion. This may be due in part to the fact that we attempted to remove most violence related content from this model through the keyword filtering process. (a) Seeded Return, Return Flow (b) Seeded Displacement, Displacement Figure 8: Fixed effects for topic changes based on population displacement Associating the seeded models with information on return and displacement flows respec- tively, we find that an increased level of return is associated with increased discussion of ‘Prices, goods, horoscopes’ (+15.3%) in the seeded return model. Increased return is also negatively asso- ciated with discussion of the two air strike topics, ‘Air strike, civilian injury’ and ‘Air strike, Idlib, Assad’ (-3.8 and 3.7% respectively). The disparity between this model and the above model based upon a binary return date shows the importance of obtaining more granular information about population flows. In the seeded displacement model increased displacement is positively associated with the air strike topic (+0.33%) and anti-ISIS campaign (+0.38%), another military centered topic. Increased displacement is negatively associated with the ‘Cities, roads’ topic (0.4%) which, unlike the ‘Idlib, hospitals, roads’ topic above, is primarily infrastructure related. Top words for this topic include ‘city’, ‘road’, ‘intersection’, and ‘bridge.’ Though the pre- and post- return mixed effects models, population flow mixed effects models, and Location Mentions model in the previous section all lead to similar conclusions, the slight differences between effects indicate the importance of considering a variety of datasets on return and the difficulties in measuring return in a systematic manner. 6 Causal Mediation Effects Based upon our findings that violence is heavily featured in discussion and military violence specifically is negatively associated with return, we consider whether discussion of violence has a mediating effect in the relationship between violence and return. We find that increased discussion of events on social media plays a statistically significant role in the return outcome in cases where the number of violent events is below the subdistrict median. As can be seen in Figure 9, when the number of violent events is high for that subdistrict, whether or not violence is being reported on social media does not play a role in population flows. On the other hand, when the location does not have an above average amount of violence, the quantity of discussion of violent 20 Figure 9: Average Causal Mediation Effects, Average Direct Effects, and overall Average Treatment Effect of social media on the relationship between violence and population flows with 95% confidence intervals. Social media has a mediating effect in the control case, but the effect is not statistically significant in the treatment case. events has a significant effect. In this case, a 1% decrease in discussion of violent events corresponds with a 2.6% lower return rate. In both cases the direct effect is larger than the mediation effect with an above median amount of violent events corresponding to 7.5% lower return rates in control and 5.5% in treatment. The overall proportion of the effect that is mediated is 29% in the control case, whereas in the treatment case it is only 7%, meaning that the quantity of discussion of violence accounts for 29% of the overall decrease in return when violence is below the subdistrict median.17 Discussion on social media may be a proxy for higher levels of event coverage overall. Higher levels of violence could indicate that there is more information readily available whether through articles, social media, or word of mouth, and thus discussion on social media does not change behaviors. However, when that is not the case, more discussion of violence on social media could increase the salience of certain events. Thus, even if there is not a significantly high level of violence for that location in the given month, the discussion signals to potential returnees that the area isn’t safe and people choose not to return. To approximate the model’s susceptibility to deviation from the sequential ignorability assumption we use the same model as above run without the interaction term. Sensitivity analysis indicates how much the mediation effect would change based on different levels of correlation be- tween the outcome error and the mediator error. In Figure 10, the right image plots this correlation value, ρ, on the x-axis and ACME on the y-axis. From this chart, we can see that under both the control and treatment conditions the direction of the ACME holds provided that the correlation is greater than -0.1. This threshold is not very large, so these results could be sensitive to violation of the assumption. We also consider an alternative form of sensitivity analysis based on R2 (Imai et al. (2010)). Appendix Figure 21 shows the proportion of unexplained variance that is explained 17 The causal mediation effect and proportion mediated are also significant when return outcome is quan- tified as the log number of returnees rather than as a dichotomous variable. 21 Figure 10: Effects for the model without interactions, left, and susceptibility of the ACME to change based on sensitivity parameter ρ, right. Graph is only shown for the control, since the two are identical in the no interaction case. by an unobserved pre-treatment confounder, R2∗ , on the top row and the proportion of the original variance explained by the same unobserved confounder, R ˜ 2 , on the bottom row. On the left we see the case where the correlation is negative, meaning the confounder affects the outcome and medi- ator in opposite directions, and on the right where it is positive. In both cases conclusions about the sign of the ACME are robust to the confounder. In the case where the correlation is negative the results are not perfectly robust, in that the ACME could equal zero, however the confounder would have to explain a large proportion of the variance for this to be true. These results show that, for the average case with no interaction effects, even if the sequential ignorability assumption is violated it is unlikely to affect the overall conclusions. 7 Policy and Program Implications According to the web tracking company Alexa.com, Facebook and Telegram are in the top 10 most used websites in Syria. Making use of the popularity of social media, we show the benefits of using social media data to augment traditional survey methods in assessing conditions and attitudes around internal displacement and refugee return in Syria. Combining a variety of data types including survey data, images, and messages from multiple social media platforms opens a window into more casual and everyday discussions occurring in these areas, at a scale which surveys can not provide. Most research on refugee return (Hoffman and Makovsky (2021)) highlights the importance of practitioners having access to the right type of information, and social media provides one such (biased) source of regular, and longitudinal information. We also reveal what topics of conversation are the most prevalent when speaking about locations, including violence, governance, and the economy, and supplement these findings with support from images. Following from these findings, the policy implications fit into two broad areas: investing in social media monitoring and on the ground policies. Surveys are expensive to conduct, and data collection such as that done by REACH or OCHA requires many contacts on the ground. The onset of the novel coronavirus has revealed the importance of leveraging methods for fully remote research inquiries: from design to data collection and analysis. We find that messages posted on social media complement survey data and 22 can be used to understand certain questions regarding refugee return. By leveraging these results, policymakers and program managers can augment current data sources and even supplant them in areas where contacts are not readily made or available. To address biases, it is important to consider what populations are being reached with data collection and what populations may be left out, such as those without internet access. In addition, researchers should note the ‘public’ nature of social media, and thus choose projects for which public opinions and attitudes are salient. For example, research focusing on highly sensitive or personal issues may benefit from a different data source. Finally, just as in a survey, it is necessary to consider that an individual’s public expression of their views may differ from their personal actions and motives. Using a variety of sources can help mitigate some bias issues, as different sources have different policies on censoring violent material and thus may be used for different purposes. Our research showcases the granularity at which social media can provide insights. For instance, as one would expect, we find that there is a clear, statistically significant difference in mentions of violence in areas with more return. This trend prevails even while looking at specific subdistricts, thus indicating the geographic granularity at which social media can provide a signal. Another interesting observation from our analysis is the role of Turkey and its mentions on social media. The high prevalence of topics related to Turkey, Kurdish politics, extremism and factions suggests that these are key issues impacting return in northern Syria. Social media data provides a window into discussions around violence and trust in these areas, including the role of foreign powers and the politics involved in the conflict. Apart from security, discussion of governance, goods provision, and relief organizations are more prevalent in areas which people have returned to. An important difference in return and non return seems to be access to services as well as markets for goods in places of return. Thus, it is important to think of policies that stabilize the provision of such services and provide for the flow of goods. Those interested in the economic needs of Syrians can look to groups online, including online markets and job postings, for information about what goods are in high demand as well as current inflation levels. Images in these groups also indicate a clear trend that mobility is important for returnees: there is a high prevalence of bike and car sales in areas with return. Ultimately, social media data provides a relatively comprehensive picture of the situation on the ground in Syria, especially when a variety of sources are used. By identifying groups and pages in different regions, program managers can both monitor ongoing events and witness the impact of interventions on discussion and attitudes in real time. Such monitoring is less costly than survey and other methods, and feedback is more immediate. Next steps. Most of the analysis in this paper focused on descriptive analysis of trends that can be gleaned from publicly available social media data. Even though the descriptive analysis presented in this paper are valuable (Munger et al. (2021)), the value of such insights for policy makers might be even more relevant in a predictive setting. This is one of the areas which future research may focus on. The clear trends in various types of analysis indicating the presence of refugee return and IDPs gives us hope that such trends can be used to predict whether future return will occur in a location just by looking at trends in social media discussion. The implication from causal mediation that information sharing on social media plays a role in return outcomes provides additional support for this goal. Given the longitudinal nature of the data, we have the opportunity to clearly identify and quantify changes in tone of the conversation, particularly around the time of return in various locations. Many sentiment analysis tools for Arabic rely on messages from platforms such as Yelp and thus are unlikely to capture the nuances of opinion in a conflict zone. For this project we 23 ran exploratory analysis with polyglot, which categorizes each word as positive or negative, but averaging the results of such word counts can lead to in-cohesive results since the sentiment of a sentence may not be captured by the positive or negative nature of its individual words as seen in Appendix 9.4 (Chen and Skiena (2014)). Future research may train such a tool on conflict related social media messages using expert annotators to verify tone. This could be used to study potential conflicts or differences in opinions between returnees and the population already living in an area in order to understand the assimilation of returnees or IDPs in a location. Finally, researchers may wish to explore the sentiment towards refugees in neighboring countries such as Turkey and Lebanon and how social media discussion responds to increased migration. Such work could be augmented with satellite data and other mobility related data from google maps to identify differences in locations where returnees settle and where settlements already exist. 24 References Abdo, G. (2015). Salafists and sectarianism: Twitter and communal conflict in the Middle East. Center for Middle East Policy at Brookings. Abrishamkar, S., F. Khonsari, A. An, J. X. Huang, and S. McGrath (2018). Mining large-scale news articles for predicting forced migration. Aktas, V., Y. K. Tepe, and R. S. Persson (2018). Investigating turkish university students’ attitudes towards refugees in a time of civil war in neighboring syria. Current Psychology, 1–10. Al-Hilu, K. (2019, 10). Afrin under turkish control : political, economic and social transformations. Alrababa’h, A., A. Dillon, S. Williamson, J. Hainmueller, D. Hangartner, and J. Weinstein (2021). Attitudes toward migrants in a highly impacted economy: Evidence from the syrian refugee crisis in jordan. Comparative Political Studies 54(1), 33–76. Alrababa’h, A., D. Masterson, M. Casalis, D. Hangartner, and J. Weinstein (2020). The dynamics of refugee return: Syrian refugees and their migration intentions. Arias, M. A., A. M. Ibáñez, and P. Querubin (2014). The desire to return during civil war: Evidence for internally displaced populations in colombia. Peace Economics, Peace Science and Public Policy 20 (1), 209–233. Aymerich, O. and S. Zeyneloglu (2019). House damage revisited: How type of damage and per- petrating actor affect intentions and actions of idp s in iraq. International Migration 57 (2), 65–79. Bank, W. (2017, July). The toll of war: The economic and social consequences of the conflict in syria. Bank, W. (2019, Feb). The mobility of displaced syrians: An economic and social analysis. Bank, W. (2020a, Dec). Compounding misfortunes: Changes in poverty since the onset of covid-19 on syrian refugees and host communities in jordan, the kurdistan region of iraq and lebanon. Bank, W. (2020b, June). The fallout of war: The regional consequences of the conflict in syria. Barberá, P. (2015). Birds of the same feather tweet together: Bayesian ideal point estimation using twitter data. Political analysis 23 (1), 76–91. Bojanowski, P., E. Grave, A. Joulin, and T. Mikolov (2016). Enriching word vectors with subword information. CoRR abs/1607.04606. Bove, V., J. Di Salvatore, and L. Elia (2021). “What it takes to return: UN peacekeeping and the safe return of displaced people”, Unpublished Working paper. Commissioned as part of the “Preventing Social Conflict and Promoting Social Cohesion in Forced Displacement Contexts” Series. Washington, DC: World Bank Group. Bradley, M. (2013). Refugee repatriation: justice, responsibility and redress. Cambridge University Press. 25 Braithwaite, A., T. S. Chu, J. Curtis, and F. Ghosn (2019). Violence and the perception of risk associated with hosting refugees. Public Choice 178 (3), 473–492. Camarena, K. R. (2016). Returning home and worsening the war: The causal effect of refugee return on civil conflict intensity. Technical report, mimeo. Camarena, K. R. and N. Hägerdal (2020). When do displaced persons return? postwar migration among christians in mount lebanon. American Journal of Political Science 64(2), 223–239. Chatfield, A. T., C. G. Reddick, and U. Brajawidagda (2015). Tweeting propaganda, radicalization and recruitment: Islamic state supporters multi-sided twitter networks. In Proceedings of the 16th Annual International Conference on Digital Government Research, dg.o ’15, New York, NY, USA, pp. 239–249. Association for Computing Machinery. Chen, Y. and S. Skiena (2014). Building sentiment lexicons for all major languages. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers), pp. 383–389. Chu, T. S., F. Ghosn, M. Simon, A. Braithwaite, and M. Frith (2019). The journey home: Flight related factors on refugee decisions to return. APSA. Crilley, R. (01 Jan. 2017). Seeing syria: The visual politics of the national coalition of syrian revolution and opposition forces on facebook. Middle East Journal of Culture and Communica- tion 10 (2-3), 133 – 158. Davenport, C., W. Moore, and S. Poe (2003). Sometimes you just have to leave: Domestic threats and forced migration, 1964-1989. International Interactions 29 (1), 27–55. Deng, J., W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee. Erdogan, M. (2019). Syrians barometer 2019. Fabbe, K., C. Hazlett, and T. Sınmazdemir (2019). A persuasive peace: Syrian refugees’ attitudes towards compromise and civil war termination. Journal of Peace Research 56 (1), 103–117. Freelon, D., M. Lynch, and S. Aday (2015). Online fragmentation in wartime: A longitudinal analysis of tweets about syria, 2011–2013. The ANNALS of the American Academy of Political and Social Science 659 (1), 166–179. Frouws, B., M. Phillips, A. Hassan, and M. A. Twigt (2016). Getting to europe the whatsapp way: The use of ict in contemporary mixed migration flows to europe. Writing Technologies eJournal . Gall, C. (2019). Turkey’s radical plan: send a million refugees back to syria. The New York Times, 2019. Getmansky, A., T. Sınmazdemir, and T. Zeitzoff (2018). Refugees, xenophobia, and domestic conflict: Evidence from a survey experiment in turkey. Journal of Peace Research 55 (4), 491– 507. 26 Ghosn, F., T. S. CHU, M. Simon, A. Braithwaite, M. Frith, and J. Jandali (2021). The journey home: Violence, anchoring, and refugee decisions to return. American Political Science Review , 1–17. Gohdes, A. R. (2020). Repression technology: Internet accessibility and state violence. American Journal of Political Science. Hall, N. and W. Todman (2021, April). Lessons learned from a decade of humanitarian operations in syria. He, K., X. Zhang, S. Ren, and J. Sun (2016, 06). Deep residual learning for image recognition. pp. 770–778. Hoffman, M. and A. Makovsky (2021, May). Hoogeveen, J. G., M. Rossi, and D. Sansone (2017). Leaving, staying, or coming back? migration decisions during the northern Mali conflict. The World Bank. Imai, K., L. Keele, D. Tingley, and T. Yamamoto (2011). Unpacking the black box of causal- ity: Learning about causal mechanisms from experimental and observational studies. American Political Science Review 105 (4), 765–789. Imai, K., L. Keele, and T. Yamamoto (2010). Identification, Inference and Sensitivity Analysis for Causal Mediation Effects. Statistical Science 25 (1), 51 – 71. Jalabi, S. (2021, May). Attitudes toward emigration in the syrian capital of damascus: A survey in three neighborhoods. Kaplan, O. (2021, June). “superstitions and civilian displacement: Evidence from the colombian conflict”, Unpublished Working paper. Commissioned as part of the “Preventing Social Con- flict and Promoting Social Cohesion in Forced Displacement Contexts” Series. Washington, DC: World Bank Group. Khamis, S., P. B. Gold, and K. Vaughn (2012). Beyond egypt’s “facebook revolution” and syria’s “youtube uprising”: Comparing political contexts, actors and communication strategies. Arab Media & Society 15 (spring), 1–30. Kirişci, K. (2014). Syrian refugees and Turkey’s challenges: Going beyond hospitality. Brookings Washington, DC. Klausen, J. (2015). Tweeting the jihad: Social media networks of western foreign fighters in syria and iraq. Studies in Conflict & Terrorism 38 (1), 1–22. Krishnan, Nandini & Russo Riva, F. . S. D. . V. T. (2020, July). The lives and livelihoods of syrian refugees in the middle east: Evidence from the 2015-16 surveys of syrian refugees and host communities in jordan, lebanon and kurdistan, iraq. Kunz, E. F. (1973). The refugee in flight: Kinetic models and forms of displacement. International migration review 7 (2), 125–146. Lazarev, E. and K. Sharma (2017). Brother or burden: An experiment on reducing prejudice toward syrian refugees in turkey. Political Science Research and Methods 5 (2), 201. 27 Lischer, S. K. (2005). Dangerous sanctuaries: Refugee camps, civil war, and the dilemmas of humanitarian aid. Cornell University Press. Lischer, S. K. (2008, 10). Security and Displacement in Iraq: Responding to the Forced Migration Crisis. International Security 33 (2), 95–119. Lui, M. and T. Baldwin (2012). langid. py: An off-the-shelf language identification tool. In Proceedings of the ACL 2012 system demonstrations, pp. 25–30. Lynch, M., D. Freelon, and S. Aday (2014). Syria’s socially mediated civil war. Universitäts-und Landesbibliothek Sachsen-Anhalt. Martin, S. and L. Singh (2018). Data analytics and displacement: Using big data to forecast mass movement of people. In C. Maitland (Ed.), Digital Lifeline?: ICTs for Refugees and Displaced Persons. MIT Press. Martin, S. F., R. Davis, G. Benton, and Z. Waliany (2019). International responsibility-sharing for refugees: Perspectives from the mena region. Geopolitics, History and International Rela- tions 11(1), 59–91. McDowell-Smith, A., A. Speckhard, and A. S. Yayla (2017). Beating isis in the digital space: Focus testing isis defector counter-narrative videos with american college students. Journal for Deradicalization (10), 50–76. Metivier, S., D. Stefanovic, and N. Loizides (2018). Struggling for and within the community: what leads bosnian forced migrants to desire community return? Ethnopolitics 17 (2), 147–164. Metzger, M. M. and A. A. Siegel (2019). When state-sponsored media goes viral: Russia’s use of rt to shape global discourse on syria. Working paper . Miconi, A. (2020). News from the levant: A qualitative research on the role of social media in syrian diaspora. Social Media+ Society 6 (1), 2056305119900337. Mikolov, T., I. Sutskever, K. Chen, G. S. Corrado, and J. Dean (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119. Mimno, D., H. Wallach, E. Talley, M. Leenders, and A. McCallum (2011). Optimizing semantic coherence in topic models. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 262–272. Mitts, T. (2019). From isolation to radicalization: anti-muslim hostility and support for isis in the west. American Political Science Review 113 (1), 173–194. Moore, W. H. and S. M. Shellman (2006). Refugee or internally displaced person? to where should one flee? Comparative Political Studies 39 (5), 599–622. Munger, K., A. M. Guess, and E. Hargittai (2021). Quantitative description of digital media: A modest proposal to disrupt academic publishing. Journal of Quantitative Description: Digital Media 1. 28 Nielsen, R. A. (2017). Deadly clerics: Blocked ambition and the paths to jihad. Cambridge University Press. Nielsen, S. Y. (2016). Perceptions between syrian refugees and their host community. Turkish Policy Quarterly 15 (3), 99–106. NRC (2021, Mar). Syria: Another decade of crisis on the horizon expected to displace millions more. OCHA (2021). Syrian arab republic: Idp movements and idp spontaneous return movements data. Parry, J. and O. Aymerich (2021). “idps with perceived isil affiliation: Can local peace agreements facilitate their return?”, Unpublished Working paper. Commissioned as part of the “Prevent- ing Social Conflict and Promoting Social Cohesion in Forced Displacement Contexts” Series. Washington, DC: World Bank Group. Phillips, C. (2016). The battle for Syria: International rivalry in the new Middle East. Yale University Press. Prucha, N. (2016). Is and the jihadist information highway–projecting influence and religious identity via telegram. Perspectives on Terrorism 10 (6), 48–58. Raleigh, C., A. Linke, H. Hegre, and J. Karlsen (2010). Introducing acled: An armed conflict location and event dataset: Special data feature. Journal of Peace Research 47 (5), 651–660. Ramadan, R. (2017). Questioning the role of facebook in maintaining syrian social capital during the syrian crisis. Heliyon 3 (12), e00483. REACH (2018-2020). Humanitarian situation overview of syria (hsos). Roberts, M. E., B. M. Stewart, D. Tingley, C. Lucas, J. Leder-Luis, S. K. Gadarian, B. Albertson, and D. G. Rand (2014). Structural topic models for open-ended survey responses. American Journal of Political Science 58 (4), 1064–1082. Rüegger, S. (2013). Refugee flows, transnational ethnic linkages and conflict diffusion: Evidence from the kosovo refugee crisis. In presentation at the RRPP annual conference, Belgrade, Serbia. Schober, M. F., J. Pasek, L. Guggenheim, C. Lampe, and F. G. Conrad (2016). Social media analyses for social measurement. Public Opinion Quarterly 80 (1), 180–211. Schon, J. (2019). Motivation and opportunity for conflict-induced migration: An analysis of syrian migration timing. Journal of Peace Research 56 (1), 12–27. Schwartz, S. (2019). Home, again: Refugee return and post-conflict violence in burundi. Interna- tional Security 44(2), 110–145. Siegel, A. A. and V. Badaan (2020). # no2sectarianism: Experimental approaches to reducing sectarian hate speech online. American Political Science Review 114(3), 837–855. Singh, L., K. Donato, A. Arab, T. A. Belon, A. Fraifeld, S. Fulmer, D. Post, and Y. Wang (2020). Identifying meaningful indirect indicators of migration for different conflicts. 29 Singh, L., L. Wahedi, Y. Wang, Y. Wei, C. Kirov, S. Martin, K. Donato, Y. Liu, and K. Kaw- intiranon (2019). Blending noisy social media signals with traditional movement variables to predict forced migration. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1975–1983. Steele, A. (2019). Civilian resettlement patterns in civil war. Journal of peace research 56 (1), 28–41. Sánchez-Querubín, N. and R. Rogers (2018). Connected routes: Migration studies with digital devices and platforms. Social Media + Society 4(1), 2056305118764427. Tingley, D., T. Yamamoto, K. Hirose, L. Keele, and K. Imai (2014). mediation: R package for causal mediation analysis. UNHCR (2019). Fifth regional survey on syrian refugees’ perceptions and intentions on return to syria. UNHCR (2020, Jun). 1 per cent of humanity displaced: Unhcr global trends report. UNHCR (2021). Syria regional refugee response. https://data.unhcr.org/en/situations/ syria#_ga=2.202775441.1694891000.1621886375-1110293497.1621297771. Urman, A., J. C.-t. Ho, and S. Katz (2020). “no central stage”: Telegram-based activity during the 2019 protests in hong kong. Van Leeuwen, M. and G. Van Der Haar (2016). Theorizing the land–violent conflict nexus. World Development 78, 94–104. Voluntas (2019, June). State of the syria crisis response: Assessing humanitarian and development challenges. Walk, E., E. Parker-Magyar, K. Garimella, A. Akbiyik, and F. Christia (2021). Social media narratives on conflict from northern syria. Unpublished Working Paper. Wei, Y., L. Singh, and S. Martin (2016). Identification of extremism on twitter. In 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1251–1255. IEEE. WFP (2021, Feb). Twelve million syrians now in the grip of hunger, worn down by conflict and soaring food prices: World food programme. Yahya, M., C. E. for International Peace, and C. M. E. Center (2018). Unheard Voices: What Syrian Refugees Need to Return Home. Carnegie Endowment for International Peace. Yayla, A. S. and A. Speckhard (2017). Telegram: The mighty application that isis loves. May 9, 332–0. Zhou, Y.-Y., G. Grossman, and S. Ge (2021, June). “when refugee exposure improves local de- velopment and public goods provision: Evidence from uganda”, Unpublished Working paper. Commissioned as part of the “Preventing Social Conflict and Promoting Social Cohesion in Forced Displacement Contexts” Series. Washington, DC: World Bank Group. Zolberg, A. R., A. Suhrke, and S. Aguayo (1989). Escape from violence: Conflict and the refugee crisis in the developing world. Oxford University Press on Demand. 30 8 Appendix 8.1 Location Mention Distribution All With Return With IDPs Facebook Twitter Telegram Figure 11: Quantity of location mentions on different platforms. (a) Locations mentioned in the Location(b) Return locations mentioned in Location Mentions dataset by IDP and returnee status. Mentions dataset by year of return. (c) Locations mentioned in Location Mentions and in the OCHA dataset, green is areas with return and red is those with displacement. Figure 12: Maps of locations in Location Mentions dataset. 31 8.2 Top Words in Topic Model, English and Arabic Regime military 237,592 F: ‫ قصف‬,‫ ريخ‬,‫ بريف_إدلب‬,‫ مدفع‬,‫ جنب‬,‫ إدلب‬,‫بريف‬ 1. action, Air strike messages FREX: ‫ مدفع_يسدف‬,‫ يسدف_ريخ‬,‫ حرب_يسدف‬,‫ تقصف_مدفع‬,‫ دلب_جنب‬,‫ بريف_دلب‬,‫تقصف‬ Description, 41,649 F: ‫ بنت‬,‫ خير‬,‫ أحمد‬,‫ حسن‬,‫ شيخ‬,‫ حمد‬,‫محمد‬ 2. names messages FREX: ‫ هدل‬,‫ حمد_حمد‬,‫ قلتل‬,‫ حمدلل‬,‫ شفت‬,‫ سيدن‬,‫بعرف‬ Idlib, roads, 74,131 F: ‫ جسر‬,‫ دلب‬,‫ إدلب‬,‫ طريق‬,‫ مدن‬,‫ مخيم‬,‫مدين‬ 3. governance messages FREX: ‫ محل_دين‬,‫ مدين_سرمد‬,‫ دلب_مدين‬,‫ محل_مدين‬,‫ كفرتخ_ريم‬,‫ كفرتخ‬,‫سلق‬ anti-ISIS 230,185 F: ‫ لرقة‬,‫ سري_ديمقر‬,‫ قسد‬,‫ ديمقر‬,‫ شرق‬,‫ سري‬,‫دير‬ 4. campaign messages FREX: ‫ ديمقر_تعتقل‬,‫ لرقة_سري‬,‫ لزر_شرق‬,‫ بريف_لحسكة‬,‫ بريف_لرقة‬,‫ عين_عيسى‬,‫بريف_دير‬ Air strike 120,025 F: ‫ حرب‬,‫ تحلق‬,‫ ممكن_تصل‬,‫ دقيقت‬,‫ ممكن‬,‫ تصل‬,‫دقيق‬ 5. warning messages FREX: ‫ حرمة‬,‫ حرب_تحلق‬,‫ دقيق_دقيق‬,‫ تحلق‬,‫ ممكن_تصل‬,‫ دقيقت‬,‫دقيق‬ 24,223 F: ‫ نقط‬,‫ ريف_حلب‬,‫ ترك‬,‫ بريف_حلب‬,‫ غرب‬,‫ ريف‬,‫حلب‬ 6. Aleppo news messages FREX: ‫ ترنب‬,‫ طريق_حلب‬,‫ صديق_صديق‬,‫ حلب_جنب‬,‫ مدين_حلب‬,‫ حلب_غرب‬,‫ريف_حلب‬ Liberation army, 40,165 F: ‫ مسلح‬,‫ مشي‬,‫ إدلب‬,‫ إره‬,‫ تحرير‬,‫ محر‬,‫جيش‬ 7. Regime military messages FREX: ‫ مجمع_إره‬,‫ مشي_إره‬,‫ جيش_يسيطر‬,‫ بتلغر_جسر‬,‫ شتر_بتلغر‬,‫ بتلغر‬,‫محر_بريف‬ 145,970 F: ‫ تقرير‬,‫ عبر‬,‫ سرية‬,‫ أخب‬,‫ صيل‬,‫ رسم‬,‫سري‬ 8. News reports messages FREX: ‫ مرفق_بصر‬,‫ أثير‬,‫ شتر_تلغر‬,‫ رينت‬,‫ نرث_برس‬,‫ نرث‬,‫تقرير_مرفق‬ War news, 61,719 F: ‫ جنب‬,‫ إره‬,‫ جيش‬,‫ دمشق‬,‫ عسكر‬,‫ شرق‬,‫حرب‬ 9. war reporting messages FREX: ‫ ضمير_عسكر‬,‫ شرق_مؤخر‬,‫ قصف_شرق‬,‫ شرقية_مرج‬,‫ شرقية_أسط‬,‫ حرب_مركز‬,‫شرقية‬ 269,595 F: ‫ غرب‬,‫ جنب‬,‫ شرق‬,‫ خفض‬,‫ رتف‬,‫ سعر‬,‫درج‬ 10. Economy, weather messages FREX: ‫ رطب‬,‫ يرتفع‬,‫ جزئي‬,‫ بنز‬,‫ نخف‬,‫ هطل‬,‫سعر‬ Afrin, Turkish-Kurdish 87,141 F: ‫ ريخ‬,‫ شهد‬,‫ مدين‬,‫ مرتزق‬,‫ بطل‬,‫ حتل‬,‫عفرين‬ 11. politics messages FREX: ‫ ستشه_بطل‬,‫ مرتزق_حتل‬,‫ فرق_حمز‬,‫ ريخ_ستشه‬,‫ حتل_مرتزقت‬,‫ بطل_شرف‬,‫مرتزقت‬ 128,413 F: ‫ أردغ‬,‫ متحد‬,‫ ترك‬,‫ سرية‬,‫ رسي‬,‫ ريك‬,‫سري‬ 12. Foreign intervention messages FREX: ‫ متحد_ريك‬,‫ رئيس_رجب‬,‫ طيب_أردغ‬,‫ رجب_طيب‬,‫ لجن_دستر‬,‫ شنطن‬,‫أردغ‬ ME politics, 87,026 F: ‫ مجلس‬,‫ حزب‬,‫ لبن‬,‫ عرب‬,‫ سرية‬,‫ رئيس‬,‫شعب‬ 13. Assad, Lebanon messages FREX: ‫ عرب_شتر‬,‫ لحزب_بعث‬,‫ نتني‬,‫ بعث_عرب‬,‫ فلسطيني‬,‫ مجلس_شعب‬,‫سيد_رئيس‬ Horoscopes, 93,816 F: ‫ سرط‬,‫ برج‬,‫ شريك‬,‫ ضغط‬,‫ جيد‬,‫ جديد‬,‫حمل‬ 14. description messages FREX: ‫ فكن‬,‫ يمك‬,‫ ضغط_حمل‬,‫ نيز_ضغط‬,‫ قتك‬,‫ حمل_نيز‬,‫سرط‬ Air strikes, 125,279 F: ‫ مرح‬,‫ متفجر‬,‫ طير_حرب‬,‫ قصف‬,‫ إدلب‬,‫ حرب‬,‫طير‬ 15. damage, civilians messages FREX: ‫ برميل_متفجر‬,‫ ملخص_إدلب‬,‫ أصيب_رجل‬,‫ ببرميل‬,‫ طير_مرح‬,‫ ميل_متفجر‬,‫متفجر‬ Goods, coronavirus, 167,692 F: ‫ إجر‬,‫ جديد‬,‫ خدم‬,‫ شرك‬,‫ مدير‬,‫ عدد‬,‫كرن‬ 16. numbers messages FREX: ‫ تعقيم‬,‫ كرن_رتفع‬,‫ يرس_كرن‬,‫ جديد_برس‬,‫ يرس‬,‫ تسجيل_جديد‬,‫برس_كرن‬ 74,977 F: ‫ درع‬,‫ أدى‬,‫ مدفع‬,‫ مدين‬,‫ نفج‬,‫ ريف‬,‫أسد‬ 17. Regime military actions messages FREX: ‫ لغم_أرض‬,‫ نفج_مفخخ‬,‫ نفج_لغم‬,‫ أدى_لسقط‬,‫ درر_تغر‬,‫ رسم_درر‬,‫لسقط‬ Religion, 219,306 F: ‫ ثرة‬,‫ أرض‬,‫ يقل‬,‫ دين‬,‫ جميع‬,‫ كثير‬,‫نفس‬ 18. texts messages FREX: ‫ مرقع‬,‫ خذل‬,‫ ديكت‬,‫ كنتم‬,‫ ثرتن‬,‫ يصنع‬,‫أريد‬ Children, 79,470 F: ‫ مدين‬,‫ شهد‬,‫ مشفى‬,‫ أطف‬,‫ قصف‬,‫ مدني‬,‫طفل‬ 19. civilians messages FREX: ‫ مشفى_مشفى‬,‫ كفربطن‬,‫ نقل_مشفى‬,‫ مدين_عرب‬,‫ مريض_مشفى‬,‫ طفل_محمد‬,‫رتف_حصيل‬ Crimes, 45,349 F: ‫ سجن‬,‫ تحقيق‬,‫ تسم‬,‫ تقل‬,‫ شخص‬,‫ قتل‬,‫شرط‬ 20. investigations messages FREX: ‫ غتص‬,‫ ختف‬,‫ ألف_نسم‬,‫ جريم‬,‫ عترف‬,‫ جريم_قتل‬,‫مغدر‬ Figure 13: Words for the topics in the Location Mentions model with four way covariate, stemmed Arabic. Insurgency, 27,374 F: east, monastery, QSD, countryside, Raqqa, walking, tunnel 1. anti-ISIS campaign messages FREX: Free_control, Sayed_Quneitra, staircase, Hassi_Qurayt, frost_rift, send_welcome, Hasakah_paralyzed Air strike, 50,657 F: plane, war, plane_war, arch, bombing, city, merry 2. civilian witness messages FREX: airplane_merry, war_slashing, barrel_blowning, explosive_tendency, sdf_airplane, war_clopping, barrel Shops, 61,613 F: company, price, electric, expand, oil, cut, race 3. economy messages FREX: origin_monitor, benz, name_rise, rise_origin, observe_name, price, exchange_rate 102,790 F: Muhammad, Hamad, Hassan, Sheikh, Ahmed, hero, Mustafa 4. Names, martyrs messages FREX: Muhammad_Muhammad, Hamad_Mohammed, Hamad_Hamad, Muhammad_Hamad, Aleppo_Hamad, Aleppo_Mohammed, Muhammad_Sahb 65,426 F: Aleppo, countryside, rural_Aleppo, countryside_Aleppo, friend, west, old man 5. Aleppo news messages FREX: Kafr_Hamar, Aleppo_Janb, Aleppo_West, Rural_Aleppo, City_Aleppo, Aleppo_Aleppo, Aleppo_Sri Idlib, 28,301 F: city, Idlib, cities, hospital, road, center, team 6. hospitals, roads messages FREX: Syrian_Idlilb,hit_hit, hit_eye, city_team, hospital_transport, clothed, receive, NA 10,227 F: carry, condition, Afrin, meteor, capture, shop, military 7. Crimes, children messages FREX: carry_neez, neez_press, press_carry, military condition, arrest_to, draw_to, section_condition 54,338 F: Assad, countryside, cannon, walk, side, drive, east 8. Regime military action messages FREX: Assad_engine,led_to_fall, engine_deek, wounded_arrayed, engine_feteer, killed_injured, managed_to_destroyed, NA 30,049 F: accurate, possible, reach, fly, possible_reach, minute, war 9. Air strike warning messages FREX: Take_Kafr, Idlib_city, Hamra, Kafr_Hamra, Sarm_war, flying_east, flying 53,397 F: chief, company, council, director, master, minister, GM 10. Office, government messages FREX: Mr. President, Ba'ath_Arab, Baath_Party, Arab_Shooter, Decree_legislation, Legislation, Decree 5,887 F: heart, you, stranger, girl, infant, breath, man 11. Medicine, health messages FREX: I know, I saw, I said, I went, it became, you know, a muscle 37,221 F: brigade, America, Russia, brigade, leave, united, Erdogan 12. ME politics, Turkey messages FREX: Erdogan, President_Recep, Recep_Tayyip, Washington, Jeff, rick_sri, Recep_Erdogan 22,350 F: arab, religion, much, soul, hurricane, earth, say 13. Religion messages FREX: Quran, serial, less, Jesus, Holy Qur'an, poem, tenth century Camps, displacement, 26,890 F: armor, damascus, poison, Assad, kill, fight, go out 14. violence messages FREX: Countryside_Daraa, Karak_east, last_newer, Daraa_east, Rint, rural_Homs, Daraa_west 23,915 F: stairs, west, sea, lower, side, east, flat 15. Days, weather messages FREX: Push,stair_stair, extend_lower, west_tall, fall_directed, extended_affected, inclined_stair, NA 14,237 F: army, military, erh, liberation, armed, leave, Arab 16. Military control messages FREX: army_army, army_dominated, army_dominated, armored, ere_army, entered_army, organized_victory 23,767 F: Kurd, Lebanon, Israel, captive, company, Israel, new 17. Politics news messages FREX: Israel, corn_rose, new_press, Israel, party_master, stinky, capturing 42,930 F: Idlib, countryside, side, shelling, countryside_Idlib, Idlib_side, artillery 18. Air strike Idlib messages FREX: Telegram_Bridge, Telegram, Shutter_Telegram, Idlib_Syrian, Buy_Telegram, Sarm_East, shotgun_Confirmed Horoscope, 47,480 F: good, new, person, maybe, chances, adverb, analyze 19. description messages FREX: Imper_pregnancy, your condition, your surroundings, friend's_sign, good_day, your time, your situation Turkish-Kurdish 38,263 F: people, secret, wealth, Kurdish, Assad, demagogue, occupy 20. politics messages FREX: Kurdistan, abundance, our people, Reich_Satsheh, Thartan, Kurdish_Council, Dict Figure 14: Words for the topics in the Return Date model with a pre- and post-return covariate, English translation. 32 Insurgency, 27,374 F: ‫ نفج‬,‫ مشي‬,‫ لرقة‬,‫ ريف‬,‫ قسد‬,‫ دير‬,‫شرق‬ 1. anti-ISIS campaign messages FREX: ‫ لحسكة_مشل‬,‫ مرسل_رحب‬,‫ صقيع_رتف‬,‫ حسي_قريت‬,‫ درج_بحر‬,‫ سيد_قنيطر‬,‫لحر_سيطر‬ Air strike, 50,657 F: ‫ مرح‬,‫ مدين‬,‫ قصف‬,‫ ريخ‬,‫ طير_حرب‬,‫ حرب‬,‫طير‬ 2. civilian witness messages FREX: ‫ ببرميل‬,‫ حرب_تسدف‬,‫ سدف_طير‬,‫ ميل_متفجر‬,‫ يسدف_ريخ‬,‫ حرب_يسدف‬,‫طير_مرح‬ Shops, 61,613 F: ‫ ريس‬,‫ قتص‬,‫ نفط‬,‫ أسع‬,‫ كهرب‬,‫ سعر‬,‫شرك‬ 3. economy messages FREX: ‫ سعر_صرف‬,‫ سعر‬,‫ رصد_مسم‬,‫ يرتفع_أصل‬,‫ مسم_يرتفع‬,‫ بنز‬,‫أصل_رصد‬ 102,790 F: ‫ مصطفى‬,‫ بطل‬,‫ أحمد‬,‫ شيخ‬,‫ حسن‬,‫ حمد‬,‫محمد‬ 4. Names, martyrs messages FREX: ‫ محمد_صحب‬,‫ حلب_محمد‬,‫ حلب_حمد‬,‫ محمد_حمد‬,‫ حمد_حمد‬,‫ حمد_محمد‬,‫محمد_محمد‬ 65,426 F: ‫ شيخ‬,‫ غرب‬,‫ صديق‬,‫ ريف_حلب‬,‫ بريف_حلب‬,‫ ريف‬,‫حلب‬ 5. Aleppo news messages FREX: ‫ حلب_سري‬,‫ حلب_حلب‬,‫ مدين_حلب‬,‫ بريف_حلب‬,‫ حلب_غرب‬,‫ حلب_جنب‬,‫كفر_حمر‬ Idlib, 28,301 F: ‫ فريق‬,‫ مركز‬,‫ طريق‬,‫ مشفى‬,‫ مدن‬,‫ دلب‬,‫مدين‬ 6. hospitals, roads messages FREX: ‫ لتلق‬,‫ ملبس‬,‫ نقل_مشفى‬,‫ فريق_مدن‬,‫ إخب_عين‬,‫ حدث_إخب‬,‫سري_دلب‬ 10,227 F: ‫ عسكر‬,‫ محل‬,‫ قبض‬,‫ نيز‬,‫ عفرين‬,‫ شرط‬,‫حمل‬ 7. Crimes, children messages FREX: ‫ قسم_شرط‬,‫ برسم_برسم‬,‫ إلق_قبض‬,‫ شرط_عسكر‬,‫ ضغط_حمل‬,‫ نيز_ضغط‬,‫حمل_نيز‬ 54,338 F: ‫ شرق‬,‫ محر‬,‫ جنب‬,‫ مشي‬,‫ مدفع‬,‫ ريف‬,‫أسد‬ 8. Regime military action messages FREX: ‫ تمكنت_تدمير‬,‫ قتل_جرح‬,‫ محر_فطير‬,‫ جرحى_صفف‬,‫ محر_ديخ‬,‫ أدى_لسقط‬,‫أسد_محر‬ 30,049 F: ‫ حرب‬,‫ دقيقت‬,‫ ممكن_تصل‬,‫ تحلق‬,‫ تصل‬,‫ ممكن‬,‫دقيق‬ 9. Air strike warning messages FREX: ‫ تحلق‬,‫ تحلق_شرق‬,‫ سرم_حرب‬,‫ كفر_حمرة‬,‫ حمرة‬,‫ إدلب_مدينة‬,‫تصل_كفر‬ 53,397 F: ‫ جتم‬,‫ زير‬,‫ سيد‬,‫ مدير‬,‫ مجلس‬,‫ سرية‬,‫رئيس‬ 10. Office, government messages FREX: ‫ مرسم‬,‫ تشريع‬,‫ مرسم_تشريع‬,‫ عرب_شتر‬,‫ لحزب_بعث‬,‫ بعث_عرب‬,‫سيد_رئيس‬ 5,887 F: ‫ رجل‬,‫ نفس‬,‫ طفل‬,‫ بنت‬,‫ غريب‬,‫ كنت‬,‫قلب‬ 11. Medicine, health messages FREX: ‫ عضل‬,‫ بتعرف‬,‫ يصير‬,‫ رحت‬,‫ قلتل‬,‫ شفت‬,‫بعرف‬ 37,221 F: ‫ أردغ‬,‫ متحد‬,‫ رسي‬,‫ سرية‬,‫ شنطن‬,‫ ريك‬,‫سري‬ 12. ME politics, Turkey messages FREX: ‫ طيب_أردغ‬,‫ ريك_سري‬,‫ جيفر‬,‫ شنطن‬,‫ رجب_طيب‬,‫ رئيس_رجب‬,‫أردغ‬ 22,350 F: ‫ يقل‬,‫ أرض‬,‫ إسل‬,‫ نفس‬,‫ كثير‬,‫ دين‬,‫عرب‬ 13. Religion messages FREX: ‫ قرن_عشر‬,‫ قصيد‬,‫ قرآن_كريم‬,‫ مسيح‬,‫ أقل‬,‫ مسلسل‬,‫قرآن‬ Camps, displacement, 26,890 F: ‫ خرج‬,‫ صيل‬,‫ قتل‬,‫ أسد‬,‫ تسم‬,‫ دمشق‬,‫درع‬ 14. violence messages FREX: ‫ درع_غرب‬,‫ بريف_حمص‬,‫ رينت‬,‫ درع_شرق‬,‫ أخر_مستجد‬,‫ كرك_شرق‬,‫بريف_درع‬ 23,915 F: ‫ متسط‬,‫ شرق‬,‫ جنب‬,‫ خفض‬,‫ بحر‬,‫ غرب‬,‫درج‬ 15. Days, weather messages FREX: ‫ تميل_درج‬,‫ تأثر_متد‬,‫ تقعت_مدير‬,‫ غرب_تدل‬,‫ متد_خفض‬,‫ درج_درج‬,‫أرص‬ 14,237 F: ‫ عرب‬,‫ ترك‬,‫ مسلح‬,‫ تحرير‬,‫ إره‬,‫ عسكر‬,‫جيش‬ 16. Military control messages FREX: ‫ تنظيم_نصر‬,‫ دخل_جيش‬,‫ إره_جيش‬,‫ مدرع‬,‫ جيش_يسيطر‬,‫ سيطر_جيش‬,‫جيش_جيش‬ 23,767 F: ‫ جديد‬,‫ إسر_ئيل‬,‫ سرية‬,‫ إسر‬,‫ ئيل‬,‫ لبن‬,‫كرن‬ 17. Politics news messages FREX: ‫ إسر‬,‫ نتني‬,‫ لحزب_سيد‬,‫ ئيل‬,‫ جديد_برس‬,‫ كرن_رتفع‬,‫إسر_ئيل‬ 42,930 F: ‫ مدفع‬,‫ إدلب_جنب‬,‫ بريف_إدلب‬,‫ قصف‬,‫ جنب‬,‫ بريف‬,‫إدلب‬ 18. Air strike Idlib messages FREX: ‫ مسدف_تأكدت‬,‫ سرم_شرق‬,‫ إشتر_تلغر‬,‫ إدلب_سري‬,‫ شتر_بتلغر‬,‫ بتلغر‬,‫بتلغر_جسر‬ Horoscope, 47,480 F: ‫ حلل‬,‫ ظرف‬,‫ فرص‬,‫ ربم‬,‫ شخص‬,‫ جديد‬,‫جيد‬ 19. description messages FREX: ‫ ضعك‬,‫ قتك‬,‫ يمك_جيد‬,‫ برج_صديق‬,‫ محيطك‬,‫ يمك‬,‫أبر_حمل‬ Turkish-Kurdish 38,263 F: ‫ حتل‬,‫ ديمقر‬,‫ أسد‬,‫ كرد‬,‫ ثرة‬,‫ سري‬,‫شعب‬ 20. politics messages FREX: ‫ ديكت‬,‫ مجلس_كرد‬,‫ ثرتن‬,‫ ريخ_ستشه‬,‫ شعبن‬,‫ ثرة‬,‫كردست‬ Figure 15: Words for the topics in the Return Date model with a pre- and post-return covariate, stemmed Arabic. 8.3 User sampling processes for Twitter, Telegram and Facebook Twitter data. We used a multi-step process to collect data from Twitter accounts focused on Syria. First, we compiled a list of 304 well-known accounts focused on Syria, using a set of keywords as a guide. The complete list of words we used is provided in Appendix 9.3. From there, we searched through these accounts to see the Twitter “lists” they belonged to (Barberá (2015)), and from there took twelve large and comprehensive lists of users focused on Syria, often curated by journalists and other Syria watchers. We then collected information on the 5,000 most followed accounts from the sub-sample of users generated through this process, and again performed the same filtration. Finally, we manually combed through the user list to ensure that foreign media or political accounts were excluded from the analysis, and that all included accounts explicitly focused on Syria. This process produced 4,061 accounts of which 2,106 were actively posting after October 17th, 2017, were public facing, and had posted in Arabic. Telegram data. On Telegram, we collected data only from public channels (one-to-many conversa- tion) and groups (many-to-many conversation).18 Unlike Twitter, there are no publicly aggregated lists of Telegram users, and there is also no existing strategy in literature to collect Telegram data at scale. We devised a two step strategy that similarly built on an initial, manual-compiled list followed by network connections. First, we searched the same 118 keywords for publicly available channels.19 Next, we obtained all posts made in these 269 groups and channels using the Telethon 18 In the rest of the paper, we do not make a distinction between groups and channels. We refer to all of them as channels. The majority of the groups included were groups acting as marketplaces for buying and selling goods. This type of communication is best suited to a group as all participants are able to post goods for sale and inquire about the goods others are selling. e.g. https://www.cnn.com/2018/02/20/ middleeast/us-weapons-telegram-syria-intl/index.html. 19 These are crowd sourced collections of public channels like: https://lyzem.com, https://tgstat.com/, and https://tele.me. These lists are generally not exhaustive. 33 APwefor Telegram.20 Within these posts, we collected all mentions or links to additional groups or channels which gave us 1,530 accounts. Finally, we manually filtered all these accounts to include only those identified as purportedly Syrian-run, focused on Syria, and which post within our time- frame about northern Syria in Arabic. This produced a dataset of 657 groups and channels. For each post, we gathered the date of posting along with the number of views the post received.21 Facebook data. Finally, to collect data from Facebook, we used the CrowdTangle API.22 Using the API, we first searched through the CrowdTangle database for posts containing the same set of keywords as for Telegram. Then, we sorted the resulting data according to frequency of keyword mentions by various accounts. A Syrian researcher examined the resulting top 4,000 accounts and determined that 2,124 of them were Syria-focused, Syrian run, and concentrated either on any of the seven governorates in our study or on Syria in its entirety. For these 2,124 accounts, we used the APweto get all the posts made during the period Oct 2017 to Dec 2020. For each message we collected account information, post time, interactions, and engagement. Twitter, Facebook, and Telegram’s functionality make it difficult to verify the location that an account posts from. For all three platforms, we first filtered accounts in these lists according to whether their accounts’ self-reported locations or descriptions mentioned Syria or any Syrian governorates in English or Arabic, and then performed a manual check to exclude foreign accounts. This process prevented the inclusion of foreign media or political accounts not exclusively focused on Syria. Our dataset thus includes self-described Syrian users posting in Arabic from inside and outside of Syria. However, there is no reason to believe that accounts on any of the platforms – from identified Syrian users posting about Syria – are more or less likely to be inside or outside of the country. Qualitatively, many opposition news sites feature the works of internal correspondents working with editors outside of Syria. 20 See: https://docs.telethon.dev/en/latest/ 21 Telegram, being a chat application does not have public like or share counts. However, messages can be forwarded across public channels and to private chats. The ‘views’ metric captures the total number of views a message accrued over all of Telegram, thus indicating its global popularity (similar to the number of retweets or likes on Twitter). 22 https://github.com/CrowdTangle/API/wiki CrowdTangle is officially a part of Facebook and indexes a large fraction of popular, publicly available groups and pages on Facebook. 34 9 Models by Source 0 0.05 0.1 0.15 Regime military action, Air strike Air strike warning Religion, texts Description, names Foreign intervention Goods, coronavirus, numbers Liberation army, Regime military ME politics, Assad, Lebanon anti−ISIS campaign Crimes, investigations Children, civilians Economy, weather Idlib, roads, governance News reports Aleppo news Afrin, Turkish−Kurdish politics Regime military actions Air strikes, damage, civilians War news, war reporting Horoscopes, description Facebook All Twitter All 0 0.05 0.1 0.15 Telegram All Estimated topic proportion Figure 16: Effect of source, Location Mentions model. 0 0.1 0.2 Buildings People misc. Items Gatherings, meetings, suits Solo people, mostly children, injuries Militants, army Graphics images Gatherings, protests Rubble, dead Solo people, mostly men Construction vehicles, tanks Solo people, camoflage gear Landscapes Cars (personal) Cartoons, logos Writing, white background Graphics images Politicians, Assad Fires at night Writing Writing Writing, white background Bikes (motorcycles) Telegram Facebook 0 0.1 0.2 Twitter Estimated topic proportion Figure 17: Effect of source, image analysis. No error bars because all images were used. 35 9.1 Additional Topic Prevalence: Researcher Coded Models 0 0.1 0.2 Armed factions, occupation Defense, locations Names, martyrs War news, alerts Employment, announcements Air strikes, Al−Nusra Economy, shops Media channels, news Kurds, governance, relief organizations Hospital, jobs, organizations No Returnees 0 0.1 0.2 Returnees Estimated topic proportion Figure 18: Topic prevalence based on presence of returnees, Researcher Coded model. 9.2 Return Location Information Table 2: Return type for REACH data also in the 2004 census, aggregated by governorate four_way governorate # locations population ethnicity_prim agriculture distance_border distance_city 0 Both Al-Hasakeh 46 2118.347826 sunnitr 0.778043 23.113008 15.825619 1 Both Aleppo 117 1412.401709 kurdish 0.502222 15.810422 11.310419 2 Both Ar-Raqqa 13 3054.384615 sunnitr 1.296154 79.466114 16.943789 3 Both Deir-ez-Zor 24 6759.583333 sunnitr 0.967917 57.303180 15.555733 4 Both Hama 3 3057.666667 sunnifam 0.273333 25.016009 2.148169 5 Both Idlib 88 3025.363636 sunnifam 1.406364 21.087119 6.768372 6 IDP Only Al-Hasakeh 26 298.384615 kurdish 0.528462 23.389464 18.939988 7 IDP Only Aleppo 25 694.520000 sunnitr 0.116400 20.622087 11.302980 8 IDP Only Ar-Raqqa 5 45552.800000 sunnitr 8.126000 84.653970 19.458138 9 IDP Only Idlib 1 752.000000 alawi 2.610000 0.560642 2.401188 10 Neither Al-Hasakeh 333 1588.927928 kurdish 0.413273 17.430056 16.013479 11 Neither Aleppo 500 1388.202000 sunnitr 0.483860 18.767451 10.409180 12 Neither Ar-Raqqa 174 1672.810345 sunnitr 0.827701 61.080247 21.508317 13 Neither Dar’a 95 5710.273684 sunnifam 1.621474 27.380196 8.402183 14 Neither Deir-ez-Zor 95 8441.663158 sunnitr 1.996316 70.134879 15.369757 15 Neither Hama 91 4050.439560 sunnifam 1.766593 48.037720 8.946009 16 Neither Homs 19 10122.631579 sunnifam 1.799474 30.391614 4.971255 17 Neither Idlib 290 2853.072414 sunnifam 0.997207 28.069215 8.170862 18 Neither Rural Damascus 28 23745.785714 sunnifam 3.254643 27.706400 2.912928 19 Return Only Al-Hasakeh 145 1207.262069 sunnitr 0.371655 19.016981 16.457821 20 Return Only Aleppo 106 2119.820755 sunnitr 0.770849 17.654791 14.355594 21 Return Only Ar-Raqqa 74 2142.986486 sunnitr 1.042568 55.250437 18.310287 22 Return Only Dar’a 5 15159.800000 sunnifam 2.752000 16.774116 5.463571 23 Return Only Deir-ez-Zor 23 4904.173913 sunnitr 2.830000 89.920148 22.121496 24 Return Only Hama 12 3287.666667 alawi 1.505000 40.083442 7.163808 25 Return Only Idlib 31 5852.612903 sunnifam 1.250000 36.161500 7.304762 36 Table 3: Return year for REACH data also in the 2004 census, aggregated by governorate year governorate # locations population ethnicity_prim agriculture distance_border distance_city 0 2018 Al-Hasakeh 28 4008.607143 kurdish 0.739286 21.322755 14.893916 1 2018 Aleppo 42 4140.047619 sunnitr 1.499286 18.556167 12.029480 2 2018 Ar-Raqqa 29 2566.310345 sunnitr 1.743448 58.448747 19.715763 3 2018 Dar’a 5 15159.800000 sunnifam 2.752000 16.774116 5.463571 4 2018 Deir-ez-Zor 20 4883.450000 sunnitr 2.724000 90.729891 22.330015 5 2018 Hama 11 3544.454545 alawi 1.641818 36.359788 6.791851 6 2018 Idlib 22 6643.681818 sunnifam 1.590909 37.477727 8.070008 7 2019 Al-Hasakeh 13 824.153846 kurdish 0.368462 17.564086 12.040481 8 2019 Aleppo 58 1223.120690 sunnifam 0.233793 15.215984 12.379789 9 2019 Ar-Raqqa 44 1894.727273 sunnitr 0.598636 54.026432 17.361503 10 2019 Deir-ez-Zor 3 5042.333333 sunnitr 3.536667 84.521861 20.731374 11 2019 Idlib 31 2130.258065 sunnifam 1.156452 24.306679 7.917267 12 2020 Al-Hasakeh 122 1097.573770 sunnitr 0.424016 18.179761 16.403270 13 2020 Aleppo 112 1079.812500 kurdish 0.478125 17.187616 13.235720 14 2020 Ar-Raqqa 13 3054.384615 sunnitr 1.296154 79.466114 16.943789 15 2020 Deir-ez-Zor 24 6759.583333 sunnitr 0.967917 57.303180 15.555733 16 2020 Hama 3 3057.666667 sunnifam 0.273333 25.016009 2.148169 17 2020 Idlib 62 3709.177419 sunnifam 1.478387 21.469372 6.019038 18 pre-2018 Al-Hasakeh 28 558.500000 sunnitr 0.445000 27.762843 19.271699 19 pre-2018 Aleppo 11 2199.090909 kurdish 0.944545 12.211644 12.667772 20 pre-2018 Ar-Raqqa 1 790.000000 sunnitr 0.250000 16.355649 19.297956 21 pre-2018 Hama 1 463.000000 sunnifam 0.000000 81.043640 11.255345 22 pre-2018 Idlib 4 1373.750000 sunnifam 0.000000 16.888717 6.477131 9.3 Search Words for Data Collection and Seed Words , ‫ اﻟﺑﻧﺎء‬, ‫ اﻟﺳﺑﺎﻛﺔ‬, ‫ اﻟﺻرف_اﻟﺻﺣﻲ‬,‫ ﻛﯾﻠو‬, ‫ اﻟﻛﯾﻠوﻏرام‬, ‫ اﻷﺳﻌﺎر‬, ‫ اﻟﺳﻌر‬, ‫اﻟﻛﮭرﺑﺎء‬ ,‫ ﻣﺷﻔﯾﺎت‬,‫ ﻣﺳﺗﺷﻔﻰ‬,‫ اﻓﺗﺗﺎح‬,‫ اﻟﻣﺻﺎﻧﻊ‬, ‫ اﻟﺗوظﯾف‬, ‫ اﻟوظﺎﺋف‬,‫اﻋﺎدة_اﻻﻋﻣﺎر‬, ‫اﻹﻋﻣﺎر‬ ,‫ اﺳواق‬,‫ﻣرﻛز‬,‫ ﺑﯾﻊ_وﺷراء‬,‫ ﺟوع‬,‫ اﻟﻔﻘر‬,‫ ﺗرﻣﯾم‬,‫ ﺑﻘﺎﯾﺎ‬,‫ ﻋﺎﻣل‬,‫ اﻟزراع‬,‫اﻟﻣواد_اﻟﻐذاﺋﯾﺔ‬ ,‫ اﻟﻧظﺎﻓﺔ‬,‫ اﻟﺗﻌﻠﯾم‬,‫ ﻣﺟﻠس_ﻣﺣﻠﻲ‬,‫ ﻣﺟﻠس_ﻣدﯾﻧﺔ‬,‫ ﺑﻠدﯾﺔ‬,‫ ﻣﺣل‬,‫ﺳوق‬ ,‫ ﻣواﺻﻼت‬, ‫ اﻻﺗﺻﺎﻻت‬, ‫ اﻟﺧدﻣﺎت_اﻟﻔﻧﯾﺔ‬,‫ اﻟﻘﻣﺎﻣﺔ‬,‫ ازاﻟﺔ‬,‫اﻟﺧدﻣﺎت_اﻟﺑﻠدﯾﺔ_ازاﻟﺔ‬ services ‫ اﻟرﺳوم‬,‫ اﻟﺿراﺋب‬,‫ اﻟﺳﺟﻼت_اﻟﻣدﻧﯾﺔ‬,‫ اﻟﻘﯾود_اﻟﻣدﻧﯾﺔ‬,‫ ﺟﺳور‬,‫طرق‬ displacement ‫ اﻟﻧﺎزﺣﯾن‬,‫ اﻟﻧزوح‬,‫ ﻣدﻧﯾﯾن‬,‫ ﻧزوح_ﻣدﻧﯾﯾن‬,‫ ﻗﺎﻓﻠﺔ_اﻷﻣل‬,‫ ﻓﺗﺢ_ﺣدود‬,‫ھرب‬ ‫ اﻟﻌودة‬,‫ ﺗﺟﻧﯾد‬,‫ ﺟواز_اﻟﺳﻔر‬,‫ ﺑﺎﺳﺑورت‬,‫ ﺣﺎﺟز‬,‫ ﺣواﺟز‬,‫ ﺗﺻرﯾﺢ‬,‫ اﻟﻌﺎﺋدﯾن‬,‫اﻟﻌودة‬ return "‫ اﻟﻌودة_اﻟﻰ_ﻣدﯾﻧﺗﮭم‬,‫ اﻟﻌودة_اﻟﻰ_وطﻧﮭم‬,‫اﻟﻰ_ﺑﯾوﺗﮭم‬ Figure 19: Seeding keywords for return/services and displacement seeded models. These lists were expanded with word2vec and then used to create message sets. Keywords for data collection across platforms are as follows. For each key term without an obvious Syrian signifier, a Syrian researcher appended “Syria” or one of the seven governorates in our study. Keywords focus on key locations, conflict trends, and local governance and economic conditions. All search terms translated from Arabic: • “Syrian War, Syrian Civil War, Bashar al-Assad, Idlib, Aleppo, Raqqa, Qamishli, Deir Al Zour, Hasaka, Hama, Arab Spring, Azaz, Olive Branch, Euphrates Shield, Syrian Civil War, Syrian news, Syrian Democratic Forces, SDF, Syrian National Army, Kurdistan Workers Party, Islamic State of Iraq and the Levant, ISIS, YPG, People’s Protection Units, Mu- jahideen, Ahrar al-Sham, Tahrir al-Sham, Free Syrian Army, Russia in Syria, Regime, Iran in Syria, smuggling, Syrian Lira, Syrian currency, (Province) markets, Azaz markets, Afrin 37 markets, Situation of displaced persons in areas outside of government control, Haramain camp, Atmh camp, Mahmodla camp, Tall Abyad Camp, IDP camps in (Province), Syrian returnees, Local council in (Location), Cases of return to (Location), Autonomous Admin- istration, People’s municipality in (Location), Civil council in (Location), Military council in (Location), Eid in Syria, Ramadan in Syria, Ramadan in (Province), Holiday Sweets in (Province), Education in (Province), Schools in (Province) Azaz, Tabqa, Baghouz, Albuka- mal, Al Hasakah, Al Shadadi, Al Mayadeen, Al Houl, Semalka, Semalka Border Crossing, Bab Al-Hawa, Manbij, Jarablus, Al Dana, Bab Al Howa, Afrin, Saraqib, Ma’aret al Nauman, Kobani, Ein al Arab, Atareb, Al bab, Jarablus, Azaz, Jarablos, Aldana, Al Raee” 9.4 Sentiment Analysis of Messages Message Arabic Message English Polarity Sentiment |‫ﻣﺮﺻﺪ_اﺧﺒﺎر_اﻟﺜﻮرة_اﻟﺴﻮرﻳﺔ ﻋﺎﺟﻞ‬# #Syrian_Revolution_News_Observatory ‫ﻗﻮات ﺳﻮرﻳﺎ اﻟﺪﻳﻤﻘﺮاﻃﻴﺔ ﺗﻌﺘﻘﻞ أﻗﺮﺑﺎء‬ ‫ﻟﻠﺠﻴﺶ اﻟﺤﺮ ﻗﺪﻣﻮا ﻟﻘﻀﺎء إﺟﺎزة ﻋﻴﺪ‬ Urgent| SDF arrests relatives of the Free Army who came to spend the Eid al-Adha holiday from Turkey to the city of Manbij. ‫ﻣﻨﺒﺞ‬# .‫اﻟﺄﺿﺤﻰ ﻣﻦ ﺗﺮﻛﻴﺎ إﻟﻰ ﻣﺪﻳﻨﺔ ﻣﻨﺒﺞ‬ #Manbij | #Syrian Arab Republic ‫ﺳﻮرﻳﺎ‬# | mirsadakhbaralthawratalsuwria@ @mirsadakhbaralthawratalsuwria 0 Neutral ‫رﺳﺎﻟﺔ اﻵن ﻣن أھل ﻛﻧﺎﻛر إﻟﻰ ﺛوار اﻟﺷﻣﺎل ﯾذﻛر إنﱠ‬ A message now from the people of Kanaker to the rebels of the North ‫ﻋﺻﺎﺑﺎت اﻷﺳد ﺗﺑدأ ﺑﺈﻗﺗﺣﺎم ﺑﻠدة ﻛﻧﺎﻛر ﺑرﯾف دﻣﺷق‬ It is reported that Assad's gangs begin storming the town of Kanaker ‫اﻟﺟﻧوﺑﻲ ﺑﻌد ﺛﻣﺎﻧﯾﺔ أﯾﺎم ﻋﻠﻰ ﻣﺣﺎﺻرة اﻟﺑﻠدة ﻣطﺎﻟﺑﺔ‬ in the southern countryside of Damascus, eight days after the ‫ اﻟﻠﱠﮭم‬.‫اﻟﻌﺻﺎﺑﺎت ﺑﺗﺳﻠﯾﻣﮭﺎ ﻣطﻠوﺑﯾن ﻟﮭﺎ و ﻛﻣﯾﺔ ﻣن اﻟﺳﻼح‬ besieging of the town, demanding that the gangs hand over wanted ‫ﻛن ﻋوﻧﺎ ً ﻹﺧواﻧﻧﺎ و ﺛﺑﺗﮭم ﺣﺳﺑﻧﺎ اﻟﻠﱠﮫ و ﻧﻌم اﻟوﻛﯾل ﯾﻠﻌن‬ men and a quantity of weapons. Oh God, be of help to our brothers https://t.co/6prdccutmg ‫روﺣك ﯾﺎ ﺣﺎﻓظ‬ and make them steadfast. 0 Neutral ‫ ﺟراﺋم ﻋﺑداﻟﺣﻠﯾم ﻻ ﺗﻧﺗﮭﻲ ﺑدءًا ﻣن‬:rt @ramashh0 rt @ramashh0: Abdel Halim’s crimes do not end, starting with his ‫ﺗﻌﺎوﻧﮫ ﻣﻊ ﺣﺎﻓظ ﻟﺗﺳﻠﯾم اﻟﻘﻧﯾطرة ﻣروراً ﺑﻣﺟﺎزر ﺣﻣﺎة‬ cooperation with Hafez to hand over Quneitra, through the massacres ‫وﺣﻠب ودﻓﻧﮫ ﻟﻠﻧﻔﺎﯾﺎت اﻟﻧووﯾﺔ ﻓﻲ اﻟﺻﺣراء اﻟﺳورﯾﺔ‬ of Hama and Aleppo, and his burial of nuclear waste in the Syrian ‫وﺻو‬ desert, to his amendment of the constitution so that Bashar can take ‫ﻟ‬ over the rule, and all that Khaddam did after his defection was to enjoy ‫ وﻛل ﻣﺎ ﻓﻌﻠﮫ ﺧدام ﺑﻌد اﻧﺷﻘﺎﻗﮫ ھو اﻟﺗﻣﺗﻊ ﺑﺎﻷﻣوا‬،‫ﺎ ً ﻟﺗﻌدﯾﻠﮫ ﻟﻠدﺳﺗور ﻟﯾﺗﻣﻛن ﺑﺷﺎر ﻣن ﺗﺳﻠم اﻟﺣﻛم‬ the money he stole. it now!! https://t.co/c0i5hybwvw 0 Neutral ‫ﻋﺎﺟل || ﻧظﺎم اﻷﺳد ﯾﻌﺗﻘل ﺑﻌض اﻟﻌﺎﺋدﯾن إﻟﻰ اﻟﻐوطﺔ‬# ‫ اﻟﺷرﻗﯾﺔ و اﻟذﯾن ﺗم ﺗﮭﺟﯾرھم ﺳﺎﺑﻘﺎ ً ﻟﻠﺷﻣﺎل اﻟﺳوري ﻓﻲ‬#urgent || The Assad regime arrests some of the returnees to Eastern . ‫ ﺷﮭر آذار اﻟﻣﺎﺿﻲ‬Ghouta, who were previously displaced to northern Syria last March. 0 Neutral : ‫ﺣﻣص‬ ‫ﻏﺎرات ﺟوّ ﯾﺔ ﻣن طﯾران اﻷﺟرام اﻷﺳدي و طﯾران اﻟﻐُزاة‬ ‫اﻟروس اﺳﺗﮭدﻓت ﻗرى “ﺳﻠﯾم واﻟﺣﻣرات وﻋزاﻟدﯾن‬ ‫ ﻟﻣﺗﺎﺑﻌﺔ ﻣزﯾد ﻣن اﻻﺧﺑﺎر‬...‫ودﯾرﻓول” ﺑﺎﻟرﯾف اﻟﺷﻣﺎﻟﻲ‬ chickpeas : ‫ اﺷﺗرك‬, ‫واﻟﻔﯾدﯾو واﻟﺻور اﻟﻣﻠﺣﻘﺔ و ﺣﺗﻰ ﯾﺻﻠك ﻛل ﺟدﯾد‬ Air raids from the warplanes of the Assad regime and the planes of ‫دﯾري_ﻧﯾوز اﻹﺧﺑﺎرﯾﺔ ﻋﻠﻰ ﺗﻠﻐرام ﻣن ﺧﻼل‬# ‫ﺑﺷﺑﻛﺔ‬ the Russian invaders targeted the villages of “Salim, Hamrat, Ezzedine https:// telegram.me/derynews : ‫اﻟراﺑط اﻟﺗﺎﻟﻲ‬ and Derful” in the northern countryside... :// telegram.me/derynews 0.0125 Positive Figure 20: Examples of sentiment miscoding for messages. A human annotator would code all of these as negative. 38 Figure 21: Alternative measure of sensitivity analysis assessing proportion of unobserved variance explained by unobserved confounder, R2∗ top, and proportion of original variance explained by unobserved confounder, R ˜ 2 bottom. Shown for negative correlation, left, and positive, right. Results shown for control only since control and treatment are identical in the no interaction case. 39