Policy Research Working Paper 10583 Geospatial Analysis of Displacement in Afghanistan Anais Dahmani-Scuitti Erwin Knippenberg Walker Kosmidou-Bradley Johanna Lee Belanger Poverty and Equity Global Practice October 2023 Policy Research Working Paper 10583 Abstract Given increasing levels of displacement due to conflict and nighttime light growth. Allowing for nonlinearity suggests climate change, it is important to establish robust monitor- decreasing marginal returns of displacement on nighttime ing systems. This paper explores how remote sensing data, lights, as settlements showing the largest expansion of particularly geospatial data, can be leveraged to monitor nighttime lights are those with the lowest displacement displacement flows. It draws lessons from northeastern inflows. The model uses data on nighttime lights to predict Afghanistan, namely the 2018 drought, which is considered whether a settlement was a net receiver of displacement one of the worst in decades. The analysis identifies dis- flows during 2018–20 and correctly classifies 63.2 percent placement patterns by combining displacement data from of the settlements as net inflow or net outflow. This study the International Organization for Migration Displace- provides a proof of concept to test whether population dis- ment Tracking Matrix with nighttime lights. The results placements can be proxied using geospatial data trained on suggest that the cumulated displacement movements from administrative records in a data-scarce environment, where 2018 to 2020 can be proxied by trends in nighttime light real-time insights can inform humanitarian assistance. This imagery. Settlements with higher net inflows of displaced work was done before the political crisis of August 2021. persons between 2018 and 2020 have comparatively larger This paper is a product of the Poverty and Equity Global Practice. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at adahmaniscuitti@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Geospatial Analysis of Displacement in Afghanistan∗ Anais Dahmani-Scuitti +, Erwin Knippenberg, Walker Kosmidou-Bradley, Johanna Lee Belanger ∗∗ JEL Classification: C15, C21, O12, O15, L94, R23 Keywords: migration, displacement, geospatial analysis, econometric regressions, nighttime light Policy Statement Robust monitoring systems are crucial in humanitarian interventions, particularly in the most vulnerable areas. These areas, prone to natural disasters and riven by conflict, are often hard to reach, making on-the-ground data collection logistically difficult, time-consuming, and expensive. This paper explores how remote sensing data, particularly geospatial data, can be used to monitor displacement flows in a conflict zone affected by drought. Using the 2018 drought in Northeastern Afghanistan as a case study, it shows how nighttime lights collected by satellite can be used to quantify the displacement of populations. These results can be used to predict whether a settlement was a net receiver of displacement flows in 2018-2020, providing insights that can inform humanitarian assistance. ∗ We declare that we have no relevant or material financial interests that relate to the research described in this paper. The findings, interpretations, and conclusions expressed in this work do not necessarily reflect the views of the World Bank or any affiliated organizations, its Board of Executive Directors, or the governments they represent. The World Bank does not guarantee the accuracy of the data included in this work. This research was supported through the World Bank Budget (BB) for the Afghanistan Poverty and Equity Program FY 21-22 as part of its core analytical activities. + Corresponding author, adahmaniscuitti@worldbank.org ∗∗ Dahmani-Scuitti: World Bank, Poverty & Equity Global Practice, Washington DC, USA; Knippenberg: World Bank, Poverty & Equity Global Practice, Washington DC, USA; Kosmidou-Bradley: World Bank, Poverty & Equity Global Practice, Dakar, Senegal; Lee Belanger: World Bank, Poverty & Equity Global Practice, Washington DC, USA. 1. Introduction An estimated 5.5 million people were internally displaced in Afghanistan in the summer of 2021, according to the International Organization for Migration (IOM, 2021). The 2020 Global Report ranks Afghanistan among the top five countries with high levels of internal displacement due to conflict and violence. It ranks first with the highest number of Internally Displaced Persons (IDPs) due to natural disasters (IDMC, 2020a, 11-12). Important geographic disparities can be observed, with regions such as Herat, Helmand, Badghis, or Kandahar hosting many IDPs (Figure 1). Since 2017, 350,000 – 480,000 people have been displaced annually due to ongoing conflict and violence (UNOCHA, 2020; UNOCHA, 2021). A developing drought that threatens millions with food insecurity worsens a precarious political context. At the same time, fragile contexts like Afghanistan have relatively scarce on-the-ground data due to constraints on data collection. This paper explores how remote sensing data, particularly geospatial data, can be leveraged to monitor displacement flows in Afghanistan. It seeks to draw lessons from the 2018 drought, which up to that point was considered one of the worst in decades. Between mid-2017 and mid-2018, Afghanistan recorded a deficit of 70 percent precipitation and insufficient snowfall, which constitutes an important water source for irrigation (Oxfam, 2018). The 2018 drought was dubbed the worst drought in two decades and affected 22 of the 34 provinces in the country. About 13.5 million Afghan people faced heightened food insecurity in 2018, with lost livelihoods, livestock deaths, and forced displacement into urban areas (FAO, 2019). An estimated 371,000 people left their homes, including 120,000 in Badghis province or about a quarter of the population. They congregated in the provincial capital Qala-e-Naw to receive emergency humanitarian assistance. To identify patterns of displacement, the team combined displacement data from the IOM Displacement Tracking Matrix (DTM) with Nighttime Lights (NTL) data obtained from NOAA’s VIIRS satellite.3 This exploratory study was regionally focused on capturing displacement in and around Badghis. The results suggest that cumulated displacement movement over 2018-2020 can be proxied by trends in NTL imagery. Settlements with higher net inflows of displaced persons between 2018 and 2020 have larger NTL growth than others, and the results hold with additional robustness checks. Allowing for non-linearity suggests diminishing returns to displacement flows; settlements with the largest level of NTL expansion are those with the lowest level of displacement inflows. The model uses NTL data to predict whether a settlement was a net receiver of displacement flows in 2018- 2020 and finds promising results. The paper uses a fixed effect model to predict the total inflow of displaced persons and estimate whether a settlement has a positive or negative net inflow, i.e., whether it was an inbound area or served as an outbound area that people fled. We find a significant and concave relationship between the average NTL and the cumulated net inflow of displaced population over 2018-2020. The reverse model correctly classifies 63.2 percent of these settlements, a promising baseline of accuracy that could be improved upon with more training data. This study provides a proof of concept to test whether population displacements can be proxied by variations in geospatial data. It contributes to the literature on geospatial data and displacement by combining remote sensing data and IOM administrative data in Afghanistan. This country has experienced large-scale displacement and where security and accessibility constraints make ground-truth data sparse. Geospatial data offers a unique opportunity to track the evolution of indicators through time-series satellite images in hard-to-reach areas. With this work, we aim to complement the literature on displacement by providing an example of how to proxy for displacement flows in situations where data are inconsistent and hard to collect. The methodology can be scaled and tailored to the country level to monitor the effects of drought on displacement. The rest of the paper is organized as follows: section 2 summarizes the literature on geospatial analysis to track displacement. Section 3 outlines the datasets leveraged for the analysis. Section 4 presents the analytical results and robustness checks. Section 5 illustrates how these results can be used to predict displacement levels based on geospatial data alone. Section 6 concludes. 2 Figure 1. IDPs located in Afghanistan (as of December 2020) Source: WB staff estimation using IOM DTM 2020 2. Literature Review A growing strand of literature is seeking to track displacement using satellite imagery. Our work on displacement flows and NTL echoes the seminal work of Giada et al. (2003), which utilized satellite images from refugee settlements to estimate refugee populations. The identification of new refugee settlements has also been enabled by improved remote sensing (UNITAR, 2011), which can provide detailed maps of settlements (Wang et al., 2015; Pelizari et al., 2018). Several reviews of previous research on remote sensing in conflict and human rights work were elaborated by Marx & Goward (2013), Witmer (2015), and Quinn et al. (2018). Our study directly complements existing humanitarian work using nighttime lights (NTL) imagery to monitor displacement crises. Li and Li (2014) explored the spatial and temporal patterns of nighttime lights in the Syrian Arab Republic, finding a moderate correlation between NTL loss and the number of IDPs in each province. Witmer and O’Loughlin (2011) analyzed fluctuations in the NTL levels of cities within the Caucasus region of the Russian Federation and Georgia between 1992 and 2009 to detect conflict-related events, including large flows of populations. Their findings confirmed that such satellite data could detect large displacement movements. Quinn et al. (2018) showed that machine learning algorithms can return precise estimates of forcibly displaced people and geographic structures. They demonstrated how machine learning with satellite imagery could support humanitarian operations by providing objective assessments of the situation in the aftermath of a natural disaster. Humanitarian agencies have also explored alternative indicators, notably the destruction of houses, to estimate the size and duration of migration. The Internal Displacement Monitoring Centre (IDMC) and Human Rights Watch (HRW) contributed to the global research on displacement by using indicators of housing destruction and flooding gathered through satellite and aerial imagery. They demonstrate the viability of such an approach in Türkiye and the Arab Republic of Egypt. Such methodology was especially effective in urban settings, where the availability of images over time allows for time-series analysis to track the construction and destruction of buildings. In a 2018 paper, HRW documented how Rohingya villages were bulldozed in Myanmar (HRW, 2018). The high frequency of satellite images allowed us to track the evolution of destruction over time. In terms of real-time monitoring, this study contributes to the work produced by the humanitarian- development nexus to predict displacement flows using variations in satellite imagery-derived data. For example, the UNHCR Winter Cell work aimed at forecasting migration flows with weather data, as harsh winters tend to impede movements across borders and within countries. The team collaborated with national migration agencies and meteorological offices and monitored social media to elaborate daily weather reports and provide information on possible effects on camps, border controls, and transportation. 3 This paper contributes to the literature on geospatial data and displacement by combining remotely sensed data and IOM administrative data in Afghanistan, where timely, on-the-ground data can be hard to gather. Our study provides a deep dive into Badghis province, hard hit by the 2018 drought, by focusing on dynamics at the settlement level, the smallest administrative unit possible. Geospatial data offers a unique opportunity to track the evolution of indicators through time in hard-to-reach areas (e.g., due to conflicts and natural disasters). This work capitalizes on administrative data from IOM, which provides an exhaustive overview of the overall flow of displaced populations in the region at the settlement levels. Combining administrative data and satellite images illustrates the insights that can be gained in a data-scarce context. 3. The Data 3.1. Displacement and NTL data This study exploits a unique set of data from IOM, the Displacement Tracking Matrix (DTM December 2020), to identify detailed displacement patterns in the Badghis region following the 2018 drought. The IOM data for this analysis corresponds to 4 rounds of Baseline Mobility Assessments (BMA) conducted in Badghis between January 2018 and December 2020. The information was collected at the settlement level through focus group interviews with Key Informants and includes information on settlements’ location (GPS) and inflows and outflows of the displaced population. These include both IDPs from other settlements, and returnees, former IDPs returning to their origin settlements. The DTM data is a detailed and unique source of information on displaced patterns in Afghanistan. Compared to Afghanistan Income Expenditure and Labor Force Surveys, the IOM dataset directly focuses on settlements with a mobile population, ensuring a full representation of displacement in the country. Compared to other humanitarian agencies such as the United Nations Office for the Coordination of Humanitarian Affairs (UNOCHA), it is also the only source of information including IDPs (both from conflict and natural shocks), returnees from abroad and migrants, regardless of their legal status and reasons for displacement. 1 In addition, while UNOCHA supplies more frequent information on flows of internal displacement at the district level, the information cannot be disaggregated at the village level. IOM did not collect data prior to the year 2018 in Badghis, suggesting that there was limited evidence of major displacement flows before then, hence justifying our focus on the 2018-2020 period. Limitations in terms of measurement errors are nevertheless important to keep in mind and further make the case for better data on displacement in Afghanistan. Due to the absence of a clear definition of geospatial borders for IOM settlements, the possibility that settlement buffers could overlap cannot be excluded. This could bias the results if the weights of specific outcomes are artificially inflated by appearing multiple times in overlapping buffers for what should be a unique settlement. In other words, in denser areas, some of the buffers may overlap; hence the population summarized can be duplicated. Therefore, displacement numbers might also be overestimated by counting multiple times the same population. This, however, is not an important issue for our analysis, which focuses on the evolution of the displacement flows in settlements rather than absolute values. In addition, the survey relies on self-reported estimates, which can be associated with important measurement errors. There again, assuming that the bias remains constant throughout time, this issue is unlikely to affect the analysis dramatically. The IOM Displacement Tracking Matrix (DTM December 2020) allowed us to identify 294 settlements in Badghis across the seven districts. As of December 2020, there were 294 settlements in Badghis for which IOM has collected data on inflows and outflows of persons (Figure 2). 2 Key indicators refer to IDP arrivals (Afghans who fled other settlements and presently reside in the assessed location), returning IDPs (previous IDPs who returned to their location of origin), returnees from abroad (Afghans who returned to Afghanistan after having spent at least six months 1 UNOCHA data only focuses on conflict-induced IDPs, while UNHCR data reports numbers for documented returns. 2 Out of these 294 settlements, IOM collected the first flow data in 2018 for 248 settlements, while 7 settlements started being surveyed in 2019, and 19 were first surveyed before June 2020, and 20 surveyed in December 2020. This indicates that, prior to these dates, there were no significant inflows of displaced population. 4 abroad) and out migrants (Afghans who migrated from the assessed settlement to another country). Summary statistics are provided in Table 1. Among these 294 settlements, 247 (84%) were surveyed four times during 2018 and 2020, 7 (2%) were surveyed three times, 20 settlements (7%) were surveyed twice, and 20 settlements (7%) recently emerged in the northern bound of the province and were surveyed between June and December 2020 by IOM (round 11 of data), see Figure 3. We then have 1,069 observations (with multiple data points per settlement). Table 1. Summary statistics of key variables in Badghis settlements across the 2018-2020 Observations VARIABLES (couple settlement- Mean Std. Dev. Min Max period) IDPs arrivals 1,069 1048.3 5236.9 0 81545 IDPs returning 1,069 315.5 360.8 0 4116 Variable with Returnees from variation 1,069 416.6 441.7 0 3397 abroad across period Out-migration 1,069 441.5 355.8 0 2552 NTL 9,702 0.309 0.14 0.03 2.215 Variables at Number of settlements Mean Std. Dev. Min Max settlement level, Net Inflow 294 3254.91 18706.1 -6690 231486 constant Agric2020 293 70.66 20.46 0 100 across period Note: NTL stands for Nighttime Light. There are 294 settlements, among which 84% were surveyed four times over the 2018-2020 period, 2% were surveyed three times, 20% were surveyed twice, and 20% were surveyed once (in 2020). Hence, the total number of observations is 1,069. It is to be noted that the Net Inflow variable may take a negative value. Indeed, the net inflow of persons can be negative if the population migrating out of the settlement (internationally or within Afghanistan) is higher than the number of displaced persons settling in. Figure 2. Map of IOM settlement buffers Figure 3. Zoom on a settlement that emerged between in Badghis June and December 2020 (in green) 5 Source: WB staff’ computation using IOM DTM 2020. Second, each settlement was associated with its average monthly NTL value around a 1km buffer, from January 2018 to January 2020. The imagery is obtained from the Day/night Band (DNB) of NOAA’s VIIRS platform, known as Nighttime Light (NTL). This sensor has a ground resolution of approximately 750 meters by 750 meters. NTL from VIIRS, and its predecessor Defense Meteorological Satellite Program (DMSP), capture low-light emissions from Earth; “These include sources that indicate aspects of human activity, like city lights, gas flares, fishing boats, and agricultural fires, while also capturing other nighttime lights phenomena such as auroras.” (WB, 2020). 3 This data is available for every month in the studied period (2018-2020), with the exception of seasonal issues with reflectance and the DNB Sensor (for June). 4 The team then created a buffer with a 1km radius around the IOM settlements in Badghis and extracted the average monthly NTL value within these areas to capture the monthly evolution of nightlights from 2018 to 2020. An overview of the average NTL value across IOM settlement for the entire 2020 year is displayed in Figure 5. Summary statistics are available in Table 1. The IOM data does not contain information on the actual settlements’ sizes; hence we focus on the average NTL levels across a 1km-by-1km cell centered around the GPS point collected by IOM. This relies on the assumption that IOM staff members recorded a central GPS point, around which NTL activity is likely representative of the urbanization dynamic. As a reference, it takes roughly 10 minutes for a person to walk 1km. While the absence of solid data on settlement boundaries is a caveat, robustness checks will be run using a 2km and 5km radius. Figure 4. Visualization of average NTL in January 2020 Figure 5. Average NTL across the year 2020 in IOM in Badghis settlements in Badghis (1km radius). Note: WB staff computation. The mean NTL across all settlements in Badghis is 0.309, the standard deviation is 0.14, the minimum level is 0.03, and the maximum is 2.215, on a total of 9,702 cells. It is to be noted that there were no negative NTL values for the period being studied. 3.2. Main regressor To identify settlements receiving many displaced persons (IDPs, returnees, and migrants), we look at the cumulative net inflow of persons in a settlement over December 2018-2020 in the IOM data. 5 Net Inflow corresponds to the total net inflow of persons in a given settlement between December 2018 and December 2020, defined as the total inflows of persons minus the total outflows. Figure 6 shows the density of the Net Inflow variable across settlements in Badghis, and summary statistics of the variable can be observed in Table 1. 6 While the underlying refugee data is only 3 For additional data on VIIRS DNB and NTL in general, please see the World Bank’s Open Night Lights tutorial. 4 There are hence 9,702 observations for NTL values, whereby 9,702 = 2x11x294. 5 Displacement data in the Badghis region was only collected by IOM in December 2018 , June 2019, and June 2020. 6 Among the 247 settlements that have displacement information for 4 periods between 2018 and 2020, around half served as overall net receiver over 2018-20 (with Net Inflow >0) and half as net outbound (Net Inflow <0). The data suggests a lock-in effect, whereby 73 percent of settlements that witnessed a total cumulative net outflow had consistent outflows across all four periods. Similarly, 67 percent of settlements that were net 6 from settlements sampled by IOM and does not offer a comprehensive sample, the net displacement for many of these settlements is at or near zero, which would suggest many of these settlements had a stable population that can serve as a baseline (see Figure 6). Figure 6. Distribution of the cumulative net inflow of persons in Badghis Source: WB staffs’ computations. 3.3 Controls The correlation between displacement patterns and NTL growth could be affected by several factors related to economic growth, such as the initial urbanization level of the settlement. One could be concerned that displacement flows to hosting areas are not random, as IDPs may prefer to settle in urbanized areas, where access to services is easier. Large flows of displaced persons could tend to converge to large settlements, with a specific NTL growth pattern, hence skewing our interpretation of the data. There is evidence that displaced Afghan people flow to cities, where security and availability of services are perceived to be greater (IDMC, 2020b; EASO, 2020). IPDs affected by conflict tend to flee rural areas for their regional centers (Samuel Hall/NRC/IDMC, 2018). 7 As the findings of the study confirm, IDPs assume that urban areas are safer and, in these areas, where employment opportunities, services, and humanitarian aid are more readily accessible, they would be more able to cope. Furthermore, IDPs seem to prefer to stay closer to their places of origin. Social ties could also be at stake, whereby migrants tend to congregate in places in which IDPs already settled to capitalize on the network effect. Thus, it is likely that large displacement flows converge to already large settlements with specific NTL growth patterns. To prevent overweighting existing urban centers with high baseline NTL values, we thus control for population estimates in 2017 (pre-drought levels). The team, therefore, incorporated 2017 population data from the Government of Afghanistan (population originally from WorldPop, see Section 4.2). In addition, we also attempt to proxy for the type of economic activity present in the settlement, as the correlation between displacement flows and NTL growth might depend on the share of agricultural labor and urbanization levels. IOM started collecting data on the percentage of settlement income derived by sectors (e.g., agriculture) as of June 2020. While it is unfortunate that this data does not exist for the previous round, this caveat can be mitigated by controlling for the relative importance of agriculture in the settlement in late 2020, which returns information on the general level of urbanity in the settlement over the 2018-2020 period. We create the variable Agric2020 that associates each settlement with its proportion of income obtained through agriculture (farming, crop receivers of displacement inflows over the 2018-2020 period had consistent inflows during the whole period. As there would not be a lot of heterogeneity to exploit by looking at the dynamics of the 247 settlements, we focus the analysis on the cumulative displacement flows – which allows us to include a larger set of settlements. 7According to the study by Samuel Hall/NRC/IDMC conducted in 2017 in the provinces of Kabul, Herat, Kandahar, Kunduz, and Nangarhar, 92% of the respondents in the southwest of the country had moved to Kandahar city, 91% in the west to Herat city, and 76% in the east to Jalalabad (Samuel Hall/NRC/IDMC, 2018, 20). 7 production, etc.) and livestock (cattle, sheep, fish farming, etc.). For each of the settlements, summary statistics are presented in Table 1. On average, settlements derived 71% of their income from agricultural activities, with a minimum of 0% and a maximum of 100%. We also tried to include additional measures of economic activity, such as the average income level in the settlement and the share of employment in the service sectors (both only available for 2020). None of these yielded significant results and were hence not included in the regressions. Yet limitations in the data available reduce our ability to fully disentangle the interaction between displacement flows, economic growth, and NTL evolution. One should then refrain from making strong inferences in terms of causality, as this paper mainly captures complex correlations. 4. Regression Analysis In this analysis, we regress the NTL growth on the total level of displacement in the settlement, represented by the cumulative net inflow of displaced persons over the 2018-2020 period. The linear regression will return information on the general evolution of NTL, i.e., whether and how cumulative net inflow of displaced persons correlates with higher NTL growth. 4.1. Benchmark NTL growth This section explores the NTL growth rate observed in Badghis, i.e., the NTL evolution over time, before accounting for potential deviations from displacement patterns. NTL patterns indicate improvements in overall electrification or an increase in other ambient light sources, like fires in previously electrified areas, and increases or appearance of electrification in areas with little or no historic light emittance. These findings can be shown by Li et al. (2020), which used global NTL time-series data (1992–2018) to show that the NTL time series experienced a trend of both increasing NTL and continuous spatial expansion for high luminance pixels, both in urban centers and fringe locations. This indicates that the NTL growth rate for a given area will primarily exhibit an increase in overall average NTL for an already electrified area or will indicate the presence of new electrification in areas with improved infrastructure or newly inhabited areas. While there is often growth and expansion of NTL over time, baseline NTL levels remain relatively constant due to existing infrastructure and outdoor lighting, with fluctuations over time reflecting shifts in population or economic activity. To control for population size and level of economic activity, the analysis includes a benchmark population estimate (2017) and a measure of the share of agricultural income at the settlement level. Keeping in mind that the cumulative net inflow is based on the 2018 to 2020 flows of persons, the population levels of 2017 allow to proxy for initial urbanization levels and find the relevant benchmark to each settlement. The first column of Table 2 displays the linear regression of NTL levels on its past values at the settlement level, controlling for district-fixed effects. As seen previously, there are 294 settlements spread across seven districts. Let’s define ,, the average NTL in the 1km buffer around the settlement i in district j, at date t (each month from Jan 2018 to Jan 2020). The following regressions looks at how the NTL levels naturally grow through time in settlements within Badghis province, accounting for the initial population in 2017, the share of community income derived from agricultural and livestock activities in late 2020, time seasonality (µ the time fixed effect) and the fixed effect for district j: , , = β0 + β1 , ,−1 + 2 2017 + 3 2020 + + µ + ,, (1) Table 2 (column 1) shows that the benchmark growth of NTL levels is positive on average, i.e., a 1-unit increase in past NTL value in an IOM settlement always raises the current NTL value. Quadratic robustness. Appendix C expands the analysis with a quadratic model. The regression analysis of the growth rate of NTL in Badghis settlement is positive and concave, as settlements with higher NTL in period t-1 will experience larger NTL in period t. The quadratic regression suggests that the NTL growth follows a concave shape; that is, settlements with the largest expansion are those with the lowest level of displacement inflow to start with. On 8 average, the natural growth of NTL levels is such that a 1-unit increase in past NTL value in an IOM settlement always raise the current NTL, as long as past NTL values are smaller than 2.2, which is always true in the Badghis sample. 8 4.2. Impact of displacement flows on NTL growth New population flows settling in the location are likely to impose deviation from the NTL growth, e.g., through settlements’ expansion, growing services needs, or the modification in the economic activity. High inflows of the displaced population are associated with a growing need for basic electricity services such as indoor and street lighting. In addition, they could directly impact settlements’ economic activity, e.g., by modifying the repartition of the labor force throughout the different sectors or through a direct increase in the settlements’ population. As NTL levels are traditionally used to monitor human activities, see e.g., Li et al. (2018), important influxes of persons are expected to impact the evolution of NTL levels. This note therefore does not assume a fixed growth rate of NTL but rather compares growth rates across settlements and provides insights on why some settlements have higher growth rates than others - reflecting an additional inflow of migrants but also increased economic activity. A linear regression at the settlement level allows us to check for correlation between NTL growth and overall displacement flows between 2018 and 2020. 9 The following regression looks at how the NTL growth across time is affected by Net Inflow, the cumulative net inflow of persons in settlement over the 2018-2020 period. The regression controls for the initial population in 2017, the share of community income derived from agricultural and livestock activities in late 2020, time seasonality (µ the time fixed effect) and the fixed effect for district j. For visibility’s sake, let’s label X := Net Inflow, the regression is hence: , = β0 + β1 ,−1 + β2 + β3 ∗ ,−1 + β4 2017 + β5 2020 + + µ + , (3) The NTL growth therefore follows d2 NTLi,t = β3 (4) d NTLi,t−1 d d NTLi,t = β2 + β3 NTLi,t−1 (5) dX The linear regression shows that settlements with higher net inflow of displaced population experience a larger NTL d NTLi,t growth (Table 2, column 2). From equation (4), one obtains that β3 is positive, i.e., is increasing with Net d NTLi,t−1 inflow. In other words, an increase in net inflow of displaced population (for inbound areas with Net Inflow positive), or a reduction in net outflows (for outbound areas with Net Inflow negative) would be associated to larger NTL growth, while a marginal increase in outflows drops the NTL levels. Second, a marginal increase in the net inflow of displaced population would raise NTL levels, provided the settlement already recorded some level of human activity. A marginal increase in the net inflow of displaced population increases the NTL levels if equation (5) is positive, that is, if NTLi,t−1 > 0.28. As a reference, all settlements studied recorded an NTL value above 0.28 at least once during the 2018-2020 period. 10 In other words, settlements receiving a higher number of displaced populations experience a higher NTL growth if the Nighttime Light level is not too small to start with. Table 2. Regression of NTL on net inflows of persons in settlement (cumulated over 2018-2020) NTL level VARIABLES (1) (2) β1 ; Lagged NTL 0.780*** 0.744*** (0.00738) (0.00805) β2 ; Net Inflow -5.40e-07*** (9.38e-08) 8 Equation-3 is positive if and only if ,,−1 > 1.194/0.534. 9 The average NTL is not available for June 2020. 10 NTL levels around Badghis settlement ranged between 0.03 and 2.215 during the 2028-2020 period (see Table 1). 9 β3 ; (Lagged NTL)*Net Inflow 1.88e-06*** (1.97e-07) 8,497 8,497 Observations Number of settlements 293 293 District Fixed Effect Yes Yes Time Fixed Effects Yes Yes Standard errors in parenthesis, *** p<0.01, ** p<0.05, * p<0.1. Note: The settlements are spread across seven districts. Regressions control for population in settlement in 2017, the share of settlement income derived from agricultural activities in 2020, and a constant. Quadratic robustness. Expanding the previous analysis with a quadratic regression returns interesting and significant results. Their results suggest a concave relationship between displacement flows and NTL growth, i.e., diminishing marginal returns of displacement flows on NTL. When comparing settlements of equal population size in 2017, settlements showing the largest impact of displacement flows on NTL expansion received the lowest cumulated displacement inflows. Quadratic results are, however, more complex to interpret and depend on initial parameters; they are displayed in Appendix C. Yet, several situations exist whereby a marginal increase in the net inflow of displaced persons within a settlement would increase the NTL levels. 5. Can We Use NTL Growth to Predict Displacement? Having established a positive correlation between NTL growth and displacement flows, we aim to test whether overall migration patterns can be predicted using NTL data. The model described below yields promising results, as it correctly classifies 63.2 percent of settlements as serving either as a net receiver or net sender of displaced persons over 2018-2020. First, we create the binary variable Inbound, which equals 1 if the settlement was a net receiver of displaced persons (Net Inflow ≥0) over the 2018-2020 period. The Inbound variable takes value 0 if it is an outbound area (Net Inflow <0). Out of the 294 settlements, 149 were net receivers (50.7%), and 145 served as outbound areas from which people mostly left (49.3%). Second, for settlement i in district j, we then use a reversed analysis, whereby we regress the net inflow of displaced population (outcome variable Y) on the average NTL level, and the consecutive increase of NTL, the 2017 initial population level, the share of income derived from agriculture (in 2020), and the district fixed effect . Let’s define the regressors Δ,,−1 = ,, − ,,−1 , with mean(, ) being the average , across all periods between January 2018 and January 2020. The regression can be written as 2 , = β0 + β1 �, � + β2 �,, � + β3 mean�Δ, , � + β4 2017, + 5 2020 + + , (8) Table 3 shows that average NTL is a good predictor of the total net inflow of displaced population in a settlement across 2018-2020. A marginal increase in the average NTL by one unit would raise by 129,797 persons the cumulated inflow over the period. This relationship seems to be concave, at 95% confidence interval. Table 3. Regression of Total inflow on average NTL levels and growth (1) VARIABLES Y = Net Inflow mean(NTL) 129,797*** (46,556) [mean(NTL)]2 -86,305** (35,639) mean(Δ NTL) 231,355 (161,720) 10 Observations 293 R-squared 0.238 District Fixed Effect Yes Standard errors in parenthesis, *** p<0.01, ** p<0.05, * p<0. Note: Regressions control for population in settlement (2017), the share of settlement income derived from agricultural activities (2020), and a constant. This methodology allows us to predict whether a settlement is an Inbound area from coefficients in Table 3. We thereby construct the variable Predicted Inbound, which equals 1 if the predicted values for Net Inflow are positive, and 0 if they are negative. We then compare the actual Inbound classification with the predicted outcome. If the null hypothesis is H0: outbound settlement, the model has a false positive rate of 40 percent and a false negative rate of 34 percent. Overall, 63.2 percent of settlements are correctly classified. Given limited accessibility, this level of accuracy can inform preliminary assessments and resource mobilization prior, which can then be confirmed with on- the-ground validation when feasible. There is always a tradeoff between false positives and false negatives, in terms of policy preference. With limited resources, a model that minimizes false positives would allow the concentration of resources where they are most needed. Table 4. Inference Predicted Inbound using NTL variations Settlement type 0 = Outbound 1 = Inbound Total Predicted 0=Outbound 87 51 138 Settlement True Negative (1-α=.60) False Negative (β=.34) type 1=Inbound 57 98 155 False Positive (α=.40) True Positive (1-β=.66) Total 144 149 293 Source: Authors’ computation. Using the reverse engineering method, hence, yields promising results, yet the methodology needs to be improved, which stresses the need to collect high frequency data on displacement and socio-economic outcomes at the disaggregated level. Including other NTL related regressors did not improve the performance of the model, neither did the inclusion of additional controls on economic activities (average settlement income and share of employment in services, in 2020). 11 To date, alternative statistical methodologies do not yield better predictive power. For example, machine learning algorithm such as the least absolute shrinkage and selection operator (Lasso) or the random forest model do not outperform the current analysis, due to the low sample size (294 settlements). This paper works as a proof-of-concept, and calls for an expansion of the analysis to the whole Afghanistan, as a means to increase both statistical power and representativity. 6. Conclusion This work investigated whether geospatial data could be used to proxy for displacement flows and settlement expansion, using NTL data. By focusing on Badghis province it provided a relevant proof of concept for a province that was particularly affected by recent climatic shocks, including the 2018 drought. We find that large displacement movements can indeed be observed by NTL satellite imagery. Settlements classified as high net inflow of displaced persons over the 2018-2020 timespan (high inflow receivers) have larger NTL growth than others. Results remain robust when looking at the net inflow of persons as a share of the existing population. Accounting for population in 2017 as a proxy for initial levels of urbanization does not change our findings. We can use these insights to predict which settlements are experiencing a large inflow of displaced people, with a true positive rate of 66 percent. This paper provides a first basis for better understanding the drivers and spatial characteristics of settlement growth. Future expansions of the methodology could draw on additional sources of geospatial data (e.g., shifts in vegetation density due to drought). Evidence from the ground suggests that there are regional differences across 11 The regressions were run including the average, median, standard deviation, minimum and maximum of NTL in settlements during the year 2018, the year 2020 and the difference between the two. 11 displacement patterns based on geography, administration, and the like. Future work would investigate patterns between NTL growth of significant locations country-wide, within the context of recent IOM work on determinant factors to return. 12 Abbreviations and Definitions DTM Displacement Tracking Matrix IOM International Organization for Migration IDP Internally Displaced Person NTL Nighttime Lights Author Contributions. Anais Dahmani-Scuitti: Conceptualization, methodology, formal analysis, data curation, writing – original draft, writing – review & editing. Erwin Knippenberg: Conceptualization, formal analysis, writing – original draft, writing – review & editing, supervision. Walker Kosmidou-Bradley: Conceptualization, methodology, software, data curation. Johanna Lee Belanger: Conceptualization, methodology, software, data curation, writing – original draft. Acknowledgments. Authors are thankful to the International Organization for Migration (IOM) for sharing the Displacement Tracking Matrix (DTM). The team would also like to thank IOM staff members of the Afghanistan office, namely Michael Speir, Henry Kwesi Kwenin, Sebastian Boonstra, Caleb Ikyernum and Saboor Sultani for their technical support in handling the DTM data. This paper benefited from input from Razilya Shakirova, and from guidance of Cesar Cancho, Andrew Dabalen, Andrea Mario Dall’Olio and Ghazala Mansuri. Finally, the team is grateful for comments received at the Data for Policy 2021 conference. Conflicts of Interest. We declare that we have no relevant or material financial interests that relate to the research described in this paper. The findings, interpretations, and conclusions expressed in this work do not necessarily reflect the views of the World Bank or any affiliated organizations, its Board of Executive Directors, or the governments they represent. The World Bank does not guarantee the accuracy of the data included in this work. Funding. This research was supported through the World Bank Bank Budget (BB) for the Afghanistan Poverty and Equity Program FY 21-22 as part of its core analytical activities. Data availability statement. The data that support the findings of this study are based on two main sources, for satellite imagery and displacement data respectively. The Nighttime Lights (NTL) data obtained from the NASA/NOAA partnerships through the Visible Infrared Imaging Radiometer Suite (VIIRS) satellite, openly available at https://earthdata.nasa.gov/learn/backgrounders/nighttime-lights. All displacement data used for this study, alongside with the agricultural indicator in 2020, were collected by IOM, through their Displacement Tracking Matrix (DTM). The data originates from the Baseline Mobility Assessment (BMA) and the Community Based Need Assessment surveys, openly accessible at https://displacement.iom.int/afghanistan. Any extraction, translation, reproduction, and distribution, in any form, or by any means, electronic, mechanical, photocopying, or otherwise, requires the explicit prior written permission of IOM. Source: “International Organization for Migration (IOM), December 2021, Displacement Tracking Matrix (DTM)”. The population data comes from WorldPop 13 References Beine MA and Jeusette L (2019) A Meta-Analysis of the Literature on Climate Change and Migration. Discussion Paper Series. IZA DP No. 12639. Available at https://docs.iza.org/dp12639.pdf (accessed 10 December 2021). Cattaneo C, Beine M, Fröhlich CJ, Kniveton D, Martinez-Zarzoso I, Mastrorillo M, Millock K, Piguet E and Schraven B (2019) Human migration in the era of climate change. Review of Environmental Economics and Policy, 13(2), 189-206. https://www.journals.uchicago.edu/doi/pdf/10.1093/reep/rez008. European Asylum Support Office (EASO) (2020) Afghanistan: Key socio-economic indicators focus on Kabul City, Mazar-e Sharif and Herat City. Available at https://www.easo.europa.eu/sites/default/files/publications/2020_08_EASO_COI_Report_Afghanistan_Ke y_Socio_Economic_Indicators_Forcus_Kabul_Citry_Mazar_Sharif_Herat_City.pdf (accessed 10 December 2021). Famine Early Warning Systems Network (FEWS NET) (2021) Afghanistan Food Security Classification. Available at https://fews.net/fews-data/333?tid=2&page=1 (accessed 10 December 2021). Giada S, De Groeve T, Ehrlich D and Soille P (2003) Information extraction from very high-resolution satellite imagery over Lukole refugee camp, Tanzania. International Journal of Remote Sensing, 24(22), 4251-4266. https://doi.org/10.1080/0143116021000035021. Humanitarian Data Exchange (HDX) (2021) Afghanistan - Conflict Induced Displacements in 2020. Available at https://data.humdata.org/dataset/afghanistan-conflict-induced-displacements-in-2020 (accessed 10 December 2021). Human Rights Watch (HRW) (2018) Burma: Scores of Rohingya Villages Bulldozed. New Satellite Images Show Destruction Indicating Obstruction of Justice. Available at https://www.hrw.org/news/2018/02/23/burma- scores-rohingya-villages-bulldozed (accessed 10 December 2021). Internal Displacement Monitoring Centre (IDMC) (2020a) Global Report on Internal Displacement 2020. Available at https://www.internal-displacement.org/sites/default/files/publications/documents/2020-IDMC-GRID.pdf. Internal Displacement Monitoring Centre (IDMC) (2020b) A different kind of pressure: the cumulative effects of displacement and return in Afghanistan. Available at https://www.internal- displacement.org/sites/default/files/publications/documents/202001-afghanistan-cross-border-report.pdf. International Organization for Migration (IOM) (2021) IOM’s Comprehensive Action Plan for Afghanistan and Neighbouring Countries. Available at https://www.crisisresponse.iom.int/sites/default/files/uploaded- files/IOM%20Comprehensive%20Action%20Plan%20- %20Afghanistan%20and%20Neighbouring%20Countries%20final_LR.pdf (accessed 10 December 2021). Li X and Li D (2014) Can night-time light images play a role in evaluating the Syrian Crisis? International Journal of Remote Sensing, 35(18), 6648-6661. https://doi.org/10.1080/01431161.2014.971469. Li X, Zhou Y, Zhao M and Zhao X (2020) A harmonized global nighttime light dataset 1992–2018. Scientific Data, 7(1), 1-9. https://doi.org/10.1038/s41597-020-0510-y. Marx A and Goward S (2013) Remote sensing in human rights and international humanitarian law monitoring: Concepts and methods. Geographical Review, 103(1), 100-111. https://doi.org/10.1111/j.1931- 0846.2013.00188.x. Oxfam (2018) HHs needs assessment in drought affected villages and districts in Herat and Badghis provinces. Available at https://www.humanitarianresponse.info/en/operations/afghanistan/assessment/hhs-needs-assessment- drought-affected-villages-and-districts-herat (accessed 10 December 2021). Pelizari PA, Spröhnle K, Geiß C, Schoepfer E, Plank S and Taubenböck H (2018) Multi-sensor feature fusion for very high spatial resolution built-up area extraction in temporary settlements. Remote Sensing of Environment, 209, 793-807. https://doi.org/10.1016/j.rse.2018.02.025. Quinn JA, Nyhan MM, Navarro C, Coluccia D, Bromley L and Luengo-Oroz M (2018) Humanitarian applications of machine learning with remote-sensing data: review and case study in refugee settlement mapping. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 376(2128). https://doi.org/10.1098/rsta.2017.0363. 14 Samuel Hall / NRC / IDMC (2018) Escaping war: where to next? A research study on the challenges of IDP protection in Afghanistan. Available at https://www.nrc.no/globalassets/pdf/reports/escaping- war---where-to-next/nrc_idp_escaping-war_where-to-next.pdf (accessed 10 December 2021). United Nations Institute for Training and Research (UNITAR) (2011) UNOSAT Brief 2011—satellite applications for human security. Geneva, Switzerland: United Nations Institute for Training and Research. Available at https://reliefweb.int/sites/reliefweb.int/files/resources/UNOSAT_Brief_Sat_App_for_Human_Sec_2011_0. pdf (accessed 10 December 2021). United Nations Office for the Coordination of Humanitarian Affairs (UNOCHA) (2020) Afghanistan: Snapshot of Population Movements - January to December 2019. Available at https://reliefweb.int/sites/reliefweb.int/files/resources/afg_population_movement_snapshot_20200115.p df (accessed 10 December 2021). United Nations Office for the Coordination of Humanitarian Affairs (UNOCHA) (2021) Afghanistan: Snapshot of Population Movements - January to December 2020. Available at https://reliefweb.int/report/afghanistan/afghanistan-snapshot-population-movements-january-december- 2020-23-jan-2021 (accessed 10 December 2021). Wang S, So E and Smith P (2015) Detecting tents to estimate the displaced populations for post-disaster relief using high resolution satellite imagery. International Journal of Applied Earth Observation and Geoinformation, 36, 87-93. https://doi.org/10.1016/j.jag.2014.11.013. Witmer FD (2015) Remote sensing of violent conflict: eyes from above. International Journal of Remote Sensing, 36(9), 2326-2352. https://doi.org/10.1080/01431161.2015.1035412. Witmer FD and O'Loughlin J (2011) Detecting the effects of wars in the Caucasus regions of Russia and Georgia using radiometrically normalized DMSP-OLS nighttime lights imagery. GIScience & Remote Sensing, 48(4), 478-500. https://doi.org/10.2747/1548-1603.48.4.478. World Bank (WB) (2020) Remotely-sensed data of Nighttime lights. In Open Night Lights tutorial. Available at https://worldbank.github.io/OpenNightLights/tutorials/mod1_2_introduction_to_nighttime_light_data.ht ml (accessed 10 December 2021). 15 Appendix - Geospatial analysis of displacement in Afghanistan, a case study Anais Dahmani-Scuitti 1, Erwin Knippenberg, Walker Kosmidou-Bradley, Johanna Lee Belanger (Poverty & Equity Global Practice, The World Bank) Appendix A: variables definitions Definitions of IOM variables Returnees are Afghan nationals who have returned to Afghanistan in the assessed location after having spent at least six months abroad. This includes both documented returnees (Afghans who were registered refugees in host countries and requested voluntary return with UNHCR and relevant national authorities) and undocumented returnees (Afghans who returned spontaneously or were deported from host countries, irrespective of whether or not they were registered refugees with UNHCR and relevant national authorities). Arrival IDPs are Afghans who fled from other settlements in Afghanistan and have arrived and presently reside at the assessed location - host community, as a result of, or in order to avoid, the effects of armed conflict, generalized violence, human rights violations, protection concerns, or natural and human-made disasters. Net Inflow = Returnees + IDPs Returned IDPs are Afghans who have returned to their home or place of origin in the assessed location or settlement from which they had fled as IDPs in the past, as a result of, or in order to avoid, the effects of armed conflict, generalized violence, human rights violations, protection concerns, or natural and humanmade disasters. Fled IDPs are Afghans who have fled from an assessed location or settlement within which they previously resided and now currently reside in a different settlement in Afghanistan, as a result of, or in order to avoid, the effects of armed conflict, generalized violence, human rights violations, protection concerns, or natural and human-made disasters. Construction of total inflow We construct the variable Net Inflow as the cumulative net inflow (labeled net_inflow) at final date (December 2020). For each settlement i, the variable is hence defined Net Inflowi = ∑2020 t=2018 local net inflowi with local net inflowi= number of returnees from abroadi + number of returning IDPsi + number of new IDPs in settlementi - (Out Migrantsi + Fled IDPsi) Appendix B: Geospatial Data and Methodology The geospatial methodology used for this analysis extracts monthly descriptive statistics of Nighttime Lights data to individual buffers around IOM settlement locations. This process is known in Geographic Information Systems (GIS) as “zonal statistics”, and is a commonly used method for collecting raster, or gridded data (e.g., NTL, precipitation) into zones or areas of interest to the user (e.g., buffers, administrative districts). This general approach was used in Google Earth Engine, a free cloud computing and data repository, which enables large- scale, satellite imagery and other analyses, including the extraction of zonal statistics.48 1 Corresponding author, adahmaniscuitti@worldbank.org For the purposes of exploring drought-driven displacement, we kept the analysis constrained to Badghis Province in Northwestern Afghanistan, but the same approach is easily replicated in other regions or even country wide. The input data used in this analysis were the UN IOM DTM dataset from 2020, and the Nighttime Lights data were monthly composites from January 2018 to December 2020. 2 To replicate the methodology, one may open the Google Earth Engine script at the universal link 3 or download the script directly from GitHub 4. In the GitHub repository there is a folder titled “NTL,” which contains the monthly NTL script that can be extracted to the buffers, as well as a script for extracting a monthly average of NTL data for the entire area of interest. The user then may update the boundaries of the analysis by simply uploading a new dataset and preserving the object name ’table’ so that the script will apply the same processes on the new area of interest. The user would also update the export location to ensure that the file saves to a location on their personal Google Drive account (or alter the code to save it as an Earth Engine ’asset’, or dataset. The outputs of the methodology are a panel dataset in the form of a CSV (unless otherwise specified in the code) and can be exported from Drive onto one’s local machine for further statistical analysis. Appendix C: Quadratic model C.1. Benchmark NTL growth (quadratic) Table 5 displays the linear and quadratic regression of NTL levels on its past values, at the settlement level, controlling for district fixed effects. As seen previously, there are 294 settlements, spread across 7 districts. Let’s define ,, the average NTL in the 1km buffer around the settlement i in district j, at date t (each month from Jan 2018 to Jan 2020). The following regressions looks at how the NTL levels naturally grow through time in settlements within Badghis province, accounting for the initial population in 2017, the share of community income derived from agricultural and livestock activities in late 2020, time seasonality (µ the time fixed effect) and the fixed effect for district j. Equation-1 in the main text assumes a linear evolution of NTL, while equation-2 is quadratic: 2 , , = 0 + 1 ,,−1 + 2 �, ,−1 � + 3 2017 + 4 2020 + + µ + ,, (2) ̂2 ,,−1 . ̂1 + 2 The NTL evolution is such that a marginal increase in past NTL value modifies the current NTL by The regression analysis of the growth rate of NTL in Badghis settlement is positive and concave, as settlements with higher NTL in period t-1 will experience larger NTL in period t. The linear regression suggests that the NTL growth follows a concave shape, that is, settlements with the largest expansion are those with the lowest level of displacement inflow to start with. On average, the natural growth of NTL levels is such that a 1-unit increase in past NTL value in an IOM settlement always raise the current NTL, as long as past NTL values are smaller than 2.2, which is always true in the Badghis sample. 5 Table 5. Linear and quadratic regression of NTL its lagged value and district fixed effects NTL levels VARIABLES (1) (2) Linear Quadratic β1 ; Lagged NTL 0.780*** 1.194*** (0.00738) (0.0205) β2 ; (Lagged NTL)2 -0.269*** (0.0125) Observations 8,497 8,497 2 Due to reflectance issues from collection, NTL from June are unavailable for most of Afghanistan. As such, there was no NTL data used for the month of June 2018-2020. 3 Belanger, J. Google Earth Engine Script. https://code.earthengine.google.com/4b069482a54f1fb6041d2e2cff25834d. 4 Stewart, B., Chamorro, A.E., Martine, L. and Belanger, J. World Bank - GEE_Zonal. GitHub Repository. https://github.com/worldbank/GEE_Zonal/. 5 Equation-3 is positive if and only if , ,−1 > 1.194/0.534. Number of settlements 293 293 District Fixed Effects Yes Yes Time Fixed Effects Yes Yes Standard errors in parenthesis, *** p<0.01, ** p<0.05, * p<0.1 Note: Regressions control for population in settlement in 2017, the share of settlement income derived from agricultural activities in 2020, and a constant. C.2. Quadratic impact of displacements flows on NTL growth Expanding the previous analysis with a quadratic regression returns interesting and significant results. While linear regressions return information on general trends, quadratic regressions account for non-linear dynamics. Hence, the coefficients β3 and β5 of the quadratic regression are of particular interest, as they represent the correlation between high inflows of persons and the NTL growth. Defining X = , the regression is hence 2 2 , = β 0 + β1 ,−1 + β2 + β3 ∗ ,−1 + β4 � ,−1 � + β5 ∗ � ,−1 � + β6 ( )2 + 2 β7 ( )2 ∗ ,−1 + β8 ( )2 ∗ � ,−1 � + β6 2017 + µ + , (6) One obtains that d NTLi,t = β2 + β3 ,−1 + β5 2 2 ,−1 + 2. X i (β6 + β7 ,−1 + β8 ,−1 ) (7) d Table 6. Regression of NTL on net inflows of persons in settlement (cumulated over 2018-2020) Y = NTL level (1) (2) VARIABLES Linear Quadratic β1 ; Lagged NTL 0.744*** 1.162*** (0.00805) (0.0220) β4 ; (Lagged NTL)2 -0.337*** (0.0140) β2 ; Net Inflow -5.40e-07*** -2.10e-06*** (9.38e-08) (4.65e-07) β3 ; (Lagged NTL)*Net Inflow 1.88e-06*** 6.36e-06*** (1.97e-07) (1.62e-06) β5 ; (Lagged NTL)2* Net Inflow 4.44e-06*** (1.05e-06) Observations 8,497 8,497 Number of settlements 293 293 District Fixed Effect Yes Yes Time Fixed Effects Yes Yes Standard errors in parenthesis, *** p<0.01, ** p<0.05, * p<0.1. Note: The settlements are spread across seven districts. Regressions control for population in settlement in 2017, the share of settlement income derived from agricultural activities in 2020, and a constant. The quadratic regression displays a concave relationship between displacement flows and NTL growth, which suggests diminishing marginal returns of displacement flows on NTL. When comparing settlements of equal population size in 2017, settlements that show the largest impact of displacement flows on NTL expansion are those that received the lowest cumulated levels of displacement inflows. There exist several situations whereby a marginal increase in the net inflow of displaced persons within a settlement would increase the NTL levels. Figure 7 displays the set of combination of Net Inflow and past NTL values such that a marginal increase in the net total inflow of displaced person hosted over 2018-2020 increases current NTL level. As can be seen in the green shaded area, marginal increase in Net inflow would raise the NTL as long as the net inflow is smaller than f(NTL), with f(.) a quadratic function defined below the graph. 6 Note: Let f( ) ≡ 37310.9 − 17647.1/ 2 + 53445.4/ . Equation 7 positive when Net Inflowi < f( ,−1 ) Figure 7. Conditions for an increase in net displacement inflow to raise the NTL level. (green shaded area) Inbound areas. In settlements that have a positive net inflow of displaced population, a marginal increase of net inflow would raise the NTL level whenever the inflow of displaced person is low enough, as long as there is some minimum level of human activities. 7 It will however always drop the NTL level if the settlement serves as important inbound with a net inflow of displaced population large enough. 8 In outbound areas, a marginal increase in the net inflow of displaced person would raise the NTL levels whenever the total outflow is negative enough, as we will have that Net Inflowi < f�NTLi,t−1 � < 0 . 9 Outbound areas. In settlements that serve as outbound areas for a small number of displaced persons, a marginal decrease in outflows (i.e., Net Inflow becoming less negative) would drop the NTL level if the past NTL level is small enough. Indeed, equation (7) is negative whenever f(NTLi,t−1 ) < Net Inflowi < 0. In other words, in outbound settlements with small outflows, there exists a range of NTL (smaller than 0.28) such that a reduction in the total outflows (i.e., marginal increase in net inflow) would reduce the NTL levels. Robustness checks can be found in the appendix, whereby we modify the size of the catchment area around settlement GPS points to construct alternative NTL averages, and check whether initial urbanization levels affect the relationship between NTL growth and displacement. In addition, we found no evidence of a correlation between displacement flows and initial NTL levels. Remembering that NTL observations represent the average NTL levels across 6 We have that f(NTLi,t−1 ) <0 for all NTLi,t−1 ∈]0; 0.28[. The function f is decreasing from 0 to 0.67, and increasing from 0.67 onward, with f(0.66) = 77,255.6 In addition, equation-8 is negative when NTLi,t−1 = 0 , and lim () =37,071. → + ∞ 7This is because equation (7) is positive only if Net Inflowi < f (NTLi,t−1 ) with f(x) ≡ 37310.9 − 17647.1/ 2 + 53445.4/. A sufficient condition for Equation-8 to be positive NTL > 0.28, while it is negative if Net Inflow > 77,255 d NTLi,t Equation-8 can be written as 8 = −2.10e−06 + 6.36e−06 ,−1 + 4.44 −06 2 ,−1 + 2. X i (−5.99e −11 2 ,−1 ) .Equation-8 is d positive if 0 < Xi < f(NTLi,t−1 ) ≡ 37310.9 − 17647.1/NTLi,t−1 2 + 53445.4/NTLi,t−1 . However, the numerator is positive as long as NTLi,t−1 > 0.28; and equation-8 negative if NTLi,t−1 = 0.28. 9 As can be seen in the left lower part of Figure 7, one gets that lim (NTLi,t−1 ) =-∞. Hence, whenever Net Inflow is sufficiently negative, NTLi,t−1 → 0 we find that equation-8 is positive for all NTLi,t−1>0. a 1km radius around the settlement GPS point, any increase in the NTL value reported could represent either a variation in the intensity of night-time light, or could capture a spatial extension (still within the 1km radius). Our data does not allow to identify directly whether NTL growth originate from a spatial expansion or a variation in intensity. This question could be further investigated through an analysis of the relationship NTL growth and urban expansion. Appendix D: Differences based on initial NTL levels Using a Jenks decomposition to identify natural breaks in the distribution of settlements’ initial NTL level as of January 1st, 2018, settlements are classified into two groups (high or low initial NTL) to measure the urbanization starting point. On average, settlements had an initial NTL level of 0.217, with a minimum of 0.175 and a maximum of 1.058. A Jenks decomposition is used to identify two groups of settlement (higher or lower initial NTL), based on the distribution of NTL levels (on Jan 1st, 2018) throughout the Badghis settlement. The Jenks method allows to create two groups such that the variance within a group is minimized, while the variance between groups is maximized. The categorization is such that 289 settlements are associated with the low NTL group, with a minimum of 0.175 and a maximum of 0.414. On the other hand, 5 settlements are classified into the higher NTL initial level, with a minimum NTL of 0.474 and a maximum of 1.058. Table C.1. Summary statistics of NTL levels as of Jan 1st, 2018 Number of settlements Mean Std. Dev. Min Max Summary statistics for whole sample Sample: 294 settlements 294 0.217 0.074 0.175 1.058 Summary statistics by group Sample: Group “Lower NTL” 289 0.209 0.030 0.175 0.414 Sample: Group ”Higher NTL” 5 0.694 0.232 0.474 1.058 Figure C.1. Kernel density of initial NTL level (Jan 1st, 2018) – top 1% excluded. Kernel density of NTL levels on Jan 1st, 2018 Bottom 99% 30 20 Density 10 0 .18 .2 .22 .24 .26 .28 NTL level on 01Jan2018 kernel = epanechnikov, bandwidth = 0.0020 Source: Authors’ computations The impact that displacement flows may have on the NTL growth does depend on initial level of NTL, as urbanized areas do not seem to be impacted by inflows of displaced. We run a linear and quadratic regression following equation (3) and equation (6), on two different sample: the 145 observations with high initial NTL levels (5 settlements across 2018-2020), and the 8381 observations with lower initial NTL levels (289 settlements). The linear and quadratic regression show that larger displacement inflows are always correlated with larger NTL growth in settlement with higher initial NTL levels, at, 95% confidence interval level, see columns 3 and 4 of Table C.2. The linear and quadratic regressions analysis on the settlements starting with lower initial NTL levels return mixed results. Table C.2. column (1) show that equation (4) is positive, i.e., the NTL growth increases with the Total Inflow, as the interaction term is positive. The linear regression suggests that a marginal increase in net inflows of persons would increase the NTL in places with medium-high levels of human activities, i.e., with NTL above 0.30. The quadratic regression displays interesting patterns: - In outbound settlements, a reduction in the net outflows of displaced persons raises the NTL level in outbound areas with important human activity. A sufficient condition for a marginal decrease in outflows (i.e., an increase in Total Inflow) to be associated with an increase in NTL levels is that past NTL be larger than 0.9. 10 - In inbound areas, a marginal increase in net inflows of persons would increase the NTL in places with small levels of human activities, i.e., with NTL between 0 and 0.29. Table C.2. Regression of NTL on net inflows of persons in settlement on two different samples (higher vs lower initial NTL levels) Y = NTL levels Sample: Small initial NTL Sample: Large initial NTL (1) (2) (3) (4) VARIABLES Linear Quadratic Linear Quadratic 1 ; Lagged NTL 0.563*** 0.917*** 0.570*** 1.201*** (0.00947) (0.0210) (0.0911) (0.297) 4 ; (Lagged NTL)2 -0.393*** -0.392** (0.0147) (0.164) 2 ; Total Inflow -9.09e-07*** -4.09e-06*** 1.55e-05** -1.55e-05 (7.78e-08) (3.87e-07) (6.09e-06) (0.000128) 3 ; (Lagged NTL)*Total Inflow 3.01e-06*** 1.34e-05*** 4.04e-06 0.000347 (1.67e-07) (1.40e-06) (5.61e-06) (0.000285) 5 ; (Lagged NTL)2* Total Inflow 3.44e-06*** -9.81e-05 (9.64e-07) (0.000161) 6 ; (Total Inflow) 2 1.39e−11 *** -7.17e-10 (0) (6.01e-09) 7 ; (Lagged NTL)*( Total Inflow)2 -3.20e-11*** -1.32e-08 (0) (1.29e-08) 8 ; (Lagged NTL)2*( Total Inflow)2 -5.30e-11*** 3.55e-09 (0) (7.24e-09) 9 ; Population in 2017 1.06e-05*** 7.21e-06*** 7.83e-05** -1.92e-05 (6.93e-07) (6.44e-07) (3.28e-05) (2.04e-05) 10 ; % agriculture in 2020 -6.40e-05* -1.85e-05 0.00769*** 0.00641*** (3.40e-05) (3.11e-05) (0.00270) (0.00207) 0 ; Constant 0.109*** -5.30e-11*** -0.812** 0 10d NTLi,t = −4.09 −06 + 1.34 −06 ,−1 + 3.44 −06 2 −11 − 3.20e−11 ,−1 − 5.30 −11 2 ,−1 + 2. X i (1.39e ,−1 ) ≡ h(NTL) + 2e - d Xf(NTL) with f(NTL) positive if NTL in [0,0.29] and h(NTL) positive if NTL>0.9. 11 (0.00596) (0) (0.403) (0) Observations 8,352 8,381 145 145 Number of settlements 288 289 5 5 District Fixed effect Yes Yes Yes Yes Time Fixed effects Yes Yes Yes Yes Standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 Appendix D. Robustness tests Variation in settlements’ buffer sizes Robustness checks will be run, by modifying the monthly NTL values associated to each settlement, through changes in the size of the catchment area around settlements GPS points (from 1km to 2 and 5km radius). As discussed earlier, we do not have information on the actual settlements’ borders, and must rely on averaging NTL values around the given GPS location. While the study focused on a 1km-by-1km cell, this section extracts the average NTL values observed within 2 and 5km radius around the IOM settlements GPS point. As a reference, a 2km distance would take roughly a 20min walk, while 5km would be completed in 1hour. Summary statistics are available in Table 5.0.1. Table 5.0.1. Summary statistics of NTL levels, average across 2km and 5km radius around GPS points Variable Observations Mean Std. Dev. Min Max NTL (average across 2km radius) 9,702 0.31 0.11 0.04 1.63 NTL (average across 5km radius) 9,702 0.31 0.14 0.03 1.85 Source: Authors’ computation Linear results are robust to the construction of NTL average based on a 2km and 5km radius instead of 1km, and the 5km radius results suggest an even stronger positive correlation between Net Inflow and NTL growth. The 2km radius analysis suggests that a marginal increase in Net Inflow would increase the NTL levels in settlements which had a past NTL levels larger than 0.64. The 5km radius analysis shows that a marginal rise in Net Inflow would always be associated with an increase in NTL levels. Quadratic results become more complex when using the NTL average based on a 2 and 5km radius instead of 1km, as some results differ. In inbound areas: - Using the 2km radius, a marginal increase in Net Inflow would increase NTL level when the initial level of human activity is low enough (for NTL below 0.22), while the 5km radius analysis suggest that this is true (for NTL below 0.31), provided that the settlement is an important receiver of net inflows, i.e., the Net Inflow is high enough. d NTLi,t o Using the 2km radius average for NTL values, equation (7) can be decomposed as ≡ d d NTLi,t h(,−1 ) + Xi .f( ,−1 ), since = −3.44e−06 + 1.97e−05 ,−1 − d 2.37e−05 2 ,−1 + 2. Xi (9.40 −12 − 5.95e−11 ,−1 + 7.66e−11 2 ,−1 ) . One finds that f(NTL) is positive whenever NTL < 0.22 or whenever NTL >0.56. The function h(NTL) is positive when NTL in [0.25; 0.79]. d NTLi,t o Using the 5km radius average for NTL values, equation (7) can be decomposed as = d −5.1e −06 −05 + 1.87e ,−1 − 5.75e −05 2 + 2. Xi (1.88 ,−1 −11 −11 − 6.05e ,−1 ) ≡ h(NTL) + Xi .f(,−1 ), with = . One finds that f(,−1 )>0 whenever ,−1 < 0.31, and that the function h is always negative. Hence, equation (7) is positive whenever past NTL levels are above 0.31 and the Net Inflow is large enough. - Using the 2km radius, a marginal increase in Net Inflow would increase NTL level in areas with already some level of human activity (NTL > 0.56) as soon as the inflow of displaced persons is high enough; while the 5km radius infirm this finding. Using the 5km NTL average, there exists no level of net inflow Net Inflow high enough so that an increase in net inflow would raise the NTL levels, even in areas with some level of human activities In outbound areas, a marginal increase in the net inflow of displaced person would raise the NTL levels whenever the total outflow is negative enough in areas with past NTL levels high enough (above 0.56 for the 2km analysis, and above 0.31 for the 5km analysis). Table 5.0.2 Regression of NTL on net inflows of persons in settlement, 2km and 5km radius Y = NTL (x-km radius) x= 2Km x= 5km (1) (2) (3) (4) VARIABLES Linear Quadratic Linear Quadratic 1 ; Lagged NTL 0.878*** 0.839*** 0.816*** 0.916*** (0.00591) (0.0212) (0.00741) (0.0260) 4 ; (Lagged NTL)2 0.0320* -0.155*** (0.0184) (0.0242) 2 ; Total Inflow 4.90e-07*** -3.44e-06*** -4.42e-08 -5.10e-06*** (8.45e-08) (6.25e-07) (7.70e-08) (4.37e-07) 3 ; (Lagged NTL)*Total Inflow -7.66e-07*** 1.97e-05*** 6.86e-07*** 1.87e-05*** (1.66e-07) (2.79e-06) (1.29e-07) (1.63e-06) 5 ; (Lagged NTL)2* Total Inflow -2.37e-05*** -5.75e-06*** (2.92e-06) (1.35e-06) 6 ; (Total Inflow) 2 9.40e-12 *** 1.88e-11*** (3.46e-12) (2.49e-12) 7 ; (Lagged NTL)*( Total Inflow)2 -5.95e-11*** -6.05e-11*** (0) (9.03e-12 ) 8 ; (Lagged NTL)2*( Total Inflow)2 7.66e-11*** 0 (0) (0) 9 ; Population in 2017 3.70e-06*** 3.77e-06*** 9.56e-06*** 7.15e-06*** (4.18e-07) (4.29e-07) (6.58e-07) (6.42e-07) 10 ; % of agriculture in 2020 -3.25e-05 -3.06e-05 6.02e-05* 0.000108*** (2.38e-05) (2.38e-05) (3.47e-05) (3.35e-05) 0 ; Constant -0.0240*** -0.0125* -0.00620 -0.0233*** (0.00397) (0.00669) (0.00525) (0.00811) Observations 8,497 8,497 8,497 8,497 Number of settlements 293 293 293 293 District Fixed Effect Yes Yes Yes Yes Time Fixed Effects Yes Yes Yes Yes Standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 Does displacement affect NTL differently based on initial urbanization levels? This section attempts to study even further how displacement flows may impact NTL growth differently based on the initial level of human activity and urbanization, e.g., depending on whether IDPs will have to set up a camp or can settle in an already existing urban center. While the initial regressions attempted to control for this by including initial population level (in 2017), this section goes further by allowing the relationship between NTL and displacement to depend on initial NTL levels (as of January 1st, 2018), or by replacing the main independent variable to account for displacement inflows as a share of the existing population. There is however no clear evidence of a correlation between displacement flows and initial NTL levels. No correlation was observed between total displacement outflows and initial NTL levels, and the dynamic observed in inbound settlements seems to be mostly driven by outliers (Figure 5.0). 11 Figure 5.0. Scatter plot of total inflow, by initial NTL level Outbound settlements Inbound settlements top 1% of outflouws excluded .7 .4 .6 .35 NTL on Jan 2018 NTL on Jan 2018 .5 .3 .4 .25 .3 .2 .2 -3000 -2000 -1000 0 0 50000 100000 150000 200000 250000 Total net inflow of displaced Total net inflow of displaced Source: Authors’ computations. The impact of displacement flows on NTL growth might depend on the relative size of the settlement, we hence test for the robustness of our result when changing the main independent variable to account for net inflow of displaced population as a share of total population in population. Displaced households settling into a new settlement might have a very different effect on NTL growth (through modification in urbanization and human activity), depending on whether they have to build a new settlement or settle in an already urbanized area. In particular, the speed of infrastructure construction and access to additional services is likely to differ, hence resulting in different NTL growth. Variables For the sake of robustness, the analysis will therefore be completed by replacing the independent variable with two alternative measures of net inflow of persons as a proportion of settlement population. First, the variable Average Inflow Share accounts for the average share of displaced population hosted in a given period (i.e., the average net inflow of displaced persons hosted in a settlement as a percentage of the total population on a given period). Figure 5.1 shows the density of the Average Inflow Share is variable across settlements in Badghis, while summary statistics can be observed in Table 5.1. Second, the variable Net Inflow Share is constructed as the sum of the net inflow of displaced persons divided by the cumulated population in settlement (both summed over each observation available for 2018-2020). Figure 5.2 shows the density of the Net Inflow Share is variable across settlements in Badghis, while summary statistics can be observed in Table 5.2 Figure 5.1 Distribution of average net inflow of displaced persons in settlements (as share of population) 11 The dynamic remains unchanged when including the top 1% outflows, i.e., settlements with more than -5000 net total inflow (more than 5000 individuals having fled the settlement – in net). Source: Authors’ computation based on IOM DTM 2020. Table 5.1. Summary statistics for the cumulative net inflow in Badghis settlements Variable Nb settlements Mean Std. Dev. Min Max Average Inflow Share 294 .0018038 .280172 -1.065 .8400924 Note: The net inflow of person can be negative if the population that migrate out of the settlement (either internationally or within Afghanistan) is higher than the number of displaced persons settling in. Figure 5.2. Distribution of the total net inflow of persons in settlements (as % of population) Total share of net arrivals in settlements over 2018-2020 4 3 Density 2 1 0 -.5 0 .5 1 Total share of displaced individuals kernel = epanechnikov, bandwidth = 0.0425 Source: Authors’ computation based on IOM DTM 2020. Table 5.2. Summary statistics for the cumulative net inflow as a share of cumulated population Variable Nb settlements Mean Std. Dev. Min Max Share 294 0.079 0.194 -0.331 0.881 Source: Authors’ computation based on IOM DTM 2020. Results There is a risk that NTL growth depends on the size of displaced population relative to the total settlement population, we therefore test the robustness of our results by first accounting for the average net influx of displaced person, as a share of settlement population. That is, we first replace our main independent variable Net Inflow by the variable , representing the cumulative net inflow of persons as a share of population in settlement (see Section 4.1 for descriptive statistics). The linear regression following equation (3) is are displayed in column 1 of Table 5.3, the quadratic regression equation (6) in column 2. Then, results are replicated by using the Net Inflow Share (rather than the average), in columns 3 and 4. The linear regression is robust to the inclusion of the alternative variables; a marginal increase in net relative inflow of displaced population (relative to settlement population) do increase the NTL, as long as the past NTL level is not too small. Assuming a linear relationship, equation (5) is positive as long as past NTL levels are larger than 0.24, see column (1) of Table 5.3. An increase in the average net inflow of displaced persons (as a proportion of population) raises the NTL level as long as past NTL level is not too small. Similarly, an increase in the total net inflow of displaced persons (as a proportion of cumulated population) raises the NTL level - equation (5) is positive - as long as past NTL levels are larger than 0.22. As a reference, all IOM settlements in Badghis recorded an NTL value above both thresholds at least one during 2018-2020. 12 Table 5.3. Regression of NTL on average and total share of displaced persons as share of population Y = NTL levels X= Average Inflow Share X = Net Inflow Share (1) (2) (3) (4) VARIABLES Linear Quadratic Linear Quadratic 1 ; Lagged NTL 0.674*** 1.214*** 0.646*** 1.211*** (0.00890) (0.0222) (0.00961) (0.0226) 4 ; (Lagged NTL)2 -0.385*** -0.426*** (0.0143) (0.0154) 2 ; Total Inflow -0.0799*** 0.0179 -0.0772*** 0.0765** (0.00613) (0.0123) (0.00822) (0.0309) 3 ; (Lagged NTL)*Total Inflow 0.337*** -0.254*** 0.363*** -0.590*** (0.0173) (0.0593) (0.0185) (0.136) 5 ; (Lagged NTL)2* Total Inflow 0.742*** 1.188*** (0.0717) (0.121) 6 ; (Total Inflow) 2 -0.0796*** -0.184*** (0.0217) (0.0482) 7 ; (Lagged NTL)*( Total Inflow)2 0.527*** 1.047*** (0.102) (0.195) 8 ; (Lagged NTL)2*( Total Inflow)2 -0.736*** -1.285*** (0.115) (0.159) 9 ; Population in 2017 8.80e-06*** 5.17e-06*** 8.68e-06*** 4.97e-06*** (7.66e-07) (7.47e-07) (7.66e-07) (7.43e-07) 10 ; % agriculture in 2020 8.28e-05** 0.000108*** 7.05e-05* 0.000108*** (4.07e-05) (3.91e-05) (4.08e-05) (3.92e-05) 0 ; Constant 0.0561*** -0.106*** 0.0638*** -0.101*** (0.00633) (0.00852) (0.00647) (0.00857) Observations 8,497 8,497 8,497 8,497 12 NTL levels around Badghis settlement evolved between 0.03 and 2.215 during the 2028-2020 period (see Table 3.3). Number of settlements 293 293 293 293 District Fixed effect Yes Yes Yes Yes Time Fixed effects Yes Yes Yes Yes Standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 The robustness analysis using quadratic regressions on both NTL and displacement measures yields mixed results, column 2 and 4 of Table 5.3. There are several cases whereby a marginal increase in the Average Inflow Share (relative to cumulated population) increases the NTL levels, e.g. in inbound settlements with large average net inflow and past NTL levels within a medium-low range, or in important outbound settlements with both low and high past levels of NTL. 13 - In inbound settlements, a sufficient condition for equation (7) to be positive is that past NTL level lies between 0.35 and 0.50. As a reference, this corresponds to around 20% of observations in Badghis. Another sufficient condition is that past NTL ranged between [0.22;0.50] and that the average inflow share be sufficiently large. - An increase in the Average Inflow Share (average net inflow of displaced relative to settlement population) will reduce the NTL as soon as the past NTL and Average Inflow Share are high enough. A sufficient condition for equation (7) to be negative is that the Average Inflow Share (Xi) is high enough and the past NTL level is higher than 0.5. Around one quarter of Badghis settlements reached that NTL threshold at least once during the 2018-2019 period. - In settlements that serve as important outbound areas, a reduction in the average outflow share (increase in the average inflow share) would increase the NTL both in areas that have very low or medium-high level of human activity. In outbound areas, a sufficient condition for a marginal increase in the Average Inflow Share to increase the NTL is that the average outflow is large enough, in areas with low past NTL levels (below 0.22) or high NTL levels (larger than 0.5). There are several cases whereby a marginal increase in the total net inflow (relative to cumulated population) increases the NTL levels, e.g., in already urbanized outbound areas with high NTL, and in important inbound areas with medium human activity levels. - In inbound areas, a sufficient condition for equation (7) to be positive is that NTL in [0.26,0.56] in inbound area. It is negative if past NTL levels are larger than 0.55 and inflows are large enough. That is, a marginal increase in the total inflow of displaced persons as share of cumulated population would decrease the NTL in important inbound areas that already have medium-high level of human activity. 14 - As before, low past NTL value (below 0.26) or medium-high (above 0.56) associated with a large negative Net Inflow Share (important outbound) also results in equation (7) being positive. That is, in urbanized settlements from which people flee massively, a marginal decrease in the number of people leaving the settlement (i.e., a net increase in Net Inflow Share) will be associated with an increase in NTL levels. d NTLi,t d NTLi,t 13 When = Average Inflow Share, equation-8 becomes = = −0.254. ,−1 + 0.736. 2 ,−1 + 2. X i (−0.0796 + d d 0.736. 2 0.527. ,−1 − ,−1 ). The term between brackets is positive when ,−1 ∈[0.22;0.50]. In addition, −0.254. ,−1 + 0.736. 2 ,−1 > 0 when ,−1 > 0.35. 14d NTLi,t = 0.0765 − 0.59. ,−1 + 1.188. 2 2 d ,−1 + 2. X i (−0.184 + 1.047. ,−1 − 1.285. ,−1 ) = h(NTL) + 2Xf(NTL) with f(NTL)>0 if NTL in [0.26;0.56] and h(NTL) >0 for all NTL. Appendix E. Alternative data sources IOM’s CBNA collects information on arrival IDPs twice a year based on key informant interviews at the level of settlements, which are aggregated into districts and regions. In 2020, new displacements were reported across 1,069 settlements in 7 districts of the Badghis province. In CBNA, respondents are asked to indicate the percentages of IDPs that arrived at a settlement due to conflict and natural disasters; in the dataset, displacements are grouped into four categories, such as induced by conflict, conflict-natural disasters, natural disasters, and natural disasters-conflict. OCHA’s dataset (HDX 2021) includes information on conflict-induced displacements at the level of districts, collected through the alert system; additionally, OCHA reports the dates of displacements. In 2020, there were 66 incidents (dates) of displacement due to conflict, and those IDPs arrived in 2 districts of the Badghis province. Comparisons of IOM and OCHA data on new displacements were performed for the districts reported by both organizations. IOM reported more conflict-induced displacements than OCHA in the Ab Kamari district but no settlements with only conflict-induced arrival IDPs in the Qala-e-Naw district. At the same time, the IOM’s numbers of IDPs due to conflict and conflict-natural disasters exceed the OCHA’s numbers of conflict-induced IDPs in both districts. Effective comparisons of IOM and OCHA data on displacements may not be conducted due to differences in their data collection methodologies. Period of comparison The IOM’s CBNA contains data on new displacements in Afghanistan during 2018-2020; the dataset that we currently have includes 4 periods: January-December 2018, January-June 2019, January-June 2020, and January-December 2020. The OCHA data on conflict-induced displacements are available for 2016-2020. Table A1. New displacements in Badghis province reported by IOM and OCHA. Organization 2016 2017 2018 2019 2020 Jan-June July-Dec Jan-June July-Dec 2019 2019 2020 2020 IOM - - 168197 15 32983 - 19747 24819 OCHA 15157 27361 9918 24113 2053 1810 6072 A comparative analysis of IOM and OCHA data on IDPs in the Badghis province was performed for January- December 2020, except for correlation analysis, which was done for 2019-2020. Description of IOM data on displacement IOM’s CBNA provides information on internally displaced people (IDPs) twice a year based on key informant interviews at the levels of settlements, districts, and provinces in Afghanistan. In particular, for the period of January-December 2020, the data are available for 1069 settlements and 7 districts of the Badghis province. Summary statistics for new displacements in 2020 are presented in Table 2. The largest numbers of IDPs arrived in Bala Murghab (22,491 new IDPs; on average, 91 people in each of 247 settlements) and Qala-e-Naw (12,289 new IDPs; on average, 90 persons in 136 settlements) districts of the Badghis province. Table A2. The summary statistics for Arrival IDPs (new displacements) in Badghis province in 2020. 16 Districts Mean Std. Dev. Freq. (# of settlements) 15The number of ArrivalIDPs2018 in IOM’s 2018 CBNA is too large and most likely to include displacements in previous periods (not only IDPs that arrived in 2018). This number cannot be compared with OCHA’s number. 16 Stata command: .tab ADM2NameEnglish, sum (ArrivalIDPs2020). Ab Kamari 7.5846774 25.656578 248 Bala Murghab 91.05668 283.35422 247 Ghormach 13.2 39.750008 20 Jawand 16.623077 65.459647 130 Muqur 21.822485 46.781577 169 Qadis 15.058824 31.127211 119 Qala-e-Naw 90.360294 204.55805 136 Total 41.689429 162.17494 1,069 In 2020, 44,566 IDPs arrived in the Badghis province due to various reasons. Table A3 shows the breakdown of numbers of IDPs that arrived due to conflict, natural disasters, and their combinations 17. It additionally specifies the numbers of IDPs due to conflict (9,504) and due to conflict and conflict-natural disasters (30,571). Table A3. IDPs arrived in Badghis province in 2020. Arrival IDPs due to: conflict, conflict and conflict conflict-natural disasters, conflicts-natural disasters natural disasters, and Districts natural disasters-conflict Ab Kamari 1,881 1,503 154 Bala Murghab 22,491 16,715 9,005 Ghormach 264 257 84 Jawand 2,161 189 0 Muqur 3,688 2,207 240 Qadis 1,792 1,190 21 Qala-e-Naw 12,289 8,510 0 Total 44,566 30,571 9,504 Description of OCHA data on displacement OCHA reports conflict-induced displacements for districts and provinces of Afghanistan. OCHA dataset additionally includes dates of displacement. According to OCHA, in 2020, there were 66 incidents (dates) of displacement in the Badghis province (Table A4). Table A4 presents summary statistics for conflict-induced IDPs based on displacement incidents (not comparable to IOM data because IOM’s summary statistics is for settlements). Table A4. The summary statistics for conflict-induced IDPs in 2020. Districts Mean Std. Dev. Freq. (# of displacement incidents) Ab Kamari 20 0 1 Qala-e-Naw 120.95385 93.139417 65 Total 119.42424 93.251861 66 As reported by OCHA, in 2020, conflict-induced IDPs settled in two districts of the Badghis province Qala-e-Naw (7,882 people) and Ab Kamari (20 people) (Table A5). 17The IOM questionnaire asks to indicate the percentages of IDPs that arrived at a settlement due to conflict and natural disasters, which then in dataset are grouped into 4 categories: conflict, conflict-natural disasters, natural disasters, and natural disasters-conflict. Table A5. Conflict-induced IDPs settled in Badghis province in 2020. Districts Displaced individuals (or arrival IDPs) Ab Kamari 20 Qala-e-Naw 7,862 Total 7,882 Comparison of IOM and OCHA data IOM’s CBNA reports arrival IDPs for settlements (in 2020, 1,069 settlements in the Badghis province), which are then aggregated into districts (7 districts in the region). OCHA’s data only contains information for districts; in 2020, conflict-induced IDPs arrived in two districts of the Badghis province. Table A6 compares displacements reported by OCHA to those recorded by IOM for two districts that are included in the OCHA’s dataset. For example, OCHA reported the arrival of 7,862 IDPs due to conflict in Qala-e-Naw district. According to IOM, settlements in this district reported 9,350 IDPs that arrived due to conflict and conflict-natural disasters (but 0 IDPs if to look at settlements that reported the arrival of IDPs due to conflict only). This hints that OCHA data may also include displacement due to natural disasters along with those due to conflict, even though the former is not specified. Table A6. Comparison of OCHA and IOM’s data on displacements in Badghis province in 2020. OCHA IOM Reason for conflict conflict; conflict; conflict displacement conflict-natural conflict-natural disasters; disasters natural disasters; Districts natural disasters- conflicts Ab Kamari 20 1,881 1,503 154 Qala-e-Naw 7,862 12,289 8,510 0 Other districts 0 30,396 20,558 9,350 Total 7,882 44,566 30,571 9,504 Correlation analysis of OCHA and IOM’s data on internal displacement was conducted based on a limited number of comparable observations (districts) (Table A7). Pearson’s correlation coefficient is 0.8162 (p<0.01), which, however, may not be interpreted because of too few observations. Table A7. Arrival IDPs in 2019-2020 Periods Districts IOM OCHA Jan-June 2019 Ab Kamari 1071 0 Bala Murghab 20445 17335 Ghormach 273 0 Jawand 1127 0 Muqur 1288 0 Qadis 1503 0 Qala-e-Naw 11432 6778 Jan-June 2020 Ab Kamari 972 20 Bala Murghab 11011 0 Ghormach 0 0 Jawand 700 0 Muqur 1686 0 Qadis 763 0 Qala-e-Naw 4615 1790 July-Dec 2020 Ab Kamari 909 0 Bala Murghab 11480 0 Ghormach 264 0 Jawand 1461 0 Muqur 2002 0 Qadis 1029 0 Qala-e-Naw 7674 6072 Changes in the numbers of Arrival IDPs (stock of IDPs) in 2018-2020 In 2020, the number of IDPs (protracted IDPs) in the Badghis province decreased by 24,327 compared to 2018 (Table A8). In particular, this effect was achieved due to IDPs leaving Qala-e-Naw (-67,888) and Ab Kamari (-693) districts. In other districts, the numbers of IDPs increased as compared with those in 2018. These numbers should be interpreted with caution as the government of Afghanistan disputes IOM’s numbers of protracted IDPs. Table A8. Changes in the numbers of IDPs in 2020 vs. 2018 according to IOM data Districts 2018 2020 Absolute change Percentage change Ab Kamari 7,806 7,113 -693 -8.9 Bala Murghab 9,985 44,118 34,133 341.8 Ghormach 1,146 1,146 - Jawand 3,319 6,138 2,819 84.9 Muqur 7,203 12,319 5,116 71.0 Qadis 8,069 9,109 1,040 12.9 Qala-e-Naw 233,900 166,012 -67,888 -29.0 Total 270,282 245,955 -24,327 -9.0