Policy Research Working Paper 10683 The Welfare Cost of Drought in Sub-Saharan Africa Jon Gascoigne Sandra Baquie Katja Vinha Emmanuel Skoufias Evie Calcutt Varun Kshirsagar Conor Meenan Ruth Hill Poverty and Equity Global Practice January 2024 Policy Research Working Paper 10683 Abstract This paper quantifies the impact of drought on household household data, allowing survey data from close to 100,000 consumption for five main agroecological zones in Africa, households to be used in the analysis. The damage func- developing vulnerability (or damage) functions of the tions are used to quantify the impact of historical weather relationship between rainfall deficits and poverty. Damage conditions on poverty for eight countries, highlighting the functions are a key element in models that quantify the risk to poverty outcomes that weather variability causes. risk of extreme weather and the impacts of climate change. National poverty rates are 1–12 percent higher, depending Although these functions are commonly estimated for on the country, under the worst weather conditions relative storm or flood damages to buildings, they are less often to the best conditions observed in the past 13 years. This available for income losses from droughts. The paper takes amounts to an increase in the total poverty gap that ranges a regional approach to the analysis, developing standardized from US$4 million to US$2.4 billion (2011 purchasing hazard definitions and methods for matching hazard and power parity). This paper is a product of the Poverty and Equity Global Practice. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at rhill@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team The Welfare Cost of Drought in Sub-Saharan Africa Jon Gascoigne (Centre for Disaster Protection) ⓡ Sandra Baquie (World Bank) ⓡ Katja Vinha (Consultant) ⓡ Emmanuel Skoufias (National University of Singapore and World Bank) ⓡ Evie Calcutt (World Bank) ⓡ Varun Kshirsagar Consultant) ⓡ Conor Meenan (Centre for Disaster Protection) ⓡ Ruth Hill (World Bank and Centre for Disaster Protection)1 JEL Codes: Q54, I32 Project Code: P177432 Keywords/topics: Shocks and vulnerability to poverty, drought 1 ⓡ indicates randomized author order using the American Economic Association author randomization tool, https://www.aeaweb.org/journals/policies/random-author- order/search?RandomAuthorsSearch%5Bsearch%5D=XxSZtJmQOh5k. This paper started as a result of a conversation with Jon Gascoigne in early 2020. Jon understood better than any of us the need for a multidisciplinary team to understand how to model the welfare impacts of drought. Collaborating across disciplines can be a challenge, and negotiating those discussions requires good grace, humility, a strong focus on relationship, and a genuine interest in understanding another’s view and expertise. Jon exemplified these characteristics in all of his work, and true to form, brought them to this project too. We think it is very fitting that the randomization tool placed him as first author on this paper early in its drafting. We are saddened he didn’t get to see the final product and to know just how grateful we all were to him and the example he set us. We thank Stefanie Brunelin and Odyssia Ng for being part of the early conversation on this paper and we thank Bhavin Thakrar and Paul Wilson for working on parts of the simulations data exercise. 1. Introduction Climate disasters reverse gains and limit progress in efforts to reduce poverty. Global extreme poverty is increasingly concentrated in Sub-Saharan Africa (SSA) where drought is a major risk, a risk that is increasing in many parts of the continent because of climate change. Investments to reduce the impact of drought on welfare are thus essential to poverty reduction efforts in the region. Quantifying the welfare needs that arise when climate shocks occur is an important part of targeting investments to reduce their impact (Clarke and Dercon 2016). This requires vulnerability or damage functions that relate losses in welfare to climate conditions (Auffhammer 2018). While these are widely available for the impact of floods and cyclones on buildings, this is not the case when it comes to quantifying the impact of drought on household consumption in Sub-Saharan Africa. This paper pilots a regional approach to estimating vulnerability functions for drought and consumption using methods widely employed in the microeconomic literature (Dell, Jones, and Olken 2014). The analysis uses survey data from close to 100,000 households across nine countries and five livelihood zones, allowing considerable variation in climatic conditions to be utilized in estimation. The result is stable estimates that are not driven by any one climate event. We find that consumption is reduced by 10-20 percent in the worst weather observations in our surveys (corresponding to about a 1 in 10-year drought event). We use the estimated vulnerability functions and the full hazard distribution in each household location to calculate the risk drought poses to welfare outcomes. The results indicate that risk is large. Poverty is 1–12 percent higher under the worst weather conditions relative to the best conditions observed in the past 13 years. This amounts to an increase in the total national poverty gap that ranges from US$4 million to US$2.4 billion (2011 PPP) across countries. To our knowledge, these are the first cross- country estimates of the poverty impact of drought in Africa. There are several ways in which damage functions for rainfall deficits have been estimated in the climate economics literature. Auffhammer (2018) provides a review and examples of the approaches applied to US farm incomes. The one used for estimating the impact of extreme weather events uses observed data to identify the causal impact of weather changes on outcomes of interest (Dell, Jones, and Olken 2014). Data on hazards are matched to household observations in survey data using information on the geographic location of a household, and a fixed effects regression is run to identify the impact of weather in determining economic outcomes. Although not undertaken with the purpose of estimating vulnerability functions, several studies have used this method to estimate the impact of drought on welfare. For example, in Africa, Hill and Porter (2017) look at the impact of a water requirement satisfaction index (WRSI) on consumption and poverty, and Hill and Mejia-Mantilla (2017) look at the impact of WRSI on household income and consumption in Uganda. In Malawi, Baquie and Fuje (2020) look at the impact of rainfall on consumption and poverty. Wineman, Mason, Ochieng and Kirimi (2019) examine the relationship between rainfall and income, calorie consumption and poverty in Kenya. Baez, Kshirsagar and Skoufias (2019) look at the impact of rainfall defined dry spells on nutrition outcomes. Outside Africa, Kochhar and Knippenberg (2023) look at the impact of NDVI, rainfall and temperature on consumption and poverty in Afghanistan. Studies that use estimated impacts of drought to predict the welfare cost of other weather events include Hill and Porter (2017), Porter and White (2016), Kochhar and Knippenberg (2023). 2 This study uses the same methods as these papers but takes a regional approach (as in Baez, Kshirsagar and Skoufias 2019) with the objective of developing robust estimates of the relationship between drought conditions and consumption that is not specific to one context or event. This relationship can then be used to predict the impact of drought for other events or locations. The estimates presented in this paper are within the range of results found in country-specific papers. For example, a moderate (about a 1 in 10 year) drought is predicted to reduce consumption by 15 percent and 9 percent in Uganda and Ethiopia, akin to the 10 percent loss we estimate for crop cultivating households in maize and highland zones in this paper. 2 Anttila‐Hughes and Sharma (2015) take the same approach for storms. 2 Taking a regional approach requires the use of standardized hazard definitions and methods for matching hazard and household data. This is done by considering the main hazard measures and techniques used in comparable country studies, and systematically assessing the choice of hazard, the hazard-household matching approach, and the functional form used in analysis. Specifically, we use gridded historical data on precipitation, vegetation, evapotranspiration, and soil moisture, using measures such as the Normalized Difference Vegetation Index (NDVI), the Standardized Precipitation Evapotranspiration Index (SPEI), and WRSI. These measures reflect the hazard to livelihoods from lack of rain in different ways. Some are direct measures of climate conditions (e.g., precipitation, soil moisture, and SPEI), others aggregate climate data through crop models (WRSI), and others measure vegetation outcomes (NDVI). The gridded hazard measures are then merged to harmonized household surveys from nine countries (Ethiopia, Lesotho, Malawi, Mauritania, Mozambique, Niger, Nigeria, Zambia, and Zimbabwe) with location coordinates for either the household or the community. The surveys used are the ones that are also used for official poverty estimates. A consistent approach to temporal matching was used to take into account the timing of the survey and agricultural seasons. Different approaches to spatial merging were tested. There are important methodological insights from the analysis that emerge. First, we find that soil moisture and greenness measures provide more consistently reliable results than evapotranspiration or rainfall measures. This is an important finding given most of the extant literature is based on evapotranspiration or rainfall metrics. Second, we find that a linear specification performs adequately for the types of moderate shocks that are well-represented in the survey data used. Third, we find that a hazard measure based on 20 km or 50 km radius around each household performs better than those that use smaller radii, perhaps because a larger radius allows local market effects to also be considered. The relationship between hazard measures and welfare outcomes is estimated separately for different livelihood zones across SSA. Separate estimation provides needed flexibility, given that drought conditions manifest themselves differently across livelihood zones. Important differences in impacts between agroecological zones and between land and livestock owners are observed. However, the downside of a regional approach is a limited ability to explore the heterogeneity of impacts across households in detail without significant investments in data harmonization, and we leave additional work on the heterogeneity of impacts as something to explore further in future analysis. There are limitations to the analysis undertaken that should be borne in mind when interpreting results. First, estimates capture the direct impact of local drought conditions and not impacts beyond the local area, such as those that might arise from disruption of markets or price impacts. The degree to which this is a concern depends on the relative importance of direct and indirect impacts. The evidence is mixed on this. Hallegatte et al. (2016) suggest food prices are an important channel of the welfare impacts of climate change. However, Artuc et al. (2023) model trade between locations and find the main impact is the direct production effect, not the impact on prices. To the extent this is the case the local estimates presented in this paper may be quite close to aggregate impacts. Secondly, the empirical estimates are subject to “survivor bias” in that consumption is only measured for households still in existence after the shock has occurred. If households move or disintegrate (perhaps due to the death of one or more household members) because of drought conditions, the estimated impacts could be over or underestimated. This is much more likely to be the case for severe drought conditions, whereas those that are captured in our surveys tend to be less extreme events that are unlikely to cause movement or disintegration of households. This brings us to the third potential source of bias in the results. Although the analysis uses survey data across many weather conditions in the analysis, there are very few observations in the survey data that experience drought conditions below two standard deviations from the mean. This limits the degree to which non-linearities can be estimated. It also means there is limited empirical support for predicting the welfare costs of extreme events. A final source of bias comes from measurement error in the hazard data. Although care was taken to use the best available data and match hazard and household data carefully, any measurement error in the hazard variable will attenuate the estimated coefficients, thereby underestimating the welfare impacts. Additional uncertainty is 3 present given the distribution of hazards is changing over time with climate change and historical data forms the basis of all simulations presented in this paper. The models built in this paper are an important first step toward constructing vulnerability or damage functions as used in the catastrophe risk modeling literature, and can inform disaster response planning and financing. However, these limitations mean that care is needed in interpreting the results of the analysis. We envisage that the set of tools developed herein will provide the basis for refining the analysis as more data become available, and expanding coverage to other regions. Accompanying this paper is code designed to extract a large set of potential shock measures and determine which of these are consistent with changes in the welfare measures (out- of-sample). The modular code used is intended to be transparent, to aid in replicability, and to be available for others to improve upon. The goal is to encourage both learning and common standards, which in turn contribute to the implementation of drought-responsive policies that are objective and transparent. The paper is structured as follows. Section 2 presents the empirical framework for assessing the relationship between welfare and hazards in rural areas where rainfall-dependent agricultural production is dominant, and where rainfall deficits (droughts) thus may have very serious welfare consequences. It sets out the approach used to estimate the vulnerability function and an exceedance probability curve that describes the welfare losses likely to occur because of rainfall deficits of increasing severity. Section 3 discusses in detail the historical hazard database and the household survey data used, the merging of the hazard data to the household survey data, and the specifications used for the hazard measures. Section 4 presents estimates of vulnerability functions. Section 5 shows how these vulnerability functions can be used to project welfare losses and presents some exceedance probability curves. Section 6 offers some conclusions. 2. An Empirical Framework for an Ex Ante Assessment of the Welfare Costs of Drought A catastrophe risk modeling framework uses three components to generate an ex ante assessment of the costs of a catastrophe: hazard, exposure, and vulnerability function. Hazards capture data on the possible weather outcomes that may occur in a given place, such as possible tropical cyclones. Exposure captures what could be affected by the hazard in that location, such as the number and type of buildings. A vulnerability function provides an estimate of the cost of damage that will occur as a result of a given hazard and a given exposure. In the example of tropical cyclones and buildings, it would indicate the cost of damage for a given windspeed affecting a given type of building. Combining the three components allows development of a probabilistic impact curve, which indicates how the cost of hazard varies with the probability of the hazard for a specific place, given the expected hazards and the exposed assets. This section of the paper shows how this framework can be applied to generate a probabilistic impact curve for drought in SSA. For drought in SSA, the hazard is rainfall deficits during the growing season. As noted in the introduction, there are different ways of measuring this hazard, and the next section discusses the relevant data in more detail, including the time period for which historical data are available and use of the data to generate an estimate of the probability distribution of hazards in the near future. Exposure in the current case is not about physical infrastructure and its location and characteristics, but about households that may be impacted differently by rainfall deficits depending on their location and characteristics. The data on households are also described in more detail in the next section. The vulnerability function in this case is the income or welfare losses among households engaged in different livelihoods, which result from lower yields, lower demand for labor, and increased prices that may result from that. The rest of this section focuses on the estimation of the vulnerability function and how it is used to derive a probabilistic impact curve. The estimation of the vulnerability function is widely present in the microeconometric literature, even though it is not referred to as such or used for generating estimates of probabilistic impact. 4 2.1. Estimating a vulnerability function Household welfare at any given time may be summarized in general by the function below: ℎ = ( , ℎ , ℎ ) + ℎ , (1) where ℎ denotes the measure of welfare (in the current analysis, log of per capita expenditures ℎ of household ℎ in locality or cluster ,), is the measure of the hazard at time at the level of the geographic cluster , ℎ denotes household observable characteristics, ℎ denotes an observable variable summarizing the environment of households, and ℎ is an error term summarizing the influence of all unobservable factors on welfare. 3 For the purposes of this study, welfare is measured by the monetary value of total consumption expenditures or food and non-food consumption. 4 The main objective of a vulnerability function is to summarize the relationship between the level of and the level of . If the function were known a priori it would be simple to plug in values for , , and to determine the value of the welfare outcome . In the absence of information on the function , methods need to be employed to estimate it. This necessity gives rise to a fundamental trade-off between the prediction accuracy and the interpretability of the model used (James et al. 2013). For example, machine learning methods, such as random forests, bagging, boosting, and support vector machines, treat as a black box. None of these methods is especially concerned about the exact form and interpretability of the function , provided that it yields accurate predictions for the welfare outcome. On the other hand, approximating the function with the average value of the welfare outcome ℎ conditional on specific values of the explanatory variables , 1ℎ , and ℎ — i.e., (, , ) = (ℎ | = , ℎ = , ℎ = ), or the conditional expected function (CEF)—is typically easier to interpret and is useful for deeper insights into the causal relationship between hazard and welfare (Angrist and Pischke 2009). These advantages, however, may come at the expense of predictive accuracy (in-sample or out-of-sample). 5 With these considerations in mind, we adopt the CEF approach for estimating vulnerability functions. To keep the analysis simple and tractable, the approach begins with an ad hoc specification of the controls used in the regression model, using a linear approximation to the CEF that is extended to a nonlinear specification for the hazard and allowing for the potential heterogeneity of the hazard’s impacts on welfare based on some easily and commonly observable household characteristics and environmental features. For any given welfare measure ℎ , a log linear approximation to equation (1) above yields ℎ = 0 + 1 ℎ + 2 + 1 ℎ + + + ℎ , (2) where denotes region- or district-level fixed effects and denotes country-survey year effects. The district- level fixed effects control for fixed spatial characteristics, whether observed or unobserved, and thus disentangle the hazard measure from many possible sources of omitted variable bias. The year effects further control for any common trends and thus help ensure that the relationships of interest are identified from idiosyncratic local 3 The elements of X can include the age and education level of the household head; the ethnicity, religion, total number, age, and gender composition of household members; ownership of assets such as livestock, phone, radio, TV, etc.; and characteristics of the household residence, including type and quality of water and sanitation facilities, main source of energy, and type of floor and material of construction for walls and roof. 4 Focusing on non-monetary dimensions of welfare such as the height for age z score of children (HAZ) or the weight for height z-score (WHZ), Skoufias et al. (2023) explore an alternative specification of the vulnerability function that allows consideration of the effects of hazards through the income as well as the environmental channels. In this specification, exposure to weather-related hazards may affect the value of agricultural income earned (the income channel), as well as the health status of household members (environmental channel). 5 In data-constrained environments like the ones we study, complex (black-box) machine learning models are unlikely to provide the gains in accuracy that justify the loss in interpretability. 5 shocks. Thus the empirical approach is very much in line with the strong identification properties of the weather- shock approach summarized by Dell, Jones, and Olken (2014). It should be noted that with district fixed effects, , included in the specification, the coefficients 1 can be estimated only if the values of ℎ vary within districts. Otherwise, the use of the district dummies will absorb the effects of these environmental factors. The impact of drought on consumption expenditures is the cumulative effect of a household’s crop loss and a series of entitlement failures in a local economy that are triggered by this initial loss (Sen, 1981; Deveraux, 2007). Devereux (2007) characterizes a sequence of four entitlement failures: first, production fails as a result of the rains failing; then labor markets fail as households are less and less able to find work opportunities on other farms or in off-farm activities; then commodity markets fail as grain prices increase and prices of liquid assets decrease. Finally, transfers fail as households cannot rely on the support of others in their network who face the same constraints in meeting everyday basic needs. The formulation outlined here is agnostic as to how rainfall is impacting consumption. However, the spatial measure of drought used and the necessity of incorporating district fixed effects focuses the analysis more on picking up production failures and failures in the immediate labor and commodity local market in which the drought is measured. The impact of drought through commodity markets that span a larger geographic area or transfer networks over a larger geographic space will not be picked up. ̂2 > 0 such that lack of rainfall and If higher levels of indicate rainfall sufficiency, it is anticipated that droughts have a negative effect on welfare (especially when a monetary welfare measure is used, such as consumption expenditures per adult equivalent). The extent to which hazards experienced by households in the sample are associated with sufficiently significant declines in their welfare depends in part on the degree to which there is variability in the exposure to and intensity of the shocks experienced. For example, if the value of is the same for all in the sample and not just for those within a given cluster, then there is no way to identify the parameter 2 . Hence, it is critical to derive estimates of ̂2 based on data that include significant variability in . This task is a particular challenge when analyzing the impact of drought because these events are spatially correlated and not localized. Increasing the number of years and the number of countries included increases the variation in . For this reason, the analysis prioritizes inclusion of countries where multiple rounds of survey data were available. In addition, by estimating this relationship at the level of the livelihood zone and not the country, the analysis increases the number of years and variation in hazards available. Based on the general but unknown function f, the effect of exposure to the hazard on welfare depends on the specific values of the determinants of welfare summarized by the variables , ℎ , and ℎ , i.e., ℎ = ( = ,ℎ = ,ℎ = ) . The linear specification in regression equation (2) assumes a constant effect of the hazard on ℎ welfare (i.e., = 2 ) that is independent of the level of the hazard and of specific household characteristics ℎ and environmental factors ℎ . It is important to relax this assumption in two ways: first, by allowing the impact of the hazard to vary by the size of the hazard deviation; and second, by allowing the impact of the hazard to vary by characteristics of the household. To explore the functional form of the hazard, several different approaches were adopted. Two approaches were visual: a simple plot of welfare against hazard and a binned scatterplot of the residualized welfare against the residualized hazard. Another approach estimated linear, quadratic, and cubic hazard specifications, as well as the best fractional polynomial specification, and compared the fit and plotted residuals. 6 A final approach explored further nonlinearities with step and piecewise linear functional forms. This involved segmenting the continuous hazard distributions into 5 (or 10) bins for use as a set of indicator variables to capture the hazard intensity. Results for linear, quadratic and cubic specifications are shown. Parsimonious specifications that best described the underlying relationship, especially in hazard range where 90 percent of the observed events occurred, were used for the simulations. 6 In Stata, the binscatter2 command was used to visualize the overall relationship between hazard and welfare. The fp and fp plot commands were used to explore the fit and residuals of different polynomial specifications. 6 It is important to bear in mind that sensitivity tests for the specification of the vulnerability function are based on the hazards and extreme events contained within the sample used to estimate the vulnerability function. A maintained assumption is that the functional form estimated based on the observed and probably limited set of historic events is also valid for extrapolated events outside the range of the sample. As King and Zeng (2006) point out, extreme counterfactuals that are far from the range of hazards contained in the estimation sample are more likely to be model dependent and thus quite misleading. This is because different model specifications (e.g., linear versus quadratic) may yield comparable losses within the limited range of the estimation sample, while in reality the welfare losses may be very different for extreme hazard values that are beyond the range of the estimation sample. Some of these concerns are addressed in section 5 of the paper. The impact of the hazard is allowed to vary by characteristics of the household by estimating a model that includes interaction terms of the hazard with specific household characteristics and/or some or all environmental characteristics, as in equation (3): ℎ = 0 + 1 1ℎ + 2 + 3 ( ∗ 1ℎ ) + 1 ℎ + 2 ( ∗ ℎ ) + + + ℎ (3) ̂3 > 0 for a household characteristic or asset that mitigates the negative In this specification, it is expected that �4 > 0 for an environmental characteristic that mitigates impact of the shock on welfare (e.g., irrigated land) (or the negative impact of the shock on welfare). Among a number of characteristics that were tried, two were consistently significant across specifications: land for cropping livelihood zones and tropical livestock units (TLUs) for pastoral livelihood zones. We present results by land ownership and TLUs as appropriate, but future work should include other characteristics. Acknowledging the costs of some out-of-sample predictive accuracy, we opted for the benefits of a simple (or ad- hoc) specification that allows for a more systematic and transparent comparison of the differences in estimates resulting from the hazard measure used, the specification of the functional form of the vulnerability function, and the potential heterogeneity of impacts. Country-level applications building on the findings of this study would benefit from the adoption of data-driven shrinkage or regularization methods such as Lasso, that are known to improve out-of-sample predictive accuracy. 2.2. Estimating an exceedance probability curve A common way to communicate risk—as calculated by combining data for hazard and exposure (value at risk) with vulnerability functions—is via exceedance probabilities (EPs) and their inverse measure, return periods. An EP curve describes the probability that various levels of loss will be exceeded, in this case the welfare loss that will occur because of rainfall deficits of increasing severity. Another way to understand the EP is to consider its inverse measure: return period = 1/(exceedance probability). For example, a loss with an annual exceedance probability of 1 percent, which means that there is only a 1 percent chance of facing a worse loss, is equivalent to a loss that gives a return period of 1 in 100 years. A longer return period suggests a lower probability that the impact from an extreme hazard will occur in any single year. For each hazard value in a given location, the loss is estimated using the vulnerability function and the characteristics of the households in the exposure data. To generate an EP curve, losses are ranked and combined with the probability of these drought events occurring. This calculation can be based on historical information (“as if” such events were to happen again today with current exposures) or many thousands of years of simulated/synthetic events. This paper focuses on the consumption losses that occur for below-average years, and it estimates an EP curve for both total consumption losses and consumption losses that push people into poverty or push those who were already poor further into poverty (i.e., that increase the poverty gap). Section 5 describes how this is done in practice and presents these curves. Catastrophe risk modeling has developed in the insurance and reinsurance sector over the past 30 years in response to potential insolvency-inducing events, such as Hurricane Andrew in 1992 in the US. Such analytics exploit increasing geographical data that captures exposure value at risk, predominantly the built environment, as 7 well as improved representation of hazards. Investment in research and technology in the field has been driven by the inadequacies of the historical record, mainly insurance claims data, for extreme events, and specifically by the inability of these data to robustly capture the statistics of catastrophic loss on insured portfolios. Unlike automotive or health claims, which are numerous and lend themselves to more reliable curve fitting for frequency and severity distributions, the small number of records of low-frequency and high-severity loss-making events do not provide sufficient evidence for such actuarial analysis. . While EP curves can be constructed from 20 years of observed historical loss data, they are limited to low-return-period estimation, are highly erratic and sensitive to data addition, and almost certainly underestimate loss levels, especially when considering average annual loss (Verisk 2013). Such records are rarely representative, especially in the extreme tail of the EP curve, which has few if any points to define it and is of primary importance to insurers and reinsurers. That is, history is not a good predictor of all possible future events. Moreover, in low-income regions of the world, there is often acute data scarcity around historical evidence of natural hazards. The resulting short time series of hazards hampers the estimation of the EP curve. The increasing availability of remotely sensed weather information from satellites has helped provide global coverage for hazard- related data sets that now range from 10 years to 40 years in length. The range of variables inferred from satellite data has also expanded to include new ones such as soil moisture, as new sensors come online and provide reliable training and validation data. A key component of catastrophe modeling is developing statistical and physics-based methodologies, which vary by hazard type, to extend limited historical catalogs of events to many thousands of years of probabilistic events in order to adequately capture frequency, extreme severities, and full geographical location. This process is computationally intensive, and great care must be taken to ensure that resultant event sets are constrained by observed physical processes; for example, definition of hazard parameter distributions of climatological characteristics must maintain spatial and temporal correlations. Model calibration and validation against historical information are a valuable check, but inevitably assumptions are required on data processing and sampling steps, ideally supported by academic reference (ABI 2011). Though the event-by-event loss calculations of catastrophe modeling are computationally intensive, especially for hazards that demand high geographic resolution (which can be less than 10 m for flood), these calculations also allow differing aspects of model uncertainty to be quantified and communicated. 7 Only such a large time series of synthetic but nevertheless realistic events can generate enough samples of losses to robustly define the remote tail of EP curves that are of interest to the insurance industry, such as 1-in-100-year and 1-in-250-year return periods. Taking such a probabilistic loss perspective also ensures the EP is smoothed out by having many thousands of data points. For the purposes of this paper, a 10,000-year stochastic catalog of synthetic simulations was generated across Malawi at a spatial resolution of 0.05° and dekadal temporal resolution for several hazard variables (including soil moisture). 8 These 10,000 years of hazard events are all potential versions of what could happen in the next 12 months and can be combined with vulnerability and exposure information to estimate longer return period tail behaviors) to get a fuller risk perspective. This step requires making assumptions about the relationship between hazard and impact, using historical data as a starting point. In a data-scarce environment, it can be very hard to validate the performance of the vulnerability function in the tails. This paper is a starting point, and more investigation is required. As noted above, these 10,000 years of hazard events reflect events that are likely to occur under current (and near-current) climate conditions, rather than forecasting climate conditions 100 years (for example) in the future. In this way catastrophe models differ significantly from general circulation models/global climate models (GCMs), 7 LMA (Lloyd’s Market Association), “Understanding Uncertainty in Catastrophe Modelling for Non-Catastrophe Modellers,” https://www.lmalloyds.com/lma/readfile.aspx?iDocumentStorageKey=cc44f6be-b83f-4cf9-903c- e802c1f312a8&iFileTypeCode=PDF&iFileName=Understanding%20uncertainty%20in%20cat%20modelling%20for% 20non-cat%20modellers. 8 Given the computational requirements to generate such a stochastic catalog, this exercise was completed for only one country. 8 as they have been built to answer different questions about risk. Such simulated hazard event sets focus upon a primary damage discriminator, such as wind speed for storm or ground shaking for earthquake, which is subsequently translated into impact on exposure through damage functions describing vulnerability. A criticism of catastrophe models is that they are not forward looking regarding climate change. 9 Since the consequences of climate change are already present, however, any recent historical record for atmospheric- related natural hazards comprises a climate change signal, alongside other potential modalities, such as the El Niño–Southern Oscillation (ENSO). Model sensitivity to these climate aspects can be explored in a catastrophe modeling environment and considered as another dimension of uncertainty. In recent years, “climate conditioning” of catastrophe models for differing time horizons and emissions scenarios (such as in Bates [2023]) has received much attention and is an active research frontier, attracting many new entrants to natural hazard risk modeling (see Fielder et al. 2021). Model adjustment can range from a simple shifting of event frequencies in stochastic hazard tables to much fuller incorporation of and alignment with GCM approaches. It must be remembered that there may be considerable uncertainty around future climate patterns (such as rainfall across the Sahel), and it may be challenging to adequately parameterize them without suggesting inappropriate levels of exactitude. Scenario modeling is one typical approach under high uncertainty. Additionally, exposure and vulnerability components of risk (such as urbanization, informal settlements, and flooding) can change much faster than climate trends. Since this work increasingly falls under financial regulation and disclosure regimes (e.g., NGFS 2022), explicit climate modeling for catastrophic risks will see much growth in the near future, with translation of approaches to low-income regions through use of underlying global data sets. This paper’s analysis focuses on past and current weather patterns and does not include an estimation of future changes due to global warming. 3. Data on Hazards and Welfare 3.1. Hazard data For the analysis described in this paper, we constructed a database of historical hazard data for all SSA using remotely sensed satellite data and indices that were reputable and publicly available and that had high levels of spatial and historical coverage. The considered hazard variables include rainfall, vegetation cover, evapotranspiration, soil moisture, and the WRSI. Given there is no single definition of drought, these hazard data were used to develop different measures of climatic conditions as a proxy for drought. These measures range from simple ones based on the amount and timing of precipitation during the growing season in a defined area, to more complex composite measures combining precipitation with other information capturing the impact of increased temperatures on water availability and agricultural stress at a location. Appendix B provides more details on all the drought measures considered and the sources of the data. 10 Table 3.1 provides a brief overview of the measures used in the analysis. These measures by and large focus on the first three months of the growing season. Table 3.1. Summary of range of hazard measures derived from remotely sensed data Data type Source Hazard measures Rainfall CHIRPS global rainfall data set from US Standardized anomaly of cumulative Geological Survey (NASA) rainfall (based on the first three months of the growing season) Vegetation NDVI product of MODIS vegetation NDVI standardized anomaly (based on the (greenness) indices average over the first three months of the growing season) Evapotranspiration SPEI generated by the Climatic Research Level of three-month SPEI at the end of the Unit of the University of East Anglia first three months of the growing season 9 See for example Purdy (2021). 10 A workshop was held at the beginning of the work described in this paper to ensure the right measures were included (in addition to those summarized in appendix B). 9 Soil moisture Soil Water Index from Copernicus Global Soil moisture standardized anomaly (based Land Service on the average over the first three months of the growing season) Availability of water FEWS NET End of season WRSI as anomaly (as a ratio for crops of the WRSI to the median historical WRSI) Source: World Bank. Note: CHIRPS = Climate Hazards Group InfraRed Precipita�on with Sta�on; FEWS NET = Famine Early Warning Systems Network; MODIS = Terra Moderate Resolu�on Imaging Spectroradiometer; NASA = Na�onal Aeronau�cs and Space Administra�on; NDVI = Normalized Difference Vegeta�on Index; SPEI = Standardized Precipita�on Evapotranspira�on Index; WRSI = Water Requirement Satisfaction Index. The historical coverage of the data differs. In general, the analysis uses data from the period 2000/01 to 2019/20 at dekadal or monthly intervals to reflect the shocks faced within the context of a changing climate. For example, the NDVI data utilized are a product of the MODIS (Terra Moderate Resolution Imaging Spectroradiometer) satellites and are available for February 2000 and after. However, the precipitation data are from the CHIRPS (Climate Hazards Group InfraRed Precipitation with Station) global rainfall data set and are available from 1981. Only the soil moisture data cover a shorter period, starting in 2007. Despite the limited history, however, soil moisture data are rapidly proving useful in estimating drought conditions, and they have a memory effect that rainfall data lack. The hazard data were extracted at a uniform grid resolution of 0.05° by 0.05° (about 5 km by 5 km near the equator) and a crop coverage mask (Copernicus Global Land Cover Layers, 2019) was applied to exclude hazard values in areas that would not affect crop yields. Data on seasonal timing were taken from the UN Food and Agriculture Organization (FAO) for each country. To aggregate the hazard data to an appropriate geographic unit for merging with the welfare data, additional data were incorporated in the data set, including country and administrative area boundaries from the World Bank Group, Global Administration Areas (GADM), and Global Administrative Unit Layers (GAUL) of the FAO. The standardized anomaly measure is defined as (level – mean for local area)/standard deviation for the local area, where the mean and standard deviation for the local area are calculated based on the 20 years of hazard data, or the maximum data available in the case of soil moisture. The local area is defined by aggregating the data for a 20 km buffer of the household location or for the administrative area if the former is not available. The unconditional correlation between different hazard measures is not particularly high. Table 3.2 presents results for each livelihood zone used in the analysis. The independent variation across these hazard measures highlights their capture of different atmospheric conditions; hence the difficulty of identifying a measure capturing drought events well. But as the analysis will show, in a regression framework the results are more consistent across most hazard measures (with some exceptions). This suggests that despite capturing different atmospheric conditions, the hazard measures are reasonably effective at capturing situations making households worse off. 10 Table 3.2. Unconditional correlation between different hazard measures by livelihood zone Maize Highland Roots and lowlands Total Total Total Soil Soil Soil seasonal SPEI NDVI seasonal SPEI NDVI seasonal SPEI NDVI moisture moisture moisture rainfall rainfall rainfall Soil moisture 0.43 0.49 0.00 SPEI 0.47 0.26 0.44 0.28 0.30 0.12 NDVI 0.48 0.63 0.32 0.32 0.51 0.18 0.22 0.43 0.07 WRSI 0.21 0.13 0.12 0.16 0.22 0.15 0.18 0.11 0.15 -0.11 0.01 0.02 Pastoral Agropastoral Total seasonal Soil SPEI NDVI Total seasonal Soil SPEI NDVI rainfall moisture rainfall moisture Soil moisture 0.46 0.32 SPEI 0.45 0.23 0.51 0.39 NDVI 0.57 0.54 0.28 0.33 0.60 0.31 WRSI 0.29 0.09 0.24 0.26 0.26 -0.11 0.17 0.10 Source: World Bank. Note: NDVI = Normalized Difference Vegeta�on Index; SPEI = Standardized Precipita�on Evapotranspira�on Index; WRSI = Water Requirement Satisfaction Index. 3.2. Hazard simulations data For the analysis, we asked Verisk’s AIR Worldwide (AIR) to produce 10,000-year drought-related stochastic catalogs of synthetic simulations for Malawi at a spatial resolution of 0.05° and dekadal temporal resolution. Stochastic catalogs are designed to provide a view of hazard that goes beyond the data available in the historical record. AIR used historical data on rainfall to produce three such catalogs: for precipitation, soil moisture, and vegetation index. To generate the stochastic catalogs, AIR used data similar to that used by the World Bank team in generating the historical hazard time series. CHIRPS data were used for precipitation; the Soil Water Index at a depth of 40 cm from the Copernicus Global Land Service was used for soil moisture; and the NDVI for vegetation cover was from MODIS. The resulting catalogs were validated by comparing the synthetic time series with the historical data in Malawi. The validation results show that the catalogs preserve the intrinsic properties in terms of spatial and temporal correlations, produce a realistic view of hazard, and are consistent with historical observations while allowing for the creation of new extremes. For the purposes of this paper, the approach was tested using the stochastic catalog for soil moisture in Malawi. Further details on the methodology used to generate the 10,000-year catalog and the validation conducted are in appendix C. 11 3.3. Welfare data The main welfare measures examined are constructed from household consumption surveys. To provide a comparable measure of welfare across countries, the aggregate that is used for national poverty measures and that forms the basis for global poverty measurement is used. Consumption aggregates are converted into 2011 purchasing power parities (PPPs) for this analysis. 11 We selected nine countries representing geographic variation in SSA for the analysis of the relationship between droughts and monetary welfare (consumption). They include Ethiopia in Eastern Africa; Mauritania, Niger, and Nigeria in Western Africa; and Lesotho, Malawi, Mozambique, Zambia, and Zimbabwe in Southern Africa. For five of these countries data are available for multiple years. The countries, years, and outcome measures used in analysis are listed in Table 3.3. To allow for ease of matching rainfall patterns to consumption in the first analysis, the selected countries are all from parts of the continent where there is a clear main season of production. Selection was also designed to include countries that had multiple survey rounds that could be pooled for analysis and that would provide sufficient numbers for analysis across livelihood zones. The empirical analysis is carried out by livelihood zone, which entails pooling these countries and estimating like areas of production within them. The zones used are those defined in Dixon et al. (2019). They allow for the required level of analysis. The authors provide gridded data for the livelihood zones estimated for 2015, which is within our analysis period. We further aggregated the zones estimated for 2015 into five zones. Table 3.4 indicates how the countries included match to the livelihood zones used in the analysis. Table 3.5 reports descriptive statistics for the variables used in the analysis by country and survey year. Table 3.3. Household surveys and welfare measures included in the analysis Country (years) Welfare measure Ethiopia (2010/11, 2014/15) Total expenditure per adult equivalent Food expenditure per adult equivalent Nonfood expenditure per adult equivalent Lesotho (2017/18) Total expenditure per capita Malawi (2010/11, 2016/17, Total expenditure per capita 2019/20) Total food expenditure per capita Total nonfood expenditure per capita Household dietary diversity: based on 12 food categories Household food consumption score: based on food types consumed and frequency of consumption in a seven-day period (maximum 120) Mauritania (2014, 2019/20) Total expenditure per capita Mozambique (2014) Total expenditure per capita Total food expenditure per capita Total nonfood expenditure per capita Niger (2011, 2014, 2018) Total expenditure per capita Total food expenditure per capita Total nonfood expenditure per capita Nigeria (2011, 2013, 2016) Total expenditure per capita Zambia (2015) Total expenditure per capita Zimbabwe (2017) Total expenditure per capita Source: World Bank. 11 In most countries, the aggregates take spatial variation in prices into account and where this is done, this is retained. In addition, if the consumption aggregate used for national poverty measurement uses per adult equivalent instead of per capita, this is also retained. This makes the aggregate different from the ones reported in the World Bank’s Poverty and Inequality Platform. 12 Table 3.4. Matching of household surveys to livelihood zones Ethiopia Lesotho Malawi Mauritania Mozambique Niger Nigeria Zambia Zimbabwe Total Maize mixed 2,913 377 26,172 0 2,610 0 0 5,580 12,614 50,266 Highland 10,651 2,444 0 0 227 0 29 0 1,695 15,046 Roots and 0 0 0 0 0 0 5,705 0 0 5,705 lowland Agropastoral 0 0 0 0 0 2,315 2,278 0 0 4,593 Pastoral 0 0 0 11,463 0 2,178 0 0 0 13,641 Source: World Bank. Note: Livelihood zones are generated from Dixon et al. (2019). Agropastoral and pastoral regions in Ethiopia are not included in the analysis. Pastoral includes pastoral and arid pastoral oasis farming systems. Highland includes highland perennial and highland mixed farming systems. Roots and lowland combines root and tuber crop, cereal root crop, and humid lowland tree crop farming systems. Table 3.5. Descriptive statistics for variables by year and country Share of Share of Share of Share of Share of Average per households households households households households capita Year Country with land with with interviewed in locations consumptiona ownershipb educated female during the lean considered (2011 US$ PPP) headb head season remote 2011 Ethiopia 950 0.98 0.37 0.15 0.34 0.68 2016 Ethiopia 1,227 0.96 0.38 0.17 0.35 0.69 2017 Lesotho 1,281 0.79 0.16 0.39 0.48 0.44 2010 Malawi 590 0.94 0.45 0.20 0.42 0.21 2016 Malawi 543 0.90 0.47 0.27 0.58 0.19 2019 Malawi 499 0.90 0.50 0.29 0.42 0.22 2014 Mauritania 1,782 0.44 0.36 0.27 0.96 0.54 2019 Mauritania 1,755 0.42 0.36 0.33 0.00 0.55 2014 Mozambique 626 0.96 0.49 0.22 0.12 0.67 2011 Niger 672 1.00 0.05 0.04 0.50 0.16 2014 Niger 715 0.99 0.06 0.08 0.19 0.14 2018 Niger 869 0.88 0.06 0.13 0.42 0.16 2011 Nigeria 738 0.82 0.46 0.08 0.55 0.19 2013 Nigeria 873 0.84 0.43 0.08 0.54 0.19 2016 Nigeria 859 0.84 0.46 0.06 0.54 0.19 2015 Zambia 486 0.89 0.65 0.18 0.67 0.72 13 2017 Zimbabwe 950 0.68 0.47 0.34 0.40 0.69 Source: World Bank. Notes: a. Or per adult equivalent consumption, depending on the country. b. See appendix B, table B.2 for country-specific definitions used. 3.4. Merging hazard and welfare data Linking the data on droughts to household welfare outcomes is a key step in assessing the poverty impact of drought risk. Hazard measures are available at a daily frequency over the period 2000/01–2019/20, whereas household and individual welfare measures typically collect information for the period around the date of the survey (food consumption during last seven days, height for age of child at the time of the survey, etc.) at four- or five-year intervals. This difference requires choices regarding the period over which the hazard measure is used in relation to the date of the survey and welfare measure. An additional consideration is whether the hazard experienced should be directly associated with the household’s geographic location, or associated with a broader administrative or geographic area such as the district or the province. Relating welfare to hazards experienced at a fine geographic level, ideally the household level, allows the impact on agricultural production and water and disease prevalence to be reflected while reducing measurement error for the experienced hazard. On the other hand, measuring rainfall shock for the larger geographic area could pick up some impacts of rainfall on agricultural prices and wages that would not be picked up by a very localized shock measure. Frequently there is a limit on what can be selected for analysis, as GPS coordinates at the household or the cluster/village level might not be collected (or made publicly available for reasons of confidentiality). Moreover, when the level of spatial aggregation within a country is very high, there is less variation in shock measures, which reduces statistical power available for identifying the impact of the shock on welfare. A focus on more local shock measures focuses the analysis on the income impact, and only consumption impacts for the immediate market. If GPS coordinates at the household or cluster level are available, there is a choice regarding the spatial level of aggregation of the drought measure. For example, the value of the hazard may be merged directly at the household or cluster level, or the household’s geographic location may be matched with a spatially aggregated hazard measure such as an average of the hazard measure within a 10 km, 20 km, or 50 km radius from the household’s location. Figure 3.1 illustrates this approach: the squares represent the 5 km by 5 km grids for which the hazard measures used in this analysis are available; the centroid of the square in the center of the figure (the grid with the star) represents the location of the household or the cluster; and the shaded squares represent the grids used to spatially aggregate the hazard measure for a 10 km radius (darker shade) and a 20 km radius (lighter shade) around the household’s location. 12 12 Specifically, for any given radius the spatial aggregation includes grid cells with at least half of the grid cell within the radius, and grid cells whose centroid is within the radius. 14 Figure 3.1. Aggregation of hazards in grids within a 10 km or 20 km radius Source: World Bank. Note: Squares represent the 5 km by 5 km grids for which the hazard measures are available; the (starred) centroid of the square represents the household or cluster loca�on; shaded squares represent the grids used to spa�ally aggregate the hazard measure for a 10 km radius (darker shade) and a 20 km radius (lighter shade) around the household’s loca�on. The results presented in this paper are based on matching a household’s geographic location with a spatially aggregated hazard measure within a 20 km radius. In the absence of any GPS data at the household or cluster level, the hazard measures are aggregated to the geographic or closest administrative level that is coded in the household survey. We chose the level of spatial aggregation so as to provide a roughly consistent area of spatial aggregation for surveys where matching was done using GPS data and administrative data. Initial analysis suggested that this level of spatial aggregation was appropriate. We compared the sensitivity of the relationship between welfare and hazard measures aggregated using different radii in the case of Malawi, a country that has GPS coordinates for surveyed households and multiple years of data. The statistical significance of results increased for radii of up to 20 km and then started to decrease again with larger radii, suggesting that a radius of 20 km is able to reflect the local shock to a household’s crop yields and is also large enough to capture some localized impacts on prices and wages in the local market area. However, as shown in section 4.2, for the full data set a larger geographic area it is not clear that 20 km works better in every setting, and there is very little difference between the results across aggregating radii. 3.5. Using standardized anomalies for analysis The hazard measures generated are in levels as well as standardized anomalies (z-scores). 13 The latter assume that differences in the level of the hazard from the historic average (i.e., level changes) matter not in an absolute sense but in proportion to an area’s usual variation. Welfare measures are only partly affected by the relation between hazards and crop yields and are the outcome of various ex ante and ex post risk-coping strategies developed by households over the years for the purpose of reducing income fluctuations and/or protecting their level of welfare (e.g., consumption). Hence the extent to which the experienced hazard deviates from what households expect (i.e., the historic mean value of the hazard) appears to be a better specification than the actual level of the hazard. Thus it is anticipated that the specification of shocks as standardized anomalies (in standard deviation units or z- scores) is likely to yield more sensible welfare impact estimates than those obtained using the level of the hazard 13 Some hazard measures such as the WRSI cannot be expressed as a standardized anomaly because by construction they are top coded (for example, the top value of WRSI is 100 percent). However, WRSI can be expressed as an anomaly (i.e., the ratio of current value with the mean or median). 15 (i.e., rainfall in millimeters). In the case of droughts, a positive anomaly means the hazard experienced was less severe than the historical mean, and a negative anomaly means it was more severe. When analysis of the specification of the hazard level as a level includes enough years of welfare data, controlling for fixed effects at some spatial level (e.g., district level) transforms the hazard and all other control variables as deviations from the mean (Dell, Jones, and Olken 2014). In cases such as ours, where the sample of welfare data contains a limited number of observations over time, this is not the case, and it is preferable to use a full time series of hazard data prior to regression analysis to generate a standardized anomaly. 14 Another advantage offered by the standardization of the deviation from the local mean is that it offers the opportunity to gauge the magnitude of the shock experienced across a variety of geographic contexts. We use z- scores for the hazard anomalies, which means that in addition to being differenced from the mean, the hazard variable is scaled by the standard deviation. This approach assumes that changes in the hazard matter not in an absolute sense, but in proportion to an area’s usual variation. This assumption is useful in taking some production and livelihood differences into account across space. A 3 mm decrease in rain does not have the same impact in wet and dry regions. Considering variations relative to an area’s usual variation is more important when there is no control for this in the hazard measure being used, e.g., total rainfall, soil moisture. It is arguably less important when using an outcome measure that already takes crop choice into account, such as NDVI or WRSI. Table 3.6 presents the means and standard deviations for the hazard variables used in the analysis. The means indicate lower than average years for many of the hazards across the livelihood zones. The standard deviations highlight substantial variation around these means. It is this variation that the analysis exploits. Table 3.6. Mean and standard deviation of hazards used in the analysis Hazard variable Maize Highland Roots Pastoral Agropastoral Growing season -0.312 -0.322 -0.293 0.098 -0.359 Soil moisture (0.929) (1.126) (0.769) (1.086) (1.000) -0.284 -0.349 -0.472 0.334 0.058 NDVI (0.949) (1.038) (0.923) (1.046) (0.955) 0.111 -0.345 -0.600 0.367 0.067 SPEI (0.721) (0.820) (0.850) (0.834) (1.183) 0.035 -0.378 -0.291 0.417 -0.002 Rainfall (1.135) (1.170) (0.909) (0.980) (1.183) 99.050 98.860 99.339 101.488 100.866 WRSI (9.785) (9.789) (2.118) (11.026) (10.109) 14 Consider, for example, a cross-sectional household data set with three survey rounds and an identifier for the same districts (or clusters, though that is less likely) in each round. Using fixed effects for districts (clusters) transforms all the variables as deviations from the sample district (cluster) mean over all the household observations across the years in the same district (cluster) within the sample. With three survey rounds over the 20-year period analyzed, the district-specific sample mean of any given hazard measured is likely to differ significantly from the district mean estimated over a 20-year period. As a consequence, estimates of the welfare impacts of shocks are likely to differ significantly depending on whether the analysis uses the specification of the hazard as a level along with the fixed effects method or it uses the specification of the hazard as a standardized anomaly from the local normal (over the 20-year period). In fact, in the extreme case in which only one cross- sectional round of survey data is available, then the specification of the hazard as a standardized anomaly is the only sensible option, since in this instance, use of the level of hazard with the fixed effects for districts (for example) would simply express the hazard as a deviation from the district mean of the hazard values experienced in the same survey round by other households in the survey. 16 Rainy season (used in pastoral and agropastoral specifications) 0.693 0.610 Soil moisture (0.497) (0.173) 0.396 0.161 NDVI (1.038) (0.852) 0.205 0.334 Rainfall (0.942) (0.826) Source: World Bank. Note: NDVI = Normalized Difference Vegeta�on Index; SPEI = Standardized Precipita�on Evapotranspira�on Index; WRSI = Water Requirement Satisfaction Index. Countries and surveys included are: Ethiopia (2011, 2016), Lesotho (2017), Malawi (2010, 2016, 2019), Mauritania (2014, 2109/20), Mozambique (2014), Niger (2018), Nigeria (2011, 2013, 2016), Zambia (2015), and Zimbabwe (2017). 4. Results 4.1. Main results Tables 4.1 to 4.3 show results for continuous measures of hazard anomalies for the three cropping livelihood zones of maize, highland, and roots and lowland. Each table shows results for linear, quadratic, and cubic measures of soil moisture, vegetation (NDVI), evapotranspiration (SPEI), total seasonal rainfall, and WRSI; notes to each table indicate which countries and years are included. The impact of the hazard is estimated and presented separately for households with and without land to reflect the difference in the degree to which households are likely to be impacted by the hazard. 15 There are three important conclusions from these results. First, not all measures perform equally well at capturing the occurrence of the shock, to the extent that welfare is positively and significantly correlated with good weather conditions and negatively affected by weather shocks for households with land. Of the four measures, soil moisture and NDVI appear to give more precise and consistent results across specifications, with soil moisture offering the most consistent results across livelihood zones. SPEI, total seasonal rainfall and WRSI provide much less consistent results. Total seasonal rainfall often provides counterintuitive results, as does SPEI in some specifications. WRSI is often not significant. Second, although nonlinearities are known to be present in the relationship between weather and agricultural production, we do not observe a consistent nonlinear pattern across regression results. This could simply reflect the fact that we are looking at the relationship between weather and consumption rather than agricultural production where non-linear effects are well established (e.g., Lobell, Schlenker and Costa-Roberts 2011). It could also reflect the fact that non-linearities are already part of the index construction in some cases. For example, WRSI weights losses in rainfall differently based on when they occur and NDVI, by capturing vegetation should, already capture the non-linear impact of weather conditions on yields. Another reason for not observing a consistent non-linear pattern could be that we have few household survey observations at extreme values of the anomaly which is where observations would be needed to pick up non-linear effects. In the maize livelihood zone some nonlinearity is present: there is a stronger relationship between hazard anomalies and welfare in the middle of the distribution (where anomalies and therefore welfare changes are smaller) and a smaller impact of anomalies on welfare at the tails of the distribution. A steeper relationship is also observed in the quadratic and cubic specification for the roots and lowland livelihood zone, but no tapering off for negative values (Figures 4.1 and 4.2). Despite being significant in this case, the squared and cubic terms do not indicate much nonlinearity in the relationship. Nonlinearities are much more important when using the total 15 The term “hazard” as used here includes the full distribution of the weather anomaly used—that is, positive and negative events, not just events that cause welfare losses according to the definition of hazard. 17 rainfall measure; this makes sense because this measure does not reflect any nonlinear relationship between rainfall and crop production in how it is constructed. In these cases, the nonlinearity indicates a larger impact on welfare as the anomaly becomes bigger and more negative, and a smaller impact on welfare as the anomaly becomes more positive, even turning negative for very large values of total rainfall that may reflect waterlogging and flooding. Third, for soil moisture negative impacts are felt for both landed and landless households with no consistent pattern of who is most affected. For the highlands and NDVI measures in roots and tubers livelihood area, there is a strong and significant impact on landed households, but little impact on landless households. Overall, though given the relatively similar impact in many specifications it suggests that using weather variation in a 20km radius includes the impact of the weather on immediate labor markets. 18 Table 4.1. Results for maize livelihood zone Soil moisture NDVI SPEI Total seasonal rainfall WRSI (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) Landed households Hazard 0.055*** 0.059*** 0.119*** 0.024** 0.034*** 0.071*** 0.019 0.019 0.062*** -0.019 -0.017 0.012 0.001 0.012 -0.015 [3.852] [4.097] [4.792] [2.122] [2.755] [4.120] [1.616] [1.547] [2.843] [-1.570] [-1.407] [0.701] [0.685] [1.169] [-0.433] Hazard2 0.022* 0.011 0.020*** -0.001 0.003 -0.004 -0.022*** -0.015** -0.000 0.000 [1.831] [0.839] [2.807] [-0.082] [0.243] [-0.250] [-3.021] [-1.964] [-1.162] [0.653] Hazard3 -0.028*** -0.021*** -0.021*** -0.011** -0.000 [-2.954] [-3.814] [-2.600] [-2.205] [-0.863] Landless households Hazard 0.043** 0.038* 0.032 0.013 0.030* 0.077** -0.028** -0.041** 0.071 -0.045*** -0.042*** -0.075** -0.003*** 0.000 -0.016 [2.239] [1.844] [0.875] [0.801] [1.696] [2.441] [-2.006] [-2.249] [1.550] [-3.161] [-3.128] [-2.526] [-3.064] [0.056] [-0.422] Hazard2 0.002 0.004 0.029** 0.005 -0.022 -0.039** -0.033** -0.039** -0.000 0.000 [0.101] [0.181] [2.189] [0.347] [-1.305] [-2.213] [-2.114] [-2.571] [-0.444] [0.374] Hazard3 0.007 -0.023** -0.041*** 0.012 -0.000 [0.399] [-2.352] [-2.917] [1.215] [-0.406] Obs. 50,266 50,266 50,266 50,242 50,242 50,242 50,266 50,266 50,266 50,266 50,266 50,266 48,506 48,506 48,506 R-sq 0.362 0.363 0.364 0.358 0.358 0.359 0.361 0.361 0.361 0.361 0.362 0.362 0.370 0.371 0.371 Source: World Bank. Note: NDVI = Normalized Difference Vegeta�on Index; SPEI = Standardized Precipita�on Evapotranspira�on Index; WRSI = Water Requirement Satisfaction Index. Results are from a weighted regression such that each household is assigned its survey-specific population weight. Countries and years included are Ethiopia (2011, 2016), Lesotho (2017), Malawi (2010, 2016, 2019), Mozambique (2014), Zambia (2015), and Zimbabwe (2017). Dependent variable is the log of per capita (or adult equivalent) consumption. Other variables included and not shown are education of household head, gender of household head, whether the household location is remote, agroecological zone indicators, whether the household was surveyed during the lean season, survey year, country, country x survey year, and administrative fixed effects for district (Ethiopia. Lesotho, Malawi) and province (Mozambique, Zambia, Zimbabwe). Robust t-statistics are shown in parenthesis. Significance level: * = 10 percent, ** = 5 percent, *** = 1 percent. When variable is significant the coefficient and standard error are in bold font. 19 Table 4.2. Results for highland livelihood zone Soil moisture NDVI SPEI Total seasonal rainfall WRSI (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) Landed households Hazard 0.045*** 0.044*** 0.044* 0.020* 0.017 0.025 0.041** 0.001 -0.049* 0.030** 0.017 0.031 0.003* 0.017 0.065 [4.495] [4.350] [1.653] [1.649] [1.154] [1.382] [2.426] [0.060] [-1.649] [2.575] [1.172] [1.307] [1.782] [1.332] [0.902] Hazard2 -0.004 -0.004 -0.003 -0.013 -0.055*** -0.041** -0.017* -0.021* -0.000 -0.001 [-0.427] [-0.333] [-0.338] [-0.917] [-3.940] [-2.496] [-1.895] [-1.813] [-1.147] [-0.783] Hazard3 -0.000 -0.005 0.024** -0.005 0.000 [-0.007] [-0.853] [2.153] [-0.694] [0.699] Landless households Hazard -0.005 -0.006 -0.030 -0.002 -0.027 -0.026 0.020 -0.011 0.031 0.017 0.016 0.042 0.002 0.021 0.111 [-0.214] [-0.231] [-0.533] [-0.063] [-0.565] [-0.556] [0.765] [-0.294] [0.600] [0.747] [0.622] [0.876] [1.046] [1.368] [1.577] Hazard2 -0.002 -0.000 -0.023 -0.024 -0.031 -0.111** 0.001 -0.009 -0.000 -0.001 [-0.100] [-0.002] [-0.852] [-0.496] [-1.080] [-2.233] [0.062] [-0.358] [-1.303] [-1.425] Hazard3 0.010 -0.001 -0.051* -0.009 0.000 [0.523] [-0.045] [-1.746] [-0.625] [1.308] Obs. 15,046 15,046 15,046 15,046 15,046 15,046 15,046 15,046 15,046 15,046 15,046 15,046 12,159 12,159 12,159 R-sq 0.265 0.265 0.265 0.260 0.260 0.261 0.261 0.266 0.268 0.262 0.263 0.263 0.258 0.259 0.259 Source: World Bank. Note: NDVI = Normalized Difference Vegeta�on Index; SPEI = Standardized Precipita�on Evapotranspira�on Index; WRSI = Water Requirement Satisfaction Index. Results are from a weighted regression such that each household is assigned its survey-specific population weight. Countries and years included are Ethiopia (2011, 2016), Lesotho (2017), Mozambique (2014), and Zimbabwe (2017). Dependent variable is the log of per capita (or adult equivalent) consumption. Other variables included and not shown are education of household head, gender of household head, whether the household location is remote, whether the household was surveyed during the lean season, survey year, country, country x survey year, and administrative fixed effects for district (Ethiopia. Lesotho) and province (Mozambique, Zimbabwe). Robust t-statistics are shown in parenthesis. Significance level: * = 10 percent, ** = 5 percent, *** = 1 percent. When variable is significant the coefficient and standard error are in bold font. 20 Table 4.3. Results for roots and lowland livelihood zone Soil moisture NDVI SPEI Total seasonal rainfall WRSI (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) Landed households Hazard 0.152*** 0.092*** 0.136*** 0.069*** 0.092*** 0.091*** 0.082*** 0.063* 0.027 0.069*** 0.036* 0.022 0.019 -1.337*** -3.865 [8.123] [3.668] [4.314] [4.269] [4.214] [3.038] [2.634] [1.785] [0.527] [3.406] [1.766] [0.744] [1.360] [-2.927] [-0.237] Hazard2 -0.07*** -0.149*** 0.022 0.023 -0.017 0.013 -0.041*** -0.036*** 0.007*** 0.033 [-2.801] [-4.316] [1.496] [1.419] [-0.715] [0.379] [-3.894] [-3.504] [2.967] [0.196] Hazard3 -0.051*** 0.001 0.023 0.005 -0.000 [-2.668] [0.092] [0.999] [0.839] [-0.154] Landless households Hazard 0.166*** 0.154*** 0.165*** 0.036 0.013 -0.001 0.027 0.112 0.022 0.021 0.015 -0.010 0.050** -2.006** 11.437 [3.698] [3.391] [2.970] [0.971] [0.236] [-0.016] [0.442] [1.254] [0.209] [0.543] [0.429] [-0.148] [2.281] [-2.266] [0.441] Hazard2 -0.032 -0.037 -0.018 0.005 0.069 0.167* -0.008 -0.010 0.011** -0.126 [-0.579] [-0.571] [-0.486] [0.097] [1.264] [1.677] [-0.323] [-0.370] [2.335] [-0.476] Hazard3 -0.007 0.014 0.077 0.010 0.000 [-0.157] [0.466] [1.373] [0.600] [0.514] Obs. 5,705 5,705 5,705 5,705 5,705 5,705 5,705 5,705 5,705 5,705 5,705 5,705 2,206 2,206 2,206 R-sq 0.175 0.179 0.182 0.157 0.158 0.158 0.154 0.155 0.157 0.156 0.160 0.160 0.107 0.115 0.115 Source: World Bank. Note: NDVI = Normalized Difference Vegeta�on Index; SPEI = Standardized Precipita�on Evapotranspira�on Index; WRSI = Water Requirement Satisfaction Index. Results are from a weighted regression such that each household is assigned its survey-specific population weight. One country, Nigeria, is included for years 2011, 2013, and 2016. Dependent variable is the log of per capita (or adult equivalent) consumption. Other variables included and not shown are education of household head, gender of household head, whether the household location is remote, whether the household was surveyed during the lean season, survey year, country, country x survey year, and zone fixed effects. Robust t-statistics are shown in parenthesis. Significance level: * = 10 percent, ** = 5 percent, *** = 1 percent. When variable is significant the coefficient and standard error are in bold font. 21 Figure 4.1. Landed household results for soil moisture and NDVI, by livelihood zone a. Maize livelihood zone, soil moisture b. Maize livelihood zone, NDVI c. Highland livelihood zone, soil moisture d. Highland livelihood zone, NDVI e. Roots & lowland livelihood zone, soil moisture f. Roots & lowland livelihood zone, NDVI Source: World Bank. Note: Red dotted lines indicate range encompassing 90 percent of the sample. Graphs outlined in red have significant quadratic or cubic terms. 22 Figure 4.2. Landed household results for SPEI and total rainfall, by livelihood zone a. Maize livelihood zone, SPEI b. Maize livelihood zone, total seasonal rainfall c. Highland livelihood zone, SPEI d. Highland livelihood zone, total seasonal rainfall e. Roots & lowland livelihood zone, SPEI f. Roots & lowland livelihood zone, total seasonal rainfall Source: World Bank. Note: Red dotted lines indicate range encompassing 90 percent of the sample. Graphs outlined in red have significant quadratic or cubic terms. 23 Tables 4.4 and 4.5 show results for pastoral and agropastoral areas respectively. The choice of hazard and functional form is influenced by in-depth work undertaken on Mauritania (Hill and Vinha 2022), one of the pastoral countries in the sample. This work highlights the importance of using Tropical Livestock Unit (TLU) ownership as a way of distinguishing between household types. It also highlights the importance of including households living in areas classified as urban, as many of these households were found to have significant TLU ownership. Also, given that especially many of the pastorals are in locations without growing season per se, the hazards are constructed over the whole rainy season. The results for pastoral areas show that poor weather impacts consumption much more strongly for households with fewer animals. Those with more animals are able to smooth their consumption in the face of poor weather. The results for agropastoralists are harder to interpret. The same pattern of worse impacts for households with fewer animals holds for agropastoralists, but weather shocks appear positive for landless households. Only results for NDVI are shown, but no measure provided sensible results. Further work, and probably additional data, are needed to better model and understand the impact of weather shocks on agropastoralist welfare in Sub-Saharan Africa, including considering cumulative impacts of rainfall losses across seasons and temperature impacts on cattle. Table 4.4. Results for pastoral zone in Western Africa NDVI Soil moisture Total seasonal rainfall (1) (2) (3) (4) (5) (6) (7) (8) (9) No TLU Hazard 0.069** 0.065*** 0.130*** 0.120*** 0.050 0.014 0.014 0.019 0.111 [2.463] [2.966] [3.510] [2.809] [0.929] [0.157] [0.518] [0.667] [1.632] Hazard2 0.004 0.056* 0.079 0.022 -0.008 0.025 [0.126] [1.832] [1.304] [0.313] [-0.227] [0.662] Hazard3 -0.037*** 0.059 -0.049* [-2.782] [0.681] [-1.659] Few (less than 2.1) TLU Hazard 0.087*** 0.065*** 0.120*** 0.110*** 0.072** 0.060 0.073** 0.078** 0.156** [2.899] [3.656] [4.102] [2.841] [2.378] [1.233] [2.432] [2.579] [2.552] Hazard2 0.024 0.062*** 0.047 0.030 -0.013 0.005 [1.049] [2.891] [1.185] [0.506] [-0.658] [0.235] Hazard3 -0.028*** 0.016 -0.040* [-3.379] [0.375] [-1.914] Many (more than 2.1) TLU Hazard 0.020 0.022 -0.011 0.016 0.087** 0.078 0.001 0.009 0.028 [1.008] [1.074] [-0.382] [0.433] [2.355] [1.592] [0.047] [0.330] [0.520] Hazard2 -0.009 -0.019 -0.076* -0.090 -0.027 -0.021 [-0.954] [-1.352] [-1.895] [-1.538] [-1.522] [-1.153] Hazard3 0.013* 0.013 -0.005 [1.867] [0.264] [-0.284] Obs. 13,641 13,641 13,641 13,641 13,641 13,641 13,641 13,641 13,641 R-sq 0.417 0.418 0.422 0.415 0.416 0.416 0.415 0.415 0.416 Source: World Bank. Note: NDVI = Normalized Difference Vegetation Index; TLU = Tropical Livestock Unit. Results are from a weighted regression such that each household is assigned its survey-specific population weight. Countries and years included are Mauritania (2014, 2109/20) and Niger (2018). Dependent variable is the log of per capita (or adult equivalent) consumption. Other variables included and not shown are education of household head, gender of household head, whether the household location is remote, whether the household was surveyed during the lean season, survey year, country, country x survey year, and administrative fixed effects. Robust t-statistics are shown in parenthesis. Significance level: * = 10 percent, ** = 5 percent, *** = 1 percent. 24 Table 4.5. Results for agropastoral zone in Western Africa NDVI (1) (2) (3) Landed households No TLU Hazard -0.046 -0.047 -0.031 [-0.817] [-0.843] [-0.329] Hazard2 -0.002 -0.001 [-0.068] [-0.025] Hazard3 -0.009 [-0.224] Few (less than 2.1) TLU Hazard 0.052* 0.052* 0.064 [1.658] [1.693] [1.240] Hazard2 0.022 0.022 [1.034] [0.970] Hazard3 -0.006 [-0.292] Many (more than 2.1) TLU Hazard 0.039 0.039 -0.010 [0.908] [0.906] [-0.192] Hazard2 -0.010 -0.008 [-0.235] [-0.215] Hazard3 0.032 [1.044] Landless households Hazard -0.212*** -0.214*** 0.112 [-3.393] [-3.553] [0.453] Hazard2 0.018 0.023 [0.265] [0.352] Hazard3 -0.186 [-1.391] Obs. 4,593 4,593 4,593 R-sq 0.132 0.133 0.137 Source: World Bank. Note: NDVI = Normalized Difference Vegetation Index. Results are from a weighted regression such that each household is assigned its survey-specific population weight. Countries and years included are Niger (2018) and Nigeria (2011, 2013, 2016). Dependent variable is the log of per capita (or adult equivalent) consumption. Other variables included and not shown are education of household head, gender of household head, whether the household location is remote, whether the household was surveyed during the lean season, survey year, country, country x survey year, and administrative fixed effects. Robust t-statistics are shown in parenthesis. Significance level: * = 10 percent, ** = 5 percent, *** = 1 percent. 4.2. Exploring different areas over which to aggregate the hazard data Initial analysis suggested that a 20 km level of spatial aggregation of hazards was appropriate. This radius is able to reflect the local shock to a household’s crop yields and may be large enough to capture some localized impacts on prices and wages in the local market area. We tested this level of aggregation again at the end of the analysis for all countries and found that the level of spatial aggregation made very little difference. In some specifications taking a 25 larger level of spatial aggregation increases the estimated impact of the shock—although not significantly—without affecting statistical significance (table 4.6). In future work it would be interesting to explore much larger areas of aggregation if this is possible (i.e., if enough variation in the hazard is left and GPS data are available) to see if this affects the results. Table 4.6. Results for different areas of spatial aggregation of hazard data Maize (GPS coordinate countries only) 20 km radius 50 km radius 10km 20km 50km Maize Highland Roots Maize Highland Roots Landed households Hazard 0.066*** 0.075*** 0.086*** 0.055*** 0.045*** 0.152*** 0.056*** 0.045*** 0.154*** [4.252] [4.608] [4.765] [3.852] [4.495] [8.123] [3.722] [4.495] [8.487] Landless households Hazard 0.109*** 0.112*** 0.123*** 0.043** -0.005 0.166*** 0.045** -0.004 0.159*** [4.165] [4.212] [4.451] [2.239] [-0.214] [3.698] [2.276] [-0.176] [4.017] Obs. 34,824 34,739 35,212 50,266 15,046 5,705 50,739 15,060 5,856 R-sq 0.138 0.137 0.133 0.362 0.265 0.175 0.357 0.260 0.177 Source: World Bank. Note: Results are from a weighted regression such that each household is assigned its survey-specific population weight. Dependent variable is the log of per capita (or adult equivalent) consumption. The hazard variable used is the soil moisture index. Other variables included and not shown are education of household head, gender of household head, whether the household location is remote, agroecological zone indicators, whether the household was surveyed during the lean season, survey year, country, country x survey year, and administrative fixed effects for district. Robust t-statistics are shown in parenthesis. Significance level: * = 10 percent, ** = 5 percent, *** = 1 percent. 4.3. Exploring food and nonfood consumption impacts separately Four of the nine countries (Ethiopia, Malawi, Mozambique, and Niger) had separate aggregates of food and nonfood consumption in the data sets available for analysis. We conducted the same analysis separately on food and nonfood consumption for these countries to assess whether the main impact was on food consumption or whether households protected food consumption by reducing nonfood consumption; the latter result is found, for example, in Kochhar and Knippenberg (2023) using a similar method. Tables 4.7 and 4.8 present results for the maize and highland livelihood zones respectively. Results are shown using just the soil moisture variable. For both livelihood zones, the impacts of the hazard variables on food consumption are larger than impacts on nonfood consumption. Nonfood consumption is impacted only among landed households. Further work with data from additional countries would be informative in understanding whether this pattern holds. 26 Table 4.7. Food and nonfood consumption, maize livelihood zone (soil moisture) Total consumption, all countries in Total consumption, countries with zone disaggregated information Food consumption Nonfood consumption Landed households 0.055*** 0.059*** 0.119*** 0.058*** 0.060*** 0.131*** 0.081*** 0.082*** 0.158*** 0.037* 0.041** 0.101*** Hazard [3.852] [4.097] [4.792] [3.731] [3.900] [4.681] [4.754] [4.680] [5.581] [1.903] [2.168] [2.751] 0.022* 0.011 0.024* 0.012 0.011 -0.002 0.030* 0.019 Hazard2 [1.831] [0.839] [1.753] [0.809] [0.729] [-0.152] [1.758] [1.009] -0.028*** -0.032*** -0.035*** -0.027* Hazard3 [-2.954] [-3.041] [-3.740] [-1.902] Landless households 0.043** 0.038* 0.032 0.060*** 0.053* 0.063 0.093*** 0.100*** 0.112** 0.028 0.009 0.029 Hazard [2.239] [1.844] [0.875] [2.593] [1.910] [1.161] [3.942] [3.610] [2.121] [0.950] [0.255] [0.407] 0.002 0.004 0.005 0.002 0.022 0.018 -0.013 -0.019 Hazard2 [0.101] [0.181] [0.195] [0.071] [0.739] [0.546] [-0.384] [-0.515] 0.007 -0.002 -0.003 -0.008 Hazard3 [0.399] [-0.067] [-0.125] [-0.269] Obs. 50,266 50,266 50,266 31,671 31,671 31,671 31,671 31,671 31,671 31,671 31,671 31,671 R2 0.362 0.363 0.364 0.363 0.363 0.365 0.281 0.281 0.284 0.387 0.388 0.389 Source: World Bank. Note: Results are from a weighted regression such that each household is assigned its survey-specific population weight. Countries and years included are Ethiopia (2011, 2016), Malawi (2010, 2016, 2019), and Mozambique (2014). Dependent variable is the log of per capita (or adult equivalent) consumption. Other variables included and not shown are education of household head, gender of household head, whether the household location is remote, agroecological zone indicators, whether the household was surveyed during the lean season, survey year, country, country x survey year, and administrative fixed effects for district (Ethiopia. Lesotho, Malawi) and province (Mozambique, Zambia, Zimbabwe). Robust t-statistics are shown in parenthesis. Significance level: * = 10 percent, ** = 5 percent, *** = 1 percent. 27 Table 4.8. Food and nonfood consumption, highland livelihood zone (soil moisture) Total consumption, all countries in Total consumption, countries with zone disaggregated information Food consumption Non-food consumption Landed households 0.045*** 0.044*** 0.044* 0.044*** 0.043*** 0.044 0.054*** 0.052*** 0.037 0.042*** 0.043*** 0.060* Hazard [4.495] [4.350] [1.653] [4.402] [4.227] [1.600] [4.547] [4.168] [1.195] [3.365] [3.413] [1.720] -0.004 -0.004 -0.005 -0.005 -0.010 -0.005 0.004 -0.001 Hazard2 [-0.427] [-0.333] [-0.515] [-0.422] [-1.038] [-0.420] [0.395] [-0.086] -0.000 -0.000 0.006 -0.007 Hazard3 [-0.007] [-0.040] [0.520] [-0.558] Landless households -0.005 -0.006 -0.030 -0.012 -0.012 -0.053 0.013 0.012 -0.080 -0.026 -0.025 -0.016 Hazard [-0.214] [-0.231] [-0.533] [-0.441] [-0.450] [-0.740] [0.454] [0.408] [-1.218] [-0.718] [-0.685] [-0.153] -0.002 -0.000 -0.013 -0.009 -0.014 -0.002 -0.008 -0.012 Hazard2 [-0.100] [-0.002] [-0.578] [-0.384] [-0.530] [-0.091] [-0.275] [-0.365] 0.010 0.015 0.035 -0.004 Hazard3 [0.523] [0.696] [1.624] [-0.117] Obs. 15,046 15,046 15,046 10,877 10,877 10,877 10,877 10,877 10,877 10,868 10,868 10,868 R2 0.265 0.265 0.265 0.268 0.268 0.268 0.333 0.334 0.334 0.228 0.228 0.228 Source: World Bank. Note: Results are from a weighted regression such that each household is assigned its survey-specific population weight. Countries and years included are Ethiopia (2011, 2016) and Mozambique (2014). Dependent variable is the log of per capita (or adult equivalent) consumption. Other variables included and not shown are education of household head, gender of household head, whether the household location is remote, whether the household was surveyed during the lean season, survey year, country, country x survey year, and administrative fixed effects for district (Ethiopia, Lesotho) and province (Mozambique, Zimbabwe). Robust t-statistics are shown in parenthesis. Significance level: * = 10 percent, ** = 5 percent, *** = 1 percent. 28 5. Simulating a Probabilistic Impact Curve for Welfare Losses from Drought The parameter estimates from the regression model in equation (3) (section 2) can be used to predict welfare for � alternate hazard values. Let ℎ be the predicted level of welfare if weather had been in that year instead of . Using the linear specification with heterogeneous impacts—equation (3)—for year t produces � � � � � ℎ = 0 + 1 ℎ + 2 + 3 ( ∗ 1ℎ ) + � 1 ℎ + + , (4) which can be written as: � � � ℎ = ℎ + 2 ( − ) + 3 (( − ) ∗ ℎ ). (5) Using this approach, welfare can be simulated for many different hazard values. This method assumes that the vulnerability function estimated is also valid for the range of simulated hazards. This assumption is conditional on the simulated values of the hazard being within the range of hazards experienced by households within the districts and years used for estimating equation (3). Indeed, the relationship estimated in equation (3) is representative for the range of values included in the regression. Informing more extreme events than the ones in the regression sample would require strong assumptions to extrapolate this relationship. Hence, understanding the ex ante distribution of welfare impacts of future climate events requires not just the ability to identify the relationship between welfare outcomes and hazard variables; it also requires the range of hazard values for which the impact is identified to be commensurate with the range of hazard values that might occur in the future. More extreme hazard values are less likely to manifest, and it is unlikely that survey data were collected when and where these extreme hazard values were last realized. For this reason, the above approach is not likely to provide a useful prediction of the number of people falling into poverty as a consequence of an unusually severe drought that is spread over many districts and/or countries. To assess the degree to which the welfare impacts of future hazard events can accurately be simulated, and thus the dangers associated with extreme counterfactuals minimized (King and Zeng 2006), we undertake two tests. The first assesses the degree to which the method used for identifying the impact of hazards on welfare reduces the variation in the hazard measure reflected in the analysis. The second assesses whether the extreme values of the hazards are ̂2 and within or outside the range of hazards used in estimation of ̂3 . The district-level fixed effects are very useful for minimizing the sources of omitted variable bias, while year effects are useful for ensuring that the relationship between hazard and welfare is identified from idiosyncratic local shocks, and not aggregate or covariate shocks. However, the identification derived from the use of fixed effects typically comes at a ̂2 and cost. The estimates of the coefficients ̂3 derived using fixed effects are based on variation within districts and within years, and thus do not account for the potentially large variation that may prevail between districts and between years. Thus they are unable to account for the variation in welfare that may result from covariate shocks, such as an El Niño or La Niña causing common rainfall conditions across large geographic areas that include multiple districts within a country or even across countries. The loss in variation that comes from this approach can be assessed by comparing the histogram of the hazard measure with the histogram of the within-residualized hazard measures (Mummolo and Peterson 2018), where the within- residualized hazard measure is the vector of residuals from a regression of the hazard on district and year effects (as well as any other binary variables). In simple terms, this vector represents the variation in the hazard that is used to estimate the coefficients 2 and 3 in the fixed effects model. These two hazard distributions can be compared by comparing the standard deviation of each distribution. For the household surveys, the standard deviation of the hazard falls by an average of 28 percent (9–42 percent) when country-year and district fixed effects are included. The full set of reductions by indicator and livelihood zone is given in table 5.1. Although the reduction in variation is small, the importance of the larger covariate shock for welfare could still be significant, and as underscored above, this is not captured through this approach. Table 5.1. Reduction in standard deviation of hazard anomalies resulting from the inclusion of fixed effects Livelihood zone Soil moisture NDVI SPEI Total rainfall 29 Maize (GS) 32% 29% 23% 36% Highland (GS) 23% 27% 33% 29% Roots and lowland (GS) 9% 7% 42% 23% Agropastoral (RS) 35% 28% 28% Pastoral (RS) 40% 36% 27% Source: World Bank. Note: NDVI = Normalized Difference Vegetation Index; SPEI = Standardized Precipitation Evapotranspiration Index; GS = growing season; RS = rainy season. To assess whether the extreme values of the hazards are within or outside the range of hazards experienced by households within the districts and years used, we assessed the percentage of households in a given livelihood zone that experienced a hazard value more than two standard deviations lower than the mean or more than three standard deviations lower than the mean. The results for soil moisture for maize, highland and roots and lowland livelihood zones and for NDVI for agropastorals and pastorals are shown in table 5.2. Assessing the common support between the range of possible hazards and those used in estimating coefficients ̂3 can help address concerns regarding the ̂2 and plausibility of the welfare impact predictions that use the fixed effects estimates. In sum, while moderate deviations in ̂2 and rainfall are well reflected in the data used to estimate ̂3 , more extreme deviations are less well reflected in the data. Table 5.2. Share of households with an extreme hazard outcome for soil moisture Livelihood zone Hazard z-score < -3 Hazard z-score < -2 Negative hazard z-score Soil moisture (GS) Maize 0 0.01 0.63 Highland 0 0.05 0.64 Roots and lowland 0 0.03 0.63 NDVI (RS) Agropastoral 0 0.00 0.40 Pastoral 0 0.01 0.38 Source: World Bank. Note: GS = growing season; RS = rainy season. We generated the alternate draws of the hazard distribution, , a in two ways. The first used the historical weather distribution to select for all households. This provided a first assessment of what an EP curve would look like but is based on a limited set of historical data points—in the case of soil moisture, only 13 years of data. The second way used the data set of 10,000 possible weather simulations for Malawi developed by the risk modeling firm AIR, which was based on the historical pattern of rainfall in Malawi. This catalog is available only for Malawi but suggests the degree of precision that modeling the weather offers as compared to using historical weather data. The generated are used with parameter estimates from tables 4.1 to 4.5 to simulate what consumption would be for each household under these weather conditions. Once consumption has been simulated for a given draw of the weather, variables that describe the simulated consumption distribution can be generated. We focus particularly on: (i) the poverty headcount rate—the share of households whose simulated consumption falls below the extreme poverty line of $1.90—and (ii) the total poverty gap—the difference between consumption and the $1.90 poverty line added up for all households whose consumption falls below the poverty line. The increase in the poverty rate and total poverty gap for eight of the nine countries included in this analysis are shown in figures 5.1 and 5.2. Niger was not included given the low percentage of the population in livelihood zones for which there are robust results. Each country comprises more than one livelihood zone, so each country draws on more than one set of parameter estimates. For the simulation results, the linear relationship is used with the soil moisture hazard for countries with growing season. For Mauritania we use the rainy season NDVI. The linear model was chosen 30 because it was a parsimonious specification that best described the underlying relationship, especially in the hazard range where 90 percent of the observed events occurred. Poverty is 1–12 percent higher (an average of 6 percent) under the worst weather conditions relative to the best conditions observed in the last 13 years, 16 and 1-5 percent (an average of 3 percent) higher under the worst conditions relative to average. This is a very large impact that could be an underestimate given it does not incorporate the impact of drought on broader market conditions, or the impact on any households that moved or disintegrated because of the shock. Any measurement error in the hazard variable would also attenuate the estimated impact. The shock is largest in Nigeria. The difference in the total poverty gap between the worst and best weather conditions ranges from US$4 million to US$2.4 billion (2011 PPP). This provides an indication of the cost of meeting the increase in welfare needs present under poor weather conditions. The actual cost would be much higher given it is not possible to target support in this way. Finding ways to reduce the impact of rainfall on household income and consumption through other means—such as investing in soil conservation and water management practices, or strengthening the ability of households to earn income in nonaffected sectors in bad years—would help reduce this cost, although it is not possible to say by how much. Figure 5.1. The impact of drought on rural poverty rates (historical CDF) a. Ethiopia b. Lesotho c. Malawi d. Mauritania 16 Nineteen years for Mauritania (given we are using NDVI data only for Mauritania given the livelihood zones it encompasses, and we have a longer NDVI timeseries). 31 e. Mozambique f. Nigeria g. Zambia h. Zimbabwe Source: World Bank. Note: CDF = cumulative distribution function. Each point in the graph shows the poverty rate that would obtain were the weather conditions of that year to be experienced again now, everything else equal. The share of the rural population covered by livelihood zones included in the analysis varies from country to country: 80 percent in Ethiopia, 100 percent in Lesotho, 91 percent in Malawi, 55 percent in Mauritania, 70 percent in Mozambique, 55 percent in Nigeria, 67 percent in Zimbabwe, and 82 percent in Zambia. Figure 5.2: Total increase in the poverty gap caused by drought (historical CDF) a. Ethiopia b. Lesotho 32 c. Malawi d. Mauritania e. Mozambique f. Nigeria g. Zambia h. Zimbabwe Source: World Bank. Note: CDF = cumulative distribution function; PPP = purchasing power parity. These graphs show the estimated change in the poverty gap due to drought x the poverty line x the rural population. The share of the rural population covered by livelihood zones included in the analysis varies from country to country: 80 percent in Ethiopia, 100 percent in Lesotho, 91 percent in Malawi, 55 percent in Mauritania, 70 percent in Mozambique, 55 percent in Nigeria, 67 percent in Zimbabwe, and 82 percent in Zambia. 33 Figure 5.3 shows how the results change when moving from weather data in the historical distribution to simulated hazard data. As discussed in subsection 2.2, use of simulations should provide more precision on the shape of the exceedance probability curve at the tail. More information is provided on the shape of this curve at the tail, but the main takeaway is unchanged, largely because the parameters were not estimated to change at larger weather shocks. This result suggests that ability to predict large deviations in welfare for very rare and extreme weather events is limited without a more informed vulnerability function in the tail, and hence that the historical and simulated distributions provide similar insights into the nature of risk of falling into poverty as a result of a drought. However, even so the difference in poverty between best and worst scenario using the historical data series is 5.3 percentage points whereas with the synthetic dataset the spread is nearly three-times as large, at 14.5 percentage points. Similarly relying on the 12 years of historical data yields a maximum increase in the poverty gap of US$315 million (2011 PPP) whereas with the synthetic data the maximum welfare loss is more than double at US$652 million (2011 PPP). Even with the limitations of the estimated vulnerability function for extreme weather events, the simulated hazard sets capture potential states of the world, especially in terms of the distribution of events across the country, providing a more robust picture of possible welfare losses. If the underlying assumptions used for the generation of the simulated hazard reflect near-future possible weather patterns, then the use of simulated hazard datasets can provide improved preparedness for non-extreme events. Figure 5.3. Impact on poverty rate and poverty gap using hazard simulations in Malawi (simulated CDF) a. Poverty rate b. Poverty gap Source: World Bank. Note: CDF = cumulative distribution function; PPP = purchasing power parity. Analysis shown includes 91 percent of Malawi’s rural population. 6. Conclusion This paper developed standardized vulnerability curves for household welfare for five main livelihood zones in Sub- Saharan Africa. The methodology involves combining methods and insights from the economics literature that estimates the impact of weather shocks on household welfare and the catastrophe risk modeling literature that traditionally has estimated the ex-ante distribution of the physical costs associated with anomalous weather events. This exercise required matching of household survey data on consumption to data on hazards at a scale not previously attempted. It also required standardizing an approach to matching, hazard definition, and estimation across countries. The exercise yielded insights into the types of hazard measures that are most useful for identifying the impact of drought on household livelihoods, and into the methods used to match data. The analysis highlighted the value of measures that closely capture the impact of drought on crop yields without requiring crop models to aggregate welfare data; and the value of aggregating over a larger geographical space that considers broader market impacts of drought (where data allow). The analysis showed the merits of estimating functions across country boundaries, but it also 34 revealed the limits of using standardized data sets in seeking to understand the heterogeneity of drought impacts across household types. We find that consumption is reduced by 10-20 percent in the worst weather observations in our surveys (corresponding to about a 1 in 10-year drought event). We use the estimated vulnerability functions and the full hazard distribution in each household location to calculate the cost drought poses to welfare outcomes. The results indicate that risk is large. Poverty is 1–12 percent higher under the worst weather conditions relative to the best conditions observed in the last 13 years. This amounts to an increase in the total national poverty gap that ranges from US$4 million to US$2.4 billion (2011 PPP) across countries. Estimates are more precise for countries with a large share of production in maize, highland and pastoral areas. Potential biases in estimating impacts mean this is likely an underestimate of the true cost. The cost of weather shocks on welfare provides an indication of the size of the increased need that would need to be met by adaptive social protection systems or access to financial services. The cost of weather shocks on consumption could be reduced by investments in adaptation policies such as soil management and tree cover or by other adaptive farming techniques. Investments in development that enable households to switch into less rainfall-dependent economic activities, either when droughts occur or as part of broader structural transformation, would also reduce the welfare cost of drought. Further work is needed to estimate how different investments would reduce the welfare cost of drought. This type of work is needed to help prioritize investments. Statistical analyses that rely only on observed historical data are generally limited by their short observation periods. In the context of drought risk, patterns and severity of future drought events might differ from what has been observed historically. Stochastic data sets provide a synthetic view of all events that could occur in a given year, together with their likelihood of occurring. These data sets typically allow for a more reliable analysis of drought risk, especially for events that occur less frequently. Use of a stochastic data set for Malawi showed how such an analysis can be carried out, but further work is needed to produce and use more reliable analyses of spatial and temporal correlations of drought hazard. A further extension of this work would incorporate climate trends into the analysis of welfare impacts in order to estimate the costs of longer-run welfare impacts (Baquie and Foucault 2023). References ABI (Association of British Insurers). 2011. Industry Good Practice for Catastrophe Modelling. London: ABI. https://www.abi.org.uk/globalassets/sitecore/files/documents/publications/public/migrated/solvency- ii/industry-good-practice-for-catastrophe-modelling.pdf. Angrist, J. D., and J. S. Pischke. 2009. Mostly Harmless Econometrics. Princeton, NJ: Princeton University Press. Anttila-Hughes, J., and M. Sharma. 2015. “Linking Risk Models to Microeconomic Indicators.” Policy Research Working Paper 7359, World Bank, Washington, DC. http://hdl.handle.net/10986/22235. Artuc. E., G. Porto, B. Rijkers. 2023. Crops, Conflict, and Climate Change. World Bank Mimeo. Auffhammer, Maximilian. 2018. “Quantifying Economic Damages from Climate Change.” Journal of Economic Perspectives 32 (4): 33–52. https://doi.org/10.1257/jep.32.4.33. Baez, Javier E., Varun Kshirsagar, and Emmanuel Skoufias. 2019. “Adaptive Safety Nets for Rural Africa : Drought- Sensitive Targeting with Sparse Data.” Policy Research Working Paper Series, Policy Research Working Paper Series, , December. https://ideas.repec.org//p/wbk/wbrwps/9071.html. Baquie, S., and G. Foucault. 2023. “Background Note on Bringing Climate Change into Vulnerability Analysis.” Unpublished background paper. World Bank. Baquie, Sandra, and Habtamu Neda Fuje. 2020. “Vulnerability to Poverty Following Extreme Weather Events in Malawi.” Policy Research Working Paper Series, Policy Research Working Paper Series, , October. https://ideas.repec.org//p/wbk/wbrwps/9435.html. Bates, Paul D., James Savage, Oliver Wing, Niall Quinn, Christopher Sampson, Jeffrey Neal, and Andrew Smith. 2023. “A Climate-Conditioned Catastrophe Risk Model for UK Flooding.” European Geosciences Union 23 (2): 891–908. https://nhess.copernicus.org/articles/23/891/2023/. 35 Blanchard, Antoine, and Luis Sousa. 2021. “Probabilistic Hazard Analysis for Drought in Sub-Saharan Africa: Stochastic Precipitation, Soil Moisture, and Vegetation Index Datasets for Malawi.” AIR Worldwide. Clarke, Daniel J and Stefan Dercon. 2016. Dull Disasters? How planning ahead will make a difference. Oxford University Press. Dell, M., B. Jones, and B. Olken. 2014. “What Do We Learn from the Weather? The New Climate–Economy Literature.” Journal of Economic Literature 52 (3): 740–98. http://dx.doi.org/10.1257/jel.52.3.740. Deveraux, S. 2007. "The impact of droughts and floods on food security and policy options to alleviate negative effects," Agricultural Economics 37(s1): 47-58, December. Dixon, John, Dennis P. Garrity, Jean-Marc Boffa, Timothy O. Williams, Tilahun Amede, Christopher Auricht, Rosemary Lot, and George Mburathi, eds. 2019. Farming Systems and Food Security in Africa: Priorities for Science and Policy under Global Change. Routledge. Fielder, T., A. J. Pitman, K. Mackenzie, N. Wood, C. Jakob, and S. E. Perkins-Kirkpatrick. 2021. “Business Risk and the Emergence of Climate Analytics.” Nature Climate Change 11: 87–94. https://www.nature.com/articles/s41558-020-00984-6. Hallegatte, Stephane, Mook Bangalore, Laura Bonzanigo, Marianne Fay, Tamaro Kane, Ulf Narloch, Julie Rozenberg, David Treguer, and Adrien Vogt-Schilb. 2016. Shock Waves: Managing the Impacts of Climate Change on Poverty. Washington, DC: World Bank. https://doi.org/10.1596/978-1-4648-0673-5. Hill, R., and C. Porter. 2017. “Vulnerability to Drought and Food Price Shocks: Evidence from Ethiopia.” World Development 96: 65–77. Hill, R., and K. Vinha. 2022. “Mauritania and Vulnerability to Drought.” Unpublished paper, World Bank. Hill, Ruth, and Carolina Mejia-Mantilla. 2017. With a Little Help: Shocks, Agricultural Income, and Welfare in Uganda. Policy Research Working Papers. The World Bank. https://doi.org/10.1596/1813-9450-7935. James, G., D. Witten, T. Hastie, and R. Tibshirani. 2013. An Introduction to Statistical Learning with Applications in R. Springer. King, G., and L. Zeng. 2006. “The Dangers of Extreme Counterfactuals.” Political Analysis 14: 131–59. Kochhar, Nishtha, and Erwin Knippenberg. 2023. “Droughts and Welfare in Afghanistan.” Policy Research Working Paper 10272, World Bank, Washington, DC. Lobell, David B., Wolfram Schlenker, and Justin Costa-Roberts. 2011. “Climate Trends and Global Crop Production Since 1980.” Science 333 (6042): 616–20. https://doi.org/10.1126/science.1204531. Mummolo, J., and E. Peterson. 2018. “Improving the Interpretation of Fixed Effects Regression Results.” Political Science Research and Methods 6 (4): 829–35. doi:10.1017/psrm.2017.44. NGFS (Network for Greening the Financial System). 2022. “Physical Climate Risk Assessment: Practical Lessons for the Development of Climate Scenarios with Extreme Weather Events from Emerging Markets and Developing Economies.” https://www.ngfs.net/sites/default/files/media/2022/09/02/ngfs_physical_climate_risk_assessment.pdf. Porter, Catherine, and Emily White (2016) “Potential for Application of a Probabilistic Catastrophe Risk Modelling Framework to Poverty Outcomes: General Form Vulnerability Functions Relating Household Poverty Outcomes to Hazard Intensity in Ethiopia.” Policy Research Working Paper 7717, World Bank, Washington DC. http://documents.worldbank.org/curated/en/854961468184437420/pdf/WPS7717.pdf. Purdy, Meghan. 2021. “Cat Models: Convenient, Familiar—But Flawed for Projecting Physical Climate Risk.” Medium, June 14, 2021. https://medium.com/jupiterintel/cat-models-f0fc7e890bef. Sen, A. 1981. Poverty and Famines: An Essay on Entitlement and Deprivation. Oxford University Press. Skoufias, E., et al. 2023. “Climatic Variability and Child Undernutrition in Sub-Saharan Africa Redux.” World Bank Mimeo. Verisk. 2013. “Modeling Fundamentals: What Is AAL?” March 25, 2013. https://www.air- worldwide.com/publications/air-currents/2013/Modeling-Fundamentals--What-Is-AAL-/. Wineman, Ayala, Nicole M. Mason, Justus Ochieng, and Lilian Kirimi. 2017. “Weather Extremes and Household Welfare in Rural Kenya.” Food Security 9 (2): 281–300. https://doi.org/10.1007/s12571-016-0645-z. 36 Appendix A. Exposure Data The exposure data layers are static inputs for the drought/welfare modeling exercise that are fixed in analysis, unlike hazard components. These exposure data layers contain information relating to various spatial characteristics as set out in table A.1. Table A.1. Information on exposure data Data set Description Source and shape files Citation Uniform Resolution Grid A country x, y grid (where extents are buffered to 2° from country boundary). (URG) World Bank – World Bank–approved administrative https://datacatalog.worldbank.org/datas World Bank Basemap Collection, country boundaries (Admin0) (and polygons) including et/world-bank-official-boundaries February 2020. administrative international boundaries, disputed areas, boundaries coastlines, and lakes. GADM – Database GADM is a database of the location of the https://biogeo.ucdavis.edu/data/gadm3. Global Administrative Areas. 2012. of Global world's administrative areas (boundaries). 6/gadm36_shp.zip GADM database of Global Administrative Administrative areas in this database include Administrative Areas, version 2.0. Areas (Admin0– countries, counties, departments, etc. and cover Admin6) every country in the world. For each area some attributes are provided, including the name and in some cases variant names. GAUL – Global Information is provided at two spatial levels: https://data.europa.eu/euodp/en/data/ Europa.EU Administrative national (Gaul0) and subnational (Gaul1). The dataset/jrc-10112- http://data.europa.eu/eli/dec/2011/8 Unit Layers reference spatial layers are derived from the 10004/resource/609a472e-7e26-4b63- 33/oj. (Gaul0–Gaul1) GAUL data set, implemented by the UN Food 8003-17298cb45e2a and Agriculture Organization (FAO) within the CountrySTAT and Agricultural Market https://mars.jrc.ec.europa.eu/asap/files/ Information System (AMIS) projects. gaul1_asap.zip Crop coverage – Mapped crop coverage data layer (2019 https://land.copernicus.eu/global/produ Buchhorn, M., B. Smets, L. Bertels, B. Copernicus Global vintage). cts/lc De Roo, M. Lesiv, N.-E. Tsendbazar, M. Land Cover Layers Herold, and S. Fritz. Copernicus Global Land Service: Land Cover 100m: Collection 3: epoch 2019: Globe 2020. Population – GHS- This spatial raster data set depicts the https://ghsl.jrc.ec.europa.eu/ghs_pop20 GHS population grid multi-temporal POP R2019A distribution of population, expressed as the 19.php (1975–1990–2000–2015). number of people per cell. 37 Travel Time to The data set includes five data layers https://dataverse.harvard.edu/file.xhtml HarvestChoice, International Food Markets – representing travel time to the nearest market ?persistentId=doi:10.7910/DVN/YKDWJD Policy Research Institute (IFPRI). 2016. International Food of five sizes (population of 20,000, 50,000, /UFOGZS&version=2.2 “Travel Time to Markets in Africa Policy Research 100,000, 250,000, and 500,000) respectively, on South of the Sahara.” Institute (IFPRI) five arc-minute grids (~9 km) across SSA (2016- https://doi.org/10.7910/DVN/YKDWJD 01-29). , Harvard Dataverse, V2. Global Agro- GAEZ generates large databases of (i) natural http://pure.iiasa.ac.at/id/eprint/13290 Fischer, G., F. O. Nachtergaele, S Ecological Zones resource endowments relevant for agricultural Prieler, E. Teixeira, G. Toth, H. van (GAEZ) uses; (ii) spatially detailed results of individual Velthuizen, L. Verelst, and D. Wiberg. land utilization type assessments in terms of 2012. Global Agro-Ecological Zones suitability and attainable yields; (iii) spatially (GAEZ v 3.0)—Model Documentation. detailed results of estimate/actual yields of main Laxenburg, Austria: IIASA food and fiber commodities for all rain‐fed and (International Institute for Applied irrigated cultivated areas; and (iv) spatially Systems Analysis) and Rome, Italy: detailed yield and production gaps, also for main FAO. food and fiber commodities. Livelihood zones Map of farming systems in Africa in 2015. The Personal copy of data from Christopher Dixon, John, Dennis P. Garrity, Jean- farming systems analysis integrates an extensive M. Auricht (https://auricht.com/) Marc Boffa, Timothy O. Williams, range of spatial data, administrative statistics, Tilahun Amede, Christopher Auricht, assessment reports, and expert knowledge in Rosemary Lott, and George Mburathi, order to update the African component of the eds. 2019. Farming Systems and Food 2001 FAO/World Bank farming systems analysis. Security in Africa: Priorities for Science Data sources include GAEZ FAO/IIASA, FAOSTAT, and Policy Under Global Change. and HarvestChoice. Routledge. (See figure 2.2b.) Source: World Bank compilation. Appendix B. Hazard Data Table B.1. Information on hazard data ERA5 CHIRPS SPEI WRSI SWI10 NDVI Hazard Precipitation Precipitation Evapotranspiration Availability of water for Soil moisture Vegetation measure (modeled) crops (greenness) Full name European Centre for Climate Hazards Group Standardized Water Requirement Average of Soil Water Normalized Medium-Range InfraRed Precipitation Precipitation- Satisfaction Index Index over a 10-day Difference Vegetation Weather Forecasts with Station data Evapotranspiration period Index (ECMWF) Reanalysis Index 5th Generation 38 Description ERA5 climate The CHIRPS data The SPEI is a multi- This spatially explicit The Soil Water Index The Terra Moderate reanalysis gives a archive is a quasi- scalar drought index index is an indicator of quantifies the Resolution Imaging numerical global (50S–50N) based on climatic data. crop performance moisture condition at Spectroradiometer description of the gridded 0.05° It is a ratio of actual to based on the various depths in the (MODIS) Vegetation recent climate, resolution times series potential availability of water to soil. It is mainly driven Indices Monthly produced by covering precipitation evapotranspiration. the crop during a by precipitation via (MOD13C2) Version combining models from 1981 to near-real growing season. the process of 6.1 product provides with observations. time. infiltration. Various t- a vegetation index Total precipitation is values are available: 1, value at a per pixel the accumulated 5, 10, 15, 20, 40, 60, basis. The NDVI is liquid and frozen 100. The T- value referred to as the water, comprising simulates the continuity index to rain and snow, that infiltration time into the existing National falls to the Earth’s deeper soil layers and Oceanic and surface. It is the sum hence indicates Atmospheric of large-scale qualitatively the depth Administration precipitation and of the SWI estimate. Advanced Very High convective T40 was selected as Resolution precipitation. being generally Radiometer (NOAA- representative, based AVHRR)-derived on expert advice. NDVI. Temporal 1979 to present 1981 to near-real time 1901–2018 2000–2021 2007–2021 2000–2021 coverage precipitation time series Spatial coverage Global Global Global Regional Global Global Resolution Reanalysis: 0.25° 0.05° (~5.5 km) 0.5° (~5.5 km) 0.1° (~11 km) 0.1° (~11 km) 0.5° (~5.5 km) (~28 km) Output 24-hour aggregation mm/per unit of time Z-values, monthly WRSI per raster cell; An index describing Vegetation density, information since the beginning current WRSI data give the relation between monthly of the forecast range of 0–100 percent surface soil moisture of total and profile soil moisture as a function of time (percent) Data set source https://cds.climate.c https://data.chc.ucsb.e https://spei.csic.es/ https://edcftp.cr.usgs.g https://land.copernicu https://lpdaac.usgs.g opernicus.eu/cdsapp du/products/CHIRPS- ov/project/fews/dekad s.eu/global/products/s ov/products/mod13c #!/dataset/reanalysi 2.0/africa_dekad/tifs/ al/ wi 2v061/ 39 s-era5-single- https://earlywarning.us levels?tab=overview gs.gov/fews/product/1 28 Data set Hersbach, H., B. Bell, Funk, C.C., P. J. Sergio M. Vicente- Verdin, J., and R. Klaver. Copernicus Service Didan, K. 2021. reference/citati P. Berrisford, G. Peterson, M. F. Serrano, Instituto 2002. “Grid‐cell‐based information 2018. MODIS/Terra on Biavati, A. Horányi, J. Landsfeld, D. H. Pirenaico de Ecología, Crop Water Accounting Vegetation Indices Muñoz Sabater, J. Pedreros, J. P. Verdin, Zaragoza, Spain. for the Famine Early Monthly L3 Global Nicolas, C. Peubey, J. D. Rowland, B. E. Santiago Beguería, Warning System.” 0.05Deg CMG V061 R. Radu, I. Rozum, D. Romero, G. J. Husak, J. Estación Experimental Hydrological Processes [data set]. NASA Schepers, A. C. Michaelsen, and A. de Aula Dei, Zaragoza, 16 (8): 1617–30. DOI: EOSDIS Land Simmons, C. Soci, D. P. Verdin. 2014. “A Spain. 10.1002/hyp.1025. Processes DAAC. Dee, and J.-N. Quasi-Global Accessed 2021-06-24 Thépaut. 2018. Precipitation Time from “ERA5 Hourly Data Series for Drought https://doi.org/10.50 on Single Levels monitoring.” US 67/MODIS/MOD13C2. from 1979 to Geological Survey Data 061. Present.” Copernicus Series 832, ftp://chg- Climate Change ftpout.geog.ucsb.edu/ Service (C3S) Climate pub/org/chg/products/ Data Store (CDS). CHIRPS- 2.0/docs/USGS- DS832.CHIRPS.pdf. Additional Range of WRSI values: information/ < 50: Failure notes 50–60: Poor 60–80: Mediocre 80–95: Average 95–99: Good 99–100: Very Good 0: N/A The following values are flags (and not specific WRSI values in the normal range 0– 100): 253: No start (late) 254: Yet to start 40 Source: World Bank compilation. Table B.2. Definitions used for household descriptive variables from country surveys Country Land ownership (0 if none, 1 if any) Education Ethiopia Ownership of land plots Literacy Lesotho Ownership or cultivation of land Level: primary complete Malawi Land area cultivated Can read and write Mauritania Hectares owned Literacy Mozambique Number of plots Can read and write Niger Land area cultivated Level: any Nigeria Land area cultivated Level: primary complete Zambia Land area cultivated Level: primary complete Zimbabwe Ownership of any arable land Level: primary complete Source: World Bank. Note: Country-specific definitions for land ownership and education are based on survey information. 41 Appendix C. Hazard Simulations The Centre for Disaster Protection and the World Bank asked Verisk’s AIR Worldwide (AIR) to develop a series of stochastic catalogs of relevant climate variables in Malawi. These would support the ongoing effort to develop a standardized approach to ex ante estimation of the welfare impacts of moderate and severe droughts in SSA, complementing the ongoing empirical analysis. The catalogs would allow a forward-looking assessment of possible future events that goes beyond the reliance on the history of past events (i.e., would allow a more robust probabilistic view of future events). AIR’s methodology, data sources (table C.1), data validation, and results are set out in Blanchard and Sousa (2021) and summarized below, with extracts largely taken directly from the paper. Developing the Stochastic Catalogs for Malawi Under the scope of this project, AIR developed stochastic catalogs of three drought-related variables for Malawi— precipitation, soil moisture, and vegetation index. The 10,000-year stochastic catalog produced for each variable includes 10,000 samples of annual time histories for the variable of interest, which all reflect conditions that are likely under the current (and near-current) climate. Table C.1. Data sets used to generate the stochastic catalogs Variable Description Data set Resolution, Source frequency, period covered Atmospheric Atmospheric ECMWF Reanalysis 0.25°, three- ECMWF moisture reanalysis of the v5 (ERA5) hourly, 1979– global climate 2018 Precipitation Atmospheric CHIRPS 2.0 0.05°, dekadal CHIRPS precipitation (10-day), 2000– estimates from 2020 rain gauge and satellite observations Soil moisture Soil Water Index SWI10-40 0.1°, dekadal, Copernicus Global (SWI) at a depth of 2007–present Land Service 40 cm Vegetation index Normalized MOD13C2 Version 6 0.05°, monthly, MODIS Difference 2000–2020 Vegetation Index Source: Blanchard and Sousa 2021. Note: CHIRPS = Climate Hazards Group InfraRed Precipitation with Station; ECMWF = European Centre for Medium-Range Weather Forecasts; MODIS = Terra Moderate Resolution Imaging Spectroradiometer. The approach is based on random sampling of stochastic perturbations of historical data to generate stochastic realizations of precipitation values over Malawi. The stochastic catalogs of NVDI and SWI are then derived from the stochastic precipitation catalog. More specifically, the 10,000-year stochastic precipitation catalog has a spatial resolution of 0.05° and dekadal temporal resolution. It was developed using random sampling of stochastic perturbations of atmospheric moisture data (ERA5 [(European Centre for Medium-Range Weather Forecasts) Reanalysis v5]) between 1979 and 2018, at a native resolution of 0.25° and three-hour intervals (as an input to the process). While CHIRPS data could not be used in place of ERA5 at the modeling stage, they were used to augment a climatological adjustment to reflect “real” rainfall readings and climatology compared to ERA5. 42 The result is 250 independent stochastic realizations of what a 40-year precipitation time series might be. The pseudo-precipitation values generated based on ERA5 (prior to the adjustment for CHIRPS) were subsequently used to generate SWI and NDVI catalogs using empirical correlations known to exist between precipitation and these two variables to develop statistical relationships. Each stochastic realization of NDVI and SWI was thus obtained from (and consistent with) the stochastic realizations of precipitation. Validation Stochastic catalogs are designed to provide a view of hazard that goes beyond the data available in the historical record. In developing such catalogs, care must be exercised to avoid compromising the ability to realistically represent the physical processes governing the natural phenomena being modeled. Several diagnostic figures were generated using the rainfall catalog to show that the catalog created statistically robust stochastic realizations. Specifically, these realizations (i) preserved the intrinsic properties in terms of spatial and temporal correlations; (ii) were consistent with the empirical distributions of precipitation at each individual location; and (iii) produced a realistic view of hazard, consistent with historical observations while allowing for the creation of new extremes. Validation diagnostics included several comparisons: - The synthetic rainfall time series was compared with historical data (CHIRPS satellite rainfall data and gauge rainfall data), across different spatial aggregations; this included comparing the distribution of monthly rainfall averages for each month in the stochastic catalog (figure C.1) with the CHIRPS data set. The result shows that the stochastic catalog captures well both historical trends and an extension of the tails of the monthly distributions, suggesting the presence of previously unseen extremes and consistent with physical processes at play. - The cumulative distribution functions (CDFs) for historical and simulated hazards were compared to assess the relationship between frequency of an event’s occurrence and its severity. - The spatial distributions of rainfall averaged over various time scales were compared. 43 Figure C.1. Distribution of monthly rainfall averages aggregated at the country and regional levels Source: Blanchard and Sousa 2021. Note: PRCP = precipitation. Each illustrated monthly distribution extends over its overall value range, with the top, middle, and bottom notches denoting the 90 percent, 50 percent, and 10 percent quantiles, respectively. The inset shows the driest months only (June through September). For the soil moisture catalog, in addition to considering spatial-temporal correlations and consistency with individual empirical distributions, a key objective was to ensure that the correlations between the variables, as obtained from the stochastic time series, were consistent with their empirical counterparts. Diagnostic figures (Figure C.2) were produced that highlighted good agreement between historical and simulation data in terms of both frequency and amplitude; they also showed that seasonal trends and associated spatial patterns were adequately captured in the catalogs. In summary, the stochastic catalogs produce a realistic view of hazard, which is consistent with historical observations while allowing for the creation of new extremes. 44 Figure C.2: Distribution of monthly SWI averages aggregated at the country and regional levels Source: Blanchard and Sousa 2021. Note: SWI = Soil Water Index. For each illustrated monthly distribution, the top, middle, and bottom notches denote the 90 percent, 50 percent, and 10 percent quantiles, respectively. Appendix D. Drought Hazard: Stochastic versus Historical Perspectives A comparison of exceedance probability curves derived from the empirical and stochastic time series shows the inability of empirical data to capture all possible events that may occur in the future, including extreme events. This is evident from figure D.1, where the historical record of precipitation (represented by the empirical EP curve) is but one of a range of possible future outcomes (represented by the range of stochastic EP curves). For SWI (figure D.2), in some cases the historical EP falls outside the range of stochastic results. However, this is expected, as the SWI catalog is derived from the stochastic precipitation catalog, rather than from the empirical SWI data directly. This approach was used so that the correlations between stochastic SWI and precipitation time series would robustly reflect their empirical correlations. Figure D.1. EP curve for total precipitation aggregated at the country level for the entire year 45 Source: Blanchard and Sousa 2021. Note: EP = exceedance probability; PRCP = precipitation. Horizontal axis is shown in logarithmic scale to highlight return periods of up to 20 years, which are more relevant to this application. Figure D.2. EP curve for total SWI aggregated at the country level for the entire year Source: Blanchard and Sousa 2021. Note: EP = exceedance probability; SWI = Soil Water Index. Horizontal axis is shown in logarithmic scale to highlight return periods of up to 20 years, which are more relevant to this application. Comparing the stochastic and historical time series of z-scores over the last 20 years for rainfall shows the stochastic catalog accounting for a range of future outcomes that would be neglected in an analysis relying solely on the historical record. The reduced range of stochastic variability for the SWI catalog is the result of the reduced variability in the historical time series of these variables. (Figure D.3 shows only the last eight years for visual clarity, but these are representative of the entire range). 46 Figure D.3. Stochastic catalog versus historical z-score time series of precipitation and SWI Source: Blanchard and Sousa 2021. Note: PRCP = precipitation; SWI = Soil Water Index. This analysis highlights the ability of the stochastic approach to complement the historical view by producing a realistic range of possible future outcomes that goes beyond what can be inferred from the historical record. 47