Policy Research Working Paper 10768 A Data-Driven Approach for Early Detection of Food Insecurity in Yemen’s Humanitarian Crisis Steve Penson Mathijs Lomme Zacharey Carmichael Alemu Manni Sudeep Shrestha Bo Pieter Johannes Andrée Poverty and Equity Global Practice May 2024 Policy Research Working Paper 10768 Abstract The Republic of Yemen is enduring the world’s most severe Historical validation demonstrates that warnings can be protracted humanitarian crisis, compounded by conflict, reliably issued before sharp deterioration in food security economic collapse, and natural disasters. Current food inse- occurs, using only a few critical indicators that capture curity assessments rely on expert evaluation of evidence with inflation, conflict, and agricultural productivity shocks. limited temporal frequency and foresight. This paper intro- These indicators signal deterioration most accurately at duces a data-driven methodology for the early detection and five months of lead time. The paper concludes that simple diagnosis of food security emergencies. The approach opti- data-driven approaches show a strong capability to generate mizes for simplicity and transparency, and pairs quantitative reliable food security warnings in Yemen, highlighting their indicators with data-driven optimal thresholds to generate potential to complement existing assessments and enhance early warnings of impending food security emergencies. lead time for effective intervention. This paper is a product of the Poverty and Equity Global Practice. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at spenson@ worldbank.org and bandree@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team A Data-Driven Approach for Early Detection of Food Insecurity in Yemen's Humanitarian Crisis Steve Penson,a* Mathijs Lomme,a,b Zacharey Carmichael,a Alemu Manni,c Sudeep Shrestha,b Bo Pieter Johannes Andrée,a*1 Keywords: Agriculture and Food Security, Crisis, Early Warning Systems, Food Price Analysis, Vulnerability, Economic Monitoring JEL: C01, C14, C25, C53, O10 1 a World Bank, b ACAPS, c FAO, *email: spenson@worldbank.org and bandree@worldbank.org. Funding by the World Bank’s Food Systems 2030 (FS2030) Multi-Donor Trust Fund program (grants TF0C0728 and TF0C0828) is gratefully acknowledged. This paper has been prepared as background to the Joint Monitoring Report (JMR), a multi-partner monitoring initiative in Yemen. We would like to thank the peer reviewers on the Quality Enhancement Review for the Yemen Data Driven Identification of Food Security Crises through the JMR, Alan Fuchs, Alexandra Christina Horst, and José Lopez; and the peer reviewers on the Decision Review meeting on Expansion of Real Time Food and Energy Price Monitor, Sergiy Zorya, Nick Haan, Kamau Wanjohi, whose comments helped improve this paper. The authors would like to thank members of JMR Core Development Team including teams at ACAPS, UNICEF, FAO, WFP, WHO and the World Bank. Thanks also to those who contributed to the contextual analysis and methodology development and review including Francesca Marini, Artavazd Hakobyan, Faiza Hesham Hael Ahmed, Nic Parham, Maliha Hussein, Oleg Bilukha, Elijah Odundo, Ismail Kassim, Peter Hailey, Dan Maxwell, Rebecca Semmes, Riham Abuismail, Fawad Raza, Gaurav Singhal, Hussein Gadain, Andres Chamorro, Alia Jane Aghajanian, Olaf De Groot, Emily Henderson, Patrick Vercammen, Felix Leger and Seb Fouquet. Finally, we would like to thank participants to the Joint Monitoring Report workshops including the Yemen Social Fund for Development (SFD) as well as humanitarian, development, and donor partners who have supported the research throughout. This paper reflects the views of the authors and does not reflect the official views of the World Bank, its Executive Directors, or the countries they represent. 1 Introduction Despite considerable humanitarian assistance, the food crisis in the Republic of Yemen remains one of the world's most dire humanitarian catastrophes in the world. Yemen currently is home to the 5th largest population in the world experiencing crisis levels of acute food insecurity (FSIN and Global Network Against Food Crises, 2023). The latest country-wide multi-partner Integrated food security Phase Classi�ication (IPC) assessment of end of 2022 (IPC, 2022) estimated that 17 million people were in a food crisis, or worse situation, meaning that the population was unable to meet minimum dietary needs without resorting to irreversible coping strategies. Food security can be assessed across four dimensions requiring that food is available, that individuals can access this food, that food supply and access are stable, and that food provides adequate nutrition (Food and Agriculture Organization (FAO), 2008). Acute food insecurity arises when these dimensions are severely impaired. The food security challenges faced by Yemen are signi�icant, and the number of interrelated factors that can worsen acute food insecurity are many. Humanitarian disasters stem from intricate connections among con�lict, poverty, extreme weather, climate, and food price shocks (Misselhorn, 2005; Headey, 2011; Singh, 2012; D’Souza and Jolliffe, 2013), exacerbated by enduring structural factors (Maxwell and Fitzpatrick, 2012), and ultimately lead to high levels of acute malnutrition and mortality in vulnerable populations. The prevalence of acute severe malnutrition in Yemen has impacted the population, leading to increased vulnerability to health issues and diseases, such as cholera, stunting, wasting, and a variety of both physical and mental health consequences. Prior to the 2015 escalation of con�lict, Yemen already had one of the world’s highest malnutrition levels. The situation has been aggravated by escalating con�lict and economic decline, and recently the overwhelming impact of the COVID-19 pandemic and the war in Ukraine. During this time, many aid projects, including emergency food assistance, WASH services, and malnutrition treatment programs that are highly dependent on continued funding from donor partners (UNICEF, 2020), have been disrupted periodically by funding shortfalls. Malnutrition has particularly severe impacts on children, leading to long-term declines in cognitive development and potentially enduring health issues. The assessment of Black et al. (2013) on maternal and child undernutrition in low-income and middle-income countries revealed that nearly half of child deaths worldwide were linked to undernutrition. In 2021-2022, Gatti, et al. (2023) estimated that severe food price shocks in the MENA region resulted in hundreds of thousands of children facing long-term consequences, including stunted growth. Additionally, beyond immediate health consequences and loss of life, severe food crises in�lict lasting harm on the children of affected families, resulting in intergenerational adverse health and educational outcomes (Galler & Barrett, 2001; Veenendaal, et al., 2013; Galler & Rabinowitz, 2014; Asfaw, 2016). Recognizing these costs, the international community has responded to Yemen’s food crisis with enormous humanitarian aid. According to World Bank data on aid and of�icial development assistance, Yemen received 8 billion dollars in of�icial development assistance alone in 2018. At the $1.90 dollar poverty line, this was suf�icient to pay for almost 90% of the annual expenses of the population FAO estimates to have been malnourished that year. In a comprehensive review of 2020 aid programs, Ghorpade and Ammar (2021) estimated that the combined reach of humanitarian and development programs was enough to cover, and in fact exceed, the entire Yemeni population. Alongside aid, there has recently been a growing emphasis on prevention and targeted intervention, as it is often more cost- effective and sustainable to prevent humanitarian catastrophes rather than solely responding to them (Meerkatt, Kolo, & Renson, 2015; Mechler, 2016). Moving from reactive to pro-active aid requires investment in close monitoring to enable early detection and rapid response when new food security risks emerge (Maxwell & Hailey, 2020). Different early warning and food security information systems already exist to support and inform humanitarian and 2 development programming, including FEWS NET and the Integrated Food Security Phase Classi�ication (IPC). To date, IPC analyses have provided the primary and common means for tracking food insecurity risks. These major analyses, however, require signi�icant resources and time to conduct and are typically updated on an annual or at best on a semi-annual basis. Large-scale household surveys are infrequent due to access constraints, security issues, and lack of funding. For instance, the World Bank has not carried out a Living Standards Measurement Study and poverty assessment in Yemen since the war broke out in 2015. While comprehensive analyses are vital for informing programming, targeted humanitarian interventions require more frequent monitoring to mitigate potentially fast-moving developments. The need to enhance the current food security monitoring processes in Yemen is well documented and highlighted for instance by the IPC Famine Review (Maxwell, et al., 2022). To contribute to an improved capacity to predict when, where and how food insecurity escalates, this paper explores data-driven approaches for the early detection and diagnosis of food insecurity emergencies in Yemen. The approach optimizes for simplicity and transparency, and pairs quantitative indicators with data-driven optimal thresholds to generate early warnings of impending food insecurity emergencies. Previous data-driven approaches to forecast impending emergencies have been pioneered for instance by Mellor (1986), who emphasized economic vulnerabilities, crop failures, and price signals as key indicators of famine. This provides a modeling template that remains in place today. Further insights into price signals and economic deterioration speci�ically were provided by Seaman and Holt (1980), Cutler (1984) and (Khan, 1994) in the context of the Ethiopian famine of 1972-1974 and the 1984-1985 famine in Niger, and by Andree (2022) who forecasts severe food insecurity in 191 countries based on macro-economic data. More recently, machine learning and time series methods have been employed for prediction at a granular level, as demonstrated by André e et al., who predicted local future food crisis using data on food prices, agricultural productivity shocks, and con�lict. Using the same data, Wang, et al. model transitions across lower and higher food insecurity phases. Related approaches have since been developed to provide machine-learning driven high-frequency monitoring of food security (Martini et al., 2022). The proposed methodology in this paper builds upon the existing food security modeling literature, with a speci�ic focus on parsimony, simplicity, and transparency. The proposed approach is a response to recent calls emphasizing the importance of simplicity and transparency in food security modeling (e.g., Baylis et al., 2021; Zhou et al., 2022; McBride et al., 2022). The approach is also informed by the indicator-driven alert system developed by Somalia’s Food Security and Nutrition Analysis Unit (FSNAU), which provides automated monitoring capabilities. The proposed methodology extends this system by optimizing alert thresholds at different levels of tolerance for false alerts, benchmarking different indicators, and selecting optimal approaches based on historical validation. This results in a lightweight but effective food security monitoring system that can cater to different targeting strategies simultaneously and supplement existing food insecurity assessments. Historical validation of the warnings demonstrates that food insecurity emergencies can be reliably detected before they occur, using only a few critical indicators to capture in�lation, con�lict, and agricultural productivity shocks. The paper concludes that the simple data-driven approaches show a strong capability to detect impending food security emergencies, highlighting their potential to complement existing assessments and enhance lead time for effective, timely, and proactive response. The paper is structured as follows. Section 2 introduces the data, section 3 develops a framework for the validation and calibration of indicators and thresholds, paying particular focus to balancing false positives and false negatives. Section 4 presents key results, and section 5 concludes. Additional results are found in the supplementary appendices. 3 2 Data 2.1 Target variable: Emergency outbreaks This paper aims to predict transitions into critical states of food insecurity with suf�icient lead time for action, solely using readily observable indicators. The strategy is to pair the indicators with optimized thresholds to issue reliable warnings before major escalations in food insecurity occur. This work particularly draws on the World Bank's research on Predicting Food Crises (André e et al., 2020). Of�icial IPC data is only available from 2018 to the present. Since the objective is to calibrate indicators to a historical time series of food insecurity situations, historical IPC-compatible data was gathered from FEWS NET covering periodic assessments conducted in 333 districts in Yemen from October 2014 to July 2023. The data quanti�ies food insecurity using the IPC-compatible analytical framework categorizing the severity of food insecurity and recommending risk mitigation policies (Hillbruner and Moloney (2012) provide a review of the process). The IPC scale distinguishes �ive phases of food insecurity: (1) minimal/none, (2) stressed, (3) crisis, (4) emergency, and (5) famine/catastrophe. When food insecurity reaches crisis levels, the IPC scale advises a signi�icant policy shift. Speci�ically, for conditions of stress (2) and below, the focus is on risk management, while at crisis level (IPC Phase 3) and above, it shifts to urgent action to mitigate outcomes (IPC, 2021). FEWS NET IPC data are reported at a sub-national livelihood zone level. To obtain a consistent time series, the data were mapped to a comparable district level using a spatial overlay and population density was calculated using Meta 2018 high resolution population density maps (Meta, 2024). Table 1 shows summary statistics. Table 1: Summary statistics of FEWS NET IPC classifications from 2014 to 2023. Frequency of FEWS NET IPC observations: 8,646. Number of districts: 333. FEWS NET IPC phase adjusted for aid Frequency (n=8,646) 1 0 2 678 3 4213 4 3751 5 4 FEWS NET IPC phase transition adjusted for aid Frequency (n=8,464) 1 to 2 0 2 to 3 194 3 to 4 444 4 to 5 4 % of time each district spent in IPC Phase 3+ Frequency (n=333) 0% - 20% 9 20% - 40% 0 40% - 60% 0 60% - 80% 30 80% - 100% 294 The summary statistics reveal several important insights: • 92% of IPC observations were in IPC Phase 3+. • 7% of IPC observations could be classi�ied as IPC escalations when compared to the previous period. • 88% of districts spent 80%-100% of the time in IPC Phase 3+. 4 Given most observations fall into IPC Phase 3 or above, the focus of the application is on preventing the transition from IPC Phase 3 to 4. The goal is to issue warnings of impending IPC Phase 4, bolstering prevention efforts. Figure 1 shows the FEWS NET IPC phase distribution between 2014 and 2023. Figure 1: FEWS NET IPC Phase distribution . The phase data is netted from humanitarian impacts, the values indicate the �ive phases of food insecurity: (1) minimal/none, (2) stressed, (3) crisis, (4) emergency, and (5) famine/catastrophe. 2.2 Food security risk indicators To identify key indicators of worsening food security, datasets with comprehensive spatial and temporal coverage were reviewed. A thorough review of 26 available datasets was carried out to select food security risk indicators. These indicators were reviewed through several factors: • Data quality: The data quality of the indicator. • Relation to food security/nutrition: Whether the indicator is related to food security or nutrition. • Risk/outcome indicator analysis: Whether the indicator is a risk indicator or an outcome indicator. • Suitability: Whether the indicator is suitable for food security risk alert modeling based on expert consultation. The goal is to leverage this data to spot signs of impending emergencies through historical comparison, issuing alerts at two speci�ic thresholds. These thresholds are adjusted to balance the trade-off between false positives and false negatives effectively. Table 2 highlights the food security risk indicators used by the analysis. These indicators align with food and nutrition security dimensions used by IPC: food access, food availability and food stability (IPC, 2021). Direct measures of nutrition and food utilization were not available. 5 Table 2: Food security indicators For each indicator, the dimension corresponds to those recognized by the IPC framework. The method and window have been selected based on historical calibration. Alert and alarm indicate thresholds at which warnings are issued, with alarms indicating critical risks compared to alerts. Indicator Description Dimension Method Window Alert Alarm Percentage Average of top 5 change from 7.0%- >14.2% Food prices performing food Access exponential 4 months 14.2% increase items (YER) moving average increase (EMA) Percentage Average of petrol 15.8%- change from >35.1% Fuel prices and diesel price Access 4 months 35.1% moving average increase (YER) increase (MA) Relative Exchange YER to USD Access Strength Index 8 months 67.1-75.2 >75.2 rate exchange rate (RSI) Standardized -0.88 to- Drought Precipitation Index Availability SPI 1 month <-0.88 0.12 (SPI) District and 14-month neighboring Con�lict Stability RSI of the EMA RSI over 12 64.5-90.2 >90.2 districts averaged month EMA con�lict fatalities Summed 14-month Displacement displacements Stability RSI of the EMA RSI over 6 55.1-68.7 >68.7 (from and to) month EMA Below, each data source is discussed and detail the different indicators that were constructed. Details on the formulas used to implement the different indicator methods are available in Annex I. 2.2.1 Food prices In Yemen, the affordability of key food commodities is a critical indicator of food security levels. Rising food prices, currency devaluation, disruptions in public salary payments, and diminished job opportunities have signi�icantly decreased purchasing power, leaving more people unable to meet their basic needs (ACAPS, Mercy Corps, 2020) (ACAPS, 2023). Several food price indicators were explored to capture these drivers including individual food item analysis, custom food item baskets and the humanitarian Standard Minimum Expenditure Basket (SMEB) (Cash and Markets Working Group Yemen, 2022). Data is used from the World Bank’s RTFP data derived from WFP surveys, capturing monthly food prices at a district level (André e, 2021, André e, 2023a, André e, 2023b, Adewopo et al., 2024). This methodology integrates actual data and machine learning estimates to monitor continuous food prices across, �illing in gaps where direct market data collection was not possible due to access issues in real time. Three methods were tested to analyze food prices: the percentage deviation from the Moving Average (MA), the percentage deviation from the Exponential Moving Average (EMA), both across time-windows ranging from 1 to 12 months, and the Relative Strength Index (RSI), across time-windows ranging from 6 to 14 months. Both YER and USD food prices were modeled but separating these as two indicators worked better. In addition to the �ive food categories in the SMEB, prices for individual Real Time Food Prices (RTFP) 2 food items were analyzed. Results indicated a strong predictive power of certain food categories, speci�ically imported foods such as beans, millet, sorghum, sugar, and wheat �lour. A 5-item basket consisting of these food items reached almost 10% lower loss than the SMEB basket. This resulted 2 RTFP: https://microdata.worldbank.org/index.php/catalog/study/WLD_2021_RTFP_v02_M 6 in our selection of the average price of these �ive food items as our target indicator. The optimal method and time-window found was the percentage deviation from a four-month EMA. For detailed statistical outcomes, refer to Annex II. 2.2.2 Fuel prices Fuel affordability remains a critical issue affecting food security in Yemen, with rising fuel prices increasing food distribution costs and, consequently, food prices (ACAPS, 2023). Yemen's dependence on fuel imports makes it susceptible to �luctuations in international oil prices, impacting internal fuel prices and driving up overland transportation costs and food prices. Additionally, higher fuel prices escalate the cost of living, pushing more people below the basic needs affordability threshold. Modeling employed the World Bank’s RTEP 3 data (André e, 2021, André e, 2023a, André e, 2023c) derived from WFP surveys, capturing monthly food prices at the district level. For fuel price analysis, the same methods were used to developed indicators from the food price data. Diesel and petrol showed high predictive accuracy individually compared to gas prices as well as all other indicators. To capture robust signals from the data, a fuel basket was speci�ied consisting of an equal part of petrol and diesel. The percentage deviation from the 4-month EMA emerged as the most effective method. For detailed statistical outcomes, refer to Annex II. 2.2.3 Exchange rate Exchange rate volatility signi�icantly in�luences food price changes, and particularly in areas controlled by the Internationally Recognized Government (IRG) of Yemen. In January 2020, the Central Bank of Yemen (CBY) in Sana’a prohibited the use of new Yemeni rial (YER) banknotes issued by the IRG- controlled CBY in Aden, leading to a dual currency system. The printing of new YER banknotes by CBY Aden to �inance the IRG's budget de�icit has depreciated the YER against the USD, increasing the cost of goods and services for households. This depreciation affects purchasing power and, consequently, food security, underscoring the importance of continuous monitoring (ACAPS, 2023). The new notes are not accepted in areas controlled by the Ansar Allah (AA), resulting in a dual currency system with a pronounced in�lation differential. Two data sources for the exchange rate were considered: a Telegram source recording rates several times a week in both Aden and Sana’a, and the World Bank’s RTFX data (André e, 2021, André e, 2023a, André e, 2023d) derived from WFP surveys, which provides monthly averaged rates per governorate. The World Bank’s Real Time Exchange Rates (RTFX) 4 data, offered more detail, better data coverage, and better modeling results and was selected as the exchange rate indicator for the analysis. The same indicators were constructed as for the fuel and food prices. Analysis over 6 to 14-month time windows revealed an 8-month RSI as most effective in signaling escalation risks. For comprehensive statistical details, see Annex II. 2.2.4 Drought Drought signi�icantly impacts food security in Yemen, exacerbating water scarcity, reducing crop yields and livestock productivity, increasing reliance on expensive food imports, and intensifying the humanitarian crisis. It can also cause displacement and resource con�licts, further destabilizing the region. Despite imported food accounting for 83% of Yemenis' daily caloric intake (ACAPS, 2023), making drought a relatively less impactful food security risk compared to food prices or exchange rates, drought remains critical due to the role of agricultural in over half of the households' employment and the signi�icance of cash crops like Qat on income and expenditure, affecting household purchasing power (UNDP, 2022). 3 RTEP: https://microdata.worldbank.org/index.php/catalog/study/WLD_2023_RTEP_v01_M 4 RTFX: https://microdata.worldbank.org/index.php/catalog/study/WLD_2023_RTFX_v01_M 7 For drought analysis, rainfall data (Funk et al., 2015), Normalized Difference Vegetation Index (NDVI) (USGS, 2023), and Standardized Precipitation Index (SPI) (Guttman, 2007) were evaluated. Various methods, including Z-score (mean and median), moving averages of the anomaly rates, and the SPI, were tested. Crop calendar information for Yemen was used (FAO-FSIS, Government of Yemen, 2018) to determine the critical months for rainfall in crop growth, and indicators were tested that used the non- crop months to �ix the signals at 0 in the data. Z-scores considering a crop calendar resulted generally in improved results. However, the best results came from the year-round data SPI method, which outperformed crop-growing only months. The prede�ined general thresholds recommend for the SPI to indicate severe droughts are a value of -1.3 and -1.6 to indicate extreme droughts. The optimized thresholds given by our model are higher than the standard SPI thresholds. For detailed statistical outcomes, refer to Annex II. 2.2.5 Conflict Con�lict affects general security and the movement of people, crucial for agricultural and �isheries production and market access. Globally in 2017, 60% of undernourished people and 79% stunted children lived in con�lict-affected areas (FAO, IFAD, UNICEF, WHO, and WFP, 2017). Studies indicate that con�lict's impact on food security and nutrition worsens with prolonged con�lict and weak institutional response (Holleman, Jackson, Sanchez, & Vos, 2017), as seen in Yemen. ACLED data, tracking monthly con�lict incidents and fatalities at the district level, was utilized. Con�lict intensity was gauged by fatalities and incidents per district. Incidents include battles, explosions/remote violence, or violence against civilians. The analysis considered con�lict at both the district and surrounding district levels to capture exposure to nearby con�lict, de�ined as occurring within either one or two levels of proximity. Methods tested for both direct and neighboring district con�lict data included percentage deviations from MA and EMA, and the RSI of these EMAs. For neighboring con�lict, the averages over the neighboring districts were constructed �irst, before calculating the moving averages over time. The interpretation of this is that the RSI measures whether the EMA (compounding trend) of exposure to local/regional con�lict is surging or easing. Of the indicators tested, neighboring con�lict within one level proximity outperformed both local and two-level proximity indicators. The best performing was neighboring con�lict fatalities using RSI over a 14-month period based on the EMA of the last 12 months. This indicator picks up on escalations, and peaks in reaction to the compounding impacts of sustained violence. Detailed statistical results are available in Annex II. 2.2.6 Displacement Displacement ampli�ies demand for food and services in host areas and often signi�ies severe livelihood disruptions. Research indicates signi�icant impacts of forced migration on wages, household income, consumption, wellbeing measures, and employment in both origin and destination communities (Calderó n-Mejı́a & Ibá ñez, 2016; Foged & Peri, 2016; Kreibaum, 2016; Maystadt & Duranton, 2019; George & Adelaja, 2022; Esen & Binatli, 2017; Ruiz & Silva, 2015). The UN International Organization for Migration's Displacement Tracking Matrix (IOM-DTM) provides monthly data on displacements to and from districts since 2014. In modeling displacement, total displacements to, from, and the aggregate of both were examined, using the same analytical methods as for con�lict modeling. The combined total of displacements to and from a district was chosen as the indicator, with the RSI over an EMA identi�ied as the optimal method. The analysis employed a 6-month time window for the EMA calculation and a 14-month time window for the RSI calculation. For detailed statistical outcomes, refer to Annex II. 8 3 Methods To set and test thresholds for food security indicators, a binary target variable was created from the IPC phases, adjusting for humanitarian impacts. Areas in crisis (IPC Phase 3) that would reach emergency status (IPC Phase 4) without aid were marked as emergencies. This approach aims to predict intervention needs rather than outcomes. Thus, IPC Phases ≤3 were coded as 0 (non-emergency) and ≥4 as 1 (emergency). A focused sample was then taken that highlights district escalations, targeting instances where the prior period's target was 0, ensuring only genuine food security escalations are tracked. Figure 2 displays the data. Thresholds for each indicator were then �ine-tuned to best front run these new food insecurity emergencies. Figure 2: Distribution the target variable. Distribution of binarized indicator used to calibrate alerts, left). Distribution of escalation events, used to validate the early warning capability, right). Notation-wise, this can be detailed as follows. Speci�ically, the target variable takes the value 1 if a food security emergency is observed, de�ined as IPC categories 4, or 5, and 0 otherwise. For convenience, 0 and 1 can be referred to generically as "class labels," with observations corresponding to food security emergencies being the "positive class." was de�ined as a column vector containing the class labels, encompassing all districts and months where an IPC rating was observed. is the food security indicator de�ined as a corresponding vector with indicator values used to generate warnings. � was de�ined as the corresponding column vector of binary predictions generated according to the following threshold rule: � = � 1 > 0 (1) is de�ined as a numeric threshold. Warnings for time + 1 are always generated with information available at time . In other words, represents the threshold at which an observed value for any given indicator is high enough to predict that the IPC rating in the next time step will be at least IPC Phase 4. The threshold is optimized (see Table 2 for values) to minimize a loss function that evaluates the ability to front run transitions into emergencies by validating against a speci�ic subset of data: district/month observations with an IPC phase rating of 4 and above, provided the prior rating was below 4. More � that included the entries preceded by � ∗ be de�ined as the sub-vectors of and speci�ically, let ∗ and non-emergencies. A conformable vector was used of ones denoted as and a scalar weight denoted as to evaluate predictive performance using this prediction loss function: �� ′ � − � ( − )′ = ′ + (1 − ) ′ = ∗ + (1 − ) ∗ ( − ) ( − ) (2) 9 This loss function is a weighted average of the False Positive Rate (FPR) and False Negative Rate (FNR) measured against emergency escalations. is oriented so that lower values correspond to better predictions. The special cases where is equal to the rate of occurrence of the positive class or equal to 0.5, the value for 1 − equals the standard Accuracy and common Balanced Accuracy rates. The weight factor determines if the FNR or the FPR is more penalised. Values of that place increasing weights on false negatives are used to determine thresholds to generate warnings for different risk levels. The settings correspond to: • = ⅓: failing to recognize an emergency (false negative) is half as costly as raising a false warning • = ½: failing to recognize an emergency (false negative) is just as costly as raising a false warning • = ⅔: failing to recognize an emergency (false negative) is twice as costly as raising a false warning For each risk indicator, the weight factor = ½ was used to �ind an optimal method and time-window. The solution spaces per indicator were trimmed down following several �iltering steps. First, we �ilter out all solutions where the Loss ( = ½ ) >= 0.5 and where the FPR >= 0.5. This excludes calibration results that are uninformative, or attain low average loss by raising warnings most of the time potentially leading fatigue. Solutions where FNR > 3*FPR, and solutions where FPR > 2*FNR, were removed as well, to avoid extremely unbalanced error pro�iles that may be undesirable for similar reasons. Finally, two method-speci�ic �iltering steps were followed to remove implausible solutions. First, for RSI indicators, solutions involving threshold values below 50, which indicate decreases instead of increases, were excluded. Second, for Z-score methods, solutions involving threshold values above 0.5 were excluded. The Z-scores are used for the drought indicators, where positive values indicate an increase relative to the benchmark values, which is the opposite of what these indicators are intended to capture. After these �iltering steps, the optimal performing method/time-window combination is selected, with which we determine the two thresholds. Having found an optimal method and time-window, weight factors of 1/3 and 2/3 respectively were used to determine threshold values that minimize loss in two scenarios. For alerts indicating heightened risks, emphasis was on reducing false negatives ( = ⅔:), leading to more frequent warnings to catch potential emergencies early. For alarms indicating critical risks, the focus was on lowering false positives ( = ⅓), resulting in fewer but more certain warnings. The two-step approach ensures that alarms are generated from the same distribution as alerts, only at higher risk levels. This means that heightened alerts capture the majority of potential emergencies, while the subset of more conservative critical alerts capture those that are most likely to escalate. 4 Results Two sets of results were produced from the analysis. Initially, the indicators were examined individually to evaluate their predictive capabilities and to establish thresholds for generating alerts (signaling elevated risks) and alarms (indicating critical risks). Following the identi�ication of optimized thresholds, a multivariate analysis was conducted. This analysis aimed to determine the most effective way to combine these optimized indicators, assess the incremental accuracy of each indicator, and ascertain the ideal number of indicators for monitoring. This comprehensive approach ensures a nuanced understanding of each indicator's contribution to an overall risk assessment and facilitates the development of a robust model that can accurately predict areas at risk of worsening food insecurity. 4.1 Individual indicators Table 3 presents the univariate validation results of the six main indicators from Table 2. The statistical validation underscores varied performance and differential sensitivity and speci�icity across indicators. 10 Recall that alarms are issued based on a more conservative calibration of thresholds compared to alerts. This trades false and true positives, aiming to minimize the occurrence of false alarms while maintaining the capacity to detect true emergencies. Among the indicators, price data emerge as the most important group of indicators to signal deteriorating food security conditions. Food prices emerge as the most predictive indicator of lower risk levels (alerts) and demonstrate a relatively balanced performance with a moderate FPR of 0.27 and FNR of 0.33 for alerts, and a signi�icantly reduced FPR of 0.05 for alarms, highlighting a successful calibration towards conservatism for critical risks. This is mirrored by the loss values, which shift from 0.31 for alerts to a more favorable 0.28 for alarms, indicating an effective rebalancing between sensitivity and speci�icity at critical risk levels. Fuel prices produce similar alerts but emerge as the more predictive indicator for critical risks (alarms). Fuel price alarms result in a lower FNR (0.45 as opposed to 0.73 for food prices), with slightly higher FPR (0.16, compared to 0.06 for food price alarms). Together, this provides an improved calibration for critical risks (loss value of 0.26). Table 3: Food security risk indicator validation Summary of statistical validation for food security indicators, delineating calibrated thresholds for 'Alerts' and 'Alarms'. 'Alerts' are optimized to lower false negatives, enhancing the frequency of warnings for early emergency detection. 'Alarms' focus on reducing false positives to ensure the reliability of critical risk warnings, facilitating targeted and timely responses. Risk indicator Food prices Fuel prices Exchange Rates Method Percentage of EMA Percentage of MA RSI Risk level Alert Alarm Alert Alarm Alert Alarm True positives 298 121 286 246 321 248 False positives 1,144 234 1,196 687 1,923 1,464 True negatives 3,131 4,041 3,079 3,588 2,352 2,811 False negatives 146 323 158 198 123 196 Kappa 0.20 0.24 0.18 0.26 0.10 0.09 Accuracy 0.73 0.88 0.71 0.81 0.57 0.65 Balanced Accuracy 0.70 0.61 0.68 0.70 0.64 0.61 F1 Score 0.32 0.30 0.30 0.36 0.24 0.23 Precision 0.21 0.34 0.19 0.26 0.14 0.14 FPR 0.27 0.05 0.28 0.16 0.45 0.34 FNR 0.33 0.73 0.36 0.45 0.28 0.44 Loss (Alert, w=2/3; Alarm, w=1/3) 0.31 0.28 0.33 0.26 0.33 0.38 Risk indicator Drought Conflict Displacement Method SPI RSI of EMA RSI of EMA Risk level Alert Alarm Alert Alarm Alert Alarm True positives 258 129 228 105 212 129 False positives 1,492 647 1,217 557 952 548 True negatives 2,783 3,628 3,058 3,718 3,323 3,727 False negatives 186 315 216 339 232 315 Kappa 0.10 0.10 0.11 0.09 0.15 0.13 Accuracy 0.64 0.80 0.70 0.81 0.75 0.82 Balanced Accuracy 0.62 0.57 0.61 0.55 0.63 0.58 F1 Score 0.24 0.21 0.24 0.19 0.26 0.23 Precision 0.15 0.17 0.16 0.16 0.18 0.19 FPR 0.35 0.15 0.28 0.13 0.22 0.13 FNR 0.42 0.71 0.49 0.76 0.52 0.71 Loss (Alert, w=2/3; Alarm, w=1/3) 0.40 0.34 0.42 0.34 0.42 0.32 11 Exchange rates, analyzed through RSI, exhibit the highest FPR among all indicators reaching 0.45 for alerts, and decreasing to 0.34 for alarms. Despite this reduction, the high initial FPR points to a signi�icant over-triggering tendency, potentially leading to alarm fatigue. Nevertheless, its loss values, from 0.33 for alerts to 0.38 for alarms, indicate moderate predictive value, with room for improving speci�icity. Our further analysis shows that the indicator performs much better when taking a North/South divide into consideration. Drought indicators, utilizing the SPI at default thresholds showed a stark contrast in FPR between alert (0.05) and alarm (0.03) levels, suggesting a very conservative approach that unfortunately results in a high false negative rate (FNR) of 0.94 for alerts and 0.96 for alarms. This extreme conservatism, re�lected in its loss values (0.64 for alerts and 0.34 for alarms), suggests standard threshold values may be under- predicting true risks, limiting its utility in timely emergency responses. Optimizing the thresholds had a strong impact on improving the results lowering loss (0.40 for alerts and 0.34 for alarms albeit with a more balanced error mixture), reaching similar loss values as the con�lict and displacement indicators. Con�lict and displacement indicators, employing RSI of the EMA of local and neighboring con�lict fatalities, highlight a different challenge. With notably high FNRs at the alert level (0.49 for con�lict and 0.52 for displacements), which increase to 0.76 and 0.71 for alarms respectively, these indicators show a propensity to under-predict emergencies. Their loss values re�lect this, with a signi�icant reduction in predictive error from alerts to alarms, for con�lict (from 0.42 to 0.34) and displacements (from 0.42 to 0.32), driven by lower FPRs and a better performance at the critical risk level. To examine the impact of slight adjustments in thresholds, kernel density plots for the six indicators were generated (Figure 3), based on continuous values with orange lines for alert triggers and red for critical alarms. The plots reveal that alerts trigger more frequently, and allow some deterioration in indicator values to occur before escalating to alarms. Figure 3: Kernel density plots for the six indicators. Orange line = alert. Red line = alarm. Kernel densities are estimated and indicative of smooth approximated distributions and may indicate some mass at impossible values, such as RSI values outside of the 0 -100 range. 12 Figure 4: Historic food security warnings 2009 to 2024 The graph displays the percentage of districts for which alerts and alarms are issued. Alerts are plotted in a cumulative sense in that they remain in place when the risks transition into alarms. Annex III contains a detailed timeline outlining critical food security events from 2014 to 2023, as compiled by food security experts. Accompanying this timeline are enlarged versions of the �igure below, focusing on speci�ic periods to provide clearer insights. Figure 4 illustrates the nationwide historical distribution of districts receiving alerts and alarms over time, indicating that alarms that signal higher risk levels are generally issued more conservatively and follow after alerts have already indicated preceding risk levels. Several periods stand out. Notably, the period of 2010-2011, marked by a sharp currency devaluation re�lective of Yemen's severe internal con�lict and the onset of the Arab Spring protests. The subsequent period from 2011 to 2014 appears relatively stable with a sporadic issuance of drought alerts. It is crucial to acknowledge that displacement data only begins in 2014, and con�lict data in 2015, implying that the full spectrum of potential alerts and alarms prior to these years is not captured. From 2014 onward, the displacement and con�lict data contribute to a stark rise in the issuance of alerts and alarms, with the food and fuel price data highlighting additional risks during the Saudi-led port blockade in 2015. A signi�icant increase in currency devaluation and in�lation from 2017, coupled with escalating con�lict, culminated in a peak in alarm issuance in 2018. Lastly, a widespread in�lationary surge across all price indicators triggered another spike in alarms around 2022. Finally, while the major food insecurity periods are marked by escalations in economic and con�lict indicators, the drought alarms do not center on any speci�ic year and the highest count of alarms occurs when economic and con�lict risks coincide with periods of drought. 4.2 Multivariate indicator analysis The univariate validation exercise con�irmed the relevance of each indicator while also underscoring the limited predictive capability of relying solely on one alone. In practice, monitoring multiple indicators 13 raises the question of how best to integrate these measures and evaluate the relative value of additional indicators. Annex IV includes correlation matrices for the various indicators, revealing some correlations between different warnings. Notably, food and fuel prices show a stronger correlation with each other, whereas displacement and con�lict warnings are more closely related. This indicates that combining either a con�lict or displacement indicator with food prices may yield a more comprehensive overview than pairing food and fuel price indicators. To systematically evaluate this, the six indicators were analyzed using a Logit model against the target indicator. The analysis focused on how the integration of individual indicators enhanced the model's performance in predicting the target indicator. To explore the incremental accuracy of each indicator, we conducted (inverse) Recursive Feature Elimination (RFE). Leveraging the outcomes from the Generalized Linear Models (GLM), we ranked the indicators by their importance, with the most in�luential indicator positioned �irst. We then forecasted the outcome, determining the optimal threshold for dichotomizing the outcome into binary results and computing the Balanced Accuracy. Subsequently, the second-ranked indicator was incorporated, and the procedure was iterated, recalculating the Balanced Accuracy with each additional indicator until the metric was derived for the full set of indicators. The �indings are depicted in Figure 5. The analysis indicates that sequentially adding indicators in order of importance improves predictive performance, with the most substantial improvements observed upon adding the initial indicators. However, performance saturates or may even drop beyond a certain point, indicating that combining multiple indicators may lead to over-�itting. Table 4 presents the regression outcomes for the GLM employing RFE. The initial three models are based on: (1) normalized continuous indicator data, (2) binary alert data, and (3) binary alarm data. A Brier score nearing zero and a pseudo R-squared value, computed as 1− log loss/ uninformative log loss, approximating 0.6 indicate that these simple models possess a commendable predictive capability regarding actual escalations in food security. Table 4: Foundational GLM results Regression results for three models: 1) normalized continuous indicator data, 2) binary alert data, and 3) binary alarm data. To assess model performance, the Brier score, pseudo R-squared and a weighted average of error types, optimizing the probability cut-off used of classi�ication, were calculated. (1) GLM Continuous (2) GLM Alert (3) GLM Alarm 3.805*** 1.061*** 0.812*** Food price (0.298) (0.138) (0.149) 0.737*** 1.453*** Fuel price (0.137) (0.139) 1.213*** 0.868*** Exchange rate (0.119) (0.113) 0.261* Displacement (0.153) -0.048 Conflict (0.160) 0.888*** 0.343** Drought (0.111) (0.140) Pseudo R2 0.574 0.621 0.615 Brier 0.082 0.074 0.075 Loss w=1/2 0.298 0.252 0.275 Loss w=1/3 0.289 0.275 0.256 Loss w=2/3 0.308 0.230 0.295 Note: *p<0.1; **p<0.05; ***p<0.01 The results in Table 4 help understand the interplay of different indicators when used jointly to track overall food insecurity risks. Strikingly, when using continuous indicator values and employing RFE, the best forecasting power was achieved when only using the food price indicator. In contrast, models (2) and (3) that respectively use binary indicators with optimized thresholds as inputs, reach considerably 14 lower loss and utilize more indicators. Combining alerts yields the lowest loss values (for instance 0.23 for w=2/3, which is considerably below the univariate scores in Table 3). In conclusion, the dichotomization into alerts and alarms does not only provide more interpretable warnings as compared to the level readings of the continuous indicators, it also helps improve forecasting in a simple linear modeling framework. Furthermore, the correlation matrices in Annex IV reveal that the dichotomization reduces correlations between the data, which ensures that fewer warnings raise simultaneously. Figure 5: Marginal accuracy of basic indicators The vertical axis plots cross-validated balanced accuracy using a Generalized Linear Model (GLM), the horizontal axis shows how prediction performance evolves as indicators are combined. From right to left, indicators are dropped in order of signi�icance, the �irst indicator thus being the most dominant. Results are for Model (2) (left), Model (3) (right). Figure 6: Marginal accuracy of all indicators The vertical axis plots cross-validated balanced accuracy using a GLM, the horizontal axis shows how prediction performance evolves as indicators are combined. From right to left, indicators are dropped in order of signi�icance, the �irst indicator thus being the most dominant. The predictors include all alerts and alarms with interaction terms for IRG/AA divide. Optimal number of indicators is identi�ied when graph reached maximum balanced accuracy. Results are for Model (7) (left) and Model (8, �inal) (right). Given the current geopolitical and economic context in Yemen, characterized by a division between areas controlled by IRG and AA, the study explored region-speci�ic dynamics. Models (4) – (6) in Table 5 build upon the frameworks of Models (1) – (3), integrating region-speci�ic interaction effects. Following this integration, RFE was employed to eliminate non-informative predictors and optimize predictions for out-of-sample data. The �indings highlight a signi�icant interaction effect across the IRG/AA divide. For the models utilizing alerts and alarms, these interactions center on the in�lation and drought indicators. Critically, the extremely parsimonious alerts Model (5) suggests that food security in the IRG area is driven by in�lation, while food security in the AA areas is driven by droughts. This outcome aligns the 15 existence of a dual currency system in the country when currency devaluation is rampant in IRG, and the AA areas are associated with increased agriculture and susceptibility to droughts. It is also worth noting that the alarms Model (6) tracks multiple dimensions of food security, and now outperforms the alerts model, as shown by the better loss values, suggesting that the use of alerts alone in Model (5) leads to an oversimpli�ied representation of true food security drivers. Table 5: GLM results with an IRG/AA interaction Regression results for three models 4-6, extended using a regional interaction effect across the IRG/AA divide, and optimized using recursive feature elimination. To assess model performance, the Brier score and pseudo R-squared tests were calculated and a weighted average of error types was calculated, optimizing the probability cut-off used of classi�ication. (4) GLM Continuous (5) GLM Alerts (6) GLM Alarms IRG/AA IRG/AA IRG/AA -3.706*** Food price (0.781) -6.659*** Fuel price (1.266) 6.829*** Exchange rate (1.101) 7.608*** Displacement (1.097) 4.619*** 1.195*** Conflict (0.829) (0.459) -4.507*** 3.782*** 3.498*** Drought (1.163) (0.271) (0.404) Food price with IRG/AA 6.186*** 1.971*** 5.228*** interaction (0.968) (0.133) (0.836) Fuel price with IRG/AA 7.430*** 1.751*** interaction (1.526) (0.155) Exchange rate with IRG/AA -6.434*** 1.643*** 1.012*** interaction (1.231 (0,145) (0.135) Displacement with IRG/AA -7.702*** interaction (1.254) Conflict with IRG/AA -4.070*** -1.502** interaction (0.953) (0.587) Drought with IRG/AA 4.287*** -3.460*** -4.038*** interaction (1.385) (0.317) (0.546) Pseudo R2 0.597 0.603 0.640 Brier 0.078 0.077 0.068 Loss w=1/2 0.271 0.277 0.253 Loss w=1/3 0.242 0.25 0.219 Loss w=2/3 0.299 0.304 0.286 Note *p<0.1; **p<0.05; ***p<0.01 In our �inal comprehensive analysis, we combined both alerts and alarms and applied RFE across all predictors and interaction effects. Models (1) – (6) revealed that, while each indicator has a foundation in theory and is predictive on a univariate basis, optimal forecasting speci�ications typically result from discarding certain variables altogether. We, therefore, created a simple meta indicator which sums over the different alerts and alarms, to signal the number of warnings of any type. We applied a 4-month MA, to capture possible delays in impacts. Using this additional indicator, we apply RFE across all possible interactions. The results are in Table 6 and the RFE results for these models are in Figure 6. As depicted in Figure 6, the model's prediction performance improves signi�icantly with the inclusion of the initial set of predictors and then �lattens. Model (7) identi�ies the optimal combination of predictors, while Model 8 is designed to select the best predictors within the constraint that the meta-indicators for alerts and are maintained. This approach ensures that all dimensions are kept in the model and is justi�ied by the theoretical relevance of each indicator to food security and their demonstrated univariate 16 signi�icance. Retaining a relationship with all indicators in the model, despite causing a potential marginal decrease in historical prediction performance, guarantees the inclusion of essential food security aspects in future analyses, thereby enhancing the model's resilience to evolving food security trends. The pseudo R-squared (0.698), Brier score (0.057) and Balanced Accuracy (0.824) values of Model (8) �inally remained identical at the triple digit level when compared to the purely statistically optimized Model (7), yielding a signi�icant improvement over models (1) - (6). This observation underscores that incorporating additional dimensions of food security can be done without incurring a measurable performance trade-off, effectively capturing broader food security considerations without impact on overall model effectiveness. Table 6: Comprehensive assessment and final GLM specification Regression results for two models that nest models 5-6. Model 7 employs recursive feature elimination across all predictors, and model 8 only across interaction effects. To assess model performance, the Brier score and pseudo R-squared tests were calculated and a weighted average of error types was calculated, optimizing the probability cut-off used of classi�ication. (8) GLM combined (7) GLM combined IRG/AA keeping base IRG/AA indicators Predictors: Alerts Alarms Alerts Alarms 2.369*** -0.634* 3.042*** Meta indicator (0.207) (0.342) (0.441) -4.538*** -4.401*** Food price (0.861) (0.877) 0.785*** 0.811*** Fuel price (0.213) (0.214) -1.331*** -0.971** Exchange Rate (0.378) (0.434) Displacement -2.713*** 1.332** -1.980*** Con�lict (0.629) (0.667) (0.479) 2.196*** 2.006*** 2.537*** 1.921*** Drought (0.494) (0.526) (0.509) (0.534) Meta indicator with IRG/AA 0.601*** -2.898*** 1.299*** -3.648*** interaction (0.131) (0.310) (0.398) (0.533) Food price with IRG/AA 6.786*** 6.643*** interaction (0.913) (0.928) Fuel price with IRG/AA 1.023*** 0.972*** interaction (0.226) (0.226) Exchange rate with IRG/AA 3.649*** -0.510** 3.219*** -0.542** interaction (0.490) (0.223) (0.551) (0.222) Con�lict rate with IRG/AA 3.264*** -1.980** 2.456*** -0.478** interaction (0.707) (0.793) (0.558) (0.877) Drought with IRG/AA -1.521*** -2.502*** -1.911*** -2.458*** interaction (0.582) (0.685) (0.596) (0.690) Pseudo R2 0.698 0.698 Brier 0.057 0.057 Loss w=1/2 0.176 0.176 Loss w=1/3 0.167 0.168 Loss w=2/3 0.186 0.184 Note *p<0.1; **p<0.05; ***p<0.01 4.3 Estimates of populations in emergency Using the GLM, it is possible to express the combined risk assessment in terms of the expected number of people in areas at risk of experiencing an IPC Phase 4+. For each district and time period, the GLM gives a probability between 0 and 1. Speci�ically, the total population per district can be multiplied by 17 the GLM-modeled probability to calculate the population-weighted average IPC Phase 4+ risk. This in turn can be scaled to match historical population counts in IPC Phase 4+ areas. To estimate the expected number of people in areas at risk of experiencing a deterioration into IPC Phase 4, model 8 was chosen. The following notation clari�ies the calculation. Let � = ∑ � =1 ∗ / ∑=1 be the population- weighted IPC Phase 4+ probability. Using the binary FEWS NET IPC Phase 4+ data, the share of the population in IPC Phase 4+ areas can be calculated = ∑ =1 ∗ / ∑=1 . A linear rescaling is then required between and � and to remove aby bias stemming from different scales that results from optimizing for balanced accuracy, rather than for minimizing RMSE against the population totals directly. To rescale between the FEWS NET data and the GLM, a single scaling parameter is calculated using Least Squares. To provide some robustness to outliers or spikes in the modeled data, the scaling parameter is calculated using a centered moving average of the modeled probabilities. � , 3�, � = × �� (3) In which ( , 3) denotes a 3-period centered moving average of and (⋅, ) is a -period lag operator. The regression is �itted for = 1, … , 12, and the optimal lag is selected by minimizing MAE. ̂ is then used to calculate the �inal population that are predicted to be in areas experiencing IPC This Phase 4+. | � ̂ 4+ = arg min ∈1,…12 � , � � � × (4) Figure 7: availability of IPC assessments IPC assessments are contingent on suf�icient evidence for high-con�idence declarations, with no phases declared when information availability does not meet con�idence criteria. The graph therefore highlights the likely access issues that may have also impacted FEWS NET assessments. This �inal metric enables the monitoring of how alerts and alarms collectively correspond to the expected number of individuals in areas at risk of reaching IPC Phase 4+ conditions. To calculate this, we utilized , incorporating data from both FEWS NET (starting from 2014) and IPC data (from 2019 onwards). The data integration was performed due to several challenges with the underlying assessment 18 data. Notably, in regions controlled by AA, we observed minimal variation in FEWS NET data across assessments post-COVID-19 pandemic. This is likely due to the assessment process, which involves projecting a most likely scenario and then during the next assessment cycle adjusting it based on new evidence. Reduced access and information scarcity have likely led to minimal adjustments, possibly causing the initial pre-pandemic assessment to carry over into later analyses. Concurrently, IPC data indicated considerably lower proportions of the population in IPC Phase 4+ areas since inception. It is important to note that IPC assessments are contingent on suf�icient evidence for high-con�idence declarations, with no phases declared when information availability does not meet con�idence criteria. The sporadic availability of IPC data, particularly the absence of comprehensive coverage in AA areas highlighted in Figure 7, evidences the access issues that may have contributed to the persistence of initial IPC Phase 4 ratings in FEWS NET reports. Consequently, we noted that our modeled population estimates line up closely to FEWS NET �igures in the pre-COVID-19 period, while lining up more closely to the IPC data in the period after. Recognizing the potential biases and coverage issues that lead to discrepancies in both data sets, we applied Last Observation Carried Forward (LOCF) imputation to district-level IPC data. was then calculated using FEWS NET data for the period before IPC data availability and a linear average of both datasets afterwards, as shown in the accompanying visuals. thus represents an estimated percentage of populations in areas experiencing IPC Phase 4, giving equal weight to the last known IPC and last known FEWS NET assessments. We then determined the optimal scaling and lag parameters, �inding the optimal lag value to be 5. This suggests our �inal estimate can predict food security conditions most reliably �ive months in advance, indicating that alerts and alarms can preemptively signal deterioration into IPC Phase 4 conditions by at least �ive months. Figure 8 presents the detailed comparison of IPC Phase 4+ �igures from IPC and FEWS NET sources against the model's estimates of the population at risk of reaching IPC Phase 4+ conditions, based on data-driven alerts and alarms. Notably, the trends in the data-driven visuals precede the of�icial declarations, aligning with the previously identi�ied optimal lead-time of �ive months in IRG and four (but not signi�icantly different from �ive) in AA. Speci�ically, in areas controlled by AA, the model's results closely align with FEWS NET data before 2020 and more closely follow IPC data thereafter although our indicators point to an overall lower severity than what is suggested by either FEWS Net or IPC data. It is important to note that for the entirety of 2022, there are no IPC assessments and a subsequent lower con�idence in the FEWS NET phases. The modeled results in turn showed a sharp increase in estimates for the IRG areas in 2022 compared to 2020. This could similarly be attributed to overall uncertainties associated with the of�icial IPC phase data not meeting con�idence thresholds in both 2020 and 2022 as shown in Figure 7, making the relative increase captured by the model estimates less apparent in the of�icial data. Finally, by 2023, the model estimates decreased, highlighting a period with notably fewer alerts and alarms, whereas the gap between FEWS NET and IPC data widened, particularly in IRG areas. The difference between FEWS NET and IPC data highlights the importance of analyzing not only the ultimate population estimates from of�icial or modeled sources but also monitoring the underlying alerts and alarms. This approach provides a thorough understanding of the factors driving food security. Overall, the �indings indicate that combining indicators can reasonably express IPC phase exposure, although there are challenges related to the availability and potential biases in the underlying phase data. While the �inal population estimate is a valuable meta-indicator, the results emphasize the signi�icance of evaluating the contributions of individual indicators to improve real-time food security monitoring and enable prompt interventions. Figure 8: Estimated population in areas experiencing IPC Phase 4+ Comparison between of�icial IPC Phase 4+ �igures, FEWS NET and the calculated estimation of people at risk of experiencing a deterioration into IPC Phase 4+ in IRG and AA areas. 19 5 Discussion and conclusions This study presents a crucial advancement in the methodology for predicting food insecurity, particularly in crisis-stricken regions like Yemen. The introduction of a data-driven approach represents a signi�icant leap forward from traditional heavy loaded assessment techniques, which often suffer from limited frequency and foresight. By leveraging quantitative indicators alongside data-driven thresholds for early warning signals, this research offers a systematic and transparent method to anticipate food insecurity emergencies. The selection of indicators—focusing on in�lation, con�lict, and agricultural productivity—underscores the multifaceted nature of food insecurity, acknowledging that no single factor can encapsulate the complex dynamics at play. In terms of predictive value and importance, food and fuel prices stand out due to their balanced performance across alert and alarm levels, effectively managing the trade-off between minimizing false alarms and detecting critical emergencies. In contrast, the con�lict and displacement indicators, with a 20 more cautious approach, risk missing early signs of food security crises but raise fewer false alarms and perform better at higher risk levels. Exchange rates are prone to higher false positive rates at the alert level, suggesting a need for re�ined area-speci�ic calibration to improve relevance. The analysis highlights a complex scenario where the utility of each indicator is de�ined by its balance between sensitivity and speci�icity. Proper threshold calibration is essential, with price signals showing superior effectiveness but with focus on exchange rates alone leading to a high count of false alarms. Whereas drought indicators at default values fall short, the historical validation results point out that at optimized thresholds they do provide additional predictive power. Speci�ically, when combining indicators in a single model, results show that in�lation data and drought data have area-speci�ic impacts along the North/South divide. This insight is crucial for policy makers and stakeholders to devise targeted responses to food security threats. The �indings from historical validation underline the ef�icacy of the chosen indicators, with a notable precision in forecasting emergencies �ive months ahead of time. This lead time is critical for mobilizing resources and implementing interventions to mitigate the impact of predicted crises. However, the study also highlights the challenges of data availability and the need for careful interpretation of model outputs, especially in regions with limited access or where political and economic instability may affect data reliability. The availability of nutrition outcome indicators in particular presents signi�icant challenges, primarily due to the lack of consistent, high-quality data collection, especially in crisis-affected areas like Yemen. These indicators are critical for understanding the direct impact of food insecurity on population health and well-being, yet they are often underprioritized in data collection efforts. Investment in the collection and analysis of nutrition outcome indicators should be considered a priority. Strengthening data infrastructure and capacity for regular, detailed nutritional assessments can signi�icantly enhance the accuracy and effectiveness of food insecurity predictions. Integrating these outcome indicators with the modeled results from data-driven approaches offers a more holistic view of food security emergencies, enabling targeted interventions that address both the underlying causes and the immediate impacts of food insecurity. The overall �indings suggest that data-driven indicators successfully signal prevailing food security risks, and offer valuable contributions to real-time food security monitoring in the intervals between IPC assessments. This capability enhances the ability to track emerging food security challenges promptly, facilitating timely interventions. The data-driven methodology introduced in this paper marks a signi�icant contribution to the �ield of food security assessment, offering a scalable and transparent approach to early warning systems. The successful historical validation of the model in Yemen, despite its status as the epicenter of a severe humanitarian crisis, demonstrates the robustness and potential applicability of this approach in other contexts. The ability to detect food insecurity deterioration well in advance provides a valuable window for preemptive action, potentially saving lives and reducing the human and �inancial costs of food crises. Nonetheless, this study also brings to light the intrinsic limitations of data-driven approaches, such as dependency on the quality and availability of data and the necessity for continuous re�inement of model parameters to adapt to changing realities on the ground. Future research should focus on enhancing the model's predictive accuracy, expanding the set of indicators to include more granular data points, and exploring the integration of this methodology with traditional assessment techniques to create a more comprehensive and dynamic early warning system. In conclusion, while signi�icant challenges remain, the promising results of this research highlight the potential of simple, data-driven approaches to play a complementary role alongside existing methods, ultimately contributing to a more proactive and effective response to food insecurity crises worldwide. 21 References ACAPS. (2021). Yemen: The impact of remittances on Yemen's economy. Geneva: ACAPS. ACAPS. (2023). Food Affordability in Con�lict-Torn Yemen in Light of the Ukraine War. Joint SDG Fund. ACAPS. (2023). Yemen Economic Tracking Initiative. Retrieved October 04, 2023, from https://yemen.yeti.acaps.org/xr-commodities/ ACAPS. (2023). Yemen: Food supply chain update. Geneva: ACAPS. ACAPS, Mercy Corps. (2020). Yemen: Food supply chain. Geneva: ACAPS. Adewopo, J., André e, B., Peter, H., Solano-Hermosilla, G., & Micale, F. (2024). Comparative Analysis of AI- Predicted and Crowdsourced Food Prices in an Economically Volatile Region. Washington DC: World Bank Group. Alix-Garcia, J., Walker, S., Bartlett, A., Onder, H., & Sanghi, A. (2018). Do refugee camps help or hurt hosts? The case of Kakuma, Kenya. Journal of Development Economics, 130, 66-83. André e, B. (2021b). Monthly food price estimates by product and market. YEM_2021_RTFP_v02_M. Washington DC: World Bank Microdata Library. Retrieved from YEM_2021_RTFP_v02_M: https://doi.org/10.48529/2ZH0-JF55 André e, B. (2023d). Monthly currency exchange rate estimates by product and market. YEM_2023_RTFX_v01_M. Washington DC: World Bank Microdata Library. Retrieved from https://microdata.worldbank.org/index.php/catalog/study/YEM_2023_RTFX_v01_M André e, B. (2023c). Monthly energy price estimates by product and market. YEM_2023_RTEP_v01_M. Washington DC: World Bank Microdata Library. Retrieved from https://microdata.worldbank.org/index.php/catalog/study/YEM_2023_RTEP_v01_M André e, B. P. (2021). Estimating Food Price In�lation from Partial Surveys. Washington DC: World Bank Group. Andree, B. P. (2022a). Machine Learning Guided Outlook of Global Food Insecurity Consistent with Macroeconomic Forecasts. Washington DC: World Bank Group. André e, B. P., & Pape, U. J. (2023). Machine Learning Imputation of High Frequency Price Surveys in Papua New Guinea. Washington DC: World Bank Group. André e, B. P., Chamorro, A., Kraay, A., Spencer, P., & Wang, D. (2020). Predicting Food Crises. Washington DC: World Bank Group. Asfaw, A. (2016). The Inter-Generational Health Effect of Early Malnutrition: Evidence from the 1983-85 Ethiopian Famine. Balkan, B., & Tumen, S. (2016). Immigration and prices: Quasi-experimental evidence from Syrian refugees in Turkey. Journal of Population Economics, 29(3), 657-686. Baylis, K., Heckelei, T., & Storm, H. (2021). Machine learning in agricultural economics. Handbook of Agricultural Economics, 5, 4551-4612. Black, A., & van Nederpelt, P. (2020). Dimensions of Data Quality (DDQ). Herveld: Data Management Association (DAMA). Black, R. E., Victora, C. G., Walker, S. P., Bhutta, Z. A., Christian, P., de Onis, M., . . . Maternal and Child Nutrition Study Group. (2013). Maternal and child undernutrition and overweight in low-income and middle-income countries. Lancet, 382(9890), 427-451. 22 Calderó n-Mejı́a, V., & Ibá ñez, A. (2016). Labour market effects of migration related supply shocks: Evidence from internal refugees in Colombia. Journal of Economic Geography, 16(3), 695-713. Cash and Markets Working Group Yemen. (2022). Yemen Minimum Expenditure Basket - Operational Guidance Note. CMWG. Columbia Climate School International Research Institute for Climate and Society. (2024, January 7). ICSB CHIRPS v2p0. Retrieved from IRI: https://iridl.ldeo.columbia.edu/SOURCES/.UCSB/.CHIRPS/.v2p0/?Set-Language=en Cutler, P. (1984). Famine forecasting; Prices and peasant behaviour in Northern Ethiopia. Disasters, 8(1), 48-56. D'Souza, A., & Jolliffe, D. (2013). Con�lict, food price shocks, and food insecurity: The experience of Afghan households. Food Policy, 42(C), 32-47. Esen, O., & Binatli, A. (2017). The Impact of Syrian Refugees on the Turkish Economy: Regional Labour Market Effects. Social Sciences, 6(4), 129. FAO, IFAD, UNICEF, WHO, and WFP. (2017). The state of food security and nutrition in the world 2017. Rome: FAO. FAO-FSIS, Government of Yemen. (2018). Yemen Seasonal Calendars. FAO-FSIS. Foged, M., & Peri, G. (2016). Immigrants' effect on native workers: New analysis on longitudinal data. American Economic Journal: Applied Economics, 8(2), 1-34. Food and Agriculture Organization (FAO). (2008). An Introduciton to the Basic Concepts of Food Security. Rome: FAO. Food Security and Nutrition Analysis Unit (FSNAU) - Somalia. (2023). EW-EA Dashboard. Retrieved October 15, 2023, from https://dashboard.fsnau.org/ FSIN and Global Network Against Food Crises. (2023). Global Report on Food Crises 2023. Rome: GRFC. Funk, Chris; Peterson, Peter; Landsfeld, Martin; Pedreros, Diego; Verdin, James; Shukla, Shraddhanand; Husak, Gregory; Rowland, James; Harrison, Laura; Hoell, Andrew; Michaelson, Joel. (2015). The climate hazards infrared precipitation with stations - a new environmental record for monitoring extremes. Scienti�ic Data, 2. Galler, J. R., & Barrett, R. L. (2001). Children and famine: long-term impact on development. Ambulatory Child Health, 7(2), 85-95. Galler, J., & Rabinowitz, D. G. (2014). The intergenerational effects of early adversity. Prog Mol Biol Transl Sci, 128, 177-98. Gatti, Roberta; Lederman, Daniel; Islam, Asif; Bennett, Federico; Andree, Pieter Johannes Bo; Assem, Hoda; Lot�i, Rana; Mousa, Mennatallah. (2023). Altered Destinies: The Long-Term Effects of Rising Prices and Food Insecurity in the Middle East and North Africa. Washington DC: World Bank Group. George, J., & Adelaja, A. (2022). Armed con�lict, forced displacement and food security in host communities. World Development, 158. Ghorpade, Y., & Ammar, A. (2021). Social Protection at the Humanitarian-Development Nexus: Insights from Yemen. Washington DC: World Bank Group. Guttman, N. B. (2007). Accepting the standardized precipitation index: A calculation algorithm. JAWRA, 35(2), 311-322. 23 Headey, D. (2011). Rethinking the global food crisis: The role of trade shocks. Food Policy, 36(2), 136- 146. Hillbruner, C., & Moloney, G. (2012). When Early Warning Is Not Enough—Lessons Learned from the 2011 Somalia Famine. Global Food Security, 1(1), 20-28. Holleman, C., Jackson, J., Sanchez, M., & Vos, R. (2017). Sowing the seeds of peace for food security: Disentangling the nexus between con�lict, food security and peace. FAO Agricultural Development Economics Technical Study 2, 95. IBM. (2023). Data quality dimensions (Watson Knowledge Catalog). Retrieved 10 October, 2023, from https://www.ibm.com/docs/en/cloud-paks/cp-data/4.7.x?topic=quality-data-dimensions IPC. (2021). Technical Manual Version 3.1. IPC. IPC. (2022). Yemen: Acute Food Insecurity Snapshot l October - December 2022. Rome: IPC. IPC. (2022). Yemen: Food Security & Nutrition Snapshop March 2022. IPC. Khan, M. (1994). Market-based early warning indicators of famine for the pastoral households of the Sahel. World Development, 22(2), 189-199. Kreibaum, M. (2016). Their Suffering, Our Burden? How Congolese Refugees Affect the Ugandan Population. World Development, 78, 262-287. Manatsa, D., Mukwada, G., Siziba, E., & Chinyanganya, T. (2010). Analysis of multidimensional aspects of agricultural droughts in Zimbabwe using the Standardized Precipiation Index (SPI). Theoretical and Applied Climatology, 102, 287-305. Martini, Giulia; Bracci, Alberto; Riches, Lorenzo; Jaiswal, Sejal; Corea, Matteo; Rivers, Jonathan; Husain, Arif; Omodei, Elisa. (2022). Machine learning can guide food security efforts when primary data are not available. Nature Food, 3, 716-728. Martin-Shields, C. P., & Stojetz, W. (2019). Food security and con�lict: Empirical challenges and future opportunities for research and policy making on food security and con�lict. World Development, 119, 150-164. Maxwell, D., & Fitzpatrick, M. (2012). The 2011 Somalia famine: Context, causes, and complications. Global Food Security, 1(1), 5-12. Maxwell, D., & Hailey, P. (2020). Toward Anticipatory Information Systems and Action: Notes on Early Warning and Early Action in East Africa. Boston: Tufts University. Maxwell, D., Haan, N., Bilukha, O., Hailey, P., Seal, A., & Lopez, J. (2022). Famine Review of the IPC Acute Food Insecurity and Acute Malnutrition Analyses. IPC. Maystadt, J., & Duranton, G. (2019). The development push of refugees: Evidence from Tanzania. Journal of Economic Geography, 19(2), 299-334. McBride, L., Barrett, C. B., Browne, C., Hu, L., Liu, Y., Matteson, D. S., . . . Wen, J. (2022). Predicting poverty and malnutrition for targeting, mapping, monitoring and early warning. Applied Economic Perspectives and Policy, 44(2), 879-892. Mechler, R. (2016). Reviewing estimates of the economic ef�iciency of disaster risk management: opportunities and limitations of using risk-based cost–bene�it analysis. Natural Hazards, 81, 2121-2147. 24 Meerkatt, H., Kolo, P., & Renson, Q. (2015). UNICEF/WFP return on investment for emergency preparedness study. Munich: Boston Consulting Group. Mellor, W. J. (1986). Agriculture on the Road to Industrialisation. Washington DC: Overseas Development Council. Meta. (2024, 03 07). High Resolution Population Density Maps. Retrieved from Data for Good at Meta: https://dataforgood.facebook.com/dfg/tools/high-resolution-population-density- maps#methodology Misselhorn, A. (2005). What Drives Food Insecurity in Southern Africa? A Meta-Analysis of Household Economic Studies. Global Environmental Change, 15, 33-42. Mlenga, D. H., & Jordaan, A. J. (2020). Integrated drought monitoring framework for Eswatini applying standardised precipication index and normalised difference vegetation index. Jamba - Journal of Disaster Risk Studies, 12(1). Ruiz, I., & Silva, C. (2015). The labor market impacts of forced migration. American Economic Review, 105(5), 581-586. Seaman, J., & Holt, J. (1980). Markets and famines in the third world. Disasters, 4(3), 283-297. Singh, R. B. (2012). Climate Change and Food Security. In Improving Crop Productivity in Sustainable Agriculture (pp. 1-22). Berlin: Wiley-WCH Verlag GmbH & Co. KGaA. The Observatory of Economic Complexity (OEC). (2023). Yemen Pro�ile. Retrieved 19 October, 2023, from https://oec.world/en/pro�ile/country/yem UNDP. (2022). Qat and Coffee values chain analysis in Yemen. New York: UNDP. UNICEF. (2020). Malnutrition surges among young children in Yemen as condition worsen. Geneva: UNICEF. USGS. (2023, December 1). USGS FEWS NET Data Portal. Retrieved from NDVI eVMOD/eVIIRS: https://earlywarning.usgs.gov/fews/product/951 Veenendaal, M. V., Painter, R. C., de Rooij, S. R., Bossuyt, P., van der Post, J., Gluckman, P., . . . Roseboom, T. (2013). Transgenerational effects of prenatal exposure to the 1944-45 Dutch famine. BJOG, 120(5), 548-53. Wang, D., Andree, B. P., Chamorro, A. F., & Spencer, G. (2020). Stochastic Modeling of Food Insecurity. Washington DC: World Bank Group. World Bank Group. (2020). Yemen Desert Locust Response Project - Project Appraisal Document. Washington DC: World Bank Group. World Bank Group. (2020). Yemen Economic Update - October 2020. Washington DC: World Bank Group. World Bank Group. (2021). Yemen Dunamic Needs Assessment (DNA). Washington DC: World Bank Group. World Bank Group. (2023). Breaking the Cycle of Food Crises in Yemen. Washington DC: World Bank Group. World Bank Group. (2023). Food Security Crisis Preparedness Plan. Washington DC: World Bank Group. World Bank Group. (2023). Yemen Country Overview. Retrieved 10 19, 2023, from https://www.worldbank.org/en/country/yemen/overview 25 World Food Programme. (2023). WFP Hunger Map. Retrieved October 19, 2023, from https://hungermap.wfp.org/ Zhou, Y., Lentz, E., Michelson, H., Kim, C., & Baylis, K. (2022). Machine learning for food security: Principles for transparency and usability. Applied Economic Perspectives and Policy, 44(2), 893- 910. 26 Annex I: Formulas The following formulas have been used for different food security risk indicators. Exponential moving average (EMA) A moving average which weighs recent observations heavier than older observations. The formula is:   +  1 −1   +  2 −2   +   …   +   0 ( )  =   1  +  1   +  2   +   …   +   (5) Where xt is the value of the food security risk indicator at period t and w(t) is the weight factor at period t, de�ined as:   =  (1 − ) (6) This α is the smoothing factor which by default is determined by the attached time-window n, α = 2/(1+n). In the modeling of the food security risk indicators, both this default value as manually chosen smoothing factors have been used. Long-term average The long-term average (or Simple Moving Average) is de�ined as −   +  −+1   +   …   +   (,  )  =     (7) In the formulas n is the selected time-window and xt is the value of the food security risk indicator at period t. Relative Strength Index (RSI) Two columns are required, looking at the change in the food security risk indicator. One is for the upward change U and one for the downward change D:   =   max(   −  −1 ,  0 )   =   max(−1   −   ,  0 ) (8) The EMA is then calculated for both U and D. This gives us the Relative strength factor (RS): ()   =   ( ) (9) Finally, the Relative Strength Factor (RSI) is de�ined as: 100   =  100  −   1  +   ( 10 ) 27 RSI is used in analysis of �inancial markets and one of the bene�its of using the RSI is that it is always between 0 and 100. Values above 70 are usually interpreted as an indication of a strong upward trend and values below 30 as a strong downward trend. Z-score mean The Z-score (or Standard score) represents the distance between the measured value and the population mean expressed in units of the standard deviation: −   =   ( 11 ) Where µ is the mean of the population, σ is the standard deviation of the population and x is the value of the food security risk indicator. Z-score median Is the same procedure as calculating the Z-score (mean), but instead of subtracting the population mean, the population median is subtracted. This approach could be used to �ilter out big spikes in the population data Standardized Precipitation Index (SPI) An index widely used to characterize meteorological drought on a range of timescales. The idea is like a Z-score approach, however in the SPI case, the rainfall data is �itted to a, usually, gamma distribution which is then transformed to a normal distribution. The interpretation of the SPI values is how many standard deviations the observation deviates from the long-term mean. Several timescales can be used for SPI, for the model both SPI1 and SPI3 are modeled, using a one-month time-period or a 3-month time-period. 28 Annex II: Extended indicator validation Table A 1a: Food price indicator statistics The table shows validation results for the best result per price indicator. Indicator methods are optimized for an equal weighted average of FPR and FNR (Loss) with restrictions explained in the text. Results are ordered by Loss (best on top). The chosen indicator is marked by ***. Indicator Method Window Boundary TP FP TN FN Accuracy Precision F1 Kappa FPR FNR Loss Sorghum (1 KG) Percentage from MA 5 1.108 250 668 3607 194 0.817 0.272 0.367 0.275 0.156 0.437 0.297 FPI (Top 5)*** Percentage from EMA 4 1.070 298 1144 3131 146 0.727 0.207 0.316 0.201 0.268 0.329 0.298 Rice(Imported, 1 KG) Percentage from EMA 4 1.098 272 1054 3221 172 0.740 0.205 0.307 0.194 0.247 0.387 0.317 Beans (Red Kidney, 1 KG) Percentage from MA 5 1.132 255 979 3296 189 0.752 0.207 0.304 0.192 0.229 0.426 0.327 Sugar (1 KG) Percentage from MA 8 1.142 231 867 3408 213 0.771 0.210 0.300 0.191 0.203 0.480 0.341 Lentils(1 KG) Percentage from MA 9 1.050 339 1981 2294 105 0.558 0.146 0.245 0.104 0.463 0.236 0.350 Millet (1 KG) Percentage from EMA 12 1.130 242 1071 3204 202 0.730 0.184 0.275 0.157 0.251 0.455 0.353 Wheat Flour (1 KG) Percentage from EMA 10 1.152 247 1139 3136 197 0.717 0.178 0.270 0.149 0.266 0.444 0.355 SMEB Percentage from MA 6 1.162 218 871 3404 226 0.768 0.200 0.284 0.174 0.204 0.509 0.356 Eggs (1 Dozen) Percentage from MA 9 1.106 307 1754 2521 137 0.599 0.149 0.245 0.107 0.410 0.309 0.359 Salt (1 KG) Percentage from MA 11 1.124 272 1649 2626 172 0.614 0.142 0.230 0.091 0.386 0.387 0.387 Wheat (1 KG) Percentage from MA 9 1.178 193 1021 3254 251 0.730 0.159 0.233 0.110 0.239 0.565 0.402 Onions (1 KG) RSI 12 65.331 169 921 3354 275 0.747 0.155 0.220 0.100 0.215 0.619 0.417 Oil (Vegetable, 1L) Percentage from MA 5 1.062 250 1875 2400 194 0.562 0.118 0.195 0.046 0.439 0.437 0.438 Livestock (Two-year-old male) Percentage from MA 5 1.022 254 2042 2233 190 0.527 0.111 0.185 0.033 0.478 0.428 0.453 Tomatoes (1 KG) Percentage from MA 6 1.062 246 1967 2308 198 0.541 0.111 0.185 0.034 0.460 0.446 0.453 Peas (1 KG) Percentage from MA 7 1.018 237 1965 2310 207 0.540 0.108 0.179 0.027 0.460 0.466 0.463 Potatoes (1 KG) RSI 11 60.521 130 1170 3105 314 0.686 0.100 0.149 0.010 0.274 0.707 0.490 29 Table A 2b: Food basket price indicator statistics The table shows validation results for the best result per method applied to basket indicators. Indicator methods are optimized for an equal weighted average of FPR and FNR (Loss) with restrictions explained in the text. Results are ordered by Loss (best on top). The chosen indicator is marked by ***. Indicator Method Window Boundary TP FP TN FN Accuracy Precision F1 Kappa FPR FNR Loss FPI (Top 5)*** Percentage from EMA 4 1.070 298 1144 3131 146 0.727 0.207 0.316 0.201 0.268 0.329 0.298 FPI (Top 5) Percentage from MA 5 1.096 284 1107 3168 160 0.732 0.204 0.310 0.195 0.259 0.360 0.310 SMEB Percentage from MA 6 1.162 218 871 3404 226 0.768 0.200 0.284 0.174 0.204 0.509 0.356 SMEB Percentage from EMA 9 1.130 236 1171 3104 208 0.708 0.168 0.255 0.131 0.274 0.468 0.371 SMEB RSI 9 77.355 173 1098 3177 271 0.710 0.136 0.202 0.072 0.257 0.610 0.434 Table A 3: Fuel price indicator statistics The table shows validation results for the best result per method. Indicator methods are optimized for an equal weighted average of FPR and FNR (Loss) with restrictions explained in the text. Results are ordered by Loss (best on top). The chosen indicator is marked by ***. Indicator Method Window Boundary TP FP TN FN Accuracy Precision F1 Kappa FPR FNR Loss Fuel (Petrol, 1L) Percentage from MA 3 1.168 287 1014 3261 157 0.752 0.221 0.329 0.219 0.237 0.354 0.295 Fuel (Petrol, 1L) Percentage from EMA 3 1.124 296 1124 3151 148 0.730 0.208 0.318 0.203 0.263 0.333 0.298 Fuel Average (Petrol, Diesel)*** Percentage from MA 4 1.351 246 687 3588 198 0.812 0.264 0.357 0.263 0.161 0.446 0.303 Fuel (Diesel, 1L) Percentage from MA 5 1.263 261 946 3329 183 0.761 0.216 0.316 0.207 0.221 0.412 0.317 Fuel Average (Petrol, Diesel) Percentage from EMA 2 1.054 317 1497 2778 127 0.656 0.175 0.281 0.153 0.350 0.286 0.318 Fuel (Diesel, 1L) Percentage from EMA 3 1.162 262 1063 3212 182 0.736 0.198 0.296 0.181 0.249 0.410 0.329 Fuel Average (Petrol, Diesel) RSI 6 88.978 210 761 3514 234 0.789 0.216 0.297 0.193 0.178 0.527 0.353 Fuel (Diesel, 1L) RSI 8 78.557 210 838 3437 234 0.773 0.200 0.282 0.172 0.196 0.527 0.362 Fuel (Petrol, 1L) RSI 12 78.156 202 783 3492 242 0.783 0.205 0.283 0.176 0.183 0.545 0.364 Fuel (Gas, 1L) Percentage from EMA 3 1.052 177 1234 3041 267 0.682 0.125 0.191 0.056 0.289 0.601 0.445 Fuel (Gas, 1L) Percentage from MA 4 1.072 169 1203 3072 275 0.687 0.123 0.186 0.051 0.281 0.619 0.450 Fuel (Gas, 1L) RSI 12 58.918 150 1272 3003 294 0.668 0.105 0.161 0.020 0.298 0.662 0.480 30 Table A 4: Exchange rate indicator statistics The table shows validation results for the best result per method. Indicator methods are optimized for an equal weighted average of FPR and FNR (Loss) with restrictions explained in the text. Results are ordered by Loss (best on top). The chosen indicator is marked by ***. Indicator Method Window Boundary TP FP TN FN Accuracy Precision F1 Kappa FPR FNR Loss RTFX Exchange Rate RSI 8 67.134 321 1923 2352 123 0.566 0.143 0.239 0.097 0.450 0.277 0.363 RTFX Exchange Rate Percentage from MA 7 1.056 211 1298 2977 233 0.676 0.140 0.216 0.083 0.304 0.525 0.414 Table A 5: Drought indicator statistics The table shows validation results for the best result per method. Indicator methods are optimized for an equal weighted average of FPR and FNR (Loss) with restrictions explained in the text. All NDVI indicators had been removed by the �iltering rules. Results are ordered by Loss (best on top). The chosen indicator is marked by ***. Indicator Method Window Boundary TP FP TN FN Accuracy Precision F1 Kappa FPR FNR Loss SPI SPI 1 -0.120 258 1492 2783 186 0.644 0.147 0.235 0.100 0.349 0.419 0.384 Rainfall (Crop calendar) MA of Z-score (Median) 3 0.451 228 1408 2867 216 0.656 0.139 0.219 0.084 0.329 0.486 0.408 Rainfall MA of Z-score (Mean) 3 0.150 284 2023 2252 160 0.537 0.123 0.206 0.058 0.473 0.360 0.417 Rainfall EMA of Z-score (Mean) 1 -0.475 191 1198 3077 253 0.693 0.138 0.208 0.077 0.280 0.570 0.425 SPI (Crop calendar) SPI 1 -0.112 165 952 3323 279 0.739 0.148 0.211 0.089 0.223 0.628 0.426 Rainfall (Crop calendar) EMA of Z-score (Median) 6 0.271 158 939 3336 286 0.740 0.144 0.205 0.082 0.220 0.644 0.432 Rainfall (Crop calendar) EMA of Z-score (Mean) 6 0.066 165 1008 3267 279 0.727 0.141 0.204 0.078 0.236 0.628 0.432 Rainfall EMA of Z-score (Median) 5 0.210 206 1411 2864 238 0.651 0.127 0.200 0.061 0.330 0.536 0.433 Rainfall MA of Z-score (Median) 5 0.451 274 2128 2147 170 0.513 0.114 0.193 0.040 0.498 0.383 0.440 31 Table A 6: conflict indicator statistics The table shows validation results for the best result per method. Indicator methods are optimized for an equal weighted average of FPR and FNR (Loss) with restrictions explained in the text. Results are ordered by Loss (best on top). The chosen indicator is marked by ***. Indicator Method Window Boundary TP FP TN FN Accuracy Precision F1 Kappa FPR FNR Loss Fatalities (1deg Neighbor)*** RSI of EMA [14, 12] 64.529 228 1217 3058 216 0.696 0.158 0.241 0.114 0.285 0.486 0.386 Fatalities (2deg Neighbor) RSI of EMA [14, 12] 56.914 305 2051 2224 139 0.536 0.129 0.218 0.071 0.480 0.313 0.396 Counts (1deg Neighbor) RSI of EMA [14, 12] 72.345 216 1223 3052 228 0.693 0.150 0.229 0.100 0.286 0.514 0.400 Counts (2deg Neighbor) RSI of EMA [14, 12] 86.172 210 1175 3100 234 0.701 0.152 0.230 0.102 0.275 0.527 0.401 Fatalities RSI of EMA [14, 12] 53.307 169 902 3373 275 0.751 0.158 0.223 0.104 0.211 0.619 0.415 Fatalities (2deg Neighbor) Percentage from EMA 1 1.108 205 1318 2957 239 0.670 0.135 0.208 0.073 0.308 0.538 0.423 Fatalities (2deg Neighbor) Percentage from MA 1 1.108 205 1318 2957 239 0.670 0.135 0.208 0.073 0.308 0.538 0.423 Fatalities (1deg Neighbor) Percentage from MA 12 1.006 184 1234 3041 260 0.683 0.130 0.198 0.063 0.289 0.586 0.437 Fatalities (1deg Neighbor) Percentage from EMA 12 1.012 167 1169 3106 277 0.694 0.125 0.188 0.054 0.273 0.624 0.449 Counts (1deg Neighbor) Percentage from MA 12 1.275 160 1255 3020 284 0.674 0.113 0.172 0.034 0.294 0.640 0.467 Counts (2deg Neighbor) Percentage from MA 12 1.058 203 1798 2477 241 0.568 0.101 0.166 0.014 0.421 0.543 0.482 Counts (1deg Neighbor) Percentage from EMA 12 1.162 155 1353 2922 289 0.652 0.103 0.159 0.016 0.316 0.651 0.484 Counts (2deg Neighbor) Percentage from EMA 11 1.006 178 1840 2435 266 0.554 0.088 0.145 -0.011 0.430 0.599 0.515 Table A 7: Displacement indicator statistics The table shows validation results for the best result per method. Indicator methods are optimized for an equal weighted average of FPR and FNR (Loss) with restrictions explained in the text. Results are ordered by Loss (best on top). The chosen indicator is marked by ***. Indicator Method Window Boundary TP FP TN FN Accuracy Precision F1 Kappa FPR FNR Loss Displacements (From, to)*** RSI of EMA [14, 6] 55.110 212 952 3323 232 0.749 0.182 0.264 0.148 0.223 0.523 0.373 Displacements (To) RSI of EMA [12, 5] 50.100 173 914 3361 271 0.749 0.159 0.226 0.107 0.214 0.610 0.412 Displacements (To) Percentage from EMA 2 1.010 103 1096 3179 341 0.695 0.086 0.125 -0.014 0.256 0.768 0.512 32 Annex III: Key food security events 2014 to 2023 Table A 8: Timeline of food security events The key events and dates were compiled based on expert-consultation. Those who contributed have been mentioned in the acknowledgements. The timeline provide an indicator than can be used to cross-check the performance of the alerts and alarms depicted in the visuals in this paper, see Table A 8. Date Timeline of key events in Yemen relevant to the food security situation Sep-14 AA forces, allied with forces loyal to former Yemeni president Ali Abdullah Saleh, seize Sana’a. Sep-14 Suspension of public social welfare cash transfers to 1.5 million beneficiaries. AA /Saleh forces advance south towards Taizz, Lahj and Aden. Saudi-led coalition launches military Mar-15 intervention in Yemen. Suspension of oil and gas production and export. Jul-15 Saudi-backed government forces retake Aden. May-16 UNVIM inspection mechanism installed, which allowed ships to come in. Suspension of public payroll payments due to insufficient stock of rial banknotes. CBY Aden no longer able to Aug-16 issue letters of credit to finance import of essential commodities due to lack of foreign currency reserves. President Hadi transfers CBY headquarters from Sana’a to Aden. Yemen is left with two competing Central Sep-16 Banks. Yemen experience the world’s largest outbreak of cholera, recording 1 million cases and 2,000 related deaths Jul-17 in 2017. The coalition imposes a complete blockade on Yemen, in retaliation to a AA missile fired into Saudi Arabia. The Nov-17 blockade is then lifted in January 2018. Saudi Arabia allocates a deposit of USD 2 billion to CBY Aden to try to stabilize the rial and finance imports of Mar-18 food (unutilized due to the lack of implementation mechanism and capacity). The Saudi-led coalition and coalition-backed forces launch an offensive on the port city of Al Hodeidah, the Jun-18 country’s main entry point for commercial food and fuel and humanitarian aid. Oct-18 The value of Yemeni rial versus the USD in the local market reaches its lowest record: 800 YER per 1 USD. IRG implements Decree 75, which limits fuel importers to only those approved by the IRG-run Economic Oct-18 Committee. CBY Aden rolls out a revised version of the letters of credit system, with accompanying below-market exchange Nov-18 rate that helped stem the rapid depreciation of the riyal between July and October 2018 CBY Sana’a demands banks to issue cheques rather than hard cash if importers wished to open letters of credit Nov-18 with CBY Aden (in relation to CBY Aden requirement to deposit the equivalent in rial of the credit in USD). UN facilitated Stockholm agreement (between the AA and IRG) including agreement on Hodeidah port and Dec-18 imports. CBY Sana’a place restrictions on banks from opening letters of credit with CBY Aden for food importers Mar-19 headquartered in Sana’a. Fuel Crisis: Competition over fuel revenues between the AA and IRG led to reduced fuel imports through Al Mar-19 Hodeidah and to increased fuel prices. Jun-19 IRG bans fuel imports from Omani and Iraqi ports as well as Al Hamriya Port in Sharjah, UAE. IRG introduces Decree 49 for fuel import regulation, requiring fuel importers to pay import taxes and customs Jul-19 fees to Aden. Aug-19 The Southern Transitional Council (STC), backed by the UAE, take control of Aden and Zinjibar. Competition over fuel revenues between the AA and IRG led to a second fuel crisis and international pressure Sep-19 on the IRG to lessen restrictions. Nov-19 IRG and the STC sign the Riyadh Agreement aimed to ending the fighting in the south. The AA extend a ban on new rial banknotes (issued by CBY Aden after 2016) in their controlled areas, Dec-19 prohibiting citizens from using the new banknotes. Humanitarian funding requirements are the higher ever recorded in Yemen (USD 4.19 billion) and actual Jun-19 funding the highest ever received (USD 3.64 billion – 87% of the total). First confirmed case of COVID-19 declared in Yemen. The pandemic affects remittances inflow from Yemenis Apr-20 abroad. Yemen donors pledging conference for the HRP 2020 is hosted by Saudi Arabia. Only USD 1.35 billion (50%) Jun-20 out of USD 3.38 billion required is pledged. The IRG announces the suspension of fuel imports via Al Hodeidah in response to the AA withdrawing up to Jun-20 YER 45 billion from the ‘special account’ at CBY Al Hodeidah destined for payment of public salaries. 33 President Hadi announces a reshuffle of the Cabinet (as settled in the Riyadh Agreement), which becomes to Dec-20 include southern separatists. The value of the rial in IRG areas momentarily fall to record low of 917 YER per 1 USD. The 2 billion USD Saudi support to CBY Aden for staple food import financing is nearly depleted. IRG suspends Jan-21 fuel imports through Al Hodeidah, after average imports from September to December 2020. Feb-21 The AA launch an intensive offensive to take over Marib governorate. Mar-21 COVID-19 total cases double in one month. Jul-21 Exchange rate reaches YER 1,000 per USD Aug-21 Yemen gets USD 665 million of IMF reserves in new SDR allocation Oct-21 AA advance in Marib Nov-21 Aden raises price of bread in IRG areas citing rising production costs including high price of wheat flour Dec-21 A rapid improvement in the Yemeni rial following change in CBY leadership Feb-22 Russia invasion of Ukraine increase wheat price by 30% Apr-22 Truce announced - 41% decrease in casualty numbers Jun-22 FAO announces a decline in world food prices for third consecutive month Oct-22 Truce fails Feb-23 KSA deposits USD 1bn in CBY Aden impacting IRG areas Jul-23 WFP cuts food rations in Yemen by 35% Table A 9: Cross-reference of alerts and alarms against key events reported during the expert-consultation Please see table A 7 for detailed events. Food security risk indicators Food security event description March 2015 The advancement of AA/Saleh forces towards the south prompts a military intervention by the Saudi-led coalition in Yemen, resulting in the suspension of oil and gas production and export activities. This period is characterized by a steep increase in risks, as re�lected in escalations across most monitored indicators. Importantly, the con�lict indicator rises prior to March followed by additional warnings triggered by the displacement indicator and then the fuel price indicator. 34 July 2015 Prior to the retaking of Aden by Saudi-backed government forces, the con�lict indicator is positive at alarm levels indicator critical risks. Again, the elevation in the con�lict indictor preceded elevation in the displacement indicator. Following the retaking of Aden by Saudi-backed government forces, a notable decrease is observed in the number of escalations across several of the monitored indicators. The graph visualizes well how critical risks indicated by alarms subside �irst, while heightened risks indicated by the alerts persist for a longer period. September 2016 President Hadi's decision to transfer the CBY headquarters from Sana'a to Aden results in the establishment of two competing Central Banks within Yemen. This period is characterized by a noticeable elevation in the exchange rate indicator, with the increase persisting over the subsequent months but with limited translation into critical alarms. 35 November 2018 The CBY in Aden implements a revised letters of credit system, featuring a below-market exchange rate to curb the rapid depreciation of the Yemeni riyal between July and October 2018. These efforts towards macroeconomic stabilization contribute to a reduction in risks related to all price indicators. Food and fuel price alarms decline �irst, followed by rapid easing of alerts. Exchange rate alerts lingering for a longer period. Throughout the entire period, con�lict alerts persist in a fraction on areas. June 2020 A fuel crisis erupts as the IRG announces the halt of fuel imports through Al Hodeidah, reacting to the AA's withdrawal of up to YER 45 billion from the 'special account' at CBY Al Hodeidah, funds that were earmarked for public salary payments. This crisis leads to minor escalations in price indicators, particularly due to disruptions in food supply chains, that result in several price indicator alerts. 36 December 2020 Following President Hadi's announcement of a cabinet reshuf�le at the CBY in Aden, the Yemeni rial's value in IRG- controlled areas brie�ly plummeted to 917 YER per USD. The initial rapid appreciation of the Yemeni rial after the cabinet changes led to a temporary decrease in monitored risks. However, a subsequent depreciation of the rial ensued, resulting in escalated concerns once again that spill over from the exchange rates alone onto the food and fuel price indicator alerts. July 2021 The exchange rate escalates to YER 1,000 per USD, marking a continued depreciation of the Yemeni rial. Sustained currency devaluation contributes persistence in the exchange rate alarms as well as spikes in food and fuel price indicators, thereby elevating the overall level of food insecurity concerns. This period highlights well that an increase in alarms that signify critical risks, may be preceded by a longer period of alerts that signify that risks are already elevated. 37 February 2022 The invasion of Ukraine by Russia results in a 30% increase in international wheat prices. Much of this increased occurred prior to the invasion date as global markets were pricing in the risks at the backdrop of warnings by various agencies of the looming threat. The alarms visual provides an important example of the capability of indicators to front-run developments. Right after the invasion, another spike in alerts is triggered. The global price rise triggers a signi�icant spike in the domestic food price indicators re�lected by the elevation of alerts and alarms, attributed to the surge in imported wheat costs and disruptions in food import channels. April 2022 A truce is announced, leading to a 41% decrease in casualty numbers, re�lected by subsiding con�lict alerts. The earlier escalation in food and fuel price related food security concerns following the Russian invasion of Ukraine is alleviated by the truce, contributing to a decrease in the overall level of monitored risks related to prices. 38 October 2022 The previously announced truce fails. Despite the collapse of the truce, there was no signi�icant uptick in con�lict intensity nor were there notable changes in food prices. The sustained period of relative calm ensured that the level of food insecurity concern remained stable after the truce's dissolution. 39 Annex IV: Correlation between individual indicators Table A 10: Correlations between continuous indicator data Food Fuel Exchange rate Displacement Conflict Drought price price Food price 1 0.58 0.27 0.27 0.19 -0.05 Fuel price 0.58 1 0 0.41 0.24 -0.38 Exchange rate 0.27 0 1 0 0.08 0.16 Displacement 0.27 0.41 0 1 0.31 -0.2 Conflict 0.19 0.24 0.08 0.31 1 -0.18 Drought -0.05 -0.38 0.16 -0.2 -0.18 1 Table A 11: Correlations between alerts Food price Fuel price Exchange rate Displacement Conflict Drought Food price 1 0.59 0.14 0.4 0.36 0.19 Fuel price 0.59 1 0.1 0.33 0.23 0.17 Exchange rate 0.14 0.1 1 -0.06 0.02 -0.05 Displacement 0.4 0.33 -0.06 1 0.43 0.24 Conflict 0.36 0.23 0.02 0.43 1 0.2 Drought 0.19 0.17 -0.05 0.24 0.2 1 Table A 12: Correlations between alarms Food Fuel Exchange rate Displacement Conflict Drought price price Food price 1 0.4 0.19 0.11 0.04 0.03 Fuel price 0.4 1 0.02 0.49 0.37 0.31 Exchange rate 0.19 0.02 1 -0.1 -0.12 -0.02 Displacement 0.11 0.49 -0.1 1 0.42 0.29 Conflict 0.04 0.37 -0.12 0.42 1 0.31 Drought 0.03 0.31 -0.02 0.29 0.31 1 40