Policy Research Working Paper 9886 Estimating Food Price Inflation from Partial Surveys Bo Pieter Johannes Andrée Development Data Group & Fragility, Conflict and Violence Global Theme December 2021 Policy Research Working Paper 9886 Abstract The traditional consumer price index is often produced at Programme surveys in 25 fragile and conflict-affected an aggregate level, using data from few, highly urbanized, countries where real-time monthly food price data are areas. As such, it poorly describes price trends in rural or not publicly available from official sources. The results are poverty-stricken areas, where large populations may reside made available as a data set that covers more than 1200 in fragile situations. Traditional price data collection also markets and 43 food types. The local statistics provide a follows a deliberate sampling and measurement process that new granular view on important inflation events, including is not well suited for monitoring during crisis situations, the World Food Price Crisis of 2007–08 and the surge in when price stability may deteriorate rapidly. To gain real- global inflation following the 2020 pandemic. The paper time insights beyond what can be formally measured by finds that imputations often achieve accuracy similar to traditional methods, this paper develops a machine-learn- direct measurement of prices. The estimates may provide ing approach for imputation of ongoing subnational price new opportunities to investigate local price dynamics in surveys. The aim is to monitor inflation at the market markets where prices are sensitive to localized shocks and level, relying only on incomplete and intermittent survey traditional data are not available. data. The capabilities are highlighted using World Food This paper is a product of the Development Data Group, Development Economics and the Fragility, Conflict and Violence Global Theme. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http:// www.worldbank.org/prwp. The author may be contacted at bandree@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Estimating Food Price Inflation from Partial Surveys ´e* By Bo Pieter Johannes Andre JEL: C01, C14, C25, C53, O10. Keywords: Inflation, Food Security, Financial Stability, Machine Learning. * The World Bank, Development Economics, Data Group, Analytics and Tools Unit. The author may be contacted at bandree(at)worldbank.org. The findings, interpretations, and conclusions expressed in this paper are entirely those of the author. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. This work is part of the program “Building the Evidence on Protracted Forced Displacement: A Multi-Stakeholder Partnership”. The program is funded by UK aid, it is managed by the World Bank Group (WBG) and was established in partnership with the United Nations High Commissioner for Refugees (UNHCR). The scope of the program is to expand the global knowledge on forced displacement by funding quality research and disseminating results for the use of practitioners and policy makers. This work does not necessarily reflect the views of the UK government, the WBG or UNHCR. The author would like to thank Paolo Verme, Aart Kraay, Phoebe Spencer, Olivier Dupriez, Kamwoo Lee, Andres Chamorro, Nadia Piffaretti, Benjamin Stewart, Roberta Gatti, Daniel Lederman, and Talip Kilic for detailed feedback, inputs and rich discussions on early results. 1 2 ´ ANDREE DECEMBER - 2021 Low inflation and price stability have historically been associated with improved growth and development outcomes (The World Bank, 2019; Ha et al., 2019b,a). Inflation dis-proportionally taxes the purchasing power of low incomes (Easterly and Fischer, 2001), which is in contrast with the proportional income effects of economic growth itself (Dollar and Kraay, 2002). As such, price stability and low inflation are crucial to achieve the Sustainable Development Goals and price data are critical for economic monitoring. Food prices specifically play a critical role in determining the immediate ability of households to consume food (Andreyeva et al., 2010; Waterlander et al., 2019), and the capacity of farmers to plan the investments needed for reliable future production of food (Koomen et al., 2015; Andr´ ee et al., 2017; Diogo et al., 2017). Food price inflation thus serves as an important metric to inform economic policy and is closely watched by both economists and humanitarians during times of financial turmoil and economic crisis.1 The traditional figures on inflation used for these purposes have primarily been produced at an aggregate level using data from few, highly urbanized, areas. As such they do not directly describe price evolution in rural or poverty-stricken ar- eas, or in regions where vast populations are internally displaced and live in frag- ile situations.2 For example, food crises are often characterized by great spatial heterogeneity and geographic specificity (Maxwell et al., 2020), while traditional price indicators do not provide insights beyond the major (urban) markets where prices are formally measured.3 Traditional indicators are also often produced with delay, or not at all, particularly during crises when economic variables deteriorate rapidly.4 Recognizing these limitations, international institutions have invested heavily 1 The significance of price inflation as a signal of deteriorating food security can be highlighted by noting that the recent crises in South Sudan and the Republic of Yemen are characterized by high inflation. Historically, the Bengal famine of 1943 followed after a period of hyper-inflation, starvation in the Weimar Republic occurred alongside record hyper-inflation, the 1980 Uganda famine occurred after double to triple digit inflation rates, the 1992 famine in Southern Somalia was preceded by a year of triple-digit inflation, and the 1998 famine in southern Sudan was preceded by years of double digit inflation. The use of food price indicators to track famine risks is documented more substantively by Seaman and Holt (1980); Cutler (1984); Khan (1994); Andr´ ee et al. (2020); Wang et al. (2020). 2 Naturally, there are other critiques around inflation calculations, some as old as the methodologies themselves. Some accusations about both up- and downward biases are discussed by Reinsdorf et al. (2009). Often, conflicting views on inflation can be traced back to differences in (baskets of) goods and data sampling locations, and so for some applications improved clarity can be gained by more narrowly defining indexes, which the results of the paper may also help enable. 3 See also the recent economic update for South Sudan (World Bank, 2021). The work compares food prices subnationally and finds that increasing prices are the most significant factor driving recent food insecurity but documents strong spatial heterogeneity in both market price dynamics, as well as relations to food insecurity, and provides examples of markets where prices are more sensitive to localized shocks. 4 For instance, the International Finance Statistics (IFS) data base of the IMF reports monthly price data at Consumer Price Index (CPI) component level, but few developing countries report food price data without substantial delays. Of the 25 countries analyzed by the paper, none presently reports real-time monthly official food price data; half had not reported food price data going back 12 months. Annual statistics are similarly far from complete. Of all countries in the World Bank’s WDI, only 60% (40% of the analyzed countries) had an inflation figure for the most recent 2019–2020 period. The IFS data reported similarly on only half of the countries. This was last checked on August 31, 2021; WDI data identifier FP.CPI.TOTL.ZG, and IFS data identifier PCPI PC PP PT. ESTIMATING FOOD PRICE INFLATION 3 in subnational price survey systems. However, the capacity to monitor inflation continues to be severely limited by challenges related to data gathering that in- clude such issues as lack of resources, loss of access to markets, or disturbances in ground operations during crises and conflict outbreaks. Despite tremendous effort, missing data remains pervasive, which poses problems when one wants to compute real-time inflation measured over multiple price series. Whenever a sin- gle price point is absent, the required index cannot be computed without making assumptions about missing data. To overcome some of the shortcomings associated with traditional systems, this paper develops an approach for real-time imputation of survey data drawing on multiple imputation and ensemble learning ideas. The imputation is generated as an ensemble prediction from multiple completed price trajectories. Each price trajectory is the result of a stochastic equation chain of machine learning models that leverage correlations between prices of individual food items and similarities between markets to predict missing local price quotes. The idea is similar to a bootstrap aggregation (bag) of base learners (Breiman, 1996), only now random- ization in base learners is not achieved by random sampling but from a stochastic simulation of chained model predictions. By augmenting incomplete and inter- mittent survey data with reliable predictions, subnational food price trends can be monitored continuously. The paper highlights the new price monitoring capabilities using surveys from the World Food Programme (WFP) gathered in 25 fragile and conflict-affected countries. The final estimates of food price inflation documented in this paper are shown to capture important local inflation events related to conflict; as well as known food crises including the World Food Price Crisis of 2007-08 documented c˜ by Baffes et al. (2008); Concei¸ ao and Mendoza (2009), and the surge in inflation that followed the 2020 pandemic and subsequent expansion in the global monetary base. The results include inflation estimates for a large number of countries where these data have recently not been available, and the analysis of the results documents new insights into different characteristics of recent and past inflation events. The granular high-frequency results provide an alternative price estimate that is particularly relevant in data-poor, lower-income, regions, where the capacity to maintain in-depth price monitoring programs, that rely on traditional CPI meth- ods, is often limited. The focus on low-income countries is in contrast with past work that, instead, has had a stronger focus on data-rich and higher income areas. For example, past related work has developed forecasting methods for future CPI (Joutz, 1997; Gavin and Mandal, 2002), future commodity spot and futures prices at global markets (Ahumada and Cornejo, 2016; Ouyang et al., 2019), or meth- ods for continuously updating current expectations of future country-level CPI inflation using information from alternative sources (Modugno, 2011; Seabold and Coppola, 2015). Finally, the methods proposed by the paper could also be ap- plied to enhance other data gathering programs, improve their cost-effectiveness 4 ´ ANDREE DECEMBER - 2021 by replacing a subset of expensive surveys with targeted predictions, and advance broader economic monitoring in data-poor regions. The remainder of the paper has been structured as follows. The next section provides a brief overview of the type of price surveys that can be used to deploy the new monitoring capabilities and discusses some basic challenges encountered when trying to read inflation from the raw data. Section II details the imputa- tion strategy and section III presents results. For those interested in analyzing the subnational price and inflation estimates further, all results for the 25 countries analyzed are available to be interactively explored.5 Section IV provides conclud- ing remarks and makes several recommendations for future data gathering. I. Food Price Survey Data Subnational food prices have been surveyed in many countries for years by humanitarians to inform their country operations. Well-known data bases are those from the WFP, FEWS NET and the Food and Agricultural Organization (FAO).6 The paper focuses on raw monthly data from the WFP, but parts of the discussion, and particularly the methods developed here, could apply to similar data sets.7 The paper gathered all end-of-August data available from the WFP Vulnerabil- ity Analysis and Mapping (VAM) unit as of September 21, 2021. The WFP data reports prices as measured in different market locations throughout each country. Price quotes are supplied by WFP country and regional offices, and local partners including the FAO. In total, the data base covers over 2,000 markets across 99 countries, and reports monthly data on the prices of goods that vary by country and market. Price monitoring dates back to early/mid 2000s in most countries, while in few countries monitoring began already in the 1990s. The number of markets varies strongly by country, ranging from 1 to well over a 100. The methods developed by the paper are particularly interesting when there are multiple markets, as otherwise, official CPI data may provide sufficient, if not better, insight. The goods for which prices are collected, and the manner in which these prices are measured, in turn vary widely across markets and may change over time. For example, prices of some goods are collected at the retail level in some markets, while for others they are collected at the wholesale level and include discounts typical for large purchases. Generally, prices are measured in nominal local currency per unit of commodity (e.g. Shillings per 10 kg of maize), but in some countries USD quotes on selected goods may exist alongside 5 With the publication of the paper, new monthly food price estimates by product and market using the methods and data described in this paper, will be made available on the World Bank Microdata Library. They can be viewed and downloaded at https://doi.org/10.48529/2ZH0-JF55 or, currently, https://microdata.worldbank.org/index.php/catalog/4218. The citation is Andr´ ee (2021). 6 There are also country-specific data gathered by FSNAU (Somalia) and CLiMIS (South Sudan). More recently, the International Food Policy Research Institute has piloted gathering high frequency prices in several countries. 7 For instance, estimates have also been developed for Papua New Guinea in collaboration with IFPRI in a separate pilot project. Results can be requested from the author. ESTIMATING FOOD PRICE INFLATION 5 local currency quotes. As a result, the full data set is highly heterogeneous and challenging to work with. For example, while it contains common staples like rice, sorghum, and maize, that are monitored in many countries, there are also items specific only to one country. The focus of the paper is purely on monitoring food price inflation, in a rea- sonably cross-comparable manner.8 The strategy has been to extract from the very large set of price data set a stable set of commodity prices that are defined as homogeneously as possible across countries, and that are as widely available as possible across markets, while having the best coverage over time. The period of analysis starts in January 2007, or the next first date at which data was avail- able. After carefully examining the prevalence of price data across commodities, markets, and time for each country, 43 foods were identified for which price data are reasonably abundant across multiple markets in at least one country. Tables A2 and A3 list the selected food items by country. The foods are either staples, agricultural produce, or dairy products.9 Aside from non-foods, the selection ex- cludes only fish and meat products due to the very high dietary heterogeneity in these foods. The resulting country-specific baskets thus consists mostly of staples, often similar in nature as the type of foods traded at global markets. For exam- ple, maize, sorghum, millet, wheat, vegetable oil, to name a few common food items, are also tracked in the World Bank Commodities Price Data (The Pink Sheet) that is used to construct the World Bank Food Price Index used to track international food price developments. In most cases, the different food items can substitute one another to certain degrees, or may act as complements, so that the prices of most foods will be strongly associated with the price developments in others. For example, in Afghanistan there are only four food items: bread, rice, wheat and wheat flour. These prices are likely strongly associated. First, rice and wheat are both cereals and likely to move together in price, as either will replace the other as sources of carbohydrates to a certain degree if their price ratios di- verge too strongly. The price of wheat is likely a good predictor of the price of wheat flour, which in turn is used to make bread. Thus, a regression chain, that cycles over the individual food items, can make good imputations for all products as soon as the price of one of these goods is available. When more food items are observed, the chances increase that for each item with a missing price quote, at 8 In several countries, the data covers non-food items that include common household items, prevailing day labor wages and local exchange rates, but generally the availability of such data is highly specific to the selected area. The proposed statistical methods, however, will not necessarily discriminate between types of commodities and could be applied to generate country-specific results on non-foods. 9 The items covered are bread, cooking oil, pulses, rice, sugar, wheat, wheat flour, beans, ground- nuts, maize, millet, sorghum, bananas, cassava flour, maize flour, onions, potatoes, tomatoes, cassava, cocoyam, cowpeas, eggs, milk, peas, plantains, sesame, cabbage, carrots, cucumbers, dates, fonio, garlic, oranges, tomatoes paste, watermelons, yogurt, bulgur, cheese, chickpeas, lentils, maize meal, gari, and parsley. The selection of these items has been fully codified in a set of flexible rules around minimum data-availability thresholds. The selected items in this way depend on the period of analysis. Generally, if the focus is on recent data, without requiring a long history, a higher number of food items could be selected. The code to scrape and run the price monitor has been made available as open source and can be customized for improved country-specific results. 6 ´ ANDREE DECEMBER - 2021 least one other item can be found that carries very strong predictive power over its price.10 Table A1 summarizes the raw data availability in each country. The table shows that the data of the analyzed countries on average include 7 food items measured across 27 markets, with around half of the survey data missing. The number of markets is generally a strong improvement in geographic detail when compared to traditional price indicators. For example, World Bank (2021) analyzes official CPI at the subnational component level in South Sudan. The analysis highlights the limits of relying purely on primary data collection, as the indexes are only compiled for the three major urban areas and not for any rural areas. This is despite an estimated 80% of the country’s population that lives in rural areas, and the fact that only a fraction of the remainder urban population buys goods at these three markets. Many urban households rely on markets that are less connected to international markets than primary markets in the capital city. In total, the application uses data from 676 different markets. There are an ad- ditional 547 markets without observations, which are markets where WFP tracks commodities that are not covered by the application. These locations are not modeled but are spatial interpolated separately. Table A1 provides a country breakdown of data availability, for the country-product combinations analyzed by the paper. Data coverage of around 70%, as a percent of all time × location with data, is relatively high, while 30% can be considered relatively low. Natu- rally, these figures depend on the selection of markets, food items, and the period of analysis. Higher data coverage could be achieved by focusing on a shorter time period, working with fewer markets, or tracking a very narrow basket. The statistics are thus not representative of the WFP data base, but represent the data selection of the paper which seeks to balance reasonable data coverage and data availability. The raw price data of these selected food items remains a challenging source to deal with. Available data is regularly contaminated by outliers, which could be due to incorrect survey entries (e.g. a misplaced digit). Moreover, many local price series are incomplete and price quotes of different products may become available at different times and locations. The data availability constraints be- come more problematic when the interest is in tracking inflation, which requires prices quotes to be matched with historical quotes using a fixed time interval. For any given country, if the data selection was made such that the data coverage was 100%, the selection would collapse to zero cases unless the focus would be on an extremely narrow selection of data points that introduces strong sampling 10 There may also be nonlinear patterns that carry important information about unobserved prices. First, a change in price ratio between two prices may be an indicator that a certain food of interest will likely trade against an elevated price ratio to some other food. Second, governments may use price controls and so certain prices may remain fixed for long periods. This can for example be observed in the bread prices in Afghanistan here (Andr´ ee, 2021). These fixed prices regularly break once input prices have risen too sharply, and such level shifts in the data can signal that all price levels must seek new price equilibria. To that regard, there may also be time-varying relations between various prices that, for example, change depending on key level shifts in the prices of certain foods. ESTIMATING FOOD PRICE INFLATION 7 biases. Such a selection can also only be made ex-post and so it is only useful for historical analysis of prices and not for real-time monitoring. II. Methods To overcome some of the shortcomings associated with the raw data, this section develops an approach for real-time imputation. The idea is to create an ensem- ble prediction from multiple completed market-level price trajectories, produced by simulating stochastic equation chains of predictive machine learning models that leverage correlations between prices of individual food items and similarities between markets. The completed data can then be used to construct regional or local food price indexes. A. The missing data problem First, table 1 visualizes the general missing data problem. Table 1—Example of the missing data problem. A B C a1 b1 b2 c2 a3 c3 a4 b∗ 4 b6 c6 Note: Example of the missing data problem, three hypothetical vectors A, B , and C that represent price series, with elements at , bt and ct being individual price quotes indexed by time periods t. Blank entries represent missing observations. The challenge is to estimate change rates ∆P of the basket price vector P = A + B + C that spans all t = 1, . . . , 6. Element b∗ 4 is an example outlier price which needs to be removed and replaced with an estimate. Source: Example has been prepared by the author for this paper. The literature has put forward many solutions for time series interpolation and extrapolation. It is useful to discuss briefly why some standard tools cannot reliably be used. Simple data gaps in univariate time series can often be linearly interpolated or solved with last-observation-carried-forward imputation. Both approaches can be sufficiently adequate if there is just one observation missing in a sequence. When several values are missing in a row, the results might rapidly become unrealistic. For example, a4 can only be carried forward since a5 and a6 are missing. This introduces a lag with which price increases are observed. When prices are rising, a last-observation-carried-forward imputation introduces a downward bias in the price average, for example using this method we would approximate p ˆ6 ∼ a4 + b6 + c6 << a6 + b6 + c6 when a6 >> a4 . As shall become clear from the results, price levels in low-income countries can move fast in either 8 ´ ANDREE DECEMBER - 2021 direction so that old price data quickly loses its relevance.11 Kalman filtering is a popular method that is known to produce results that are optimal in common settings. These methods are discussed at length by Durbin and Koopman (2013). The state of the methodologies at present, however, is such that the current application brings together too many issues to derive a state space specification that generalizes across all data situations. Essentially, a good approach may be specified for narrow studies of prices within any given country individually, but likely not across all prices and all countries.12 B. The predictive imputation framework The literature on missing data often distinguishes between different mecha- nisms of missing data: MCAR (missing completely at random), MAR (missing at random), and MNAR (missing not at random). This taxonomy, introduced by Rubin (1976) and well-explained by van Buuren (2012), becomes important when missing data occurs at the covariate side, and different corrections are needed for unbiased inference depending on the type of missing data problem. In the current application, it is required that the data is at least MAR and possibly MNAR and that the value of one food item can be accurately predicted by making reference to prices of other food items and the location in space and time.13 As such, the 11 Closely related interpolation methods are plagued by similar issues. For example, linear interpolation takes information from the future to the past. The linear interpolation estimate for c5 can only be made after c6 has already been observed. Alternatively, moving average imputation based on past values results in biases when used for real-time monitoring. When prices are rising, a moving average imputation of a6 would be based on an average of lower prices and a real-time monitoring system using moving average imputation would regularly suggest false trend reversals. These simple interpolation approaches are also highly influenced by outliers. Interpolating linearly from b2 to b6 , through outlier b∗ 4 , replicates information into two new outliers b∗ ∗ 3 and b5 . More flexible spline methods are known to introduce spurious cycles in time series data and may result in explosive extrapolation due to their quadratic or higher order nature, particularly in the presence of outliers. 12 Optimality in a Kullback-Leibler or Root-Mean-Square sense is only under restrictive specification assumptions which require tedious design effort and diagnosing. Generally, Kalman filtering estimates time varying states and uses current state estimates and observations to estimate future states. When observations are missing, the imputation will rely fully on the transition equation. This means that the dynamic properties deteriorate rapidly during periods of large data gaps. This may in part be tackled by considering multivariate state space approaches so that realistic predictions for one sequence can be made by utilizing recent information from another sequence. Fast multivariate filtering implementations that could be deployed at scale are for example explored by Koopman and Durbin (2000). Due to outliers, and general non-Gaussian behavior, non-Gaussian outlier models may need to be considered. The univariate context is for example explored by Shephard (1994); Koopman et al. (2019). The above methods all rely on restrictive linearity assumptions. Nonlinear filtering is explored by Wan and Van Der Merwe (2000). Deciding between approaches is complicated by the variability in quality and quantity of data, the patterns of missingness, and the dynamic properties of the data-generating process. The considerations are likely unique to each country-specific study. Finally, predictions with time-series methods are based on historical information and not designed to smooth data that is simultaneously missing in multiple trajectories based on contemporaneous relations between intermittently observed time series with limited overlap. 13 From the traditional missing data perspective, the MCAR situation is an easy problem as it effec- tively implies simply dropping incomplete cases will not bias subsequent inference. On the other side of the spectrum are MNAR problems in which the probability of a case being missing systematically varies for reasons that are unknown. In this case, data is needed that explains why certain observations are missing. From an inference perspective, MNAR problems are not even salvaged by predictive regression- based methods that make predictions for missing data based on other covariates. These methods simply ESTIMATING FOOD PRICE INFLATION 9 focus is entirely on specifying highly accurate prediction models for the expected prices. A standard regression-based prediction approach would fill missing entries in A using data from (B, C ), fill missing entries in B using information from (A, C ), and so on. There are now two main issues that need to be solved to move forward with this idea. First, few matchable entries may exist. For example, only 8 out of 24 entries are missing in table 1, so the data is 67% complete. However, there are zero cases with elements in A, B and C that can be contemporaneously matched, so a regression cannot be fit even though price information is relatively abundant. Second, if missing entries in A are updated, then the information would certainly be relevant for the imputations in B and C . The information held in imputations in B and C , on the other hand, may lead one to find a better model to adjust the imputations in A and so on. The problems of simultaneity and few match- able entries are solved using a chained equations approach. The application here essentially adapts the Multiple Imputation using Chained Equations framework discussed extensively by Rubin (1996); van Buuren and Groothuis-Oudshoorn (2011); van Buuren (2012); Murray (2018), for predictive accuracy using ensem- ble methods to produce a stable prediction for missing data. In particular, the approach starts by estimating the following regression function: (1) (a1 , a3 , a4 ) = f (b1 , ˆ b∗ b3 , ˆ ˆ1 , c3 , c 4, c ˆ4 ). In this regression, ˆ b∗ ∗ ˆ ˆ1 , c 4 is an estimate that has replaced outlier b4 and b3 , c ˆ4 are previously generated imputations. These values need to be initialized around some initial value. A suitable initialization and outlier-replacement method will be discussed. For now, let’s assume these elements are initialized with some plausible guess work. After estimating the regression, the function f ˆ can be used to update entries t = 2, 5, 6 in variable A by generating the predictions (2) (ˆ a2 , a ˆ5 , a ˆ(b2 , ˆ ˆ6 ) = f ˆ5 , c6 ). b5 , b6 , c2 , c Next, a new regression with the elements (b1 , b2 , b6 , ) from B on the left-hand side and matching entries from A, C on the right-hand-side can be constructed. ˆ2 , a As seen in table 1, the updated values a ˆ6 generated with equation 2, using patterns learned in equation 1, would now be on the right-hand side, so the amplify existing correlations and, acting as low-pass filters, reduce variability in the data. This leads to over-confident inferences. Stochastic methods that specify the mechanism of missingness and estimate a distribution for each missing data point are then needed (see van Buuren (2012)). In the current paper, this taxonomy is less relevant as the interest is in obtaining the single most likely value for a missing data point and not in corrections in standard errors of some final regression model. From this perspective, of a pure prediction problem, the MCAR problem is actually the hardest as it implies that no useful information exists to make predictions. Instead, prediction requires that the data is MAR or MCAR and that the values of a missing data point can accurately be predicted using the values of other covariates. The only word of caution is then that the predicted values are based on patterns in the observed data and underestimate total variance in the unobserved data. Section B.B4 develops methods to model the conditional heteroskedastic variance throughout the data to provide some plausible estimate of price variance at lower time frames. 10 ´ ANDREE DECEMBER - 2021 covariates have been improved by the previous modeling step. This process can be repeated iteratively until all elements have been updated several times and new updates do not improve subsequent predictions any further. For each food item d ∈ 1, . . . , D, in this example D = 3, this involves a chain of regression functions fˆd,i , where i ∈ 1, . . . , I indexes the updating iteration. The iterative sequence of regressions involved in updating the imputations is referred to as a regression chain because the predictions made by the regression fitted in the previous step, feed into the inputs of the next regression model. Depending on several factors, such as the values at which the missing data val- ues in the first regression equation are initialized, the order in which the algorithm cycles through the data, random elements of modeling (for example, taking boot- strap samples or using stochastic methods to generate synthetic training data), the result after I iterations will each time be different. Hence, the process is repeated M times, thus simulating a function space indexed by f ˆd,i,m . Because the properties of simulated values for missing data change throughout the chain, the chained equations method allows to find various prediction models beyond what could be estimated with observed data alone. More details are provided in the appendix B.B1. An ensemble predictor is finally constructed by averaging the M prediction results from the regression model at the tail of the simulation chain f ˆd,I,m . In particular, after generating M imputations, the final imputation for a price ele- ment x ˆt ∈ (A, B, C ) is generated by calculating the ensemble average M 1 i=I,m (3) ˆt = x xt (ˆ ). M 1 Since there are stochastic elements to the iterative algorithm, each prediction at iteration I will be generated from a different prediction model. Increasing M improves both the stability of the stochastic result as well as the accuracy of the ensemble prediction. This is similar to bootstrap aggregating, a prediction im- provement technique central to the Random Forest algorithm of Breiman (2001), in which multiple random simple base learners are combined to improve stability and accuracy by canceling out random prediction errors. The key difference is that bootstrap aggregation produces multiple learners by taking random draws of the training data, while the randomization in learners at iteration I result from different stochastic simulations of chained model predictions, see section B.B2 for more details on the derivation of the stochastic ensemble predictor. C. The regression specification The methods are applied to each country individually, so the desired flexibility of the prediction models varies. Moreover, the properties of the data change throughout the simulation, thus the desired flexibility of the model should change ESTIMATING FOOD PRICE INFLATION 11 accordingly, gradually exchanging robustness for flexibility.14 The paper considers two implementations depending on data availability in the country. When data is scarce, an elastic net model is used (implemented by Hastie et al. (2021)), which helps reduce the impact of uninformative predictors (Friedman et al., 2010). When data is abundant, a cubist regression is used fol- lowing Quinlan (1992); Witten et al. (2016).15 This is a piece-wise linear model that combines decision trees, boosting, and neighborhood smoothing, to capture smoothed versions of Random Forest-type of nonlinearities (Kuhn et al., 2012).16 The model differs from a Random Forest regression by using simple linear regres- sions at terminal nodes, thereby resulting in smoother transitions, that are more typical for numeric data, and enabling short-range out-of-sample extrapolation based on local regressions. Due to their linear nature, these local extrapolations generally remain reasonably stable, which is as opposed to, say, neural network predictions that can rapidly turn explosive outside observed data intervals. Cu- 14 See for example the discussions in (Andr´ ee, 2020) on the relation between the ee et al., 2019; Andr´ sample size, strength of nonlinearity, and model size. Dynamic assumptions about the behavior of the process being modeled can be imposed by using non-parametric approaches whose parameterizations can arise flexibly. However, being overly flexible and over-fitting f would lead to terrible predictions. To a certain degree, the ensemble prediction will counter some of the prediction errors that arise from over-fitting. For example, similar to a Random Forest algorithm, if the prediction errors of models m = 1, . . . , M at step I are uncorrelated, the ensemble simply cancels them out, thereby increasing robustness when M increases. Regardless, over-fitting can be problematic. Particularly important is to avoid an overly flexible model in the first estimation step. When f is an overly flexible model, an incorrect initialization can be learned, particularly if there is a pattern of missingness that can locally be correlated to the levels in other prices. For example, if missing entries in A occur during elevated levels in B , then initializing missing entries in A at the unconditional mean and using a flexible model to parameterize f , can lead to a model that simply learns to use the mean of A as a predictor within distinct regions of elevated levels in B . In such a situation, the imputations do not improve well across updating iterations. Essentially, across iterations, the model memorizes the initialization. To avoid the algorithm getting stuck, it is best to introduce a stochastic element around a reasonably correct initialization and use techniques that can control the flexibility of f across the iterations. In addition, it may be preferable to use only models for f that result in reasonably smooth transitions in nonlinearity across levels, rather than in very sharp cut-offs in local data associations. Since the entries on the covariate side are updated iteratively, there could also be more nuances fit by the function f as the algorithm iterates. This means while the flexibility of f should be relatively low in the first step, it could be beneficial to increase it across iterations. Finally, a variable selection mechanism is also advisable as the importance of predictive features may change across iterations, see (Graham et al., 2012). 15 In total there are 186 different regression problems (number of food items summed across countries), 72 of these are solved with a linear model. Thus, 61% of problems are solved nonlinearly. Both models achieve good results, for instance in the Republic of Yemen and the Syrian Arab Republic, most of the regression problems are solved linearly but the cross-validated accuracy results are still good. The exact rules to determine whether a linear or nonlinear model is used are codified based on human judgment and best viewed in the source code that has been made available. It is possible to determine this by letting both models compete in a cross-validation exercise, but the runtime increases beyond what is practical. The guiding principle has been that when few markets are available, or long temporal gaps exist, and the results need to be extrapolated far across the data dimensions, the linear model is used. 16 Cubist is an extension of M5 regression trees that incorporates pruning, neighborhood smoothing and boosting. Essentially is uses a computationally efficient strategy to recursively partition the data space and fit simple piece-wise linear prediction models within each partition, whose predictions are combined using neighborhood averaging of local model predictions. The advantages over M5 are that it can produce smoother transitions across numeric outputs, and much faster runtime. Both being of high importance to the current application. The advantages over Random Forest are that the cubist model has linear regressions at terminal nodes and so it can extrapolate slightly out of range, while Random Forests can only interpolate using medians or averages of typical values associated withing ranges of the input data. 12 ´ ANDREE DECEMBER - 2021 bist models have done well on a variety of spatially oriented prediction problems, often reaching accuracy not far below that of deep learning methods while main- taining full model interpetability (Morellos et al., 2016; Ng et al., 2019; Sbahi et al., 2021). While the short-range extrapolation capabilities of cubist models are useful in the current setting, the most beneficial feature for this application here is that it runs much faster than its M5 cousin or other boosting or en- sembling methods such as Gradient Boosting Machines and eXtreme Gradient Boosting methods (Hagenauer et al., 2019). Two regression specifications are considered. Both regression models simply perform a spatio-temporal interpolation, leveraging temporal price trends, geo- graphic proximity, and spatial trends, to make time-varying spatial interpolations between prices of related food items. More precisely, the linear model is of the following form: (4) P A = β0 + P −A β1 + Xβ2 + Gγ1 + Sγ2 + ε, while the cubist approximates the nonlinear function: (5) P A = f (P −A , X, S ) + ε. In these equations, P A is a vector that stacks all the prices of commodity A observed at all markets × time combinations within a country. Similarly, P −A is a matrix that has the prices of all other food items. In the previous example, it would simply bind columns (B, C ), and the iterations would simply cycle over specifications according to the scheme P B ∼ P −B + εB and P C ∼ P −C + εC . In this example, there are just three price vectors, but table A1 shows that the application considers problems with up to 15 price predictors. Prices on both sides of the equation are modeled in logarithms, omitted from notation here. The vector β1 captures the linearized relationships between log prices, e.g. the price ratios. The matrix G are group dummies for market-level fixed effects and administrative-level fixed effects, the matrix S are seasonal dum- mies. Finally, X is a matrix of additional covariates. These include logarithmic coordinates to capture spatial trends, and price trend features engineered from (A, B, C, . . . ) that capture important temporal variation. The price trend features are engineered by first taking the individual market price trajectories contained in (A, B, C, . . . ) that have been observed with at least 95% data coverage, the up to 5% missing price points are imputed with a seasonal Kalman filter using a Basic Structural Model. Second, a commodity-specific country price trend is con- structed by Kalman interpolating all market trajectories and taking a weighted average based on data coverage. In particular, trajectories that had above 75- percentile data coverage are averaged, weighting by normalized data coverage rates. As an example, if there are three markets with above 75-percentile data coverage that respectively have 50%, 75% and 100% data coverage, the weights are (0, 0.5, 1). ESTIMATING FOOD PRICE INFLATION 13 Particularly within the matrix X there can be highly correlated predictors, while the matrix G may contain multiple identical indicators, for instance if there is only one market within an administrative unit. The following hard-coded vari- able selection rules are used. First, linear combinations are removed to avoid dummy problems. Highly collinear variables with > .95 correlation are removed by iteratively recalculating the correlation matrix and removing the variable with the highest overall correlation. Finally, variables with near-zero variance are re- moved.17 All predictors are centered and scaled. Note that equation 4 is estimated using an elastic net model that uses L1 and L2 penalties to shrink the predictor space, see again (Friedman et al., 2010; Hastie et al., 2021). The cubist regression specification of equation 5 is essentially an observation specific counterpart of the linear model. As such, only the regional dummies are removed. The cubist model can partition the data and fit local regressions. Since the coordinates are supplied, the model can learn spatial fixed effects using adaptive neighborhood sizes as well as spatial interaction effects by partitioning by coordinates rather than relying on explicit spatial dummies, see again (Quinlan, 1992; Kuhn et al., 2012; Witten et al., 2016). D. Validation of imputation accuracy There are no countries in the data where the food price data is complete and the true inflation of a basket of food items is fully known. This makes val- idation against true data difficult.18 Cross-validation techniques are used to adjust the flexibility and assess the predictive accuracy of the models in each iteration. The elastic net model optimizes over the standard L1 and L2 penal- ties and the mixing parameter.19 The cubist model tunes over the neighbor- hood size used for smoothing, and the boosting iterations, using grid of all N eighborhood × Committees = (2, 4, 8) × (1, 25, 50, 100) combinations. The training and out-of-sample validating are as usual on mutually exclusive draws of observations. A 4-fold validation is used to limit the computational burden of the application to manageable levels.20 17 Food price series with zero variance do not occur in the data, but the cross-validation sampling might in theory land a draw that has extremely low price variation, particularly if the code would be applied to other countries. 18 For instance, if a set of countries where the WFP data are complete would have been available, an interesting exercise would be to construct the true food basket price and accompanying inflation. Then, in this same sample, randomly a subset of the data could be removed to simulate a pattern of missing observations and the final basket price imputation could be imputed and compared to the true data. 19 L penalizes likelihood by the absolute sum of coefficients, and L by the sum of squared parameter 1 2 values, thereby discouraging large parameter estimates but having very different impacts when redundant parameters approach 0. In particular, L1 penalization can set parameter estimates to 0 as the penalty remains influential when a parameter approaches 0. The mixing parameter determines whether only L1 , L2 or a mix of penalties, is used, balancing between the well-known Lasso Regression and Ridge regression, or mixing them (elastic net). 20 Note that for D foods, a total of countries × M × I × D ¯ country × (f olds + 1) × Θ regressions need to be estimated, with θ ∈ Θ indexing the different tuning configurations and D ¯ country being the average number of food items in a country. 14 ´ ANDREE DECEMBER - 2021 The training data contains in addition to the observed prices a small draw of 10% of the imputations generated from the previous regression estimates. These synthetic data points help balance prediction performance in thinner regions of the sample space. When the training data consists only of actual observations, the predictions may be biased toward purely observed value ranges. Adding synthetic cases using previous predictions makes the estimation problem slightly more representative of the missing data. The validation sample always consists only of actual data points, thus excluding imputations.21 The model-tuning focuses on a Normalized Mean-Absolute-Error criterion. The reason for this criterion is that the MAE is less impacted by outliers than the common Root-Mean-Squared Error measure and has a stable interpretation across applications.22 Note that an attempt will be made to replace outliers by missing data values and imputing them, but there is no guarantee that all outliers are captured and so an outlier-robust prediction validation metric is a safe option. There are M validation results for each food item, but the interest is primarily in the robustness of the final imputed price index which combines all the predictions. The validation results are condensed for better presentation. In particular, a cross-comparable confidence score is constructed from the individual validation estimates.23 The metric is defined as follows. First, a normalized MAE for food item d is constructed as the ratio of the MAE of the model for that food item to the MAE obtained by a simple mean prediction. Since each MAE estimate represents an average point percent error rate due to the log nature of the price data, the individual MAE values are averaged geometrically. 1 M M m=1 M AE m,d (6) N M AE d = M AEd |µ where M AE m,d is a cross-validation estimate of MAE using the standard for- 21 Note that throughout the simulation, the quality of synthetic training data improves. Such tech- niques are also being explored elsewhere. For example, Lee and Braithwaite (2020) use a regression chain of image recognition models and feature-based models that update one another’s training data which improved their learning results. In the current paper, f is parameterized using a piece-wise linear approach and adding bootstrap draws from previous imputations helped improve validation performance on actual data by allowing the models to train on denser representative example data near the edges of the sample space. This helps stabilize extrapolations outside of the sample space. 22 With simple arithmetic one can show that M AE ≤ RM SE ≤ √nM AE , which reveals that the upper-limit of RMSE varies with sample size and has different interpretations across applications. 23 M × D × I cross-validation metrics can be computed in each country application. The result at iteration I is used to diagnose the quality of imputations, but the full validation sequence provides a diagnostic to determine a relevant value for I . In particular, throughout the simulation, the cross- validation metrics should improve. The number of iterations for stochastic imputation is often determined by observing whether the means and variances of the predictions start changing in a purely random fashion. In the current case, the stopping criterion that the cross-validation performance does not improve further is a slightly less vague criterion. The multiple imputation implementation of van Buuren and Groothuis-Oudshoorn (2011) has default values of I = M = 5. In the current application, diagnosing the performance indicated I = 8 was usually sufficient. M = 5 was kept due to the high computational load of the application. ESTIMATING FOOD PRICE INFLATION 15 mula for MAE, M AEd |µ is the MAE calculated using observed data and the unconditional mean. Since the true data range is not observed, and averaging is known to improve ensemble performance, the quantity from equation 6 is likely a conservative estimate.24 The focus next is on the quantity 1 − N M AE , which is the share of the total absolute variation in the demeaned data explained by the imputation model. The D values are averaged as follows.   2 D  −1  d wd Z 1 − N M AE d  (7) Z  CV -score =   D  d w d  where Z is the Fischer Z-transformation and Z −1 its inverse, and w are the relative weights of the price component in the final price index. The final score from equation 7 roughly has the interpretation of the average R2 of the food price index, using a robust calculation of out-of-sample errors.25 The unit of measurements are harmonized so that each food item has equal weight after in the index once scaled to a comparable unit of measurement. Specifically, the food item specific predictions are aggregated into food price indexes by summing the prices of all foods in the basket, after bringing the prices to comparable units of measurement (1 kg for foods, 1 liter for fluids and vegetable oils, 1 dozen for single packaged eggs, 1 unit for foods that come in other units of measurement – such as some fruits that come in bundles). As a simple example, if the index consists of 1 kg of sorghum and 100 Kg of maize, the latter price is multiplied by 0.01 and an equally weighted index is constructed with the result.26 Note 24 Note that M AE |µ only uses the observed data range while the true data range is likely larger. As such, the estimated N M AE d is likely larger than the true value since the denominator is underestimated in equation 6. There are further related challenges to the validation due to the fact that the general objective of imputing unobserved data can only be validated using observations. Some further rationale for the divergence metric and how it relates to the objective is provided in section B.B3. 25 Note that (1−N M AE ) ≥ (1−N RM SE ), where the second value equals the out-of-sample calculation of the R2 . Keep in mind, however, that the data for validation may still contain outliers, see again section B.B3, so that (1 − N RM SE ) > (1 − N RM SE ). Regardless, when N M AE is low, such as in nearly all countries as the results will show and particularly in the outlier case, the value (1 − N M AE ) approaches the out-of-sample R2 given by (1 − N RM SE ). Moreover, when N M AE is small, then model fit is good and average bias must be small, which means that 1 − N M AE approaches the in-sample R2 which is the squared Pearson-correlation between the predictions and the observations. It is well known that the arithmetic mean of multiple correlation coefficients underestimates the total correlation and that the distribution of the correlation coefficient must first be normalized before averaging to obtain a less biased estimate, particularly when the number of coefficients is small. See (Corey et al., 1998) on this matter and the use of the Fischer Z-transformation in this context. 26 The standardization in the units of measurement is automated and works by parsing the text in the WFP data base that describes the food items and inferring multipliers that standardize the prices to comparable units. In particular, each text string is parsed as amount × unit and a multiplier for food item 1 d is calculated as wd = amount c, where c is a conversion factor from unit to either Kilograms or Liters. 1 1 For example, the text string 10 P ounds of Rice would result in a multiplier of wrice = 10 × 0.453592 = 0.2204624, while 1 kg of wheat would simply result in a weight of wwheat = 1. This strategy is usually 16 ´ ANDREE DECEMBER - 2021 that from a nutritional perspective, some foods should be weighted more strongly to model a food price index that reflects preferable consumption ratios.27 The simple approach here is just to ensure that an item measured in larger quantities does not dominate the final index simply because it has a larger price range, or dominate the validation result simply because the unit of measurement is small. This is not ideal, but sensible, given that expenditure shares for specific food items used to construct traditional CPI are not widely available in the countries analyzed by the paper. Recall that the estimation exercise of the paper is in fact motivated by the unavailability of traditional data. Tables A2 and A3 list for all countries the food-specific Index Weights used to scale price levels to correspond to prices for comparable units of measurement. E. The initialization A two-step approach is taken to initialize the regression chain. First, univariate imputation methods are used to pre-impute the starting values of missing en- tries.28 The iterative modeling then initializes randomly around the pre-imputed values by adding an additive disturbance term to the initialization drawn from a uniform distribution centered around 0, scaled to 10% of the range in the data. The variance of the prices is stabilized with logarithms, so this disturbance term impacts the initialization roughly evenly across levels. It is important to clean the data from outliers before calculating the pre- imputations and applying the imputation algorithm. In particular, outliers can lead to explosive predictions or generally bad model fit depending on the types of learning methods used. Since the outliers need to be removed from incomplete time series, standard time series methods such as those relying on non-parametric reasonable and avoids that a food commodity measured in bags of 100 kg dominates the food basket price, but is obviously not smart enough to deal with all situations effectively. A specific rule is specified to deal with eggs, which may be described in some countries as 12 units, while in other countries eggs are measured as 1 dozen. In the case of ‘12 units’, a conversion to 1 dozen is made. All the price estimates from the paper are provided for analysis, and the index weights are provided with the index estimates. If it is suspect that conversion factors may have an impact on the final inflation rate estimates, researchers are encouraged to construct their own indexes relevant to the studies at hand using the individual food price estimates. 27 For example, 1 kg of salt and 1 kg of sorghum are given equal weight in the simple index generated here, whereas the latter may carry more dietary importance. The reason for using equal weights is that it takes expert judgment to define weights based on dietary needs, which is not easily automated given the heterogeneity of the data. In an ideal world, future price survey programs would be accompanied with surveys on household expenditure shares or food specific trading volumes. 28 First, minor gaps of up to 3 consecutive months are filled with a univariate Seasonal Kalman smoother using a Basic Structural Model. Then food-market combinations with at least 67% data coverage are completed using the same method. Next, all markets trajectories completed in this way are averaged to construct a preliminary country mean. All other market level series with > 50% data cover- age are then imputed using predictions from a Generalized Additive Model with observed price points as dependent variable and the preliminary country mean as the predictor. The parameters are estimated by Restricted Maximum Likelihood. The country average is recalculated, if there are any further markets with missing data, but > 20% data coverage, the same method is again applied. Any market which has < 20% data coverage, is pre-imputed by using an inverse distance weighing interpolation based on the other imputations. The spatial interpolation uses a neighborhood size cutoff which is selected using cross-validation. ESTIMATING FOOD PRICE INFLATION 17 smoothers with adaptive bandwidths cannot be applied. The approach here first applies a linear interpolation to the raw data, then calculates returns and detects outliers in the returns sequence using the approach by Boudt et al. (2008). The outliers are then turned into missing data points which will be replaced by im- putation. In most cases, severe outliers are simply measurement errors. Minor outliers that simply reflect the natural extreme price variation of illiquid markets, should already be reasonably stabilized by the log transformations. III. Results The methods are applied country-by-country to 25 FCS countries. There are a high number of results, all commodity-specific validation and predictions results are available in the online repository of this paper. Since for multiple countries in the sample no official food CPI are available for the periods of analysis, the section here shall discuss some of the highlights. The cross-validation results of the final price estimates are summarized in the last column of table A4 and highlight that, even on a monthly basis, the market- level price predictions are fairly accurate. The Fisher-average CV-score across countries is 0.85. The lowest result is at 0.69 still a reasonably well informed pre- diction result given the natural volatility in prices that make direct measurement similarly prone to error. To better understand the determinants of prediction reliability, a simple linear regression was performed with results in A5. This indicated that the share and relative dimension of available data (number of foods, number of markets) are not linear predictors of prediction accuracy, instead the volatility and strength of infla- tion determine how well the statistical methods impute missing data. Jointly, av- erage inflation and volatility explain almost half of the variation in cross-validation performance. The signs of the coefficients suggest that when volatility increases, prediction performance deteriorates and data collection becomes more important. When the inflation trend is strong relative to volatility, then the price trend is clearer in the data and imputation accuracy increases, suggesting that robust in- flation tracking is possible even when little ground truth data is available. Purely based on signs, there is some indication that an increased number of food items is beneficial and an increased number of markets makes the prediction task more difficult. This is in line with the idea that a higher number of food items increases the chances that at least one strongly related food item can be leveraged for pre- diction, while an increased number of markets increases spatial heterogeneity in the data thereby increasing the difficulty of accurately predicting all local price levels. The subnational results are aggregated into national food price indexes by cal- culating a simple average price giving equal weight to each market within the country. The imputation focuses on the 676 markets where prices are observed, and spatially interpolates the results onto the full set of markets using a Shepard with cross-validated neighbors algorithm. The country average price indexes are 18 ´ ANDREE DECEMBER - 2021 thus based on the full set of markets, but are essentially derived as a weighted average of the imputed markets with more weight given to markets that are more densely surrounded by other markets where WFP maintains or has maintained price monitoring operations. Table A4 also summarizes the monthly changes in the price indexes over the entire study period using three simple metrics. First, the monthly returns in the price index are annualized. This gives the average rate at which food prices have inflated at an annual basis across all available time periods. The second column provides a maximum draw-down figure, which is the largest negative per- centage price change that occurred from top to bottom over the full time period. Apart from price changes, economists are frequently also interested in price uncer- tainty. The third column in table A4 contains the annualized standard deviation in monthly price returns, which is the average price volatility over the entire pe- riod of study. For example, in Afghanistan prices have increased at an average annual rate of 5.76%, the largest deflationary event was a measured price move of −43.86% from top to bottom, and the average standard deviation with which price changes fluctuated was 8.69%. The simple aggregation of results highlights that food price inflation is problematic in many FCS countries. Inflation-targeting countries often aim for a positive inflation rate just below 2%. Only 6 out of the 25 analyzed countries meet this criterion for food when measured over the entire study period. Moreover, in 7 countries, active price increases in food have been so strong that they outweighed price uncertainty. Historical famines have frequently been associated with high price increases that led to speculative purchasing of food as an investment (Gr´ ada, 2007). Investors typically aim for investments in high-return low-volatility assets and often track the performance of investments by focusing on risk-adjusted returns (Markowitz, 1991). A commonly applied metric is the Sharpe ratio, which is the annualized return divided by the annualized standard deviation of returns. The ratio is a z -score type of metric that describes the excess return received for the extra volatility endured when holding an asset. In general, for an asset to be investment- grade, one likes the ratio to be above 1 so that the price increase outweighs the price uncertainty.29 The Sharpe ratios can be obtained from table A4 by dividing the first column by the third. It is important to note that this ratio is constructed in hindsight and does not fully characterize all the factors involved with an investment decision, nor does it take into consideration the scarcity of information and the complexity of the market environment during inflation events. Nevertheless, it provides a simple and well-understood way to standardize price changes in a way that allows comparing the significance of price changes across markets associated with different typical volatilities. For example, a 4% inflation surge is more significant if the typical price change over that time period is 1% as opposed to, say, 10%. Contrasting these ratios, highlights that some inflation 29 As a reference, the average Sharpe ratio of the dollar denominated S&P500 index has been around 1 over the past 25 years, while good performing hedge funds often achieve annual Sharpe ratios of 1.5. ESTIMATING FOOD PRICE INFLATION 19 events have been more significant than others. For example, in Afghanistan, the average ratio between price increase and price uncertainty in local currency as measured over all time steps equals 0.66. At this value, inflation is positive but price uncertainty outweighs price increase and so there is no strong incentive to hold on to food to hedge against currency depreciation risks. This is opposed to Sudan, South Sudan and the Syrian Arab Republic, where the increase in food prices has not only been extremely high, but has also outweighed food price uncertainty by roughly a factor 2. Alternatively, food prices in Afghanistan rose by 14.09% annualized in 2020, which seems low when compared to some other high inflation events, but highly significant when compared to the relatively low price volatility over that same time period (inflation was 2.67 times volatility). Inflation-targeting countries also aim for prices to be stable on low time frames. Tables A6 and A7 condense the high frequency results by presenting annualized figures. An FCS-wide average is included at the bottom. In total, price estimates for 1223 markets and 43 foods underly these figures. For all year × countries combinations for which estimates are produced (361), only 48 annualized inflation rate estimates fall within the 0-2% targeting range, while 51 estimates cross a critical inflation threshold of 50%. The FCS-wide results highlight the elevated inflation levels during the period associated with the World Food Price Crisis of 2007-08 and the aftermath of the pandemic. The annualized monthly price change in 2008 peaks at geometric average of 19.52% across the full set of countries, while remaining mostly within a modest several percentage point range over the 2009 to 2018 period. The estimated average inflation rate then spikes again at 27.46% in the year of the pandemic, while the preliminary 2021 inflation estimate of 22.65% remains well above levels of the World Food Price Crisis of 2007-08. The results in table A7 highlight that price uncertainty has remained more constant. A striking result is that average price uncertainty was higher during the World Food Price Crisis of 2007-08, peaking at an annualized rate of 13.92%, while hitting 10.52% during the pandemic shock and even dropping to 8.1% in the preliminary 2021 estimate. This means that the recent high inflation event was not only stronger in terms of price increase, but also more significant when compared to natural price volatilities in the two periods. It is from this perspective again interesting to transform the estimates to Sharpe ratios. Both the World Food Price Crisis of 2007-08 and the post-pandemic inflation surge was associated with prices increasing relative to price uncertainty, and this price development is more pronounced in the post-pandemic inflation event. For example, from 2009 to 2018 the FCS-wide average annual Sharpe ratio for food prices was 0.55, but during 2007-08 it averaged 1.30. During 2020, this ratio hits 2.70, while the preliminary annualized 2021 estimate sits at 2.80, meaning that returns in food prices outweigh price volatility risks of holding food as an asset by nearly a factor 3. This means the ratio of inflation to price uncertainty is almost double that of the World Food Price Crisis of 2007-08. The monthly price estimates indicate that prices can rise and fall dramatically 20 ´ ANDREE DECEMBER - 2021 within very short time periods in FCS countries. Figure A1 presents estimated monthly year-on-year inflation on a line chart for the Republic of Yemen together with intra-month price ranges on a candle chart. The intra-month volatility algorithm is detailed in section B.B4 and uses a conditional autoregressive het- eroskedasticity model to estimate the time-varying properties of the monthly price variance process. The estimates are used to calculate Expected Shortfall in the price returns, which is used to construct the wicks and bodies of the candles. The open values are defined as the estimated conditional expectation of monthly prices, the highs are the average intra-month price levels in the worst 50% of estimated conditional price increase events, the lows are equal to the average intra-month price levels in the worst 50% of estimated conditional price decrease events. The colored parts of the candles are the estimated price ranges within which the majority of monthly prices are estimated to be, with red candles indi- cating months in which the end-of-month prices closed below the start-of-month price estimates. Strong inflationary events are thus visualized as consecutive green candles. Periods of high volatility are visualized by candles with large wicks. Re- sults for all other countries are available in figures A2 to A4. The charts in figures A2 to A4 highlight that there have been several notable high inflation events. The World Food Price Crisis of 2007-08 is visible as a price up-tick in most countries, most notably in Afghanistan, Burkina Faso, Chad, Haiti, Niger and Somalia. In some of the charts, the price action seems modest as it is suppressed by recent price action. In these instances, it may be more useful to look at prices on a logarithmic chart. Prices have also visibly surged in many countries in the year following the pandemic. IV. Discussion and concluding remarks Food price inflation is an important metric to inform economic policy and is closely watched by both economists and humanitarians, yet official figures are often missing, lacking in spatial detail, or published with delay. More important, crisis situations, or vulnerable populations in general, are often characterized by high geographic specificity while traditional price indicators do not provide in- sights beyond the major (urban) markets where prices are formally measured. Traditional CPI data in these situations, if available, may be insufficiently un- bundled into distinct price estimates for it to be put into a relevant context. Recognizing this, international institutions have invested substantially in subna- tional price surveys. However, the capacity to monitor inflation has continued to be severely limited by challenges related to missing data. To overcome some of these shortcomings, this paper proposed an approach for real-time imputation of survey data drawing on multiple imputation and ensemble learning ideas. The paper highlighted the new price monitoring capabilities using survey data gathered in 25 fragile and conflict-affected countries. The final estimates of food price inflation documented in this paper were shown to accurately capture im- portant inflation events including the World Food Price Crisis of 2007-08 and the ESTIMATING FOOD PRICE INFLATION 21 surge in inflation that followed the 2020 pandemic and subsequent expansion in the global monetary base. The paper used out-of-sample validation techniques to estimate the reliability of the augmented data. The share of missing data (20.4% to 79.18%), the number of markets (3 to 77), and the number of food items (3 to 16), varied widely across the country-specific applications, but the imputation methods were shown to remain robust across these aspects, even when data coverage was relatively low. A linear regression that correlated the cross-validated prediction performance with key properties of the country applications showed that data coverage itself was not a correlate of prediction accuracy. Instead, the strength of inflation and the natural volatility in prices are strong predictors of the imputation accuracy. The accuracy of the imputations was judged by estimating the total price vari- ation explained by the imputation models. On average, across the countries, the models predicted 85% of the observed price variation. In individual countries, the errors were in the 5% to 30% range of observed prices. This puts the accuracy of imputations in a similar range of that of the direct measure of prices at major urban markets in countries with well-established CPI methods. In particular, direct measure of prices is prone to error with respect to true price levels simply because of the natural volatility in prices. For example Lebow and Rudd (2003) estimated that measurement error in CPI change rates in the United States, a country with strong statistical capacities, could have been as high as 0.3% to 1.4% points over a period where the change rates were typically in a 3% − 6% range, which, similarly, places the measurement error of official CPI change in a 5% to 45% range of true values. An important result is thus that as long as incomplete and intermittent sur- vey data has a at least a 20% to 40% rate of completeness, it can be augmented reliably with predictions to monitor subnational food price trends continuously. This contributes to a previously non-existing capacity to provide insights beyond the major (urban) markets where prices are formally measured with traditional methods. In situations with great spatial heterogeneity, and localized vulnera- bilities — such as in fragile environments or during crisis — this can provide important new insight. Additionally, in those markets where prices are very sen- sitive to localized shocks, the statistical estimates may provide new opportunities to investigate local price dynamics with a similar confidence as one would have obtained otherwise with measured prices. A cross-country analysis of inflation trends, and comparison with inflation trends at global markets such as captured by the FAO or World Bank Food Price Index, would be interesting for future analyses, but is beyond the scope of the paper as it would require dealing with the differences in exchange rates. In a few countries, such as the Republic of Yemen, local unofficial exchange rates are also surveyed by WFP. In a separate analysis, the methods of the paper have been applied to track prices of 23 items covering foods and non-foods in the Repub- 22 ´ ANDREE DECEMBER - 2021 lic of Yemen.30 This highlighted that it is possible to track dollar-denominated prices, broadly for the five food categories that comprise the FAO index, and draw comparisons with price events in global markets when the exchange rate is also monitored. In this paper, it is simply noted that figures A2 to A4 provide some visual guidance on how global price change events propagate at the national level, as nearly all the country graphs display inflation events in 2020. The figures indicate that there may be substantial cross-country heterogeneity in the timing and amplitude of the price surges, which likely relates to differences in factors such as whether a country is a net importer or exporter of food and how the local currency is managed during times of crisis. Finally, since not all products matter the same for household well-being, future analysis may use the estimates developed by this paper to explore food-specific price dynamics or produce infla- tion estimates based on food baskets that incorporate information on expenditure shares. Deploying statistical methods to enhance data gathering may help improve price estimates more widely. For instance, additional investments could be redirected to broaden the scope of data gathering to include additional sampling locations and food items rather than be used to strengthen data coverage of existing nar- row monitoring operations. This could give a more complete view on subnational inflation than can be achieved by pure data gathering alone. Moreover, gath- ering data on additional food items can help produce more accurate predictors for missing observations. The imputations primarily utilize contemporaneous re- lations between the prices of different items, and so priority in data gathering should be given to ensuring that in each month prices of at least some items at some markets are observed. Ensuring that some data is available most of the time periods is more important than ensuring that most of the data is available in some of the time periods. In addition, predictions are more reliable in high inflation episodes and less reliable in high volatility episodes, so data gathering processes may also use statistical methods to produce real-time price estimates and use the results in turn to determine how much new data gathering is needed. References Ahumada, H. and Cornejo, M. (2016). Forecasting food prices: The case of corn, soybeans and wheat. International Journal of Forecasting, 32(3):838–848. ee, B. P. J. (2020). Theory and Application of Dynamic Spatial Time Series Andr´ Models. Rozenberg Publishers and Tinbergen Institute, Amsterdam. ee, B. P. J. (2021). Monthly food price estimates by product and market. In Andr´ WLD 2021 RTFP v02 M, Version 2021-12-02. World Bank Microdata Library, Washington, DC. 30 Results and adapted code are available from the author upon request. ESTIMATING FOOD PRICE INFLATION 23 ee, B. P. J., Chamorro, A., Spencer, P., Koomen, E., and Dogo, H. (2019). Andr´ Revisiting the relation between economic growth and the environment; a global assessment of deforestation, pollution and carbon emission. Renewable and Sustainable Energy Reviews, 114:109221. ee, B. P. J., Diogo, V., and Koomen, E. (2017). Efficiency of second- Andr´ generation biofuel crop subsidy schemes: Spatial heterogeneity and policy de- sign. Renewable and Sustainable Energy Reviews, 67:848–862. ee, B. P. J., Kraay, A., Chamorro, A., Spencer, P., and Wang, D. (2020). Andr´ Predicting Food Crises. World Bank Policy Research Working Papers. Andreyeva, T., Long, M. W., and Brownell, K. D. (2010). The impact of food prices on consumption: A systematic review of research on the price elasticity of demand for food. Baffes, J., Mitchell, D., Riordan, E. M., Streifel, S., Timmer, H., and Shaw, W. (2008). Global Economic Prospects: Commodities at the Crossroads 2009. World Bank Publications, Washington, DC. Baillie, R. T., Bollerslev, T., and Mikkelsen, H. O. (1996). Fractionally integrated generalized autoregressive conditional heteroskedasticity. Journal of Economet- rics, 74(1):3–30. Baillie, R. T., Han, Y. W., and Kwon, T.-G. (2002). Further Long Memory Properties of Inflationary Shocks. Southern Economic Journal, 68(3):496. Boudt, K., Peterson, B., and Croux, C. (2008). Estimation and decomposition of downside risk for portfolios with non-normal returns. The Journal of Risk, 11(2):9. Breiman, L. (1996). Bagging predictors. Machine Learning. Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32. Chang, C.-L., McAleer, M., and Tansuchat, R. (2012). Modelling Long Memory Volatility In Agricultural Commodity Futures Returns. Annals of Financial Economics, 07(02):1250010. c˜ Concei¸ ao, P. and Mendoza, R. U. (2009). Anatomy of the global food crisis. Third World Quarterly. Corey, D. M., Dunlap, W. P., and Burke, M. J. (1998). Averaging correlations: Expected values and bias in combined pearson rs and fisher’s z transformations. Journal of General Psychology, 125(3):245–261. Cutler, P. (1984). Famine forecasting; Prices and peasant behaviour in Northern Ethiopia. Disasters, 8(1):48–56. 24 ´ ANDREE DECEMBER - 2021 Diogo, V., Reidsma, P., Schaap, B., Andr´ee, B. P. J., and Koomen, E. (2017). Assessing local and regional economic impacts of climatic extremes and feasi- bility of adaptation measures in Dutch arable farming systems. Agricultural Systems, 157:216–229. Dollar, D. and Kraay, A. (2002). Growth Is Good for the Poor. Journal of Economic Growth. Durbin, J. and Koopman, S. J. (2013). Time Series Analysis by State Space Methods. Oxford University Press. Easterly, W. and Fischer, S. (2001). Inflation and the Poor. Journal of Money, Credit and Banking, 33(2):160. Friedman, J., Hastie, T., and Tibshirani, R. (2010). Regularization paths for gen- eralized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22. Gavin, W. T. and Mandal, R. J. (2002). Predicting inflation: food for thought. The Regional Economist, (Jan.):4–9. Ghalanos, A. (2020). Introduction to the rugarch package. (Version 1.4-3). Gr´ ´ (2007). Making Famine History. Journal of Economic Literature, ada, C. O. 45:5–38. Graham, J. W., Van Horn, M. L., and Taylor, B. J. (2012). Dealing with the Problem of Having Too Many Variables in the Imputation Model. In Missing Data, pages 213–228. Springer New York, New York, NY. Ha, J., Ivanova, A., Ohnsorge, F., and Unsal, F. (2019a). Inflation: Concepts, Evolution, and Correlates. Policy Research Working Paper. Ha, J., Kose, M. A., and Ohnsorge, F. L. (2019b). Understanding Inflation in Emerging and Developing Economies. Policy Research Working Paper. Hagenauer, J., Omrani, H., and Helbich, M. (2019). Assessing the performance of 38 machine learning models: the case of land consumption rates in Bavaria, Ger- many. International Journal of Geographical Information Science, 33(7):1399– 1419. Hastie, T., Qian, J., and Tay, K. (2021). An Introduction to glmnet. Joutz, F. L. (1997). Forecasting CPI Food Prices: An Assessment. American Journal of Agricultural Economics, 79(5):1681–1685. Khan, M. (1994). Market-based early warning indicators of famine for the pastoral households of the Sahel. World Development, 22(2):189–199. ESTIMATING FOOD PRICE INFLATION 25 Koomen, E., Diogo, V., Dekkers, J., and Rietveld, P. (2015). A utility-based suitability framework for integrated local-scale land-use modelling. Computers, Environment and Urban Systems, 50:1–14. Koopman, S. J. and Durbin, J. (2000). Fast filtering and smoothing for multi- variate state space models. Journal of Time Series Analysis, 21(3):281–296. Koopman, S. J., Lit, R., and Nguyen, T. M. (2019). Modified efficient importance sampling for partially non-Gaussian state space models. Statistica Neerlandica, 73(1):44–62. Kuhn, M., Weston, S., Keefer, C., and Coulter, N. (2012). Cubist Models For Regression. Lebow, D. E. and Rudd, J. B. (2003). Measurement Error in the Consumer Price Index: Where Do We Stand? Journal of Economic Literature, 41(1):159–201. Lee, K. and Braithwaite, J. (2020). High-Resolution Poverty Maps in Sub-Saharan Africa. arXiv, 2009.00544. Markowitz, H. M. (1991). Foundations of Portfolio Theory. The Journal of Finance, 46(2):469. Maxwell, D., Khalif, A., Hailey, P., and Checchi, F. (2020). Determining famine: Multi-dimensional analysis for the twenty-first century. Food Policy, 92:101832. Modugno, M. (2011). Nowcasting Inflation using High Frequency Data. Working Paper Series. Morellos, A., Pantazi, X. E., Moshou, D., Alexandridis, T., Whetton, R., Tziotzios, G., Wiebensohn, J., Bill, R., and Mouazen, A. M. (2016). Machine learning based prediction of soil total nitrogen, organic carbon and moisture content by using VIS-NIR spectroscopy. Biosystems Engineering, 152:104–116. Murray, J. S. (2018). Multiple Imputation: A Review of Practical and Theoretical Findings. Statistical Science, 33(2):142–159. Ng, W., Minasny, B., Montazerolghaem, M., Padarian, J., Ferguson, R., Bailey, S., and McBratney, A. B. (2019). Convolutional neural network for simul- taneous prediction of several soil properties using visible/near-infrared, mid- infrared, and their combined spectra. Geoderma, 352:251–267. Ouyang, H., Wei, X., and Wu, Q. (2019). Agricultural commodity futures prices prediction via long- and short-term time series network. Journal of Applied Economics, 22(1):468–483. Quinlan, J. R. (1992). Learning With Continuous Classes. Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, pages 343—-348. 26 ´ ANDREE DECEMBER - 2021 Reinsdorf, M., Triplett, J. E., Reinsdorf, M., and Triplett, J. E. (2009). A Re- view of Reviews: Ninety Years of Professional Thinking About the Consumer Price Index. In Price Index Concepts and Measurement, pages 17–83. National Bureau of Economic Research, Inc. Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3):581–592. Rubin, D. B. (1996). Multiple Imputation After 18+ Years. Journal of the American Statistical Association, 91(434):473. Sbahi, S., Ouazzani, N., Hejjaj, A., and Mandi, L. (2021). Neural network and cubist algorithms to predict fecal coliform content in treated wastewater by multi-soil-layering system for potential reuse. Journal of Environmental Qual- ity, 50(1):144–157. Seabold, S. and Coppola, A. (2015). Nowcasting prices using Google trends : an application to Central America. Policy Research Working Paper Series. Seaman, J. and Holt, J. (1980). Markets and Famines in the Third World. Dis- asters, 4(3):283–297. Shephard, N. (1994). Partial non-Gaussian state space. Biometrika, 81(1):115– 131. The World Bank (2019). Inflation in Emerging and Developing Economies: Evo- lution, Drivers, and Policies. The World Bank. van Buuren, S. (2012). Flexible Imputation of Missing Data. Flexible Imputation of Missing Data. van Buuren, S. and Groothuis-Oudshoorn, K. (2011). Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3):1–67. Wan, E. A. and Van Der Merwe, R. (2000). The unscented Kalman filter for nonlinear estimation. In IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium, AS-SPCC 2000, pages 153–158. In- stitute of Electrical and Electronics Engineers Inc. ee, B. P. J., Chamorro, A. F., and Spencer, P. G. (2020). Stochas- Wang, D., Andr´ tic modeling of food insecurity. World Bank Policy Research Working Papers. Waterlander, W. E., Jiang, Y., Nghiem, N., Eyles, H., Wilson, N., Cleghorn, C., Gen¸ c, M., Swinburn, B., Mhurchu, C. N., and Blakely, T. (2019). The effect of food price changes on consumer purchases: a randomised experiment. The Lancet Public Health, 4(8):e394–e405. Witten, I. H., Frank, E., Hall, M. A., and Pal, C. J. (2016). Data Mining: Practical Machine Learning Tools and Techniques. Elsevier Inc. World Bank (2021). South Sudan Economic Update, June 2021 : Pathways to Sustainable Food Security. Technical report, World Bank, Washington, DC. ESTIMATING FOOD PRICE INFLATION 27 Tables and Figures Table A1—Summary of raw food price data. Country Currency Markets Items Data Coverage Start date Afghanistan AFN 9/40 4 69.27% Jan-07 Burkina Faso XOF 63/65 3 47.79% Jan-07 Burundi BIF 61/68 7 35.93% Jan-07 Cameroon XAF 12/51 11 20.82% Jan-07 Central Afr. Rep. XAF 18/40 3 36.89% Jan-08 Chad XAF 35/56 3 39.17% Jan-07 Congo, Rep. XAF 5/11 7 44.47% May-10 Congo, Dem. Rep. CDF 26/83 10 35.43% Nov-07 Gambia, The GMD 15/28 11 41.91% Jan-07 Guinea-Bissau XOF 3/45 7 56.63% Feb-16 Haiti HTG 9/9 6 68.83% Jan-07 Iraq IQD 18/18 13 48.02% May-11 Lao PDR LAK 17/17 5 46.04% Feb-12 Lebanon LBP 26/26 15 61.49% Mar-12 Liberia LRD 18/24 3 49.44% Mar-07 Mali XOF 77/126 6 58.37% Jan-07 Mozambique MZN 25/52 7 63.13% Jan-07 Myanmar MMK 36/165 3 43.00% Apr-07 Niger XOF 68/79 4 79.60% Jan-07 Nigeria NGN 33/35 16 27.59% May-12 Somalia SOS 18/28 4 55.49% Jan-07 South Sudan SSP 9/20 8 52.35% Jan-07 Sudan SDG 14/14 3 58.01% Jan-07 Syrian Arab Rep. SYP 36/91 15 56.99% Aug-11 Yemen, Rep. YER 24/24 12 45.48% Nov-08 Average 27/49 7 51.47% Note: The first column reports the local currency in which prices are measured, the second column reports the number of markets from which data is used as a fraction of all known market location for which predictions are made, the third columns reports the number of food items for which data is used to construct the food price index, the fourth columns reports the total number of price observations as a share of all market × time combinations where markets is the first number in the third column, the final column reports when the estimated price index starts. Source: The statistics have been prepared by the author for this paper based on end-of-August (2021) food price data from World Food Program. 28 ´ ANDREE DECEMBER - 2021 Table A2—Summary of food price index components. Country Index Component Afghanistan Bread - Retail (1 kg, Index Weight = 1), Rice (Low Quality) - Retail (1 kg, Index Weight = 1), Wheat - Retail (1 kg, Index Weight = 1), Wheat Flour - Retail (1 kg, Index Weight = 1) Burkina Faso Beans (Niebe) - Retail (1 kg, Index Weight = 1), Maize (White) - Retail (1 kg, Index Weight = 1), Millet - Retail (1 kg, Index Weight = 1) Burundi Rice (Low Quality, Local) - Retail (1 kg, Index Weight = 1), Beans - Retail (1 kg, Index Weight = 1), Maize (White) - Retail (1 kg, Index Weight = 1), Bananas - Retail (1 kg, Index Weight = 1), Cassava Flour - Retail (1 kg, Index Weight = 1), Onions - Retail (1 kg, Index Weight = 1), Sweet Potatoes - Retail (1 kg, Index Weight = 1) Cameroon Oil (Palm) - Retail (1 L, Index Weight = 1), Rice (Local) - Wholesale (90 kg, Index Weight = 0.01), Wheat Flour - Retail (1 kg, Index Weight = 1), Beans (Niebe) - Wholesale (90 kg, Index Weight = 0.01), Maize - Wholesale (90 kg, Index Weight = 0.01), Sorghum (Red) - Wholesale (90 kg, Index Weight = 0.01), Bananas - Retail (12 kg, Index Weight = 0.08), Potatoes - Retail (1 kg, Index Weight = 1), Cassava (Fresh) - Retail (5 kg, Index Weight = 0.2), Cocoyam (Macabo) - Retail (20 kg, Index Weight = 0.05), Plantains - Retail (1 kg, Index Weight = 1) Central Afr. Rep. Oil (Palm) - Retail (1 L, Index Weight = 1), Rice - Retail (1 kg, Index Weight = 1), Maize - Retail (1 kg, Index Weight = 1) Chad Maize (White) - Retail (1 kg, Index Weight = 1), Millet - Retail (1 kg, Index Weight = 1), Sorghum (Red) - Retail (1 kg, Index Weight = 1) Congo, Rep. Bread - Retail (1 kg, Index Weight = 1), Oil (Palm) - Retail (1 L, Index Weight = 1), Rice (Mixed, Low Quality) - Retail (1 kg, Index Weight = 1), Wheat Flour - Retail (1 kg, Index Weight = 1), Beans (White) - Retail (1 kg, Index Weight = 1), Groundnuts (Shelled) - Retail (1 kg, Index Weight = 1), Cassava Flour - Retail (1 kg, Index Weight = 1) Cong, Dem. Rep. Oil (Palm) - Retail (1 L, Index Weight = 1), Rice (Local) - Retail (1 kg, Index Weight = 1), Sugar - Retail (1 kg, Index Weight = 1), Wheat Flour - Retail (1 kg, Index Weight = 1), Beans - Retail (1 kg, Index Weight = 1), Maize - Retail (1 kg, Index Weight = 1), Cassava Flour - Retail (1 kg, Index Weight = 1), Maize Flour - Retail (1 kg, Index Weight = 1), Cassava (Cossette) - Retail (1 kg, Index Weight = 1), Plantains - Retail (1 kg, Index Weight = 1) Gambia, The Oil (Vegetable) - Retail (1 L, Index Weight = 1), Rice (Small Grain, Imported) - Retail (1 kg, Index Weight = 1), Sugar - Retail (1 kg, Index Weight = 1), Beans (Dry) - Retail (1 kg, Index Weight = 1), Groundnuts (Shelled) - Retail (1 kg, Index Weight = 1), Millet - Retail (1 kg, Index Weight = 1), Bananas - Retail (1 kg, Index Weight = 1), Onions - Retail (1 kg, Index Weight = 1), Tomatoes - Retail (1 kg, Index Weight = 1), Milk - Retail (1 kg, Index Weight = 1), Carrots - Retail (1 kg, Index Weight = 1) Guinea-Bissau Oil (Vegetable, Imported) - Retail (1 L, Index Weight = 1), Rice (Imported) - Retail (1 kg, Index Weight = 1), Sugar - Retail (1 kg, Index Weight = 1), Wheat - Retail (1 kg, Index Weight = 1), Millet - Retail (1 kg, Index Weight = 1), Sorghum - Retail (1 kg, Index Weight = 1), Fonio - Retail (1 kg, Index Weight = 1) Haiti Oil (Vegetable, Imported) - Retail (1 Gallon, Index Weight = 0.26), Rice (Tchako) - Retail (1 Marmite, Index Weight = 0.37), Sugar (White) - Retail (1 Marmite, Index Weight = 0.37), Wheat Flour (Imported) - Retail (1 Marmite, Index Weight = 0.37), Beans (Black) - Retail (1 Marmite, Index Weight = 0.37), Maize Meal (Local) - Retail (1 Marmite, Index Weight = 0.37) Iraq Bread (Khoboz) - Retail (1 Unit, Index Weight = 1), Oil (Vegetable) - Retail (1 L, Index Weight = 1), Rice - Retail (1 kg, Index Weight = 1), Sugar - Retail (1 kg, Index Weight = 1), Wheat Flour - Retail (1 kg, Index Weight = 1), Beans (White) - Retail (1 kg, Index Weight = 1), Potatoes - Retail (1 kg, Index Weight = 1), Tomatoes - Retail (1 kg, Index Weight = 1), Eggs - Retail (1 Unit, Index Weight = 12), Milk (Powder) - Retail (1 kg, Index Weight = 1), Dates - Retail (1 kg, Index Weight = 1), Cheese (Local) - Retail (1 kg, Index Weight = 1), Lentils - Retail (1 kg, Index Weight = 1) Note: Weights have been rounded to two digits. Note that for index with one item measured in 1 kg and one item measured in 100 kg, the latter has only a weight of 0.01, which has the same effect as scaling the latter unit of measurement to 1 kg and weighting both equally. Source: The statistics have been prepared by the author for this paper based on price data selected using end-of-August (2021) food price data from World Food Program. ESTIMATING FOOD PRICE INFLATION 29 Table A3—Summary of food price index components (continued). Country Index Component Lao PDR Oil (Soybean) - Retail (1 L, Index Weight = 1), Rice (Glutinous, Second Quality) - Retail (1 kg, Index Weight = 1), Sugar (Brown) - Retail (1 kg, Index Weight = 1), Eggs - Retail (1 Unit, Index Weight = 12), Garlic (Small) - Retail (1 kg, Index Weight = 1) Lebanon Bread (Pita) - Retail (1 kg, Index Weight = 1), Oil (Sunflower) - Retail (5 L, Index Weight = 0.2), Rice (Imported, Egyptian) - Retail (1 kg, Index Weight = 1), Sugar (White) - Retail (1 kg, Index Weight = 1), Wheat Flour - Retail (1 kg, Index Weight = 1), Beans (White) - Retail (1 kg, Index Weight = 1), Eggs - Retail (30 pcs, Index Weight = 0.33), Milk (Powder) - Retail (900 G, Index Weight = 1.11), Cabbage - Retail (1 kg, Index Weight = 1), Cucumbers (Greenhouse) - Retail (1 kg, Index Weight = 1), Tomatoes (Paste) - Retail (1.3 kg, Index Weight = 0.77), Bulgur (Brown) - Retail (1 kg, Index Weight = 1), Cheese (Picon) - Retail (160 G, Index Weight = 6.25), Chickpeas - Retail (1 kg, Index Weight = 1), Lentils (Red) - Retail (1 kg, Index Weight = 1) Liberia Oil (Palm) - Retail (1 Gallon, Index Weight = 0.26), Rice (Imported) - Retail (50 kg, Index Weight = 0.02), Cassava (Fresh) - Retail (50 kg, Index Weight = 0.02) Mali Rice (Local) - Retail (1 kg, Index Weight = 1), Beans (Niebe) - Retail (1 kg, Index Weight = 1), Groundnuts (Shelled) - Retail (1 kg, Index Weight = 1), Maize - Retail (1 kg, Index Weight = 1), Millet - Retail (1 kg, Index Weight = 1), Sorghum - Retail (1 kg, Index Weight = 1) Mozambique Oil (Vegetable, Local) - Retail (1 L, Index Weight = 1), Rice (Imported) - Retail (1 kg, Index Weight = 1), Sugar (Brown, Local) - Retail (1 kg, Index Weight = 1), Wheat Flour (Local) - Retail (1 kg, Index Weight = 1), Groundnuts (Small, Shelled) - Retail (1 kg, Index Weight = 1), Maize (White) - Retail (1 kg, Index Weight = 1), Maize Meal (White, Without Bran) - Retail (1 kg, Index Weight = 1) Myanmar Oil (Palm) - Retail (1 L, Index Weight = 1), Pulses - Retail (1 kg, Index Weight = 1), Rice (Low Quality) - Retail (1 kg, Index Weight = 1) Niger Rice (Imported) - Retail (1 kg, Index Weight = 1), Maize - Retail (1 kg, Index Weight = 1), Millet - Retail (1 kg, Index Weight = 1), Sorghum - Retail (1 kg, Index Weight = 1) Nigeria Bread - Retail (1 Unit, Index Weight = 1), Oil (Palm) - Retail (750 ML, Index Weight = 1.33), Rice (Imported) - Wholesale (50 kg, Index Weight = 0.02), Groundnuts (Shelled) - Wholesale (100 kg, Index Weight = 0.01), Maize (White) - Wholesale (100 kg, Index Weight = 0.01), Millet - Wholesale (100 kg, Index Weight = 0.01), Sorghum (White) - Wholesale (100 kg, Index Weight = 0.01), Bananas - Retail (1.3 kg, Index Weight = 0.77), Maize Flour - Retail (1.3 kg, Index Weight = 0.77), Cassava Meal (Gari, Yellow) - Wholesale (100 kg, Index Weight = 0.01), Cowpeas (White) - Wholesale (100 kg, Index Weight = 0.01), Eggs - Retail (30 pcs, Index Weight = 0.33), Milk - Retail (20 G, Index Weight = 50), Oranges - Retail (400 G, Index Weight = 2.5), Watermelons - Retail (2.1 kg, Index Weight = 0.48), Gari (White) - Wholesale (100 kg, Index Weight = 0.01) Somalia Oil (Vegetable, Imported) - Retail (1 L, Index Weight = 1), Rice (Imported) - Retail (1 kg, Index Weight = 1), Maize (White) - Retail (1 kg, Index Weight = 1), Sorghum (Red) - Retail (1 kg, Index Weight = 1) South Sudan Oil (Vegetable) - Retail (1 L, Index Weight = 1), Wheat Flour - Retail (1 kg, Index Weight = 1), Beans (Red) - Retail (1 kg, Index Weight = 1), Groundnuts (Shelled) - Retail (1 kg, Index Weight = 1), Maize (White) - Retail (3.5 kg, Index Weight = 0.29), Millet (White) - Retail (3.5 kg, Index Weight = 0.29), Sorghum (White, Imported) - Retail (3.5 kg, Index Weight = 0.29), Sesame - Retail (3.5 kg, Index Weight = 0.29) Sudan Wheat - Wholesale (90 kg, Index Weight = 0.01), Millet - Retail (3.5 kg, Index Weight = 0.29), Sorghum - Retail (3 kg, Index Weight = 0.33) Syrian Arab Rep. Bread (Bakery) - Retail (1.1 kg, Index Weight = 0.91), Oil - Retail (1 L, Index Weight = 1), Rice - Retail (1 kg, Index Weight = 1), Sugar - Retail (1 kg, Index Weight = 1), Wheat Flour - Retail (1 kg, Index Weight = 1), Beans (White) - Retail (1 kg, Index Weight = 1), Tomatoes - Retail (1 kg, Index Weight = 1), Eggs - Retail (30 pcs, Index Weight = 0.33), Dates - Retail (1 kg, Index Weight = 1), Yogurt - Retail (1 kg, Index Weight = 1), Bulgur - Retail (1 kg, Index Weight = 1), Cheese - Retail (1 kg, Index Weight = 1), Chickpeas (Yellow) - Retail - Retail (1 kg, Index Weight = 1), Lentils - Retail (1 kg, Index Weight = 1), Parsley - Retail (1 Packet, Index Weight = 2) Yemen, Rep. Oil (Vegetable) - Retail (1 L, Index Weight = 1), Rice (Imported) - Retail (1 kg, Index Weight = 1), Sugar - Retail (1 kg, Index Weight = 1), Wheat - Retail (1 kg, Index Weight = 1), Wheat Flour - Retail (1 kg, Index Weight = 1), Beans (Kidney Red) - Retail (1 kg, Index Weight = 1), Onions - Retail (1 kg, Index Weight = 1), Potatoes - Retail (1 kg, Index Weight = 1), Tomatoes - Retail (1 kg, Index Weight = 1), Eggs - Retail (1 Unit, Index Weight = 12), Peas (Yellow, Split) - Retail (1 kg, Index Weight = 1), Lentils - Retail (1 kg, Index Weight = 1) Note: Weights have been rounded to two digits. Note that for index with one item measured in 1 kg and one item measured in 100 kg, the latter has only a weight of 0.01, which has the same effect as scaling the latter unit of measurement to 1 kg and weighting both equally. Source: The statistics have been prepared by the author for this paper based on price data selected using end-of-August (2021) food price data from World Food Program. 30 ´ ANDREE DECEMBER - 2021 Table A4—Summary of estimation results. Country Avg. inflation Max. draw-down Avg. volatility CV-score Afghanistan 5.76% -44.02% 8.69% 0.87 Burkina Faso 4.08% -37.74% 15.25% 0.77 Burundi 3.90% -27.74% 13.26% 0.77 Cameroon 1.25% -20.40% 7.01% 0.95 Central Afr. Rep. 1.07% -29.92% 19.36% 0.75 Chad 4.17% -45.92% 19.02% 0.69 Congo, Rep. 1.56% -24.17% 11.76% 0.87 Congo, Dem. Rep. 7.63% -16.67% 8.64% 0.87 Gambia, The 4.92% -15.31% 10.52% 0.71 Guinea-Bissau 1.62% -18.56% 14.51% 0.81 Haiti 7.60% -34.82% 11.98% 0.82 Iraq -1.07% -25.58% 5.12% 0.90 Lao PDR 0.83% -3.44% 1.56% 0.93 Lebanon 20.35% -21.96% 14.07% 0.92 Liberia 11.67% -17.95% 10.57% 0.93 Mali 2.46% -23.97% 8.63% 0.84 Mozambique 9.10% -29.33% 9.40% 0.84 Myanmar 1.82% -38.76% 11.83% 0.74 Niger 4.03% -23.71% 9.84% 0.87 Nigeria 7.82% -20.89% 7.44% 0.92 Somalia 5.54% -48.67% 12.95% 0.83 South Sudan 43.03% -29.55% 25.11% 0.87 Sudan 47.16% -30.24% 18.93% 0.87 Syrian Arab Rep. 35.25% -23.49% 17.01% 0.88 Yemen, Rep. 9.55% -23.84% 14.13% 0.78 Note: The first three columns respectively report average annualized inflation, maximum draw-down, and average annual realized volatility in percentages. The final column reports the cross-validated confidence score that ranges from 0 to 1 for the final food price index using the calculations from the paper. Additional cross-validation statistics can be found on the World Bank Data Catalog page where a live version of the data base is maintained. Source: The statistics have been prepared by the author for this paper based on end-of-August (2021) food price data from World Food Program. ESTIMATING FOOD PRICE INFLATION 31 Table A5—Linear decomposition of CV -score. Dependent variable: CV -score OLS log Number of markets −0.034 (0.028) log Number of food items 0.021 (0.032) Number of markets per food item 0.003 (0.005) Data completeness −0.019 (0.092) Inflation rate 0.450∗∗ (0.138) Volatility −1.305∗∗ (0.361) Max. draw-down 0.024 (0.138) Constant 1.018∗∗∗ (0.098) Observations 25 R2 0.627 Adjusted R2 0.474 Residual Std. Error 0.052 (df = 17) F Statistic 3.792 (df = 7; 17) Note: ∗ p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01 Note: Simple linear regression estimates that decompose the cross-validation performance score into key characteristics of the imputation problem. Percentage covariates are modeled as numeric digits of the same unit as the dependent variable, e.g. a monthly volatility of 2% enters the regression as a value of 0.02. The simple results highlight that prediction performance is not significantly related to data dimensions of the problem, instead high volatility and high inflation are determinants of imputation accuracy. Jointly, these two variables explain almost half of the in-sample variation. The signs of the coefficients suggest that when volatility increases, prediction performance deteriorates and data collection becomes more important. When the inflation trend is strong relative to volatility, then the price trend is clearer in the data and imputation accuracy increases, suggesting that robust inflation tracking is possible even when little ground truth data is available. Source: The statistics have been prepared by the author for this paper based on end-of-August (2021) food price data from World Food Program. 32 ´ ANDREE DECEMBER - 2021 Table A6—Realized Annualized Food Price Inflation. Country 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021∗ Afghanistan 38.04 55.36 -29.02 2 4.44 18.49 0.95 -1.14 1.31 -0.14 1.29 1.56 1.17 14.09 2.8 Burkina Faso 27.63 -11.96 13.69 -1.57 16.35 -5.15 -8 -0.68 -1.02 1.39 14.46 -9.27 -12.64 21.61 47.93 Burundi 7.31 1.31 15.57 3.53 -6.19 23.57 -2.75 -9.7 21.06 23.54 -3.83 -11.18 15.5 1.13 -16.2 Cameroon 3.12 -0.51 2.1 1.58 1.82 4.82 -2.3 -2.62 -1.9 -6.23 5.15 -3.24 0.05 13.91 7.62 Central Afr. Rep. 2.02 11.31 -3.27 3.63 -0.39 -4.79 -1.41 -12.37 38.04 -21.28 -6.38 5.8 2.24 4.61 14.03 Chad -11.1 62.25 -7.59 -26.54 57.55 -5.1 7.08 1.41 0.09 -27.3 18.13 -12.55 0.85 25.46 31.07 Congo, Rep. 4.08 -5.65 9.85 -5.97 14.92 12.94 -14.66 14.78 -14.77 -1.13 6.35 1.36 Congo, Dem. Rep. 10.64 21.35 7.83 -1.64 28.96 2.14 -8.04 -0.03 -2.52 12.94 24.37 3.82 11.56 8.26 0.6 Gambia, The 1.85 16.01 -8.69 2.28 7.44 5.16 2.04 3.08 1.2 18.81 -11.58 20.73 0.65 24.28 -5.77 Guinea-Bissau -3.33 4.83 -12.24 14.76 -2.75 -5.84 28.79 Haiti 2.24 28.17 -17.21 6.06 3.6 11.58 -11.57 1.71 30.81 3.15 -0.5 5.46 45.86 1.19 25.98 Iraq -0.35 -0.17 1.34 -6.99 0.28 -2.93 -7.45 -8.21 1.84 8.07 7.08 Lao PDR 0.22 0.31 0.39 0.68 0.58 0.16 1.6 2.11 -0.49 -1.02 0.71 0.6 5.88 0.73 -0.23 Lebanon 7.32 1.12 -1.96 -14.64 2.13 4.74 0.43 18.01 131.39 237.14 Liberia 20.5 26.59 3 9.86 18.7 12.72 -8.2 11.18 -7.25 19.8 6.11 33.26 10.31 15.92 9.83 Mali 6.54 5.1 1.11 -2.43 19.5 -4.07 -5.42 -1.75 2.06 1.13 2.76 2.76 -5.3 4.51 21.2 Mozambique 36.47 25.8 -9.16 23.36 6.11 -0.14 -3.3 -3.5 19.9 74.51 -26.05 -0.85 11.3 12.88 0.55 Myanmar 6.67 -8.33 -19.68 12.4 -5.37 -8.49 4.57 -6.89 16.43 13.26 -2.25 2.31 4.81 3.25 45.03 Niger 3 33.49 -0.12 -7.91 17.46 2.71 1.22 -10.76 -1.91 3.55 4.58 -5.93 -1.47 10.15 31.23 Nigeria 24.31 2.84 4.28 -2.01 1.6 37.98 2.24 -7.63 -2.21 24.84 27.67 Somalia 46.67 54.55 -18.55 12.62 2.83 -24.26 1.18 6.74 -5.3 9.58 -1.49 -1.4 5.52 11.35 9.6 South Sudan -2.23 21.42 18.47 15.26 69.16 2.54 -9.32 15.87 120.68 451.01 63.75 33.22 31.43 63.63 20.17 Sudan 23.83 57.15 28.14 1.79 38.1 29.91 31.92 23.1 3.71 13.96 71.59 119.75 62.1 271.37 72.13 Syrian Arab Rep. 13.32 11.14 80.45 0.27 66.91 19.73 -11.96 -12.1 36.18 245.09 74.74 Yemen, Rep. 8.77 10.33 5.13 19.71 -13.76 2.91 1.09 10.88 6.11 13.3 30.1 4.37 31.81 10.96 FCS 11.4 19.52 -1.71 2.71 13.04 2.66 1.91 0.55 9.83 13.63 4.82 5.14 8.63 27.46 22.65 Note: Figures are annualized month-on-month price changes in percentages. Monthly price data is maintained at the World Bank Data Catalog page associated with the paper. FCS is a geometric average of country rates. The 2021∗ figures are based on end-of-August data. Source: Statistics have been prepared by the author for this paper. Table A7—Realized Annualized Food Price Volatility. Country 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021∗ Afghanistan 7.94 20.49 11.42 9.59 4.84 6.09 2.39 2.37 2.65 1.98 2.04 2.58 3.23 5.82 3.19 Burkina Faso 13.92 23.34 23.37 13.05 10.97 19.97 15.63 15.99 11.97 12.17 12.2 17.68 8.08 10.19 5.67 Burundi 9.63 10.22 10.03 11.8 17.04 14.66 12.77 8.7 22.07 17.99 8.18 12.22 15.08 9.26 7.38 Cameroon 2.44 1.91 1.68 3.55 5.09 12.06 11.45 6.1 6.92 12.87 11.03 8.99 6.44 7.62 4.34 Central Afr. Rep. 34.06 26.11 17.97 23.4 19.73 14.48 26.54 16.88 25.05 9.95 19.78 12.06 14.28 12.02 17.3 Chad 14.16 24.52 26.96 23.26 14.74 21.57 19.76 19.09 15.07 15.07 20.04 17.04 11.1 15.32 9.53 Congo, Rep. 2.79 9.75 10.34 17.2 5.97 19.66 13 12.23 14.23 16.59 9.01 3.36 Congo, Dem. Rep. 2.32 8.23 15.22 9.79 4.26 3.33 5.14 5.22 3.86 7.83 14.4 8.19 7.47 9.41 5.55 Gambia, The 10.81 7.43 7.48 4.12 5.44 4.14 8.76 4.51 7.84 13.54 13.98 13.19 16.68 14.97 6.57 Guinea-Bissau 10.59 12.58 13.52 13.07 11.35 19.18 15.73 Haiti 9.45 21.15 9.77 20.62 10.8 10.22 6.59 3.32 6.27 7.03 3.28 2.48 10.88 20.32 5.79 Iraq 3.18 3.88 3.3 6.13 4.75 3.35 6.4 4.54 5.7 7.01 5.87 Lao PDR 0.82 0.92 0.81 0.94 0.92 1.33 1.09 2.2 0.78 0.95 0.4 1.65 2.69 3.12 1.98 Lebanon 3.71 3.39 3.08 4.86 4.54 6.42 1.79 8.5 13.48 5.29 Liberia 9.73 10.64 19.55 13.73 7.72 5.77 10.02 12.51 8.08 8.58 10.69 16.01 9.29 9.29 7.32 Mali 7.55 12.67 10.76 6.78 4.56 10.78 5.23 6.1 9.07 5.69 6.46 10.83 6.2 6.46 2.68 Mozambique 9.41 5.38 10.92 7.47 5.97 5.35 4.38 4.99 9.88 12.14 7.89 4.07 6.75 7.22 8.62 Myanmar 5.16 17.45 24.57 15.26 15.03 15.15 8.47 9.02 11.94 5.55 6.58 8.09 5.51 6.7 9.06 Niger 4.96 15.12 10.96 11.75 6.13 11.91 8.6 4.58 4.92 11.06 13.73 8.68 3.75 13.97 5.71 Nigeria 13.27 6.58 4.91 6.31 9.18 7.44 8.44 4.1 5.74 5.96 4.63 Somalia 8.66 32.58 12.66 10.19 15.94 10.86 7.03 7.35 7.15 9.25 7.5 4.76 2.35 10.41 4.12 South Sudan 10.19 6.08 26.43 11.19 25.72 29.82 16.24 28.53 26.45 27.4 24.64 34.47 14.02 12.66 28.83 Sudan 14.5 23.13 11.24 16.94 11.55 22.91 12.48 23.65 11.93 11.13 20.34 19.55 16.41 14.93 11.1 Syrian Arab Rep. 7.26 9.8 22.19 15.21 7.39 13.55 4.96 8.89 14.47 11.03 21.09 Yemen, Rep. 3.77 18.45 15.28 23.15 11.87 7.84 11.28 21.82 15.38 7.41 17.58 7.78 9.51 5.85 FCS 9.55 13.92 13.99 11.4 10.38 10.9 9.87 9.34 10.6 10.27 10.34 10.44 9.12 10.52 8.1 Note: Figures are annualized standard deviations of month-on-month price changes in percentages. Monthly price data is maintained at the World Bank Data Catalog page associated with the paper. FCS is a geometric average of country rates. The 2021∗ figures are based on end-of-August data. Source: Statistics have been prepared by the author for this paper. ESTIMATING FOOD PRICE INFLATION 33 Yemen, Rep. 2008-07-01 / 2021-08-01 2.6 2.4 2.2 2.0 1.8 1.6 1.4 1.2 1.0 0.8 Price Inflation (Year on year, %) 19.20322 40 40 30 30 20 20 10 10 0 0 -10 -10 -20 -20 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan 2008 2009 2009 2010 2010 2011 2011 2012 2012 2013 2013 2014 2014 2015 2015 2016 2016 2017 2017 2018 2018 2019 2019 2020 2020 2021 Figure A1. Estimated food prices, inflation and intra-month volatilities in Yemen. Note: Price of food baskets in local currency, local-market average, January 2015 =1. The top charts shows Open, High, Low and Close price estimates of the total food basket price. The food basket consists of a 24 market-average of retail prices of Cooking Oil (Vegetable, 1 L), Rice (Imported, 1 kg), Sugar (1 kg), Wheat (1 kg), Wheat Flour (1 kg), Beans (Kidney Red, 1 kg), Onions (1 kg), Potatoes (1 kg), Tomatoes (1 kg), Eggs (12 Units), Peas (Yellow, Split, 1 kg) and Lentils (1 kg). The bottom chart shows monthly food price inflation as a year-on-year percentage increase in Close prices. Source: Figure prepared by the author for this paper. A Live version of all graphs, including graphs at the subnational level, are maintained in the on line data repository. 34 ´ ANDREE DECEMBER - 2021 Afghanistan 2007-01-01 / 2021-08-01 Burkina Faso 2007-01-01 / 2021-08-01 1.45 1.2 1.40 1.35 1.1 1.30 1.25 1.0 1.20 1.15 0.9 1.10 0.8 1.05 1.00 0.7 0.95 0.90 0.6 0.85 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Burundi 2007-01-01 / 2021-08-01 Cameroon 2007-01-01 / 2021-08-01 1.7 1.10 1.6 1.5 1.05 1.4 1.3 1.00 1.2 0.95 1.1 1.0 0.90 0.9 0.85 0.8 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Central African Republic 2007-05-01 / 2021-08-01 Chad 2007-01-01 / 2021-08-01 1.45 1.40 1.2 1.35 1.1 1.30 1.25 1.0 1.20 0.9 1.15 1.10 0.8 1.05 0.7 1.00 0.95 0.6 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 May May May May May May May May May May May May May May Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Congo 2010-05-01 / 2021-08-01 Democratic Republic of the Congo 2007-07-01 / 2021-08-01 1.8 1.20 1.7 1.15 1.6 1.10 1.5 1.4 1.05 1.3 1.00 1.2 0.95 1.1 1.0 0.90 0.9 0.85 0.8 0.80 0.7 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 May May May May May May May May May May May Apr Jul Jul Jul Jul Jul Jul Jul Jul Jul Jul Jul Jul Jul Jul 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Figure A2. Estimated food prices and intra-month volatilities in Afghanistan, Burkina Faso, Burundi, Cameroon, Central African Republic, Chad, the Republic of the Congo and the Democratic Republic of Congo. Note: Price of food baskets in local currency, local-market average, January 2015 =1. Source: Figure prepared by the author for this paper. A Live version of all graphs, including graphs at the subnational level, are maintained in the online data repository. ESTIMATING FOOD PRICE INFLATION 35 Gambia 2007-01-01 / 2021-08-01 Guinea-Bissau 2015-01-01 / 2021-08-01 1.6 1.15 1.5 1.10 1.4 1.3 1.05 1.2 1.00 1.1 1.0 0.95 0.9 0.8 0.90 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2015 2016 2017 2018 2019 2020 2021 Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2015 2015 2016 2016 2017 2017 2018 2018 2019 2019 2020 2020 2021 Haiti 2007-01-01 / 2021-08-01 Iraq 2011-03-01 / 2021-08-01 2.6 1.10 2.4 1.05 2.2 2.0 1.00 1.8 1.6 0.95 1.4 0.90 1.2 1.0 0.85 0.8 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Mar Mar Mar Mar Mar Mar Mar Mar Mar Mar Mar 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Lao People's Democratic Republic 2007-01-01 / 2021-08-01 Lebanon 2012-01-01 / 2021-08-01 1.10 5.5 1.08 5.0 1.06 4.5 1.04 4.0 3.5 1.02 3.0 1.00 2.5 0.98 2.0 0.96 1.5 1.0 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Liberia 2007-01-01 / 2021-08-01 Mali 2007-01-01 / 2021-08-01 2.2 1.35 2.0 1.30 1.8 1.25 1.6 1.20 1.4 1.15 1.2 1.10 1.0 1.05 0.8 1.00 0.6 0.95 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 0.90 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure A3. Estimated food prices and intra-month volatilities in the Gambia, Guinea-Bissau, Haiti, Iraq, Lao People’s Democratic Republic, and Lebanon. Note: Price of food baskets in local currency, local-market average, January 2015 =1. Source: Figure prepared by the author for this paper. A Live version of all graphs, including graphs at the subnational level, are maintained in the on line data repository. 36 ´ ANDREE DECEMBER - 2021 Mozambique 2007-01-01 / 2021-08-01 Myanmar 2007-03-01 / 2021-08-01 1.8 2.0 1.7 1.8 1.6 1.6 1.5 1.4 1.4 1.2 1.3 1.0 1.2 1.1 0.8 1.0 0.6 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Mar Mar Mar Mar Mar Mar Mar Mar Mar Mar Mar Mar Mar Mar Mar 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Niger 2007-01-01 / 2021-08-01 Nigeria 2011-09-01 / 2021-08-01 2.0 1.35 1.30 1.9 1.25 1.8 1.20 1.7 1.15 1.6 1.10 1.5 1.05 1.00 1.4 0.95 1.3 0.90 1.2 0.85 1.1 0.80 0.75 1.0 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Sep Sep Sep Sep Sep Sep Sep Sep Sep Sep Aug 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Somalia 2007-01-01 / 2021-08-01 South Sudan 2007-01-01 / 2021-08-01 1.7 1.6 70 1.5 60 1.4 1.3 50 1.2 40 1.1 1.0 30 0.9 20 0.8 0.7 10 0.6 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Sudan 2007-01-01 / 2021-08-01 Syrian Arab Republic 2011-03-01 / 2021-08-01 10 35 9 30 8 25 7 6 20 5 15 4 10 3 2 5 1 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Mar Mar Mar Mar Mar Mar Mar Mar Mar Mar Mar 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 Figure A4. Estimated food prices and intra-month volatilities in Mozambique, Myanmar, Niger, Nigeria, Somalia, Sudan, South Sudan and the Syrian Arab Republic. Note: Price of food baskets in local currency, local-market average, January 2015 =1. Source: Figures prepared by the author for this paper. A Live version of all graphs, including graphs at the subnational level, are maintained in the on line data repository. ESTIMATING FOOD PRICE INFLATION 37 Mathematical Appendix B1. Prediction strategy The imputation strategy used here is based on an adaption of the Multiple Imputations by Chained Equations (see, van Buuren and Groothuis-Oudshoorn (2011); van Buuren (2012); Murray (2018)). In particular, the original approach has been adapted to allow specify the stochastic properties of the initialization and keep track of cross-validation throughout the imputation process. The adapted code base has been made available, see the links provided in the footnotes of the introduction of the main paper. The code base has also been adapted to allow parallelization of computation. Apart from these practical modifications, the sketch of the algorithm is relatively standard. The algorithm produces multiple likely data sets, which are then pooled into a single imputation. The number of imputations is M = 5, and the m-th imputed price data set is P (m) where m ∈ (1, . . . , M ). Let P −d = (P 1 , . . . , P d−1 , P d+1 , P D ) denote the collection of the d − 1 variables in P except P d . Note that for a given P d , the set P −d may itself be incomplete, (P 1 , . . . , P d−1 , P d+1 , P D ) may be correlated with one another, the relationship between P d and P −d could be complex, and P d may also depend on h other available regressors X = (X1 , . . . , Xh ). Assume that the hypothetically complete price data P is a partially observed random sample from the d- variate multivariate distribution P (P |θ ) and that the multivariate distribution of P is completely specified by the unknown parameter vector θ . Thus, the objective is to obtain estimates for θ . The algorithm estimates a posterior distribution of θ by iteratively sampling from the conditional distributions P (P1 |P−1 , Xh ; θ1 ) . . (B1) . . P (Pd |P−d , Xh ; θd ) The parameters θ1 , . . . , θd are specific to the respective conditional densities. Starting from a draw from the marginal distributions, in the current application modeled using univariate time series methods, the i-th iteration of chained equations is a Gibbs sampler that successively draws ∗ (i) ∼ P θ |P (i−1) (i−1) θ1 1 1,obs , P2 , . . . , PD ,X ∗ (i) (i−1) (i−1) ∗ (i) P1 ∼P P1 |P1,obs , P2 , . . . , PD , X, θ1 . . (B2) . , ∗ (i) (i) (i) θD ∼ P θD |PD,obs , P1 , . . . , PD−1 , X ∗ (i) ∼ (i) (i) ∗ (i) PD P PD |PD,obs , P1 , . . . , PD , X, θD (i) (i) where Pd = (Pd,obs , P ∗ d ) is the i-th imputation of price d at iteration i. The imputations of the (i−1) (i) (i) previous iteration P ∗d enter the next imputation P ∗ d through the other price variables P ∗ −d . (i−1) At each iteration, P ∗d can also be used to generate synthetic cases by adding a random draw of conditional expectations for missing entries to the dependent side of next regression on the same price variable. The process is iterated 8 times, a stopping criterion can be guided by keeping track of prediction validation criteria across i. The imputations and their updates here are done by drawing the conditional means from the posterior distribution of penalized linear regression or cubist regressions. Note that due to the randomness in the cubist algorithm itself, the randomness of the initialization, and the randomness in the synthetic data, there remains substantial randomness in successive iterations that allows the sequence to visit a large variety of likely prediction models for the missing values. For instance, since the models used for prediction can extrapolate beyond the range of training data values to generate synthetic cases with different properties than observed data, and because there are additional sources of randomness such as related to parameter tuning and random components of the cubist model, each iteration may weaken, amplify, or change local correlations between the columns in the data in the next imputation iteration. Hence, while a pre-determined linear regression specification approach artificially amplifies the relations between the columns of the data by reinforcing its own learning pattern throughout the iterations, the stochastic method allows the sequence to find prediction models for the missing values 38 ´ ANDREE DECEMBER - 2021 beyond what seemed likely from initially observed data alone. B2. Ensemble Let Q be the quantity of interest, simple linear difference calculation using fixed-weight averages of P . The ultimate goal of the multiple imputation strategy will be to obtain an estimate Q ˆ (B3) ˆ |P) = Q, E(Q P denoting the true price index population. Since P is unknown, Q is unknown. The amount of uncertainty in the estimate Qˆ thus depends on what is known about Pmis . Since we can only recreate it with uncertainty based on information in Pobs , the idea is to summarize a distribution of Q under varying estimates of Pmis . In other words, the possible functions Q given what has been observed in Pobs have a posterior distribution P (Q|Pobs ) which in turn can be decomposed into two parts (B4) P (Q|Pobs ) = P (Q|Pobs , Pmis )P (Pmis |Pobs )dPmis In this, P (Q|Pobs , Pmis ) is the posterior distribution of inflation in the hypothetically complete price data and P (Pmis |Pobs ) is the posterior distribution of the missing price data given the observed price ˙ mis . data. Suppose that P (Pmis |Pobs ) is used to draw various likely price data sets for Pmis , denoted as P Then, associated inflation Q can be calculated from (P ˙ mis , Pobs ). By repeating this process multiple times, one can obtain the posterior distribution for Q and equation B4 shows that Q than equals the expectation over all draws: (B5) P (Q|Pobs ) = E(E([Q|Pobs , Pmis ]|Pobs )), which suggests that when Qˆ (m) is the estimated model using the m-th imputation, then the combined model using all the imputations is equal to the ensemble estimate M M (B6) ˆ= 1 Q ˆ (m) = Q Q 1 ˆ (m) ; · P , M m=1 M m=1 where the second equality is due to the linearity in the simple linear difference formulation of Q. B3. Performance against the unobserved prediction objective Let pdit be a possible observed price quote for food item d ∈ (1, . . . , D ) that may be observed at location i ∈ (1, . . . , N ) and time t ∈ (1, . . . , T ). Let pd ∋ pdit be the possibly incomplete vector of prices for food item d generated by stacking all N × T entries. P = p1 , . . . , pD is the N T × D matrix that collects all the possible price points and consists of observed and missing parts, Pobs = (p1 D obs , . . . , pobs ) and Pmis = p1 D mis , . . . , pmis . Suppose the true prices are generated by some process that is only partially observed and with error. In particular, for every individual price signal d ∈ (1, . . . , D), the focus is on a T -period sequence {πt d }T d d t=1 that is a subset of the realized path of the stochastic sequence π := {πt }t∈Z . Suppose that {πt }T t=1 is unobserved, but that there is an observed sequence (B7) pd d d d obs := {pt obs = M (πt )}. In this equation, Md is a function that describes how price data of commodity d is measured. It may produce measurement error in the form of additive outliers as well as data gaps. The important distinction is thus that true prices π d are assumed to exist but it is only possible to partially observe pd d d d obs by surveying, inadvertently introducing errors and missing entries. The sequence p = (pobs , pmis ) can thus be split in missing values pd d mis and possibly contaminated observations pobs . The two-fold aim d is to proxy πobs with an estimate p ˆd d d obs by filtering pobs from outliers, and proxy πmis by estimating ESTIMATING FOOD PRICE INFLATION 39 ˆd p d mis by filling in entries for pmis based on the information contained in p ˆd obs . Since the true targets are unobserved, a direct criterion is difficult to establish but the objective can be summarized as minimizing the divergence ∥p ˆ d=1,...D − π d=1,...D ∥, in turn estimated with an L1 -norm based metric for the prediction function that generates p ˆd mis , validated by the out-of-sample predictions it makes for the outlier-filtered ˆd p obs that serves as a proxy d . for πobs B4. Intra-month estimates The price-level estimates are accompanied by intra-month price range estimates represented as an Open-High-Low-Close time series.  ˆ   O EPt |Ft−1 ˆ   = Pt−1 + E∆α>0.50 Pt |Ft−1  H   (B8) Lˆ Pt−1 + E∆α<0.50 Pt |Ft−1  Cˆ Pt t where P is the imputed price series and E∆α is the expected change in the α-percentile cases. The combined results can be plotted on a candle chart. The majority of price action can then be assumed to have occurred within the body of the candles, while the wicks indicate the average price of respectively the highest 50% of intra-month prices, and the average price of the lowest %50 of intra-month prices. The first three quantities in equation B8 are estimated by modeling the time-varying distribution of the month-on-month inflation process as an autoregressive moving average process with fractionally integrated generalized autoregressive conditional heteroskedasticity (ARMA-fiGARCH) following Baillie et al. (1996). This is a time-varying density (B9) Ft = (µt , σt , ϑ) where µt is a conditional mean process defined as an ARMA(p, q ) process p q (B10) µt = c + ϕj µt−j + θj εt−1 + εt , j =1 j =1 and the conditional variance is specified as a fractionally integrated GARCH process of order (p, d, q ) and ϑ is a vector of remaining parameters of the distribution. The conditional variance is defined as follows. First, let the standard GARCH(p, q ) process be defined as 2 (B11) σt = ω + α(L)ε2 2 t + β (L)σt 2 as the conditional variance, ω an intercept, and L the back-shift operator with α(L) = q with σt j =1 αj Lj and β (L) = p j j =1 βj L . This model has an ARMA representation of the squared process: (B12) (1 − L)ϕ(L)ε2 2 2 2 t = [1 − α(L) − β (L)]εt = ω + [1 − β (L)](εt − σt ) max(p,q )−1 with ϕ(L) = j = ϕj Lj . The fractionally integrated GARCH is obtained by replacing the back-shift operator (1 − L) with a truncated fractional difference operator K =1000∼∞ Γ(d + 1) (B13) (1 − L)d = Lk . k=0 Γ(k + 1)Γ(d − k + 1) Ignoring the approximation error due to the truncation, at d = 0 the model equals the standard GARCH in which volatility shocks decay at an exponential rate. Similarly, when d = 1, the AR polynomial of the GARCH has a unit root and the model equals the integrated GARCH in which shocks persist forever. When there are level shifts in volatility process, an integrated GARCH usually better describes the data 40 ´ ANDREE DECEMBER - 2021 than the standard GARCH. Shifts in the volatility process may stem for example from price controls. However, the unconditional variance is undefined in this model which is theoretically difficult to conceive. The fractionally integrated GARCH that results under values 0 < d < 1 allows the GARCH process to have hyperbolic memory in the volatility process such that the volatility process shifts gradually. Such long-memory volatility features have been observed widely in both agricultural commodities (Chang et al., 2012) and general inflationary shocks (Baillie et al., 2002). Finally, the log likelihood requires specifying the remainder parameters in λ. The model is estimated using the Generalized Error Distribution: z−µ ν −0.5 νe λ (B14) G(z ; ϑ) = 21+ν −1 λΓ (ν −1 ) with ϑ = (µ, λ, ν ) as the parameter vector that define location, scale and shape. The distribution is symmetric and unimodal and so the location parameter defines both the mode, median and mean of the distribution. The distribution generalizes the Normal Distribution, when ν = 2, but also allows for higher or lower kurtosis. For example, when ν decreases, the distribution flattens. When ν = 1, the distribution follows the Laplace distribution, while it tends to the Uniform distribution when ν → ∞. The conditional volatility estimates can be used to calculated Expected Shortfall by integrating under the Value-at-Risk distribution 1 α (B15) ESα (X ) = − V aRγ (X )dγ, α 0 with V aRa being the (1 − α) quantile of the estimated returns distribution. Since the conditional return distribution is time-varying and fully specified by the model in B9, the time-varying Expected Shortfall can be estimated by calculating time-varying V aRα,t = µt + σ ˆt |t − 1G−1 (a) , where G−1 is the inverse PDF function of the Generalized Error Distribution. The quantity E∆α Pt is then estimated by the empirical equivalents of ESα,t . The algorithms are implemented following (Ghalanos, 2020). The autoregressive order of both the ARMA and GARCH processes are kept at 1, the moving average orders are selected using the AICc allowing up to three lags.