Estimating Food Price Inflation from Partial Surveys

The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


Policy Research Working Paper 9886
The traditional consumer price index is often produced at an aggregate level, using data from few, highly urbanized, areas. As such, it poorly describes price trends in rural or poverty-stricken areas, where large populations may reside in fragile situations. Traditional price data collection also follows a deliberate sampling and measurement process that is not well suited for monitoring during crisis situations, when price stability may deteriorate rapidly. To gain realtime insights beyond what can be formally measured by traditional methods, this paper develops a machine-learning approach for imputation of ongoing subnational price surveys. The aim is to monitor inflation at the market level, relying only on incomplete and intermittent survey data. The capabilities are highlighted using World Food Programme surveys in 25 fragile and conflict-affected countries where real-time monthly food price data are not publicly available from official sources. The results are made available as a data set that covers more than 1200 markets and 43 food types. The local statistics provide a new granular view on important inflation events, including the World Food Price Crisis of 2007-08 and the surge in global inflation following the 2020 pandemic. The paper finds that imputations often achieve accuracy similar to direct measurement of prices. The estimates may provide new opportunities to investigate local price dynamics in markets where prices are sensitive to localized shocks and traditional data are not available.
This paper is a product of the Development Data Group, Development Economics and the Fragility, Conflict and Violence Global Theme. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http:// www.worldbank.org/prwp. The author may be contacted at bandree@worldbank.org. 1 The significance of price inflation as a signal of deteriorating food security can be highlighted by noting that the recent crises in South Sudan and the Republic of Yemen are characterized by high inflation. Historically, the Bengal famine of 1943 followed after a period of hyper-inflation, starvation in the Weimar Republic occurred alongside record hyper-inflation, the 1980 Uganda famine occurred after double to triple digit inflation rates, the 1992 famine in Southern Somalia was preceded by a year of triple-digit inflation, and the 1998 famine in southern Sudan was preceded by years of double digit inflation. The use of food price indicators to track famine risks is documented more substantively by Seaman and Holt (1980);Cutler (1984); Khan (1994); Andrée et al. (2020); Wang et al. (2020).
2 Naturally, there are other critiques around inflation calculations, some as old as the methodologies themselves. Some accusations about both up-and downward biases are discussed by Reinsdorf et al. (2009). Often, conflicting views on inflation can be traced back to differences in (baskets of) goods and data sampling locations, and so for some applications improved clarity can be gained by more narrowly defining indexes, which the results of the paper may also help enable.
3 See also the recent economic update for South Sudan (World Bank, 2021). The work compares food prices subnationally and finds that increasing prices are the most significant factor driving recent food insecurity but documents strong spatial heterogeneity in both market price dynamics, as well as relations to food insecurity, and provides examples of markets where prices are more sensitive to localized shocks. 4 For instance, the International Finance Statistics (IFS) data base of the IMF reports monthly price data at Consumer Price Index (CPI) component level, but few developing countries report food price data without substantial delays. Of the 25 countries analyzed by the paper, none presently reports real-time monthly official food price data; half had not reported food price data going back 12 months. Annual statistics are similarly far from complete. Of all countries in the World Bank's WDI, only 60% (40% of the analyzed countries) had an inflation figure for the most recent 2019-2020 period. The IFS data reported similarly on only half of the countries. This was last checked on August 31, 2021; WDI data identifier FP.CPI.TOTL.ZG, and IFS data identifier PCPI PC PP PT.

ESTIMATING FOOD PRICE INFLATION
3 in subnational price survey systems. However, the capacity to monitor inflation continues to be severely limited by challenges related to data gathering that include such issues as lack of resources, loss of access to markets, or disturbances in ground operations during crises and conflict outbreaks. Despite tremendous effort, missing data remains pervasive, which poses problems when one wants to compute real-time inflation measured over multiple price series. Whenever a single price point is absent, the required index cannot be computed without making assumptions about missing data.
To overcome some of the shortcomings associated with traditional systems, this paper develops an approach for real-time imputation of survey data drawing on multiple imputation and ensemble learning ideas. The imputation is generated as an ensemble prediction from multiple completed price trajectories. Each price trajectory is the result of a stochastic equation chain of machine learning models that leverage correlations between prices of individual food items and similarities between markets to predict missing local price quotes. The idea is similar to a bootstrap aggregation (bag) of base learners (Breiman, 1996), only now randomization in base learners is not achieved by random sampling but from a stochastic simulation of chained model predictions. By augmenting incomplete and intermittent survey data with reliable predictions, subnational food price trends can be monitored continuously.
The paper highlights the new price monitoring capabilities using surveys from the World Food Programme (WFP) gathered in 25 fragile and conflict-affected countries. The final estimates of food price inflation documented in this paper are shown to capture important local inflation events related to conflict; as well as known food crises including the World Food Price Crisis of 2007-08 documented by Baffes et al. (2008); Conceição and Mendoza (2009), and the surge in inflation that followed the 2020 pandemic and subsequent expansion in the global monetary base. The results include inflation estimates for a large number of countries where these data have recently not been available, and the analysis of the results documents new insights into different characteristics of recent and past inflation events.
The granular high-frequency results provide an alternative price estimate that is particularly relevant in data-poor, lower-income, regions, where the capacity to maintain in-depth price monitoring programs, that rely on traditional CPI methods, is often limited. The focus on low-income countries is in contrast with past work that, instead, has had a stronger focus on data-rich and higher income areas. For example, past related work has developed forecasting methods for future CPI (Joutz, 1997;Gavin and Mandal, 2002), future commodity spot and futures prices at global markets (Ahumada and Cornejo, 2016;Ouyang et al., 2019), or methods for continuously updating current expectations of future country-level CPI inflation using information from alternative sources (Modugno, 2011;Seabold and Coppola, 2015). Finally, the methods proposed by the paper could also be applied to enhance other data gathering programs, improve their cost-effectiveness by replacing a subset of expensive surveys with targeted predictions, and advance broader economic monitoring in data-poor regions.
The remainder of the paper has been structured as follows. The next section provides a brief overview of the type of price surveys that can be used to deploy the new monitoring capabilities and discusses some basic challenges encountered when trying to read inflation from the raw data. Section II details the imputation strategy and section III presents results. For those interested in analyzing the subnational price and inflation estimates further, all results for the 25 countries analyzed are available to be interactively explored. 5 Section IV provides concluding remarks and makes several recommendations for future data gathering.

I. Food Price Survey Data
Subnational food prices have been surveyed in many countries for years by humanitarians to inform their country operations. Well-known data bases are those from the WFP, FEWS NET and the Food and Agricultural Organization (FAO). 6 The paper focuses on raw monthly data from the WFP, but parts of the discussion, and particularly the methods developed here, could apply to similar data sets. 7 The paper gathered all end-of-August data available from the WFP Vulnerability Analysis and Mapping (VAM) unit as of September 21, 2021. The WFP data reports prices as measured in different market locations throughout each country. Price quotes are supplied by WFP country and regional offices, and local partners including the FAO. In total, the data base covers over 2,000 markets across 99 countries, and reports monthly data on the prices of goods that vary by country and market. Price monitoring dates back to early/mid 2000s in most countries, while in few countries monitoring began already in the 1990s.
The number of markets varies strongly by country, ranging from 1 to well over a 100. The methods developed by the paper are particularly interesting when there are multiple markets, as otherwise, official CPI data may provide sufficient, if not better, insight. The goods for which prices are collected, and the manner in which these prices are measured, in turn vary widely across markets and may change over time. For example, prices of some goods are collected at the retail level in some markets, while for others they are collected at the wholesale level and include discounts typical for large purchases. Generally, prices are measured in nominal local currency per unit of commodity (e.g. Shillings per 10 kg of maize), but in some countries USD quotes on selected goods may exist alongside local currency quotes. As a result, the full data set is highly heterogeneous and challenging to work with. For example, while it contains common staples like rice, sorghum, and maize, that are monitored in many countries, there are also items specific only to one country.
The focus of the paper is purely on monitoring food price inflation, in a reasonably cross-comparable manner. 8 The strategy has been to extract from the very large set of price data set a stable set of commodity prices that are defined as homogeneously as possible across countries, and that are as widely available as possible across markets, while having the best coverage over time. The period of analysis starts in January 2007, or the next first date at which data was available. After carefully examining the prevalence of price data across commodities, markets, and time for each country, 43 foods were identified for which price data are reasonably abundant across multiple markets in at least one country. Tables A2 and A3 list the selected food items by country. The foods are either staples, agricultural produce, or dairy products. 9 Aside from non-foods, the selection excludes only fish and meat products due to the very high dietary heterogeneity in these foods. The resulting country-specific baskets thus consists mostly of staples, often similar in nature as the type of foods traded at global markets. For example, maize, sorghum, millet, wheat, vegetable oil, to name a few common food items, are also tracked in the World Bank Commodities Price Data (The Pink Sheet) that is used to construct the World Bank Food Price Index used to track international food price developments. In most cases, the different food items can substitute one another to certain degrees, or may act as complements, so that the prices of most foods will be strongly associated with the price developments in others. For example, in Afghanistan there are only four food items: bread, rice, wheat and wheat flour. These prices are likely strongly associated. First, rice and wheat are both cereals and likely to move together in price, as either will replace the other as sources of carbohydrates to a certain degree if their price ratios diverge too strongly. The price of wheat is likely a good predictor of the price of wheat flour, which in turn is used to make bread. Thus, a regression chain, that cycles over the individual food items, can make good imputations for all products as soon as the price of one of these goods is available. When more food items are observed, the chances increase that for each item with a missing price quote, at 6 ANDRÉE  least one other item can be found that carries very strong predictive power over its price. 10 Table A1 summarizes the raw data availability in each country. The table shows that the data of the analyzed countries on average include 7 food items measured across 27 markets, with around half of the survey data missing. The number of markets is generally a strong improvement in geographic detail when compared to traditional price indicators. For example, World Bank (2021) analyzes official CPI at the subnational component level in South Sudan. The analysis highlights the limits of relying purely on primary data collection, as the indexes are only compiled for the three major urban areas and not for any rural areas. This is despite an estimated 80% of the country's population that lives in rural areas, and the fact that only a fraction of the remainder urban population buys goods at these three markets. Many urban households rely on markets that are less connected to international markets than primary markets in the capital city.
In total, the application uses data from 676 different markets. There are an additional 547 markets without observations, which are markets where WFP tracks commodities that are not covered by the application. These locations are not modeled but are spatial interpolated separately. Table A1 provides a country breakdown of data availability, for the country-product combinations analyzed by the paper. Data coverage of around 70%, as a percent of all time × location with data, is relatively high, while 30% can be considered relatively low. Naturally, these figures depend on the selection of markets, food items, and the period of analysis. Higher data coverage could be achieved by focusing on a shorter time period, working with fewer markets, or tracking a very narrow basket. The statistics are thus not representative of the WFP data base, but represent the data selection of the paper which seeks to balance reasonable data coverage and data availability.
The raw price data of these selected food items remains a challenging source to deal with. Available data is regularly contaminated by outliers, which could be due to incorrect survey entries (e.g. a misplaced digit). Moreover, many local price series are incomplete and price quotes of different products may become available at different times and locations. The data availability constraints become more problematic when the interest is in tracking inflation, which requires prices quotes to be matched with historical quotes using a fixed time interval. For any given country, if the data selection was made such that the data coverage was 100%, the selection would collapse to zero cases unless the focus would be on an extremely narrow selection of data points that introduces strong sampling 10 There may also be nonlinear patterns that carry important information about unobserved prices. First, a change in price ratio between two prices may be an indicator that a certain food of interest will likely trade against an elevated price ratio to some other food. Second, governments may use price controls and so certain prices may remain fixed for long periods. This can for example be observed in the bread prices in Afghanistan here (Andrée, 2021). These fixed prices regularly break once input prices have risen too sharply, and such level shifts in the data can signal that all price levels must seek new price equilibria. To that regard, there may also be time-varying relations between various prices that, for example, change depending on key level shifts in the prices of certain foods.
biases. Such a selection can also only be made ex-post and so it is only useful for historical analysis of prices and not for real-time monitoring.

II. Methods
To overcome some of the shortcomings associated with the raw data, this section develops an approach for real-time imputation. The idea is to create an ensemble prediction from multiple completed market-level price trajectories, produced by simulating stochastic equation chains of predictive machine learning models that leverage correlations between prices of individual food items and similarities between markets. The completed data can then be used to construct regional or local food price indexes.
A. The missing data problem First, table 1 visualizes the general missing data problem.
b 6 c 6 Note: Example of the missing data problem, three hypothetical vectors A, B, and C that represent price series, with elements at, bt and ct being individual price quotes indexed by time periods t. Blank entries represent missing observations. The challenge is to estimate change rates ∆P of the basket price vector P = A + B + C that spans all t = 1, . . . , 6. Element b * 4 is an example outlier price which needs to be removed and replaced with an estimate. Source: Example has been prepared by the author for this paper.
The literature has put forward many solutions for time series interpolation and extrapolation. It is useful to discuss briefly why some standard tools cannot reliably be used. Simple data gaps in univariate time series can often be linearly interpolated or solved with last-observation-carried-forward imputation. Both approaches can be sufficiently adequate if there is just one observation missing in a sequence. When several values are missing in a row, the results might rapidly become unrealistic. For example, a 4 can only be carried forward since a 5 and a 6 are missing. This introduces a lag with which price increases are observed. When prices are rising, a last-observation-carried-forward imputation introduces a downward bias in the price average, for example using this method we would approximatep 6 ∼ a 4 + b 6 + c 6 << a 6 + b 6 + c 6 when a 6 >> a 4 . As shall become clear from the results, price levels in low-income countries can move fast in either 8 ANDRÉE  direction so that old price data quickly loses its relevance. 11 Kalman filtering is a popular method that is known to produce results that are optimal in common settings. These methods are discussed at length by Durbin and Koopman (2013). The state of the methodologies at present, however, is such that the current application brings together too many issues to derive a state space specification that generalizes across all data situations. Essentially, a good approach may be specified for narrow studies of prices within any given country individually, but likely not across all prices and all countries. 12

B. The predictive imputation framework
The literature on missing data often distinguishes between different mechanisms of missing data: MCAR (missing completely at random), MAR (missing at random), and MNAR (missing not at random). This taxonomy, introduced by Rubin (1976) and well-explained by van Buuren (2012), becomes important when missing data occurs at the covariate side, and different corrections are needed for unbiased inference depending on the type of missing data problem. In the current application, it is required that the data is at least MAR and possibly MNAR and that the value of one food item can be accurately predicted by making reference to prices of other food items and the location in space and time. 13 As such, the 11 Closely related interpolation methods are plagued by similar issues. For example, linear interpolation takes information from the future to the past. The linear interpolation estimate for c 5 can only be made after c 6 has already been observed. Alternatively, moving average imputation based on past values results in biases when used for real-time monitoring. When prices are rising, a moving average imputation of a 6 would be based on an average of lower prices and a real-time monitoring system using moving average imputation would regularly suggest false trend reversals. These simple interpolation approaches are also highly influenced by outliers. Interpolating linearly from b 2 to b 6 , through outlier b * 4 , replicates information into two new outliers b * 3 and b * 5 . More flexible spline methods are known to introduce spurious cycles in time series data and may result in explosive extrapolation due to their quadratic or higher order nature, particularly in the presence of outliers.
12 Optimality in a Kullback-Leibler or Root-Mean-Square sense is only under restrictive specification assumptions which require tedious design effort and diagnosing. Generally, Kalman filtering estimates time varying states and uses current state estimates and observations to estimate future states. When observations are missing, the imputation will rely fully on the transition equation. This means that the dynamic properties deteriorate rapidly during periods of large data gaps. This may in part be tackled by considering multivariate state space approaches so that realistic predictions for one sequence can be made by utilizing recent information from another sequence. Fast multivariate filtering implementations that could be deployed at scale are for example explored by Koopman and Durbin (2000). Due to outliers, and general non-Gaussian behavior, non-Gaussian outlier models may need to be considered. The univariate context is for example explored by Shephard (1994); Koopman et al. (2019). The above methods all rely on restrictive linearity assumptions. Nonlinear filtering is explored by Wan and Van Der Merwe (2000). Deciding between approaches is complicated by the variability in quality and quantity of data, the patterns of missingness, and the dynamic properties of the data-generating process. The considerations are likely unique to each country-specific study. Finally, predictions with time-series methods are based on historical information and not designed to smooth data that is simultaneously missing in multiple trajectories based on contemporaneous relations between intermittently observed time series with limited overlap.
13 From the traditional missing data perspective, the MCAR situation is an easy problem as it effectively implies simply dropping incomplete cases will not bias subsequent inference. On the other side of the spectrum are MNAR problems in which the probability of a case being missing systematically varies for reasons that are unknown. In this case, data is needed that explains why certain observations are missing. From an inference perspective, MNAR problems are not even salvaged by predictive regressionbased methods that make predictions for missing data based on other covariates. These methods simply focus is entirely on specifying highly accurate prediction models for the expected prices.
A standard regression-based prediction approach would fill missing entries in A using data from (B, C), fill missing entries in B using information from (A, C), and so on. There are now two main issues that need to be solved to move forward with this idea. First, few matchable entries may exist. For example, only 8 out of 24 entries are missing in table 1, so the data is 67% complete. However, there are zero cases with elements in A, B and C that can be contemporaneously matched, so a regression cannot be fit even though price information is relatively abundant. Second, if missing entries in A are updated, then the information would certainly be relevant for the imputations in B and C. The information held in imputations in B and C, on the other hand, may lead one to find a better model to adjust the imputations in A and so on. The problems of simultaneity and few matchable entries are solved using a chained equations approach. The application here essentially adapts the Multiple Imputation using Chained Equations framework discussed extensively by Rubin (1996); van Buuren and Groothuis-Oudshoorn (2011); van Buuren (2012); Murray (2018), for predictive accuracy using ensemble methods to produce a stable prediction for missing data. In particular, the approach starts by estimating the following regression function: In this regression,b * 4 is an estimate that has replaced outlier b * 4 andb 3 ,ĉ 1 ,ĉ 4 are previously generated imputations. These values need to be initialized around some initial value. A suitable initialization and outlier-replacement method will be discussed. For now, let's assume these elements are initialized with some plausible guess work. After estimating the regression, the functionf can be used to update entries t = 2, 5, 6 in variable A by generating the predictions Next, a new regression with the elements (b 1 , b 2 , b 6 , ) from B on the left-hand side and matching entries from A, C on the right-hand-side can be constructed. As seen in table 1, the updated valuesâ 2 ,â 6 generated with equation 2, using patterns learned in equation 1, would now be on the right-hand side, so the amplify existing correlations and, acting as low-pass filters, reduce variability in the data. This leads to over-confident inferences. Stochastic methods that specify the mechanism of missingness and estimate a distribution for each missing data point are then needed (see van Buuren (2012)). In the current paper, this taxonomy is less relevant as the interest is in obtaining the single most likely value for a missing data point and not in corrections in standard errors of some final regression model. From this perspective, of a pure prediction problem, the MCAR problem is actually the hardest as it implies that no useful information exists to make predictions. Instead, prediction requires that the data is MAR or MCAR and that the values of a missing data point can accurately be predicted using the values of other covariates. The only word of caution is then that the predicted values are based on patterns in the observed data and underestimate total variance in the unobserved data. Section B.B4 develops methods to model the conditional heteroskedastic variance throughout the data to provide some plausible estimate of price variance at lower time frames.

ANDRÉE DECEMBER -2021
covariates have been improved by the previous modeling step. This process can be repeated iteratively until all elements have been updated several times and new updates do not improve subsequent predictions any further. For each food item d ∈ 1, . . . , D, in this example D = 3, this involves a chain of regression functionŝ f d,i , where i ∈ 1, . . . , I indexes the updating iteration. The iterative sequence of regressions involved in updating the imputations is referred to as a regression chain because the predictions made by the regression fitted in the previous step, feed into the inputs of the next regression model. Depending on several factors, such as the values at which the missing data values in the first regression equation are initialized, the order in which the algorithm cycles through the data, random elements of modeling (for example, taking bootstrap samples or using stochastic methods to generate synthetic training data), the result after I iterations will each time be different. Hence, the process is repeated M times, thus simulating a function space indexed byf d,i,m . Because the properties of simulated values for missing data change throughout the chain, the chained equations method allows to find various prediction models beyond what could be estimated with observed data alone. More details are provided in the appendix B.B1.
An ensemble predictor is finally constructed by averaging the M prediction results from the regression model at the tail of the simulation chainf d,I,m . In particular, after generating M imputations, the final imputation for a price elementx t ∈ (A, B, C) is generated by calculating the ensemble average Since there are stochastic elements to the iterative algorithm, each prediction at iteration I will be generated from a different prediction model. Increasing M improves both the stability of the stochastic result as well as the accuracy of the ensemble prediction. This is similar to bootstrap aggregating, a prediction improvement technique central to the Random Forest algorithm of Breiman (2001), in which multiple random simple base learners are combined to improve stability and accuracy by canceling out random prediction errors. The key difference is that bootstrap aggregation produces multiple learners by taking random draws of the training data, while the randomization in learners at iteration I result from different stochastic simulations of chained model predictions, see section B.B2 for more details on the derivation of the stochastic ensemble predictor.

C. The regression specification
The methods are applied to each country individually, so the desired flexibility of the prediction models varies. Moreover, the properties of the data change throughout the simulation, thus the desired flexibility of the model should change accordingly, gradually exchanging robustness for flexibility. 14 The paper considers two implementations depending on data availability in the country. When data is scarce, an elastic net model is used (implemented by Hastie et al. (2021)), which helps reduce the impact of uninformative predictors (Friedman et al., 2010). When data is abundant, a cubist regression is used following Quinlan (1992); Witten et al. (2016). 15 This is a piece-wise linear model that combines decision trees, boosting, and neighborhood smoothing, to capture smoothed versions of Random Forest-type of nonlinearities (Kuhn et al., 2012). 16 The model differs from a Random Forest regression by using simple linear regressions at terminal nodes, thereby resulting in smoother transitions, that are more typical for numeric data, and enabling short-range out-of-sample extrapolation based on local regressions. Due to their linear nature, these local extrapolations generally remain reasonably stable, which is as opposed to, say, neural network predictions that can rapidly turn explosive outside observed data intervals. Cu-14 See for example the discussions in (Andrée et al., 2019; Andrée, 2020) on the relation between the sample size, strength of nonlinearity, and model size. Dynamic assumptions about the behavior of the process being modeled can be imposed by using non-parametric approaches whose parameterizations can arise flexibly. However, being overly flexible and over-fitting f would lead to terrible predictions. To a certain degree, the ensemble prediction will counter some of the prediction errors that arise from over-fitting. For example, similar to a Random Forest algorithm, if the prediction errors of models m = 1, . . . , M at step I are uncorrelated, the ensemble simply cancels them out, thereby increasing robustness when M increases. Regardless, over-fitting can be problematic. Particularly important is to avoid an overly flexible model in the first estimation step. When f is an overly flexible model, an incorrect initialization can be learned, particularly if there is a pattern of missingness that can locally be correlated to the levels in other prices. For example, if missing entries in A occur during elevated levels in B, then initializing missing entries in A at the unconditional mean and using a flexible model to parameterize f , can lead to a model that simply learns to use the mean of A as a predictor within distinct regions of elevated levels in B. In such a situation, the imputations do not improve well across updating iterations. Essentially, across iterations, the model memorizes the initialization. To avoid the algorithm getting stuck, it is best to introduce a stochastic element around a reasonably correct initialization and use techniques that can control the flexibility of f across the iterations. In addition, it may be preferable to use only models for f that result in reasonably smooth transitions in nonlinearity across levels, rather than in very sharp cut-offs in local data associations. Since the entries on the covariate side are updated iteratively, there could also be more nuances fit by the function f as the algorithm iterates. This means while the flexibility of f should be relatively low in the first step, it could be beneficial to increase it across iterations. Finally, a variable selection mechanism is also advisable as the importance of predictive features may change across iterations, see (Graham et al., 2012). 15 In total there are 186 different regression problems (number of food items summed across countries), 72 of these are solved with a linear model. Thus, 61% of problems are solved nonlinearly. Both models achieve good results, for instance in the Republic of Yemen and the Syrian Arab Republic, most of the regression problems are solved linearly but the cross-validated accuracy results are still good. The exact rules to determine whether a linear or nonlinear model is used are codified based on human judgment and best viewed in the source code that has been made available. It is possible to determine this by letting both models compete in a cross-validation exercise, but the runtime increases beyond what is practical. The guiding principle has been that when few markets are available, or long temporal gaps exist, and the results need to be extrapolated far across the data dimensions, the linear model is used. 16 Cubist is an extension of M5 regression trees that incorporates pruning, neighborhood smoothing and boosting. Essentially is uses a computationally efficient strategy to recursively partition the data space and fit simple piece-wise linear prediction models within each partition, whose predictions are combined using neighborhood averaging of local model predictions. The advantages over M5 are that it can produce smoother transitions across numeric outputs, and much faster runtime. Both being of high importance to the current application. The advantages over Random Forest are that the cubist model has linear regressions at terminal nodes and so it can extrapolate slightly out of range, while Random Forests can only interpolate using medians or averages of typical values associated withing ranges of the input data.

ANDRÉE DECEMBER -2021
bist models have done well on a variety of spatially oriented prediction problems, often reaching accuracy not far below that of deep learning methods while maintaining full model interpetability (Morellos et al., 2016;Ng et al., 2019;Sbahi et al., 2021). While the short-range extrapolation capabilities of cubist models are useful in the current setting, the most beneficial feature for this application here is that it runs much faster than its M5 cousin or other boosting or ensembling methods such as Gradient Boosting Machines and eXtreme Gradient Boosting methods (Hagenauer et al., 2019). Two regression specifications are considered. Both regression models simply perform a spatio-temporal interpolation, leveraging temporal price trends, geographic proximity, and spatial trends, to make time-varying spatial interpolations between prices of related food items. More precisely, the linear model is of the following form: while the cubist approximates the nonlinear function: In these equations, P A is a vector that stacks all the prices of commodity A observed at all markets × time combinations within a country. Similarly, P −A is a matrix that has the prices of all other food items. In the previous example, it would simply bind columns (B, C), and the iterations would simply cycle over specifications according to the scheme P B ∼ P −B + ε B and P C ∼ P −C + ε C . In this example, there are just three price vectors, but table A1 shows that the application considers problems with up to 15 price predictors.
Prices on both sides of the equation are modeled in logarithms, omitted from notation here. The vector β 1 captures the linearized relationships between log prices, e.g. the price ratios. The matrix G are group dummies for market-level fixed effects and administrative-level fixed effects, the matrix S are seasonal dummies. Finally, X is a matrix of additional covariates. These include logarithmic coordinates to capture spatial trends, and price trend features engineered from (A, B, C, . . . ) that capture important temporal variation. The price trend features are engineered by first taking the individual market price trajectories contained in (A, B, C, . . . ) that have been observed with at least 95% data coverage, the up to 5% missing price points are imputed with a seasonal Kalman filter using a Basic Structural Model. Second, a commodity-specific country price trend is constructed by Kalman interpolating all market trajectories and taking a weighted average based on data coverage. In particular, trajectories that had above 75percentile data coverage are averaged, weighting by normalized data coverage rates. As an example, if there are three markets with above 75-percentile data coverage that respectively have 50%, 75% and 100% data coverage, the weights are (0, 0.5, 1).
Particularly within the matrix X there can be highly correlated predictors, while the matrix G may contain multiple identical indicators, for instance if there is only one market within an administrative unit. The following hard-coded variable selection rules are used. First, linear combinations are removed to avoid dummy problems. Highly collinear variables with > .95 correlation are removed by iteratively recalculating the correlation matrix and removing the variable with the highest overall correlation. Finally, variables with near-zero variance are removed. 17 All predictors are centered and scaled.
Note that equation 4 is estimated using an elastic net model that uses L 1 and L 2 penalties to shrink the predictor space, see again (Friedman et al., 2010;Hastie et al., 2021). The cubist regression specification of equation 5 is essentially an observation specific counterpart of the linear model. As such, only the regional dummies are removed. The cubist model can partition the data and fit local regressions. Since the coordinates are supplied, the model can learn spatial fixed effects using adaptive neighborhood sizes as well as spatial interaction effects by partitioning by coordinates rather than relying on explicit spatial dummies, see again (Quinlan, 1992;Kuhn et al., 2012;Witten et al., 2016).

D. Validation of imputation accuracy
There are no countries in the data where the food price data is complete and the true inflation of a basket of food items is fully known. This makes validation against true data difficult. 18 Cross-validation techniques are used to adjust the flexibility and assess the predictive accuracy of the models in each iteration. The elastic net model optimizes over the standard L 1 and L 2 penalties and the mixing parameter. 19 The cubist model tunes over the neighborhood size used for smoothing, and the boosting iterations, using grid of all N eighborhood × Committees = (2, 4, 8) × (1, 25, 50, 100) combinations.
The training and out-of-sample validating are as usual on mutually exclusive draws of observations. A 4-fold validation is used to limit the computational burden of the application to manageable levels. 20

ANDRÉE DECEMBER -2021
The training data contains in addition to the observed prices a small draw of 10% of the imputations generated from the previous regression estimates. These synthetic data points help balance prediction performance in thinner regions of the sample space. When the training data consists only of actual observations, the predictions may be biased toward purely observed value ranges. Adding synthetic cases using previous predictions makes the estimation problem slightly more representative of the missing data. The validation sample always consists only of actual data points, thus excluding imputations. 21 The model-tuning focuses on a Normalized Mean-Absolute-Error criterion. The reason for this criterion is that the MAE is less impacted by outliers than the common Root-Mean-Squared Error measure and has a stable interpretation across applications. 22 Note that an attempt will be made to replace outliers by missing data values and imputing them, but there is no guarantee that all outliers are captured and so an outlier-robust prediction validation metric is a safe option. There are M validation results for each food item, but the interest is primarily in the robustness of the final imputed price index which combines all the predictions. The validation results are condensed for better presentation. In particular, a cross-comparable confidence score is constructed from the individual validation estimates. 23 The metric is defined as follows. First, a normalized MAE for food item d is constructed as the ratio of the MAE of the model for that food item to the MAE obtained by a simple mean prediction. Since each MAE estimate represents an average point percent error rate due to the log nature of the price data, the individual MAE values are averaged geometrically.
where M AE m,d is a cross-validation estimate of MAE using the standard for-21 Note that throughout the simulation, the quality of synthetic training data improves. Such techniques are also being explored elsewhere. For example, Lee and Braithwaite (2020) use a regression chain of image recognition models and feature-based models that update one another's training data which improved their learning results. In the current paper, f is parameterized using a piece-wise linear approach and adding bootstrap draws from previous imputations helped improve validation performance on actual data by allowing the models to train on denser representative example data near the edges of the sample space. This helps stabilize extrapolations outside of the sample space.
22 With simple arithmetic one can show that M AE ≤ RM SE ≤ √ nM AE, which reveals that the upper-limit of RMSE varies with sample size and has different interpretations across applications. 23 M × D × I cross-validation metrics can be computed in each country application. The result at iteration I is used to diagnose the quality of imputations, but the full validation sequence provides a diagnostic to determine a relevant value for I. In particular, throughout the simulation, the crossvalidation metrics should improve. The number of iterations for stochastic imputation is often determined by observing whether the means and variances of the predictions start changing in a purely random fashion. In the current case, the stopping criterion that the cross-validation performance does not improve further is a slightly less vague criterion. The multiple imputation implementation of van Buuren and Groothuis-Oudshoorn (2011) has default values of I = M = 5. In the current application, diagnosing the performance indicated I = 8 was usually sufficient. M = 5 was kept due to the high computational load of the application. mula for MAE, M AE d |µ is the MAE calculated using observed data and the unconditional mean. Since the true data range is not observed, and averaging is known to improve ensemble performance, the quantity from equation 6 is likely a conservative estimate. 24 The focus next is on the quantity 1 − N M AE, which is the share of the total absolute variation in the demeaned data explained by the imputation model. The D values are averaged as follows.
where Z is the Fischer Z-transformation and Z −1 its inverse, and w are the relative weights of the price component in the final price index. The final score from equation 7 roughly has the interpretation of the average R 2 of the food price index, using a robust calculation of out-of-sample errors. 25 The unit of measurements are harmonized so that each food item has equal weight after in the index once scaled to a comparable unit of measurement. Specifically, the food item specific predictions are aggregated into food price indexes by summing the prices of all foods in the basket, after bringing the prices to comparable units of measurement (1 kg for foods, 1 liter for fluids and vegetable oils, 1 dozen for single packaged eggs, 1 unit for foods that come in other units of measurement -such as some fruits that come in bundles). As a simple example, if the index consists of 1 kg of sorghum and 100 Kg of maize, the latter price is multiplied by 0.01 and an equally weighted index is constructed with the result. 26 Note such, the estimated N M AE d is likely larger than the true value since the denominator is underestimated in equation 6. There are further related challenges to the validation due to the fact that the general objective of imputing unobserved data can only be validated using observations. Some further rationale for the divergence metric and how it relates to the objective is provided in section B.B3. 25 Note that (1− N M AE) ≥ (1− N RM SE), where the second value equals the out-of-sample calculation of the R 2 . Keep in mind, however, that the data for validation may still contain outliers, see again section B.B3, so that (1 − N RM SE) > (1 − N RM SE). Regardless, when N M AE is low, such as in nearly all countries as the results will show and particularly in the outlier case, the value (1 − N M AE) approaches the out-of-sample R 2 given by (1 − N RM SE). Moreover, when N M AE is small, then model fit is good and average bias must be small, which means that 1 − N M AE approaches the in-sample R 2 which is the squared Pearson-correlation between the predictions and the observations. It is well known that the arithmetic mean of multiple correlation coefficients underestimates the total correlation and that the distribution of the correlation coefficient must first be normalized before averaging to obtain a less biased estimate, particularly when the number of coefficients is small. See (Corey et al., 1998) on this matter and the use of the Fischer Z-transformation in this context. 26 The standardization in the units of measurement is automated and works by parsing the text in the WFP data base that describes the food items and inferring multipliers that standardize the prices to comparable units. In particular, each text string is parsed as amount×unit and a multiplier for food item d is calculated as w d = 1 amount c, where c is a conversion factor from unit to either Kilograms or Liters. For example, the text string 10 P ounds of Rice would result in a multiplier of w rice = 1 10 × 1 0.453592 = 0.2204624, while 1 kg of wheat would simply result in a weight of w wheat = 1. This strategy is usually ANDRÉE DECEMBER -2021 that from a nutritional perspective, some foods should be weighted more strongly to model a food price index that reflects preferable consumption ratios. 27 The simple approach here is just to ensure that an item measured in larger quantities does not dominate the final index simply because it has a larger price range, or dominate the validation result simply because the unit of measurement is small. This is not ideal, but sensible, given that expenditure shares for specific food items used to construct traditional CPI are not widely available in the countries analyzed by the paper. Recall that the estimation exercise of the paper is in fact motivated by the unavailability of traditional data. Tables A2 and A3 list for all countries the food-specific Index Weights used to scale price levels to correspond to prices for comparable units of measurement.

E. The initialization
A two-step approach is taken to initialize the regression chain. First, univariate imputation methods are used to pre-impute the starting values of missing entries. 28 The iterative modeling then initializes randomly around the pre-imputed values by adding an additive disturbance term to the initialization drawn from a uniform distribution centered around 0, scaled to 10% of the range in the data. The variance of the prices is stabilized with logarithms, so this disturbance term impacts the initialization roughly evenly across levels.
It is important to clean the data from outliers before calculating the preimputations and applying the imputation algorithm. In particular, outliers can lead to explosive predictions or generally bad model fit depending on the types of learning methods used. Since the outliers need to be removed from incomplete time series, standard time series methods such as those relying on non-parametric reasonable and avoids that a food commodity measured in bags of 100 kg dominates the food basket price, but is obviously not smart enough to deal with all situations effectively. A specific rule is specified to deal with eggs, which may be described in some countries as 12 units, while in other countries eggs are measured as 1 dozen. In the case of '12 units', a conversion to 1 dozen is made. All the price estimates from the paper are provided for analysis, and the index weights are provided with the index estimates. If it is suspect that conversion factors may have an impact on the final inflation rate estimates, researchers are encouraged to construct their own indexes relevant to the studies at hand using the individual food price estimates.
27 For example, 1 kg of salt and 1 kg of sorghum are given equal weight in the simple index generated here, whereas the latter may carry more dietary importance. The reason for using equal weights is that it takes expert judgment to define weights based on dietary needs, which is not easily automated given the heterogeneity of the data. In an ideal world, future price survey programs would be accompanied with surveys on household expenditure shares or food specific trading volumes.
28 First, minor gaps of up to 3 consecutive months are filled with a univariate Seasonal Kalman smoother using a Basic Structural Model. Then food-market combinations with at least 67% data coverage are completed using the same method. Next, all markets trajectories completed in this way are averaged to construct a preliminary country mean. All other market level series with > 50% data coverage are then imputed using predictions from a Generalized Additive Model with observed price points as dependent variable and the preliminary country mean as the predictor. The parameters are estimated by Restricted Maximum Likelihood. The country average is recalculated, if there are any further markets with missing data, but > 20% data coverage, the same method is again applied. Any market which has < 20% data coverage, is pre-imputed by using an inverse distance weighing interpolation based on the other imputations. The spatial interpolation uses a neighborhood size cutoff which is selected using cross-validation. smoothers with adaptive bandwidths cannot be applied. The approach here first applies a linear interpolation to the raw data, then calculates returns and detects outliers in the returns sequence using the approach by Boudt et al. (2008). The outliers are then turned into missing data points which will be replaced by imputation. In most cases, severe outliers are simply measurement errors. Minor outliers that simply reflect the natural extreme price variation of illiquid markets, should already be reasonably stabilized by the log transformations.

III. Results
The methods are applied country-by-country to 25 FCS countries. There are a high number of results, all commodity-specific validation and predictions results are available in the online repository of this paper. Since for multiple countries in the sample no official food CPI are available for the periods of analysis, the section here shall discuss some of the highlights.
The cross-validation results of the final price estimates are summarized in the last column of table A4 and highlight that, even on a monthly basis, the marketlevel price predictions are fairly accurate. The Fisher-average CV-score across countries is 0.85. The lowest result is at 0.69 still a reasonably well informed prediction result given the natural volatility in prices that make direct measurement similarly prone to error.
To better understand the determinants of prediction reliability, a simple linear regression was performed with results in A5. This indicated that the share and relative dimension of available data (number of foods, number of markets) are not linear predictors of prediction accuracy, instead the volatility and strength of inflation determine how well the statistical methods impute missing data. Jointly, average inflation and volatility explain almost half of the variation in cross-validation performance. The signs of the coefficients suggest that when volatility increases, prediction performance deteriorates and data collection becomes more important. When the inflation trend is strong relative to volatility, then the price trend is clearer in the data and imputation accuracy increases, suggesting that robust inflation tracking is possible even when little ground truth data is available. Purely based on signs, there is some indication that an increased number of food items is beneficial and an increased number of markets makes the prediction task more difficult. This is in line with the idea that a higher number of food items increases the chances that at least one strongly related food item can be leveraged for prediction, while an increased number of markets increases spatial heterogeneity in the data thereby increasing the difficulty of accurately predicting all local price levels.
The subnational results are aggregated into national food price indexes by calculating a simple average price giving equal weight to each market within the country. The imputation focuses on the 676 markets where prices are observed, and spatially interpolates the results onto the full set of markets using a Shepard with cross-validated neighbors algorithm. The country average price indexes are  thus based on the full set of markets, but are essentially derived as a weighted average of the imputed markets with more weight given to markets that are more densely surrounded by other markets where WFP maintains or has maintained price monitoring operations. Table A4 also summarizes the monthly changes in the price indexes over the entire study period using three simple metrics. First, the monthly returns in the price index are annualized. This gives the average rate at which food prices have inflated at an annual basis across all available time periods. The second column provides a maximum draw-down figure, which is the largest negative percentage price change that occurred from top to bottom over the full time period. Apart from price changes, economists are frequently also interested in price uncertainty. The third column in table A4 contains the annualized standard deviation in monthly price returns, which is the average price volatility over the entire period of study. For example, in Afghanistan prices have increased at an average annual rate of 5.76%, the largest deflationary event was a measured price move of −43.86% from top to bottom, and the average standard deviation with which price changes fluctuated was 8.69%. The simple aggregation of results highlights that food price inflation is problematic in many FCS countries. Inflation-targeting countries often aim for a positive inflation rate just below 2%. Only 6 out of the 25 analyzed countries meet this criterion for food when measured over the entire study period. Moreover, in 7 countries, active price increases in food have been so strong that they outweighed price uncertainty.
Historical famines have frequently been associated with high price increases that led to speculative purchasing of food as an investment (Gráda, 2007). Investors typically aim for investments in high-return low-volatility assets and often track the performance of investments by focusing on risk-adjusted returns (Markowitz, 1991). A commonly applied metric is the Sharpe ratio, which is the annualized return divided by the annualized standard deviation of returns. The ratio is a z-score type of metric that describes the excess return received for the extra volatility endured when holding an asset. In general, for an asset to be investmentgrade, one likes the ratio to be above 1 so that the price increase outweighs the price uncertainty. 29 The Sharpe ratios can be obtained from table A4 by dividing the first column by the third. It is important to note that this ratio is constructed in hindsight and does not fully characterize all the factors involved with an investment decision, nor does it take into consideration the scarcity of information and the complexity of the market environment during inflation events. Nevertheless, it provides a simple and well-understood way to standardize price changes in a way that allows comparing the significance of price changes across markets associated with different typical volatilities. For example, a 4% inflation surge is more significant if the typical price change over that time period is 1% as opposed to, say, 10%. Contrasting these ratios, highlights that some inflation events have been more significant than others. For example, in Afghanistan, the average ratio between price increase and price uncertainty in local currency as measured over all time steps equals 0.66. At this value, inflation is positive but price uncertainty outweighs price increase and so there is no strong incentive to hold on to food to hedge against currency depreciation risks. This is opposed to Sudan, South Sudan and the Syrian Arab Republic, where the increase in food prices has not only been extremely high, but has also outweighed food price uncertainty by roughly a factor 2. Alternatively, food prices in Afghanistan rose by 14.09% annualized in 2020, which seems low when compared to some other high inflation events, but highly significant when compared to the relatively low price volatility over that same time period (inflation was 2.67 times volatility).
Inflation-targeting countries also aim for prices to be stable on low time frames. Tables A6 and A7 condense the high frequency results by presenting annualized figures. An FCS-wide average is included at the bottom. In total, price estimates for 1223 markets and 43 foods underly these figures. For all year × countries combinations for which estimates are produced (361), only 48 annualized inflation rate estimates fall within the 0-2% targeting range, while 51 estimates cross a critical inflation threshold of 50%. The FCS-wide results highlight the elevated inflation levels during the period associated with the World Food Price Crisis of 2007-08 and the aftermath of the pandemic. The annualized monthly price change in 2008 peaks at geometric average of 19.52% across the full set of countries, while remaining mostly within a modest several percentage point range over the 2009 to 2018 period. The estimated average inflation rate then spikes again at 27.46% in the year of the pandemic, while the preliminary 2021 inflation estimate of 22.65% remains well above levels of the World Food Price Crisis of 2007-08. The results in table A7 highlight that price uncertainty has remained more constant. A striking result is that average price uncertainty was higher during the World Food Price Crisis of 2007-08, peaking at an annualized rate of 13.92%, while hitting 10.52% during the pandemic shock and even dropping to 8.1% in the preliminary 2021 estimate. This means that the recent high inflation event was not only stronger in terms of price increase, but also more significant when compared to natural price volatilities in the two periods. It is from this perspective again interesting to transform the estimates to Sharpe ratios. Both the World Food Price Crisis of 2007-08 and the post-pandemic inflation surge was associated with prices increasing relative to price uncertainty, and this price development is more pronounced in the post-pandemic inflation event. For example, from 2009 to 2018 the FCS-wide average annual Sharpe ratio for food prices was 0.55, but during 2007-08 it averaged 1.30. During 2020, this ratio hits 2.70, while the preliminary annualized 2021 estimate sits at 2.80, meaning that returns in food prices outweigh price volatility risks of holding food as an asset by nearly a factor 3. This means the ratio of inflation to price uncertainty is almost double that of the World Food Price Crisis of 2007-08.
The monthly price estimates indicate that prices can rise and fall dramatically 20 ANDRÉE DECEMBER -2021 within very short time periods in FCS countries. Figure A1 presents estimated monthly year-on-year inflation on a line chart for the Republic of Yemen together with intra-month price ranges on a candle chart. The intra-month volatility algorithm is detailed in section B.B4 and uses a conditional autoregressive heteroskedasticity model to estimate the time-varying properties of the monthly price variance process. The estimates are used to calculate Expected Shortfall in the price returns, which is used to construct the wicks and bodies of the candles.
The open values are defined as the estimated conditional expectation of monthly prices, the highs are the average intra-month price levels in the worst 50% of estimated conditional price increase events, the lows are equal to the average intra-month price levels in the worst 50% of estimated conditional price decrease events. The colored parts of the candles are the estimated price ranges within which the majority of monthly prices are estimated to be, with red candles indicating months in which the end-of-month prices closed below the start-of-month price estimates. Strong inflationary events are thus visualized as consecutive green candles. Periods of high volatility are visualized by candles with large wicks. Results for all other countries are available in figures A2 to A4. The charts in figures A2 to A4 highlight that there have been several notable high inflation events. The World Food Price Crisis of 2007-08 is visible as a price up-tick in most countries, most notably in Afghanistan, Burkina Faso, Chad, Haiti, Niger and Somalia. In some of the charts, the price action seems modest as it is suppressed by recent price action. In these instances, it may be more useful to look at prices on a logarithmic chart. Prices have also visibly surged in many countries in the year following the pandemic.

IV. Discussion and concluding remarks
Food price inflation is an important metric to inform economic policy and is closely watched by both economists and humanitarians, yet official figures are often missing, lacking in spatial detail, or published with delay. More important, crisis situations, or vulnerable populations in general, are often characterized by high geographic specificity while traditional price indicators do not provide insights beyond the major (urban) markets where prices are formally measured. Traditional CPI data in these situations, if available, may be insufficiently unbundled into distinct price estimates for it to be put into a relevant context. Recognizing this, international institutions have invested substantially in subnational price surveys. However, the capacity to monitor inflation has continued to be severely limited by challenges related to missing data. To overcome some of these shortcomings, this paper proposed an approach for real-time imputation of survey data drawing on multiple imputation and ensemble learning ideas.
The paper highlighted the new price monitoring capabilities using survey data gathered in 25 fragile and conflict-affected countries. The final estimates of food price inflation documented in this paper were shown to accurately capture important inflation events including the World Food Price Crisis of 2007-08 and the surge in inflation that followed the 2020 pandemic and subsequent expansion in the global monetary base.
The paper used out-of-sample validation techniques to estimate the reliability of the augmented data. The share of missing data (20.4% to 79.18%), the number of markets (3 to 77), and the number of food items (3 to 16), varied widely across the country-specific applications, but the imputation methods were shown to remain robust across these aspects, even when data coverage was relatively low. A linear regression that correlated the cross-validated prediction performance with key properties of the country applications showed that data coverage itself was not a correlate of prediction accuracy. Instead, the strength of inflation and the natural volatility in prices are strong predictors of the imputation accuracy.
The accuracy of the imputations was judged by estimating the total price variation explained by the imputation models. On average, across the countries, the models predicted 85% of the observed price variation. In individual countries, the errors were in the 5% to 30% range of observed prices. This puts the accuracy of imputations in a similar range of that of the direct measure of prices at major urban markets in countries with well-established CPI methods. In particular, direct measure of prices is prone to error with respect to true price levels simply because of the natural volatility in prices. For example Lebow and Rudd (2003) estimated that measurement error in CPI change rates in the United States, a country with strong statistical capacities, could have been as high as 0.3% to 1.4% points over a period where the change rates were typically in a 3% − 6% range, which, similarly, places the measurement error of official CPI change in a 5% to 45% range of true values.
An important result is thus that as long as incomplete and intermittent survey data has a at least a 20% to 40% rate of completeness, it can be augmented reliably with predictions to monitor subnational food price trends continuously. This contributes to a previously non-existing capacity to provide insights beyond the major (urban) markets where prices are formally measured with traditional methods. In situations with great spatial heterogeneity, and localized vulnerabilities -such as in fragile environments or during crisis -this can provide important new insight. Additionally, in those markets where prices are very sensitive to localized shocks, the statistical estimates may provide new opportunities to investigate local price dynamics with a similar confidence as one would have obtained otherwise with measured prices.
A cross-country analysis of inflation trends, and comparison with inflation trends at global markets such as captured by the FAO or World Bank Food Price Index, would be interesting for future analyses, but is beyond the scope of the paper as it would require dealing with the differences in exchange rates. In a few countries, such as the Republic of Yemen, local unofficial exchange rates are also surveyed by WFP. In a separate analysis, the methods of the paper have been applied to track prices of 23 items covering foods and non-foods in the Repub-

ANDRÉE
DECEMBER -2021 lic of Yemen. 30 This highlighted that it is possible to track dollar-denominated prices, broadly for the five food categories that comprise the FAO index, and draw comparisons with price events in global markets when the exchange rate is also monitored. In this paper, it is simply noted that figures A2 to A4 provide some visual guidance on how global price change events propagate at the national level, as nearly all the country graphs display inflation events in 2020. The figures indicate that there may be substantial cross-country heterogeneity in the timing and amplitude of the price surges, which likely relates to differences in factors such as whether a country is a net importer or exporter of food and how the local currency is managed during times of crisis. Finally, since not all products matter the same for household well-being, future analysis may use the estimates developed by this paper to explore food-specific price dynamics or produce inflation estimates based on food baskets that incorporate information on expenditure shares.
Deploying statistical methods to enhance data gathering may help improve price estimates more widely. For instance, additional investments could be redirected to broaden the scope of data gathering to include additional sampling locations and food items rather than be used to strengthen data coverage of existing narrow monitoring operations. This could give a more complete view on subnational inflation than can be achieved by pure data gathering alone. Moreover, gathering data on additional food items can help produce more accurate predictors for missing observations. The imputations primarily utilize contemporaneous relations between the prices of different items, and so priority in data gathering should be given to ensuring that in each month prices of at least some items at some markets are observed. Ensuring that some data is available most of the time periods is more important than ensuring that most of the data is available in some of the time periods. In addition, predictions are more reliable in high inflation episodes and less reliable in high volatility episodes, so data gathering processes may also use statistical methods to produce real-time price estimates and use the results in turn to determine how much new data gathering is needed. The first column reports the local currency in which prices are measured, the second column reports the number of markets from which data is used as a fraction of all known market location for which predictions are made, the third columns reports the number of food items for which data is used to construct the food price index, the fourth columns reports the total number of price observations as a share of all market × time combinations where markets is the first number in the third column, the final column reports when the estimated price index starts. Source: The statistics have been prepared by the author for this paper based on end-of-August (2021) food price data from World Food Program. 9.55% -23.84% 14.13% 0.78 Note: The first three columns respectively report average annualized inflation, maximum draw-down, and average annual realized volatility in percentages. The final column reports the cross-validated confidence score that ranges from 0 to 1 for the final food price index using the calculations from the paper. Additional cross-validation statistics can be found on the World Bank Data Catalog page where a live version of the data base is maintained. Source: The statistics have been prepared by the author for this paper based on end-of-August (2021) food price data from World Food Program. Note: * p<0.1; * * p<0.05; * * * p<0.01 Note: Simple linear regression estimates that decompose the cross-validation performance score into key characteristics of the imputation problem. Percentage covariates are modeled as numeric digits of the same unit as the dependent variable, e.g. a monthly volatility of 2% enters the regression as a value of 0.02. The simple results highlight that prediction performance is not significantly related to data dimensions of the problem, instead high volatility and high inflation are determinants of imputation accuracy. Jointly, these two variables explain almost half of the in-sample variation. The signs of the coefficients suggest that when volatility increases, prediction performance deteriorates and data collection becomes more important. When the inflation trend is strong relative to volatility, then the price trend is clearer in the data and imputation accuracy increases, suggesting that robust inflation tracking is possible even when little ground truth data is available. Source: The statistics have been prepared by the author for this paper based on end-of-August (2021) food price data from World Food Program.  Monthly price data is maintained at the World Bank Data Catalog page associated with the paper. FCS is a geometric average of country rates. The 2021 * figures are based on end-of-August data. Source: Statistics have been prepared by the author for this paper. p d mis by filling in entries for p d mis based on the information contained inp d obs . Since the true targets are unobserved, a direct criterion is difficult to establish but the objective can be summarized as minimizing the divergence ∥p d=1,...D −π d=1,...D ∥, in turn estimated with an L 1 -norm based metric for the prediction function that generatesp d mis , validated by the out-of-sample predictions it makes for the outlier-filtered p d obs that serves as a proxy for π d obs .

B4. Intra-month estimates
The price-level estimates are accompanied by intra-month price range estimates represented as an Open-High-Low-Close time series.
where P is the imputed price series and E∆α is the expected change in the α-percentile cases. The combined results can be plotted on a candle chart. The majority of price action can then be assumed to have occurred within the body of the candles, while the wicks indicate the average price of respectively the highest 50% of intra-month prices, and the average price of the lowest %50 of intra-month prices. The first three quantities in equation B8 are estimated by modeling the time-varying distribution of the month-on-month inflation process as an autoregressive moving average process with fractionally integrated generalized autoregressive conditional heteroskedasticity (ARMA-fiGARCH) following Baillie et al. (1996). This is a time-varying density (B9) Ft = (µt, σt, ϑ) where µt is a conditional mean process defined as an ARMA(p, q) process (B10) µt = c + p j=1 ϕ j µ t−j + q j=1 θ j ε t−1 + εt, and the conditional variance is specified as a fractionally integrated GARCH process of order (p, d, q) and ϑ is a vector of remaining parameters of the distribution. The conditional variance is defined as follows. First, let the standard GARCH(p, q) process be defined as (B11) σ 2 t = ω + α(L)ε 2 t + β(L)σ 2 t with σ 2 t as the conditional variance, ω an intercept, and L the back-shift operator with α(L) = q j=1 α j L j and β(L) = p j=1 β j L j . This model has an ARMA representation of the squared process: with ϕ(L) = max(p,q)−1 j = ϕ j L j . The fractionally integrated GARCH is obtained by replacing the back-shift operator (1 − L) with a truncated fractional difference operator (B13) (1 − L) d = K=1000∼∞ k=0 Γ(d + 1) Γ(k + 1)Γ(d − k + 1) L k .
Ignoring the approximation error due to the truncation, at d = 0 the model equals the standard GARCH in which volatility shocks decay at an exponential rate. Similarly, when d = 1, the AR polynomial of the GARCH has a unit root and the model equals the integrated GARCH in which shocks persist forever.
When there are level shifts in volatility process, an integrated GARCH usually better describes the data