Policy Research Working Paper 10758 Comparative Analysis of AI-Predicted and Crowdsourced Food Prices in an Economically Volatile Region Julius Adewopo Bo Pieter Johannes Andrée Helen Peter Gloria Solano-Hermosilla Fabio Micale Development Data Group & Agriculture and Food Global Practice April 2024 Policy Research Working Paper 10758 Abstract High-frequency monitoring of food commodity prices is relate to independent ground truth data. To evaluate if these important for assessing and responding to shocks, especially data strategies can meet the long-standing demand for real- in fragile contexts where timely and targeted interventions time intelligence on food affordability, this paper analyzes for food security are critical. However, national price surveys open-source daily crowdsourced data (104,931 datapoints) are typically limited in temporal and spatial granularity. It is from a recently published data set in Nature Journal, rel- cost prohibitive to implement traditional data collection at ative to complementary ground truth sample. The paper frequent timescales to unravel spatiotemporal price evolu- subsequently compares these data to open-source monthly tion across market segments and at subnational geographic artificial intelligence–generated price data for identical levels. Recent advancements in data innovation offer prom- commodities over a 36-month period in northern Nige- ising solutions to address the paucity of commodity price ria, from 2019 to 2022. The results show that all the data data and guide market intelligence for diverse development sources share a high degree of comparability, with varia- stakeholders. The use of artificial intelligence to estimate tion across commodity and market segments. Overall, the missing price data and a parallel effort to crowdsource findings provide important support for leveraging these commodity price data are both unlocking cost-effective new and innovative data approaches to enable data-driven opportunities to generate actionable price data. Yet, little is decision-making in near real time. known about how the data from these alternative methods This paper is a product of the Development Data Group, Development Economics and the Agriculture and Food Global Practice. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http:// www.worldbank.org/prwp. The authors may be contacted at jadewopo_@worldbank.org and bandree@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Comparative Analysis of AI-Predicted and Crowdsourced Food Prices in an Economically Volatile Region Julius Adewopo,1,2 Bo Pieter Johannes Andrée,1 Helen Peter,2 Gloria Solano-Hermosilla,3 Fabio Micale4 Affiliations 1. Data and Development Group, World Bank, Washington, DC. USA 2. International Institute of Tropical Agriculture – IITA, Ibadan, Nigeria 3. Universidad Pablo de Olavide, Sevilla, Spain 4. European Commission Joint Research Center (EC-JRC), Ispra, Italy corresponding author(s): Julius Adewopo (jadewopo@worldbank.org) JEL Classification: Q11 – Aggregate Supply and Demand Analysis: Prices Keywords: Food Price, Crowdsourcing, Artificial Intelligence, Ground truth, Data Introduction Food insecurity poses a lingering threat to national development across low- and middle- income countries (LMICs), and the recent inflationary trend has severely undercut food affordability for millions of households (World Bank, 2023; Headey & Ruel, 2023). The recent Shared Prosperity Report (World Bank, 2022) reveals that globally, 658 million individuals live below the international poverty line of $2.15 per day. Furthermore, estimates suggest that between 690 million and 783 million people worldwide experienced hunger in 2022 (FAO, IFAD, UNICEF, WFP and WHO, 2023; FSIN and Global Network Against Food Crises, 2023). Among these, over 258 million are in a food crisis, and forced to meet minimum dietary needs by employing irreversible coping strategies, including the liquidation of livelihood assets (FSIN and Global Network Against Food Crises, 2023). The international community's response to the global food crisis has encompassed significant humanitarian aid to directly protect livelihoods, as well as increased investment in resilience and prevention. Transitioning from reactive to proactive aid necessitates rigorous monitoring to gather information for early detection and to inform responses to emerging food security risks (Lentz & Maxwell, 2022). Food and consumer price inflation significantly impact household food affordability, and both are leading indicators of food insecurity (Andrée, 2021b; Easterly & Fischer, 2021; Waterlander, et al., 2019). Generally, food price data reflect market pulse and sentiment, and capture reactions to shocks and threats. Monitoring of commodity prices and food price inflation is an important tool in early warning and rapid assessment of risks to markets and food security (Baquedano, 2015; Andrée, 2021a). However, regional or national system support for frequent tracking of changes in food prices at various market segments is rare in LMICs (Adewopo, et al., 2021; Zeug, et al., 2017; Galtier, et al., 2014). For ex-ante and ex-post assessment of food security and market functioning, it is crucial to monitor food prices at sub-national levels, especially in fragile contexts where majority of households are living at the brink of poverty, and limited capacity or opportunity exists for local safety nets. In most LMICs, price monitoring and reporting systems for major staple commodities are rife with various intractable challenges, including planning time, costs, and logistical constraints to collect data at relevant granularity (i.e. market segments, commodity sub-types, spatial and temporal scales, etc.) to derive national and sub-national intelligence on market or price dynamics (FAO, 2017; Galtier, et al., 2014). Yet, governmental institutions, development agencies, and market stakeholders rely on such intelligence to make decisions about appropriate development and humanitarian interventions or investments. The underlying commodity price data are typically generated by designated national agencies or third-party institutions, through snapshot surveys of markets or implementation of arguably non-transparent proxy methods (Green, et al., 2013; Kalkuhl, et al., 2016). Proper food system planning and appropriate intervention require robust data and credible insights for time-relevant, targeted action, and reflexive impact assessment. At best, current official data represent a lagging rear-view of food security status and changes, and at worst, they misrepresent or under-represent the critical nuances of market signals in space and time, thereby eroding the confidence of decision-makers and data users. Within the past decade, researchers have proposed and tested various innovative approaches to bridge the food price data gap in fragile contexts, producing mixed results (Kisan, 2015; Zeug, et al., 2017). The inherent applicability of these approaches can be adjudged relative to core elements of data innovation, including replicability, synchronicity, validity, and velocity over time and space. Data crowdsourcing and artificial intelligence (AI) are two major rapidly emerging, and independently evolving, approaches that are 2 considered promising for timely monitoring of market prices and to support food security assessment. Generally, the data crowdsourcing approach leverages the so-called “citizen science” principles to engage volunteers (paid or unpaid) to submit data at intervals (defined or undefined) which can be aggregated within or across locations and time. The approach is premised on the notion that the diversity of the crowd volunteers will foster the collection of rich data with minimal or no bias because the data submission is typically decentralized and independent, representing a robust quasi-sampling across the population at scale (Arbia, et al., 2023; Solano-Hermosilla, et al., 2022). Beyond the core value proposition that crowdsourcing is a cost-effective and timely approach to collect price data, it also offers ancillary benefits. For instance, it fosters an inclusive digital and data ecosystem, where citizens are not merely data consumers or data subjects, but they are also empowered as data curators and duly recognized as active stakeholders in data-driven decision making for the food system. This approach was fully developed and tested under an initiative called Food Price Crowdsourcing in Africa (FPCA), which resulted in the submission of food price data by >1,000 volunteer citizens, over a 3-year period (2019 -2021) across a region in Nigeria where food insecurity is prevalent (Thomas, 2023; World Bank, 2023). The outputs provided one of the first near-real-time datasets that revealed the actual impact of the COVID-19 pandemic on food prices, and indirectly, food affordability at the sub-national level (Adewopo, et al., 2021; Adewopo, et al., 2022). In parallel to data crowdsourcing, the recent leap in the application of AI has revealed new possibilities for food system intelligence, including back-casting, now-casting, and forecasting. By leveraging predictive algorithms, AI systems are now performing complex data operations to spurn new datasets from sparse data points (Nguyen, et al., 2023; Savage, 2023; Lu, et al., 2021; Gavin & Mandal, 2002; Aldoseri, et al., 2023). While AI applications for complex analysis are not new, the embedding of modern analytical models and real-time coupling with multi-channel data or information pipelines has unlocked advanced capabilities of AI algorithms. One of such revolutionary capability is the monthly imputation of price data, which was developed for 25 countries (Andrée, 2021a; Andrée & Pape, 2023) and is now deployed by the World Bank to cover over 2,100 markets across 36 countries and monitor food prices, energy prices, and unofficial parallel market exchange rates. 1 The Real- Time-Prices (RTP) dataset is created through a resource-intensive Markov Chain Monte Carlo framework involving hundreds of thousands of models to address missing data points in underlying surveys gathered from country systems, the World Food Programme and the Food and Agriculture Organization. To ensure the rapid processing required for maintaining up-to-date outputs as new data arrives, the algorithm employs a swift tree-based approach known as Cubist. This method integrates local models that capture relationships within specific regions of the feature space, similar to the concept of local receptive fields in convolutional neural networks (CNNs). 1 Andrée, B. P. J. (2021). Monthly food price estimates by product and market (Version 2024-01-25). WLD_2021_RTFP_v02_M. Washington, DC: World Bank Microdata Library. https://microdata.worldbank.org/index.php/catalog/4483 Andrée, B. P. J. (2024). Monthly energy price estimates by product and market (Version 2024-01-25). WLD_2021_RTFP_v02_M. Washington, DC: World Bank Microdata Library. https://microdata.worldbank.org/index.php/catalog/6134 Andrée, B. P. J. (2024). Monthly food price estimates by product and market (Version 2024-01-25). WLD_2021_RTFP_v02_M. Washington, DC: World Bank Microdata Library. https://microdata.worldbank.org/index.php/catalog/6160 3 As these approaches evolve independently, they portray an unprecedented opportunity to sidestep the challenges associated with traditional enumerator-based price data surveillance, including extensive temporal lags (sometimes >5years gap), coarse spatial resolution, and poor market segment coverage (Zeug, et al., 2017). Further, as the data systems mature, they can complement enumerator-led efforts by enriching national data systems with high-frequency price monitoring. However, for the data outputs to be considered useful, it is imperative to validate against standard enumerator-submitted data across market locations and over time. Due to the nascence of these innovative data approaches and sources, it is uncertain if these alternative estimates of commodity prices are credible reflections of market signals and actual price levels. Both approaches are characterized by inherent limitations. For instance, the crowdsourcing approach can arguably be influenced by the vagaries of independent volunteers who may submit spurious or outdated price data at will, defying the inbuilt controls to enforce the integrity of the data collection system (Solano-Hermosilla, et al., 2022; Arbia, et al., 2023). Similarly, model-based price estimates from AI algorithms can be subject to various accuracy issues and uncertainties due to parameterization and bias (Aldoseri, et al., 2023; Andrée, 2021a). Therefore, independent ground-truthing against enumerator-led data, and evidencing [dis- ]agreement between these data sources, is imperative to guide further innovation and engagement of stakeholders to advance real-time assessment of food price changes, affordability, and food security. Study Area We focused this comparative study on three (3) major states in the core northern region of Nigeria (Figure 1), where the geographical and time-range coverage of the World Bank’s RTP (henceforth, AI-estimated) food prices overlap with crowdsourced price submissions. With a combined population of 30.5 million, the focal states account for 15% of the national population (UNFPA, 2023) and approximately 6.5% of the national GDP (BudgIT, 2022). Characterized by low literacy and historically high levels of insecurity, this region has faced lingering threats of food insecurity and the expedient need for relevant interventions provides context for alternative high-frequency surveillance of food commodity prices and timely insights into overall market conditions. In the period preceding the data crowdsourcing efforts, this region of the country was under severe warning for insecurity due to terrorism, armed banditry, and religious crisis. Historically, agricultural production has played a major role in the economy of the three states, with cultivation of major staple crops such as maize, cowpea, soybeans, and local rice. The geographical coverage extends through the Northern and Southern Guinea savanna agroecological zone; therefore, the major sowing and harvesting season occurs between June and October annually, and most grain crops are cultivated under rain-fed and irrigated conditions. The major markets are usually active throughout the year, varying in size, and typically featuring vendors from wholesalers to retailers. There are smaller markets that are periodically open (such as village markets, proximate to farm gates), while the major cities feature small and mid-sized local retailers (including kiosks, shops, supermarkets, etc.). Due to the presence of active trade routes within the region, people (buyers and sellers) and goods move between markets, while vendors can own multiple stores within and across markets, potentially influencing spatial price transfer. 4 Figure 1: Map of the study area, covering three (3) states in the core northern region of Nigeria, showing georeferenced market locations where ground truth commodity prices were submitted by volunteer crowd and trained enumerators, in conjunction with markets where artificial intelligence (AI) algorithm was used to estimate prices. Results The results of this analysis are presented in two (2) layers (Table 1) to (i) elucidate the correlative relationship between crowdsourced and enumerator-submitted ground truth data over an 8-month period, and (ii) assess the relationship between crowdsourced and AI- estimated price data over a 3-year period, with exploration of nuanced aspects of the relationship relative to specificity of sub-national geography, market segments, or commodity sub-type. Price data was available for both maize and imported rice across all datasets. The AI-estimated prices track this as white maize (100kg) and imported rice (100kg). Although the AI-imputed monthly data is based on input of limited monthly survey data available from WFP during the period (15.4% of the datapoints), the cross-validation metric for the commodities ranged from 0.87 – 0.97, indicating high confidence in model estimates. In contrast to the AI-based price data, the crowdsourced and ground truth data differentiated the prices of the commodities into additional sub-types (as yellow maize, white maize, Indian Rice, Thailand Rice) albeit with different units of measurement that are typically smaller, and closer to quantities at which the foods are bought by households, than the AI-estimated prices. The quantities include standard packaging weights (100kg, 50kg, 10kg, 1kg) and local measures (kwano, mudu, tiyya, etc.). The relative distribution of the packaging units was previously published along with the data post-sampling method which showed that most of the prices submitted (81.9%) were based on local measure/weights 5 (Arbia, et al., 2023). To enable comparisons, all prices were converted to unit price per kilogram (i.e. ₦/kg). Note that this is a simple rescaling that does not control for possible discounts that are factored into larger quantity prices such as those tracked by the AI- estimated data. The AI-estimated data contain monthly Open, High, Low and Close (OHLC) price estimates, while the crowd-sourced and enumerator data is available at an intra-month interval. For easier comprehension, we first present the overall summary of the correlative relationships between paired price datasets (by commodity, and their sub-types), based on monthly averages. We hone-in on the results for enumerator versus crowdsourced commodity prices to position the crowdsourced data relative to enumerator-submitted ground truth data. We then position the AI-estimated data relative to the crowdsourced data, leveraging the longer time interval for which these data are available. The focus is initially on yellow maize prices, which shows similar trend with white maize prices, to elucidate important ramifications of the observed correlative relationships between ground truth and crowd sourced prices at different time intervals. We then contrast monthly maize prices (yellow maize and white maize) in the crowdsourced data with same commodity prices in the AI-estimated dataset (Figure 2). Additional insights for rice (as a whole and by sub-types) is presented in the appendix sections. Figure 2: Comparison of data volume, frequency, and temporal trend of maize grain prices from AI-estimated and ground truth sources in the northern region of Nigeria over three years (2019–2021) – (a); and during validation period (Mar. 2021 – Oct. 2021) – (b) and (c). Ground Truth Enumerator versus Crowdsourced Commodity Prices: During the 8-month data validation period, the trend and progression of daily averaged submitted prices were comparable between enumerator-submitted and crowdsourced ground truth data, notwithstanding indicative differences in the daily price range and mean values, during the 8-month validation period (Figures 2 and 3a-c). Specifically, the overall mean price from both sources were not significantly different (p=0.40), as ground truth and crowdsourced maize 6 price averaged ₦216.25 and ₦229.75, respectively; similarly, the price of rice averaged at ₦612.25 and ₦615.88, respectively. Focusing on yellow maize, high variability was observed in the daily mean of crowdsourced data (₦156 to ₦296), as compared to enumerator range (₦120 to ₦374). However, the observed variability attenuated as the submissions were averaged into weekly and monthly datapoints (Figure 3b-c). By averaging both datasets into coarser temporal granularity (i.e. daily =>weekly =>monthly), the cohesion and similarity of price signals became more evident, and correlation coefficient improved (36% increase) from 0.69 at daily timestep to 0.94 at monthly timestep. Further, at monthly timestep (Figure 3c), 86% of the variability inherent in ground truth prices is represented in the crowdsourced data (p<0.001, α=0.05). In the corollary, the co-variability explained at weekly and daily timesteps are lower, likely owing to data noisiness at daily and weekly time intervals. Table 1: Relationship between Ground truth reference prices from enumerators (Gr), Crowdsourced (Cr), and Artificial Intelligence (AI) estimated prices for maize and rice. The Gr- Cr price data comparison was based on intraday data submissions over 8-months period (Mar - Oct 2021), while the Cr-AI data comparison was based on 3-years data (2019-2021) monthly average Cr prices juxtaposed with monthly AI-estimated closing prices. The values in the table represent the overall mean prices for each period, the corresponding correlation coefficient (R), and coefficient of determination (r2), indicating the direction and strength of the relationship between the paired price datasets. 8-months Validation 3-Year Comparison Gr ~ Cr Cr ~AI Gr (₦/kg) Cr (₦/kg) Cr (₦/kg) AI (₦/kg) R r2 R r2 Yellow 219.94 232.76 0.94 0.86** Maize 160.72 131.15 0.99 0.98** White 212.68 227.05 0.94 0.86** Thailand 649.31 625.50 0.78 0.55* 434.72 429.85 0.94 0.88** Rice Indian 575.34 606.13 0.96 0.91* * denotes significant relationship, P-value <0.01; ** denotes significant relationship, P-value <0.001. All tests performed at α = 0.05. 7 a. Price (₦/kg) b. Groundtruth Price (₦/kg) R=0.69, r2 = 0.48 R=0.89, r2 = 0.79 R=0.94, r2 = 0.86 P<0.0001 P<0.0001 P=0.0005 Figure 3. Co-evolution (a) and relationship (b) of yellow maize prices submitted by trained local enumerators (ground truth) and volunteer crowd (crowdsource) over 8 months period (Mar 2021 – Oct 2021) within the northern region of Nigeria. The cohesion of price signal and relationship increasingly improved as intraday datapoints were averaged to daily, weekly, and monthly time intervals. R denotes the correlation coefficient while r2 denotes the coefficient of determination, indicating the measure of co-variability. Crowdsourced versus AI-Estimated Commodity Prices: Based on the preceding observation that price signals between the crowdsourced and ground truth data were credibly cohesive at monthly timestep, and the fact that AI-generated prices are currently available as monthly estimates, we proceeded to focus further analysis on the correlation of monthly averages of crowdsourced data with AI-based estimates over a longer timeframe (3 years). The correlative analysis shows that the variability of monthly prices of maize was predominantly similar between both data sources, marked by near-perfect co-variance over the 36 months period (r2= 0.94; p<0.001; Table 1 and Figures 4a and 4b). The mean monthly crowdsourced prices mostly followed similar trend as AI-based estimates, with slight exception around June 2021 when AI-estimated closing price showed a decline by 2.7% but crowdsourced price steeply increased by 9.3%, relative to previous month. The co- evolution of prices between both sources resumed in the following month, trending downwards by 25.3% and 12.4% for crowdsourced and AI-estimated prices, respectively, within the subsequent five (5) months. Notwithstanding the month-to-month variations and inherent temporal gaps in the crowdsourced price data, the overall monthly average price per kg of maize was predominantly higher than average AI-estimated closing prices (Table 1; Figure 3a). Subsequent filtering of the crowdsourced prices for maize based on farmgate, wholesale (100kg packages), and retail markets show differences in mean prices during the period, in the order ₦154.77, ₦161.23, and ₦164.03, respectively, all remaining higher than 8 overall average AI-estimated closing prices (₦131/kg) but closer to the average AI-estimated “high” monthly prices (₦142.08, data not shown in this paper). a. b. R: 0.97 | r2: 0.94 P-value: < 0.001 Figure 4: Temporal trend (a) and relationship (b) between monthly average prices of maize submitted by volunteer crowd (crowdsourced) and estimate by artificial intelligence (AI- estimated) within a fragile context in the northern region of Nigeria during a 3-year period (2021-2023). Crowdsourced prices represent monthly post-sampled average of intraday submissions by volunteers across >100 geolocated market points, while AI-estimated prices represent average estimated monthly closing price over four (4) geolocated markets within the study region. Unraveling relationships by Location, Commodity Type, and Market Segments Unraveling data at lower geographic level (admin layer 2, also called “States”), differentiating by commodity sub-types, and delineation of the source market segments, shows a consistent agreement between both crowdsourced and AI-estimated prices, in alignment with the preceding overall aggregation. First, general mean monthly crowdsourced prices were comparable to AI-estimated prices by location (₦154/kg vs ₦156/kg in Kaduna, ₦152/kg vs ₦139/kg in Kano, and ₦148 vs ₦134/kg in Katsina, respectively). Also, the crowdsourced prices varied by market segments, mainly following the order for maize (Farmgate = ₦155/kg < Wholesale = ₦161/kg < Retail = ₦164/kg), with the average farmgate price closest to average AI-estimated closing price (₦131/kg). However, there was no major difference between maize subtypes (₦154/kg for yellow maize and ₦150/kg for white maize). Beyond these indicative (in)differences in average prices, the fitted linear relationship shows slight differences in the co-variance of prices from both sources, yet the relationship remained consistently strong within each state and for both maize sub-types (0.72≤ R ≤0.98, p<0.0001; Figure 5a; Table 1). In related context for market segments, the covariation of prices from both data sources was consistently high across farmgate, retail, and wholesale selling points, as shown by r2 values hovering between 0.84 – 0.89 (Figure 5b). 9 2 2 2 R=0.97, r = 0.93 R=0.97, r = 0.94, R=0.78, r = 0.60 P<0.0001 P<0.0001 P<0.0001 2 2 R=0.85, r = 0.72, 2 R=0.96, r = 0.92 R=0.98, r = 0.97, P<0.0001 P<0.0001 P<0.0001 Figure 5a: Relationship between Crowdsourced and AI-estimated prices of maize, disaggregated by State and Commodity subtype within a fragile context in the northern region of Nigeria. The intraday volunteer-submitted crowdsourced prices were collected over a 3-year period (2019 -2021) and averaged into monthly values. The AI- estimated monthly closing prices were averaged over four (4) market locations within the study region. r2 denotes coefficient of determination and the significance of the relationship is tested at α=0.05. 10 r2=0.87 r2= 0.84 r2 =0.89 P< 0.001 P < 0.001 P < 0.001 NCr=18,529 NCr=35,843 NCr=498 Figure 5b: Relationship between Crowdsourced and AI-estimated prices of maize across three (3) major market segments within a fragile context in the northern region of Nigeria. The intraday volunteer-submitted crowdsourced prices were collected over a 3-year period (2019 -2021) and averaged into monthly values. The AI-estimated monthly closing prices were averaged over four (4) market locations within the study region. NCr denotes number of intraday Crowdsourced datapoints that were included in the analysis during the period. Discussion The innovative crowdsourcing of food commodity prices and deployment of AI algorithms for price imputation and estimation are both compelling strategies to achieve high-frequency food price surveillance and near-real-time assessment of market signals, enabling proactive anticipation of threats and responsive actions to shocks, particularly in fragile contexts. Both strategies are relatively nascent and are independently evolving with promising outputs so far. At the core, we seek to address two questions that are fundamental to gain the confidence of stakeholders (including the data science community) as it pertains to the use and further embedding of the outputs from either or both approaches in further analytical considerations. Our first question is aimed at determining if there is a credible relationship between conventionally-collected ground truth price data and the crowdsourced price datasets. The second question focuses on whether there is consistent agreement between the price data that are generated from both innovative data gathering methods, over a longer timeframe. While both questions are apparently straightforward, they are not trivial. Answering these questions requires the intentional set-up of a data collection methodology that aligns at least six (6) major elements – (i) national context, (ii) timeline, (iii) geographic locations, (iv) market segments, (v) commodity types/sub-types, and (vi) packaging units. Therefore, our ability to elucidate the relationship between these datasets rested solely on the coincidence of the focal geography/markets (Figure 1), co-occurrence of the selected commodities, and intersection of the temporal data granularity (i.e. intraday > daily > monthly). This correlative analysis of crowdsourced and AI-estimated price datasets unravels important understanding of the inherent credibility of both approaches, relative to ground truth prices. New frontiers in data innovation typically promise data volume, variety, velocity, and high- throughput processing, but they trigger reasonable questions about their trustworthiness and accuracy to realistically represent ground observations under rapidly changing or high entropy conditions, such as volatile markets in fragile situations. It is imperative to ensure 11 that new data approaches meet minimum requirements to establish their validity and dependability, relative to extant conventional approaches. Since FPCA was implemented in two (2) discontinuous phases, there were two short gaps (4- months and 6 months) in the temporal trend of the crowdsourced data within the 3-year period (Figure 3a), yet the data trajectory reflects major shifts in prices. This comparative assessment provides compelling evidence that suggests both validity and complementarity of crowdsourced and AI-estimated price data under demonstrably limiting geographic context. At the first level, the magnitude of agreement between the prices that were curated through both data sources transcended initial expectation. As new efforts emerge to apply crowdsourcing approach to generate high-frequency and large volume of data (Manners, et al., 2022; Minet, et al., 2017), there remains a lingering and prevalent notion that citizen volunteers are unlikely (or incapable) to submit credible data that can be as reliable as data from trained enumerators (Zeug, et al., 2017). During the period, total of 2,355 and 102,842 individual price datapoints accrued to ground truth enumerators and volunteer crowd, respectively, within the focal region. Both datasets were originally conveyed with different pre-defined local packaging units and market segments, however majority of the datapoints (72%) were submitted from retail-based market sources. The strong agreement between both price dataset obviates doubts regarding the usefulness and validity of high-frequency data sourced directly from citizens or market actors. Beyond the previously highlighted limitations of new generation data imputation methods, AI-based algorithms can be constrained by paucity of training data to consistently validate model outputs. Due to limited availability of food price data for model calibration in fragile contexts, it is within reason to assume inherent unstable model behaviors for northern region of Nigeria, subject to intrinsic model robustness, which may lead to imputation of egregious price data outputs, especially over a long timeframe, characterized by multiple seasons. Note that the period was ostensibly defined by a market-disruptive shock of calamitous proportions, the COVID-19 pandemic, which triggered unanticipated episodic price behavior. The market price effects during this period were duly reflected in the trend of crowdsourced data, as previously reported (Adewopo, et al., 2021), and the strong correlation with AI-estimate prices suggest that the underlying algorithm effectively modeled both the periodic shifts and the indicative monthly changes in prices. It is noteworthy that the inherent price signals in crowdsourced data became more observable as multiple daily submissions were averaged into daily, weekly, and monthly values. This is attributable to the tendency of averaged price values to coalesce around the modal values, thereby minimizing the effect of outliers and amplifying the signal-to-noise ratio. The emergence of observable price signal from the plume of crowdsourced datapoints can be valuable for various purposes in food system analysis and decision-support. At the basic level, the detectability of price signals addresses a major need to track seasonal price dynamics, and can support the development of agile analytical systems that links commodity prices to ancillary socio-economic and ecological factors. Unequivocally, the temporal granularity of commodity price monitoring systems should match the relevant temporal scale of target market-focused policies or food systems intervention. Therefore, the high- frequency datapoints proffer the possibility to aggregate price data at the required time- scale to for data-driven insights and decision-making in fragile national or regional contexts. The monthly AI-based estimates are currently generated across 36 countries, with comparable survey data available from different organizations to potentially scale up such approaches in other countries. The strong correlation between AI-estimated prices and crowdsourced data, triangulated against enumerator-led data, over an extended timeframe 12 (3 years) lends credence to the innovativeness and ability to capture on-ground conditions that are represented by both approaches. The near-perfect agreement between the price data from both sources is indicative of a potential to couple human-centered price monitoring system with an alternate algorithm-dependent system to evolve a hybrid high- frequency price monitoring system, bolstered by inbuilt multi-temporal validation architecture. Specifically, crowdsourced data can feed into such envisioned system as georeferenced intraday price data points, providing spatially-rich stream of training and testing data into the AI algorithm pipeline, and potentially enhancing model performance and the credibility of data outputs. Conversely, the AI algorithm can be leveraged to generate relatively coarser price data and insights at multiple administrative levels across countries (or regionally), beyond the geographic coverage of crowdsourcing efforts, especially in contexts where crowdsourcing is ineffective or yields sparse data. The value of both innovative data methods transcends mere understanding of current or past trend of commodity prices. It is also a significant milestone towards forecasting emergent food crises based on the richness of data flow to test various assumptions about price movement, investment outlook, and overall resilience or vulnerability of households to shocks and stressors (Andrée, et al., 2020; Wang, et al., 2022; Gatti, et al., 2023). Finally, the understanding that aggregation of prices from a variety of packaging units did not compromise the relative relationship between crowdsourced and AI-estimated prices can be useful to adapt future efforts to scale either of the innovative data curation methods. Generally, the AI-estimated prices are generated for wholesale package units i.e. 100kg packages, and represents wholesale prices (Andrée, 2021a). Meanwhile, the crowdsourced price data (at level 0) were submitted based on a variety of selling packages at farmgate, retail, and wholesale market segments, and the dataset contains information that differentiates the packaging units (Solano-Hermosilla, et al., 2021; Arbia, et al., 2023). Our results align with previous finding that price per unit of commodity (i.e. ₦/kg) is usually lower at wholesale segment than retail segment (Hirvonen, et al., 2021), however the correlative relationship between AI-estimated prices and the wholesale subset of the crowdsourced data remained strong (r2=0.93). Realistically, markets in LMICs are predominantly characterized by sellers and buyers that transact commodities with different packaging units within or between stores or sheds. Therefore, to be agile and viable, price monitoring systems must be sufficiently robust to track prices across markets segments or diverse packaging units, while unraveling market signals based on price per unit, notwithstanding the source. In essence, the highly positive results from this analysis, without discrimination of the source market segments and or packaging units associated with the prices, indicate the plausibility of the crowdsourced and AI-estimated price data for overall market surveillance. As both innovative approaches [co-]evolve, the findings from this study establish instructive evidence to support further development of these methods towards bridging extant data gaps in food price monitoring systems within LMICs, and contribute to advancement of data- rich analytics, including scalable now-casting and forecasting of food insecurity over time and at regional, national, or sub-national levels (Andrée, 2022). 13 Methods This paper leveraged previously published food price datasets, available in different public repositories (Andrée, 2021a; Solano-Hermosilla, et al., 2021; Arbia, et al., 2023; Andrée & Pape, 2023), which were coupled with unpublished metadata (mainly ID of data submitter – volunteer crowd or trained enumerator) and wrangled to establish linkages across data sources and generate new evidence regarding the validity of the high frequency data outputs from the newer methods (i.e. crowdsourcing and AI-based imputation). Briefly, the FPCA crowdsourced price data were submitted by ~1,200 volunteer citizens (so called “crowd”) who were onboarded and encouraged to submit prices of six (6) major commodities whenever they leisurely or purposefully visit nearest market within or beyond their locality (Adewopo, et al., 2022). The crowdsourced data were spatially rich because volunteers submitted georeferenced datapoints from many (>100) market locations and across nine (9) market types which all fall within three major market segment classification – Farmgate (nearest to production locations), Retail, and wholesale (Arbia, et al., 2023; Solano- Hermosilla, et al., 2022). We accessed the monthly AI-estimated prices with the version date of 09/26/2023 from the World Bank microdata library. The AI-estimated data were generated by imputation of sparse, incomplete, or infrequent price data gathered from multiple sources and provided by Humanitarian Data Exchange (HDX) in partnership with the World Food Programme (WFP), through hybrid machine learning models that integrate local relationships within specific regions of the feature space, similar to the concept of local receptive fields in CNN, to support investigation of local price dynamics in markets where prices are sensitive to localized shocks and traditional data are not available (Andrée, 2021a). We computed basic statistical metrics and implemented dual-layered correlative analyses. The first layer focused on the relationship between daily crowdsourced prices and ground truth prices, and this cascaded to second layer assessment of relationship between monthly averages of crowdsourced prices and monthly AI-estimates. Although the AI-estimates include open, high, low, and close (OHLC) prices, we opted to utilize closing prices because they often represent the prevailing market sentiment within each period. The raw intraday crowdsourced dataset (level 0 data) was downloaded and queried for the coinciding commodities between actual data submissions and AI-estimates (rice and maize). Both commodities are major staples that are commonly traded in the region, and nationally. The ground truth data collection was implemented in parallel to the submission of data by crowd volunteers during the final 8-months of FPCA initiative (EC-JRC, 2022). Twelve (12) enumerators who are resident at different locations, dispersed across the focal region (Figure 1), were duly trained with a data collection protocol. By visiting the nearest market daily, the enumerators submitted observed or transactional data through a preconfigured data collection survey instrument (on smartphone). Multiple enumerators were enlisted to improve coverage of multiple market locations, and minimize potential human bias and artefacts in the data. It should be noted that the different unit packages for the prices that were submitted by volunteer crowd and enumerators reflect the market reality that commodities are usually transacted in diverse unit packages, which are standardized across various market segments and geographies. The price per unit (₦/kg) for each submitted datapoints were eventually computed with reference to standard conversion units (Arbia, et al., 2023). Prior to implementing the two-step analysis, the raw intraday crowdsourced and enumerator-submitted prices were checked for obvious outliers, based on expert 14 observation and pre-knowledge of realistic price ranges for each commodity during the time period. Outliers were identified with Tukey outlier detection method (Tukey, 1977; Dastjerdy, et al., 2023) by calculating the interquartile range and applying fence-rule to filter values. Spurious datapoints (1.3% of the entire dataset) were excluded from further analysis. Second, the intraday crowdsourced and enumerator-submitted prices were computed into daily, weekly, and monthly averages for each commodity type/sub-type. To assess the degree of closeness between the prices from both sources, we conducted pairwise Pearson’s correlation analysis, and fitted a linear trend between paired price datasets, as described in other literatures (Taylor, 1990 and Senthilnathan, 2019). While the correlation coefficient (R) determines the existence and direction of a relationship between the prices, the coefficient of determination (r2) indicates the pattern and strength of the relationship, based on the co-variability of both prices (Equation 1 and 2). R ranges from -1 to +1, indicating strongly negative to strongly positive relationship, respectively; whereas r2 ranges from 0 to 1, directly indicating the degree of variance explained between the prices. The correlative and regression analysis were performed between the crowdsourced and enumerator prices at daily, weekly, and monthly timesteps, using relevant R packages and functions. Generally, (∑ )−(∑ ).(∑ ) = (1) ��∑ 2 �−(∑ )2 ��∑ 2 �−(∑ )2 ∑ =1( − ) 2 2 = 1 − � ∑=1(− ) 2 (2) Where, is the number of observations, is the prices submitted from source 1, is the prices submitted from source 2; is the timepoint in data collection period, is last time poit in the data sequence, is the price from data source 1 at ℎ timepoint, is the price � is the mean value of . from data source 2 at ℎ timepoint, Also, following previous literature (Xu, et al., 2017; Snedecor & Cochran, 1989), we conducted paired sample student-t tests to assess if the differences between price means were significant between the data sources during the period. The tests were conducted based on the null hypotheses that mean prices from one data source is not significantly different from the other. Inferential test statistics were based on a confidence level threshold (α) of 0.05, and the acceptance or rejection of the null hypotheses was based on final p-values. Data Availability Generally, the datasets for the analyses were previously published and accessible through two (2) major public repositories. The overall crowdsourced price data was published on the data portal of the European Commission Joint Research Center (EC-JRC, 2022; Solano- Hermosilla, et al., 2021), while the AI-estimated prices data is available through World Bank’s microdata library. 2 The initial correlative analysis between crowdsourced and ground truth 2 The search tags are, respectively, RTFP, RTEP and RTFX for Real-Time Food Prices, Real-Time Energy Prices, and Real-Time foreign eXchange Rates. Links to the dedicated data sets have been provided in 15 prices was based on the raw (level 0) intraday submissions, subset by internal identifier (ID) records that differentiated volunteer crowds from trained enumerators in the data pool. While the full pre-processed crowdsourced datasets in the public database are presented as daily averages for ease of use and understanding by public users (EC-JRC, 2022), detailed intraday datasets for the second major wave of data collection (in 2021) are accessible through another repository on EC-JRC data-m portal (Solano-Hermosilla, et al., 2021). The full monthly AI-estimated prices are presented as open, high, low, and close prices for each country in the aforementioned repository. Code Availability Data analysis was conducted with R-software within R-studio environment (R Core Team, 2013). The entire code is saved in GitHub to reproduce the results and charts, and can be accessed through https://github.com/PJNation/FoodPriceAnalytics). References Adewopo, J., Solano-Hermosilla, G., Micale, F., & Colen, L. (2022). Crowdsourced data reveal threats to household food security in near real-time during COVID-19 pandemic. In J. McDermott, & J. Swinnen (Eds.), COVID-19 and global food security: Two years later (pp. 40- 45). International Food Policy Research Institute (IFPRI). doi:10.2499/9780896294226_05 Adewopo, J., Solano-Hermosilla, G., Colen, L., & Micale, F. (2021). Using crowd-sourced data for real-time monitoring of food prices during the COVID-19 pandemic: Insights from a pilot project in northern Nigeria. (Jun:29:100523). doi:10.1016/j.gfs.2021.100523 Aldoseri, A., Al-Khalifa, N., & Hamouda, A. (2023). Re-Thinking Data Strategy and Integration for Artificial Intelligence: Concepts, Opportunities, and Challenges. Applied Science, 13(12), 7082. doi:10.3390/app13127082 Andrée, B.P.J. (2021a). Estimating food price inflationfrom partial surveys. World Bank, Development Data Group. The World Bank Group. Washington D.C. Retrieved 01 05, 2024, from https://openknowledge.worldbank.org/server/api/core/bitstreams/7bb07f51-6a93- 5207-a43f-eb563b24d87f/content Andrée, B.P.J. (2021b). Monthly food price inflation estimates by country (Version 2023-12- 11). Washington, D.C., USA. Retrieved 10 11, 2023, from World Bank Microdata Library: https://microdata.worldbank.org/index.php/catalog/4509 Andrée, B.P.J. (2022). Machine Learning Guided Outlook of Global Food Insecurity Consistent with Macroeconomic Forecasts. Policy Research Working Papers. Washington, D.C., Washington, USA. doi:10.1596/1813-9450-10202 Andrée, B.P.J., & Pape, U. (2023). Machine Learning Imputation of High Frequency Price Surveys in Papua New Guinea. 10559. Washington D.C., Washington D.C.: World Bank. Retrieved 02 01, 2024, from http://hdl.handle.net/10986/40410 the main text and all data sets can be retrieved by querying RTP (Real-Time Prices): https://microdata.worldbank.org/index.php/catalog/?page=1&sk=RTP&ps=15 16 Andrée, B.P.J., Chamorro, E., Andres, F., Kraay, A., Spencer, P., & Wang, D. (2020). Predicting Food Crises (English). Policy Research working paper, no. WPS 9412. Washington D.C., Washington D.C., United States. Retrieved 02 01, 2024, from http://documents.worldbank.org/curated/en/304451600783424495/Predicting-Food-Crises Arbia, G., Solano-Hermosilla, G., Nardelli, V., Micale, F., Genovese, G., Amerise, I., & Adewopo, J. (2023). From mobile crowdsourcing to crowd-trusted food price in Nigeria: statistical pre-processing and post-sampling. 10(446). doi:10.1038/s41597-023-02211-1 Baquedano, F. (2015). Developing a price warning indicator as an early warning tool - a compound growth approach. GIEWS – Global Information and Early Warning System on Food and Agriculture. Rome, Italy. Retrieved from https://www.fao.org/fileadmin/user_upload/foodprice/docs/resources/a-i7550e.pdf BudgIT. (2022). State of States 2022 Edition. [Okeowo, Gabriel; Fatoba, Iyanuoluwa, eds]. BudgIT. Retrieved 01 02, 2024, from https://yourbudgit.com/wp- content/uploads/2022/10/2022-State-of-states_Official.pdf Dastjerdy, B., Saeidi, A., & Heidarzadeh, S. (2023). Review of Applicable Outlier Detection Methods to Treat Geomechanical Data. Geotechnics, 3, 375-396. doi:10.3390/geotechnics3020022 Dou, Z., Stefanovski, D., David, G., Lindem, M., Rozin, P., Chen, T., & Chao, A. M. (2020). Household Food Dynamics and Food System Resilience Amid the COVID-19 Pandemic: A Cross-National Comparison of China and the United States. Frontier in Sustainable Food System, 4. doi:10.3389/fsufs.2020.577153 Easterly, W., & Fischer, S. (2021). Inflation and the Poor. Journal of Money, Credit and Banking, 33(2), 160. EC-JRC. (2022). Food Price Crowdsourcing in Africa - FPCA. Retrieved 01/03/2024, from European Commission Joint Research Center Data Portal: https://datam.jrc.ec.europa.eu/datam/mashup/FP_NGA/index.html; Last accessed: 01/05/2024 FAO. (2017). Building Agricultural Market Information Systems: A literature review. Rome, Italy. Retrieved 01 07, 2024, from http://www.fao.org/3/a-i7151e.pdf FAO, IFAD, UNICEF, WFP and WHO. (2023). The State of Food Security and Nutrition in the World 2023. Urbanization, agrifood systems transformation and healthy diets across the rural–urban continuum. Rome, Italy. doi:10.4060/cc3017en FSIN and Global Network Against Food Crises. (2023). Global Report on Food Crises 2023 - GRFC2023. Rome. Retrieved 03 01, 2024, from https://www.fsinplatform.org/sites/default/files/resources/files/GRFC2023-compressed.pdf Galtier, F., David-Benz, H., Subervie, J., & Egg, J. (2014). Agricultural market information systems in developing countries: New models, new impacts. Cahiers Agricutures, 23, 232- 244. doi:10.1684/agr.2014.0716 Gatti, R., Lederman, D., Islam, A., Bennet, F., & Andrée, B. (2023). Altered Destinies: The Long-Term Effects of Rising Prices and Food Insecurity in the Middle East and North Africa. 17 Middle East and North Africa Economic Update. Washington D.C., Washington D.C., USA. Retrieved 02/29/2024, from https://elibrary.worldbank.org/doi/abs/10.1596/978-1-4648- 1974-2 Gavin, W., & Mandal, R. (2002). Predicting inflation: food for thought. The Regional Economist,, Jan, 4-9. Green, R., Cornelsen, L., Dangour, A., Turner, R., Shankar, B., Mazzocchi, M., & Smith, R. (2013). The effect of rising food prices on food consumption: systematic review with meta- regression. BMJ, 346(f3703). doi:10.1136/bmj.f3703 Headey, D., & Ruel, M. (2023). Food Inflation and Child Undernutrition in low and middle income countries. Nature Communications, 14(5761). doi:10.1038/s41467-023-41543-9 Hirvonen, K., Minten, B., Mohammed, B., & Seneshaw, T. (2021). Food prices and marketing margins during the COVID‐19 pandemic: Evidence from vegetable value chains in Ethiopia. Agric Economics, 52(3), 407-421. doi:10.1111%2Fagec.12626 Joutz, F. (1997). Forecasting CPI Food Prices: An Assessment. American Journal of Agricultural Economics, 79(5), 1681-1685. Kalkuhl, M., von Braun, J., & M., T. (2016). Volatile and Extreme Food Prices, Food Security, and Policy: An Overview. In M. Kalkuhl, J. von Braun, & M. Torero (Eds.), Food price volatility and its implications for food security and policy. Cham, Switzerland: Springer. doi:10.1007/978-3-319-28201-5 Kisan, G. (2015). Review of global food price databases: Overlaps, gaps and opportunities to improve harmonization. Food Security Information Network. Retrieved 01 10, 2024, from https://reliefweb.int/report/world/review-global-food-price-databases-overlaps-gaps-and- opportunities-improve Lentz, E., & Maxwell, D. (2022). How do information problems constrain anticipating, mitigating, and responding to crises? International Journal of Disaster Risk Reduction, 81(103242). doi:10.1016/j.ijdrr.2022.103242 Lu, Y., Shen, M., Wang, H., & Wang, X. (2021). Machine Learning for Synthetic Data Generation:. Journal of Latex Class Files, 14(8). Retrieved from https://arxiv.org/pdf/2302.04062.pdf Manners, R., Adewopo, J., Niyibituronsa, M., Remans, R., Ghosh, A., Schut, M., et al. (2022). Leveraging Digital Tools and Crowdsourcing Approaches to Generate High-Frequency Data for Diet Quality Monitoring at Population Scale in Rwanda. Front. Sustain. Food Syst., 5. doi:10.3389/fsufs.2021.804821 Minet, J., Curnel, Y., Gobin, A., Goffart, J., Melard, F., Tychon, B., et al. (2017, November). Crowdsourcing for agricultural applications: A review of uses and opportunities for a farmsourcing approach. Computers and Electronics in Agriculture, 142(Part A), 126-138. doi:10.1016/j.compag.2017.08.026 Nguyen, T., Nguyen, H., Lee, J., Wang, Y., & Tsai, C. (2023). The consumer price index prediction using machine learning approaches: Evidence from the United States. Heliyon, 9(10), e20730. doi:10.1016/j.heliyon.2023.e20730 18 R Core Team. (2013). R: A language and environment for statistical computing. R Found. Stat. Comput. Vienna, Austria. Savage, N. (2023, 04 27). Synthetic data could be better than real data. doi:10.1038/d41586- 023-01445-8 Schneider, J., Zabel, F., & Mauser, W. (2022). Global inventory of suitable, cultivable and available cropland under different scenarios and policies. Nature Scientific Data, 9(527). doi:10.1038/s41597-022-01632-8 Schneider, K., Fanzo, J., Haddad, L., Herrero, M., Moncayo, J., Herforth, A., et al. (2023). The state of food systems worldwide in the countdown to 2030. Nature Food, 4, 1090-1110. doi:10.1038/s43016-023-00885-9 Senthilnathan, S. (2019). Usefulness of Correlation Analysis. doi:10.2139/ssrn.3416918 Snedecor, G., & Cochran, W. (1989). Statistical Methods (8th ed.). Ames: Iowa State University Press. Solano-Hermosilla, G., Adewopo, J., Gorrín-González, C., Micale, F., Arbia, G., & Nardelli, V. (2021). FPCA-II. Food Price Crowdsourcing Africa-expansion. Sevilla, Spain: European Commission, Joint Research Centre (EC-JRC) [Dataset]. Retrieved from http://data.europa.eu/89h/f3bc86b0-be5f-4441-8370-c2ccb739029e Solano-Hermosilla, G., Barreiro-Hurle, J., Adewopo, J., & Gorrin-Gonzalez, C. (2022). Increasing engagement in price crowdsourcing initiatives: Using nudges in Nigeria. World Development, 152(105818). doi:10.1016/j.worlddev.2022.105818 Subbaraman, B., & Goodwin, E. (2017). FAO AMIS Nigeria. Final Report. Contract Report, Knoema Corporation, Agricultural Data Exchange, Kaduna. Taylor, R. (1990). Interpretation of the correlation coefficient: A Basic Review. JDMS, 1, 35- 39. doi:10.1177/875647939000600106 Thomas, A. (2023). Food Insecurity in Nigeria: Food Supply Matters. IMF Country Report No 2023/094 and Special Issue SIP/2023/018. 27p. IMF Country Report, International Monetary Fund (IMF). Retrieved 01 02, 2024, from https://www.imf.org/- /media/Files/Publications/Selected-Issues-Papers/2023/English/SIPEA20230 Tukey, J. (1977). Exploratory Data Analysis; Addison-wesley series in behavioral science- quantitative methods. Reading, MA, USA: Addison-Wesley. UNFPA. (2023). World Population Dashboard Nigeria. United Nations. Retrieved 01 03, 2024, from United Nations Population Fund Dashboard: https://www.unfpa.org/data/world- population/NG Wang, D., Andrée, B.P.J., Chamorro, A., & Girouard, S. (2022). Transitions into and out of food insecurity: A probabilistic approach with panel data evidence from 15 countries. World Development, 159. doi:10.1016/j.worlddev.2022.106035 Waterlander, W. E., Jiang, Y., Nghiem, N., Eyles, H., Wilson, N., Cleghorn, C., et al. (2019). The effect of food price changes on consumer purchases: a randomised experiment. The Lancet Public Health, 4(8), e394-e405. 19 World Bank. (2022). Poverty and Shared Prosperity 2022: Correcting Course. Overview Booklet. World Bank Group. Washington D.C. World Bank. (2023). Food Security Update. Washington D.C.: The Worldbank. Retrieved 01 02, 2024, from https://thedocs.worldbank.org/en/doc/40ebbf38f5a6b68bfc11e5273e1405d4- 0090012022/related/Food-Security-Update-XCVII-December-14-23.pdf Xu, M., Fralick, D., Zheng, Z., Wang, B., Tu, M., & Feng, C. (2017). The Differences and Similarities Between Two-Sample T-Test and Paired T-Test. Shanghai Arch Psychiatry, 29(3), 184-188. doi:10.11919/j.issn.1002-0829.217070 Zeug, H., Zeug, G., Bielski, C., Solano-Hermosilla, G., & M`barek, R. (2017). Innovative Food Price Collection in Developing Countries, Focus on Crowdsourcing in Africa. European Union. Luxembourg: Office of the European Union. EUR 28247 EN. https://publications.jrc.ec.europa.eu/repository/handle/JRC103294 Author contributions Adewopo, J. [Idea, Analysis, Lead Author], Andrée, B.P.J. [Idea, Paper writing and review], Peter, H. [data collection, revision of manuscript], Hermosilla-Solano, G. [data collection, review of manuscript], Micale, F. [data collection, review of manuscript]. Funding Support Funding by the Federal Ministry for Economic Cooperation and Development (BMZ, Germany) as part of the World Bank’s Food Systems 2030 (FS2030) Multi-Donor Trust Fund program (grants TF073570 and TF0C0728) is gratefully acknowledged. Also, we recognize initial funding for ground truth data collection from European Commission Joint Research Center (EC-JRC) and support from the Agropolis Foundation for the research mobility under Louis Malassis International Scientific Prize 2019. Acknowledgments The authors thank Tomaso Ceccarelli and Liesbeth Colen for comments and insights on previous outputs from this work. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Competing interests We declare no competing or conflicting interests on this manuscript. Supplementary Materials 20 a. b. Appendix 1: Co-evolution (Panel a) and relationship (Panel b) of Thailand rice prices submitted by trained local enumerators (groundtruth) and volunteer crowd (crowdsource) over 8 months period (Mar 2021 – Oct 2021) within the northern region of Nigeria. The cohesion of price signal and relationship increasingly improved as intraday datapoints were averaged to daily, weekly, and monthly time intervals. R denotes the correlation coefficient while r2 denotes the coefficient of determination, indicating the measure of co-variability between 21 both datasets. 1 a b R: 0.94 | r2: 0.88 P-value <0.001 Appendix 2: Temporal trend (a) and relationship (b) between monthly average prices of rice (imported) submitted by volunteer crowd (crowdsourced) and estimate by artificial intelligence (AI-estimated) within a fragile context in the northern region of Nigeria during a 3-year period (2021-2023). Crowdsourced prices represent monthly post- sampled average of intraday submissions by volunteers across over 100 geolocated market points, while AI-estimated prices represent average estimated monthly closing price over four (4) geolocated markets within the study region. 22 2 2 2 R=0.91, r = 0.82 R=0.93, r = 0.87 R=0.96, r = 0.92 P<0.0001 P<0.0001 P<0.0001 2 2 R=0.95, r = 0.90 2 R=0.94, r = 0.89 R=0.95, r = 0.91 P<0.0001 P<0.0001 P<0.0001 Appendix 3: Relationship between Crowdsourced and AI-estimated prices of rice (imported), disaggregated by State and Commodity subtype within a fragile context in the northern region of Nigeria. The intraday volunteer-submitted crowdsourced prices were collected over a 3-year period (2019 -2021) and averaged into monthly values. The AI- estimated monthly closing prices were averaged over four (4) market locations within the study region. r2 denotes coefficient of determination and the significance of the relationship is tested at α=0.05. 23