Estimating Local Agricultural GDP across the World

,


Introduction
According to the Food and Agriculture Organization of the United Nations, at least 2.5 billion people depend on the agricultural sector for their livelihood and it provides a key source of employment and income for the poor and vulnerable people (FAO, 2013(FAO, , 2019(FAO, , 2021). Yet, economic statistics of the agricultural sector are frequently produced at a national or lower administrative level and may not adequately capture the local variation. Furthermore, a spatial mismatch may exist between 15 the geographic unit of interest like the natural area of a river and the administrative area. Lastly, local conditions can pose challenges to measurement across the world. Agricultural land area is approximately five billion hectares and access to data capture and reporting in fragile, conflict and violence states may not allow current or complete geographic coverage.
Detailed agricultural data are critical to examining a wide range of agricultural issues including technology and land use (e.g. Bella and Irwin, 2002;Luijten, 2003;Staal et al., 2002;Samberg et al., 2016), exposure to natural hazards (e.g. Murthy 20 et al., 2015) and patterns of and productivity of economic development (e.g. Nelson, 2002;Elhorst and Strijker, 2003;Gollin et al., 2014;Reddy and Dutta, 2018). Carrão et al. (2016) examine the exposure of people and economic activity to drought using measures of physical elements (e.g. cropland and livestock). Rentschler and Salhab (2020) find that low and middle-income countries have 89% of global flood exposed population and poor people account for almost 600 million, who are directly exposed to the risk of intense flooding. Vesco et al. (2021) examine linkages between climate variability and agricul-25 tural production as well as conflict. They find that climate variability contributes to an increase in the spatial concentration of agricultural production within countries. Furthermore, in countries with a high share of agricultural employment in the national workforce, they find this combined effect increases the likelihood of conflict onset. To better target rural development strategies for economic growth and poverty reduction, as well as conserve the natural resource base for long-term sustainable development, we need to accurately delineate the spatial distribution of agricultural resources and production activities (Wood 30 et al., 1999).
One method to partially address spatial mismatch between administrative and other geographic units such as natural hazards uses the gridded (raster) data format by providing an intermediate and consistent unit for disaggregation and aggregation (e.g. UNISDR, 2011). Data-disaggregation methods can use detailed data to inform estimates of aggregated data from large areas at the local level (e.g. see review in Pratesi et al., 2015). Several spatial data products from global models are available to estimate 35 population at a local level (see review in Leyk et al., 2019).
Previous evidence-based risk analyses take advantage of global data of hazards to estimate exposure of population and economic activity (e.g. Gunasekera et al., 2015Gunasekera et al., , 2018Ward et al., 2020;Rentschler and Salhab, 2020). Gross Domestic Product (GDP) is a critical economic indicator in the measurement and monitoring of an economy in a country that is typically only available at national and occasionally sub-national levels. Regional indicators play a key role in the necessary variation 40 to forecast regional GDP (Lehmann and Wohlrabe, 2015) and food security (Andree et al., 2020). Previous efforts to estimate local GDP use high resolution spatial auxiliary information such as luminosity or population data to provide local variation.
Methods by Nordhaus (2006); World Bank and UNEP (2011); Kummu et al. (2018); Murakami and Yamagata (2019) took advantage of gridded population data, which is the result of a model disaggregating the most detailed level population data into grids (e.g. see review in Leyk et al., 2019). However, wealth is not evenly distributed among people nor infrastructure (Berg 45 et al., 2018). In fact, the divide between the rich and poor is even widening in our time (Dabla-Norris et al., 2015). The method used in World Bank and UNEP (2011) stratify the population by rural and urban, yet definition of these geographic areas can vary based on the selection of the population model (Leyk et al., 2019). These measurements matter in application to stylized facts such as the strong negative correlation of the level of urbanization with the size of its agricultural sector (Roberts et al., 2017). Also, the uniform distribution of labor in agriculture is another key concern (Gollin et al., 2014). Other methods used 50 land cover such as vegetation and built-up indices, however did not incorporate types of agriculture like cropland and livestock (Gunasekera et al., 2015;Goldblatt et al., 2019).
Other methods to estimate GDP at a local level take advantage of the lights at night dataset. Doll et al. (2006) andElvidge et al. (2009) found nighttime lights to provide a uniform, consistent, and independent estimate for economic activity, and several other studies (e.g. Chen and Nordhaus, 2011;Henderson et al., 2012;Ghosh et al., 2010;Bundervoet et al., 2015;Wang et al., 55 2019; Eberenz et al., 2020;Wang and Sun, 2021) utilized this striking correlation between luminosity and economic activities to estimate economic output on the ground. While night light is a good reflection of economic activities in manufacturing and urban areas, night light data may not capture the agricultural activity as it requires areas to emit light. Bundervoet et al. (2015) suggest that agricultural indicators rather than rural population could improve the estimation of GDP given the importance of agriculture in many of the economies in their sample of Africa. Gibson et al. (2021) find that night time lights data are a poor 60 predictor of economic activity in low population density rural areas.
In this paper, we present a high resolution gridded Agricultural GDP (henceforth AgGDP) dataset that is produced through a spatial allocation model by distributing national and sub-national statistics to 5 arcminute pixels based on satellite-derived information of constituents of AgGDP, including forestry, hunting, and fishing, as well as cultivation of crops and livestock production 1 . We make two main contributions. First, we construct a global dataset of gridded AgGDP. This entails a massive 65 effort of data collection and integration. We extend and apply the cross-entropy framework developed in the Spatial Production Allocation Model (SPAM) for crops that pioneered the use of cross-entropy optimization in spatial allocation (You and Wood, 2003;You et al., 2014You et al., , 2018Yu et al., 2020). We construct and integrate global datasets of the components of agricultural GDP as priors and then reconcile the values with the regional account statistics using cross-entropy optimization. Second, we contribute to efforts assessing the exposure of economic activity to natural hazards with a focus on agricultural GDP. 70 Significant progress has been made to measure physical assets such as built-up area and estimate hazards to quantify its exposure to natural hazards. However, the spatial distribution of agricultural GDP is less known. So, we apply these data to inform efforts quantifying the population and agricultural GDP at risk to drought and water scarcity.
The rest of this paper is structured as follows. The next section provides a detailed description of the methodology and data.
Then, we present the model results and data. Then, we discuss the results along with validation followed by usage notes from 75 a fitness-for-use perspective. Finally, we provide concluding remarks.

Methodology and data
Following the composite structure of agricultural GDP, we disaggregate the national and sub-national statistics into a global grid through a cross-entropy allocation model. Given the availability of data and the global scope, our efforts varied on adjusting official statistics and creating priors for different components. Below we discuss the construction of each component, AgGDP 80 statistics and the allocation model followed by the global natural hazards data. Given the spatial resolution and year of reference of the input data for the crop value of production, we estimate AgGDP for the year 2010 into 5 arc-minute grids (10x10 km) across the world.

Construction of components
For each pixel, we construct an estimated value of production based on high spatial resolution information of the five compo-85 nents that serve as priors in the modeling process: crop, livestock, forestry, fishing, and hunting. Given the lack of information on the hunting component, we disaggregate the forestry component into two parts: timber and non-timber products of forestry.
1 Agriculture, forestry, and fishing corresponds to ISIC divisions 1-3 and includes forestry, hunting, and fishing, as well as cultivation of crops and livestock production The non-timber products of forestry includes an even distribution of hunting. The construction of the five components is described below in four subsections: crop, livestock, forestry (timber and non-timber) and fishing.

Crop value of production 90
The crop component in the gridded AgGDP is generated by multiplying the quantity of production from the global SPAM 2010 version 1 dataset 2 (You et al., 2018) with the producer prices at the country level from FAOSTAT (FAO, 2016) for each crop and then summed together. 3 As mentioned earlier, SPAM is a cross-entropy model, which calculates a plausible allocation of crop areas and production to approximately 10 km pixels, based on agricultural statistics at national and sub-national levels, combined with gridded layers of cropland, irrigated areas, population density and potential crop areas and yields (Yu 95 et al., 2020). SPAM's output distinguishes between 42 crops (33 individual crops, 9 aggregated crops) that together add up to practically all cultivated crops in a country with four parameters including production, yield, physical area and harvest area.
For aggregated SPAM crops (such as other cereals, other pulses, vegetables, fruits, etc.), we computed their prices by taking the weighted average of their components, as follows: where Jagg is the aggregated crop group, j is any crop that belongs to Jagg, Price Jagg is the price of the aggregated crop group, price j is the price of crop j, and prod j is the production of j.
For each grid, the value of crop production is thus: where Cropval i is the value of total crop production in pixel i, prod i,j is the production of crop j in pixel i, and price j is the price 105 of crop j. A map of global gridded crop production value as a prior is shown in Figure 1.

Livestock production
Livestock accounts for an estimated 40% of the global value of agriculture output and plays an important role in ensuring the livelihoods and food security for over one-sixth of the world's population (FAO, 2018). Yet, it is still under rapid expansion as the global demand for animal-sourced products such as meat, milk, eggs, and hides continues to grow (Herrero and Thornton,110 2013). While species and quantities of livestock raised vary among regions and husbandry farmers, there are five primary species -cattle, sheep, goats, pigs, and chicken -that prevail worldwide and provide essential products for human consumption.
We calculate the component of livestock production in gridded AgGDP based on the distribution maps of the above five primary species from the Gridded Livestock of the World (Robinson et al., 2014;Gilbert et al., 2018) and FAOSTAT's value of production of livestock products (including meat, milk, eggs, honey and wool) (FAO, 2020). To facilitate comparison, the 115 2 Available at www.mapSPAM.info 3 As for the producer price, ideally, we need sub-national level figures, but such a dataset is not available globally. Therefore, we use the FAOSTAT's national producer prices.  animal-specific density numbers are converted to one animal type by using International Livestock Units (Eurostat, 2018), as shown in Table 1. Then the densities of the animal equivalent values are multiplied by pixel areas to get the count of animals per grid, which is multiplied by the FAOSTAT's value of production to obtain the livestock production prior for each pixel. where lsval i is the total value of livestock production in pixel i; lsval x is the value of livestock production (meat, milk, eggs, 120 honey and wool) that is reported at the national level; lsnum i is the total number of equivalent animals in pixel i; and X is a set including all pixels that fall within the boundary of a nation.
A map of global gridded livestock production value as a prior is shown in Figure 2.

Forestry production and hunting
Forest resources have been utilized by people since the advent of civilization (Hossain et al., 2008). Up until now, over a billion 125 people still rely on forest resources for food security and income generation to some extent (FAO, 2018). In the world's least developed regions, 34 countries depend on fuelwood to provide more than 70% of energy, among which 13 nations require 90% of energy (FAO, 2018).
The contribution of forest production to AgGDP can be classified into two broad types: wood (logging) products and nonwood forest products. Wood (logging) products are the most exploited commodities in the forestry sector. The trees are cut 130 down to be the raw materials for producing timber and pulp, which are further processed and converted into a number of derivatives, such as construction materials and paper products. Non-wood forest products are defined by the Food and Agriculture Organization of the United Nations (FAO). 4 It is estimated that millions of households around the world depend on non-wood forest products for their livelihood. Some 80% of people in the developing world use these products in their everyday life (Sorrenti, 2016).

135
For a complete assessment of forest production priors, this study takes both wood and non-wood products into consideration.
The gridded non-wood forest products dataset used in this study was jointly developed by Resources for the Future and the World Bank (Siikamäki et al., 2015) through an approach of meta-regression modeling, which integrates over 100 estimates at various locations from a literature review and multifold information on ecological and socioeconomic factors. The value of non-wood forest products is resampled to the 5 arc-minute grid cell size and converted to 2010 USD for consistency with other 140 AgGDP components. As part of non-timber products, we include hunting with an even distribution across units and time given the lack of information.
The value of wood products per pixel is calculated based on forest loss from year 2010 to year 2011 excluding loss due to fire, with an assumption that the forests were mainly cut down for timber production. The Moderate Resolution Imaging Spectroradiometer (MODIS) Land Cover map (Friedl et al., 2010) for year 2011 is overlaid on top of that for year 2010 to 145 detect the area that has changed from forest to non-forest. 5 However, forest loss due to fire should be removed because it does not result in wood products. Thus, fire information for year 2010 is obtained from the NASA Fire Information for Resource Management System (FIRMS) (NASA, 2018) and areas that experienced forest fire are eliminated. After the identification of forest area change in each pixel, the value of wood production at national level is taken from a FAO lead project (Lebedys and Yanshu, 2014) and proportionally disaggregated to arrive at a pixel-wise value of wood products as follows: where Woodval i is the value of wood products in pixel i; forestval x is the value of forest products reported at national level; nonwoodval x is the value of non-wood products at national level which is derived from Siikamäki et al. (2015); forestloss i is the area of forest loss excluding loss to fire in pixel i; again, X is a set including all pixels that fall within the boundary of a nation.

155
A map of global gridded wood forest production value as a prior is shown in Figure 3.

Fishery production
Fish makes up approximately 17% of animal-sourced protein in the human diet worldwide (Mathiesen, À. M., 2018). The fishery industry supports the livelihood of 12% of world population by creating 200 million jobs along its value chain. In the global trade system, 80 billion USD worth of fish is exported from developing countries and it plays a crucial role in promoting 160 local economic development (Kelleher et al., 2009). 4 These products are "goods of biological origin other than wood derived from forests, other wooded land and trees outside forests", including foods (nuts, fruits, mushrooms, etc.), food additives (herbs, spices, sweeteners, etc.), fibers (for construction, furniture, clothing, etc.), and plant and animal products with chemical, medical, cosmetic or cultural value. 5 The measurement is limited to detection of land cover change from satellite and will likely account for selective harvesting or forest degradation. We estimate both freshwater inland fisheries and marine production values using the FISHSTAT (FAO, 2009) data with a classification based on the fish production categories. The inland fishery production value is the result of disaggregating corresponding country level statistics in proportion to areas of inland water bodies in pixels. The distribution of inland water bodies is obtained from the ESA-CCI (Lamarche et al., 2017). Thus, the value of inland fishing production in each grid is 165 calculated as follows: where fishval i is the value of fishery production in pixel i; freshval x is the value of fresh fish production at the national level which is aggregated from FISHSTAT; waterbody i is the area of water bodies in pixel i; and X is a set including all pixels i that fall within the boundary of a nation x.

170
The value of marine fisheries production is based on its proximity to fish landing ports weighted by a composite indicator of equal weight from the number of visits and sum of the vessel hold of fishing vessels. We use the port database from the World Port Index (National Geospatial-Intelligence Agency, 2019) and the number of port visits with a vessel hold of fishing vessels from Hosch et al. (2019) to create a composite variable as the prior based on the sum (for each port) of the number of visits (each event in the database) and total vessel hold at the port. The geographic coverage of the ports is calculated for each port using the minimum port distance provided in (Hosch et al., 2019). Any distance greater than 150km is calculated at 150km.
The value of marine fishing production in each grid is calculated as follows: where marineval i is the value of fishery production in pixel i; portindex i is an equal weighted composite index of the number of visits in pixel and the total vessel hold in pixel i; and X is a set including all pixels i that fall within the boundary of a nation 180 x.
A map of global gridded fishery production value as a prior is shown in Figure 4.

AgGDP Statistics and Linked Grids
Tremendous effort has been made to collect and organize national and sub-national statistics from various sources of national ministries or from reports. However, not every country publishes its agricultural GDP figures at the sub-national (regional) that maintains global geographic layers with a consistent and comprehensively unified coding system (FAO, 2015). Then, we overlay the GAUL administrative boundaries on the grid network to assign the corresponding codes of the administrative units to each grid. For areas where sub-national AgGDPs have different administrative areas than GAUL, the GAUL areas are merged or split to match the sub-national AgGDP area.

200
After constructing all the components, we define a spatial allocation model in a cross-entropy framework following (You et al., 2014) to allocate administrative statistics to 5 arc-minute pixels 8 . National and sub-national AgGDP values are used as a constraint, while the distribution of crop, livestock, fishery, and forestry production (hunting is included in non-timber products of forestry) is used to create priors for estimating pixel-level AgGDP. Measurement units are unified using deflators and exchange rates. 9

205
The first step is to transform all real-value parameters into corresponding probabilities. Let S i be the share of the total agricultural GDP allocated to pixel i within a country x. AgGDP i,x is the agricultural GDP allocated to pixel i in country x and X is a set including all pixels that fall within the boundary of a nation. Therefore: 6 Regional Gross Domestic Product (RGDP) can be estimated following the production, income or expenditure approaches. However, RGDP is not typically compiled using the expenditure approach due to the scarcity of data such as inter-regional purchases and sales, or regional exports/imports. On the production and income approaches, the estimate of market activities is typically from the production approach, whereas the estimate of non-market industries is from the income approach. 7 The European Union developed a standard for administrative levels: The Classification of Territorial Units for Statistics (NUTS, for the French nomenclature d'unités territoriales statistiques). 8 A comprehensive presentation of the cross-entropy method is in Rubinstein and Kroese (2004) where we assume hunting occurs in areas with equal probability.
Theoretically, the sum of these components should be close to the official values obtained from the World Development Indicators. We make sure that the official AgGDP values are guaranteed to be no less than the sum of all five components of 215 agricultural GDP. Therefore, we first sum up all prior estimations of AgGDP.
Then, we rescale the prior AgGDP to be consistent with the official AgGDP value: Then we calculate the prior for S i as a probability by normalizing PriorAgGDP: Finally, we formulate a cross entropy model in the following mathematical optimization framework: Subject to the following three conditions: where i: i=1,2,3,. . . are pixel identifiers within the allocation unit (e.g. Brazil); and k: k=1,2,3, . . . are identifiers for sub- preserving constraint (e.g. Tobler, 1979) that ensures the sum of all allocated AgGDP values is equal to the total AgGDP of the country. The next equation (14) sets the sum of all allocated AgGDP within those subnational units with available data to be equal to the corresponding sub-national AgGDP values. The last equation (15) is a natural constraint for the percentage of AgGDP, which is also the probability in the cross-entropy model. The modeling framework is flexible in that more constraints can be added if more data are available and/or more reasonable assumptions on how AgGDP should be spatially disaggregated are discovered. 10 Last but not least, we multiply the total regional agricultural GDP by the probability in the cross-entropy model to derive the final pixel level agricultural GDP:  Figure 5 shows the results of the SPEI.
The Water Crowding Index (WCI) is a measure of water scarcity considering the local population as the annual water availability per capita (Falkenmark, 1986(Falkenmark, , 2013. Veldkamp et al. (2015) model global water crowding index with return periods. We take the mean of any pixels of the ensemble WCI with a 10 year return period within an agricultural GDP pixel.

250
Following the literature (e.g. Arnell, 2003;Alcamo et al., 2007;Kummu et al., 2010;Veldkamp et al., 2015), we categorize the WCI into four categories: Absolute is less than 500 m 3 /capita per year; severe is less than or equal to 1000 m 3 /capita per year; moderate is less than or equal to 1,700 m 3 /capita per year; and low is the remainder (Figure 6). Then, we evaluate water shortage events using a threshold of 1,700 m 3 /capita per year with a return period of 10 years.  to 10 km. 12 The spatial extent and quantity distribution of AgGDP over the world are in agreement with general knowledge of agricultural technology adoption and suitability, with well-known agricultural nations, such as India, China and the United States standing out as regions with high AgGDP. A number of European countries also exhibit high agricultural GDP values, 265 which is likely due to the benefit of adopting mechanized farming and technological facilitation, considering that the shares of agricultural land and agrarian population are relatively low in these well-developed places. Countries in Sub-Saharan Africa remain low in agricultural production, as indicated by low-value pixels sparsely spreading over the continent. Within the continent, agricultural production activities primarily take place in geographic areas with suitability.

Night time lights
The correlation of AgGDP with night light varies across world regions as it requires areas to emit light (    The exposure to drought is not uniform across the world. Across the world, the group of high income countries have less population and agricultural GDP exposed to drought in each number of years with extremely dry compared to the countries in 280 other income categories (Figure 8). The top ten countries in agricultural GDP exposed to an extreme drought from 2000 to 2009 include the large economies in the agriculture sector such as China, India, the United States and Russian Federation (Table B1).
However, other countries have a high share of their agricultural GDP exposed to an extreme drought (Table B2). The top 10 countries in 2010 population exposed to dry areas include countries with the largest economies in the agriculture sector as noted above, but the list includes countries such as the Democratic Republic of Congo, Tanzania and Uganda (Table B3).

285
Across the world, high income countries have less population and agricultural GDP in areas of absolute or severe categories of the Water Crowding Index compared to countries in other income categories (Figure 9). The top ten countries of agricultural GDP exposed to the Water Crowding Index include large economies in the agriculture sector such as China, India, Pakistan, Indonesia and Nigeria (Table B4). However, several countries have a high share of their agricultural GDP exposed to the Water 13 They use a Random Forest-based dasymetric redistribution method.  Crowding Index (Table B5). The top 10 countries in 2010 population exposed to dry areas include countries with the largest 290 economies in the agriculture sector as noted above, but the list includes countries such as Bangladesh, the Arab Republic of Egypt and Mexico (Table B6).

Validation
A true validation of the predictive accuracy of this model involves data collection and construction of agricultural gross regional product in different pixels and testing those independent observations against the predicted values. The regional product data are 295 generally constructed at the administrative level rather than the pixels, so validation would have to be done on an aggregation of model predictions. Few countries provide the required data to assess the prediction accuracy to examine the internal validation of the disaggregation efficiency and the data collection would be extremely costly and time-consuming. For the case of Brazil, Thomas et al. (2019) examine the predictive accuracy of three models to disaggregate agricultural GDP spatially including: cross-entropy, rural population and spatial regression. The cross-entropy and spatial regression models outperform a naive rural 300 population AgGDP model as measured by the Mean Absolute Deviation (MAD) and Root Mean Square Error (RMSE). 14 While the spatial regression performs the best, global data requirements that allow high enough degrees of freedom is a challenge.
14 Specifically, the MAD and RMSE for each model are respectively: the rural population density model (28,744 and 25,397), Cross-entropy spatial allocation (8,249 and 18,347) and Spatial disaggregation from a regression on agricultural production (7,214 and 16,673). severe is less than or equal to 1000 m 3 /capita per year, moderate is less than or equal to 1700 m 3 /capita per year and low is the remainder.
Given these data requirements and challenges, we compare the cross-entropy model to another spatial allocation model based on rural population at the country level. Then we extend a comparison of both models at the global level by mapping the correlation.

305
One advantage of the cross-entropy is the volume preserving pycnophylactic property, which ensures the sum of the gridded data is the original value and allows the possibility to include all information that is available from mixed levels of data (e.g. You et al., 2014). However, this presents a challenge in terms of an assessment of a global model. Previous work on gridded data products includes evaluations of accuracy. Typically, studies evaluate the internal accuracy of the model exploiting multiple geographic levels of data (as mentioned above in Thomas et al., 2019). Similarly, Van Boeckel et al. (2011), who examine duck 310 data in Thailand, conclude that input levels do matter, especially the importance of the presence of administrative level 1 data. Robinson et al. (2014) evaluate the livestock model in Brazil and find a positive association between the model accuracy and the administrative level of the training data used in the model. They also illustrate this inverse relationship of prediction accuracy and level of intensification in the case of chickens in Europe (Robinson et al., 2014). At the local cell level, previous models of land or population have compared results to independent local data (Siebert et al., 2002) or identified errors of omission in 315 a gridded population model using the locations from household surveys (e.g. Tiecke et al., 2017).
Since we can not perform an evaluation of prediction accuracy for all countries, we compare the global cross-entropy model with another allocation model, which is similar to the global assessment of maize and rice production in You et al. (2014). For the comparison, we construct a proportional allocation model using rural population density following the method in Thomas et al. (2019) for the case of Brazil. 15 Then, we can test the similarity of the two maps. Following Levine et al. (2009), we assume a normal distribution over the 2 million land pixels and perform a pairwise student t test to test the null hypothesis that both maps were identical. This test allows us to examine whether the mean difference in the corresponding pixel value from one map to another was greater than would be expected by chance alone. The t test statistic tell us that we can not reject the null hypothesis which provides some evidence of similarity between the two models using all the global pixels. Figure 10 displays three global maps: the two models and their Spearman correlation. 16 We exclude areas from the analysis with values 325 that are less than 200,000. The correlation shows both areas of high and low correlation as the input of the models draws from the relationship of agriculture from productions values or a (rural) population perspective.
The cross-entropy model can also propagate errors from the ancillary data that are inputs to the components. For the SPAM model, the CGIAR network held expert consultation and validation workshops according to each crop and subsequently incorporated their feedback with modifications of the priors used in the model (You et al., 2014). The authors of Gridded Livestock

330
Of the World (GLW) note regional differences in accuracy (i.e. RMSE values) are the result of the variation of production intensity and thus dependence on the initial conditions of the land upon which the prediction variables are mainly drawn (spatial agro-ecological variables) (Robinson et al., 2014). Lastly, the models integrate higher spatial resolution data to inform the spatial disaggregation procedures, which is subject to the MAUP (Openshaw, 1981).

335
We provide descriptive statistics of the data and modeling from a fitness-for-use perspective (e.g. Leyk et al., 2019). The data are most appropriate for applications at global, continental and regional scales (You and Wood, 2006). Decisions regarding the use of this version over smaller spatial extents should be carefully considered in relation to the underlying assumptions and characteristics of a particular area. However, as the spatial refinement of ancillary data advances along with greater currency, coverage and representativeness, we expect validation possibilities to increase and inform a better understanding of the 340 uncertainty and the associated fitness-for-use. Also, we intend to improve spatial and temporal coverage when it is feasible.
The data disaggregation model from source to target level does impose spatial relationships and is subject to error (Li et al., 2007). The measurement of GDP is also challenging (Angrist et al., 2021), especially agricultural production (Carletto et al., 2015). The level of uncertainty associated with these results includes the thematic, spatial and temporal accuracies. Below, we discuss these data and modeling issues in relation to two aspects: regional accounts and the regional components of AgGDP 345 (mainly crop, fishery, forest, and livestock production values) that are priors in the cross-entropy model, and the outcome of the cross-entropy model. 15 We use the 2010 Gridded Population of the World version 4 from Center for International Earth Science Information Network -CIESIN -Columbia University (2017) adjusted to the United Nation's World Population Prospects followed by including the rural area defined by the Global Human Settlement grid for 2015: namely, "Rural cluster", "Low Density Rural grid cell", or "Very low density rural grid cell" . 16 The raster correlation in R performs a simple moving window correlation between two grids with a 3x3 pixel window. 3.2.1 Regional accounts We collect regional accounts by sector from various sources into a global database. The data are not balanced over time nor at the geographic level. The variation in the reference year of the regional accounts data influences the temporal balance of 350 the database. This mismatch can influence the regional distribution of the agricultural GDP that may be different than the target reference year of 2010. Given climate 17 and specifically rainfall is important input to crop and livestock production and may contribute to variation across years (Stanimirova et al., 2019;Zhang et al., 2020), we attempt to reduce this source of error by averaging over multiple years when data are available similar to You et al. (2014). However, this does not eliminate this mismatch. The availability of data varies when grouped by World Bank income (low or lower middle, upper middle and 355 high income). The average absolute temporal difference (ATD) defined as the mean difference in years between the reference regional accounts and the target year (2010) is higher in the low and lower middle income group. Likewise, the mean deviation of the share of AgGDP by country over the year(s) is larger in low or lower middle compared to high income. The global regional account database includes national and subnational units at various administrative levels. 18 Following Robinson et al. (2014) in their assessment of Gridded Livestock Of the World (GLW) 2.0, we summarize the average spatial 360 resolution (ASR) of the input regional data, which is the square root of the land area divided by the number of administrative units. We find that on average the ASR decreases from high to low income groups.

Components
Another source of uncertainty is indirect temporal inaccuracy propagated from the input datasets of the components, which are modeled. We discuss all five components of agricultural GDP: crop, livestock, forest, fish and hunting. The SPAM model 365 (You et al., 2014) is a result of several gridded modeled datasets including rural population density from Global Rural-Urban Mapping Project (GRUMP) Alpha version (Balk et al., 2006). Likewise, the Gridded Livestock of the World v2.0 includes rural population density in 2006 (GRUMP) along with other predictors such as precipitation (Hijmans et al., 2005) and a modeled travel time to places with 50,000 inhabitants circa 2000 (Nelson, 2008). (Anderson et al., 2015) find variation in their examination of global data products of cropping systems models. For livestock, we transform the 5 major livestock into 370 international values from livestock products (namely, meat, milk, eggs, honey and wool). The forest (non-wood products, wood-products) components relies on a remote sensing model to estimate forest loss. With regards to the non-timber values, limitation from the sources present two challenges. The estimates use simple averages from the literature that accordingly assume a property of uniformity in the value of a hectare of forest as similar across the world and the sample of forests with literature drawn for the study is representative of the world (Siikamäki et al., 2015). The fishing model relies on proximity and 375 association with ports or water bodies. 19 Finally, since we do not incorporate any information on hunting, the result is an even distribution across units and time.
17 For a discussion on climate yield factors see Block et al. (2008). 18 This also includes cases where administrative units at the same level are merged to match the geography of the regional accounts data. 19 The freshwater case does not account for any variation, whereas the marine port locations incorporate variation on vessel holds. Another source of uncertainty is the geographic distribution of the components. Ideally, we would use subnational prices, however it was not feasible, and the results do not reflect this occurrence, including administrative units with higher variation of prices due to the heterogeneity of distinct urban and rural areas.

Conclusions
Natural hazards impact both lives and livelihoods and a higher frequency and severity of disasters will likely increase in a changing climate. Socio-economic estimates at the local level inform disaster preparations of the exposure of physical assets and production to natural hazards and have implications for food security. Increased frequency and severity of natural hazards such as floods, droughts and cyclones are also likely to impact agricultural production systems, which can be wide ranging 385 including loss of life, harvest or livestock and damage to infrastructure.
Significant advancement in the spatial allocation of indicators has occurred in the past 10 years such as population (e.g Leyk et al., 2019). The advantages of gridded data as a common spatial unit of integration and the cross-entropy models are clear. These common units allow us to examine within-country characteristics, especially in the case of spatial data that do not conform with each other such as administrative boundaries and natural hazards to inform analyses with local estimates. We 390 present a novel data set that disaggregates the national and regional accounts of the agriculture sector across areas as a result of a model where we use ancillary data including satellite data. This allows us to estimate especially in countries that have a relatively higher share of agricultural activity in the entire economy. Then, we examine the exposure of areas with at least one extreme drought during 2000 to 2009 to agricultural GDP, where nearly 1.2 billion people live, and find an estimated US$432 billion of agricultural GDP in 2010.

395
These data are the result of data collection and collaboration across multiple entities to ensure the most current and widest coverage. However, persistent challenges to data collection remain, including limited geographic levels and temporal lag at low frequencies. Also, the reference year and spatial resolution of the local AgGDP estimates are limited to the contemporaneous availability of the economic statistics and components such as the crop production model. We often have to consider the fitness-for-use while considering the accuracy; the model has lower average spatial resolution in areas where we have little 400 data, however these same areas may benefit from the availability of these estimates to inform policy. Predictions are dependent on the availability and quality of the training data on which the model is based and the modeling process is flexible to update individual countries as the data are available. In the near future, we hope to increase the currency and number of countries with subnational data as updates of the regional account data and models of the components upon which the model relies become available.       We appreciate the support of the World Bank Strategic Research Program on Big Data.