E YES IN THE S KY , B OOTS ON THE G ROUND : A SSESSING S ATELLITE - AND G ROUND -B ASED A PPROACHES TO C ROP Y IELD M EASUREMENT AND A NALYSIS Downloaded from https://academic.oup.com/ajae/advance-article-abstract/doi/10.1093/ajae/aaz051/5607565 by guest on 31 March 2020 D AVID B. L OBELL , G EORGE A ZZARI , M ARSHALL B URKE , S YDNEY G OURLAY , Z HENONG J IN , T ALIP K ILIC , AND S IOBHAN M URRAY Understanding the determinants of agricultural productivity requires accurate measurement of crop output and yield. In smallholder production systems across low- and middle-income countries, crop yields have traditionally been assessed based on farmer-reported production and land areas in house- hold/farm surveys, occasionally by objective crop cuts for a sub-section of a farmer’s plot, and rarely us- ing full-plot harvests. In parallel, satellite data continue to improve in terms of spatial, temporal, and spectral resolution needed to discern performance on smallholder plots. This study evaluates ground- and satellite-based approaches to estimating crop yields and yield responsiveness to inputs, using data on maize from Eastern Uganda. Using unique, simultaneous ground data on yields based on farmer reporting, sub-plot crop cutting, and full-plot harvests across hundreds of smallholder plots, we docu- ment large discrepancies among the ground-based measures, particularly among yields based on farmer-reporting versus sub-plot or full-plot crop cutting. Compared to yield measures based on either farmer-reporting or sub-plot crop cutting, satellite-based yield measures explain as much or more varia- tion in yields based on (gold-standard) full-plot crop cuts. Further, estimates of the association between maize yield and various production factors (e.g., fertilizer, soil quality) are similar across crop cut- and satellite-based yield measures, with the use of the latter at times leading to more significant results due to larger sample sizes. Overall, the results suggest a substantial role for satellite-based yield estimation in measuring and understanding agricultural productivity in the developing world. Key words: Agricultural productivity, crop yield estimation, crop cutting, maize, remote sensing, Uganda. JEL codes: C83, Q12. Improving the productivity of smallholder most effective avenues for reducing their pov- farmers is widely considered to be one of the erty and food insecurity (Byerlee et al. 2007). With agriculture contributing up to 69% of ru- ral household income in Africa (Davis et al. David B. Lobell is a professor in the Department of Earth System Science, and the Center on Food Security and the Environment (FSE), Stanford University. George Azzari is chief technology offi- cer at Atlas AI and Marshall Burke is an assistant professor, both in Development Data Window, and the CGIAR Standing Panel on the Department of Earth System Science and the FSE, Stanford Impact Assessment. Terra Bella (now Skysat) provided free high- University. Sydney Gourlay is a survey specialist at Living resolution satellite imagery for the MAPS remote sensing tasking Standards Measurement Study (LSMS), Development Data Group area for research purposes. MAPS I and MAPS II were both imple- (DECDG). Zhenong Jin is an assistant professor in the mented using the World Bank Survey Solutions Computer-Assisted Department of Bioproducts and Biosystems Engineering, Personal Interviewing (CAPI) platform. The research team would University of Minnesota. Talip Kilic is a senior economist and like to thank the dedicated management and field staff of the Siobhan Murray is a technical specialist, both at LSMS, DECDG, Uganda Bureau of Statistics regarding fieldwork implementation; the World Bank. The lead principal investigators were Lobell and Mr. Wilbert Drazi Vundru for Survey Solutions programming, field- Kilic on the Stanford University and the World Bank front, respec- work supervision and survey data quality control; and Ms. Madeline tively. Field data collection for MAPS II (2016) was financed by the Lisaius for help with image processing. The authors thank the World Bank Innovations in Big Data Analytics Program, the World Global Innovation Fund and USAID/BFS for additional funding. Bank Trust Fund for Statistical Capacity Building—Innovations in Correspondence may be sent to: dlobell@stanford.edu. Amer. J. Agr. Econ. 00(0): 1–18; doi: 10.1093/ajae/aaz051 C The Author(s) 2019. Published by Oxford University Press on behalf of the Agricultural and Applied Economics V Association.. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 2 August 2019 Amer. J. Agr. Econ. 2017), and given high rates of expected pov- Compared to the body of methodological erty reduction associated with agricultural research that has shown severe systematic growth (Dorosh and Thurlow 2018), such pro- biases in farmer-reported plot area measures ductivity improvements remain a longstanding (Carletto et al. 2017) and that have under- goal in many African countries. Similarly, at lined the increasing use of GPS-based plot the international level, doubling the produc- area measurement in national household sur- tivity and incomes of smallholders have been veys, there is a dearth of evidence on the ac- Downloaded from https://academic.oup.com/ajae/advance-article-abstract/doi/10.1093/ajae/aaz051/5607565 by guest on 31 March 2020 identified as a key target within the United curacy of farmer-reported crop production. It Nation’s Sustainable Development Goal is, however, known that the process of solicit- (SDG) 2 of Ending Hunger. ing farmer-reported production information Accurate measurements of crop production, is mediated by complexities that include (a) cultivated area, and yield are at the heart of potential recall bias, (b) tendency to round official agricultural statistics and are key to off numbers, (c) the use of non-standard mea- monitoring progress towards national and in- surement units, (d) various conditions and ternational development goals, including SDG states of crop harvest; and (e) partial/early 2. Further, the survey data underlying these crop harvests, among others (Carletto, outcomes are frequently used by agricultural Jolliffe, and Banerjee 2015). In fact, the economists to investigate a vast array of emerging body of evidence from various policy-relevant research topics, including (a) smallholder production systems across Africa the scale-productivity relationship (Larson has revealed the systematic measurement et al. 2014; Julien et al. 2019); (b) agricultural errors in self-reported crop production productivity impacts of fertilizer use (Harou (Gourlay, Kilic, and Lobell 2017; Desiere and et al. 2017), soil quality (Berazneva et al. Jolliffe 2018; Abay et al. 2019) and their non- 2018), land misallocation (Restuccia and negligible implications for questions at the Santaeulalia-Llopis 2017), and sustainable heart of agricultural economics, including the land management practices (Arslan et al. scale-productivity relationship. These find- 2015); (c) farm- and household-level impacts ings further highlight the critical need to im- of exposure to extreme weather events prove the accuracy of methods used to (Wineman et al. 2017; McCarthy et al. 2018); measure land productivity. (d) the extent and cost of gender differences A less common but also well-established in agricultural productivity (O’Sullivan et al. approach to measure crop yields is by physi- 2014; Kilic et al. 2015); (e) the relationships cally harvesting a sub-section of a farmer’s between agricultural and welfare outcomes at plot, also known as crop cutting (Fermont the household- and/or individual-level and Benson 2011). Crop cutting provides a (Carletto, Corral, and Guelfi 2017; Darko more objective way to measure grain produc- et al. 2018); and (f) the comparative effects of tion for a part of the plots, but heterogeneity agricultural versus non-agricultural growth on within a plot can lead to sensitivities of crop poverty reduction (Dorosh and Thurlow 2018; cut yields to the precise location and size of Ivanic and Martin 2018). the crop cut sub-plot vis-a-vis the entire plot The most common way to assess outcomes (Fermont and Benson 2011). An alternative related to the productivity of smallholder is to harvest the entire plot, which avoids farmers, including land productivity (e.g., most of the problems of the prior methods crop yields), is by using information collected and is therefore frequently considered the through in-person interviews for household “gold standard” yield measurement (Casley and farm surveys. For example, the house- and Kumar 1988; Fermont and Benson 2011). hold surveys supported by the World Bank However, full plot harvests require a substan- Living Standards Measurement Study— tial amount of labor and coordination with Integrated Surveys on Agriculture (LSMS- farmer harvest schedules, which makes them ISA) initiative measure plot areas with costly and difficult to scale. handheld GPS units and solicit farmer- Given the limitations of existing reported information on crop production approaches, recent work has explored the and input use, among other topics, at the ability of satellite data to track crop yields. plot level. These data, together with the Burke and Lobell (2017) showed that 1 m res- multi-topic information solicited by these olution data from Terra Bella’s Skysat sen- surveys, have informed a burgeoning field of sors (now owned by Planet Labs) were useful development research on Africa over the for mapping maize yields for farms in western last decade. Kenya. This usefulness was measured both by Lobell et al. Eyes in the Sky, Boots on the Ground 3 correlation of satellite-based yield estimates plot harvests”). Thus, we can compare differ- with traditional ground-based yield measures, ent ground-based measures with each other, as well as by the ability of satellite-based and with the satellite data. yields to detect positive yield associations Third, the study uses data from the with fertilizer and hybrid seed inputs. This Copernicus program’s Sentinel-2A satellite, latter aspect was considered especially impor- which has coarser spatial resolution but more tant since (a) ground-based yield measures spectral bands than the Skysat sensor used in Downloaded from https://academic.oup.com/ajae/advance-article-abstract/doi/10.1093/ajae/aaz051/5607565 by guest on 31 March 2020 are inevitably imperfect themselves, and (b) Burke and Lobell (2017). Furthermore, detecting response to inputs or some other whereas Skysat data are currently only avail- aspect of farm management is a common mo- able for a small fraction of the Earth’s surface tivation for collecting plot-level yield data in each day, Sentinel-2A and its recently the first place. However, objective ground- launched sister satellite Sentinel-2B each cap- based measures of productivity were unavail- ture imagery every ten days for the entire able in Burke and Lobell (2017), which land surface of the Earth, with an effective limited the ability to understand the relative five-day repeat for the Sentinel-2 duo since extent of measurement error in ground- June 2017. These imagery are quickly made versus satellite-based measures. available to the public at no cost. For these Here, using unique gold-standard data reasons, Sentinel-2 represents an attractive from full-plot harvests across hundreds of option for estimating yields over large smallholder fields, we assess the ability of regions.1 satellite-based approaches to measure plot- All plot-level measures of maize yield, in- level maize yields on African smallholder cluding farmer-reported self-reported pro- farms and to understand how yields respond duction per hectare (SR), sub-plot crop cut to productivity-enhancing factors such as soil production per hectare (CC), full plot crop quality. The analysis uses data from Eastern production per hectare (FP), and variants of Uganda from the 2016 round of MAPS: remotely sensed production per hectare (RS) Methodological Experiment on Measuring rely on GPS-based plot areas. All such meas- Maize Productivity, Soil Fertility and ures are also compared to each other using Variety, a survey experiment implemented standard statistical approaches, and are used during the first rainy season of 2016 (June– to study the sensitivity of the associations be- October) in 45 enumeration areas within a tween maize yield and various production 400 square kilometer area spanning the factors measured through a combination of a Iganga and Mayuge districts of Eastern household survey and extensive soil sam- Uganda, the leading maize-producing region pling. Overall, we find that SR yields exhib- of the country. ited significant positive bias when compared The analysis extends the work presented in to CC or FP yields, with an average yield in Burke and Lobell (2017) in at least three sub- SR more than double the other two. stantial ways. First, the Ugandan maize sys- Although CC yields agreed well with FP in tems are considerably more subsistence- terms of the overall yield distribution, the focused and heterogeneous than the Kenyan correlation between CC and FP yields across counterparts in Burke and Lobell (2017), fields was relatively low, with CC able to cap- with generally smaller plot sizes, lower input ture roughly one-quarter of the variability in use, greater prevalence of under-canopy FP yields. The RS yields exhibited significant intercrops such as beans and groundnuts, and correlations with the ground measures, in frequent occurrence of over-canopy inter- some cases exceeding CC yields in the ability crops such as cassava and bananas. Thus, to capture variation in FP yields. Moreover, Uganda represents a different and, in many RS yields exhibit correlations with different ways, more challenging environment in which production factors (e.g., fertilizer, soil qual- to test satellite-based crop yield measure- ity) that are very similar to those for CC and ment approaches. FP yields, further indicating that RS yields Second, whereas Burke and Lobell (2017) provide a meaningful measure of land relied on farmer self-reported data on maize productivity. production, this paper uses objective meas- ures based on survey field team harvests of maize grain for 64m2 subplots within each plot (“crop cuts”), as well as whole plot har- 1 Burke and Lobell (2017) focused on field campaigns in 2014 vests for a random half of our sample (“full and 2015, before Sentinel-2 was operational. 4 August 2019 Amer. J. Agr. Econ. The paper is organized as follows. The next one maize plot was selected from each house- section describes the data, while the following hold for crop cutting and variety identifica- section presents the comparisons among tion components. ground-based yield measures, as well as be- MAPS II implemented full-plot crop cut- tween ground- and satellite-based yield meas- ting for a random sub-sample of plots, and in- ures, and the results from the estimations of creased the area for sub-plot crop cutting maize yield regressions for each yield variant (from 4x4m to 8x8m) on each plot. These Downloaded from https://academic.oup.com/ajae/advance-article-abstract/doi/10.1093/ajae/aaz051/5607565 by guest on 31 March 2020 of interest. The last section discusses these decisions were anchored in the concerns results and summarizes the main conclusions. around intra-plot variability of maize yields. Given the enhancements in the scope of crop cutting data in MAPS II and the interest in Data the validation of satellite-based approaches to yield estimation, we rely solely on the MAPS II data on 463 households/plots for MAPS: Methodological Experiment on which sub-plot crop cutting data are avail- Measuring Maize Productivity, Soil Fertility able. The only exception, as explained below, and Variety is a two-round household panel is the plot-level data on soil fertility, which is survey that was conducted in Eastern Uganda sourced from MAPS I. Table 1 provides a to test the relative accuracy of subjective breakdown of 463 plots in accordance with approaches to data collection vis- a-vis objec- pure stand versus (type of) intercropped cul- tive survey methods for maize yield measure- tivation status. ment, soil fertility assessment, and maize Three visits were made to each household variety identification. Both survey rounds during MAPS II. During the (first) post- were implemented by the Uganda Bureau of planting visit, enumerators solicited informa- Statistics, with technical and financial assis- tion on (a) demographic and socio-economic tance provided by an inter-agency partner- attributes of household members; (b) house- ship that was led by the World Bank Living hold dwelling characteristics and ownership Standards Measurement Study (LSMS). of durable assets and agricultural imple- ments; and (c) area, cultivation pattern, man- Sampling Design and Fieldwork agement, pre-harvest labor and seed inputs for all maize plots that were cultivated during Analysis in this paper focused on Round II of the reference rainy season.3 Following the MAPS in 2016, building on sampling from completion of the household post-planting in- earlier MAPS I in 2015. In Round 1, a sample terview, each enumerator visited the maize of 75 enumeration areas (EAs) were selected plot that was selected in accordance with the in Eastern Uganda, the top maize-producing protocol detailed in the previous section. At region of the country, using the 2014 that time, plot boundaries were mapped with Population and Household Census (PHC) a handheld GPS device and crop-cut sub- EA frame. We focus on 45 EAs distributed plots set up for later harvesting and weighing. across a 400 square kilometer remote sensing The crop cut sub-plot location was chosen at tasking area spanning the Iganga and random, in accordance with the protocol de- Mayuge districts (figure 1). Fieldwork was tailed by Gourlay, Kilic, and Lobell (2017) conducted from June to October 2016, and and in line with international best practices. field teams attempted to track and re- During the (second) crop cutting visit, the interview 540 households that had been inter- enumerator harvested the crop cut sub-plots viewed in Round 1 within the tasking area. to obtain objectively measured harvest Overall, 489 of the 540 households were successfully re-interviewed.2 As in MAPS I, 3 A parcel is conceptualized as a continuous piece of land un- 2 In total, 34 out of 51 households that we did not interview in der a common tenure system, while a plot is defined as a continu- MAPS II were due to the fact that they were not cultivating ous piece of land on which a unique crop or a mixture of crops is maize in the first season of 2016. The remaining 17 households grown under a uniform, consistent crop management system, not can be broken down as follows: 5 households could not be split by a path of more than one meter in width, and with bound- tracked or were outside of the tracking area defined as the aries defined in accordance with the crops grown and the opera- Iganga and Mayuge districts (5); 4 households had suffered total tor. Therefore, a parcel can be made up of one or more plots. crop loss prior to post-planting interview; 7 households had al- This distinction is key since for the purposes of within-farm anal- ready harvested their maize by the post-planting interview; and 1 ysis of agricultural productivity, the ideal is to capture within- household refused. Gourlay, Kilic, and Lobell (2017) report that parcel, plot area measurements linked with plot-level measure- attrition bias is not a concern. ment of agricultural production. Lobell et al. Eyes in the Sky, Boots on the Ground 5 Downloaded from https://academic.oup.com/ajae/advance-article-abstract/doi/10.1093/ajae/aaz051/5607565 by guest on 31 March 2020 Figure 1 Study region in Eastern Uganda Note: Three images show Sentinel-2 images and dates used in the study. Polygons indicate outlines of plots where surveys/crop cuts were performed. re-paced the perimeter and measured the area Table 1. Distribution of MAPS II Plots by with a Garmin eTrex 30 handheld GPS device. Cultivation Status The area was recorded on the questionnaire in Pure Intercropped square meters, and the raw GPS track outline stand was stored. The competing yield measures in Maize- Maize- Maize- Maize- our study are all anchored in GPS-based plot Legume Cassava Legume- Other Cassava area measurement. In MAPS II, the median plot size was 0.11 hectare (ha; roughly one- 124 119 161 52 7 quarter of an acre), with 46% below 0.10 ha and 17% below 0.05 ha. Soil fertility assessment. The soil quality in- quantities, as detailed in the subsequent sec- dex, based on lab analyses of soil samples tion. Finally, during the (third) post-harvest obtained from the sampled plot locations, is visit, farmer-reported information on total plot- used in our analysis to gauge the possibility specific maize production, non-labor inputs and of recovering the expected coefficients in pro- harvest labor inputs was solicited for all maize duction function estimations that use plots that were cultivated during the reference satellite-based yields as dependent variables. season. The post-harvest visit was scheduled Gourlay, Kilic, and Lobell (2017) provide within a two-month period following the com- details on the collection of soil samples at pletion of each household’s harvest. each plot location in MAPS I. Briefly, four soil samples were collected at random locations within each plot and were subjected to spec- Key Measurement Domains and Methods tral soil analysis. The resulting data were used to construct a composite soil quality index (SQI), following Mukherjee and Lal (2014). Plot area measurement. After walking the pe- Given the data limitations, the constructed in- rimeter of a given plot with the plot manager dex focuses on nutrient storage capacity but to identify the boundaries, the enumerators ignores the other two components of soil 6 August 2019 Amer. J. Agr. Econ. quality identified by Mukherjee and Lal field and then reweighed at a central location (2014) related to root development and water in Kampala under strict supervision following storage.4 additional drying. At the time of the final weighing, the moisture content of each sample Ground-based maize yield measurement. We was captured to standardize all crop cut sample construct both farmer self-reported (SR) and weights used for our analyses at 12% moisture. The MAPS II sub-plot crop cutting based plot- Downloaded from https://academic.oup.com/ajae/advance-article-abstract/doi/10.1093/ajae/aaz051/5607565 by guest on 31 March 2020 crop-cut based yield estimates. For the farmer estimates, plot managers were asked to report level maize production estimates are computed their estimate of maize harvest at the parcel- by multiplying the crop cut sub-plot production plot-level during the post-harvest visit, as de- across the 64m2 area covered by the 8x8m sub- scribed in Gourlay, Kilic, and Lobell (2017). plot by the ratio of the entire GPS-based plot Each plot manager was allowed to report pro- area in square meters up to 64m2. duction in non-standard measurement units, Furthermore, half of the target household and the dry grain-equivalent harvest quantities population within each of the pure stand and in kilograms were calculated by using a con- intercropped domain was selected at random version factor database developed by UBOS.5 for a full-plot (FP) crop cut. This rare ap- To complement SR estimates, we also proach to crop production measurement obtained two crop-cut based measures of plot- entails the harvesting of the entire plot area, level yields. Crop cutting has been recognized shelling the resulting harvest, weighing it in as the gold standard for yield measurement the field, and capturing its moisture level. This since the 1950s by the Food and Agriculture operation was conducted by the enumerators Organization of the United Nations (FAO). with help from the EA-specific crop cut moni- Gourlay, Kilic, and Lobell (2017) review the tor and the crop cut assistant(s) recruited from potential concerns regarding yield measure- within the households. On the MAPS II plots ment concerning crop cutting and detail the selected for full-plot harvest, the harvest of way in which the MAPS approach to crop cut- the designated 8x8m subplot was weighed sep- ting and its hands-on supervision overcame arately from the full-plot harvest to allow for them. comparative yield analysis. The full-plot har- In this study, one 8x8m sub-plot (divided vests were only weighed in the EAs as their into four 4x4m quadrants) was laid on each transport to and additional drying and plot. Each subplot was cordoned off until har- reweighing at a central location was deemed vest and was supervised by the EA-specific logistically infeasible. Moisture readings taken crop cut monitor between the post-planting from the maize grain harvested from the full and the crop cutting visits. Each plot manager plot harvests were used to standardize the pro- was asked not to harvest any crop from the duction quantity to 12% moisture. A total of sub-plots until the crop cutting visit, and not to 211 plots had full-plot harvests. Gourlay, Kilic, manage the sub-plot any differently than the and Lobell (2017) detail the approach to full rest of the plot. These messages, first communi- plot harvests. Although farmers were not told cated by the enumerator, were intended to be the final weight of their harvest, it is likely that enforced by the local crop cut monitors.6 The the process of harvesting and bagging the shelled maize harvests tied to each of the four maize improved their self-report production adjacent 4x4m quadrants were weighed in the values compared to plots without full plot har- vests. Therefore, the analyses that use self- reported maize production per hectare rely only on 252 plots without a full plot harvest. 4 The PCA-based soil quality index was constructed for the full Ground-based SR and FP yields were de- MAPS 1 sample, and therefore analyzes the correlation of soil prop- rived by dividing the reported or measured erties and crop cutting yields on a larger sample than MAPS 2. 5 Refer to Gourlay, Kilic, and Lobell (2017) for more informa- mass of maize production by the area corre- tion regarding the conversion factors used in expressing farmer- sponding to the GPS-based plot area, or 64m2, reported production information in kilogram-equivalent terms. in the case of the 8x8m crop cut sub-plot. 6 The lack of statistically significant differences between average CC and FP yields is a finding in support of the assumption that the crop cut sub-plot areas were not managed differently with respect Satellite-based yield measurement. Images to the rest of the plot. Following the first visit to the sampled house- holds, the supervision of the crop cut sub-plots were conducted on from Sentinel-2A, processed to top-of- a weekly basis by the local crop cut monitors, who were tasked with visiting the sampled households and sub-plot locations to en- sure that the farmers were clear regarding our request for consis- tency in management practices on the crop cut sub-plot vis- a-vis weekly progress reports, none of which referred to any suspected the rest of the plot. During the fieldwork, the field teams submitted instances of differential management of crop cut sub-plot areas. Lobell et al. Eyes in the Sky, Boots on the Ground 7 Table 2. Spectral Vegetation Indices (VIs) Employed in This Study Name Equation Equation using Reference Sentinel-2 bands NDVI (RNIR – RRED) / (B8 – B4) / (B8 þ B4) (Rouse et al. 1973) (Normalized (RNIR þ RRED) Difference Downloaded from https://academic.oup.com/ajae/advance-article-abstract/doi/10.1093/ajae/aaz051/5607565 by guest on 31 March 2020 Vegetation Index) GCVI (RNIR / RGREEN) – 1 (B8/B3) - 1 (Gitelson et al. 2003) (Green Chlorophyll Vegetation Index) MTCI (RNIR—R705) / (B8-B5) / (B5 – B4) (Dash and Curran 2004) (MERIS Terrestrial (R705—RRED) Chlorophyll Index) NDVI705 (RNIR – R705) / (B8 – B5) / (B8 þ B5) (Gitelson et al. 2003) (Red-Edge NDVI705) (RNIR þ R705) NDVI740 (RNIR – R740) / (B8 – B6) / (B8 þ B6) (Gitelson et al. 2003) (Red-Edge NDVI740) (RNIR þ R740) Note: R refers to reflectance, and B refers to the corresponding sentinel-2 band number used to compute the VI. atmosphere reflectance (Level -1 C), were trained on several images in the region, includ- accessed within the Google Earth Engine ing those used in Burke and Lobell (2017). platform. Sentinel-2A is a polar orbiting sat- Satellite-based yields were then derived in ellite carrying a Multi-Spectral Instrument two ways, following Burke and Lobell (2017). (MSI), which acquires images at $10:30 a.m. First, “calibrated” remote sensing yields local time for each location on the Earth’s (RS_cal) were obtained from a regression land surface roughly every ten days. The MSI model of FP yields on VI values measured on measures radiation reflected from the Earth’s May 30 and June 19, 2016, using only pure surface in 13 separate wavelength intervals stand maize plots that were at least 0.1ha in called “bands”, with a spatial resolution of size. Since FP yields are expensive to obtain 10m for the visible and near-infrared bands, and cannot be considered as part of large- and 20m to 60m for other bands. For this scale operations, an alternative version of the study, three relatively cloud-free images were calibrated remote sensing yield was obtained available during the growing season, on April (RS_cal_cc), which used CC, rather than FP 30, May 30, and June 19, 2016. Sentinel-2B, yields to calibrate the model. These models which is identical to Sentinel-2A but stag- can be specified as follows: gered by five days, was launched in 2017 and so is not included in this study. Clouds and shadows were masked from the ð1 Þ RS cal Model : FP Yieldi ¼ aFP;i Sentinel images using a random forest classi- þ bVI ;FP;1 Ã VI May 30;i fier trained on points visually selected from images throughout the region. Five vegetation þ bVI ;FP;2 Ã VI June 19;i indices (VIs) that are commonly used in the literature were then calculated for each pixel þ eFP;i using the equations shown in table 2. The av- erage value of all bands and VIs within each ð2 Þ RS cal cc Model : CC Yieldi ¼ aCC;i plot polygon were then extracted for each im- þ bVI ;CC;1 Ã VI May 30;i age date for further analysis, averaging across all pixels with at least half of their area over- þ bVI ;CC;2 Ã VI June 19;i lapping with the plot. In addition, for compari- son with the Sentinel-2A images, an image þ eCC;i acquired by Planet Lab’s Skysat sensor on May 29, 2016 was accessed. Skysat measures where i denotes plot, and a and e denote radiance in blue, green, red, and near-infrared regression-specific constant and error term, re- channels at a 1m resolution. As with the spectively. The calibration was done using only Sentinel-2 data, clouds and shadows were purestand plots since ground-based objective masked using a random forest classifier yield estimates were not available for non- 8 August 2019 Amer. J. Agr. Econ. maize crops on intercropped plots. The restric- including log of plot area, log of distance to tion in terms of plot area was driven by smaller household (km), presence of cover crops, log plots having larger problems with geolocation of seed planted (kg), use of inorganic fertil- accuracies and mixed pixels in Sentinel-2. All izer, log of household labor days and hired la- VIs shown in table 2 were tested and are dis- bor days, number of hired laborers, soil cussed below, with the preferred model using quality index (SQI), and household attributes, the MERIS Terrestrial Chlorophyll Index including wealth index, agricultural asset in- Downloaded from https://academic.oup.com/ajae/advance-article-abstract/doi/10.1093/ajae/aaz051/5607565 by guest on 31 March 2020 (MTCI). Yields were estimated as a linear dex, dependency ratio, household size, head function of VI as shown in equations (1)–(2). of household age, gender, and years of educa- Quadratic models were also considered but tion, and whether the manager was the survey gave poorer out-of-sample performance. respondent. For regressions including inter- The second satellite-based approach was to cropped plots, two additional variables were estimate “uncalibrated” yields (RS_scym) by included: a binary variable indicating the pres- using the scalable crop yield mapper (SCYM) ence of an intercrop, and a variable indicating approach (Lobell et al. 2015). In this approach, the log of the intercrop seed rate (i.e., the ra- a crop model and local daily weather data were tio of quantity of seed planted to quantity of used to simulate crop growth and yield for vari- seed that the farmer estimates would have ous realistic combinations of on-farm manage- been planted if the plot was pure stand). ment, such as sow date, seeding density, and Although we include a rich set of controls in fertilizer rate. The simulated values of total can- our regressions, it is possible that omitted vari- opy nitrogen on the dates with available images ables may be affecting yields, and therefore were then translated into MTCI using pub- the estimated coefficients should not be inter- lished relationships (Schlemmer et al. 2013), preted as causal. Instead, the primary goal of this analysis is to use independently measured ð3Þ MTCI ¼ 3:05 þ 0:789 Ã canopyN variables—many of which (such as fertilizer or soil quality) are known to affect productivity in a wide array of cropping systems—to fur- where canopyN is the simulated amount of ther evaluate satellite-based yield measures. total nitrogen in aboveground biomass after This is especially helpful in cases where the subtracting the nitrogen in the grain (which is ground-based yield measures are thought to invisible to the sensor). As in the calibrated be error-prone, or when output is measured approach, the yields are then regressed on MTCI, except in the case of SCYM the re- only for one crop on an intercropped field. gression uses simulated yield and MTCI rather than actual values. In this way, SCYM Results avoids reliance on any ground data for cali- bration, which is why it is referred to as an “uncalibrated” approach. Given the unique co-occurrence of three dif- Both types of satellite-based yield estimates ferent ground-based yield measures in this were tested in two complementary ways. First, study, we begin by comparing these measures the yields were compared directly with the to each other. We then describe the compari- ground-based estimates across both purestand son of satellite and ground measures of maize and intercropped plots. However, given that yield for purestand maize fields, where the ground-based estimates are subject to (differ- comparison is most straightforward because ent types of) measurement error and neglect a maize harvest alone defines the productivity potentially substantial amount of production of the plot. Comparisons are then presented from non-maize crops, the direct comparisons for intercropped fields where ground-based between the two yield measures is not a measures provide only a partial measure of straightforward test of the satellite-based crop output. Finally, we present results of yields. That is, some of the discrepancy will regressing the various yield measures on dif- also be due to errors in the ground-based esti- ferent production factors, both with and with- mates, or discrepancies in the types of outputs out including intercropped fields. that are measured. As a second form of evalu- Comparison of Ground-Based Yield ation, we performed regressions of yield on Measures different production factors for both ground- based and satellite-based yields and compared The distributions of yields from the three the resulting coefficients. Specifically, we ground-based approaches are displayed in regressed yields on key plot characteristics, figure 2a and summarized in table 3. Both Lobell et al. Eyes in the Sky, Boots on the Ground 9 (a) (b) 5 Field Size 1.0 Self−Report (SR) ● ● SR Yield (Mg/ha) ● < 0.05 ha Crop Cut Harvest (CC) 4 ●● ● ● ● >= 0.05 ha ● ● ● Full Plot harvest (FP) 3 ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● 0.8 2 ● ● ●● ● ● ●● ●● ● ● ● ●● ● ● ● ●● ● ●● ●● ● ●● ●● ● ● ● ● ●●● ● ● ● ● Downloaded from https://academic.oup.com/ajae/advance-article-abstract/doi/10.1093/ajae/aaz051/5607565 by guest on 31 March 2020 ● ● ●● ● ● ● ●● ● Correlation ●● ●● ● ● ●● ● 1 ●● ● ● ● ●●● ● ● ● ●●●●● ● ● ● ● ●● ●● ● ●●●● ● ● ● ●● ●●● ● ● ● ● ●●● ● ●● ● ● ● ● ● ●●●● ● ●● ● ●● ● All Fields, r = 0.04 ●● ●●● ● ● ● ●●● ●● ●●●● ●●● ● ● ● ●● ● ● ● ● ● ●● ●● ●●● ● ●● ● ● ● ● ● Fields >= 0.05 ha, r = 0.28 ●●● ● ● ●● ● ●● ● ●●● 0 ● ●● ● 0.6 0 1 2 3 4 5 Density CC Yield (Mg/ha) 0.4 (c) 5 ● FP Yield (Mg/ha) 4 ● 0.2 3 ● ● ● ● ● ●● 2 ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● Correlation ● ●● ● ● ● ●● 1 ●● ● ● ●●● ● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ●● ●● ● ●● ●●● ●● ●●●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● All Fields, r = 0.51 ● ● ● ●●●● ● ●●● ● ● ● ● ● ● 0.0 ● ● ● ●● ●● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●●● ● ● ● ● ●●● ● ● ●●●● ●● ● ● ● ●●● Fields >= 0.05 ha, r = 0.57 0 ●● ● ● ●● ● 0 1 2 3 4 5 6 0 1 2 3 4 5 Yield (t/ha) CC Yield (Mg/ha) Figure 2. Yield distributions for ground-based measures Note: (a) Vertical bars at bottom indicate the mean yield for each measurement approach. (b) Scatter plot of SR and CC yields for all plots, and, separately, for plots above 0.05ha in size (black points). (c) Scatter plot of FP and CC yields. objective, harvest-based approaches show (0.11 ha or 1,100 m2) or 4% of the average very similar distributions, with a mean CC plot size. The effect of this heterogeneity yield of 0.73 metric tons per hectare (t/ha) appears to be greater in intercropped plots, and a mean FP yield of 0.68 t/ha. These dif- as the correlation between CC and FP yields ferences were not statistically significant is higher on pure stand maize plots (r ¼ 0.70). (p > 0.2). In contrast, the farmer self-reported The more subjective SR yields show al- (SR) yields contained many more high yield- most no correspondence (r ¼ 0.04) with the ing values, including 11 (out of 252 total) crop cutting-based measures (figure 2b). plots with SR yield greater than 5 t/ha. The Because correlations may be heavily influ- highest SR yields tended to occur on very enced by errors on especially small fields, small plots, with 8 of these 11 being on plots figure 2b also reports correlations that are smaller than 0.05 ha. The average SR yield of based on the exclusion of plots with areas 1.83 t/ha was significantly higher, and indeed below 0.05ha. Despite the increase in the more than double, that for CC and FP yields. correlation coefficient to 0.28, still less than Given that SR, CC, and FP yields are com- 10% of the variation in CC yields is cap- peting ground-based measures, a useful ques- tured by SR yields. tion is how well correlated they are across different plots. Correlation between CC and Comparison of Ground- and Satellite-Based FP yields was significant (p < 0.01) but only Yield Measures on Pure Stand Plots 0.51 overall (figure 2c). If one views full-plot crop cutting as the “gold standard” of We begin the evaluation of satellite VIs by ground-based measures, this indicates that presenting the performance of the calibrated 8x8m crop cuts capture only roughly one- models (in terms of adjusted R2) using differ- quarter of the variability in actual plot yields. ent sources of ground-based yields for cali- These discrepancies reflect the substantial bration, as well as different types of VIs intra-plot heterogeneity of yields in these sys- (figure 3). Satellite-based yields were esti- tems. The 64 m2 area of the crop cuts, despite mated for all plots that did not contain clouds requiring a costly and ambitious effort, are on either May 30 or June 19 (397 out of 463 roughly just 6% of the median plot size total plots). 10 August 2019 Amer. J. Agr. Econ. Table 3. Summary Statistics of the Different Ground-Based Yield Measures All Pure stand Intercropped Yields (in kg/ha) mean median mean Median mean median Self-Reported 1,826 784 1,878 1,039 1,805 685 (SR) Downloaded from https://academic.oup.com/ajae/advance-article-abstract/doi/10.1093/ajae/aaz051/5607565 by guest on 31 March 2020 Sub-Plot Crop 728 595 827 725 692 571 Cutting (CC) Full Plot Crop 676 511 842 740 623 472 Cutting (FP) Different Different Different Different Different Different means? Distributions? means? Distributions? means? Distributions? SR vs. CC *** *** *** *** ** *** CC vs. FP À À À À À À consistently outperformed the other VIs on 1.0 Yield Measure Veg. Index both image dates. The MTCI was designed to SR NDVI be sensitive to canopy chlorophyll concentra- CC GCVI 0.8 FP MTCI tion (Dash and Curran 2004), which is likely a good proxy for yield in the low nutrient set- adjusted R2 0.6 ● ting of Uganda. Perhaps more importantly, MTCI is much less sensitive to atmospheric 0.4 ● conditions than other VIs such as NDVI or ● ● ● ● GCVI (Curran and Dash 2005) because it ● ● 0.2 ● ● ● ● ● uses the difference in reflectance between ● ● ● ● ● ● ● ● ● ● ● ● ● two nearby bands that will be similarly af- ● ● ● ● ● 0.0 ● ● ● ● ● fected by atmospheric scattering. In both images, significant amounts of haze are evi- 0.00 0.02 0.04 0.06 0.08 0.10 dent above many of the plot sites in both the Min field size (ha) raw reflectance and NDVI or GCVI images. However, the MTCI images exhibit much Figure 3. Adjusted R2 of regressions of yields lower sensitivity to haze (see online supple- vs. VI, by VI type and type of ground-based mentary appendix figure A1). (d) Finally, a yield measure substantial fraction of FP yield variability is Note: Models were run for successive subsets of data by excluding plots be- captured by VIs, with the MTCI-based model low indicated plot size. Results for some VIs in table 2 are not displayed for capturing 55% of yield variability on plots of clarity, but consistently performed worse than GCVI and MTCI. at least 0.10ha. Notably, this value is greater than the amount of FP yield variability cap- Four important features are evident in fig- tured by CC yields on these plots (adjusted ure 3: (a) Adjusted R2 values were generally R2 ¼ 47%), indicating that satellite meas- higher between VIs and FP yields than be- ures are better correlated with full plot har- tween VIs and CC or SR yields, which is con- vests than the crop cuts on those same sistent with the notion that full-plot crop fields. Performance using only May 30 or cutting provides a better measure of plot- June 19 was similar but slightly worse than level productivity. (b) Adjusted R2 tended to the model using both dates (37% and 49% improve when excluding the smallest plot of yield variation explained for each date, sizes, consistent with the results in Burke and respectively), as shown in online supple- Lobell (2017). A likely explanation for this is mentary appendix figure A2. the increased importance of georeferencing One potential concern with the calibrated errors and mixed pixels on the smallest of models is that they are unduly influenced by plots. For example, a 0.05 ha plot covers an sowing date differences between fields. For area of just five 10x10m Sentinel-2 pixels, and example, if rains came early, such that fields most of these pixels are likely to span the planted early in the season had higher yields, edge of the plot and contain some contribu- but also more mature plants at an earlier tion from neighboring plots. (c) The MTCI stage, the correlation between crop yields Lobell et al. Eyes in the Sky, Boots on the Ground 11 (a) (b) (c) 4 Full Plot Calibrated 4 Crop Cut Calibrated 4 Uncalibrated (SCYM) R 2 = 0.58 R 2 = 0.28 R 2 = 0.54 CC yield (Mg/ha) FP yield (Mg/ha) FP yield (Mg/ha) 3 3 3 2 2 2 1 1 1 Downloaded from https://academic.oup.com/ajae/advance-article-abstract/doi/10.1093/ajae/aaz051/5607565 by guest on 31 March 2020 0 0 0 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 RS_CAL yield (Mg/ha) RS_CAL_CC yield (Mg/ha) RS_SCYM yield (Mg/ha) Figure 4. Plot yield comparisons Note: Comparison of (a) full plot yields vs. predictions from a remote sensing model calibrated to full plot yields, (b) crop cut yields vs. predictions from a re- mote sensing model calibrated to crop cut yields, and (c) full plot yields vs. “uncalibrated” remote sensing yield estimates, which are based on calibration to crop model simulations. All panels show results for pure stand maize plots at least 0.1 ha in size, which are the subset of plots used to calibrate the models in (a) and (b). and the satellite measures would appear posi- This finding suggests that although CC tive but could simply arise from plants being yields are noisier measures of plot-level pro- at different stages in the season. For our data- ductivity compared to FP yields, this noise is set, farmers reported the month of sowing mostly random and does not significantly bias and whether they sowed in the first or second the estimated coefficients in a model to pre- half of the month. These reported sowing dict yields from satellite data. Thus, one can dates exhibited a weak negative correlation expect models calibrated using CC yields with yields, with r ¼ À0.22 for all purestand (which are much more feasible and common maize fields and r ¼ À0.44 for purestand than FP yields) to have lower R2 but similar fields larger than 0.1ha. Moreover, the agree- out of sample accuracy for predicting true ment between VI and yields were not unduly plot productivity as models calibrated with influenced by omitting particular sowing FP yields. dates from model testing, as shown in online The “uncalibrated” estimates, obtained supplementary appendix figure A3. from a regression of simulated yields versus Specifically, removing fields with different simulated MTCI on these same dates, sow dates had a negligible impact on the cor- resulted in a nearly identical R2 to models relation between satellite and full-plot yields, calibrated with FP yields (R2 ¼ 0.54, with the exception of one influential field figure 4c). The uncalibrated estimates did ex- sown in February, which achieved a very high hibit significant bias, with a tendency to over- yield and whose removal reduced the ad- estimate yields by roughly 1 ton/ha, because justed R2 by roughly 15 percentage points. none of the simulated yields were as low as Nonetheless, even after removing this field the lowest of the observed FP yields. the model still explained a highly significant Nonetheless, the high correlation between 43% of yield variation in the remaining fields. uncalibrated estimates and true FP yields At first glance, the results discussed above indicates that ground calibration is not a pre- imply that measuring FP yields will result in a requisite for capturing a large fraction of spa- superior calibrated model, given that the ad- tial yield variability with satellite data. justed R2 for the FP model is more than twice The “calibrated” and “uncalibrated” mod- that for the CC model when focusing on the els can be viewed as two extremes of using performance of pure stand plots larger than available ground data, with the calibrated 0.10 ha. Individual field predictions are shown model using all purestand maize fields with in figure 4a and b, along with the calibration cloud-free imagery, and the uncalibrated statistics. Interestingly, though, the coeffi- model using only model simulations. In prac- cients of the two regressions were very simi- tice, an important question is how much accu- lar, with the model calibrated to CC yields racy is retained as the size of the calibration having a slightly lower range of predicted dataset, and the associated costs of field yields. As a result, this model did nearly as work, is reduced. To explore this further, we well predicting FP yields (R2 ¼ 0.54) as the randomly selected a subset of fields larger model calibrated to FP yields. than 0.1ha to train a calibrated model, and 12 August 2019 Amer. J. Agr. Econ. Yield source for calibration 1.0 1.4 SR CC FP 0.8 1.2 RMSE (t/ha) 1.0 0.6 R2 Downloaded from https://academic.oup.com/ajae/advance-article-abstract/doi/10.1093/ajae/aaz051/5607565 by guest on 31 March 2020 0.8 0.4 0.6 0.2 0.4 All Fields Fields > 0.1 ha 0.0 5 10 15 20 25 5 10 15 20 25 Training size (# fields) Training size (# fields) Figure 5. Sample size training effects Note: The effect of training sample size on the out-of-sample root mean square errors (rmse, left) and squared correlation (R2, right) for predicting FP yields, using models trained on SR, CC, or FP yields. Dashed lines indicate results for purestand maize fields larger than 0.1ha, while solid lines show results for all purestand fields. then tested the model on FP yields from fields found that Sentinel-2 and Skysat performed not used in the calibration. This was repeated very similarly when using GCVI for both, for different sizes of the training subset, with even though many plots contained only a few the average performance plotted as a func- Sentinel-2 pixels (online supplementary ap- tion of training sample size (figure 5). We al- pendix figure A4). The large boost in perfor- ternatively evaluated predictions on fields mance when using MTCI with Sentinel-2 above 0.1ha as well as all purestand fields. therefore more than outweighed any loss in Results indicate that although the out-of- accuracy from using coarser resolution. This sample performance continues to improve for result may be specific to the particular atmo- additional samples, training on 10 fields does spheric conditions, time of growing season, nearly as well as training on 25. Also evident and characteristics of the study site, and in figure 5, and consistent with the discussion therefore we caution against overweighting above, is the fact that training on CC results the benefits of spectral versus spatial resolu- in nearly identical performance as the FP tion. Nonetheless, it is an informative com- model when tested on FP yields. Training on parison made possible by having two images SR yields, in contrast, results in large root so close in time over a study site with large mean square errors because of the substantial amounts of quality ground-based data. bias associated with SR yields (figure 5). The superior performance of MTCI is noteworthy, especially given that several of Comparison of Ground- and Satellite-Based the most recent satellite sensors, which pos- Yield Measures on All Maize Plots sess higher spatial resolution than Sentinel-2, Of interest in agricultural regions such as lack the red edge bands needed to calculate Uganda, where maize is typically inter- MTCI. In this study, we fortuitously had ac- cropped with other species, is how well satel- cess to a relatively cloud-free image acquired lite measures can capture the performance of by Terra Bella’s Skysat sensor on May 29, mixed-crop plots. Of course, ground-based one day before a Sentinel-2 image. Skysat yield measures are readily beset by chal- was used in Burke and Lobell (2017), and in lenges from intercropping (Carletto, Jolliffe, the context of smallholder mapping has the and Banerjee 2015). In crop cutting applica- particularly attractive feature of 1m spatial tions, pure stand plots are typically priori- resolution. Particularly for the small plot tized due to (a) differences in harvest sizes in Uganda, we anticipated that the 1m calendars of crops on intercropped plots (e.g., resolution would offer substantial benefits maize versus root/tuber crops, such as cassava compared to the 10m resolution of Sentinel- as in our study, whose harvests may span an 2’s main bands, and the 20m resolution of extended period; take place on a needs basis; Sentinel-2’s red edge bands. Surprisingly, we and cut across agricultural seasons); (b) the Lobell et al. Eyes in the Sky, Boots on the Ground 13 (a) (b) (c) 4 4 4 Legume intercrop Cassava intercrop Both Legume and Cassava 3 3 3 FP yield FP yield FP yield Field Size Downloaded from https://academic.oup.com/ajae/advance-article-abstract/doi/10.1093/ajae/aaz051/5607565 by guest on 31 March 2020 2 < 0.1 ha 2 2 >= 0.1 ha 1 1 1 Correlation Correlation Correlation All Fields, r = 0.28 All Fields, r = 0.12 All Fields, r = −0.19 0 Fields >= 0.1 ha, r = 0.48 0 Fields >= 0.1 ha, r = 0.35 0 Fields >= 0.1 ha, r = 0.06 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 RS yield RS yield RS yield Figure 6. Comparison of calibrated remote sensing yields vs. full plot harvests for different types of intercropped plots Note: (a) maize intercropped with only legumes (beans, groundnuts), (b) maize intercropped with only cassava, and (c) maize intercropped with both legumes and cassava. All panels show remote sensing yields based on calibration to FP yields in purestand maize plots at least 0.1ha in size (model shown in figure 4a). RS yields tend to be higher than FP yields in intercropped fields since the latter do not account for production from the other crops. difficulty of conducting multiple crop har- estimates. The relatively better performance vests on intercropped plots and accommodat- for legume intercrops presumably reflects the ing crop-specific post-harvest processing and fact that both beans and groundnuts grow drying needs prior to weighing. And in the close to the ground, below the maize crop, analysis of surveys soliciting farmer-reported whereas cassava intercrops often include very information on crop production, the yields mature cassava plants that exceed the maize for each crop on an intercropped plot is com- crop in height. puted by dividing the production of each crop The worse performance for satellite- based by either (a) the entire plot area, (b) the plot maize yields on intercropped compared to area multiplied by the farmer-reported share pure stand plots makes sense, since non- of the plot area cultivated with the crop, or maize crops can be a large contributor to the (c) the plot area multiplied by the ratio be- light reflected from the canopy and measured tween the farmer-reported seed use under by satellite sensors, especially in the case of intercropping and hypothetical seed use un- intercrops such as cassava that overhang der pure stand cultivation. maize plants. However, in these situations it In our study, the ground-based measures of is doubtful that the yield of maize is the best yield (SR, CC, and FP) were obtained only measure of land productivity. In the absence for maize, irrespective of the pure stand ver- of other ground-based measures of productiv- sus intercropped cultivation status. The sec- ity, we turn instead to assessing the sensitivity ondary crop harvests were not considered in of the relationships between yield and factors our crop cutting operation primarily due to of production to the choice of the ground- the above referenced reasons that typically versus satellite-based yield variant. lead to the prioritization of pure stand plots in crop cutting applications. In turn, we com- Assessment of Inter-Relationships between pared the satellite-based yield measures to Maize Yields and Factors of Production FP for different types of plots, grouped based on the presence and type of intercropping Pure stand plot-level maize yield regressions (figure 6). The performance on plots inter- resulted in similar coefficients for models us- cropped with legumes (beans or groundnuts) ing CC, FP, and satellite-based yields (ta- was significantly lower than on pure stand ble 4). The coefficients for the three factors plots, with roughly 20% of yield variability of production of interest—plot area, soil captured for plots at least 0.10ha in size quality index, and incidence of inorganic fer- (figure 6a). Maize yield estimates were even tilizer use—are visualized in figure 7a. As worse on plots intercropped with cassava also noted by Gourlay, Kilic, and Lobell (figure 6b) or both legumes and cassava (2017), the regression using SR yields (figure 6c), with less than 10% of the maize resulted in a much stronger negative coeffi- yield variability captured by the satellite cient for plot area than the objective ground- 14 Table 4. Regression Coefficients for Pure Stand Plots Using Different Yield Measures Dependent Variable/Maize Yield Type August 2019 Self-report Crop-cut Full plot RS_cal_fp RS_cal, cc RS_scym (1) (2) (3) (4) (5) (6) Log Plot Area (GPS, ha) À1.94*** (0.42) À0.08 (0.07) À0.23 (0.14) À0.15** (0.06) À0.10** (0.04) À0.11** (0.05) Log Plot Distance from Dwelling (GPS, km) 0.10 (0.33) À0.04 (0.06) À0.19 (0.12) À0.11** (0.05) À0.0024 À0.05 (0.04) Cover Crops Present Prior to Plantinga À0.35 (0.99) 0.01 (0.20) 0.26 (0.48) À0.08 (0.15) À0.08 (0.11) À0.04 (0.13) Log Maize Seed Planting Rate (Kg/Ha) 1.19** (0.48) 0.09 (0.08) 0.18 (0.14) 0.14** (0.07) 0.10** (0.05) 0.10* (0.05) Inorganic Fertilizer Applicationa 0.56 (1.14) 0.35** (0.17) 0.98*** (0.28) 0.34** (0.13) 0.26*** (0.09) 0.33*** (0.11) Log Household Labor Days 0.56* (0.30) 0.05 (0.06) À0.01 (0.10) 0.05 (0.04) 0.04 (0.03) 0.05 (0.03) Log Hired Labor Days 0.27 (0.42) À0.01 (0.06) À0.03 (0.10) À0.11** (0.05) À0.07** (0.03) À0.09** (0.04) No Hired Labora 0.13 (0.96) À0.24 (0.16) 0.09 (0.26) À0.06 (0.12) À0.02 (0.09) À0.05 (0.10) Soil Quality Index 1.36 (2.64) 1.11** (0.45) 1.84** (0.82) 1.44*** (0.36) 1.02*** (0.25) 1.14*** (0.30) Wealth Index 0.46 (0.39) 0.09 (0.07) À0.05 (0.12) À0.0045 À0.06 (0.04) À0.06 (0.04) Agricultural Asset Index 0.43 (0.32) À0.01 (0.06) 0.09 (0.10) 0.08* (0.04) 0.05 (0.03) 0.06 (0.03) Dependency Ratio À0.16 (0.35) 0.01 (0.06) 0.01 (0.10) À0.01 (0.05) À0.02 (0.03) À0.02 (0.04) Household Size À0.04 (0.11) 0.01 (0.02) 0.02 (0.04) 0.01 (0.02) 0.003 (0.01) 0.003 (0.01) Manager ¼ Respondenta 0.07 (0.83) 0.03 (0.16) À0.05 (0.38) 0.02 (0.13) 0.04 (0.09) 0.07 (0.11) Received Crop-Production À0.08 (0.69) À0.16 (0.12) 0.26 (0.19) 0.06 (0.09) 0.06 (0.06) 0.09 (0.08) Related Extension Servicesa Femalea À0.20 (0.73) À0.09 (0.13) À0.04 (0.25) À0.02 À0.16** (0.07) À0.18** (0.09) Age (Years) À0.03 (0.02) À0.004 (0.004) 0.003 (0.01) À0.001 (0.003) À0.002 (0.002) À0.001 (0.003) Years of Education À0.09 (0.07) À0.01 (0.01) 0.03 (0.02) 0.01 (0.01) 0.002 (0.01) À0.003 (0.01) Constant À4.35 (3.25) À0.12 (0.60) À2.01* (1.08) À0.76 (0.47) À0.21 (0.33) 0.95** (0.39) Observations 73 124 51 105 105 105 R2 0.4 0.19 0.47 0.4 0.39 0.37 Adjusted R2 0.19 0.05 0.17 0.27 0.27 0.24 Residual Std. Error 2.25 (df ¼ 54) 0.54 (df ¼ 105) 0.55 (df ¼ 32) 0.37 (df ¼ 86) 0.25 (df ¼ 86) 0.31 (df ¼ 86) F Statistic 1.96** 1.33 1.55 3.17*** 3.11*** 2.78*** (df ¼ 18; 54) (df ¼ 18; 105) (df ¼ 18; 32) (df ¼ 18; 86) (df ¼ 18; 86) (df ¼ 18; 86) aNote: denotes a dummy variable. Asterisks ***, **, and * denote statistical significance at the 1%, 5%, and 10% levels, respectively. Standard errors appear in parentheses. Amer. J. Agr. Econ. Downloaded from https://academic.oup.com/ajae/advance-article-abstract/doi/10.1093/ajae/aaz051/5607565 by guest on 31 March 2020 Lobell et al. Eyes in the Sky, Boots on the Ground 15 (a) 8 Yield source (b) 8 Yield source Purestand All fields SR SR CC CC 6 FP 6 FP Regression Coefficient Regression Coefficient RS_CAL_FP RS_CAL_FP 4 RS_CAL_CC 4 RS_CAL_CC RS_SCYM RS_SCYM Downloaded from https://academic.oup.com/ajae/advance-article-abstract/doi/10.1093/ajae/aaz051/5607565 by guest on 31 March 2020 2 ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● 0 ● ● ● ● ● ● ● −2 −2 ● −4 −4 Plot area Soil Quality Fertilizer Plot area Soil Quality Fertilizer Figure 7. Summary of regression coefficients for three relevant factors using six different mod- els corresponding to six yield measures. Error bars show 1/- two standard deviations of the mean estimate. based measures, indicating that the conven- Discussion and Conclusions tional wisdom of an inverse-relationship be- tween farm size and productivity may be an Despite the importance of agriculture for ru- artifact of measurement error. While the rela- ral livelihoods, poverty alleviation, and food tionship between soil quality and any one of security across the developing world, house- CC, FP, and satellite-based yields was positive hold and farm surveys collecting micro data and statistically significant at least at the 5% on agriculture exhibit substantial cross- level, the coefficient associated with soil quality country heterogeneity in terms of access poli- failed to be statistically significant in the regres- cies, use of international best practice survey sion using SR yields. In line with the results of methods and dissemination standards, and the CC and FP yield regressions, the relation- data quality (Carletto, Jolliffe, and Banerjee ship between fertilizer use and any one of the 2015). Given the rapid advances in the avail- calibrated or uncalibrated satellite-based yields ability of 10-meter or sub-10-meter spatial was positive and statistically significant at the resolution satellite imagery, the demand is in- 1% level. creasing for understanding how these advan- The regressions for all plots, including both ces can be leveraged to measure and pure stand and intercropped plots, show quali- understand agricultural outcomes with tatively similar coefficients, as depicted in greater accuracy and higher spatial figure 7b and online supplementary appendix resolution. table A1. The satellite-based regressions still Although there is a concerted push to find a significant positive association with soil showcase the value of geospatial applications quality, whereas the coefficients on fertilizer for monitoring and evaluation efforts in the remain positive but become statistically insig- agriculture sector, and for tracking the prog- nificant. A possible explanation for this result ress towards the SDGs, multi-disciplinary re- is that cassava biomass, which influences the search efforts aimed at assessing the accuracy satellite-based yield estimates on intercropped and feasibility of the proposed applications, plots, is similar to maize in its responsiveness particularly in smallholder production sys- to soil quality, but less responsive to inorganic tems, are scant. If validated, satellite-based fertilizer. In comparison to regressions using remote sensing, combined with georefer- FP yields, those using either CC or satellite- enced household and farm survey data that based yields generally had smaller confidence could serve as “ground truth”, could dramati- intervals for coefficient values, which reflects cally enhance not only our ability to fill the the fact that full plot harvests were only per- data gaps, but also our understanding of the formed on 211 plots, whereas sub-plot crop linkages between development and human cutting was done for all 463, and satellite esti- welfare. The field of agricultural economics, mates were available on 397. too, has a stake in these developments, given 16 August 2019 Amer. J. Agr. Econ. the wide range of research applications in yield and key production factors such as soil low- and low–middle income contexts that quality and fertilizer use, even when including continue to rely on household and farm sur- plots of all sizes and those that are inter- vey data, and the emerging evidence on sys- cropped. The significance levels of the coeffi- tematic measurement errors in farmer- cients informed by the satellite-based reported crop production estimates that may measures are often even higher than those have a bearing on fundamental relationships underlined by the full plot harvests. The cross- Downloaded from https://academic.oup.com/ajae/advance-article-abstract/doi/10.1093/ajae/aaz051/5607565 by guest on 31 March 2020 in smallholder production systems (Gourlay, sectional nature of our data limits the ability Kilic, and Lobell 2017; Desiere and Jolliffe to interpret regression coefficients as the 2018; Abay et al. 2019; Wossen et al. 2019). causal effect of a factor on yields. Taking advantage of a unique range of Nonetheless, the fact that factors expected to ground-based plot-level maize yield measures affect yields (i.e., soil quality, fertilizers) are based on farmer-reporting, sub-plot crop cut- associated as strongly with satellite-based ting and full-plot harvests that were collected yield measures as with ground-based yield as part of a methodological survey experi- measures indicates that the errors in both yield ment that was conducted in Eastern Uganda, measures are of similar magnitude. This find- our study showcases the accuracy and empiri- ing emphasizes that an imperfect correlation cal utility of satellite-based approaches to between satellite measures and full plot har- plot-level maize yield estimation in small- vests reflects errors in ground-based estimates holder production systems with a median plot as well as those in satellite-based estimates. size of approximately one-tenth of a hectare. Moreover, the regression results suggest that The satellite-based yield estimates include even if satellite-based measures are less accu- those that are (a) anchored in a calibration rate than full plot harvests, the greater sample model that relates maize yields from full-plot size can compensate for any loss in accuracy. harvests to MTCI values on multiple dates on Also noteworthy is the fact that satellite- a subset of pure stand maize plots that were at based models calibrated to CC yields perform least 0.1 ha in size; (b) based on the same cali- similarly to those calibrated to FP yields, in bration model that uses sub-plot crop cut, as terms of both agreement with FP yields and opposed to full-plot, yield; and (c) based solely estimation of yield response to soil quality and on crop model simulations, without reliance fertilizer. These results indicate that although on any ground-based yield measure. While (a) CC yields are imperfect approximations of ac- and (b) are identified as “calibrated” variants tual yields, the errors do not substantially bias of remotely-sensed maize yields, (c) is framed remote sensing calibrations. Thus, sub-plot as the “uncalibrated” counterpart. crop cutting appears to be a suitable replace- The accuracy of the satellite-based maize ment for full-plot harvests when the latter are yield estimates is found to be very encourag- not possible. We also found that even using ing. The availability of over 200 full plot har- just 10 fields of either FP or CC yields for cali- vests, which is very rare because of their cost, bration results in accuracies approaching that is a unique situation with which to test satel- of the full model. In addition, we show that lite estimates, and we find that both cali- crop model simulations can be used as a re- brated and uncalibrated approaches capture placement for ground-based measures if the roughly half of the variance in full plot har- potential bias in estimated yields is recognized vests when restricting the analysis to where and acceptable. The bias may also be reduced both ground and satellite approaches are in the future, although that is beyond the measuring the same output (pure stand scope of the current paper. plots), and where the satellite pixels corre- Overall, our findings suggest that remote sponding to the plot are less likely to be con- sensing approaches to measuring crop yields, taminated by neighboring plots (plots > 0.10 particularly when calibrated based on crop hectare). The uncalibrated approach exhibits, cutting operations on the ground, can offer however, a strong tendency to overestimate more accurate and precise measurements yields, but adequately captures spatial varia- compared to farmer reporting. At the plot- tion in yield. In fact, the satellite-based esti- level, the future models can be trained with mates explained slightly more variance in full sub-plot crop cutting on a subsample of plots plot harvests than sub-plot crop cuts per- identified in a household/farm survey, and formed within the plots. subsequently, used to estimate crop yields on In addition, satellite-based estimates can the remaining plots that are not subject to faithfully reproduce the associations between crop cutting as part of the same survey. Lobell et al. Eyes in the Sky, Boots on the Ground 17 Our results corroborate and extend those in Policy Inference and the Inverse Size- Burke and Lobell (2017), despite differences Productivity Relationship in Agriculture. in the study region and the sensors used. Journal of Development Economics 139: Burke and Lobell reported higher R2 between 171–84. satellite estimates and self-reported yields on Arslan, A., N. McCarthy, L. Lipper, S. purestand maize fields ($0.4 vs. $0.2 in this Asfaw, A. Cattaneo, and M. Kokwe. study), which could reflect the fact that farm- 2015. Climate Smart Agriculture? Downloaded from https://academic.oup.com/ajae/advance-article-abstract/doi/10.1093/ajae/aaz051/5607565 by guest on 31 March 2020 ers in the commercial fields of western Kenya Assessing the Adaptation Implications in have more accurate estimates of their yields Zambia. Journal of Agricultural than the more subsistence farmers of Eastern Economics 66 (3): 753–80. Uganda. Unlike in Burke and Lobell (2017), Berazneva, J., L. McBride, M. Sheahan, and this study had the benefit of objective ground- D. Gu ~a. 2018. Empirical Assessment ¨ eren based measures, including 8x8m crop cuts and of Subjective and Objective Soil Fertility full plot harvests, which revealed the low accu- Metrics in East Africa: Implications for racy of self-reports in this region. Similar to Researchers and Policy Makers. World Burke and Lobell (2017), this study found that Development 105: 367–82. the correlation between yields and different Burke, M., and D.B. Lobell. 2017. Satellite- production factors were very similar whether Based Assessment of Yield Variation using satellite-based yields or the preferred and Its Determinants in Smallholder ground-based yield measures, though in this African Systems. Proceedings of the study a wider range of factors, including objec- National Academy of Sciences 114 (9): tive soil measurements, was considered. 2189–94. Even though our study emphasized measur- Byerlee, D., A. De Janvry, E. Sadoulet, R. ing plot-level yields, many applications, such Townsend, and I. Klytchnikova. 2007. as forecasting regional food supply or assess- World Development Report, 2008: ing local conditions for insurance payouts, re- Agriculture for Development. quire accuracy at more aggregate scales. Our Washington DC: World Bank. results suggest that the integration of georefer- Carletto, C., P. Corral, and A. Guelfi. 2017. enced micro survey data on agriculture, such Agricultural Commercialization and as that from the LSMS-ISA, with the expand- Nutrition Revisited: Empirical Evidence ing, publicly-available high-resolution satellite from Three African Countries. Food imagery, will provide a tool to generate the Policy 67: 106–18. landscape-scale data needed for these aggre- Carletto, C., S. Gourlay, S. Murray, and A. gate estimates. Not only could these outputs Zezza. 2017. Cheaper, Faster, and More be used in national and international monitor- than Good Enough: Is GPS the New ing efforts, they should be expected to create Gold Standard in Land Area an unparalleled scope for research on entire Measurement. Survey Research Methods landscapes of agricultural plots. Collectively, 11 (3): 235–65. these measurement tools will allow more rapid Carletto, C., D. Jolliffe, and R. Banerjee. feedback on the effectiveness of different 2015. From Tragedy to Renaissance: efforts to raise productivity, which in turn can Improving Agricultural Data for Better enable more effective agricultural and devel- Policies. Journal of Development Studies opment policy. 51 (2): 133–48. Casley, D., and K. Kumar. 1988. The Collection, Analysis and Use of Supplementary Material Monitoring and Evaluation Data. Washington DC: The World Bank. Supplementary materials are available at Curran, P.J., and J. Dash. 2005. Algorithm American Journal of Agricultural Economics Theoretical Basis Document ATBD 2.22 online. Chlorophyll Index. Southampton-UK: University of Southampton. Darko, F.A., A. Palacios-Lopez, T. Kilic, and References J. Ricker-Gilbert. 2018. Micro-Level Welfare Impacts of Agricultural Abay, K.A., G.T. Abate, C.B. Barrett, and T. Productivity: Evidence from Rural Bernard. 2019. Correlated Non-Classical Malawi. Journal of Development Studies Measurement Errors, “Second Best” 54 (5): 915–32. 18 August 2019 Amer. J. Agr. Econ. Dash, J., and P.J. Curran. 2004. The MERIS Larson, D.F., K. Otsuka, T. Matsumoto, Terrestrial Chlorophyll Index. and T. Kilic. 2014. Should African International Journal of Remote Sensing Rural Development Strategies Depend 25 (23): 5403–13. on Smallholder Farms? An Davis, B., S. Di Giuseppe, and A. Zezza. 2017. Exploration of the Inverse- Are African Households (Not) Leaving Productivity Hypothesis. Agricultural Agriculture? Patterns of Households’ Economics 45 (3): 355–67. Downloaded from https://academic.oup.com/ajae/advance-article-abstract/doi/10.1093/ajae/aaz051/5607565 by guest on 31 March 2020 Income Sources in Rural Sub-Saharan Lobell, D.B., D. Thau, C. Seifert, E. Engle, Africa. Food Policy 67: 153–74. and B. Little. 2015. A Scalable Satellite- Desiere, S., and D. Jolliffe. 2018. Land Based Crop Yield Mapper. Remote Productivity and Plot Size: Is Sensing of Environment 164: 324–33. Measurement Error Driving the Inverse McCarthy, N., T. Kilic, A. De la Fuente, and Relationship? Journal of Development J. Brubaker. 2018. Shelter from the Economics 130: 84–98. Storm? Household-Level Impacts of, and Dorosh, P., and J. Thurlow, 2018. Beyond ag- Responses to, the 2015 Floods in Malawi. riculture versus non-agriculture: decom- Economics of Disasters and Climate posing sectoral growth–poverty linkages Change 2 (3): 237–58. in five African countries. World Mukherjee, A., and R. Lal. 2014. Comparison Development 109: 440–451. of Soil Quality Index Using Three Fermont, A., and T. Benson. 2011. Methods. PLoS One 9 (8): e105981. Estimating Yield of Food Crops Grown O’Sullivan, M., A. Rao, R. Banerjee, K. by Smallholder Farmers: A Review in Gulati, and M. Vinez. 2014. Levelling the the Uganda Context. Washington, Field: Improving Opportunities for DC:IFPRI Discussion Paper 01097 1–57. Women Farmers in Africa. Washington Gitelson, A.A., Y. Gritz, and M.N. Merzlyak. DC: World Bank Group. 2003. Relationships between leaf chloro- Restuccia, D., and R. Santaeulalia-Llopis. phyll content and spectral reflectance 2017. Land Misallocation and and algorithms for non-destructive chlo- Productivity. NBER Working Paper No. rophyll assessment in higher plant leaves. 23128. Journal of Plant Physiology 160: 271–282. Rouse, J.W., R.H. Haas, J.A. Schell, and Gourlay, S., T. Kilic, and D. Lobell. 2017. D.W. Deering. 1973. Monitoring Could the Debate Be over? Errors in Vegetation Systems in the Great Okains Farmer-Reported Production and Their with ERTS. In Proceedings of the Third Implications for Inverse Scale– Earth Resources Technology Satellite-1 Productivity Relationship in Uganda. Symposium, Washington, DC, USA, eds. doi: 10.1596/1813-9450-8192. Freden, S.C., Mercanti, E.P. Washington, Harou, A.P., Y. Liu, C.B. Barrett, and L. DC: NASA. You. 2017. Variable Returns to Fertiliser Schlemmer, M., A. Gitelson, J. Schepers, R. Use and the Geography of Poverty: Ferguson, Y. Peng, J. Shanahan, and D. Experimental and Simulation Evidence Rundquist. 2013. Remote Estimation of from Malawi. Journal of African Nitrogen and Chlorophyll Contents in Economies 26 (3): 342–71. Maize at Leaf and Canopy Levels. Ivanic, M., and W. Martin. 2018. Sectoral International Journal of Applied Earth Productivity Growth and Poverty Observation and Geoinformation 25: Reduction: National and Global Impacts. 47–54. World Development 109 (C): 429–39. Wineman, A., N. Mason, J. Ochieng, and L. Julien, J.C., B.E. Bravo-Ureta, and N.E. Kirimi. 2017. Weather Extremes and Rada. 2019. Assessing Farm Performance Household Welfare in Rural Kenya. by Size in Malawi, Tanzania, and Food Security 9 (2): 281–300. Uganda. Food Policy 84: 153–64. Wossen, T., T. Abdoulaye, A. Alene, P. Kilic, T., A. Palacios-Lopez, and M. Nguimkeu, S. Feleke, I.Y. Rabbi, M.G. Goldstein. 2015. Caught in a Productivity Haile, and V. Manyong. 2019. Estimating Trap: A Distributional Perspective on the Productivity Impacts of Technology Gender Differences in Malawian Adoption in the Presence of Agriculture. World Development 70: Misclassification. American Journal of 416–63. Agricultural Economics 101 (1): 1–16.