Policy Research Working Paper 10445 Measuring Global Economic Activity Using Air Pollution Irene Ezran Stephen D. Morris Martín Rama Daniel Riera-Crichton Latin America and the Caribbean Region Office of the Chief Economist May 2023 Policy Research Working Paper 10445 Abstract This paper uses satellite readings of nitrogen (NO2) air during COVID, in France, the UK and Spain gross domes- pollution, a byproduct of combustion, to improve the tic product in 2020 was underreported by 76, 181, and 205 measurement of global economic activity. The proposed basis points respectively. The methodological contribution approach improves upon night light measures for coun- extends previous Error-Measurement frameworks which, tries where data manipulation, conflict, or other factors suffer from error-in-variables biases, with an objective, have led to poor national accounts. The paper also shows data-driven identification strategy exploiting the plausibly that existing country rankings of gross domestic product orthogonal measurement errors between nitrogen dioxide accuracy over the past 15 years are unreliable, even among and night lights, which are measured at different times. advanced economies. For example, the paper shows that This paper is a product of the Office of the Chief Economist, Latin America and the Caribbean Region. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at drieracrichton@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Measuring Global Economic Activity Using Air Pollution∗ Irene Ezran† Stephen D. Morris‡ ın Rama† Mart´ Daniel Riera-Crichton§ . Keywords: Nitrogen Dioxide, GDP, Night Lights. JEL Codes: E01, E32, F44, Q53. ∗ Materials for online publication appear following the end of the paper. The authors gratefully acknowledge Lok Lamsal and Barry Lefer of NASA for assistance with nitrogen dioxide data and Christopher Elvidge and Virgilo Galdo for assistance with night lights. Sam Bazzi provided useful comments. Any remaining errors are our own. † The World Bank. ‡ Amazon. § The World Bank. Corresponding author. Email:drieracrichton@worldbank.org 1 Introduction This paper uses open access, daily, global NASA satellite measurements of nitrogen dioxide (NO2 ) to provide estimates of business cycle movements that redress long recognized shortfalls in national accounts and other satellite estimates such as night light measures. Over the last decade, global satellite readings of night lights have revolutionized our ability to estimate economic activity in situations where government data is unreliable, or where subnational data is simply unavailable (Henderson et al., 2012). However, these readings only offer information content beyond GDP in countries with the most inaccurate statistical systems (Chen and Nordhaus, 2011), they collect data at night when most economic activity is minimal, and have notoriously misestimated important cyclical movements. For instance, during the financial crisis when on the ground data estimated GDP in the US to fall 8.9% below trend, night lights were 1% above trend. 1 In contrast, US NO2 was 7.6% below trend at the end of 2009, clearly reflecting the recession. In fact, we will show that NO2 , a direct byproduct of combustion that stays near its source for relative long periods and is always measured during the daytime, when most economic activities occur, has explanatory power for annual economic fluctuations in addition to both lights and all of the most prominent ground level signals available. Moreover, while each of these signals and GDP are only available at a lag, NO2 , is published online just a few days after measurement. Finally, we show that NO2 ’s uniquely explanatory power applies to all countries, including advanced economies. Our methodological contribution is an objective, data-driven identification strategy for the error-measurement framework utilized by Chen and Nordhaus (2011) and Henderson et al. (2012). Error-measurement frameworks typically require the analyst to impose prior beliefs concerning data accuracy to overcome bias from errors-in-variables. We exploit the fact that NO2 and night lights are measured at different times to achieve identification through exo- geneity in measurement errors. We scrutinize this identifying assumption at length, leveraging overidentifying restrictions from household surveys (Pinkovskiy and Sala-i Martin, 2016). Ulti- mately, we show the growth elasticity of NO2 is constant across nations. Finally, we substantiate our approach by showing that an NO2 composite estimate for economic activity is close to GDP in the U.S., a placebo in which data is widely thought to be accurate. However, it rebuts GDP from China, where official data has long been considered suspect (Rawski, 2001). From the outset, we stress that NO2 is not infallible, and there are other key complementar- ities between NO2 and lights. In particular, NO2 captures a specific type of economic activity associated to the burning of fossil fuels. While combustion is responsible for about 84% of 1 While there are instances where lights reflect abrupt downturns (Elvidge et al., 2020; Beyer et al., 2021), we show that such instances are the norm rather than the exception, as lights only predict annual fluctuations in 20% of countries, and have no additional explanatory power when ground level signals are available. 1 world energy production, trend NO2 has been declining for several years in the world’s wealth- iest nations due to focused decarbonization. Thus, lights remain preferable for studies of trend economic growth in nations with poor quality data. Our results concern the separate, comple- mentary, and critically important issue of off-trend, business cycles fluctuations, globally. Our core contribution is to show that regardless of any given change in trend air pollution, the off- trend component of NO2 has material information content for annual fluctuations, everywhere. Applications. We consider three categories of applications. First, we demonstrate the value- added of NO2 in the context of existing results. For example, we show that Martinez (2019)’s lights-based results on global instances of possible data manipulation over 1992-2008 do not hold in an updated sample of annual fluctuations over 2005-2020. However, the results are restored if we use NO2 . Thus, there are measurable benefits to using NO2 in global applications. Second, we give five illustrative global examples where NO2 resolves significant GDP errors. Hyperinflation. Beginning with the largest errors in the sample, we find that the reduction in GDP in the Rep´ ublica Bolivariana de Venezuela in 2019 (−43%), and rise in Zimbabwe in 2009 (+47%) were significantly overstated, each perhaps twice as large as the actual change in economic activity. Conflict. There is quantifiable evidence of informal economic activity increasing during recent civil war in the Syrian Arab Republic and the Republic of Yemen. Specifically, the fall in economic activity during either conflict was about 10 percentage points less precipitous than indicated by GDP. Oil Supply. Azerbaijan reported a 30% spike in GDP due to the 2006 opening of an oil pipeline. We show this export-line accounting increase in GDP was not matched by a commensurate increase in economic activity, which was closer to 20%. Nigeria reported falling GDP in 2016 due in part to falling oil prices (-1.63%). We find economic activity actually rose (+0.74%), likely attributable to its shadow economy. Financial Crises. Many factors including widespread tax evasion led to the 2011 Greek debt crisis, during which GDP fell sharply (-11%). We find that the fall in economic activity was less steep (-8%), a difference perhaps due to the same hidden income that helped catalyze the crisis. On the other hand, evaluating the 2018 Argentine monetary crisis, we find no misstatement. Multinationals. A natural question is whether GDP is ever significantly erroneous in countries with otherwise very reliable data. One example is Ireland’s GDP, which rose by 22% in 2015 due to the inclusion of intangible intellectual property from multinational corporations. The measurable increase in economic activity was at least 1 to 7 percentage points less. Third, we use the fact that NO2 uniquely has information content beyond GDP for all 2 countries to create two new global rankings of GDP accuracy. Objective Data Quality. Existing rankings of GDP data quality are at least partially subjective. We assemble a new world ranking of GDP misstatement by country over 2005-2020, which ranges from 0.25 to 5 percentage points per annum on average, and is in line with Henderson et al. (2012)’s estimate of trend growth errors up to 3 percentage points. We find that existing subjective rankings are misleading, and our objective ranking implies a significant reshuffling. GDP Errors during the 2020 Pandemic. We reconsider reported GDP during the 2020 COVID- 19 crisis and create a ranking of misstatement in this year specifically. We find that even for countries with very high-quality statistical systems including the United Kingdom, the unprecedented official GDP statistics reported during 2020 seem unlikely to be entirely accurate. We ultimately conclude that 2020 GDP systematically understated economic activity, globally. Finally, we close by discussing other avenues for research we have no space for in this proof- of-concept work, including subnational and sub-annual estimates, and nowcasting. Related Literature. There is a wealth of economic research using satellite data, as summa- rized by Donaldson and Storeygard (2016). The primary signal which has been used towards measuring economic activity is the radiance of night lights. Levin et al. (2020) provides a comprehensive review of this literature. Alternative satellite data such as high-resolution imagery has been used to measure world poverty (Jean et al., 2016) and detect urban markets (Baragwanath et al., 2019). Moreover, there have been many examples of using satellite air pollution data in the environmental and health literature (Jayachandran, 2009). However, to our knowledge the only previous study using air pollution data to measure cyclical economic activity is Morris and Zhang (2018), which uses NO2 to infer economic activity in China. The current study represents a more comprehensive and independent effort to carry out this seed of an idea to its full fruition, globally, with a distinct methodology, and in new applications. Our research design builds on scientists’ long-held observation that there is a correlation between NO2 and economic activity. For example, several scientific studies observed reductions in NO2 during the 2009 Financial Crisis (Lin and McElroy, 2011; Castellanos and Boersma, 2012; Russell et al., 2012; Itahashi et al., 2014). It should come as no surprise that scientists also measured an abrupt downturn in NO2 during the 2020 pandemic (Gkatzelis et al. (2021) and references therein). While the reduction in economic activity causing these reductions is often implicit (e.g. Keller et al. (2021)), none of the scientific literature has attempted to measure a formal statistical relationship between NO2 and economic activities globally.2 2 Our search revealed just two scientific studies quantifying correlations between NO2 and official measures of economic activity, and the focal point of these analyses are particular places or times. First, Montgomery and Holloway (2018) present correlations between trend economic growth and NO2 in a study of 38 cities from 2005 to 2011. Second, Wei et al. 3 NO2 may also be interpreted as a new variant in big data efforts to quantify cyclical fluc- tuations. For example, across countries in Sub-Saharan Africa, Buell et al. (2021) present a suite of high-frequency indicators, including Google search trends and mobile payments. Sim- ilarly, across ZIP codes in the United States, Chetty et al. (2020) build a database that uses anonymized data from private companies to proxy consumer spending, business revenues and employment rates nearly in real-time. Similarly, Goolsbee and Syverson (2021) and others rely on mobility data. The advantages of air pollution data over any of these alternative sources is that satellite NO2 readings have global coverage, are available in real-time, are entirely open access, and are now reaching their maturity, with comprehensive data back over 15 years. Outline of the Paper. The paper proceeds as follows: Section 2 provides background on satellite measurements of NO2 . Section 3 describes the data in further detail, including summary statistics and other basic characteristics. In Section 4, we describe the methodology, including our objective procedure for identification. Section 5 showcases the baseline results, and Section 6 implements the methods to applications. Section 7 concludes. 2 Satellite Measurement of Nitrogen Dioxide Background. We use air pollution as a measure of economic activity. However, not all air pollution data, including that measured at ground stations, is suited to such a task. First, ground-level air pollution monitoring stations have an uneven spatial distribution. It is not the same to measure pollution in an urban intersection or in the countryside. Additionally, these stations are sparsely located, particularly in developing countries, where grid cells may contain just one station or none at all. Second, even when measured, these readings may subsequently be doctored due to political motives, such as meeting mandatory targets of energy intensity (Ghanem and Zhang, 2014). These features are some of the primary reasons why “top-down” measurement of air pollu- tion from satellites has become prevalent in scientific studies (Lin and McElroy, 2011; Wang et al., 2012). NO2 specifically has been of keen interest to scientists, and top-down measure- ments have proven fruitful in several contexts. For example, to infer atmospheric densities over China, where monitoring station data is suspect (Richter et al., 2005). An air pollutant with serious implications for the environment and health, near-Earth NO2 is primarily due to human economic activities involving combustion (e.g., vehicles, power plants, and factories). Still, the availability of satellite readings of other air pollutants more familiar to some people, including carbon dioxide (CO2 ), begs the question of why satellite readings of air pollution have (2020) note a correlation between NO2 and economic activities in select U.S. cities during the COVID-19 crisis. No other work generalizes these results globally, develops an NO2 -based measure of economic activity, or considers our applications. 4 not previously been used to infer economic activity. The reason is that there may be significant differences between concentrations measured by satellites (stock) and emissions more indicative of ongoing economic activity (flow). For many air pollutants including CO2 , there may be substantial movement between the source of emissions and place of measurement due to wind, necessitating a sophisticated atmospheric chemical transport model (e.g. Middleton (1995)). Why NO2 ? It is specifically the fact that NO2 stays in place, and is thus directly indicative of combustion below it, which makes it a preferable signal of economic activity among all air pollutants. Indeed, scientific papers have supported the veracity of satellite retrievals of NO2 densities in particular (Geddes et al., 2016). The short atmospheric life of NO2 (less than a day) means that densities are closely correlated to emissions (Seinfeld and Pandis, 2016).3 While these densities are also possibly subject to other errors, such as non-human sources (e.g., soil emissions, wildfires, and lightning), these have comparatively small effects. In this paper, we use near-Earth densities of NO2 , which are closely linked with anthropogenic emissions due to NO2 ’s low transport, and apply a statistical procedure to account for measurement errors. Sample. We make use of vertical column densities (VCDs) of tropospheric NO2 from Ozone Monitoring Instrument (OMI), a nadir spectrometer onboard NASA’s Earth Observing System (EOS) Aura satellite (Lamsal et al., 2008, 2010). The units of densities are molecules per square centimeter, typically on the order of 1e15. The column has a high spatial resolution of 0.1° × 0.1° (latitude by longitude), which is equivalent to 13 km × 24 km at the nadir, i.e. directly downwards towards the center of the Earth in the field of view.4 The height of the columns is the troposphere, about the maximum cruising altitude of commercial aircraft, and more indicative of ground-level human activity than readings higher in the atmosphere. OMI is a spectrometer, a scientific instrument which uses solar backscatter visible and ultraviolet radiation to deduce aerosol densities. Consequently, measurements always occur during daylight, since solar wavelengths are required for a spectrometer to function. This is inherently opposite the conditions required to measure night lights, a key feature we make use of later towards identification. The satellite on which OMI is mounted, Aura, is part of the “A-Train” afternoon constellation, a sun-synchronous line of satellites moving in a pole- to-pole route. These satellites are oriented as such purposefully to provide data from multiple 3 NO2 is in fact so easy to measure relative to other air pollutants that it has been used to validate measures of these other atmospheric species, like CO2 , over China (Berenzin et al., 2013). It is also well-known that NO2 serves as an effective proxy for human-made nitrogen oxides (NOx = NO + NO2 ) (Leue et al., 2001; Velders et al., 2001). During the daytime, a chemical process converts nitrogen oxide (NO) into NO2 (and vice versa) within minutes. It is common practice to use measurements of NO2 densities as a proxy for human-made NOx . 4 Although the grid is of uniform size in degrees, due to the curvature of the Earth, the geographic area of this grid varies in the field of view i.e. swath. For instance, longitude lines approach each other when moving from the equator to either pole. In the analysis, we convert the degree grid from latitude and longitudinal coordinates to geographic area using elementary trigonometric principles and weight the measured density by geographic area, consistent with accepted practices. 5 instruments simultaneously at the same time each day, crossing the equator around 1:30 PM local time. NASA’s EOS-Aura OMI science team at Goddard Space Flight Center retrieves, filters, and publishes the data in easily useable format, at daily frequency up through just a few days earlier, with informative illustrations, videos, and context. At the time of writing, easily accessible city-level OMI NO2 VCDs with visualizations were available below. Data and Visualizations:5 https://so2.gsfc.nasa.gov/no2/no2_index.html Finally, we note that while alternative NO2 VCDs from other satellites are available back to the mid-1990s, accumulating all into one analysis leads to technical inconsistencies due to differences in instruments. Thus, the fact that OMI has at this point consistently made readings using the same instrumentation for over 15 years is the crux of why a systematic global macroeconomic study like ours is just now possible. We provide additional information, and other NO2 satellite data past, present, and future, in Online Appendix A.1. 3 Data 3 .1 Basic Characteristics Summary Statistics. We begin with some simple summary statistics for the OMI NO2 data we utilize. Table 1 Panel A compares the data set, which is comprised of country average VCDs per area, to GDP as reported in v. 10.0 of the Penn World Tables. Also included is night lights per area, formed from data joining older DMSP-OLS sensors with newer VIIRS sensors.6 It is apparent that the primary differences between NO2 and lights are that NO2 has an overall lesser average growth rate, and is more coarsely gridded. With respect to the latter point, as Donaldson and Storeygard (2016) note (p. 192), though there may be a tendency to equate resolution with information value, this may not always be the case; as we will show, this difference in grid size does not diminish the explanatory power of NO2 versus lights at the country-level. Moreover, other NO2 sensors present and future have higher resolution if a more 5 (Lamsal et al., 2021). Alternative ways to access the complete raw data set, such as to compute averages over all country area used later in this paper, are available through NASA’s Earth Science Data Systems (ESDS) at https://earthdata.nasa. gov/ and GES DISC tool at https://disc.gsfc.nasa.gov/. Visualization tools are available at https://worldview. earthdata.nasa.gov/ and https://giovanni.gsfc.nasa.gov/giovanni/. 6 We discuss the construction of the night lights data set we use, along with all technical definitions, in Online Appendix B.1. In short, we use inter-calibrated DMSP-OLS data until the end of 2013, when the improved stray light-corrected, stable-lights VIIRS product becomes available (see Mills et al. (2013) for this data product and Gibson et al. (2020) for why VIIRS is preferable to DMSP-OLS when available). With respect to GDP, the most recent vintage of the Penn World Tables v. 10.0 ends in 2019. To compute GDP in 2020, we used World Bank data on 2020 growth for all 81 countries with data available at the time of writing. Thus, we consider an unbalanced panel. 6 Table 1: Summary Statistics: Annual Percent Growth, 2005-2020. A. Overall. Variable Countries Years Mean Std. Dev. Minimum Maximum Lat. × Lon. %∆GDP 178 05-20 3.06 5.02 -36.39 59.53 %∆NO2 178 05-20 0.37 8.71 -32.60 101.17 0.1° × 0.1° %∆Lights 178 05-20 9.44 42.22 -53.24 545.27 < 0.01° × 0.01° B. Means, By-Data Quality Grade. Grade Countries %∆GDP %∆NO2 %∆Lights A 16 1.10 -2.14 2.70 B 13 1.63 -1.28 3.33 C 109 3.12 0.67 8.27 D 27 4.50 1.15 16.81 E 10 4.33 1.75 24.79 All 175* 3.06 0.37 9.44 Notes. Henceforth, “NO2 ” or “Lights” means per area. “GDP” is real GDP at constant 2017 prices. *3 countries out of the full sample of 178 appearing in Panel A are ungraded, and therefore do not appear in Panel B: Cura¸ cao, Palestinian Territory, and Sint Maarten. Country lists by-grade are reported later in Table 7. fine-grained analysis is of interest.7 The lesser average growth rate of NO2 is at least in part due to heterogeneity in decarbonization efforts across countries, as we now discuss. Data Quality Grades. Table 1 Panel B breaks down the mean growth rates in Panel A by-country data quality grade. These grades are provided in the Penn World Tables on the basis of countries’ capacity for compiling accurate statistics. For consistency, we use Chen and Nordhaus (2011)’s expanded version with grade “E” meaning those countries having basically no statistical organizations.8 In general, higher grades are associated with more developed economies. It is no mistake that grade A and B countries have trend NO2 declining over time, which is primarily due to decarbonization efforts, akin to the theory of an environmental Kuznets curve. This is unlike lights, which are associated with upwards trends over time across all grades, and with conditional convergence. So, when considering economic growth i.e. trend changes in output, night lights are most useful. Critically, this conclusion flips when we consider 7 The resolution of lights is 30 arc-seconds for the DMSP-OLS database and less for VIIRS. An arc-second is one sixtieth of an arc-minute, which is one-sixtieth of a degree of latitude or longitude. Thus, OMI’s resolution of 0.1° × 0.1 is about 360 arc-seconds, or 12 times as coarse as night lights. An alternative European Space Agency NO2 instrument, TROPOMI, with data beginning in 2017, is about 3 times as coarse as night lights. The upcoming TEMPO instrument for NO2 has resolution near lights. See Online Appendix A.1 for more information. 8 An alternative ranking from the World Bank favored by Henderson et al. (2012) is more careful to rank countries regarding GDP accuracy specifically, but is only available for 114 countries in the sample and importantly contains no Penn World Tables “A” graded countries. These two rankings are positively correlated, but there are exceptions. Later when we report substantive results regarding data quality grades in Section 6, we also show they are robust to this alternative. 7 off-trend fluctuations in NO2 , which we now show are indicative of business cycles. 3 .2 Time Series Characteristics Figure 1 presents the example of China, to provide some basic intuition regarding NO2 ’s superior ability to capture cyclical movements. In the left-hand side panels are NO2 , while the right are lights. In the top row are raw data, while the bottom is the cyclical component. All panels cover the same period, from January 1, 2015 to January 1, 2021, which includes two significant downturns: August 24, 2015’s “Black Monday”, when the Shanghai main share index lost 8.49% of its value, and January 1, 2020, a rough estimate for the beginning of the COVID-19 pandemic. Both downturns are clearly evident in the raw data and the cyclical component of NO2 , which fell significantly in their aftermath. However, the same may not be said for lights; whereas Black Monday is evident in the cyclical component, the pandemic is not. Thus, regardless of the fact that NO2 is overall downward-trending, the cyclical component seems to contain more information regarding changes in economic activity in this case. Figure 2 changes the perspective to average world NO2 , and back to the 2009 Financial Crisis. From this perspective as well, the potential for NO2 to indicate the timing of global business cycles is evident; for instance, the ramifications of the 2015 episode in China are also visible in world data, as are both the 2009 and 2020 crises.9 Thus, a hypothesis we wish to test is whether NO2 reflects cyclical economic activity better than lights globally. 3 .3 Cross-Sectional Characteristics Thus far we have emphasized time series fluctuations. But NO2 also contains important cross- sectional information. While NO2 has broader applications than just the COVID-19 pandemic, this is a nonetheless an example of a particularly stark reduction in economic activity where we may see geographic heterogeneity clearly. China. Figure 3 depicts the change in NO2 in China, provinces outlined, between the first quarter of 2019 and 2020. Within the area of most significant downturns (blue) is Hubei province, containing Wuhan. Amongst all world countries, China is notable for how uniform the reduction in NO2 was in its populated regions reflective of the efficacy of lockdowns in spurring collective action. For reference, China had some of the highest global concentrations of NO2 before the crisis. World maps of NO2 in 2019 previous to the pandemic, and lockdown effects in 2020, are depicted in Online Appendix A.3. Europe. Figure 4 depicts the change in NO2 in Europe between the second quarter of 2019 and 9 NO2 was already below trend before the pandemic due in part to the 2019 manufacturing recession, which is more clearly seen in U.S.-specific data in Online Appendix A.2. 8 NO2: Data Lights: Data Black Monday Pandemic Black Monday Pandemic 2.4 2.4 2 2 Vertical Column Density: 1e15 mol/cm2 2.3 2.3 Radiance: W/(cm2 -sr) 1.8 1.8 2.2 2.2 2.1 2.1 1.6 1.6 2 2 1.9 1.9 1.4 1.4 15 16 17 18 19 20 21 15 16 17 18 19 20 21 NO2: Cyclical Component Lights: Cyclical Component Black Monday Pandemic Black Monday Pandemic 10 10 5 5 Lights: % Difference from Trend NO2: % Difference from Trend 0 0 5 5 -5 -5 0 0 -10 -10 -5 -5 -15 -15 -20 -20 -10 -10 15 16 17 18 19 20 21 15 16 17 18 19 20 21 Figure 1: China: NO2 and Night Lights, Data and Cyclical Component. Notes: Underlying NO2 data is daily OMI VCD readings and night lights are monthly VIIRS radiance (stable lights, stray-light corrected). For NO2 , 365 day daily rolling average is used while for lights, 12-month rolling average is used. For both, trend is calculated using Hamilton (2018) filter. 9 Financial Crisis Pandemic 5 5 NO2: % Difference from Trend 0 0 -5 -5 -10 -10 08 09 10 11 12 13 14 15 16 17 18 19 20 21 Figure 2: NO2 : World Average, Cyclical Component. Notes. 365 day rolling average. “Financial Crisis” date is Sep. 17, 2008 failure of Lehman Brothers. “Pandemic” is marked Jan. 1, 2020. Trend component calculated using Hamilton (2018) filter. 2020, to coincide with its slightly later timing of shutdowns. Shutdown effects are particularly apparent in urban centers, with only sporadic increases in NO2 . While many of the decreases appear to be large, making inferences requires a statistical framework. For example, we will show that the significant decline in NO2 in the U.K. is still not enough to corroborate its remarkable reported -10.37% change in GDP. We provide a close-up of northern Italy in a two week interval around its stringent lockdown and compare it with Wuhan in Online Appendix A .4. There, we also show that reductions in NO2 are correlated with formal indexes of lockdown stringency. United States. Figure 5 depicts the U.S. It is evident that shutdown effects were strong on the coasts (blue), but the same was not true everywhere, especially the south, where there are uniform areas of increasing NO2 (red). This is distinct from the implicit collective action in China and may not be explained by seasonal effects, since the difference is taken from a year earlier. A possibility is that, for social reasons correlated with political views, this is indicative both of more lax shutdown orders and resistance to lockdown orders when mandated (Bazzi et al., 2021). In Online Appendix A.5, we corroborate this thesis by showing that county-level 2016 presidential election voting for Donald Trump was associated with less reduction in NO2 . 10 NO2 in China: Difference between 2020 Q1 and 2019 Q1 (unit: 1e15 molecules/cm2) -10 0 10 Harbin Shenyang Beijing Tianjin Zhengzhou Xi'an Nanjing Chengdu Wuhan Hangzhou Chongqing Guangzhou Hong Kong Figure 3: China: Business Cycle Frequency Effects of COVID-19 Shutdown on NO2 . 11 NO2 in Europe: Difference between 2020 Q2 and 2019 Q2 (unit: 1e15 molecules/cm2) -3 0 3 Saint Petersburg Stockholm Novgorod Glasgow Moscow Kazan' Hamburg Minsk Birmingham Berlin Warsaw London BrusselsCologne Kyiv Prague Kharkiv Paris Volgograd Munich Vienna Budapest Rostov-on-Don Milan Belgrade Bucharest Sofia Rome Barcelona Madrid Figure 4: Europe: Business Cycle Frequency Effects of COVID-19 Shutdown on NO2 . Seattle ! Portland ! Minneapolis ! Milwaukee Detroit Boston Chicago ! Omaha Cleveland ! ! Brooklyn Queens ! ! Columbus ! DenverAurora Indianapolis Manhattan New York ! !! ! ! Kansas City ! Colorado Springs !Baltimore ! ! ! !! !Saint Louis Louisville Oakland Washington ! Wichita ! ! ! ! Fresno Virginia Beach !! ! San Jose Las Vegas Tulsa Nashville ! Raleigh ! ! Bakersfield ! Albuquerque ! Oklahoma ! City Memphis ! Charlotte ! ! ! Anaheim ! Long Beach! Atlanta ! Phoenix Mesa San Diego ArlingtonDallas !! ! ! ! Tucson !! El Paso Fort Worth ! !!! ! ! Austin New Orleans Jacksonville San Antonio !Houston NO2 in the United States: ! ! ! Difference between 2020 Q2 and 2019 Q2 ! Corpus Christi Tampa ! (unit: 1e15 molecules/cm2) ! Miami ! -2 0 2 Figure 5: United States: Business Cycle Frequency Effects of COVID-19 Shutdown on NO2 . San Juan ! 12 4 Empirical Framework To draw a comparison with night lights, we use its core empirical framework. After introducing it, we explain its key identification problem, and our new data-based strategy to overcome it. 4 .1 Chen and Nordhaus (2011)-Henderson et al. (2012) Model Here we review the error-measurement framework introduced by Chen and Nordhaus (2011) and Henderson et al. (2012), which revolves around the fact that there is classical measurement error in GDP.10 The variance of this error varies by-country data quality grade. Suppressing subscripts for now, for countries with data quality grade g =A, . . ., E we write: z = y + εz,g (1) where z is log GDP measured in national accounts, y is log unobservable true output or income, 2 and the variance of εz,g is denoted σz,g . A log signal of economic activity x (e.g. NO2 or lights) is related to y via: x = βy + εx . (2) Measurement error in the signal is assumed uncorrelated with the measurement error in GDP, which is defensible for satellite signals. β is the growth elasticity of the signal. Combining the last two equations: z = ψx + [εz,g − (1/β )εx ], (3) where ψ = 1/β the inverse growth elasticity. Due to errors-in-variables (correlation between x and εx ), the OLS estimator ψ is biased. Nonetheless, this is still a best-fit relationship for producing the best-fit values, called a proxy, z = ψx. In all regressions, the model we estimate when constructing proxies is the following, with all subscripts now explicit for completeness: zijt = ψxijt + aj · t + bi + cj + dt + eijt (4) Above, eijt is an error, for time t, place j , and sensor i. dt is a time fixed effect and cj is a place fixed effect, both of which are systematically included in Chen and Nordhaus (2011) and Henderson et al. (2012). Our controls are distinct in two respects. First, our newer night lights sample contains observations from both the historical DMSP-OLS sensor from 2005-2013, and the more modern VIIRS sensor from 2014-2020. Thus, we include a sensor fixed effect, bi . Second, we systematically include a country-specific time trend aj · t unless otherwise noted. Thus, the specifications of primary interest to us involve off-trend fluctuations, unlike previous 10 We use the notation of Henderson et al. (2012) with minor changes. Chen and Nordhaus (2011) named the model (p. 8,590). 13 lights studies, which primarily employ country-specific time trends as robustness tests. The goal of the error-measurement framework is to compute an optimal composite estimate y of output using both reported GDP z and the signal-derived proxy z . The weights λg on GDP which yield this optimal composite are, y = λg z + (1 − λg )z, (5) 2 2 σx σy λg = 2 2 2 2 ) + σ2 σ2 . (6) σz,g (β σy + σx x y 2 Say the variance of GDP measurement errors σz,g are relatively large for one grade compared to another. Then, the weight on GDP, λg , intuitively falls relative to the weight on the proxy, 1 − λg . And yet, while intuitive, the model is subject to a fundamental identification problems which prevents λg from being calculated. We now describe the problem, and our solution. 4 .2 Objective Identification by Instrumental Variables The identification problem may be summarized as follows. Given there are G distinct categories of data quality grades, then the data supplies 2 + G moments. 2 cov(x, z ) = βσy (7) 2 var(x) = β 2 σy 2 + σx (8) 2 2 var(zg ) = σy + σz,g (9) And yet, there are 3 + G parameters which must be estimated to compute composite estimates: 2 2 2 2 β, σy , σx , σz, 1 , . . . , σz,G . Thus, the model is under-identified, and one additional restriction must be imposed. Henderson et al. (2012)’s approach is to fix the signal-to-noise ratio for one group using a-priori assumptions, which without loss of generality we may assume to be grade 2 σy 2 A: ϕA = σ2 +σ 2 . Relatedly, Chen and Nordhaus (2011)’s approach is to fix σz,A directly. y z,A We advance a data-driven approach which allows the analyst to estimate the model without imposing subjective assumptions. Note that for signal x we may write, x = βz + (εx − βεz,g ). (10) If we were to try to estimate the above relationship by OLS, there would be bias, due to the correlation of z with εz,g .11 However, we are now working in the context of not just one, but two satellite signals, NO2 and night lights, and there is good reason to believe the measurement errors in either are uncorrelated. As we will explain, many of the sources of measurement error 11 This is exactly similar to the bias in the estimated proxy discussed previously (Equation 3). 14 in either signal are due to the timing of their measurement, and are inherently not shared. If this is so, and labeling the other signal not x as w, we have an additional 3 + Gth moment from estimating Equation 10 by instrumental variables to identify the growth elasticity, β . β = cov(x, w)/cov(z, w) (11) Threats to identification include the facts that obstructions like clouds affect both lights and NO2 . In the analysis, we explain at length the rationale behind our identifying assumption and provide tests of overidentifying restrictions. 5 Baseline Results Figure 6 sets the stage for our baseline results by simply plotting the relationship between log GDP and log NO2 per area, net of country and year fixed effects. As indicated by the statistically significant best-fit line, there is a positive relationship in the data, and a linear model captures the relationship well. We now utilize the empirical framework described in the previous section to formally solidify the relationship. 5 .1 Proxy Table 2 presents results from regressions of log GDP on a log signal. Thus, the reported coefficient is ψ , and the fitted values are proxies. Column 1 uses NO2 as a signal. There are 178 countries in the unbalanced panel over 2005-2020, resulting in a sample of 2,752 observations. The difference between this regression and Figure 6 is that we now include a country-specific time trend (Equation 4). Thus, we seek explanatory power for off-trend annual fluctuations. Robust standard errors, clustered by-country, indicate NO2 is strongly statistically significant. The within-country adjusted R-squared of 0.805 indicates substantial explanatory power. Column 2 repeats the same exercise, now using night lights as a signal, for the same sam- ple.12 While night lights do have some explanatory power for cyclical fluctuations, statistical significance is less than when using NO2 . For context, in Columns 3-5 we repeat the exercise again, but now using ground level signals which have less broad availability than the satellite signals. Column 3 uses ground station measured CO2 emissions from combustion sources, as reported by the International Energy Agency. As previously noted, while satellite data is not subject to politically-motivated manipulation, this data may be. Moreover, at the time of writing, it was available for a slightly smaller sample of countries, and only until 2019. Col- umn 4 contains log electricity consumption (units are kWh). Column 5 uses household survey response means from the World Bank’s PovcalNet database. These means aggregate income or 12 See Online Appendix B.1 for details, and correspondence with the baseline estimates in Henderson et al. (2012). 15 .5 ln(Real GDP) 0 -.5 -.4 -.2 0 .2 .4 ln(NO2) Figure 6: Raw Data, Net of Country and Year Fixed Effects Notes. Annual data for 178 countries, 2005-2020. Unbalanced panel: total sample is 2,752. Slope is ψ = 0.348 and robust standard error, clustered by-country, is 0.061 (statistically significant at 1% level). Here and in what follows “NO2 ” continues to mean per area. 16 Table 2: Baseline Results. ln(GDP) ln(GDP) ln(GDP) ln(GDP) ln(GDP) ln(GDP) ln(GDP) ln(GDP) ln(GDP) (1) (2) (3) (4) (5) (6) (7) (8) (9) ln(NO2 ) 0.113*** 0.116*** 0.123*** 0.110*** 0.071*** (0.025) (0.025) (0.025) (0.025) (0.019) ln(Lights) 0.030* 0.031** 0.020 0.022 0.020 (0.016) (0.016) (0.015) (0.015) (0.013) ln(CO2 ) 0.079*** 0.077*** 0.065*** 0.066** (0.028) (0.026) (0.024) (0.031) ln(Electricity) 0.103*** 0.081*** 0.060*** (0.025) (0.022) (0.016) ln(Survey) 0.328*** 0.349*** (0.069) (0.051) Observations 2,752 2,752 2,472 2,548 2,303 2,752 2,472 2,371 2,071 Countries 178 178 174 170 154 178 174 167 146 Years 05-20 05-20 05-19 05-19 05-19 05-20 05-20 05-19 05-19 R-Squared 0.805 0.803 0.816 0.811 0.866 0.806 0.821 0.824 0.882 Robust standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 Notes. All specifications include country and year fixed effects, and a country-specific time trend. When nights lights are used, fixed effects are included for either DMSP-OLS (pre-2014) or VIIRS (2014-2020) sensor. Both NO2 and lights are per area. Robust standard errors, clustered by country are reported in parentheses. R-Squared is adjusted and within all fixed effects. consumption distributional data from more than 1900 household surveys spanning 1967-2019 and 168 economies.13 As a caveat for this data, it is worth noting that not all these surveys are comparable in design and sampling methods. There are also sizable discrepancies between national accounts and household surveys.14 In Columns 6-9, we investigate the inclusion of multiple signals at once. Column 6 shows that both NO2 and lights have explanatory power when included jointly. However, when any of the additional signals are included in Columns 7-9, the statistical significance of lights disappears. Thus, NO2 is always a useful predictor of cyclical output, even when additional widely-used signals such as electricity consumption are available.The same cannot be said about lights. Differences by-Grade. Table 3 Column 1 replicates Table 2 Column 4, in which both NO2 and lights were used as signals. Then, in Columns 2-4, we break down the results by data quality grade.15 Lights only have statistically significant explanatory power for cyclical fluctuations in the 37 out of 178 countries with the lowest data quality, consistent with the findings in Chen 13 Chen et al.; see Chen and Ravallion (2010) and Pinkovskiy and Sala-i Martin (2016) for representative applications. 14 See Deaton (2005) for a discussion of the sources of these discrepancies. 15 We consolidate A and B, and D and E, into joint groups to be more comparable with the larger sample in grade C. 17 Table 3: Differences By Country Data Quality Grade. ln(GDP) ln(GDP) ln(GDP) ln(GDP) (1) (2) (3) (4) Grades All A, B C D, E ln(NO2 ) 0.116*** 0.135*** 0.079** 0.121** (0.025) (0.039) (0.032) (0.055) ln(Lights) 0.031** -0.013 0.016 0.059** (0.016) (0.012) (0.023) (0.029) H0 : Same Effect All Grades (F -Stat./[p-val.]) NO2 : 0.69 [0.503] Lights: 3.00 [0.053] Observations 2,752 464 1,681 562 Countries 178 29 109 37 Year 05-20 05-20 05-20 05-20 R-squared 0.806 0.807 0.784 0.814 Notes. All specifications include country and year fixed effects, and country time trends. When nights lights are used, fixed effects for either OLS (average over DMSP satellites; pre-2014) or VIIRS (2014-2020) are included. Robust standard errors, clustered by country are shown in (parentheses) and p-values in [brackets]. R-Squared is adjusted and within fixed effects. Three countries, Cura¸ cao, Palestinian Territory, and Sint Maarten, are ungraded. and Nordhaus (2011). On the other hand, the effect of NO2 is statistically significant across all grades. We reject the null hypothesis that the effect is the same across-grades for lights with a p-value of 0.053, but fail to do so at any reasonable significance level for NO2 . NO2 is the only signal which consistently has explanatory power for cyclical economic fluctuations—even when other signals are available—across all countries, and with a common effect size. Data Manipulation. Martinez (2019) uses night lights to quantify the global extent of data manipulation in GDP. This is achieved using the Freedom in the World (FiW) index of autocracy to augment the error-measurement framework for proxy estimation.16 Building on this insight, the FiW index is now added to the baseline specification in Table 2 in levels, squared, and as an interaction with lights. The effect on the interaction, known as the autocracy gradient, is the key parameter. When positive, this is indicative of data manipulation, because GDP increases more per unit increase in light radiance in autocracies than in democracies, and there is no reasonable alternative explanation but manipulation. Table 4 Column 1 replicates Martinez (2019)’s results for annual fluctuations over 1992- 2008. The interaction between lights and the FiW index is positive and strongly statistically 16 Freedom in the World (FiW) index: https://freedomhouse.org/report/freedom-world. 18 significant, indicating data manipulation. However, lights are evidently not robust to different samples, which can obscure this result. For instance, Column 2 extends the time sample through 2013, which is the end of the DMSP-OLS sample. In this case, the interaction has a gradient of zero. And, when we change the sample to 2005-2020 in Column 3 using VIIRS data, the sample we use in our baseline results, the gradient becomes negative. One possibility is that this spurious result is in some way due to the inclusion of the 2020 downturn. For example, it has been speculated is that manipulation is more likely to occur during recessions in China (Wallace, 2016). In Column 4, we investigate this possibility by including an interaction dummy for whether the year is 2009 or 2020, which we designate a crisis, and find no difference. The results change, however, when we use NO2 as the signal of economic activity instead of lights. In Column 5, we re-estimate the same specification used in Column 3 through 2020, simply replacing lights with NO2 . Again, a positive interaction appears, indicative of data manipulation previously seen in lights over 1992-2008. Moreover, in Column 6, we see that there is statistically significant evidence of misreporting during crisis years specifically, consistent with theory. Thus, using NO2 has measurable benefits in global applications. Trend Economic Growth. Despite these findings supporting NO2 , night lights are highly complementary, and applicable in different contexts. In Online Appendix B.2, we repeat the analysis described in Tables 2, 3, and 4, but removing country-specific time trend controls. Thus, predictive power for trend economic growth is now under consideration. One difference is that the overall explanatory power of night lights remains, regardless of what additional signals are available. Moreover, examining the differences by-grade, we find that lights now explain trend growth not only in grade D and E nations, but C as well. Finally, it is unclear how to interpret NO2 -based estimates in either case, given the known differences in trends across nations that are omitted; perhaps as a result, the explanatory power of NO2 falls. Finally, we also find that there is lights-based evidence of data manipulation over 2005-2020 in terms of trend economic growth, and measurable differences in crisis years. Thus, in studies of economic growth in developing nations, lights are preferable. But in studies of global cyclical economic activity like ours, NO2 is preferable, and upon it we base the remaining analysis. 5 .2 Identifying the Growth Elasticity of NO2 We are interested in annual fluctuations. Given that NO2 is a preferable signal for this purpose, we now wish to construct a composite estimate of economic activity using it, specifically. In order to do so, we need to estimate the growth elasticity of NO2 , previously denoted β (Equation 2). As previously discussed, to retrieve this from a regression of log NO2 (x) on log GDP (z ), we need an instrument which is correlated with true output, but which has measurement errors uncorrelated with both GDP and NO2 . Since Table 2 Columns 2-5 can be interpreted as the 19 Table 4: Data Manipulation. ln(GDP) ln(GDP) ln(GDP) ln(GDP) ln(GDP) ln(GDP) (1) (2) (3) (4) (5) (6) ln(Lights) 0.168*** 0.152*** 0.046** 0.046** (0.046) (0.047) (0.019) (0.019) ln(NO2 ) 0.062* 0.061* (0.033) (0.034) FiW 0.015 0.015 0.036 0.039 -0.601 -0.622 (0.021) (0.029) (0.034) (0.034) (0.411) (0.416) FiW2 -0.002 -0.003 -0.008 -0.008 -0.007 -0.007 (0.004) (0.005) (0.005) (0.005) (0.005) (0.005) Manipulation: Lights ln(Lights)×FiW 0.008* -0.000 -0.004** (0.004) (0.005) (0.002) ln(Lights)×FiW: No Crisis -0.004** (0.002) ln(Lights)×FiW: Crisis -0.005** (0.002) Manipulation: NO2 ln(NO2 )×FiW 0.018 (0.012) ln(NO2 )×FiW: No Crisis 0.019 (0.012) ln(NO2 )×FiW: Crisis 0.022** (0.011) Observations 2,855 3,703 2,622 2,622 2,622 2,622 Countries 170 170 170 170 170 170 Years 92-08 92-13 05-20 05-20 05-20 05-20 Within R-Squared 0.744 0.732 0.807 0.821 0.821 0.822 Robust standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 Notes. All specifications include country and year fixed effects, as well as a country-specific time trend. Sensor fixed effects included for lights when sample goes beyond 2013. Robust standard errors, clustered by country, are reported in parentheses. Column 1 replicates Martinez (2019), August 2019 version, Table 2, Column 6. Eight countries included in the previous 178 country baseline have no Freedom in the World index available in this time sample: Aruba, Anguilla, Cabo Verde, Cayman Islands, Montserrat, Serbia, Sint Maarten, Turks and Caicos Islands. Columns 4 and 6 include a dummy interaction for crisis years, which are designated to be 2009 and 2020. In both, an interaction of this dummy with FiW and FiW2 is also included as an unreported control. 20 first stage from two stage least squares, lights, CO2 , electricity, and surveys are all relevant. Table 5 Column 1 presents an OLS estimate of β = 0.168. This baseline estimate is pre- sumably due to errors-in-variables. Our hypothesis is that night lights are the most promising instrument, since they are also available for the full sample, but always measured at a different time from NO2 , and so there is reason to believe their measurement errors are uncorrelated. Column 2 proceeds using night lights as an instrument, finding a larger estimate of β = 0.262, and corroborating the hypothesis of attenuation in the OLS estimate. Tests for relevance and weak identification both indicate lights are both relevant and not weak. Threats to Identification. Our primary identifying assumption is that the measurement errors in NO2 and lights are uncorrelated. At face value this seems likely, as the two sensors always make measurements at different times: NO2 during the day, and lights at night. How- ever, there is room to critique any exogeneity assumption, which is inherently untestable. In the present case, there may be some concern that meteorological factors such as cloud cover are a confounding factor, since neither NO2 or lights are measurable if cloud cover is too dense. We defend our exogeneity assumption in three ways. First, as previously noted, OMI makes NO2 readings around 1:30 PM local time. Perhaps by coincidence, the VIIRS instrument makes night lights readings around 1:30 AM (Elvidge et al., 2013). Therefore, readings could not be further apart, and there is more than ample opportunity for random changes in cloud cover due to constant, wind assisted movement. Second, there are a variety of other sources of significant measurement error which affect one sensor, but not the other because of their different timing. For example, auroral activity is a significant concern for night lights retrievals but not NO2 . Third, there are other potential instruments available which corroborate the lights-based estimate. In Table 5 Column 3, we appeal to the results of Pinkovskiy and Sala-i Martin (2016), who showed that measurement errors in PovcalNet survey means are uncorrelated with those of lights. Thus, these represent another potential instrument, in which we can have some particular confidence. Survey means cover a smaller sample of countries than satellite-based signals, so they are perhaps not the best instrument for our purposes. However, by including both lights and survey means as instruments, we now have an overidentified model, and can test those overidentifying restrictions with a standard Sargan (1958)-Hansen (1982) J -statistic. We find that it is not possible to reject the null hypothesis that both overidentifying restrictions are valid, with a p-value of 0.273. Of course, validity of overidentifying restrictions does not necessarily imply instrument exogeneity (e.g. Deaton (2010)). But we may also see that the estimate of β changes only marginally with the inclusion of this additional instrument. Robustness Checks. In Table 5, Columns 4 and 5, we further scrutinize our lights-based estimate in Column 2 by including electricity and CO2 as additional instruments. Again, we may not reject the validity of overidentifying restrictions. Moreover, all instruments are jointly 21 Table 5: Instrumental Variables Identification of Growth Elasticity of NO2 (β ). ln(NO2 ) ln(NO2 ) ln(NO2 ) ln(NO2 ) ln(NO2 ) (1) (2) (3) (4) (5) ln(GDP) 0.141*** 0.262** 0.198*** 0.226*** 0.271*** (0.031) (0.116) (0.068) (0.071) (0.070) Instruments ln(Lights) ✓ ✓ ✓ ✓ ln(Survey) ✓ ✓ ✓ ln(Electricity) ✓ ✓ ln(CO2 ) ✓ Relevance Kleibergen-Paap rk LM statistic 14.208 16.933 18.491 26.188 (H0 : Underidentified ) [<0.001] [<0.001] [<0.001] [<0.001] Weak Identification Cragg and Donald (1993) Wald F statistic 200.7 574.1 521.7 496.1 Kleibergen and Paap (2006) rk F statistic 16.0 30.9 28.1 33.8 Overidentifying Restrictions Sargan (1958)-Hansen (1982) J statistic 1.423 4.204 4.916 (H0 : All Exogenous) [0.273] [0.122] [0.178] Observations 2,752 2,752 2,303 2,226 2,071 Countries 178 178 154 149 146 Year 05-20 05-20 05-19 05-19 05-19 R-squared 0.049 0.013 0.045 0.039 0.035 Notes. All specifications include country and year fixed effects. When using nights lights, fixed effects for either OLS sensor (average over DMSP satellites; pre-2014) or VIIRS (2014-2020) included. Robust standard errors, clustered by country, are reported in (parentheses) and p-values in [brackets]. R-Squared is adjusted and within all fixed effects, where applicable. 22 Table 6: Optimal Weighting on GDP in Composite Estimate, λ. Country Data Quality Grade (g ) A B C D E 2 Variance of GDP Measurement Error (σz,g ; 1e-3) 0.73 2.90 4.94 12.99 28.69 Weight on GDP in Composite Estimate (λg ) 0.91 0.72 0.60 0.36 0.20 Observations 240 208 1,697 412 150 Countries 16 13 109 27 10 Notes. All specifications use baseline β = 0.262 from instrumental variables estimation (Table 5). Then, 2 σz,g is estimated individually by-grade g =A, . . ., E, which yields λg according to Equation 6. relevant and not weak. Though the specification with all instruments in Column 5 covers a smaller sample than our original Column 2, which used only lights as an instrument, we reach an estimate of β = 0.271, full circle back to our initial lights-based estimate in Column 2 of β = 0.262. Finally, we note that we have already shown the relationship between GDP and NO2 to be in-common across nations in Table 3, and use β = 0.262 henceforth. 5 .3 Composite Estimate With the estimated growth elasticity of NO2 pinned down, we have enough moments to estimate the weighting λg on GDP in the NO2 composite estimate, by-grades g =A, . . . , E. To do so, we first obtain two estimates for the whole sample of countries, conditional on β : the variance 2 of true, latent economic activity σy = cov(x, z )/β = 7.74e-3 and NO2 measurement error 2 2 2 σx = var(x) − β σy = 5.48e-3. With this information, we may infer the country grade-specific 2 2 variance of GDP measurement error σz,g = var(zg ) − σy for each grade A through E, and compute country grade-specific GDP weights λA , . . . λE using Equation 6. Table 6 reports the results. As we move from grade A countries toward E, the variance of measurement error in GDP rises, as expected. As a result, the weight on GDP in the composite estimate containing NO2 falls, consistent with intuition there should be less weight on GDP when it is more erroneous. The weight on GDP is never 1, even for Grade A countries, indicating there is always some weight on the NO2 proxy. Implications and Context. The fact that the NO2 proxy has information value beyond GDP even for grade A countries is important. Generally, it is thought that GDP from countries in this category encapsulates all available information. Using a fully data-dependent approach, we have shown that, for A-rated countries like the U.S., fully 9% of the optimal measure should come from satellite measures of NO2 . For B ratings, including Spain and Germany, 28% of the composite estimate comes from NO2 . In C-rated countries, including China, that figure rises to 40%. In D and E, including Zimbabwe and the Republic of Yemen, the NO2 weights reach 23 64% and 80%, respectively. The literature on night lights finds weightings on lights-based proxies which are considerably lower. We have found that for grades B and C, 28% to 40% of the composite measure should come from NO2 . In comparison, Henderson et al. (2012) find that only about 15% of the optimal composite estimate should come from night lights for countries in these groups. Chen and Nordhaus (2011) find an even lower weight of and 2% for grade C countries, and 0% for grades A and B. For grades D and E, we found that 64% to 80% of the composite should come from NO2 . Henderson et al. (2012) find a lesser weighting of about 52% for lights in comparable countries, while Chen and Nordhaus (2011)’s estimate is 37% or lower. Sanity Checks. The fact that such a high weighting should come from the NO2 proxy, even in countries with very high data quality grades, may be surprising. However, it is by no means unreasonable. The intuition why is that the proxy is a close fit to GDP for very high data quality countries, so any weighting between GDP and NO2 proxy largely replicates GDP. Figure 7 provides a different perspective to convey this basic intuition. It does so by com- paring composite estimates for China (Grade C, left column) versus the United States. (Grade A, right column). In the top row are time series for reported GDP and the composite estimate. In the bottom row is a clearer representation of the same data which depicts in which years these two measures coincide (the 45 degree line) versus when they do not.17 Here, the units are normalized to percentage change by-year. For China, we find large deviations of the com- posite estimate from reported GDP specifically during the 2009 Financial Crisis and the 2020 COVID-19 pandemic. In both cases, we find that GDP overstates true GDP during periods of crisis, consistent with other studies (Koech et al., 2012; Morris and Zhang, 2018).18 Overall, we find that growth was 1.57% lower than reported in 2009 and 1.28% lower than reported in 2020. Figure 7’s implications for the U.S., which is a placebo where we might expect data to be accurate, are different. In this case, reported GDP and the composite are overlaid, even in crisis years (black “09” and “20”). Note that there is a higher weighting on GDP in the composite estimate simply because the U.S. is grade A. To make a fully objective comparison, we can also calculate a normalized composite estimate as though the U.S. were grade C, using the notation: Red *: Normalized results as though country were in grade C, when not. 17 The dark best-fit line in such figures will always be weakly flatter than the 45 degree line to the inherent property of reversion towards the mean arising from the regression estimation of the proxy. The flatter the line, the more erroneous the data. 18 There is broad disagreement concerning whether GDP in China overstates or understates economic activity. This is perhaps attributable to the fact that China’s reported GDP series are overly smooth, consistent with both (Nakamura et al., 2016). Our conclusion that the cyclical component of GDP overstates economic activity during crisis years is not necessarily inconsistent with conclusions of understatement, including by Clark et al. (2020), who do not study the cyclical component. 24 China: Time Series U.S.A.: Time Series 16.85 16.8 16.8 16.6 16.75 16.4 16.7 16.2 16.65 16 Reported Log Real GDP Reported Log Real GDP 16.6 Composite Estimate Composite Estimate 15.8 2005 2009 2015 2020 2005 2009 2015 2020 Year Year China U.S.A. 12 4 06 15 0706 10 10 18 14 17 12 2 0719 Composite Estimate Composite Estimate 1113 16 8 11 08 13 09 0 08 17 12 14 18 16 4 19 15 -2 09 09 Composite=Reported Composite=Reported 20 Composite Fitted Values 20 Composite Fitted Values 0 -4 20 0 4 8 12 -4 -2 0 2 4 Reported % Chg. Real GDP Reported % Chg. Real GDP China US Reported Composite Difference Reported Composite Difference 2009 7.54 5.97 1.57 -2.57 -2.45 -0.11 * -2.06 -0.51 2020 1.92 0.64 1.28 -3.55 -3.65 0.10 * -4.00 0.46 Figure 7: Reported GDP versus NO2 Composite: % change in China and US Notes:. The top row contains time series for log GDP or the composite estimate. The bottom row contains percentage change in each, viewed as scatter plot, by-year. China is a grade C country (GDP weight λC =0.60) and US is grade A (GDP weight λA =0.91). Red asterisks (*) indicate what the composite estimate, and the resulting difference for the US would be if it were treated as grade C. 25 We find that even in this case (red “09” and “20”), the relative differences between reported and composite estimates are small. Thus, the NO2 composite estimate is completely reasonable. 6 Applications We now turn to a number of applications of the NO2 composite estimate. The difference y − z between the composite estimate y and reported GDP z is the amount of misstatement in GDP. When misstatement is positive, this indicates GDP understates true economic activity; when it is negative, this indicates overstatement. The standard deviation of misstatement is, SD(Misstatement) = (1 − λg )RMSE, (12) where RMSE is the root mean square error from the regression to obtain the proxy. An ordering of all world countries based on this standard deviation provides a natural objective ranking of GDP accuracy. Such a world ranking is uniquely made possible by NO2 , which both has data for all world countries, and explanatory power for fluctuations in each. Table 7 presents estimates of this standard deviation over 2005-2020, by-data quality grade, ranked from highest variability to lowest. Thus, those at the top of the list have the most erroneous GDP data. At the bottom of the table, the overall means of standard deviations increase from grade A to E, indicating that our grades corroborate the grades on average, and are therefore not unreasonable. Specifically, the errors range from 0.25 to 5 percentage points on average, in line with Henderson et al. (2012)’s estimate of trend growth errors up to 3 percentage points. But at the same time, there are individual instances where our objective ordering represents a significant reordering of the existing subjective rankings. For example, Algeria, the most accurate of grade D countries, would be ranked as more accurate than Ireland, the most inaccurate of grade A countries. Overall, we judge that the Republic of Yemen had the most inaccurate GDP data over this period, and Australia the most accurate. One potential issue with the Penn World Tables A through E grades is that they may not be reflective of GDP data accuracy. To further explore the implications of our rankings, we also considered the World Bank’s subjective ranking of GDP accuracy for the 114 countries in our sample of 178 which it grades. The analysis is presented in Online Appendix B.3. While there is a positive correlation between the Penn World Tables and World Bank subjective rankings, there are important differences. Regardless, our ordering entails a significant reordering of the World Bank, rankings as well. In the following section, we consider some of the countries with the most inaccurate GDP data at the top of Table 7, in each grade. In many cases, the ranking is due in part to a single, cyclical episode of especially erroneous data. As we have previously shown, being able 26 Table 7: Objective Data Quality: Standard Deviation of GDP Misstatement, Percentage Points, 2005-2020. A B C D E 1 Ireland 1.03 Greece 1.85 Venezuela, R.B. 9.25 Zimbabwe 10.74 Yemen 13.41 2 Denmark 0.37 Argentina 1.45 Syrian Arab Rep. 7.27 Central African Rep. 8.02 Congo 8.86 3 Luxembourg 0.31 Uruguay 1.17 Azerbaijan 5.25 Liberia 6.61 Chad 6.64 4 Netherlands 0.27 Spain 1.00 Equatorial Guinea 5.24 Anguilla 5.05 Angola 5.95 5 UK 0.24 Portugal 0.92 Qatar 5.07 Comoros 4.37 Iraq 3.39 6 US 0.22 New Zealand 0.91 Maldives 3.59 Togo 4.36 Djibouti 3.30 7 Italy 0.21 Singapore 0.89 Sierra Leone 3.58 Malta 4.16 Uganda 2.23 8 Japan 0.21 Finland 0.85 Cˆote d’Ivoire 3.54 Belarus 3.64 Sudan 2.13 9 Sweden 0.20 Poland 0.61 Turks and Caicos 3.39 Cyprus 3.59 Myanmar 2.03 10 Norway 0.19 Chile 0.57 Antigua and Barbuda 3.33 Suriname 3.23 Niger 1.97 11 Canada 0.18 Germany 0.53 Lebanon 2.91 Mongolia 3.23 12 Belgium 0.15 Israel 0.29 Estonia 2.71 Guinea 3.21 13 Austria 0.15 Korea, Rep. 0.29 Latvia 2.71 Namibia 2.76 14 France 0.14 Grenada 2.55 Lesotho 2.53 15 Switzerland 0.11 Benin 2.38 Turkmenistan 2.44 16 Australia 0.09 Iceland 2.31 Bhutan 2.12 17 Hungary 2.27 Tajikistan 2.05 18 Aruba 2.21 Saudi Arabia 1.98 19 Ukraine 2.18 Montserrat 1.90 20 Armenia 2.14 Mozambique 1.84 21 Lithuania 2.07 Congo Dem. Rep. 1.72 22 Nigeria 2.05 Cambodia 1.57 23 Cayman Islands 1.99 Haiti 1.52 24 Romania 1.96 Lao PDR 1.43 25 Croatia 1.94 Uzbekistan 1.31 26 Gabon 1.92 Guyana 1.31 27 Slovenia 1.91 Algeria 0.99 28 Iran, Islamic Rep. 1.84 29 Zambia 1.83 30 Brazil 1.79 31 Gambia, The 1.78 32 Nicaragua 1.68 33 Fiji 1.65 34 Trinidad and Tobago 1.64 35 Dominica 1.60 36 Peru 1.57 37 Senegal 1.55 38 T¨urkiye 1.55 39 Ecuador 1.53 40 Ghana 1.50 41 Montenegro 1.42 42 China 1.42 . . . . . . . . . 107 Guatemala 0.51 108 Indonesia 0.49 109 Costa Rica 0.43 Mean 0.25 0.87 1.61 3.25 4.99 27 to consider cyclical episodes like these is the core advantage of NO2 versus lights. Moreover, in each case there is at least one country in a data quality ranking for which lights have no explanatory power, and thus to these NO2 is uniquely suited. 6 .1 Instances Where GDP Is Particularly Suspect Economic Crises.and Hyperinflation First, we consider the cases of the Rep´ ublica Bolivar- iana de Venezuela and Zimbabwe, which are at the top of the list in grades C and D respectively. A shared feature of these countries is that they both experienced periods of significant economic distress, hyperinflation, and social unrest. Figure 8 depicts the annual percentage change in reported GDP versus the composite esti- mate for these countries. The Rep´ ublica Bolivariana de Venezuela’s data indicates tremendous losses in GDP during a period of crisis from 2016-2019, culminating with a remarkable -43.08% change in GDP in 2019. While we confirm a large downturn in that year, we find that this figure is a significant overstatement of the true decline, which was likely closer to -27.71%. To understand the source of this error, it is worth noting that the while the data was compiled and published by the IMF, the Venezuelan government has refused to cooperate with IMF authorities in the calculation of GDP since 2004.19 The fact that 2019 followed several years of large declines raises the question of whether several years of underestimating the extent of the recession culminated in one year when authorities were forced to reconcile previous errors. Zimbabwe, on the other hand, reported a remarkably high growth rate of 46.70% in 2009, which is especially surprising given the effects of the 2009 Financial Crisis elsewhere in the same year. We find that the true growth in output was closer to 19.89%. We also considered a grade C normalization for Zimbabwe, which subsequently weights GDP more (*). Even in this case, our estimated composite would still only be 29.81%. To explain this, we note that over 60% of Zimbabwe’s economy may be a shadow economy (Schneider and Enste, 2013). Our finding suggests that this segment of the economy may have responded more elastically to the 2009 global downturn than the measurable, formal economy. Conflict. Figure 9 considers the Syrian Arab Republic and the Republic of Yemen, which are at or near the top of the list in Table 7, in Grades C and E respectively, and both endured civil war during this period. Conflict has direct consequences for macroeconomic performance (H¨onig, 2021) and is also known to cause changes in NO2 (Lelieveld et al., 2015). Both in the Syrian Arab Republic over 2012-2013, and the Republic of Yemen in 2015, we find that actual output was likely higher than reported in GDP. This may be due to informal economies, characterized by unincorporated businesses that can not be counted in GDP, and which are known to arise during conflict (Peksen and Early, 2020; Galdo et al., 2021). It is especially 19 The IMF’s estimate differs somewhat from the Penn World Tables, at -35%, closer to our estimate. 28 Venezuela Zimbabwe 10 60 06 07 08 12 11 0 13 10 40 14 Composite Estimate Composite Estimate 15 09 -10 09 17 18 16 20 09 -20 11 10 12 17 18 13 14 0716 06 08 15 19 0 19 -30 -40 -20 -40 -30 -20 -10 0 10 -20 0 20 40 60 Reported % Chg. Real GDP Reported % Chg. Real GDP Venezuela Zimbabwe Reported Composite Difference Reported Composite Difference 2009 46.70 19.89 26.82 * 29.81 16.89 2019 -43.08 -27.71 -15.37 Figure 8: Hyperinflation. important to measure such uncounted income since civil wars are much more likely to occur in low-income countries where GDP is most unreliable (Blattman and Miguel, 2010). Oil Supply. Figure 10 depicts Azerbaijan and Nigeria, both Grade C countries which expe- rienced issues pertaining to oil in the sample. Thus, Azerbaijan reported world-leading GDP growth of 29.61% in 2006 and 22.68% in 2007. Much of this was due to the completion of the Baku-Tbilisi-Ceyhan oil pipeline which began moving oil pumped in Azerbaijan to world markets. While this no doubt resulted in a sudden increase in income, the question we may ask with satellites is how much of this is reflected in actual, measurable economic activities within Azerbaijan, rather than ledger income entries upon which GDP is based. We find evidence of slightly less steep inclines in economic activity, at 20.44% in 2006 and 16.43% in 2007. Nigeria, Africa’s largest economy, on the other hand experienced the first decline in its GDP in 25 years in 2016. This was primarily attributed to a decline in oil prices. We find that the reported decline of -1.63% may not have existed; in fact, the economy may have grown by up to 0.74% that year. We note that, similar to Zimbabwe, over 60% of Nigeria’s overall economic activity is thought to arise from shadow economies (Omodero, 2019). At the same time, we find that the actual increase in Nigerian output during the 2009 Financial Crisis may have been 2.68%, less than previously thought, perhaps also due to similar concerns. 29 Syria Yemen 10 10 07 06 0 0 11 0809 06 07 19 10 18 08 19 Composite Estimate Composite Estimate 13 10 1209 18 17 16 14 11 17 1516 -10 -10 15 14 -20 15 -20 1312 -30 -30 -40 -30 -20 -10 0 10 -40 -30 -20 -10 0 10 Reported % Chg. Real GDP Reported % Chg. Real GDP Syria Yemen Reported Composite Difference Reported Composite Difference 2012 -30.57 -20.94 -9.63 2013 -30.52 -21.12 -9.39 2015 -36.32 -13.71 -22.62 * -24.88 -11.44 Figure 9: Conflict. Azerbaijan Nigeria 30 8 0607 10 6 08 13 20 06 11 14 Composite Estimate Composite Estimate 09 07 12 4 10 15 18 19 08 13 09 10 2 17 12 19 1514 18 17 0 11 16 16 0 -10 -2 20 -10 0 10 20 30 -2 0 2 4 6 8 Reported % Chg. Real GDP Reported % Chg. Real GDP Azerbaijan Nigeria Reported Composite Difference Reported Composite Difference 2006 29.61 20.44 9.18 2007 22.68 16.43 6.26 2009 7.73 5.05 2.68 2016 -1.63 0.74 -2.37 Figure 10: Oil Supply. 30 Greece Argentina 10 5 06 060710 07 5 11 Composite Estimate Composite Estimate 18 1719 08 0 14 08 17 15 13 15 16 0 13 12 1418 18 16 19 -5 10 09 12 -5 09 11 11 20 20 -10 -10 -10 -5 0 5 -10 -5 0 5 10 Reported % Chg. Real GDP Reported % Chg. Real GDP Greece Argentina Reported Composite Difference Reported Composite Difference 2011 -10.70 -7.95 -2.75 * -6.80 -3.90 2018 -2.51 -1.55 -0.86 * -1.15 -1.36 Figure 11: Financial Distress. 6 .2 More Nuanced Cases We now consider instances of GDP errors for countries within grades A and B, which are less obvious, and yet still measurable. Financial Distress. Figure 11 presents the example of financial distress in Greece and Ar- gentina, which consequently were are at the top of the list among grade B countries in Table 7. First is the Greek government debt crisis, which reached a crescendo in 2011-2012 and amounted to an overall reduction in GDP by 25% over 2008-2013 (Gourinchas et al., 2017). We find that the officially reported change in economic activity of -10.70% in 2011 was in reality closer to -7.95%, and even more moderate, -6.80%, if Greece had been treated as a grade C country. One possible explanation could be related to high levels of informality. In particular, a greater fraction of output comes from the self-employed in Greece relative to other nations, making tax evasion an imporant concern (Alstadsæter et al., 2019). The 2018 Argentine monetary crisis, however, tells a different story. In this case, we find very modest differences between reported changes in GDP, and the composite estimate. This is the case regardless of how we treat the weight on GDP and despite the fact that tax evasion is also a problem in Argentina. 31 Ireland U.K. 5 20 15 06 0714 15 13 10 15 16 17 11 12 18 19 Composite Estimate Composite Estimate 0 08 10 17 14 18 07 19 06 09 10 16 1320 11 -5 0 12 08 09 20 -10 -10 20 -10 0 10 20 -10 -5 0 5 Reported % Chg. Real GDP Reported % Chg. Real GDP Ireland UK Reported Composite Difference Reported Composite Difference 2015 22.46 20.80 1.65 * 15.15 7.31 2020 -10.37 -9.96 -0.41 * -8.56 -1.81 Figure 12: Multinationals. Multinationals. In Figure 12, we finally turn to countries in the highest data quality grade A. At the top of the list in Table 7 was Ireland, which in 2015 reported an outlier 22.46% growth rate in GDP. This was due to the counting of intellectual property from multinational corporations attracted by relatively low tax rates. In this case, the reported figure causes us to pause and reflect whether GDP, which is inflated by such intangible capital, measures what we want it to: economic activity. The composite measure allows us to measure this more directly, as only economic activities cause NO2 . Because Ireland is a grade A country, there is only a small weight on the composite estimate, so our initial estimate of 20.80% does not differ much from the reported figure. However, if we treat Ireland as a grade C country, we obtain a smaller composite estimate of 15.15%, fully 7.31% less than the reported rate. In this instance, we conclude that GDP may not be an accurate reflection of true economic activity. For comparison, we place Ireland side-by-side with the UK., which did not have the same experience pertaining to multinationals, but is similar in grade and other characteristics. During the 2020 COVID-19 crisis, the UK reported an astounding -10.45% change in GDP over 2020. The composite estimate suggests the actual change may have been somewhere in the range of -9.96% to -8.56%, depending on how we treat the weighting on GDP. To interpret this finding, it would be useful to compare it with differences elsewhere. We now create a country ranking of the difference between reported GDP and the composite estimate for context. 32 6 .3 Global Economic Activity during the 2020 Pandemic During the early phases of the pandemic, it was widely reported in the media and noted in scientific studies that NO2 was falling globally, and that this was indicative of declines in economic activity (Carlton, 2020; Venter et al., 2020; Liu et al., 2020; Mesas-Carrascosa et al., 2020; Zhao et al., 2020). We now use the composite estimate to understand to what degree these measurements overturn officially reported GDP during this unprecedented year. This is done for the full subsample of 81 countries for which the World Bank had compiled 2020 GDP at the time of writing. Table 8 presents the results, broken down into instances of understatement (Panel A) and overstatement (Panel B). In each panel, the countries are listed from largest absolute magnitude difference to smallest. For each country, we compute a percentile of how many other years in the 2005-2020 sample had larger absolute differences; thus, smaller percentiles indicate that the difference reported for this country is particularly large, and may be deemed an unusually sizable error.20 Finally, we also compute a normalized difference as though every country were grade C (*). Here we show just the top 10 countries and selected others in each panel. Full expansions of both the understatement and overstatement lists to show all countries, as well as a visual depiction of the extent of normalizations versus the baseline are illustrated graphically in Online Appendix B.4. Understatement. Table 8 Panel A lists the largest instances of understatement, in order. Beginning at the top of the list with #1, we find that grade C Belize’s reported -15.05% contraction in GDP was not a fully accurate depiction of the downturn, which was likely more muted at -10.55%. The percentile of 0.00 indicates that this was the greatest difference between reported GDP and the composite estimate for Belize over the whole 2005-2020 sample. Peru at #3 is also notable because it experienced one of the worst per capita death tolls from COVID- 19 worldwide. Notwithstanding, we find the formidable -11.81% figure reported in GDP is a significant understatement. Similar conclusions can be made for many other countries down the list. Further down, at #31, is the UK, previously depicted in Figure 12. While the absolute difference is smaller than for some other less highly-related countries, the -0.41% difference for the UK still is a 0.00 percentile figure, meaning it was unusually large. Overall, there is systematic evidence of underreporting. Overstatement. Table 8 Panel B lists instances of overstatement, from largest to smallest. Notable instances include #4 China, previously depicted in Figure 7. While China experienced 20 Percentile may not be directly interpreted as a p-value, but does provide a qualitative sense of how much of an outlier a given difference is. It is important to consider percentile in the interpretation of under- or overstatement because except in knife’s edge cases where GDP happens to coincide with proxy, a composite estimate will always technically imply a difference. 33 Table 8: Growth in 2020 (%). A. Understatement. Country Grade Reported Composite Difference Difference* Percentile 1 Belize C -15.05 -10.55 -4.49 0.00 2 Namibia D -8.30 -4.04 -4.26 -2.69 0.00 3 Peru C -11.81 -7.82 -3.99 0.00 4 Mongolia D -6.06 -2.09 -3.96 -2.50 6.67 5 Philippines C -9.80 -5.94 -3.86 0.00 6 Malta D -7.13 -3.86 -3.27 -2.06 13.33 7 Botswana C -8.23 -5.28 -2.95 6.67 8 India C -7.37 -4.67 -2.70 0.00 9 Honduras C -9.39 -6.77 -2.63 0.00 10 Morocco C -7.38 -4.93 -2.45 6.67 . . . . . . . . . . . . . . . . . . . . . . . . 31 United Kingdom A -10.37 -9.96 -0.41 -1.81 0.00 . . . . . . . . . . . . . . . . . . . . . . . . 42 Romania C -3.72 -3.71 -0.01 93.33 B. Overstatement. Country Grade Reported Composite Difference Difference* Percentile 1 Serbia C -0.91 -2.62 1.71 0.00 2 T¨ urkiye C 1.58 0.14 1.44 20.00 3 Egypt C 1.48 0.12 1.35 13.33 4 China C 1.92 0.63 1.29 13.33 5 Vietnam C 2.78 1.51 1.27 0.00 6 Ukraine C -4.17 -5.36 1.19 60.00 7 Lithuania C -0.77 -1.95 1.18 26.67 8 Belarus D -0.97 -1.99 1.03 0.65 73.33 9 Korea, Rep. B -0.92 -1.88 0.96 1.36 0.00 10 Albania C -3.26 -4.13 0.87 13.33 . . . . . . . . . . . . . . . . . . . . . . . . 34 United States A -3.55 -3.65 0.10 0.46 40.00 . . . . . . . . . . . . . . . . . . . . . . . . 39 Poland B -2.77 -2.78 0.01 0.02 93.33 Notes: Percentile is percent of years 2005-2019 for which absolute difference between reported data and composite estimate for country was larger. In Panel A, Understatement, 19 out of 42 countries (45%) have percentile of 0.00. In Panel B, Overstatement, 6 out of 39 countries(15%) have percentile of 0.00. 34 a quicker re-opening from COVID-19 lockdowns than some other countries, we find that the overall reported increase in GDP by 1.92% is likely an overstatement of the true increase in economic activity. Figure 7 also depicted the US, seen at #34 in this list. While the 2020 statistic of -3.55% was technically an overstatement, we find a composite estimate of -3.65% was very close; in fact, the difference between reported GDP and composite is no greater than in 40% of other years over 2005-2020. Thus, we conclude reported GDP was correct in this case. Interpretation. While there are nearly as many countries in the list of overstatement (39) as understatement (42), it is worth noting that percentiles for overstatement are in general larger than those reported with respect to understatement, and there are limited instances of zeros. Specifically, only 6 of the 39 overstatement countries (15%) are associated with a percentile of 0.00, whereas 19 of the 42 understatement countries (45%) are. Thus, for nearly half of the countries in the understatement list, this was the largest instance of misreporting over the entire sample of 2005-2020. Altogether, there is more convincing evidence of understatement of economic activity across countries during 2020 than overstatement. A potential explanations for this is the crisis dramatically changed the way economic activities were conducted–for example, by video conference–that may have been difficult to quantify in GDP. However, since the primary source of world electricity is fossil fuel combustion, such activities would be measured indirectly in NO2 by satellites. The same is true of changes in automobile traffic. While no signal is perfect, there is at least evidence in NO2 that many countries’ GDP in 2020 understated the extent of ongoing activities, many of them unmeasurable otherwise. 7 Conclusion In this paper, we have presented a global analysis of cyclical economic activity using NO2 , and shown that this signal may be used to compute improved measures of economic activity in comparison with GDP alone. In contrast with the exceedingly short list of existing globally- available signals like night lights, NO2 uniquely has reliable explanatory power for off-trend business cycle fluctuations. It also has information content beyond GDP for all countries. Limitations. As in any single study, there are a number of possible limitations deserving fur- ther insight. First, heterogeneity in production technologies across countries could mean that NO2 is a better signal of economic activity in some countries than others. We have found by formal statistical test that the relationship between NO2 and GDP is stable across countries, supporting our presumption of a fixed growth elasticity (Table 3). Yet, investigating dimen- sions of heterogeneity may be an important concern in other applications. Second, production 35 technology may change over time. Since the time sample we use of 2005-2020 is relatively short, this may not be a great concern in our context. For instance, Morris and Zhang (2018) find no technical change in the NO2 -GDP relationship for China over a similar length sample. However, the issue may be pertinent for longer horizons. Future Work. There are a number of other possible applications not considered here which may represent fruitful avenues for future work. First, there is the problem of subnational measurement of GDP which we have thus far not considered in a formal sense, and could lead to informative comparisons with previous night lights studies doing the same (Hodler and Raschky, 2014; Henderson et al., 2018). While the high resolution of NO2 is naturally amenable to such an application, this yields other concerns, like discontinuities near borders (Pinkovskiy, 2017). Second, another application of the data is towards improving the time resolution of low frequency GDP data by interpolation. Especially in developing nations where GDP is only released annually, the potential to reliably estimate at higher frequency is valuable. This may also allow for synergies with previous work; high-frequency estimates of economic activity at the district level in South Asia supported a granular assessment of the impact of several major shocks, including a surge in conflict in Afghanistan, the demonetization experiment in India, and massive earthquakes in Nepal (Beyer et al., 2018; Chodorow-Reich et al., 2020). In such applications, the issue of disentangling seasonality appears to be the key challenge. Third, with respect to GDP, the complexity of data collection gives rise to a lag between the end of the period to be evaluated and the availability of even the most preliminary of estimates. This “waiting period” may be substantial; one to two months among advanced economies and a quarter to two quarters among emerging markets. Daily NO2 data is made freely, publicly available by NASA online just a few days after measurement. Given the quick availability of this resource, there is also the potential for applications to real-time nowcasting, and the reduction in GDP revisions, with substantial benefits for policymaking. References Alstadsæter, Annette, Niels Johannesen, and Gabriel Zucman, “Tax evasion and inequality,” American Economic Review, 2019, 109 (6), 2073–2103. Baragwanath, Kathryn, Ran Goldblatt, Gordon Hanson, and Amit K Khandelwal, “Detecting urban markets with satellite imagery: An application to India,” Journal of Urban Economics, 2019, p. 103173. Bazzi, Samuel, Martin Fiszbein, and Mesay Gebresilasse, ““Rugged individualism” and collective (In) action during the COVID-19 pandemic,” Journal of Public Economics, 2021, 195, 104357. 36 Berenzin, EV, IB Konovalov, P Ciais, A Richter, S Tao, G Janssens-Maenhout, M Beekmann, and Ernst Detlef Schulze, “Multiannual changes of CO2 emissions in China: Indirect estimates derived from satellite measurements of tropospheric NO2 columns,” Atmospheric Chemistry and Physics, 2013, 13, 9415–9438. Beyer, Robert CM, Esha Chhabra, Virgilio Galdo, and Martin Rama, Measuring districts’ monthly economic activity from outer space, The World Bank, 2018. , Sebastian Franco-Bedoya, and Virgilio Galdo, “Examining the economic impact of COVID-19 in India through daily electricity consumption and nighttime light intensity,” World Development, 2021, 140, 105287. Blattman, Christopher and Edward Miguel, “Civil war,” Journal of Economic Literature, 2010, 48 (1), 3–57. Buell, Brandon, Reda Cherif, Carissa Chen, Hyeon-Jae Seo, Jiawen Tang, and Nils Wendt, “Impact of COVID-19: Nowcasting and Big Data to Track Economic Activity in Sub-Saharan Africa,” IMF Working Paper, 2021. Carlton, Jim, “Coronavirus Offers a Clear View of What Causes Air Pollution,” The Wall Street Journal, 2020. Castellanos, Patricia and K Folkert Boersma, “Reductions in Nitrogen Oxides Over Europe Driven by Environmental Policy and Economic Recession,” Scientific Reports, 2012, 2 (1), 1–7. Chen, Shaohua and Martin Ravallion, “The developing world is poorer than we thought, but no less successful in the fight against poverty,” The Quarterly Journal of Economics, 2010, 125 (4), 1577–1625. , , and Prem Sangraula, “PovcalNet,” Available at http: // iresearch. worldbank. org/ PovcalNet/ home. aspx . Chen, Xi and William D. Nordhaus, “Using Luminosity Data as a Proxy for Economic Statistics,” Proceedings of the National Academy of Sciences, 2011, 108 (21), 8589–8594. Chetty, Raj, John N Friedman, Nathaniel Hendren, Michael Stepner, and The Opportunity Insights Team, The economic impacts of COVID-19: Evidence from a new public database built using private sector data number w27431, National Bureau of Economic Research, 2020. Chodorow-Reich, Gabriel, Gita Gopinath, Prachi Mishra, and Abhinav Narayanan, “Cash and the economy: Evidence from india’s demonetization,” The Quarterly Journal of Economics, 2020, 135 (1), 57–103. Clark, Hunter, Maxim Pinkovskiy, and Xavier Sala i Martin, “China’s GDP growth may be understated,” China Economic Review, 2020, 62 (C). Cragg, John G and Stephen G Donald, “Testing identifiability and specification in in- strumental variable models,” Econometric Theory, 1993, pp. 222–240. 37 Deaton, Angus, “Measuring Poverty in a Growing World (or Measuring Growth in a Poor World),” Review of Economics and Statistics, 2005, 87 (1), 1–19. Deaton, Angus, “Instruments, randomization, and learning about development,” Journal of economic literature, 2010, 48 (2), 424–55. and Alan Heston, “Understanding PPPs and PPP-Based National Accounts,” American Economic Journal: Macroeconomics, 2010, 2 (4), 1–35. Donaldson, Dave and Adam Storeygard, “The View From Above: Applications of Satellite Data in Economics,” Journal of Economic Perspectives, 2016, 30 (4), 171–98. Elvidge, Christopher D, Kimberly E Baugh, Mikhail Zhizhin, and Feng-Chi Hsu, “Why VIIRS data are superior to DMSP for mapping nighttime lights,” Proceedings of the Asia-Pacific Advanced Network, 2013, 35 (0), 62. , Tilottama Ghosh, Feng-Chi Hsu, Mikhail Zhizhin, and Morgan Bazilian, “The Dimming of Lights in China during the COVID-19 Pandemic,” Remote Sensing, 2020, 12 (17), 2851. Galdo, Virgilio, Acevedo Gladys, Lopez and Martin Rama, “Conflict and the com- position of economic activity in Afghanistan.” IZA Journal of Development and Migration, 2021, 12 (1), pp. Geddes, Jeffrey A, Randall V Martin, Brian L Boys, and Aaron van Donkelaar, “Long-term trends worldwide in ambient NO2 concentrations inferred from satellite obser- vations,” Environmental health perspectives, 2016, 124 (3), 281–289. Ghanem, Dalia and Junjie Zhang, “‘Effortless Perfection:’Do Chinese cities manipulate air pollution data?,” Journal of Environmental Economics and Management, 2014, 68 (2), 203–225. Gibson, John, Susan Olivia, and Geua Boe-Gibson, “Night Lights in Economics: Sources ´ and Uses,” Etudes et Documents, 2020, 1. Gkatzelis, Georgios I, Jessica B Gilman, Steven S Brown, Henk Eskes, A Rita Gomes, Anne C Lange, Brian C McDonald, Jeff Peischl, Andreas Petzold, Chelsea R Thompson et al., “The global impacts of COVID-19 lockdowns on urban air pollution: A critical review and recommendations,” Elementa: Science of the Anthropocene, 2021, 9 (1). Goolsbee, Austan and Chad Syverson, “Fear, lockdown, and diversion: Comparing drivers of pandemic economic decline 2020,” Journal of Public Economics, 2021, 193, 104311. Gourinchas, Pierre-Olivier, Thomas Philippon, and Dimitri Vayanos, “The analytics of the Greek crisis,” NBER Macroeconomics Annual, 2017, 31 (1), 1–81. Hamilton, James D, “Why You Should Never Use the Hodrick-Prescott Filter,” Review of Economics and Statistics, 2018, 100 (5), 831–843. 38 Hansen, Lars Peter, “Large sample properties of generalized method of moments estimators,” Econometrica: Journal of the Econometric Society, 1982, pp. 1029–1054. Henderson, J. Vernon, Adam Storeygard, and David N. Weil, “Measuring Economic Growth from Outer Space,” American Economic Review, 2012, 102 (2), 994–1028. Henderson, J Vernon, Tim Squires, Adam Storeygard, and David Weil, “The global distribution of economic activity: nature, history, and the role of trade,” The Quarterly Journal of Economics, 2018, 133 (1), 357–406. Hodler, Roland and Paul A Raschky, “Regional favoritism,” The Quarterly Journal of Economics, 2014, 129 (2), 995–1033. onig, Tillman, “The Legacy of Conflict: Aggregate Evidence from Sierra Leone,” Working H¨ Paper, 2021. Itahashi, S, I Uno, H Irie, J-I Kurokawa, and T Ohara, “Regional modeling of tropo- spheric NO2 vertical column density over East Asia during the period 2000–2010: comparison with multisatellite observations,” Atmospheric Chemistry and Physics, 2014, 14 (7), 3623– 3635. Jayachandran, Seema, “Air quality and early-life mortality evidence from Indonesia’s wild- fires,” Journal of Human resources, 2009, 44 (4), 916–954. Jean, Neal, Marshall Burke, Michael Xie, W Matthew Davis, David B Lobell, and Stefano Ermon, “Combining satellite imagery and machine learning to predict poverty,” Science, 2016, 353 (6301), 790–794. Johnson, Simon, William Larson, Chris Papageorgiou, and Arvind Subramanian, “Is Newer Better? Penn World Table Revisions and Their Impact on Growth Estimates,” Journal of Monetary Economics, 2013, 60 (2), 255–274. Keller, Christoph A, Mathew J Evans, K Emma Knowland, Christa A Hasenkopf, Sruti Modekurty, Robert A Lucchesi, Tomohiro Oda, Bruno B Franca, Felipe C Mandarino, M Valeria D´ ıaz Su´arez et al., “Global impact of COVID-19 restrictions on the surface concentrations of nitrogen dioxide and ozone,” Atmospheric Chemistry and Physics, 2021, 21 (5), 3555–3592. Kleibergen, Frank and Richard Paap, “Generalized reduced rank tests using the singular value decomposition,” Journal of econometrics, 2006, 133 (1), 97–126. Koech, Janet, Jian Wang et al., “China’s slowdown may be worse than official data sug- gest,” Economic Letter, 2012, 7. Lamsal, LN, RV Martin, A Van Donkelaar, EA Celarier, EJ Bucsela, KF Boersma, R Dirksen, C Luo, and Y Wang, “Indirect validation of tropospheric nitrogen dioxide retrieved from the OMI satellite instrument: Insight into the seasonal variation of nitrogen oxides at northern midlatitudes,” Journal of Geophysical Research: Atmospheres, 2010, 115 (D5). 39 , , , M Steinbacher, EA Celarier, E Bucsela, EJ Dunlea, and JP Pinto, “Ground-level nitrogen dioxide concentrations inferred from the satellite-borne Ozone Mon- itoring Instrument,” Journal of Geophysical Research: Atmospheres, 2008, 113 (D16). Lamsal, Lok N, Nickolay A Krotkov, Alexander Vasilkov, Sergey Marchenko, Wen- han Qin, Eun-Su Yang, Zachary Fasnacht, Joanna Joiner, Sungyeon Choi, David Haffner et al., “Ozone Monitoring Instrument (OMI) Aura nitrogen dioxide standard prod- uct version 4.0 with improved surface and cloud treatments,” Atmospheric Measurement Techniques, 2021, 14 (1), 455–479. Lelieveld, Jos, Steffen Beirle, Christoph H¨ ormann, Georgiy Stenchikov, and Thomas Wagner, “Abrupt recent trend changes in atmospheric nitrogen dioxide over the Middle East,” Science advances, 2015, 1 (7), e1500498. Leue, C., Mark Wenig, T. Wagner, Oliver Klimm, Ulrich Platt, and Bernd J¨ ahne, “Quantitative Analysis of NOx Emissions from GOME-Satellite Image Sequences,” Journal of Geophysical Research, 03 2001, 106, 5493–5506. Levin, Noam, Christopher CM Kyba, Qingling Zhang, Alejandro S´ anchez de Miguel, Miguel O Rom´ an, Xi Li, Boris A Portnov, Andrew L Molthan, An- dreas Jechow, Steven D Miller et al., “Remote sensing of night lights: A review and an outlook for the future,” Remote Sensing of Environment, 2020, 237, 111443. Lin, J-T and Michael Brendon McElroy, “Detection from space of a reduction in anthro- pogenic emissions of nitrogen oxides during the Chinese economic downturn,” Atmospheric Chemistry and Physics, 2011, 11 (15), 8171–8188. Liu, Fei, Aaron Page, Sarah A Strode, Yasuko Yoshida, Sungyeon Choi, Bo Zheng, Lok N Lamsal, Can Li, Nickolay A Krotkov, Henk Eskes et al., “Abrupt decline in tropospheric nitrogen dioxide over China after the outbreak of COVID-19,” Science Advances, 2020, 6 (28), eabc2992. Martinez, Luis R, “How Much Should We Trust the Dictator’s GDP Growth Estimates?,” Revise and Resubmit, Journal of Political Economy, 2019. Mesas-Carrascosa, Francisco-Javier, Fernando P´ erez Porras, Paula Trivi˜ no- Tarradas, Alfonso Garc´ ıa-Ferrer, and Jose Emilio Mero˜no-Larriva, “Effect of lock- down measures on atmospheric nitrogen dioxide during SARS-CoV-2 in Spain,” Remote Sensing, 2020, 12 (14), 2210. Middleton, DR, “Modelling air pollution transport and deposition,” 1995. Mills, Stephen, Stephanie Weiss, and Calvin Liang, “VIIRS day/night band (DNB) stray light characterization and correction,” in “Earth Observing Systems XVIII,” Vol. 8866 International Society for Optics and Photonics 2013, p. 88661P. Montgomery, Anastasia and Tracey Holloway, “Assessing the relationship between satellite-derived NO2 and economic growth over the 100 most populous global cities,” Journal of Applied Remote Sensing, 2018, 12 (4), 042607. 40 Morris, Stephen D and Junjie Zhang, “Validating China’s Output Data Using Satellite Observations,” Macroeconomic Dynamics, 2018, 23 (8), 3327–3354. Nakamura, Emi, J´ on Steinsson, and Miao Liu, “Are Chinese growth and inflation too smooth? Evidence from Engel curves,” American Economic Journal: Macroeconomics, 2016, 8 (3), 113–44. Omodero, Cordelia Onyinyechi, “The financial and economic implications of underground economy: The Nigerian perspective,” Academic Journal of Interdisciplinary Studies, 2019, 8 (2), 155–155. Peksen, Dursun and Bryan Early, “Internal Conflicts and Shadow Economies,” Journal of Global Security Studies, 2020, 5 (3), 463–477. Pinkovskiy, Maxim and Xavier Sala i Martin, “Lights, camera. . . income! Illuminating the national accounts-household surveys debate,” The Quarterly Journal of Economics, 2016, 131 (2), 579–631. Pinkovskiy, Maxim L, “Growth discontinuities at borders,” Journal of Economic Growth, 2017, 22 (2), 145–192. Rawski, Thomas G, “China by the numbers: How reform affected Chinese economic statis- tics,” China Perspectives, 2001, 33, 25–34. Richter, Andreas, John P Burrows, Hendrik N¨ uß, Claire Granier, and Ulrike Niemeier, “Increase in tropospheric nitrogen dioxide over China observed from space,” Nature, 2005, 437 (7055), 129–132. Russell, AR, LC Valin, and RC Cohen, “Trends in OMI NO 2 observations over the United States: effects of emission control technology and the economic recession,” Atmospheric Chemistry and Physics, 2012, 12 (24), 12197–12209. Sargan, John D, “The estimation of economic relationships using instrumental variables,” Econometrica: Journal of the Econometric Society, 1958, pp. 393–415. Schneider, Friedrich and Dominik H Enste, The shadow economy: An international survey, Cambridge University Press, 2013. Seinfeld, John H and Spyros N Pandis, Atmospheric chemistry and physics: from air pollution to climate change, John Wiley & Sons, 2016. Sinton, Jonathan E, “Accuracy and reliability of China’s energy statistics,” China Economic Review, 2001, 12 (4), 373–383. Velders, Guus J. M., Claire Granier, Robert W. Portmann, Klaus Pfeilsticker, Mark Wenig, Thomas Wagner, Ulrich Platt, Andreas Richter, and John P. Bur- rows, “Global Tropospheric NO2 Column Distributions: Comparing Three-Dimensional Model Calculations with GOME Measurements,” Journal of Geophysical Research: Atmo- spheres, 2001, 106 (D12), 12643–12660. 41 Venter, Zander S, Kristin Aunan, Sourangsu Chowdhury, and Jos Lelieveld, “COVID-19 lockdowns cause global air pollution declines,” Proceedings of the National Academy of Sciences, 2020, 117 (32), 18984–18990. Wallace, Jeremy L, “Juking the stats? Authoritarian information problems in China,” British Journal of Political Science, 2016, 46 (1), 11–29. Wang, SW, Qiang Zhang, DG Streets, KB He, RV Martin, LN Lamsal, D Chen, Y Lei, and Z Lu, “Growth in NOx emissions from power plants in China: bottom-up estimates and satellite observations,” Atmospheric Chemistry and Physics, 2012, 12 (10), 4429–4447. Wei, Zigang, Shobha Kondragunta, Kai Yang, Hai Zhang, and Brian McDonald, “Correlating Economic Activity Indicators and Tropospheric Column Nitrogen Dioxide dur- ing COVID-19 Pandemic in the United States,” 2020. Zhao, Yanbin, Kun Zhang, Xiaotian Xu, Huizhong Shen, Xi Zhu, Yanxu Zhang, Yongtao Hu, and Guofeng Shen, “Substantial changes in nitrogen dioxide and ozone after excluding meteorological impacts during the COVID-19 outbreak in mainland China,” Environmental Science & Technology Letters, 2020, 7 (6), 402–408. 42 Online Appendix Contents A NO2 : Context and Characteristics S1 A.1 OMI Details and Alternative Sources . . . . . . . . . . . . . . . . . . . . . . . . S1 A.2 US Downward Trend and Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . S1 A.3 World Geographic Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . S1 A.4 COVID-19 Lockdown Stringency and NO2 . . . . . . . . . . . . . . . . . . . . . S3 A.5 US Political Landscape and Collective Inaction . . . . . . . . . . . . . . . . . . S4 B Other Results S6 B.1 Night Lights Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S6 B.2 Results without Country Time Trend . . . . . . . . . . . . . . . . . . . . . . . . S10 B.3 Alternative Country Grades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S12 B.4 2020 Misstatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S12 References S15 List of Figures S1 NO2 : United States. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S2 S2 NO2 : World Concentrations of NO2 as of Year-End 2019. . . . . . . . . . . . . . S3 S3 NO2 : World Difference Between First Half of 2019 and 2020. . . . . . . . . . . . S4 S4 COVID-19 Shutdowns and NO2 in Hubei Province and Northern Italy. . . . . . S5 S5 COVID-19 Lockdown Stringency Index and NO2 , by-Country. . . . . . . . . . . S6 S6 U.S. County-Level COVID-19 Change in NO2 and 2016 Election Results. . . . . S7 S7 Data Quality Grades Correspondence . . . . . . . . . . . . . . . . . . . . . . . . S14 S8 Effect on Composite Estimate for 2020 when Normalized to C-Grade Weight . . S18 List of Tables S1 Updated Data and Assumptions in Henderson et al. (2012) Baseline. . . . . . . S9 S2 Effect of Extending DMSP-OLS Baseline with More Recent VIIRS Data. . . . . S10 S3 Proxy: No Country Time Trend. . . . . . . . . . . . . . . . . . . . . . . . . . . . S11 S4 Proxy: By-Country Data Quality Grade, No Country Time Trend. . . . . . . . . S12 S5 Data Manipulation: Role of Country Time Trend. . . . . . . . . . . . . . . . . . S13 S6 Growth in 2020 (%): Understatement. . . . . . . . . . . . . . . . . . . . . . . . S16 S7 Growth in 2020 (%): Overstatement. . . . . . . . . . . . . . . . . . . . . . . . . S17 i A NO2: Context and Characteristics A .1 OMI Details and Alternative Sources This paper uses OMI NO2 VCDs from NASA. Like night lights, NO2 is not measurable in certain circumstances, such as if there is too much cloud cover. In all results, we use data that are not flagged for misreadings from other aerosols, misleading snow-covered backgrounds, or if clouds exceed 30% of the field of view. These choices are consistent with standards in scientific studies to avoid spurious conclusions. Averaging over days and using data quality flags correctly are critical to drawing proper conclusions from VCDs. NASA currently expects OMI to operate at least until 2022, and likely beyond. However, it is not the only available satellite data resource for NO2 . For example, the earlier European Space Agency (ESA) instrument GOME, with lower spatial resolution than OMI, was in service from 1995 to 2011 (Geddes et al., 2016). More recent is the ESA’s TROPOMI instrument, which was launched in 2017, and has a higher resolution than OMI (7 km × 7 km) (Veefkind et al., 2012). We opt for OMI data in this study not only because the time sample is longer, but because the NO2 product uses the same algorithm for the entire, uninterrupted time period, which is not the case for all satellites. Other data products are available for inferring higher-resolution data from OMI in subsamples (Laughner et al., 2018) and exciting near-future technologies including NASA’s TEMPO instrument promise to have a sharper spatial resolution, about 2 km × 4.5 km. A .2 US Downward Trend and Cycles Figure S1 presents both the 365 day rolling average of NO2 VCDs for the US, along with the cyclical component. There are two points worth emphasizing. First, the US is one of the developed nations in the sample with a distinctly downward trend in NO2 (See Table 1, Panel B). In all the analyses, we include country-specific time trends. Thus, we are considering just the cyclical component of the data, which has no trend, and is an indicator of cyclical economic activity rather than trend growth. Second, in US cyclical NO2 data, the mild 2019 “manufacturing recession” is evident. This is part of the reason why world NO2 concentrations were already below trend before 2020 (Figure 2). A .3 World Geographic Heterogeneity Figure S2 shows the clear correspondence between global economic activity and NO2 as of year- end 2019. The figure is also suggestive that it may be the case that some developing countries are subject to more pollution for the same amount of output (e.g. China and India). This is another reason why controlling for country-specific trends is important. S1 NO2: Data Financial Crisis Pandemic 1.5 1.5 NO2: % Difference from Trend 1.4 1.4 1.3 1.3 1.2 1.2 1.1 1.1 08 09 10 11 12 13 14 15 16 17 18 19 20 21 NO2: Cyclical Component Financial Crisis Pandemic 10 10 NO2: % Difference from Trend 5 5 0 0 -5 -5 -10 -10 08 09 10 11 12 13 14 15 16 17 18 19 20 21 Figure S1: NO2 : United States. Notes. Data is 365 day rolling average. Trend component calculated using Hamilton (2018) filter. S2 Figure S3 shows COVID-19 shutdown effects, which may be compared with Figure S2 to get an idea where reductions were greatest. Viewing the world all at once results in less clear takeaways than individual nations, due to different orders of magnitude across countries. Figures 3-5 in the main text provide closeups of China, Europe, and the US within this figure, and provide for clearer intuition and takeaways. A .4 COVID-19 Lockdown Stringency and NO2 A basic premise underlying some of our results is that COVID-19 lockdowns immediately caused declines in economic activity that were then measurable in NO2 . In Figure S4, these declines are first clearly apparent in Hubei province (containing Wuhan) by simply comparing the two- week difference before and after the first lockdown (Panel A). Similar outcomes are obvious subsequent to other quarantines, including in northern Italy (Panel B). To underscore the fact that lockdowns lead to immediate, measurable declines in NO2 , we compared the World Bank’s measure of COVID-19 lockdown stringency versus changes in NO2 for the countries in our sample.1 Figure S5 provides a graphical depiction of the obvious negative correlation we found between stringency and NO2 VCDs in China, Italy, the US, and France. In each case, increases in stringency are associated with declines in NO2 , while reductions in stringency are associated with increases, underscoring the scientific literature (Gkatzelis et al., 2021). A .5 US Political Landscape and Collective Inaction There seems to be a geographic correlation in the United States between political affiliation and extent of COVID-19 lockdowns, as proxied by changes in NO2 . There is good reason to think that there may have been less forceful shutdowns in Republican-leaning districts, and less adherence to lockdown orders specifically amongst those who also exhibit stronger support for the Republican Party between 2000 and 2016 (Bazzi et al., 2021). As a basic test of this “collective inaction” thesis, we regressed the percentage change in NO2 from 2019 Q2 to 2020 Q2 on the margin, of Democratic victory in the 2016 Presidential election at the county level. If the thesis is correct, we should obtain a negative sign on the effect, since this would mean more Democratic-leaning counties were associated with stronger lockdown measures. We obtain the following estimates (robust standard errors in parentheses), %∆NO2 = −9.837 + −0.046 · (Margin of Dem. over Rep.) + Error. (0.428)∗∗∗ (0.010)∗∗∗ 1 Availability of the stringency index: https://ourworldindata.org/grapher/covid-stringency-index S3 Moscow London Harbin New York Istanbul Ankara BeijingSeoul Tehran Xi'an Tokyo AlexandriaCairo Lahore Delhi Wuhan Monterrey Riyadh Dhaka Nanjing Guadalajara Mexico City Mumbai Khartoum Karachi BangkokBuriram HyderabadBangalore Ho Chi Minh City Bogota Lagos Singapore Jakarta Luanda Lima Sao Paulo Rio de Janeiro Santiago Sydney Buenos Aires Mean NO2 in 2019 (unit: 1e15 molecules/cm2) 0 5 Figure S2: NO2 : World Concentrations of NO2 as of Year-End 2019. S4 Moscow London Harbin New York IstanbulAnkara Beijing Tehran Seoul Tokyo Xi'an AlexandriaCairo Lahore Delhi Wuhan Monterrey Riyadh Dhaka Nanjing Guadalajara Mexico City Mumbai Khartoum Karachi Buriram Rangoon Bangkok Bogota Lagos Bangalore Singapore Jakarta Luanda Lima Sao Paulo Rio de Janeiro Santiago NO2 difference between Sydney Buenos Aires 2020 H1 and 2019 H1 (unit: 1e15 molecules/cm2) -3 0 3 Figure S3: NO2 : World Difference Between First Half of 2019 and 2020. S5 N02 in the Hubei province: two-week difference before and after Wuhan lockdown (unit: 1e15 molecules/cm2) Shiyan -10 0 10 Xiangyang Suizhou Shennongjia Forestry District Jingmen Xiaogan Yichang Huanggang Tianmen Wuhan Ezhou Qianjiang Xiantao Enshi Tujia and Miao Autonomous Prefecture Jingzhou Huangshi Xianning A. Hubei Province, China Trentino-Alto Adige Friuli-Venezia Giulia Valle d'Aosta Veneto Lombardia Piemonte N02 in Italy: Emilia-Romagna Liguria two week difference before and after lockdown in northern Italy (unit: 1e15 molecules/cm2) Toscana Marche Umbria -6 0 6 Abruzzo Lazio Molise Campania Apulia Basilicata Sardegna Calabria Sicily B. Italy Figure S4: COVID-19 Shutdowns and NO2 in Hubei Province and Northern Italy. S6 NO2 and Covid−19 stringency indices NO2 (LHS) Stringency index (RHS) China Italy Yearly NO2 percent change Stringency index Yearly NO2 percent change Stringency index 60 100 60 100 Northern Italy Wuhan lockdown lockdown 40 80 40 80 20 20 60 60 0 0 40 40 −20 −20 −40 20 −40 20 −60 0 −60 0 Oct Dec Feb Apr Jun Aug Oct Dec Feb Apr Oct Dec Feb Apr Jun Aug Oct Dec Feb Apr 2019 2020 2021 2019 2020 2021 United States of America France Yearly NO2 percent change Stringency index Yearly NO2 percent change Stringency index 60 100 60 100 40 80 40 80 20 20 60 60 0 0 40 40 −20 −20 −40 20 −40 20 −60 0 −60 0 Oct Dec Feb Apr Jun Aug Oct Dec Feb Apr Oct Dec Feb Apr Jun Aug Oct Dec Feb Apr 2019 2020 2021 2019 2020 2021 * Note: NO2 is calculated as follows: [ln(91−day RA of NO2) − ln(91−day RA of NO2 in prior year)]*100 Figure S5: COVID-19 Lockdown Stringency Index and NO2 , by-Country. * Source: The stringency index comes from Our World in Data. Notes. NO2 : percentage change in the 91 day rolling. Stringency Index: World Bank, Our World in Data. N = 3, 073 and R2 = 0.01. The constant is negative both due to overall downward trend in NO2 , and overall nationwide decline in NO2 in 2020 due to the pandemic. The effect of the Democratic margin is negative, consistent with the collective inaction thesis. While this result is interesting, there are important robustness checks to this incomplete claim. For instance, we make no effort to disentangle whether these differences are due to more lax orders, or more resistance to orders. These questions are beyond the scope of our analysis. B Other Results B .1 Night Lights Baseline In this subsection we show how the incremental introduction of updated data, satellite instru- ments, and technical adjustments (including removal of stray lights and inter-calibration) takes us from Henderson et al. (2012)’s results to the night lights baseline we present in this paper (Table 2, Column 2). Before presenting this correspondence, we describe the difference between instruments and satellites, which is sometimes confusing. We make use of two separate night lights data sets. First, we use data derived from the US Air Force Defense Meteorological Satellite Program’s S7 NO2 in the United States by county: Percent change between 2020 Q2 and 2019 Q2 (percent) < -20 (-20, -10) (-10, 0) (0, 10) (10, 20) > 20 A. U.S.: Change in NO2 During COVID-19 Outbreak. 2016 presidential election votes by county voting margin (percent) Democrat: +60-90 Democrat: +30-60 Democrat: +0-30 Republican: +0-30 Republican: +30-60 Republican: +60-90 B. U.S.: 2020 Presidential Election Results. Figure S6: U.S. County-Level COVID-19 Change in NO2 and 2016 Election Results. S8 (DMSP) Operational Linescan System (OLS) sensors i.e. instruments. These fly on board 6 separate satellites in our 1992-2013 sample, named F-10, F-12, F-14, F-15, F-16, and F-18. For several years, there are OLS observations from two separate satellites; for example, F-12 and F- 14 overlap over 1997-1999. Second, we use data from the Visible Infrared Imaging Radiometer Suite (VIIRS) instrument, which has flown on board the single NASA/NOAA SNPP satellite since its launch in 2011. While the available data sample for VIIRS technically begins in 2012, the more reliable stray light-corrected version begins in 2014, which is the sample we utilize.2 Throughout, we use the acronyms OLS and VIIRS to disambiguate between instruments, which is only one-to-one with satellite in the case of VIIRS. With this in mind, Table S1 Column HSW is a reference to Henderson et al. (2012)’s baseline result, which appears in their Table 1, Column 1 (p. 1,012) and Table 2, Column 1 (p. 1,015). This estimate used the now dated v. 6.1 of the Penn World Tables (PWT), which is revised between vintages (Johnson et al., 2013). Table 2 Column 2 is our replication of the same specification, but updated using the newest available version of the PWT, v. 10.0. Column 3 uses the same model, but now with since-available inter-calibrated OLS data.3 Column 4 repeats the same exercise, but extended to the end of the OLS data we include in the analysis, in 2013. This is the first instance of the estimated lights coefficient changing notably, and R-squared also increases. Column 5 repeats this specification, but includes satellite fixed effects. These are advisable since variability across satellite measurements even within the OLS instrument may be substantial, owing to differences in sensor settings across satellites; this approach was followed by Chen and Nordhaus (2011). The sample size increases as a result. Both the estimated coefficient and R-squared decrease against the increases from the larger sample. Column 6 starts the sample in 2005, where our NO2 data begins. Table S2 Column 1 estimates the same relationship, but using the stray light-corrected VIIRS data set which begins in 2014. VIIRS addresses many of the shortcomings of the DMSP- OLS database (Gibson et al., 2020) and has also has been used towards estimating GDP (Chen and Nordhaus, 2019). With this corrected, more accurate data, we observe a comparatively smaller estimated lights coefficient and weaker explanatory power than DMSP-OLS. There are no satellite fixed effects because VIIRS is on board a single satellite. Column 2 compares results from only the brightest lights in city cores. Column 3 compares results with NOAA’s preferable stable (i.e. non-ephemeral) lights product, which increases explanatory power and significance, and we use for the remaining specifications. Column 4 joins the previous DMSP- OLS specification from Table S1 Column 5 with Column 3. Column 5 joins the previous 2 Stray lights are light which reaches the spacecraft as it observes parts of the Earth that are currently dark. This occurs due to the Earth’s curvature, and typically at high latitudes. Such measurements must be corrected before concluding that a retrieved luminosity is indeed man-made (Elvidge et al., 2017). 3 Inter-calibration is a statistical method necessary to process the data before it may be used. It is necessary because the digital numbers reported in the OLS data set are, for technical reasons, not directly proportional to actual luminosity. We use the methodology described by Elvidge et al. (2014). S9 Table S1: Updated Data and Assumptions in Henderson et al. (2012) Baseline. ln(GDP) ln(GDP) ln(GDP) ln(GDP) ln(GDP) ln(GDP) HSW (2) (3) (4) (5) (6) ln(lights/area) 0.277** 0.287*** 0.284*** 0.334*** 0.237*** 0.201*** (0.031) (0.059) (0.058) (0.037) (0.045) (0.032) PWT Version 6.1 10.0 10.0 10.0 10.0 10.0 Raw OLS ✓ ✓ Inter-Calibrated OLS ✓ ✓ ✓ ✓ Satellite Average ✓ ✓ ✓ ✓ Satellite Fixed Effects ✓ ✓ Observations 3,015 3,034 3,034 3,934 6,169 2,205 Countries 188 180 180 180 180 180 Years 92-08 92-08 92-08 92-13 92-13 05-13 Within Country R-Sq. 0.769 Within R-Squared 0.209 0.139 0.137 0.192 0.128 0.118 Robust standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 Notes. “HSW” denotes Henderson et al. (2012)’s baseline result, which appears in Table 1, Column 1 (p. 1,012) and Table 2, Column 1 (p. 1,015). All specifications include country and year fixed effects. Robust standard errors, clustered by country, are in parentheses. Both reported R-Squared statistics are adjusted for sample size; “Within R-Sq.” is within all fixed effects. Note, Penn World Tables v. 6.0 had a country count of 188, while v. 10.0 has a reduced count of 183. In this analysis, Hong Kong, Macao, and Taiwan are included as part of China for consistency with existing shape files commonly used to process worldwide data. We adjust all Penn World Tables numbers accordingly, and thus obtain a final data set of 183-3=180 countries with both GDP and night lights data available. S10 Table S2: Effect of Extending DMSP-OLS Baseline with More Recent VIIRS Data. ln(GDP) ln(GDP) ln(GDP) ln(GDP) ln(GDP) ln(GDP) (1) (2) (3) (4) (5) (6) ln(lights/area) 0.055*** 0.169*** 0.218*** 0.228*** 0.124*** 0.109*** (0.017) (0.050) (0.049) (0.030) (0.028) (0.027) PWT Version 10.0 10.0 10.0 10.0 10.0 10.0 Stray Light-Corr. VIIRS ✓ ✓ ✓ ✓ ✓ ✓ Top 3 Clusters (City Core) ✓ Stable Lights (VIIRS, ‘14-‘20) ✓ ✓ ✓ ✓ Inter-Calibrated (OLS, ‘92-‘13) ✓ ✓ ✓ Satellite Average (OLS) ✓ Satellite Fixed Effects ✓ ✓ Observations 1,194 1,188 1,194 7,363 3,399 2,752 Countries 180 179 180 180 180 178 Years 14-20 14-20 14-20 92-20 05-20 05-20 Within R-Squared 0.032 0.075 0.125 0.151 0.079 0.068 Robust standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 Notes. All specifications include country and year fixed effects. Robust standard errors, clustered by country, are in parentheses. p-value for the null hypothesis that the estimated effect differs across night lights data sets in brackets. “Within R-Sq.” is adjusted and within all fixed effects. In comparison with Table S1 Column 6, which presented results using DMSP-OLS alone, Column 6 uses the same OLS specification but extends the sample through 2020 using VIIRS. This column becomes Table S3 Column 1 in the text. Note, two countries are dropped in comparison with Column 5 – Bermuda and the British Virgin Islands – because they are smaller than the resolution of OMI, and hence not included in the NO2 sample. In Column 6, the satellite average is computed across OLS sensors only. DMSP-OLS specification from Table S1 Column 6, which starts in 2005 where OMI NO2 data begins, with Column 3. Column 6 now uses satellite averages, for comparability with Henderson et al. (2012). This is equivalent to Table S3 Column 2, below. Including country-specific time trends finally yields Table 2, Column 2, the baseline night lights specification used in the text. B .2 Results without Country Time Trend Table S3 conducts the same estimation as Table 2 in the text, but now omitting country-specific time trends. Thus, whereas Table 2 depicted the business cycle, i.e. cyclical relationships be- tween GDP and each signal, Table S3 presents trends, i.e. longer-run economic growth relation- ships. In particular, the baseline night lights specification in Column 2 is described in Section B .1 above, and is equivalent to Table S2 Column 6. The basic difference when considering trend economic growth is that night lights maintain statistically significant explanatory power across S11 Table S3: Proxy: No Country Time Trend. ln(GDP) ln(GDP) ln(GDP) ln(GDP) ln(GDP) ln(GDP) ln(GDP) ln(GDP) ln(GDP) (1) (2) (3) (4) (5) (6) (7) (8) (9) ln(NO2 ) 0.348*** 0.307*** 0.165*** 0.164*** 0.080 (0.061) (0.060) (0.055) (0.060) (0.053) ln(Lights) 0.109*** 0.100*** 0.053*** 0.049** 0.055** (0.027) (0.027) (0.020) (0.021) (0.023) ln(CO2 ) 0.316*** 0.281*** 0.246*** 0.221*** (0.035) (0.036) (0.035) (0.037) ln(Electricity) 0.207*** 0.052 0.035 (0.055) (0.046) (0.049) ln(Survey) 0.468*** 0.345*** (0.083) (0.059) Observations 2,752 2,752 2,472 2,548 2,303 2,752 2,472 2,371 2,071 Countries 178 178 174 170 154 178 174 167 146 Years 05-20 05-20 05-19 05-19 05-19 05-20 05-20 05-19 05-19 R-Squared 0.049 0.068 0.313 0.159 0.256 0.105 0.341 0.344 0.490 Robust standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 Notes. All specifications include country and year fixed effects, but no country-specific time trend. When using nights lights, fixed effects for either OLS (pre-2014) or VIIRS (2014-2020) included. Robust standard errors, clustered by country, in parentheses. R-Squared is adjusted and within all fixed effects. all specification, whereas NO2 eventually drops off. This is likely due to the fact that GDP and lights trend increase across most countries over time, whereas NO2 is trend decreasing in many countries during the sample (Table 1, Panel B). Thus, lights are a valuable indicator of economic growth, while NO2 is perhaps preferable when considering cyclical fluctuations. Table S4 does the same without country time trends by-grade, comparable with the with- time trends Table 3 in the text. Column 1 is equivalent to Column 6 in Table S3. From a trend growth perspective, NO2 is primarily useful in higher-graded countries, whereas lights are primarily useful for lower-graded countries. The basic difference when including time trends in Table 3 is that NO2 also has explanatory power for cyclical fluctuations in nations with lower-graded statistical systems. Table S5 repeats the analysis pertaining to data manipulation summarized in Table 4, now considering the effects of omitting country-specific time trends. Column 1 of Table S5 replicates Column 3 of Table 4, which showed no evidence of data manipulation with respect to cyclical fluctuations. This is indicated by the counterintuitive negative interaction between lights and the FiW index of autocracy. However, when in Column 2 we omit the country time trend, the positive interaction reappears. We also find that there is a difference between crisis and non-crisis years in this case. Thus, the issues we uncover with respect to lights’ unreliability concerns off-trend fluctuations, specifically. S12 Table S4: Proxy: By-Country Data Quality Grade, No Country Time Trend. ln(GDP) ln(GDP) ln(GDP) ln(GDP) (1) (2) (3) (4) Grades All A, B C D, E ln(NO2 ) 0.307*** 0.230*** 0.172** 0.230 (0.060) (0.079) (0.073) (0.143) ln(Lights) 0.100*** 0.055 0.084** 0.134** (0.027) (0.037) (0.042) (0.059) Observations 2,752 464 1,681 562 Countries 178 29 109 37 Year 05-20 05-20 05-20 05-20 R-squared 0.105 0.105 0.056 0.105 Notes. All specifications include country and year fixed effects. When using nights lights, fixed effects for either OLS sensor (average over DMSP satellites; pre-2014) or VIIRS (2014-2020) are included. A country-specific time trend is included where indicated. Robust standard errors, clustered by country are reported in parentheses. R-Squared is adjusted and within all fixed effects. Three countries, Cura¸ cao, Palestine, and Sint Maarten, are ungraded. B .3 Alternative Country Grades Henderson et al. (2012) note that it is difficult to disentangle which countries actually have poor quality national accounts from the Penn World Tables letter grades used in the main text. They instead turn to World Bank grades for a subsample of countries. Here, we show how our primary results regarding objective data quality rankings in Table 7 would change using this alternative grading scheme for the subset of 114 graded countries. Table S7 presents the results in two scatter plots, where marker size is proportional to the number of countries. In Panel A, we show that while the Penn World Tables and World Bank grades have a strong positive correlation, there are important differences between the two rankings. In Panel B, we show how our objective measure compares with the alternative World Bank ranking. While also positively correlated, our ranking still represents a significant reordering of countries. Thus, our qualitative results discussed with respect to our objective rankings are the same if we were to conduct the discussion versus this alternative grading scheme. B .4 2020 Misstatement Table 8 in the text list an abbreviated set of countries with understatement or overstatement in 2020. Tables S6 and S7 expand Table 8 Panels A and B, respectively, to show understatement and overstatement across all countries for which 2020 GDP data was available at the time of S13 Table S5: Data Manipulation: Role of Country Time Trend. ln(GDP) ln(GDP) ln(GDP) (1) (2) (3) ln(Lights) 0.046** 0.066* 0.066* (0.019) (0.038) (0.038) FIW 0.036 0.085 0.088 (0.034) (0.060) (0.061) FIW2 -0.008 -0.020** -0.020** (0.005) (0.009) (0.009) Manipulation ln(Lights)×FiW -0.004** 0.009** (0.002) (0.004) ln(Lights)×FiW: No Crisis 0.009** (0.004) ln(Lights)×FiW: Crisis 0.011*** (0.004) Country Time Trend YES NO NO Observations 2,622 2,622 2,622 Countries 170 170 170 Years 05-20 05-20 05-20 Within R-Squared 0.807 0.128 0.130 Robust standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 Notes. All specifications include country, year, and sensor fixed effects. Robust standard errors, clustered by country are in parentheses. Eight countries included in the previous 178 country baseline with no Freedom in the World index available in this time sample: Aruba, Anguilla, Cabo Verde, Cayman Islands, Montserrat, Serbia, Sint Maarten, and Turks and Caicos Islands. Columns 3 includes a dummy interaction for crisis years, which are designated to be 2009 and 2020. In this case, a dummy interaction with FiW and FiW2 is also included as an unreported control. S14 A Penn World Tables Grade B C D E 0 1 2 3 4 5 6 7 8 9 10 World Bank Data Quality Grade A. Penn World Tables versus World Bank 10 9 Objective Data Quality Grade, Binned 2 3 4 5 6 1 07 8 0 1 2 3 4 5 6 7 8 9 10 World Bank Data Quality Grade B. Our Objective Measure (Binned to Deciles+1) versus World Bank. Figure S7: Data Quality Grades Correspondence Notes. 178 countries have PWT grades (A is the highest quality) and 114 have World Bank grades (10 is the highest quality). The World Bank grades include no PWT A-graded countries. Panel A: World Bank grade averages, by-Penn World Tables Grades. Averages are E: 2.8. D: 3.5. C: 5.2. B: 7.5. A: N/A. Panel B: our objective measure orders every country by standard deviation of GDP misstatement (Table 7). We classify them into deciles+1 (11 groups) to make a direct comparison with the World Bank’s grades A-B. S15 writing. Figure S8 now depicts graphically the relationship between columns “Difference” and nor- malized “Difference*” in Table S6 and S7. Plotted are ISO codes for each nation, each previ- ously given in those tables. Along the 45 degree diagonal in black are C graded countries, which receive no normalization. To the left of the (0,0) coordinate axis are countries with understated output in 2020 as reported in GDP, while to the right are instances of overstatement. From the skew of the figure, it is clear that instances of understatement are in general larger. When an ISO code in red is above the diagonal, then normalization implies that GDP is either less understated or more overstated. Below the diagonal, GDP is either more understated or less overstated. For instance, the A-graded UK (ISO=“GBR”) appears at (-0.41,-1.81). Thus, on the basis of its grading, its GDP understates true output growth by -0.41%. However, if we normalized it to grade C, that understatement would be even larger, at -1.81%. Before normalization, the UK seems to have significantly less understatement than grade D Malta (MLT), with an original difference of -3.27%. But when normalized, Malta has a difference of -2.06%, much closer to the UK. This figure thus communicates how much conclusions about relative under- or overreporting are due to how countries are classified based on data grades. Other countries where conclusions change markedly include A graded countries with over- statement, including Luxembourg (LUX), Ireland (IRL), Norway (NOR), Netherlands (NED), and Denmark (DNK), near the top of the figure. References Bazzi, Samuel, Martin Fiszbein, and Mesay Gebresilasse, ““Rugged individualism” and collective (In) action during the COVID-19 pandemic,” Journal of Public Economics, 2021, 195, 104357. Chen, Xi and William D. Nordhaus, “Using Luminosity Data as a Proxy for Economic Statistics,” Proceedings of the National Academy of Sciences, 2011, 108 (21), 8589–8594. and , “VIIRS Nighttime Lights in the Estimation of Cross-Sectional and Time-Series GDP,” Remote Sensing, 2019, 11 (9), 1057. Elvidge, Christopher D., Feng-Chi Hsu, Kimberly E. Baugh, and Tilottama Ghosh, “National Trends in Satellite-Observed Lighting,” Global Urban Monitoring and Assessment Through Earth Observation, 2014, 23, 97–118. , Kimberly Baugh, Mikhail Zhizhin, Feng Chi Hsu, and Tilottama Ghosh, “VIIRS Night-Time Lights,” International Journal of Remote Sensing, 2017, 38 (21), 5860–5879. Geddes, Jeffrey A, Randall V Martin, Brian L Boys, and Aaron van Donkelaar, “Long-term trends worldwide in ambient NO2 concentrations inferred from satellite obser- vations,” Environmental health perspectives, 2016, 124 (3), 281–289. S16 Table S6: Growth in 2020 (%): Understatement. ISO Country Grade Reported Composite Difference Difference* Percentile 1 BLZ Belize C -15.05 -10.55 -4.49 0.00 2 NAM Namibia D -8.30 -4.04 -4.26 -2.69 0.00 3 PER Peru C -11.81 -7.82 -3.99 0.00 4 MNG Mongolia D -6.06 -2.09 -3.96 -2.50 6.67 5 PHL Philippines C -9.80 -5.94 -3.86 0.00 6 MLT Malta D -7.13 -3.86 -3.27 -2.06 13.33 7 BWA Botswana C -8.23 -5.28 -2.95 6.67 8 IND India C -7.37 -4.67 -2.70 0.00 9 HND Honduras C -9.39 -6.77 -2.63 0.00 10 MAR Morocco C -7.38 -4.93 -2.45 6.67 11 ECU Ecuador C -8.07 -5.84 -2.23 0.00 12 MEX Mexico C -8.84 -6.89 -1.96 0.00 13 COL Colombia C -7.09 -5.21 -1.89 0.00 14 TUN Tunisia C -8.99 -7.12 -1.87 0.00 15 MYS Malaysia C -5.78 -3.92 -1.86 0.00 16 ARG Argentina B -10.43 -8.58 -1.85 -2.62 6.67 17 LKA Sri Lanka C -3.96 -2.35 -1.60 0.00 18 BHR Bahrain C -5.98 -4.43 -1.55 0.00 19 ESP Spain B -11.47 -10.03 -1.45 -2.05 0.00 20 THA Thailand C -6.39 -5.17 -1.22 6.67 21 ISL Iceland C -6.88 -5.79 -1.10 33.33 22 URY Uruguay B -6.08 -5.03 -1.05 -1.49 0.00 23 ZAF South Africa C -7.21 -6.19 -1.02 0.00 24 CHL Chile B -6.17 -5.25 -0.92 -1.30 0.00 25 SAU Saudi Arabia D -4.21 -3.39 -0.82 -0.52 66.67 26 CRI Costa Rica C -4.69 -3.89 -0.80 6.67 27 PRT Portugal B -7.87 -7.19 -0.68 -0.96 26.67 28 SGP Singapore B -5.53 -4.96 -0.57 -0.80 33.33 29 IDN Indonesia C -2.12 -1.63 -0.49 13.33 30 MOZ Mozambique D -1.27 -0.86 -0.41 -0.26 66.67 31 GBR United Kingdom A -10.37 -9.96 -0.41 -1.81 0.00 32 HRV Croatia C -8.45 -8.16 -0.30 86.67 33 SVK Slovakia C -5.34 -5.13 -0.21 66.67 34 CZE Czech Republic C -5.78 -5.60 -0.18 60.00 35 FRA France A -8.59 -8.42 -0.17 -0.76 0.00 36 ITA Italy A -9.35 -9.19 -0.17 -0.74 20.00 37 AUT Austria A -6.94 -6.79 -0.15 -0.67 13.33 38 CAN Canada A -5.55 -5.49 -0.07 -0.29 60.00 39 BEL Belgium A -6.53 -6.47 -0.06 -0.27 66.67 40 MKD North Macedonia C -4.65 -4.60 -0.05 86.67 41 BRA Brazil C -4.46 -4.41 -0.04 86.67 42 ROU Romania C -3.72 -3.71 -0.01 93.33 Notes: Percentile is percent of years 2005-2019 for which absolute difference between reported data and composite estimate for country was larger. S17 Table S7: Growth in 2020 (%): Overstatement. ISO Country Grade Reported Composite Difference Difference* Percentile 1 SRB Serbia C -0.91 -2.62 1.71 0.00 2 TUR Turkey C 1.58 0.14 1.44 20.00 3 EGY Egypt C 1.48 0.12 1.35 13.33 4 CHN China C 1.92 0.63 1.29 13.33 5 VNM Vietnam C 2.78 1.51 1.27 0.00 6 UKR Ukraine C -4.17 -5.36 1.19 60.00 7 LTU Lithuania C -0.77 -1.95 1.18 26.67 8 BLR Belarus D -0.97 -1.99 1.03 0.65 73.33 9 KOR South Korea B -0.92 -1.88 0.96 1.36 0.00 10 ALB Albania C -3.26 -4.13 0.87 13.33 11 BGR Bulgaria C -3.90 -4.59 0.69 46.67 12 JOR Jordan C -1.51 -2.19 0.68 40.00 13 FIN Finland B -2.81 -3.42 0.61 0.86 40.00 14 NZL New Zealand B -1.16 -1.74 0.58 0.82 20.00 15 NIC Nicaragua C -2.05 -2.56 0.51 66.67 16 CYP Cyprus D -5.24 -5.68 0.44 0.28 86.67 17 ISR Israel B -2.44 -2.88 0.44 0.62 20.00 18 HUN Hungary C -5.24 -5.67 0.43 53.33 19 EST Estonia C -2.73 -3.15 0.42 73.33 20 LUX Luxembourg A -1.32 -1.73 0.41 1.81 6.67 21 IRL Ireland A 2.45 2.04 0.40 1.77 33.33 22 RUS Russia C -2.96 -3.34 0.38 66.67 23 BIH Bos. and Herz. C -4.30 -4.66 0.35 66.67 24 NOR Norway A -1.30 -1.64 0.34 1.48 0.00 25 NLD Netherlands A -3.82 -4.14 0.32 1.42 0.00 26 PRY Paraguay C -0.67 -0.99 0.31 53.33 27 GTM Guatemala C -1.58 -1.88 0.30 60.00 28 DNK Denmark A -2.77 -3.06 0.29 1.28 20.00 29 DEU Germany B -5.41 -5.69 0.28 0.40 46.67 30 GRC Greece B -8.38 -8.64 0.26 0.36 80.00 31 SVN Slovenia C -6.28 -6.46 0.19 93.33 32 CHE Switzerland A -3.03 -3.21 0.18 0.79 0.00 33 SWE Sweden A -3.03 -3.16 0.12 0.54 40.00 34 USA United States A -3.55 -3.65 0.10 0.46 40.00 35 JPN Japan A -5.17 -5.26 0.09 0.41 46.67 36 LVA Latvia C -3.68 -3.77 0.09 93.33 37 AUS Australia A -2.47 -2.56 0.09 0.39 20.00 38 NGA Nigeria C -1.82 -1.88 0.06 86.67 39 POL Poland B -2.77 -2.78 0.01 0.02 93.33 Notes: Percentile is percent of years 2005-2019 for which absolute difference between reported data and composite estimate for country was larger. S18 Understatement Overstatement LUX IRL SRB More Overstated NOR NLD TUR DNK KOR CHN EGY VNM UKR LTU 1 CHE NZLFIN ALB ISR BGR JOR BLR SWE NIC USA JPN AUS HUN EST DEU RUS GRC BIH PRY GTM SVN CYP LVA NGA POL 0 ROU BRA MKD CZE SVK MOZ BEL HRVCAN SAU IDN Less Understated AUT ITA CRI SGP FRA PRT Less Overstated -1 ZAF Difference* ISL THA CHL BHR URY LKA MYS TUN COL GBR MEX -2 MLT ESP ECU MNG MAR NAM HND IND ARG BWA -3 More Understated PHL -4 PER BLZ -5 -5 -4 -3 -2 -1 0 1 Difference Figure S8: Effect on Composite Estimate for 2020 when Normalized to C-Grade Weight Notes. Difference=Reported-Composite. Difference* is similar, but using the composite based on the assumption of a grade C data weighting λ = 0.60 for all countries. See Tables S6 and S7 for ISO codes correspondence with country names and underlying values. S19 Gibson, John, Susan Olivia, and Geua Boe-Gibson, “Night Lights in Economics: Sources ´ and Uses,” Etudes et Documents, 2020, 1. Gkatzelis, Georgios I, Jessica B Gilman, Steven S Brown, Henk Eskes, A Rita Gomes, Anne C Lange, Brian C McDonald, Jeff Peischl, Andreas Petzold, Chelsea R Thompson et al., “The global impacts of COVID-19 lockdowns on urban air pollution: A critical review and recommendations,” Elementa: Science of the Anthropocene, 2021, 9 (1). Hamilton, James D, “Why You Should Never Use the Hodrick-Prescott Filter,” Review of Economics and Statistics, 2018, 100 (5), 831–843. Henderson, J. Vernon, Adam Storeygard, and David N. Weil, “Measuring Economic Growth from Outer Space,” American Economic Review, 2012, 102 (2), 994–1028. Johnson, Simon, William Larson, Chris Papageorgiou, and Arvind Subramanian, “Is Newer Better? Penn World Table Revisions and Their Impact on Growth Estimates,” Journal of Monetary Economics, 2013, 60 (2), 255–274. Laughner, Joshua L, Qindan Zhu, and Ronald C Cohen, “The berkeley high resolution tropospheric no 2 product,” Earth System Science Data, 2018, 10 (4), 2069–2095. Veefkind, JP, I Aben, K McMullan, H F¨ orster, J De Vries, G Otter, J Claas, HJ Eskes, JF De Haan, Q Kleipool et al., “TROPOMI on the ESA Sentinel-5 Precursor: A GMES mission for global observations of the atmospheric composition for climate, air quality and ozone layer applications,” Remote Sensing of Environment, 2012, 120, 70–83. S20