Policy Research Working Paper 10916 How Well Did Real-Time Indicators Track Household Welfare Changes in Developing Countries during the COVID-19 Crisis? David Newhouse Rachel Swindle Shun Wang Joshua D. Merfeld Utz Pape Kibrom Tafere Michael Weber Development Economics Poverty and Equity Global Practice & Human Capital Project September 2024 Policy Research Working Paper 10916 Abstract This paper investigates the extent to which real-time indica- variation in income losses and current employment, with tors derived from internet search, cell phones, and satellites R2 values of approximately 0.54 and 0.38, respectively. predict changes in household socioeconomic indicators Income gains, self-reported food insecurity, social dis- across approximately 300 administrative level-1 regions tancing behavior, and child school engagement are more in 20 countries during the COVID-19 crisis. Measures difficult to predict, with R2 values ranging from 0.06 to of changes in socioeconomic status in each region are 0.17. Google search terms related to food, money, jobs, and taken from high-frequency phone surveys. When using religion were the most powerful predictors of work stoppage the first wave of data, fielded between April and August and income declines in the first survey wave, while those 2020, models selected using the least absolute shrinkage and related to food, exercise, and religion better tracked changes selection operator explain 37 percent of the cross-regional in income declines and employment over time. Google variation in the share of households reporting declines in mobility measures are also strong predictors of changes in total income and 34 percent of the share of respondents employment and the prevalence of specific types of income reporting work stoppages since the onset of the crisis. Real- declines. In general, satellite data on vegetation, pollution, time indicators explain a lower amount of the within-region and nighttime lights are far less predictive. Google mobil- variation in income losses and current employment over ity and search data, and to a lesser extent vegetation and time, with an R2 of 15 percent for current employment pollution data, can provide a meaningful signal of regional and 22 to 26 percent for the prevalence of income declines. economic distress and recovery, particularly during the early When limiting the sample to urban regions, real-time indi- phases of a major crisis such as COVID-19. cators are far more effective at explaining within-region This paper is a product of the Development Economics, the Poverty and Equity Global Practice, and the Human Capital Project. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http:// www.worldbank.org/prwp. The authors may be contacted at dnewhouse@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team How Well Did Real-Time Indicators Track Household Welfare Changes in Developing Countries during the COVID-19 Crisis? 1 David Newhouse, Rachel Swindle, Shun Wang, Joshua D. Merfeld, Utz Pape, Kibrom Tafere, Michael Weber JEL: D31, D74, I31, O15 Keywords: COVID-19; Well-being; Work; Income; Google mobility; NO2 ; Google search 1The team would like to thank the Korea Trust Fund for Economic and Peace-Building Transitions (KTF) for providing funding. The KTF, supported by the Ministry of Economy and Finance, Republic of Korea, is a global fund administered by the World Bank to finance critical development operations and analysis in situations of fragility, conflict, and violence. 1. Introduction COVID-19 triggered one of the most precipitous and dramatic shocks to households and regional economies since the end of World War II. Within weeks of the World Health Organization’s declaration of COVID-19 as a global pandemic on March 11, 2020, the virus had spread across most of the world. 2 As the crisis unfolded, there was an immediate need for fast, accurate data on the economic effects of the pandemic. The World Bank Group worked with national statistical offices to field phone surveys in dozens of member countries, with the goal of expeditiously measuring how the pandemic was impacting households. These surveys asked respondents about directional income changes, both for total and various sources of income. In addition, they asked about access to critical services and resources, food security, employment status, behavioral adjustments, and overall well-being. Insights from these survey data have proven to be invaluable for documenting the impact of the shocks on households in many developing countries and informing policy responses: they are enabling more precise targeting in assistance and recovery programs, whether in the form of direct cash transfers to residents or interventions to assist firms in the hardest-hit sectors. However, fielding phone surveys takes critical time, which is not always an available resource in the midst of a major crisis, and requires substantial resources to sustain over longer periods. Real-time indicators – like satellite images, pollution data, and Google mobility data – are immediately available at low cost. Hence, it is not surprising to see that, for example, mobility data from Google has become popular for understanding community responses to the pandemic. When lockdowns were implemented, Google mobility data shows corresponding declines in activity in commercial areas and increases in time spent at residential spaces. These data are available in close to real time and can be used to document compliance with mobility restrictions. Similarly, near real-time data on nitrogen dioxide pollution showed less pollution in some areas in the aftermath of the shock (Masaki et al., 2020). The varied speed of the onset of the crisis and the progression of recovery in different countries, and regions within them, raises the natural question of how well these real-time indicators track the economic indicators collected in the phone surveys, and which real-time indicators are most informative. 2Director-General’s opening remarks at the media briefing on COVID-19, 11 March 2020. https://www.who.int/director-general/speeches/detail/who-director-general-s-opening-remarks-at- the-media-briefing-on-COVID-19---11-march-2020. 2 This study combines a large set of near real-time indicators with the World Bank’s high- frequency phone surveys to assess the ability of the indicators to predict changes in a variety of proxies for household economic well-being. In addition to Google Mobility data, Google search trends, night-time lights, vegetation and pollution data are used to develop predictive models of trends in a variety of outcomes measured in household surveys. First, the study examines the ability of these real-time indicators to predict variation in the severity of the initial shock, as measured by the prevalence of self-reported decreases in income and job losses since the start of the crisis, recorded in the first survey wave that each country fielded between April and August 2020. Second, the study examines the ability of the real-time-indicators to predict variation within regions – typically states or provinces – over time as the impact of the crisis evolved. While the use of real-time data is relatively new in economic research, several studies provide insight into best practices for predictive modeling and how this data can contribute to poverty research. Many real-time measures, such as Google mobility, night lights from satellites, nitrogen dioxide (NO2 ), and Google search of certain keywords, have the potential to predict traditional well-being measures. Google mobility has been recently used to predict industrial production growth (Sampi & Jooste, 2020) and economic growth (Putra & Arini, 2020) during the pandemic. The Google mobility index shows how visits to places are changing over time in each geographic region. The index is measured for six categories of places, including workplaces, residential, grocery and pharmacy, transit stations, retail and recreation, as well as parks. The data are not representative, however, as they are only collected from smartphone users that have location services turned on. Economists increasingly use night lights detected by satellites as a proxy for economic indicators (Chen & Nordhaus, 2011; Gibson et al., 2021; Henderson et al., 2012; Ishizawa et al., 2017; Mellander et al., 2015). In developing countries with low-quality economic and social accounting systems, or when satellite images are more frequently available than such data, the satellite-based nightlight data may provide valuable information on social and economic development. In addition to the widely used night lights, other types of remotely sensed data have recently been explored in economics. For example, Castellanos & Boersma (2012) find that NO2 is correlated with economic activities and thus can be used to measure the magnitude of social and economic activities (Keola & Hayakawa, 2021). The Normalized Difference 3 Vegetation Index (NDVI) explains about 4 percent of the variation in per capita consumption across Sri Lanka (Engstrom et al., 2021). Daytime satellite imagery has also been shown to predict 15 to 17 percent of changes in village-level asset indices extracted from Living Standards Measurement Surveys in Sub-Saharan Africa. However, that rises to 35 percent when predicting an index of changes in asset in ownership at the village level, and 50% when that index of changes is aggregated to the district level (Yeh et al, 2020). Khachiyan et al (2021) find that models trained on satellite imagery can explain between 32 and 46 percent of decadal changes in income and population in the United States. Data from search engines have also been used to assess people’s real-time well-being without relying on survey questionnaires. Google Trends and its local equivalents (such as Baidu Trends in China and Naver Trends in the Republic of Korea), based on search frequency, provide an index for search intensity (or relative popularity) by topic or term over time in a certain geographical area. A number of recent studies have used data from Google Trends to explore the changes in psychological well-being during the COVID-19 pandemic (Brodeur et al., 2021; Foa et al., 2020; Ma et al., 2021). However, these studies focus on measuring the impact of lockdowns on search behavior and do not compare search data with large-scale phone survey data. In addition, data from search engines are also not nationally representative since they are only available for areas with access to the internet and collected from users of Google for online search, while data from satellites are affected by cloud cover, raising questions about their ability to track economic changes more broadly. In this paper, we exploit a set of real-time indicators from different sources to predict changes in welfare in developing countries during the COVID-19 crisis. Six main findings emerge: 1. The selected real-time indicators predict a significant share of the cross-regional variation in the initial impacts of the crisis. Across about 300 regions in 20 countries, eight Google search terms explain a combined 36 percent of the variation in the share of households reporting total income declines since the start of the crisis between April and August 2020. Meanwhile, during the same period, 10 Google search indicators explain a combined 34 percent of the variation in the share of respondents reporting job loss since the start of the crisis. 2. Compared with initial impacts, selected real-time indicators explain less of the variation in the prevalence of income declines and employment between April 2020 and 4 February 2021. The R2 values for these models are still substantial, however, and range from 22 to 26 percent for the incidence of income declines and are 15 percent for current employment. 3. When limiting the sample to urban regions only, real-time indicators are far more successful in explaining within-region variation in income declines and employment between April 2020 and February 2021. The R2 values for these models range from 29 to 55 percent for the incidence of income declines and are 33 percent for current employment. Improved predictive performance in urban regions appears to be partly due to the greater predictive accuracy of Google search terms. 4. Google mobility measures are strongly associated with work stoppage and changes in current employment, as well as declines in farm, non-farm enterprise, and wage income, during both the initial phase and the subsequent period. 5. Google search terms related to food, money, jobs, and religion were also strong predictors of income declines and work stoppage during the initial stage of the crisis. Google searches related to food, assistance, religion, and in some cases exercise were strong predictors of changes in the prevalence of income declines and employment as the crisis evolved. 6. In general, Google search and mobility information are far more predictive of income and employment changes than satellite data from vegetation, pollution, and night-time lights. However, vegetation levels strongly predict a measure of food insecurity and the share of students that quit school, while pollution is a moderately strong predictor of work stoppage, current employment, and non-farm and wage changes. Night-time lights were not selected by LASSO in any of the predictive models. Taken together, our findings suggest that these real-time indicators, particularly Google mobility and search terms, predict a respectable amount of the variation in changes in well- being. The signs – positive or negative correlation – are often intuitive, especially when explaining variation over time. Real-time indicators were better able to predict cross-sectional variation in initial crisis impacts, measured as retrospective changes, than intertemporal variation in outcomes over repeated waves of survey data. In addition, the real-time indicators were better suited for predicting income losses than gains. The importance of different indicators also varies widely by context, highlighting the benefits of training models that 5 combine information from several different sources. Such models may be a promising avenue for monitoring changes in a region’s economic circumstances, particularly in the midst of sudden, large shocks when more traditional survey data are not (yet) widely available. The remainder of this paper is organized as follows: section 2 introduces the data and empirical methodology, section 3 describes trends in measures of well-being, section 4 presents the results of our analysis, and section 5 concludes this study. 2. Data and Empirical Methodology The analysis uses two types of data: (1) high-frequency phone surveys measuring household well-being during the COVID-19 crisis, and (2) real-time metrics of mobility, internet search activity, pollution, and night-time lights. We use the regionally aggregated high-frequency phone survey (HFPS) data as the dependent variable for the prediction based on real-time metrics. 2.1 Household Well-Being Measures from the HFPS The HFPS data cover 59 developing countries. 3 The process of fielding the surveys for each country was done in conjunction with national statistics offices to ensure appropriate weighting and population representation. Enumerators contacted households by phone and followed up with individuals in subsequent waves. Surveys were either carried out by recontacting respondents from a previously fielded nationally representative survey or through random digit dialing (RDD). Lower-income countries primarily opted for the former, while middle-income countries with more widespread phone coverage were more likely to use RDD. In cases where previous survey respondents were re-contacted, weights were constructed to account for the selection bias due to only interviewing households with phones who responded to the survey. However, respondents tend to be heads and the weights did not 3The HFPS includes 8 countries from East Asia and Pacific (Indonesia, Cambodia, the Lao People’s Democratic Republic, Myanmar, Mongolia, the Philippines, Solomon Islands, and Viet Nam), 8 from Europe and Central Asia (Armenia, Bulgaria, Croatia, Georgia, Poland, Romania, Tajikistan, and Uzbekistan), 14 from Latin America and the Caribbean (Argentina, Bolivia, Chile, Colombia, Costa Rica, Dominican Republic, Ecuador, El Salvador, Guatemala, Honduras, Mexico, Paraguay, Peru, and Saint Lucia), 5 from the Middle East and North Africa (the Arab Republic of Egypt, Iraq, the West Bank and Gaza, Tunisia, and the Republic of Yemen), and 24 from Sub-Saharan Africa (Burkina Faso, the Central African Republic, Chad, the Democratic Republic of Congo, the Republic of Congo, Djibouti, Ethiopia, Gabon, Ghana, Kenya, Madagascar, Malawi, Mali, Mauritius, Mozambique, Nigeria, São Tomé and Príncipe, Senegal, Sierra Leone, Somalia, South Sudan, Uganda, Zambia, and Zimbabwe). Among them, three countries (Djibouti, Kenya, and Mauritius) do not have a subnational region code, and thus are omitted from the analysis. 6 correct for biases due to non-random selection of respondents within the household (Brubaker et al, 2021; Kugler et al., 2021). Therefore, while household level outcomes such as self-reported income declines are plausibly representative, individual level outcomes such as employment mainly apply to household heads. See World Bank (2023) for more information on the HFPS. The outcome variables taken from the HFPS are listed in Table 1 and include income change, employment, food security, and prevention measures such as social distancing and quitting schools. Work stoppage is measured in the first wave of the phone survey in each country, and is constructed based on the questions “Was the respondent working at the time the survey was conducted?” and “Was the respondent working before the pandemic?”. In particular, work stoppage is set equal to 1 if a respondent reported that they were working prior to the pandemic but not currently working at the time of the survey, 0 if they were working prior to the pandemic and were currently working at the time of the survey, and missing if they were not working prior to the pandemic. Income change is measured by asking respondents whether their household’s income has increased, decreased, or stayed the same since the start of the pandemic (first wave only) or since the last survey wave (all subsequent waves). When looking at the evolution of the crisis over time, we combined these two questions into a single measure, in order to include all waves. Income change is further asked separately for each income source including farm, non- farm, wages, and pensions. The question for total income change is “Has your Total Household Income changed since the pandemic started?” There are 5 categories of answers: 1 = Increased, 2 = Stayed the same, 3 = Decreased, 4 = Not received, 9 = Do not know. We create a variable on income loss, where “Decreased” is recoded as 1, “Do not know” and “Not received” as missing, and “Stayed the same” or “Increased” as 0. We also create a variable on income gain, where “Increased” is recoded as 1, “Do not know” and “Not received” as missing, and “Stayed the same” or “Decreased” as 0. The creation of variables for different income sources, farm income, non-farm income and wage income follows the same rule. Not all income questions were asked in each survey, so the sample of available regions varies by indicator. Food security is another key metric of household well-being, which is measured using a series of questions on household constraints accessing food. The analysis uses two questions: (1) “in 7 the last week did you or another adult in the household skip a meal due to lack of money or resources?”; and (2) “was there a time in the last week when you or another adult in the household went an entire day without food due to lack of money or other resources?”. The responses are binary, 1 for “Yes” and 0 for “No”. We also consider measures related to social distancing and school attendance. For social distancing, the survey question is “Have you adopted social distancing/self-isolation?” We use two measures of educational disruption. When analyzing wave 1, we construct a measure of school dropout based on “Have children in the HH attended school (primary/secondary) before school closures?” and “Have children in the HH engaged in ANY learning or educational services since school closures?". Because the first of these two questions is generally not asked after the first wave, we only consider the latter of those questions when analyzing all waves of data. We take the mean of each well-being measures at the regional (state/province) level by month to create a panel of regions. 4 Appendix Table 1 shows the number of regional observations by month in 2020. All variables have data beginning in April 2020. However, the panel data are highly unbalanced with most observations concentrated in June, July and August, implying that the HFPS surveys covered most countries in the three months. The summary statistics of the regional-level HFPS well-being measures for wave 1 and all waves in 2020 are reported in panels A and B of Appendix Table 1. 2.2 Explanatory Variables The main explanatory variables are a set of real-time indicators including night-time lights, air pollution, and EVI (a vegetation index) measured from satellites, Google mobility data, and the popularity of Google search of certain keywords. These are listed in Table 2. Summary statistics of real-time indicators for wave 1 and all waves in 2020 are reported in Appendix Table 2A and 2B, respectively. The measures of night-time lights, air pollution, and EVI are all obtained from Google Earth Engine. The night-time lights data are taken from the VIIRS monthly composites produced 4 Each survey wave is assigned to a unique month. 8 by the Earth Observation Group at the Colorado School of Mines. 5 To measure pollution, we use Nitrogen Dioxide (NO2) concentrations measured by Sentinel 5-P satellites. 6 We expect that NO2 levels increase in response to human activity, namely burning fossil fuels and biomass, which may have fallen during the crisis. EVI was taken from the Landsat 8 Collection 1 Tier 1 composite. 7 The Google search index at subnational levels reflects the relative popularity of any given search term. The total search volume for each term is divided by total Google search volume for that week and subnational region. The search index for each term is rescaled such that the week-subnational region observation with the highest relative search score, for all weeks in that country, is assigned a value of 100 with all other weeks expressed relative to this peak. The baseline (weeks with a value of 100) itself is a mechanism for rescaling search term data such that values are comparable across regions and over time. 8 We aggregate data into monthly averages, taking the average of the weekly data points for each subnational district. Google restricts the quantity of data that can be downloaded in one batch. Therefore, the datasets are extracted in multiple rounds to cover all subnational districts for all countries with HFPS data. This process requires indexing all downloads against a common administrative district which serves as a reference point. The reference district is included in all rounds of downloads such that the resulting search terms are comparable regardless of whether they are pulled in the same round. This approach is based on Abay et al. (2020). The Google mobility index shows how visits to places are changing over time in each geographic region. Google mobility data is measured against a baseline established by Google 5 https://developers.google.com/earth-engine/datasets/catalog/NOAA_VIIRS_DNB_MONTHLY_V1_VCMSLCFG 6 “Sentinel-5P NRTI NO2 : Near Real-Time Nitrogen Dioxide”, Google Earth Engine. https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S5P_NRTI_L3_NO2 7 https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C01_T1_8DAY_EVI 8 To be more precise, the index for a particular month and term is constructed as following: 1 4 ,,, 100 ,, = � ∗ 4 =1 ,,, ,,, � � ,,, where ,, represents the subnatiuonal google search index for a term in month m and subnational region r. ,,, represents the share of all search volume in a particular week w (in month m for region r) for that term, and ,,, Max is the maximum, across regions and weeks, in country c. 9 researchers and is consistent across countries. The baseline is for a “normal” day so, for example, Monday mobility data is measured as the deviation from the “normal” Monday baseline and similarly for other weekdays. The “normal” day reflects the median value from five weeks of pre-crisis data from January 3 through February 6, 2020. The data are aggregated to monthly values at the first subnational administrative division in each country. The index is measured for six places, including workplaces, residential, grocery and pharmacy, transit stations, retail and recreation, as well as parks. The real-time indicators at the levels of subnational administrative division by month are merged with the aggregated HFPS panel data at the same administrative level, yielding the panel data used for the analysis. 2.3 Multiple Imputation Because Google mobility data are missing for several regions, we employ multiple imputation techniques to fill them. Google does not disclose at least one mobility indicator in a total of 559 out of 1,096 regions, across the 6 indicators (Residential, Retail and Recreation, Workplace, Transit Stations, Groceries/Pharmacies, and Parks). Using multiple imputation techniques allows us to include these regions in the models, which generate more precise estimates of the relationship between outcomes and other non-mobility predictors. To fill in missing Google mobility indicators, we create 100 imputations using imputation by chained equations, based on the other mobility indicators, night-time lights, NO2, and vegetation index (EVI). The chained equations approach carries out imputation iteratively for each of the Google Mobility variables, each time substituting values drawn from the conditional distribution for missing values, until convergence is achieved. Because we carry out analysis on two different samples, one for initial impacts and one for the subsequent evolution of the crisis, we perform two imputation exercises: one only for data from the first wave, and one for data for all waves. In the first case, mobility values are imputed for between 65 and 135 of the 311 total regions, depending on the Google mobility indicator. When using the data for all waves, we demean the data using each region’s mean, to isolate the variation over time, and impute the demeaned variables. For this case, mobility values are imputed for between 315 and 540 of the 1,096 total region-waves. 10 A final issue is that of weights. The outcome variables are means of household indicators taken at the regional level, although these are demeaned when predicting the evolution of the shock. When constructing these regional averages, we use household weights. Because the number of observations varies by region, country, and survey weight, it is appropriate to correct for heteroscedasticity. We therefore weight each region-wave observation by the number of households in the original phone survey data used to construct the average for that region. 9 This gives greater weight to regions for which the outcomes are measured with greater precision. These weights are applied both in model selection and in model estimation. 2.4 LASSO Model Selection To construct predictive models, we use LASSO to select which variables among the real-time indicators best predict changes in income, work status, food insecurity, behavioral changes, and educational disruptions. When predicting initial impacts, the dependent variable is the regional average, and when predicting the subsequent evolution of the crisis the dependent variable and the predictors are demeaned using the regional mean over multiple waves. The full set of real-time indicators are included as candidate variables. We use a variant of the LASSO that selects the penalty parameter lambda to minimize the Bayesian Information Criteria, following Zhang et al. (2010). When selecting models to predict in the first wave, both the LASSO and the resulting post-lasso estimates are clustered on country, and when predicting changes over time, the LASSO and the post-lasso estimates are clustered on country-wave. 10 It is not straightforward to apply Lasso to multiply imputed data (Du et al., 2020) and different approaches have been proposed. For the purpose of maintaining simplicity and to avoid overfitting, we select the model using only data from one arbitrarily selected imputation, and report that R2. In addition, however, we report the mean R2 from 100 models, one for each imputation, each selected independently using LASSO and then estimated using OLS. The latter is our preferred measure of R2. 9 We do not give greater weight to more populous regions, conditional on sample size. 10The BIC is calculated based on the unpenalized variables, which is appropriate since we are using LASSO to select the model rather than for post-estimation. The calculation of the BIC takes into account clustering, as described in Stata. 11 After selecting variables by applying Lasso to a single imputation, we estimate the post-lasso OLS regressions across all 100 imputations, take the average of the point estimates, and estimate the variance using Rubin’s rule, as is standard when using multiple imputations. 11 To examine the predictions in the first wave, we estimate the following equation in the HFPS wave 1 data: = 0 + 1 + , where indicates the subnational region. and denote the lasso-selected set of HFPS well-being and real-time indicators, respectively. is a random disturbance term, clustered on country. For the long-term analysis, we consider the following OLS model using the demeaned variables, � + , � = 0 + 1 � and where � denote the set of demeaned HFPS well-being and real-time indicators respectively for wave w in region i. is a random disturbance term, clustered on country- wave. Using the demeaned variables is equivalent, at least in terms of the coefficient estimates, as a model in levels that includes regional fixed effects in the estimation. The resulting R2 measures provide a measure of how well the selected real-time indicators explain the variation in outcomes, within regions across time. 3. Trend Analysis 3.1 Trend of HFPS Well-Being Measures Figure 1 presents the trend of five well-being measures in 2020, averaging over countries in the sample. Not all measures are available for each country in each wave, so the composition of countries varies by measure and month. In addition, the first wave of the survey asks about changes since the pandemic began, while subsequent months ask about changes since the previous survey wave for that country. As the surveys start in April, we cannot report changes in the first quarter. The figure shows that the share of respondents reporting income loss is 11 We use the mi estimate command in Stata for this. 12 very high in April, implying that people’s income was highly affected by the pandemic in the first few months of the outbreak in the countries surveyed. However, the proportion of respondents reporting a decrease continuously drops until around September, at which point it begins to rise again throughout the end of the year. This pattern holds for all income types. Consistent with the general trend of income loss, the share of working people rose between April and September, with some variation in the last quarter. In summary, people experienced a large shock to well-being in April. Subsequent trends suggest another weaker shock in the last quarter of 2020, although this is difficult to conclude with certainty because of the changing composition of the regions in the sample. Figure 1. Trend of HFPS Well-being Measures 3.2 Trend of Real-Time Measures 13 Figure 2 shows the trend of night-time lights, NO2, and EVI. This sample includes all administrative regions in the survey data, but we are now able to go back to the beginning of the year. Night-time lights first increased from January to April, then decreased through August, followed by an increase from September to November. The data on NO2, on the other hand, shows a decline starting in February, prior to the crisis, and the recovery lags those of night-time lights by a couple of months. EVI, on the other hand, increases in the spring and declines in the late fall, which could be indicative of seasonal patterns, since the pandemic may be less likely to affect a vegetation index. Figure 2. Trend of Night Light, NO2, and EVI Figure 3 presents the trend of Google mobility measures. Google mobility shows a largely consistent pattern with the income loss measures. All five measures on outside mobility (including mobility in workplaces, grocery and pharmacy, transit stations, retail and recreation, and parks) show a V-shape: they decreased from February to April and started to recover from 14 May onwards. On the contrary, the dynamics of the residential mobility shows an inverted V- shape, peaking in April. The mobility measures clearly show the worldwide lockdown in response to the first wave of infections. The mobility drop in April is largest for transit stations and smallest for grocery and pharmacy. The pattern is consistent with many countries’ mobility restriction policies which were mostly relaxed for daily shopping, but strictly maintained for public transportation. Though recovery of mobility outside residential areas starts from April onwards, they never reach February levels before the end of 2020, except for mobility in grocery and pharmacy. The trend of mobility in parks shows a W-shape with the two lowest values in April and November, representing the two waves of mobility restrictions and following recoveries. Figure 3. Trend of Google Mobility Index 15 Figure 4 presents the trend of Google search terms in six panels. Among the various keywords about food, the search for “Bread” increases sharply from March to April but decreases in September. The searches for “Wheat” and “Milk” show a similar trend as “Bread”, though peaking at a lower value. Other items do not show a clear trend. A clear inverted-U shape is found for the search of “money” and “unemployment”, peaking in April and March 2020, respectively. A significant drop in the search for “salary”, “jobs”, “job posting”, “LinkedIn”, “vacancy”, and “employment” starts around March, showing the initial shock of the pandemic. Some of them slightly recover after April, but most remain low. The search for “online school” shows an inverse-U shape, peaking in June, while other searches related to children show possible effects of the beginnings of the pandemic, though not nearly as stark as online schooling searches. The searches for “prayer”, “Quran”, and “Bible” increase rapidly from February to April, in response to the announcement of the pandemic, though it is worth noting that Ramadan started in late April in 2020. Figure 4. Trend of Google Search Items Panel A: Food items Panel B: Food assistance Panel C: Money Panel D: Employment 16 Panel E: Children and schooling Panel F: Religion 4. Predicting Well-Being Outcomes Using Real-Time Indicators Given that the trends shown in the previous section suggest that the real-time indicators may be able to predict the economic impact of the shock, this section investigates the predictive performance of the real-time indicators in multivariate regression models. In the first sub- section we show the results for the early stage of the pandemic. These results use the real-time indicators to predict retrospective questions on income loss and work stoppage since the start of the crisis, exploiting only the first wave of data for each country. In the second sub-section we present the results that predict demeaned variables using the panel of surveys. The final subsection presents results from Shapley decomposition of R2s that describe which types of variables are contributing the most predictive power to the different models. 4.1 Predicting Variation in the Early Impacts of the Pandemic 17 Table 3 reports the cross-sectional regression results of predicting income loss and work stoppage by real-time indicators using the first wave of data collected between April and August 2020. 12 Column 1 shows the results for the share of households reporting total income declines. Our preferred measure of overall model fit is the mean R2 over the full set of one hundred imputations, which is 0.37. More frequent Google searches for “money” and “jobs”, and less frequent searches for “free food” are statistically significant correlates of the prevalence of total income declines. Columns 2-4 show the results for different types of income. The predictive power for a loss of farm income, nonfarm income, and wage income is significantly less, at 0.09, 0.13, and 0.24, respectively. The predictive power of real-time indicators for job loss is slightly lower than total income but substantially higher than those of the different types of income (average R2 = 0.34). The signs of the coefficients when analyzing the initial wave are not always intuitive. For example, searches for “bread” are negatively correlated with total income declines, searches for “Food Bank” are negatively correlated with the prevalence of wage income declines, and searches for “Free Food” are negatively correlated with the prevalence of work stoppage. This likely reflects correlations between Google search behavior and initial conditions of the regions, such as their overall level of development. For example, residents in poorer areas may be more likely to search for food banks or free food in response to an adverse shock, even if those shocks were more prevalent in more affluent areas. Similarly, searches for “jobs” and “unemployment” are negatively correlated with the prevalence of total income declines in the first wave, suggesting that areas hit less hard by the crisis, in terms of the prevalence of income declines, may have been more likely to search for unemployment insurance or new jobs. More generally, the strong correlations in the analysis of the first wave partly reflect the role of pre- crisis regional characteristics that are correlated with both the real-time indicators and the intensity of the shock. The correlations with mobility data are more intuitive, as workplace mobility was negatively correlated with the prevalence of wage income declines. This reflects the fact that the mobility data is measured as a change from a pre-crisis baseline. Google searches for “job postings” are positively correlated with wage income declines, as one would expect. Google searches for 12 Each survey wave is associated with a month, and the real-time indicators for that month were used. 18 the “Quran” were also positively correlated with work stoppage, suggesting that increased religiosity may have played a role as a coping mechanism for those that stopped work in the early stages of the crisis. 4.2 Predicting Variation over Time during the Pandemic Table 4 reports the results of demeaned OLS regressions to predict income loss and working status by real-time indicators using all waves of HFPS data. Demeaning the data by region effectively controls for time-invariant characteristics of the region. The second-to-bottom row of column 1 shows that the predictive power for total income loss since the previous survey is 0.22. The explained variation by real-time indicators for the three subcategories of income loss is slightly larger than that for the total income loss, as shown in columns 2-4. The R2 for farm income loss, nonfarm income loss, and wage income loss is 0.25, and 0.26, and 0.23 respectively. The last column presents the predicting power for whether the respondent was working when the HFPS survey was administered. The mean R2 is 0.15. The results in Table 4 imply that the prevalence of income loss and working status during the survey period is meaningfully associated with real-time indicators during the pandemic. However, this association is weaker than the cross-sectional variation using wave 1, because it does not include any of the correlation between real-time indicators and time-invariant characteristics of the region such as the region’s level of development prior to the crisis. The signs of the statistically significant variables generally make intuitive sense when predicting variation over time. Increased Google searches for “Milk” and “Bible” are a reaction to negative shocks and are therefore positively correlated with income loss and negatively correlated with current work. Google searches for “Gym” are positively associated with total income declines, suggesting that exercise may be a source of stress relief when income declines. Searches for “Jobs” are positively correlated with farm income declines and wage income declines. Google searches for “Free meal” are positively correlated with the prevalence of income declines and “Cheap food” is negatively correlated with employment. Searches for “online school” are negatively associated with the prevalence of non-farm and wage income declines, suggesting that households in harder hit areas are less able to invest in online schooling. Since an individual’s income might rise and fall along with changes in infection risk as well as stringency measures over time during the pandemic, we also investigate how well real-time 19 indicators can predict income gains during the pandemic, in addition to income loss as shown in Table 4. 13 The three columns in Table 5 show the predictive power for farm income gain, nonfarm income gain, and wage income gain respectively. The predictive power for non-farm and farm income gain is highest (R2 = 0.17 and 0.16), followed by wage income gain (R2 = 0.12), and total income gain (R2 = 0.06). The signs of the statistically significant correlations are generally opposite to those for income declines, as one would expect. For example, searches for the “Quran” and “Bible” are negatively correlated with gains in total and non- farm incomes, while searches for “gym” are negatively correlated with non-farm income. Increases in pollution are also positively associated with wage income gains. As shown in Table 6, predictive models for adults skipping meals as the dependent variable have similar explanatory power as those for income gain. Column 1 shows that a higher level of NO2 is associated with a smaller incidence of skipping meals, while additional searches for “bible” are positively associated with skipping meals. The average explained variation over the 100 imputations is 0.16. When evaluating the model using data from another question on food security (“adults going a whole day without food”), the explanatory power is slightly lower (R2 = 0.13). In that model, higher vegetation is associated a greater tendency to skip meals for a full day, as are Google searches for “Milk”, which is consistent with the latter as an indicator of economic distress. The real-time indicators can also predict people’s social distancing and schooling behaviors during the pandemic, as reported in Table 7. The predictive power for social distance is low, however, as the R2 is only 0.13. The predicting power is slightly higher for quitting school, with four variables explaining 16 percent of the variation. However, none of the predictors is statistically significant in either model. Tables 3a through 3d in the appendix reports results on models predicting the prevalence of reported income declines and current employment separately for subsamples of urban and rural regions. These results are therefore analogous to those reported in Tables 3 and 4, using models selected and estimated using only data from urban and rural regions. Urban regions are defined as those for which half or more of the weighted survey respondents in the region 13There are three possible values for the income change variables: increase, decrease, and no change. As such, “increase” is not the complement of “decrease”. 20 are classified as an urban resident, while the rest are considered rural. In wave 1, 66 out of 146 (45 percent) of the regions in the sample are considered urban according to this definition. Across all waves, 223 out of 599 (37 percent) are. 14 The LASSO model selection and estimation in each case are carried out only for the relevant subsample of regions, drawing on multiple imputations conducted in the full sample. Tables 3a-3d generally show much stronger predictive performance of the model in urban regions. In wave 1, the model explains 48 percent of the variation in reported income decreases on average across the imputations in urban regions, as opposed to only 14 percent in rural regions. However, predictions in rural areas are more accurate for the reported declines in farm income in wave 1, explaining 35 percent of the decline in variation, with google search data for assistance, school, and student being statistically significant predictors. When looking at variation within urban regions over time, the model explains 55 percent of the variation in reported declines in total income, with changes in google searches for employment and online school having significant predictive power. The models for predicting variation over time within rural regions are much less accurate, with R2 values ranging from 0.1 for total income to 0.25 for wage income loss. These results suggest that google search data in particular is better suited to predicting changes in welfare in urban areas than rural areas. 4.3 Decomposing the Explanatory Power of Predictors Turning back to the models estimated on the full sample of areas, we conduct Shapley R2 decompositions to better understand the role of different types of variables in explaining variation in the model. Table 8 reports the results, which show the share of the R2 that is accounted for by ten different sets of variables. The sets are the Google mobility data, Google searches related to food (“Milk”, “Bread”, “Cheap food”), Money (“Money”), Jobs (“Jobs”, “Salary”, “Unemployment”, “Vacancy”), assistance (“Assistance”, “Free Food”, “Free Meal”, and “Food Bank”), exercise (“Exercise” and “Gym”), Religion (“Bible”, “Quran”, “Prayer”), and School (“Student”, “Online School”, “Online Education”). The final two sets of variables are the EVI (vegetation), and No2 (a measure of pollution), taken from satellites. 14 These figures for both wave 1 and the full sample are obtained from each multiple imputation replication. 21 The results show that mobility variables, when selected by LASSO, are generally powerful predictors of income and employment changes. When examining the initial impact, they explain between 27 percent (for farm income declines) and 40 percent (for work stoppage) of the explained variation. For the subsequent evolution, they explain between 34 percent (Nonfarm income decrease) and 46 percent (current employment) of the explained variation. The importance of different Google search terms varies considerably by indicator. When predicting initial impacts in the first wave, searches related to food, money, and jobs account for 80 percent of the explained variation. When examining the same variable (the prevalence of total income declines) over time, important variables include food, exercise, and religion, which combined account for over 90 percent of the explained variation. When modeling work stoppage in the initial wave, search related to food, money, and religion contribute 36 percent of the variation. When modeling variation over time in current employment, searches related to food and religion contribute 27 percent of the variation, while jobs and stopping school contribute 5 percent each. In general, Google mobility and search terms explain a much larger share of the variation than the satellite-based measures. Night-time lights, although included in the candidate set of variables, was not selected by lasso for any of the 18 models listed in Table 8. There are a couple of cases however where EVI and NO2 contribute significant variation, particularly EVI in the case of stopping school and not eating all day. NO2 is a significant determinant in particular of non-farm enterprise income gains, which is suggestive of a particularly strong association between household enterprises and mobility. Combined, the Google mobility measures and NO2 explain 51 percent of the variation in the prevalence of non-farm income losses and 59 percent of the prevalence of non-farm income gains. 5. Conclusion and Discussion This paper assesses the ability of real-time indicators to predict household survey-based well- being measures. Reference outcomes are provided by high-frequency phone surveys administered from the second quarter of 2020 to February 2021. The real-time data are publicly available and derived from the internet (Google mobility and Google search) or satellites (night lights, air pollution, and EVI). 22 The survey data confirm that the pandemic had a profound impact on people’s well-being, particularly in the beginning of the pandemic in early 2020. Subsequent recovery is observed for many well-being indicators from mid-2020 onwards. While no single variable performs well in predicting changes in well-being, various combinations of indicators from Google search terms, mobility, and satellite data have considerable explanatory power. These predictive models explain the largest share of the variation when measuring the prevalence of income declines and work stoppage since the onset of the crisis in the initial wave of the survey. This is partly due to the large variation across different regions and countries in economic disruption caused by the initial impact of COVID-19. However, the real-time indicators are also correlated with predetermined regional characteristics, like the overall level of development, that were themselves correlated with the intensity of the shock. When predicting variation within regions across time, however, real-time indicators maintain substantial explanatory power. For example, the indicators can explain about a quarter of the variation in the prevalence of income declines across regions, and about 15 percent of the variation in current employment, income gain, food security, and the share of households with children that stopped school. The real-time indicators considered here are significantly stronger predictors of the prevalence of reported income declines across urban regions, where the indicators explain 55 percent of the prevalence of income declines across regions. The models suggest that Google search terms are more powerful at distinguishing differences in the prevalence of reported income declines across urban regions than rural regions. This is consistent with the fact that most Google search queries are originating from urban areas, due to greater connectivity. Of the real-time indicators, Google mobility data and search data for food, jobs, assistance, exercise, and religion were generally most predictive, though relative importance varies from indicator to indicator. Google search variables accounted for all of the explained variation of the prevalence of total income decrease, both in the first wave and over time, and at least 40 percent of the variation of all indicators related to income and employment. Measures of vegetation (EVI) and pollution (NO2) were also predictive in some cases. However, most models contain a variety of different indicators, highlighting the benefits of training a model that can combine information from disparate sources. 23 Further research could expand on this work in several ways. First, indicators extracted from pre-crisis satellite imagery could be added to the model, to see if they improve predictive power. Second, google search terms are likely affected by seasonal patterns, and accounting for these could also improve predictive power. Third and more broadly, it would be worthwhile exploring additional real-time indicators, such as satellite imagery that measures the pace and characteristics of new building construction in urban areas, and agricultural outcomes in rural areas. Call-detail records from mobile phone operators may also be very useful for this type of nowcasting. Fourth, further work can better understand which indicators and methods perform better in different contexts, as well as how model accuracy depends on the use of different training data to estimate the models. Non-parametric machine learning methods such as extreme gradient boosting may also predict better, particularly with large datasets. Finally, it would also be useful to expand the scope of this type of exercise beyond the specific COVID-19 crisis. More research is needed to better understand the potential biases associated with different real-time indicators in different contexts. For example, Google search terms and mobility measures may not accurately reflect changes in areas where internet connectivity is low and may be more accurate in reflecting sharp downturns during a crisis than gradual progress. Google mobility measures may also suffer from selection bias since they only capture movement for a subset of mobile phone owners. Because of these concerns, we believe that these types of methods require further testing, evaluation, validation, and verification before being implemented to generate experimental official statistics. After better understanding when, where, and why nowcasts work well, these types of methods have the potential to produce extremely useful estimates, particularly in informing policy responses to unanticipated shocks. The results are therefore sufficiently promising to warrant prioritizing further investment in piloting and testing different predictors and methods to generate nowcasts for these and other related socioeconomic outcomes. 24 References Abay, K. A., Tafere, K., & Woldemichael, A. (2020). Winners and losers from COVID-19: Global evidence from Google Search. Policy Research Working Paper 9268. World Bank Group. Baek, C., McCrory, P. B., Messer, T., & Mui, P. (2020). Unemployment effects of stay-at-home orders: Evidence from high frequency claims data. IRLE Working Paper No. 101–20. Banks, J., & Xu, X. (2020). The Mental health effects of the first two months of lockdown during the COVID-19 pandemic in the UK. Fiscal Studies, 41(3), 685–708. Banks, Fancourt, & Xu, X. (2021). Mental Health and the COVID-19 Pandemic. In J. F. Helliwell, R. Layard, J. Sachs, J.-E. De Neve, L. Aknin, & S. Wang (Eds.), World happiness report 2021 (pp. 107–130). New York: United Nations Sustainable Development Solutions Network. Brodeur, A., Clark, A. E., Fleche, S., & Powdthavee, N. (2021). COVID-19, lockdowns and well-being: Evidence from Google Trends. Journal of Public Economics, 193, 104346. Castellanos, P., & Boersma, K. F. (2012). Reductions in nitrogen oxides over Europe driven by environmental policy and economic recession. Scientific Reports, 2, 265. Chen, X., & Nordhaus, W. D. (2011). Using luminosity data as a proxy for economic statistics. PNAS, 108(21): 8589–8594. Coibion, O., Gorodnichenko, Y., & Weber, M. (2020a). Labor markets during the COVID-19 crisis: A preliminary view. NBER Working Paper w27017. Coibion, O., Gorodnichenko, Y., & Weber, M. (2020b). The cost of the COVID-19 crisis: Lockdowns, macroeconomic expectations, and consumer spending. BFI Working Paper 2020–60. Cotofan, M., De Neve, J.-E., Golin, M., Kaats, M., Ward, G. (2021). Work and well-being during COVID-19: Impact, inequalities, resilience, and the future of work. In J. F. Helliwell, R. Layard, J. Sachs, J.-E. De Neve, L. Aknin, & S. Wang (Eds.), World happiness report 2021 (pp. 153–190). New York: United Nations Sustainable Development Solutions Network. Engstrom, R., Hersh, J., &Newhouse, D. (2021). Poverty from space: Using high resolution satellite imagery for estimating economic well-being. The World Bank Economic Review;, lhab015, https://doi.org/10.1093/wber/lhab015 Foa, R., Gilbert, S., & Fabian, M. O. (2020). COVID-19 and subjective well-being: Separating the effects of lockdowns from the pandemic. Cambridge, United Kingdom: Bennett Institute for Public Policy. http://dx.doi.org/10.2139/ssrn.3674080. Gibson, J., Olivia, S., Boe-Gibson, G., & Li, C. (2021). Which night lights data should we use in economics, and where?, Journal of Development Economics, 149, 102602. 25 Google LLC (2021). Google COVID-19 community mobility reports. https://www.Google.com/COVID19/mobility/. Accessed on June 25, 2021. Helliwell, J. F., Huang, H., Wang, S., & Norton, M. (2021). World Happiness during COVID- 19. In J. F. Helliwell, R. Layard, J. Sachs, J.-E. De Neve, L. Aknin, & S. Wang (Eds.), World happiness report 2021 (pp. 13–56). New York: United Nations Sustainable Development Solutions Network. Henderson, J. V., Storeygard, A., & Weil, D. N. (2012). Measuring economic growth from outer space. American Economic Review, 102(2), 994–1028. Ishizawa, O. A., Miranda, J. J., & Zhang, H. (2017). Understanding the impact of windstorms on economic activity from night lights in Central America. World Bank Policy Research Working Paper # 8124. Keola, S., & Hayakawa, K. (2021). Do lockdown policies reduce economic and social activities? Evidence from NO2 emissions. The Developing Economies, 59(2): 178–205. Khachiyan, A., Thomas, A., Zhou, H., Hanson, G. H., Cloninger, A., Rosing, T., & Khandelwal, A. (2021). Using Neural Networks to Predict Micro-Spatial Economic Growth (No. w29569). National Bureau of Economic Research. Kim, B., & Zhao, Y. (2020). Psychological suffering owing to lockdown or fear of infection? Evidence from the COVID-19 outbreak in China. Discussion Paper Series 2008, Institute of Economic Research, Korea University. Ma, M., Wang, S., & Wu, F. (2021). COVID-19 and well-being: Lessons from East Asia. In J. F. Helliwell, R. Layard, J. Sachs, J.-E. De Neve, L. Aknin, & S. Wang (Eds.), World happiness report 2021 (pp. 57–90). New York: United Nations Sustainable Development Solutions Network. Masaki, T., Nakamura, S., & Newhouse, D. (2020). How is the COVID-19 crisis affecting Nitrogen Dioxide emissions in Sub-Saharan Africa?, World Bank Mellander, C., Lobo, J., Stolarick, K., & Matheson, Z. (2015). Night-time light data: A good proxy measure for economic activity? PLoS ONE, 10(10): e0139779. Putra, R. A. A., & Arini, S. (2020). Measuring the economics of a pandemic: How people mobility depict economics? An evidence of people’s mobility data towards economic activities. Working Paper. Sampi, J., & Jooste, C. 2020. Nowcasting economic activity in times of COVID-19: An approximation from the Google Community Mobility Report. Policy Research Working Paper No. 9247. Washington, DC: World Bank. Taquet, M., Luciano, S., Geddes, J. R., & Harrison, P. J. (2021). Bidirectional associations 26 between COVID-19 and psychiatric disorder: Retrospective cohort studies of 62 354 COVID-19 cases in the USA. The Lancet Psychiatry, 8(2): 130–140. Taquet, M., Geddes, J. R., Husain, M., Luciano, S., Harrison, P. J. (2021). 6-month neurological and psychiatric outcomes in 236 379 survivors of COVID-19: A retrospective cohort study using electronic health records. The Lancet Psychiatry, 8(5), 416–427. World Bank Group (2023). Household Monitoring Systems to Track the Impacts of the COVID-19 Pandemic. Washington, D.C. Available at: https://www.worldbank.org/en/topic/poverty/brief/high-frequency-monitoring-surveys Zhang, Y., Li, R., & Tsai, C.-L. (2010). Regularization parameter selections via generalized information criterion. Journal of the American Statistical Association, 105(489): 312–323. 27 Table 1: Outcome variables Outcome Survey question Was the respondent working at the time the survey was conducted?” and “Was the respondent Work stoppage (wave 1) working before the pandemic?” Employment “Was the respondent working at the time the survey was conducted?” Total Income Change “Has your Total Household Income changed since the pandemic started?” Farm Income Change “Has income from farming, fishing, or livestock changed since the onset of the crisis / last round of survey?” Non-farm Income “Has income from farming, fishing, or livestock changed since last round of survey?” Change Wage Income Change “Has your Wage Income changed since the last round of the survey?” Skip meal “In the last week did you or another adult in the household skip a meal due to lack of money or resources? Not eat for a day “Was there a time in the last week when you or another adult in the household went an entire day without food due to lack of money or other resources?”. Social Distancing “Have you adopted social distancing/self-isolation?” Quit school ““Have children in the HH attended school (primary/secondary) before school closures?” and “Have children in the HH engaged in ANY learning or educational services since school closures?" Educational Engagement “Have children in the HH engaged in ANY learning or educational services since school closures?" 28 Table 2. Candidate Real-Time Indicators Category Description Google Workplaces Mobility trends for places of work Mobility Data Residential Mobility trends for places of residence Mobility trends for places like grocery markets, food warehouses, farmers markets, specialty Grocery and pharmacy food shops, drug stores, and pharmacies Transit stations Mobility trends for places like public transport hubs, such as subway, bus, and train stations Retail and recreation Mobility trends for places like restaurants, cafes, shopping centers, theme parks, museums, libraries, and movie theatres Parks Mobility trends for places like local parks, national parks, public beaches, marinas, dog parks, plazas, and public gardens Google Food “Bread”, “Cheap Food”, “Milk” Search Data Money “Money” Jobs “Jobs”, “Salary”, “Unemployment”, “Vacancy” Assistance “Assistance”, “Free Food”, “Free Meal”, “Food Bank” Exercise “Exercise”, “Gym” Religion “Bible”, “Quran”, “Prayer” School “Student”, “Online School”, “Online Education” Satellite Data EVI Enhanced Vegetation Index (Landsat) No2 Nitrogen Dioxide (Sentinel 5-Precursor) NTL Night-time lights (VIIRS) 29 Table 3. Predicting Income Loss in Wave 1 Using Real-Time Indicators % Total income % Farm income % Nonfarm % Wage income % Work decrease decrease income decrease decrease Stoppage GS_Food_Bank -0.003 -0.058 (1.12) (3.12)** GS_Bread -0.042 -0.016 (3.13)** (1.61) GS_Money 0.053 0.057 0.018 (2.74)* (2.08) (1.02) GS_Jobs -0.245 -0.250 -0.043 -0.052 -0.068 (2.50)* (2.03) (0.49) (1.05) (1.13) GS_Childcare -0.009 (0.62) GS_Unemployment -0.043 -0.037 -0.015 -0.005 (3.09)** (2.67)* (2.12) (0.32) GS_School -0.025 (0.76) GS_Student 0.051 0.020 (1.61) (1.25) Mobil_Residential 0.004 0.005 (0.91) (2.09) EVI 0.185 (1.88) Mobil_Workplace -0.002 (3.41)** GS_Job_Posting 0.025 (2.32)* GS_Employment 0.039 (1.04) GS_Prayer 0.026 (0.97) GS_Bible 0.067 0.016 (1.96) (0.60) No2 -1.723 (2.33)* Mobil_Parks -0.001 (1.43) GS_Free_Food -0.012 (3.76)** GS_Milk 0.017 (0.98) GS_Quran 0.056 (2.30)* _cons 0.544 0.512 0.585 0.318 0.244 (16.96)** (5.95)** (12.47)** (10.85)** (5.46)** R2 0.36 0.08 0.12 0.25 0.36 R2_Mean 0.37 0.09 0.13 0.24 0.34 N 258 210 293 254 308 Notes: * p<0.05; ** p<0.01. Results show multiple imputation estimation of OLS regressions, with standard errors clustered on country. Unit of observation is admin-1 region. R2 refers to R2 from the arbitrary selected imputation used to select the model. R2_Mean refers to mean R2 over 100 imputations, each estimated using different lasso-selected models. Models were selected using BIC-minimizing lasso clustered on country. GS refers to Google search, Mobil to mobility, EVI to enhanced vegetation index, and No2 to Nitrogen Dioxide 30 Table 4. Predicting Income Losses Within Region Across Time % Total % Farm % Nonfarm % Wage income % Current income loss income loss Income loss loss work GS_Free_Food 0.008 0.011 0.010 0.014 -0.007 (0.80) (0.76) (1.19) (2.13) (1.77) GS_Milk 0.078 0.085 0.083 0.055 -0.030 (2.67)* (2.36)* (2.26)* (1.53) (2.48)* GS_Groceries 0.010 0.003 (0.43) (0.09) GS_Money 0.023 0.031 0.003 -0.001 (1.30) (1.37) (0.15) (0.04) GS_Bible 0.075 0.038 0.107 0.055 -0.039 (2.28)* (1.52) (4.38)** (2.25)* (2.43)* GS_Quran 0.011 -0.025 (0.92) (1.99) GS_Exercise 0.019 0.019 (0.85) (0.76) GS_Gym 0.085 0.067 0.073 0.015 -0.018 (2.34)* (1.43) (1.69) (0.44) (1.13) EVI -0.236 (1.09) Mobil_Residential -0.016 -0.002 (2.03) (0.60) Mobil_RetailRec -0.007 -0.007 -0.003 (2.02) (1.94) (1.38) Mobil_Transit 0.001 -0.001 -0.000 0.001 (0.44) (0.24) (0.12) (0.93) Mobil_GrocPharm -0.004 -0.003 0.002 (1.71) (1.50) (1.99) GS_Free_Meal 0.049 0.027 (6.42)** (2.82)* GS_Jobs 0.091 0.085 0.064 -0.018 (2.34)* (2.07) (2.46)* (0.95) GS_Prayer 0.013 (0.45) No2 -4.251 -2.082 1.923 (4.62)** (2.85)* (2.81)* Mobil_Workplace 0.007 0.005 (1.67) (1.30) GS_Online_School -0.033 -0.031 (3.25)** (4.69)** GS_Student -0.014 (1.07) Mobil_Parks -0.001 (1.78) GS_Food_Bank 0.005 (0.81) GS_Cheap_Food -0.014 (2.52)* GS_Assistance 0.009 (0.91) GS_Bread 0.008 (0.75) 31 GS_Childcare -0.002 GS_Job_Posting -0.015 (1.30) GS_School 0.029 (2.01) _cons -0.005 -0.006 -0.004 -0.001 0.001 (1.45) (1.34) (0.94) (0.22) (0.94) R2 0.16 0.28 0.28 0.24 0.20 R2_Mean 0.22 0.25 0.26 0.23 0.15 N 561 490 655 619 1,069 * p<0.05; ** p<0.01. See notes to Table 2 32 Table 5. Predicting Income Gains Within Region Across Time Total income gain Farm income gain Nonfarm Wage income gain income gain Mobil_Parks 0.000 0.000 (0.24) (0.55) GS_Bread -0.011 (0.80) GS_Groceries -0.015 (1.79) GS_Money -0.000 -0.015 0.001 -0.010 (0.01) (1.11) (0.09) (1.35) GS_Vacancy -0.008 (1.17) GS_Unemployment -0.018 -0.019 -0.018 (1.65) (0.70) (1.35) GS_Quran -0.023 0.010 (3.44)** (1.06) Mobil_Transit 0.002 0.002 0.001 (1.18) (2.00) (1.62) GS_Free_Meal -0.053 (2.46)* No2 2.225 0.848 (2.59)* (1.79) GS_Wheat 0.008 (0.75) GS_Jobs -0.048 -0.010 (1.24) (0.51) GS_Student 0.003 0.001 (0.26) (0.14) GS_Online_School 0.018 0.011 (2.28)* (1.87) GS_Bible -0.041 -0.020 (3.33)** (1.41) GS_Gym -0.032 (2.64)* GS_Free_Food -0.010 (1.46) GS_Salary -0.009 (0.86) GS_Linkedin 0.033 (1.41) _cons 0.002 0.003 0.003 0.001 (1.94) (1.18) (1.79) (0.66) R2 0.06 0.18 0.18 0.11 R2_Mean 0.06 0.16 0.17 0.12 N 561 490 655 619 * p<0.05; ** p<0.01 33 Table 6. Predicting Food Insecurity Within Region Across Time Skip meal Skip meal all day No2 -1.493 -0.992 (2.68)* (1.56) EVI 0.125 0.263 (1.00) (2.35)* Mobil_Residential -0.001 (0.33) Mobil_Transit -0.001 -0.001 (0.99) (1.50) Mobil_GrocPharm -0.001 (1.82) GS_Milk 0.012 0.020 (0.73) (3.27)** GS_Groceries -0.003 -0.001 (0.45) (0.26) GS_Money 0.016 -0.000 (1.96) (0.06) GS_Jobs 0.022 -0.008 (1.69) (0.60) GS_Student -0.017 (1.53) GS_Prayer -0.010 (1.28) GS_Bible 0.027 0.011 (3.27)** (1.36) Mobil_Parks -0.000 (0.20) Mobil_Workplace 0.001 (1.47) GS_Unemployment -0.004 (0.83) GS_Gym 0.017 (1.55) _cons 0.002 0.002 (0.64) (1.40) R2 0.19 0.15 R2_Mean 0.16 0.13 N 788 893 * p<0.05; ** p<0.01 34 Table 7. Predicting Social Distancing and Schooling Changes Within Region Across Time EVI 0.121 -1.646 (0.91) (2.71)* Mobil_Parks 0.000 (0.25) Mobil_RetailRec -0.001 (0.31) Mobil_Workplace -0.000 (0.01) GS_Assistance -0.013 (1.25) GS_Groceries 0.012 (2.01) GS_Money 0.003 (0.15) GS_Salary -0.073 (1.68) GS_Linkedin 0.009 (0.61) GS_Vacancy -0.021 (1.08) GS_Student -0.024 (1.61) GS_Prayer -0.006 (0.49) GS_Quran -0.026 (1.13) GS_Exercise -0.017 (0.81) No2 -4.307 (1.47) _cons 0.009 0.750 (1.66) (19.81)** R2 0.15 0.09 R2_Mean 0.13 0.09 N 675 890 * p<0.05; ** p<0.01 35 Table 8: Results from Shapley decompositions of R2 by outcomes Indicators Search Search Geo Mean Mobility Food Money Jobs Assistance Exercise Religion School Total EVI No2 R2 R2 Initial impact Total income decrease 37% 20% 22% 10% 4% 6% 100% 0.36 0.37 Farm income decrease 27% 73% 73% 0.08 0.09 Nonfarm income decrease 80% 20% 100% 0.12 0.13 Wage income decrease 28% 6% 15% 15% 24% 2% 62% 9% 0.25 0.24 Work stoppage 40% 18% 11% 7% 11% 2% 49% 10% 0.36 0.34 Evolution of crisis Total income decrease 30% 7% 33% 29% 100% 0.16 0.22 Farm income decrease 44% 14% 5% 3% 21% 5% 9% 57% 0% 0.28 0.25 Nonfarm income decrease 34% 12% 3% 7% 10% 15% 3% 50% 17% 0.28 0.26 Wage income decrease 43% 14% 1% 4% 3% 2% 12% 10% 46% 10% 0.24 0.23 Current employment 46% 10% 1% 5% 5% 2% 17% 5% 45% 9% 0.20 0.15 Total income gain 3% 31% 1% 24% 41% 97% 0.06 0.06 Farm income gain 31% 3% 2% 63% 68% 0.18 0.16 Nonfarm income gain 32% 1% 1% 12% 7% 14% 7% 42% 27% 0.18 0.17 Wage income gain 20% 10% 21% 6% 18% 13% 68% 18% 0.11 0.12 Skip meal 55% 5% 7% 2% 12% 2% 28% 9% 8% 0.19 0.16 Don’t eat all day 17% 12% 0% 2% 6% 4% 24% 42% 18% 0.15 0.13 Social Distance 22% 6% 1% 34% 3% 6% 18% 68% 11% 0.14 0.15 Educational Engagement 0% 92% 8% 0.09 0.09 Note: Missing values for Google Mobility data were imputed 100 times using chained multiple imputation. Cells show results from Shapley decomposition of R2 of results from one imputation. R2 refers to R2 using that imputation, while mean R2 refers to mean R2 over 100 imputations using different lasso-selected models. Food includes “Cheap Food”, “Bread”, and “Milk”. Money refers to “Money”. Jobs includes “Jobs”, “Salary”, “Unemployment”, and “Vacancy”. Assistance includes to “Assistance”, “Free Food”, “Free Meal”, and “Food Bank” . Exercise includes “Exercise” and “Gym” , Religion includes “Bible”, “Quran”, and “Prayer”. School includes “Student”, “Online School”, and “Online Education”. EVI is vegetation index, No2 is Nitrogen Dioxide, and NTL is night time lights. 36 Appendix Table 1. Summary Statistics of HFPS Well-being Measures Variable Obs Mean Std. Dev. Min Max Panel A: wave 1 Income loss 636 0.726 0.241 0 1 Farm income loss 437 0.690 0.258 0 1 Nonfarm income loss 536 0.802 0.199 0 1 Wage income loss 447 0.526 0.232 0 1 Stop working 679 0.334 0.220 0 1 Panel B: all waves Income loss 1,343 0.693 0.345 0 1 Farm income loss 983 0.603 0.303 0 1 Nonfarm income loss 1,203 0.622 0.296 0 1 Wage income loss 1,106 0.398 0.237 0 1 Currently working 1,970 0.581 0.239 0 1 Farm income gain 983 0.068 0.130 0 1 Nonfarm income gain 1,203 0.066 0.116 0 1 Wage income gain 1,106 0.067 0.102 0 1 Skip meal 1,400 0.352 0.259 0 1 Skip meal for one whole day 1,563 0.135 0.157 0 1 Social distance 1,002 0.870 0.235 0 1 Quit school 1,024 0.317 0.309 0 1 37 Appendix Table 2A. Summary Statistics of Real-time Well-being Measures in Wave 1 Variable Obs Mean Std. Dev. Min Max Night light 549 1.512 4.435 0.098 58.793 NO2 (10-2) 597 0.017 0.013 -0.002 0.139 EVI 653 0.294 0.122 0.023 0.609 Google mobility Grocery and pharmacy 364 -23.831 21.411 -74.484 53.452 Parks 358 -34.055 43.503 -91.333 427.769 Residential 322 16.070 8.841 -4.161 37.867 Retail and recreation 376 -42.650 25.560 -86.645 110.968 Transit stations 317 -52.784 23.937 -90.258 118.593 Workplaces 402 -27.308 17.867 -69.097 21.433 Google search Food bank 579 0.191 2.231 -0.116 35.436 Free food 579 0.156 1.426 -0.155 13.896 Free meal 579 0.066 1.184 -0.107 15.255 Cheap food 579 -0.036 0.738 -0.164 7.748 Food supply 579 0.301 1.944 -0.110 15.221 Assistance 579 0.120 1.011 -0.606 4.847 Bread 579 0.895 1.671 -0.735 5.977 Milk 579 0.490 1.278 -1.012 3.464 Grocery 579 0.091 1.077 -0.384 7.687 Wheat 579 0.512 1.508 -0.496 6.095 Money 579 0.621 1.207 -1.085 4.579 Salary 579 0.009 0.888 -0.791 3.359 Jobs 577 -0.189 0.782 -0.951 3.181 Childcare 579 0.208 1.421 -0.230 8.899 Job posting 579 -0.050 0.764 -0.262 5.879 Temporary job 579 0.051 1.117 -0.095 17.291 Linkedin 577 0.062 0.970 -0.613 4.937 Vacancy 575 0.034 0.936 -0.386 5.984 Unemployment 575 0.282 1.134 -0.358 7.700 Employment 575 -0.120 0.836 -0.809 3.707 School 572 0.106 0.959 -0.996 3.027 Student 575 0.215 1.079 -0.824 4.728 Online education 575 0.262 1.700 -0.180 12.557 Online school 575 0.397 1.934 -0.260 16.821 Prayer 575 0.918 1.538 -0.823 5.199 Bible 573 0.476 1.230 -0.861 4.318 Quran 574 0.315 1.452 -0.401 6.780 Exercise 574 0.683 1.575 -0.765 5.064 Gym 573 -0.092 0.799 -0.791 2.919 38 Appendix Table 2B. Summary Statistics of Real-time Well-being Measures in 2020 Variable Obs Mean Std. Dev. Min Max Night light 1,553 1.456 4.821 0 70.152 NO2 (10-2) 1,671 0.019 0.021 -0.002 0.549 EVI 1,802 0.284 0.127 -0.032 0.609 Google mobility Grocery and pharmacy 934 -19.039 20.310 -74.484 79.839 Parks 935 -30.401 44.442 -91.452 427.769 Residential 843 14.271 8.389 -5.967 37.867 Retail and recreation 961 -37.026 24.111 -86.645 110.968 Transit stations 816 -47.092 24.561 -90.258 118.593 Workplaces 1,084 -22.785 16.312 -69.097 21.433 Google search Food bank 1,586 0.028 1.418 -0.116 35.436 Free food 1,586 0.051 1.109 -0.155 13.896 Free meal 1,586 0.018 0.961 -0.107 15.255 Cheap food 1,586 -0.091 0.544 -0.164 8.185 Food supply 1,586 0.091 1.323 -0.110 15.221 Assistance 1,586 0.089 0.987 -0.606 4.912 Bread 1,586 0.660 1.498 -0.735 5.977 Milk 1,586 0.325 1.224 -1.012 3.464 Grocery 1,586 0.018 0.924 -0.384 7.687 Wheat 1,586 0.404 1.401 -0.496 6.213 Money 1,586 0.501 1.101 -1.085 4.579 Salary 1,586 -0.096 0.811 -0.791 3.359 Jobs 1,584 -0.296 0.717 -0.951 4.104 Childcare 1,586 0.070 1.092 -0.230 8.899 Job posting 1,586 -0.061 0.764 -0.262 6.388 Temporary job 1,586 0.018 1.041 -0.095 17.291 Linkedin 1,580 -0.001 0.945 -0.613 6.019 Vacancy 1,574 0.020 0.905 -0.386 5.984 Unemployment 1,574 0.274 1.166 -0.358 9.649 Employment 1,574 -0.162 0.822 -0.809 3.707 School 1,571 0.049 0.899 -0.996 4.343 Student 1,574 0.125 1.016 -0.824 4.728 Online education 1,574 0.154 1.476 -0.180 15.073 Online school 1,574 0.246 1.605 -0.260 16.821 Prayer 1,574 0.655 1.467 -0.823 5.199 Bible 1,570 0.351 1.201 -0.861 4.318 Quran 1,571 0.052 1.051 -0.401 6.780 Exercise 1,571 0.560 1.541 -0.765 5.064 Gym 1,569 -0.140 0.816 -0.791 2.919 39 Appendix Table 3a: Predicting Income Loss in Urban Regions in Wave 1 Using Real-Time Indicators income_decrease_m farminc_decrease_m nonfarminc_decrease_m wageinc_decrease_m ean ean ean ean -0.002 (2.08) -0.058 (5.90)** 0.027 (1.53) -0.010 (0.97) -0.213 -0.133 -0.046 (2.13) (1.04) (0.72) -0.017 0.076 (0.93) (2.40)* -0.027 -0.026 (2.82)* (1.04) 0.016 (0.48) 0.031 (1.58) -0.074 (3.60)** -0.059 (0.51) 0.002 (0.15) 0.120 0.075 0.095 (2.79)* (1.55) (2.19)* -0.000 (0.11) -0.067 (1.88) 0.041 (1.46) -0.024 (0.81) 0.072 (2.01) 0.034 (1.00) 0.550 0.638 0.650 0.383 (17.02)** (17.85)** (13.85)** (10.32)** 0.51 0.26 0.06 0.14 40 0.48 0.26 0.05 0.21 162 121 170 151 Appendix Table 3b: Predicting Income Loss in Wave 1 in Rural Regions Using Real- Time Indicators Predicting Income Loss in wave 1 Using Real-Time Indicators income_decre farminc_decre nonfarminc_decr wageinc_decre Labo_st ase_mean ase_mean ease_mean ase_mean op Mobil_Reta 0.002 ilRec (1.54) Mobil_Wor -0.004 -0.001 -0.002 kplace (2.13) (0.56) (3.66)** GS_Jobs -0.069 (0.67) GS_Unemp -0.054 -0.039 loyment (3.78)** (2.06) Mobil_Resi 0.003 dential (0.60) GS_Assista 0.163 nce (2.42)* GS_School -0.197 (3.98)** GS_Student 0.108 (2.76)* Mobil_Park 0.000 -0.002 s (0.31) (1.75) GS_Money 0.084 (2.26)* GS_Gym -0.115 (1.38) Nighttime_ 0.051 Lights (1.21) GS_Bread 0.013 (0.48) _cons 0.543 0.595 0.566 0.393 0.238 (8.74)** (5.10)** (9.82)** (22.25)** (3.91)* * R2 0.20 0.39 0.28 0.12 0.15 R2_Mean 0.14 0.35 0.26 0.14 0.16 N 96 89 123 103 125 41 Appendix Table 3C: Predicting Income Losses Within Urban Regions Across Time Total_Incom Farm_Incom NonFarm_Inco Wage_Inco CurrentW eLoss eLoss meLoss meLoss ork No2 -4.766 -7.275 -3.440 4.707 (1.27) (2.55)* (1.46) (3.59)** Nighttime_Lig 0.015 0.022 0.022 hts (1.02) (2.27)* (1.09) Mobil_Residen 0.022 0.010 0.015 -0.004 tial (2.64)* (1.33) (3.06)** (0.83) Mobil_Workpl -0.001 -0.007 ace (0.23) (2.25)* GS_Free_Meal 0.001 0.042 (0.17) (9.87)** GS_Cheap_Fo -0.019 od (1.99) GS_Bread -0.073 0.029 (2.97)* (1.96) GS_Milk 0.066 0.097 0.063 0.006 (1.51) (2.30)* (2.02) (0.25) GS_Groceries -0.001 0.014 (0.06) (2.44)* GS_Wheat 0.031 -0.002 0.050 (1.96) (0.07) (2.57)* GS_Money -0.005 0.027 -0.010 (0.20) (1.95) (1.05) GS_Salary 0.034 -0.010 (1.19) (0.75) GS_Job_Postin 0.017 -0.012 g (0.96) (1.20) GS_Linkedin 0.005 -0.033 (0.24) (1.19) GS_Employme 0.043 -0.017 nt (2.39)* (1.39) GS_Student 0.022 0.054 -0.002 (0.61) (1.52) (0.09) GS_Online_Sc -0.044 -0.030 0.001 hool (4.32)** (4.79)** (0.19) GS_Bible 0.031 0.099 0.044 -0.034 (1.57) (2.73)* (1.77) (2.69)* GS_Exercise 0.006 0.012 0.015 -0.035 42 (0.45) (0.41) (0.98) (3.86)** GS_Gym 0.049 0.042 0.035 0.022 (1.49) (0.93) (0.81) (0.65) GS_Free_Food 0.030 0.004 -0.012 (2.72)* (0.39) (2.03) GS_Assistance 0.031 0.000 -0.002 (0.94) (0.02) (0.18) GS_Vacancy -0.047 0.011 (1.99) (0.82) GS_Online_Ed 0.040 ucation (1.55) EVI 0.014 (0.03) GS_Quran -0.051 -0.030 (2.06) (1.69) Mobil_Parks -0.001 (3.12)** Mobil_RetailR 0.001 ec (0.58) Mobil_GrocPh 0.002 arm (2.57)* GS_Jobs -0.044 (1.75) GS_Childcare -0.000 (0.08) GS_Unemploy 0.001 ment (0.12) GS_Prayer 0.024 (2.17)* _cons -0.011 -0.006 -0.005 -0.009 0.000 (1.54) (0.59) (0.84) (1.55) (0.29) R2 0.54 0.38 0.29 0.38 0.38 R2_Mean 0.55 0.37 0.29 0.38 0.33 N 293 257 330 330 576 Appendix Table 3D: Predicting Income Losses Within Rural Region Across Time Predicting Income Losses Within Rural Regions Across Time Total_Inco Farm_Inco NonFarm_Inc Wage_Inco Current meLoss meLoss omeLoss meLoss Work Mobil_Retai -0.001 -0.003 -0.002 lRec (1.29) (0.65) (0.85) GS_Groceri 0.022 es 43 (1.56) EVI -0.560 (1.89) Mobil_Groc -0.004 -0.003 -0.003 0.001 Pharm (1.83) (1.27) (1.73) (0.58) GS_Jobs 0.214 0.097 0.059 (2.71)* (1.16) (1.32) GS_Prayer 0.057 0.070 (1.71) (2.55)* GS_Gym 0.132 0.142 0.015 -0.032 (1.70) (1.45) (0.28) (1.33) No2 -4.127 -1.550 1.307 (4.36)** (2.00) (1.64) Mobil_Tran -0.001 -0.000 0.002 sit (0.27) (0.14) (1.23) Mobil_Wor 0.006 0.007 kplace (1.40) (1.54) GS_Free_Fo 0.029 0.093 od (0.88) (3.80)** GS_Milk 0.164 0.116 -0.046 (2.84)* (2.41)* (2.86)* GS_Student -0.009 -0.019 (0.40) (1.47) GS_Bible 0.079 -0.057 (1.92) (2.61)* GS_School 0.012 0.044 (0.39) (2.75)* GS_Online_ -0.032 School (2.15) GS_Assista 0.016 nce (0.96) GS_Job_Pos -0.073 ting (2.48)* _cons -0.001 -0.011 -0.001 0.003 0.002 (0.38) (2.23) (0.17) (1.27) (0.65) R2 0.06 0.18 0.27 0.28 0.12 R2_Mean 0.10 0.18 0.24 0.25 0.12 N 268 233 325 289 493 Note: Rural regions are defined as regions where half or more of weighted survey respondents were classified as residing in rural areas. 44