Policy Research Working Paper 9961 Nudging in the Time of the Coronavirus Evidence from an Experimental Tax Trial in Albania at the Onset of a Global Pandemic Jonathan Karver Hilda Shijaku Christoph Ungerer Macroeconomics, Trade and Investment Global Practice & Poverty and Equity Global Practice March 2022 Policy Research Working Paper 9961 Abstract This paper presents the results of a randomized controlled the impact of receiving the letters could be estimated). For trial testing the effectiveness of taxpayer communications employers receiving soft-tone letters, the study finds large, informed by behavioral science in inducing business payroll statistically significant increases on subsequent payroll dec- tax compliance at the onset of the COVID-19 pandemic. In larations (by as much as 10 percent relative to the control March 2020, an experimental tax trial targeting 5,423 firms group), which gradually attenuate over the following six was implemented, coinciding with the national lockdown months. No statistically significant effects are found for due to the global pandemic. The Albanian tax authority letters sent to employees or strong-tone letters. The findings sent postal letters to employers and selected employees highlight (i) the importance of framing of communications highlighting a suspicion that wages were under-declared as well as the importance of smart selection of letter recip- to avoid personal income tax withholding. Employers and ients for taxpayer communication campaigns, (ii) which employees suspected of under-declaring were randomly type of taxpayer communications were most effective in assigned to receive a soft-tone letter (highlighting the social the context of the COVID-19 pandemic, and (iii) the role importance of contributing through taxes), a strong-tone that randomized controlled trials and behavioral science letter (highlighting the penalties associated with under-de- can play in strengthening the effectiveness of government claring), or none (forming a control group against which policy, particularly for public revenue mobilization. This paper is a product of the Macroeconomics, Trade and Investment Global Practice and the Poverty and Equity Global Practice. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http:// www.worldbank.org/prwp. The authors may be contacted at jkarver@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Nudging in the Time of the Coronavirus: Evidence from an Experimental Tax Trial in Albania at the Onset of a Global Pandemic March 2022 1 Jonathan Karver† Hilda Shijaku‡ Christoph Ungerer‡ Keywords: Albania, tax compliance, informality, behavioral insights, behavioral science, tax evasion JEL Classification: H26, H30, H32, D90, Z18 1 The authors are incredibly grateful to the dedicated staff at the General Directorate of Taxation who collaborated on the design, implementation, and evaluation of this study. The authors would like to extend a special acknowledgement to Delina Ibrahimaj, Ceno Klosi, Gjergi Duro, and Orald Lani, who were fundamental to the successful design and implementation of the project. The authors are also grateful for excellent feedback received during the presentation of the paper at the 7th Shadow Conference, particularly Edgar Castro and Gabriel Tourek. † World Bank, Poverty & Equity Global Practice. Corresponding author (jkarver@worldbank.org) ‡ World Bank, Macroeconomics, Trade, and Investment Global Practice ‡ World Bank, Macroeconomics, Trade, and Investment Global Practice I. Introduction The goal of assuring an efficient and effective tax system is easier said than done. This goal is particularly challenging in low- and middle-income countries where enforcement capacity is limited and the resources (political, financial, and technological) of tax administrations to innovate are constrained. Further complicating this goal is the notion that standard policy instruments used to improve compliance -including strategic reforms to legislation and tax administration -might fall short of securing efficient and effective tax systems. The introduction of the COVID-19 pandemic stands in the way of this goal by requiring a delicate balance between compassion for the difficult circumstances faced by many taxpayers and the need to finance economic support packages for those most impacted by the crisis. In this challenging environment, an understanding of human behavior and the ways in which different stakeholders interact with policies and systems can make the goal of more efficient and effective taxation more attainable. Many studies have shown that behaviorally informed notifications (applying insights from behavioral science to improve the way tax administrations communicate with taxpayers) can lead to more honest and timely tax declarations and payments (e.g., see Hallsworth et al. 2017; Kettle et al. 2016; and Mascagni et al. 2017), but less is known about what works in a global crisis like COVID-19. The comparative advantage of this “nudging” approach (relative to others informed by behavioral science 2 and the tax literature more generally) is that it can be achieved typically at a very low (financial and political) cost, while generating almost immediate evidence on changes of behavior in the short term. The approach involves strategically engaging with taxpayers through targeted notifications or reminders to induce a change in behavior -namely through simplification of technical language, clear calls to action, and highlighting aspects of taxes that might typically be overlooked (e.g., the enforcement capacity of the tax authorities or how taxes finance public goods). This approach has typically considered two types of framing of nudges: one strong (hard)-tone and another soft-tone, each motivated by the assumption that in some contexts (or for some segments of taxpayers) the binding constraint to compliance is related to perceptions of enforcement (and as such, audit probability) while in others, the binding constraints is related to low tax morale, a lack of knowledge about the role of taxes, or perceptions of the behavior of others (among others). Testing the two approaches against each other provides useful evidence of where tax administrations should invest resources in engagement approaches (in terms of motivating voluntary, rather than enforced, compliance), particularly in crisis contexts. This approach further relies on the collection of diagnostic information regarding existing perceptions about tax compliance and institutional trust, which motivate the type of language and framing that ex-ante is expected to have the most meaningful impact (see, for example, Hernandez et al. 2019). As is the case in many low- and middle-income countries -particularly in the Western Balkans (World Bank, 2019) -under-declaration of personal income tax withholding is common and results from perverse incentives for employees to work informally. This problem is commonly materialized through the phenomenon of envelope or shadow employment (Schneider & Enste, 2000), which can take the form of shadow workers (employees at a firm with no contract, and thus nonexistent in the eyes of the authorities) or shadow wages (some nonnegligible proportion of an employees’ wages to reduce a firm’s tax burden and/or share of social security or other contributions). Albania is no exception to this problem, and the tax 2 For example, through the simplification of the choice architecture faced by taxpayers when carrying out their obligations (e-filing, simplified websites, simplified tax forms). 2 authorities suspect that many firms under-declare pay for their workers based on an evaluation of market wages for select sectors. Shadow (or envelope) wages are not unique to Albania; they represent important development challenges across the globe (Medina & Schneider, 2019) and their consequences are multifaceted (Schneider & Enste, 2000). For one, they put substantial strain on public revenue collection, which limits the reach of the government in providing public services and financing welfare programs through effective redistribution. For another, they undermine worker welfare through their adverse impact on social security contributions and unemployment insurance, the latter of which is particularly relevant during economic crises. Workers “unknown” to tax authorities are unable to claim benefits to address unexpected changes to welfare resulting from job loss or reduction of working hours (or contribute to pensions for retirement). Moreover, it becomes more difficult (or impossible) for workers to resolve labor disputes given unfair treatment from employers, unlawful termination, discrimination, or compensation from injuries or death sustained in the workplace. Since the types of benefits (regular or ad-hoc) provided to workers when loss of income occurs are partly financed by different types of taxes, highlighting the social and private costs of evasion can be effective at changing behavior, particularly when information gaps exist or at times when the role of the state becomes increasingly important (e.g., during a global pandemic). Our study set out to explore if targeted notifications to suspicious taxpayers (those suspected of underreporting wages for some of their employees) work to change behavior, and whether the language (e.g., threat of punishment vs. elevating tax morale) and/or the target audience (employee vs. employer) matters in these communications. Our study is motivated by three questions: (i) can letters addressed to suspected taxpayers improve declaration behavior and increase payroll declarations? (ii) Which type of communications works best, notifications emphasizing the enforcement capacity of tax authorities or those meant to increase intrinsic and extrinsic motivation for compliance? And (iii) do letters work better when addressed to the employer or when addressed to the workers? Firms in Albania submit monthly payroll declarations to the tax authority, which determines tax and social security obligations due. This monthly interaction with tax authorities (given the limited reach of the latter to monitor the behavior of firms) provides ample opportunities for taxpayers to partly evade their obligations, and provides a justification to intervene when suspicious behavior is identified. While we posed these questions prior to the onset of the COVID-19 pandemic, the answers themselves are even more relevant in this unique environment. To this end, we designed and implemented a randomized controlled trial (RCT) in collaboration with the General Directorate of Taxes (GDT) in early 2020 to test the effectiveness of a notification strategy to suspected taxpayers (firms and their employees). Implementation coincided with the onset of the COVID-19, which presented a unique opportunity to evaluate the impact of notifications when both firms and their employees were faced with new (and unforeseen) challenges. Our results highlight a substantial, sustained impact on payroll declarations (magnitude of up to 10%) of behaviorally-informed letters to firm owners. This impact was mostly driven by differential declaration amounts induced by the soft-toned letters. The remainder of the paper is organized as follows: Section II reviews the most recent literature on nudging for voluntary tax compliance; section III describes the data used and the methods applied to design, implement and evaluate the randomized controlled trial with the tax authorities; section IV describes the results from this exercise; section V discusses the limitations of the work; and section VI discusses the implications of the findings for policy and future research. 3 II. Literature The literature on the application of insights from behavioral science to tax compliance is relatively new, though builds off assumptions of taxpayer self-interest and puts into question the notion of the rational, fully informed taxpayer. Traditional models of tax compliance (based on the expected utility theory (EUT)) assume that taxpayers are rational and act selfishly to optimize their utility (e.g., see Allingman and Sandmo, 1972 and Yitzhaki, 1974). In other words, taxpayers minimize the amount of taxes they pay given the audit probability they face, and tax authorities have complete information about taxpayers (as well as the resources to carry out enforcement activities when risks are identified). However, the empirical evidence does not support the EUT in different ways (see Pudney et al., 2000; Dhami and Al-Nowaihi, 2007), leading to new approaches to enforcing tax compliance that assume taxpayers are only partly rational, and make decisions based on other factors such as tax morale, limited attention and the credibility of enforcement actions from tax authorities (including how much information they actually possess about individual firms), among others. These new approaches (not necessarily directly informed by behavioral science, but nonetheless applying many of the shared tools) have been documented through the widespread use of tax experiments (information field experiments), which highlight the role of communication (or more generally, engagement) by the tax authorities directed at taxpayers to induce a change in behavior without the need for more aggressive (and thus costly) interventions (such as audits and the execution of penalties and fines) (Mascagni, 2018). The assumption of this approach is that the mere communication of information (e.g., that behavior is being monitored) is enough to change behavior, since it influences the perception from taxpayers that they could face consequences. In the wake of this experimental tax work, various studies introducing key elements from behavior science into the tax compliance/evasion function have been carried out in low- to high-income countries (Hallsworth, 2015; Hallsworth et al., 2017; Rees-Jones, 2018; Cranor et al., 2020). In practice, this approach assumes that taxpayers are quasi-rational (see Engström et al., 2015), but they are first and foremost human, meaning that their decisions are a function of their knowledge (awareness of tax obligations), attitudes and beliefs (about institutions, the behavior of other taxpayers, etc.), and their cognition (their mental capacity to carry out their obligations, which can be confusing, time-consuming, and frustrating) (Goodnow-Dalton et al. 2021). Moreover, the approach acknowledges that tax authorities rarely have the necessary resources to carry out all the audits they would like, or even the complete information to fully assess the risk of certain taxpayers (Pomeranz & Vila-Belda 2019). This body of evidence points to important (and generally, cost- effective) impacts of targeted communication from tax authorities, highlighting how the effectiveness of different approaches (language applied, channels used) is highly sensitive to the context. For example, evidence points to the highly effective use of communications centered around deterrence messages (Hernandez et al., 2017), social norms (Hallsworth et al., 2017), national pride (Hallsworth et al., 2017), and public goods provision (Mascagni et al., 2017) relative to no messages or in many cases standard messages without any behaviorally informed cues (status quo framing and language used by tax authorities). While substantial evidence exists suggesting there are gains from using behaviorally informed notification strategies to induce taxpayer compliance, a more systematic handling of the evidence puts into question the reach of this approach in the long-run, as well as the expected magnitude of these impacts. Atinyan & Asatryan (2020) conducted a meta-analysis of 45 randomized controlled trials applying nudging approaches to tax compliance and found that effects from these experiments are small in magnitude and limited to deterrence (hard or strong tone) messages meant to influence perceived audit probabilities and the private costs of noncompliance (fines). Atinyan & Asatryan’s findings are important given that the approach continues to be applied across countries, and their relevance in even more challenging environments for tax authorities needs to continue to be validated. 4 III. Data & Methods Our study is based on a pilot experiment designed and implemented jointly with the General Directorate of Taxation (GDT) in early 2020 based on 5,423 registered firms (and over 8,000 targeted employees) across select occupation groups, identified by the authorities as having previously declared suspiciously low pay for a subset of employees on their payroll. This risk assessment is carried out on a quarterly basis by the Risk Management Directorate of the GDT based on average wages defined for select occupations. There is evidence of bunching of wages at or near the minimum wage, and this tendency carries over into professions where the minimum wage is substantially below the expected (market) wage. As such, the GDT assesses these tendencies and communicates with firms and their employees about this suspicious reporting. Setting & taxpayer sample The sample of taxpayers included in our study are not a random sample of firms but rather a subset of firms subject to the personal income tax (PIT) that were assessed to be at risk of under-declaring wages by the Risk Management Directorate of the GDT. The risk assessment is carried out on a quarterly basis, and the assessment to identify the sample of firms in our study was carried out in early 2020. This exercise identified individual taxpayers (employees) with a declared wage substantially below the market wage (defined through tax records from the entire universe of taxpayers) for a given sector and occupation based on monthly declarations in the fourth quarter of 2019. The exercise is carried out by the GDT and evaluates if the difference between the reported and expected (market) wage are large enough to generate suspicion, and these suspicions are validated with additional information on employees and their firm. The assessment targeted a total of 21 professions ranging from non-technical (e.g., van driver) to technical (e.g. pharmacist) occupations, none of which had been targeted in previous campaigns. 3 The risk assessment resulted in the identification of over 8,000 employees 4 across the 5,423 targeted firms. For 2,132 (39%) of these firms, the owner herself/himself was identified as an employee with suspiciously low pay. The 5,423 targeted firms 5 are not representative of all registered (and tax paying) firms in Albania, though they cover key sectors where tax evasion is assumed to be substantial, including accommodation/food service and professional services. This unrepresentative targeting is by design, since the firms targeted were determined by suspicious reporting for certain employees given their profession. Most of the professions targeted by the Risk Management Directorate exist in firms across sectors (and within a given targeted firm, more than one profession could be targeted), whereas some are highly specialized. In terms of sectoral composition of the targeted firms, 46 percent could be classified under professional services, 30 percent under wholesale and retail trade, 15 percent under manufacturing, 5 percent under construction, and 4 percent under transport. Meanwhile, in terms of targeted employees, 20 percent of all targeted employees are classified as “Administrators”, 11 percent are classified as computer technicians, and all other professions captured less than 10 percent of the total. For 85 percent of firms, only one employee was 3 While the targeting strategy was relatively new at the time of our experiment, previous campaigns had been carried out with standard letters and a unique set of professions (and thus employees). All indications suggest that employers and employees targeted in our experiment had not been directly or indirectly exposed to this type of campaign in the past. 4 In some cases, employees were employed with more than one firm, but these represent less than 1% of all employees targeted. 5 Of these firms, 165 are excluded from the results below because they submitted no declarations in 2020 filing periods (targeting was done based on December 2019 filing data). There are no statistical differences in treatment assignment for those excluded and those included in the analysis. 5 targeted; for the remaining 15 percent of firms, an average of four employees were targeted (for one firm, as many as 209 employees were targeted). Experimental design Jointly with the GDT, we designed a tax experiment in the form of a randomized controlled trial (RCT). While an existing notification strategy was in place at the time the study was initiated, there was no impact evaluation framework in place to causally evaluate the impact of letters from the GDT to suspicious taxpayers on declaration behavior (namely, the level of PIT withholding reported to the authorities). The experiment centered around the random assignment of firms (employers and a subset of their employees) to receive one of two variants of letters (treatment groups) or no letter at all (the control group). We opted for not having a third group to be assigned to the original (unmodified) letter since the interest of the authorities was to improve the presentation and framing of their letters in such a way that the impact was expected to be unambiguously positive, 6 but also to answer an important question regarding the type of framing that is most appropriate in the specific setting (strong-tone vs. soft-tone). The assignment to only one of three experimental groups was further motivated by a relatively small sample size and the need for the experiment to be clustered at the firm level. 7 The experiment was designed for treatment status to be clustered at the firm level; that is, the employer would receive the exact same assignment as employees targeted within the firm (the subset of employees targeted as having suspiciously low pay was deterministic of the firm targeted in the experiment). We had initially considered a non-clustered design, wherein firms would first be randomly assigned to one of the three treatment groups (no letter, strong-tone letter, and soft-tone letter) after which targeted employees would be independently assigned to no letter or one of the two letter tones. This design was ultimately foregone given a lack of statistical power from the available sample of 8,000 employees (and the number of potential combinations of treatment at two levels). Such a design would allow for an additional element to induce compliance (employee bargaining power). Unfortunately, implementation challenges with the IT systems resulted in a non-clustered, albeit random (by chance), treatment assignment for employees and employers (independent of one another), which we discuss in section V.8 As a result, we were able to estimate experimental impacts for letters assigned to employers or employees (independent of one another) as well as quasi-experimental impacts for letters assigned to employers and employees (or a different combination of the two, including the variation in tone). However, given anecdotal information that most employees would not have received letters directed to them (despite being delivered to the place of business, most employees would have been at home at the time letters were delivered due to the national lockdown) the focus of the experiment is on the random assignment of firms (employers) to one of three treatment groups. The GDT had previously (as recently as November 2019) carried out a targeted notification campaign to address underreporting of wages, though the evidence highlighting its impact was not entirely unambiguous given the lack of a control group. In support of the GDT’s ongoing strategy and in an effort to find marginal 6 By behavioralized, we mean elements of presentation, framing, and language that are typically associated with human-centered design: including a clear call to action, providing information in the simplest language possible, avoiding heavy legal language, and making the notifications visually appealing and as personalized as possible. Non-behavioralized thus refers to the standard notifications sent by public agencies that put disproportionate weight 7 Power calculations carried out by the team recommended that only 3 treatment arms at most be considered to assure a MDE consistent with the literature. 8 Random assignment was carried out by the GDT team rather than the World Bank team due to time considerations. 6 improvements in their targeted campaigns, we designed two alternative behaviorally-informed postal letters (based on an original letter considered by the GDT, directed at either employers or employees) -one soft- tone and one strong-tone, to reflect a tax morale vs. deterrence approach (respectively) to improve voluntary compliance. The pilot experiment involved random assignment to a control group (no letter) and one of two letter groups (soft vs. strong-tone letter), with one third of firms (and approximately one third of targeted employees) in each. Both alternative letters parted from the original letter used by the GDT, and incorporated a number of behavioral insights that have proven to be effective across settings (Goodnow-Dalton, et al. 2021), including the simplification of language (e.g., removal of unnecessary legal language and the inclusion of a call to action), the format of the letters themselves (to make these more visually appealing to the reader in a way that directed attention to specific messages), and the inclusion of key language around either enforcement (strong-tone) or centered around the role of taxes in public goods provision (soft-tone). In particular, the strong-tone letter included language around monitoring and verifying of salaries of taxpayers relative to the market, and continuous monitoring and rectification of irregularities and reactions from taxpayers. The soft- tone letter included a message about the social norms of wage declaration and the consequences of under- reporting in the form of detriment to worker welfare, as well as a message about responsibility and civic duty (i.e., tax morale). The full text of each of the two letters to employers (translated to English from Albanian) is included in the Annex. 9 A total of 3,618 letters were sent to employers in the first half of March 2020, approximately two weeks before the deadline to submit the payroll declaration for February 2020 (the due date for February was March 20th). Letters were sent out by the tax authority on March 2nd, and delivery of these occurred between that March 5th and around March 13th. No data is available on the delivery date (or success) of individual letters, but these were printed, sealed and sent out by regional directorates, so distance to the capital (Tirana) should not have predicted the delivery date. As part of normal procedure, the GDT also shared electronic pop-ups on the e-filing portal (viewed when a targeted account was accessed) to reinforce the messages of the letters (pop-ups were assigned to the same random sample of taxpayers, and the tone/content of the letters was the same). Given that the first COVID-19 case in Albania was made public on March 8th and the lockdown due to the pandemic began on March 10th, it is likely that an equal proportion of letters were delivered before drastic changes began to occur in business activity as after. Empirical model & estimation strategy The longitudinal nature of the administrative data allows us to evaluate short- to long-term impacts of the letters on taxpayer behavior. Given that PIT withholding is declared monthly by firms and that the letters were sent only once (in March 2020), there is value in understanding both changes in immediate behavior as well as changes in long-term behavior, what in other contexts could be viewed as a habit shift (a permanent shift in declaration behavior). Our main estimator of interest -given the experimental nature of the trial -is the intent to treat (ITT), which captures the differential impact associated with those assigned to receive one of the two letters highlighting suspicion of under-declaration of wages. Unlike other recent studies (e.g. Hernandez, et al. 2019), data on 9 Letters directed to employees were essentially the same as those directed to employers, the only difference was the framing of costs/benefits of compliance/evasion in terms individual welfare (e.g., “You should be aware that a lower declared wage is detrimental to your welfare: it deprives you of future retirement, health and unemployment benefits that your hard work has earned you and lowers the quality of public services that are meant to benefit you, your family, and your community”). 7 the success of delivery of these letters (i.e., among those assigned to receive the letters, which taxpayers actually received them) was unavailable, so the local average treatment effect (LATE) -which simplifies to the average treatment on the treated (ATT) given that treatment non-compliance is one-sided only -cannot be estimated. Let Yit be the outcome of interest measuring tax compliance (e.g., the payroll amount declared) for each firm i at time t. Continuing with the example of aggregate payroll amount at the firm level (the sum of wages reported for the firm in each month), we have: (1) Yit = f(Xit,αit,βit,γit, εit) ∀ i=1,2,…N firms and t=0,1,…T periods where X is a vector of firm characteristics that dictate the level of declarations (e.g., firm output, number of employees, tax rate to which each firm is subjected, which is in and of itself a function of firm size but also sector, etc.) and are partially observed by the tax authorities; α is the audit probability estimated by the firm (which can vary by firms depending on risk preferences of owners and accountants, past enforcement of their own firm or within their sector more generally, and can be influenced by increased enforcement activities on the whole by the tax administration), β is some measure of employee influence over the tax reporting decisions of the firm (bargaining power of individual employees); γ is meant to capture tax morale (the intrinsic and extrinsic willingness and motivation to pay tax), and ε are contextual factors within which a firm operates and are likely to influence their declaration behavior (e.g., perceived social norm of compliance among their immediate competitors and partners). The parameters α,β,γ, ε are not observed by the tax authorities. The effect of each on compliance measures is -in theory -unambiguously positive, except for β (employees might wish to have less declared on their behalf when they have more bargaining power due to present bias and aversion to paying taxes). Strategic reminders -based on the literature -can influence α,β,γ, ε directly (and X indirectly) through the impact they can have on attitudes, beliefs, and perceptions. Parting from (1), our interest in this study is to evaluate the impact of one-off strategic reminders to taxpayers engaged in suspicious declaration behavior (in PIT withholding) on various measures of compliance captured over multiple declaration periods. Given the tremendous amount of fluctuation in reported levels of payroll during this period (due to temporary business closures or reductions in operating capacity), our main estimation approach utilizes a random effects model estimation with a full set of time and treatment interactions to attempt to isolate the true effect of the letters for a given period. Unlike studies applying fixed or random effects or difference-in-difference estimation where treatment is staggered or continuous over a longer period of time (i.e., more than two periods), our estimation can simply consider the full set of time period-letter assignment interactions to identify treatment effects for specific filing periods, holding all other interactions constant. 10 To this end, we consider two linear regressions (for simplification, we focus on the log of the continuous outcome of the total wage bill at the firm level, though our results also look at the log of employment and log of the average wage per employee in the firm): 10 A recent paper by Callaway & Sant’Anna (2020) discusses estimation strategies to capture treatment effects through a difference-in-difference estimator when there are multiple time periods, variation in treatment timing, and when assignment is not statistically random. 8 (2) Log(Yit) = α + β1Posti + β2StrongTonei + β3SoftTonei + β4PostStrongTonei + β5PostSoftTonei + γ + εit =1 4 τStrongTonei + ∑=1 5 τSoftTonei + γ + εit (3) Log(Yit) = α + β1λi + β2StrongTonei + β3SoftTonei + ∑ ∀ i=1,2,…N firms and t=(0, t) periods where α is the intercept, StrongTone the assignment to the hard tone (strong) letter, SoftTone the assignment to the soft tone (soft) letter, γ is a fixed (or random) firm effect and ε the error term. β4 and β5 represent the intent-to-treat estimates of the treatment effect of the letters (soft and strong) on the outcome of interest, which in the case of the (log) wage bill represent the percentage increase in the wage bill given random assignment to each of the letters relative to the control condition (no letter). In (2), Post is a dummy capturing the “endline” period of the two-period regression (pre vs. post); in (3), λ is a vector of time (filing period) dummies with their corresponding coefficients β1 and τ is the filing period dummy for each filing period from 1 to T (December). Equation (2) is equivalent to measuring the short term, cross-sectional impact of the notifications of tax declarations, which is the effect size that typically gets discussed in the literature (the one-off impact of the nudge on behavior based on an outcome in a specific moment in time compared to a baseline). It is a random effects estimation with interaction effects but ignores intermediate periods between the baseline and endline (when a gap exists). Given that the timing of the intervention (delivery of letters would have occurred within one week of the national emergency being declared), it is problematic to consider a cross-sectional estimation of impacts given that the first behavior recorded post-intervention coincided with multiple changes to the environment wherein taxpayers were operating, and thus a substantial amount of noise in the data. Equation (3) measures the longer-term, longitudinal impact of the one-time notifications on declaration behavior, and thus is subject to time variant and invariant fixed effects. (3) is effectively a difference in difference estimation to account for both time variant and time invariant impacts from the one-off letters, and this is our preferred estimation strategy, though we also report the impacts identified by estimating (2) above. The focus of our analysis is on the total wage bill reported by the firm, for a handful of reasons. Focusing on individual wages (of targeted employees) runs the risk of ignoring externalities in declaration behavior that are applied across the board for all employees or for a sample of untargeted employees. This is particularly true if the number of employees declared by the firm changes (new employees enrolled, existing employees dropped), and a reaction is still there (compensating changes in total employment by paying the surviving employees more or less). We nonetheless test the impact of the letters on a additional firm level measures -for example, the total number of employees and the average wage of employees -in addition to the total payroll declared by the firm. Given the expected differential response of the two types of firms described above (those where the owner is targeted as an employee and those where they are not), the analysis will address potential heterogeneity by estimating impacts for these two subgroups. IV. Results In our study, we focus on two main sets of results: (i) The cross-sectional impacts of the letters on declaration behavior (as measured by the amount of PIT withholding declared, in addition to the secondary measures discussed above) and (ii) the longitudinal impacts (and thus, persistence) over time. An evaluation of the persistence of impacts (understood as either a continued divergence in behavior or simply a lack of 9 convergence) is useful for many reasons. For one, most evidence in the literature focuses on impacts in the short-term, as well as highlights the limited reach of nudging to induce the change in attitudes that is required for permanent changes in behavior. For another, the pandemic created a very uncertain environment for businesses, so focusing on a longer-term impact of these letters is useful to rule out a clear failure of external validity. In addition to cross-sectional vs. longitudinal impacts of the letters of key compliance outcomes, we also evaluate these impacts independently for two types of firms: those where the owner of the firm is also an employee targeted as reporting suspiciously low pay, and those where this is not the case. This distinction is important because ex-ante, the reaction to the notifications could differ substantially based on the dual role of the firm owner, and also because it highlights the heterogeneity of firms covered in our experiment. Regression analysis estimating the causal effect of letters sent to employers highlight the short-term impact of soft-tone letters on payroll declarations, and no impact from strong-tone letters. In the filing periods with due dates after the March implementation of the trial (March and April periods), the soft letter led to an increase in declared payroll for employers receiving the soft letter by between 7 and 10 percent relative to the group of employers receiving no letter. Below, we discuss these results in greater detail. Evaluating impacts on firm payroll Figure 1 below provides a visualization of the sustained impact of the two letters (across all firms) over a 12-month period, particularly the soft tone letter. What stands out most from this figure is the muted reduction in the firm wage bill between March and April for the group assigned to the soft letter, which displays the largest gap over the 12-month period. This gap fluctuates over time and begins to close (particularly compared to firms assigned to the strong letter), while hinting towards a growing gap between firms assigned to either of the two letters relative to those assigned to neither beginning in September. What this trend illustrates is that the reduction in declared wages in March (reflecting the drop in business activity in the first month of the pandemic) is sensitive to the language of language of the letter, and that the return to declaring normal wage levels (beginning in November) is only achieved by firms assigned to receive a notification letter. The gap in payroll declarations between the group of taxpayers receiving the soft letter and no letter was sustained throughout most of the period of evaluation (10 declaration periods post letter delivery). This finding provides some evidence of persistent impacts in the long term: After an initial shift in declaration amounts in the second and third filing periods after letters were disbursed, declaration levels of the group not assigned to notifications failed to converge with the soft-letter group. Below, we evaluate these trends by estimating equations (2) and (3). 10 Figure 1. Indexed wage bill by treatment group over time 110 105 100 Index (January=100) 95 90 85 80 75 70 July October November January August September May February March June December April No letter Soft letter Strong letter Source: Authors’ calculations. Figure plots the aggregate payroll declared by the firm given random assignment to the employer letter. Values are indexed relative to January wage levels (wage in January=100 for all treatment groups) i. Cross sectional impacts of letters on declaration behavior Table 1 below reports the difference in difference estimators on the treatment effect between the baseline (January) and each subsequent period, ignoring the time trend in payroll declarations over the 12-month period (which is incredibly noisy, as shown above). The results highlight the differential impact of the soft letter on the payroll reported by the firm, and no impact of the strong letter (and in fact an immediate negative, albeit marginally significant impact) relative to the group assigned to receive no letter. The important findings from this estimation are the strong negative time effect in the first few months after the onset of the pandemic (reflecting a drop in economic activity, and thus wages paid out) and the isolated effect of the letters. Payroll declarations dropped between 4 and 30 percent between March and July (significant at the 1% level except for the July declaration month), returning to a statistically identical amount beginning in August and even increasing by 3% in December (albeit only significant at the 10% level). The first post-implementation filing period (February, where the deadline for reporting was March 20th, about one week after the onset of the pandemic) in theory would not have been influenced by the pandemic (given that February activity was not affected by the pandemic) but could have influenced the behavior of firms that at the time the letter was received had not yet submitted their February declaration. In fact, we find a small, marginally significant negative impact of the strong tone letter (3% reduction in payroll reported, significant at the 10% level). Given that treatment groups were balanced across those who had and who had not submitted a declaration for February by the time the letters were delivered around March 5th, this result is interesting as it might point to backfire effects of strong language at times of elevated uncertainty among taxpayers. We will discuss more of this in section VI. 11 Table 1. Cross-sectional treatment effects on payroll reported by the firm Declaration period considered for endline Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec soft letter -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.01 -0.01 -0.01 [0.05] [0.05] [0.05] [0.05] [0.05] [0.05] [0.05] [0.05] [0.05] [0.05] [0.05] strong letter -0.06 -0.06 -0.06 -0.07 -0.06 -0.07 -0.07 -0.07 -0.05 -0.05 -0.05 [0.05] [0.05] [0.05] [0.05] [0.05] [0.05] [0.05] [0.05] [0.05] [0.05] [0.05] post=1 0.01 -0.11*** -0.30*** -0.15*** -0.07*** -0.04* -0.03 -0.02 0.00 0.00 0.03* [0.01] [0.02] [0.04] [0.03] [0.02] [0.02] [0.02] [0.02] [0.02] [0.02] [0.02] soft letter # post=1 -0.01 0.06** 0.10** 0.02 0.03 0.03 0.04 0.04 -0.01 0.02 0.00 [0.01] [0.03] [0.05] [0.04] [0.03] [0.04] [0.03] [0.03] [0.03] [0.03] [0.03] strong letter # post=1 -0.03* 0.03 0.05 0.02 0.02 0.01 0.01 0.01 -0.00 -0.01 -0.03 [0.01] [0.03] [0.05] [0.04] [0.03] [0.03] [0.03] [0.03] [0.03] [0.03] [0.03] Constant 11.25*** 11.25*** 11.25*** 11.25*** 11.25*** 11.25*** 11.25*** 11.25*** 11.24*** 11.24*** 11.24*** [0.03] [0.03] [0.03] [0.03] [0.03] [0.03] [0.03] [0.03] [0.04] [0.04] [0.04] Observations 10342 10225 10005 10013 10019 10005 9943 9912 9835 9796 9738 Source: Authors' calculations using historical tax declaration records. Random effects model estimated. Robust standard errors in parenthesis *** p<0.01, ** p<0.05, * p<0.10 ii. Longitudinal impacts of letters on firm payroll declarations The illustrative evidence in Figure 1 above is further supported by the estimation of equation (3), which accounts for both time-variant and time-invariant factors. Figure 2 below (full results in table A1 of the Annex) reports the treatment effects from the random assignment of firms to the soft or strong tone letters (relative to the control group) on the log wage bill (as such, coefficients capture the percent difference in the total wage bill). The coefficients of interest here are the interaction between the month of declaration and the random assignment, which capture the impact of each letter at a specific declaration period relative to the control group at baseline (January). 11 We observe large, statistically significant impacts of the soft letter across all firms in March (6.9% impact on the wage bill relative to no notification in January , p<0.01) and April (10.2% impact on the wage bill relative to no notification in January, p<0.05), but no impacts from the strong letter. 11 Not shown in the table are the time effects or the time invariant treatment effects, which highlight that there are no statistically significant differences in payroll across treatment assignment at baseline (January), and that there is a statistically significant drop in payroll declared (relative to January) that persists through November. Full results are provided in the annex. 12 Figure 2. Dynamic treatment effects on payroll reported by firm Soft letter Strong letter .3 .3 .25 .25 Marginal effect on payroll (w.r.t control) .2 .2 Marginal effect on payroll (w.r.t control) .15 .15 .1 .1 .05 .05 0 0 -.05 -.05 -.1 -.1 -.15 -.15 -.2 -.2 Nov Dec Nov Dec Mar Apr Mar Apr Feb Jun Aug Sep Feb Jun Aug Sep Jul Jul May May Oct Oct Filing period Filing period Source: Authors' calculations using historical tax declaration records. Coefficients from random effects model plotted. Evaluating impacts on employment & average wages The impacts on the firm wage bill highlighted above can be decomposed into two changes: a change (likely positive) in the average wage per employee and a change (likely negative) in the total number of workers employed (reported) by the firm. The letters were meant to address suspiciously low wages for select employees, so the expected impact of these letters was an increase in the total wage bill due to a higher wage reported for targeted employees. However, at the time of the study there was anecdotal evidence that some firms continue to report wages for employees no longer employed by the firm (administrative error in filing), so some employees would be “let go” as a result of these letters. However, a rational decision by the firm would also be to either terminate employees because of the letters (to hide the suspiciously low pay from the authorities) or as a result of the pandemic (lower firm operations, so some employees get let go). For simplicity, we focus on the dynamic treatment effects (using the entire series of data) rather than the cross-sectional ones. Figure 3 below presents the dynamic treatment effects on the total number of employees (in logs) reported by the firm. Interestingly enough, the impacts from treatment have an immediate effect on behavior that is independent on the impact of the global pandemic: The interaction between the soft letter and the February filing period (pre-pandemic employment, though with a due date after the pandemic started and the letters were received) is strong and negative (1.6% reduction in employees reported, p<0.05), particularly for firms 13 where the owners themselves were not targeted employees (NE in the table A2 of the Annex) (2.4%, p<0.01). A similar effect in the same filing period is observed on the number of untargeted employees in firms where the owners themselves are not targeted employees (1.9% reduction, p<0.05). The reduction in overall employment originating from the soft tone letter continues in the March filing period, though only among non-owner employed firms (3.1% reduction, p<0.01). The impact of the soft tone letter on the total number of untargeted employees becomes positive towards the end of the series, where the treatment effect fluctuates between 12 and 13 percent from September through November (p<0.05). As for the strong letter, impacts are only observed for select segments of firms and employees, and only beginning in the April filing period. Contrary to the soft tone letter, these impacts are positive throughout the series. For example, the number of untargeted employees reported increases by 9.5 percent in the April filing period for owner employed (OE in Table A2 in the Annex) firms (p<0.05). Among targeted employees across all firms, the strong letter leads to a net increase of between 1.1 and 2.7 percent in the period spanning April to December (p<0.05 in April and May, p<0.01 thereafter). These impacts appear to be driven primarily by non-owner employed firms (NE in the table). What this suggests -given the null effect of the strong tone letter on the overall firm payroll -is that firms exposed to the strong tone letter (relative to no letter at all) maintained more of their targeted employees on the payroll but without increasing the average wage of these (expanded upon below). Figure 3. Dynamic treatment effects on employment report by the firm Soft letter Strong letter .06 .06 .05 .05 Marginal effect on employment (w.r.t control) Marginal effect on employment (w.r.t control) .04 .04 .03 .03 .02 .02 .01 .01 0 0 -.01 -.01 -.02 -.02 -.03 -.03 -.04 -.04 -.05 -.05 Nov Dec Nov Dec Mar Apr Mar Apr Feb Jun Aug Sep Feb Jun Aug Sep Jul Jul May May Oct Oct Filing period Filing period Source: Authors' calculations using historical tax declaration records. Coefficients from random effects model plotted. 14 The impact of the letters on total employment (given the impacts on total payroll displayed in Figure 2) are suggestive of a temporary upward shift in the average declared wage (in the first half of the series) that returns to normal as the total employment balances out. Figure 4 below presents the dynamic treatment effects on the average wage (in logs) reported by the firm. In terms of the impacts of the letters on all employees, the expected positive net impact of the soft tone letters on the average wage can be seen clearly for the filing period of March (interaction between the soft tone letter and the March filing period), where wages are 7.5 percent higher (p<0.05) relative to control firms in January. This impact is mostly driven by the relative increase in wages of untargeted employees in non-owner employed firms. For example, the impact of the soft tone letter in the March filing period is 9.1 percent among non-owner employed firms (p<0.05). Beyond April (where a 13 percent impact on the average wage of untargeted employees is observed), some impacts are observed in August (6.7% increase, p<0.05) and in November (albeit marginally significant). In terms of the strong tone letter, the only impact on the average wage is observed for untargeted employees in owner-employed firms (OE in the table), and this impact is strong and negative (13.3% reduction, p<0.05). 15 Figure 4. Dynamic treatment effects on average wage reported by the firm Soft letter Strong letter .15 .15 Marginal effect on avg. wage (w.r.t control) Marginal effect on avg. wage (w.r.t control) .1 .1 .05 .05 0 0 -.05 -.05 Nov Dec Nov Dec Mar Apr Mar Apr Feb Jun Aug Sep Feb Jun Aug Sep Jul Jul May May Oct Oct Filing period Filing period Source: Authors' calculations using historical tax declaration records. Coefficients from random effects model plotted. Considering heterogeneity of impacts based on sector of activity Beyond the differential impact of the notification letters due to characteristics of the firm in terms of whether the owners themselves were explicitly targeted for suspicion of under-declaration, an important factor to consider in this context is the sector of economic activity of the firms subject to these notifications. Some industries would have been deemed as essential (e.g., food services) and others as non-essential (e.g., entertainment venues), and thus influence the operational capacity of firms and with it (a) access to postal mail and (b) the ability to retain employees. How these factors influence reactions to strategic notifications is seemingly ambiguous: on the one hand, essential businesses (such as grocery stores) that would have stayed open and not necessarily limited capacity would have for the most part “received the message” (employers and employees), but have been too busy to react. On the other hand, non-essential businesses (such as travel agencies) were much less likely to get the message in the first place, but also potentially more sensitive to the messages because of the uncertainty of their business and operational capacity in the short- and medium-term. While our data only permits us to look at very aggregate indicators of the type of economic activity, it nonetheless illustrates the heterogeneity of impacts based on firm characteristics. The estimation of equation (2) above by aggregated sector of economic activity (professional services, wholesale/retail trade, 16 construction, manufacturing, and transport) reveal that sensitivity to these notifications is conditional. For example, the impact of the soft tone letter on the total wage bill appears to be driven almost entirely by firms classified as engaging in professional services, which overall would be less essential than wholesale/retail trade (for example). The impact of the soft letter on the wage bill of firms in professional services was 11 and 18 percent in March and April filing periods and significant at the 5% level (respectively) whereas the impact for all other firms was 3-4 percent, and statistically insignificant. A potential explanation for this differential impact was that accountants for professional services firms were spending more time managing their tax returns when letters were received than day to day business operations. V. Limitations While our study provides strong evidence that notification letters to firms engaged in suspicious behavior can have an impact on declaration behavior during a global crisis (particularly, soft tone letters), more evidence is needed to eliminate doubts about the channel and magnitude of the impact as well as its persistence in the long-term. An initial concern originating from implementation challenges to the experiment -the challenges in clustering treatment assignment at the firm level -while posing minimal risks to the validity of the results, should nonetheless be addressed in future research. As we discussed above, the original design of the experiment stipulated that targeted employees within a given firm would receive the same letter assignment as their employers (if a firm owner received a soft tone letter, all targeted employees would receive a similarly toned letter). While this design was not ideal -clustering does not allow us to identify employer- employee bargaining and the differential impact of letters to employees vs. letters to their employers -it nonetheless provided an additional guarantee that the message was reaching the intended audience (either directly to the employer, who makes the filing decisions, or indirectly to the employee, who can influence the filing decisions of the employer). The result of this specific implementation challenge was that we effectively ended up with two independent experiments: (i) employers randomly assigned to one of three treatment conditions and (ii) employees assigned (randomly, by chance 12) to one of three treatment conditions. In the absence of a global pandemic, this would need to be considered more explicitly, but given the unique environment, this scenario likely had a limited effect on the treatment effects since most employees did not even get a chance to see these letters when it mattered most (more on this below). If anything, this would mean that the treatment effects are a lower bound estimate of the letters since presumably employees would further influence the behavior of employers making the filing decisions (given fear of punishment from the authorities, concern over individual welfare, or both). Beyond the implementation challenges that originated prior to the global pandemic, there are some additional considerations from this study; specifically, it is crucial to remain mindful of the very specific timing and context in which the experiment took place. The impact of the different types of messaging may have been uniquely affected by the pandemic. First, letters to employers and employees sent to their workplace may not have been effective because employees were not at their workplace during the initial stages of the pandemic (and the same could be said for many employers) - and therefore did not read the letters. In a context in which workers are now gradually 12 This by chance random assignment was confirmed through balance tests. 17 returning to their workplace, it may turn out that letters directed at employees will be more effective than what was found in this experiment, particularly given the increased relevance and importance of social protection in worker welfare. Unlike other studies evaluating the impacts of tax experiments, ours was unable to assess the success of the letter delivery (both for employers and employees), which is problematic since letters might have been delivered at a time when the intended audiences were not present. This would suggest that our impact estimates are likely a lower bound estimate of the true effect of the letters. Second, businesses managed their payroll more actively than usual during the pandemic. Firms responded to the crisis by letting go of employees or reducing work hours or pay – and then re-hiring as lockdowns were lifted and business activity started recovering. This meant that payroll was a more significant concern – and fewer firms just submitted a standard automated payroll declaration. This may have led to a stronger than usual response of payroll declarations to the employer letters - though it may also have reduced persistence of payroll treatment effects. Third, in a crisis context, enforcement messages (which were relatively muted compared to the literature) may have been less effective than they would be in more normal times, while appeals to social solidarity may have been relatively more effective. In a context in which businesses are struggling to make ends meet in the face of the pandemic, threatening messages enforcing tax collection may not have appeared credible and may have in fact hardened the resolve of businesses not to cooperate with the government. Moreover, the language in the strong letter was relatively tame, in the sense that enforcement actions to be carried out in the face of noncompliance were abstract, not referring to many actions beyond monitoring of behavior (the same was true of original letters used by the GDT in past campaigns). For example, in a study in Poland summarized by Hernandez et al. (2017), enforcement toned language was much more concrete, and thus potentially more credible (e.g., referring to Enforcement Orders and freezing of salaries and bank accounts). In contrast, given a national tragedy, businesses may have been particularly receptive to messages appealing to their social duty to contribute to revenue mobilization to fight the pandemic. Fourth, given the role of the tax authority in delivering wage subsidies during the early stages of the pandemic, its communications may have had greater than usual weight in terms of influencing business declaration behavior, and for some types of firms more than others. Notably, during the lockdowns in the initial phase of the pandemic, the tax authority was responsible for administering wage subsidies paid out on basis of the declared payroll of businesses. This meant that letters sent out by the tax authorities were possibly read more carefully than in normal times, particularly for businesses that were disproportionately impacted by the pandemic (e.g., professional services compared to wholesale and retail trade, where we saw some heterogeneity in impacts, discussed earlier). Finally, the crisis has likely increased noise in the tax data, making it more difficult to identify a statistically significant impact of the letters on payroll declaration behavior. For all these reasons, further research is needed to fully determine how context- dependent the results of the RCT presented in this study are. VI. Discussion This study provides a unique opportunity to evaluate the impact of strategic notifications to taxpayers in the context of a global pandemic. Letters were delivered at a time when businesses were forced to drastically cut their operations and likely downsize (both in terms of employees and production overall), and business (and thus accounting) practices likely shifted dramatically. This may have led to a stronger than usual response of payroll declarations to the soft-toned employer letters (and weaker than usual reaction to enforcement messages, which could have been viewed as non-credible or insensitive) particularly since tax 18 morale (which is influenced by factors such as trust in institutions and perceptions regarding accountability and reciprocity) was likely to be sensitive to the changes occurring during the pandemic. However, it is safe to assume that the unrealized impact of these letters on compliance is a lower bound estimate of what could be expected in normal times because the costs of compliance (in terms of loss of liquidity) are elevated with high levels of uncertainty and reductions in the productive capacity of most firms and the effectiveness of the letters limited due to the high likelihood of letters not making it into the right hands. In addition to uncovering important impacts of the letters on firm behavior, the experiment yielded substantial monetary gains with a single round of notifications in a very short period. Back of the envelope calculations suggest that letters sent to employers raised an additional 932,000 USD in PIT revenues that would have otherwise not been recovered in the absence of the letters. Our results suggest that in crisis situations, positively framed language highlighting the benefits from tax compliance are more effective at generating immediate (positive) reactions from taxpayers while having meaningful impacts on revenues. Future efforts should be made to test the differential impact of soft vs. strong-tone language in a post-COVID environment to validate these findings. Moreover, our study was unable to identify how the employer-employee dynamics influences tax compliance. Future studies (in Albania and beyond) should explore this dynamic at the firm level (including bargaining and collusion) to improve the efficiency of tax administrations in dealing with the underreporting of wages in low enforcement capacity environments. References Antinyan, A. & Asatryan, Z. (2020). "Nudging for Tax Compliance: A Meta-Analysis," CESifo Working Paper Series 8500, CESifo. Allingham, M.G., Sandmo A., 1972. Income tax evasion: A theoretical analysis. Journal of Public Economics 1, 323-38. Callaway, B. & Sant’Anna, P. (2020). Difference-in-Differences with multiple time periods, Journal of Econometrics (in press) Cranor, T., Goldin, J., Homonoff, T. and Moore, L., 2020. Communicating Tax Penalties to Delinquent Taxpayers: Evidence from a Field Experiment, National Tax Journal 73(2), 331-36. Dalton, Abigail Goodnow; Manning, L.; Jamison, J.; Sen, I.; Karver, J.; Castaneda Nunez, J.; Guedes, L.; Mujica Estevez, S. 2021. Behavioral Insights for Tax Compliance. eMBeD brief. Washington, D.C. : World Bank Group. Dhami, Sanjit and Ali al-Nowaihi. 2007. Why do people pay taxes? Prospect theory versus expected utility theory. Journal of Economic Behavior and Organization 64, 171-92. Engström, P., Nordblom, K., Ohlsson, H. & Persson, A. (2015), ‘Tax Compliance and Loss Aversion’, American Economic Journal: Economic Policy 7(4), 132–164. Hallsworth, M., 2015. The use of field experiments to increase tax compliance, Oxford Review of Economic Policy, 30(4), Pages 658–679. 19 Hallsworth, M., List, J., Metcalfe, R., Vlaev, I., 2017. The behavioralist as tax collector: Using natural field experiments to enhance tax compliance, Journal of Public Economics 148, 14-31. Hernandez, M. Jamison, J., Korczyc, E., Mazar, N. & Sormani, R. 2017. Applying behavioral insights to improve tax collection : experimental evidence from Poland. Washington, D.C. : World Bank Group. Hernandez, M.; Karver, J.; Negre, M.; Perng, J. 2019. Promoting Tax Compliance in Kosovo with Behavioral Insights. Washington, D.C. : World Bank Group. Mascagni, G. Nell, C., & Monkam, N. (2017). One Size Does Not Fit All: A Field Experiment on the Drivers of Tax Compliance and Delivery Methods in Rwanda (January 2017). ICTD Working Paper 58. Mascagni, G. (2018). "From The Lab To The Field: A Review Of Tax Experiments," Journal of Economic Surveys, vol. 32(2), pages 273-301, April. Medina, L. and Schneider, F. (2019), Shedding Light on the Shadow Economy: A Global Database and the Interaction with the Official One. CESifo Working Paper 7981, December 2019. Munich Pomeranz, D. and Vila-Belda, J. (2019). Taking state-capacity research to the field: Insights from collaborations with tax authorities. Annual Review of Economics 11, 755-781. Pudney S.E., Pyle D, Saruc T., 2000. Income tax evasion: An experimental approach. In: MacDonald A, Pyle D., Illicit activity: The economics of crime and tax fraud. Ashgate: Aldershot. Schneider, F., & Enste, D. (2000). Shadow Economies: Size, Causes, and Consequences. Journal of Economic Literature, 38 (1), 77-114. Retrieved September 6, 2021, from http://www.jstor.org/stable/2565360 World Bank (2019). Western Balkans Labor Market Trends 2019. Washington, DC: World Bank Group Yitzhaki S., 1974. A note on income tax evasion: A theoretical analysis. Journal of Public Economics 3, 201-202. 20 Annex Figure A1. Employer letter, soft tone MINISTRIA E FINANCAVE DHE EKONOMISE DREJTORIA E PËRGJITHSHME E TATIMEVE Drejtoria e Tatimeve Tirane Drejtoria e Shërbimit të Tatimpaguesve Nr. Prot Tiranë , më .03.2020 TO: ___________ NIPT: ___________ ADRESA: __________________________ Correct the wages you report for your [occupation] as soon as possible! Dear Ms./Mr. [last name], Our monitoring of your tax returns gives us reason to believe that the salaries you declare for your [occupation] are not reflecting the level that you are in fact paying. The wages you report for [occupation] place your firm among a minority of employers that are paying substantially below the national average. You should be aware that declaring lower wages for your employees is detrimental to their welfare: it deprives them of future retirement, health and unemployment benefits that their hard work has earned them while lowering the quality of public services meant to benefit your business, family, and community. As taxpayers of this country, we have a responsibility and civic duty to comply with our tax obligations. Please reach out directly to the GDT by calling XX-XXX-XXXX for further information or so we can help you addresses this issue. We are committed to protecting the rights of all taxpayers in Albania, but we need your cooperation to do so. We truly appreciate your contribution as a taxpayer and citizen and thank you for your attention. Regional Director XX XX [digital signature] 21 Figure A2. Employer letter, strong tone MINISTRIA E FINANCAVE DHE EKONOMISE DREJTORIA E PËRGJITHSHME E TATIMEVE Drejtoria e Tatimeve Tirane Drejtoria e Shërbimit të Tatimpaguesve Nr. Prot Tiranë , më .03.2020 TO: ___________ NIPT: ___________ ADRESA: __________________________ Correct the wages you report for your [occupation] as soon as possible! Dear Mr./Ms. [last name], Our monitoring of your tax returns gives us reason to believe that the salaries you declare for your [occupation] are not reflecting the level that you are in fact paying. The wages you report for [occupation] imply you are paying them substantially below the national average. The GDT is continuously verifying salaries by comparing those declared by employers to those observed in the market. Please be advised that the GDT will continue to monitor the filing of your salaries and address irregularities as permitted by the law. Please take action as soon as possible to address this issue, as we will be monitoring your response to this letter. Please reach out directly to the GDT by calling XX-XXX-XXXX for further information or so we can help you addresses this issue. We are committed to protecting the rights of all taxpayers in Albania, but we need your cooperation to do so. We truly appreciate your contribution as a taxpayer and citizen and thank you for your attention. Regional Director XX XX [digital signature] 22 Table A1. Dynamic treatment effects on payroll reported by firm Non-owner Interaction effect All firms employed Owner employed soft letter # Periudha=2002 -0.004 -0.007 0.003 [0.016] [0.021] [0.026] soft letter # Periudha=2003 0.069*** 0.058* 0.085* [0.026] [0.032] [0.046] soft letter # Periudha=2004 0.102** 0.074 0.148** [0.049] [0.065] [0.073] soft letter # Periudha=2005 0.018 -0.015 0.072 [0.039] [0.049] [0.063] soft letter # Periudha=2006 0.033 -0.006 0.100 [0.033] [0.036] [0.065] soft letter # Periudha=2007 0.035 -0.012 0.115 [0.035] [0.036] [0.072] soft letter # Periudha=2008 0.036 -0.005 0.104 [0.034] [0.038] [0.067] soft letter # Periudha=2009 0.027 -0.016 0.100* [0.031] [0.036] [0.057] soft letter # Periudha=2010 -0.001 -0.045 0.074 [0.031] [0.037] [0.056] soft letter # Periudha=2011 0.028 0.003 0.069 [0.030] [0.035] [0.054] soft letter # Periudha=2012 0.014 -0.026 0.080 [0.031] [0.035] [0.059] strong letter # Periudha=2002 -0.020 -0.019 -0.024 [0.014] [0.015] [0.029] strong letter # Periudha=2003 0.034 0.044 0.017 [0.028] [0.032] [0.054] strong letter # Periudha=2004 0.038 0.018 0.071 [0.051] [0.067] [0.076] strong letter # Periudha=2005 0.020 0.019 0.020 [0.037] [0.044] [0.068] strong letter # Periudha=2006 0.011 0.000 0.027 [0.032] [0.033] [0.066] strong letter # Periudha=2007 0.003 -0.031 0.060 [0.033] [0.035] [0.067] strong letter # Periudha=2008 -0.002 -0.019 0.025 [0.033] [0.035] [0.066] strong letter # Periudha=2009 -0.004 -0.006 -0.003 [0.030] [0.030] [0.062] strong letter # Periudha=2010 -0.007 -0.012 -0.000 23 [0.029] [0.029] [0.060] strong letter # Periudha=2011 -0.010 -0.003 -0.025 [0.030] [0.031] [0.062] strong letter # Periudha=2012 -0.029 -0.027 -0.035 [0.031] [0.031] [0.066] Constant 11.240*** 11.564*** 10.705*** [0.035] [0.047] [0.046] Observations 57583 36377 21206 Source: Authors' calculations using historical tax declaration records. Random effects model estimated. 2001 refers to January 2020; 2002 to February 2020, and so on. Robust standard errors in parenthesis *** p<0.01, ** p<0.05, * p<0.10 Table A2. Dynamic treatment effects on employment report by the firm Total employment Targeted employees Untargeted employees All NE OE All NE OE All NE OE soft letter # Periudha=2002 -0.016** -0.024*** -0.007 -0.000 -0.001 -0.000 -0.014 -0.019** 0.007 [0.008] [0.009] [0.016] [0.003] [0.004] [0.001] [0.010] [0.009] [0.030] soft letter # Periudha=2003 -0.017 -0.031*** 0.004 0.002 -0.001 0.006 -0.001 -0.018 0.066* [0.011] [0.012] [0.020] [0.004] [0.006] [0.006] [0.013] [0.012] [0.038] soft letter # Periudha=2004 0.008 -0.001 0.018 0.002 -0.003 0.008 0.028 0.019 0.072 [0.016] [0.021] [0.024] [0.006] [0.010] [0.007] [0.020] [0.022] [0.047] soft letter # Periudha=2005 -0.002 -0.008 0.005 0.003 -0.002 0.008 0.014 0.005 0.052 [0.015] [0.019] [0.026] [0.007] [0.011] [0.007] [0.019] [0.020] [0.051] soft letter # Periudha=2006 0.012 0.004 0.024 0.007 0.002 0.012 0.024 0.012 0.080 [0.016] [0.018] [0.029] [0.007] [0.010] [0.009] [0.019] [0.020] [0.053] soft letter # Periudha=2007 0.005 -0.006 0.019 0.010 0.008 0.011 0.010 -0.006 0.079 [0.017] [0.019] [0.031] [0.007] [0.011] [0.008] [0.020] [0.021] [0.055] soft letter # Periudha=2008 0.010 -0.001 0.026 0.009 0.009 0.007 0.019 0.003 0.087 [0.018] [0.020] [0.033] [0.008] [0.012] [0.008] [0.021] [0.022] [0.056] soft letter # Periudha=2009 0.012 -0.005 0.038 0.012 0.016 0.006 0.027 0.002 0.131** [0.018] [0.020] [0.034] [0.008] [0.012] [0.008] [0.020] [0.021] [0.056] soft letter # Periudha=2010 0.016 0.000 0.039 0.010 0.014 0.005 0.030 0.008 0.120** [0.018] [0.021] [0.033] [0.008] [0.013] [0.008] [0.021] [0.022] [0.057] soft letter # Periudha=2011 0.020 0.006 0.040 0.010 0.016 0.001 0.033 0.010 0.127** [0.018] [0.021] [0.033] [0.008] [0.013] [0.008] [0.022] [0.023] [0.061] soft letter # Periudha=2012 0.013 -0.003 0.037 0.011 0.017 0.003 0.025 0.006 0.108* [0.019] [0.022] [0.034] [0.008] [0.013] [0.007] [0.022] [0.023] [0.061] strong letter # Periudha=2002 -0.011 -0.015* -0.006 0.002 0.002 0.001 -0.005 -0.013 0.030 [0.008] [0.009] [0.015] [0.002] [0.004] [0.001] [0.010] [0.010] [0.030] strong letter # Periudha=2003 -0.010 -0.014 -0.005 0.005 0.004 0.006 0.000 -0.013 0.054 [0.010] [0.012] [0.019] [0.004] [0.005] [0.006] [0.012] [0.012] [0.038] strong letter # Periudha=2004 0.009 0.008 0.010 0.011** 0.013* 0.009 0.031 0.017 0.095** 24 [0.016] [0.021] [0.024] [0.005] [0.008] [0.007] [0.019] [0.021] [0.047] strong letter # Periudha=2005 -0.008 -0.012 -0.003 0.013** 0.014* 0.010 0.010 -0.005 0.076 [0.016] [0.020] [0.027] [0.006] [0.009] [0.007] [0.019] [0.020] [0.051] strong letter # Periudha=2006 -0.012 -0.015 -0.007 0.018*** 0.019** 0.016* -0.002 -0.019 0.072 [0.016] [0.019] [0.029] [0.006] [0.009] [0.008] [0.020] [0.020] [0.055] strong letter # Periudha=2007 -0.018 -0.019 -0.016 0.019*** 0.023** 0.014* -0.010 -0.029 0.069 [0.017] [0.020] [0.032] [0.006] [0.009] [0.008] [0.021] [0.021] [0.059] strong letter # Periudha=2008 -0.004 -0.008 0.002 0.022*** 0.028*** 0.014* -0.002 -0.018 0.069 [0.018] [0.020] [0.034] [0.007] [0.010] [0.008] [0.021] [0.022] [0.060] strong letter # Periudha=2009 -0.006 -0.009 -0.002 0.022*** 0.031*** 0.011 0.003 -0.017 0.084 [0.018] [0.020] [0.034] [0.007] [0.010] [0.008] [0.021] [0.021] [0.059] strong letter # Periudha=2010 0.001 -0.013 0.023 0.025*** 0.036*** 0.011 0.011 -0.013 0.105* [0.018] [0.021] [0.033] [0.008] [0.012] [0.008] [0.022] [0.023] [0.059] strong letter # Periudha=2011 -0.006 -0.013 0.004 0.026*** 0.039*** 0.009 0.001 -0.016 0.072 [0.018] [0.021] [0.033] [0.007] [0.012] [0.007] [0.022] [0.023] [0.064] strong letter # Periudha=2012 -0.009 -0.007 -0.012 0.027*** 0.041*** 0.008 0.000 -0.008 0.041 [0.019] [0.022] [0.034] [0.008] [0.012] [0.007] [0.023] [0.024] [0.065] Observations 61798 38208 23590 54812 31655 23157 47350 37246 10104 Source: Authors' calculations using historical tax declaration records. Random effects model estimated. Table only shows interaction effects. Robust standard errors in parenthesis *** p<0.01, ** p<0.05, * p<0.10 Table A3. Dynamic treatment effects on average wage reported by the firm All employees Targeted employees Untargeted employees All NE OE All NE OE All NE OE soft letter # Periudha=2002 0.012 0.014 0.008 0.008 0.009 0.009 0.021 0.033 -0.010 [0.015] [0.020] [0.022] [0.014] [0.019] [0.021] [0.027] [0.033] [0.043] soft letter # Periudha=2003 0.075*** 0.074** 0.075* 0.048* 0.050 0.042 0.101*** 0.091** 0.133 [0.025] [0.031] [0.042] [0.027] [0.037] [0.040] [0.038] [0.038] [0.099] soft letter # Periudha=2004 0.091* 0.064 0.133* 0.055 0.044 0.071 0.131** 0.138* 0.111 [0.046] [0.062] [0.069] [0.051] [0.076] [0.060] [0.066] [0.073] [0.151] soft letter # Periudha=2005 0.017 -0.013 0.066 0.011 0.002 0.025 0.012 0.030 -0.038 [0.035] [0.045] [0.056] [0.039] [0.055] [0.053] [0.050] [0.054] [0.117] soft letter # Periudha=2006 0.024 -0.009 0.081 0.016 -0.012 0.058 0.059 0.067* 0.025 [0.029] [0.032] [0.058] [0.030] [0.038] [0.051] [0.041] [0.038] [0.124] soft letter # Periudha=2007 0.033 -0.012 0.108* 0.014 -0.039 0.092* 0.066 0.053 0.092 [0.030] [0.031] [0.062] [0.028] [0.030] [0.054] [0.043] [0.039] [0.125] soft letter # Periudha=2008 0.023 -0.011 0.079 0.004 -0.048 0.081 0.051 0.067** 0.004 [0.029] [0.032] [0.055] [0.029] [0.033] [0.052] [0.036] [0.034] [0.105] soft letter # Periudha=2009 0.009 -0.023 0.061 0.001 -0.027 0.042 0.041 0.035 0.062 [0.025] [0.030] [0.045] [0.025] [0.032] [0.042] [0.035] [0.035] [0.092] soft letter # Periudha=2010 -0.013 -0.045 0.040 -0.023 -0.045 0.012 0.034 0.034 0.039 25 [0.026] [0.032] [0.044] [0.023] [0.029] [0.041] [0.036] [0.039] [0.085] soft letter # Periudha=2011 0.004 -0.009 0.027 -0.011 -0.020 0.006 0.060* 0.068* 0.041 [0.024] [0.029] [0.042] [0.023] [0.029] [0.038] [0.035] [0.039] [0.084] soft letter # Periudha=2012 -0.006 -0.031 0.036 -0.008 -0.031 0.029 0.039 0.039 0.039 [0.025] [0.029] [0.047] [0.026] [0.031] [0.046] [0.033] [0.033] [0.090] - strong letter # Periudha=2002 -0.009 -0.002 -0.020 -0.018 -0.015 -0.026 -0.003 0.038 0.133** [0.013] [0.014] [0.027] [0.015] [0.019] [0.027] [0.026] [0.027] [0.067] strong letter # Periudha=2003 0.042 0.057* 0.018 0.035 0.064* -0.014 0.052 0.064 0.011 [0.027] [0.031] [0.050] [0.028] [0.035] [0.047] [0.039] [0.041] [0.097] strong letter # Periudha=2004 0.039 0.025 0.063 0.024 0.047 -0.012 0.046 0.089 -0.092 [0.048] [0.064] [0.073] [0.052] [0.075] [0.066] [0.070] [0.078] [0.157] strong letter # Periudha=2005 0.026 0.029 0.021 -0.000 0.023 -0.038 0.009 0.052 -0.128 [0.034] [0.041] [0.061] [0.040] [0.053] [0.059] [0.046] [0.049] [0.114] strong letter # Periudha=2006 0.021 0.015 0.032 -0.007 -0.005 -0.015 0.037 0.059 -0.045 [0.028] [0.028] [0.059] [0.030] [0.034] [0.054] [0.043] [0.043] [0.114] strong letter # Periudha=2007 0.022 -0.012 0.082 -0.021 -0.026 -0.016 0.016 0.005 0.031 [0.028] [0.029] [0.058] [0.029] [0.025] [0.062] [0.041] [0.041] [0.104] strong letter # Periudha=2008 0.003 -0.011 0.026 -0.026 -0.025 -0.030 -0.010 0.010 -0.080 [0.027] [0.028] [0.055] [0.028] [0.027] [0.057] [0.039] [0.040] [0.096] strong letter # Periudha=2009 -0.001 -0.001 0.000 -0.026 -0.008 -0.055 -0.012 0.016 -0.103 [0.024] [0.023] [0.050] [0.026] [0.025] [0.052] [0.036] [0.036] [0.094] - strong letter # Periudha=2010 -0.002 0.003 -0.011 -0.037 -0.003 0.089* 0.005 0.042 -0.112 [0.023] [0.023] [0.050] [0.023] [0.018] [0.052] [0.037] [0.039] [0.093] - - strong letter # Periudha=2011 -0.004 0.007 -0.022 0.051* -0.018 0.104* 0.029 0.054 -0.048 [0.025] [0.025] [0.052] [0.026] [0.027] [0.053] [0.039] [0.041] [0.093] strong letter # Periudha=2012 -0.017 -0.018 -0.014 -0.036 -0.000 -0.090 0.003 0.013 -0.032 [0.026] [0.024] [0.057] [0.026] [0.020] [0.059] [0.036] [0.035] [0.098] Observations 57583 36377 21206 51139 30719 20420 38589 28862 9727 Source: Authors' calculations using historical tax declaration records. Table only shows interaction effects. Robust standard errors in parenthesis *** p<0.01, ** p<0.05, * p<0.10 26