Policy Research Working Paper 10940 Tax Compliance in Romania Monica Robayo-Abril Georgiana Balaban Marcin Wronski Poverty and Equity Global Practice A verified reproducibility package for this paper is October 2024 available at http://reproducibility.worldbank.org, click here for direct access. Policy Research Working Paper 10940 Abstract This paper uses statistical matching techniques to assess tax in the bottom half of the income distribution. The results compliance and underreporting of labor income in Roma- show that tax-reported income at the median of the income nia, overall and for different population groups, including distribution equals only 90 percent of the true (survey) among minimum wage workers, to understand the distribu- income, and at the 25th percentile, this share is 83 percent. tional implications and its links with minimum wage policy Women are more tax compliant than men. Tax compliance and design. Understanding the extent and distribution of varies across sectors of the economy, regions of the coun- tax evasion is relevant for enhancing domestic tax capacity, try, and demographic groups. Transport, construction, and its redistributive impacts, and the links with social policy, food and accommodation are the sectors of the economy including minimum wage policy. Estimating the average with the lowest tax compliance. The underreporting of underreporting of income is challenging due to the signif- income results in lower fiscal capacity for the country and icant underrepresentation of top incomes in survey data. may also lower the efficiency of means-tested social assis- After censoring, the average underreporting of income is 6 tance. The underreporting of income significantly increases percent. When looking at the distribution of tax evasion, the share of minimum wage earners, which may impact the the analysis also shows significant underreporting of income minimum wage policy. This paper is a product of the Poverty and Equity Global Practice.. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at mrobayo@worldbank.org. A verified reproducibility package for this paper is available at http://reproducibility.worldbank. org, click here for direct access. RESEA CY LI R CH PO TRANSPARENT ANALYSIS S W R R E O KI P NG PA The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Tax Compliance in Romania Monica Robayo-Abril1 Georgiana Balaban2 Marcin Wronski3 1 World Bank 2 Romanian Ministry of Finance 3 World Bank and SGH Warsaw School of Economics JEL classifica ion: H24, H26, D31, J31, J38 Keywords: tax compliance; labor market; minimum wage; tax data; EU-SILC; imputation, Romania This paper was prepared as part of a collaboration between the World Bank and the Ministry of Finance in Romania to inform the tax compliance agenda in the country. The document was part of the Poverty Program for Romania in the World Bank's Global Poverty and Equity Practice. The authors are grateful for comments received from Madalina Avram (Romanian Ministry of Finance), Anne Brockmeyer (Senior Economist, World Bank), Reena Badiani (Senior Economist, World Bank) and George Stefan (The Bucharest University of Economic Studies). I. Introduction This paper aims to assess the level of tax compliance and under-reporting of income in Romania, including among minimum wage workers, and understand the distributional implications and its links with minimum wage policy and design. This study reflects a collaboration between the Romanian Ministry of Public Finance and the World Bank to evaluate the extent to which taxpayers, particularly those earning minimum wages, comply with tax obligations and report their income accurately to tax authorities. It also examines the distributional implications of tax compliance and under-reporting of income in Romania by assessing how tax compliance affects the distribution of tax burdens and benefits across different income groups, including minimum wage earners. By doing so, the study can provide insights into how tax policies and design can address inequalities and promote greater equity. Furthermore, the study aims to identify the links between minimum wage policy and tax compliance. Finally, it examines the impact of the minimum wage policy on tax compliance and under-reporting of income. Additionally, the study assesses how tax policies can be designed to support minimum wage earners and promote greater tax compliance. Understanding the extent and distribution of tax evasion is relevant for several reasons. Improving tax collection is a critical component of domestic revenue mobilization and addressing fiscal deficits. It can help the country address its fiscal deficits and outlook; enhancing domestic tax capacity is also critical for strengthening social protection and human capital development. Among other factors, the country's response to the pandemic has required significant public expenditure, putting pressure on its fiscal resources. According to the IMF Fiscal Monitor Database of Country Fiscal Measures in Response to the COVID–19 Pandemic, the value of additional spending and foregone 2 revenue in Romania equaled 3.6% of its GDP to protect the society and economy from the pandemic shock. 1 Therefore, the government must prioritize domestic revenue mobilization (DRM) to enhance its fiscal position. To achieve this, Romania must address tax compliance, tax evasion, and tax transparency issues. By improving tax compliance and reducing tax evasion, the government can generate more revenue, reduce the need for borrowing, and increase the country's fiscal space. For instance, as part of the Recovery and Resilience Plan, Romania is receiving financial resources for the modernization and digitalization of the Tax Administration (ANAF). In return, ANAF committed to increasing its revenue by 2.5 percentage points of GDP by the end of 2025 compared to 2019 and simultaneously reducing the VAT gap by 5 percentage points by Q2 2026 compared to 2019. Through increased domestic resources, Romania can invest more in social programs that protect the most vulnerable populations and support human capital development. These investments can include education, health, and training programs to improve the country's productivity and long-term economic growth. Second, tax evasion has significant redistributive impacts on the incidence and distribution of taxes; therefore, reducing tax evasion is crucial for ensuring the effectiveness of the tax system and promoting greater income equality. Tax evasion not only reduces the amount of taxes collected but also skews the distribution of taxes, potentially leading to increased income inequality. Tax evasion can lead to varying tax burdens on individuals with similar income levels, leading to significant differences in the actual tax burden on individuals with similar income levels, which can affect the efficacy of taxes and their distributional consequences (Bendek & Lelkes, 2011; Nygård et al., 2018; Alstadsæter et al., 2019; Argentiero et al., 2021; Leenders et al., 2023). This makes estimating income tax evasion and understanding its distributional implications essential. Tax evasion can also reduce households' personal income tax payments by a substantial amount, which can have implications for poverty and 1 Additional spending included partially covering the wages of employees of companies impacted by the shock, purchase of medical equipment and supplies, bonuses for health personnel, targeted measures for the tourism sector, and discounts for corporate income tax payments. Moreover, pandemic loan guarantees were worth 4.0% of the GDP [IMF, 2021]. 3 inequality estimates, especially if income underreporting is taken into account. Studies in other countries have shown that income inequality may also rise significantly due to tax evasion, as high- income households tend to evade taxes proportionately more than low-income households (Alstadsæter et al., 2019; Guyton et al., 2021; Leenders et al., 2023). This means that the burden of taxes falls disproportionately on lower-income households, leading to greater income inequality. Reducing tax evasion requires not only improving tax administration and compliance mechanisms, but also addressing underlying social and economic factors that contribute to tax evasion. By doing so, policy makers can promote greater equity and social welfare, improve fiscal sustainability, and ensure that the tax system serves its redistributive function effectively. Furthermore, tax evasion has important links with social policy, as it may reduce the tax system's progressivity, making it less effective in redistributing income from higher-income to lower-income individuals. The assessment of benefits for some programs may rely on a person's tax return. In Romania, child benefits are universal, granting the same amount regardless of the number of children, but other social benefits are means-tested. The list of means-tested benefits includes family allowance, child-care benefits, and social aid for guaranteed minimum income. Over a half million people receive means-tested social benefits (Adăscăliţei et al., 2020). Tax evasion renders targeting social benefits less effective, as there may be "leakages" to ineligible recipients who underreport their incomes. This can result in benefit fraud and lead to fiscal losses for the government and reduced social welfare for eligible recipients. Ignoring tax evasion can be misleading regarding the distributive and fiscal effects of social benefits and the tax system. It can reduce the tax system's progressivity as high-income payers may not pay their share. This exercise contributes to another analytical study currently being conducted by the World Bank for Romania: a fiscal incidence analysis (CEQ) that evaluates the distributive impacts of the overall fiscal system. The main results of the CEQ will be featured in the upcoming World Bank Public Expenditure Review. In particular, understanding the interaction between labor tax evasion and the minimum wage policy is crucial for optimal policy making. Payroll tax evaders may be overrepresented among minimum wage earners, meaning that a large share of those people who reported a minimum wage may be 4 underreporting their income and receiving, on average, much more of their actual income as "envelope wages" or non-declared cash in addition to their official wage. This can lead to a spike in the minimum wage in the wage distribution and undermine the effectiveness of the minimum wage policy. Envelope wages remain an important problem in Central and Eastern European transition economies despite some improvements. Empirical research confirmed that the spike in the minimum wage is partially explained by the underreporting of income (tax evaders are overrepresented among minimum wage workers) in Hungary (Tonin, 2011), Latvia (Gavoille & Zasova, 2023), and Poland (Kiełczewska et al., 2021). Therefore, assessing tax compliance and underreporting of income, particularly among minimum wage workers, is essential for promoting effective social policy and designing effective minimum wage policies. It can help ensure that the benefits of social programs are targeted to those eligible and promote greater equity in the tax system. It can also inform policy design by identifying the factors contributing to tax evasion and developing measures to reduce them, promoting a more effective and equitable fiscal system. The document is organized as follows. Section II explains the methodology and data sources used. Section III presents the main results of the study for Romania. Finally, Section IV concludes. II. Methodology and Data Several methods assess tax evasion in the literature, each with advantages and limitations. To obtain a more comprehensive understanding of the extent of tax evasion in Romania, it is essential to understand which methods or combinations of these methods are more reliable. One method compares tax returns and tax audit data, although it is important to note that audits are not always conducted randomly. Analyzing tax audit data can provide detailed information on noncompliance data. This information can be enriched by linking the audit data to population census data. However, this method is not always feasible, given the restrictions on access to audit data. Another standard method in the literature to estimate tax evasion is to use alternative sources such as income surveys, household budget surveys with consumption data, or discrepancies in economic statistics. For instance, it is possible to compare the incomes reported in administrative tax records 5 with those reported in income surveys, assuming that tax evaders have no incentives to conceal their true income when responding to an income survey. It is also possible to investigate tax evasion based on data on food spending and estimates of the link between income and food spending (Engels curve). For example, Kukk & Staeher (2014) show that in Estonia, the reported total income of households with business income above 20% of total income must be multiplied by 2.6 to attain the same propensity to food consumption as households of wage earners. In this sense, households with business income underreport 62% of their ‘true’ total income. The comparison of consumption patterns of the public sector employees (where envelope wages are not an issue due to legal constraints) and the private sector is another method. Kiełczewska et al. (2021) apply this method to Poland. They find that 12% of the employees receive envelope wages and that the most important problem is near the minimum wage threshold. Lopez-Luzuriaga et al. (2023) apply this method in Ecuador using income tax and online billing data. They find that the underreporting of income among private sector workers is 7%-9%, corresponding to 3% of the GDP. Envelope wages are most common in small companies. Alternatively, discrepancies in economic statistics can also reveal information about tax evasion. This approach involves comparing actual tax revenue with the National Accounts, which can help identify discrepancies indicative of tax evasion. Recently, a novel strand of the literature on tax evasion developed based on leaked lists of tax evaders (e.g., Alstadsæter et al., 2019). Studies based on Eurobarometer data show that in 2007, the share of employees receiving envelope wages in Romania stood at 22%, the highest among transition economies. However, the study based on Eurobarometer data for 2019 shows that approximately 5% of the employees receive envelope wages in Romania, and this share is medium compared to other CEE economies (Horodnic & Williams, 2021). A survey conducted in 2020 on a representative sample of students revealed that 31% of the Romanian students working during their studies received envelope wages, and 38% would accept envelope wages when having job offers. Moreover, 37% of students prefer envelope wages to obtain higher take-home income (Horodnic et al., 2020). Barth & Onegdal (2018) present evidence that approximately 5% of employees in EU countries receive envelope payments (part of their income is not reported to the tax authority). Di Nola et al. (2019) find high cross-country heterogeneity in the share of employees receiving envelope payments. According to their results, this share is 15% in Romania, 6 which is relatively high. Williams and Horodnic (2017) also found that the percentage of workers receiving unreported income is higher in Eastern Europe than Western Europe. Medina and Schneider (2019) estimated the size of the shadow economy for 158 countries from 1991 to 2015. Their results show that, in the case of Romania, the informal economy has gradually decreased from 36% to 23%, except for the global financial crisis. Davidescu & Schneider (2017) identify the main drivers of the shadow economy in the case of Romania: (i) unemployment rate, (ii) high share of self- employment, (iii) indirect taxation, and (iv) lack of trust in authorities. Davidescu & Schneider (2019) argue that the minimum wage acts as a long-term supporting factor of the grey economy as enterprises will seek alternative methods of circumventing tax authorities, and at least part of them will pay their employees in cash (envelope wage). Davidescu (2022) identifies a significant impact of minimum wage increases on the magnitude of the shadow economy, but only in the long term. Stancu et al. (2020), based on the dynamic multiple indicators – multiple causes (DYMIMIC) approach, estimate the share of the shadow economy in Romania's GDP at over 15%. According to the authors, this share has declined since the early 1990s, when it was close to 25%. Schneider (2015) estimates the share of the shadow economy in the GDP to be over 25%, the second highest value (after Bulgaria) in Central and Eastern Europe. In this paper, we follow the second approach; in this approach, to determine the extent of tax evasion, we must compare the distribution of "true" incomes with the distribution of "reported" incomes. Still, several challenges arise, and selecting and matching the appropriate datasets is necessary. Following the literature, we assume that true income is revealed in income or labor surveys such as the European Survey on Income and Living Conditions (EU-SILC), the Household Budget Survey (HBS), the Labor Force Survey (LFS), while "reported income" is the income reported to the tax authority. However, the main challenge with this approach is that there is no single data set that includes both the true and the reported income of individuals. Official tax return data only capture reported income and do not provide undeclared or tax-exempt income information, meaning it cannot accurately measure "true" income. Conversely, survey data does not provide reliable information on reported income to tax authorities. Therefore, it is necessary to match datasets to obtain a 7 comprehensive understanding of tax evasion. This paper is the first study linking administrative and survey data to investigate the issue of envelope wages and measure tax compliance in Romania. In the case of Romania, the following data sources were available for this study: administrative tax records, the EU-LFS (European Union Labor Force Survey), and the EU-SILC (European Union Statistics on Income and Living Conditions). The administrative tax records collected by the Romanian Ministry of Public Finance included a full sample of monthly tax records for all employees from 2020 to 2021. Additionally, the records for self-employed individuals were available for 2019 and 2020. 2 The EU-LFS and EU-SILC, household survey data, were also available for this study. The EU-SILC data covered the years 2020-2021, which included the income years 2019 and 2020. The EU-LFS data covered the years 2019-2021. The EU-LFS and EU-SILC data were also valuable as they allowed for comparing estimates of reported and true incomes. Using a combination of administrative records and survey data, this study can better understand tax evasion in Romania. The administrative records are an essential and rich data source, providing detailed information on tax payments and potential estimates of reported income. Brockemyer (2019) discusses the types of administrative data used in empirical economic research. She also debates the upsides and downsides of administrative tax data. Our data is a tax register covering all employees who received labor income in 2020 – 2021. The value of income is reported by the third party – the employer. The tax administration extracted the de-identified data under the formal agreement. The tax administration also provided a detailed and comprehensive description of the tax forms and variables in the dataset. Tax administrative records cover the entire formal sector, providing a more comprehensive sample than survey data. The data provided detailed information on labor income and COVID-19 support programs. The monthly number of taxpayers varies between 6.9 million and 7.4 million. This data is less prone to selective non-reporting at the top of the income distribution and contains detailed information collected frequently. Electronic collection of tax data reduces errors, improving the data's accuracy. Tax data measures income variables with high precision. In contrast, survey data might be subject to self- 2 Though available, data on self-employees was not used for this analysis as the focus was on employees. 8 reported measures provided by respondents. However, tax data may lack demographic information on individuals or households unless it is merged with other administrative or survey data. Also, it does not contain information on capital income or other key worker characteristics, such as education and occupation (only the NACE code of the company is recorded). The income of employees is reported by the employer, who has fewer incentives to misreport and is often more tightly monitored than the employees. This increases the data quality. In our data, labor income is measured at the individual level. Therefore, we cannot investigate tax evasion at the household level. Although the quality of tax data is usually high, it still may be plagued by missing variables, zero values challenging to interpret, and technical errors. Administrative tax data may also be challenging to extract from the administrative databases because they are often not integrated and not optimized to extract data (often, tax administration is interested only in checking the single taxpayer's income). Künn (2015) documents the increasing usage of linked administrative and survey data in economic research and discusses the challenges of data linking. Given the very large volume of information for approximately 7 million taxpayers, different identifiers were provided for the same individual, making the process of aggregating monthly tax data for each individual into an annual income measure difficult/quite challenging. In Romania, several income categories are exempt from personal income tax and, therefore, are not included in the definition of income used here. Exemptions include social benefits other than pensions and temporary work incapacity benefits, such as allowances, indemnities, social insurance benefits, and scholarships. Invalidity pensions for the 1st degree of invalidity and benefits for war invalids are also exempt. Additionally, income from damages, insurance payouts, and punitive damages is tax-free, but there are other exempt categories. 3 3 These encompass income from inheritance or donation, real estate transfers under specific conditions, and income from diplomatic missions and international organizations. Exemptions also extend to consultancy income under non- reimbursable agreements, subsidized interest on credits, and sports prizes from European, World, and Olympic championships. 9 In addition to critical socioeconomic information, the EU-SILC data provides valuable information on gross and net labor incomes. Still, it is important to acknowledge its limitations, particularly about non-response bias and recall errors. The EU-SILC (Statistics on Income and Living Conditions) survey relies on voluntary participation, whereas anyone with taxable income must file a tax return. This means there may be an under-sampling of households with high incomes, leading to underestimating top incomes and inequality. Furthermore, self-reported incomes in the EU-SILC may be subject to recall errors, with respondents potentially misremembering their income sources over the past year (the reference year). Incomes in the EU-SILC are also subject to imputations by the National Statistical Office, which might introduce some inaccuracies. Despite these concerns, the EU-SILC remains the only reliable source of household self-reported incomes, with sufficient sample size and detailed demographic information. As a result, we base our analysis on EU-SILC data to estimate true income. Another primary concern with the labor income data in the EU-SILC is the lack of information on monthly incomes; to overcome this, we perform the analysis with annual income estimates rather than monthly. The income information is only obtainable every year and does not correspond to the same period as the labor market status and additional labor variables. Yearly income measures cannot be used as a substitute for monthly income measures because yearly income may accumulate in only a few months of employment. Therefore, to accurately assess monthly income, we must consider the duration spent in different employment statuses during the year. The retrospective primary economic status data provides some information that can help divide the income into monthly parts, but to do so, we need access to longitudinal data. As a result of the different periods, the income divided by 12 may differ from the monthly income of the job at the time of the interview in the previous or current year. This issue is particularly significant for individuals with unstable careers, including job changes and unemployment interruptions, as the varying periods can result in substantial biases. To address these limitations, we decided to analyze yearly incomes instead of monthly incomes by aggregating administrative tax data, which is available every month, to a yearly level. 10 The EU-LFS data could not be used for this study as it only has information on the wage distribution in deciles rather than specific point estimates. This poses a challenge when analyzing the data, as point estimates are often needed to estimate labor income accurately. To overcome this issue, one option is to use imputation techniques to fill in the missing values. Specifically, we could use the EUSILC data, which provides detailed demographic characteristics and income information, to impute the missing point estimates on the LFS data. This can help provide a more accurate representation of the wage distribution and enable a more nuanced analysis. However, it is important to carefully consider the imputation process's potential biases and limitations and evaluate the quality of the resulting estimates, especially when several data sources (including administrative tax data) need to be imputed. In this study, we match income data from the 2020 tax administrative dataset with the 2021 EU-SILC (reference income year 2020) using data matching techniques; then, we assess the degree of tax compliance and possible overestimation of the share of minimum wage earners in the tax data. Following the literature, we assume that the "true" income is reported in the survey, while "reported" income is reported to the tax authority. After imputing tax income into the EU-SILC, we assess the degree of tax evasion and the impact of tax evasion on the share of minimum wage earners. Before matching the administrative tax data and the EU-SILC datasets, it is essential to carefully evaluate the comparability of the datasets before conducting any analysis. We must evaluate the comparability of the two datasets regarding the target population and income. This involves determining whether the datasets are compatible and suitable for comparison. The extent of comparability determines the accuracy of the results and the conclusions drawn from the analysis. Therefore, it is important to ensure that the datasets are matched appropriately and that the analysis considers any differences in the target population or income that may affect the results. Both data sources are compared in Table 1. 11 Table 1. The main characteristics of data sources - Tax Administrative Data vs. EU-SILC. Tax Administrative data EU – SILC Participation Compulsory (includes all employees who paid tax based Voluntary on their declared labor income) Coverage Cover the full labor income Underreporting at the top is distribution usually a problem in the survey data Tax evasion Data is impacted by tax evasion, Participants report their total as the informal economy is not income earned in the formal and covered informal economy Data quality The employers report income to Income is self-reported; recall the tax office errors and rounding may be a problem Income definition 4 Gross salary, 24 monthly files Gross labor income, annual Data size 6.9 – 7.4 million taxpayers 16,630 individuals; 5,758 with monthly positive labor income Source Ministry of Public Finance Harmonized data from Eurostat based on data collected by the National Statistics office Source: Own elaboration To compare the coverage of both data sources, we compare the aggregate value of labor income in national accounts and administrative tax data, and EU-SILC. One option to check for potential biases 4 While in the EU-SILC, the income was initially collected in local currencies in a data set shared with researchers, it is recorded in euros. The income in tax data is recorded in Romanian leu (RON). We converted tax income into euros using the exchange rate provided in the database. 12 in the SILC is to compare the reliability of aggregate income in the SILC against the National Accounts figure of the Central Statistical Office. The study on the micro-macro gap between the EU-SILC data and national accounts shows that coverage of the household disposable income reported in Romania was the lowest in the European Union. The estimated total sum of EU-SILC disposable income was 33% (+/- 10%) of national accounts' gross household sector disposable income before any modifications. This share was higher than 50% in all remaining countries; it was higher than 75% in many countries. The coverage of wages and salaries in Romania in 2014 equaled 65% and was also the lowest in the European Union. It was nearly 90% or even above 100% in most countries. This situation might be explained by the high number of subsistence farmers and self-employed, but also due to incentives to set up a microenterprise and pay 3% on turnover (below the threshold of EUR 1 million per year and without the obligation to have a full-time employee until 2023) rather than have a labor contract to cover a tax wedge of 43% of gross income. The micro-macro gap at the EU level is caused mainly by measurement errors and conceptual differences (Törmälehto, 2019). The outcomes of this comparison are presented in Table 2. The total compensation of employees in national accounts equals €87.0 billion. The estimate based on the EU-SILC data, which does not include social security contributions paid by the employer, is approximately €75.8 billion. The estimate based on the tax data, which does not include self-employed and social security contributions paid by the employer, is €67.0 billion. Although these estimates are not fully comparable, differences are probably small enough to be explained by varying data coverage. Following the literature, we assume that employees report their total income in survey data, including envelope payments and income earned in the informal economy. The higher value of aggregate compensation of employees in the EU-SILC than in the tax data indicates that this is indeed a fact. Then, we compare the "true income" reported in the EU-SILC with the "reported income," which is determined by the tax income imputed in the EU-SILC dataset based on common characteristics of individuals available in both data sources. The comparison outcomes indicate the possible degree of underreporting of income in tax data. 13 Table 2. Comparison of data on labor income in the national accounts, tax data, and EU-SILC. 2020 National accounts 2020 Tax admin data 2021 EU-SILC (2020 income year) 5 Coverage Covers both employees Covers only employees Covers both employees and self-employed, and self-employed includes social security contributions paid by the employer Aggregate 87.0 B. € (annual) 67.0 B. € (annual) 75.8 B. € (both compensation of (only employees, lack of employees and self- group (employees and self-employed6) employed) or self-employees) 74.5 B € (employees) Source: Own based on Eurostat, 7 tax data, and the EU-SILC. Since this method of estimating tax compliance requires merging multiple data sources, proper methods for statistical matching must also be assessed, given the data availability and the country context; data fusion techniques receive increasing interest from researchers and statistical agencies (Donatiello et al., 2016a; Lamarche et al., 2020). When common identifiers are available, research can apply the record linkage. For example, Lopez-Luzuriaga (2023) recently applied record linkage to match data on consumption from the electronic billing system with income tax data in Ecuador and investigate the frequency of envelope payments in the country. When common identifiers are unavailable, researchers rely on statistical matching (Yam & King, 2020; Lewa et al., 2021). Statistical matching allows for the synthetic data source to be derived based on data from various sources that do not contain the same unit or common identifier. The matching relies on common characteristics available in both data 5 Gross labor income. 6 According to the EU-SILC the share of self-employed in the economically active population is approximately 16%. The vast majority of self-employed are farmers. The vast majority of self-employed do not employ workers. 7 Compensation of employees: https://ec.europa.eu/eurostat/databrowser/view/tec00013/default/table?lang=en 14 sets. The statistical matching techniques have been used by Robayo-Abril and Rude (2024) in the Romanian context. The data sources used in this research do not share common identifiers. Therefore, we cannot apply record linkage and must use statistical matching techniques. We imputed tax income in the EU-SILC data set based on common characteristics. In tax data, employees' gross salary is reported by their employers. Our income concept is total gross labor income. If more than one employer employs one individual, we sum up the income from all employment contracts. The EU-SILC captures various income sources and distinguishes labor and non-labor income. In the EU-SILC data, we also use total gross labor income as our income concept. Both tax data and EU-SILC record the information on gender, year of birth, and region, 8 sector of the economy, 9 and employment status (full-time vs. part-time). Although the list of the common variables is not long, it is not necessarily a limitation. Too many matching variables can generate unnecessary noise (Donatiello et al., 2016b). Among data fusion approaches, we utilize model-based methodologies. Following the approach of Bacher and Prander (2018), we first estimate an empirical functional relationship between X and Z using the donor dataset, which in our case is the tax administrative data, as follows: = (; ; ) The functional form of (; ; ) depends on the population parameters and the imprecision with which Z might have been measured, denoted as . An example of such a functional form could be a linear regression: = 0 + 1 1 + ⋯ + + 8 In the EU-SILC , the information is representative at the NUTS-2 level, while in the tax data, the information is at the NUTS- 3 level. More detailed classification in tax data was aggregated to match the classification in the EU-SILC. 9 In the EU-SILC , the information on the sector of the economy is provided using a simplified two-digit NACE classification. The tax data offers more detailed four-digit NACE classification. Similarly, as in the case of income, we aggregate the more detailed classification of sectors in the tax data to match the simpler classification in the EU-SILC. 15 2 is assumed to have a normal distribution with a mean value of 0 and a variance of . Based on the estimated model parameters, missing values are estimated by: = (; ´; ´) = ′0 + ′1 1 + ⋯ + ′ + ′ It is crucial to note that data fusion methods depend on the conditional independence assumption, which requires the relationship between the overlapping variable Z and the missing variable X in the donor dataset to hold in the recipient dataset as well. This assumption is difficult to test empirically and may not always hold. Selecting overlapping variables Z that have high explanatory power for X can help address this issue. 10 Donatiello et al. (2016a) demonstrate that the conditional independence assumption is fulfilled when at least one variable exists( , which is perfectly correlated with either X or Y). The fact that the functional form of the imputation model relies on decisions taken by the researcher and might not reflect the true underlying functional form is an important limitation of model-based methodologies. Model-based methodologies rely on the functional form assumed in the first step of the estimation procedure. The functional form is imposed by the researcher and not driven by data (Donatiello et al., 2016a). The results of the model may depend on model specification, so the choice of covariates for the imputation model can significantly affect the results. That is an important limitation in statistical matching techniques. The study employs multiple imputation models, following Ruben's (1986) approach of concatenating datasets, and draws on statistical matching methods inspired by imputations commonly used for missing observations (Bacher & Prander, 2018). The methodology, as outlined by Lewaa et al. (2021), encompasses parametric, nonparametric, and mixed methods. Using Ruben's (1986) methodology, matching variables are identified to append two datasets, focusing on variables present in both surveys relevant to the target variable and exhibiting similar distributions across surveys (Serafino & Tonkin, 2017). Harmonization of overlapping variables is initially conducted, followed by lasso regressions to 10 This can be affected by the lack of covariates such as educational attainment in the tax administrative data. 16 determine the most significant variables for explaining income. Subsequently, the datasets are concatenated based on selected variables, and various imputation models are employed, including linear regression, predictive mean matching (PMM), and truncated regression. The best-performing imputation model determined based on alignment with observed distribution, consistency of results, and performance is the predictive mean matching (PMM) technique. Furthermore, a comparison of model specifications reveals that a weighted PMM slightly outperforms the unweighted PMM. By imputing many variables of tax income, we recognize the uncertainty of the matching process. Predictive mean matching usually imputes less implausible values than other imputation techniques. It is also recommended if the distribution of the original variable is skewed, which is true in the case of income data. We imputed ten values of tax income into the EU-SILC data set based on the ten closest neighbors (k=10). The imputation was handled in Stata using the multiple imputation module. EU-SILC includes survey weights, which increase the representativeness of the sample. We use the weights in the estimations reported below. We evaluate the performance of the different imputation methods by comparing several moments of the distribution of the imputed tax income to the ones of the true observable tax income. There are several possibilities for evaluating the performance of statistical matching techniques (Leulecscu and Agafitei, 2013). We start by comparing the imputed mean and standard deviation to the observed mean and standard deviation of tax income. We then compare the distribution of population subgroups (by income percentiles), which is crucial because it shows how the method works in the context of distributional analyses. The chosen imputation method should effectively maintain the marginal distributions of the variables before and after imputation. However, results based on matched datasets should be interpreted with caution, as they can be sensitive to the variables included in the imputation, and the specific objectives of the imputation, such as estimating the marginal distribution of taxable income rather than estimating joint distributions. 17 III. What is the extent and distribution of tax compliance in Romania? The imputation results show that the final imputed values using predictive mean matching (PMM) are reasonably close to those observed on the tax admin data. The distribution of income in the EU- SILC, tax data, and imputed EU-SILC is presented in Table 3. The mean income value in tax data and imputed EU-SILC are reasonably close.11 The mean tax income is 1.2% lower than the mean survey income, and the mean imputed tax income is 6.6% higher than the mean survey income. As discussed below, our estimate of tax evasion in Romania is close to that of other countries in the region. Table 3. The distribution of survey income, tax income, and imputed tax income (2020 €). Statistic Annual income based on Tax annual income- Imputed tax annual gross Survey data observed value income (EU – SILC, gross labor (July 2020 * 12) (imputed EU-SILC) income) Mean (without censoring) 11,300 11,164 12,051 Mean (with censoring) 10,863 . 10,191 P1 2,968 322 1,364 P5 5,223 1,374 5,531 P10 5,741 2,321 5,531 P25 7,419 5,531 6,186 P50 9,539 7,813 8,644 P75 13,779 13,376 13,862 P90 18,019 21, 446 21,823 P95 22,258 28, 416 29,108 P99 29, 678 57, 253 56,975 Note: To limit the impact of the underrepresentation of top incomes in survey data on the estimate of mean tax compliance, we censor the distribution of survey income (“true income”) and imputed tax income (“reported income”) at the 99th percentile of the distribution of survey income. Source: Observed and imputed values based on EU-SILC and tax admin data. P1-P99 refers to the first and 99th income percentiles based on the uncensored distribution. 11 Theoretically, we can imagine a scenario in which the distribution of imputed tax income is completely different from the distribution of income observed in this survey. In such a case, the outcomes of the imputation would be difficult to interpret. 18 A comparison of the distributions of "true" income based on EU-SILC and the imputed tax admin income reflects the underreporting of income at lower income quintiles and the underreporting of top incomes in survey data. At the median, the gap between “true” and “reported” income is 9%; thus, tax compliance is 91%. The mean value of income imputed tax income is higher than that of survey income. This results from a significant underreporting of top income in EU-SILC data. If we censor income distribution at the 99th percentile of the distribution of labor income in EU-SILC data, the mean imputed tax income is lower than survey income. The gap between mean “reported” and “true” mean income is 6%. 12 Due to challenges arising from the underreporting of top incomes in EU-SILC data,13 our preference is to assess tax compliance by comparing median incomes. This approach mitigates complications associated with data discrepancies and provides a clearer perspective on tax compliance levels. We present the income distributions in the (original) EU-SILC (gross labor income) and imputed tax income in the EU-SILC in Figure 1. We censured the distributions at the 99th percentile to keep the graphs readable. The rounding of income is an obvious problem in the EU-SILC data. Many respondents do not provide the exact income value, only a rounded value. When comparing the distribution of income observed in EU-SILC with the imputed tax income, we observe a spike in values corresponding to the minimum wage (the annual value of minimum wage for the employees without higher education is 5,531 EUR, the annual value of minimum wage for the employees in the construction sector is 7,441 EUR). However, the imputed tax income data spike is significantly higher than that observed in EU-SILC data. The spike in the value of minimum wage may be seen as evidence that the minimum wage is effective in providing a floor constraining the value of wages. Without the minimum wage, the underreporting of income will probably be higher (minimum wage establishes a minimum value of 12 After applying censoring at income equal to €29,678, the mean survey income becomes €10,863 and the mean imputed tax income (tax data imputed into the survey) is €10,191. 13 The value of income at P90 in tax data is 19% higher than in the survey, and at P95, the income reported in tax data is 28% higher. At P99, income in tax data is nearly twice as high as in survey data (see Table 3). 19 official salary, so less remains for an envelope payment), or some employees will be completely informal. In the bottom half of the distribution, imputed tax income (“reported income”) is significantly lower than survey income (“true income”). This probably reflects the underreporting of income in the tax data. At the 75th percentile of the distribution, the income value is approximately equal in all three data sources. In the top quarter of the distribution, tax income and imputed tax income are significantly higher than survey income. This may reflect the underreporting of top incomes in survey data or the systematic undercoverage of richer households (e.g., Brzeziński et al., 2020; Flaichere et al., 2022). The survey might completely miss respondents from the extreme right tail of the income distribution either because the survey design does not target high-income earners or because the wealthiest individuals refuse to participate. Figure 1. The distribution of observed gross labor income in the EU-SILC. .25 .2 .15 Fraction .1 .05 0 0 20000 40000 60000 EU-SILC Imputed tax income Note: The first red line indicates the annual minimum wage for employees without higher education, and the second red line indicates the annual minimum wage for employees in the construction sector. Both distributions are censored at the 99th percentile to keep the graph readable. Source: Own estimation based on tax data, EU-SILC, and imputation methods. Our calculation of annual tax income relies on the tax income reported in July multiplied by 12. This decision is informed by the fact that, in July, the mean income closely aligns with the annual mean income. Notably, despite the challenges posed by COVID-19, Romania witnessed a 4% increase in mean 20 labor income from January to November 2020. Additionally, variations in the number of employees throughout the year, peaking in the summer and dropping in the winter, impact data matching. The characteristics of employees also undergo seasonal changes, making the evolving employee pool more significant in our context than fluctuations in median/mean labor income values. When estimating the gap between median "true" and median "reported income" based on tax data for February 2020, we observed an increase in average tax compliance to 96%, while tax compliance at the median decreased to 85%. Considering these factors, we assert that July provides the most representative proxy for annual income, and therefore, we maintain our main estimate based on this month as the baseline. Bendek & Lelkes (2011) compare the “true income” reported in the household budget survey and “reported income” from the sample of tax declarations in Hungary and find that the average rate of underreporting of income in tax data is 9%– 13%. The degree of underreporting is higher among self- employed than among employees. The underreporting is highest at the bottom and the top of the income distribution. The underreporting of true income in tax data reduces the tax system's progressivity. Paulus (2015) compares the "true income" reported in labor force survey data and income reported in tax declarations in Estonia. The study found that about 12% of total employment income is not reported to the tax authority, primarily due to the partial underreporting of income (envelope payments). Kiełczewska et al. (2021) compare the "true income" reported in the household budget survey and "reported income" from tax declarations in Poland. According to their results, 31% of microenterprises employees and 12% of employees in Poland receive envelope payments. Envelope payments equal 6% of Poland's labor income and 1.6% of the Polish GDP. Research done for other countries clearly shows that underreporting of top incomes in survey data significantly biases the true measure of income inequality downwards; our analysis shows this is also the case in Romania. In Table 4, we compare the measures of income inequality in the EU-SILC dataset, tax data, and imputed EU-SILC and find that this is also the case in Romania. Our reference income variable is gross labor income. We restrict our sample to employees because we do not have complete administrative tax data for self-employed. We measure income inequality on the individual level. Income inequality is significantly higher in tax and imputed tax data than in survey data. In the original 21 (weighted) EU-SILC data, the Gini index equals only 0.2510. Regarding the imputed tax income in the EU-SILC data set, the Gini index is nearly 50% higher and equals 0.3699. The underestimation of inequality is exceptionally high at the top. The top 10% income share equals 30.3% in the imputed EUSILC and only 20.67% in the EU-SILC. The top 1% income share increases from 2.9% to 7.2%. 14 The income inequality using the imputed tax income in the EU-SILC is comparable but slightly lower than in the tax data. The slightly higher inequality in tax data may reflect a better representation of top-income earners (having different characteristics than EU-SILC respondents) in the tax dataset. These outcomes show that survey data severely underestimate Romania's true income inequality level. Table 4. Income inequality in Romania (gross labor income, only employees). Indicator Survey data Tax admin data Imputed tax income (EU – SILC) (July 2020 * 12) (imputed EU-SILC) Gini index 0.2510 0.4321 0.3699 Theil index 0.1010 0.3679 0.2728 Bottom 50% income share 32.79% 21.87% 25.76% Middle 40% income share 46.55% 45.38% 43.94% Top 10% income share 20.66% 32.75% 30.30% Top 1% income share 2.91% 8.28% 7.19%. Source: own estimation based on the EU-SILC and tax data. 14 To get a sense of magnitudes, Atkinson et al. (2011) and Burkhauser et al. (2012) have analyzed income data for the United States and found that the survey-based estimates of the top 1% income share are lower by several percentage points compared to the estimates based on tax return data. 22 The discrepancy between reported income and true income, a common issue in tax data, indicates the extent to which people underreport their income on their tax returns or tax evasion; in Romania, this is higher among low-income individuals. The reported income, also known as imputed tax income, is the income that individuals or companies report on their tax returns. In contrast, the survey income, also known as true income, is the actual income earned by individuals or companies. The results show that underreporting is exceptionally high for individuals in the bottom half of the income distribution, where income is likely easier to hide. At the median, the gap between reported and true income is 10%, which means that people report only 90% of their "true" income. At the 25th percentile, the gap is 17%, indicating that people in the lower quartile report only 83% of their true income. However, at the 10th percentile, the gap is still 4%, which means that even those with the lowest income still report less than their true income. Here, we should notice that at the bottom of the income distribution, the possibility of paying employees via envelope payments is limited because enough income must remain to pay the minimum wage. Otherwise, formal employment relations would be lost. There could be several reasons for underreporting, including tax evasion, an informal economy, and criminal activities. Individuals may underreport incomes to the tax authorities to reduce their tax liability. Alternatively, there may be a high incidence of informal economy activities, including "envelope payments," where individuals are paid in cash, and the payment is not recorded. Criminal activities, such as drug trafficking or money laundering, may also result in underreporting of income. Our data and method used in this note are also not good at capturing the underreporting of income at the top of the distribution because top earners tend to be underrepresented in the survey data. While surveys can provide helpful information about income trends and distributions, they may not accurately capture the incomes of the wealthiest individuals, leading to potential gaps and biases in the data. Wealthy individuals may be missing from the sample (due to coverage errors, sparseness, or unit non-response) or -- even if they are included-- the income information is missing (due to item nonresponse), underreported or censored (Lustig, 2019; Ravallion, 2022; Yonzan et al., 2022) To somehow control for this problem, we restrict the distribution of survey income and imputed income at the 99th percentile of the income in the EU-SILC (see Bendek & Lelkes, 2011). In this case, the mean income in the EU-SILC is 11,231 EUR, and the mean imputed tax income is 10,193 EUR. The average 23 underreporting is thus 9.3%, close to the estimate of Bendek & Lelkes (2011) for Hungary (9%-13%), an estimate of Kiełczewska et al. (2021) for Poland (6%), and the estimate of Paulus (2015) for Estonia (12%). Thus, Romania is not an outlier in the regional context. We use the ratio of median imputed tax income ("reported income") to median survey income ("true income") to measure tax (non)compliance. 15 The more significant the difference between the median incomes, the higher the non-compliance with tax laws. This approach assumes that the survey income represents the "true" income earned, while the imputed tax income represents the income that individuals or companies report to the tax authorities. Using this ratio to measure tax compliance/noncompliance has several advantages. First, it provides a straightforward and easy-to- calculate measure that can be used to compare tax compliance across different groups or regions. Second, it is less affected by outliers or extreme values than other measures of income, such as the mean or total income. However, this approach also has some limitations. For example, it assumes that the survey data accurately captures the "true" income earned, which may not always be true. Additionally, the ratio may not capture all non-compliance with tax laws, such as underreporting deductions. We do not use the ratio between mean imputed tax income and mean survey income as the measure of tax (non) compliance because, as discussed above, the mean value of income in imputed data is driven upwards by top incomes not represented in the survey data. Tax compliance levels vary across different sectors of the economy, with the lowest compliance (measured at median income) seen in transport (32% gap), construction (26% gap), and accommodation and food services (18% gap). We report the tax (non)compliance across the sectors of the economy in Table 5. Because the EU-SILC sample size imposes a constraint on the analysis and estimates for smaller sectors of the economy may not be reliable, we discuss only outcomes for branches employing at least 5% of employees in the EUSILC data. Our results show that tax compliance is lower than 100% in all sectors of the economy, and the median imputed tax income is lower than the median survey income, indicating underreporting of income. The lowest tax compliance was observed 15 The ratio is expressed as: imputed tax income / survey income. 24 in transport, where the reported income was 32% lower than the true income. Similar low compliance rates were seen in construction (26% gap) and accommodation and food services (18% gap), which is unsurprising since these sectors are known for having a high degree of informal employment. To reduce the informality in the construction sector, the government has introduced a differentiated/higher minimum wage as of 2019 and, at the same time, had granted very generous fiscal facilities (PIT break) exception from the 10% health insurance contribution, and suspension of the transfer to the second pension pillar (3.75%), while for employers the contribution for work insurance was reduced from 2.25% to 0.33%. However, the public administration sector showed relatively low compliance (18% gap). The public administration sector is typically associated with formal employment and compliance with tax regulations; this finding of relatively low compliance in this sector is puzzling. This could be due to the relatively small sample size of employment in this sector in the EU-SILC data, or the performance of the imputation among subgroups, which may affect the reliability of the estimates. It is also possible that specific factors unique to the public administration sector, such as complex tax rules or administrative difficulties, could make it challenging for public servants to recall the exact amount earned, as in the public sector, several bonuses/allowances are granted depending on the number of days worked in a month (which represents approximately 20% of the total income earned). In 2020, due to the pandemic, a large part of public servants worked remotely, reducing their wages, as these bonuses were contingent on physical presence. 25 Table 5. Tax compliance across the sectors of the economy (measured at median income) Economic Sector NACE code Share of employees Tax compliance in the EU-SILC data (median imputed tax income/ median survey income) Agriculture a 3.55% 92.33% Mining, manufacturing, electricity, and water supply b–e 28.66% 85.13% Construction f 9.13% 74.06% Trade g 16.92% 96.67% Transport h 7.30% 68.37% Accommodation and food service I 2.67% 81.79% Information and communication j 3.69% 65.72% Financial services k 1.63% 76.93% Real estate, professional, rental l–- n 6.37% 84.18% Public administration o 5.90% 81.81% Education p 4.72% 94.05% Human health q 6.34% 87.81% Source: own based on imputation outcomes. 26 To investigate tax compliance further, we adopt a “cell approach” (Bendek & Lelkes, 2011) and estimate the tax compliance in 40 cells composed of gender (2), regions (4), and age groups (5). The comparison of “true” and “reported” income across subgroups of the population allows for a better understanding of tax evasion patterns. We report the total outcomes of this exercise in Tables 5 and 6. Our results show that tax compliance is higher among women than men. In all “women” cells, the tax compliance (measured at the median of the income distribution) is higher than 90%; in some cells, the reported income is even higher than the tax income. In the case of men, in all cells except one, the tax compliance is lower than 90%; on average, the tax compliance is close to 80%, and in some cells, the tax compliance is lower than 70%. D’Attoma et al. (2017) previously identified similar gender differences in tax compliance in the US, UK, Sweden, and Italy based on tax compliance experiments. According to their results, women are more compliant than men. Table 6. Tax compliance:“ cel” approach, men (measured at median income) Age/region <30 30 – 39 40–- 49 50-59 RO1 (North) 85.09% 66.82% 69.71% 67.58% RO2 (East) 81.56% 80.34% 83.88% 85.88% RO3 (South) 87.50% 74.71% 72.99% 74.28% RO4 (West) 87.51% 89.88% 75.24% 79.53% Source: Own estimation based on the imputed EU-SILC data 27 Table 7. Tax compliance: "cell" approach, women (measured at median income) Age/region <30 30 – 39 40 - 49 50-59 RO1 (North) 111.38% 92.07% 93.91% 91.52% RO2 (East) 96.84% 95.00% 96.80% 111.93% RO3 (South) 105.33% 101.04% 103.25% 109.95% RO4 (West) 127.60% 97.92% 98.90% 110.19% Source: Own estimation based on the imputed EU-SILC data. Our results show that tax compliance is highest among workers younger than 30 and older than 60, possibly due to the selection bias where people with better jobs tend to stay longer in the labor market. The research also reveals that tax compliance varies across regions, with Macro region One (North) having the lowest compliance rate. In contrast, the compliance rate is medium in Macro region. Three (South Romania) and higher in the other two regions. This indicates that regional differences in factors affecting tax compliance, such as the size of the informal economy or the quality of tax administration, may exist. Further investigation may be necessary to understand these regional differences and address the compliance issues. Finally, we discuss the impact of non-compliance on the share of minimum-wage earners. Estimating the share of minimum wage earners is important for several reasons. Firstly, it allows for assessing the prevalence of low-wage work in the economy, a crucial indicator of income inequality and poverty. Secondly, it can inform policy decisions regarding minimum wage setting and targeting social protection programs for low-income households. Additionally, understanding the characteristics of minimum wage earners, such as their age, gender, education level, and employment industry, can provide insights into the broader labor market dynamics and help identify vulnerable groups needing support. The proportion of minimum wage earners in Romania is notably high when estimated from tax data. Over a quarter of all employees are minimum wage earners, and in some sectors, this share exceeds 28 50%. Based on administrative tax data, previous research estimated the share of minimum wage earners in Romania in 2020 – 2021 and described their profile (Robayo-Abril, Zamfir, and Wroński, 2024). According to our results, based on the tax data, more than 25% of all employees earn minimum wage. The proportion of minimum wage earners varies across branches of the economy and regions of the country. The share of minimum wage earners is exceptionally high in micro enterprises (less than ten employees), where 67% of all employees are minimum wage earners. In construction, accommodation, and food services, more than 50% of employees earn only the minimum wage; in the transport and storage sector, this share is over 40%. The proportion of minimum wage earners is higher in low-income regions. The share of minimum wage earners is highest among the youngest and oldest employees. However, the estimated share of minimum-wage earners in tax data may be subject to measurement error due to tax evasion and unreported income; in particular, it may be biased upwards. In particular, workers and employers may underreport their actual earnings to avoid paying taxes or to receive government benefits. This behavior could lead to an upward bias in estimating the share of minimum wage earners in the tax data. Moreover, envelope payments, typically made in cash and not reported to tax authorities, are more prevalent among workers in low-wage occupations, further complicating the measurement of the share of minimum-wage earners based on tax data. It is theoretically possible that some employees earn minimum wage formally and more informally, e.g., by receiving envelope payments. To assess the reliability of the estimate of the share of minimum wage earners based on tax data, we compare the share of minimum wage earners in survey data, tax data, and imputed survey data. We define minimum wage earners the same as Robayo-Abril, Zamfir & Wroński (2024); we cover separate minimum wage for the construction sector, and we do not cover the separate minimum wage for employees with higher education.16 Due to the significant gap in the share of part-time workers in 16 Thus, we define minimum wage workers as those earning between 95% of the minimum wage for employees without secondary education and 105% of the minimum wage for employees with higher education and those who earn between 95% and 105% of the minimum wage for the construction sector. Robayo-Abril, Zamfir, and Wronski (2024) describe the institutional setting of minimum wage legislation in Romania in detail. 29 EUSILC and tax data (5% and over 20%, respectively 17 ), we focus only on full-time employees. The outcome of this comparison is presented in Table 5. In July 2020, a relatively large proportion of all full-time employees in the tax administration dataset earned a minimum wage or less compared to the EU-SILC. In tax admin data, in July 2020, only a tiny fraction (2.7%) of all full-time employees earned less than 95% of the minimum wage. About 24.7% earn between 95% and 105% of the minimum wage. Thus, the share of minimum wage earners is at 27.4%. In the EU-SILC data, the share of minimum wage earners is significantly lower, at 11.95%. In the imputed EU-SILC, the share of minimum wage earners is between EU-SILC and tax data, which equals 23.04%. Results presented in Table 8 suggest that the share of minimum wage earners estimated based on tax data may be biased upwards. However, even if the actual share of minimum wage earners is lower than in tax data, it is still among the highest in the European Union. The lower share of minimum wage earners in EU-SILC data and imputed EU-SILC data may also be explained by the fact that income is measured annually. Thus, some employees who earned minimum wage for part of the year are not in the minimum wage threshold for the whole year. Our estimates on the share of minimum wage earners in the survey data are close to estimates based on other surveys. For example, Eurostat, based on EU-SILC 2018, estimated the share of minimum wage earners in Romania to be 13% for men and 18% for women. Based on the Structure of Earnings Survey 2018, Eurostat estimated the share of minimum wage earners to be 13%. 18 17 The higher share of part-time workers in tax data may be driven by COVID-related reductions in working time, which are not recorded in the EU-SILC. 18 https://ec.europa.eu/eurostat/statistics- explained/index.php?title=File:Proportion_of_employees_earning_less_than_105_%25_of_the_minimum_wage,_2018_( %25).png 30 Table 8. The share of minimum wage earners in survey data, imputed survey data, and tax data (only full-time employees). EU-SILC Imputed EU-SILC Tax data (July 2020) Less than 95% of MW 5.11% 5.44% 2.7% 95 – 105% of MW 6.84% 17.6% 24.7% More than 105% of MW 88.06% 76.96% 72.6% Source: Own estimation based on the imputed EU-SILC data. IV. Conclusion This paper focuses on assessing tax compliance and underreporting of income in Romania, with a specific emphasis on minimum wage workers. The study aims to understand the distributional implications of tax compliance and its links with minimum wage policy and design. The paper highlights that tax evasion has significant redistributive impacts, affecting the incidence and distribution of taxes. Reducing tax evasion is crucial for ensuring the effectiveness of the tax system and promoting greater income equality. The study also emphasizes the links between tax evasion and social policy, particularly regarding the distribution of social benefits. Tax evasion can undermine the progressivity of the tax system, leading to fiscal losses and reduced social welfare for eligible recipients. The research also explores the interaction between labor tax evasion and minimum wage policy, emphasizing that payroll tax evaders may be overrepresented among minimum wage earners. This raises concerns about the effectiveness of minimum wage policy and the need to address tax compliance issues among this demographic. The study underscores the importance of addressing tax compliance, evasion, and transparency issues to enhance fiscal position and generate more revenue. The study provides an estimated range for non-compliance/under-reporting of income in Romania, both overall and for different population groups, and explores the characteristics associated with 31 non-compliance. The study examines the extent and distribution of tax evasion among different income groups, regions, and sectors of the economy. The goal is to identify the factors contributing to tax evasion and develop measures to reduce them. To investigate tax evasion in Romania, we match tax administrative data with the survey data (EU-SILC) to assess Romania's tax (non)compliance. We impute tax income to EU-SILC and compare the value of imputed tax income (reported income) and survey income (true income). According to our results, the tax compliance at the distribution median is approximately 94%; it decreases to 83% at the 25th percentile and increases to 96% at the 10th percentile. The average tax compliance (after censoring of top income) is 94%. The average underreporting of income is 6%. This result is close to the estimate for Hungary (9%– 13%) by Bendek & Lelkes (2011), an estimate by Kiełczewska et al. (2021) for Poland (6%), and the estimate for Estonia (12%) by Paulus (2015). Tax compliance varies across sectors of the economy. It is particularly low in transport, construction, food, and accommodation. Women are more tax-compliant than men. Tax compliance is highest among the youngest and oldest employees and lowest in prime working age. Tax compliance varies across country regions; it is lowest in the North and highest in the East and West. The share of minimum wage earners in the tax data is inflated due to the underreporting of income. According to tax data, 27.5% of employees are minimum wage earners. According to the raw EU-SILC data, this share is 13.5%. Based on the imputation of tax income into the EU-SILC data, our results show that the share of minimum wage earners is significantly higher, 22.3%. Underreporting of tax income is an important challenge for public policy in Romania. The discrepancy between tax data and survey data should be further investigated. Tax evasion weakens the fiscal capacity of the country. Under-reporting of income in the bottom half of the income distribution may result in an incorrect allocation of means-tested benefits and limit the efficiency of the social policy. The estimates of the share of minimum wage earners based on the tax data should be treated cautiously. Our results suggest that approximately one-quarter of the minimum wage earners receive envelope wages and earn more than the minimum wage. 32 References Adăscăliţei, D., Rat, C. and Spătari, M. (2020). Improving Social Protection in Romania. FriedrichEbert Stiftung Romania.Alstadsæter, A., Johannesen, N. and Zucman, G. (2019). Tax Evasion and Inequality. American Economic Review, 109(6), 2073 – 2103. Argentiero, A., Casal, S., Mittone, L. and Morreale, A. (2021). Tax evasion and inequality: some theoretical and empirical insights. Economics of Governance, 22, 309 – 320. Atkinson, A. B., T. Piketty and E. Saez (2011). Top Incomes in the Long Run of History, Journal of Economic Literature 49(1), pp. 3-71 Bacher, J., & Prandner, D. (2018). Datenfusion in der sozialwissenschaftlichen Wahlforschung– Begründeter Verzicht oder ungenutzte Chance? Theoretische Vorüberlegungen, Verfahrensüberblick und ein erster Erfahrungsbericht. Österreichische Zeitschrift für Politikwissenschaft, 47(2), 61-76. Barth, E., & Ognedal, T. (2018). Tax Evasion in Firms. LABOUR, 32(1), 23–44. Bendek, D. and Lelkes, E. (2011). The Distributional Implications of Income Under-Reporting in Hungary. Fiscal Studies, 32(4), 539 – 560. Brzeziński, M., Sałach, K. and Wroński, M. (2020). Wealth inequality in Central and Eastern Europe: evidence from joined survey and rich lists' data. Economics of Transition and Institutional Change, 28(4), 637-660. D’Attoma, J., Volintiru, C., & Steinmo, S. (2017). Willing to share? Tax compliance and gender in Europe and America. Research & Politics, 4(2). https://doi.org/10.1177/2053168017707151Di Nola, A., Kocharkov, G., & Vasilev, A. (2019). Envelope wages, hidden production, and labor productivity. The BE Journal of Macroeconomics, 19(2), 1–30. Donatiello, G., D’Orazio, M., Frattarola, D., Rizzi, A., Scanu, M., & Spaziani, M. (2016a). The statistical matching of EU-SILC and HBS at ISTAT: Where do we stand for the production of official statistics? In DGINS–Conference of the Directors General of the National Statistical Institutes (pp. 2627). 33 Donatiello, G., D'Orazio, M., Frattarola, D., Rizzi, A., Scanu, M., & Spaziani, M. (2016b). The role of the conditional independence assumption in statistically matching income and consumption. Statistical Journal of the IAOS, 32(4), 667-675. Guyton, J., Langetieg, P., Reck, D., Rish, M. & Zucman, G. (2021). Tax Evasion at the Top of the Income Distribution: Theory and Evidence. NBER WP 28542, National Bureau of Economic Research. Flaichere, E., Lustig, N. and Vigorito, A. (2022). Underreporting of Top Incomes and Inequality: A Comparison of Correction Methods using Simulations and Linked Survey and Tax Data. Review of Income and Wealth, Early View. Gavoille, N. and Zasova, A. (2023). Minimum wage spike and income underreporting: A back-of- the-envelope-wage analysis. Journal of Comparative Economics, 51(1), 372 – 402. Horodnic, I. A., Williams, C.C., and Ianole-Călin, R. (2020). Does higher cash-in-hand income Motivate young people to engage in under-declared employment. Eastern Journal of European Studies 11(2), 48–69. Horodnic, I.A. and Williams, C.C. (2021). Cash wage payments in transition economies: Consequences of envelope wages. IZA World of Labor 2021: 280 International Monetary Fund (2021). Fiscal Monitor Database of Country Fiscal Measures in Response to the COVID–19 Pandemic (October 2021). https://www.imf.org/en/Topics/imf- andcovid19/Fiscal-Policies-Database-in-Response-to-COVID-19 (accessed on April 24, 2023). Kiełczewska, A., Kośny, M. and Sawulski, J. (2021). Skala płacenia pod stołem w Polsce. Warszawa: Polski Instytut Ekonomiczny. Kukk, M. and Staehr, K. (2014). Income underreporting by households with business income: evidence from Estonia. Post-Communist Economies, 26(2), 257 – 276. Künn, S. (2015). The challenges of linking survey and administrative data. IZA World of Labor: p. 214 34 Lamarche, P., Oehler, F. & Rioboo, I. (2020). European household's income, consumption, and wealth. Statistical Journal of the IAOS, 36(4), 1175–1188. Leenders, W., Lejour, A., Rabaté, S. and van ’t Riet, M. (2023). Offshore tax evasion and wealth inequality: Evidence from a tax amnesty in the Netherlands. Journal of Public Economics, 217, 104785. Leulescu, A., & Agafitei, M. (2013). Statistical matching: a model based approach for data integration. Eurostat-Methodologies and Working papers, 10-2. Lewaa, I., Hafez, M. S.. and Ismail, M. A. (2021). Data integration using statistical matching techniques: A review. Statistical Journal of the IAOS, (Preprint), 1-20. Lopez-Luzuriaga, A., Calijuri, M., Pessino, C., Schächtele, S., Gonzalez, U. & Chamorro, C. (2023). Detecting Envelope Wages with E-billing Information. International Tax and Public Finance. https://doi.org/10.1007/s10797-023-09811-y Lustig, N. (2019). The “Missing Rich” in Household Surveys: Causes and Correction Approaches. Working Paper No. 75. CEQ Institute, Tulane University. Murray, J.S. (2018). Multiple Imputation: A Review of Practical and Theoretical Findings. Multiple Imputation: A Review of Practical and Theoretical Findings. Statistical Science, 33(2), 142 – 159. Nygård, O.E., Slemrod, J., Thoresen ,T.O. (2019). Distributional Implications of Joint Tax Evasion. The Economic Journal, 129(620), 1894 – 1923. Paulus, A. (2015). Tax evasion and measurement error: An econometric analysis of survey data linked with tax records. ISER Working Paper No. 2015-10, University of Essex. Ravallion, M. (2022). Missing Top Income Recipients. Journal of Economic Inequality, 20(1), 205 – 222. Robayo M. and Rude, B. (2024). Statistically Matching Income and Consumption Data: The Case of Romania. Forthcoming, World Bank. 35 Robayo-Abril, A., Zamfir, M. & Wronski, M. (2024). “Simulating Aggregate and Distributional Effects of Minimum Wages Increases in Romania: Evidence from Survey and Administrative Data,” forthcoming Rubin D. B. (1987). Multiple Imputation for Non-response in Surveys. Wiley. New York. Schneider, F., (2015). Size and Development of the Shadow Economy of 31 European and 5 Other OECD Countries from 2003 to 2015: Different Developments. Linz: Johannes Kepler University. Tonin, M. (2011). Minimum wage and tax evasion: Theory and evidence. Journal of Public Economics, 95(11-12), 1635 – 1651. Törmälehto, V.-M. (2019). Reconciliation of EU statistics on income and living conditions (EUSILC) data with national accounts. Statistical Working Papers, Luxembourg: Eurostat. Williams, C. C., & Horodnic, I. A. (2017). Evaluating the illegal employer practice of under- reporting employees’ salaries. British Journal of Industrial Relations, 55(1), 83–111. Yang, S. and Kim, J.S. (2020). Statistical data integration in survey sampling: a review. Japanese Journal of Statistics and Data Science, 3, 625 -650. Yonzan, N., Milanovic, B., Morelli, S. and Gornick, J. (2022). Drawing a Line: Comparing the Estimation of Top Incomes between Tax and Household Survey Data. Journal of Economic Inequality, 20(1), 67 – 95. 36