Sex-disaggregating Tax Administrative Data: Experience from Colombia’s Tax and Customs Authority Knowledge Note November 2024 Gender and Tax Dialogue: A series on Gender and Fiscal This series delves into the often-overlooked intersection of gender and taxation, revealing how tax policies and their administration can impact men and women differently across the globe. Each knowledge note will explore the subtleties of tax policy, tax and customs administrations, and tax compliance through a gender lens, uncovering both challenges and opportunities for creating an efficient and progressive tax system. Such a system is important for facilitating women’s economic participation and agency. Join us as we unpack the complexities of gender equality in taxation, highlighting transformative stories and innovative strategies that aim to promote inclusive economic growth and sustainable development. Luis Fernando Gamboa Niño (Former Advisor to the Director General of DIAN), Luis This Knowledge Carlos Reyes (Minister of Commerce, Industry and Tourism of Colombia and former Note was prepared Director General of DIAN), Ana Maria Tribin (Senior Economist, World Bank) and Hitomi Komatsu (Gender Economist, World Bank) authored this Note. Authors would by the Integrating like to thank Eduardo Iriondo for the excellent research assistance and DIAN’s former Gender Equality and current staff, Javier Ávila Mahecha, Adriana Yaneth Plazas Cadena, Irayda Ximena into Tax Reform Lara Chaves, Juan Guillermo Caicedo Useche, Olga Adriana Puerto Gonzalez, María project team of the Fernanda Osorio Moreno, Pastor Hamleth Sierra Reyes, and David Gustavo Suarez World Bank’s Global Castellanos for sharing their knowledge and experience in the interviews carried out Tax Program. which during April and May of 2024. Authors would also like to thank Alejandro Montoya (DIAN), and Rafael Munoz Moreno (Lead Country Economist), Melise Jaud (Senior is led by Ceren Ozer Economist), Anne Brockmeyer (Senior Economist), and Thiago Scot (Economist) from (Senior Economist, the World Bank for peer reviewing this Note. Fiscal Policy and Sustainable Growth The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Unit, MTI). Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Attribution Please cite the work as follows: “Gamboa, Luis Fernando; Reyes, Luis Carlos; Tribin, Ana Maria; and Komatsu, Hitomi (2024). Sex-disaggregating Tax Administrative Data: Experience from Colombia’s Tax and Customs Authority. Gender and Tax Dialogue Knowledge Note. Washington, DC.: World Bank.” Cover photo: Dookh Press / World Bank. Report’s design: Giannina Raffo, Adrián Lizaldre. 2 KNOWLEDGE NOTE EXECUTIVE SUMMARY The 2022 Colombian Tax Reform gave the National Tax and Customs Authority (DIAN) the mandate to conduct gender-focused studies. Subsequently, DIAN established institutional structures, developed strategies for sex-disaggregation in the taxpayer registry database and personal income tax return forms, and analyzed disaggregated tax data. This Knowledge Note documents DIAN’s experience in doing so by outlining the institutional strategy, methodologies used in the past and at present, and challenges encountered in the process. It aims to offer lessons for other revenue administrations and government agencies planning to sex-disaggregate and analyze tax data. 1. Tax data disaggregation by women and men Over the years, DIAN has undertaken numerous initiatives to integrate gender-focused analysis into tax data, but many of these efforts have faced significant limitations. DIAN is collaborating with the National Civil Registry of Colombia - the institution responsible for civil registration - to identify the sex of taxpayers in the tax administrative data under a restrictive information agreement between the two agencies. This agreement explicitly states who can access and how this data can be used. However, data sharing between the two organizations is limited because of the lack of a legal framework for inter-agency exchange of private information. Under the restrictive agreement, DIAN cannot access the entire Civil Registry database nor use it for economic analysis. Instead, it needs to validate the tax filers’ sex individually through the available online system, which makes validation of already registered individuals too time-consuming. Hence, the validation process of the sex of taxpayer is restricted to newly registered individuals. For this reason, DIAN opted to develop its own methods for sex-disaggregation. The most recent strategy by DIAN to disaggregate personal income tax data by sex (women and men) involved: 1) Merging the taxpayer database and pension data using the national ID numbers (where 42.8% of individuals could be matched); 2) Using an ID number rule, which easily reveals the sex of individuals who obtained their ID before the year 2000—when men and women were assigned different series of numbers— where 34.6% of individuals could be matched; and 3) Using an algorithm designed to categorize taxpayers based on a list of names historically attributable to each sex in Colombia. This name-based algorithm is applied to individuals born after 2000 for whom the ID rule could not be used. This model assigns a probability of being male or female based on characteristics of the first and second middle name, such as letter structure and number of consonants. This is the method that DIAN uses to conduct and publish gender analysis of tax administrative data. 3 KNOWLEDGE NOTE 2. Data disaggregation by male, female, non-binary, and transgender in the tax registry database and tax returns In addition to classifying income taxpayers into women or men using the above method, DIAN invites taxpayers to voluntarily report their sex according to one of four categories (male, female, non-binary, and transgender) in both the tax registry database and tax returns since 2022. In 2023, out of 5.4 million taxpayers, about 1 million individuals voluntarily declared their sex. However, the option for self-declaration of sex during registration has been discontinued due to the sensitive nature of collecting this information. Instead, since 2024, DIAN has retrieved this data from identity documents in the National Civil Registry database for new registrants, under a restrictive information-sharing agreement. The disaggregated data with four sex categories has not yet been fully processed for analysis, but this is currently underway. DIAN is in the process of integrating and validating the information, which will enable future studies to incorporate taxpayers’ self-reported sex. Current efforts for disaggregation have focused on the regular income tax regime. In the next phase, this approach will likely be applied to other tax regimes, such as the simplified tax regime as well as additional disaggregation, such as by marital status, that could further enrich the understanding of tax data by sex. 3. Lessons learnt Some of the lessons learnt from DIAN’s experience in sex-disaggregation among income taxpayers are as follows: • The legal mandate given to DIAN through the 2022 Tax Reform and the commitment and leadership of the former Director General of DIAN, who established an institutional strategy (including setting up the Unit for Differential Focus and Gender (PLURAL)) created the institutional space to study gender issues. • Competent technical staff who led the development of the methodologies significantly contributed to the progress made in sex-disaggregating and analyzing data. • Inter-agency collaboration in data sharing is essential. If DIAN has a legal mandate to fully access the taxpayers’ sex in the National Civil Registry, DIAN would not need to develop its own probabilistic allocation models. It would also reduce the duplication of efforts and avoid inaccurate prediction and inconsistencies in the data. • One challenge in asking taxpayers to self-report their sex (male, female, non-binary, and transgender) is its voluntary nature, leading to less than 20% of taxpayers completing this question in their tax returns. • Some individuals expressed discomfort or annoyance when they were asked about their sex (male, female, non-binary, and transgender) during in-person tax registrations. This led to the discontinuation of this practice for tax registrations altogether. • Given these challenges, DIAN decided to develop its own methods of predicting the taxpayers’ sex using the ID rule, the first names of taxpayers, and merging taxpayer and pension data described above. But there could be prediction errors in this method. 4 KNOWLEDGE NOTE • The handling of sensitive data requires designing protocols and tools that guarantee adequate care and the possibility of data recovery in case of data loss. 5 KNOWLEDGE NOTE 1. INTRODUCTION In 2022, Colombia’s National Tax and Customs Authority (DIAN) was given the mandate through Article 90 of the Law 2277 to establish the necessary information for studies, cross-referencing, and statistical analyses with a gender focus. Much of the statistics on gender wage gaps and income distribution in Colombia are calculated using household surveys (mainly the Colombian Great Integrated Household Survey, GEIH 1). While household surveys cover low-income individuals and informal workers who do not pay taxes, one disadvantage of these surveys is the underrepresentation of the wealthiest individuals. In contrast, DIAN covers formal taxpayers and high-income individuals, which is important for measuring the gender income gaps, particularly at the top of the income distribution. 2 This Knowledge Note aims to document DIAN’s experience in sex-disaggregating income taxpayer data and provide examples of the use of disaggregated data for policy analysis. It offers lessons for other revenue authorities and government agencies planning to sex-disaggregate and analyze administrative tax data. It summarizes the institutional strategies, methodologies used, and challenges encountered in this process based on interviews with experts and government officials. We use the term “sex” to mean biological sex at birth unless explicitly stated otherwise. Studies documenting revenue authorities’ efforts to identify the sex of the taxpayers are limited. In a survey of OECD countries, 25 of the 43 countries have disaggregated data for analysis and tax authorities in only 16 countries can retrieve this data from the tax returns of the personal income tax and social security contributions (OECD 2023). The Africa Tax Administration Forum (ATAF 2021) reports that revenue authorities in ATAF countries do not collect the taxpayers’ sex, except for Uganda, although the Rwanda Revenue Authority has recently collected this information and used it for policy analysis (Ruganintwali 2023). Sex-disaggregated tax data can provide insights on the gender difference in the income distribution and incomes of high-net worth individuals. Women comprise 11% of the top 0.1% of labor income earners in the U.S. (Piketty, Saez, and Zucman 2018) and 11% and 20% of the top 0.1% of total income earners in Denmark and Spain, respectively (Atkinson, Casarico, and Voitchovsky 2018). In Honduras, women make up 30% of the top 0.1% of income earners with undistributed corporate profits (Del Carmen et al. 2024) which is similar to Colombia with 28.2% of female share (DIAN 2024). Further disaggregation on the type of income reported to authorities (e.g. labor income, undistributed 1 GEIH is a survey on household demographics, income, employment, and housing. 2 DIAN sets thresholds for tax declarations related to income, assets, purchases, consumption, deposits, and transfers from the previous year and therefore the tax data do not include low-income individuals. Tax data also does not include underreported income. For the 2022 fiscal year, DIAN (2023) reported that about 5.4 million people filed income in their tax returns, representing a 21% increase (equivalent to 1.08 million taxpayers) compared to the 2021 fiscal year. Approximately 2.7 million of the 5.4 million filers had non-zero taxes to pay on their income after deductions. 6 KNOWLEDGE NOTE corporate profits, rental income) can indicate the source of gender inequality and whether inequality persists overtime. Regarding the tax burden, Lin and Slemrod (2024) find that women face lower effective tax rates in the U.S. federal income tax because of women’s lower incomes and progressivity of the tax system while Del Carmen et al. (2024) find no gender difference in Honduras. Some governments regularly analyze and publish sex-disaggregated data for policy analysis. The Government of Canada analyzes tax expenditures on an annual basis using personal income tax data, which shows that women tend to benefit from deductions for childcare expenses and disability support while men benefit from exemptions for capital gains (Government of Canada 2021). The Government of Ireland also publishes the Annual Budget, which shows gender differences in implications of specific tax deductions (Government of Ireland 2024). There are also efforts by regional tax organizations. For example, ATAF is providing support to revenue authorities to collect sex- disaggregated tax through the publication of their flagship report, the African Tax Outlook (ATAF 2022). Until recently, despite isolated efforts by DIAN researchers for more than a decade, no institutional strategy had been established to disaggregate this data by sex (women and men). In 2021, DIAN began coordinating and centralizing initiatives to include sex-disaggregated data to conduct analyses that promote inclusive and equitable policies. Additionally, steps have been taken to collect self-reported gender identity (male, female, non-binary, and transgender). 7 KNOWLEDGE NOTE 2. METHODOLOGIES AND INSTITUTIONAL STRATEGIES FOR DATA DISAGGREGATION OVERTIME Efforts made to disaggregate tax data within DIAN went through several stages overtime as outlined in Table 1. There were also legal reforms that gave DIAN the mandate to collect and analyze sex- disaggregated data and institutional strategies that were established for this purpose. These are discussed in this section. Table 1: Legal reforms, institutional strategies, and methodologies for sex-disaggregation of tax administrative data overtime Phase Description Challenges Early 2010s There was no institutional strategy for data Lack of coordination and an disaggregation by sex. Isolated efforts by institutional strategy. researchers. Mid-2010s Two-phase process using ID number rules and Errors in classification due to name-based algorithm was used for sex uncommon and foreign classification. names. DIAN management DIAN’s management took the initiative to study None specifically noted for in 2021 gender implications using sex-disaggregated tax this phase. statistics. 2022 Colombian Tax 2022 Colombian Tax Reform gave DIAN the None specifically noted for Reform mandate to conduct gender-focused studies. It also this phase. mandates DIAN to collect taxpayers’ sex in the tax returns. Two working groups Two working groups (the RUT team and the None specifically noted for within DIAN in 2022 analytics and economic studies team) were formed this phase. to handle data disaggregation. Disaggregation and A three-step process was developed to Manual process for new gender analysis disaggregate by women and men using taxpayer registrants; limited by data since 2022 and pension data, ID identification rules, and a compatibility. Voluntary DIAN introduced a voluntary sex declaration option Low completion rate and declaration option (male, female, non-binary, transgender) in tax potential inaccuracies. in tax returns and returns and RUT forms. The option for self- tax registration in declaration in the RUT form was discontinued in 2022 2024 Establishment of PLURAL Unit was created to ensure that DIAN’s Institutional adjustments to Unit for Differential internal policies and work conditions are free of new gender policies. Focus and Gender gender biases. It also conducts analyses of taxes (PLURAL) in DIAN in with a gender focus. 2023 8 KNOWLEDGE NOTE National Civil An agreement with the National Civil Registry was Privacy concerns and data- Registry Agreement established to use data for sex classification, sharing limitations. in 2024 though DIAN’s access is restricted to validation purposes only. DIAN cannot access the entire Civil Registry data for analysis or publication. Use of DIAN uses sex-disaggregated data for internal Data currently being disaggregated data analysis and policy evaluation. processed for publication. 2.1. Disaggregation process in the mid-2010s There have been several decentralized efforts for sex-disaggregation of data. These began in the mid- 2010s when access to microdata from tax returns that included the identification numbers and names of taxpayers was available. Additionally, these efforts proceeded without any formal information exchange agreement with other government entities to obtain sex data from their databases. The disaggregation process in mid-2010s took place in two phases. • Identification number rule: In the first phase, a specific rule on the identification numbers for people who were issued an ID before 2000 (Decree 1695, 1971) was used. According to this rule, identification documents with numbers in the range of 1,000,000 to 20,000,000 or from 70,000,001 to 100,000,000 correspond to men, and those in the range of 20,000,001 to 70,000,000 correspond to women. 3 This rule allowed for an initial classification by sex. 4 This method could not be used for individuals born after 2000 for whom the ID rule was not applied. • Algorithm using names: The second phase used an algorithm that identified common characteristics in male and female names and determined patterns within them. For example, names ending with the letter “A” tend to correspond to women in Colombia, while those with the first name “Juan” are commonly men. Using this technique, sex was assigned to the remaining cases. However, there remained an approximate group of 2% of individuals without a clear classification due to uncommon names, foreign names, or names used interchangeably between men and women. This code is no longer in use within DIAN because other methods, such as the identification rule, are more efficient. Efforts to disaggregate data within DIAN were discontinued between late 2010s and 2021. During this period, academic studies by Ávila-Mahecha (2016) and Londoño-Vélez & Ávila-Mahecha (2024) provided important insights. However, much of the previous work on sex-disaggregation was not fully integrated into institutional knowledge as it was carried out by individual researchers and faced 3 Additionally, for the identity card (document for minors), which consists of 11 characters, the penultimate digit determines the sex: even for males and odd for females. 4 This technique has also been used for other taxable years. The recovery percentage was 80.09% of the sample using the consecutive number of the ID card for the year 2019, 76.85% for 2020, 72.33% for 2021, and 64.76% for 2022. The recovery percentage decreases in recent years because young tax filers were not assigned an ID number under this rule and the share of young filers increased during this period. 9 KNOWLEDGE NOTE technical challenges. As a result, some of the earlier efforts were lost, and the institutional processes were not maintained. 2.2. Disaggregation methods since 2022 DIAN’s Department of Economic Studies and the Analytics Department developed a statistical model to predict the sex of the income taxpayers in the Unified Tax Registry (RUT) using name-based algorithms (among other methodologies) to classify sex using taxpayers’ names. The RUT is a DIAN- managed system that identifies, locates, and classifies all individuals and companies required to pay taxes, file income and assets tax return, and fulfill other tax obligations. Registration in the RUT is mandatory for filing income tax returns. DIAN also has access to additional information about individuals’ sex through a previous agreement with the Pension and Parafiscal Management Unit (UGGP). 5 The disaggregation model, developed by the Analytics Department at DIAN since 2022, was inspired by the one used in the mid-2010s. Three steps were taken to disaggregate data by women and men: 1. Merging taxpayer and pension data: It used two databases: registry of all taxpayers (RUT database) and the pension database (UGGP). RUT database for the year 2022 comprises 18.3 million observations, of which 42.8% could be cross-referenced with UGGP information (see Figure 1). In these cases, the sex classification from UGGP was kept and the data was merged based on ID numbers (which comprised of 7.8 million taxpayers corresponding to 70% of income declarants in 2020). 2. ID identification rule: Next, the ID card identification rule – which reveals sex for individuals born before 2000 was used, which enabled an additional 34.6% of the database to be cross- referenced. 3. Name-based algorithm technique: For the remaining cases, a generalized linear model for sex classification by name was developed using an algorithm technique (see Annex 1). This method was used for individuals born after 2000 for whom the ID rule could not be applied. This model assigns a probability of being male or female based on characteristics of the first and second middle name, such as letter structure and number of consonants. 6 The model had 7,908,772 records in its training phase, achieving a 95% fit. Finally, after all the exercises, only 0.17% of the original RUT database remained unclassified. This is the data that DIAN uses to conduct sex-disaggregated analysis of tax data that is published. 5 Unit responsible for monitoring and controlling contributions made to the Social Protection System. 6 Colombian population is characterized by more than one-first name and, in many cases, their position indicates their sex. For example, “Jose Maria” is men and “Maria José” is women. 10 KNOWLEDGE NOTE Figure 1: Analytics model disaggregation. 2022. This disaggregation model could theoretically be applied to pre-2018 tax data. However, there were significant modifications to the structure of income tax reporting, including changes in wealth, deductions, and cost reporting, as well as adjustments in processing methodologies. This makes it difficult to compare the pre-2018 data with current data and can lead to discrepancies over time. Consequently, analyses have been restricted to gross income and income tax payments. DIAN intends to eventually disaggregate data from earlier years and carry out corresponding analyses using the limited comparable variables. 2.3. Self-declaration of sex in tax returns and registration The 2022 Colombian tax reform 7 gave DIAN the legal requirement to conduct gender-focused studies, specifying that “DIAN will establish the necessary information to be disclosed in tax returns, allowing (the institution) the collection of information necessary for conducting studies, data analysis, statistical analysis with a gender focus, and proposing reductions in structural inequalities” (Ley 2277, 2022, Art. 90). The tax reform also mandates DIAN to collect taxpayers’ sex in the tax returns. 7 Reform aimed at modifying a series of taxes to increase national revenue. 11 KNOWLEDGE NOTE Since the 2022 tax reforms, DIAN collects information on the sex of the taxpayer in both the tax registry database (i.e. the RUT) and tax returns. 1) Starting in the 2022 fiscal year, taxpayers can voluntarily indicate their gender (male, female, non-binary, and transgender) in the electronic tax return system. Out of 5,438,850 taxpayers, around 1 million individuals did so in 2023. This data has not yet been fully processed for analysis but is currently in progress. DIAN is actively working on integrating and validating this information. 2) For tax registrations, taxpayers could voluntarily self-report their gender by choosing one of four categories (namely, male, female, non-binary, and transgender) which was incorporated in the RUT forms in October 2023. The RUT registration is a one-time process that can be completed either in person or online. However, the self-declaration of sex has been discontinued for RUT registrations because taxpayers became noticeably upset when they were asked about their sex in person by tax officials. Since then, DIAN retrieves this information from the identity documents in the National Civil Registry for new registrants under a restrictive information agreement with the National Civil Registry. 2.4. Institutional strategy established since 2021 a) DIAN’s management in 2021 Around 2021, DIAN’s management led the initiative to study statistical behavior and gender implications using tax statistics by focusing on an institutional strategy to disaggregate data by sex. Initially, an agreement was made to do so by collaborating with the National Civil Registry of Colombia 8, but progress on data sharing was slow because privacy concerns were raised by the Civil Registry. For this reason, DIAN subsequently opted to develop its own methods and models as well as an institutional strategy to support these efforts. b) Working groups on sex disaggregation of tax data Following the 2022 tax reforms, working groups in DIAN were organized to determine the best way to sex-disaggregate data. Eventually, two working groups – the RUT team and the analytics and economic studies team – were formed to handle the data disaggregation. The RUT team focused on negotiating an inter-institutional agreement with the National Civil Registry. Meanwhile, the analytics and economic studies team worked on developing a model to disaggregate sex using names and identity card numbers. c) Agreement with the National Civil Registry in 2024 After a lengthy process, a restrictive information agreement was reached between DIAN and the National Civil Registry at the beginning of 2024 for the use of Civil Registry data. The agreement 8 Entity responsible for civil registration and identification of Colombians. 12 KNOWLEDGE NOTE explicitly states who can access data, what can be accessed, and how it can be used. But because DIAN cannot access the entire Civil Registry database, it needs to validate tax filers’ sex individually through the available online system, which makes validation of already registered individuals too time- consuming. DIAN retrieves the sex of new registrants from the identity documents in the National Civil Registry database 9 and records it in the tax registry database, the RUT. 10 Because it would be too time-consuming to verify this information for existing taxpayers, DIAN had to develop its own methods for sex-disaggregation. d) Establishment of Unit for Differential Focus and Gender (PLURAL) in DIAN Simultaneously, through Circular number 000004 dated May 12, 2023, “the DIAN Director established the Unit for Differential Focus and Gender (PLURAL) within the entity, in order to undertake work plans, such as verifying that employment, compensation, promotion, welfare, health, safety, and labor flexibility, as well as governance and leadership, are developed without gender biases” (DIAN, 2023, Resolución 133). The initial efforts and objectives of PLURAL focused on gender equality within DIAN. Currently, PLURAL focuses on ensuring the absence of gender biases within DIAN, promoting a tax policy with a gender focus, and conducting statistical studies to identify and address detected gender gaps (DIAN, 2024). 9 Historically, the National Civil Registry has limited its data to recording sex (male or female) based on the sex reported in the birth certificate. Since 2022, the Civil Registry has allowed individuals to change their sex to transgender or non-binary through a specific process. First, they must change the sex on their birth certificate and then submit a written request to the Civil Registry with the updated birth certificate to apply for the change. As of March 13, 2024, 100 ID cards in Bogotá have been issued with these new options but not all necessarily file tax returns. 10 Soon, a web service will automatically obtain the data from the National Civil Registry when entering the ID number. 13 KNOWLEDGE NOTE 3. USE OF DISAGGREGATED DATA The sex-disaggregated data is available at DIAN for internal use and analysis. The databases contain variables, such as assets (gross and net), liabilities, income (gross and net), costs and deductions, income tax, occasional profits, payable tax, and final balance. From this data, sector information can be extracted through the main economic activity, geographical location via sectional address 11 , identification document, name, and, thanks to the disaggregation, sex, among other sub variables. The Analytics Department delivered the results to the Department of Economic Studies, thus facilitating their incorporation into future analyses and studies that contribute to a better understanding of fiscal dynamics with a gender perspective. Once an institutional strategy for cross-referencing information sources was implemented at DIAN, it became possible to initiate the process of coding income databases to implement longitudinal studies that include measurements of economic mobility, differences in marginal tax rates, income and wealth inequality, among other topics. Some of these have been implemented by the Economic Studies Department in internal unpublished statistical analyses, which include variables of wealth, income, and tax obligations, disaggregated by sex and economic sector. 3.1. Working paper by Ávila-Mahecha (2016) The data obtained from sex disaggregation has been used by Ávila-Mahecha (2016) in a working paper of DIAN. 12 This document highlights gender differences in income, wealth, and level of debt. For the year 2014, in Colombia, the average wealth of female tax filers represented 92% of that of men and their gross income averaged only 58% of male income. Regarding sectoral analyses, there are gender differences in some subsectors. Men are overrepresented in the construction or mining and quarrying subsectors while women predominate in subsectors on household employment or real estate. 13 11 Set of numbers identifying the geographical location of companies or individuals in Colombia, specifically indicating the location of their tax office. This information is included in the tax data. 12 The paper was prepared in the Department of Operational Analysis Management of DIAN. 13 The author separated individuals into age groups (youth, middle-aged, and elderly) using the ID number sequence. Both middle-aged men and women are the group with the highest assets and wealth compared to the youth and elderly. Gender income gaps change by age. Elderly men have 75% more gross income than elderly women. Among middle-aged individuals, men earn 79.6% more than women, and among the youth, men earn 64.4% more than women. 14 KNOWLEDGE NOTE 3.2. Paper by Londoño-Vélez & Ávila-Mahecha (2024) These data have also been used by Londoño-Vélez & Ávila-Mahecha (2024) where they employ various databases, including a longitudinal panel of individual tax declarations covering the fiscal years 1993- 2016 in Colombia. The authors study behavioral responses to personal wealth taxes in Colombia using this data and names in the Panama Papers to investigate the diversion of funds to tax havens. They find evidence that taxpayers immediately decrease declared wealth in response to a wealth tax. Regarding the gender differences, they find that the proportion of men is higher among individuals named in the Panama Papers (63.4% among those named and 56.2% among those not named). 3.3. Analysis by PLURAL Among its initial medium-term objectives for PLURAL in DIAN are publishing tax statistics with a gender perspective for use by researchers and civil society, developing documents on gender income gaps, and studying the tax and customs regimes to identify gender gaps and propose adjustments through public policies. PLURAL is also working to ensure that DIAN’s internal policies and work conditions do not have gender biases. Current efforts have focused on identifying income of taxpayers under the regular income tax regime. In the next phase, this approach will likely be applied to other tax regimes, such as the simplified tax regime, assets, among others, as well as additional analyses that could further enrich the understanding of tax data by sex. These initiatives include the analysis of marital status and age of taxpayers and sectoral studies that examine how the dynamics of different industries and economic areas can affect men and women differently. The PLURAL group has published sex-disaggregated data in two statistical reports. The first report analyzes the wealthiest individuals in Colombia (DIAN, 2024b). For 2021, the wealthiest 5% of adults in the country comprised around two million people. Of these, 50.1% were men and 49.9% were women. The proportion of women decreases as groups with greater wealth are analyzed. Of the wealthiest 1% of adults in Colombia, 45.7% were women and 54.3% were men. Finally, the wealthiest 0.01% of Colombia is comprised of 70% men and 30% women. 15 KNOWLEDGE NOTE Figure 2: Percentage of men and women in each wealth group. Colombia 2021 Top 0.01% Top 0.1% Top 1% Top 5% 0% 20% 40% 60% 80% 100% Male Female Source: DIAN (2024b). The data uses income tax returns of individuals and assimilated residents for the 2021 tax year (Form 210). 14 When analyzing income by sex, a similar trend is observed: of the top 5% of adults with the highest incomes in Colombia, approximately 60% are men and 40% are women. In the top 1% with the highest incomes, 65% are men and 35% are women; this figure for men increases to 72% (and 28% for women) in the top 0.1% and to 76% for men and 24% for women in the top 0.01%. 14 The wealth groups were constructed based on net worth (box 31 of Form 210). 16 KNOWLEDGE NOTE Figure 3: Percentage of men and women in each highest income group. Colombia 2021 Top 0.01% Top 0.1% Top 1% Top 5% Top 8% 0% 20% 40% 60% 80% 100% Male Female Source: DIAN (2024b). The data uses income tax returns of individuals and assimilated residents for the 2021 tax year (Form 210). 15 The subsequent report published by the PLURAL group focuses on an analysis of possible gender biases present in tax legislation and their implications. This report examines gender gaps that may exist in the tax system, identifying both explicit biases, where women and men are treated differently, and implicit biases, which arise because of gender differences in incomes or employment patterns. Additionally, the report provides a comparison and examples of how these gaps and biases have manifested in both national and international contexts (DIAN, 2024b). Currently, work is underway on a future analysis of direct and indirect taxes. These studies aim to foster reflections that may influence the formulation of public policies or, at the very least, serve as a basis for discussions and support legislative proposals in the country. 15 The income groups were constructed based on the total gross income (sum of boxes 32 + 43 + 58 + 74 + 99 + 104 + 107 + 108 + 111 of Form 210). 17 KNOWLEDGE NOTE 4. LESSONS LEARNED AND RECOMMENDATIONS 4.1. Political commitment and competence of technical staff The legal mandate through the 2022 Tax Reform and the former Director General’s commitment and leadership created the institutional space to study gender issues. This includes establishing an institutional strategy, such as setting up the Unit for Differential Focus and Gender (PLURAL) and working groups on negotiations with the National Civil Registry and data disaggregation. Competent technical staff who led the development of the methodologies also significantly contributed to the progress made in sex-disaggregating and analyzing data. 4.2. Inter-agency data sharing Inter-agency collaboration in data sharing is essential. If DIAN has the legal mandate to fully access the taxpayers’ sex in the National Civil Registry, it will not need to develop its own probabilistic allocation models and it would reduce the duplication of efforts and avoid inaccurate prediction and inconsistencies in the data. Negotiations between DIAN and the National Civil Registry faced difficulties and it took a long process to reach an agreement. The National Civil Registry argued that sharing sensitive information could violate privacy and that its improper use could cause discrimination. Consequently, a restrictive information agreement was signed to individually validate identity documents through an online system and confirm the sex of new registrants in the RUT. . Minor inconveniences have been identified, such as cases of people receiving a new ID number, which generates duplication problems. Additionally, only data for adults are shared. 4.3. Voluntary declaration and sensitivity of information One of the main challenges with asking taxpayers to self-report their sex in the tax declaration is its voluntary nature. In its initial implementation, less than 20% declared their sex. The distinction between male, female, non-binary and transgender in the box can cause confusion or discomfort among individuals, reducing their willingness to complete it. Certain individuals expressed annoyance when questioned about their sex during in-person registration at the RUT. 16 Additionally, the questions were asked in cubicles that were not private which could also reduce willingness to respond to the question. 16 The registration for the RUT is done only once and remains registered for all tax procedures, including the annual income tax declaration. 18 KNOWLEDGE NOTE 4.4. Name based algorithm technique Name-based algorithm models present challenges, such as the need for clear rules for identification or learning databases. Also, there are individuals with uncommon names, making prediction difficult. The model's margin of error can also result in incorrect assignments. Despite these challenges, in the absence of explicit data or an information agreement, a name-based allocation model seems to be the best option for disaggregating data by sex. DIAN recommends verifying the data with available sources for more accurate information and it considers it essential to complete all phases of verification and to carry out random checks to ensure data integrity. Yet, interviewees of this study agreed that the ideal scenario would be for the entity responsible for registering people's sex in the birth registry, the National Civil Registry, to be the only one responsible for collecting this information. 17 4.5. Ensuring data integrity Maintaining data integrity presents challenges, such as ensuring consistency across different storage locations and managing access permissions. Also, technical issues during data migration or hardware upgrades can lead to data loss, as seen when files for 2014 were deleted. The major problem lies in the potential for data to be irretrievably lost if not backed up properly. It is advisable to promote the use of file storage systems with recovery mechanisms so that data cannot be permanently deleted by accident and to ensure that sensitive data is not at risk. 17 The following people were interviewed in April and May 2024: Javier Ávila Mahecha, Adriana Yaneth Plazas Cadena, Irayda Ximena Lara Chaves, Juan Guillermo Caicedo Useche, Olga Adriana Puerto Gonzalez, María Fernanda Osorio Moreno, Pastor Hamleth Sierra Reyes, and David Gustavo Suarez Castellanos. 19 KNOWLEDGE NOTE REFERENCES ATAF (2022). Are tax policies developed to reduce gender inequality in ATAF member countries?: A desk review. Pretoria, South Africa; ATAF. From: https://events.ataftax.org/. Atkinson, A. B., A. Casarico, and S. Voitchovsky. 2018. “Top Incomes and the Gender Divide.” Journal of Economic Inequality 16: 225–56. Ávila-Mahecha, J. (2016). Diferencias de género en la riqueza, ingresos y rentas de las personas naturales en Colombia. DIAN Dirección General. Decreto 1695. 1 de septiembre de 1971 (Colombia). Del Carmen, Giselle, Santiago Garriga, Wilman Nuñez, Thiago De Gouvea Scot De Arruda. 2024. “Two Decades of Top Income Shares in Honduras.” Policy Research Working Paper WPS 10722, World Bank Group, Washington, DC. Departamento Administrativo Nacional de Estadística. (2012). Colombia. Accessed at: https://www.dane.gov.co/files/investigaciones/fichas/glosario_GEIH13.pdf Departamento Administrativo Nacional de Estadística. (2018). Colombia. Accessed at: https://www.dane.gov.co/index.php/estadisticas-por-tema/demografia-y-poblacion/censo-nacional-de- poblacion-y-vivenda-2018/cuantos-somos Departamento Administrativo Nacional de Estadística. (2020). Colombia. Accessed at: https://www.dane.gov.co/files/investigaciones/genero/informes/Informe-participacion-mujer-mercado- laboral.pdf Departamento Administrativo Nacional de Estadística. (2024). Colombia. Accessed at: https://www.dane.gov.co/files/operaciones/GEIH/bol-GEIHMLS-oct-dic2023.pdf Dirección de Impuestos y Aduanas Nacionales, Circ 000004. 12 de mayo de 2023 (Colombia). Dirección de Impuestos y Aduanas Nacionales, Res 133. 4 de septiembre de 2023 (Colombia). Dirección de Impuestos y Aduanas Nacionales. (2024a). Elementos conceptuales para una tributación con enfoque de género. Retrieved from: https://www.dian.gov.co/dian/cifras/Informesespeciales/03- Elementos-Conceptuales-para-una-Tributacion-con- Enfoque-de-Genero.pdf Dirección de Impuestos y Aduanas Nacionales. (2024b). Estadísticas de ingreso y riqueza en clave de género: un zoom en las personas más ricas de Colombia. Retrieved from: https://www.dian.gov.co/dian/cifras/Informesespeciales/02-Estadisticas-de-Ingreso-y-Riqueza-en-Clave-de- Genero-PLURAL.pdf Ley 2277, Art. 90. 13 de diciembre de 2022 (Colombia). Lin, E. Y., and J. Slemrod. 2024. “Gender Tax Difference in the U.S. Income Tax.” International Tax and Public Finance (Mar.): 1–33. Londoño-Vélez, J., & Avila-Mahecha, J. (2024). Behavioral responses to wealth taxation: Evidence from colombia (No. w32134). National Bureau of Economic Research. Observatorio Colombiano de las Mujeres. (s.f.). Colombia. Accessed at: https://observatoriomujeres.gov.co/es/EconomicAutonomy Registraduría Nacional del Estado Civil de Colombia. (2024). Colombia. Accessed at: https://www.registraduria.gov.co/-Quienes-somos-670-.html Ruganintwali, P.B. (2023). Gender Equality in Rwanda and in the Rwanda Revenue Authority (RRA). https://www.imf.org/-/media/Files/Topics/Fiscal/Revenue-Portal/Gender/gender-equality-in-rwanda-and-in- the-rra.ashx Unidad de Gestión Pensional y Parafiscales. (2024). Colombia. Accessed at: https://www.ugpp.gov.co/nuestraentidad/somos/misionyvision World Development Indicators. World Bank. (2018). Retrieved from: https://databank.worldbank.org/source/world-development-indicators 20 KNOWLEDGE NOTE ANNEX 1 METHODOLOGY MODEL DISAGGREGATION 2022 Sex Identification Process Cross-referencing with administrative databases Name-base algorithm • The information is cross-referenced • For the remaining cases, a name with other administrative based algorithm is used to databases from entities that have determine the sex based on the sex using the national ID. individual's name characteristics. Identification using ID rule • For cases not resolved in the previous step, the identification rule is applied for IDs issued before the year 2000. 21 KNOWLEDGE NOTE 22 KNOWLEDGE NOTE