Policy Research Working Paper 11248 Monitoring Global Aid Flows A Novel Approach Using Large Language Models Xubei Luo Arvind Balaji Rajasekaran Andrew Conner Scruggs Development Finance Vice Presidency November 2025 Policy Research Working Paper 11248 Abstract Effective monitoring of development aid is the foundation taxonomy, allow unpacking of “multi-sectoral” and “sector for assessing the alignment of flows with their intended not specified” classifications, and enable estimation of flows development objectives. Existing reporting systems, such to new themes, including World Bank Global Challenge as the Organisation for Economic Co-operation and Devel- Programs, International Development Association–20 opment’s Creditor Reporting System, provide standardized Special Themes, and Cross-Cutting Issues. Validation classification of aid activities but have limitations when it against both Creditor Reporting System benchmarks and comes to capturing new areas like climate change, digitali- International Development Association commitment data zation, and other cross-cutting themes. This paper proposes demonstrates robustness. This approach illustrates how a bottom-up, unsupervised machine learning framework machine learning and the new advances in large language that leverages textual descriptions of aid projects to generate models can enhance the monitoring of global aid flows highly granular activity clusters. Using the 2021 Credi- and inform future improvements in aid classification and tor Reporting System data set of nearly 400,000 records, reporting. It offers a useful tool that can support more the model produces 841 clusters, which are then grouped responsive and evidence-based decision-making, helping to into 80 subsectors. These clusters reveal 36 emerging aid better align resources with evolving development priorities. areas not tracked in the current Creditor Reporting System This paper is a product of the Development Finance Vice Presidency. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at xluo@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team Monitoring Global Aid Flows: A Novel Approach Using Large Language Models Xubei Luo, Arvind Balaji Rajasekaran, Andrew Conner Scruggs 1 Authorized for distribution by Francisco Galrao Carneiro, Manager, Development Finance Vice Presidency, World Bank Group JEL codes: F35, C38, C55 Keywords: Foreign Aid, Classification Methods: Cluster Analysis, Modeling and Analysis 1 We are grateful to Paulo S. Baioni and Vasanth Kumar Vikram Prakash for their substantive contributions to the technical foundation of this paper, and to Damodaran Venugopal Kothandan, Ani Popiashvili, Sunny Yujuan Sun, Sasha Segeda, and colleagues from the World Bank Group Information and Technology Solutions team and the Technology Innovation Office for their support in the development of the AI and machine learning algorithms. We thank Akihiko Nishio for overall guidance, and Francisco Galrao Carneiro for valuable suggestions and expert insights throughout the study; Enrique Blanco Armas, Michelle Leonore Fortune, Ashish Makkar, Stela Mocan, Raju Singh, and Long Hanhua Wang for their productive discussions; and all colleagues who participated in the Monitoring Global Aid Flows: An AI & Machine Learning Approach workshop and the Global Aid Data Workshop for their feedback. The content is solely the responsibility of the authors. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not represent the views of the World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. I. Introduction Monitoring development aid contributes to improving the alignment of financial flows with their intended development objectives. As global development priorities evolve, policy makers require monitoring systems that can track commitments not only to traditional sectors such as health, education, and agriculture, but also to cross-cutting and newly emerged thematic domains like climate change, digitalization, and global pandemics. Without the ability to capture financial support to these dynamic areas, aid data risks underrepresenting the full scope of development efforts, potentially contributing to a misalignment between global goals such as the Sustainable Development Goals (SDGs), with operational priorities of development cooperations. The Organisation for Economic Co-operation and Development (OECD) Creditor Reporting System (CRS) 2 is the most widely used and comprehensive database of official aid flows, with over 4 million activity-level records reported since 2000. Its standardized “purpose codes,” designed by the OECD Development Assistance Committee (DAC), enable comparability across financial flows from different donors and over time, making the CRS an indispensable tool for researchers, policy makers, and development practitioners. This categorization follows a top-down approach, where official labels (known as “purpose codes”) for categorizing what sector the aid activity is intending to support (example: health, agriculture, etc.) are predetermined by the DAC before the reporting period begins. 3 Because of this structured and consistent coding system, the CRS is highly reliable for tracking flows to traditional sector categories. However, at the same time, it faces inherent challenges when applied to new or cross-cutting priorities. Activities related to new priorities are often dispersed across multiple sectors or grouped into residual categories like multi-sectoral/cross-cutting or sector not specified. As the Stockholm Environment Institute (2025) describes, “it can be very difficult to compile a picture of financial support for some issues that are of high importance on both national and international agendas, for example sustainable oceans, because they are not in the coding system of the CRS. So, support for activities related to oceans are likely to be split across quite a few sector categories and cannot easily be identified or brought together in any comprehensive way.” This reflects a structural shortcoming in how aid activities are classified, pointing to a broader challenge in producing a full and coherent picture of aid flows to emerging thematic areas. This paper proposes a complementary and novel approach to address this challenge. We develop a bottom- up, unsupervised machine learning framework that leverages recent advances in machine learning and large language models (LLMs) to maximize the value of the textual descriptions accompanying each aid activity in the CRS dataset. By embedding and clustering the text from nearly 400,000 records from the 2021 dataset, the framework organically re-organizes the CRS data into new groups of similar activities in a coherent manner. Our framework produces 841 granular clusters, which we then group into 80 sub-sectors. 2 See: Converged Statistical Reporting Directives for the Creditor Reporting System (CRS) and the Annual DAC Questionnaire, (OECD, 2023), https://one.oecd.org/document/DCD/DAC/STAT(2023)9/FINAL/en/pdf 3 In OECD CRS reporting, “the sector of destination of a contribution is determined by the reporting entity answering the question ‘which specific area of the recipient’s economic or social structure is the transfer intended to foster?’ and assigning a ‘purpose code’ which reflects this. The sector classification does not refer to the type of goods or services provided by the donor. Sector specific education or research activities (e.g., agricultural education) or construction of infrastructure (e.g., agricultural storage) should be reported under the sector to which they are directed, not under education, construction, etc.” (see: OECD, 2019. https://web-archive.oecd.org/2019-03-28/80638- purposecodessectorclassification.htm). 2 To enhance interpretability, recent advances in LLMs are used to generate meaningful names for the machine-derived clusters. Our approach makes three main contributions. First, it identifies 36 emerging areas of aid activity that are absent from the current CRS taxonomy, including food security, digitalization, and pandemics. Second, it unpacks ambiguous categories, reallocating records reported as “multi-sectoral” or “not specified” into more precise clusters. Third, it enables estimation of commitments to development priority areas including the World Bank Group’s Global Challenge Programs (GCPs) and International Development Association (IDA)-20 Special Themes and Cross-Cutting Issues. Validation against both CRS benchmarks and IDA commitments data underscores the robustness of the methodology. The remainder of this paper is structured as follows. Section II reviews the literature on aid monitoring and machine learning applications in development. Section III outlines the methodology, including data processing, clustering, and validation. Section IV presents results and compares them to existing CRS classifications and IDA commitments. Section V discusses our key findings and their policy implications. Section VI concludes by highlighting the broader significance of machine learning and LLMs for the future of aid monitoring and reporting. II. Literature Review Effective monitoring development aid has long been recognized as complex, resource-intensive, and sensitive to the priorities of different stakeholders (Flogstad & Hagen, 2017; Djankov et al., 2009; Easterly & Pfutze, 2008). Donors and agencies use varied systems for classifying and reporting aid flows, which complicates comparability across organizations. For instance, Flogstad and Hagen (2017) show that bilateral agencies track climate finance differently according to internal policies, making cross-agency comparisons difficult. The fragmented nature of donor operations and differences in reporting practices contribute substantially to the challenges of measuring aid flows consistently. The OECD CRS is the most widely used and comprehensive database for official aid flows, with standardized purpose codes that provide comparability across donors and time (Nunnenkamp et al., 2013). Its top-down classification system is robust, but adapting purpose codes to new or cross-cutting themes can be slow and resource-intensive (Toetzke, Banholzer, & Feuerriegel, 2022). The original purpose code taxonomy was created alongside the CRS in 1973, and since 2010, only two new purpose codes have been added: “Refugees/asylum seekers in donor countries” in 2018 (OECD, 2019) and “Covid-19 control” in 2021 (Casadevall Bellés and Calleja, 2024). As Grollmen et al. (2018) and WHO (2015) note, inconsistencies sometimes emerge when different entities interpret the reporting standards differently, particularly in health. Pincet et al. (2019) similarly highlight the importance of developing complementary instruments to maximize the value of CRS data. Against this backdrop, scholars have applied machine learning techniques to aid data, broadly following two approaches. The first, more common, is a supervised learning approach, where models are trained on pre-labeled data to identify aid flows to predefined topics. For example, Pincet et al. (2019) estimate support to the SDGs, Dixit et al. (2022) focus on global public goods for health, and Borst et al. (2022) and Toetzke, Stünzi, & Egli (2022) apply supervised methods to estimate climate finance. While effective for specific purposes, these models are limited in scope and demand substantial amounts of human effort to create the 3 training datasets. Therefore, a model trained to identify climate finance cannot be repurposed to track flows to digitalization or other themes without re-training on new labeled datasets. The second approach is unsupervised learning, which allows algorithms to organically group aid activities into clusters without pre-labeled training data. This approach was pioneered in the CRS context by Toetzke, Banholzer, & Feuerriegel (2022), who applied unsupervised methods to CRS data from 2000–2019, generating 173 “activity clusters.” These revealed emerging themes, such as “youth empowerment” and “conservation of wetlands,” that were not explicitly captured by CRS purpose codes. The strength of unsupervised learning lies in its ability to detect new trends directly from the data, without requiring categories to be predefined. However, such models are statistical in nature and require validation by subject matter experts (SMEs) to ensure coherence and accuracy. For example, Toetzke et al. (2022) engaged nine domain experts to review their machine-generated activity clusters for accuracy and coherency. Our project contributes to this literature by extending the exploratory, bottom-up approach of Toetzke, Banholzer, & Feuerriegel (2022). We apply unsupervised learning to the 2021 CRS dataset to generate 841 clusters, grouped into 80 sub-sectors, and complement this with advances in LLMs to name the machine- derived clusters. This not only helps to identify emerging aid areas but also allows us to unpack multi-sector and unallocated aid, and to estimate flows to cross-cutting themes such as the World Bank Group’s GCPs and IDA20 Special Themes and Cross-Cutting Issues. In this way, our study demonstrates how machine learning can enhance the use of CRS data, by offering policy makers and practitioners new tools to extract additional insights from existing information. III. Methodology At the pilot stage, we explored three potential machine-learning frameworks: the “Fast Clustering” 4 method, the “HDBSCAN” 5 method, and the “k-means” 6 method, using a sample of 50,000 records from the 2021 CRS dataset. Each approach combined clustering with LLMs to generate names for the machine- derived clusters (See Figure 1), in addition to the different tools, platforms, and services for data processing (see table A.1 in the appendix for more details). We compared the performance of the frameworks by applying a qualitative ‘grading’ rubric 7 to a random sample of 220 records. The “Fast Clustering” framework performed the best during this evaluation, producing clusters that were more internally coherent relative to the other two considered methods. We then applied the chosen framework to the entire 2021 CRS dataset, covering 395,306 activity-level records. Each record contains textual descriptions of the project or activity and the corresponding financial commitment. 8 4 “Fast clustering” allows for an efficient clustering of records into groups based on pre-defined “similarity-score” threshold between records. See: Rajendran et al., 2017. 5 The “HDBSCAN” method is a “Hierarchical density-based clustering” algorithm which allows for finding clusters of differing densities. See McInnes et al., 2017. 6 The “K-means” approach is the most popular clustering method and the one used by Toetzke et al. (2022) in their paper. It clusters all records into a pre-defined number (k) of groups. 7 Each record in the sample was assigned a score from 0 to 4 based on the degree of coherence between its textual description and the name of its assigned ML-generated cluster. 8 Please see Section 1 of the Appendix for more details of the 7-step machine learning methodology. 4 Figure 1: Snapshot of ML Methodology to Process OECD CRS Data Step 1. Data Preparation. We download the 2021 CRS Data 9 from the OECD website, 10 and assign a unique identifier to each record for traceability. Step 2 – Language. Harmonization: Non-English text in project descriptions is translated to English to ensure comparability. This includes “Project Title”, “Purpose Name”, “Short Description” and “Long Description”. Step 3 – Preprocessing. We remove irrelevant information such as country names, agency identifiers, and generic placeholder text (e.g., “no description,” “multi-sector”) to ensure that clustering focused on substantive content. Step 4 – Embedding. Textual descriptions are converted into numerical vectors using a pre-trained sentence embedding model. This model is trained to represent sentences in a way which captures their underlying meaning, allowing the algorithm to cluster together records with similar content instead of simply matching exact words. 9 It should be noted that for this paper we use the CRS 2021 reporting cycle data that was updated in January 2024. The CRS historical data is updated periodically by the OECD and partners. 10 See: OECD Data Explorer, Creditor Reporting System (CRS), https://www.oecd.org/en/publications/creditor- reporting-system_22180907.html. 5 Step 5 – Clustering. Records are grouped into similar clusters using the “Fast Clustering” algorithm. Parameters such as similarity thresholds and minimum cluster size are calibrated through multiple iterations to balance granularity with interpretability. Our final model produced 841 activity clusters. Records that did not meet similarity thresholds were grouped in an outlier cluster (“Other”). 11 Step 6 – Naming Clusters. To improve interpretability, we use LLMs to generate descriptive names for each cluster. For each cluster, representative records are provided to the model, which proposed a concise label (see the Appendix for prompts and process). Step 7 – Validation. A sample of clusters is manually validated by reviewing records within clusters to confirm coherence with the generated names. This ensures alignment between machine-generated outputs and subject-matter understanding. 12 Our primary iteration for analysis generated 841 clusters which cover a wide range of aid activities. For example, agriculture-related records were grouped into clusters such as Irrigation Infrastructure, Crop Production, and Pastoral Livestock Farming. In health, the model identified granular categories such as Health Personnel Training, Vaccine Development and Production, Tuberculosis Control, and Malaria Control and Prevention. IV. Results Validation Our machine learning framework produced 841 activity clusters from the 2021 CRS dataset. In this section, we present validation exercises and illustrate how the new clusters provide deeper insights into the global aid architecture compared to existing CRS classifications. The discussion is structured around (i) internal validation against CRS benchmarks, (ii) external validation against IDA commitment data, and (iii) illustrative findings from the clusters. 1. Internal Validation Due to the statistical and unpredictable nature of unsupervised learning models, it is impossible to perfectly reproduce the exact same clusters from running the same AI/ML model separate times on the same dataset. A key test of the framework is whether the generated clusters are broadly consistent in subsequent executions of the framework while also maintaining granularity. To assess the stability of our model, we compared the semantic similarity between the results of the framework across two different executions. Similarly named clusters between the two executions were paired and had a ‘cosine similarity score’13 generated for each pair. In this comparison, we found that only one matched cluster pairs scored under 80%, while over half (627 out of 822 matched cluster pairs) exceeded a 90% similarity score (figure 2). This shows strong consistency between iterations, suggesting that the clustering process is stable despite some minor, expected variations between rounds. 11 For a full discussion of the clustering process, see the Methods note in Section 1 of the Appendix. 12 Furthermore, we manually validated the quality of the names produced in step 6 for each cluster, and we edited 48 problematic cluster names. Some examples of these problematic names were names which were overly technical or included jargon (ex: “ICASS Bill Payments”), names which were overly complex (ex: “Gender-responsive Climate Finance And Energy Efficiency”), and names which were overly generic (ex: “Maternal”). We then re-named these clusters after manually examining a sub-set of records in each cluster and validating the topic of those clusters (in these examples, the example names were changed to “Bill Payments for Donor Office Costs”, “Gender-responsive Climate Finance”, and “Technical Assistance for Maternal and Child Health”, respectively). 13 In the context of machine learning, cosine similarity evaluation is used to assess the similarity between clusters. 6 Figure 2: Frequency distribution of cosine similarity scores between two rounds of AI/ML clustering Cosine Similarity Scores Between Two Iterations of ML Clustering Framework 300 250 250 Number of Paired Clusters Between 199 200 178 Rounds 150 135 100 50 36 22 1 2 0 ≤ 0.8 (0.8, 0.825] (0.825, 0.85] (0.85, 0.875] (0.875, 0.9] (0.9, 0.925] (0.925, 0.95] > 0.95 Cosine Similarity Score of Paired Clusters Between Rounds We also conducted manual validation by reviewing a random sample of records across ten clusters. In almost all cases, records were both coherent within clusters and consistent with the LLM-generated labels. For example, in the Vaccine Development and Production cluster, nearly all projects were directly related to immunization research, procurement, or delivery, with minimal unrelated content. To better understand how our results compare with existing CRS classification, we manually assigned each ML-generated cluster to a CRS sector group. 14 Each cluster was assigned to a CRS sector group based on the name of the generated cluster and the inclusion criteria of the CRS sectors according to the OECD. For example, the cluster named “Higher Education Scholarships” was assigned to the Education sector group, while the cluster named “Solar Photovoltaic Power Generation” was assigned to the Energy sector group. We then summed up the number of records, as well as the dollar amount of commitments, for all clusters assigned to each sector. This allows us to gain a broad sense of how the new ML-generated organization of the 2021 CRS data compares to the original CRS categorization. Figure 3 presents the results of this benchmarking analysis. 14 For more details on this validation exercise, see Section II of the Appendix. 7 Figure 3: Comparison of estimation of CRS Sector Groups using ML-Generated Clusters to Original CRS Sector Groups 2021 data, $US Millions in 2021 prices Difference in Total Commitment Flows by Sector (Original CRS vs ML Generated Sectors) 70,000 60% 55% 60,000 51% 50% 46% US$, Millions 50,000 43% % Change 38% 39% 40% 40,000 35% 30% 30,000 24% 20,000 17% 19% 17% 20% 14% 15% 16% 10% 12% 10% 10,000 8% 5% 4% 4% 0 1% 2% 1% 0% $ Commitments, Original $ Commitments, ML-generated % Difference Difference in Total Record Count by Sector (Original CRS vs ML Generated Sectors) 90,000 120% 112% 80,000 100% 70,000 US$, Millions 60,000 80% % Change 50,000 67% 68% 60% 40,000 45% 44% 30,000 36% 36% 40% 20,000 27% 28% 25% 23% 20% 10,000 14% 15% 16% 15% 6% 2% 7% 10% 5% 1% 5% 0 0% 0% Count of Records, Original Count of Records, ML-generated Abs. % Difference The comparison illustrates that our ML-generated clusters broadly align with the high-level CRS sector groupings. For most of the sectors, the difference between the grouping of the ML-generated clusters and the CRS sectors is within 20% based on dollar amount or number of records, as illustrated by the two charts 8 in figure 3. For example, in many large sectors, such as Education, Government & Civil Society, and Health, we see very small differences in the number of records (1%, 5%, and 1%, respectively). This supports the overall accuracy of the newly generated clusters while leaving room for differences needed for the extraction of new information. Most of the large differences lie in the small sectors, such as Debt Relief, where we observe a 68% difference in number of records between our ML-generated clusters and the original CRS sector classification. 2. External Validation Beyond internal tests, we also validated the framework against an independent benchmark: the International Development Association’s (IDA’s) commitments across sectors. This provides a way to test whether the new clusters align with a donor’s commitments to sectors from a data source outside of the OECD CRS. For this exercise, we use the IDA lending by sector report, which provides the Major Sector as well as Sector (disaggregated view) mapping for IDA operations. We matched the Major Sectors directly to the ML-generated clusters where possible and used the Sectors (disaggregated) where the Major sectors did not have a direct pair. Where neither Major Sectors nor Sectors had a matching pair with our ML-generated results, we treated them as residual sectors without a pair. Four IDA Sectors did not have a corresponding pair and were left as residual, standalone sectors. The mapping is provided in Table 1. We see a similar distribution across larger sectors such as Health, Energy, Agriculture, Education, WASH, etc. However, there are significant differences as well with Social Infrastructure/Social Protection, and Government & Civil Society/Public Administration. This may be due to the fact that these domains as produced by our ML framework are broader than World Bank sector definitions (as in the case of Social Infrastructure/Social Protection) or vice versa (Government & Civil Society/Public Administration). We also have categories such as Humanitarian Aid and Multi-Sector/Cross-Cutting as part of the machine framework results, which are not part of the World Bank’s sector list. This validation demonstrates that unsupervised clustering, despite being data-driven, can approximate donor-reported thematic allocations with a high degree of accuracy. Importantly, this shows that our framework does not displace CRS reporting but rather complements it by enabling additional thematic tracking without requiring new reporting burdens. 9 Table 1: Comparison of ML Generated Sectors with IDA Commitments Data ML Generated Sectors (OECD Commitments WB's IDA Major Sectors/ Commitments CRS-IDA) (US$, Millions) Sectors1 (US$, Millions) Other Social Infrastructure Services 6881 Social Protection 5504 Health 5188 Health 5019 Energy 4444 Energy & Extractives 4113 Agriculture, Forestry, and Fishing 3257 Agriculture 3787 Education 2986 Education 3156 WASH 2017 Water/Sanit/Waste 2254 Humanitarian Aid 1635 Transport & Storage 1554 Transportation 1859 Govt. & Civil Society 1302 Public Admin 4464 Multi-Sector/Cross-Cutting 1141 Trade 842 Trade 1140 Banking & Financial Services 808 Financial Sector 1180 General Budget Support & Assistance 587 Communications Infrastructure 345 ICT Infrastructure 656 Business & Other Services 287 Tourism 185 Tourism 130 Industry, Mining & Construction 151 Mining 111 Population and Reproductive Health 98 Refugee in Donor Countries 68 Sectors not Specified 47 Debt Relief 46 Housing Construction 24 Information & Communicatio 473 Grand Total 33871 33871 1/ In order to the relevant IDA sector category to ML-Generated sectors, at the first instance we looked at the level 2 (aggregated) Major sectors and matched it to the relevant sector pair in ML generated sector column (eg: Health, Energy, Social protection, etc). Where we couldn't find the exact match at level 2 in IDA dataset we used level 1 (disaggregated) sectors (eg: ICT Infrastructure, Mining, etc). In these cases, the appropriate deductions were made to the level 2 major sectors to which the level 1 sectors belonged. V. Key Findings Our machine learning framework yielded four sets of findings that complement the CRS by adding granularity and thematic depth. These findings illustrate how the approach helps identify emerging areas of aid, unpack multi-sectoral aid, unpack aid that is unallocated by sector, and generate estimates for cross- cutting priorities that are difficult to monitor using existing purpose codes alone. 1. Emerging Areas Not Tracked in CRS The first key finding is the identification of 36 emerging aid areas that are not explicitly captured by the current CRS taxonomy. Examples include digitalization, food security, and pandemic preparedness. These topics are dispersed across multiple CRS purpose codes, making them difficult to track systematically through the existing system. By clustering records based on textual descriptions, our framework provides a way to systematically monitor these emerging priorities. The machine generated 841 clusters covered a wide range of topics, many of which were highly specific. We manually classified all 841 activity clusters into 80 sub-sectors. The design of these sub-sectors was 10 informed by both the OECD CRS and World Bank sector taxonomies, but also adapted to align with the content of the clusters themselves. Of these 80 sub-sectors, 36 track topics not currently systematically monitored in the existing CRS sector classification system, such as “Food Security”, “General Climate”, “Private Sector Mobilization”, “Digitalization”, “Pandemics”, “Employment Creation”. Figure 4: Methodology for creating sub-sectors. Policy relevance: This finding shows how CRS data can be leveraged to monitor new themes without altering the official purpose code structure, thus reducing reporting burdens while increasing analytical value. 2. Unpacking Multi-Sectoral Aid The second finding concerns activities classified as Multi-Sectoral/Cross-Cutting in CRS. Current CRS reporting standards, by their nature, have difficulty tracking flows to multi-sector or cross-cutting areas (Pitt et al., 2018). These categories are broad by design, but their heterogeneity limits their usefulness for policy makers. For example, a project relating to both health and education may either be classified into the Health sector, Education sector, or Multi-Sector/Cross-Cutting, depending on the judgment of the reporting donor. Our model re-assigns such records into more specific clusters. For example, aid activities previously grouped as “multi-sectoral” were redistributed into clusters such as Health and Climate, General Women and Youth, Education and Health, Rural Development, Urban Development, General Environment, and more. Figure 5 presents the amount of commitments attributed to these multi-sectoral sub-sectors by the machine learning framework. This totals $59 billion, with General Women and Youth serving as the largest cross-cutting area ($15 billion) followed by Rural Development ($10 billion), Other Multi-Sector ($10 billion), and General Climate ($6 billion). 11 Figure 5: ML-generated Multi-Sector/Cross Cutting Sub-Sectors In $US millions in 2021 prices $16,000.00 $14,000.00 $12,000.00 $10,000.00 $8,000.00 $6,000.00 $4,000.00 $2,000.00 $- Policy relevance: This re-mapping improves the interpretability of multi-sectoral commitments and provides policy makers with clearer insights into the actual nature of spending. 3. Unpacking Aid Unallocated by Sector The third finding involves aid flows for which no sectoral focus was reported. In the 2021 CRS data and original sector classifications, 9,001 records are classified into the “Sector not Specified” CRS sector group. These activities account for a sizable share of reported commitments but provide limited analytical value in their raw form. Our machine learning framework grouped the majority of these records (5,839 of 9,001, or 65%) into clusters with a clear sectoral focus, while the rest (3,162 records) fell into our outlier sub- sector. 15 Table 2 presents the major clusters where the machine learning algorithm grouped those records originally classified as “Sector not Specified” in the 2021 CRS data. Table 2 shows that a large share of these records were related to the rescheduling or refinancing of disbursements, and the machine learning algorithm distributed many of the other records into areas such as banking & financial services, industry, democratic participation/elections, governance reform and support, social protection and dialogue, and language education. 15 We created one outlier Sub-Sector for clusters for which we were unable to identify any overarching or unifying theme. This includes the following four clusters: Others, Miscellaneous Grants, Unallocated Contributions, and Disbursement Rescheduling and Refinancing. The later cluster includes records focused on refinancing loans or changing disbursement timelines, without any clear sectoral focus. 12 Table 2: Top 10 ML-Cluster Destination for CRS “Sectors not Specified” records 16 Sum of Corresponding Author Count of Commitments ML-Generated Cluster Created Sub-Sector Records ($US millions) Disbursement Rescheduling and Refinancing Sectors Not Specified 591 $ (11.90) Industry, Mining & Nonferrous Metals Production Construction 319 $ 9.14 Program Support and Operational Costs Administrative Costs 298 $248.20 Other Banking & Strategic Financial Support Financial Services 231 $ 4.14 Participatory Governance and Local Democratic Economic Development Participation/Elections 203 $ 1,140.52 Multi-Sectoral Financing Other Multi-Sector 189 $ 1,538.10 International Election Observation and Democratic Democratic Participation Participation/Elections 176 $341.03 Local Development and Governance Governance Reform and Support Support 173 $43.34 Social Protection and Social Dialogue Support Dialogue 171 $43.00 Language Education and English Language Skills Development Research 169 $116.32 Policy relevance: By reducing the share of aid recorded under the category of “Sectors not Specified”, this approach enables more accurate attribution of activities to their relevant sectors, thereby enhancing the clarity and usefulness of aid data. 4. Estimating Flows to Cross-cutting and Thematic Priorities Finally, we use our framework to make estimates of financial support to cross-cutting themes central to areas high on the international agenda. We conducted this exercise for the five IDA20 Special Themes, the four IDA Cross-Cutting Issues, and the six WB Global Challenge Programs (GCPs). Table 3 shows the 26 ML-generated clusters mapped to the IDA20 Special Theme of “Climate Change”. Clusters mapped to this thematic issue include Agricultural Policy and Climate Risk, Solar Energy Solutions and Applications, and Water Policy and Climate Change. These 26 clusters total 16,937 records and over $25 billion in commitments (see Table 3). 17 The full list of mapping of all clusters to these thematic priorities is available upon request. Figure 6 presents the results of this analysis, including the estimated aid flows to each thematic area based on this methodology. Based on these estimates, over $25 billion of commitments in 16 Table 3 only presents the subset of records classified as “Sectors not Specified” in the CRS data and includes ten clusters with the most records from this CRS sector group. Information of the entirety of those ML-generated clusters, which also include records from other CRS sector groups, is available upon request. 17 The limitation of these methods should be noted, particularly the likelihood that our estimates underestimate the total flows to these thematic issues. For example, while the majority of records in “Agricultural Productivity and Climate Adaptation” may be related to climate, we cannot determine how many records in other clusters —despite having non-climate-related names— also pertain to climate. 13 2021 were directed toward the IDA20 Special Theme “Climate Change”, while over $42 billion was committed toward the GCP of “Enhanced Health Emergency Prevention, Preparedness and Response.” 18 Table 3: Mapping of ML-Generated Clusters to IDA20 Special Theme “Climate Change” Count of Commitments ML-Generated Cluster mapped to “Climate Change” Records ($US millions) Climate Change Adaptation Strategies 1891 $2546.49 Agricultural Productivity and Climate Adaptation 1504 $1174.87 Climate-smart Agricultural Financing 1407 $1973.13 Agricultural Development and Solar Energy Utilization 1378 $1050.39 Renewable Energy Investment 1132 $2257.94 Hydropower Plant Security and Efficiency 844 $3747.35 Solar Photovoltaic Power Generation 765 $2577.26 Agricultural Development and Climate Monitoring 765 $306.44 Agricultural Sector Development and Climate Resilience 714 $594.90 Renewable Energy Development and Efficiency 688 $250.77 Renewable Energy Transition and Efficiency 675 $418.21 Green Energy Transition and Efficiency 578 $1377.08 Climate Change Adaptation and Disaster Risk Reduction 539 $1250.10 Disaster Risk and Climate Change Adaptation 535 $1261.91 Climate Policy Implementation and Biodiversity Conservation 513 $445.48 Climate Adaptation in Agriculture 457 $286.98 Water Supply Systems and Energy Generation 385 $1034.19 Water Policy and Climate Change 361 $511.99 Solar And Geothermal Energy Development 316 $730.35 Health Policy And Environmental Improvement 286 $86.43 Agricultural Policy and Climate Risk 245 $247.76 Climate-smart Agriculture and Food Security 234 $161.39 Food Security and Climate Health Research 229 $40.03 Solar Energy Solutions and Applications 223 $443.73 Gender-responsive Climate Finance 214 $489.84 Environmental Health Risk Reduction 59 $3.50 Total Climate Change Estimate 16,937 $25,268.49 18 The estimates created for high-level thematic issues are high-level and likely under-estimating flows to these areas. A future area of work is to create more accurate estimations for these thematic issues. 14 Figure 6: Estimation of 2021 Commitments to WB GCPs, IDA20 Special Themes, and IDA20 Cross-Cutting Issues In $US billions in 2021 prices Policy relevance: This demonstrates how bottom-up clustering can complement CRS classifications by enabling monitoring of financial support to cross-cutting thematic areas of high importance on the international agenda and which are not easily captured by existing purpose codes. VI. Conclusion These results demonstrate how machine learning methods can extract additional value from aid data and help mitigate existing challenges in aid monitoring. Our bottom-up, unsupervised methodology organically groups aid activities, enabling the identification of emerging themes that are not easily captured through a top-down classification system. This re-grouping also provides a way to unpack activities previously coded as “Multi-Sector/Cross-Cutting” or “Sector Not Specified”, categories which, while useful, limit analytical depth. The granularity of our ML-generated clusters further allows for high-level estimation of commitments to thematic areas that are otherwise difficult to track. In this report, we applied the framework to the World Bank Group’s GCPs, IDA20 Special Themes, and IDA Cross-Cutting Issues. The resulting estimates illustrate how a data-driven, complementary approach can enrich the monitoring of global aid flows. Despite these contributions, several limitations should be acknowledged. Because unsupervised clustering does not produce a single “correct” grouping, it is impossible to validate each of the 395,306 records 15 individually. To address this, we compared subsets of machine-generated clusters against CRS sector classifications and validated them at the sectoral level using both financial volume and record counts. While these checks support internal validity of the approach at a high-level, uncertainty remains at the level of the individual record. In addition, when we scale up the analysis to include additional years of CRS data, the machine generated clusters will also change. A key strength of this organic, bottom-up methodology is its ability to adaptively capture unexpected patterns in the data, which might be stifled if outcomes from previous iterations were rigidly enforced. However, this also means that the results from the ML framework are not perfectly replicable, and incorporating additional data will lead to different clusters than those produced in previous iterations. Therefore, the outcomes from an expanded dataset may not be directly comparable to those from this initial one-year exercise. This poses challenges for longitudinal analysis and for tracking changes over time, as each addition of data will reshape the clusters. While our unsupervised method allows us to estimate funding to thematic areas, such as those identified by the World Bank Group’s GCPs, the IDA20 Special Themes, and the IDA Cross-Cutting Issues, these estimates are inherently high-level and likely understate the true volume of flows to these areas This is because the estimates are based on aggregating clusters together rather than assigning each record individually to specific thematic issues. For example, while we may assume that most or all records in the cluster named “Climate Adaptation in Agriculture” are climate related, there is no way to be sure that projects with climate adaptation aspects are not also included in other clusters without climate-explicit names. As such, it is likely that our method underestimates the number of records that are relevant to each theme. These limitations point to opportunities for refinement. Future work could combine unsupervised clustering with record-level tagging to improve accuracy in thematic estimation. For example, records could be directly classified against predefined themes (e.g., jobs, digitalization, the IDA Special Themes) in order to provide more precise and replicable results. From a policy perspective, the framework illustrates how existing CRS data can be leveraged without altering official reporting requirements. By uncovering emerging topics, unpacking residual categories, and generating thematic estimates, machine learning provides a flexible complement to the CRS. This can strengthen monitoring of global aid flows in real time and support bilateral donors, multilateral institutions, and policy makers in efforts to align resources with evolving development priorities. It can also provide recipient countries with better insights into aid flows, helping to inform dialogue and coordination around external financing. References Annen, K. & Kosempel, S. (2009). Foreign aid, donor fragmentation, and economic growth. The B.E. Journal of Macroeconomics, 9(1), Article 33. Available at: http://www.bepress.com/bejm/vol9/iss1/art33 Borst, J., Wencker, T. & Niekler, A. (2022). Using text classification with a Bayesian correction for estimating overreporting in the Creditor Reporting System on climate adaptation finance. arXiv preprint. Available at: https://arxiv.org/pdf/2211.16947.pdf 16 Casadevall Bellés, S. & Calleja, R. (2024). The evolution of the ODA accounting rules. CGD Note No. 376. Center for Global Development, 28 June. Available at: https://www.cgdev.org/publication/evolution- oda-accounting-rules Development Co-operation Directorate Development Assistance Committee DAC Working Party on Development Finance Statistics. (2021). Converged statistical reporting directives for the Creditor Reporting System (CRS) and the Annual DAC Questionnaire. OECD. Available at: https://one.oecd.org/document/DCD/DAC/STAT(2020)44/FINAL/En/pdf Dixit, S., Mao, W., McDade, K.K. & Schäferhoff, M. (2022). Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques. Unpublished manuscript. Djankov, S., Montalvo, J.G. & Reynal-Querol, M. (2009). Aid with multiple personalities. Journal of Comparative Economics, 37(2), 217–229. https://doi.org/10.1016/j.jce.2008.09.005 Dupriez, O. (2018). An empirical comparison of machine learning classification algorithms. World Bank. Available at: http://pubdocs.worldbank.org/en/666731519844418182/PRT-OD-presentation-V2.pdf Easterly, W. & Pfutze, T. (2008). Where does the money go? Best and worst practices in foreign aid. Journal of Economic Perspectives, 22(2), 29–52. Flogstad, C. & Hagen, R.J. (2017). Aid dispersion: Measurement in principle and practice. World Development, 97, 232–250. https://doi.org/10.1016/j.worlddev.2017.04.022 Goldblatt, R., Stuhlmacher, M., Hostert, P. & Fensholt, R. (2017). Using Landsat and nighttime lights for supervised pixel-based image classification of urban land cover. Remote Sensing of Environment, 205, 253–275. https://doi.org/10.1016/j.rse.2017.11.026 Gounden, C., Irvine, J. & Wood, R. (2015). Promoting food security through improved analytics. Procedia Engineering, 107, 335–336. https://doi.org/10.1016/j.proeng.2015.06.089 Gualberti, G. & Wilcks, J. (2021). Measuring official development finance for digitalisation. OECD iLibrary. Available at: https://www.oecd-ilibrary.org/sites/0b50096a-en/index.html Jean, N., Burke, M., Xie, M., Davis, W.M., Lobell, D.B. & Ermon, S. (2016). Combining satellite imagery and machine learning to predict poverty. Science, 353(6301), 790–794. https://doi.org/10.1126/science.aaf7894 Kimura, H., Mori, Y. & Sawada, Y. (2012). Aid proliferation and economic growth: A cross-country analysis. World Development, 40(1), 1–10. Lu, X., Wrathall, D., Sundsøy, P., Iqbal, A., Wetter, E., Qureshi, T. & Bengtsson, L. (2016). Unveiling hidden migration and mobility patterns in climate stressed regions: A longitudinal study of six million anonymous mobile phone users in Bangladesh. Global Environmental Change, 38, 1–7. https://doi.org/10.1016/j.gloenvcha.2016.02.002 McInnes, L., Healy, J. & Astels, S. (2017). hdbscan: Hierarchical density-based clustering. Journal of Open Source Software, 2(11), 205. https://doi.org/10.21105/joss.00205 17 McKenzie, D. & Sansone, D. (2017). Man vs. machine in predicting successful entrepreneurs: Evidence from a business plan competition in Nigeria. World Bank Policy Research Working Paper. Available at: http://documents.worldbank.org/curated/en/968231513116778571 Marineau, J.F. (2023). Disaggregating foreign aid: What have we learned from research on sub-national foreign aid. Commonwealth Review of Political Science, 6(1), Article 3. Micah, A.E., Solorio, J., Stutzman, H. & others. (2022). Development assistance for human resources for health, 1990–2020. Human Resources for Health, 20, 51. https://doi.org/10.1186/s12960-022-00744-x Mohri, M., Rostamizadeh, A. & Talwalkar, A. (2018). Foundations of Machine Learning. MIT Press. Nunnenkamp, P., Öhler, H. & Thiele, R. (2013). Donor coordination and specialization: Did the Paris Declaration make a difference? Review of World Economics, 149(3), 537–563. OECD. (2009). Comparative study of data reported to the OECD Creditor Reporting System (CRS) and to the Aid Management Platform (AMP). OECD. Available at: http://www.oecd.org/3dac/stats/43908328.pdf OECD. (2019). CRS reporting checklist (DCD/DAC/STAT(2019)27). OECD Development Co-operation Directorate, Development Assistance Committee. OECD. (2021). Purpose codes: Sector classification. OECD. Available at: https://www.oecd.org/development/financing-sustainable-development/development-finance- standards/purposecodessectorclassification.htm Parthasarathy, R., Rao, V. & Palaniswamy, N. (2017). Deliberative inequality: A text-as-data study of Tamil Nadu's village assemblies. World Bank Policy Research Working Paper. Available at: http://documents.worldbank.org/curated/en/582551498568606865 Pincet, A., Okabe, S. & Pawelczyk, M. (2019). Linking aid to the Sustainable Development Goals – A machine learning approach. OECD Working Paper. Available at: https://www.oecd- ilibrary.org/docserver/4bdaeb8c-en.pdf Pitt, C., Grollman, C., Martinez-Alvarez, L., Arregoces, L. & Borghi, J. (2018). Tracking aid for global health goals: A systematic comparison of four approaches applied to reproductive, maternal, newborn, and child health. The Lancet Global Health, 6(9), 859–874. Rajendran, E., Kalaiprasath, R. & Udayakumar, R. (2017). A fast-clustering algorithm for high-dimensional data. International Journal of Civil Engineering and Technology, 8(10), 1220–1227. Stockholm Environment Institute. (2025). About the OECD’s CRS data. Available at: https://aid- atlas.org/about/data Toetzke, M., Banholzer, N. & Feuerriegel, S. (2022). Monitoring global development aid with machine learning. Nature Sustainability, 5, 533–541. https://doi.org/10.1038/s41893-022-00874-z Toetzke, M., Stünzi, A. & Egli, F. (2022). Consistent and replicable estimation of bilateral climate finance. Nature Climate Change, 12(10), 897–900. https://doi.org/10.1038/s41558-022-01482-7 World Health Organization. (2015). State of inequality: Reproductive maternal newborn and child health: Interactive visualization of health data. WHO. 18 Appendix: Supplementary Methodological Notes I. Pilot Phase and ML-Framework Selection The analysis was conducted using the Creditor Reporting System (CRS) dataset from the Organisation for Economic Co-operation and Development (OECD). The dataset provides project-level information on global development aid flows as reported by participating donor countries to the OECD. Aid activities are updated by donor organizations at the end of each year, and the dataset is updated by the OECD annually. Each activity within the OECD CRS dataset includes a project title, a short description, a long description, and a purpose name. Textual descriptions are usually written in the official language of the donor organization. In our initial exploration, three different approaches were considered (see Table A.1). A preliminary run on a random sample of 50,000 English records from 2021was conducted with each, in order to evaluate the approach’s quality of outcome, technical evaluation options, and machine learning training inputs. Table A.1: Considered Machine Learning Frameworks The authors then manually graded the quality of clustering using a sub-sample of 220 records and their corresponding ML-generated, assigned cluster. Each record in the sub-sample was assigned a score from 0 to 4 based on the degree of coherence between its textual description and the name of its assigned ML- generated cluster. Approach 2 (Fast Clustering Methodology) was then chosen over Approach 1 (HDBSCAN) and Approach 3 (K-Means Methodology) for the following reasons: 1) The Fast-Clustering methodology performed noticeably better during manual evaluations than the HDBSCAN methodology; 2) The Fast-Clustering methodology, unlike K-Means which requires pre-determining the number of sub- sectors, offers the advantage of organically deriving the number of sub-sectors from the data itself, aligning with the project’s goal of creating a truly bottom-up understanding of aid activities; 3) The Fast-Clustering methodology allows for the creation of a true outlier sector, while the K-Means approach does not; and 4) The Fast-Clustering methodology allows for the calibration of three parameters to better calibrate the size and number of clusters generated, which the K-Means methodology does not. Table A.2 provides additional details of the individual components of this chosen approach. Table A.2: Components of Chosen Machine Learning Framework - Approach 2 (Fast Clustering) Approach Compute Embedding Clustering Language Naming Service (LLM Model) 2 Azure Data All-miniLM- Fast Clustering Azure Azure Open Bricks: a L6-v2: Algorithm: This Translator AI – (GPT unified Is a sentence algorithm is tuned Service is a 4): Azure analytics transformer for large datasets. cloud-based OpenAI platform model that In a large list of neural Service is a offered by maps sentences, it machine fully 19 Microsoft sentences & searches for local translation managed Azure, paragraphs to communities. A service that is service that developed in a 384- local community is part of the allows collaboration dimensional a set of highly Azure AI developers to with dense vector similar sentences. services easily Databricks. It space. family and integrate provides a can be used OpenAI managed with any models into Apache Spark operating their environment system applications. for big data processing and analytics. Sequentially, the sample was enlarged to all aid activities in the 2021, a total of 395,306 records, to further validate, calibrate, and improve the machine learning framework. In Section III of this paper, we broadly outlined this finalized, seven-step, ML-clustering framework. Here, we provide additional methodological notes, expanding on the technical aspects of each step: Step 1 - Data Preparation. The CRS Data is manually downloaded from the OECD Site and placed in the Azure Blob as ".CSV” Files. Since the CRS Data does not have any unique identifier, we generate and assign a random unique number for each CRS record in the first step for back mapping and identification purposes. Step 2 – Language. “Fasttext-langdetect” is an open-source library working as a wrapper for the language detection model trained on ‘fastText’ by Facebook. 19 The library can recognize 250+ language. The columns “Project Title”, “Purpose Name”, “Short Description” and “Long Description” within the CRS data were used for this analysis. The Open-source Language detection is first run on these columns and non-English values are filtered separately. Azure AI Translator Service is used for the actual translation of non-English characters. Azure Translator Service is a cloud-based neural machine translation service that is part of the Azure AI services family and can be used with any operating system. Step 3 – Preprocessing. Names and Keywords such as "resilience", "resilient", "resiliency", "no description", "semi-aggregates", "non-sector allocable", "aid-activity", "usaid", "multi-sector", "aid activities", "multisector", "USAID", "aid", "activity", "developing countries", "multisectoral", "africa", "antarctica", "asia", "europe", "north america", "south america", "australia","african", "portuguese", "asian", "american", "european", "australian" are removed from descriptive text columns. The redacted words are found in the key text columns for analysis, then we consider also “Sector Name” (5th column) as well. These key columns are then merged into a single column for easier analysis. Flair, a powerful NLP library, is used to remove geographical information from the text, including country and city names, multilateral & bilateral Agency names, defined keywords, etc. Flair allows for the application of state-of- the-art natural language processing (NLP) models to a text, such as named entity recognition (NER), sentiment analysis, part-of-speech tagging (PoS), special support for biomedical data, sense disambiguation and classification, with support for a rapidly growing number of languages. 19 See: https://fasttext.cc/docs/en/language-identification.html. 20 Step 4 – Embedding. “all-MiniLM-L6-v2” model is used for embedding as the sentence-transforming model. It maps sentences & paragraphs to a 384-dimensional dense vector space, which is required for our task of clustering textual descriptions. Step 5 – Clustering. Clustering is done based on the embeddings generated in the previous step through the “Fast Clustering” algorithm, calibrated for large datasets (50,000 sentences in less than 5 seconds). The algorithm searches for local communities (a set of highly similar sentences) within the large dataset. We configure the threshold of cosine-similarity score to define the level of similarity of sentences. This also allows users to define the minimal size for a local community and derive either large coarse-grained clusters or small fine-grained clusters. Table A.3 presents the parameters chosen for the model. Key parameters include: 1) threshold: a high threshold will only find extremely similar sentences; a lower threshold will find sentences that are less similar. Consider sentence pairs with a cosine-similarity larger than the threshold as similar, 2) Min_Community_Size: only communities with at least a certain number of sentences will be returned, 3) Batch_Size: technical parameter where the total observations are separated in batches and processed parallelly, and 4) Sort Max Size: maximum size for a community. Cosine similarity in ML can be used as a metric for deciding the optimal number of neighbors where the data points with a higher similarity will be considered as ‘nearest’ neighbors and the data points with lower similarity will not be considered. Based on the above settings, around 600-800 clusters are generated with the 2021 CRS data. Once the clusters are generated, we map them to the original CRS data for each unique identifier. Table A.3: Key Parameters for Fast Clustering Algorithm Threshold Min_Community_Size Batch_Size Sort Max Size 0.80 50 1024 100 Step 6 – Naming. For each cluster, six representative sentences are identified and provided to Azure Open AI to generate the name for that cluster. We leverage the gpt-4-32k (0613) model, which names each cluster using the six representative sentences. Step 7: Postprocessing. Only the required columns are selected and written to the final output file. II. Machine-Generated Clusters Our machine learning framework grouped all records in the 2021 CRS data into 841 distinct clusters with unique names. In order to gain insight into how this new machine learning categorization compared to the original CRS sector classification, our subject matter experts manually classified each cluster into an existing CRS sector. This validation exercise is discussed in Section IV, “Results Validation”. We then also grouped each cluster into new "Sub-Sectors." This grouping was informed by the OECD CRS and World Bank sector taxonomies, as well as the judgment of the authors. This approach allows the identification of emerging trends that are not immediately evident in the full list of 841 machine-generated clusters. Table A.4 presents the full list of the 80 new sub-sectors. It includes the total number of records and commitments aggregated into each sub-sector from the clusters that were grouped together. Additionally, the table indicates whether these sub-sectors represent topics that are already tracked by existing CRS sectors or if they highlight new emerging topics. The full table of all 841 machine-generated clusters and their mapping to both the CRS Sector Groups and our new, author-created sub-sectors is available upon request. 21 Table A.4: Newly Created Sub-Sectors Tracked in Count of Sum of Sub-Sectors Aggregated from ML-Generated Clusters CRS Records Commitments Sectors? ($US Millions) NGO's, Volunteers, Partnerships No 6,576 503.71 Administrative Costs Yes 20,350 10100.18 Agriculture Yes 7,793 6228.81 Forestry Yes 3,216 2407.23 Pest Control and Animal Health No 2,316 1382.98 Fisheries Yes 2,175 1777.93 Food Security No 4,062 3967.01 Agriculture and Climate No 7,217 6241.31 Private Sector Mobilization No 1,254 1418.98 Other Banking & Financial Services Yes 4,855 16116.14 Business & Other Services Yes 2,640 2085.01 Digitalization No 2,875 2211.82 Other ICT Infrastructure Yes 5,373 1378.01 Construction Yes 286 203.74 Debt Relief Yes 849 1105.65 Language Education and Research No 1,617 364.16 Global Leadership Development No 2,421 808.36 Tertiary Education Yes 5,698 4981.72 Basic Education Yes 4,006 2507.82 Secondary and Vocational Training Yes 5,354 3951.48 International Student Mobility No 1,187 343.99 Specialized Training No 1,591 1308.42 Other Education Yes 8,967 5969.56 Research Yes 1,539 755.38 Non-renewable Energy Yes 1,883 6748.16 Other Energy Yes 3,131 9410.81 Renewable Energy Yes 5,606 12836.88 General Budget Support & Assistance Yes 1,459 8105.13 Domestic Resource Mobilization No 3,316 3721.15 Governance Reform and Support Yes 2,643 2848.70 Democratic Participation/Elections Yes 12,110 3847.48 Macroeconomic Policies No 2,845 3434.51 Fragility Yes 6,498 3109.72 Security Systems No 8,342 718.87 Judicial Systems No 3,319 218.55 Public Sector Capacity Building Yes 3,634 1966.36 Combating Illicit Flows No 2,050 616.47 Narcotics Control No 17,405 668.44 Law Enforcement No 5,186 344.35 22 Human Rights Protection No 6,952 5763.82 Migrant Support and Protection No 1,807 2015.04 Customs and Borders No 1,548 347.89 Health and Climate No 1,064 189.53 Health Policy Yes 2,803 1302.57 Vaccinations No 1,029 400.33 Professional Health Development Yes 1,014 287.04 Nutrition No 2,135 463.85 Health Technical Assistance, Training, Capacity Building No 2,769 1605.21 Health Financing Yes 3,695 10849.83 Health Systems Yes 10,232 5033.18 Other Health Yes 9,294 5837.86 Pandemics No 13,181 42214.49 HIV/AIDS Yes 14,200 6932.11 Maternal and Child Health No 5,564 2879.80 Reproductive Health Yes 7,038 3282.34 Emergency Food Assistance Yes 4,808 11128.44 Emergency Response Yes 11,243 22127.62 Emergency Preparation Yes 4,887 7051.62 Industry, Mining & Construction Yes 5,793 3717.64 Other Multi-Sector Yes 5,231 9621.84 Rural Development No 6,280 9695.48 Urban Development No 3,245 5972.97 General Women and Youth No 15,649 15167.99 Education and Health No 4,046 2059.04 General Climate No 3,540 6060.33 General Environment Yes 8,180 4256.57 Social Protection and Dialogue Yes 9,183 9176.65 Employment Creation No 2,163 394.07 Cultural Heritage Preservation No 2,865 263.24 Labor Rights No 4,955 845.08 Refugees Yes 2,521 9122.02 Sectors not Specified Yes 3,162 1821.55 Tourism Yes 1,475 816.34 Trade Yes 3,598 6972.78 Transport Infrastructure Yes 2,679 8560.36 Non-land Transportation Yes 1,924 2209.46 Land Transportation Yes 3,262 12054.62 Logistics and Transportation Costs No 5,268 3190.68 WASH Yes 10,359 11188.27 Sustainable Water and Waste No 1,649 859.74 III. Mapping of Clusters to Thematic Areas 23 The granular nature of the ML-generated clusters enables the high-level estimation of financial support for thematic issues, such as the WBG GCPs, the IDA20 Special Themes, and the IDA Cross-Cutting Issues. Our subject matter experts manually identified clusters that were specifically related to these broad thematic areas based on the ML-generated name of each cluster. The identified clusters were then aggregated to create estimations of the financial support going to each of these thematic areas in 2021. These estimates are presented in Figure 6 and discussed in Section V, “Key Findings”. The full list of all ML-generated clusters and their mapping to the select thematic issues is available upon request. 24