Policy Research Working Paper                           11248




              Monitoring Global Aid Flows
       A Novel Approach Using Large Language Models

                                   Xubei Luo
                            Arvind Balaji Rajasekaran
                             Andrew Conner Scruggs




Development Finance Vice Presidency
November 2025
Policy Research Working Paper 11248


  Abstract
  Effective monitoring of development aid is the foundation                          taxonomy, allow unpacking of “multi-sectoral” and “sector
  for assessing the alignment of flows with their intended                           not specified” classifications, and enable estimation of flows
  development objectives. Existing reporting systems, such                           to new themes, including World Bank Global Challenge
  as the Organisation for Economic Co-operation and Devel-                           Programs, International Development Association–20
  opment’s Creditor Reporting System, provide standardized                           Special Themes, and Cross-Cutting Issues. Validation
  classification of aid activities but have limitations when it                      against both Creditor Reporting System benchmarks and
  comes to capturing new areas like climate change, digitali-                        International Development Association commitment data
  zation, and other cross-cutting themes. This paper proposes                        demonstrates robustness. This approach illustrates how
  a bottom-up, unsupervised machine learning framework                               machine learning and the new advances in large language
  that leverages textual descriptions of aid projects to generate                    models can enhance the monitoring of global aid flows
  highly granular activity clusters. Using the 2021 Credi-                           and inform future improvements in aid classification and
  tor Reporting System data set of nearly 400,000 records,                           reporting. It offers a useful tool that can support more
  the model produces 841 clusters, which are then grouped                            responsive and evidence-based decision-making, helping to
  into 80 subsectors. These clusters reveal 36 emerging aid                          better align resources with evolving development priorities.
  areas not tracked in the current Creditor Reporting System




 This paper is a product of the Development Finance Vice Presidency. It is part of a larger effort by the World Bank to
 provide open access to its research and make a contribution to development policy discussions around the world. Policy
 Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted
 at xluo@worldbank.org.




          The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
          issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
          names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
          of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
          its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                        Produced by the Research Support Team
      Monitoring Global Aid Flows: A Novel Approach Using Large Language
                                    Models
                  Xubei Luo, Arvind Balaji Rajasekaran, Andrew Conner Scruggs 1




Authorized for distribution by Francisco Galrao Carneiro, Manager, Development Finance Vice Presidency,

                                                 World Bank Group




  JEL codes: F35, C38, C55
  Keywords: Foreign Aid, Classification Methods: Cluster Analysis, Modeling and Analysis




  1
    We are grateful to Paulo S. Baioni and Vasanth Kumar Vikram Prakash for their substantive contributions to the
  technical foundation of this paper, and to Damodaran Venugopal Kothandan, Ani Popiashvili, Sunny Yujuan Sun,
  Sasha Segeda, and colleagues from the World Bank Group Information and Technology Solutions team and the
  Technology Innovation Office for their support in the development of the AI and machine learning algorithms. We
  thank Akihiko Nishio for overall guidance, and Francisco Galrao Carneiro for valuable suggestions and expert insights
  throughout the study; Enrique Blanco Armas, Michelle Leonore Fortune, Ashish Makkar, Stela Mocan, Raju Singh,
  and Long Hanhua Wang for their productive discussions; and all colleagues who participated in the Monitoring Global
  Aid Flows: An AI & Machine Learning Approach workshop and the Global Aid Data Workshop for their feedback.
  The content is solely the responsibility of the authors. The findings, interpretations, and conclusions expressed in this
  paper are entirely those of the authors. They do not represent the views of the World Bank and its affiliated
  organizations, or those of the Executive Directors of the World Bank or the governments they represent.
    I.        Introduction
Monitoring development aid contributes to improving the alignment of financial flows with their intended
development objectives. As global development priorities evolve, policy makers require monitoring
systems that can track commitments not only to traditional sectors such as health, education, and
agriculture, but also to cross-cutting and newly emerged thematic domains like climate change,
digitalization, and global pandemics. Without the ability to capture financial support to these dynamic areas,
aid data risks underrepresenting the full scope of development efforts, potentially contributing to a
misalignment between global goals such as the Sustainable Development Goals (SDGs), with operational
priorities of development cooperations.
The Organisation for Economic Co-operation and Development (OECD) Creditor Reporting System
(CRS) 2 is the most widely used and comprehensive database of official aid flows, with over 4 million
activity-level records reported since 2000. Its standardized “purpose codes,” designed by the OECD
Development Assistance Committee (DAC), enable comparability across financial flows from different
donors and over time, making the CRS an indispensable tool for researchers, policy makers, and
development practitioners. This categorization follows a top-down approach, where official labels (known
as “purpose codes”) for categorizing what sector the aid activity is intending to support (example: health,
agriculture, etc.) are predetermined by the DAC before the reporting period begins. 3 Because of this
structured and consistent coding system, the CRS is highly reliable for tracking flows to traditional sector
categories. However, at the same time, it faces inherent challenges when applied to new or cross-cutting
priorities. Activities related to new priorities are often dispersed across multiple sectors or grouped into
residual categories like multi-sectoral/cross-cutting or sector not specified. As the Stockholm Environment
Institute (2025) describes, “it can be very difficult to compile a picture of financial support for some issues
that are of high importance on both national and international agendas, for example sustainable oceans,
because they are not in the coding system of the CRS. So, support for activities related to oceans are likely
to be split across quite a few sector categories and cannot easily be identified or brought together in any
comprehensive way.” This reflects a structural shortcoming in how aid activities are classified, pointing to
a broader challenge in producing a full and coherent picture of aid flows to emerging thematic areas.
This paper proposes a complementary and novel approach to address this challenge. We develop a bottom-
up, unsupervised machine learning framework that leverages recent advances in machine learning and large
language models (LLMs) to maximize the value of the textual descriptions accompanying each aid activity
in the CRS dataset. By embedding and clustering the text from nearly 400,000 records from the 2021
dataset, the framework organically re-organizes the CRS data into new groups of similar activities in a
coherent manner. Our framework produces 841 granular clusters, which we then group into 80 sub-sectors.



2
  See: Converged Statistical Reporting Directives for the Creditor Reporting System (CRS) and the Annual DAC
Questionnaire, (OECD, 2023), https://one.oecd.org/document/DCD/DAC/STAT(2023)9/FINAL/en/pdf
3
  In OECD CRS reporting, “the sector of destination of a contribution is determined by the reporting entity answering
the question ‘which specific area of the recipient’s economic or social structure is the transfer intended to foster?’ and
assigning a ‘purpose code’ which reflects this. The sector classification does not refer to the type of goods or services
provided by the donor. Sector specific education or research activities (e.g., agricultural education) or construction of
infrastructure (e.g., agricultural storage) should be reported under the sector to which they are directed, not under
education, construction, etc.”           (see:    OECD, 2019. https://web-archive.oecd.org/2019-03-28/80638-
purposecodessectorclassification.htm).



                                                            2
To enhance interpretability, recent advances in LLMs are used to generate meaningful names for the
machine-derived clusters.
Our approach makes three main contributions. First, it identifies 36 emerging areas of aid activity that are
absent from the current CRS taxonomy, including food security, digitalization, and pandemics. Second, it
unpacks ambiguous categories, reallocating records reported as “multi-sectoral” or “not specified” into
more precise clusters. Third, it enables estimation of commitments to development priority areas including
the World Bank Group’s Global Challenge Programs (GCPs) and International Development Association
(IDA)-20 Special Themes and Cross-Cutting Issues. Validation against both CRS benchmarks and IDA
commitments data underscores the robustness of the methodology.
The remainder of this paper is structured as follows. Section II reviews the literature on aid monitoring and
machine learning applications in development. Section III outlines the methodology, including data
processing, clustering, and validation. Section IV presents results and compares them to existing CRS
classifications and IDA commitments. Section V discusses our key findings and their policy implications.
Section VI concludes by highlighting the broader significance of machine learning and LLMs for the future
of aid monitoring and reporting.


    II.     Literature Review
Effective monitoring development aid has long been recognized as complex, resource-intensive, and
sensitive to the priorities of different stakeholders (Flogstad & Hagen, 2017; Djankov et al., 2009; Easterly
& Pfutze, 2008). Donors and agencies use varied systems for classifying and reporting aid flows, which
complicates comparability across organizations. For instance, Flogstad and Hagen (2017) show that
bilateral agencies track climate finance differently according to internal policies, making cross-agency
comparisons difficult. The fragmented nature of donor operations and differences in reporting practices
contribute substantially to the challenges of measuring aid flows consistently.

The OECD CRS is the most widely used and comprehensive database for official aid flows, with
standardized purpose codes that provide comparability across donors and time (Nunnenkamp et al., 2013).
Its top-down classification system is robust, but adapting purpose codes to new or cross-cutting themes can
be slow and resource-intensive (Toetzke, Banholzer, & Feuerriegel, 2022). The original purpose code
taxonomy was created alongside the CRS in 1973, and since 2010, only two new purpose codes have been
added: “Refugees/asylum seekers in donor countries” in 2018 (OECD, 2019) and “Covid-19 control” in
2021 (Casadevall Bellés and Calleja, 2024). As Grollmen et al. (2018) and WHO (2015) note,
inconsistencies sometimes emerge when different entities interpret the reporting standards differently,
particularly in health. Pincet et al. (2019) similarly highlight the importance of developing complementary
instruments to maximize the value of CRS data.

Against this backdrop, scholars have applied machine learning techniques to aid data, broadly following
two approaches. The first, more common, is a supervised learning approach, where models are trained on
pre-labeled data to identify aid flows to predefined topics. For example, Pincet et al. (2019) estimate support
to the SDGs, Dixit et al. (2022) focus on global public goods for health, and Borst et al. (2022) and Toetzke,
Stünzi, & Egli (2022) apply supervised methods to estimate climate finance. While effective for specific
purposes, these models are limited in scope and demand substantial amounts of human effort to create the



                                                      3
training datasets. Therefore, a model trained to identify climate finance cannot be repurposed to track flows
to digitalization or other themes without re-training on new labeled datasets.

The second approach is unsupervised learning, which allows algorithms to organically group aid activities
into clusters without pre-labeled training data. This approach was pioneered in the CRS context by Toetzke,
Banholzer, & Feuerriegel (2022), who applied unsupervised methods to CRS data from 2000–2019,
generating 173 “activity clusters.” These revealed emerging themes, such as “youth empowerment” and
“conservation of wetlands,” that were not explicitly captured by CRS purpose codes. The strength of
unsupervised learning lies in its ability to detect new trends directly from the data, without requiring
categories to be predefined. However, such models are statistical in nature and require validation by subject
matter experts (SMEs) to ensure coherence and accuracy. For example, Toetzke et al. (2022) engaged nine
domain experts to review their machine-generated activity clusters for accuracy and coherency.

Our project contributes to this literature by extending the exploratory, bottom-up approach of Toetzke,
Banholzer, & Feuerriegel (2022). We apply unsupervised learning to the 2021 CRS dataset to generate 841
clusters, grouped into 80 sub-sectors, and complement this with advances in LLMs to name the machine-
derived clusters. This not only helps to identify emerging aid areas but also allows us to unpack multi-sector
and unallocated aid, and to estimate flows to cross-cutting themes such as the World Bank Group’s GCPs
and IDA20 Special Themes and Cross-Cutting Issues. In this way, our study demonstrates how machine
learning can enhance the use of CRS data, by offering policy makers and practitioners new tools to extract
additional insights from existing information.


    III.     Methodology
At the pilot stage, we explored three potential machine-learning frameworks: the “Fast Clustering” 4
method, the “HDBSCAN” 5 method, and the “k-means” 6 method, using a sample of 50,000 records from
the 2021 CRS dataset. Each approach combined clustering with LLMs to generate names for the machine-
derived clusters (See Figure 1), in addition to the different tools, platforms, and services for data processing
(see table A.1 in the appendix for more details). We compared the performance of the frameworks by
applying a qualitative ‘grading’ rubric 7 to a random sample of 220 records. The “Fast Clustering”
framework performed the best during this evaluation, producing clusters that were more internally coherent
relative to the other two considered methods.
We then applied the chosen framework to the entire 2021 CRS dataset, covering 395,306 activity-level
records. Each record contains textual descriptions of the project or activity and the corresponding financial
commitment. 8




4
  “Fast clustering” allows for an efficient clustering of records into groups based on pre-defined “similarity-score”
threshold between records. See: Rajendran et al., 2017.
5
  The “HDBSCAN” method is a “Hierarchical density-based clustering” algorithm which allows for finding clusters
of differing densities. See McInnes et al., 2017.
6
  The “K-means” approach is the most popular clustering method and the one used by Toetzke et al. (2022) in their
paper. It clusters all records into a pre-defined number (k) of groups.
7
  Each record in the sample was assigned a score from 0 to 4 based on the degree of coherence between its textual
description and the name of its assigned ML-generated cluster.
8
  Please see Section 1 of the Appendix for more details of the 7-step machine learning methodology.

                                                         4
                      Figure 1: Snapshot of ML Methodology to Process OECD CRS Data




Step 1. Data Preparation. We download the 2021 CRS Data 9 from the OECD website, 10 and assign a
unique identifier to each record for traceability.
Step 2 – Language. Harmonization: Non-English text in project descriptions is translated to English to
ensure comparability. This includes “Project Title”, “Purpose Name”, “Short Description” and “Long
Description”.
Step 3 – Preprocessing. We remove irrelevant information such as country names, agency identifiers, and
generic placeholder text (e.g., “no description,” “multi-sector”) to ensure that clustering focused on
substantive content.
Step 4 – Embedding. Textual descriptions are converted into numerical vectors using a pre-trained sentence
embedding model. This model is trained to represent sentences in a way which captures their underlying
meaning, allowing the algorithm to cluster together records with similar content instead of simply matching
exact words.


9
  It should be noted that for this paper we use the CRS 2021 reporting cycle data that was updated in January 2024.
The CRS historical data is updated periodically by the OECD and partners.
10
    See: OECD Data Explorer, Creditor Reporting System (CRS), https://www.oecd.org/en/publications/creditor-
reporting-system_22180907.html.

                                                        5
Step 5 – Clustering. Records are grouped into similar clusters using the “Fast Clustering” algorithm.
Parameters such as similarity thresholds and minimum cluster size are calibrated through multiple iterations
to balance granularity with interpretability. Our final model produced 841 activity clusters. Records that
did not meet similarity thresholds were grouped in an outlier cluster (“Other”). 11
Step 6 – Naming Clusters. To improve interpretability, we use LLMs to generate descriptive names for
each cluster. For each cluster, representative records are provided to the model, which proposed a concise
label (see the Appendix for prompts and process).
Step 7 – Validation. A sample of clusters is manually validated by reviewing records within clusters to
confirm coherence with the generated names. This ensures alignment between machine-generated outputs
and subject-matter understanding. 12
Our primary iteration for analysis generated 841 clusters which cover a wide range of aid activities. For
example, agriculture-related records were grouped into clusters such as Irrigation Infrastructure, Crop
Production, and Pastoral Livestock Farming. In health, the model identified granular categories such as
Health Personnel Training, Vaccine Development and Production, Tuberculosis Control, and Malaria
Control and Prevention.


     IV.     Results Validation
Our machine learning framework produced 841 activity clusters from the 2021 CRS dataset. In this section,
we present validation exercises and illustrate how the new clusters provide deeper insights into the global
aid architecture compared to existing CRS classifications. The discussion is structured around (i) internal
validation against CRS benchmarks, (ii) external validation against IDA commitment data, and (iii)
illustrative findings from the clusters.
     1. Internal Validation
Due to the statistical and unpredictable nature of unsupervised learning models, it is impossible to perfectly
reproduce the exact same clusters from running the same AI/ML model separate times on the same dataset.
A key test of the framework is whether the generated clusters are broadly consistent in subsequent
executions of the framework while also maintaining granularity. To assess the stability of our model, we
compared the semantic similarity between the results of the framework across two different executions.
Similarly named clusters between the two executions were paired and had a ‘cosine similarity score’13
generated for each pair. In this comparison, we found that only one matched cluster pairs scored under 80%,
while over half (627 out of 822 matched cluster pairs) exceeded a 90% similarity score (figure 2). This
shows strong consistency between iterations, suggesting that the clustering process is stable despite some
minor, expected variations between rounds.


11
   For a full discussion of the clustering process, see the Methods note in Section 1 of the Appendix.
12
   Furthermore, we manually validated the quality of the names produced in step 6 for each cluster, and we edited 48
problematic cluster names. Some examples of these problematic names were names which were overly technical or
included jargon (ex: “ICASS Bill Payments”), names which were overly complex (ex: “Gender-responsive Climate
Finance And Energy Efficiency”), and names which were overly generic (ex: “Maternal”). We then re-named these
clusters after manually examining a sub-set of records in each cluster and validating the topic of those clusters (in
these examples, the example names were changed to “Bill Payments for Donor Office Costs”, “Gender-responsive
Climate Finance”, and “Technical Assistance for Maternal and Child Health”, respectively).
13
   In the context of machine learning, cosine similarity evaluation is used to assess the similarity between clusters.

                                                          6
                             Figure 2: Frequency distribution of cosine similarity scores between two rounds of AI/ML
                                                                     clustering

                                     Cosine Similarity Scores Between Two Iterations of ML Clustering Framework
                                     300

                                                                                                                           250
                                     250
 Number of Paired Clusters Between




                                                                                                                                        199
                                     200
                                                                                                              178
              Rounds




                                     150                                                        135


                                     100


                                      50                                           36
                                                                     22
                                             1           2
                                       0
                                            ≤ 0.8   (0.8, 0.825] (0.825, 0.85] (0.85, 0.875] (0.875, 0.9] (0.9, 0.925] (0.925, 0.95]   > 0.95
                                                               Cosine Similarity Score of Paired Clusters Between Rounds


We also conducted manual validation by reviewing a random sample of records across ten clusters. In
almost all cases, records were both coherent within clusters and consistent with the LLM-generated labels.
For example, in the Vaccine Development and Production cluster, nearly all projects were directly related
to immunization research, procurement, or delivery, with minimal unrelated content.
To better understand how our results compare with existing CRS classification, we manually assigned each
ML-generated cluster to a CRS sector group. 14 Each cluster was assigned to a CRS sector group based on
the name of the generated cluster and the inclusion criteria of the CRS sectors according to the OECD. For
example, the cluster named “Higher Education Scholarships” was assigned to the Education sector group,
while the cluster named “Solar Photovoltaic Power Generation” was assigned to the Energy sector group.
We then summed up the number of records, as well as the dollar amount of commitments, for all clusters
assigned to each sector. This allows us to gain a broad sense of how the new ML-generated organization of
the 2021 CRS data compares to the original CRS categorization. Figure 3 presents the results of this
benchmarking analysis.




14
       For more details on this validation exercise, see Section II of the Appendix.

                                                                                    7
         Figure 3: Comparison of estimation of CRS Sector Groups using ML-Generated Clusters to
                                       Original CRS Sector Groups
                                  2021 data, $US Millions in 2021 prices

                                         Difference in Total Commitment Flows by Sector
                                             (Original CRS vs ML Generated Sectors)
                  70,000                                                                                                                                  60%
                                                                                                                                                         55%
                  60,000                                                                                                     51%                          50%
                                                                                                                 46%
  US$, Millions




                  50,000                                                               43%




                                                                                                                                                                % Change
                                               38%                                           39%                                                          40%
                  40,000                                         35%
                                                                                                                                                          30%
                  30,000                 24%
                  20,000      17%                          19%              17%
                                                                                                                                                          20%
                                                                                                   14%                                  15%        16%
                                                     10%                                                               12%                                10%
                  10,000                                                          8%
                                    5%                                                                   4% 4%
                       0                                               1%                                                          2%         1%          0%




                              $ Commitments, Original                       $ Commitments, ML-generated                        % Difference



                                           Difference in Total Record Count by Sector
                                            (Original CRS vs ML Generated Sectors)
                  90,000                                                                                                                                 120%
                                                                                           112%
                  80,000
                                                                                                                                                         100%
                  70,000
  US$, Millions




                  60,000                                                                                                                                 80%




                                                                                                                                                                % Change
                  50,000                                                                                                 67%                   68%
                                                                                                                                                         60%
                  40,000
                                                                            45%                                                44%
                  30,000                    36%                                                36%
                                                                                                                                                         40%
                  20,000                 27%                                           28%                       25%
                                                                                                           23%                                           20%
                  10,000                            14%                        15%                            16%                       15%
                                                       6% 2%                                         7%                                    10%
                              5% 1%                                                                                                                  5%
                       0                         0%                                                                                                      0%




                           Count of Records, Original                   Count of Records, ML-generated                        Abs. % Difference


The comparison illustrates that our ML-generated clusters broadly align with the high-level CRS sector
groupings. For most of the sectors, the difference between the grouping of the ML-generated clusters and
the CRS sectors is within 20% based on dollar amount or number of records, as illustrated by the two charts

                                                                                       8
in figure 3. For example, in many large sectors, such as Education, Government & Civil Society, and Health,
we see very small differences in the number of records (1%, 5%, and 1%, respectively). This supports the
overall accuracy of the newly generated clusters while leaving room for differences needed for the
extraction of new information. Most of the large differences lie in the small sectors, such as Debt Relief,
where we observe a 68% difference in number of records between our ML-generated clusters and the
original CRS sector classification.
    2. External Validation
Beyond internal tests, we also validated the framework against an independent benchmark: the International
Development Association’s (IDA’s) commitments across sectors. This provides a way to test whether the
new clusters align with a donor’s commitments to sectors from a data source outside of the OECD CRS.
For this exercise, we use the IDA lending by sector report, which provides the Major Sector as well as
Sector (disaggregated view) mapping for IDA operations. We matched the Major Sectors directly to the
ML-generated clusters where possible and used the Sectors (disaggregated) where the Major sectors did
not have a direct pair. Where neither Major Sectors nor Sectors had a matching pair with our ML-generated
results, we treated them as residual sectors without a pair. Four IDA Sectors did not have a corresponding
pair and were left as residual, standalone sectors. The mapping is provided in Table 1.
We see a similar distribution across larger sectors such as Health, Energy, Agriculture, Education, WASH,
etc. However, there are significant differences as well with Social Infrastructure/Social Protection, and
Government & Civil Society/Public Administration. This may be due to the fact that these domains as
produced by our ML framework are broader than World Bank sector definitions (as in the case of Social
Infrastructure/Social Protection) or vice versa (Government & Civil Society/Public Administration). We
also have categories such as Humanitarian Aid and Multi-Sector/Cross-Cutting as part of the machine
framework results, which are not part of the World Bank’s sector list.
This validation demonstrates that unsupervised clustering, despite being data-driven, can approximate
donor-reported thematic allocations with a high degree of accuracy. Importantly, this shows that our
framework does not displace CRS reporting but rather complements it by enabling additional thematic
tracking without requiring new reporting burdens.




                                                    9
             Table 1: Comparison of ML Generated Sectors with IDA Commitments Data
        ML Generated Sectors (OECD          Commitments                WB's IDA Major Sectors/            Commitments
                   CRS-IDA)                (US$, Millions)                       Sectors1                (US$, Millions)
      Other Social Infrastructure Services              6881           Social Protection                             5504
      Health                                            5188           Health                                        5019
      Energy                                            4444           Energy & Extractives                          4113
      Agriculture, Forestry, and Fishing                3257           Agriculture                                   3787
      Education                                         2986           Education                                     3156
      WASH                                              2017           Water/Sanit/Waste                             2254
      Humanitarian Aid                                  1635
      Transport & Storage                               1554           Transportation                                  1859
      Govt. & Civil Society                             1302           Public Admin                                    4464
      Multi-Sector/Cross-Cutting                        1141
      Trade                                              842           Trade                                           1140
      Banking & Financial Services                       808           Financial Sector                                1180
      General Budget Support & Assistance                587
      Communications Infrastructure                      345           ICT Infrastructure                               656
      Business & Other Services                          287
      Tourism                                            185           Tourism                                          130
      Industry, Mining & Construction                    151           Mining                                           111
      Population and Reproductive Health                   98
      Refugee in Donor Countries                           68
      Sectors not Specified                                47
      Debt Relief                                          46
                                                                       Housing Construction                              24
                                                                       Information & Communicatio                       473
      Grand Total                                              33871                                                  33871
      1/ In order to the relevant IDA sector category to ML-Generated sectors, at the first instance we looked at the level
      2 (aggregated) Major sectors and matched it to the relevant sector pair in ML generated sector column (eg: Health,
      Energy, Social protection, etc). Where we couldn't find the exact match at level 2 in IDA dataset we used level 1
      (disaggregated) sectors (eg: ICT Infrastructure, Mining, etc). In these cases, the appropriate deductions were made
      to the level 2 major sectors to which the level 1 sectors belonged.



    V.       Key Findings

Our machine learning framework yielded four sets of findings that complement the CRS by adding
granularity and thematic depth. These findings illustrate how the approach helps identify emerging areas of
aid, unpack multi-sectoral aid, unpack aid that is unallocated by sector, and generate estimates for cross-
cutting priorities that are difficult to monitor using existing purpose codes alone.

         1. Emerging Areas Not Tracked in CRS
The first key finding is the identification of 36 emerging aid areas that are not explicitly captured by the
current CRS taxonomy. Examples include digitalization, food security, and pandemic preparedness. These
topics are dispersed across multiple CRS purpose codes, making them difficult to track systematically
through the existing system. By clustering records based on textual descriptions, our framework provides a
way to systematically monitor these emerging priorities.
The machine generated 841 clusters covered a wide range of topics, many of which were highly specific.
We manually classified all 841 activity clusters into 80 sub-sectors. The design of these sub-sectors was

                                                               10
informed by both the OECD CRS and World Bank sector taxonomies, but also adapted to align with the
content of the clusters themselves. Of these 80 sub-sectors, 36 track topics not currently systematically
monitored in the existing CRS sector classification system, such as “Food Security”, “General Climate”,
“Private Sector Mobilization”, “Digitalization”, “Pandemics”, “Employment Creation”.
                                 Figure 4: Methodology for creating sub-sectors.




Policy relevance: This finding shows how CRS data can be leveraged to monitor new themes without
altering the official purpose code structure, thus reducing reporting burdens while increasing analytical
value.
        2. Unpacking Multi-Sectoral Aid
The second finding concerns activities classified as Multi-Sectoral/Cross-Cutting in CRS. Current CRS
reporting standards, by their nature, have difficulty tracking flows to multi-sector or cross-cutting areas
(Pitt et al., 2018). These categories are broad by design, but their heterogeneity limits their usefulness for
policy makers. For example, a project relating to both health and education may either be classified into the
Health sector, Education sector, or Multi-Sector/Cross-Cutting, depending on the judgment of the reporting
donor. Our model re-assigns such records into more specific clusters. For example, aid activities previously
grouped as “multi-sectoral” were redistributed into clusters such as Health and Climate, General Women
and Youth, Education and Health, Rural Development, Urban Development, General Environment, and
more. Figure 5 presents the amount of commitments attributed to these multi-sectoral sub-sectors by the
machine learning framework. This totals $59 billion, with General Women and Youth serving as the largest
cross-cutting area ($15 billion) followed by Rural Development ($10 billion), Other Multi-Sector ($10
billion), and General Climate ($6 billion).




                                                     11
                      Figure 5: ML-generated Multi-Sector/Cross Cutting Sub-Sectors
                                      In $US millions in 2021 prices

     $16,000.00

     $14,000.00

     $12,000.00

     $10,000.00

      $8,000.00

      $6,000.00

      $4,000.00

      $2,000.00

            $-




Policy relevance: This re-mapping improves the interpretability of multi-sectoral commitments and
provides policy makers with clearer insights into the actual nature of spending.
           3. Unpacking Aid Unallocated by Sector
The third finding involves aid flows for which no sectoral focus was reported. In the 2021 CRS data and
original sector classifications, 9,001 records are classified into the “Sector not Specified” CRS sector group.
These activities account for a sizable share of reported commitments but provide limited analytical value
in their raw form. Our machine learning framework grouped the majority of these records (5,839 of 9,001,
or 65%) into clusters with a clear sectoral focus, while the rest (3,162 records) fell into our outlier sub-
sector. 15 Table 2 presents the major clusters where the machine learning algorithm grouped those records
originally classified as “Sector not Specified” in the 2021 CRS data. Table 2 shows that a large share of
these records were related to the rescheduling or refinancing of disbursements, and the machine learning
algorithm distributed many of the other records into areas such as banking & financial services, industry,
democratic participation/elections, governance reform and support, social protection and dialogue, and
language education.




15
  We created one outlier Sub-Sector for clusters for which we were unable to identify any overarching or unifying
theme. This includes the following four clusters: Others, Miscellaneous Grants, Unallocated Contributions, and
Disbursement Rescheduling and Refinancing. The later cluster includes records focused on refinancing loans or
changing disbursement timelines, without any clear sectoral focus.

                                                       12
          Table 2: Top 10 ML-Cluster Destination for CRS “Sectors not Specified” records 16
                                                                                                      Sum of
                                                  Corresponding Author              Count of       Commitments
ML-Generated Cluster                              Created Sub-Sector                Records        ($US millions)
Disbursement Rescheduling and
Refinancing                                       Sectors Not Specified                  591                $ (11.90)
                                                  Industry, Mining &
Nonferrous Metals Production                      Construction                           319                  $ 9.14
Program Support and Operational Costs             Administrative Costs                   298                 $248.20
                                                  Other Banking &
Strategic Financial Support                       Financial Services                     231                   $ 4.14
Participatory Governance and Local                Democratic
Economic Development                              Participation/Elections                203              $ 1,140.52
Multi-Sectoral Financing                          Other Multi-Sector                     189              $ 1,538.10
International Election Observation and            Democratic
Democratic Participation                          Participation/Elections                176                 $341.03
Local Development and Governance                  Governance Reform and
Support                                           Support                                173                  $43.34
                                                  Social Protection and
Social Dialogue Support                           Dialogue                               171                  $43.00
                                                  Language Education and
English Language Skills Development               Research                               169                 $116.32


Policy relevance: By reducing the share of aid recorded under the category of “Sectors not Specified”, this
approach enables more accurate attribution of activities to their relevant sectors, thereby enhancing the
clarity and usefulness of aid data.
         4. Estimating Flows to Cross-cutting and Thematic Priorities
Finally, we use our framework to make estimates of financial support to cross-cutting themes central to
areas high on the international agenda. We conducted this exercise for the five IDA20 Special Themes, the
four IDA Cross-Cutting Issues, and the six WB Global Challenge Programs (GCPs). Table 3 shows the 26
ML-generated clusters mapped to the IDA20 Special Theme of “Climate Change”. Clusters mapped to this
thematic issue include Agricultural Policy and Climate Risk, Solar Energy Solutions and Applications, and
Water Policy and Climate Change. These 26 clusters total 16,937 records and over $25 billion in
commitments (see Table 3). 17 The full list of mapping of all clusters to these thematic priorities is available
upon request. Figure 6 presents the results of this analysis, including the estimated aid flows to each
thematic area based on this methodology. Based on these estimates, over $25 billion of commitments in



16
   Table 3 only presents the subset of records classified as “Sectors not Specified” in the CRS data and includes ten
clusters with the most records from this CRS sector group. Information of the entirety of those ML-generated clusters,
which also include records from other CRS sector groups, is available upon request.
17
   The limitation of these methods should be noted, particularly the likelihood that our estimates underestimate the
total flows to these thematic issues. For example, while the majority of records in “Agricultural Productivity and
Climate Adaptation” may be related to climate, we cannot determine how many records in other clusters —despite
having non-climate-related names— also pertain to climate.

                                                         13
2021 were directed toward the IDA20 Special Theme “Climate Change”, while over $42 billion was
committed toward the GCP of “Enhanced Health Emergency Prevention, Preparedness and Response.” 18
         Table 3: Mapping of ML-Generated Clusters to IDA20 Special Theme “Climate Change”
                                                                                   Count of       Commitments
     ML-Generated Cluster mapped to “Climate Change”                               Records       ($US millions)
     Climate Change Adaptation Strategies                                      1891             $2546.49
     Agricultural Productivity and Climate Adaptation                          1504             $1174.87
     Climate-smart Agricultural Financing                                      1407             $1973.13
     Agricultural Development and Solar Energy Utilization                     1378             $1050.39
     Renewable Energy Investment                                               1132             $2257.94
     Hydropower Plant Security and Efficiency                                  844              $3747.35
     Solar Photovoltaic Power Generation                                       765              $2577.26
     Agricultural Development and Climate Monitoring                           765              $306.44
     Agricultural Sector Development and Climate Resilience                    714              $594.90
     Renewable Energy Development and Efficiency                               688              $250.77
     Renewable Energy Transition and Efficiency                                675              $418.21
     Green Energy Transition and Efficiency                                    578              $1377.08
     Climate Change Adaptation and Disaster Risk Reduction                     539              $1250.10
     Disaster Risk and Climate Change Adaptation                               535              $1261.91
     Climate Policy Implementation and Biodiversity Conservation               513              $445.48
     Climate Adaptation in Agriculture                                         457              $286.98
     Water Supply Systems and Energy Generation                                385              $1034.19
     Water Policy and Climate Change                                           361              $511.99
     Solar And Geothermal Energy Development                                   316              $730.35
     Health Policy And Environmental Improvement                               286              $86.43
     Agricultural Policy and Climate Risk                                      245              $247.76
     Climate-smart Agriculture and Food Security                               234              $161.39
     Food Security and Climate Health Research                                 229              $40.03
     Solar Energy Solutions and Applications                                   223              $443.73
     Gender-responsive Climate Finance                                         214              $489.84
     Environmental Health Risk Reduction                                       59               $3.50
     Total Climate Change Estimate                                             16,937           $25,268.49




18
  The estimates created for high-level thematic issues are high-level and likely under-estimating flows to these
areas. A future area of work is to create more accurate estimations for these thematic issues.

                                                         14
     Figure 6: Estimation of 2021 Commitments to WB GCPs, IDA20 Special Themes, and IDA20
                                        Cross-Cutting Issues
                                    In $US billions in 2021 prices




Policy relevance: This demonstrates how bottom-up clustering can complement CRS classifications by
enabling monitoring of financial support to cross-cutting thematic areas of high importance on the
international agenda and which are not easily captured by existing purpose codes.


    VI.     Conclusion
These results demonstrate how machine learning methods can extract additional value from aid data and
help mitigate existing challenges in aid monitoring. Our bottom-up, unsupervised methodology organically
groups aid activities, enabling the identification of emerging themes that are not easily captured through a
top-down classification system. This re-grouping also provides a way to unpack activities previously coded
as “Multi-Sector/Cross-Cutting” or “Sector Not Specified”, categories which, while useful, limit analytical
depth.
The granularity of our ML-generated clusters further allows for high-level estimation of commitments to
thematic areas that are otherwise difficult to track. In this report, we applied the framework to the World
Bank Group’s GCPs, IDA20 Special Themes, and IDA Cross-Cutting Issues. The resulting estimates
illustrate how a data-driven, complementary approach can enrich the monitoring of global aid flows.
Despite these contributions, several limitations should be acknowledged. Because unsupervised clustering
does not produce a single “correct” grouping, it is impossible to validate each of the 395,306 records

                                                    15
individually. To address this, we compared subsets of machine-generated clusters against CRS sector
classifications and validated them at the sectoral level using both financial volume and record counts. While
these checks support internal validity of the approach at a high-level, uncertainty remains at the level of the
individual record.
In addition, when we scale up the analysis to include additional years of CRS data, the machine generated
clusters will also change. A key strength of this organic, bottom-up methodology is its ability to adaptively
capture unexpected patterns in the data, which might be stifled if outcomes from previous iterations were
rigidly enforced. However, this also means that the results from the ML framework are not perfectly
replicable, and incorporating additional data will lead to different clusters than those produced in previous
iterations. Therefore, the outcomes from an expanded dataset may not be directly comparable to those from
this initial one-year exercise. This poses challenges for longitudinal analysis and for tracking changes over
time, as each addition of data will reshape the clusters.
While our unsupervised method allows us to estimate funding to thematic areas, such as those identified by
the World Bank Group’s GCPs, the IDA20 Special Themes, and the IDA Cross-Cutting Issues, these
estimates are inherently high-level and likely understate the true volume of flows to these areas This is
because the estimates are based on aggregating clusters together rather than assigning each record
individually to specific thematic issues. For example, while we may assume that most or all records in the
cluster named “Climate Adaptation in Agriculture” are climate related, there is no way to be sure that
projects with climate adaptation aspects are not also included in other clusters without climate-explicit
names. As such, it is likely that our method underestimates the number of records that are relevant to each
theme.
These limitations point to opportunities for refinement. Future work could combine unsupervised clustering
with record-level tagging to improve accuracy in thematic estimation. For example, records could be
directly classified against predefined themes (e.g., jobs, digitalization, the IDA Special Themes) in order to
provide more precise and replicable results.
From a policy perspective, the framework illustrates how existing CRS data can be leveraged without
altering official reporting requirements. By uncovering emerging topics, unpacking residual categories, and
generating thematic estimates, machine learning provides a flexible complement to the CRS. This can
strengthen monitoring of global aid flows in real time and support bilateral donors, multilateral institutions,
and policy makers in efforts to align resources with evolving development priorities. It can also provide
recipient countries with better insights into aid flows, helping to inform dialogue and coordination around
external financing.


References
Annen, K. & Kosempel, S. (2009). Foreign aid, donor fragmentation, and economic growth. The B.E.
   Journal      of      Macroeconomics,        9(1),     Article       33.     Available     at:
   http://www.bepress.com/bejm/vol9/iss1/art33
Borst, J., Wencker, T. & Niekler, A. (2022). Using text classification with a Bayesian correction for
    estimating overreporting in the Creditor Reporting System on climate adaptation finance. arXiv
    preprint. Available at: https://arxiv.org/pdf/2211.16947.pdf




                                                      16
Casadevall Bellés, S. & Calleja, R. (2024). The evolution of the ODA accounting rules. CGD Note No. 376.
    Center for Global Development, 28 June. Available at: https://www.cgdev.org/publication/evolution-
    oda-accounting-rules
Development Co-operation Directorate Development Assistance Committee DAC Working Party on
   Development Finance Statistics. (2021). Converged statistical reporting directives for the Creditor
   Reporting System (CRS) and the Annual DAC Questionnaire. OECD. Available at:
   https://one.oecd.org/document/DCD/DAC/STAT(2020)44/FINAL/En/pdf
Dixit, S., Mao, W., McDade, K.K. & Schäferhoff, M. (2022). Tracking financing for global common goods
    for health: A machine learning approach using natural language processing techniques. Unpublished
    manuscript.
Djankov, S., Montalvo, J.G. & Reynal-Querol, M. (2009). Aid with multiple personalities. Journal of
    Comparative Economics, 37(2), 217–229. https://doi.org/10.1016/j.jce.2008.09.005
Dupriez, O. (2018). An empirical comparison of machine learning classification algorithms. World Bank.
   Available at: http://pubdocs.worldbank.org/en/666731519844418182/PRT-OD-presentation-V2.pdf
Easterly, W. & Pfutze, T. (2008). Where does the money go? Best and worst practices in foreign aid. Journal
    of Economic Perspectives, 22(2), 29–52.
Flogstad, C. & Hagen, R.J. (2017). Aid dispersion: Measurement in principle and practice. World
    Development, 97, 232–250. https://doi.org/10.1016/j.worlddev.2017.04.022
Goldblatt, R., Stuhlmacher, M., Hostert, P. & Fensholt, R. (2017). Using Landsat and nighttime lights for
    supervised pixel-based image classification of urban land cover. Remote Sensing of Environment, 205,
    253–275. https://doi.org/10.1016/j.rse.2017.11.026
Gounden, C., Irvine, J. & Wood, R. (2015). Promoting food security through improved analytics. Procedia
   Engineering, 107, 335–336. https://doi.org/10.1016/j.proeng.2015.06.089
Gualberti, G. & Wilcks, J. (2021). Measuring official development finance for digitalisation. OECD
    iLibrary. Available at: https://www.oecd-ilibrary.org/sites/0b50096a-en/index.html
Jean, N., Burke, M., Xie, M., Davis, W.M., Lobell, D.B. & Ermon, S. (2016). Combining satellite imagery
    and      machine      learning    to    predict    poverty.   Science,    353(6301),       790–794.
    https://doi.org/10.1126/science.aaf7894
Kimura, H., Mori, Y. & Sawada, Y. (2012). Aid proliferation and economic growth: A cross-country
   analysis. World Development, 40(1), 1–10.
Lu, X., Wrathall, D., Sundsøy, P., Iqbal, A., Wetter, E., Qureshi, T. & Bengtsson, L. (2016). Unveiling
    hidden migration and mobility patterns in climate stressed regions: A longitudinal study of six million
    anonymous mobile phone users in Bangladesh. Global Environmental Change, 38, 1–7.
    https://doi.org/10.1016/j.gloenvcha.2016.02.002
McInnes, L., Healy, J. & Astels, S. (2017). hdbscan: Hierarchical density-based clustering. Journal of Open
   Source Software, 2(11), 205. https://doi.org/10.21105/joss.00205




                                                    17
McKenzie, D. & Sansone, D. (2017). Man vs. machine in predicting successful entrepreneurs: Evidence
   from a business plan competition in Nigeria. World Bank Policy Research Working Paper. Available
   at: http://documents.worldbank.org/curated/en/968231513116778571
Marineau, J.F. (2023). Disaggregating foreign aid: What have we learned from research on sub-national
    foreign aid. Commonwealth Review of Political Science, 6(1), Article 3.
Micah, A.E., Solorio, J., Stutzman, H. & others. (2022). Development assistance for human resources for
   health, 1990–2020. Human Resources for Health, 20, 51. https://doi.org/10.1186/s12960-022-00744-x
Mohri, M., Rostamizadeh, A. & Talwalkar, A. (2018). Foundations of Machine Learning. MIT Press.
Nunnenkamp, P., Öhler, H. & Thiele, R. (2013). Donor coordination and specialization: Did the Paris
   Declaration make a difference? Review of World Economics, 149(3), 537–563.
OECD. (2009). Comparative study of data reported to the OECD Creditor Reporting System (CRS) and to
   the      Aid      Management          Platform        (AMP).      OECD.        Available     at:
   http://www.oecd.org/3dac/stats/43908328.pdf
OECD. (2019). CRS reporting checklist (DCD/DAC/STAT(2019)27). OECD Development Co-operation
   Directorate, Development Assistance Committee.
OECD.      (2021).   Purpose      codes:      Sector  classification.  OECD.      Available               at:
   https://www.oecd.org/development/financing-sustainable-development/development-finance-
   standards/purposecodessectorclassification.htm
Parthasarathy, R., Rao, V. & Palaniswamy, N. (2017). Deliberative inequality: A text-as-data study of Tamil
    Nadu's village assemblies. World Bank Policy Research Working Paper. Available at:
    http://documents.worldbank.org/curated/en/582551498568606865
Pincet, A., Okabe, S. & Pawelczyk, M. (2019). Linking aid to the Sustainable Development Goals – A
    machine learning approach. OECD Working Paper. Available at: https://www.oecd-
    ilibrary.org/docserver/4bdaeb8c-en.pdf
Pitt, C., Grollman, C., Martinez-Alvarez, L., Arregoces, L. & Borghi, J. (2018). Tracking aid for global
      health goals: A systematic comparison of four approaches applied to reproductive, maternal, newborn,
      and child health. The Lancet Global Health, 6(9), 859–874.
Rajendran, E., Kalaiprasath, R. & Udayakumar, R. (2017). A fast-clustering algorithm for high-dimensional
    data. International Journal of Civil Engineering and Technology, 8(10), 1220–1227.
Stockholm Environment Institute. (2025). About the OECD’s CRS data. Available at: https://aid-
    atlas.org/about/data
Toetzke, M., Banholzer, N. & Feuerriegel, S. (2022). Monitoring global development aid with machine
    learning. Nature Sustainability, 5, 533–541. https://doi.org/10.1038/s41893-022-00874-z
Toetzke, M., Stünzi, A. & Egli, F. (2022). Consistent and replicable estimation of bilateral climate finance.
    Nature Climate Change, 12(10), 897–900. https://doi.org/10.1038/s41558-022-01482-7
World Health Organization. (2015). State of inequality: Reproductive maternal newborn and child health:
   Interactive visualization of health data. WHO.


                                                     18
                            Appendix: Supplementary Methodological Notes


    I.         Pilot Phase and ML-Framework Selection
The analysis was conducted using the Creditor Reporting System (CRS) dataset from the Organisation for
Economic Co-operation and Development (OECD). The dataset provides project-level information on
global development aid flows as reported by participating donor countries to the OECD. Aid activities are
updated by donor organizations at the end of each year, and the dataset is updated by the OECD annually.
Each activity within the OECD CRS dataset includes a project title, a short description, a long description,
and a purpose name. Textual descriptions are usually written in the official language of the donor
organization.
In our initial exploration, three different approaches were considered (see Table A.1). A preliminary run on
a random sample of 50,000 English records from 2021was conducted with each, in order to evaluate the
approach’s quality of outcome, technical evaluation options, and machine learning training inputs.
                          Table A.1: Considered Machine Learning Frameworks




The authors then manually graded the quality of clustering using a sub-sample of 220 records and their
corresponding ML-generated, assigned cluster. Each record in the sub-sample was assigned a score from 0
to 4 based on the degree of coherence between its textual description and the name of its assigned ML-
generated cluster. Approach 2 (Fast Clustering Methodology) was then chosen over Approach 1
(HDBSCAN) and Approach 3 (K-Means Methodology) for the following reasons: 1) The Fast-Clustering
methodology performed noticeably better during manual evaluations than the HDBSCAN methodology; 2)
The Fast-Clustering methodology, unlike K-Means which requires pre-determining the number of sub-
sectors, offers the advantage of organically deriving the number of sub-sectors from the data itself, aligning
with the project’s goal of creating a truly bottom-up understanding of aid activities; 3) The Fast-Clustering
methodology allows for the creation of a true outlier sector, while the K-Means approach does not; and 4)
The Fast-Clustering methodology allows for the calibration of three parameters to better calibrate the size
and number of clusters generated, which the K-Means methodology does not. Table A.2 provides additional
details of the individual components of this chosen approach.
             Table A.2: Components of Chosen Machine Learning Framework - Approach 2 (Fast
                                              Clustering)

   Approach          Compute       Embedding              Clustering       Language          Naming
                                                                           Service           (LLM
                                                                                             Model)
         2        Azure Data       All-miniLM-       Fast Clustering       Azure             Azure Open
                  Bricks: a        L6-v2:            Algorithm: This       Translator        AI – (GPT
                  unified          Is a sentence     algorithm is tuned    Service is a      4): Azure
                  analytics        transformer       for large datasets.   cloud-based       OpenAI
                  platform         model that        In a large list of    neural            Service is a
                  offered by       maps              sentences, it         machine           fully

                                                     19
                     Microsoft           sentences &         searches for local   translation       managed
                     Azure,              paragraphs to       communities. A       service that is   service that
                     developed in        a 384-              local community is   part of the       allows
                     collaboration       dimensional         a set of highly      Azure AI          developers to
                     with                dense vector        similar sentences.   services          easily
                     Databricks. It      space.                                   family and        integrate
                     provides a                                                   can be used       OpenAI
                     managed                                                      with any          models into
                     Apache Spark                                                 operating         their
                     environment                                                  system            applications.
                     for big data
                     processing and
                     analytics.


Sequentially, the sample was enlarged to all aid activities in the 2021, a total of 395,306 records, to further
validate, calibrate, and improve the machine learning framework. In Section III of this paper, we broadly
outlined this finalized, seven-step, ML-clustering framework. Here, we provide additional methodological
notes, expanding on the technical aspects of each step:
Step 1 - Data Preparation. The CRS Data is manually downloaded from the OECD Site and placed in the
Azure Blob as ".CSV” Files. Since the CRS Data does not have any unique identifier, we generate and
assign a random unique number for each CRS record in the first step for back mapping and identification
purposes.
Step 2 – Language. “Fasttext-langdetect” is an open-source library working as a wrapper for the language
detection model trained on ‘fastText’ by Facebook. 19 The library can recognize 250+ language. The columns
“Project Title”, “Purpose Name”, “Short Description” and “Long Description” within the CRS data were
used for this analysis. The Open-source Language detection is first run on these columns and non-English
values are filtered separately. Azure AI Translator Service is used for the actual translation of non-English
characters. Azure Translator Service is a cloud-based neural machine translation service that is part of the
Azure AI services family and can be used with any operating system.
Step 3 – Preprocessing. Names and Keywords such as "resilience", "resilient", "resiliency", "no
description", "semi-aggregates", "non-sector allocable", "aid-activity", "usaid", "multi-sector", "aid
activities", "multisector", "USAID", "aid", "activity", "developing countries", "multisectoral", "africa",
"antarctica", "asia", "europe", "north america", "south america", "australia","african", "portuguese",
"asian", "american", "european", "australian" are removed from descriptive text columns. The redacted
words are found in the key text columns for analysis, then we consider also “Sector Name” (5th column)
as well. These key columns are then merged into a single column for easier analysis. Flair, a powerful NLP
library, is used to remove geographical information from the text, including country and city names,
multilateral & bilateral Agency names, defined keywords, etc. Flair allows for the application of state-of-
the-art natural language processing (NLP) models to a text, such as named entity recognition (NER),
sentiment analysis, part-of-speech tagging (PoS), special support for biomedical data, sense disambiguation
and classification, with support for a rapidly growing number of languages.




19
     See: https://fasttext.cc/docs/en/language-identification.html.

                                                             20
Step 4 – Embedding. “all-MiniLM-L6-v2” model is used for embedding as the sentence-transforming
model. It maps sentences & paragraphs to a 384-dimensional dense vector space, which is required for our
task of clustering textual descriptions.
Step 5 – Clustering. Clustering is done based on the embeddings generated in the previous step through
the “Fast Clustering” algorithm, calibrated for large datasets (50,000 sentences in less than 5 seconds). The
algorithm searches for local communities (a set of highly similar sentences) within the large dataset. We
configure the threshold of cosine-similarity score to define the level of similarity of sentences. This also
allows users to define the minimal size for a local community and derive either large coarse-grained clusters
or small fine-grained clusters. Table A.3 presents the parameters chosen for the model. Key parameters
include: 1) threshold: a high threshold will only find extremely similar sentences; a lower threshold will
find sentences that are less similar. Consider sentence pairs with a cosine-similarity larger than the threshold
as similar, 2) Min_Community_Size: only communities with at least a certain number of sentences will be
returned, 3) Batch_Size: technical parameter where the total observations are separated in batches and
processed parallelly, and 4) Sort Max Size: maximum size for a community. Cosine similarity in ML can
be used as a metric for deciding the optimal number of neighbors where the data points with a higher
similarity will be considered as ‘nearest’ neighbors and the data points with lower similarity will not be
considered. Based on the above settings, around 600-800 clusters are generated with the 2021 CRS data.
Once the clusters are generated, we map them to the original CRS data for each unique identifier.
                       Table A.3: Key Parameters for Fast Clustering Algorithm

                  Threshold     Min_Community_Size           Batch_Size      Sort Max Size
                         0.80                          50            1024                 100


Step 6 – Naming. For each cluster, six representative sentences are identified and provided to Azure Open
AI to generate the name for that cluster. We leverage the gpt-4-32k (0613) model, which names each cluster
using the six representative sentences.
Step 7: Postprocessing. Only the required columns are selected and written to the final output file.


    II.     Machine-Generated Clusters
Our machine learning framework grouped all records in the 2021 CRS data into 841 distinct clusters with
unique names. In order to gain insight into how this new machine learning categorization compared to the
original CRS sector classification, our subject matter experts manually classified each cluster into an
existing CRS sector. This validation exercise is discussed in Section IV, “Results Validation”. We then also
grouped each cluster into new "Sub-Sectors." This grouping was informed by the OECD CRS and World
Bank sector taxonomies, as well as the judgment of the authors. This approach allows the identification of
emerging trends that are not immediately evident in the full list of 841 machine-generated clusters. Table
A.4 presents the full list of the 80 new sub-sectors. It includes the total number of records and commitments
aggregated into each sub-sector from the clusters that were grouped together. Additionally, the table
indicates whether these sub-sectors represent topics that are already tracked by existing CRS sectors or if
they highlight new emerging topics.
The full table of all 841 machine-generated clusters and their mapping to both the CRS Sector Groups and
our new, author-created sub-sectors is available upon request.

                                                      21
                             Table A.4: Newly Created Sub-Sectors

                                                     Tracked in     Count of   Sum of
Sub-Sectors Aggregated from ML-Generated Clusters    CRS            Records    Commitments
                                                     Sectors?                  ($US Millions)
NGO's, Volunteers, Partnerships                      No             6,576      503.71
Administrative Costs                                 Yes            20,350     10100.18
Agriculture                                          Yes            7,793      6228.81
Forestry                                             Yes            3,216      2407.23
Pest Control and Animal Health                       No             2,316      1382.98
Fisheries                                            Yes            2,175      1777.93
Food Security                                        No             4,062      3967.01
Agriculture and Climate                              No             7,217      6241.31
Private Sector Mobilization                          No             1,254      1418.98
Other Banking & Financial Services                   Yes            4,855      16116.14
Business & Other Services                            Yes            2,640      2085.01
Digitalization                                       No             2,875      2211.82
Other ICT Infrastructure                             Yes            5,373      1378.01
Construction                                         Yes            286        203.74
Debt Relief                                          Yes            849        1105.65
Language Education and Research                      No             1,617      364.16
Global Leadership Development                        No             2,421      808.36
Tertiary Education                                   Yes            5,698      4981.72
Basic Education                                      Yes            4,006      2507.82
Secondary and Vocational Training                    Yes            5,354      3951.48
International Student Mobility                       No             1,187      343.99
Specialized Training                                 No             1,591      1308.42
Other Education                                      Yes            8,967      5969.56
Research                                             Yes            1,539      755.38
Non-renewable Energy                                 Yes            1,883      6748.16
Other Energy                                         Yes            3,131      9410.81
Renewable Energy                                     Yes            5,606      12836.88
General Budget Support & Assistance                  Yes            1,459      8105.13
Domestic Resource Mobilization                       No             3,316      3721.15
Governance Reform and Support                        Yes            2,643      2848.70
Democratic Participation/Elections                   Yes            12,110     3847.48
Macroeconomic Policies                               No             2,845      3434.51
Fragility                                            Yes            6,498      3109.72
Security Systems                                     No             8,342      718.87
Judicial Systems                                     No             3,319      218.55
Public Sector Capacity Building                      Yes            3,634      1966.36
Combating Illicit Flows                              No             2,050      616.47
Narcotics Control                                    No             17,405     668.44
Law Enforcement                                      No             5,186      344.35

                                             22
Human Rights Protection                                    No    6,952    5763.82
Migrant Support and Protection                             No    1,807    2015.04
Customs and Borders                                        No    1,548    347.89
Health and Climate                                         No    1,064    189.53
Health Policy                                              Yes   2,803    1302.57
Vaccinations                                               No    1,029    400.33
Professional Health Development                            Yes   1,014    287.04
Nutrition                                                  No    2,135    463.85
Health Technical Assistance, Training, Capacity Building   No    2,769    1605.21
Health Financing                                           Yes   3,695    10849.83
Health Systems                                             Yes   10,232   5033.18
Other Health                                               Yes   9,294    5837.86
Pandemics                                                  No    13,181   42214.49
HIV/AIDS                                                   Yes   14,200   6932.11
Maternal and Child Health                                  No    5,564    2879.80
Reproductive Health                                        Yes   7,038    3282.34
Emergency Food Assistance                                  Yes   4,808    11128.44
Emergency Response                                         Yes   11,243   22127.62
Emergency Preparation                                      Yes   4,887    7051.62
Industry, Mining & Construction                            Yes   5,793    3717.64
Other Multi-Sector                                         Yes   5,231    9621.84
Rural Development                                          No    6,280    9695.48
Urban Development                                          No    3,245    5972.97
General Women and Youth                                    No    15,649   15167.99
Education and Health                                       No    4,046    2059.04
General Climate                                            No    3,540    6060.33
General Environment                                        Yes   8,180    4256.57
Social Protection and Dialogue                             Yes   9,183    9176.65
Employment Creation                                        No    2,163    394.07
Cultural Heritage Preservation                             No    2,865    263.24
Labor Rights                                               No    4,955    845.08
Refugees                                                   Yes   2,521    9122.02
Sectors not Specified                                      Yes   3,162    1821.55
Tourism                                                    Yes   1,475    816.34
Trade                                                      Yes   3,598    6972.78
Transport Infrastructure                                   Yes   2,679    8560.36
Non-land Transportation                                    Yes   1,924    2209.46
Land Transportation                                        Yes   3,262    12054.62
Logistics and Transportation Costs                         No    5,268    3190.68
WASH                                                       Yes   10,359   11188.27
Sustainable Water and Waste                                No    1,649    859.74


  III.    Mapping of Clusters to Thematic Areas

                                                 23
The granular nature of the ML-generated clusters enables the high-level estimation of financial support for
thematic issues, such as the WBG GCPs, the IDA20 Special Themes, and the IDA Cross-Cutting Issues.
Our subject matter experts manually identified clusters that were specifically related to these broad thematic
areas based on the ML-generated name of each cluster. The identified clusters were then aggregated to
create estimations of the financial support going to each of these thematic areas in 2021. These estimates
are presented in Figure 6 and discussed in Section V, “Key Findings”. The full list of all ML-generated
clusters and their mapping to the select thematic issues is available upon request.




                                                     24