Policy Research Working Paper 10772 From Survey to Big Data The New Logistics Performance Index Jean-François Arvis Daria Ulybina Christina Wiederer Macroeconomics, Trade and Investment Global Practice May 2024 Policy Research Working Paper 10772 Abstract The World Bank has published the Logistics Performance set of indicators measure the speed and connectivity of Index since 2007. The Logistics Performance Index used international supply chains. This paper presents the data to be based exclusively on perception ratings from a global sources, rationale, and production of the indicators. It does survey of logistics professionals. In 2023, it was augmented not discuss the findings from the new indicators, nor does with key performance indicators derived from massive it introduce additional empirical work. The paper com- global international shipment tracking data (data on con- plements the 2023 issue of Connecting to Compete, the tainer shipping, air cargo, and postal logistics). The new companion report to the Logistics Performance Index. This paper is a product of the Macroeconomics, Trade and Investment Global Practice. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at jarvis1@worldbank.org, cwiederer@worldbank.org, and dulybina@worldbank.org. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent. Produced by the Research Support Team From Survey to Big Data: The New Logis- tics Performance Index Jean-François Arvis, Daria Ulybina, and Christina Wiederer1 JEL: F10, R40 1 The authors are all in the World Bank’s Trade and Regional Integration Unit (ETIRI) and can be reached at jarvis1@worldbank.org, dulybina@worldbank.org, cwiederer@worldbank.org. Table of Contents Abstract........................................................................................................................................ i Abbreviations ............................................................................................................................. iv 1. Introduction ............................................................................................................................. 1 What is the World Bank Logistics Performance Index (LPI)? ........................................................................ 1 Moving towards Big Data .............................................................................................................................. 2 Digitization and new opportunities: massive tracking data .......................................................................... 3 2. Data sources ............................................................................................................................ 4 Universal Postal Union (Postal data) ............................................................................................................. 4 Cargo iQ (Air cargo data) ............................................................................................................................... 7 TradeLens (Container tracking data) ............................................................................................................. 9 MDST and Marine Traffic (Container ship tracking data)............................................................................ 11 3. Definition and processing of KPIs ............................................................................................ 12 From tracking data to country indicators.................................................................................................... 12 Definition of the KPI .................................................................................................................................... 13 Level of availability and country level aggregation ..................................................................................... 16 Data cleaning and consistency .................................................................................................................... 16 Potential biases with the data..................................................................................................................... 18 Potential for more indicators from the existing data.................................................................................. 19 4. Conclusions and future directions ........................................................................................... 19 References ................................................................................................................................. 20 Appendix 1: Container tracking data from TradeLens .................................................................. 22 Appendix 2: Postal data from the Universal Postal Union (UPU) .................................................. 30 Appendix 3: Air cargo tracking data from Cargo iQ ..................................................................... 34 Appendix 4: Container ship tracking datasets ............................................................................. 36 MDS Transmodal ......................................................................................................................................... 36 ii Marine Traffic .............................................................................................................................................. 37 List of tables Table 1: Events in the UPU database ...................................................................................................... 5 Table 2: KPI derived from tracking data in the LPI 2023 ....................................................................... 14 Table 3: Sample container voyage ........................................................................................................ 16 Table 4: Rules that consistent events must follow (TradeLens dataset) .............................................. 17 Table 5: Number of observations by event type (TradeLens dataset) ................................................. 23 Table 6: Data provisioning requirements by role TradeLens event categories .................................... 25 Table 7: TradeLens event categories .................................................................................................... 25 Table 8: List of timestamps (TradeLens dataset) .................................................................................. 27 Table 9: EDI message names and definitions........................................................................................ 30 List of figures Figure 1: UPU coverage by region and income group ............................................................................ 6 Figure 2: Cargo iQ events sequence........................................................................................................ 7 Figure 3: Country coverage of Cargo iQ dataset by WB region and income group ................................ 8 Figure 4: TradeLens supply chain framework ....................................................................................... 11 Figure 5: TradeLens dataset: Number of observations by week .......................................................... 22 Figure 6: Count of consignments by number of events (TradeLens) .................................................... 24 Figure 7: High-level network structure of Tradelens dataset ............................................................... 29 Figure 8: Share of null values in 2019 parcel postal records of destination countries ......................... 32 Figure 9: Network representation of the UPU dataset ......................................................................... 33 Figure 10: Sequence of data events (Cargo iQ dataset) ....................................................................... 34 Figure 11: Maritime dataset country coverage by WB region and income group ............................... 37 iii Abbreviations AIS Automatic Identification System B2B Business to Business e-AWB Electronic Airway Bill EMSEVT Express Mail Service EVenTs IATA International Air Transport Association IMO International Maritime Organization KPI Key performance indicator LPI Logistics Performance Index MOP Master Operating Plan OE Office of Exchange TE Transport equipment TEU Twenty-foot equivalent unit ULCV Ultra-large carrier vessel UPU Universal Postal Union iv 1. Introduction Logistics combines operations and services related to the movement and storage of goods. Logistics services include freight forwarding; third-party logistics; customs brokerage; road transportation; port, airport and rail operations; and warehousing and refrigeration. Logistics is a critical contribution to economic activity within and across borders. Efficient logistics help reduce trade costs and is essen- tial to trade and regional integration. Low international connectivity, inadequate logistics infrastruc- ture, low-quality logistics services, and heavy trade procedures at and behind the border raise logistics costs. Logistics costs amount to 8% of GDP in the United States but rise to 15%-20% in many middle- income countries and up to 30% in low-income countries, landlocked or island states. 2 Although logistics is mainly a B2B activity, its efficiency depends on policy interventions. Logistics ac- tivities expand in space and time, require specific infrastructure (e.g., ports), involve government con- trols (e.g., customs clearance), and have a large environmental or spatial footprint (e.g., warehouses, multimodal terminals). Logistics performance, a concept introduced by the World Bank (Arvis et al. 2007), is a cross-cutting concern for the public sector and international organizations. It combines interventions in the following areas: infrastructure and provision of infrastructure services including ports (often in public-private partnership) or land corridors, regulations and market incentives, and border management. 3 Logistics also require specific attention as a contribution to broader concerns such as spatial planning (e.g., logistics zones), skills and training, urban policies (distribution in urban environments), and especially decarbonization (“green logistics”). 4 The World Bank introduced the Logistics Performance Index (LPI) in 2007 as a set of country indicators to inform policy makers and practitioners. What is the World Bank Logistics Performance Index (LPI)? The World Bank’s Logistics Performance Index is a comprehensive index that has been covering the entire supply chain for between 139 and 160 countries in the 2007 to 2023 editions. It is based on a survey of nearly 1,000 logistics professionals worldwide and is useful for comparing performance across countries and identifying and prioritizing broad reform areas for interventions within countries. The index is based on numerical ratings of 1 (weakest) to 5 (strongest). The Logistics Performance Index is a weighted average of six components: 1. Efficiency of the clearance process 2. Quality of trade- and transport-related infrastructure 3. Ease of arranging competitively priced international shipments 4. Competence and quality of logistics 5. Ability to track and trace consignments 2 Banomyong et al. 2022. 3 Arvis et al. 2007; World Bank 2010. 4 McKinnon 2015. 1 6. Frequency with which shipments reach the consignee within the scheduled or expected deliv- ery time Respondents rate up to eight countries along those six dimensions. The countries are selected accord- ing to the country of operation of the respondent. The country selection engine combines randomness and trade patterns. The LPI tells how easy (perceived) it is to establish efficient and reliable connec- tions between trading countries. Conceptually, the LPI is a revealed supply chain accessibility metric. The LPI expanded on the concept of logistic friendliness introduced by Daley and Murphy [1999]. These authors were measuring through surveys the attractiveness of cities in the United States, as regional hubs of logistics activities. The choice of a survey-based instrument was in part dictated by necessity. When the LPI was created, "hard" supply chain data was scarce, and hardly comparable across economies. Initially, the LPI survey covered lead time to trade or in customs for different modes, with focus on the respondent’s country of operation. Yet the dispersion of results for a given country was higher than the perception-based LPI, and global surveys were found inadequate to measure lead time or delays. The LPI survey was discontinued after the 2023 edition to focus on more general attributes of logistics. The Logistics Performance Index (LPI) is the primary supply chain benchmark for policymakers and has motivated reforms since 2007. The LPI has been used as a component indicator in several high-level dashboards including the Sustainable Development Goals or those of the World Economic Forum. 5 Moving towards Big Data The perception-based approach had shortcomings. LPI scores reflect a quantification of qualitative perceptions on an ordinal scale and are thus subject to noise. Sampling is nonrandom since respond- ents choose whether to participate, possibly leading to selection bias. This creates difficulties when comparing small changes between countries and within countries over time. An additional issue re- lates to year-on-year comparisons, which suffer from the limitation that respondents grade perfor- mance on a qualitative scale that could be affected by indexing issues, meaning that respondents’ standards of what constitutes “high” or “low” efficiency and quality of logistics services can differ. In each edition, several countries could not be featured in the LPI as too few assessments from logistics service providers based outside the country in question were obtained. Survey fatigue was a growing issue. In recent years, supply chain data has changed from scarce to abundant. Digitalization of supply chains expanded swiftly. Shipments are tracked from origin to destination in many industries and across bor- ders. This has been driven by the rapid digitalization of logistics processes (e.g., Internet of Things) and the growing need for traceability, in parallel with the fast pace of automation in public agencies (e.g., port community system, customs automation). Consequently, global datasets are available, albeit mode-specific. The 2023 edition introduces quantitative key performance indicators (KPIs), based on shipment track- ing datasets, measuring the speed of trade around the world. The KPIs are derived from millions of international movements of containers, aviation shipments, or postal parcels. KPIs measure delays at ports, airports or assess international connectivity (e.g., number of international connections). These 5 See https://sdgs.un.org/goals and https://www.weforum.org/publications/the-global-enabling-trade-report- 2016/ 2 KPIs, measured in days or simple counts, are relatable to policy makers and practitioners concerned with the performance of key logistics hubs or gateways, such as ports or airports. Digitization and new opportunities: massive tracking data Since the first release of the Logistics Performance Index in 2007, the data ecosystem of supply chains has changed radically. Tracking data is now abundant, and at least for some modes, from globally consistent sources. Much of this change is driven by digitization of supply chains where shipments and assets are located in real time. This knowledge allows the owner of the goods, or its agent, to make timely decisions on purchasing, inventory management, routing or rerouting. Digitization has trans- formed supply chain management techniques and improves operational efficiency and flexibility and reduces costs across supply chains. 6 Supply chain data interoperability across the supply chain participants is essential to supply chain dig- itization: Third-party logistics operators should exchange seamlessly with carriers (e.g., shipping lines). Industry initiatives to normalize supply chain data or create shared tracking repositories are critical to the fast adoption of digitization. For instance in the area of container shipping, the Digital Container Shipping Association 7 promotes common standards, while the now discontinued TradeLens was an attempt of a common digital platform. Digitization of supply chain operations generates massive granular high-frequency datasets. The vol- ume of information on cargo, vessel and vehicle movements in time and space offers new opportuni- ties to analyze supply chains and logistics services provision at the national level and globally. To construct the new set of indicators for the LPI 2023, the World Bank collaborated with external data providers in the logistics industry. The data comprised high-frequency, micro-level datasets, namely: (i) dwell time and connectivity of containerized trade based on consignment activities from TradeLens (ii) speed of international parcel movements from the Universal Postal Union (UPU); (iii) air cargo tracking from Cargo iQ (IATA); (iv) container lines shipping service deployment information from MDS Transmodal; and (v) turnaround time at port generated from worldwide containership port calls from Automatic Identification System (AIS) data provider Marine Traffic. The data accessed is pure tracking data: a series of events for a given shipment with a time stamp and a localization (e.g., port or airport code). It does not include the price paid for freight and logistics services, nor the value or nature of the merchandise. The partners’ databases may aggregate waybill or manifest information where monetary information is available. As this information is commercially sensitive, it has been excluded from the partnerships with the World Bank. The tracking data is exhaustive regarding long-distance international trade (the number of observa- tions is in the double digit millions). It covers three major types of international trade: container trade, air cargo and parcels. Bulk shipping is not included. However, bulk logistics is much less representative of countries’ performance than container trade. The available tracking data is less representative of cross-border trade between countries in the same tracking block than long-distance trade. Cross-border trade is carried by trucks (semi-trailers) or rail. 6 Supply chain digitalization has been reviewed many times, e.g., https://supplychaindigital.com/digital-supply- chain/mckinsey-5-point-plan-on-supply-chain-digital-transformation. 7 https://dcsa.org/newsroom/streamlining-international-trade 3 Tracking systems for trucks and freight trains exist only at the country or regional level. Lacking a global repository, these modes (road and rail) are not yet included in the new KPIs. However, corridor per- formance information is available from container tracking data to/from inland destinations. The rest of the paper is organized as follows. Section 2 summarizes the data sources and presents the indicators. Section 3 presents the production process of the indicators. Appendices describe the da- tasets, their cleaning methodology and descriptive statistics. 2. Data sources Universal Postal Union (Postal data) The volume of e-commerce has surged in the past decade. According to McKinsey projections, 8 by 2030, cross-border e-commerce in goods will grow anywhere between $1 trillion and $2 trillion of merchandize value from its current $300 billion, resulting in significant changes in supply chains. The volume of e-commerce activities was equivalent to 30% of global GDP in 2019, 9 so its role in economic development cannot be overlooked. Information on postal services, including data on postal infrastructure, processing volumes, and speed allows to assess the performance of logistics, customs, and delivery services at the country level. In many countries, postal networks constitute an essential infrastructure and play a key role in guaran- teeing access to universal communication services and global value chains. This is particularly valuable for the youth, women, and small businesses in vulnerable, marginalized, and rural areas. According to research conducted by UPU in collaboration with UN Global Pulse, various characteristics of the inter- national postal network are closely related to countries’ economic and social advancement, playing an important role in their trade development. 10 The extent to which international e-commerce is interconnected with postal services was pointed out by a McKinsey report 11 published in March 2022, specifying that the UPU is responsible for handling about two-thirds of all letter-parcel (up to 2 kilograms) deliveries across borders. The Universal Postal Union (UPU) also manages international express shipments, which account for about 5%-10% of in- ternational e-commerce volumes, and the rest is delivered by commercial operators. Therefore, infor- mation collected by UPU is a source of comprehensive data for over 190 member countries, and prob- ably the best source of information on e-commerce trade. UPU develops and maintains technical standards and Electronic Data Interchange message specifica- tions used in exchange of electronic information between postal services. UPU’s databases of EDI messages contain detailed information on volumes, frequencies, key cross-border activities and other tracking data on postal items. This information is available via the EMSEVT (Express Mail Service 8 https://www.mckinsey.com/industries/travel-logistics-and-infrastructure/our-insights/signed-sealed-and-de- livered-unpacking-the-cross-border-parcel-markets-promise 9 https://unctad.org/news/global-e-commerce-jumps-267-trillion-covid-19-boosts-online-sales 10 Hristova et al. 2016 11 https://www.mckinsey.com/~/media/mckinsey/industries/travel%20logistics%20and%20infrastruc- ture/our%20insights/the%20endgame%20for%20postal%20net- works%20how%20to%20win%20in%20the%20age%20of%20e%20commerce/the_endgame_for_postal_net- works_how_to_win_in_the_age_of_e-commerce.pdf 4 EVenTs) messaging standard, which is used to track parcels (packages up to 30 kg), letters (letter-post items and packages up to 2 kg), and express mail flows in the UPU network. Table 1: Events in the UPU database Message ID Event Description Exporting events A Posting/Collection B Arrival at outwards office of exchange C Departure from outward office of exchange Importing events D Arrival at inward office of exchange E Held by Import Customs F Departure from inward office of exchange G Arrival at delivery office H Attempted/Unsuccessful delivery I Final delivery Transit events J Arrival at transit office of exchange K Departure from transit office of exchange The processing and handling sequence of cross-border shipments is reported in Table 1. For an e- commerce item, after a shopper places an order, the shipper hands the item to the origin post (code A). The post inducts the item into its domestic network where it passes through several handling, sortation, and transport processes (code B). At the origin Office of Exchange (OE), the item is placed in a receptacle for international dispatch to the destination OE, in which it departs from the country- of-origin (code C). After two potential transiting events (coded J-K), the item arrives at the destination (code D), where it is unloaded and handed over to the destination post. The E event denotes the pro- cess of separating items from a bundle that they were shipped in, as well as item retrieval and clear- ance through customs. The destination post inducts it into their domestic network for processing and potential relocation to the delivery office, from which a final delivery to the customer happens (code I). Alternatively, records of unsuccessful deliveries are recorded using event H. Each of the different handoffs described above is supported by data capture methods such as barcode scanning and com- puterized entries. The different supply chain partners use this data to generate and exchange the ap- plicable EDI messages in compliance with agreed standards and business rules designed by UPU. With about 1 billion records per year, EMSEVT provides data for building quality-of-service KPIs that are used by postal operators for assessments of their efficiency. The time difference between A and B means collection time performance, the difference between B and C denotes the processing time at origin, C-D informs about international transit, D-F contains information on the dwell time at destina- tion office of exchange, including customs clearance. The difference between D and H/I informs about performance of postal activities at destination. For the LPI, the focus was on the availability and performance at destination, making events D, H/I the primary events of interest for assessing the performance of postal services and last mile deliveries. This indicator has also the best country coverage. The KPI published in 2023 only retain parcel, not letter, data. 5 The dataset was constructed for the whole calendar year 2019 and aggregated to the country level by looking at parcels’ time difference between events using statistics such as mean, median and inter- quartile range for tracking the lead time of parcel movements. The sample comprised countries with more than a hundred inbound unique parcel shipments. The sample included 132 countries from all World Bank geographical regions and income groups (see Figure 1). Forty percent of low-income coun- tries were represented in the postal dataset, while other income categories were represented at be- tween 61 and 68 percent. The geographical coverage was more evenly distributed with all but one region having over half of the countries represented in postal statistics. Only Sub-Saharan Africa had less than 50 percent of its members represented as part of UPU’s information. A detailed overview of this dataset is available in Appendix 2. Figure 1: UPU coverage by region and income group Percentage of WB countries/region 100.0% 79.3% 80.0% 71.4% 66.7% 62.5% 60.0% 52.6% 53.7% 45.8% 40.0% 20.0% 0.0% East Asia & Europe & Latin America & Middle East & North America South Asia Sub-Saharan Pacific Central Asia Caribbean North Africa Africa Overall 80.0% 68.5% Percentage of all WB countries/income 63.0% 61.1% 60.0% category 39.3% 40.0% 20.0% 0.0% High income Low income Lower middle income Upper middle income Overall 6 Cargo iQ (air cargo data) In addition to the postal flow dataset, the LPI 2023 includes indicators on air cargo logistics. The oper- ational framework of the air cargo industry, including cargo handling by carriers and airports, is rooted in the same EDI protocol as the postal network: the main events of the supply chain follow a similar logical ordering (Figure 2). The dataset was provided by Cargo iQ – a nonprofit and affiliated with the International Air Transport Association (IATA) created in 1997 to develop a system of shipment plan- ning and performance monitoring for air cargo based on common business process and milestone definitions. 12 Cargo iQ’s focuses on the digitalization and exchange of e-AWB (electronic airway bill). Cargo iQ includes over 60 participants including forwarders, air carriers, ground handling companies, road carriers and airports to work together to define the standards for shared processes and planning to control and to evaluate performance of cargo shipments. Most major carriers are participating in the initiative. Annually, Cargo iQ collects over 110 million data lines, of which 12 million are airport- to-airport shipments. These records, covering information for about 650 airports and 184 countries and accounting for 45 percent of global air freight volume, were used to construct the aviation pillar of the LPI 2023. Figure 2: Cargo iQ events sequence NFD RC DEP AR Notification of DL Cargo Shipment Shipment Readiness for Cargo Received in Departure from Arrival at Delivery of Delivered to Warehouse at Origin / Last Transit / Cargo to Consignee/age Transit / Departure Point Destination consignee/age nt Destination) nt Cargo iQ’s reporting system captures the path of air cargo shipments defined by a master operating plan (MOP) consisting of 19 milestones. 13 These events along the supply chain capture the Origin- Destination movement of cargo between and at origin airports, destination airports and transit loca- tions. Figure 2 demonstrates the milestones at the destination airport that were available for the LPI 2023. A shipment is tracked from the point of departure of the flight (DEP) to its arrival (ARR) and checking into a warehouse at a destination airport (RCF), followed by the advisory to the consignee of the freight’s arrival (NFD) and the consignee’s final collection of the freight from the carrier at the destination airport (DLV). As with the other shipment tracking databases, it is the carrier’s responsibility to enter the data in the system. The time differences between the milestones provide information on the reliability and per- formance of individual carriers, freighters, operators and on the aggregated level, i.e., individual air- ports and countries. To avoid leakage of sensitive information for specific carriers operating on lanes with total market share exceeding 80 percent and on those lanes where the total number of carriers is below 3, 46 countries were excluded from the final dataset, bringing the total number of countries reported in the LPI 2023 aviation pillar to 141. 12 https://www.cargoiq.org/value-proposition 13 https://www.iata.org/en/programs/cargo/cargoiq/ 7 The indicators that were extracted from Cargo iQ are based on a pair of milestones: NFD and DLV, i.e., timestamp of the notification of document’s readiness and final handing over of the physical shipment to the freight forwarder. These milestones have the highest compliance levels. The time elapsed be- tween events NFD and DLV was computed for each e-AWB recorded in the system at destination country given the validity of the time difference (both timestamps have to exist and the time differ- ence between them has to be positive). Destination country aggregates presented as part of the avi- ation pillar of natural unit indicators were constructed from these time differences derived for each e-AWB. Country-level aggregates computed were means, medians, and absolute square deviations. Figure 3 shows the country coverage of the final subset of the Cargo iQ dataset. As was the case with the survey-based LPI, low-income countries have the least coverage (only a quarter of LICs are availa- ble as part of aviation statistics). Geographical coverage is lower overall for Cargo iQ in comparison with UPU. East Asia & Pacific, South Asia and Sub-Saharan Africa all have about 35-40 percent of coun- tries covered. Overall, slightly less than a half of the World Bank member countries have records sourced from Cargo iQ information on aviation supply chains and logistics. For a more detailed over- view of Cargo iQ data, refer to Appendix 3. Figure Figure 3: 3 Cargo Country IQ country coverage coverage of Cargo iQ dataset by WB region and income group 66.7% 58.6% 57.1% 60% 39.5% 41.5% 37.5% 35.4% 40% 20% 0% East Asia & Pacific Europe & Central Latin America & Middle East & North America South Asia Sub-Saharan Africa Asia Caribbean North Africa 80% 60.5% 60% 42.6% 38.9% 40% 25.0% 20% 0% High income Low income Lower middle income Upper middle income 8 TradeLens (container tracking data) TradeLens, started in 2018 and discontinued in 2023, was a data and document sharing data platform bringing together shipping lines and other participants in container logistics. The project was a collab- oration between IBM and GTD Solution, a division of shipping conglomerate Maersk. TradeLens brought together over 1,000 entities involved in the global supply chain, including over 200 ports and terminals, over 15 custom authorities and, by mid-2022, was facilitating the information exchange of about 65% of containerized trade. TradeLens data model TradeLens used a data model with three related classes: Consignments, Transport Equipment and Shipments. It tracked consignments, transport equipment (containers) and shipments, while manag- ing the identifiers and relationships between them. The TradeLens data structure allowed a consign- ment to be sequentially in multiple transport equipment (ship, rail, barges, truck), along with other consignments. It also allowed transport equipment to be part of multiple consignments. The LPI 2023 dataset retrieval for this report was defined by the data provider for a period of 6 months between May 1, 2022, and October 31, 2022. The sample contained timestamps for 11 events for four transport modes (Ocean, Road, Barge, and Rail) and two load statuses (Full or Empty), associated with over 3 million unique tracked consignments (container voyages) and over 30 million observations in total. The dataset covers over 11,000 UN/LOCODEs 14 (including destinations, origins, and live locations (locations of specific event’s timestamps). Process framework and definitions of phases The conceptual framework for processing the data is presented in Figure 4 and with more detail in Appendix 1. The dataset contained a sequence of events and their timestamps for each consignment. On average, there were 9.8 events associated with each consignment. Additional static information on each consignment includes UN/LOCODEs for origin and for destination. Compared to aviation and postal tracking data, the container data is more complex. Events do not happen in a strict sequential order, as for UPU or Cargo iQ. Shipping transshipments or land corridors at origin and destination increase the number of possible combinations. The sequence of a container is as follows (possible but not compulsory legs are in italics): 1. An empty container is sent from a depot to the exporter’s premise where it is stuffed (sending from the depot is a registered event, not the stuffing) 2. The full container arrives at an inland facility 3. The container leaves the inland facility (by rail, barge, or by default truck) 4. Processes 2-3 are repeated at intermediate facility 5. The full container arrives at the port of export 6. The container is loaded on vessel 7. The vessel departs 14 https://unece.org/trade/cefact/unlocode-code-list-country-and-territory 9 8. The vessel arrives at transshipment port 9. The container is unloaded from vessel 10. The container is loaded on vessel 11. The vessel departs the transshipment port 12. Processes 8-11 are repeated at other transshipment port 13. The vessels arrives at destination port 14. The container leaves the destination port 15. The container arrives at an inland facility 16. The container leaves the facility 17. Processes 15-16 are repeated at other inland facility 18. The empty container arrives back at a depot 15 For each consignment, the raw data was partitioned into phases of the supply chain processes: export, transshipment, and import. Hence the export phase included events numbered 1 through 7 in the sequence above, transshipment phase encompassed events 8 through 12 (and as many of the re- peated transshipment port visits as was documented), and the import phase covered stages 13 until the last record existing for the specified consignment. The export phase was identified as a collection of events on a consignment level that are preceding the first occurrence of loaded on vessel or vessel’s departure at location of origin (country level). Usu- ally, these include Gate-out (Empty) and Gate-in (Full) events with the latter event’s location being close to the vessel’s departure and loading locations. This phase was the source of LPI KPIs such as export dwell time and export corridor time (defined in more detail in the next section). With the same principle in mind, the import phase corresponds to all events occurring after the last occurrence of Vessels Arrival or Discharge from vessel at location of destination (country level). Events in this category include Gate out (Full), Gate in (Empty) (return of an empty container), contributing to indicators such as import dwell time and import corridor time. The final phase is associated with transshipments and is defined as a collection of events occurring between import and export phases (between first occurrence of Loaded on Vessel/Departure and last occurrence of Arrival/Discharge from vessel) where the active UN/LOCODE is different from the UN/LOCODE of origin and destination. All three phases include timestamps for Rail and Barge legs of the voyage in some cases. Using the definitions above, a collection of natural unit indicators was con- structed for each phase of the process using timestamps of the events to compute time differences for segments of the consignment voyage. Once the time differences for specific segments were de- rived, the aggregation for UN/LOCODEs and countries was used to produce the KPI. More details on 15 Different depot than the one from leg 1. 10 the TradeLens data model, data cleaning and validation methodology as well as KPI construction meth- odology are available in Appendix 1. Figure 4: TradeLens supply chain framework MDST and Marine Traffic (container ship tracking data) Container shipping operates along regular scheduled services, with several ships operating the same route. A comparison is often made with (line) bus operations, although ship services look more like loops. Similar to code share flights, container services operate in code share between shipping alli- ances to allow regular day of the week calls. To see whether two ports (or countries) are connected, one has to look not just at direct connections but at the full loop service. To take an example, China and the Netherlands have many services between them, but none is direct as the service will make intermediate calls (e.g., Singapore, Gulf, Southern Europe). Source: CMA-CGM The LPI 2023 uses container services data by MDS Transmodal (MDST). MDST is an independent con- sultancy that collects and aggregates transport-related data. A dataset of aggregates for country pairs and countries for January-June 2022 was derived from MDST’s Containership Databank, which covers shipping schedules and volumes offered on liner shipping routes. The shipping KPIs in the LPI 2023 focus on deployed capacity and number of services of ports for a specific country as well as on the list of countries that are connected through direct liner shipping services (connectivity indicator). This data is also used by UNCTAD’s Liner Shipping Connectivity Index (LSCI), also provided by MDST. The LSCI is a combination of several KPIs such volume of container trade, or number on lines and services for a specific economy. When it comes to maritime connectivity, the new LPI data does not aggregate like the LSCI but refers to the number of international direct connections, to be consistent with the KPIs for other transport modes. In addition, the LPI 2023 uses information on port calls of container ships acquired from MarineTraffic. The port call dataset from MarineTraffic is based on Automatic Identification System (AIS) messages 11 and on ports and ship datasets from open-source datasets and a database made available through the International Maritime Organization (IMO). Fields available in the port calls dataset include: • Type of the event: arrival or departure • UN/LOCODE 7 Port Name • Date / Time in UTC • Vessel particulars: o IMO number o MMSI (Maritime Mobile service Identity) number o Vessel name o Vessel type o Draught at the time of the recording o Length, width, and max draught The dataset was prepared using MarineTraffic port call data covering over 5,000 container ships calling at over 800 ports during the first two quarters of 2022. Based on estimated time differences between recorded arrivals and departures to port facilities, an indicator of turnaround time per port was con- structed. In addition, ship data, including capacity in twenty-foot equivalent units (TEU) is available. Ship types range from small feeders with capacity of up to 1,000 TEU to ultra large container vessels with capacity starting at 14,501 TEU. Additional details regarding this data source are in Appendix 4. 3. Definition and processing of KPIs From tracking data to country indicators The micro-logistics datasets used for the LPI 2023 consist of large numbers of voyages of individual shipments (e.g., container or parcels). For shipping data, they contain the port call sequence. The raw tracking data consists of the following fields: • ID of the voyage • Origin, destination • Coded location of the event: tracking data localize the event by port, airport, or more gener- ally UN/LOCODE. • Time of the event • Type of the event (structured categories, specific to the dataset): e.g., container loaded on ship, parcels arriving at the bureau of destination, port arrival etc. The individual voyage statistics are sliced to create lead time between two consecutive events either at the same location (e.g., container stay or dwell time at a port) or at two different locations. The data is then combined across voyages going through this location (if the consecutive events are at the same place), or through a pair of locations (if the consecutive locations are different). The data is processed according to a growing level of aggregation. First the information is aggregated by trade lane: Statistics of lead time of a shipment or cargo unit from origin to destination or key logistics steps, for instance dwell time at gateways (ports, airports, postal bureau) at origin or destination. The indi- cators in the LPI 2023 have been decided with the data partners depending on the ease of interpreta- tion for final users. 12 The survey-based LPI captures the ease of establishing reliable and effective supply chain connections to export target countries. The survey does this by pooling the information from the six LPI compo- nents, as assessed by logistics and freight forwarding professionals. LPI respondents rate countries according to their supply chain experience when providing their macro-level qualitative knowledge of them. Micro-level tracking data, however, informs about the performance outcomes along the LPI dimensions except for affordability, as cost information is not available in this data. Attributes of sup- ply chain performance accessible from the data include two main categories. The first category comprises indicators of timeliness and reliability. Domestic and international supply chains have a skewed lead time distribution with a long tail. Therefore, the significance of lead time at a port or along a corridor is not captured just but its average but by how likely much longer delays are. The shape of the right side of the distribution, i.e., the length of the “tail”, reflects reliability. Corresponding KPIs include: • Mean/Median lead time for legs of the supply chain (between two subsequent events), with focus on time at and within border. • Indicators of reliability as statistical dispersion such as interquartile range defined as the dif- ference between the 75th and 25th percentile. The second category includes indicators of connectivity. The structure of the network matters. More connections mean more options to access markets, also contributing to reliability and performance. This is like human mobility where more options to reach destinations increase accessibility of individ- uals to reach jobs or services. Connectivity indicators are provided by mode: container shipping, avia- tion, postal. They mostly consist of simple counts of connections in the network. Definition of the KPI The set of indicators below focuses on (i) delays experienced at the same place or dwell time at ports or airports, (ii) delays in motion, and (iii) connectivity information. From a policy standpoint, dwell time is an important KPI. Time spent at hubs and gateways can be improved by productivity invest- ments, fluidization of information, or simplification of processes. Furthermore, dwell time at these hubs and gateways has considerably more dispersion than in international freight transport. While planes and ships have consistent travel time (at least under normal circumstances), lead time at key facilities or to some extent land corridors is affected by a high degree of dispersion that impacts the logistics of the consignee at destination. The KPI produced for the report are in Table 2. 13 Table 2: KPI derived from tracking data in the LPI 2023 Data provider KPI name Definition and significance Period Unit Why it matters Cargo iQ Number of part- Count of distinct origin country partners per destina- Q1-2 2022 Number of Air cargo connectivity metric ners tion country countries Cargo iQ Aviation dwell Time difference between event NFD (Notification of Q1-2 2022 days Efficiency of handling and clearance time Readiness for Delivery of Cargo) and DLV (Cargo Deliv- and notification to consignee ered to Consignee) at destination country Median and quartiles are provided UPU Number of part- Count of distinct number of country partners 2019 Number of Postal connectivity ners countries UPU Postal delivery Median time difference between event EMD (Arrival 2019 Days Efficiency of clearance and postal lo- time at inwards office of exchange) and events EMH (Un- gistics at destination successful delivery) or EMI (Final delivery) at destina- tion country Median and quartiles are provided MDST Number of ser- Total number of maritime services (operated through Q1-2 2022 Number of ser- Availability of services and fre- vices liner shipping companies on a predefined rotation) be- vices quency of connections tween the two countries MDST Number of alli- Count of number of alliances per destination country Q1-2 2022 Number of alli- Competition between services ances ances MDST Number of part- Count of distinct number of country partners per Q1-2 2022 Number of Shipping connectivity metric ners country countries MarineTraffic Turnaround Time difference between first instance of arrival and Q1-2 2022 Days Proxy of the performance of the time last instance of departure for consecutive repeated ship to shore interface (inc. handling by terminal operator) 14 port visits (if any) calculated for each port call. Aggre- gated directly from port call time differences to coun- tries over a 6 months’ time period (2022-01 - 2022-06) TradeLens Import dwell Time spent at the same location defined as UN/LO- May 1 to Octo- Days Critical KPI resulting from several time (port of CODE before delivery to the client. Dwell time at port ber 31, 2022 factors including goods clearance, entry) of entry. Mean, median and quartiles are provided removal and land services and to some extent terminal and multi- modal performance TradeLens Import dwell Time spent at the same location defined as UN/LO- May 1 to Octo- Days Critical KPI resulting from several time (consoli- CODE before delivery to the client. Dwell times at port ber 31, 2022 factors including goods clearance, dated) of entry and intermediate inland locations are com- removal and land services and to bined and statistically aggregated for the same coun- some extent terminal and multi- try of destination. Mean, median and quartiles are modal performance provided TradeLens export dwell Time spent at the same location defined as UN/LO- May 1 to Octo- Days Same as import dwell time but more time (port of CODE since expedition and before ship loading. Dwell ber 31, 2022 representative of domestic logistics departure) time at port of departure. Mean, median and quartiles are provided TradeLens export dwell Time spent at the same location defined as UN/LO- May 1 to Octo- Days Same as import dwell time but more time (consoli- CODE since expedition and before ship loading. Dwell ber 31, 2022 representative of domestic logistics dated) times at port of departure and intermediate inland lo- cations are combined and statistically aggregated for the same country of origin. Mean, median and quar- tiles are provided TradeLens Corridor import Time spent in motion (or idle or not) between loca- May 1 to Octo- Days Representative of road or rail corri- and export lead tions. This time is combines across legs and aggre- ber 31, 2022 dor performance excluding multi- time gated by country of export or country of import. modal transfers en route which are Mean, median and quartiles are provided. included in dwell time. 15 Level of availability and country level aggregation The KPIs are available at the level of location or pair of locations (typically a UN/LOCODE). For the official report, the data was further aggregated at country level by taking average, medians and other statistics of the natural unit indicators generated for all locations in the country (or pair of countries for corridor data). Port- specific data is available for just under 500 ports and can be made available for academic purposes upon request. Consignment-specific data cannot be made publicly available due to confidentiality concerns and data sharing agreements with partnering organizations. For landlocked countries, the observations serving to estimate the dwell time at port of entry are separate from the observations related to shipments going to the country of transit. Some countries classified as land- locked developing countries in our report are not considered to be such in a global, less strict definition. For example, Azerbaijan, a country that formally has access to the Caspian sea and operates its largest port (Baku), was classified as a landlocked country. This data feature stems from the nature of our datasets: all of the maritime-related data solely focused on container shipping. Our indicators are thus relevant to ports and ter- minals that are able to handle containerships and ignore dry bulk, liquid bulk, LNG/oil, and Ro/Ro terminals. In the future, we plan to include additional KPIs that include the performance of port facilities that handle dry bulk operations. Data cleaning and consistency Tracking datasets are not perfect. Events may be missing or duplicated in the sequence of individual consign- ments. Sequences may be truncated. Table 3 includes the sequence for a consignment going from Odessa in Ukraine to India in transit to Nepal. In between, the container is transshipped in Istanbul, Salalah, and Co- lombo. However, the transshipment in Istanbul includes duplications (highlighted in grey). To account for this type of issue, cleaning procedures are implemented prior to slicing the data to produce indicators. Anomalous events are eliminated based on rules that “normal” events should logically follow (Table 4). This ensures that the slicing isolates consecutive event pairs. Table 3: Sample container voyage FULL_ST event location date ATUS Actual gate out UAYUZ 2021-10-29-12.59.00.000000 Empty Actual gate in UAYUZ 2021-10-30-14.14.00.000000 Full Actual loaded on vessel UAYUZ 2021-10-31-21.59.00.000000 Full Actual vessel departure UAYUZ 2021-11-01-03.49.00.000000 Full Actual vessel arrival TRIST 2021-11-07-16.30.00.000000 Full Actual discharge from vessel TRIST 2021-11-07-23.42.00.000000 Full Actual discharge from vessel TRAMR 2021-11-07-23.42.07.000000 Full Actual loaded on vessel TRIST 2021-11-17-03.51.00.000000 Full Actual loaded on vessel TRAMR 2021-11-17-03.51.46.000000 Full Actual vessel departure TRIST 2021-11-17-08.15.00.000000 Full Actual vessel arrival OMSLL 2021-11-27-04.18.00.000000 Full Actual discharge from vessel OMSLL 2021-11-27-14.15.00.000000 Full Actual loaded on vessel OMSLL 2021-12-01-21.25.17.000000 Full Actual vessel departure OMSLL 2021-12-02-00.16.00.000000 Full Actual vessel arrival LKCMB 2021-12-05-07.38.00.000000 Full 16 Actual discharge from vessel LKCMB 2021-12-05-21.09.51.000000 Full Actual loaded on vessel LKCMB 2021-12-09-03.26.28.000000 Full Actual vessel departure LKCMB 2021-12-09-07.18.00.000000 Full Actual vessel arrival INVTZ 2021-12-15-12.09.00.000000 Full Actual discharge from vessel INVTZ 2021-12-16-05.28.00.000000 Full Actual gate out INVTZ 2021-12-22-06.11.00.000000 Full Actual gate in NPBRG 2022-02-03-09.15.00.000000 Full Actual gate out NPBRG 2022-02-17-18.16.00.000000 Full Actual gate in NPBRG 2022-02-18-09.52.00.000000 Empty Table 4: Rules that consistent events must follow (TradeLens dataset) Prior event Prior location Event Next location Posterior event Vessel arrival same 'Actual discharge from vessel' same Gate out Rail departure Loaded on vessels Barge departure Gate in same 'Actual loaded on vessel' same Vessel departure Rail arrival Discharge from vessel Barge arrival Vessel departure previous 'Actual vessel arrival' same Discharge from vessel Loaded on vessel same 'Actual vessel departure' next Vessel arrival Gate out previous 'Actual gate in' same Gate out Rail departure Loaded on vessel Gate in same 'Actual gate out' next Gate in Discharge from vessel Rail arrival Rail departure previous 'Actual rail arrival' same Gate out Loaded on vessel Barge departure Gate in same 'Actual rail departure' next Rail arrival Discharge from vessel 17 Barge arrival Gate in same 'Actual barge departure' next Barge arrival Discharge from vessel Rail arrival Barge Departure previous 'Actual barge arrival' next Gate out Loaded on vessel Rail departure Potential biases with the data Despite the data being massive and provided by reputable industry sources, the data used in this analysis may imperfectly reflect the reality on the ground. Potential issues include: • Although there are rigorous procedures to input the time stamps, the process is not fully automatized in some countries and may depend on practices by local operators, more so for aviation and postal data than for shipping. The very few countries for which there is a strong suspicion this is the case have not been included, as per the recommendation of the data partners (see appendixes by mode on the cleaning process). The published KPI only use data that the data partners consider reliable. • The shipping data by TradeLens is a very large sample of movement of containers (in the double digit millions). There may be a selection bias in that more efficient operators tend to use advance digital tracking solutions. There is a also a possibility of bias due to some shipping line being more present in the sample than others. However, expert opinion is that it is very unlikely. Furthermore, expert read- ing of the dwell time data across ports suggests that the dwell time data is consistent with local knowledge when available. • In some cases, the end of the import process registered in the data may not be the actual delivery to the importer. For instance, small developing economies including islands states may not move con- tainers inland but rather unstuff them and store the goods in bonded facilities. The current system does not detect these old-fashioned logistics practices, which are progressively disappearing. • Representativeness of transport corridor data for landlocked countries: For rich landlocked countries in Europe, moving containers inland is not representative of their international supply chain. In Eu- rope, goods are cleared mostly at entry into the EU. Thus lead time data from container tracking is not representative of the cross-border processes of these countries and may exaggerate the actual delays. For instance, trucking of imports to landlocked countries in Europe is not directly measured given direct trucking and clearance at the port of entry. In developing landlocked countries, trucks (or rail) are going to inland ICDs, 16 thus the data is representative. 16 Except for some landlocked developing economies, such as the West Bank and Gaza, or SACU countries, where goods are cleared at the port of entry. 18 • Finally, the concept used in some countries to measure delays may differ from the definitions used here to ensure global comparability. For instance, in many places, containers may be moved for pro- cessing at terminal to satellite facilities in the same general location. Our definition consolidates the time spent at all facilities, not just at the port terminal. Potential for more indicators from the existing data The current data can support more indicators than in Table 2. The following indicators could be considered in the future: 1. How time at destination depends on origin. Trade procedures and transfer of information or payment at destination may be influenced by the country of origin. An indicator could be a weighted variance of dwell time depending on origin. 2. Efficiency of domestic logistics (KPI based on the logistics of empty containers). The tracking data covers the responsibility of the international logistics operators, not logistics done by shippers up- stream or consignees downstream the supply chain. Supply chain practices by the latter may vary. However, container data include information on the movement of empty containers that proxies the time taken to stuff export containers or deliver full import containers at destination. The lead time for an empty container to come back full is an indication of the efficiency of domestic logistics. 3. Cold chain logistics. Refrigerated containers (reefers) are important for food trade. Differences in lead time for reefers compared to ordinary containers could be informative, especially on the export side. 4. Accessibility indicators as weighted average over origin countries of the lead time to destination coun- try. 4. Conclusions and future directions The experience of the 2023 analytics of the large tracking datasets is encouraging. KPI derived from Big Data provide new insights, more actionable than the traditional LPI (e.g., dwell time). The following directions could reinforce the value of the KPIs. The first is making the production sustainable by strengthening the data part- nerships with industry. The discontinuation of TradeLens means that other data partnerships are needed to access comparable data, either direct partnerships with the shipping lines or purchasing the data from com- mercial providers. The second direction is to explore the potential of new indicators from the same dataset (see section 3). A third area is producing the established overall LPI on a 1-5 scale from Big Data. It is possible that the full set of KPIs can accurately help predict the survey-based LPI. A respondent to the classical LPI survey is influenced by his/her actual experience of timeliness, speed and delays of supply chains when rating a partner country. This information is objectively captured by the new KPIs. In statistical terms, the tracking data may constitute a “sufficient statistic” of the LPI, and the survey-based LPI will hence be discontinued. Finally, establishing more data partnerships and obtaining more data are desirable. Having global or large- scale sources on international rail or road movements or truck movements would enrich the set of indicators. Since shipment tracking databases compile waybills or manifests, they include information on the nature and value of goods and freight costs, adding new insights to the current ones focused on speed of trade and con- nectivity. 19 References Anson, J. and Helble, M. (2014): Postal economics and statistics for strategy analysis – the long view. Devel- opment strategies for the postal sector: an economic perspective, 19-40. Arvis, J.F., Mustra, M.A., Panzer, J., Ojala, L. and Naula, T. (2007): Connecting to Compete: Trade Logistics in the Global Economy. World Bank, Washington DC. Arvis, J.-F., Mustra, M.A., Ojala, L., Shepherd, B., and Saslavsky, D. (2010): Connecting to Compete 2012. Trade Logistics in the Global Economy. The Logistics Performance Index and Its Indicators. Washington, DC: The World Bank. Arvis, J.-F., Mustra, M.A., Ojala, L., Shepherd, B., and Saslavsky, D. (2012): Connecting to Compete 2012. Trade Logistics in the Global Economy. The Logistics Performance Index and Its Indicators. Washington, DC: The World Bank. Arvis, J.-F., Saslavsky, D., Ojala, L., Shepherd, B., Busch, C., and Raj, A. (2014): Connecting to Compete 2014. Trade Logistics in the Global Economy. The Logistics Performance Index and Its Indicators. Washington, DC: The World Bank. Arvis, J.-F., Saslavsky, D., Ojala, L., Shepherd, B., Busch, C., Raj, A., and Naula, T. (2016): Connecting to Com- pete 2016. Trade Logistics in the Global Economy. The Logistics Performance Index and Its Indicators. Wash- ington, DC: The World Bank. Arvis, J.-F., Ojala, L., Wiederer, C., Shepherd, B., Raj, A., Dairabayeva, K., and Kiiski, K. (2018): Connecting to Compete 2018. Trade Logistics in the Global Economy. The Logistics Performance Index and Its Indicators. Washington, DC: The World Bank. Arvis, J.-F., Ojala, L., Shepherd, B., Wiederer, C., and Ulybina, D. (2023): Connecting to Compete 2023. Trade Logistics in an Uncertain Global Economy. The Logistics Performance Index and Its Indicators. Washington, DC: The World Bank. Banomyong, R., P. Varadejsatitwong and P. Julagasigorn (2022): Benchmarking the National Logistics Costs: a Case of 49 Countries. Conference Paper, Conference: The 12th International Conference on Logistics & Transport 2022, "A New Era in Supply Chain Intelligence Systems: A Brave New World", Krabi, Thailand. Boffa, M. (2019): E-commerce and the cost of waiting, https://www.sites.google.com/site/mauro- boffaphd/research Hristova D., A. Rutherford, J. Anson, M. Luengo-Oroz and C. Mascolo (2016): The International Postal Net- work and Other Global Flows as Proxies for National Wellbeing, https://doi.org/10.1371/jour- nal.pone.0155976 McKinnon, Alan C., Browne, Michael, Piecyk, Maja, Whiteing, Anthony E. (2015): Green logistics: improving the environmental sustainability of logistics, London: Kogan Page. Murphy, P. R. & Daley, J. M. (1999). Revisiting logistical friendliness: perspectives of international freight for- warders. Journal of Transportation Management, 11(1), 65-72. doi: 10.22237/jotm/922925160 20 World Bank (2010): Trade and transport facilitation assessment: A practical toolkit for country implementa- tion. Washington, DC: World Bank Group. http://documents.worldbank.org/cu- rated/en/967151468325281350/Trade-and-transport-facilitation-assessment-a-practical-toolkit-for-country- implementation World Bank (2017): Logistics competencies, skills, and training: An assessment toolkit (mimeo). 21 Appendix 1: Container tracking data from TradeLens TradeLens, started in 2018 and discontinued in 2023, was a data and document sharing data platform bringing together shipping lines and other participants in container logistics. TradeLens used the IBM Blockchain Plat- form, a permissioned blockchain system that offers immutability, privacy, and traceability of shipping docu- ments. However, unlike anonymous blockchains used in cryptocurrencies, TradeLens blockchain defined its members as “Trust Anchors”, who are known to the network based on their cryptographic identifies. TradeLens’ data model and access control schema were aligned with the UN/CEFACT Supply Chain Reference Data Model. Initial processing The size of consolidated source dataset was just below 31.5 million observations, with around 3 million unique consignment identifiers. The dataset was received from TradeLens via a shared repository of 351 csv files. The extraction was done for consignments with at least one event recorded between May 1, 2022, and October 31, 2022. As a consequence, the time span of the dataset was wider than the 6 months: summarizes the count of observations for each week available. Timestamps recorded before May 1 and after October 31, 2022 (high- lighted in grey in Figure 5) were excluded from subsequent processes. Figure 5: TradeLens dataset: Number of observations by week The dataset’s fields and their definitions (based on the December 2022 version 1.6 of the TradeLens API docu- Millions 1.4 mentation and data model) 1.2 are described in Table 7. 1 These fields comprised 0.8 Consignment Identifier, Count 0.6 Origin Location Port 0.4 (UN/LOCODE format), Des- 0.2 0 tination Location Port (UN/LOCODE format), Lo- 5/9/2022 5/23/2022 6/6/2022 6/20/2022 7/4/2022 7/18/2022 8/1/2022 8/15/2022 8/29/2022 9/12/2022 9/26/2022 10/10/2022 10/24/2022 <5/1/2022 >11/1/2022 cation Value (UN/LOCODE format), Event Occurrence Week time, Full Status and Event name. The combination of Consignment ID and the Event occurrence time formed a unique identifier of each record. Therefore, records that had duplicates for the Consignment ID and the Timestamp were dropped. 380,539 observations without information on the status of the shipment were eliminated from the dataset. Additionally, we excluded all Consignment IDs that had only one observation. There were 37,500 of these records. This initial cleaning pro- cess resulted in 29,692,153 observations and 3,025,565 unique consignments tracked. The dataset referenced 22 events at about 11,000 different UN/LOCODEs. Table 5 contains a summary for the number of observations by each event type and status of the container being transported (full or empty). Table 5: Number of observations by event type (TradeLens dataset) Status Total Event Empty Full N/A Actual barge arrival 1 78,083 2,670 80,754 Actual barge departure 2,858 70,334 2,523 75,715 Actual discharge from vessel 49 4,795,724 84,871 4,880,644 Actual gate in 2,162,635 3,346,099 9,340 5,518,074 Actual gate out 2,271,041 2,861,318 6,903 5,139,262 Actual loaded on vessel 207 5,139,571 89,745 5,229,523 Actual rail arrival 3,037 392,467 155 395,659 Actual rail departure 37,464 353,566 103 391,133 Actual vessel arrival 427 4,589,840 454 4,590,721 Actual vessel departure 598 4,991,800 12 4,992,410 Empty transport equipment interchanged - - 183,763 183,763 Total 4,478,317 26,618,802 380,539 31,477,658 With the goal of maximizing the number of observations used for further analysis and relevance of the event information for the derived indicators, we excluded all observations with transport event type identified as “Empty transport equipment interchanged”. We created two joint events stemming from the combination of the most available event types: loading of transport equipment on vessel and vessel’s departure as well as vessel’s arrival and discharge of transport equipment from it (described in more detail further below). Data model Containers are the units being moved, and the vast majority of events in the TradeLens database belongs to the container level (e.g., actual Gate In, actual Discharge from Vessel). The container encompasses a broad range of equipment: from intermodal shipping containers of all sizes to single-unit cardboard boxes. Formally, containers belong to the Transport Equipment (TE) objects and are used to track the movement of traded goods from one place to another. Thus, the physical container participates in multiple transport equipment operations over time. 23 The dataset’s focus is on consignments and their routes (sequence of events, timestamps, and their locations) associated with a consignment object of the TradeLens data model (Consignment ID) through transport equip- ment objects. In the data model, the logisti- Figure 6: Count of consignments by number of events (TradeLens) cal movement of traded objects is designed to be handled through one or more con- signments. In other words, a consignment represents a link between what is being transported and how it is being transported (i.e., in full, partial or multiple containers). A consignment is the foundational object of the transport and logistics model. TradeLens’ platform design assigned a spe- cific function to the consignment identifier that is an aggregator concept for transport equipment (containers). Thus the route of a consignment represented the aggregate of all its transport equipment routes – opera- tionally, the route translated directly into a list of ports and locations that the consign- ment or container is visiting. In cases where the consignment was divided among more than one container, the dataset provided the latest date and corresponding event available. It was typical that within a consign- ment, container-level information was approximately the same (i.e., +/- a few hours). In cases when two con- tainers were routed differently between their origin and destination ports, or even on different vessels, they comprised two separate consignments. On average, each consignment was associated to 10 different events, with specific locations, status, and timestamps for each. The histogram in Figure 6 shows the distribution of the number of consignments through the lens of the number of linked events. Each consignment in the dataset has most attributes belonging to transport equipment defined at different modal legs of the voyage while only a few were associated with static consignment attributes. The fields Origin and Destination provided in UN/LOCODE format formed static attributes of each individual consignment that remain unchanged throughout the whole route of the consignment and associated containers. The dynamic (or transport event) attributes included the status of the container (full or empty), the event type, the “live” location of the event (UN/LOCODE of event occurrence) and the timestamp of the event’s occurrence in ISO 8601 format. Transport events were designed to communicate planned, estimated, and actual operational routes and the progress towards their completion. However, the dataset received and used contained only actual event types (Table 7). Each type of transport event leg comprised four types of events: loading and discharge of the con- tainer from the vessel as well as vessel’s departure and arrival. Differentiation between modes of transport was available through event categories such as truck, rail, barge, and ocean vessel. The submission of the transport events information was assigned at different levels of authority to a variety of participants that were subscribed (associated) through the platform to the transport events. Table 6 maps the types of participants and their roles for any given consignment or transport equipment, and the conditions under which the participant would be playing that role. M denotes mandatory fields that had to be provided by the assigned participant in all applicable scenarios; while C (Conditional) made it conditional 24 for the participant to provide the data: if the data and scenario were relevant and applicable to the shipment/ consignment/ transport equipment, and if the data was available to the participant, then the participant had Trans-shipment Destination Ma- Transport Service Provider Destination In- Participant In- land Terminal Origin Marine Inland Aggre- rine Terminal Origin Inland land Service Terminal Terminal Terminal P Ocean Car- Rail Opera- Feeder Op- Depot Barge Op- Truck Op- gator PCS NVOCC erator erator erator TSI / id rier tor Transport/TE Event Actual gate out C C C C C C C M M C C C C C Actual gate in C C C C C C C M M C C C C C Actual rail departure C C M C C C C C C C Actual rail arrival C C M C C C C C C C Actual barge departure C C M C C C C C C C C C Actual barge arrival C C M C C C C C C C C C Actual loaded on vessel C M M C M M M Actual vessel departure C M M C M M M Actual vessel arrival C M M C M M M Actual discharge from vessel C M M C M M M to provide it. Table 6: Data provisioning requirements by role TradeLens event categories Table 7: TradeLens event categories Transport/TE Event Transport mode Description (Status) Actual rail departure Rail (Full) Notification that the packed transport equipment has departed by train from a terminal or inland location. Actual rail arrival Rail (Full) Notification that the packed transport equipment has arrived by train to a terminal or inland location. Actual barge depar- Barge (Full) Notification that the packed transport equipment has departed by ture barge from a location. Actual barge arrival Barge (Full) Notification that the packed transport equipment has arrived by barge at a location. Departure-Loaded Ocean/Container Actual loaded on vessel: The notification that the transport equipment (Full) has been loaded on vessel. Actual vessel departure: The actual time of departure of the packed transport equipment from a terminal, based on vessel departure from a berth. Arrival-Discharge Ocean/Container Actual discharge from vessel: The notification that the transport (Full) equipment has been discharged from vessel. Actual vessel arrival: The actual arrival time of the transport equip- ment at a terminal, based on vessel arrival at a berth. Actual gate in Truck (Full or The actual time of arrival of a truck carrying the transport equipment Empty) at a gate, including all actual gate in events, both on export and im- port side and for full and empty transport equipment truck moves. The common types of such moves include: 25 - Actual time of arrival of empty transport equipment for export load at stuffing site - Actual time of arrival of full transport equipment at export terminal - Actual time of arrival of packed transport equipment at import in- land stripping location - Actual time of arrival of empty transport equipment at depot (termi- nal or inland location) after stripping. Actual gate out Truck (Full or The actual time of departure of a truck carrying the transport equip- Empty) ment, including all actual gate out events, both on export and import side and for full and empty transport equipment truck moves. The common types of such moves include: - Actual time of departure of empty transport equipment from depot - Actual time of departure of packed transport equipment from stuff- ing site at inland location - Actual time of departure of packed transport equipment from im- port terminal - Actual time of departure of empty transport equipment at stripping location. Further processing Transport events of vessel arrival and container discharge from vessel were consolidated into a single event type (Arrival-Discharge); the event of loading a container on vessel and vessel’s departure were combined to form a Departure-Loaded synthetic event type. The consolidation process took existing vessel activities (De- partures and Arrivals) as an anchor that was a gap filled with container-associated events (Loaded and Dis- charge) in case the anchor activity for the specific leg was unavailable. Finally, we verified that all consignment locations were uniquely identified by origin and destination ports (i.e., no consignments had more than one unique origin location and destination location). An additional 643 con- signment IDs with location of origin being the same as the location of destination were excluded from the dataset. Extraction of process phases and rule settings Event types and statuses were grouped into three thematic phases based on the most frequently observed sequences of events-status occurrences and the TradeLens data model. Two of the three phases included at least two different modes of transport: only synthetic types of events (Departure-Loaded and Arrival-Dis- charge) existed across all three phases. Specific rule-based definitions are described visually in Figure 4. Table 8 provides an exact ruleset used to clean and transform the dataset. The ruleset sets three groups of pairs of boundary timestamps that define the beginning and the end of each phase of the shipping process. Each phase comprises both the predefined boundary events as well as all the events (with corresponding timestamps and live locations) occurring between them. Additional data cleaning and validation procedures aimed at elimination of repeated or irrelevant records at all phases of the shipment including those occurring after the destination location of the consignment has been reached. As observed in the histogram (Figure 6) of the frequency of consignments that have a certain number of events, there exist a few extreme outliers, with a few consignments having the number of obser- vation seven times larger than the average number observed in the dataset. The validation rule put in place defined the end of the exporting phase as the point where the full container has been either loaded on vessel or the vessel with the full transport equipment on board has departed the terminal at the “live” location, the 26 latter being the location at or in close proximity to the origin location port associated with the consignment. A pair of countries can be considered at or in close proximity to each other if the 2-letter country code (ISO-2 code, extracted from the first two characters of UN/LOCODE) of the country of the event is the same as the ISO-2 code of Origin country. This close proximity was defined based on common borders matrix and addi- tional manual adjustment of the shared border information for European countries. Table 8: List of timestamps (TradeLens dataset) Timestamp [I] Timestamp [II] Export phase Date/Event of the first occurrence of Full observation at Date/Event of the first occurrence of Departure- the live location (loc) at or in close proximity (*) to the Loaded (Full) event country of origin (orig). Import phase Date/Event of the last occurrence of Arrival-Discharge Date/Event of the last occurrence at the live location (Full) event type (loc) "at or in close proximity"(*) to the country of des- tination (dest) Transshipments Date/Event of the first occurrence of Departure-Loaded Date/Event of the last occurrence of Arrival-Discharge (Full) event type (Full) event type Length of voyage leg (in days): Timestamp [II] – Timestamp [I] After the extraction of boundary observations of the phases for each consignment route taken, the dataset was split into three subsets at the boundaries of the phases using a common consignment identifier and re- moving the events/timestamps that occurred outside of specified boundaries. The output of this process gen- erated the following number of unique identifiers for each phase: The export phase had 2,121,771 unique consignment identifiers; Import: 2,176,961 distinct IDs; Transshipment: 2,356,138. In the subsequent cleaning process, each event for each consignment ID received a “next” event and timestamp information. Using these pairs of events, the time difference between the two events was calculated. These time differences (available in hours and days) were used as direct inputs to the LPI 2023 Key Performance Indicators (KPIs). New indicators To validate the new datasets and soundness of the processing methodology, a logical ruleset was imple- mented. The rules included checking whether all “live” locations of the Transshipment phase were different from specified Origin and Destination UN/LOCODES as well as validations of the first/last observations loca- tions in Export/Import phases to match origin/destination locations provided by the user with the consign- ment identifiers. Other validation rules included checks of repeated event types and verification of the correct expected sequences of supply chain and transportation events. Python code used for processing the data is available on request. The output indicators derived were classified as belonging to one of the three phases of the consignment voyage: export, import, and transshipments as illustrated in Figure 4. Unless specified otherwise, all indicators here and in the subsequent sections of the report are provided in days. The import phase indicators encom- pass the time spent at the destination country of the consignment, and export phase indicators describe the time spent at the origin location of the consignment. Both export and import phases can be split between two 27 elements: first, import and export dwell time, defined as the time it took in the location of the ship’s arrival to unload, unstuff or move full container from a ship to another mode of transport; and the time it took to load a consignment on a ship from the point of arrival to the departure location via rail, barge or truck until its departure from the facility on containership, respectively. These dwell time indicators follow a consignment at specific locations of the loading/unloading that can either be given destinations and origins or, alternatively, locations in close proximity to predefined destinations and origins. The second element of export and import phases is consolidated dwell time of a container, defined as dwell time at destination (for import) and origin (for export) combined with time spent at inland multimodal clearance facilities and container transportation to/from loading/unloading facilities if different from the predefined departure location. The transshipment phase indicators were not used as part of the time difference indicators due to high corre- lations with the geographical distances between the origin and destination locations. Instead, a simple count of the number of transshipment locations per origin-destination pairs was provided in addition to the four indicators described above. Lastly, an additional indicator, defined as corridor import dwell time, took into account the full length of the importing stage of the consignment: from the point of arrival to the destination until the last observed timestamp for consignment for landlocked countries. The indicator is representative of road or rail corridor performance excluding multimodal transfers en route, which are included in the dwell time. The estimated time difference was representative of time to import for corridors serving landlocked countries based on lead time between destination and port of import UN/LOCODEs. Each of the indicators described above was derived separately for each consignment with validated shipping phase used for sourcing of each indicator. To illustrate this approach to data cleaning, an example of a con- signment with only one event of the importing phase can be used. Due to the missing observations in the import stage, these indicators are not included in the import-based (i.e., dwell times) indicators. However, this consignment’s events may still be used in the export dwell time indicators provided that its exporting timestamps are consistent with the validation ruleset. The time difference between timestamps of each con- signment’s events were aggregated for UN/LOCODEs, pairs of UN/LOCODEs as well as mean, median and quar- tile distributions of the series are provided for download from the LPI website. The directed networks in Figure 7 were constructed based on TradeLens data on the cumulative number of consignments and average transshipment lengths for all World Bank defined country regional groupings. 28 Figure 7: High-level network structure of Tradelens dataset Weighted directed network constructed from TradeLens da- Weighted directed network constructed from TradeLens da- taset. The weight and the size of the node are proportional to taset. The weight and the size of the node are proportional the number of consignments recorded cumulatively between to the time spent in overall transshipment phase recorded May 1 and October 31, 2022 cumulatively between May 1 and October 31, 2022. 29 Appendix 2: Postal data from the Universal Postal Union (UPU) The Universal Postal Union (UPU), the provider of the postal data for the LPI 2023, is a specialized agency of the United Nations that coordinates postal Table 9: EDI message names and definitions policies, standards, and data collec- tion for its 192 members. The UPU has amassed vast quantities of infor- mation, with over 6 billion parcels sent by post annually domestically and internationally and over 5.5 bil- lion of letter-post items exchanged internationally. The dataset that was accessed and processed for the LPI originated from EDI (Electronic data Interchange) protocol records ex- changed between postal operators about each tracked postal item circu- lating in the international postal net- work. This EDI message specification is referred to as EMSEVT (Express Mail Service EVenTs); the current ver- sion 3 of the specification was adapted for the purposes of tracing all tracked postal mailing items including parcels (up to 30 kg), letters (up to 2 kg) and express deliveries. Just in 2019, UPU registered over 1 billion EMSEVT messages exchanged do- mestically and internationally, each message having potentially up to eight required event types – key timestamps defining the movement of a parcel item from posting/collection to final delivery. The processing and handling sequence of cross-border shipments is in Table 9. For an e-commerce item, after a shopper places an order, the shipper hands the item over to the origin post (code EMA). The post inducts the item into its domestic network where it passes through several handling, sortation, and transport pro- cesses (code EMB). At the origin Office of Exchange (OE), the item is placed in a receptacle for international dispatch to the destination OE, in which it departs from the country-of-origin code (EMC). After a few potential transiting events (coded EMJ-EMK), the item arrives at the destination (code EMD), where it is unloaded and handed to the destination post. The EME event denotes the process of separating different items from a bun- dle (receptacle) that they were shipped in, item retrieval, and clearance through customs. The destination post then inducts it into their domestic network for processing and potential relocation to the delivery office, from which a final delivery to the customer happens (code EMI). Records of unsuccessful deliveries are rec- orded using event EMH. Each of the different handoffs described above is supported by data capture methods such as barcode scanning and computerized entries. Different supply chain partners use this data to generate and exchange the applicable EDI messages in com- pliance with agreed standards and business rules designed by the UPU. For tracked items, postal operators provide the track-and-trace information with respect to the outward and inward tracked letter-post items. Operators are encouraged to observe indicative targets associated with the transmission of postal items event information in the exchange of having the same partner’s information available to them. UPU encourages the following benchmarks: first, 90% of parcels that receive an EMC (departure from office of exchange) event should have an EMD event transmitted within 24 hours of the event time and date; and second, 90% of parcels that receive an EMD event should have a EMH and/or an EMI event transmitted within 48 hours of the event time and date. According to UPU catalogue, the most reliable fields are EMC, EMD, EMH and EMI, Since the 30 time difference between EMB and EMC events would be representative of the length of the transit between two countries and would heavily depend on the distance between the two locations, the time difference be- tween events EMD and EMH/EMI were used to derive the postal delivery time for the LPI 2023. Computed time differences for each postal item were aggregated to a country level using medians, means, standard deviations and decile distribution. The results inform about the performance of postal services at destination, especially the last mile deliveries. Data on parcels is relevant in assessing the value of time and reliability of e- commerce in the destination country. It also helps assess the quality of postal infrastructure and speed of delivery, since large parcel flows usually require more efficient facilities. 17 The reliability of delivery operations in these countries was estimated with statistical measures such as central tendency and dispersion in lead time. The studied sample comprised countries with more than a hundred Inbound unique parcel shipments; this sample included 132 countries from all World Bank geographical regions and income groups. 40 percent of low-income countries were represented in the postal dataset, while other income categories had approxi- mately equal representation in postal statistics, varying between 61 and 68 percent for each income group. The geographical coverage was more evenly distributed with all but one region having over half of the coun- tries represented in postal statistics. Only Sub-Saharan African countries had less than 50 percent of their members represented as part of UPU’s information. The route of each tracked item can be universally described using the EDI protocol and EMSEVT message specification: the list of common event types and their definitions are provided Table 9. It is the EMSEVT mes- saging specification that enables the visibility of tracked postal items for customers and receiving postal oper- ators. EMSEVT records provide essential data for building quality-of-service KPIs that are used by postal oper- ators for assessment of their performance. The focus on parcel flows was in line with previous literature that used the same dataset and found a strong statistical correlation between parcel flows and international trade flows for the same commodity groups (Anson et al. 2014). The parcel sample extraction was designed with the goal of understanding the role of the lead time (i.e., delivery time) in identifying its significance as a factor of supply chain performance. By constructing several measures of central tendency and dispersion in lead time, the reliability of logistics operations was estimated. An additional specification collected by UPU and used in this study pertains to messages classified as PREDES (PRE-advice of DESpatch) in postal standards of recording and exchanging information. The PREDES message notifies the destination about an incoming dispatch and shipment of mail receptacles (e.g., bags, trays) of the same mail category. PREDES is created at the origin Office of Exchange and is sent to the destination Office of Exchange to pre-advise the receiving party about the incoming dispatch. Information extracted from PREDES dataset was used for deriving the connectivity indicator, and volume-based network analysis of postal logistics (see Box 1). The connectivity, or the number of direct postal connections was used to reveal the spatial pat- terns of logistics and accessibility for specific postal bilateral connections. Data cleaning and filtering procedures were applied at the extraction of the dataset to limit its size and avoid retrieving invalid or irrelevant observations. Based on UPU’s postal manual, the key mandatory event types were retrieved for each tracked item and aggregated to bilateral country-level statistic for 2019. Events EMC, EMD, EMH, and/or EMI were mandatory for reporting event types. For each of these events, the attributes Item ID, destination country, event date and time and reporting office of exchange were available. Based on these attributes, the information on the origin and destination countries was retrieved. The sample was de- fined using the following rules applied per unit of observation – a uniquely identified tracking number: 17 Boffa 2019 31 • Mail class can be only C (parcel) and the receiving date of message had to be between January 1, 2019, and December 21, 2019. • Pairs of consecutive events (EMA through EMI excluding optional EMK and EMJ events) had to be recorded consistently and in correct sequential order– no negative time difference between event types was extracted. Observations associated with a uniquely identified tracking number could be reused circularly: in other words, if an item had either EMA or EMB events missing but correctly iden- tified EMD and EMC events, the time difference between correctly recorded timestamps was included in the aggregations on country levels and was part of the resulting dataset. • Events EMH and EMI were consolidated in a way that minimized loss of the datapoint. In case one of them was missing, we extracted ei- Figure 8: Share of null values in 2019 parcel postal records of destina- ther the EMH or EMI timestamp and tion countries in case both were available, the lat- est one was taken. The output of this Percent of EMH/EMI null events Percent of EMD null events logic was coded as EMH/I event. • To differentiate between domestic 7.4% Total and international postal flows, the 6.5% ISO-2 code of sender location (usu- 21.4% Sub-Saharan Africa ally first two alphanumeric charac- 12.2% ters) had to be different from the 28.1% South Asia ISO-2 code of receiving entity. 6.3% • The aggregation from tracking identi- 8.1% North America fiers to countries of destination and 11.3% country lanes was computed during 8.4% Middle East & North Africa the extraction process. 9.3% • Subsequent extraction also elimi- 12.5% nated country lanes with low vol- Latin America & Caribbean 7.8% umes of exchanges (fewer than 99 Europe & Central Asia 6.6% records in one year). These were 5.4% identified using PresDes database East Asia & Pacific 5.9% and excluded from the construction 4.3% of the postal pillar of the LPI 2023. The cleaning processes described above yielded a considerable share of missing observations for the parcel network for unique destination countries - month of 2019 pairings. Figure 8 provides an example of prevalence of empty or Null values for event EMD and events EMH/I cumulatively for the full 2019 for different regional country groups. 32 Figure 9: Network representation of the UPU dataset Box 1. United Postal Service (UPU) network and its characteristics A network representation of the dataset is provided in Figure 10. We constructed a weighted directed graph based on the number of parcel items recorded in international EMSEVT messages in 2019. Coloring gradation of the nodes (light blue to purple) represents differences in out-degree metric, while the size of each node reflects its in-degree value. The network captures 159 countries connected with 13,229 edges in total. Connec- tions between countries are weighted using the total number of parcels exchanged for which EMSEVT message had a non-missing EMD event. The weight information is conveyed using both coloring gradation (black to red) and thickness of the line between two nodes. More than three quarters of all connections exist in both direc- tions, which is reflected in network’s reciprocity value of 0.78 (0-1 scale). Overall, 53 percent of all potential direct connections are fully realized in the parcel dataset. Because this network is moderately dense, its diam- eter, or the largest geodesic distance, is expected to be small. In this case, no country is more than three “steps” (or transit locations) away from any other. This suggests that the network is relatively compact, and the infor- mation in the shape of parcels may travel quickly. On average, each country has 83 partners to whom it sends and from whom it receives EMSEVT parcel-specific information. 33 Appendix 3: Air cargo tracking data from Cargo iQ The new supply chain tracking indicators in the LPI 2023 include indicators pertaining to the aviation sector of logistics services. The operational framework of the air cargo industry, including cargo handling by carriers and airports, is rooted in the same EDI protocol as that of the UPU: the main events of the supply chain follow a similar logical ordering, the sequence of data events in Figure 10. The dataset was provided by Cargo iQ, a nonprofit affiliated with IATA created in 1997. Cargo iQ’s reporting system was designed to capture entire path of air cargo shipments defined by a master operating plan (MOP) consisting of 19 milestones. These milestones represent timestamped events happening along the supply chain capturing the movement of cargo between and at origin airports, destination airports and any transit locations. Figure 10 demonstrates only the milestones at the destination airport that were available for the LPI 2023. A shipment, commonly identified through an e-AWB (electronic airway bill) is tracked through the system from the point of departure of the flight with cargo (DEP) through its arrival (ARR) and checking into a warehouse at a destination airport (RCF), followed by the advisory to the consignee of the freight’s arrival (NFD) and the consignee’s final collection of the freight from the carrier at the destination airport (DLV). This information is captured in real-time and entered into the system by a responsible party. For all five milestones, it is carriers’ responsibility to enter the information in a timely, consistent, and accurate manner. Due to the reliance on manual data entry, some of the data collected by Cargo iQ was not usable due to errors and omissions as well as systematic errors. The approach taken to identify, evaluate and effectively exclude erroneous observations is described below. Figure 10: Sequence of data events (Cargo iQ dataset) DEP ARR RCF NFD DLV • Shipment •Shipment •Cargo • Notification of •Cargo Departure from Arrival at Received in Readiness for Delivered to Origin / Last Transit / Warehouse at Delivery of Consignee Departure Point Destination Transit / Cargo Destination) To compile the most representative and least error-prone collection of relevant milestones, the LPI team ex- tracted an anonymized set of indicators for destination airports and analyzed the patterns on missing values and trends in data validation fails. The most frequent data quality issue detected in all of the milestones and majority of airports was the negative time difference in minimum values of the sample. Negative values can be attributed to the incorrect sequence of entered milestones, usually a result of human error. Due to the prevalence of this data feature in all sampled airports, negative values were excluded from the main data query. Another problem that was detected in time difference calculation of ARR-DLV results was the discrep- ancy between the counts of recorded airway bills and the number of pairs of events with both milestones (such as ARR and DLV) available. A pattern of systematically larger averages than medians was observed, sug- gesting the right skew of the data distribution. However, there were a few airports for which the skewness pattern did not hold, suggesting variability and significant qualitative differences in the nature and origins of outliers. 34 Cargo IQ provided the following explanation of some idiosyncrasies identified in the screening of decile distri- butions of destination airports. In particular, the large discrepancy between the number of observations in some geographical regions at smaller airports primarily located in Africa are a result of significantly lower membership rates of African carriers in comparison with American, European, or Asian carriers. To construct a summarized tracking dataset (measured in natural units), the time differences between mile- stones for each shipment were computed and then aggregated by pairs of origin-destination airports (airport lanes), by pairs of origin-destination countries (country lanes) as well as by destination airports and destination countries. The dataset covered 4 quarters of 2019 and two first quarters of 2022. The aggregation of all records was computed for the lead time components using median, means and decile distribution. Additional variables extracted and used for validations and data cleaning were the total number of airway bills and the number of carriers per shipping lane and per destination countries. Finally, the summarized tracking dataset was constructed based on the selection of milestones and query parameters that passed quality assurance requirements and were recommended by Cargo iQ. Specifically, it was advised to rely more on information obtained from records of DLV and NFD milestones and less – on the records from other event types such as ARR due to lower rate of compliance for provision of ARR information. Thus, the final indicator of dwell time at the destination airport was calculated based on NFD and DLV mile- stones, with DLV being indicative of the final handing over of the physical shipment to the freight forwarder. Additionally, the data excluded all observations that had negative time differences between defined event types, trimming the left tail of data distributions. The cleaning procedure was included in the constructed queries that were executed against the dataset. The extraction from the dataset focused on five key events that had the best coverage at the destination stage of the shipment. To eliminate chances of disclosing iden- tifiable information and to ensure anonymity of the members, the observations for carriers who operated on lanes with limited (two or fewer carriers participating) or no competition with other carriers, were excluded. In other words, for bilateral e-AWB records and aggregates, lanes that recorded fewer than three operating carriers were dropped. This rule eliminated over half of the observations in the bilateral country dataset along. However, when examining this cleaning process from a wider perspective, eliminating the large number of lines (over half of bilateral observations) did not result in a comparable number of countries being excluded. The summarized tracking dataset from Cargo iQ was constructed for the following time difference and aggre- gated using statistics closely resampling those from UPU specifications, i.e., means, deciles, standard devia- tions and counts of non-empty positive observations. 35 Appendix 4: Container ship tracking datasets MDS Transmodal MDS Transmodal (MDST) is a UK-based independent consultancy focusing on the international transport sec- tor, in particular on freight transport including shipping, ports, road, rail, logistics, and distribution. MDST col- lects and aggregates transport-related data and maintains several databases related to freight transportation. A quarterly dataset for the period between January and June 2022 of aggregates for country pairs and coun- tries was derived from MDST’s containership database, which covers shipping schedules and offered volumes on liner shipping routes. Indicators available as part of the partnership agreement with MDST also include the number of services, number of operators, number of alliances, average annual frequency of shipping service as well as statistics (average, maximum, minimum) on the number of deployed ships, ship sizes and their age. This information was used to derive certain indicators of logistics performance related to (1) maritime con- nectivity (number of partners connected via liner shipping services); (2) service availability (total number of liner shipping services (operated through liner shipping companies on a predefined rotation) between the two countries; and (3) service competitiveness indication (proxied by the number of alliances that operate on any given rotation between countries). These indicators were combined with Marine Traffic variables ( covered in the next section). All indicators stemming from these two datasets will be referred to as maritime indicators with a global country coverage of approximately 52 percent of countries that are members of the World Bank (country coverage is shown in Figure 11). 36 Marine Traffic The port calls dataset originating from Marine Traffic is a high-quality collection of records, processed from the Automatic Identification System (AIS) messages and enriched with proprietary information on (i) ports, (ii) ship dataset sourced from the IMO (International Maritime Organization) ships registry. Each observation in the dataset comprises a record of a “port call” – geolocated timestamp of the arrival or the departure to or from an individual port. The dataset includes additional variables on the characteristics of ships including its international identifiers (IMO and MMSI (Maritime Mobile Service Identity) numbers) as well as information on the ship’s length, width, load capacity (the number or TEU they can carry, standard draught and event’s draught values and ship types ranging from smaller feeders with capacity not exceeding 1,000 TEU to ULCVs (Ultra Large Carrier Vessels) with 14,501 TEUs and above. Figure 11: Maritime dataset country coverage by WB region and income group 80% 76.2% Overall 66.7% 58.5% 60% 51.7% 47.4% 41.7% 37.5% 40% 20% 0% East Asia & Europe & Latin America & Middle East & North America South Asia Sub-Saharan Pacific Central Asia Caribbean North Africa Africa 60% 56.8% 55.6% 48.1% 39.3% 40% 20% 0% High income Low income Lower middle income Upper middle income The analysis was conducted using records of port calls prepared for the World Bank by Marine Traffic and covering over 5,000 containerships ships calling at over 1,000 ports worldwide. The information available in- clude timestamps of arrivals and departures reported through AIS signal through terrestrial and satellite re- ceivers. The data covers ports for the LPI 2023 natural unit indicators derived by aggregating calls of contain- ership movements from January 1, 2022, to December 31, 2022, with roughly 1 million new observations added per year. Based on estimated time differences between recorded arrivals and departures at/from ports facilities, an indicator of turnaround time per port was constructed. Broadly, the turnaround time also served as a primary input from ship-specific port calls to overall ports, countries, and regional levels. 37