Policy Research Working Paper                         10296




          A Metadata Schema for Data
     from Experiments in the Social Sciences
                                Jack Cavanagh
                             Jasmin Claire Fliegner
                                 Sarah Kopper
                                Anja Sautmann




Development Economics
Development Research Group
February 2023
Policy Research Working Paper 10296


  Abstract
 The use of randomized controlled trials (RCTs) in the social                      —data catalogs that make such data easily findable, search-
 sciences has greatly expanded, resulting in newly abundant,                       able, and comparable, and thus more readily reusable for
 high-quality data that can be reused to perform methods                           secondary research. The schema is designed to document
 research in program evaluation, to systematize evidence for                       the unique properties of RCT data. Its set of fields and asso-
 policymakers, and for replication and training purposes.                          ciated encoding schemes (acceptable formats and values)
 However, potential users of RCT data often face significant                       can be used to describe any dataset associated with a social
 barriers to discovery and reuse. This paper proposes a meta-                      science RCT. The paper also makes recommendations for
 data schema that standardizes RCT data documentation                              implementing a catalog or database based on this metadata
 and can serve as the basis for one—or many, interoperable                         schema.




 This paper is a product of the Development Research Group, Development Economics. It is part of a larger effort by the
 World Bank to provide open access to its research and make a contribution to development policy discussions around the
 world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may
 be contacted at asautmann@worldbank.org.




         The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
         issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
         names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
         of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
         its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                       Produced by the Research Support Team
     A Metadata Schema for Data from Experiments in the Social
                                                                      ∗
                                                     Sciences

     Jack Cavanagh†              Jasmin Claire Fliegner‡              Sarah Kopper†             Anja Sautmann§




  Keywords: Randomized control trials, metadata, data publication, secondary research, trial registration

  JEL codes: C10, C81, C90




   ∗ We would like to thank David Rhys Bernard, Merc`e Crosas, Maya Duru, Benjamin Morse, Julian Gautier, Steven Glazerman,
Jakob Hennig, Maria Ruth Jones, Jessaca Spybrook, Wendy Thomas, Gabriel Tourek, James Turitto, Keesler Welch, Lars
Vilhuber for providing detailed feedback on the metadata schema, and Caitlin Brown and Rachel Griﬃth for providing helpful
comments on the paper. We would also like to thank Davi Bhering, Simon Cooper, Michael Gibson, Sabhya Gupta, Katharina
Kaeppel, Daniela Muhaj, Isabela Salgado, Sheral Shah, and Selva Swetha for helping us test the schema with data from 29
RCTs. We also thank Mehmood Asghar, Barbara Bierer, Olivier Dupriez, Julie Goldman, Rebecca LI, Katherine McNeill,
Amy Nurnberger, Limor Peer, Matthew Welch, and Julie Wood, for support and feedback at various stages of the project. The
ﬁndings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily
represent the views of the World Bank and its aﬃliated organizations, or those of the Executive Directors of the World Bank
or the governments they represent.
  Supplementary materials, including proposed controlled vocabularies, can be found in the associated GitHub repository
https://github.com/sakopper/rct metadata schema.
   † J-PAL/MIT, email: jcavanagh@povertyactionlab.org and skopper@povertyactionlab.org
   ‡ The University of Manchester, email: jasmin.ﬂiegner@manchester.ac.uk
   § Development Economics Research Group, World Bank, email: asautmann@worldbank.org.
1     Introduction

The use of randomized control trials (RCTs) in the social sciences has greatly expanded over the past two

decades, from economics and political science to public health and education.1 In parallel, journals, funders,

policy organizations, and organizations promoting open science have emphasized that original research data

be publicly accessible for others to analyze and use. As a result of this concerted push, hundreds of orig-

inal RCT datasets from a wide variety of contexts and populations are already published and in principle

accessible to researchers, with the potential to beneﬁt numerous areas of research.

    However, there remain to date signiﬁcant barriers to the discovery and use of existing RCT data. Published

datasets are scattered across data repositories, journal and university websites, and researcher homepages.

In many data repositories, ﬁlter options and documentation ﬁelds are broad, often consisting of free form

text ﬁelds, and data quality and provenance are hard to assess. RCT data are often published with a focus on

replicating the analysis in a speciﬁc paper, and the properties speciﬁc to RCTs and features of the data have

to be pieced together from inspecting datasets, data appendices, and readme ﬁles. Combining datasets across

studies is hindered by a lack of harmonization and documentation. Any work in this regard by individual

researchers is lost to the next person who wants to conduct a similar study.

    In this paper, we propose a metadata schema that can serve as the basis for a catalog of RCT data

(or, more broadly, any social science experimental data). The metadata schema deﬁnes a set of ﬁelds with

encoding schemes that can be used to describe datasets associated with social science experiments. The

encoding scheme deﬁnes the acceptable formats and values to complete the ﬁeld, such as free text entry,

dates, numeric values, or controlled vocabularies (multiple choice options).2

    The proposal organizes ﬁelds into thematic modules and includes whether a ﬁeld should be optional

or mandatory. A core objective of this proposal is standardization. Standardizing the documentation of

speciﬁc types of data allows harmonization, aggregation, and cross-referencing. Data repositories that follow

a standardized schema can be easily made searchable by internet search engines such as Google Dataset

Search. As much as possible, our schema therefore conforms with common data description standards

(such as those of the Data Documentation Initiative (DDI)), and with existing schemata and catalogs for

experiments as well as survey data from the social sciences (such as the World Bank’s Microdata Catalog,

the AEA RCT Registry, and ClinicalTrials.gov).

    In addition, we added ﬁelds that we considered particularly useful for secondary research. Datasets

documented with the schema can be easily searched or ﬁltered by many diﬀerent criteria, from the type
   1 We use the term RCT to describe an experiment that uses randomization to assign some intervention or treatment to

participants but is conducted outside of a controlled environment (such as a laboratory).
   2 We follow the deﬁnitions in ISO (2021).




                                                          2
and time period of intervention, to features of the randomized research design (such as stratiﬁcation or

clustering) to contents of the data (such as how take-up/treatment compliance was measured). These options

do not currently exist in commonly used data repositories in the social sciences (e.g. Harvard Dataverse,

openICPSR, or the World Bank Microdata Catalog). We formulated a set of principles to guide decision

making in selecting the ﬁnal set of ﬁelds. These aim to balance the eﬀort to contribute metadata, the

usefulness of the metadata for diﬀerent research purposes, and the complexity of the information collected.

For example, we wanted to make the entry of new metadata straightforward for the majority of RCT datasets,

while providing enough ﬂexibility to describe unusual research designs or unique data properties.3 In addition

to our proposal, we also formulated a set of recommendations for implementing a catalog or database based

on this metadata schema.

    With this schema, we hope to provide a public good to the research community that facilitates the creation

of inter-operable catalogs, with the goal of speeding up scientiﬁc discovery, reducing duplication of eﬀort,

and creating a systematic overview of existing RCT research. Research facilitated by RCT data catalogs on

external validity, impact heterogeneity, and generalizability of policy impacts beyond the immediate study

populations has the potential to help policymakers, funders, or impact investors make decisions about policy

programs. Reuse of RCT data can in turn bolster investments in primary data collection and spur methods

improvements that will make future experiments faster and more robust. Better access to RCT data can also

oﬀer new research opportunities for scholars who do not have the resources to undertake costly primary data

collection themselves. Finally, original data is a citable contribution to science independent of an associated

paper or report. Enhancing the visibility of the data and the data citation helps ensure that those responsible

for creating the data receive credit (some of whom may not be co-authors of the academic study).

    The next section gives a brief overview of areas of research that make use of secondary RCT data. Section

3 describes the process undertaken to create the metadata schema. Section 4 walks through the schema in

detail, and section 5 describes considerations for creating a metadata catalog based on the schema.

2     Uses of Secondary Experimental Data in Research

RCT data have many properties that make them useful for testing hypotheses and garnering insights that

were not necessarily the focus of the original study. Randomization provides a credible exogenous source of

variation in the data. Many RCTs collect representative data on large populations, often of groups that are

underrepresented (e.g. because they are not part of the formal economy) or of particular interest for policy

research (e.g. eligible for certain beneﬁts). RCT datasets often contain indicators and variables of high

policy relevance and may use innovative measurement methods such as lab-in-the-ﬁeld preference measures.
    3 We   also aimed to make the schema usable to describe laboratory experiments, although this is not the main focus.




                                                                3
In addition, as RCTs as a method mature, researchers have turned their attention to consolidating and

systematizing the RCT-based evidence, as well as expanding and improving experimental methodology and

econometric analysis methods. In short, many emerging areas of secondary research could beneﬁt from

improved access, systematic cataloging, and harmonized documentation of RCT data. Here we provide a

brief overview of research that reuses RCT data and informed the development of the schema.

  Combining evidence. Meta-analysis techniques such as Bayesian hierarchical models (BHM) can increase

external validity and generalizability of experimental results. Recent examples include Bandiera et al. (2021),

who combine 16 laboratory and ﬁeld experiments to estimate the impact of performance pay on women, and

Meager (2019), who estimates the impact of micro credit on income-generating activities and consumption.

When available data is not catalogued, meta-studies run the risk of overlooking less prominent studies.4

  Assessing estimation and prediction methods. In seminal work, LaLonde (1986) used experiments as a

benchmark to assess the bias in non-experimental estimation methods. The literature that followed (Fraker

and Maynard, 1987; Dehejia and Wahba, 1999, 2002; Glazerman et al., 2003, and many more) has matured

to the point of being able to draw conclusions about the full distribution of bias (e.g Chaplin et al., 2018).

Researchers have also used experimental data for making out-of-sample predictions and evaluating external

validity, e.g. by comparing diﬀerent prediction methods or quantifying site selection bias (e.g. Hotz et al.,

2005; Allcott, 2015; Gechter et al., 2019).

  Measurement. Researchers use existing data to validate methods of measurement for important concept

and indicators. Recent work has for example examined sources of measurement error in agricultural data

(e.g. Beegle et al., 2012; Rosenzweig and Udry, 2019) and measurement methods for women’s agency (e.g.

Donald et al., 2020; Jayachandran et al., 2021).

  Estimating structural models. Structural models can exploit the experimental variation for identiﬁcation

and enrich experimental data, for example by evaluating the role of underlying preferences and behavioral

factors for take-up decisions as a means to conduct welfare analysis or make predictions for the eﬀects of

new policies (e.g. Todd and Wolpin, 2006, 2010; Meghir et al., 2019; Guiteras et al., 2019).

  Statistical learning and machine learning. A rapidly growing literature applies machine learning methods

to RCT data. Examples include regularization methods to discipline covariate selection or the identiﬁcation

of treatment eﬀect heterogeneity (e.g. Chernozhukov et al., 2018; Chernozhukov et al., 2018), and new

sampling methods, e.g. multi-arm bandits and related adaptive experimental algorithms (e.g. Dimakopoulou

et al. (2018); Caria et al. (2021); Kasy and Sautmann (2021)). Existing RCT datasets can be used to optimize

algorithms, check large- or small-sample behavior, and simulate thousands of trials to improve speed and

evaluate performance without incurring costs or burdening subjects.
  4 An   issue potentially exacerbated by biases inherent to the publication process (Andrews and Kasy, 2019).



                                                              4
    Epistemology, RCT methodology, and research transparency. Researchers have begun to examine large sets

of studies to understand the “political economy” of designing, conducting, and publishing experiments (e.g.,

Andrews and Oster (2019) on external validity bias; Gechter and Meager (2022) on the role of pre-existing

                                    oﬄer (2017) on how data publication inﬂuences citation rates; Anderson
infrastructure for site selection; H¨

and Magruder (2017) on how to reduce the number of false discoveries; or Christensen and Miguel (2018)

on the adherence to transparent research practices).

    Summary statistics. RCT data can help both researchers and the broader public to better understand

underrepresented populations or get an overview of the body of experimental evidence. Especially in low-

income contexts, RCT data delivers a detailed picture of populations that are typically not well-represented

in available data – neither government data, such as (formal) labor market statistics, nor private data, such

as bank records. RCT data can be used to extract stylized facts, conduct exploratory research or power

calculations, and more. For example, the non-proﬁt AidGrade (2019) compiled a database of standardized

eﬀect sizes and standard errors to facilitate simple forms of comparative analysis (e.g. Vivalt, 2015, 2019).

    The metadata schema is designed to beneﬁt all these diﬀerent applications by

    • enabling ﬁltering datasets on criteria such as unit of randomization or intervention assignment strategy;

    • collecting information speciﬁc to RCTs such as interventions, treatment arms, or treatment compliance;

    • facilitating the combination of multiple RCT datasets by documenting features such as available covari-

      ates, time period covered in the data, or inclusion/exclusion criteria;

    • recording external resources such as registry entries, ethics review protocols, and academic publications,

      and documenting information such as whether a pre-analysis plan exists or who funded or partnered in

      the implementation of the RCT.

3     Creating the Metadata Schema

We followed the process outlined in ISO (2021) in creating the metadata schema. Throughout, we were

advised by a group of data scientists and data librarians from Harvard Dataverse, Harvard Medical School,

Harvard Business School, MIT, ISPS/Yale, DDI, Vivli and the World Bank Microdata Catalog.

    We started by reviewing the types of research that use RCT and experimental data (as summarized in

section 2), conducting a survey with researchers who have re-used RCT data or expressed interest in methods

research,5 and researching existing metadata schemata for social science and experimental data. The four

main sources of metadata ﬁelds we ended up using host some of the largest collections of information on

existing RCTs and RCT data (at the time of writing).
   5 Expressions of interest were collected through the Research Methods Initiative of Innovations for Poverty Action (IPA) and

the Global Poverty Research Lab, or through surveys of aﬃliated researchers of the Abdul Latif Jameel Poverty Action Lab
(J-PAL).




                                                              5
  For information related to survey data, we focused on schemata based on the Data Documentation

Initiative (DDI), an international standard for documenting survey data (DDI, 2021), in particular the

ﬁelds used in the Harvard Dataverse and the International Household Survey Network (IHSN) template of

the World Bank Microdata Catalog. The Harvard Dataverse is an accredited, cross-disciplinary data

repository that enables any researcher or institution to publish and archive data and code. J-PAL and

IPA maintain a Dataverse data collection called the “Datahub for Field Experiments in Economics and

Public Policy” that currently hosts over 200 RCT datasets. The repository builds on a suite of tools for the

publication of scholarly data (King, 2007), and its metadata schema is mapped to the DDI Codebook. The

schema is organized in blocks that can be used to customize the metadata documentation in individual data

collections (Harvard Dataverse, 2021). We considered all blocks used by the J-PAL/IPA Datahub.

  The World Bank Microdata Library hosts survey and other data from multiple institutions, including

the World Bank’s own research departments, which publish their data under the World Bank’s Open Data

Policy. It aggregates multiple named collections, including those of the Development Economics Research

Group (DECRG), the Development Impact Evaluation Unit (DIME), and the Strategic Impact Evaluation

Fund (SIEF),6 and all metadata can be accessed through an Automated Programming Interface (API) (The

World Bank, 2022). The World Bank’s IHSN microdata template has four sections compatible with DDI

– Document Description, Study Description, Datasets, and Variables Groups – and an External Resources

section compatible with the Dublin Core metadata standard (IHSN, 2022). We considered all of these for

the schema and adopted many usage recommendations from Dupriez et al. (2021).

  Most existing data catalogs do not cover ﬁelds related speciﬁcally to the design of RCTs. For these, we

turned to two important trial registries. The AEA RCT Registry of the American Economic Association is

likely the most complete record of past and ongoing RCTs in economics, including unpublished studies (AEA

RCT Registry, 2022). The registry metadata contains information speciﬁc to RCTs that is not typically

included in survey schemata, such as the intervention, randomization method, outcome measures, or the

reviewing IRB, although these contents are mostly stored in free text ﬁelds and there is no cross walk with

DDI or other metadata standards. The US National Library of Medicine and the National Institutes of

Health maintain the trial registry and results database ClinicalTrials.gov for clinical trials in the United

States (McCray and Ide, 2000). All clinical studies of drugs and devices controlled by the Food and Drug

Administration (FDA) must be registered here. The registry has detailed metadata ﬁeld deﬁnitions and

maintains an API feed (ClinicalTrials.gov, 2022). Many repositories draw from or link to this registry (e.g.

Vivli, 2022; ISRCTN, 2022). The ﬁelds on interventions, study arms, and outcome measures provided a
   6 Impact evaluation often means RCT in this context but can also mean other rigorous analysis methods aimed at estimating

causal impacts.




                                                             6
model for our schema.

  Roughly 270 metadata ﬁelds were under consideration for inclusion in the ﬁnal schema. Some metadata

schemata provided valuable insights even if their ﬁelds were ultimately not adopted; the full list of schemata

considered can be found in the GitHub repository for this project. The following principles for the

RCT metadata schema guided decision-making on (i) which metadata ﬁelds to include, (ii) the encoding

scheme for each ﬁeld including whether to create a controlled vocabulary (multiple choice options), and (iii)

the cardinality of each ﬁeld (i.e., whether the ﬁeld is optional vs. mandatory and if it can be repeated):

 1. The primary purpose of the schema is to provide information that helps identify RCT datasets for

    secondary research, i.e., using the previously collected data for new studies.

 2. The schema primarily describes the design of the data collection and content of the data, not the

    academic study or analysis results.

 3. Preference is given to DDI-compliant over existing and to existing over newly created ﬁelds.

 4. Field deﬁnitions should make information comparable across studies.

 5. The level of detail collected must balance usefulness for the purpose above with the eﬀort required to

    create an RCT metadata record for contributors.

 6. The schema must balance ease of use with completeness.

Note that principle 1 and 2 set the schema apart from data repositories that focus on making the analysis in

the original study replicable (and thus primarily record aspects of the data that were already exploited for

research). Item 2 also excludes information such as estimated treatment eﬀect sizes, which depend on the

analysis method applied. Items 3 and 4 aim to make the schema interoperable with existing catalogs. Item 4

led us to sometimes amend the original ﬁeld deﬁnitions and provide controlled vocabularies wherever possible

(see below). Item 5 ruled out information that could be diﬃcult to obtain or verify, especially ex-post (such

as information on intervention cost or budget). Item 6 means that the defaults of the schema focus on the

most common data structures, while optional free text ﬁelds allow for supplying additional information.

  Iterations of the metadata schema were extensively reviewed and tested by the authors, supported by a

group of J-PAL staﬀ, an external group of experts in RCT data reuse or metadata schemata, and the advisory

group mentioned above. Part of the testing consisted of completing the metadata ﬁelds for a set of 29 RCTs.

Most of these test datasets are hosted on the J-PAL Dataverse and were chosen based on aspects of their

design, with the aim of testing both “typical” RCTs and “edge cases” (for a full list, see the supplementary

material on the GitHub repository for this project). The purpose was both to test the metadata schema in

its entirety, including the sequencing and structure of the ﬁelds, and to select, develop and test associated

controlled vocabularies. We also tested whether the ﬁeld deﬁnitions were easy to understand, and whether

there was any diﬃculty obtaining the requested information for a given RCT. Tester feedback led us for


                                                      7
example to provide a deﬁnition of what constitutes a “dataset”; see Section V. Data below.

A Note on the Controlled Vocabularies

Though not formally part of the schema, the controlled vocabularies (CVs) are an important part of our

proposal. CVs are primarily used for ﬁltering, and as with multiple-choice survey questions, they need to

“partition” the space of possible options, meaning the set of individual entries must cover the entire universe

of options without overlap (i.e. options that could ﬁt two or more entries). Moreover, entries need to be

balanced, in the sense that they need to be speciﬁc enough to help users narrow down the set of studies of

interest, but broad enough so that each entry applies to more than a small number of studies.

    The catalog testing process7 led to numerous adaptations to existing CVs, both removal/aggregation of

entries and addition of new entries. Two examples are the ﬁelds Kind of Data and Mode of data collection

(CVs G and I in Appendix B). These ﬁelds are part of the DDI and appear in many schemata including

the IHSN, which uses a truncated version of the DDI CV (Dupriez et al., 2021). However, testers found

the IHSN CV not well-suited to describe administrative data or data from laboratory or “lab-in-the-ﬁeld”

experiments. This led us to add back relevant items from the DDI CV as well as create new entries.

    In some cases we developed entirely new CVs, such as CV E in Appendix B, which describes available types

of covariates at the cluster or group level. We see some of the newly proposed CVs as under development.

In these cases, we provide a version 0.9 in the GitHub repository of this project, with the aim of updating

to version 1.0 based on a larger body of RCT datasets.

4     The Metadata Schema in Detail

In what follows, we give an overview of the metadata schema, divided into modules I to VII. Appendix A

contains a corresponding table of all metadata ﬁelds, with the CVs we suggest listed in Appendix B. In this

section, we provide supplementary information about the schema for back-end users or contributors who

create metadata schema entries, data users who are perusing a catalog based on this schema or reading

individual metadata entries, and data stewards and application programmers integrating the schema into an

existing application or building a new catalog, also called catalog owners. We included illustrative examples

that may be helpful for establishing a standardized way to describe an RCT’s research design. Some details

serve users who are less familiar with the conventions and practices for social science RCTs; researchers who

routinely work with RCT data may ﬁnd the information in Appendices A and B suﬃcient.

    For each metadata ﬁeld, the table in Appendix A contains a name, a short description, an encoding
   7 Testers were asked to comment on the overall suitability of each CV for documenting RCTs, as well as the individual

options within each CV. Testers could also suggest new CV entries. For test ﬁlls, testers both provided free-form text responses
and selected all applicable choices from each existing CV (if any) in order to evaluate the CV’s coverage as well as potential
ambiguities.




                                                               8
scheme, and the “cardinality” of the ﬁeld, that is, whether the ﬁeld is optional or mandatory, and whether

it is unique or “repeatable”. We also included suggestions for implementation for an adopting organization

(see section 5 for more). A cross-walk with other metadata schemata is posted in the Github repository.

Note that in many cases we adopted existing ﬁelds but modiﬁed the wording of deﬁnitions for clarity in the

context of social science experiments.

What Constitutes an RCT Metadata Record?                  An RCT is an experiment combined with data col-

lection on the subjects or experimental units. It is principally deﬁned by the study population and unit

of randomization, the intervention, and the randomization procedure used to create comparable treatment

arms. The schema is designed assuming that each top-level metadata record corresponds to exactly one

study or RCT, in which a set of interventions were randomized in a sample representative of some popula-

tion described in section II. We describe how to delineate the one or more individual datasets that are part

of an RCT below in section V on “Data”.

I    Basic Information

The basic information ﬁelds in the schema provide summary information on the RCT.

    The section contains all the information needed to cite RCT data, including the study title as well as

author/data owner names and aﬃliations. Original data is a contribution to science separate from publica-

tions based on the data. Contributors may consider crediting a larger or diﬀerent set of individuals from the

academic article. If the data is stored in a repository or other citable location, the metadata record should

include the same citation information that is also provided with the data.

    Similarly, the abstract describes the purpose, nature, and scope of the RCT and data, and may contain

more or diﬀerent information from the paper abstract. The topic classiﬁcation uses a CV to describe the

area of research. While we tested CVs such as CESSDA, IHSN, the World Bank themes, and the J-PAL/IPA

sectors, ultimately an organization adopting the metadata schema may choose a topic CV based on its own

requirements and use case. For example, an economic journal may choose to use the JEL codes (AEA, 2022).

The version and version date ﬁelds provide version control for the metadata record.

II     Study Population

The study population section provides information related to the study as a whole and who was included

in it, covering location, study population, and study sample.

    Contributors can choose the country of intervention from a CV (ISO country code), with the option to add

free-text detail on geographical coverage, as well as any inclusion and exclusion criteria for the intervention

studied. In a policy context, these might be formal eligibility criteria for a social program or beneﬁt; the



                                                      9
researchers might also apply other research-related conditions for inclusion into the treatment and control

groups. Jointly, the geographical information and inclusion/exclusion criteria describe the sampling frame

from which the randomization units in the treatment and control groups are (randomly) drawn (see also

section IV on sampling method).

  The second set of ﬁelds concerns the unit of randomization as the primary unit of statistical analysis (as

opposed to the unit of observation; see below). A randomization unit can be an individual experimental

unit or a group (a cluster). Contributors are asked to choose the randomization unit from a controlled

vocabulary. The CV we recommend expands the DDI CV considerably to account for frequently occurring

units of randomization in social science experiments, such as households, businesses, or schools. The CV

allows for separate description of physical units (e.g., a production line, a class room) and administrative or

legal units (e.g., all employees of a ﬁrm, all students at the same grade level). This distinction can matter for

example with interventions that exhibit physical spillovers, such as health interventions. Health interventions

are for example often assigned at the school, classroom, or grade level (see e.g. Parker et al., 2021).

  With individual-level randomization, the unit of randomization is typically the same as the unit of ob-

servation. With cluster-level randomization, each randomization unit may contain several observation units

or even diﬀerent types of observation units.8 A unit counts as a targeted randomization unit or cluster if

it was intended for inclusion in the study (either to receive an intervention or in the control group), even

if the intervention was ultimately not oﬀered or received as intended, or if no outcomes were measured. A

unit counts as an actual randomization unit if at least one outcome was measured for one observation unit

within the cluster post-intervention. There may be a variety of reasons for a discrepancy between actual and

targeted sample sizes. This could be random variation (e.g. patients or job seekers visiting a facility on a

given date), but also implementation errors. Note that even if a targeted unit is assigned the experimental

intervention as planned, there may be non-compliance, i.e. subjects may not take up the intervention or

circumvent or counteract it. Compliance is covered in section IV; actual study sample size should include

non-compliers. The ﬁelds in this section pool observations across waves and treatment arms to provide

high-level information on the size of the study. For a breakdown by arm see section III.

  Cluster-randomized studies may randomly assign diﬀerent units of observation, say, buyers and sellers.

Note, however, that the unit of randomization is “one level higher”, i.e. (for example) the market in which

buyers and sellers interact. Even though individual buyers and sellers are randomized into treatment, the

random variation used in the analysis comes from the diﬀerent treated shares on both sides of the market

(in the ﬁeld “Study was designed to analyze” in section IV, contributors would in this case report “general
   8 The unit – or units – of observation are recorded in section V. Data below, how randomization units are sampled is in

section IV. Study Design, and how randomization units are assigned to treatment arms is detailed in section III. Outcomes and
Interventions.



                                                             10
equilibrium eﬀects”). Similarly, in the cross-over design in Lopez et al. (2022) (see below), patients arriving

at clinics at diﬀerent days received diﬀerent treatment arms. However, the level of randomization is the

clinic. In the edge case where an academic study uses two separate randomization procedures and presents

the treatment eﬀects from each separately – i.e. uses the randomization at diﬀerent levels for identiﬁcation

– contributors may choose to create two metadata records, which can be linked in section VII. Otherwise we

recommend reporting the higher level unit of randomization (e.g. the market vs. the buyer or seller).

Example: The study “Targeting the Poor: Evidence from a Field Experiment in Indonesia” by Alatas et al.

(2012) compared diﬀerent methods for targeting aid to poor households. To create their sample, the authors

chose three provinces in Indonesia, then randomly selected 640 villages from those provinces (stratiﬁed

by geographic location, see below). The choice of the three provinces should be described as part of the

inclusion/exclusion criteria. The description should also include that larger villages with more than 100

households per sub-village on average were excluded from sampling in one district. The randomization unit

in this study was the village, while the unit of observation was the household. Both the targeted and actual

sample sizes were 640 villages and sub-villages and 5756 households.

III   Outcomes and Interventions

This section concerns the tested interventions and outcomes. The ﬁrst set of ﬁelds describes the outcome

measures collected. The ﬁelds in this section are repeated for each outcome variable. Contributors provide

a short free text name, a category chosen from a controlled vocabulary, an optional free-text description,

and a yes/no answer as to whether the outcome was measured at least once prior to the intervention (“at

baseline”). Baseline outcome measures are relevant for treatment eﬀect estimates but can also serve a range

of secondary research purposes, such as summary statistics of the study population. We tested diﬀerent CVs

for the outcome categories but ultimately decided against making a recommendation before carrying out

further testing with more (and more varied) RCTs. The free text description ﬁeld can be used to provide

additional information such as the unit of measurement, data format and type, or any transformations

applied. This is especially useful when comparing data from diﬀerent studies.

  The next part of the module records the actual interventions and their assignment to study arms. A

social science RCT may test the eﬀects of a policy change, encouragement, information, or other process or

action. In laboratory experiments, experimental treatments can include complex variation of the incentives,

game forms, or information provided that govern the interactions of participants. An intervention is any

experimental manipulation of the participants’ environment. An arm is a randomly selected subgroup of

participants that receives none, one, or multiple interventions as part of the study. A canonical RCT

consists of one treatment arm receiving the intervention and a control group arm that does not receive the


                                                      11
intervention. In practice, RCT designs are often more complex, with multiple study arms and diﬀerent

interventions or intervention levels. The schema proceeds by listing all interventions and arms, and then

matching none, one, or multiple interventions to each arm. This structure closely follows ClinicalTrials.gov ;

to our knowledge, no metadata schema used in the social sciences provides this level of detail.9

  Contributors are ﬁrst asked to list the study interventions by assigning a short name, classifying the

intervention type, and then providing an optional free text description. The CV for the intervention type

is still in development, as the CVs tested did not provide the right balance between broad classiﬁcations

and detailed options for the kinds of interventions that occur frequently in social science RCTs. Note that

the intervention type is one of three key ﬁelds that either alone or in combination help users narrow down

the content of the study; the other two are the topic area (under I. Basic Information), and the outcome

measures (see above). All three may be diﬀerent: for example, Oster and Thornton (2012) randomize the

distribution of menstrual cups in Nepal; they measure take-up by direct recipients as well as individuals in

their social networks as the outcome; and the aim of the research is to understand peer eﬀects in technology

adoption. Finalizing the CVs for these ﬁelds will require testing with a larger body of studies. The new CVs

will be posted on the GitHub repository for this project.

  Next, contributors can select the intervention assignment strategy from a CV and then optionally provide

a more complete description. The CV options are adapted from ClinicalT rials.gov and include parallel,

factorial, and cross-over assignment, as well as the option “other”.10 Contributors can also provide additional

information, such as whether the random assignment was carried out using stratiﬁcation. We encourage

listing out all stratiﬁcation variables. Stratiﬁcation and other procedures aimed at improving balance such

as re-randomization typically reduce variance but also need to be accounted for in the analysis.

  Finally, the section records the study arms. Contributors can give an identifying name to each arm, list

the targeted and actual number of randomization units in the arm, and then indicate which intervention(s)

this group received (if any: a control group may receive no interventions). This allows users to back out,

for example, how many subjects received the same intervention across treatment arms, and (for factorial

designs) which intervention combinations are observed in the data.

  Some experimental designs are common in the social sciences but less so in clinical trials. We make some

recommendations on how to describe these designs using the options provided. One example is random-

ized phase-in, meaning that the intervention is randomly assigned to start at diﬀerent times for diﬀerent

experimental arms, and the comparison between the arms (already) receiving the intervention and the arms
   9 For example, the AEA RCT Registry and YARD only have free text ﬁelds to describe the intervention(s) and do not

diﬀerentiate between interventions and arms.
  10 Note that we dropped “single group assignment” and “sequential assignment” as these assignment strategies do not create

a comparison group and therefore do not constitute an RCT by the common understanding of this term in the social sciences.




                                                            12
not (yet) receiving the intervention in each period is used to estimate the treatment eﬀect. We recommend

using the “crossover design” option, deﬁning one arm for each study group that starts the intervention at

a diﬀerent time, and explaining the timing of the phase-in and duration for which each arm receives the

intervention in the free-form text ﬁeld.

  Example: In Barrera-Osorio et al. (2020), 101 private secondary schools in Uganda were randomly assigned

to receive per-student vouchers from the government, 51 starting in the 2011 school year, and 50 starting in

the 2012 school year. The authors use the diﬀerence in intervention start date to estimate the short-term

impact of the public-private partnership program on student enrollment and performance. Other examples

of studies using phase-in designs are given in Bouguen et al. (2020).

  In more standard cross-over designs, each arm receives diﬀerent (possibly all) interventions sequentially,

and only their order is randomly assigned. Such designs are frequently used in laboratory experiments. A

cross-over experiment with two treatment conditions A and B might then be deﬁned as having two arms and

two interventions, where both arms receive both A and B (but one arm receives intervention A ﬁrst and the

other receives intervention B ﬁrst).

  Example: In Lopez et al. (2022), the two clinic-based interventions consisted of discount vouchers for a

free course of malaria treatment, given either to physicians to pass on to patients at their discretion (“doctor

voucher”), or directly to patients before the consultation with the physician (“patient voucher”). The days

on which each clinic received either no intervention, the doctor voucher intervention, or the patient voucher

intervention were selected based on a randomized schedule. This design could be described as a cross-over

design with 60 arms (each clinic) that each received both interventions. The rotational calendar that was used

can be described in the free-text ﬁeld. This cross-over design is unusual in that every clinic was randomized

into a diﬀerent schedule.

  Factorial (cross-randomization) designs combine two (or more) types of interventions with diﬀerent levels

or intensities, leading to arms that each receive diﬀerent combinations of (levels of) interventions. The

simplest factorial designs have two interventions, and the “levels” might simply consist of either receiving

the intervention, or not receiving it. The four treatment arms then receive A and B, only A, only B, or

neither. More complex factorial designs might involve multiple levels of each type of intervention.

  Example: Cohen et al. (2015) cross-randomized several subsidy levels (from 0% to 92%) for malaria

medication and malaria tests to understand how the availability of aﬀordable testing aﬀected demand for

malaria treatment. This design may be recorded by deﬁning one intervention for each subsidy level and each

subsidized good that appears in the study. Factorial assignment encompasses “fractional” factorial designs

which may drop some of the cells that would be created by a full cross-combination of all intervention levels.

  To some degree, the deﬁnition of arms and interventions is up to the metadata contributor. For example,


                                                      13
some policies or programs consist of a bundle of diﬀerent types of interventions, such as a health consultation

combined with a discount on a health product. An RCT may only test some combinations of these compo-

nents; e.g., a family planning consultation with or without a discount on birth control, but not a discount

without a consultation. Such designs are technically closer to a parallel design than a factorial design, since

the eﬀect of the discount cannot be independently assessed from the eﬀect of the consultation. That said,

contributors may still choose to deﬁne the two interventions “discount” and “consultation” (as in a factorial

design) rather than “consultation plus discount” and “consultation.” This can make sense especially if the

two intervention types diﬀer substantively and data users might be looking for interventions of only one type.

Contributors should still select the intervention assignment strategy that best ﬁts (in this case, parallel).

  Recently, adaptive experimental designs have received increased attention in the social sciences. One type

of experiment uses the information learned during early observations or waves in the experiment to alter the

assignment shares of the diﬀerent treatment arms (e.g., Kasy and Sautmann, 2021; Caria et al., 2021). Even

though the arm size is ex-ante unknown, we recommend using the “factorial” or “parallel” CV entries, using

the arm sizes targeted in each stage of the adaptive design, and the free-text ﬁeld to describe the adaptive

assignment strategy. Other experiments study the optimal treatment of a given experimental unit over time,

see e.g. Almirall et al. (2014). This may include randomized changes to the treatment over time. Here, the

contributor might choose the option “other” and describe the assignment strategy in the free text ﬁeld.

  The last two ﬁelds in this section record the overall time period of all interventions. The timing of the

individual intervention is not recorded separately in order to reduce the burden of completing the intervention

ﬁelds, but if the timing of an intervention is important to the design, contributors can either use the free-

text ﬁeld or choose to deﬁne separate interventions that distinguish time of treatment receipt. This may

be appropriate in phase-in or cross-over designs as above; if the intervention de facto changes over time

(e.g. a remedial tutoring program with a changing curriculum); or if treatment eﬀects are expected to diﬀer

signiﬁcantly based on timing or length of the treatment.

IV    Study Design

This module provides further information on the research design of the study. The ﬁrst ﬁeld asks whether

the study builds on or extends a prior RCT. This could be the case if a new outcome is measured, or an

intervention is added, but the randomization and sample of at least some of the original study are retained.

This information can be important for assessing statistical power or identifying related data sources.

  The next set of ﬁelds records how the randomization units in the study were sampled from the sampling

frame described by the ﬁelds in section II. Contributors deﬁne the sampling type using the associated CV

detailed in Appendix B and can then (optionally) describe the sampling method. When the sampling strategy


                                                      14
is simple, the “type” ﬁeld may be suﬃcient. For example, an RCT may include the entire population in a

location – all students in a district’s schools, etc. – in which case the sampling method is “1. Total universe

(population)”; no further explanation is required. When sampling was carried out in multiple stages that

involve diﬀerent forms of probability selection (Option 2.5: Probability - Multistage), or using a mixed

strategy (Option 4: Mix of probability and non-probability sampling), providing a description of the process

is very helpful to data users. Note that sampling type refers to the method for sampling randomization units.

If diﬀerent, information on the sampling of the observational units may be provided in section V. The Study

sampling method: Description ﬁeld also allows contributors to provide more detail on how the sample size

was chosen (e.g., a sample determined by ex ante power calculations vs. an RCT at scale), and why targeted

and actual numbers of randomization units may not be the same (see above).

Example: The Indonesia study by Alatas et al. (2012) sampled randomization units using a stratiﬁed mul-

tistage design. The authors randomly selected 640 villages from the included three provinces based on a

30/70 urban/rural split, and then randomly selected one sub-village (neighborhood) from each village. On

their own, the two sampling stages could be described as “2.3.1 Probability - Stratiﬁed: Disproportional

stratiﬁed” and “2.1 Probability - Simple random,” respectively. In the CV, the contributor should select

“2.5 Probability - Multistage” and describe the two selection stages for the village and sub-village in the

free text ﬁeld. Additional helpful context for the Study sampling method: Description ﬁeld could include

that, even though the targeted and actual sample sizes are equal, ﬁve of the originally selected villages were

replaced prior to the randomization for various reasons.

  The other elements of the Study Design section provide additional information relevant for the variability

and external validity of the treatment eﬀect estimates. First, contributors can select the types of covariates

available in the data from a CV, with the decision to indicate types of available covariates rather than

variable-level information aiming to not overly burden contributors. After testing diﬀerent CVs, we propose

a CV adapted from GESIS (Hoﬀmeyer-Zlotnik, 2016) to describe individual-level covariates, and a new

vocabulary to describe covariates at the cluster level, such as household or other group-level characteristics.

Information on covariates is needed for meta-analyses and can also be useful for methodological research.

For instance, Tabord-Meehan (2018) uses data from an experiment on increasing charitable donations by

Karlan and Wood (2017) to demonstrate a new method of adaptive stratiﬁcation that uses information from

a ﬁrst experimental wave to select “stratiﬁcation trees” in the second wave.

  The next ﬁeld of the schema includes a new optional ﬁeld that describes which forms of treatment eﬀects

the study was designed to analyze. This includes the “intent to treat” eﬀect, average treatment eﬀect,

and local average treatment eﬀect or average treatment eﬀect on the treated. Note that the latter two

imply that compliance with the treatment assignment is known (see below). A study that is designed to


                                                      15
analyze “4. Heterogeneous treatment eﬀects or eﬀects by subgroup” is powered to detect treatment eﬀects in

each population subgroup. The researchers may conduct disproportional stratiﬁed sampling and oversample

subgroups that constitute a small share of the population in order to estimate treatment eﬀects in this

subgroup. A study designed to measure “5. General equilibrium eﬀects” might measure outcomes for groups

other than the directly aﬀected group and randomize the interventions at the market level, rather than the

individual level. An example might be to measure the eﬀects of providing the unemployed with job search

assistance on salaries and ﬁrms. A study that captures “6. Spillovers or externalities” measures eﬀects of

treating one unit on other units in the vicinity. This is often done by varying the share of treated units

within a cluster and requires collecting outcome data on untreated units.

             epon et al. (2013) randomly varied the share of unemployed job seekers in a city receiving a
  Example: Cr´

job placement assistance program in order to study displacement eﬀects on those not receiving the program.

  These design features are speciﬁc to social science RCTs, and to our knowledge there exists to date no

CV for them. We consider this CV under development.

  The last ﬁeld of this section describes compliance with the randomized intervention assignment. In some

situations, treatment assignment is not identical with treatment receipt or take-up. Those assigned to the

intervention may not actually receive it, and conversely those not assigned to it may nonetheless gain access.

  Example: Imperfect compliance is particularly common in so-called encouragement designs. In an exper-

iment with around 1500 small ﬁrms in Tajikistan, Okunogbe and Pouliquen (2022) vary whether ﬁrms are

trained and provided assistance for ﬁling their taxes electronically in order to estimate the eﬀect of e-ﬁling

on tax payments and other outcomes. About 60% of the control, but 93% of the treatment group adopt

e-ﬁling. This is an example of “two-sided non-compliance”:11 some ﬁrms in the treatment group do not use

e-ﬁling while many in the control group do.

  The degree of treatment compliance is important for external validity and potential selection eﬀects and

has also been used in methodological research; for example, Bernard et al. (2022) use imperfect compliance

RCTs to estimate the bias of observational methods in practice. Partial compliance is more common in the

social sciences than in laboratory or medical trials, which are typically closely controlled. At the same time, it

is often diﬃcult to unambiguously deﬁne and measure. We ask contributors to use the free-form compliance

description ﬁeld to explain what forms of non-compliance are in principle possible, how compliance was

measured, and what the actual rates of non-compliance are.
 11 See   Angrist and Pischke (2009) for the terminology of one-sided and two-sided compliance.




                                                              16
V    Data

The data section of the metadata schema describes the actual data available to users, arranged in one or

more datasets. For data with restricted access, access modalities can be described in the External Resources

section. Data that are not accessible to anyone but the original researchers should not be described.

  More than one dataset may be associated with a given RCT. For example, the contributors may have

collected census or administrative data for a larger sample than the ultimate study population, or speciﬁc

information on two diﬀerent populations aﬀected by the RCT (e.g. buyers and sellers of a good). Datasets

are distinct from data ﬁles; a dataset may be broken up into several ﬁles and even stored in multiple locations,

for instance restricted-access GPS data vs. publicly accessible de-identiﬁed survey responses.12 In general,

a set of records may constitute its own dataset if it contains information central to the study, such as a

separate outcome measure or the sampling frame, and (i) consists of observational units from a distinct

study population (or sample from the study population) or (ii) is based on an independent data source (e.g.

with a speciﬁc mode of data collection).

  What delineates a dataset is ultimately up to the metadata contributor, but we recommend keeping the

number of datasets to the minimum needed to describe the data well for users; often the data associated

with a given RCT can be characterized as a single dataset. For example, the same dataset may contain

several rounds of data collection, and the metadata schema allows multiple “cycles” (including repeated

cross-sections). Even data from diﬀerent sources can often be treated as part of the same dataset.

  In some cases it can be useful to deﬁne a separate dataset to describe the data in suﬃcient detail. For

example, two measures of the same outcome, such as measures of crop productivity obtained through in-

person audits and satellite imagery, may have large discrepancies in coverage and diverging numbers of

observations. In cluster-randomized studies, data may be available both at the cluster and individual levels,

and these datasets should be described separately if each contains primary outcome measures. Similarly,

a full census in the study area followed by sampling a subset of the population for the interventions and

endline data collection warrants deﬁning separate datasets for each data collection round.

  For each dataset, contributors are asked to deﬁne a short description or name, then provide information on

the types and number of observational units, ﬁrst in total and then, further below, per arm. This information

is collected at the dataset level in order to allow for multiple units of observation, for example outcomes

measured at the teacher and the student levels. The per-arm ﬁelds are optional to allow description of

data collected prior to treatment assignment (e.g. a study population census). A unit counts as a targeted

observation unit if it was selected or intended for data collection. This may be an estimated number. A
 12 Conversely,   a data ﬁle may contain several datasets (e.g. an excel ﬁle with several sheets).




                                                                17
unit counts as an actual observation unit if at least one outcome was measured for it. With clustered

randomization designs, the targeted and actual numbers of randomization and observation units may all be

diﬀerent. Targeted and actual number of observations may diﬀer in particular if there is survey attrition,

which gives an indication of data quality and can help answer methodological questions (e.g. when researching

study designs intended to limit attrition).

  Contributors are also asked to deﬁne the time method (i.e. whether the data contains one or more cross-

sectional samples or panel data) and the number of cycles (i.e. waves or rounds of data collection), list the

modes of data collection in detail, describe the sampling method, and specify whether there are sampling

weights. Note that researchers may technically carry out separate power calculations and devise sampling

procedures for diﬀerent units of observation within the same study. In order to collect this information in

one place, any information related to power calculations (including the determination of the targeted number

of units of observation) should be included in the ﬁeld IV.3 “Study sampling method: Description”. The

method of sampling for the randomization unit is included in IV.2-3, but the method of sampling the unit

of observation within the unit of randomization and any discrepancy between targeted and actual units of

observation is recorded here in ﬁeld V.1.I.

  A free text ﬁeld allows contributors to provide data collection notes, which may describe for example how

the observational units in the dataset were sampled, when during the experiment data was collected (e.g.

at baseline, midline, or endline), and what quality controls were in place for the data collection. Finally,

contributors are asked to provide information on the timing of data collection by specifying the time period

covered in each data cycle, and, if diﬀerent (e.g. in retrospective surveys) the dates of data collection.

VI    Ethics and Research Transparency

This section allows contributors to provide information that may be relevant to the legitimacy, external

validity, credibility, and robustness of the study and its data. This includes information on ethics review

conducted, speciﬁcally the reviewing institution(s) and protocol number(s), as well as information on funding

or supporting bodies and implementation partners. Here, survey ﬁrms, collaborating government agencies,

and other parties can be named.

  Contributors can also indicate whether any documentation is available to users on the ethics of the study,

such as consent forms or a structured ethics appendix, as proposed by Asiedu et al. (2021), and to what

degree records related to registration or pre-speciﬁcation exist. The CV includes options such as “pre-results

acceptance” for studies accepted into a journal based on the pre-speciﬁed research design, and “populated

pre-analysis plan” for cases where the researchers produced a document containing the analysis exactly as

pre-speciﬁed, typically separate from the research paper (see Banerjee et al. (2020)). The CVs for both ﬁelds


                                                      18
were newly developed.

VII      External Resources

This section provides information on external resources available for the study, including the location of the

described data. For each resource, contributors can provide a type and free-form description, citation, link

(DOI/URL), and information on the access policy for this resource.

    If available, contributors should not only point to the data itself, but also to resources describing the data

and study such as academic publications, reports, codebooks, ethics protocols, etc. The controlled vocabulary

for the resource type is adapted from the World Bank Microdata Catalog but allows as additional types

“database or data repository entry” (to describe the locations of the study’s datasets), “trial registration”,

“pre-analysis plan”, “populated pre-analysis plan”, and “research ethics documentation”. The last four

entries complement the information in section VI.

    Data from a single RCT may be stored in diﬀerent locations (e.g. replication data in a repository like

Harvard Dataverse, access-restricted identifying data with the researcher, administrative data with a separate

data provider); each resource should be linked. Conversely, diﬀerent external resources, such as the data itself

along with replication code, are sometimes stored in the same location or with the same citation information.

The CV allows contributors to enter the information once and tick all options that apply under type and

provide additional detail in the description.

5     Creating a Metadata Catalog Based on the Metadata Schema

We close with a few considerations for implementing a catalog based on the proposed schema, with a focus

on maximizing the quality, consistency, and usability of the metadata and catalog.

Catalog:      An RCT metadata catalog consists principally of a back-end data entry interface; front-end

search, ﬁltering, and display functions; and an underlying database of metadata ﬁelds. Catalog owners

may consider modifying the schema. For users interested in adopting only a subset of the metadata ﬁelds,

we made an eﬀort to minimize dependencies across modules (with the exception of core properties of the

RCT such as the study arms, which appear in several modules). Catalog owners may also be interested in

expanding the catalog beyond recording the properties of the data, for example by adding ﬁelds for analysis

results, such as standardized treatment eﬀect sizes, or contextual information, such as intervention costs.13

    To fully support interoperability and bulk uses of the metadata, we recommend an API interface to access

and download the catalog contents. This functionality also allows the catalog to be read by data search
  13 These are not part of our schema because they often require a separate eﬀort: veriﬁcation (e.g. by replicating the analysis),

modiﬁcation (e.g. by standardizing eﬀect sizes or harmonizing analysis approaches), and possibly additional research/estimation
of data not in the public domain (such as the ﬁxed and variable intervention costs).




                                                               19
engines like Google Dataset Search, increasing reach and accessibility. Since the CVs for this schema are

under development, we recommend that early adopters use free text ﬁelds after “other” options to collect

information for potential expansions or modiﬁcations to the CVs.14 Finally, user accounts allow contributors

to track and edit catalog entries after they have been posted, data owners to claim entries, and front-end

users to save searches or individual records, while a data review system contributes to ensuring data quality

control and avoiding version control issues, duplication of entries, or so-called “ghost entries”. For this

purpose, incomplete entries could be regularly ﬂagged for purging.

Back-end:       The data entry interface for contributors should minimize the time and eﬀort (and hence po-

tential for errors) required to complete an entry to help maintain data quality. This includes making use

of existing information where available. Especially for bulk entry of existing RCT data, a sophisticated

implementation could draw on existing public records where available, such as the API interfaces of Clinical-

Trials.gov and the World Bank Microdata Catalog,15 , or the downloadable AEA RCT Registry data. New

catalog entries could also be partially pre-ﬁlled by scraping web data from external resources such as registry

entries and academic articles. This could best be achieved by beginning the data entry with the links to

(some) external resources (which would also permit a duplication check within the catalog).

  Within a metadata entry, information ﬁlled early on can provide inputs to later ﬁelds, such as pre-ﬁlling

the treatment arms that were recorded in the “outcomes and interventions” section (III) in the “data” section

(V). The catalog can also prompt contributors for information based on earlier inputs, such as suggesting

entries for ethics documentation (section VI) or prior studies (section IV) in the “external resources” section

VII. Automated cross-checks and data validation can additionally be applied to numeric and multiple-choice

entries. We make speciﬁc suggestions in the programming notes in Appendix A. These features minimize

errors and duplication of eﬀort and improve consistency. Incomplete entries should be regularly automatically

saved to prevent data loss.

  An intuitive interface and extensive user support can also help with data entry. For example, CVs should

be implemented as multiple choice radio buttons or “select all that apply” tick boxes; at the end of an entry

in a loop, the navigation should allow users to choose between adding another entry or moving to the next

ﬁeld. In addition to the ﬁeld and CV deﬁnitions in the Appendix, the interface could provide help “bubbles”

containing longform instructions and detailed examples that could be drawn from this article. This could

be especially useful for describing the intervention assignment and study design in sections III and IV.

  Front-end: Careful interface design for front-end users can facilitate a quick overview over multiple

datasets as well as individual RCTs. An individual study view could pull selected information from various
  14 The authors encourage submitting suggestions for such changes through the GitHub for this project.
  15 This is facilitated by the crosswalk we created to link individual ﬁelds in our schema to those in other schemas; see this
project’s GitHub repository.


                                                              20
modules and arrange it for better reading. An example is the cross-reference between which arms and

interventions, which would be most intuitively displayed as a table. Ideally, users can also save searches or

selected studies (see above) and visualize various metadata ﬁelds for this set (e.g. show the share of entries

with a speciﬁc value for CV ﬁelds). Front-end users should also have access to the same longform/help

information as back-end users to facilitate interpretation of the data.

  The most important usability features for a catalog are the search and ﬁlter options available. Boolean

operators AND, OR, and NOT improve ﬁltering within CVs or across ﬁelds, e.g. they could allow users to

ﬁnd all studies outside of a speciﬁc country, or studies that cover speciﬁcally early childhood and primary

education. Mathematical operators (>, ≤, etc.) on dates and numerical entries can for example help ﬁnd

studies within certain time periods, or of a certain sample size. A WYSIWYG mask for constructing a

search within and across ﬁelds helps ﬁrst-time users, while a free-text entry ﬁeld supporting advanced search

functions makes the exact criteria applied replicable for other users, e.g. for meta-analysis purposes.

  Just as with the metadata schema itself, standardization and free access are paramount for fostering the

reuse of RCT data and promoting equitable access. We therefore encourage catalog implementers to make

user access free and to use open-source programming to allow other catalog owners to adopt useful features.




                                                      21
References

AEA (2022). JEL Classiﬁcation System/EconLit Subject Descriptors. URL: https://www.aeaweb.org/

econlit/jelCodes.php?view=jel (07/22/2022).

AEA RCT Registry (2022). The American Economic Association’s registry for randomized controlled trials.

URL: https://www.socialscienceregistry.org (12/09/2022).

AidGrade (2019). Aidgrade. URL: http://www.aidgrade.org/ (08/25/2019).

Alatas, V., A. Banerjee, R. Hanna, B. A. Olken, and J. Tobias (2012). Targeting the poor: Evidence from

a ﬁeld experiment in Indonesia. American Economic Review 102 (4), 1206–40.

Allcott, H. (2015). Site selection bias in program evaluation. Quarterly Journal of Economics 130, 1117–1165.

Almirall, D., I. Nahum-Shani, N. E. Sherwood, and S. A. Murphy (2014). Introduction to SMART designs

for the development of adaptive interventions: with application to weight loss research. Translational

Behavioral Medicine 4 (3), 260–274.

Anderson, M. L. and J. Magruder (2017). Split-sample strategies for avoiding false discoveries. Technical

Report 23544.

Andrews, I. and M. Kasy (2019). Identiﬁcation of and correction for publication bias. American Economic

Review 109 (8), 2766–2794.

Andrews, I. and E. Oster (2019). A simple approximation for evaluating external validity bias. Economics

Letters 178, 58–62.

Angrist, J. D. and J.-S. Pischke (2009). Mostly Harmless Econometrics: An Empiricist’s Companion. Prince-

ton University Press.

Asiedu, E., D. Karlan, M. Lambon-Quayeﬁo, and C. Udry (2021). A call for structured ethics appendices in

social science papers. Proceedings of the National Academy of Sciences 118 (29), e2024570118.

Bandiera, O., G. Fischer, A. Prat, and E. Ytsma (2021). Do women respond less to performance pay?

building evidence from multiple experiments. American Economic Review: Insights 3 (4), 435–54.

Banerjee, A., E. Duﬂo, A. Finkelstein, L. F. Katz, B. A. Olken, and A. Sautmann (2020). In praise of

moderation: Suggestions for the scope and use of pre-analysis plans for rcts in economics. Technical report,

National Bureau of Economic Research.

Barrera-Osorio, F., P. de Galbert, J. Habyarimana, and S. Sabarwal (2020). The impact of public-private

partnerships on private school performance: Evidence from a randomized controlled trial in uganda. Eco-

nomic Development and Cultural Change 68 (2).

Beegle, K., C. Carletto, and K. Himelein (2012). Reliability of recall in agricultural data. Journal of

Development Economics 98 (1), 34–41. Symposium on Measurement and Survey Design.


                                                     22
                              e-Ferret, J. de Quidt, J. Fliegner, and R. Rathelot (2022). How biased are
Bernard, D., G. Bryan, S. Chab´

observational methods in practice? accumulating evidence using randomised controlled trials with imperfect

compliance. Ongoing work .

Bouguen, A., Y. Huang, M. Kremer, and E. Miguel (2020). Using randomized controlled trials to estimate

long-run impacts in development economics. Annual Review of Economics 68 (2).

Caria, S., G. Gordon, M. Kasy, S. Quinn, S. Shami, and A. Teytelboym (2021). An adaptive targeted ﬁeld

experiment: Job search assistance for refugees in Jordan. Working paper .

Chaplin, D. D., T. D. Cook, J. Zurovac, J. S. Coopersmith, M. M. Finucane, L. N. Vollmer, and R. E.

Morris (2018). The internal and external validity of the regression discontinuity design: A meta-analysis of

15 within-study-comparisons. Journal of Policy Analysis and Management 2 (37), 403–429.

Chernozhukov, V., D. Chetverikov, M. Demirer, E. Duﬂo, C. Hansen, W. Newey, and J. Robins (2018). Dou-

ble/debiased machine learning for treatment and structural parameters. The Econometrics Journal 21 (1),

C1–C68.

Chernozhukov, V., M. Demirer, E. Duﬂo, and I. Fernandez-Val (2018). Generic Machine Learning Inference

on Heterogenous Treatment Eﬀects in Randomized Experiments. arXiv e-prints , arXiv:1712.04802v3.

Christensen, G. and E. Miguel (2018). Transparency, reproducibility, and the credibility of economics re-

search. Journal of Economic Literature 56 (3), 920–80.

ClinicalTrials.gov (2022). ClinicalTrials.gov is a database of privately and publicly funded clinical studies

conducted around the world. URL: https://clinicaltrials.gov (12/09/2022).

Cohen, J., P. Dupas, and S. Schaner (2015). Price subsidies, diagnostic tests, and targeting of malaria

treatment: evidence from a randomized controlled trial. American Economic Review 105 (2), 609–45.

  epon, B., E. Duﬂo, M. Gurgand, R. Rathelot, and P. Zamora (2013). Do labor market policies have
Cr´

displacement eﬀects? Evidence from a clustered randomized experiment. The Quarterly Journal of Eco-

nomics 128 (2), 531–580.

DDI (2021). Document, Discover and Interoperate. URL: https://ddialliance.org/ (08/27/2019).

Dehejia, R. H. and S. Wahba (1999). Causal eﬀects in nonexperimental studies: Reevaluating the evaluation

of training programs. Journal of the American Statistical Association 94 (448), 1053–1062.

Dehejia, R. H. and S. Wahba (2002). Propensity score-matching methods for nonexperimental causal studies.

The Review of Economics and Statistics 84 (1), 151–161.

Dimakopoulou, M., Z. Zhou, S. Athey, and G. Imbens (2018). Estimation Considerations in Contextual

Bandits. arXiv e-prints , arXiv:1711.07077v4.

Donald, A., G. Koolwal, J. Annan, K. Falb, and M. Goldstein (2020). Measuring women’s agency. Feminist

Economics 26 (3), 200–226.


                                                     23
Dupriez, O., D. M. Sanchez Castro, and M. Welch (2021). Quick reference guide for data archivists. URL:

https://guide-for-data-archivists.readthedocs.io (02/22/2021).

Fraker, T. and R. Maynard (1987). The adequacy of comparison group designs for evaluations of employment-

related programs. The Journal of Human Resources 22 (2), 194–227.

Gechter, M. and R. Meager (2022). Combining experimental and observational studies in meta-analysis: A

mutual debiasing approach. Working paper .

Gechter, M., C. Samii, R. Dehejia, and C. Pop-Eleches (2019). Evaluating ex ante counterfactual predictions

using ex post causal inference. arXiv e-prints , arXiv:1806.07016v2.

Glazerman, S., D. M. Levy, and D. Myers (2003). Nonexperimental versus experimental estimates of earnings

impacts. The Annals of the American Academy of Political and Social Science 589, 63–93.

Guiteras, R., J. Levinsohn, and A. M. Mobarak (2019). Demand estimation with strategic complementarities:

Sanitation in Bangladesh. CEPR Discussion Paper No. DP13498.

Harvard Dataverse (2021). Dataverse documentation v. 5.3. URL: https://guides.dataverse.org/en/5.3/

(02/22/2021).

 oﬄer, J. (2017). Replication and economics journal policies. American Economic Review 107 (5), 52–55.
H¨

Hoﬀmeyer-Zlotnik, J. H. P. (2016). Standardisation and harmonisation of socio-demographic variables.

Hotz, V. J., G. W. Imbens, and J. H. Mortimer (2005). Predicting the eﬃcacy of future training programs

using past experiences at other locations. Journal of Econometrics 125, 241–270.

IHSN (2022).     DDI Metadata Editor (Nesstar Publisher 4.0.10).           URL: https://ihsn.org/software/

ddi-metadata-editor (07/22/2022).

ISO (2021).     ISO/TC46/SC11N800R1 Building a metadata schema – where to start.                URL: https:

//committee.iso.org/ﬁles/live/sites/tc46sc11/ﬁles/documents/N800R1%20Where%20to%20start-advice%

20on%20creating%20a%20metadata%20schema.pdf (02/15/2021).

ISRCTN (2022). ISRCTN Registry. URL: https://www.isrctn.com/ (09/12/2022).

Jayachandran, S., M. Biradavolu, and J. Cooper (2021). Using machine learning and qualitative interviews

to design a ﬁve-question women’s agency index.

Karlan, D. and D. H. Wood (2017). The eﬀect of eﬀectiveness: Donor response to aid eﬀectiveness in a

direct mail fundraising experiment. Journal of Behavioral and Experimental Economics 66, 1–8.

Kasy, M. and A. Sautmann (2021). Adaptive treatment assignment in experiments for policy choice. Econo-

metrica 89 (1), 113–132.

King, G. (2007). An introduction to the Dataverse network as an infrastructure for data sharing. Sociological

Methods & Research 36 (2), 173–199.

LaLonde, R. J. (1986). Evaluating the econometric evaluation of training programs with experimental data.


                                                     24
American Economic Review 76, 604–620.

Lopez, C., A. Sautmann, and S. Schaner (2022). Does patient demand contribute to the overuse of prescrip-

tion drugs? American Economic Journal: Applied Economics 14 (1), 225–60.

McCray, A. T. and N. C. Ide (2000). Design and implementation of a national clinical trials registry. Journal

of the American Medical Informatics Association 7 (3), 313–323.

Meager, R. (2019). Understanding the average impact of microcredit expansions: A Bayesian hierarchical

analysis of seven randomized experiments. AEJ: Applied Eocnomics 11, 57–91.

Meghir, C., A. M. Mobarak, C. D. Mommaerts, and M. Morten (2019, July). Migration and informal

insurance: Evidence from a randomized controlled trial and a structural model. Working Paper 26082,

National Bureau of Economic Research.

Okunogbe, O. and V. Pouliquen (2022). Technology, taxation, and corruption: Evidence from the introduc-

tion of electronic tax ﬁling. American Economic Journal: Economic Policy 14 (1), 341–72.

Oster, E. and R. Thornton (2012). Determinants of technology adoption: Peer eﬀects in menstrual cup

take-up. Journal of the European Economic Association 10 (6), 1263–1293.

Parker, K., M. Nunns, Z. Xiao, T. Ford, and O. C. Ukoumunne (2021). Characteristics and practices of

school-based cluster randomised controlled trials for improving health outcomes in pupils in the United

Kingdom: a methodological systematic review. BMC Medical Research Methodology 21 (1), 152.

Rosenzweig, M. R. and C. Udry (2019). External validity in a stochastic world: Evidence from low-income

countries. The Review of Economic Studies .

Tabord-Meehan, M. (2018). Stratiﬁcation trees for adaptive randomization in randomized controlled trials.

arXiv e-prints , arXiv:1806.05127.

The World Bank (2022). Microdata library. URL: https://microdata.worldbank.org (12/09/2022).

Todd, P. E. and K. I. Wolpin (2006). Assessing the impact of a school subsidy program in Mexico: Using

a social experiment to validate a dynamic behavioral model of child schooling and fertility. American

Economic Review 96 (5), 1384–1417.

Todd, P. E. and K. I. Wolpin (2010). Structural estimation and policy evaluation in developing countries.

Annual Review of Economics 2 (1), 21–50.

Vivalt, E. (2015). Heterogeneous treatment eﬀects in impact evaluation. American Economic Review 105 (5),

467–70.

Vivalt, E. (2019). Speciﬁcation searching and signiﬁcance inﬂation across time, methods and disciplines.

Oxford Bulletin of Economics and Statistics 81 (4), 797–816.

Vivli (2022). A global clinical research data sharing platform. URL: https://vivli.org/ (09/12/2022).




                                                     25
Appendices
A    The Metadata Schema

Below is the proposed schema in full. The ﬁrst two columns name the ﬁeld and provide a short description.

Column 3 deﬁnes the encoding scheme which describes whether the ﬁeld permits only controlled entries, free

text, numeric values, etc. Column 4 speciﬁes the “cardinality” of the ﬁeld, or the minimum and maximum

number of entries. A minimum of 0 means the ﬁeld is optional. A maximum of n means the ﬁeld can

be repeated multiple times. For example, a ﬁeld with cardinality (0..1) is optional and unique (such as

the abstract of the study), whereas a ﬁeld with cardinality (1..n) must contain at least one entry and can

contain multiple (such as the list of authors). Fields may be designated as optional if we consider them

useful for potential secondary uses of the data, but they may not be applicable in all cases or not known

to the contributor. The last column is not part of the schema but provides some notes on the suggested

implementation of an RCT data catalog at the back and front end, such as data entry support and display

options.

  As described in the text, many metadata ﬁelds below are adapted or taken from existing schemata, and

where possible we kept ﬁeld deﬁnitions similar to their sources to facilitate the mapping of ﬁelds. Sections

I (Basic Information), II (Study Population), V (Data) and VII (External Resources) are in large parts

similar or identical to World Bank IHSN ﬁelds or the underlying DDI ﬁelds. Section III (Outcomes and

Interventions) borrows extensively from ClinicalTrials.gov. Some AEA RCT Registry ﬁelds are referenced in

section VI (Ethics and Research Transparency). All sections except I contain at least some new ﬁelds, and

section IV (Study Design) is new in many parts. The GitHub repository contains a full crosswalk between

schemata.


    Table Legend
    Bolded text             Denotes a set of repeatable questions (a “loop”)
    Encoding scheme         CV           Controlled vocabulary
    Cardinality             0..1         Optional and non-repeatable
                            0..n         Optional and repeatable
                            1..1         Mandatory and non-repeatable
                            1..n         Mandatory and repeatable




                                                    A-1
      The Metadata Schema
       I. Basic Information
      Field                                   Description                                                                Encoding      Cardin-   Programming Notes
                                                                                                                                       ality
      I.1.    Title                            The name of the study.                                                      Free text   1..1
      I.2.    Authors/owners: The person(s), corporate body, or agency responsible for the substantive and intellectual content of the
              data. This list may diﬀer from the authors named on an associated paper or grant.
              I.2.A. Authors/owners:           Use “surname, ﬁrst name” format.                                            Free text   1..n      Loop: assign unique ID to
                      Name                                                                                                                       reference each author.
              I.2.B. Authors/owners:           Author’s aﬃliated institution at the time of data creation. Can be the      Free text   1..n
                      Aﬃliation                same as above if the owner is an agency.
      I.3.    Abstract                         A summary describing the purpose, nature, and scope of the RCT and          Free text   0..1
                                               data collection, special characteristics of its contents, and major subject
                                               areas covered.
      I.4.    Topic classiﬁcation              The broad substantive topic(s) covered by the data.                         CV          1..n      Backend: format as
                                                                                                                                                 “Select all that apply.”
      I.5.    Version                         Version number of the study entry at the appropriate level.                Numeric  N/A            Automatically generated.
      I.6.    Version date                    Version date of the study entry at the appropriate level.                  Format:  N/A            Automatically generated.
                                                                                                                         YYYY-MM-
                                                                                                                         DD
      II. Study Population
      Field                                   Description                                                                Encoding      Cardin-   Programming Notes
                                                                                                                                       ality
      II.1.   Country of intervention      The country or countries in which the intervention was implemented, even CV: ISO            1..n      Backend: format as
                                           if the study did not cover the entire country.                                country                 “Select all that apply.”
                                                                                                                         codes.
      II.2.   Geographical coverage        The geographic level at which the data is representative, within the          Free text     1..1      Reference IV.2-3“Study
                                           country of intervention and conditional on the inclusion / exclusion                                  Sampling Method”.




A-2
                                           criteria. Provides the total geographic scope of the data and, if needed,
                                           additional geographic selection criteria. Entries may be region or state
                                           names, along with qualiﬁers such as “urban areas only”, etc. Note that a
                                           study can for example have national coverage even when some districts
                                           are not included, as long as all districts were eligible for sampling as part
                                           of the sampling strategy.
      II.3.   Inclusion/exclusion criteria The criteria to determine eligibility for inclusion in the study and          Free text     1..1      Reference IV.2-3“Study
                                           randomized assignment. In general, it should be possible to tell from the                             Sampling Method”.
                                           country, geographical coverage, unit of randomization, and inclusion /
                                           exclusion criteria whether a given individual or unit (hypothetical or real)
                                           is a member of the population that is the object of the research and from
                                           which the sample was drawn.
      II.4.   Unit of randomization        The level of treatment assignment: individuals, locations, facilities,        CV A          1..1
                                           groups, etc. Also referred to as the level of clustering. The level of
                                           treatment assignment/unit of randomization may be the same as the the
                                           unit of observation.
      II.5.   Unit of randomization: Tar- The targeted number of randomization units pooled over all study arms          Numeric       0..1
              geted study sample size      and periods or phases of random assignment (waves). Include if an
                                           approximate target was used.
      II.6.   Unit of randomization:       The actual number of randomization units pooled over all study arms and Numeric             1..1
              Actual study sample size     periods or phases of random assignment (waves). Count only
                                           randomization units for which an outcome of one observational unit was
                                           measured at least once across all post-intervention data collection cycles.
      III. Outcomes and Interventions
      Field                           Description                                                                           Encoding       Cardin-   Programming Notes
                                                                                                                                           ality
      III.1. Outcomes: Measurements used to determine the eﬀect of an intervention/treatment/program on experimental subjects or
             units. Please repeat the information for each main outcome measure.
             III.1.A. Outcome: Name           A brief descriptive name used to refer to the outcome measure.     Free text       1..n                Loop: assign unique ID to
                                                                                                                                                     reference each outcome.
            III.1.B. Outcome: Category        The broad category of the speciﬁc outcome measure.                            CV             1..n      CV under development.
                                                                                                                                                     Collect responses for
                                                                                                                                                     updating the CV.
            III.1.C. Outcome:                  Additional information about the outcome measure, such as importance         Free text      0..n
                     Description               to the analysis (e.g., primary vs. secondary outcome), unit of
                                               measurement (e.g. meters), format/data type (e.g. categorical),
                                               distribution class (for numeric outcomes, e.g. count, binary, real
                                               numbers), range of possible values (e.g. 0-100), as well as a description of
                                               how the outcome was constructed (if relevant).
             III.1.D. Outcome: Collected       Is a measurement of this outcome available before any treatment or           CV: Yes /      1..n
                      pre-treatment?           notiﬁcation of treatment took place (“at baseline”)?                         No
      III.2. Interventions: An intervention is deﬁned as a process or action that is the focus of an RCT or experiment. The intervention
             may be a policy change (such as the right to buy an amount of subsidized food), an experimental condition (such as a high
             or low cost of contributing to a public good in a lab experiment), an encouragement, nudge, or information treatment (such
             as text messages or TV ads), etc. Diﬀerent variants of a process or action are a distinct intervention if they are separately
             randomly assigned. Receiving no treatment is not an intervention. Please repeat the information for each intervention tested
             in the study.
             III.2.A. Intervention: Name       A brief descriptive name used to refer to the intervention.                  Free text      1..n      Loop: assign unique ID to
                                                                                                                                                     reference each
                                                                                                                                                     intervention.
            III.2.B. Intervention: Type       The category or type of intervention.                                         CV             1..n      CV under development.
                                                                                                                                                     Collect “other” responses




A-3
                                                                                                                                                     for updating the CV.
             III.2.C. Intervention:           Free text description of the details of the intervention.                     Free text      0..n
                      Description
      III.3. Intervention assignment          The strategy used for assigning interventions to study arms.                  CV B           1..1
             strategy
      III.4. Assignment strategy              A description of the intervention assignment strategy. If relevant, provide Free text        0..1
             description                      details such as the timing of the diﬀerent interventions in a given study
                                              arm in more complex designs such as phase-in and crossover. If the
                                              treatment assignment was carried out using stratiﬁed randomization,
                                              please explain here how the strata were formed, and if possible, name the
                                              stratiﬁcation variables.
      III.5. Number of arms                   The number of subgroups of participants in the randomized trial that          Numeric        1..1      Backend: Restrict to
                                              receive none, one, or several speciﬁc interventions (i.e., arms) according to                          integers > 1.
                                              the trial’s protocol. For a trial with multiple periods or phases of random
                                              assignment (waves) that have diﬀerent numbers of arms, the maximum
                                              number of arms from all periods or phases.
      III. Outcomes and Interventions (continued)
      Field                            Description                                                                           Encoding    Cardin-   Programming Notes
                                                                                                                                         ality
      III.6. Arms: Subgroups of participants that receive none, one, or several speciﬁc interventions according to the trial’s protocol.
             Please repeat the information for each study arm.
             III.6.A. Arm: Name               A brief descriptive name used to refer to the study arm.                   Free Text       1..n      Loop: assign unique ID to
                                                                                                                                                   reference each arm. Use
                                                                                                                                                   III.5 to generate required
                                                                                                                                                   number of arms.
                                                                                                                                                   Pre-populate with generic
                                                                                                                                                   names, e.g. Arm 1, Arm 2.
            III.6.B. Arm: Targeted sample The targeted number of randomization units assigned to this study arm              Numeric     0..n      Backend: Restrict to
                     size.                across all periods or phases of random assignment (waves).                                               integers > 0. Cross check
                                                                                                                                                   sum with II.5.
            III.6.C. Arm:     Actual sample The actual number of randomization units assigned to this study arm              Numeric     1..n      Backend: Restrict to
                     size.                     across all periods or phases of random assignment (waves). Count only                               integers ≥ 0. Cross check
                                               randomization units for which at least one outcome of one observation                               sum with II.6.
                                               unit was measured post intervention.
            III.6.D. Arm:       interventional Indicate which interventions are provided in this arm of the study.           Free text   1..n      Implement as checkboxes
                     cross-reference                                                                                                               using III.2.A via unique
                                                                                                                                                   intervention IDs
                                                                                                                                                   generated.
      III.7. Intervention start date          The ﬁrst date when the administration of any of the interventions (after       Format:  1..1         Backend: give examples,
                                              random assignment) began. Please enter the earliest start date of all          YYYY-MM-              e.g. 2016-05-XX or
                                              interventions. If any element of the date is unspeciﬁed, use “X” as input.     DD                    202X-XX-XX.
      III.8. Intervention end date            The last date when the administration of any of the interventions ended.       Format:  1..1
                                              Please enter the last end date of all interventions. any element of the date   YYYY-MM-
                                              is unspeciﬁed, use “X” as input.                                               DD
      IV. Study Design




A-4
      Field                                   Description                                                                    Encoding    Cardin-   Programming Notes
                                                                                                                                         ality
      IV.1. Prior work                        Does this study extend or rely on any prior study? Examples are                CV: Yes /   1..1      If “yes” is selected,
                                              collecting additional outcomes for interventions randomly assigned in a        No /                  reference VII. External
                                              previous study, expanding the sample, or adding a treatment arm.               Unknown               Resources for information
                                                                                                                                                   on the prior study.
      IV.2. Study sampling method: Type     The type of sampling method used to select the randomization units to           CV C         1..1
                                            be included in the experiment. If sampling is performed in several stages,
                                            please select “Probability – multistage,” or “Mix of probability and
                                            non-probability sampling” and provide additional details in the
                                            description ﬁeld.
      IV.3. Study sampling method:      De- An overall description of the procedure for sampling the randomization          Free text    0..1      Backend: reference unit of
            scription                       units included in the study; if the sampling was performed in several                                  observation information in
                                            stages, consider listing them out with an explanation. Include a                                       V.1.I.
                                            description of the method used to obtain the targeted number of
                                            randomization and observation units (e.g., power calculations), along with
                                            any information related to the sampling that is relevant to users
                                            comparing targeted and actual units of randomization.
      IV.4. Covariates: Individual          Please select all individual-level covariate categories included in this study. CV D         0..n      Backend: format as
                                                                                                                                                   “Select all that apply.” If
                                                                                                                                                   no option is selected, ask
                                                                                                                                                   user to conﬁrm.
      IV. Study Design (continued)
      Field                                    Description                                                                     Encoding        Cardin-   Programming Notes
                                                                                                                                               ality
      IV.5. Covariates: Group                  Please select all cluster- or group-level covariate categories included in      CV E            0..n      Backend: format as
                                               this study.                                                                                               “Select all that apply.” If
                                                                                                                                                         no option is selected, ask
                                                                                                                                                         user to conﬁrm.
      IV.6. Study was designed to              Please select all types of treatment eﬀects the study was designed to       CV F                0..n      Backend: format as
            analyze                            measure or analyze (i.e., the randomization was designed accordingly and                                  “Select all that apply.” If
                                               the data includes the necessary information, such as intervention take-up).                               no option is selected, ask
                                                                                                                                                         user to conﬁrm. Collect
                                                                                                                                                         “other” responses for
                                                                                                                                                         updating the CV.
      IV.7. Compliance                         Please describe what forms of noncompliance with any of the interventions Free text             0..1      Reference the implications
                                               are possible or observed, and, if available, how treatment compliance is                                  of selected options “LATE
                                               measured in the data and what the take-up rates are. Noncompliance                                        or TOT” and “ATE” for
                                               occurs when not all units take up or receive the assigned intervention, or                                compliance in IV.6.
                                               when at least some units receive an intervention they were not assigned.
      V. Data
      Field                                    Description                                                                     Encoding        Cardin-   Programming Notes
                                                                                                                                               ality
      V.1. Datasets: Information about the datasets included in this study and the methodology employed in data collection. Datasets
           are distinct from data ﬁles. A set of records may constitute a separate dataset if it contains information central to the analysis,
           such as an outcome measure, and (i) consists of observational units from a distinct study population or (ii) comes from an
           independent data source or mode of data collection. Please repeat the following elements for each dataset.
           V.1.A. Dataset: Name                A brief descriptive name used to refer to the dataset.                          Free text       1..n      Loop: assign unique ID to
                                                                                                                                                         reference each dataset.
            V.1.B. Dataset: Unit of obser-     The basic unit of analysis or observation that the dataset describes. The       CV A            1..n      Backend: multiple choice
                   vation                      unit of observation may be the same as the unit of randomization.                                         (1 option per dataset).




A-5
            V.1.C. Dataset: Unit of obser-     The targeted number of observation units pooled over all study arms and         Numeric         0..n      Backend: Restrict to
                   vation: Targeted sam-       periods or phases of random assignment (waves). Include if an                                             integers > 0.
                   ple size                    approximate target was used.
            V.1.D. Dataset: Unit of obser-     The actual number of observation units included in the dataset.                 Numeric         1..n      Backend: Restrict to
                   vation: Actual sample                                                                                                                 integers > 0.
                   size
            V.1.E. Dataset: Kind of data       Please select all types of data included in the dataset.                        CV G            1..n      Backend: format as
                                                                                                                                                         “Select all that apply.”
            V.1.F. Dataset: Time method        The time method or time dimension of the dataset.                               CV H            1..n      Backend: multiple choice
                                                                                                                                                         (1 option per dataset).
            V.1.G. Dataset: Number of cy- How many cycles (data collection or measurement rounds) are in the                   Numeric         1..n      Backend: Restrict to
                   cles                   dataset?                                                                                                       integers > 0.
                                                                                                                                                         Cross-validate with V.1.F.
                                                                                                                                                         (e.g. panel data vs. only 1
                                                                                                                                                         included cycle).
            V.1.H. Dataset: Mode of data       The manner(s) in which the interview was conducted or information was           CV I            0..n      Backend: format as
                   collection                  gathered.                                                                                                 “Select all that apply.”
            V.1.I. Dataset: Observational      A description of the procedure used to select the observational units           Free text       0..n      Backend: reference power
                   unit sampling method:       within the randomization units if the unit of observation is diﬀerent from                                calculations used to
                   Description                 the unit of randomization. If sampling was performed in several stages,                                   determine targeted
                                               consider listing them out with an explanation. Include any information                                    number of observations in
                                               related to the sampling that is relevant to users comparing targeted and                                  IV.3.
                                               actual units of observation.
      V. Data (continued)
      Field                                     Description                                                               Encoding     Cardin-   Programming Notes
                                                                                                                                       ality
           V.1.J. Dataset:      Sampling       The sampling procedures used may make it necessary to apply weights to CV: Yes /        1..n
                  weights                      produce accurate statistical results. Are sampling weights included in this No
                                               dataset?
           V.1.K. Dataset: Notes on data       Brief description of the data collection or compilation. Include any        Free text   0..n
                  collection                   relevant information such as which of the dataset’s cycles was collected
                                               pre-treatment, during treatment, or post-treatment; reasons for
                                               diﬀerences between time period covered by the data and dates of data
                                               collection; quality assurance protocols such as number of call-backs; etc.
           V.1.L. Dataset: Cycles: Information on the time period covered by the data and, if diﬀerent, period of data collection in
                  each cycle. These are often identical but may diﬀer in retrospective surveys or administrative data. Please repeat
                  the information for each cycle (wave or round) included in this dataset.
                  V.1.L.i.   Dataset: Cycle: A brief descriptive name used to refer to the cycle (data collection or       Free text   1..n      Loop: use unique dataset
                             Cycle name        measurement rounds), such as study population census, baseline, endline,                          ID and assign unique ID
                                               etc.                                                                                              to reference each cycle
                                                                                                                                                 within the dataset.
                  V.1.L.ii. Dataset: Cycle: Start date of the time period covered by the data in this data collection     Format:      1..n
                            Start of time   cycle. If any element of the date is unspeciﬁed, use “X” as input.            YYYY-MM-
                            period covered                                                                                DD
                 V.1.L.iii. Dataset: Cycle: End date of the time period covered by the data in this data collection       Format:      1..n
                            End of time     cycle. If any element of the date is unspeciﬁed, use “X” as input.            YYYY-MM-
                            period covered                                                                                DD
                 V.1.L.iv. Dataset: Cycle: Start date of the data collection, if diﬀerent from the start date of the      Format:      0..n
                            Start of data   time period covered by this data collection cycle. If any element of the      YYYY-MM-
                            collection      date is unspeciﬁed, use “X” as input.                                         DD
                 V.1.L.v. Dataset: Cycle: End date of the data collection, if diﬀerent from the end date of the time      Format:      0..n
                            End of data     period covered by this data collection cycle. If any element of the date is   YYYY-MM-




A-6
                            collection      unspeciﬁed, use “X” as input.                                                 DD
           V.1.M.Dataset: Arms: Please repeat this information for each treatment arm of this study.
                 V.1.M.i. Dataset: Arm: A brief descriptive name used to refer to the study arm.                          Free text    1..n      Loop: use unique arm ID
                            Name                                                                                                                 and unique dataset ID to
                                                                                                                                                 reference each arm within
                                                                                                                                                 the dataset. Cross
                                                                                                                                                 check/pre-ﬁll with arm
                                                                                                                                                 names in III.6.A.
                  V.1.M.ii.  Dataset: Arm:      The targeted number of observational units in this arm. Include if an     Numeric      0..n      Cross check sum with
                             Targeted           approximate target was used.                                                                     V.1.C.
                             number of
                             observational
                             units
                  V.1.M.iii. Dataset: Arm:      The actual number of observational units in this arm included in the      Numeric      0..n      Cross check sum with
                             Actual number      dataset. Leave empty if this data set does not have experimental arms                            V.1.D.
                             of observational   (e.g. study population census prior to randomization).
                             units
      VI. Ethics and Research Transparency
      Field                           Description                                                                       Encoding       Cardin-   Programming Notes
                                                                                                                                       ality
      VI.1. Ethics Review: Include information on any ethics review conducted.
            VI.1.A. Ethics Review: Review- The name or hosting institution of the ethics review body.                   Free text      0..n
                    ing institution
            VI.1.B. Ethics Review: Review IRB protocol number or case reference.                                        Free text      0..n
                    protocol number
      VI.2. Research ethics documentation Select all documentation available discussing the ethics of the research or   CV J           0..n      Back end: format as
                                           documenting the consent process.                                                                      “select all that
                                                                                                                                                 apply”.Collect “other”
                                                                                                                                                 responses for updating the
                                                                                                                                                 CV.
      VI.3. Registration/pre-speciﬁcation   Was the experiment registered or pre-speciﬁed? Select all documentation     CV K           0..n      Back end: format as
                                            available with time-stamped/version-controlled records.                                              “select all that apply”.
                                                                                                                                                 Collect “other” responses
                                                                                                                                                 for updating the CV.
      VI.4. Funding agency/sponsor          The source(s) of funds for production of the work. Please list all          Free text      0..n
                                            organizations (local, national, or international) that have materially
                                            contributed, in cash or in kind, to the data collection or compilation.
      VI.5. Implementation partner          Other parties or persons that have played a signiﬁcant role in              Free text      0..n
                                            implementing the interventions or collecting the data. Please name
                                            individuals’ aﬃliations and roles in their organization at the time of
                                            implementation.
      VII. External Resources
      Field                                 Description                                                                 Encoding       Cardin-   Programming Notes
                                                                                                                                       ality
           Resources: Information on any related materials. Include the location(s) of the data, separating locations with diﬀerent
      VII.1.                                                                                                                                     Cross-reference/preﬁll




A-7
           access conditions, as well as other information helpful to data users, such as related publications, information on prior             information in IV.1 (prior
           studies the work extends or builds on, questionnaires or codebooks, and any ethics documentation or research-transparency             work/related studies), V.1
           related records.                                                                                                                      (listed datasets), VI.2
                                                                                                                                                 (research ethics
                                                                                                                                                 documentation), and VI.3
                                                                                                                                                 (registration/pre-
                                                                                                                                                 speciﬁcation)
            VII.1.A.External      resource: Please select all external resource types included in this location or      CV L           1..n      Check boxes to select all
                    Type                    citation.                                                                                            that apply.
            VII.1.B.External resource: De- A brief description or name of the resource(s).                              Free text      0..n
                    scription
            VII.1.C.External resource: Ci- Complete bibliographic reference containing all of the elements of a         Free text      1..n
                    tation                  citation that can be used to cite the work following a standard format
                                            such as APA, MLA, Chicago, etc.
            VII.1.D.External resource: Link The DOI or, if DOI is not available, URL of the resource. Leave blank if    Free text      0..n
                    (DOI/URL)               neither is available.
            VII.1.E.External resource: Ac- Is access to the resource restricted in any way? If known, provide a         Free text      0..n
                    cess policy             description of the restrictions and/or the process for accessing the
                                            resource.
B      The Controlled Vocabularies

Below is a list of the controlled vocabularies for text ﬁelds in the metadata schema, labeled alphabetically for

referencing. The ﬁrst two columns contain parent categories and detailed child categories. In some controlled

vocabularies, the parent category can be selected, whereas in others, the user has to select one of the child

categories (following the conventions of the source CV); this is indicated by the use of italics for the parent.

The third column contains “Notes” on the CV options.

    Options added to existing CVs are indicated by underlined text. If a CV is labeled as modiﬁed, but no

entries are underlined (as in Controlled vocabulary “D. Covariates: Individual”), this indicates that some

categories were dropped or consolidated or that the “Notes” were edited or added.


     Table Legend
     italics                 Parent categories in italics cannot be selected and are displayed for organizational purposes
                             only. Selecting one of the child categories is required.
     underline               Underlined ﬁelds were added or modiﬁed from the original source.




                                                        B-1
      Table 2: Controlled Vocabularies
      A. Unit of Obs./Randomization (Source: Adapted from DDI)            Notes
      1.   Individual                                                     Any individual person, irrespective of demographic characteristics, professional,
                                                                          social or legal status, or aﬃliation.
           1.1 Political/social leader
           1.2 Health provider                                            e.g. Doctors, nurses, midwives, etc.

           1.3 Patient
           1.4 Education provider                                         e.g. Teachers, principals, etc.
           1.5 Student
           1.6 Farmer
           1.7 Employee
           1.8 Business owner
           1.9 Voter
           1.10 Public servant
           1.11 Parent
           1.12 Other
      2.   Organization or legal entity                                   Any kind of formal administrative and functional structure - includes associations,
                                                                          institutions, agencies, businesses, political parties, schools, etc.
           2.1 Firm or business
           2.2 Legal or administrative division of a ﬁrm or business      e.g. Department
           2.3 Farm or agricultural business
           2.4 School
           2.5 Legal or administrative division of a school               e.g. subjects, cohorts, grades
           2.6 University/college
           2.7 Legal or administrative division of a university/college   e.g. majors, cohorts




B-2
           2.8 Hospital, health clinic or doctor’s oﬃce
           2.9 Other organization or legal entity
      3.   Family                                                         Two or more people related by blood, marriage (including step-relations), or adop-
                                                                          tion / fostering, or who identify as a couple, and who may or may not live together.
           3.1 Nuclear family
           3.2 Extended family
           3.3 Parent(s) with dependent children
           3.4 Couples
           3.5 Other
      4.   Household                                                      A person or group of people who share common living arrangements or certain
                                                                          amenities, resources, or facilities. This may include pooling some or all of their
                                                                          income and wealth and collectively consuming certain types of goods and services,
                                                                          mainly housing and food.
      5.   Housing Unit                                                   A house, apartment, mobile home, group of rooms, or single room that is occupied
                                                                          (or intended for occupancy) as separate living quarters in which the occupants live
                                                                          and eat separately from other building occupants.
      A. Unit of Observation/Randomization (Cont.)             Notes
      6.    Other group                                        Two or more individuals assembled together or having some unifying relationship.
      7.    Event/process                                      Any type of incident, occurrence, or activity. Events are usually one-time, individual occur-
                                                               rences, with a limited, or short duration. Examples: criminal oﬀenses, riots, meetings, elections,
                                                               sports competitions, terrorist attacks, natural disasters like ﬂoods, etc. Processes typically take
                                                               place over time, and may include multiple ”events” or gradual changes that ultimately lead,
                                                               or are projected to lead, to a particular result. Examples: court trials, criminal investigations,
                                                               political campaigns, medical treatments, education, athletes’ training, etc.
      8.    Geographic unit                                    Any entity that can be spatially deﬁned as a geographic area, with either natural (physical) or
                                                               administrative boundaries.
            8.1 Physical division of a ﬁrm or business         e.g. plants, production lines
            8.2 Physical division of a school                  e.g. classrooms, buildings
            or university/college
            8.3 Agricultural plot or physical unit             e.g., stable, greenhouse)
            8.4 Census tract, zip code, or other neighborhood-
            level administrative unit
            based on geographic division
            8.5 Village, community, or other
            town-level geographic division
            8.6 District, province, or other
            upper-level geographic division
      9.    Time unit                                          Any period of time: year, week, month, day, or bimonthly or quarterly periods, etc.
      10.   Text unit                                          Books, articles, any written piece/entity.
      11.   Other



      B. Intervention Assign. Strategy (Source: Adapted from CT.gov)                 Notes




B-3
      1.    Parallel                                                                 Arms are assigned to one (or no) intervention in parallel for the duration of
                                                                                     the intervention(s).
      2.    Factorial                                                                Two or more interventions are partially or fully cross-randomized to arms and
                                                                                     evaluated in parallel.
      3.    Crossover                                                                Arms are assigned to diﬀerent interventions or combinations of interventions
                                                                                     (including no intervention) during diﬀerent phases of the study.
      4.    Other
      C. Study Sampling Method (Source: DDI)            Notes
      1.   Total universe (population)                  All units (individuals, households, organizations, etc.) of a target population are included in
                                                        the randomization. For example, if the target population is deﬁned as the members of a trade
                                                        union, all union members are invited to participate in the study. Also called “census” if the
                                                        entire population of a regional unit (e.g. a country) is selected.
      2.   Probability                                  All units (individuals, households, organizations, etc.) of a target population have a non-
                                                        zero probability of being included in the randomization sample and this probability can be
                                                        accurately determined. Use this broader term if a more speciﬁc type of probability sampling
                                                        is not known or is diﬃcult to identify.
           2.1 Simple random                            All units of a target population have an equal probability of being included in the randomization
                                                        sample. Typically, the entire population is listed in a “sample frame”, and units are then chosen
                                                        from this frame using a random selection method.
           2.2 Systematic random                        A ﬁxed selection interval is determined by dividing the population size by the desired sample
                                                        size. A starting point is then randomly drawn from the sample frame, which normally covers
                                                        the entire target population. From this starting point, units for the randomization sample are
                                                        chosen based on the selection interval. Also known as interval sampling.
           2.3 Stratiﬁed                                The target population is subdivided into separate and mutually exclusive segments (strata)
                                                        that cover the entire population. Independent random samples are then drawn from each
                                                        segment. For example, in a national public opinion survey the entire population is divided into
                                                        two regional strata: East and West. After this, randomization units are drawn from within
                                                        each region using simple or systematic random sampling. Use this broader term if the speciﬁc
                                                        type of stratiﬁed sampling is not known or diﬃcult to identify.
           2.3.1 Stratiﬁed: Proportional stratiﬁed      The target population is subdivided into separate and mutually exclusive segments (strata) that
                                                        cover the entire population. Independent random samples are then drawn from each segment.
                                                        Use this broader term if the speciﬁc type of stratiﬁed sampling is not known or diﬃcult to
                                                        identify.
           2.3.2 Stratiﬁed: Disproportional stratiﬁed   The target population is subdivided into separate and mutually exclusive segments (strata)
                                                        that cover the entire population. In disproportional sampling the number of units chosen from
                                                        each stratum is not proportional to the population size of the stratum when viewed against
                                                        the entire population. The number of sampled randomization units from each stratum can be




B-4
                                                        equal, optimal, or can reﬂect the purpose of the study, like oversampling of diﬀerent subgroups
                                                        of the population.
           2.4 Cluster                                  The target population is divided into naturally occurring segments (clusters) and a probability
                                                        sample of the clusters is selected. Data are then collected from all units within each selected
                                                        cluster. Sampling is often clustered by geography, or time period. Use this broader term if a
                                                        more speciﬁc type of cluster sampling is not known or is diﬃcult to identify.
           2.4.1 Cluster: Simple random                 The target population is divided into naturally occurring segments (clusters) and a simple
                                                        random sample of the clusters is selected for randomization. Data are then collected from all
                                                        units within each selected cluster. For example, for a sample of students in a city, a number of
                                                        schools would be chosen using the random selection method, and then all of the students from
                                                        every sampled school would be included.
           2.4.2 Cluster: Stratiﬁed random              The target population is divided into naturally occurring segments (clusters); next, these are
                                                        divided into mutually exclusive strata and a random sample of clusters is selected from each
                                                        stratum. Data are then collected from all units within each selected cluster. For example, for
                                                        a sample of students
      C. Study Sampling Method (Cont.)                        Notes
                                                           in a city, schools would be divided into two strata by school type (private vs. public); schools
                                                           would be then randomly selected from each stratum, and all of the students from every sampled
                                                           school would be included.
           2.5 Multistage                                  Sampling is carried out in stages using smaller and smaller units at each stage, and all stages
                                                           involve a probability selection. The type of probability sampling procedure may be diﬀerent
                                                           at each stage. For example, for a sample of students in a city, schools are randomly selected in
                                                           the ﬁrst stage. A random sample of classes within each selected school is drawn in the second
                                                           stage. Students are then randomly selected from each of these classes in the third stage.
      3.   Non-probability                                 The selection of randomization units (individuals, households, organizations, etc.) from the tar-
                                                           get population is not based on random selection. It is not possible to determine the probability
                                                           of each element to be sampled. Use this broader term if the speciﬁc type of non-probability is
                                                           not known, diﬃcult to identify, or if multiple non-probability methods are being employed.
           3.1 Availability                                The sample selection is based on the units’ accessibility/relative ease of access. They may
                                                           be easy to approach, or may themselves choose to participate in the study (self-selection).
                                                           Researchers may have particular target groups in mind but they do not control the sample
                                                           selection mechanism. Also called “convenience” or “opportunity” sampling.
           3.2 Purposive                                   Randomization units are speciﬁcally identiﬁed, selected and contacted for the information they
                                                           can provide on the researched topic. Selection is based on diﬀerent characteristics of the
                                                           independent and/or dependent variables under study, and relies on the researchers’ judgement.
                                                           The study authors, or persons authorized by them have control over the sample selection
                                                           mechanism and the universe is deﬁned in terms of the selection criteria. Also called ”judgement”
                                                           sampling. Some types of purposive sampling are typical/deviant case, homogeneous/maximum
                                                           variation, expert, or critical case sampling.
           3.3 Quota                                       The target population is subdivided into separate and mutually exclusive segments accord-
                                                           ing to some predeﬁned quotation criteria. The distribution of the quotation criteria (gen-
                                                           der/age/ethnicity ratio, or other characteristics, like religion, education, etc.) is intended to
                                                           reﬂect the real structure of the target population or the structure of the desired study popu-
                                                           lation. Non-probability samples are then drawn from each segment until a speciﬁc number of
                                                           randomization units has been reached.




B-5
           3.4 Respondent assisted                         Randomization units are identiﬁed from a target population with the assistance of units already
                                                           selected (adapted from “Public Health Research Methods”, ed. Greg Guest, Emily E. Namey,
                                                           2014). A typical case is snowball sampling, in which the researcher identiﬁes a group of units
                                                           that matches a particular criterion of eligibility. The latter are asked to recruit other members of
                                                           the same population that fulﬁll the same criterion of eligibility (sampling of speciﬁc populations
                                                           like migrants, etc.).
      4.   Mix of probability and non-probability sampling Sample design that combines probability and non-probability sampling within the same sam-
                                                           pling process. Diﬀerent types of sampling may be used at diﬀerent stages of creating the
                                                           randomization sample. For example, for a sample of minority students in a city, schools are
                                                           randomly selected in the ﬁrst stage. Then, a quota sample of students is selected within each
                                                           school in the second stage. If separate samples are drawn from the same target population
                                                           using diﬀerent sampling methods, the type of sampling procedure used for each sample should
                                                           be classiﬁed separately.
      5.   Other
      D. Covariates: Individual (Source: Adapted from GESIS)             E. Covariates: Higher (Source: New CV)
      1.    Sex                                                          1.    Housing/property characteristics or amenities
      2.    Age                                                          2.    Demographics of household members or household structure
      3.    Race/ethnicity                                               3.    Household assets - ownership or debt
      4.    Religion                                                     4.    Household income
      5.    Citizenship                                                  5.    Farm characteristics
      6.    Marital status/registered partnership                        6.    Demographic characteristics of town, village or other governmental unit
      7.    Education                                                    7.    Geographic characteristics of town, village or other governmental unit
      8.    Labor status                                                 8.    Ethno-political characteristics of town, village, or other governmental
                                                                               unit
            8.1 Description of employment                                9.    Crime, violence, or legal enforcement indicators
            8.2 Description of professional activity                     10.   Firm-level characteristics
            8.3 Professional status                                      11.   School characteristics
            8.4 Attachment to the labor force                            12.   Hospital or clinic characteristics
            8.5 Previous employment                                      13.   Other
      9.    Income
      10.   Other


      F. Study was designed to analyze (Source: New CV)   Notes
      1.    ITT                                           The data allows estimation of the eﬀect of being assigned to treatment, also called intent to
                                                          treat eﬀect or ITT (i.e., treatment assignment is recorded in the data; the default).
      2.    LATE or TOT                                   The data allows estimation of the eﬀect of receiving treatment, also called local average treat-
                                                          ment eﬀect (LATE) or eﬀect of treatment on the treated (TOT) (i.e., treatment compliance
                                                          or take-up is recorded in the data).
      3.    ATE                                           The study allows identiﬁcation of the average eﬀect of treatment in the study population,
                                                          also called average treatment eﬀect or ATE (i.e., treatment compliance is automatic/perfect;




B-6
                                                          this may be the case for e.g. laboratory experiments).
      4.    Heterogeneous treatment eﬀects or eﬀects by   The study was designed to allow for the identiﬁcation of heterogeneous treatment eﬀects or
            subgroup                                      eﬀects by subgroup for one or more covariates.
      5.    General equilibrium eﬀects                    The randomization was designed to be able to identify general equilibrium eﬀects (e.g., cluster
                                                          randomization to measure cluster-level eﬀects on prices, labor market outcomes, etc.).
      6.    Spillovers or externalities                   The study was designed to measure spillover eﬀects or externalities caused by the intervention
                                                          (e.g., cluster randomization with varying saturation and data collected on everyone in the
                                                          cluster).
      7.    Interaction eﬀect of diﬀerent interventions   The study’s interventions were assigned to arms in a way that allows the analysis of interac-
                                                          tion eﬀects (e.g., factorial designs).
      8.    Eﬀect of varying treatment intensity          The study was designed such that distinct arms were assigned diﬀerent intensities of a broader
                                                          intervention (e.g., a cash transfer that has $20, $40, and $60 arms).
      9.    Other                                         Any other design features that permit estimating the eﬀect of an intervention on units in the
                                                          study population in a speciﬁc way.
      G. Kind of Data (Source: Adapted from DDI Deﬁnition)   Notes
      1.    Sample survey data                               Survey data collected from a sample of an underlying population.
      2.    Census/enumeration data                          Data that covers a complete population.
      3.    Administrative records data                      Information collected, used, and stored primarily for administrative (i.e., operational)
                                                             rather than research purposes.
      4.    Aggregate data                                   Data at a level of aggregation higher than the units represented in the study, such as
                                                             country or state-level average household income.
      5.    Clinical data                                    Data either collected during the course of ongoing patient care or as part of a formal
                                                             clinical trial program.
      6.    Event/transaction data                           Data that describes an event or transaction, such as data recording sales/business trans-
                                                             actions.
      7.    Observation data/ratings                         Data collected as they occur (for example, observing behaviors, events, etc.), without
                                                             attempting to manipulate any of the independent variables.
      8.    Process-produced data                            Paradata or process metadata: Information about data cleaning and transformation
                                                             processes.
      9.    Time budget diaries                              Data collected from respondent-produced diaries that contain information on their time
                                                             use.
      10.   Choice experiments for
            preference eliciation
            10.1 Incentivized                                Data produced from choice experiments with real-world incentives.
            10.2 Hypothetical                                Data produced from hypothetical choice experiments (i.e., those that do not have any
                                                             real-world implications for the respondents.)
      11.   Economic games with                              Laboratory or “lab-in-the-ﬁeld.” Data collected from laboratory or lab-in-the-ﬁeld
            participant interaction                          games played by the respondents, such as dictator or trust games, with real-world
                                                             incentives.
      12.   Measurement and tests
            12.1 Educational                                 Assessment of knowledge, skills, aptitude, or educational achievement by means of spe-
                                                             cialized measures or tests. Includes standardized testing.




B-7
            12.2 Physical                                    Assessment of physical properties of living beings, objects, materials, or natural phe-
                                                             nomena. For example, blood pressure, heart rate, body weight and height, as well as
                                                             time, distance, mass, temperature, force, power, speed, GPS data on physical movement
                                                             and other physical parameters or variables, like geospatial data.
            12.3 Psychological                               Assessment of personality traits or psychological/behavioral responses by means of spe-
                                                             cialized measures or tests. For example, objective tests like self-report measures with
                                                             a restricted response format, or projective methods allowing free responses, including
                                                             word association, sentence or story completion, vignettes, cartoon test, thematic apper-
                                                             ception tests, role play, drawing tests, inkblot tests, choice ordering exercises, etc.
      13.   Textual data                                     Data taken or coded from texts, including but not limited to documents, reports, or
                                                             speeches.
      14.   Other
      H. Time Method (Source: Adapted from ADA)     Notes
      1.   One-time cross-sectional data
      2.   Repeated cross-sectional data
      3.   Panel                                    Datasets that contain baseline and endline surveys that track the same participants included
                                                    here.
      4.   Does not apply (admin or similar)
      5.   Other



      I. Mode of Data Collection (Source: Adapted from DDI)   Notes
      1.   Interview                                          A pre-planned communication between two (or more) people - the interviewer(s) and
                                                              the interviewee(s) - in which information is obtained by the interviewer(s) from the
                                                              interviewee(s). If group interaction is part of the method, use ‘Focus group’.
           1.1 Face-to-face interview                         Data collection method in which a live interviewer conducts a personal interview, pre-
                                                              senting questions and entering the responses. Use this broader term if not CAPI or
                                                              PAPI, or if not known whether CAPI/PAPI or not.
           1.1.1 Face-to-face: CAPI/CAMI                      Computer-assisted personal interviewing. Data collection method in which the inter-
                                                              viewer reads questions to the respondents from the screen of a computer, laptop, or
                                                              a mobile device like tablet or smartphone, and enters the answers in the same de-
                                                              vice. The administration of the interview is managed by a speciﬁcally designed pro-
                                                              gram/application.
           1.1.2 Face-to-face: PAPI                           Paper-and-pencil interviewing. The interviewer uses a traditional paper questionnaire
                                                              to read the questions and enter the answers.
           1.2 Telephone interview                            Interview administered on the telephone. Use this broader term if not CATI, or if not
                                                              known whether CATI or not.
           1.2.1 Telephone: CATI                              Computer-assisted telephone interviewing. The interviewer asks questions as directed
                                                              by a computer, responses are keyed directly into the computer and the administration
                                                              of the interview is managed by a speciﬁcally designed program.




B-8
           1.2.2 Telephone: PATI                              The interviewer uses a traditional paper questionnaire to read the questions and enter
                                                              the answers; the survey is conducted through a telephone.
           1.3 Email                                          Interviews conducted via e-mail, usually consisting of several e-mail messages that allow
                                                              the discussion to continue beyond the ﬁrst set of questions and answers, or the ﬁrst e-
                                                              mail exchange.
           1.4 Web-based                                      An interview conducted via the Internet. Examples include interviews conducted within
                                                              online forums or using web-based audio-visual technology enabling the interviewer(s)
                                                              and interviewee(s) to communicate in real time.
      2.   Self-administered questionnaire                    Self-administered questionnaire includes knowledge tests and preference elicitation.
           2.1 Paper                                          Self-administered survey using a traditional paper questionnaire delivered and/or col-
                                                              lected by mail (postal services), by fax, or in person by either interviewer, or respondent.
           2.2 Email                                          Self-administered survey in which questions are presented to the respondent in the text
                                                              body of an e-mail or as an attachment to an e-mail, but not as a link to a web-based
                                                              questionnaire. Responses are also sent back via e-mail, in the e-mail body or as an
                                                              attachment.
      I. Mode of Data Collection (Cont.)               Notes
           2.3 SMS/MMS                                 Self-administered survey in which the respondents receive the questions incorporated in SMS
                                                       (text messages) or MMS (messages including multimedia content) and send their replies in the
                                                       same format.
           2.4 Web-based                               Computer-assisted web interviewing (CAWI). Data are collected using a web questionnaire,
                                                       produced with a program for creating web surveys. The program can customize the ﬂow of the
                                                       questionnaire based on the answers provided, and can allow for the questionnaire to contain
                                                       pictures, audio and video clips, links to diﬀerent web pages etc. (adapted from Wikipedia).
           2.5 CASI                                    Computer-assisted self-interview (CASI). Respondents enter the responses into a computer
                                                       (desktop, laptop, Palm/PDA, tablet, etc.) by themselves. The administration of the ques-
                                                       tionnaire is managed by a speciﬁcally designed program/application but there is no real-time
                                                       data transfer as in CAWI, the answers are stored on the device used for the interview. The
                                                       questionnaire may be ﬁxed form or interactive. Includes VCASI (Video computer-assisted self-
                                                       interviewing), ACASI (Audio computer-assisted self-interviewing) and TACASI (Telephone
                                                       audio computer-assisted self-interviewing).
      3.   Self-administered writings and/or diaries   Narratives, stories, diaries, and written texts created by the research subject.
           3.1 Email                                   Narratives, stories, diaries, and written texts submitted via e-mail messages.
           3.2 Paper                                   Narratives, stories, diaries, and written texts created and collected in paper form.
           3.3 Web-based                               Narratives, stories, diaries, and written texts gathered from Internet sources, e.g. websites,
                                                       blogs, discussion forums.
      4.   Observation                                 Research method that involves collecting data as they occur (for example, observing behaviors,
                                                       events, etc.), without attempting to manipulate any of the independent variables.
           4.1 Field observation                       Observation that is conducted in a natural environment. Field observation is deﬁned as inter-
                                                       actions, not designed by the researcher.
           4.1.1 Participant ﬁeld observation          Type of ﬁeld observation in which the researcher interacts with the sub-
                                                       jects and often plays a role in the social situation under observation.
                                                       Note: “Field observation” is deﬁned as interactions not designed by the researcher.
           4.1.2 Non-participant ﬁeld observation      Observation that is conducted in a natural, non-controlled setting without any interaction
                                                       between the researcher and his/her subjects.




B-9
           4.2 Laboratory observation                  Observation that is conducted in a controlled,                    artiﬁcially created setting.
                                                       Note: “Laboratory observation” is deﬁned as researcher-designed economic games between participants
           4.2.1 Computer interactions: Participant    Computer-based economic games in which the researcher interacts with the subjects and often
                                                       plays a role in the situation under observation.
           4.2.2 Computer interactions:                Computer-based economic games that are conducted without any interaction between the re-
           Non-participant                             searcher and his/her subjects.
           4.2.3 Computer interactions:                Computer-based economic games in which a bot interacts with the subjects and often plays a
           Bot participant                             role in the situation under observation
           4.3.1 In-person interactions: Participant   Type of laboratory observation in which the researcher interacts with the subjects and often
                                                       plays a role in the social situation under observation. Example: Observation of children’s play
                                                       in a laboratory playroom with the researcher taking part in the play.
           4.3.2 In-Person interactions: Non-part-     Type of laboratory observation that is conducted without any interaction between the re-
           icipant                                     searcher and his/her subjects.
       I. Mode of Data Collection (Cont.)            Notes
       5.   Recording                                Registering by mechanical or electronic means, in a form that allows the information to be
                                                     retrieved and/or reproduced. For example, images or sounds on disc or magnetic tape.
       6.   Content coding                           As a mode of secondary data collection, content coding applies coding techniques to transform
                                                     qualitative data (textual, video, audio or still-image) originally produced for other purposes
                                                     into quantitative data (expressed in unit-by-variable matrices) in accordance with pre-deﬁned
                                                     categorization schemes.
       7.   Aggregation                              Statistics that relate to broad classes, groups, or categories. The data are averaged, totaled,
                                                     or otherwise derived from individual-level data, and it is no longer possible to distinguish the
                                                     characteristics of individuals within those classes, groups, or categories. For example, the num-
                                                     ber and age group of the unemployed in speciﬁc geographic regions, or national level statistics
                                                     on the occurrence of speciﬁc oﬀences, originally derived from the statistics of individual police
                                                     districts.
       8.   Other                                    Use if the mode of data collection is known, but not found in the list.


       J. Research ethics documentation (Source: New CV)     Notes
       1.   IRB protocol
       2.   Description of consent process
       3.   Consent forms text or dialogue
       4.   Record of consent in the data
       5.   Structured ethics appendix                       See Asiedu et al. (2021)
       6.   Other


       K. Registration/pre-speciﬁcation (Source: New CV)     Notes
       1.   Trial registration                               Entry in any trial registry
       2.   Trial pre-registration                           Pre-registration in any trial registry
       3.   WHO-accredited clinical trial registry           Any entry (pre- or post-registration) in a WHO-accredited clinical trial registry




B-10
       4.   Pre-analysis plan                                Registered/time-stamped pre-analysis plan
       5.   Pre-results acceptance                           Pre-results acceptance in an academic journal
       6.   Public pre-results document                      Other public pre-results proposal or document
       7.   Populated pre-analysis plan                      Populated pre-analysis plan separate from the research paper
       8    Other
       L. External Resources Types (Source: IHSN)   Notes
       1.    Database or data repository entry      Location of data included in this study.
       2.    Document
             2.1 Administrative                     This includes materials such as the survey budget; grant agreement with sponsors; list of staﬀ
                                                    and interviewers, etc.
             2.2 Analytical                         This includes documents that present analytical output (academic papers, etc.). This does not
                                                    include the descriptive survey report.
             2.3 Questionnaire                      This includes the actual questionnaire(s) used in the ﬁeld.
             2.4 Reference                          Any reference documents that are not directly related to the speciﬁc dataset, but that provide
                                                    background information regarding methodology, etc. For international standard surveys, this
                                                    may for example include the generic guidelines provided by the survey sponsor.
             2.5 Report                             Survey reports, studies and other reports that use the data as the basis for their ﬁndings.
             2.6 Technical                          Methodological documents related to survey design, interviewer’s and supervisor’s manuals,
                                                    editing speciﬁcations, data entry operator’s manual, tabulation and analysis plan, etc.
             2.7 Other                              Miscellaneous items.
       3.    Pre-analysis plan                      pre-analysis plan, if separate from the trial registration.
       4.    Populated pre-analysis plan            Populated pre-analysis plan, if separate from the trial registration/pre-analysis plan.
       5.    Research ethics documentation          Any documentation related to research ethics, such as IRB or other ethics review protocols,
                                                    consent process, consent forms, structured ethics appendix, etc.
       6.    Program                                Programs generated during data entry and analysis (data entry, editing, tabulation and anal-
                                                    ysis). Include replication ﬁles here.
       7.    Table                                  Tabulations such as conﬁdence intervals that may not be included in a general report.
       8.    Audio                                  Audio type ﬁles.
       9.    Map                                    Any cartographic information.
       10.   Photo
       11.   Video                                  Video type ﬁles provided as additional visual information.
       12.   Website                                Link to related website(s).




B-11
       13.   Other