Mapping the Landscape of Transactions: The Governance of Business Relations in Latin America

To what extent do firms use trust, law, and third-parties to ensure fulfillment of agreements to transact? How do they combine these mechanisms to form transactional governance structures? How do answers to these questions vary across countries? Generating the relevant data requires constructing a survey question answerable by any firm, anywhere. The question is administered in six South American countries. Applied to the resultant data, latent class analysis (LCA) estimates classes that correspond to the transactional governance structures that firms employ to support implementation of agreements. Without imposing an a priori model, LCA discovers meaningful governance structures. Bilateralism appears in all governance structures. Law is never used alone. Bilateralism and formal institutions are sometimes complements, never substitutes. Within-country regional variation in the use of bilateralism and law exceeds cross-country variation. LCA provides the posterior probabilities that each firm uses each governance structure, facilitating testing hypotheses consequent on Williamson's discriminating-alignment agenda.


I. Introduction
To what extent do firms use different mechanisms, such as trust, law, and third-parties, to support the fulfillment of their agreements to buy and sell goods and services? How do firms combine these mechanisms? How do answers to these questions vary across countries? Williamson (1979) refers to such combinations of mechanisms as transactional governance structures. In the 40 years since he introduced this concept, there has been little progress in ascertaining which governance structures are most commonly used. This is surprising given the consensus in the literature that levels of development are associated with the effectiveness of arrangements for enforcing agreements (Williamson 1985;North 1990;Greif 2001). Although the quantitative importance of such arrangements to growth and productivity is extremely difficult to pin down, there is ample evidence that it is large. 1 The absence of evidence on the use of different arrangements for enforcing agreementstransactional governance structures -is troubling also, because any diagnosis of the causes of a country's ailments will depend on an assessment of the behavior of its firms in critical areas of activity. Given the absence of any absolute standard to guide such an assessment, rigorously comparative, cross-country data are necessary for such a diagnosis, as for example implemented by the World Governance Indicators or Doing Business. While there are many datasets on individual elements of governance, such as the strength of legal institutions or country trust levels, there is no clear overall picture capturing how firms enforce agreements. Stated starkly, currently there exists no systematic methodology or data that would allow us to draw any conclusion on, for example, whether firms in La Paz, Bolivia, rely more or less on legal institutions or trust than firms in La Paz, Argentina, or, indeed, La Paz, Arizona.
The aim of this paper is to fill this gap in the literature. The paper presents and analyzes the responses of 3,430 firms to a newly designed survey question that is posed to representative samples of firms in six South American countries. Given the use of survey weights, our estimates characterize the use of governance structures in a very large part of economic activity. The application here is to a small number of countries, but the methodology is sufficiently general that it could be applied in any country or region in the world, producing results that are exactly comparable to those presented below. 1 It is notoriously difficult to quantify the relationship between arrangements for enforcing agreements and economic performance, largely because of the issue addressed in this paper, data availability. Nevertheless, there is enough evidence to conclude that understanding the process of development could benefit from more focus on transactional arrangements and the quantitative consequences of problems with such arrangements. Nunn (2007) shows that contract enforcement explains more of the pattern of trade than physical capital and skilled labor combined. In a cross-country study, Kovac and Spruk (2016) find that a country with a one standard deviation higher level of transaction costs is 44% poorer as a result. Boehm (2018) calculates, under a set of conservative assumptions, that on average countries would grow by 18% if enforcement costs were zero. Boehm and Oberfeld (2018) find that a feasible reduction in court congestion alone in India would increase productivity by 5%. In a small survey of Romanian firms, Murrell and Paun (2010) asked about the willingness-to-pay for first-best institutions and found that the transaction costs of exchange are as high as 23% of value added. Even for a very developed and organized market, US usedcar wholesaling, Larsen (2018) found that the surplus lost in the bargaining process represents on average 14% of the first-best surplus. A (transactional) governance structure is the "institutional framework within which the integrity of a transaction is decided" (Williamson 1979: 240). Such governance structures are only a sub-part of the overall governance of the firm, but in this paper we confine our attention only to those aspects of governance relating to transactions in goods and services. 2 They comprise both tools for problem prevention and tools to provide solutions should problems arise. A governance structure is a coordinated combination of different mechanisms that together encourage the fulfillment of agreements to transact. Such mechanisms might be trust, hit-men, legally enforceable contracts, etc. No existing studies obtain a comprehensive, consistent, crosscountry picture on how these mechanisms are combined within governance structures. 3 One reason for this lacuna is surely that there exists no general theoretical framework that predicts which particular combinations of mechanisms one should expect to observe often, and which can be safely ignored.
The absence of a tight theory to guide the empirics has important consequences for how research must proceed. The pertinent survey question must ask about a comprehensive variety of individual mechanisms. The question must have a very general structure that resonates with the concerns of all types of firms, and makes responding feasible whatever the nature of any firm. Therefore, a first contribution of this paper is the construction of a new survey question, together with a demonstration that it elicits reliable data from representative samples of firms in many countries. The paper establishes the validity of the question and the responses it elicits data through examples of the productive use of the resultant data in practical settings.
Additionally, the absence of a tight theoretical framework means that data cannot be collected on governance structures, per se, but must instead focus on the use of individual mechanisms that are themselves inputs into different governance structures. The challenge then is understanding which individual mechanisms are combined by firms to produce coherent governance structures. Thus, a crucial task in building a comprehensive picture is to find those combinations that exist in practice. That is, one must apply an exploratory statistical technique that isolates archetypes of governance structures, effectively substituting empirics for what usually would be a theoretical exercise. This type of statistical investigation is very common in general, and is usually approached using latent variables. In our case, the governance structures themselves are viewed as the values, or classes, of a categorical latent variable. Exploring the characteristics of the classes thus discovered is a key part of the investigation. In an examination of Hungarian data, Mike and Kiss (2019) have shown that latent class analysis (LCA) is particularly suitable for this task. 4 Thus, a second contribution of this paper is to develop the application of LCA for the discovery and estimation of governance structures, and apply it to the six-country dataset. We characterize the governance structures, estimate the prevalence of each, and show how the use of each varies across countries, producing results that are exactly comparable across countries. Our characterization of governance structures is data-driven: it does not rely on an a priori conception of which governance structures are used in practice.
A third contribution is to provide illustrative examples of the use of the information generated by our method. We present descriptive statistics on the variation of the use of governance structures across different subsets of firms, for example, in different sectors, or for firms of varying sizes, or for firms with different types of management practices. These are descriptive in the sense that they do not isolate the ceteris paribus causal effects of single variables. We do, however, use techniques previously developed for LCA that produce consistent estimates of the descriptive parameters. The examples point the way to potential areas of further research and possible policy conclusions.
A fourth contribution is to develop a set of tools for other researchers who can use our data and estimates of governance structures to advance their own research agendas. 5 We provide a dataset of the predicted posterior probabilities that each firm in our sample uses each estimated archetypal governance structure. These predicted probabilities could then be used in combination with existing survey data produced by the World Bank Enterprise Surveys (WBES) to investigate the causal determinants of firm choice of governance structures. Or, more broadly, researchers could use our predicted probabilities in combination with their own datasets in order to address a broad range of research issues by using a Rajan-Zingales (1998) style methodology.
Going even further, readers could add different countries, or regions, or sectors to those we have studied here. Given any set of responses to the questions that we lay out in Section II, even responses from very few firms, a researcher could use the information we provide to obtain the predicted probabilities that the respondent firms use particular governance structures. The results could then be directly compared to those for the six South American countries. This could be particularly useful for researchers who have limited resources and can only sample a small number of firms, yet have an interest in obtaining reliable statistical results to use in a crosscountry comparable setting.
These contributions show the feasibility of constructing and implementing a methodology that produces cross-country data characterizing the mix of governance structures that are typically used. Once the data exist, the possibility of new discoveries arises, one that simply is not possible from the existing studies that focus on one mechanism or one particular economic environment. For example, pure bilateralism (i.e., reliance on trust and mutual interest to enforce agreements) is the most common governance structure that we observe and all governance structures use bilateral enforcement mechanisms. Thus, we find no evidence for the presence of pure arm's length transactions, with firms relying on impersonal mechanisms and formal institutions to support their transactions. A corollary is that bilateralism and formal institutions are never substitutes, and for many firms, they are complements. Additionally, while much attention has been paid in the literature to various unpaid, third-party, mechanisms of supporting agreements, such as networks, social clubs, and culturally defined groups, our economy-wide, cross-country data suggest they are of minor importance. In fact, we conclude that networks are no more important in transactional governance than is the help of government officials, a group whose role in supporting inter-firm transactions has invariably been ignored in the literature. 6 The purpose of collecting new data is not only to examine existing hypotheses but also to generate new facts that surprise and therefore stimulate new avenues for research. Here we mention two. First, the data suggest that inter-regional variation in the use of legal institutions is more important than cross-country variation. In the countries analyzed, this is unexpected because institutional rules relevant to transactions are set at the national level, and because standard data sources suggest much variation in the strength of legal institutions across our sample of countries. Secondly, but in the same vein, we find that the use of legal mechanisms in Bolivia is comparatively greater than would be predicted using the standard country-level indicators of the strength of legal institutions. The fact that the respondents in the poorest country in our six, Bolivia, rate the use and effectiveness of legal institutions as greater than that in the richest country, Uruguay, is surely a puzzle in need of further investigation.
Section II details the process of data collection, discussing the logic of our new survey question and examining its validity. Section III introduces LCA, describing the data-generating process underlying LCA and introducing the criteria that we use to choose a particular model specification. Section IV considers the characteristics of each of the four estimated governance structures, showing how their features resonate with ideas in the transaction-cost and contracttheory literature. Section V presents illustrative examples of the use of the information generated by our method, providing descriptive statistics on the variation in the use of governance structures across subsets of firms. Section VI concludes by considering direct extensions of this research. We pose questions that our findings directly kindle and detail how others can use the data we have generated.

II. The Questions, the Surveys, and Raw Responses
We use responses to questions posed by the WBES in 2017 and 2018 to representatives of a total of 3,430 firms in Argentina, Bolivia, Ecuador, Paraguay, Peru, and Uruguay. The respondents were business owners and top managers in a sample of officially registered firms with at least five employees in the manufacturing and services sectors. The samples are designed to be nationally representative, using a stratified survey design. 7 Newly designed questions were fielded on the effectiveness of various methods of preventing or resolving problems when implementing agreements, i.e. methods of enforcing agreements. 8 Two composite questions were posed to respondents, one about agreements with suppliers and the other about agreements with customers, each question having six subquestions. The composite questions only differ in whether asking about suppliers or customers. We chose to ask about supplier-and customer-relations separately because firms might employ very different types of strategies when managing upstream relations than downstream ones. Respondents were presented with a 'show card' with a Likert scale of responses and the following was read aloud:

When making agreements with [suppliers][customers], please indicate to what degree each of the following is effective in resolving or preventing problems.
Each one of the following was posed separately (without numbering): 1. Personal relationship and trust 2. Mutual interest in maintaining business relationship, without involving others 3. Paid, private dispute resolution 4. Assistance of government officials 5. Intervention of other third-parties (excluding paid, private dispute resolution and government officials) 6. Legal system The response scale (displayed on a 'show card') was:

Not at all Slightly Moderately Very much Extremely
The exact wording of the questions in Spanish and English is included in Appendix A.1. Nuances raised in the translation process are discussed in Appendix A.2.
When constructing questions to be administered in a long survey and addressed to firms of all types, in different institutional settings, both conceptual and practical difficulties immediately arise. A first difficulty we faced in question design was whether to ask firms about their relations 7 Full details of the methodology can be found at http://www.enterprisesurveys.org/methodology. Stratified random sampling was used, with strata based on firm size, geographical location, and economic sector. The data includes sampling weights. All results are obtained using these weights and thus refer to the entire population of pertinent establishments in the six countries. The WBES universe covers formal, private, non-agricultural, non-extractive firms with five or more employees. 8 Note that for simplicity we use firms interchangeably with establishments, which is the survey's unit of analysis. In fact, the phrase "top managers or owners of these establishments" is probably more accurate, but too cumbersome.
with transactional partners in general, as in Hendley and Murrell (2003), or to focus on highly specific transactions, as when Mike and Kiss (2019) asked about relations with a typical partner, allowing the respondent to choose the typical interaction. 9 Governance structures do vary within firms between transactions, suggesting that asking about a specific transaction corresponds to the daily thought processes of respondents. But asking respondents to focus only on a transaction of their own choice risks losing generality and invites selection bias. We therefore chose to ask about transactions in general. The assumption is that respondents will convey information that summarizes governance structures across the range of their firm's transactions. The surveyors did not report any problems with posing the question in this way.
A second difficulty was deciding which properties of governance structures respondents should be asked to assess. We chose to focus on preventing and/or resolving problems in the implementation of agreements because this was a fairly well circumscribed objective that would be easy for respondents to understand. It also resonates with fundamental concerns common to both the transaction-cost and contract-theory literatures. This objective is narrower than asking respondents to focus upon whether agreements work in an efficient way, although certainly part of that broader perspective. 10 A third difficulty was framing questions whose tenor complemented that of other questions in the WBES. The questions were based on those in Hendley and Murrell (2003), but were much simplified to fit into the standard questionnaire and to preserve generality. This generality would allow some level of respondent interpretation, but as our goal is to develop data across heterogeneous firms and in a cross-country analysis, very general wording was essential.
A long array of conceptual and practical difficulties is inherent to the process of constructing questions. Indeed, it is worth remembering that in the seminal paper in the current line of inquiry, Macaulay (1963) was forced to remark that "…to a great extent, existing knowledge has been inadequate to permit more rigorous procedures-as yet one cannot formulate many precise questions to be asked a systematically selected sample…Much time has been spent fishing for relevant questions…" Despite the length of time since the publication of Macaulay's paper and the recognition in the intervening years that it had raised fundamental issues, there have been few attempts to address in a general way the problems of data collection that he so clearly articulated. We fill this gap by fielding new and carefully designed questions that can be analyzed individually or together to explore a wide range of topics. By drawing meaningful conclusions based on the responses to these questions, this paper demonstrates that the questions elicit reliable data. 9 This latter approach originated in the work of McMillan and Woodruff (1999) who asked respondents to reflect on their firm's first customer and its most recently added customer; and first supplier and its newest supplier. However, McMillan and Woodruff (1999) were interested in the determinants of highly specific elements of transactions-trade credit. In that exercise, a focus on highly specific transactions is necessary. 10 Hadfield and Bozovic (2016) have argued in the context of innovation-oriented interactions that legal contracts might be valuable in the planning of transactions, even if the parties have no intention of using the contract in a formal legal setting. To the extent that a contract is useful purely for planning, respondents would not identify the use of such a contract with the effectiveness of the legal system, but rather with bilateral mechanisms.
Interviews were conducted face-to-face by local contractors in Spanish using tablet devices. Information on the dates of fieldwork and the total number of observations available for each country is given in Appendix A.3. Summary statistics on rates of "Don't Know" responses (item non-response rates) are provided in Appendix A.4. These rates are negligible. 11 Table 1 displays the raw response percentages for the twelve sub-parts of the questions taken separately. Patterns emerge even at this level of analysis. Respondents tend to regard bilateral mechanisms (i.e., trust and mutual interest) as effective, while regarding third-parties, government officials, and legal mechanisms as less effective.
Much analysis could be conducted using the types of information appearing in Table 1, by, for example, relating these responses to other information contained in the WBES. This paper takes a more holistic look. The governance of transactional relations can involve the coordinated use of different mechanisms. Therefore a fuller understanding requires consideration of the vector of responses to all six sub-questions rather than looking at individual responses in isolation. This task involves analyzing a daunting number of possibilities: with our six subquestions, each with five categories of responses, there are 15,625 possible response patterns 5 . The responses do include far fewer combinations than this. But even so, for relations with suppliers, we observe 711 distinct combinations and for relations with customers, we observe 631. (Appendix Tables A.3 and A.4 list the most common reported response patterns accounting for 50% of firm responses.) The fact that a sizeable percentage of responses is spread over hundreds of distinct patterns implies that systematic methods are needed to parsimoniously summarize observed responses. These insights and more are provided by LCA, to which we now turn.

III. Latent Class Analysis: Data-Generating Process and Diagnostics
Most economists will be unfamiliar with LCA, judging by the few examples appearing in the economics literature. 12 Therefore, we introduce in rather more detail than usual the general specification of the data-generating process (DGP) that is used to structure our estimation. The DGP used in LCA views the responses to questions like those we posed as comprising mixtures of contingency tables. The mixing proportions are unobserved. LCA handles this unobserved heterogeneity by simply separating the mixture into its constituent parts, treating the mixing proportions as a latent variable. The fundamentals of LCA are similar to those of other latent variable approaches. For example, there are immediate parallels with factor analysis. While both techniques can be applied (perhaps with small modifications) to exactly same types of observable data, the key difference is that LCA estimates a single nominal latent variable, whereas factor analysis estimates one or more continuous (i.e., cardinal) latent variables. The 11 Due to the high item-response rates, we omit from our analysis any observation that has at least one "Don't Know" as a response to any of the set of six sub-questions. 12  We first introduce the DGP in its simplest form by using a particularly strong assumption on how firms answer questions-the local independence of all responses, introduced immediately below-and then afterwards partially relax this assumption. For readers wanting to understand LCA at a more intuitive level, Appendix B uses a simple fictitious example, framed in this paper's context, to introduce the underlying approach.

III.1 The Simplest LCA Data-Generating Process
In our data, firms were asked a series of sub-questions about the effectiveness of six different mechanisms of supporting agreements with responses restricted to a 5-point Likert scale. More generally, each firm, i, reports on the effectiveness of K mechanisms. The response vector is denoted , … , , which is observed. The response for k, , can take on one of R values. In our data, K = 6 and R = 5. We observe separate 's for relations with customers and relations with suppliers, but since we keep the analysis of each type of relations entirely separate, we use only one in specifying the DGP.
Firm i chooses one of a number of governance structures. That is, i is in one categoryclass-of a single nominal latent variable c = 1,…C. The latent class of i is denoted ci, which will be estimated. As the latent variable is nominal, the estimated classes are not ordered along any single dimension, allowing full flexibility over the features underlying each class.
A standard approach in parsing the data using LCA is to assume that for any given firm the probability of choosing one of the five responses for mechanism k is conditionally independent of the probability of choosing one of the five responses for m when m ≠ k. This is called the local independence assumption. The strongest version of this assumption-local independence across all six sub-questions-leads to a particularly simple, intuitive form for the DGP.

Denote by
| the probability that a firm in latent class c chooses answer r concerning mechanism k. Denote by the probability that a firm is in latent class c. Then the probability of observing a specific vector of responses, , for firm is: where the exponent, , on | is an indicator function that equals 1 if , and 0 otherwise. The parameters | and are to be estimated. This DGP satisfies the local independence assumption across all six sub-questions because, conditional on , is independent of for all k ≠ m.

III.2 Relaxing Full Local Independence
Exceptions to using local independence for all six sub-questions can be built into LCA, and there are reasons why this might be advantageous in practical applications. 13 For example, in the piloting of the survey, we found that respondents had some degree of difficulty in separating the notions of mutual interest and personal trust. It is reasonable to assume, therefore, that errors in the responses for these two mechanisms are correlated. Moreover, when sub-questions about somewhat similar mechanisms are asked just before or just after each other respondents might not exert the cognitive effort to distinguish their responses, answering the successive subquestions in a similar way. 14 If one does not take into account the tendency for pairs of responses to contain partially the same information, then LCA will give too much weight to these responses (Vermunt and Magidson, 2002: 95). The analogy to weighted least-squares is immediate.
To formulate the relaxation of full local independence, split the K mechanisms into H subsets. Conditional on the latent class of the firm (ci), responses on sub-questions about two mechanisms not in the same subset satisfy local independence. This is not the case for two responses within the same subset. Let be the vector of firm i's responses on the mechanisms in the h th subset.
, h=1,…H, is observed with each being a sub-vector of .
Denote by | the pdf of given c. Then the probability of observing a specific response vector, , for firm is:

| 2
Estimates of • | • and the are obtained by maximizing the following likelihood: 15 | 3 where the denote the standard sampling weights included in the WBES. Use of the sampling weights implies that our estimates are representative of the entire WBES universe of firms in the six countries. The formulations of the DGP in 2 and 3 illustrate the general features of LCA. Note that can be any type of variable, ordinal, nominal, continuous, or counts, but the latent variable, , is discrete and nominal. 13 It is at this point that our application of LCA diverges from that of Mike and Kiss (2019). 14 This phenomenon is sometimes known as "nondifferentiation of responses" or simply "nondifferentiation" in the survey design literature (e.g. Krosnick and Alwin, 1988). 15 We use the Latent GOLD software (Vermunt and Magidson 2016). Latent GOLD uses both the EM and the Newton-Raphson optimization algorithms. It first relies on EM to exploit EM's stability even when the solution is far away from an optimum. When close enough to an optimim, Latent GOLD switches to Newton-Raphson exploiting its speed.

III.3 Incorporating Determinants of Class Membership
One of the purposes of using LCA is to understand which factors determine the class to which a firm belongs. This is Williamson's (1991) discriminating-alignment research agenda. While the focus of the current paper is discovering governance structures and exploring their characteristics, we do aim to provide readers with tools that can be used to link the governance structures with their potential determinants. Therefore, in Section VI we conduct some preliminary exercises relating class membership to firm characteristics.
Equation (2) is easily modified for this change.
is the probability of membership in latent class given that the firm has characteristics . These may include features of the firm (e.g. size) as well as the environment or context in which it operates (e.g. culture). Then the probability of observing a specific response vector, , for firm with characteristics is: With this model, one estimates the functions • | • and . .
If (4) is the preferred model, there are two routes to estimation. One obvious choice is to form a likelihood from (4) and estimate the • | • and the . directly. Alternatively, one could proceed in a 3-step process. First, maximize the likelihood (3) and estimate • | • and the . Then, use Bayes theorem to estimate firm-specific class membership probabilities for each firm, . Finally, estimate the functions . , 1, . . . , using and .
Our choice of estimation method is the 3-step process. There is a large literature, both theoretical and applied, reflecting on this choice. From theory, there are procedures to obtain consistent estimates of • | • and . using the 3-step process (Vermunt 2010, Bakk et al. 2013, Bakk et al. 2014. The applied literature suggests that using the 3-step process is advisable unless one has great confidence in the specification of (4), especially understanding which to include and exclude (Nylund-Gibson and Masyn 2016). The use of the 3-step process is advised because many of the relevant remain unmeasured or even unknown when using cross-country data and because the discriminating-alignment research program-identifying the 's-is still a work in progress.
Our decision to use the 3-step procedure also rests on the primary objective of this paperunderstanding the nature of the classes themselves rather than the determinants of class membership. We are most interested in characterizing the most common governance structures that appear in general (step 1). By estimating the classes in general-independently of the determinants of class membership-we are able to focus on this goal and provide readers with results that are unencumbered by any more ambitious objectives. Moreover, by providing the Bayesian posterior probabilities produced at step 2 of the 3-step procedure (the ), we make it possible for others to conduct step-3 analyses, selecting their own 's from the copious data available to all from the WBES, or using data from other favored sources.

III.4 Selecting a Specific LCA Model and Evaluating its Properties
Implementation of step-1 of the 3-step process entails estimation of • | • and the by maximizing the likelihood at (3). LCA is typical of latent-variable models in that implementation requires making many detailed decisions on choices of specification. In most existing practical applications, choosing the number of classes is most important, and usually the number of options considered is kept relatively small. However, the many possibilities for relaxation of local independence increase the options in specifying the DGP enormously. In this subsection, we provide an overview of the procedures we use to choose our favored model, relegating details to Appendix C. Detailed background information can be found in Collins and Lanza (2010), Masyn (2013), and Vermunt and Magidson (2016. Settling on the details of model specification moves through three stages. First, statistical measures of model-fit are used as criteria to choose a very small set of satisfactory models, perhaps even one model. Second, the results of those model(s) are evaluated using more subjective criteria. Parsimony-the use of a simple model (with fewer parameters)-is important in order to avoid over-fitting and to facilitate meaningful interpretation.
Third, class homogeneity and separability are checked using standard measures. Homogeneity is the notion that the members of a specific class should exhibit similar characteristics or, equivalently, that there are certain configurations of responses typifying each of the classes. Separability is the notion that each class looks quite different from all other classes, or, equivalently, that there are certain configurations of responses that distinguish each class from the others. In all three steps, statistical measures are used as model-selection criteria.
Before beginning the estimation process, it was necessary to make three preliminary decisions. First, we chose to treat firms' mechanisms for making agreements with suppliers and customers separately. This decision followed from the judgment that firms might employ very different types of strategies in conducting upstream relations than downstream ones. After all, the firm's objectives in the two activities are very different: for the former, it is primarily about securing timely delivery at an appropriate level of product quality; for the latter it is primarily about getting paid by a satisfied customer. Because of this, our survey asked separate questions vis-à-vis supplier relations and customer relations. The large samples meant that sufficient statistical power could be generated in two separate statistical analyses. 16 The second decision was to choose the numbers of classes to be considered in model selection. Invoking parsimony, we focused on 3-, 4-, 5-, and 6-class specifications. Models with greater numbers of classes were not considered as the number of parameters would exceed 250, leading to the risk of over-fitting. Moreover, robustness results outlined in Appendix C show that there was no need to consider a wider variety of numbers of classes.
The third decision was whether to consider LCA models embodying a relaxation of local independence as described in III.2, or to keep local independence across all six sub-questions. Relaxing local independence results in a proliferation of design possibilities. Our six responses on mechanisms can form a total of 15 unique pairs, with 32,766 combinations of these pairs possible. 17 Dealing with this number of possible models is obviously untenable. We thus chose to look initially where theory and the observations from survey implementation point us. As mentioned previously, the responses on mechanisms numbers 1 (trust) and 2 (mutual interest) may be related. From the cognitive interviews conducted prior to the survey, we learned that individuals sometimes did not clearly distinguish assistance of government officials (mechanism 4) from intervention of other third-parties (mechanism 5). As a result, it seems natural to consider model specifications that relax independence of these two responses as well.
Taking into account the various combinations of the relaxation of local independence and the four possibilities for class size, there are already 16 models to consider. With this starting point, we conducted an empirical exploration of whether there was a need to further relax the local independence assumption. To do this, we estimated the 16 models and examined the size of bivariate residual correlations, a measure of the marginal increase in the log-likelihood function that could be obtained by any specific relaxation of the local independence assumption (Vermunt and Magidson, 2016: 83-5). We then observed which particular combinations of mechanisms had bivariate residual correlations that were most prominent in this set of models. Appendix C.1 reports the details.
The exploratory results suggested that answers to adjacent sub-questions are related. As already noted, this may be due to non-differentiation of responses. This effect is known to be smaller in face-to-face surveys (Holbrook et al, 2003, Heerwegh andLoosveldt, 2008), which may be a reason that only adjacent and not the full series of responses seem correlated in our data. 18 Based on these correlation patterns, we chose to consider a total of 20 models for each of consumer relations and supplier relations.
Providing the details of the process of model selection is necessary for completeness, but understanding these details adds little when simply absorbing the substantive findings. Interested readers can consult the relevant appendixes, which describe the choice of the set of models that we consider (Appendix C.1), the statistical criteria that are used to compare the performance of the different models (Appendix C.2), and the final choice of the preferred models, one for customer relations and one for supplier relations (Appendix C.3). Appendix C.4 includes an analysis of robustness of our substantive conclusions, a comparison of the 17 32,766 ∑ 15 . LCA needs at least one local independence assumption for identification, hence the 14 in the summation. 18 These correlations might also reflect an "anchoring effect" (e.g. Furnham and Boo, 2011), where a subsequent response is biased towards a previously selected response; or "straight-lining" (Kaminska, McCutcheon, and Billiet, 2010), where respondents give the same answer to many consecutive, if not all, questions, visually indicated by the appearance of a "straightline" of responses. Notably, the second effect should be minimal in our data as the interview was conducted face-to-face by trained interviewers without the respondents seeing the screen with possible answers.
characteristics of the classes that are produced by our preferred models with the characteristics of the next-best models. 19 This process of model selection led us to estimate four latent classes for each of customer and supplier relations. That is, we conclude that four governance structures adequately describe the choices that firms make when constructing arrangements to enforce agreements.

IV. The Estimated Model: Class Characteristics
The assignment of names to the classes is a crucial, substantive element of the analysis because important insights are generated only if LCA uncovers readily recognizable types of governance structures that appear in all countries. Finding resonance between our estimates and existing ideas and concepts provides additional substantive validation of the analysis.

IV.1 Characteristics of Chosen Models
The naming of classes builds primarily on an examination of the estimated mechanism response probabilities. In the notation of Subsection III.1, these are the | , the estimated probability of choosing response r for mechanism k if the firm is in class c. 20 Tables 2a and 2b show these estimated probabilities for respectively, relations with suppliers and customers. Both tables are accompanied by graphical representations of these estimated probabilities. In the tables, classes are labeled in two ways, by a number and a name. The numbers are purely artefacts of the estimation process, the order in which the estimation extracts the classes. In the following subsection, we provide justifications for the class names. As the classes uncovered by LCA are nominal, they are not ordered along any single dimension, resulting in class names that do not follow a single scale or continuum.
The tables of estimated probabilities include standard errors. The probabilities are quite precisely estimated. Most estimated probabilities do not lie in the 95% confidence intervals of either their vertical or horizontal neighboring probabilities. This implies that easily discerned differences in the figures are almost certainly statistically significant differences. 19 Notably, both the first-and second-best models involve fewer classes than the maximum considered, suggesting no need to consider less parsimonious models (i.e. 7-or more classes). 20 The | appear explicitly in the DGP given in equation (1), which satisfies local independence, but not in the DGP in equation (2), which relaxes full local independence and which is the one we use. Therefore, | should be interpreted here as the marginal probability that a firm in latent class c chooses answer r on question k.

IV.2 Class Names
This subsection argues that the following names capture the nature of the governance structures that firms in the six countries use to support their transactions.

Relations with Suppliers
Relations with Customers class 1 Pure bilateralism Pure bilateralism class 2 Bilateralism with private support Bilateralism with private support class 3 Bilateralism with legal support Bilateralism with weak support class 4 Strong comprehensive governance Weak comprehensive governance The class characteristics look remarkably similar for suppliers and customers, even though the estimation for suppliers and customers is entirely separate. The nature of class 1 for both upstream and downstream relations is transparent and is the same for both types of relations: only trust or mutual interest are endorsed. Both class 1's are pure bilateralism. The use of 'pure' is emphasized as a contrast to the remaining classes, which differ primarily in what they add to bilateralism. In this way, the classes may seem ordered, implying that each subsequent class builds upon the previous. However, this is not the case, as classes uncovered by LCA are nominal and do not necessarily lie along any single dimension.
Turning to the bottom of the list, for both class 4's there are significant contributions from all mechanisms. For upstream relations, firms find the legal system as effective as any other mechanism, with governmental officials and third-parties both used almost as much as each of the two bilateral mechanisms. This governance structure is one where a full set of mechanisms is used. For class 4 of relations with suppliers, where every single mechanism is rated as effective as in every other class, we use the label strong comprehensive governance. However, for downstream relationships weak comprehensive governance is more appropriate given that all mechanisms are less effective (within the class 4's) for customer relations than for supplier relations.
In class 2, the two bilateral mechanisms are as important as in class 1 (pure bilateralism), but all other mechanisms also show a notable presence. Among the non-bilateral mechanisms, paid, private dispute resolution is the most important, followed by the legal system. This is consistent with how paid private third-parties often work in practice. Arbitration mechanisms need the backing of formal legal enforcement; the job of goons is often simply to remind miscreants of the possibility of legal sanctions; debt-collection firms invoke legalistic mechanisms while harassing. We thus use the name bilateralism with private support for the firms in class 2, remembering that only brevity precludes mentioning the secondary role of legal mechanisms.
It is for class 3 that there is a need to distinguish clearly between upstream and downstream governance structures. For supplier relations, there is a contribution from the two bilateral mechanisms, but it is weaker than in other supplier classes. Other mechanisms have a significant presence, with the legal system being the most important non-bilateral mechanism in this class; this is followed by the role of paid private dispute resolution. Thus, there is some complementarity between these two mechanisms, but this class affords greater importance to the legal system. That is, for supplier relations, class 3 differs from class 2 primarily in the relative emphasis on these two. We thus use the name bilateralism with legal support for class 3 on the upstream side, remembering that only brevity precludes mentioning the role of paid private dispute resolution.
For class 3 for customer relations, there is a contribution of the two bilateral mechanisms, but it is weaker than in all other classes on the customer side. The way in which this downstream class 3 differs from that on the upstream side is that the contribution of the non-bilateral mechanisms is quite weak. Thus, we name this class bilateralism with weak support, recognizing that among all eight estimated latent classes, this is the governance class where the aggregate effect of all 6 mechanisms is rated lowest by respondents. Compared to other classes, the label 'ineffective governance' might also be appropriate.

IV.3 What Has Been Learned on the Practices of Transactional Governance?
The identification and naming of the classes not only reveals which governance structures are used in practice, but also which possibilities are absent. All governance structures rely, at least in part, on bilateral mechanisms: no firm relies solely on a combination of third-parties and formal institutions. 21 This flies in the face of many claims in the economics and business literatures that characterize development as a process of escaping personalized interaction and moving to a rule-based, impersonalized set of interactions. 22 While our data do not capture the process of development, they do show evidence for countries at different levels of development, and there is no evidence of the existence of those purely rule-based, impersonalized transactions in the set of countries that we analyze.
A corollary of this is that bilateralism and the legal system should not be viewed as substitutes: indeed, in several of the classes they play highly complementary roles. 23 There is also evidence that paid private dispute resolution and the legal system are sometimes substitutes and sometimes complements. For example, for supplier relations, when moving from the pure bilateral class to any of the three other classes there is an increase in the use of both paid private dispute resolution and the law. But, as indicated by their very names, a move from bilateralism 21 This is also a finding of Mike and Kiss (2019) for Hungary: "Law never stands alone." 22 Mike and Kiss (2019) characterize this as the classical view, and give many references to its use. For a very widely cited version of this view in the business economics literature, see Peng (2003: 276), which claims that the most important transition for emerging economies is the process of moving "from a relationship-based, personalized transaction structure calling for a network-centered strategy to a rule-based, impersonal exchange regime". To be sure there are fields of enquiry, where this is not the case: 'law and society' for example. 23 For a long time, the dominant view in the economics and business literatures was that the use of formal legal arrangements for transactions was inconsistent with the use of personalized relationships based on trust: the formality eroded the trust. But this view has been moderated somewhat especially after Poppo and Zenger's (2002) seminal contribution. Our results are consistent with the changing view but are based on a broader overview of existing governance structures than any existing contribution to the literature.
with private support to bilateralism with legal support indicates substitutability between private and legal support.
Two further facets of the data are worth noting because of their contrast to emphases in the existing economics literature. First, in that literature, the role of government officials in supporting transactional governance is almost entirely neglected. 24 Yet, for several of our classes, government officials are awarded an important role, and in the strong comprehensive governance class of supplier-relations, they have a prominent role. Second, if one were to judge the importance of non-paid private third-parties by the amount of attention paid to them in the literature, especially in the study of networks, one would imagine that they are used a great deal. 25 Yet, in none of our classes do non-paid third-parties play any significant or defining role, and on the suppliers' side they are almost irrelevant to transactional governance.
Lastly, we examine LCA estimates of the proportion of firms placed within each class. Table  3 presents the probabilities of class membership directly estimated by maximum-likelihood (in the notation of Section III, the ). All class membership sizes are significantly different from 0. Firms are more willing to turn to private dispute resolution when dealing with customers than when dealing with suppliers. There are more purely bilateral firms on the supplier side than the customer side, but supplier relations generally also involve more use of the legal system than customer relations. Note also that a quick visual comparison of the figures summarizing Tables 2a and 2b reveals that, in general, firms rate mechanisms as less effective for customer-relations than for supplier-relations. This characteristic is epitomized in the two class-4 names-strong comprehensive governance and weak comprehensive governance. In fact, the difference between supplier-relations and customer-relations is one finding that will surface repeatedly in the remainder of the paper.
Our results are broadly consistent with those of Mike and Kiss (2019). Given that these authors use different survey questions, study a different context (Hungary), and implement LCA in a different way, such consistencies point to robust general conclusions about landscapes of transactions. Both this paper and that of Mike and Kiss (2019) find that bilateral mechanisms are important in all business relationships; that the key governance choice is between bilateralism alone, or bilateralism supplemented with other mechanisms; and that there are a significant number of firms that implement comprehensive governance. Nevertheless, there are differences between the two studies. Mike and Kiss (2019) find a latent class in which third-party reputational mechanisms are quite important. Whether this is a reflection of the different context, Hungary, or of different survey questions is an open question, to be answered only by implementing one of the key ingredients of the current paper, a consistent cross-country methodology.

V. Variations in the Use of Governance Structures
In this section, we study associations between the governance structure of firms and their other behaviors or characteristics. This is an exploratory venture in using our methodology to understand the links between governance structures and the broader environment of the firm. We simply explore patterns in the data and do not attempt to isolate ceteris paribus, causal effects of single variables. The latter would need a separate paper in itself.
To illustrate the type of thought experiments explored here, consider firm size. We compare the pattern of governance structures used by small firms with that used by large firms. We show the resultant change in the choice of governance structures as a firm becomes large for any reason and then simultaneously goes through all other changes associated with the differences between small and large firms. In terms of the notation of Section III, we estimate one at a time without considering why the vary with i. 26 By implementing steps 1 and 2 of the 3-step method outlined in Section III, one obtains estimated posterior probabilities of membership for each firm in each of the four governance structures. Then the most natural way of exploring the would be to regress these probabilities on each of interest. However, Bolck, Croon, and Hagenaars (2004) showed that this naïve approach leads to systematic underestimation of the strength of associations. These authors developed a correction procedure to eliminate this bias. Vermunt (2010), Bakk, Tekle and Vermunt (2013), and Bakk, Oberski, and Vermunt (2014) extended the correction procedure, developing a maximum-likelihood method that produced consistent estimates of the parameters defining the . . We apply their method in implementing our step-3 analysis. 27 With the richness of the WBES data and the complex origins of the governance structures used by firms, it is challenging to select a manageable set of covariates that are particularly germane. While some covariates are obviously crucial to examine, e.g. country or sector, others are less so, e.g. a firm's experience of corruption. Recognizing the exploratory nature of the exercise, we selected a set of variables that piqued our curiosity, without requiring a precise theory. Our interest is mainly in checking the validity of the estimated transactional governance structures by examining whether there are significant associations between governance choices and likely covariates. We study the associations between the use of the four governance structures and each of the covariates, one covariate at a time. We report Wald p-statistics in the second columns of Tables 4a and 4b. Each of these statistics are constructed to test the compound hypotheses that the variable listed in column 1 (the of Section III) has no explanatory power for the estimated posterior class-membership probabilities, , 1, . . .4.
Interpreting these statistics involves a multiple comparisons problem: which criteria to apply when judging statistical significance? This depends upon the insights that the reader hopes to draw-the hypothesis being tested. One natural null hypothesis is that our estimated class probabilities are no better than random in terms of their relationship with the whole set of variables listed in Table D.1. Then, the appropriate approach is to apply a family-wise error rate (FWER) method. We use the Holm-Bonferroni method (Holm 1979), reporting criteria for statistical significance in the rightmost three columns of Tables 4a and 4b. A significant value for even one p-statistic in these columns is evidence of better-than-random for the LCA procedure.
The null hypothesis that our estimated class probabilities are no better than random is rejected decisively. This is the case for both relations with suppliers and with customers. This rejection of the null hypothesis provides overall support for the validity of the method developed in this paper, including the formulation of the survey questions and LCA's interpretation of the data.
Given the confidence that we have that our posterior-probability data are better-thanrandom, we can proceed to examine hypotheses on individual variables using a criterion that has more power than the FWER. An alternative agenda examines hypotheses on individual variables. However, using standard criteria applied to the highest values of a set of -statistics violates the conditionality assumptions of standard tests. Therefore, we use the false discovery rate (FDR). If the FDR is set at 5%, for example, significance levels are set so that 95% of the individual-variable effects labeled as significant are inconsistent with the null hypothesis of no effect. We use the Benjamini-Hochberg (1995) version of FDR in columns 3 through 5 of Tables 4a and 4b. These columns are most relevant to readers with no prior theoretical hypotheses.
Lastly, some readers might come to this paper interested in a specific a priori hypothesis. These readers should focus on the standard statistical criteria for the corresponding variable in the second column of Tables 4a and 4b. The problem of multiple comparisons is not relevant to them. But note that if the information in Tables 4a and 4b is viewed before the formulation of a specific theory, the resultant hypothesis is no longer a priori, and the conditionality assumptions in standard tests of significance are no longer valid.

V.1 Cross-Country and Cross-Regional Variations in Governance Structures
We find a notable and statistically significant variation in the use of governance structures across countries. Figures 1a and 1b illustrate this variation. In these figures, and all that follow in this section, we use darker colors to denote governance structures that are more complex, that is, use more mechanisms more effectively (reflecting the | in Tables 2a and 2b). 28 Thus, for example, it is easy to see the rather surprising result that governance structures that include more than just bilateralism are more effective in Bolivia, the least developed of the six countries, than in all other countries.
To facilitate interpretation of these results, Table 5 lists some standard statistics on the six countries, together with regional and global averages. Nevertheless, none of the statistics on the absolute quality of legal institutions in that table prepare us for the surprising result on Bolivia. Given the low levels of personal trust in Bolivia, it is tempting to think that this result, instead, might be a reflection of comparative, rather than absolute, advantage in the legal realm. 29 This, cannot be a complete explanation. Note that the first two steps of the LCA do not differentiate between Bolivian firms and, for example, Uruguayan firms. Therefore, the greater effectiveness indicated by the LCA for legal institutions in Bolivia than in Uruguay is inconsistent with the much higher ratings for legal institutions in the latter country indicated in Table 5. This is a puzzle that needs further investigation.
We next look at within-country, regional variation in governance structures. We do so by applying the third step of the 3-step method to each country separately, using regional dummy variables as covariates. Tables 6a and 6b report p-values analogous to those in Tables 4a and 4b. The results on FWER strongly reject the hypothesis that our estimated governance structures simply reflect random variation. For supplier-relations, using the FDR we reject the null hypothesis of no association between governance structures and regions at the 5% level for all countries except Bolivia. In contrast, for customer-relations, only Ecuador exhibits significant within-country regional variation in governance structures using the FDR.

Figures 2a and 2b
show inter-regional variation in governance structures for those countries where we find statistically significant variation. The prevalence of bilateralism varies enormously. For example, an average firm in Rosario (in Argentina) is 22 percentage points more likely to use pure bilateralism in its relations with suppliers than an average firm in the neighboring region of Cordoba. The difference on the customer side is even starker-44%. Piura (in Peru) has the lowest level of pure bilateralism amongst any of the 17 regions in Figure  2a, even though Peru has the highest level of pure bilateralism of the six countries in Figure 1a.
Perusing all the tables and figures relating to country and regional variation, it is an inescapable conclusion that inter-regional variation is even more important than cross-country variation. For example, the standard deviation of the percentage of bilateralism in Figure 2a is greater within the regions of each of Argentina, Ecuador, Peru, and Uruguay than it is for countries in Figure 1a. Despite the fact that legal systems are country-level institutions in the six nations we study, regions, rather than countries, might be the best unit of analysis for conducting reform aimed at improving transactional mechanisms.

V.2 Attitudes Towards Courts
We examine two standard questions that appear in every WBES and have often been used as measures of court performance. The first ("fair-court") asks whether the respondent agrees or disagrees with the statement "the court system is fair, impartial and uncorrupted". The second asks whether the courts are an obstacle to the current operations of the firm ("court-as-obstacle").
Figures 3a and 3b show the patterns in the data. On the suppliers' side, consistent with our intuition, firms considering the court fair are more likely to employ governance structures with a stronger legal element. This relation is weaker on the customers' side, where there is little association between attitudes about the courts and the use of the law. The conclusion is that the fair-court question is not a reliable indicator of a firm's commitment to a legally-oriented governance strategy.
For the court-as-obstacle question, the firms that do not consider the court as an obstacle are the least likely to rely on the legal system. In contrast, as the assessment of the court as an obstacle increases, there is more reliance on governance mechanisms that involve legal systems. If one viewed this question as a measure of court quality, one would expect exactly the opposite association. 30 The most likely explanation of this apparent paradox is reverse causality: if firms do not choose to use the legal system, then the courts are not an obstacle. The firms that need the legal system are more likely to be hindered by its flaws. That is, the interpretation of answers to this court-as-obstacle question in the literature seems to be diametrically opposite to what it actually reflects. Our conclusion here is consistent with observations on data on Russia's early transition made by Hendley et al. (2000) and explored thoroughly in papers by Hendley (2016;2017): because going to court is inherently an unpleasant experience, attitudes to the courts are not good predictors of the use of the law.

V.3 Interactions with Business Associations
Figures 4a and 4b illustrate the correlation between business membership and governance structures. In all cases, firms with stronger ties to business associations are more likely to rely on bilateralism with private support. But, as shown in Tables 4a and 4b, this observation is backed by only weak statistical support, and only on the customer side. Perhaps what the data is showing here is that business associations are important in somewhat niche activities within 30 Gutmann and Voigt (2017) use the courts-as-obstacles question as a dependent variable that is viewed as a proxy for the quality of the courts. See also the following from World Bank (2014) on survey results for the Kyrgyz Republic: "Courts are perceived as one of the least problematic areas for doing business…In 2013, only 13 percent of firms saw courts as a problem, and only 4 percentage points of respondents saw it as major or very severe problem…. This is a significant improvement compared to 2008 when 60 percent of firms saw courts as a problem and 29 percent saw them as a major/severe problem." Note that over the same period, there were declines in the percentages of firms believing that the court system is fair, impartial, and uncorrupted, quick, and able to enforce its decisions. particular sectors (Bernstein 2001), but not important generally in those sectors. Our test is too low-power to reflect such niche relationships.

V.4 Sectors
As the statistical tests in Tables 4a and 4b show, governance classes do vary significantly between sectors. 31 Figures 5a and 5b illustrate this variation, which is substantial. For example, the use of bilateralism varies from 74% when food processors interact with their suppliers to 24% in the sales of construction companies. One conjecture on this difference immediately follows from Williamson's emphasis on frequency: the more frequent are exchanges, the easier it is to construct purely bilateral governance. In their sales, construction companies use governance structures that employ private, paid, dispute resolution and the legal system. This is consistent with Williamson's emphasis on more complex governance when exchange is infrequent and involves idiosyncratic interactions (Williamson 1985).

V.5 Management Practices
The effects of firms' management practices are an important avenue of investigation currently in economics (Bloom et al. 2012, Bloom et al. 2013. To quantify the role of management practices, Bloom andVan Reenen (2007, 2010), in coordination with the US Census Bureau, developed a set of survey questions, which the World Bank's Enterprise Analysis Unit modified and implemented as part of the standard WBES. 32 We examine the association between the responses to these questions and the governance structures chosen by firms. Figures 6a and 6b illustrate this association. As firms' management practices improve, the prevalence of pure bilateralism falls (from 76% to 54% on the suppliers' side and from 72% to 41% on the customers' side), indicating that the improvement in internal management practices is accompanied by the use of more complex methods of governance of external relations.

V.6 Miscellaneous Firm Characteristics
We follow the WBES indicators in calling firms "foreign owned" if they are at least 10% owned by foreign private entities. Similarly, we call firms "exporters" if at least 10% of their total sales are in foreign markets. As Tables 4a and 4b indicate, the associations between these measures and the choice of governance structures are weak. But the direction of association is intuitive, as illustrated in Figures 7 and 8. Foreign-owned firms and exporters use pure bilateralism less than firms that are domestically owned and oriented. 31 The WBES contains four-digit ISIC Rev.3.1 information on the main product and activity of each establishment. We used two-digit codes and grouped sectors as follows: Food (codes 15,16), Textiles and Garments (17,18,19), 36) Lastly, we examine firm size, which only has a weak association with the choice of governance structures. As Figure 9 shows, this association reflects the distinctive behavior of very large firms, which have a greater tendency to use governance structures that are comprehensive and make use of the legal system.
We have not commented so far on the variables that fail to reach statistical significance in Tables 4a and 4b. There are also insights there. For example, we find no association between measures of corruption and governance structure, indicating that the effect of the quality of the legal system on these two might be orthogonal. Similarly, there are few connections between the governance of transactions and the type of ownership of the firm (apart from that of foreigners). Finally, there seems to be no difference between the transactional governance structures of the firms who trade locally and those of the firms who trade nationally, a result not to be expected from the existing literature (McMillan and Woodruff 1999).

VI. Lessons Learned and Avenues for Future Research
There has been no previous work consistently mapping cross-country variation in the governance structures that firms employ to support the successful implementation of transactions. In part, this has been due to lack of data. What has been missing is a method to elicit information on the conduct of transactions in a consistent way from firms of all types, functioning in very different environments. We have designed survey questions that have solved this problem, obtaining data whose validity is amply substantiated by the various exercises conducted above. Our paper provides a meaningful picture of the landscape of transactions. This is in no small part due to the new survey questions that we used to elicit information on governance structures. These questions permit the collection of data on a crucial part of economic activity that has hitherto has been largely ignored in cross-country research.
Yet, obtaining the data provided only part of the solution to mapping the landscape. There was also the need to summarize the patterns in the data in a way that produced evocative measures, resonating comfortably with concepts standard in the economic and legal analysis of transactions. LCA eminently suits this task, generating economically meaningful constructsdistinctive types of governance structures-that were extracted from the data without the imposition of an a priori model that constrained the types of governance structures that would be estimated.
LCA is unsupervised in discovering patterns in the data, but it does rely on an underlying probabilistic, generative model. Thus, it combines the advantages of both machine learning and classical statistical methodology. The unsupervised learning offers the possibility of the discovery of new structures not imposed by the researcher and whose existence was not even contemplated before the analysis. The use of a generative model permits reliance on standard statistical techniques for model selection and evaluation of estimates. The paper has shown that LCA offers a fruitful approach that can be extended to other areas of economics where there is a need to construct parsimonious summaries of behavior whose essential nature is implicit in large amounts of data.
Some of our results would be entirely expected by most readers, but even in those cases we are able to add additional quantitative evidence. For example, pure bilateralism is the most common governance structure that we observe. But we are able to estimate the proportion of firms that rely on this approach, and importantly how that proportion varies across countries, regions, and different types of firms. In dealing with suppliers, sizeable numbers of firms supplement their bilateralism with the use of either paid private dispute resolution or formal legal mechanisms. In dealing with customers, a significant number of firms supplement their bilateralism with the use of paid private dispute resolution. Formal legal mechanisms, while used in customer relations, are less important than for supplier relations. For both upstream and downstream transactions, a relatively small proportion of firms rely on a comprehensive set of mechanisms to solve their transactional problems.
Notably, all governance structures use bilateralism. Thus, we find no evidence for the presence in our data of pure arm's length transactions, where firms rely on impersonal mechanisms and formal institutions to support their contracting. This is important because frequently, especially in the economics literature, arm's length transactions are viewed as something of an ideal, the aspirational endpoint in the process of economic development. 33 These types of transactions are sometimes even viewed to be a summary of the situation in developed economies. This view implicitly looks upon bilateralism and formal institutions as substitutes, for which we find no evidence. For many firms, they are indeed complements.
In the existing literature, there are naturally many different implicit assumptions that exist on the relative importance of the different governance structures. Given the lack of existing evidence analogous to that produced above, such assumptions have usually reflected intellectual concerns and ad hoc observation. For example, much attention has been paid in the literature to various unpaid, third-party, mechanisms of supporting agreements, such as networks, social clubs, and culturally defined groups. We see no evidence in our data to support this emphasis. Indeed, our data suggest that the role of government officials in supporting private transactions is at least as significant as the role of these types of third-parties. Any reader viewing these findings with priors gained from the existing economics literature would be quite surprised, given the relative emphasis in that literature on networks in supporting private transactions, while the role of government officials is almost completely ignored.
In Section VI, we provided examples of further analyses that can be conducted once our estimates of governance structures are obtained. The observations are at the firm level and the dependent variables are the probabilities that the firm has chosen each of the four governance structures. Thus, one can relate such probabilities to the characteristics of individual firms. For example, we find that foreign-owned firms, exporters, larger firms, and better-managed ones are less likely to use pure bilateralism. Notably, we find that regional variation in the use of governance structures is more important than cross-country variation. This is somewhat of a puzzle given that institutional rules relevant to transactions are set at the national level in all the countries that we analyze. It suggests that the practicalities of institutional implementation are at least as important as the quality of formal rules.
Nevertheless, generating the conclusions reached in Section VI has not been the prime objective of this paper. They are provided as examples to show the validity of the methodology we have developed and the potential in the datasets that we generate. Our methodology allows readers to go further than we have done, to consider testing other hypotheses by linking their own data to the data we have posted. 34 Moreover, given the information we have provided, readers could add different countries, or cities, or sectors to those we have studied here. If readers implemented the questions that we lay out in Section II in a survey of any size, even one firm, then they could use our posted tools to characterize the governance structure of the firms in their survey. Readers could produce results that are comparable with ours-thereby facilitating diagnosis of a country's strengths and weaknesses-without repeating the laborious steps described above.

Class 3 -Bilateralism with legal support
Standard errors in parenthesis.

Class 4 -Strong comprehensive governance
Standard errors in parenthesis.

Class 1 -Pure Bilateralism
Standard errors in parenthesis.

Class 2 -Bilateralism with private support
Standard errors in parenthesis.

Class 3 -Bilateralism with weak support
Standard errors in parenthesis.

Class 4 -Weak comprehensive governance
Standard errors in parenthesis.     Countries included in regional averages vary by the respective data availability (e.g. the interpersonal trust world average includes only Canada and the United States of America in addition to the Latin American and Caribbean countries). The WGI rule of law index captures perceptions of the extent to which agents have confidence in and abide by the society's rules. The WJP Civil Justice Index measures whether disputes can be resolved peacefully and effectively through the civil justice system. The DB distance to frontier score is measured on a scale of 0 to 100, where 100 is best practice on enforcing agreements and 0 represents the lowest performance. Interpersonal trust is based on Americas Barometer and shows the percentage answering 'Very trustworthy' or 'Somewhat trustworthy' to the following question: "And speaking of the people from around here, would you say that people in this community are very trustworthy, somewhat trustworthy, not very trustworthy or untrustworthy?" The data on fractionalization is from Alesina et al. (2003) and is available for different countries in different years, ranging from 1981 to 1998. The data measures 100 times the probability that a random member of the population is not from the same group.

Figure 4a: Membership and interactions with business associations as they relate to the governance structures in relations with suppliers
Belong to a business association? Regularly interact with business association? B = bilateralism, BP = bilateralism with private support, BL = bilateralism with legal support, SC = strong comprehensive Mutual interest in maintaining business relationship, without involving others ASCd9b Paid, private dispute resolution ASCd9d 1 2 3 4 5 -9 Assistance of government officials ASCd9e 1 2 3 4 5 -9 Intervention of other third-parties (excluding paid, private dispute resolution and government officials) ASCd9c 1 2 3 4 5 -9 Legal system ASCd9f 1 2 3 4 5 -9

A.2 Translation of Questions
The questions were designed in English and then translated into Spanish. In translation to Spanish, the use of the phrase "which of the following" necessitated a noun, with "circunstancias" used, most directly translated as "circumstances" but also possibly understood as "situations". This phrasing also led to the translation of sub-question (or mechanism) 6 as "recurso al sistema legal" or "recourse to the legal system". Both adjustments merit some comment as they may affect our analysis.
The use of "circumstances" as well as the word "recourse" may result in respondents' understanding questions in terms of the realized circumstances in which they found themselves or the actions they had actively undertaken (for instance, through the legal system). On the other hand, the expression "resolving or preventing problems" does appear in the question, suggesting that respondents should have borne in mind pre-emptive acts that did not go as far as, for example, filing a legal action. Piloting of these questions indicated that some respondents understood these questions as referring to their own actions, rather than indicating problems being prevented by the threat, but not use, of an action. To the extent that these questions are understood as referring to actions or realized experience, mechanisms that involve only the threat of action rather than the action itself, such as the shadow of the law, will be under-reported or rated lower.

A.3 Data Collection
All interviews were conducted face-to-face with business owners and top managers using tablet devices. Table A.1 provides information on the dates of fieldwork and the total number of interviews conducted in each country. Fieldwork started in each country following a three-or four-day training and piloting phase.  Table A.2 shows item non-response rates. These rates consider (spontaneously given) "Don't Know" responses as non-response. "Don't Know" is not displayed as a possible option in the 'show card' listing possible responses. More than 97% of respondents answered all the six sub-questions about the methods of governing relations with both suppliers and customers, i.e. not once saying "Don't Know" to any of the 12 sub-questions. The question with the most frequent occurrence of "Don't Know" for relations with suppliers is on paid private dispute resolution (1.4% of sample). For relations with customers, the question about personal trust had the highest item non-response (1.2% of sample). Given the low item non-response rates, in our application of LCA we drop observations that have at least one "Don't Know" in the relevant series of questions. This leaves 3,350 observations on relations with suppliers (97.7% of the sample), and 3,339 observations on relations with customers (97.3% of the sample).

Appendix B: An Intuitive Introduction to LCA
LCA is analogous to factor analysis (or principal components). In the simplest case of factor analysis, a continuous, cardinal, latent variable is estimated using a set of observed measures that reflect the variable with error. LCA is used when estimating a discrete, nominal, latent variable from a set of measures that reflect the variable with error. The key difference, then, lies in the measurement characteristics of the estimated latent variable. For both factor analysis and LCA, all types of observed variables (categorical, continuous, etc.) can be used (perhaps with slight modifications in the details of the statistical procedures). In our application, the measures are the survey responses to the sub-questions on the use of the six different mechanisms. Each latent class is a transactional governance structure, reflecting a combination of a set of mechanisms.
For our simplified example, suppose that a researcher visits the country of Erewhon and asks the following questions to representatives of 500 firms: 1. When making agreements with suppliers, please indicate to what degree personal trust is effective in resolving or preventing problems: 'not at all', 'moderately', or 'extremely'. 2. The same question with 'legal system' substituted for 'personal trust'.
The fictitious responses appear in Table B.1, a 3x3 contingency table.
Systematic patterns in this table are not obvious. A standard approach in first parsing the data would be to assume that the probability of choosing one of the three answers for personal trust is independent of the probability of choosing any one of the answers for legal system; this is the independence assumption. But that assumption is obviously incorrect: a standard chi-squared test rejects it at the 0.001 level.
LCA is a method of uncovering a simple structure in such data. It begins by postulating that there are distinct classes of firms. In so doing, it suggests that the failure of the independence assumption in the aggregate data arises from the fact that the responses reflect a mixture of different classes of firms. Firms within a class are viewed as all having the same data-generating process for the survey responses. In the simplest application of LCA, the independence assumption is applied within classes, and hence is usually referred to as local independence.
Let us suppose that there are two classes of firms, each class having a different approach to the governance of agreements. Very roughly speaking, LCA uses correlations in the answers to the two different questions to estimate the row and column probabilities for each class and the proportion of firms falling into each class. This leads to two separate contingency tables (B.2 and B.3) the first reflecting the responses of 200 firms and the second for 300 firms. (The numbers of firms in each category are a product of the estimation and are not imposed a priori. The number of classes is an a priori assumption.) The local independence assumption is satisfied exactly within each table: the number in each cell is a product of its row probability, column probability, and the number of firms in the class. Thus, within each table, the standard chisquared test-statistic is zero. Table B.1 is simply a cell-by-cell summation of Tables B.2 and B.3, which shows the essence of LCA-the aggregate data are assumed to arise from a mixture of simple distributions. Now, the patterns in the data stand out starkly and are easy to describe. For the firms in Table B.2, both transaction mechanisms are highly effective. For Table B.2b, the legal system is ineffective and personal trust is effective. We could label the strategies of those in Table B.2a as 'comprehensive governance', while the strategies of those in Table B.2b are 'purely bilateral'. The litmus test of an insightful LCA application is an evocative description of the behavior typical within each class, and the identification of stark differences between the behaviors of each class.
Note that in this example LCA estimates nine parameters, requiring more estimates than the eight that would directly reflect Table B.1. Nevertheless, it adds a rich understanding of the data generating process by identifying two quite separate and meaningful patterns of behavior. The full benefit of LCA arrives only when the complexity of the problem increases. The number of parameters to be estimated by LCA increases linearly in the number of questions asked. In contrast, the number of cells in the contingency matrix analogous to Table B.1 increases exponentially. In the data analyzed in this paper, the number of cells is 5 6 (15,625) and nearly this many parameters would have to be estimated without the imposition of a simple structure. In contrast, a 2-class LCA model applied to the same data would require estimation of 49 parameters. Based on the correlation patterns reported in Table C.1, a model with the correlation structure 1-2, 2-3, 3-4, 3-5, 4-5 was added to the original four model structures for transactions with suppliers. And for transactions with customers, a model with the correlation structure 1-2, 4-5, 4-6, 5-6 was added to the same four original structures. In sum, for each side of business relations (with suppliers and with customers), we chose to consider a total of 20 models, that is five correlation structures each with 3-, 4-, 5-, and 6-class specifications. 36

C.2: Criteria for Model Selection
Model-selection criteria employ a number of standard statistical measures. All measures begin with the log likelihood (LL), which reflects goodness-of-fit without any adjustment for the number of estimated parameters. The measures, other than LL itself, then add extra terms to the LL, where those terms reward parsimony and penalize classification uncertainty. The likelihoodratio χ 2 goodness-of-fit statistic (referred to as L 2 in Vermunt and Magidson (2016)) is used to test the null hypothesis that the estimated model fits the data. In the tables that follow, we present only the -values for L 2 , since its distribution varies across models, precluding comparisons of absolute values. 37 The Bayesian information criterion (BIC), the consistent Akaike information criterion (CAIC), and the approximate weight of evidence criterion (AWE) are varieties of information criteria, all reflecting the log likelihood, and thus goodness-of-fit, plus a penalty term that is a function of the number of estimated parameters and the number of observations. 38 As a consequence of the specification of the penalty terms, AWE favors more parsimonious models than does CAIC, followed, in terms of favoring parsimony, by BIC and then log likelihood. Lower values of the information criteria indicate preferred models.
Entropy R 2 is a measure of classification certainty. It has not been traditionally used as a model selection criterion but rather as an ex-post check on the model's results (Masyn 2013). An entropy R 2 that is close to zero indicates that the estimated latent classes are not welldistinguished. Two additional information criteria add a term based on the entropy R 2 , thus penalizing classification uncertainty (in addition to rewarding goodness-of-fit and parsimony). These are the 'classification AWE' and the 'integrated classification likelihood' (ICL-BIC)'. Again, lower values indicate preferred models. 39 36 As noted immediately above, the specifics of the 20 models differs between supplier-and customer-relations. 37 For background and formulae see Collins and Lanza (2010: 83) or Vermunt and Magidson (2016: 68). 38 We use the BIC and CAIC based on the log likelihood, not the alternatives that are based on L 2 . The formulae are standard (Vermunt and Magidson 2016: 70). See Banfield and Raftery (1993) for the statistic we label AWE in Section IV, which is the standard one employing this label (Masyn 2013: 568). This is not directly reported by Latent GOLD, but is easily derived from the LL, the number of estimated parameters, and the number of observations. 39  Statistics on homogeneity and separability provide a final check on acceptability of a model,. In terms of the notation of subsection III.1, homogeneity is characterized by estimated | that are not too close to 1/R. For binary (R = 2) response variables, one standard implementation of this criterion is that the | should not be in the interval [0.3, 0.7] (Masyn 2013). When we evaluate homogeneity, we aggregate responses into binary categories and apply this criterion.
The statistical measures related to separability are less ad hoc. Roughly speaking, in terms of the notation of subsection III.3, the measures assess whether the estimated are close to 0 or 1, that is classification certainty. These measures use modal class assignments-setting respondent i's class assignment to the j that maximizes . Average posterior class probability for class c ( ) is the mean value of for all i classified in c using modal class assignment. Satisfactory values are close to 1. Odds of correct classification ( ) is a ratio of two odds ratios. The numerator reflects and the denominator uses , the estimated class membership probability for c derived at step-1 of the 3-step procedure. equals 1 if class membership assignment is no better than random. A rule-of-thumb is that should be at least 5.0, for all c. The modal class assignment proportion ( ) is the proportion of respondents in class c when respondent i's class assignment is set using modal class assignment. If respondents are assigned to classes with certainty, then = . Since step-1 of the LCA estimation gives standard errors for , a natural diagnostic is to examine whether lies in a small confidence interval of .

C.3 Choosing the Preferred Model
In selecting one model from the 20 estimated, we use the statistical measures of model-fit and parsimony. At this stage the implementation of the lattermost criterion meant a preference, but not a constraint, for describing the two sides of business relations with the same number of latent classes. Tables C.2a and C.2b present the measures of model fit for the two sets of 20 estimated models. In addition, the column listing the number of parameters is included to reflect parsimony. 40 In both tables, the numbers in bold highlight the three best-performing models according to the statistic noted in the relevant column. A glance at Tables C.2a and C.2b already suggests that the models with complex correlation structures generally perform better for a variety of statistics. This is hardly surprising given the steps leading up to the consideration of this specific correlation structure (i.e. relaxation of local independence based on bivariate residual correlations).
For relations with suppliers, Table C.2a indicates that the model with 4 classes and correlation structure 1-2, 2-3, 3-4, 3-5, 4-5 performs well across most statistics. It is included in the best three models across all statistics except AWE; it is the best-performer on BIC, CAIC, and ICL-BIC; it is the second-best on Entropy-and third-best on and classification AWE. Note that among the Bayesian statistics both AWE statistics penalize an increase in the number of parameters most strongly and therefore, not surprisingly, the first-and second-best models on the classification AWE are far more parsimonious than those classified as best by other statistical criteria. However, given the strong performance on most statistics for the 4-class, 1-2, 2-3, 3-4, 3-5, 4-5 model for relations with suppliers, it is difficult to argue for a more parsimonious model. The 5-class model with the same correlation structure is the next best model.
Model selection for the relations with customers is less clear-cut. Since the 4-class model is preferred for suppliers, it is worth focusing first on 4-class models for the customer-side as well. Among these, the best performers are the one with no correlations and the one with the most complex correlation structure. While the model with no correlations performs better on some statistics (Entropy-and classification AWE), it underperforms the correlation structure 1-2, 4-5, 4-6, 5-6 on all other Bayesian statistics. Importantly, both BIC statistics are lower for the more complex model. Consequently, among the 4-class models, the correlation structure 1-2, 4-5, 4-6, 5-6 is preferred. Comparing the performance of this model with other models more broadly, the 5-and 6-class models with the same correlation structure are the closest in performance. However, the 4-class model is the best-performer on CAIC and ICL-BIC, and is only slightly inferior on the BIC and other measures. Combining this statistical evidence and an a priori preference to select models with the same number of latent classes across the two types of relations, we select the 4-class, 1-2, 4-5, 4-6, 5-6 correlation model to describe relations with customers. Here too, the 5-class model with the same correlation structure is the next best alternative.
Note that in all these steps leading up to selecting one model for each type of the two types of business relation, we did not examine the behavioral patterns reported by each of the 40 estimated models. This was entirely intentional as we followed the standard model-selection steps separating the process of selection from the analysis and interpretation of its findings.

C.4: Robustness: Comparison of the Chosen Models with the Next-Best Alternatives
We now provide further checks on the validity of our choices of LCA models. These checks use terminology and graphical formats that are laid out in Section IV of the paper, and we therefore recommend reading this part of the Supplementary Appendixes after completing Section IV.
We examine whether the behavioral patterns suggested by our chosen models differ from the behavioral patterns suggested by the next best alternatives-the 5-class models with the same correlation structure across questions as our chosen 4-class models (see Tables C.2a and C.2b). Tables C.3a and C.3b illustrate the governance structures of the 5-class models in the same format as Tables 2a and 2b from Section IV. Even a quick glance at these tables and figures is enough to recognize the same governance structures we already saw in Section IV, with no new behavioral pattern meriting a distinct name. Table C.4a presents the firm-by-firm correspondence between governance structures assigned (modally) by the 4-class models with those of 5-class models, for supplier-relations. Four of the five classes in the 5-class model have a near perfect mapping with the original four classes. The additional 5 th class can be safely described as using pure bilateralism, albeit with a tinge of legal support (see Table C.3a). It comprises mostly the firms that were assigned to the group using pure bilateralism in the 4-class classification. A close mapping between the class assignments is also reflected in the estimates of class membership probabilities, with the prevalence of governance structures similar whether we apply the 4-or 5-class LCA.
While the governance structures for customer-relations suggested by the 5-class model (Table C.3b) do not contain a structure that is qualitatively different from the structures of the 4class model (see Table 2b), the firm-by-firm correspondence exhibited in Table C.4b is less straightforward than it was for supplier-relations. Three governance structures in the 5-class group are clearly mapped into single classes in the 4-class group. The rest of the mapping is straightforwardly derived from the figures that illustrate the underlying behavior of classes 4 and 5. Namely, class 4 comprises firms that were assigned to pure bilateralism, or bilateralism with private support, or weak comprehensive. However, examining Table C.3b, class 4 is substantively indistinguishable from bilateralism with private support. Similarly, class 5 comprises firms that were assigned across all possible classes, but in terms of the behavioral pattern given in Table C.2b it is a close version of weak comprehensive governance.
To summarize, the 4-class and the 5-class models produce very similar overall estimates of governance structures. For supplier relations, nearly all firms are assigned to the same governance structures across the two models. For customer relations, the firm-by-firm assignments are clear-cut only for some governance structures. In cases with a more noisy mapping of the firm-by-firm assignments, the governance structures of the 5-class models have a structure that closely corresponds to ones already suggested by the 4-class model. Such a close correspondence between the governance structures across our chosen and the next-best models indicates that our findings are robust to small changes in model selection.

C.5: Class Homogeneity and Separability
As a final check on our chosen models, we examined measures of class homogeneity and separability, as laid out in Subsection C.2 above. As noted by Masyn (2013), a class has a high degree of homogeneity if there are both high and low probabilities predicted response probabilities that class (that is, high and low | within each c). 41 A standard rule-of-thumb is to consider a category homogeneous if these probabilities are either below 0.3 or above 0.7, but this rule-of-thumb is applicable only to binary responses. Therefore, for this exercise alone, we converted the probability data given in Tables 2a and 2b into two binary categories-'Not at all', 'Slightly' and 'Moderately' versus 'Very much' and 'Extremely'. Table C.5 reports counts of the estimated probabilities of responses in our model that qualify as homogeneous by this criterion. (Note that we now have 12 categories = 6 questions × binary responses). All four classes in both types of relations appear highly homogeneous.
Because all classes could be highly homogenous but very similar, it is also important to check whether one can reliably distinguish between the classes. This is the notion of separability, several measures of which are introduced in appendix sub-section C.2. Tables C.6a and C.6b report the estimates of these measures for our classes.
(Average Posterior Class Probability) measures average class membership probability across all respondents classified into c by modal class assignment (i.e., using the maximum posterior class probability). If the class memberships are assigned with certainty, then this measure equals 1. As Tables C.6a and C.6b show, is very close to 1, comfortably exceeding the minimum rule-of-thumb rule.
(Odds of Correct Classification Ratio) is a ratio of odds ratio, with the denominator reflecting the and the numerator reflecting . It equals 1 if average posterior probabilities are no better than a random application of the estimated class membership probabilities (that is, if Bayes theorem using firm-specific responses for class assignment does no better than class assignment ignoring the firm-specific data). Again, the tables show that our model exhibits a high degree of class separation, well above the rule-of-thumb minimum.
(Modal Class Assignment Proportion) is the proportion of respondents in each class when firms are assigned to classes modally. If respondents were assigned with certainty, then would exactly equal the directly estimated class membership probabilities ( ). To assess any discrepancy, one rule of thumb is whether lies within a 95% confidence interval (CI) of the corresponding class membership probability estimates. Tables C.6a and C.6b demonstrate clearly separate classes as our 's are close to the estimated class membership probabilities falling within the 95% CIs. Indeed, all lie in a 33% CIs of the corresponding .
All classes in both upstream and downstream relations are homogeneous and well separated.

Class 5
Standard errors in parenthesis.

Class 3
Standard errors in parenthesis.

Class 4
Standard errors in parenthesis.

Class 5
Standard errors in parenthesis.

Class 3
Standard errors in parenthesis.

Class 4
Standard errors in parenthesis.   Appendixes, page 23 Appendix D