Policy Research Working Paper              9391




              Making Gravity Great Again
                             Will Martin




Development Economics
Development Research Group
September 2020
Policy Research Working Paper 9391


  Abstract
 The gravity model is now widely used for policy analysis                           that the traditional Ordinary Least Squares estimator in
 and hypothesis testing, but different estimators give sharply                      logarithms is strongly biased downwards. The popular Pois-
 different parameter estimates and popular estimators are                           son Pseudo Maximum Likelihood model also suffers from
 likely biased because dependent variables are limited-                             downward bias. An Eaton-Kortum maximum-likelihood
 dependent, error variances are nonconstant and missing                             approach dealing with the identified sources of bias pro-
 data frequently reported as zeros. Monte Carlo analysis                            vides unbiased parameter estimates.
 based on real-world parameters for aggregate trade shows




 This paper is a product of the Development Research Group, Development Economics. It is part of a larger effort by the
 World Bank to provide open access to its research and make a contribution to development policy discussions around the
 world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The author may
 be contacted at W.Martin@cgiar.org.




         The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development
         issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the
         names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those
         of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and
         its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.


                                                       Produced by the Research Support Team
                       Making Gravity Great Again




                                         Will Martin


                         International Food Policy Research Institute


                                      Washington DC




EconLit Descriptors: C13, C24, C54, F14, F20
                               Making Gravity Great Again

I.     Introduction

The gravity model has become enormously important both for economic theory and for policy

analysis. Since its first use in the 1960s (Tinbergen 1962), it has been popular because of its

ability to represent patterns in many bilateral flows, including aid, investment, migration and

trade. The progress made in providing strong theoretical foundations (see Head and Mayer 2014)

has greatly increased its popularity for estimating critically important parameters, such as the

effects of distance on trade flows (Eaton, Kortum and Sotelo 2013) and the welfare costs of

economic distortions (Arkolakis, Costinot and RodrГ­guez-Clare 2014). It is now widely seen as a

basis for new quantitative models to supplement (Ottaviano 2016) or replace computable general

equilibrium models for policy analysis. Unfortunately, popular estimators yield different

parameter estimates, with some seriously biased because of frequent zero flows, missing data

presented as zeros and a combination of nonlinearity and heteroscedasticity.

       Bilateral trade data sets often contain many zeros, with around half common for total

trade and larger shares for sectoral trade. Up to the early 2000s, gravity equations were typically

estimated in logarithms, excluding the zero observations, despite the evidence (eg Olsen 1980)

that such sample truncation results in bias. Trade datasets frequently also include many missing

data not reported to UN COMTRADE and conflated with true zeros. An enormously influential

paper by Santos Silva and Tenreyro (SST) (2006) highlighted the potential for bias from the

combination of heteroscedasticity and nonlinearity in gravity equations. They provided a very

thoughtful explanation of why their preferred estimator, the Poisson Pseudo-Maximum

Likelihood (PPML), deals with this problem.




                                                 2
       Many have argued that PPML eliminates the bias associated with zero trade flows by

allowing the zero trade values to be included (eg Yotov et al 2016, p20). Certainly, this ability to

incorporate zeros is in sharp contrast with the conventional approach, where the zeros must be

omitted because the log of zero is undefined. However, it is not enough to eliminate concerns

about bias. Greene (1981) showed that where Ordinary Least Squares (OLS) can be and is used

on such censored samples the coefficients are scaled down by the share of non-limit observations

in the sample.

       SST (2006) presented Monte-Carlo analyses which led themвЂ”and many othersвЂ”to

conclude that the PPML estimator was the best available estimator for the gravity model.

However, these experiments included no zeros so they illuminated only the

heteroscedasticity/nonlinearity problem. SST (2011) presented an alternative Data Generating

Process (DGP) with many zeros for which the PPML estimator was unbiased but, as noted by

Head and Mayer (2014, p180), this DGP does not feature the fixed costs or threshold values that

play key roles in generating zero trade outcomes in many models based on modern theory.

Estimators such as the Tobit (Tobin 1958) or Heckman (1979) models that allow for zero-valued

outcomes are consistent with modern gravity but also suffer from potentially large biases

associated with heteroscedasticity (Arabmazar and Schmidt 1981; Donald 1995).

       Monte Carlo simulation allows application of alternative estimators to datasets with

known parameters so any bias can be identified. However, as argued by Leung and Yu (1996),

the DGPs used for Monte Carlo analysis must be close to the parameters of the process

generating the actual datasets, or their findings may be very misleading. They showed that many

Monte-Carlo based studies for limited-dependent variable models gave totally misleading

answers because their experimental designs ignored key features of the underlying data



                                                 3
generating processes. Martin and Pham (2020) found Monte Carlo conclusions about gravity

model estimators to be very sensitive to key parameters such as the distribution of predicted

trade.

         Unfortunately, no Monte Carlo simulations on data with properties matching those of the

relevant underlying data generating processes appear to be available. The important SST (2006)

Monte Carlo experiments involved: (i) trade predictions assumed log standard normal before

adding a dummy variable and (ii) model disturbances distributed log standard normal before

adding arbitrary patterns of heteroscedasticity (SST 2006, Case 3). Head and MayerвЂ™s Handbook

chapter (2014, p181) used a Monte Carlo design that yields the right share of zero trade

observations, but considered only two arbitrary types of heteroscedasticityвЂ”a constant variance

to mean ratio consistent with the PPML model and a constant variance in logs model consistent

with estimation in logarithms.

         The goals of this paper are to reduce the uncertainty about the best estimator for the

gravity model with zeros resulting from sample selection and missing data, and to identify good,

practical estimators that allow for potential heteroscedasticity. To foreshadow the key results, the

standard truncated OLS estimator was strongly biased downwardsвЂ”by between 38 and 48

percent. The PPML estimator was biased downwards by an average of 18 percent and threshold

Tobit models by 8 to 15 percent. A maximum likelihood estimator following Eaton and Kortum

(2001) in dealing with the limited-dependent variable problem but also allowing for

heteroscedasticity (the EK-H estimator) provided unbiased parameter estimates. Including

missing trade data mapped to zeros had small impacts on most parameter estimates but increased

some by around 20 percent.




                                                  4
         The second section of the paper considers the potential for biases from limited-dependent

variables, missing data and heteroscedasticity, and their implications for the design of Monte

Carlo experiments. The third section presents estimates of the gravity model from a range of

potential estimators, partly to see whether there are important differences in results, and partly to

provide estimates of the parameters needed for Monte Carlo analyses to identify the vulnerability

of different estimators to bias. The Monte Carlo framework is developed in the fourth section

and its results are discussed in the fifth section. A short sixth section provides some suggestions

for applied research, prior to the final conclusions.



II.      Sources and Solutions for Estimation Bias

Before designing Monte Carlo simulations that allow testing the suitability of alternative

estimators, it is important to understand the sources of potential bias. SST (2006) identified two:

(i) limited-dependent variable bias associated with zero trade values, and (ii) the combination of

nonlinearity and heteroscedasticity in the model. An additional source of potential bias

considered here arises from missing trade data treated as zeros.

         Limited-dependent variable bias results from winsorization of the distribution of a latent

variable at a threshold value, with all values beyond the threshold mapped to this value (Tobin

1958). This results in errors from Ordinary Least Squares (OLS) estimators having non-zero

means (Heckman 1979), violating the fundamental assumptions of OLS and leading to biased

parameter estimates. The combination of nonlinearity and heteroscedasticity may cause bias in

nonlinear models because of correlations between explanatory variables and error terms. In

limited-dependent models heteroscedasticity causes bias, even in linear models, as discussed

below.



                                                  5
       Eaton and Kortum (2001) suggest an estimator for the gravity model later formalized for

limited-dependent variable models by Carson and Sun (2007). While limited-dependent variable

estimators for decisions such as purchases of new automobiles frequently use a zero threshold,

Carson and Sun (2007) point out that the minimum possible purchase is not zero but the price of

the cheapest new car, implying a threshold above zero. Similarly, modern theory points to

threshold values below which market entry is not worthwhile given the distribution of

productivity across firms and countries (Chaney 2018). Consistent with this, the Eaton-Kortum

(EK) (2001) estimator replaces the limit observations with thresholds proxied by the smallest

observed import into each country. Importantly, Carson and Sun (2007) show that the asymptotic

distributions of other parameters are not affected by uncertainty about this threshold.

       The resulting model in levels, generalized to incorporate a nonlinear functional

relationship between the dependent and independent variables and different error variances for

different observations, can be represented as:

       (1)              yi* = f(xi,ОІ) + ui

where yi* is a latent variable, xi is a vector of exogenous regressors and ОІ a vector of parameters.

Observations of the dependent variable are:

       (2)              yi = yi*                                    if yi* > Оі or

       (3)              yi = 0                                      if yi* в‰¤ Оі

where Оі is the threshold that the latent variable must exceed before its value is observable.

       The likelihood function for this problem is the product:



                                    н µн±›н µн±›0       н µн»ѕн µн»ѕв€’н µн±“н µн±“(н µн±Ґн µн±Ґн µн±–н µн±– ,н µн»Ѕн µн»Ѕ)                     1           н µн±¦н µн±¦в€’н µн±“н µн±“(н µн±Ґн µн±Ґн µн±–н µн±– ,н µн»Ѕн µн»Ѕ)
       (4)      н µн°їн µн°ї(н µн»Ѕн µн»Ѕ, н µнјЋн µнјЋ) = в€Џн µн±–н µн±–=1 О¦пїЅ            н µнјЋн µнјЋн µн±–н µн±–
                                                                            пїЅ в€Џн µн±›н µн±›
                                                                               н µн±–н µн±–=н µн±›н µн±›0 +1 н µнјЋн µнјЋ н µнј™н µнј™(            н µнјЋн µнјЋн µн±–н µн±–
                                                                                                                                      )
                                                                                                н µн±–н µн±–




                                                                                 6
Where n0 is the number of zero observations; О¦ is the Cumulative Distribution Function (CDF)

and н µнј™н µнј™ the Density Function of the error terms; n is the total number of observations; н µнјЋн µнјЋн µн±–н µн±– is the

standard deviation of the distribution of errors at observation i which, extending Carson and Sun

(2007), varies by observation. The first product in equation (4) refers to observations at the

threshold. The second refers to nonzero trade observations. The positive EK threshold allows use

of log-linear models.

        Figure 1 provides useful insights into key problems with standard approaches to

estimation and the Eaton-Kortum (2001) version of the gravity model. The x-axis shows the

predicted value of the log of trade, which is mapped to actual trade in logarithms on the y-axis

with a unit coefficient. The EK threshold is represented by the horizontal line at Оі. This linear-in-

logarithms version makes the implications of limited-dependent variables and missing data much

more obvious than a nonlinear version. The small sample of observations was generated in

STATAвЂ™s pseudo-random number generator using parameters consistent with estimated

distributions of residuals and predicted trade.

        All the dots, squares and triangles in Figure 1 are realizations of the latent variable in

equation (1). Although it is not obvious with this sample, observations with lower predicted

values have higher error variances. If the full set of points were available, OLS would provide

unbiased estimates of the slope. However, only the round, and potentially the square, points are

observed. The triangles are realizations of the latent variable below the threshold, which are

observed as zeros and mapped to the x observations on the threshold in evaluating the likelihood

function. The displacement of these observations means that the residuals from applying OLS to

a censored sample consisting of the dots, the squares and the xвЂ™s have a non-zero mean, making

OLS a biased estimator (Heckman 1979). OLS estimation using only positive trade values also



                                                     7
results in bias because this truncation tends to remove observations with smaller errors, resulting

in errors with positive expected value.

        Heteroscedasticity can have substantial implications for Tobit estimators since the value

н µнјЋн µнјЋн µн±–н µн±– affects the realization of the CDF in equation (4) for each of the limit values, x. The resulting

bias increases with the degree of heteroscedasticity and the share of limit observations (see

Arabmazar and Schmidt 1981) and is quite different from bias due to the combination of

heteroscedasticity and nonlinearity identified by SST (2006).

        Gravity models are frequently estimated on flows of aid, investment (Chitu, Eichengreen

and Mehl 2014), migration (Artuc et al 2014) or trade data with missing values of the dependent

variable. When these are mapped to zeros, the resulting sample merges data from two distinct

regimesвЂ”the stochastic relationship between x and y and the deterministic relationship between

x and the zero observations. This can be expected to reduce the coefficient on x estimated with

non-limited-dependent estimators like Nonlinear Least Squares (NLS) or PPML. With limited-

dependent variable estimators, the impact of the missing data on coefficient estimates depends

firstly on whether these missing values would have exceeded the threshold (bias) or not (no

bias). Figure 1 shows that the direction of the bias from mapping nonzero trade values to zero

will depend on their valuesвЂ”mapping the square at (x = -1.1, y=4.4) to the threshold biases the

estimated slope upwards, while mapping the square at (8, 13) to the threshold biases it down.

This suggests that missing trade for relatively small flows may bias gravity model coefficients up

while missing trade for large flows may bias them down in absolute value.




                                                     8
Figure 1. Log-Log Gravity Model with Zeroes, Heteroscedasticity & Missing Values
    15
    10




y
    5




     Оі
    0
    -5




                                     x
             -5               0                5             10               15
                                           x




                                           9
The key parameters needed to ensure that Monte Carlo simulations replicate the system under

examination are:

    1. The threshold, Оі, distinguishing values at the limit and those above the limit,

    2. The distribution of predicted outcomes, including

        (i) Their distribution and especially whether it is normal or log-normal,

        (ii) The mean and variance of this distribution

    3. The distribution of the residuals around the expected trade outcomes, including:

        (i) Their distribution,

        (ii) Their standard deviation at any point, Пѓi, and

        (iii)The pattern of any heteroscedasticity

    4. Missing values and their treatment, and

    5. The fraction of observations at the threshold



Many estimators of limited-dependent variable models assume a threshold value of zero, since

only zero or positive values of gross flows of aid, migration, investment or trade are recorded.

But a zero threshold is questionable with multiplicative gravity models of the type considered

because of: (i) the need for enough volume to cover fixed costs in modern models of trade (eg

Chaney 2008) or other flows, (ii) the fact that it makes zeros impossible if residuals are

distributed log-normally, and (iii) the fact that the zero share must be below 50 percent if model

residuals are symmetrically distributed in levels, while larger shares are frequently observed.

        The distribution of predicted outcomes may influence the results substantially. Predicted

values of trade, н µн±“н µн±“(н µн±Ґн µн±Ґн µн±–н µн±– , н µн»Ѕн µн»Ѕ), below or near the threshold are more likely to result in zeros. When

predicted trade is far above the threshold, zeros are much less likely.



                                                       10
        The distribution of the residuals also influences outcomes. Whether they are in levels or

logs has profound implications while higher variances usually increase the share of limit

observations. Heteroscedasticity can result in bias through correlations with nonlinear functions,

misspecification of Equation (4), or both.

        The frequency of missing values and their treatment are potentially important. Many zero

observations reflect the failure of economies to report their trade or other flows. If these missing

values are treated as zeros, there may be substantial bias. If they have been replaced by analystsвЂ™

estimates, estimation of the gravity model becomes a differentвЂ”and much less interestingвЂ”

exercise in inferring the model used to fill out the missing parts of the trade matrix.

        The fraction of observations at the threshold is an important, observable outcome of the

data generating process, depending on all the parameters identified above.



III.    Estimating Key Parameters



This section examines a series of potential estimators of the gravity model both to assess the

importance of estimator choice for estimates of key parameters for trade flows, and to identify

the functional forms and parameter values needed for informative Monte Carlo analysis. The

models considered are: (i) traditional OLS in logs, (ii) PPML (iii) Feasible Generalized Least

Squares (GLS); (iv) the Tobit model, (v) Weighted Tobit, and (vi) the Eaton and Kortum (2001)

Maximum Likelihood with correction for heteroscedasticity (EK-H).

        The traditional gravity model includes national variables such as income and population

as well as bilateral variables such as the transport costs, tij, from country i to j. While intuitively

appealing, this does not capture multilateral resistances and hence generates misleading policy



                                                   11
parameters (Anderson and van Wincoop 2003). Feenstra (2004) and Redding and Venables

(2004) showed that policy relevant parameter estimates can be obtained by replacing national

variables with dummies in cross-sectional analysis, leaving only the bilateral variables, although

Anderson (2011) expressed some concerns about this approach. Because this fixed-effects (FE)

approach is the simplest solution to the problem it has become enormously popular and is

examined to ensure that the parameters for Monte Carlo analysis represent such structural gravity

models.

        Where heteroscedasticity is allowed for, the variance of the residuals is modelled as н µнјЋн µнјЋн µн±–н µн±–2 =

н µн±’н µн±’ (н µн±Ћн µн±Ћ+н µн±Џн µн±Џн µн±Џн µн±Џ) where x is the predicted log of trade and the exponent term ensures the variance is always

positive (see Cameron and Trivedi 2009, p155). Consistent with standard practice, estimates

based on the OLS residuals were used for Feasible GLS and Weighted Tobit estimators to see

how well these simple approaches perform and to allow comparison with the EK-H estimator.

        PPML was considered because of its widely perceived ability to deal with the bias

problems associated with both zeros and nonlinearity/heteroscedasticity, and the tentative

support for its performance provided by Martin and Pham (2020, p11) when predicted trade is

concentrated near zero. The Tobit model is designed to deal with limited-dependent variable

bias, and Weighted Tobit attempts to deal with both limited-dependent variable bias and impacts

of heteroscedasticity on this model. The EK-H model was included because it specifically deals

with both heteroscedasticity and limited-dependent variable bias.

        Several models used in earlier work were not examined. The Gamma Pseudo-Maximum

Likelihood model was excluded because it did not perform well in Head and MayerвЂ™s

simulations (2014, p181). NLS was excluded because of concerns about bias due to limited-

dependent variables, heteroscedasticity and nonlinearity; and evidence against normality of


                                                      12
residuals in levels. The Heckman model was not examined because it is challenging to correct

for heteroscedasticity (Donald 1995), and its need for an additional variable in the selection

equation makes model comparisons difficult. The Zero-Inflated-Poisson (ZIP) model was

excluded because it was found to give the same results as truncated PPML in real-world

regressions (Martin and Pham 2020) and because the inflation term requires an additional

variable.

       The analysis uses the 136*136 cross-sectional trade matrix for 1992 curated by Feenstra,

Lipsey and Bowen (1997) and studied by SST (2006) and by Jochmans (2017). A cross check

with UN COMTRADE revealed that 46 of these economies did not report their trade for 1992,

including Iran, Nigeria, Russia and South Africa. Of the potential 2070 trade flows between

these non-reporters 2059 were zero, with nine minuscule flows into St Kitts and Nevis, one to the

Bahamas and one to Guyana. This made clear that the developers of the dataset had made no

systematic attempted to predict flows between non-reporters. Given the possibility of bias from

treating these missing data as zeros, trade between these non-reporters was omitted in the main

analysis, with results from the full dataset presented in the Online Appendix. With these data

excluded, the full sample was 16290 with 41 percent zeros.

       The results for the Fixed Effects (FE) models in Table 1 show substantially different

results between estimators. The critical distance elasticity, for example, ranges from -0.75 with

the PPML estimator through -1.47 with the EK-H estimator to -1.88 with the Tobit model. Of the

categorical variables, common language and colonial ties have consistently positive and

significant effects, while the signs on FTA and Contiguity dummies are puzzlingly negative for

the three models based on limited-dependent variable models.




                                                13
       A key feature of the traditional model results in Table 2 is consistent patterns in the

estimates of key parameters, particularly the income and distance elasticities. The elasticity on

exportersвЂ™ GDP, for instance, rises by more than a factor of two from 0.73 with the PPML model

to 1.5 with the Weighted Tobit model. For the distance elasticity, the range is even larger,

from -0.78 with the PPML to -1.74 with the Tobit model.

       Several other features of Tables 1 and 2 are of interest. The first is that the

heteroscedasticity parameters are strikingly similar across estimators. A second is that the

differences between the estimators that account for heteroscedasticity and those that do not are

generally relatively small. This is clear for OLS and Feasible GLS, and for Tobit and Weighted

Tobit. A third is that the estimators accounting for limited dependent variable status tend to have

larger coefficients than those that do not. A fourth feature is that the PPML estimator has lower

coefficients on key distance and income terms than other estimators.




                                                 14
Table 1. Results for the Fixed Effects Gravity Model, Excluding Non-Reporters.
                                  OLS                  PPML                Feasible GLS               Tobit                 Wtd Tobit                   EK-H
                            Coeff       t-stat    Coeff    t-stat        Coeff       t-stat     Coeff         t-stat    Coeff      t-stat       Coeff          t-stat
    Log distance            -1.35       -43.1     -0.75    -18.5         -1.28       -44.6      -1.88         -44.9     -1.66      -42.3        -1.47          -26.1
     Contiguity              0.17        1.3       0.37     4.1          -0.01        -0.1      -0.57          -3.2     -0.68       -4.7        -0.60           -3.1
  Common language            0.41        6.0       0.38     4.1           0.39         6.2       0.79           9.2      0.70        8.4         0.65            6.7
     Colonial tie            0.67        9.5       0.08     0.6           0.68        10.4       0.98          10.9      1.09       12.3         0.99           10.5
        FTA                  0.31        3.2       0.38     4.9          -0.07        -0.8      -0.36          -2.5     -1.03       -8.4        -1.38           -9.5
    Fixed Effects            Yes                   Yes                    Yes                    Yes                     Yes
    Observations            9602                  16290                  9602                   16290                   16290                   16290
         R2                  0.75                  0.93                   0.80                   0.30                    0.31
      Hetero, ln
          a                 1.97        43.0        2.7      105.3                                                                               2.6            50.7
          b                 -0.12       -19.4      -0.09     -15.6                                                                              -0.12          -14.5
                                                   Mean       SD                                                                                Mean             SD
 Ln Predicted Trade                                 8.0       2.9                                                                                4.3             4.9

Source: AuthorвЂ™s estimates using Stata 16.0. The log of trade is the dependent variable in all but the PPML model. WLS and Weighted Tobit used

heteroscedasticity parameters shown under the OLS heading obtained by applying the Cameron and Trivedi (2009, p155) technique to residuals from the OLS

regression. Heteroscedasticity parameters for the PPML were estimated using an EK-H model to avoid truncation of the residuals in estimating the

heteroscedasticity function, with the log of the predicted value from the PPML as the explanatory variable, the log of actual trade in the density function in

equation (4) and the log of the threshold in the CDF.




                                                                                 15
Table 2. Results for the Traditional Gravity Model, Excluding Non-Reporters
                               OLS               PPML           Feasible GLS               Tobit            Weighted Tobit         EK-H
                          Coeff    t-stat   Coeff     t-stat   Coeff      t-stat   Coeff           t-stat   Coeff    t-stat   Coeff   t-stat
  Log exporter's GDP      0.94     80.6     0.73      27.3     0.96       86.9     1.49            94.6     1.50      99.1    1.49     91.3
  Log importer's GDP      0.80      71.7    0.74       26.9    0.83        78.4    1.15            75.3     1.17      80.0    1.15     73.6
 Log exporter's GDPC       0.21     12.5     0.16       2.9     0.21      13.1      0.27           12.1      0.25     11.6    0.24      9.6
 Log importer's GDPC        0.11      6.4     0.13       3.0     0.11       7.0      0.21            9.3      0.19      8.5    0.18      7.6
     Log distance          -1.17    -34.4    -0.78     -14.3    -1.15     -37.0     -1.74          -36.3     -1.62    -36.4   -1.41    -24.0
      Contiguity            0.32      2.2     0.20       1.9     0.10       0.9     -0.50           -2.4     -0.69     -4.1   -0.69     -3.0
   Common language          0.68     10.6     0.74       5.5     0.70      11.4      1.02           11.8      0.95     11.3    0.78      7.1
      Colonial tie          0.40      5.8     0.03       0.2     0.38       5.9      0.76           8.3       0.80      9.0    0.82      7.4
  Landlocked exporter      -0.06     -0.9    -0.86      -5.5    -0.09      -1.5     -0.11           -1.2     -0.14     -1.6    0.00     -0.1
  Landlocked importer      -0.66    -10.5    -0.69      -4.9    -0.65     -10.4     -0.74           -8.8     -0.78     -9.1   -0.62     -7.6
 ExporterвЂ™s remoteness      0.47      6.0     0.65       4.9     0.46       6.3      0.65            5.9      0.61      5.8    0.39      3.5
 ImporterвЂ™s remoteness     -0.21     -2.6     0.56       4.7    -0.13      -1.8     -0.31           -2.7     -0.26     -2.4   -0.30     -2.6
         FTA                0.48     4.5      0.18      2.1      0.17      1.9       0.32           2.0      -0.30     -2.3   -0.35     -2.5
       Openness            -0.17     -3.5    -0.12      -0.9    -0.16      -3.3      0.20            3.1      0.19      2.9    0.34      4.3
       Constant           -28.51    -26.2   -32.13     -15.4   -30.50     -30.2    -49.36          -32.2    -50.86    -35.3   -49.6    -31.9

      Observations        9602              16290              9602                16290                    16290             16290
          R2              0.66               0.86              0.72                 0.24                     0.26
       Hetero, ln
           a              2.21      48.0      2.8      115.1                                                                   2.8     94.3
           b              -0.11     -18.4   -0.098     -19.8                                                                  -0.11    -19.8
                                            Mean        SD                                                                    Mean      SD
   Ln Predicted Trade                         8.4       2.7                                                                    4.2      4.8

Source: As for Table 1.




                                                                    16
Comparison of the results for samples including Non-Reporters in Online Appendix Tables A.1

and A.2 provides some guidance on the extent of bias resulting from the missing trade values

considered in this paper. For OLS and Feasible GLS, no difference was expected or seen because

the estimation sample is almost unchanged. With PPML, the downward bias resulting from

adding these observations was almost invisible. The summary of key impacts with Tobit,

Weighted Tobit and EK-H in Figure 2, however, points to some larger differences. For GDP and

for distance using Fixed Effects, the limited-dependent estimators increased by less than 5

percent when the non-reporters were added, while the coefficients on per capita income rose

between 15 and 22 percent. The increases are consistent with the missing data sample including

some cases that would, like the left-hand square in Figure 1, have been small positive trade

values. However, there are some unexpected exceptions, with the distance coefficients lower in

absolute value in several cases. The unpredictable impacts of missing data evident both from

Figure 1 and in these results provide a strong case for identifying missing values masquerading

as zeros and removing them from the sample.




                                                17
Figure 2. Effects of Adding Missing Data on Coefficients, %


  25

  20

  15

  10

   5

   0

  -5

 -10




                                  Tobit    Wtd Tobit    EK-H


Note: Exp GDP is the natural log of Exporter GDP; Imp GDP is the natural log of importer GDP; Exp GDPC is the

natural log of Exporter GDP per capita; Imp GDPC is the natural log of Importer GDP per capita; Dist (Trad) is the

log of bilateral distance in the Traditional Gravity model; Dist (FE) is the log of distance in the Fixed Effects model.




                                                          18
IV.    Designing the Monte Carlo Experiments

Two questions guided the design of the Monte Carlo experiments: (i) are predicted trade and

model residuals best represented in logs or levels? and (ii) what are the parameters of the

distributions for traditional and FE gravity with PPML and EK-H estimators?

       Deviations from normality may result in large asymptotic biases with truncated

estimators like OLS in logs and much smaller bias with Tobit-type estimators (Arabmazar and

Schmidt 1982). To guard against such bias, Quantile plots were used to compare predictions and

residuals for each model-estimator combination against a benchmark of normality. These plots

for predictions and non-limit residuals from the Traditional Model using EK-H are presented in

the four panels of Figure A.1 in the Online Appendix, with heteroscedasticity removed from the

log residuals by dividing by their standard deviation at each observation, and similar plots for FE

gravity using PPML in Figure A.2. The same result was obtained in every caseвЂ”acceptance of

normality in logs and sharp rejection in levels, confirming SSTвЂ™s decision to perform their Monte

Carlo simulations in logs.

       The next preparatory step for the Monte Carlo analysis was to select the parameters for

the distributions of the logs of predicted trade and the log residuals, whose parameters are given

in the lower part of Tables 1 and 2. Because the PPML model is fitted to censored dataвЂ”which it

must convert into a continuous distribution because it uses a continuous version of the Poisson

distributionвЂ”the mean prediction used is larger (8.2 vs 4.2) and its standard deviation smaller

(2.8 vs 4.85) than for the uncensored latent variable in the EK-H model. While these two

distributions are quite different, and required two separate Monte Carlo analyses, both

distributions are very far from the log standard normal assumed by SST (2006). For the

residuals, a striking feature is the broad similarity across model types and estimators of the



                                                 19
heteroscedasticity parameters. The intercept was between 2.6 and 2.8 in all cases, while the

coefficients on predicted trade were between -0.09 and -0.12 in all cases.



V.       Monte Carlo Analyses


The Monte Carlo analyses were designed to be as simple as possible to focus on the question of

interest and to avoid the plethora of cases that made inference challenging in studies, such as

Martin and Pham (2020), using a wide range of models and degrees of heteroscedasticity.

Because the key challenge for estimation appears to be limited-dependent variable bias, all

coefficients should be affected equally (Greene 1981), allowing testing to focus on a single

explanatory variable.

         The experiments were performed by first generating observations using the model

presented graphically in Figure 1:

         (5)      y = ОІ.x + e

where all variables are in natural logarithms and ОІ=1. The pseudo-random samples used x ~

N(4.2, 4.85) for the EK-H-based data generating process (DGP) and N(8.2, 2.8) for the PPML-

based DGP. The error terms for the EK-H model were distributed with variance н µнјЋн µнјЋн µн±–н µн±– 2 = н µн±’н µн±’ (н µн±Ћн µн±Ћ+н µн±Џн µн±Џн µн±¦н µн±¦н µн±–н µн±– )

where a= 2.7 and b= вЂ“0.115, while simulations based on PPML parameters used a= 2.75 and b=

вЂ“0.095. Each simulation included 10,000 observations for comparability with typical samples of

bilateral trade flows, repeated 1000 times to obtain an indication of the standard error likely

associated with real-world analyses. The required random numbers were generated using the

pseudo-random number generator in STATA with seed value 10101.

         For the OLS estimator, the lowest 41 percent of outcomes were mapped to missing, and

eliminated from the sample. For the PPML, the values of the dependent variable were first

                                                         20
exponentiated into levels and then the lowest 41 percent of outcomes mapped to zero. For the

limited-dependent variable estimators, the smallest 41 percent of outcomes were replaced by the

minimum value above this range, the threshold value in the EK-H likelihood function. The

estimates for ОІ obtained from each of the four estimators considered, and the associated standard

errors, are presented in Table 3.




                                               21
Table 3. Coefficient Estimates and Standard Errors from Monte Carlo Results
                             EK-H Parameters                    PPML Parameters
                         пїЅ
                         ОІ               Std Error          пїЅ
                                                            ОІ               Std Error
 OLS                    0.62               0.01            0.52               0.01
 PPML                   0.87               0.13            0.78               0.11
 Tobit                  0.89               0.01            0.90               0.01
 Weighted Tobit         0.92               0.03            0.85               0.02
 MLE                   0.999               0.01           1.000               0.001
Note: ОІ=1




                                               22
These results suggest that Ordinary Least Squares is very seriously biased, with its coefficient

estimate with the EK-H parameters of 0.62 very close to GreeneвЂ™s (1983) prediction of 0.59 with

59 percent non-limit values. The PPML estimator appears to have much lower bias than OLS,

with downward bias of 13 percent using the EK-H based sample and 22 percent using parameters

derived with the PPML model itself.

       The Tobit model is less biased than the two models that do not account for the zeros, with

a downward bias of around 10 percent for both DGPs. Correction for heteroscedasticity in the

Tobit model reduces the bias to 8 percent using EK-H parameters, although this bias is 15

percent with PPML-based parameters.

       The EK-H estimator allowing for heteroscedasticity and limited-dependent variable bias

corrects for these problems and yields results almost exactly in line with the known true value of

1. This result extends the Head and Mayer (2014, Table 3.7, line 3) finding of low bias for an EK

Tobit estimator in the absence of heteroscedasticity to the real-world case with

heteroscedasticity. It is also consistent with Martin and Pham (2008) who obtained very low bias

from a heteroscedasticity-corrected Tobit model applied to data generated using an Eaton and

Tamura (1994) DGP.



VI.    Implications for Applied Research



A key consideration for applied research with the gravity model is whether there are many zero

values of the dependent variable. If there are, then models that do not explicitly allow for

potential sample selection biasвЂ”including OLS and PPMLвЂ”should be used with extreme

caution, considering the serious risk of biased parameter estimates.



                                                 23
          If the theoretical framework and analysis of data suggests the possible presence of bias

due to sample selection, heteroscedasticity/nonlinearity or missing values, then the extent of the

resulting bias will likely depend on parameters of the type considered in the design of the Monte

Carlo analysis in this paper. If the dataset involves very different shares of zeros and/or

distributions of predicted trade and model residuals from those in the dataset examined here, then

Monte Carlo analysis based on the parameters of the DGP is desirable before final choice of

estimator.

          Several papers have argued for the PPML estimator of the gravity model on the ancillary

grounds that it results in predicted trade flows that sum to actual imports and exports (Arvis and

Shepherd 2014; Fally 2015). This creates an advantage when gravity models estimated using

fixed effects are used to infer the multilateral resistance terms central to structural gravity. But if

this advantage of PPML comes with substantially biased coefficient estimates, the price for

analytical convenience seems too high. In this situation, the alternative of using fitted export and

import flows when inferring the multilateral resistance terms (Fally 2015, p78) seems a better

option.

          It is often attractive in applied work to use curated trade datasets such as the CEPII BACI

database (Gaulier and Zignago 2010), the IMF Direction of Trade database (Marini, Dippelsman

and Stanger 2018) used by Head and Mayer (2014) or the GTAP trade database (Gehlhar, Zhi

and Yao 2010) used by Fally (2015) rather than raw data from UN COMTRADE. The curators

of these databases add value, sometimes by choosing between the two estimates available for

many trade flowsвЂ”one from the reporter and one from its partnerвЂ”on the basis of data quality,

and sometimes by replacing missing values for intermittent reporters with trade patterns from

nearby years or with real data from other than their primary sources. Alternatively, researchers



                                                  24
can work with primary data but use similar data preparation steps to reduce the number of

missing values and improve data quality.

       Whoever processes the data, it is important to ascertain carefully what has been done in

the вЂњterra incognitaвЂќ where neither origin nor destination has reported data. Have these values

been set to zero as in the widely analyzed trade database used by SST (2006)? Or have model-

based predictions been introduced to fill out the dataset for their needs? Given the risk of biasвЂ”

of unknown directionвЂ”from use of missing values identified in this paper, and the futility of

reverse-engineering modelled dataвЂ”excluding these data from the analysis seems likely to be

preferable to their inclusion. When missing data have been set to zero, this does not upset the

adding up conditions emphasized by Fally (2015).

       The problem of unreported data remains important. Even though the number of

economies reporting trade data to COMTRADE more than doubled between 1992 and 2014,

over 60 economies identified as partners in the 2014 COMTRADE dataset did not report to

COMTRADE. This problem is probably even more serious with dependent variables other than

trade, such as foreign investment, foreign aid or migration (Artuc et al 2014). And the problem is

not unique to curated data sets. Missing values masquerading as zeros arise when bilateral trade

data arranged by importer-exporter are arranged into a bilateral trade matrix using tools such as

an EXCEL pivot table.



VII.   Summary and Conclusions



Having good estimates of the key parameters for the gravity model has become critically

important because of its increasingly wide use in economic theory, economic policy and



                                                25
hypothesis testing (Costinot and RodrГ­guez-Clare 2014). Unfortunately, widely used estimators

generate quite different results, making it particularly important to recognize which are suffering

from bias due to econometric challenges in estimating the gravity model.

       This paper considers three major econometric challenges for estimation of the gravity

modelвЂ”limited-dependent variable bias; a combination of heteroscedasticity and nonlinearity;

and missing data mapped to zeros. Many attempts have been made to assess, using Monte Carlo

simulations, which estimators are unbiased. Unfortunately, many of these have ignored key

features of the underlying data generating processesвЂ”whether by omitting all zeros from the

analysis; choosing arbitrary distributions for key variables; or examining only arbitrary patterns

of heteroscedasticity.

       This paper follows Leung and Yu (1996) in developing Monte Carlo simulations that

reflect the key parameters of the system. The analysis begins by identifying and removing nearly

2100 observations from the widely used SST (2006) dataset that are identifiable as missing

values mapped to zero. Simple quantile plots supported use of log-linear rather than linear

specifications for the Monte Carlo distributions of predicted trade flows and model residuals, so

this element of past practice was continued. The estimated degree of heteroscedasticity was quite

similar across three approaches to estimationвЂ”OLS, PPML, and Eaton-Kortum corrected for

heteroscedasticity (EK-H)вЂ”with larger proportional deviations from small trade flows than from

large. There were large differences in the distribution of predicted trade from the PPML model,

which predicts censored trade, and the EK-H model, which uses an unbounded latent variable.

To ensure robustness, Monte Carlo simulations were performed with both sets of parameters.

       The Monte Carlo results point to very large downward bias in coefficients when the

traditional estimator (OLS in logs) is used. The EK-H estimator accounting for heteroscedasticity



                                                26
exhibits almost no bias despite the large share of zeros in the sample. The PPML estimator

exhibits much less bias than the traditional OLS estimator but, at 13 to 18 percent downward

bias, is much less satisfactory than the EK-H estimator. For those who would prefer to use a

вЂњcannedвЂќ package, a weighted Tobit estimator suffered from downward bias of 8 to 15 percent.

While these recommendations on choice of estimator are specific to the type of dataset usedвЂ”a

cross-sectional matrix of total trade flowsвЂ”the framework for analysis provides a basis for

identifying the least biased estimators in other contexts.

       Like all research, this paper leaves many important and interesting questions to be

addressed. Although the Monte Carlo simulations were based on key features of the data

generating process, the results donвЂ™t appear to capture all features of the data. In particular, the

real-world analysisвЂ”on only one sampleвЂ”found lower estimates of key parameters with PPML

than with OLS, while the Monte Carlo analysis suggested greater downward bias using OLS.

With the insights on heteroscedasticity and selection bias in this paper, future researchers may be

better equipped to add additional features, such as the preferences for variety and extensive

margin growth emphasized in Helpman, Melitz and Rubinstein (2008), to help explain these

differences.




                                                  27
                                         References



Anderson, James E. and Eric van Wincoop, вЂњGravity with Gravitas: A Solution to the Border

       Puzzle,вЂќ American Economic Review 93:(2003), 170-92.

Anderson, James E., вЂњThe Gravity Model,вЂќ Annual Review of Economics 3(2011),133-60.

Arabmazar, Abbas. and Peter Schmidt, вЂњFurther Evidence on the Robustness of the Tobit

       Estimator to Heteroscedasticity,вЂќ Journal of Econometrics 17 (1981), 253-8.

Arabmazar, Abbas and Peter Schmidt, вЂњAn Investigation of the Robustness of the Tobit

       Estimator to Non-Normality,вЂќ Econometrica 50:4 (1982), 1055-63.

Arkolakis, Costas, Arnaud Costinot and AndrГ©s RodrГ­guez-Clare, вЂњNew Trade Models, Same

       Old Gains?вЂќ American Economic Review 102:1 (2012), 94-130.

Artuc, Erhan, FrГ©dГ©ric Docquier, Caglar Г–zden and Chrisopher Parsons, вЂњA Global Assessment

       of Human Capital Mobility: The Role of Non-OECD Destinations,вЂќ World Development

       65 (2014), 6-26.

Arvis, Jean-FranГ§ois and Ben Shepherd, вЂњThe Poisson Quasi-Maximum Likelihood Estimator: a

       Solution to the вЂAdding UpвЂ™ Problem in Gravity Models,вЂќ Applied Economics Letters

       20:6 (2014), 515-9.

Cameron, A. Colin and Pravin Trivedi, Microeconomics Using Stata, Stata Press, College

       Station, TX (2009).

Carson, Richard and Yixiao Sun вЂњThe Tobit Model with a Non-Zero Threshold,вЂќ Econometrics

       Journal 10 (2007), 488вЂ“502.

Chaney, Thomas, вЂњDistorted Gravity: The Intensive and Extensive Margins of International

       Trade,вЂќ American Economic Review 98:4 (2008),1707вЂ“21.



                                              28
Chaney, Thomas, вЂњThe Gravity Equation in International Trade: an Explanation,вЂќ Journal of

       Political Economy 126:1(2018),150-77.

Chitu, Livia, Barry Eichengreen and Arnaud Mehl, вЂњHistory, Gravity and International Finance,вЂќ

       Journal of International Money and Finance 46 (2014),104-29.

Costinot, Arnaud and AndrГ©s RodrГ­guez-Clare, вЂњTrade Theory with Numbers: Quantifying the

       Consequences of Globalization,вЂќ ch 3 in Gopinath, G., Helpman, E. and Rogoff, K. eds

       Handbook of International Economics, Vol 4 Elsevier, Amsterdam (2014),

       http://dx.doi.org/10.1016/B978-0-444-54314-1.00004-5

Donald, Stephen, вЂњTwo Step Estimation of Heteroskedastic Sample Selection Models,вЂќ in

       Journal of Econometrics 65:2 (1995), 347-80.

Eaton, Jonathan and Samuel Kortum, S. вЂњTrade in Capital Goods,вЂќ European Economic Review

       45 (2001),1195-235.

Eaton, Jonathan, Samuel Kortum and Sebastian Sotelo, вЂњInternational Trade: Linking Micro and

       Macro,вЂќ in Acemoglu, D., Arellano, M. and Dekel, E. eds. Advances in Economics and

       Econometrics, Tenth World Congress, Vol II: Applied Economics Cambridge University

       Press for the Econometric Society (2013).

Eaton, Jonathan. and Akiko Tamura, вЂњBilateralism and Regionalism in Japanese and US Trade

       and Direct Foreign Investment,вЂќ Journal of the Japanese and International Economies 8

       (1994), 478-510.

Fally, Thibault, вЂњStructural Gravity and Fixed Effects,вЂќ Journal of International Economics 97:1

       (2015), 75-86.

Feenstra, Robert, Advanced International Trade: Theory and Evidence. Princeton University

       Press, Princeton, New Jersey (2004).



                                               29
Feenstra, Robert, Robert Lipsey and Harry Bowen, вЂњWorld Trade Flows, 1970-1992, With

       Production and Tariff Data,вЂќ NBER Working Paper 5910, National Bureau of Economic

       Research, Cambrige MA (1997). https://www.nber.org/papers/w5910.pdf

Gaulier, G. and Zignago, S. (2010), вЂњBACI: International Trade Database at the Product Level:

       the 1994-2007 Version,вЂќ CEPII Working Paper 2010-23, CEPII, Paris.

Gehlhar, Mark, Zhi Wang and Shunli Yao, вЂњGTAP 7 Data Base Documentation- Chapter 9.A:

       Reconciling Merchandise Trade Data,вЂќ (2010) https://tinyurl.com/yyh7a5ua

Greene, William. вЂњOn the Asymptotic Bias of the Ordinary Least Squares Estimator of the Tobit

       Model,вЂќ Econometrica 49:2 (1981), 505-13.

Head, Keith. and Thierry Mayer, вЂњGravity Equations: Workhorse, Toolkit and Cookbook,вЂќ in

       Gopinath, Gita, Ehanan Helpman, and Ken Rogoff, eds Handbook of International

       Economics, Vol 4: (2014), 131-201.

Heckman, James, вЂњSample Selection Bias as a Specification Error,вЂќ Econometrica 47: (1979),

       153-61.

Helpman, Elhanan, Marc Melitz and Yona Rubinstein, вЂњEstimating Trade Flows: Trading

       Partners and Trading Volumes,вЂќ Quarterly Journal of Economics CXXIII:2 (2008), 441-

       87.

Jochmans, Koen. вЂњTwo Way Models for Gravity,вЂќ Review of Economics and Statistics 99:3:

       (2017), 478вЂ“85.

Leung, Siu. Fai. and Shihti Yu, вЂњOn the choice between sample selection and two-part models,вЂќ

       Journal of Econometrics 72 (1996), 197-229.

Marini, Marco, Robert Dippelsman and Michael Stanger вЂњNew Estimates for Direction of Trade

       Statistics,вЂќ WP/18/16, (2018) International Monetary Fund, Washington DC.



                                              30
Martin, Will and Cong Pham, вЂњEstimating the Gravity Model when Zero Trade Flows are

       Frequent,вЂќ Working Paper 2008_03, (2008), Deakin University.

       https://tinyurl.com/ybwj2hrk

Martin, Will and Cong Pham, вЂњEstimating the Gravity Model when Zero Trade Flows are

       Frequent and Economically Determined,вЂќ Applied Economics 52:26 (2020), 2766-79.

Olsen, Randall, вЂњApproximating a Truncated Normal Regression with the Method of Moments,вЂќ

       Econometrica 48:5 (1980), 1099-1105.

Ottaviano, Gianmarco, вЂњEuropean Integration and the Gains from Trade,вЂќ ch 12 in Badinger,

       Harald and Volker Nitsch, V. eds Routledge Handbook of the Economics of European

       Integration, Routledge, London and New York (2016).

Redding, Stephen and Anthony Venables, вЂњEconomic Geography and International Inequality,вЂќ

       Journal of International Economics 62:1 (2004), 53вЂ“82.

Santos Silva, JoГЈo and Silvana Tenreyro, вЂњThe Log of Gravity,вЂќ Review of Economics and

       Statistics 88 (2006), 641-58.

Santos Silva, JoГЈo and Silvana Tenreyro, (2011), вЂњFurther Simulation Evidence on the

       Performance of the Poisson Pseudo-maximum Likelihood Estimator,вЂќ Economics Letters

       112:220вЂ“2.

Tinbergen, Jan, Shaping the World Economy: Suggestions for an International Economic Policy.

       The Twentieth Century Fund, New York (1962).

Tobin, James, вЂњEstimation of Relationships for Limited Dependent Variables,вЂќ Econometrica 26

       (1958), 24-36.

Yotov, Yoto, Roberta Piermartini, JosГ©-Antonio Monteiro and Mario Larch, An Advanced Guide

       to Trade Policy, UNCTAD and the World Trade Organization, Geneva (2016).



                                              31
                     Online Appendix for вЂњMaking Gravity Great AgainвЂќ


Tables A.1 to A.3 of this Online Appendix present econometric results in parallel to those in

Tables 1 to 3 of the paper. These results, for the full sample used by Santos Silva and Tenreyro

(2006), are included both to show the effects on key coefficient estimates of excluding missing

Non-Reporter to Non-Reporter data from the sample, and to confirm that the key Monte Carlo

results reported in the paper are robust to the exclusion of these missing data.

       Quantile Plots for the normality of predicted trade and non-limit residuals from the

Traditional Gravity model estimated using the EK-H estimator on the sample excluding Non-

Reporters are presented in Figure A.1. Estimates for the Fixed Effects model using the PPML

estimator on the same sample are presented in Figure A.2. These plots are presented in levels and

in logs to provide guidance on which provides a better representation.
Table A.1. Results for the Gravity Model with Fixed Effects, Including Non-Reporters.
                                 OLS               PPML            Feasible GLS           Tobit               Wtd Tobit               EK-H
                           Coeff    t-stat   Coeff      t-stat   Coeff       t-stat   Coeff       t-stat   Coeff     t-stat   Coeff          t-stat
 Log distance              -1.35    -43.1    -0.75      -18.5    -1.28       -44.6    -1.79       -42.5    -1.59     -39.9    -1.49          -28.4
 Contiguity                0.17      1.3      0.37       4.1     -0.01        -0.1    -0.79        -4.5    -0.76      -5.2    -0.75           -3.6
 Common language           0.41      6.0      0.38       4.1      0.39         6.2     0.79         9.2     0.73      8.6      0.72           7.7
 Colonial tie              0.67      9.5      0.08       0.6      0.68        10.4     0.90        10.0     1.02     11.4      0.93           9.7
 FTA                       0.31      3.2      0.38       4.9     -0.07        -0.8    -0.49        -3.3    -1.21     -9.63    -1.38           -8.7
 Fixed Effects              Yes               Yes                 Yes                  Yes                  Yes                Yes
 Obsvns                    9613              18360               9613                 18360                18360              18360
 R2                        0.75                                   0.80
 Hetero
        a                   2.0      43.0     2.7       137.7                                                                  2.41          72.6
        b                  -0.1     -19.4    -0.1       -16.5                                                                 -0.09          -15.7
                                             Mean        SD                                                                   Mean            SD
 Ln Predicted Trade                           7.6        3.1                                                                   3.5            5.2
Sources: As for Table 1.
         Table A.2 Results for Alternative Estimators, Including Non-Reporters
                             OLS              PPML           Feasible GLS               Tobit            Weighted Tobit         EK-H
                        Coeff    t-stat   Coeff    t-stat   Coeff      t-stat   Coeff           t-stat   Coeff    t-stat   Coeff   t-stat
Log exporter's GDP      0.94      80.6    0.73      27.3    0.96        86.9    1.54            93.9     1.55      99.0    1.54     92.4
Log importer's GDP       0.80      71.9    0.74      27.0    0.83       78.5     1.19            75.7     1.22      80.8    1.20     74.6
Log exporter's GDPC      0.21      12.5    0.16       2.9    0.21       13.0     0.31            13.2     0.29      12.8    0.29     11.3
Log importer's GDPC      0.11       6.4    0.14       3.0    0.11        7.0     0.25            10.6     0.22       9.8    0.22      9.1
Log distance            -1.17     -34.4   -0.78     -14.4   -1.15      -37.0    -1.64           -33.6    -1.57     -34.2   -1.39    -24.0
Contiguity               0.31       2.2    0.19       1.9    0.10        0.9    -0.77            -3.8    -0.81      -4.7   -0.82     -3.5
Common language          0.68      10.6    0.75       5.5    0.69       11.4     1.11            12.7     1.06      12.4    0.95      9.0
Colonial tie             0.40       5.8    0.03       0.2    0.38        5.9     0.59             6.4     0.64       7.0    0.60      5.5
Landlocked exporter     -0.06      -1.0   -0.86      -5.5   -0.09       -1.5    -0.05            -0.6    -0.11      -1.3    0.02      0.3
Landlocked importer     -0.66     -10.5   -0.70      -5.0   -0.65      -10.4    -0.70            -8.1    -0.76      -8.8   -0.61     -7.2
ExporterвЂ™s remoteness    0.47       6.0    0.66       4.9    0.47        6.4     0.77             6.7     0.74       6.8    0.55      4.8
ImporterвЂ™s remoteness   -0.20      -2.5    0.56       4.7   -0.13       -1.7    -0.19            -1.6    -0.14      -1.2   -0.20     -1.7
FTA                      0.49       4.7    0.18       2.1    0.18        2.0     0.46             2.7    -0.25      -1.9   -0.29     -2.0
Openness                -0.17      -3.5   -0.11      -0.8   -0.16       -3.3     0.56             8.5     0.55       8.3    0.74      9.1
                                                                                                                              -
Constant                -28.49    -26.2   -32.33    -15.7   -30.50     -30.2    -55.53          -34.9    -57.29    -38.3   55.76    -34.8

Obsvns                  9613              18360             9613                18360                    18360             18360
R2                      0.66              0.860             0.720                0.25                     0.27
Hetero
a                        2.21      48.1                                                                                     2.73    145.7
b                       -0.11     -18.4                                                                                    -0.09    -20.3
                                          Mean     SD                                                                      Mean      SD
Ln Predicted Trade                         8.0     2.8                                                                      3.3      5.2
Table A.3. Coefficient Estimates and Standard Errors from Monte Carlo Results
                               EK-H Parameters                    PPML Parameters
                           пїЅ
                           ОІ              Std Error              пїЅ
                                                                 ОІ                 Std Error
 OLS                      0.58              0.09               0.48                  0.01
 PPML                     0.84              0.17               0.79                  0.11
 Tobit                    0.89              0.001              0.91                  0.01
 EK-H                   0.9999              0.008             0.9999                0.014
Note: Based parameters estimated including Non-Reporters, with 50 percent zeros.
Figure A.1. Quantile Plots: EK-H Model, Traditional Gravity, Excl Non-Reporters
 1. Predicted Values, logs                                     2. Predicted Values, Levels




                                                                   1.000e+09
         20        10
 Linear prediction




                                                               5.000e+08
                                                                 zblevel
       0 -10




                                                                   0
         -20




                        -20   -10       0            10   20       -1.000e+08   -50000000          0         50000000   1.000e+0
                                    Inverse Normal                                          Inverse Normal



 3. Residuals, logs                                            4. Residuals, levels
Figure A.2. Quantile Plots: PPML Model, Fixed Effects Gravity, Excl Non-Reporters
 1. Predicted Values, logs              2. Predicted Values, Levels




 3. Residuals, logs                     4. Residuals, levels
                                            Reference


Santos Silva, JoГЈo and Silvana Tenreyro, вЂњThe Log of Gravity,вЂќ Review of Economics and

       Statistics 88 (2006), 641-58.