A Flexible Modeling Framework to Estimate Interregional
                      Trade Patterns and Input-Output Accounts


            Patrick Canning, Economic Research Service, US Department of Agriculture
                     Zhi Wang, World Bank and City University of Hong Kong *


                                                  Abstract

         This study implements and tests a mathematical programming model to estimate

interregional, interindustry transaction flows in a national system of economic regions based on

an interregional accounting framework and initial information of interregional shipments. A

national input-output (IO) table, regional data on gross output, value-added, exports, imports and

final demand at sector level are used as inputs to generate an interregional IO account that

reconciles regional economic statistics and interregional transaction data. The model is tested

using data from a multi-regional global input-output database and shows remarkable capacity to

discover true interregional trade patterns from highly distorted initial estimates.


JEL Classification Numbers: R1, C67, C81




World Bank Policy Research Working Paper 3359, July 2004

The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the
exchange of ideas about development issues. An objective of the series is to get the findings out quickly,
even if the presentations are less than fully polished. The papers carry the names of the authors and should
be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely
those of the authors. They do not necessarily represent the view of the World Bank, its Executive Directors,
or the countries they represent. Policy Research Working Papers are available online at
http://econ.worldbank.org.



* Patrick Canning is a senior economist at the Economic Research Services, United State
Department of Agriculture, 1800 M street, NW, Washington, D.C. 20036; email:
pcanning@ers.usda.gov. Zhi Wang is a consultant at the World Bank and adjunct Professor at
City University of Hong Kong. Corresponding address: 838 Summer Walk Drive, Gaithersburg,
Maryland 20878. E-mail: zwang53@comcast.net. The authors thank the editor of this Journal and
three anonymous referees for their valuable comments.




                                                        1

                                      Executive Summary



         There are tremendous disparities in economic development across regions in large

developing countries such as China, India, Indonesia and Brazil. Globalization may have different

impact on urban and coast developed areas and rural and inland less developed regions. A major

obstacle in conducting policy analysis for regional economic development under globalization is

the lack of consistent, reliable regional data, especially data on interregional trade and

interindustry transactions. This study implements and tests a mathematical programming model to

estimate interregional, interindustry transaction flows in a national system of economic regions

based on an interregional accounting framework and initial information on interregional

shipments. A complete national input-outputtable plus regional sectoral data on gross output,

value-added, exports, imports and final demand are used as inputs to generate an interregional

input-output system that reconciles regional market data and interregional transactions. The

model is tested on a four-region,10-sector example against data aggregated from a multi-regional

global input-output database, and test results from seven experiments are evaluated against eight

mean absolute percentage error indexes. The model has capacity to discover the true interregional

trade pattern from highly distorted initial estimates. The paper also discusses some general

guidelines for implementing the model for a large­dimension. multi-regional account based on

real national and regional data.




                                                  2

1. INTRODUCTION


        A major obstacle in regional economic analysis and empirical economic geography is the

lack of consistent, reliable regional data, especially data on interregional trade and interindustry

transactions. Despite decades of efforts by regional economists, data analogous to national input-

output accounts and international trade accounts, which have become increasingly available to the

public today, still are generally not available even for well defined sub-national regions in many

developed countries. Therefore, economists have had to develop various non-survey and semi-

survey methods to estimate such data. In earlier years, quotient based, gravity based and regional

purchase coefficient based non-survey methods were popular but lacked logical and theoretical

structures, and so have been deemed as `deficiency methods' (Jensen, 1990).

        Since the 1980s, various constrained matrix-balancing procedures have become

increasingly popular for estimating unknown data based on limited initial information subject to a

set of linear constraints. Attempts have been made to estimate regional and interregional

transactions in a unified national accounting system of economic regions. Batten (1982) extended

earlier work by Wilson (1970)1 and laid out an optimization model based on information theory

and linkages between national and regional input-output accounts to simultaneously estimate

interregional deliveries in both intermediate and final goods. Batten and Martellato (1985)

establish a simple hierarchical relationship among five classical models associated with authors

such as Isard, Chenery and Leontief that address interregional trade within an input-output

system. They find those models could be reduced to a statistical estimation problem based on

varying degrees of available interregional trade data and demonstrate that the net effect of

additional data or additional theoretical assumptions is similar in reducing the number of

unknown variables in the underdetermined estimation problems. They also demonstrate such

estimation problems are best undertaken with a closed system, i.e., when all the geographic

components of the national or state data are estimated simultaneously. Following this philosophy,




                                                  3

Byron et al (1993), Boosma and Oosterhaven (1992) and Trendle (1999) find evidence that the

additional accounting constraints imposed by such a closed system are useful as a checking

device on individual cell values and so improve estimation accuracy. Golan, Judge and Robinson

(1994) further generalize such an estimation problem to an ill-posed, underdetermined, pure

inverse problem that can be formulated in an optimization context that involves a nonlinear

criterion function and certain adding up and consistency constraints. They also show that under

such a framework, it is easy to take account of whatever initial information and data that exist

through the specification of additional constraints. However, they do not pay attention to how

such procedures could be used in a multi-regional context and thus the potential gain from

implementing the procedure in a closed national system of economic regions.

         Methods for matrix balancing can be classified into two broad classes -- bi-proportional

scaling and mathematical programming. The scaling methods are based on the adjustments of the

initial matrix to multiplying its rows and columns by positive constants until the matrix is

balanced. It was developed by Stone and other members of the Cambridge Growth Project (Stone

et al., 1963) and is usually known as RAS. The basic method was originally applied to known

row and column totals but has been extended to cases where the totals themselves are not known

with certainty (Senesen and Bates, 1988; Lahr, 2001). Mathematical programming methods are

explicitly based on a constrained optimization framework, usually minimizing a penalty function,

which measures the deviation of the balanced matrix from the initial matrix subject to a set of

balance conditions.

         Scaling methods such as RAS have been one of the most widely applied computational

algorithms for the solution of constrained matrix balancing problems. They are simple, iterative,

and require minimal programming effort to implement. However, as pointed out by van der Ploeg

(1982), they are not straightforward to use when including more general linear restrictions and

when allowing for different degrees of uncertainty in the initial estimates and restraints. They also

lack a theoretical interpretation of the adjustment process. Those aspects are crucial for an



                                                 4

adjustment procedure to improve the information content of the balanced estimates rather than

only adjusting the initial estimates mechanically. Mohr, Crown and Polenske (1987) discuss the

problems encountered when the RAS procedure is used to adjust trade flow data. They point out

that the special properties of interregional trade data increase the likelihood of non-convergence

of the RAS procedure and propose a linear programming approach that incorporates exogenous

information to override the unfeasibility of the RAS problem.

        In recent years, more and more researchers have tended to formulate constrained matrix

balancing problems as mathematical programming problems (var der Ploeg, 1988, Nagurney and

Robinson, 1989, Bartholdy, 1991, Byron et al., 1993), with an objective function that forces

"conservatism" on the process of rationalizing X from the initial estimate X . The theoretical

foundation for the approach can be viewed from both the perspectives of mathematical statistics

and information theory, and the solution of RAS is equivalent to constrained entropy

minimization with fixed row and column totals, as shown by Bregman (1967) and McDougall

(1999), and thus can be seen as a special case of the optimization methods2.

        Another important advantage of mathematical programming models over scaling

methods is their flexibility,. which allows a wide range of initial information to be used

efficiently in the data adjustment process. Additional constraints can be easily imposed, such as

allowing precise upper and lower bounds to be placed on unknown elements, inequality

conditions, or incorporating an associated term in the objective function to penalize solution

deviations from the initial row or column total estimates when they are not known with certainty.

Therefore, it provides more flexibility to the matrix balancing procedure. This flexibility is very

important in terms of improving the information content of the balanced estimates as showed by

Robinson, Cattaneo and El-said (2001).

        A Mathematical programming approach also permits one to routinely introduce relative

degrees of reliability for initial estimates. The idea of including data reliability in matrix




                                                  5

balancing can be traced back over half a century to Richard Stone and his colleagues (1942) when

they explored procedures for compiling national income accounts. Their ideas were formalized

into a mathematical procedure to balance the system of accounts after assigning reliability

weights to each entry in the system. The minimization of the sum of squares of the adjustments

between initial entries and balanced entries in the system, weighted by the reliabilities or the

reciprocal of the variances of the entries is carried out subject to linear (accounting) constraints.

This approach had first been operationlized by Byron (1978) and applied to the System of

National Accounts of the United Kingdom by van der Ploeg (1982, 1984). Zenios and his

collaborators (1989) further extend this approach to balance a large social accounting matrix in a

nonlinear network-programming framework. Robinson and his colleagues (2001) provide a way

to handle measurement error in cross entropy minimization via an error-in-variables formulation.

Although computational burden is no longer a problem today, the difficulty of estimating the

error variances in a large data set by such approaches still remains unsolved.

        The objectives of this paper are threefold. The firstis to develop and implement a formal

model to estimate interregional, interindustry transaction flows in a national system of economic

regions based on incomplete statistical information at the regional level. The second is to

evaluate the model's performance against data from the real world. And the third is to discuss the

issues that arise when applying this modeling framework to estimate a multi-regional input-output

account containing well-defined sub-regions.

        The paper is organized as follows. Section 2 specifies the modeling framework and

discusses its theoretical and empirical properties. Section 3 tests the model by using a four-region,

ten-sector data set compiled from a global database documented in McDougall, Elbehri, and

Trong (1998). Test results from seven experiments are evaluated against eight mean absolute

percentage error indexes. Section 4 discusses some empirical issues involved in applying such a

framework to data from a national statistical system. The paper ends with conclusions and

direction for future research.



                                                   6

2. MATHEMATICAL PROGRAMING MODEL FOR ESTIMATING INTERRGIONAL
TRADE AND INTERINDUSTRYTRANSACTION FLOWS

         Our model builds upon earlier work by Wilson (1970) and Batten (1982) with two

important departures. First, it explicitly incorporates interregional trade flow information into

both its accounting framework and initial estimates. We find this greatly enhances the accuracy of

estimation results. Second, a multi-regional input-output (MRIO) account is estimated first, then

extended to an interregional input-output (IRIO) account, which substantially reduces data

requirements and the "dimension explosion" problem in real world applications.

         Consider a national economy consisting of N sectors that are distributed over M regions.

The sectors use each other's products as inputs for their own production, which is in turn used up

either in further production or by final users. Each region exports some of its products to other

regions and some to other nations. They also import products from other regions and nations to

meet their intermediate and final demand. Assuming a predetermined location of production that

defines the structure of the national economic system of regions, the deliveries of goods and

services between regions are determined by imbalances between supply and demand inside the

different regions.

         Denote xi , yi , vi , ei , and mi as sector i's gross output, final demand (excluding
                    r   r    r    r         r




exports), value-added, exports, and imports in region r respectively, and denote xi , yi , vi , ei ,

and mi as their respective national counterparts. Also denote di as delivery of sector i's product
                                                                 sr




from region r to region s, zij and zij as intermediate transactions from sector i to sector j in
                              ·r



region r and the national level respectively.3All variables are measured in annual values. In such a

static national system of economic regions, the following accounting identities must hold at each

given year for all i  N and s, r  M.




                                                  7

                                         n
(1)                                        z·ji + vi = xi
                                              r    r    r

                                        j=1


                                         n             m
(2)                                        zij + yi =
                                             ·r     r  di   sr + mir

                                        j=1            s=1


                                        m
(3)                                    di   rs + ei = xi
                                                  r    r

                                       s=1


                                        m
(4)                                       zij = zij
                                            ·r

                                       r=1


                                       m
(5)                                   xi   r = xi
                                      r=1


                                       m
(6)                                       vi = vi
                                           r

                                      r=1


                                       m
(7)                                       yi = yi
                                            r

                                      r=1


                                       m
(8)                                       ei =ei
                                            r

                                      r=1


                                      m
(9)                                      mi = mi
                                            r

                                      r=1


        Collectively, equations (1) to (9) define a multi-regional input-output account (MRIO).

Such an account stops short of assigning specific intermediate or final uses for inter/intra regional

product flows, but guarantees that these flows exactly meet all regional demands. The economic

meanings for each of the nine equations are straightforward: equation (1) defines the sum of

sector i's intermediate and primary factor input equals the sector's total output in each region.

Equation (2) states the sum of each region's intermediate and final demand must be met by

deliveries from all regions (including from its own) within the nation plus imports from other

nations. Equation 3 defines a region can only deliver to all regions within the nation and export to

other nations what it produces, while equations (4) to (9) are simply the facts that sums of all the

region's economic activities within a nation must equal to the national totals.



                                                  8

         Suppose statistics exist for each regional sector on the gross outputs and value added

( xi and vi ), the origin of exports and destination of imports (ei and mi ), and the final regional
   r       r                                                         r         r




demands ( yi ). The MRIO estimation problem can be formally stated as follows:
              r



Given a n × m × m non-negative array D = { di } and a n × n × m non-negative array Z =
                                                       sr



{ zij }, determine a non-negative array D ={ di } and a non-negative array Z = { zij } that is
   ·r                                               sr                                         ·r



close to D and Z such that equations (1) to (9) are satisfied, where s  M denotes the shipping

regions, r  M denotes the receiving regions, and i, j  N denotes the make and use sectors

respectively.


In plain English, the estimation problem is to modify a given set of initial inter-regional and inter-

industrial transaction estimates to satisfy the above nine known accounting constraints. The

mathematical programming model used to conduct the estimation employs an objective function

that penalizes the deviations of the estimated array D and Z from the initial array D and Z . Two

types of alternative functional forms could be used:

         (i) Quadratic function:



(10)             Min S =
                           2
                           1                                             ·r     ·r
                                  n   m   m
                                            ( di - di )2
                                               sr    sr     n   n   m
                                                         +   
                                                  sr                  (zij - zij )2 
                               i=1                                            ·r
                                     s=1 r=1   wd
                                                  i        i=1 j=1 r=1     wz ij    

         (ii) Cross-entropy function (Harrigan & Buchanan, 1984, Golan et al., 1994):


                           n  m   m                                          ·r
(11)            Min S =                di sr                   n   n  m
                                                                           zij          ·r   ·r
                                           sr· LN(di /di )+
                                                     sr  sr
                                                                               ·r · LN(zij /zij )
                          i=1 s=1 r=1 wdi                     i=1 j=1 r=1  wzij



         There are desirable theoretical properties of the above estimation framework that are well

documented in the literature. Firstly, it is a separable nonlinear programming problem subject to

linear constraints. The entropy function is motivated from information theory and is the objective

function underlying the well-known RAS procedure with row and column totals known with

certainty (Senesen and Bates, 1988). It measures the information surprise contained in D and Z



                                                     9

   given the initial estimates D and Z . The quadratic penalty function is motivated by statistical

   arguments. There are different statistical interpretations underlying the model by choices of

   different reliability weights wdi and wzij . When the weights are all equal to one, solution of
                                       sr         ·r



   this model gives a constrained least square estimator. When the initial estimates are taken as the

   weights, solution of the model gives a weighted constrained least square estimator, which is

   identical to the Friedlander-solution, and a good approximation of the RAS solution. When those

   weights are proportional to the variances of the initial estimates and the initial estimates are

   statistically independent (the variance and covariance matrix of D and Z are diagonal), the

   solution of the model yields best linear unbiased estimates of the true unknown matrix (Byron,

   1978), which is identical to the Generalized Least Squares estimator if the weights are equal to

   the variance of initial estimates (Stone, 1984, van der Ploeg, 1984). Furthermore, as noted by

   Stone et al. (1942) and proven by Weale (1985), in cases where the error distributions of the

   initial estimates are normal, the solution also satisfies the maximum likelihood criteria.

            Secondly, the quadratic and entropy objective functions are equivalent in the

   neighborhood of initial estimates, under a properly selected weighing scheme. By taking second

   order Taylor expansion of equation (11) at point ( di , zij ) we have
                                                            sr    ·r




                   n   m  m                   (d sr     sr )
                                                           2
                                                                    n   n  m                   ·r     ·r
             S =    {         (d  sr    sr)+     i  - di     }+                   ·r    ·r   (zij - zij )2 }
                                 i  - di                             {(zij           - zij )+
                  i=i s=1 r =1                    2 disr           i=1 j=i r =1                  2zij
                                                                                                    ·r


(12)
                   1     n     m   m
                = {                  ( di -di
                                         sr    sr )2     n     n   m
                                                    +                (zij·r
                                                                             - zij )2
                                                                                ·r
                                                                                     }+ R
                   2    i=1   s=1 r=1     disr          i=1   j=1 r=1       zij
                                                                              ·r




   This is the quadratic function (10) plus a remainder term R. As long as the posterior estimates

   and the initial estimates are close and the initial estimates are used as reliability weights4, the term

   R will be small and the two objective functions can be regarded as approximating one another.




                                                         10

         Thirdly, as proved by Harrigan (1990), in all but the trivial case, posterior estimates

derived from entropy or quadratic loss minimand will always better approximate the unknown,

true values than do the associated initial estimates. In this framework, information gain is

interpreted as the imposition of additional valid constraints or the narrowing of bounds on

existing constraints as long as the true but unknown values belong to the feasible solution set.

This is because adding valid constraints or further restricting the feasible set through the

narrowing of interval constraints cannot move the posterior estimates away from the true values,

unless the additional constraints are non-binding (have no information value). Although the

posterior estimates may not always be regarded as providing a "reasonable" approximation to the

true value5, they are always better than the initial estimates in the sense the former is closer to the

true value than the later, so long as the imposed constraints are true. In other words, the

optimization process has the effect of reducing, or at least not increasing, the variance of the

estimates. This property is simple to show by using matrix notation. Define W as the variance

matrix of initial estimates D , A as the coefficient matrix of all linear constraints. The least

squares solution (equivalent to the quadratic minimand as noted above) to the problem of

adjusting D to D that satisfies the linear constraint, A·D = 0 can be written as:


(13)                               D = (I - WAT(AWAT)-1A) D

Thus,

(14)             var(D) = (I - WAT(AWAT)-1A)W = W - WAT(AWAT)-1A)W

Since WAT(AWAT)-1A)W is a positive semi-definite matrix, the variance of posterior estimates

will always be less, or at least not greater than the variance of the initial estimates as long as

A·Dtrue = 0 holds. This is the fundamental reason why such an estimating framework will provide

better posterior estimates. Imposing accounting relationship's (1) to (9) will definitely improve,

or at least not worsen the initial estimates, since we are sure from economics those constraints are

identities and must be true for any national system of economic regions.




                                                    11

        Finally, the choice of weights in the objective function has very important impacts on the

estimation results. For instance, using the initial estimates as weights has the nice property that

each entry of the array is adjusted in proportion to its magnitude in order to satisfy the accounting

identities, and the variables cannot change sign and that large variables are adjusted more than

small variables. However, the adjustment relates directly to the size of the initial estimates

di and zij and does not force the unreliable initial estimates to absorb the bulk of the required
  sr       ·r
             ,


adjustment. Furthermore, only under the assumptions: (1) the initial estimates for different

elements in the array are statistically independent, and (2) each error variance is proportional to

the corresponding initial estimates, this commonly used weighing scheme (underlying RAS) can

obtain best unbiased estimates, while those assumptions may not hold in many cases. Fortunately,

the model is not restricted to use only a diagonal-weighing matrix such as the initial estimates.

When a variance-covariance matrix of the initial estimates is available, it can be incorporate into

the model by modifying the objective function as follows:


(15)            MinS=( D- D )T        WD-1(    D- D)+(Z- Z )T WZ-1( Z- Z )

The efficiency of the resulting posterior estimator will be further improved if the error structure

of the initial estimates is available, because such a weighting scheme makes the adjustment

independent of the size of the initial estimates. The larger the variance, the smaller its

contribution to the objective function, and hence the lesser punishment for di and zij to move
                                                                                   sr     *r



away from their initial estimates (only the relative, not the absolute size of the variance affects the

solution). A small variance of the initial estimates indicates, other things equal, they are very

reliable data and thus should not change by much, whilst a large variance of the initial estimates

indicates unreliable data and will be adjusted considerably in the solution process. Therefore, this

weighing scheme gives the best-unbiased estimates of the true, unknown inter-regional and inter-

industrial transaction value under the assumption that initial estimates for different elements in

the array are statistically independent. Although there is no difficulty to solve such a nonlinear



                                                   12

programming problem like this today, the major problem is lack of data to estimate the variance-

covariance matrix associate with the initial estimates.

         Stone (1984) proposed to estimate the variance of zij as var( zij ) = (ij zij ) , where
                                                                 ·r          ·r       *r ·r 2



ij is a subjectively determined reliability rating, expressing the percentage ratio of the standard
  *r




error to zij . Weale (1989) had used time series information on accounting discrepancies to infer
            ·r



data reliability. The similar methods can be used to derive variances associated with those initial

estimates in our model.

         Despite the difficulties in obtaining data for the best weighting scheme, advantages of

such a model in estimating interregional trade flows and interindustry transactions are still

obvious from an empirical perspective. Firstly, it is very flexible regarding the required know

information. For example, it allows for the possibility that the state total of output, value-added,

exports, imports and final demands are not known with certainty. In the real world, these regional

statistics typically have substantial gaps and inconstancies with the national total. Incorporating

associated terms similar to D and Z in the objective function to penalize solution deviations

from the initial estimates from statistical sources allows the estimation of those regional totals,

together with entries in the inter-regional delivery and inter-industrial transaction array. With the

use of upper and lower bounds, this fact can also be modeled by specifying ranges rather than

precise values for the linear constraints (1) - (3). In addition, the estimation of D or Z will be a

special case of the framework when only one set of additional data is available.

         Secondly, it permits a wider variety and volume of information to be brought into the

estimation process. For example, the ability of introducing upper and/or lower bounds on those

regional totals is one of the flexibilities not offered by commonly used scaling procedures such as

RAS. The gradient of the entropy function tends to infinity as di and zij  0, and hence
                                                                         sr     ·r




                                                     13

restricts the value of the posterior estimates to nonnegative. This is a desirable property of

estimating inter-regional trade data.6

         Thirdly, the weights in the objective function reflect the relative reliability of a given set

of initial estimates. The interpretation of the reliability weights is straightforward. Other things

equal, entries with higher reliability should be changed less than entries with a lower reliability.

The choice of those weights is also very flexible. They will use the best available information to

insure that reliable data in the initial estimates are not being modified by the optimization model

as much as unreliable data. In practice, such reliability weights can be put into a second array that

has the same dimension and structure as the initial estimates. The inverted variance-covariance

matrix of the initial estimates is statistically interpreted as the best index of the reliability for the

initial data.

         Finally, solution of this estimation problem exactly provides the data needed to construct

a so-called multi-regional input-output (MRIO) model (Miller and Blair, 1985, Isard, et al. 1998).

This model was pioneered by professor Polenske and her associates at MIT in the 1970s

(Polenske, 1980), and is still widely used in regional economic impact analysis today.

         The above model could be easily extended to further allocate Z and D to distinguish

intermediate and final delivery of goods and services within a national system of economic

regions. The extended model will be similar in many aspects with the interregional accounting

framework proposed by Batten (1982) two decades ago. However, as we will show later in this

paper, it becomes more operational and provides better empirical estimation results on

interregional shipments because of the explicit incorporation of interregional trade flow

information into both the initial estimates and the accounting framework.

         To demonstrate, denote zij as intermediate inputs delivered from sector i in region s to
                                     sr




sector j in region r within a nation, and yih as final goods and services delivered from sector i in
                                              sr




region s to type h final demand in region r. Further, denote mij and mih as imported (from other
                                                                   r        r




                                                     14

nations) intermediate and final goods and services delivered to sector j or final demand type h in

region r respectively.7 Other notation regarding state gross output, intermediate inputs, value-

added, exports and imports are the same with the aggregated model. Then the accounting

framework for the national system of economic regions can be defined as follows:

                                          n  m           n
(16)                                     zsrji       + mrji +vi =xi
                                                                    r    r

                                         j=1 s=1        j=1



                                           n  m            h   m
(17)                                               zij +
                                                     rs            yih + ei = xi
                                                                     rs   r    r

                                          j=1 s=1         h=1 s=1


                                           h   m           h
(18)                                              yih + mih = yi
                                                   sr           r       r

                                         h=1 s=1          h=1


                                           n           h
(19)                                           zij +
                                                sr          yih = di
                                                             sr      sr

                                          j=1          h=1


                                          m
(20)                                           zij = zij
                                                sr      ·r

                                         s=1


                                           n            h
(21)                                           mij +
                                                  r        mih = mi
                                                              r       r

                                          j=1          h=1


Adding a quadratic penalty objective function, we have an extended model to estimate a detailed

interregional input-output account based on the results from the earlier model.8


                      1    m  m    n   n ( zij - zij )2
                                             sr      sr        m m     n  h     sr     sr
             Min S =    {                                  +                  (yih - yih )2
                      2   s=1 r=1 i=1 j=1      wzijsr
                                                              s=1 r=1 i=1 h=1     wyih
                                                                                     sr


(22)

                      m    n  n
                    +           ( mij - mij )2
                                    r       r        m   n  h
                                                 +               ( mih - mih )2
                                                                       r     r
                                                                                }
                     r=1 i=1 j=1     wmijr
                                                    r=1 i=1 h=1        wmih
                                                                          r




        This model has the theoretical and empirical properties similar to the earlier model, but

with much higher details. The solution to (22), subject to constraints (16) to (21), provides a




                                                      15

complete set of data for a so-called inter-regional input-output (IRIO) model with imports

endogenous (Miller and Blair, 1985, Isard, et al. 1998).




3. EMPIRICAL TEST OF THE MODEL AND EVALUATION MEASURES

The Testing Data Set

        How does the model specified above perform when applied to data from the real world?

In order to evaluate the models' performance, a benchmark data set from the real world is needed.

Because good interregional trade data are quite rare and very difficult to obtain in any country, a

natural place to find such data sets is existing global production and trade databases such as the

GTAP (Global Trade Analysis Project) database. For instance, version 4 GTAP database contains

detailed bilateral trade, transportation, and individual country's input-output data covering 45

countries and 50 sectors (McDougall, Elbehri, and Trong, 1998). For our particular purpose,

version 4 GTAP database was first aggregated into a 4-region, 10-sector data set. Then three of

the four regions (the United States, European Union and Japan) were further aggregated into a

single open economy which engages in both interregional trade among its 3 internal regions and

international trade with the rest of the world. We will use this partitioned data set as the

benchmark for a hypothetical national economy, and attempt to use our model to replicate the

underlying inter-continental trade flows among Japan, EU and the United Sates as well as the

individual country's input-output accounts.

Experiment Design

        In the first experiment, we do this without use of the region-specific input-output

coefficients as the situation encountered in the real world, where only the national IO table is

available to economists (it is the three region's weighted average in our experiment and are

defined as zij = zij /(x j - v j ) × xrj - vrj to make full use of the known information). Initial
             ·r   (               ) (         )

estimates of interregional commodity flows are from the `true' interregional trade data in the




                                                  16

GTAP database but was distorted by a normally distributed random error term with zero mean

and the size of standard deviation as large as 5 times the "true" trade data. The solution from the

model is compared with the benchmark data set for both the inter-regional shipment and inter-

sector transaction flows.

         In the second experiment, we use the region-specific input-output coefficients as constant

in the model. We re-estimate the interregional shipment data as the first experiment, and compare

the model solution with the benchmark data set for the inter-regional trade data only.

         In the third experiment, we assume the interregional shipment pattern is known with

certainty and we use the three region's weighted average IO coefficients as initial estimates to

estimate the region-specific input-output accounts.

         In the fourth experiment, Batten's model was used to estimate the interregional shipment

and individual region's IO flows. In the fifth to the seventh experiments, experiments 1-3 were

repeated by using the extended model. Solutions from both models are compared with the "true"

interregional trade and inter-sector IO flow data in the aggregated GTAP data set. The

assumptions, initial estimates and expected model solution are summarized in table 1.

         (Insert Table 1 here)

Measures to Evaluate Test Results

Each experiment produces a different set of estimates, and it is desirable to know how much each

set of estimates differs from the true, known data. However, it is difficult to use a single measure

to compare the estimated results. Since there are so many dimensions in the model solution sets, a

particular set of estimates may score well on one region or commodity but badly on others. It is

meaningful to use several measures to gain more insight on the model performance in different

experiments. Generally speaking, it is the proportionate errors and not the absolute errors that

matter; therefore, the "mean absolute percentage error" with respect to the true data will be

calculated for different commodity and regional aggregations. Consider the following aggregate

index measure for intra/inter-regional trade flows:



                                                  17

                                                    n   m  m
                                            100 ·              |di - di |
                                                                   sr     sr

(23)                            MAPED=             i=1 s=1 r=1
                                                     n  m
                                                            m
                                                                di sr

                                                    i=1 s=1 r=1


Alternating the removal of summations over i, s, and r in equation (23) produces MAPE estimates

on shipments by commodities, shipping regions, and receiving regions respectively. For regional

intermediate transactions, the aggregate MAPE index is defined as:

                                                    n   n  m
                                            100 ·             | zij - zij |
                                                                 ·r    ·r


(24)                            MAPEZ =            i=1 j=1 r=1
                                                    n  n
                                                           m
                                                                zij
                                                                 ·r

                                                  i=1 j=1 r=1


Alternating the removal of summations over i, j, and r in equation (24) produces MAPE estimates

on intermediate transactions by inputs, using sectors, and regions respectively. The model and all

test experiments are implemented in GAMS and the complete GAMS program and related data

set are available from the authors upon request.

Test Results

         Table 2 summarizes all the eight measurement indexes from the seven experiments listed

in Table 1. The accuracy of the estimates is judged by their closeness to the true interregional

trade and individual region's input­output flows aggregated from the GTAP database.

        (Insert Table 2 here)

        Generally speaking, the model has remarkable capacity to rediscover the true

interregional trade flows from the highly distorted data. The estimated shipment data are very

close to the true data, as judged by the eight MAPE measurements, in all testing experiments

except the Batten model. Most of the mean absolute percentage errors are about 4-7 percent of the

true data value, which implies the model has great potential in the application of estimating

interregional trade flows. In contrast, recovering the individual region's input-output flows from

weighted average national values only obtained limited success, indicating national IO




                                                 18

coefficients in detailed sectors may be the best place to start in building regional IO accounts if

there is no additional prior information on regional technology or cost structure available.9

         Comparing estimates from different test experiments, there are several interesting

observations. First, when there is no additional information that could be incorporated into the

estimation framework, a more detailed model may not perform better than a simpler model.

Comparing results from Exp-1 and Exp-5, the more sophisticated extended model actually brings

less accurate estimates overall because of increasing numbers of unknown variables without

additional known data. However, as results in Experiments 2, 3, 6, and 7 show, the estimation

accuracy does improve by a more detailed model when more useful data become available.

Second, the marginal accuracy gained from actual individual regional IO flows is significant in

estimating interregional trade flows using the extended model, but very small in the aggregate

version. In contrast, the marginal value of accurate interregional shipment data is rather small in

estimating individual regional IO coefficients under both versions of the model. Finally, Batten's

model performed poorly in interregional shipment estimation, but obtained similar estimates on

individual regional IO flows as our model, providing further evidence that there may be no high

dependency between individual regional IO coefficients and interregional trade flows. However,

this is not a firm conclusion because the particular data set used to test the model in this paper

may be part of the problem. Since the United States, EU and Japan are all large economies, their

intermediate demands are largely met by their own production. Therefore, the correlation between

individual inter-industrial flow and inter-regional shipments may be particularly low.

          The extended model only provides better estimates of interregional shipments when

regional IO data are available, so the aggregate version of the model specified in this paper may

be the best practitioner's tool in estimating interregional trade flows because of the lack of sub-

national IO data in the real world. It demands less statistical information and has a smaller model

dimension, which facilitates the implementation and computation process.10




                                                  19

4. IMPLICATIONS FOR APPLYING THE MODEL

        Results in the previous section offer some guidance for applying the framework outlined

in this paper to real world statistics. It was found that initial estimates of regional commodity

trade flows based on survey data with very high statistical variability are highly preferable (in the

experiments) to a widely used non-survey approach for producing initial estimates.11 This finding

holds promise for opportunities to use other survey data to recover unobserved regional economic

accounts. It was also found that solving an aggregate account (e.g., a MRIO or MR-SAM) as an

intermediate step is at least as accurate (in the experiments) as producing a direct solution to an

extended account (e.g., IRIO or IR-SAM) when superior data unique to the later are not widely

available. This finding is useful when working with regional economic accounts of considerable

sector and region details. Results also support the product mix approach, whereby the most

feasible sector detail for regional gross output estimates are used to derive weighted average

national technical coefficients for more aggregated regional sectors.

        Statistical systems vary by nation and no one-size fits all rules exist that tell us how to

seamlessly employ every data-system to best advantage.12 However, there are general guidelines

for implementing the optimization framework presented in this paper to a large dimension multi-

regional account. To facilitate discussions of implementation, we assume that a detailed national

account always exists and regional sector statistics are also available in a variety of details. Then

the implementation process may be classified into three broad phases as discussed below.

Develop Independent Estimates for Major Components of a Multi-regional Account

        It has been stressed as far back as Wilson (1970) that information used to produce

parameters and initial estimates of a regional economic system should be estimated

independently. While this produces unbalanced initial accounts, it avoids introducing spurious

information that can lead to biased estimates (McDougall, 1999). A useful approach is to partition

the multi-regional account into components that coincide or are related to known statistical

survey series published regularly in the nation under study.



                                                  20

         For the multi-regional IO account outlined in equations (1) to (9), the major components

are gross regional output ( xi ), final demand ( yi ), primary factor payments (vi ), international
                              r                     r                                 r




trade ( ei and mi ), inter-industry transactions ( zij ) and inter/intra-regional trade flows ( di ). In
         r        r                                 ·r                                           sr



many cases, data for several of these components are available from a single major statistical

survey series--for example, in the United States xi and vi are available from an Economic
                                                         r        r




Census conducted every five years. Other components, for example yi , may themselves require
                                                                            r



multiple disparate data sources to compile. While the strategic groupings may differ by country, it

is likely that for large dimension (N × M) multi-regional accounts, primary data for individual

regional sectors become sparse.

         When the best available data are not consistent to the model structure, it may be

necessary to restructure the adding up requirements in the model to accommodate the data. For

example, in equation's (2) and (3) of our model, the accounting identities require data for

international exports (ei ) and imports ( mi ) on an origin of movement and destination of use
                         r                   r



basis respectively. However, in many countries such as the United Sates, port of entry/exit data

are far more reliable. Therefore, different formulation of the corresponding accounting identities

should be used.

         For certain elements of the multi-regional account, very often only a purely theoretical

inference is available to produce informed guesses about the initial estimates. A common

example is the information about service trade flows within and between regions. In using a

theory-based alternative to data, a case must be made for a prevailing empirical model that

calibrates the unobserved activities to some other statistics or available survey data.

Determine Model Dimensions Based on Maximum Concordance among Different Components

         In compiling different components of the multi-regional account, the volume and nature

of data available for each component can greatly vary. Detailed and survey based data may be




                                                   21

obtained on, for example, gross regional output and incomes, but survey data on the inter/intra-

regional trade flows of this output may be far less detailed. Inter-industry transactions may only

be available at the national level, and international trade data may be very detailed, but based on a

different product classification system. The notion of conservatism, both in the information

theoretic sense and in terms of computational burden, should be the primary guiding principal in

reconciling this information.

         Robinson et al. (2001) interpret conservatism by the rule of using `only, and all'

information in the estimation problem. Considering this rule in the present context, the fact that a

component such as gross regional outputs are available from highly detailed and reliable statistics

suggests all this information should be used. However, if the associated intra/inter-regional trade

flow account has more general product aggregations than the output account, it appears that one is

faced with an `only or all' decision. Although the specific situation often guides the approach one

takes, it is worth noting that there are usually many opportunities to introduce all information

available into the estimation process.

         In practice, conserving on computational burden may also become an issue. When

employing a more general estimation framework such as the model presented in this paper, the

use of iterative techniques that diminish computational burden may not be readily available.13

Both computer hardware and software available to the researchers may become binding in many

such instances. For example, access to special solvers or greater programming finesse becomes a

more prominent issue when computational burdens grow tremendously as model dimension

increase. In addition, while conventional personal computers have improved dramatically, limits

on current 32-bit operating systems to manage sufficient memory on PC's may become a binding

constraint for very large models. Solutions to these issues can become expensive.

Add Additional Constraints to Use All Available Information

         The greatest opportunities to use all relevant information are in the form of additional

binding linear constraints, beyond the adding up and consistency requirements, on any selected



                                                   22

groups of variables in the aggregate or extended model. Information deemed `superior' and that is

related to any group of elements in either the aggregate or extended accounts is a candidate for a

linear constraint. Since both interregional and multi-regional economic accounts are

comprehensive and detailed, there are many opportunities to introduce such constraints. A few

general guidelines are notable.

         Both the aggregate and extended accounts describe flows of payments and products in the

form of a matrix with known adding-up and consistency requirements. Any information used to

formulate new constraints--either equality or inequality linear constraints--can greatly diminish

the feasible solution set of the calibration procedure. However, new constraints that are non-

binding add no information to the problem, but do increase the computational burdens.

         Where and how information is used to formulate constraints depends on many factors.

For example, the U.S. Government has published state measures of farm productivity that include

estimates of purchased farm inputs by state for broad input categories. A pro-rated version of this

data could form the basis for additional linear constraints for agricultural sector I-O flows in the

model. Other restrictions could be designed to replicate certain highly reliable economic statistics

that can be formed by special groupings of certain flow statistics contained in the account being

estimated. Although such information must be carefully compiled, their incorporation in the form

of constraints will improve the estimation accuracy greatly.


5. CONCLUSIONS AND DIRECTION FOR FUTURE RESEARCH

         This study constructed a mathematical programming model to estimate interregional

trade patterns and input-output accounts based on an interregional accounting framework and

initial estimates of interregional shipments in a national system of economic regions. The model

is quite flexible in its data requirements and has desirable theoretical and empirical properties. An

empirical test of the model using a 4-region, 10-sector example aggregated from a global trade

database shows that the model performs remarkably well in discovering the true patterns of




                                                   23

interregional trade from highly distorted initial estimates on interregional shipments. It shows the

model may have great potential in the estimation and reconciliation of interregional trade flow

data, which often are the most elusive data to assemble. In addition, solutions from the aggregated

model exactly provides the data needed for a MRIO model and the solution from the extended

model exactly provide the data needed for an IRIO model. This will greatly reduce the data

processing burden in such analysis. Therefore, application of the model will further facilitate

quantitative economic analysis in regional sciences.

         Lessons from the experiments in this study shaped our view on approaches for applying

the model to real data from a particular nation's statistics. A logical conclusion is that widely

available and disparate survey data on the economy, including commodity flows data and

incomplete geographic data, can effectively be used to substantially narrow the margins for error

in obtaining feasible solutions to interregional input-output systems. It is also evident that data on

region-to-region commodity flows represent a limiting factor in determining the optimal sector

dimensions to be solved in the modeling framework.

         However, there are important questions not yet answered by the current study. First, test

results from the data set aggregated from GTAP also show that our model's ability to improve the

IO transaction estimates of individual regions from national averages may be limited. Continuing

research on the real underlying causes and means of improvement are needed to further enhance

the model's capacity as an estimating and reconciliation tool in building interregional production

and trade accounts. Second, the relative importance of regional sector output, value-added,

exports, imports and final demand as model input in the accuracy of a model solution is also not

analyzed, and could be addressed with minor changes of the current model. Third, the approach

employed in this study draws primarily from regional science and constrained matrix balancing

literatures. How insights from economic geography theory can help define a bounded solution

needs to be explored. Finally, the robustness of the model's performance should be further tested

using other data sets.



                                                   24

Footnotes:


1. Wilson (1970) had suggested an entropy maximizing solution for a model which integrated

gravity models and multi-regional input-output equations as constraints to estimate inter-regional

commodity flows. However, his work did not clearly incorporate a complete system of national

and regional input-output accounts as did in Batten (1982).

2. Using Monte Carlo simulation, Robinson, Cattaneo and El-said (2001) shows that when

updating column coefficients of a Social Accounting Matrix (SAM) is the major concern, the

cross entropy method appears superior, while if the focus is on the flows in the SAM, then the

two methods are very close with the RAS performing slightly better.

3. The variables di  sr and zij have no counterparts in Batten's framework, reflecting important
                              ·r



departures in the present approach.

4. The quadratic functional form has a numerical advantage in implementing the model. It is

easier to solve than the entropy function in very large models because they can use software

specifically designed for quadratic programming.

5. The minimand objective function reflects the principle that the 'distance' between the posterior

and initial estimates should be minimized. What we would like is to minimize the 'distance'

between the posterior estimates and the unknown true values. This 'distance' cannot be measured,

but a good estimation procedure should have a desirable influence on it.

6. Zeros can become non-zeros and vice versa under a quadratic penalty function. However, a

side effect for the cross entropy function is that if there are too many zeros in the initial estimates,

the whole problem may become infeasible.

7. The assignment of an intermediate (j) or final demand use (h) of international imports has no

counterpart in Batten's notation since he makes no such assignments. Either approach is valid and

would be dictated by the data available.

8. By incorporated the 6 accounting identities that the sum of all regions in the nation should

equals their national totals defined in equations (4-9), the model could be solved independently

without use of the earlier model, however, the dimension and data requirements of the model will

be much larger than the aggregated model.

9. Following the product mix method outlined in Miller & Blair (1985), initial estimates of IO

coefficients for each of the 10 aggregated industries are unique for each region. They are

weighted averages of the 3-region detailed (50-industry) IO coefficients where the weights are the

gross regional outputs of the relevant detailed industries. Experiment results show that a "product

mix" approach improves the accuracy of the true regional IO flow estimates compared to an



                                                    25

approach that directly uses the 3-region average IO coefficients, although the differences are

small in our particular model aggregation.
10. The aggregate model only has N(NM+M2+5M) variables and N(3M+N+5) constraints, while
the extended model has (N2M + NHM)(M+1) variables and N(M2+NM+N+5) constraints. This is
a much larger model, having NM2(N-1) + NM(HM-5) more variables and MN(M+N-3)

additional constraints.

11. A random normal distortion of the `true' trade data by an average of 400-percent was

produced in the previous section to simulate a well designed but poorly sampled transportation

survey of annual commodity flows.

12. Comprehensive studies by West (1990) and Lahr (2001) consider how to identify and use

superior data in a regional accounting system context.

13. For example, by allowing both regional technical coefficients and intra/inter-regional flows to

adjust, the optimal solution to the cross-entropy or quadratic formulations in section 2 must be

jointly solved.




                                                26

                                              References



Bartholdy, Kasper. 1991. "A Generalization of the Friedlander Algorithm for Balancing of
National Accounts Matrices," Computer Science in Economics and Management 2, 163-174.

Batten, David F. 1982. " The Interregional Linkages Between National and Regional Input-
Output Models," International Regional Science Review, 7, 53-67.

Batten, David F. and D. Martellato. 1985. "Classical Versus Modern Approaches to Interregional
Input-Output Analysis," Annals of Regional Science, 19, 1-15.

Boomsma, Piet and Jan Oosterhaven. 1992. " A Double-entry Method for the Construction of Bi-
regional Input-Output Tables," Journal of Regional Sciences, 32(3), 269-284.

Bregman, L. M. 1967. "Proof of the Convergence of Sheleikhovskii's method for a problem with
transportation constraints," USSR Computational Mathematics and Mathematical Physics, 1(1),
191-204.

Byron, R. P. 1978. "The Estimation of Large Social Account Matrix," Journal of Royal Statistical
Society, A, 141(Part 3), 359-367.

Byron, R. P., P.J. Crossman, J.E. Hurley and S.C. Smith. 1993 "Balancing Hierarchial regional
Accounting Matrices," Paper presented to the International Conference in memory of Sir Richard
Stone, National Accounts, Economic Analysis and Social Statistics, Siena, Italy.

Golan, Amos, George Judge, and Sherman Robinson. 1994. "Recovering Information From
Incomplete or Partial Multisectoral Economic Data," The Review of Economics and Statistics,
LXXVI(3), 541-549.

Harrigan, Frank J. 1990. "The Reconciliation of Inconsistent Economic Data: the Information
Gain," Economic System Research, 2(1), 17-25.

Harrigan, Frank J. and Iain Buchanan. 1984. "A Quadratic Programming Approach to Input-
Output Estimation and Simulation." Journal of Regional Science, 24(3), 339-358.

Isard, Walter, Iwan Azis, Matthew P. Drennan, Ronald E. Miller, Sidney Saltzman, and Erik
Thorbecke, eds., 1998. Methods of Interregional and Regional Analysis. New York: Ashgate
Publishing Company.

Jensen, Rodney C. 1990. "Construction and Use of Regional Input-output Models: Progress and
Prospects" International Regional Science Review, 13(1 & 2), 9-25.

Lahr, M.L. 2001. "A strategy for producing hybrid regional input-output tables," in Lahr,
Michael, and Erik Dietzenbacher (eds.), Input-Output Analysis: Frontiers and Extensions.
Basingstoke, U.K: Palgrave, pp. 211-242.

McDougall, R.A., A. Elbehri, and T.P. Truong. 1998. "Global Trade Assistance and Protection:
The GTAP 4 database," Center for Global Trade Analysis, Purdue University.




                                               27

McDougall, R. A. 1999. "Entropy Theory and RAS are Friends," Paper presented at the 5th
conference of Global Economic Analysis, Copenhagen, Denmark.

Miller, R. E. and P.D. Blair. 1985 Input-Output Analysis: Foundations and Extensions.
Englewood Cliffs, New Jersey: Prentice Hall.

Mohr, M., W. H. Crown and K. R. Polenske. 1987. "A Linear Programming Approach to Solving
Infeasible RAS Problems." Journal of Regional Sciences, 27(4), 587-603.

Nagurney, A. and A.G. Robinson. 1989. "Equilibration Operators for the Solution of Constrained
Matrix Problems," Working Paper, OR 196-89, Operations Research Center, MIT.


Polenske, Karen R. 1980. The U.S. Multiregional Input-Output Accounts and Model. Lexington,
Mass.: Lexington Books.

Robinson, Sherman, Andrea Cattaneo and Moataz El-Said. 2001. " Updating and Estimating a
Social Accounting Matrix Using Cross Entropy Methods" Economic System Research, 13(1), 47-
64.

Senesen, G. and J. M. Bates. 1988. "Some Experiments with Methods of Adjusting Unbalanced
Data Matrices." Journal of the Royal Statistical Society, A. 151(Part 3), 473-490.

Stone, R. 1984. "Balancing the national accounts. The adjustment of initial estimates: a neglected
stage in measurement," in A. Ingham and A.M. Ulph (eds.), Demand, Equilibrium and Trade.
London: Macmillan.

Stone, R., J. M. Bates and M. Bacharach. 1963. A programme for Growth, Vol. 3 Input-Output
Relationship 1954-1966, London: Chapman and Hall.

Stone, R., D.G. Champernowne and J.E. Meade. 1942. "The precision of national income
estimates." Review of Economic Studies, 9(2), 110-125.

Trendle, Bernard. 1999. "Implementing A Multi-regional Input-Output Model ­ The Case of
Queensland," Economic Analysis & Policy, Special Edition, 17-27.

van der Ploeg, F. 1982. "Reliability and the adjustment of Sequences of Large Economic
Accounting Matrices," Journal of the Royal Statistical Society, A. 145, 169-194.

van der Ploeg, F. 1984. "General Least Squares Methods for Balancing Large Systems and tables
of National Accounts," Review of Public Data Use, 12, 17-33.

van der Ploeg, F. 1988. " Balancing Large Systems of national Accounts," Computer Science in
Economics and Management 1, 31-39.

Weale, M. R. 1985. "Testing Linear Hypotheses on National Account data," Review of Economics
and Statistics, 67, 685-689.

Weale, M. R. 1989. "Asymptotic maximum-likelihood estimation of national income and
expenditure," Cambridge, mimeo.




                                                 28

West, G. R. 1990. "Regional Trade Estimation: A Hybrid Approach" International Regional
Science Review, 13,103-118.

Wilson, A. G. 1970. "Inter-regional Commodity Flows: Entropy Maximizing Approaches,"
Geographical Analysis, 2, 255-282.

Zenios, A. Stavros, Arne Drud and John M. Mulvey. 1989. " Balancing Large Social Accounting
Matrices with Nonlinear Network Programming." NETWORKS, 19, 569-585.




                                            29

TABLE 1: Experiment Design

Experiment      Data Know with                           Initial Estimates                   What is estimated
number          Certainty a                                                                   by the model
1               None             disr is distorted from the "true" data       disr                    Z and D

                                 zij = zij /(xj - vj × xrj - vrj
                                   ·r   (                ) (            )

2               Z = Z            D is distorted from the "true" data D                                  D only



3               D = D            zij = zij /(xj - vj × xrj - vrj
                                   ·r   (                ) (            )                                Z only



4               None
                                 zij =
                                   sr    xi + mi - ei
                                            s      s     s
                                                            × xrj - vrj                               Z and D

                                          xi + mi - ei        xj - vj   × zij

                                 yi = yi × xi + mi - ei /[xi + mi - ei ]
                                   sr       r   [  s      s     s]

                                 [Eqs. (16) and (17) in Batten (1982)]


5               None             zij = di × zij /
                                   sr       sr    ·r  [              ]       sr     sr     sr         Z and D
                                                           jzij + yi
                                                             ·r     r      yi = di -   jzij



6               Z = Z            zij = di × zij /
                                   sr       sr    ·r  [                ]     sr     sr     sr           D only
                                                           jzij + yi
                                                              ·r    r       yi = di -  jzij


7               D = D            zij = di × zij /
                                   sr       sr     ·r  [             ]       sr     sr     sr            Z only
                                                           jzij + yi
                                                             ·r     r       yi = di -  jzij

Notes:

a. In all experiments, national totals: zij, xi, yi, vi, ei, and mi are known with certainty, i.e. they

enter the model as constant. It is not necessary for the state totals-- xi , yi ,vi ,ei ,mi --be know
                                                                               r   r  r r   r



with certainty in the model, however, in all experiment reported in this paper, they enter the

model as constant.

b. In experiments 5-7, we did not distinguish different final demand types in the extended model.




                                                     30

TABLE 2: Mean Absolute Percentage Error from the True Data
Experiment #           Distorted priors  Exp-1         Exp-2  Exp-3  Exp-4 Batten model Exp-5        Exp-6 Exp-7
Indexes                disr      Ave. IO disr   zij
                                                  ·r   disr   zij
                                                               ·r    disr      zij
                                                                                ·r      di
                                                                                         sr   zij
                                                                                               ·r    disr  zij
                                                                                                             ·r


     Total MAPE
                       399.75    21.72   5.92   18.22  5.69   17.40  126.13    18.54   7.02   19.54  2.05  15.65
Receiving region MAPE
United States          265.83    17.28   8.75   19.03  8.68   15.41  129.88    16.49   10.46  24.12  3.90  13.82
European Union         447.06    20.94   3.97   15.31  3.61   15.72  111.73    16.51   4.93   14.74  0.74  14.22
Japan                  494.73    28.51   5.57   22.47  5.34   22.83  145.59    24.68   6.12   22.60  1.86  20.43
Sector MAPE I                    Inputs
Primary agriculture    304.53    25.48   5.37   25.61  5.19   24.61  125.51    34.92   7.51   27.43  1.67  23.16
Processed agriculture  319.40    14.18   9.99   15.73  10.67  11.82  129.42    13.06   9.74   18.23  2.97  10.81
Resource based sectors 392.24    53.70   3.16   20.06  5.52   21.76  135.00    13.28   4.10   15.17  2.15  16.90
Non-durable goods      312.28    15.85   4.46   9.03   3.85   10.04  127.87    11.44   5.82   10.72  3.36  9.38
Durable goods          413.91    13.69   4.81   12.74  4.36   12.02  121.60    14.06   5.24   12.91  3.38  10.43
Utility                774.76    22.36   5.29   22.56  1.40   22.62  121.86    24.73   5.93   23.30  0.95  24.08
Construction           484.64    44.19   3.34   21.58  2.61   21.16  133.12    22.53   3.63   23.87  0.01  18.45
Trade and Transport    406.12    21.53   12.24  22.47  12.68  22.11  130.52    20.83   13.04  26.37  3.08  23.83
Private services       245.15    20.86   4.47   20.56  5.07   19.35  126.71    20.30   5.83   21.55  1.17  17.31
Public services        539.32    30.69   2.48   29.30  1.30   27.49  118.65    29.77   6.01   30.08  0.62  16.12
Shipping region MAPE
United States          264.78            9.17          9.08          130.65            9.92          2.90
European Union         445.56            3.83          3.64          111.83            5.30          1.57
Japan                  495.24            5.28          4.80          144.28            6.22          1.75
Sector MAPE II                   Use
Primary agriculture              13.54          12.98         11.04            12.03          13.22        9.31
Processed agriculture            15.42          20.90         15.61            18.90          27.60        16.17
Resource based sectors           42.54          18.91         18.45            21.81          17.67        17.24
Non-durable goods                14.22          9.83          10.65            12.32          11.35        11.68
Durable goods                    19.07          11.37         11.73            12.40          11.25        11.31
Utility                          33.77          25.90         27.60            29.16          24.46        22.75
Construction                     42.75          43.54         41.74            46.29          43.43        41.60
Trade and Transport              21.89          22.42         20.04            20.88          29.75        18.02
Private services                 16.81          17.75         16.61            16.68          18.19        15.88
Public services                  51.25          46.73         46.64            50.94          40.98        16.26




                                                             30

RLuz
L:\JRSresub.doc
July 1, 2004 2:14 PM




                        30