Teaching with the Test: Experimental Evidence on Diagnostic Feedback and Capacity Building for Public Schools in Argentina

Despite the recent growth in the number of large-scale student assessments, there is little evidence on their potential to inform improvements in school management and classroom instruction in developing countries. This study conducted an experiment in the Province of La Rioja Argentina, that randomly assigned 105 public primary schools to: (a) a "diagnostic feedback" group in which standardized tests were administered in math and reading comprehension at baseline and two follow-ups and the results were made available to the schools through user-friendly reports; (b) a ?capacity-building? group for which schools were provided with the reports and also workshops and school visits for supervisors, principals, and teachers; or (c) a control group, in which the tests were administered only at the second follow-up. After two years, diagnostic feedback schools outperformed control schools by .34 and .36 standard deviations (SD) in third grade math and reading, and by .28 and .38 SD in fifth grade math and reading. The principals at these schools were more likely to report using assessment results for management decisions, and students were more likely to report that their teachers engaged in more instructional activities and improved their interactions with them. Capacity-building schools saw more limited impacts due to lower achievement at baseline, low take up, and little value-added of workshops and visits. However, in most cases the results cannot discard the possibility that both interventions had the same impact.


Introduction
A stylized fact of economic geography is that the productivity of firms increases with city size and urban density (Combes and Gobillon, 2015), and a large literature going back to Marshall (1890) explores the question of why cities have this productivity advantage. Microfoundations put forward for these agglomeration externalities are now typically grouped under the headings sharing, matching, learning and sorting (Duranton and Puga, 2004;Combes et al., 2008) and include different forms of knowledge spillovers between firms, costly trade, pro-competitive effects of city size, and sorting of workers (Syverson, 2011). The empirical literature suggests a rather consistent, across countries and years, range for the elasticity of productivity with respect to city size of 0.04-0.07 (Rosenthal and Strange, 2004). However, while theoretical micro-foundations for agglomeration externalities rest on differences across space in total factor productivity (TFP), i.e., the capacity to turn inputs into more physical output, empirical work has so far considered what we call revenue TFP measures (TFP-R), i.e., productivity calculated using revenue as a measure of output and so the capacity to turn inputs into more revenue.
To be more specific, researchers typically try to measure TFP as the residual obtained by estimating a production function through a regression of some measure of firm output on inputs. One key problem with this in practice is that usually the only output measures available are gross revenues or value-added, and not quantities. Revenues are of course made up of price and quantities. Even though industry-level price deflators are usually available they are of little use if the goal of the analysis is to pick up differences in productivity across space because they do not take into account differences in prices across locations. More broadly, revenue-based measures of productivity will pick up any heterogeneity in firmlevel prices, confounding efforts to measure 'true' physical TFP. This heterogeneity in prices across firms could be due to many factors including firm-level demand shifters, markups and production scale. At a regional level, for instance, if firms in larger cities systematically sell higher-priced, higher-quality goods, the econometrician working with a measure of revenue TFP will overstate the impact of city size on TFP. At the same time, establishing that part of the observed revenue productivity advantage of cities is due to factors other than technical efficiency would require a substantial reconsideration of agglomeration economies and in particular of the related mechanisms and policy implications.
In order to address these issues we make use of high-quality and detailed quantity, prices and revenue data on products produced by French manufacturing firms. This type of data is becoming more widely available, allowing researchers to measure firm-specific TFP while considering the presence of other forms of heterogeneities across firms, and that is what we do in this paper. More specifically, we build upon the framework developed in Forlani et al. (2016) -henceforth FMMM -that allows us to measure heterogeneity in TFP, demand and markups across firms while further providing an exact decomposition of revenue TFP. We employ the FMMM framework to measure these heterogeneities at the firm level and subsequently aggregate them at the location level to analyze differences in TFP, demand and markups across space.
We first highlight two strong patterns in the data relating revenue TFP and density. First, a substantial portion of the revenue productivity advantage of denser areas stems from product composition effects: denser areas are specialised in products generating a higher revenue TFP. Second, the way one aggregates firm-level data into regional-level data matters considerably for the measurement of the elasticity of revenue TFP with respect to density. More specifically, magnitudes are considerably larger when considering a weighted (by firm revenue or employment cost) as opposed to un-weighted data aggregation, while weighted results are more in line with the range suggested by regional-level studies (Rosenthal and Strange, 2004). 1 These patterns are driven by the relationship between firm revenue TFP and firm size (as measured by either revenue or employment cost) being positive in each location but systematically related to density: firms with higher (lower) TFP-R account for a larger (smaller) share of total revenue in denser areas. One way of interpreting this is that the market better allocates market shares across firms with heterogeneous productivities in denser areas so amplifying in aggregate regional-level figures any firm-level differences in productivity across space. These findings have important implications for regional policy. For example, they suggest that achieving regional convergence is not only about improving the TFP or the revenue TFP of firms in lagging regions but also increasing (decreasing) the relative size of the most (least) productive firms in those regions which might be hindered more than in denser regions by factors like inputs misallocation.
Concerning the factors driving the revenue productivity advantage of firms in denser areas that remains after accounting for the product composition and aggregation effects described above, we start by highlighting how, a properly defined and measured revenue TFP, should equal TFP plus the log price. Using information from the raw data we first document that prices are higher in denser areas. At the same time, quantities sold at this higher prices are higher too and so are revenues. This suggests that products sold by firms located in denser areas face a higher demand. Using measures obtained with the FMMM framework, we subsequently establish that marginal costs are higher while markups are lower in denser areas. Furthermore, there is no overall significant relationship between TFP and density and so the revenue TFP advantage of denser areas is mainly driven by higher prices. 2 By using complementary information from exports data, we also provide evidence that prices charged and quantities sold by firms located in denser areas are higher also when conditioning to a given destination market so suggesting that products sold by firms located in denser areas are of higher (actual and/or perceived) quality. The above results have further implications for regional policy. In particular, the current policy approach is based on the presumption that firms in lagging regions are characterized by a lower TFP and so interventions are directed towards increasing their technical efficiency. However, our evidence suggests that interventions should rather promote firms' product quality and marketing capabilities in order to increase revenue TFP in lagging regions.
In terms of data we make use of Eurostat's Products of the European Community (Prodcom) dataset. Prodcom consists of surveys, standardized across the European Union, of firm-level production that cover over 90% of output in manufacturing industries at a detailed (8-digit) level. We use the French Prodcom provided by the Institut National de la Statistique et desÉtudesÉconomiques (INSEE) for the 2008-15 period. Firm balance sheet and location information comes from the Fichier Approché des Résultats d'Esane (FARE) database and cover the same period 2008-2015. We use Zones d'Emploi (ZE) as our spatial unit, a measure of local labour markets similar in construction to UK Travel-To-Work-Areas.
In order to provide reassurance about the robustness of our results we employ two estimation techniques: the estimation procedure developed in FMMM as well the procedure described in De Loecker et al. (2016) -henceforth DGKP. Indeed, under the assumptions laid down in FMMM, the same revenue productivity decomposition holds and both estimation procedures are valid. We find results of both procedures to be qualitatively identical and also quantitatively very similar. We further provide a number of additional results showing that our key findings are little affected by whether we focus on the sample of single-product firms or the larger sample of single-product and multi-product firms, by whether we employ the number of full-time equivalent employees or the total wage bill to measure the labour input, by whether we consider firm revenue or firm wage bill to weigh observations, by whether we eliminate the Paris area or not, as well as by whether we use a Cobb-Douglas or a Translog production function.
Our paper is closely related to the literature on the measurement of agglomeration economies. Rosenthal and Strange (2004) and Combes and Gobillon (2015) provide summaries of this literature and agree on a range for the key elasticity of productivity with respect to density of 0.04-0.07. 3 These findings are robust to the endogeneity of current economic density and in particular to the use of long lags of historical density as instruments for current density (Ciccone and Hall, 1996;Ciccone, 2002). However all these findings, including Combes et al. (2012), 4 relate to measures of revenue TFP. By contrast, we use data on quantity, prices and revenue to measure TFP and, via the decomposition provided in FMMM, we unravel the revenue TFP advantage of denser areas into its components.
Our paper is also related to the literature on firm TFP measurement on which Olley and Pakes (1996) has had a deep impact. The key endogeneity issue addressed in Olley and Pakes (1996) is omitted variables: the firm observes and takes decisions based on productivity shocks that are unobservable to the econometrician. Yet, the econometrician observes firm decisions (investments) that do not impact productivity today and that can (under certain conditions) be used as a proxy for productivity shocks. This proxy variable approach to tackle the issue of unobservable productivity shocks has been further developed in Levinsohn and Petrin (2003), Wooldridge (2009) and Ackerberg et al. (2015) while De Loecker et al. (2016) and Forlani et al. (2016) provide frameworks consistent with the presence of heterogeneity across firms in TFP, demand and markups.
The outline of the remainder of this paper is as follows. Section 2 provides details on the data we use. Section 3 presents the model and revenue TFP decomposition of FMMM while further providing highlights of the estimation procedures. Section 4 presents our main results and findings while Section 5 contains a number of additional results and robustness checks. Section 6 concludes while the Appendix provides additional Tables and details on the estimation procedures.

Data
This Section describes the data used to study productivity and agglomeration in France. Our analysis focuses on the period 2008-2015. The core data required to estimate firm-level revenue productivity using standard methodologies comprises revenue (and/or value-added), labour, intermediates and capital. For these variables we turn to FARE, an annual census of French firms carried out by INSEE.
From the FARE dataset we take firm labour, intermediates and capital variables. The capital stock variable represents the reported book value of capital while intermediates is the value of intermediate inputs and services. For labour, we use the number of full-time equivalent employees. Some productivity studies use the firm wage bill instead on the grounds that this controls in some way for the ability of workers. We prefer to use the number of fulltime equivalent employees as benchmark, while providing additional results obtained using the wage bill, for the following reasons. Our aim is not to establish what share of the productivity advantage of denser areas is related to workers' skills and abilities (possibly due to sorting of better workers across space), but rather to establish how much of the observed revenue-based productivity advantage of firms located in denser areas is due to actual TFP differences as opposed to demand and markups differences. In this light, we prefer to use a measure of the labour input allowing our firm-level revenue TFP and quantity TFP to incorporate differences in workers' skills and abilities across locations. Furthermore, as discussed in Section 4, using the number of full-time equivalent employees allows to more clearly establish whether products sold by firms located in denser locations actually require more inputs to be produced as opposed to more expensive inputs.
FARE can be matched, via the unique firm identifier (SIREN code), to another dataset, the 'Répertoire des entreprises et desétablissements', providing us with the location of the establishments of each firm. We use information on the municipality (commune) which we subsequently match to the corresponding 'Zone dEmploi' (ZE), a measure of local labour markets similar in construction to the UK Travel-To-Work-Areas, of which there are 297 in mainland France (excluding overseas territories and Corsica). In order to give a more causal flavor to our results, in some of our regressions we instrument for current density building on an approach that is standard in the literature: using long-lagged historical densities as instruments for current densities (Combes and Gobillon, 2015). In particular, we use population density in 1831, 1861 and 1891 as our instruments. In doing so, we had to take into account two additional issues. First, historical censuses did not record municipalities which had a population of less than 5,000 in their respective years. At the ZE-level, this still leaves 24 ZEs as having zero population in 1831 so they exert no weight in subsequent regressions. Second, historical municipalities do not exactly match those of today. Several no longer exist having been subsumed over the course of 150 years of administrative changes. We deal with these by manually matching to the modern ZE.
In our investigations, we consider firms as the unit of analysis and restrict our attention to firms whose establishments (if more than one) are all located in the same ZE so that we can uniquely associate a firm to a ZE at a given point in time. In this respect, we believe that the most natural unit of analysis for productivity, demand and markups heterogeneity is the firm and not the establishment. Furthermore, inputs and outputs data are available at the level of the firm and not the establishment and so measuring productivity, demand and markups heterogeneity across establishments would necessarily involve debatable assignment procedures. In doing so, while applying some cleaning to the data, 5 we end up with 628,940 firm-year observations, corresponding the NACE two-digit industries 10-32 (Manufacturing), that we label FARE sample.
Quantity TFP estimation requires data on production quantities and sales values, and that information is available in the Products of the European Community (Prodcom) dataset at a detailed product level. Prodcom is a firm-level survey of manufacturing and production carried out by EU national statistical agencies using an 8-digit nomenclature established by Eurostat. The first four digits correspond to the 'Nomenclature Statistique des Activités Economiques dans la Communauté Européenne' (NACE) revision 2, and the first six digits to the 'Classification of Products by Activity' (CPA) with the last 2 digits adding further detail. There are approximately 3,800 Prodcom codes per year. 6 The Prodcom survey captures at least 90% of production in all the four digit industries covered by the survey. Illustrating the advantages of highly disaggregated data, Table 1 shows an extract from the 2014 Prodcom list for the six-digit code 13.10.61: 'Cotton yarn (other than sewing thread)'. wage bill and/or capital and/or value added. We then apply a small trimming (top and bottom 0.5%) based on the distribution of the following four ratios: intermediates over sales, wage bill over sales, capital over sales and sales in t over sales in t − 1. We further apply a final trimming based on the ratio between intermediates plus the wage bill over sales and also drop 2 digit sections with less than 500 observations. 6 In order to deal with Prodcom codes changing over time we use the correspondence Tables provided by Eurostat RAMON and apply the methodology described in Van Beveren et al. (2012) to construct a time-consistent products breakdown. In practice, from 2008 to 2015 there have been only minor changes in Prodcom codes.
As can be appreciated from Table 1, the eight-digit product breakdown is quite detailed and working at this level of disaggregation allows us to take into account rich differences in technology, demand and degree of competition across finely defined products. The Fare sample includes firms with complete balance sheet data in NACE 2 industries 10-32 that remain after an initial cleaning of the data. The Prodcom sample includes the subset of such firms that are in the Prodcom dataset. In both samples, an observation is a firm-year combination. SP and MP refer to single-product and multi-product firms in the Prodcom sample that have been subject to further data cleaning. We consider two samples: 1) the sample of SP and MP; 2) the sample of SP. In both samples an observation is a firm-product-year combination. For SP a firm-product-year combination corresponds to a unique firm-year combination. Monetary values are in current thousands of euros.
The Fare sample can be matched to Prodcom by means of the unique firm identifier (SIREN code). We label the matched sample, comprising 201,261 firm-year observations, as Prodcom. We subsequently applied the following cleaning procedures: • Drop products whose unit of measure is not consistent over time • We identify single-product firms as those firms that produce only one product or produce a product representing 90% or more of their total sales.
• Drop 2 digit sections with less than 500 single-product firms observations • Drop observations corresponding to extreme markups values (top and bottom 1%) This leaves us with a sample of 189,017 (121,004) firm-product-year (firm-year) observations for single-product and multi-product firms combined (SP+MP sample) and 55,432 firm-product-year observations for single-product firms only (SP sample). Clearly, for singleproduct firms a firm-product-year combination corresponds to a unique firm-year combination. Table 2 reports summary statistics for various samples. Textiles; Wearing apparel; Leather and related products 16+17 Wood and wood products; Paper and paper products 18 Printing and reproduction of recorded media 20+22 Chemicals and chemical products; Rubber and rubber products 23+24 Other non-metallic mineral products; Basic metals 25 Fabricated metal products, except machinery and equipment 26+27+28 Computer, electronic and optical products; Electrical equipment; Machinery and equipment n.e.c. 29+30 Motor vehicles, trailers and semi-trailers; Other transport equipment 31 Furniture 32 Other manufacturing There are several NACE sections missing from the SP+MP and SP samples. Section 19 (Manufacture of coke and refined petroleum products) is not part of Prodcom. Sections 10-12 (Manufacture of food products, beverages and tobacco products) are covered, and in many countries typically provide both a large number of observations, and contribution to economywide production. However, in France the Prodcom data for these sections is collected and stored separately to the main survey and we do not have access to it. We exclude section 21 (Manufacture of pharmaceuticals, medicinal chemical and botanical products) and section 30 (Manufacture of other transport equipment) when dropping sections with less than 500 single-product firms observations. Finally, we apply some aggregation across sections in order to increase the number of observations for industry-specific production function estimations ending up with the industry grouping reported in Table 3.

The MULAMA model: TFP-R decomposed
This Section follows FMMM and in particular we provide here the single-product firm version of the model. See FMMM and Appendix C for the multi-product firm extension of the model. The model is labelled MULAMA because of the names of the 3 heterogeneities it allows for: markups MU, demand LAMbda and quantity productivity A. Crucially, the MULAMA model allows to derive an exact decomposition of revenue-based TFP in terms of the underlying heterogeneities so bridging the gap between quantity TFP and revenue TFP.
In our empirical investigations, we perform estimations and provide results based on both the single-product firms sample and the larger sample of single and multi-product firms. There are pros and cons for each of the two samples. On the one hand, the sample of single and multiproduct firms is larger and more representative of the population of French manufacturing firms. On the other hand, using single-product firms requires fewer assumptions in order to measure markups, demand and productivity heterogeneity. In particular, as discussed in more detail in DGKP and FMMM, the key operational issue with multi-product firms is the assignment of inputs to outputs. Produced quantities and generated revenues are observable for the different products of each firm in databases like ours. However, information on inputs used for a specific product is typically not available. Therefore, in order to handle multiproduct firms, one needs to lay down additional assumptions in order to solve the problem of assigning inputs to outputs. We provide in Appendix C a full description of the procedure used to assign inputs to outputs. As in DGKP, our procedure first requires estimating the parameters of the production function using single-product firms only.

Demand heterogeneity
In what follows we index firms by i and time by t and denote with lower case the log of a variable (for example r it denotes the natural logarithm of revenue R it ). Standard profit maximization (marginal revenue equal to marginal costs) implies that the elasticity of revenue R it with respect to quantity Q it is one over the profit maximizing markup:

10
Electronic copy available at: https://ssrn.com/abstract=3594013 is the profit maximizing markup. This result comes from static profit maximization and holds under different assumptions about demand (representative consumer and discrete choice models) and product market structure (monopolistic competition, monopoly and standard forms of oligopoly). Despite the log revenue function, i.e., the function relating log revenue to log quantity, being both unknown and potentially different across firms, equation (1) provides us with the slope of the firm-specific log revenue function for firm i while data on the actual log revenue r it and log quantity q it referring to firm i provide us with a point where such firm-specific log revenue function cuts through the (q, r) space. If we now linearize the log revenue function around the observed data point (q it , r it ) with a slope given by 1 µ it we can uniquely pin down an intercept for this linearized log revenue function on the r axis. We use such interceptλ it as our measure of demand heterogeneity: 7 Given our definition ofλ it observed firm log revenue is simply and soλ it is a firm-specific log revenue shifter 8 corresponding to the log price firm i would face if selling one unit of its product. 9 While being general and intuitive, this measure of demand heterogeneity also maps to more formal and explicit differences in the underlying structure of preferences. In particular, FMMM show thatλ it = λ it µ it where λ it is a parameter characterizing differences in utility derived from the consumption of products sold by different firms. More specifically, consider a representative consumer who maximises at each point in time t a differentiable utility function U (.) subject to budget B t : To simplify notation we ignore components that are constant across firms in a given time period or within a product category. Those constants will be captured in our empirical analysis by a suitable set of dummies.
8 Demand heterogeneity is the variation in revenue that is not explained by variation in quantities, i.e., two firms selling the same quantity but generating a different revenue (because of a different price). Therefore, demand heterogeneity is a firm-specific log revenue shifter given quantity (or equivalently a firm-specific log price shifter given quantity).
9 At the intercept point q it = 0 and so we have Q it = 1 from which R it = P it and r it = p it =λ it . Note this has no implications whatsoever about the presence/absence of a choke price.
whereQ is a vector of elements Λ it Q it and λ it = log(Λ it ). Therefore, while the representative consumer chooses quantities Q, these quantities enter into the utility function asQ and Λ it can be interpreted as a measure of the perceived quality/appeal of a particular variety. FMMM show that the log revenue function corresponding to the above preferences r(q it , λ it ) can be approximated, around the observed profit-maximizing solution, by the linear function: and so λ it is: Two things are worth nothing at this stage. First, (5) is valid as a first-order linear approximation and is the counterpart of (2) meaning that the log revenue shifterλ it , what FMMM label demand heterogeneity, maps via markups into differences in product appeal across firms' varieties λ it =λ it µ it . Second, while the shape the function relating revenue to quantity and product appeal will depend upon the specific underlying preferences, FMMM show that (4) applies to any preferences structure that can be used to model monopolistic competition and for which a well-behaved differentiable utility function exists. 10 This includes standard CES preferences as well as generalized CES preferences (Spence, 1976) 11 , CARA preferences (Behrens et al., 2014), HARA preferences (Haltiwanger et al., 2018), Translog preferences (Feenstra, 2003) as well as the class of Variable Elasticity of Substitution (VES) preferences discussed in Zhelobodko et al. (2012) and Dhingra and Morrow (2019). For example, in the case of CARA preferences, which are non-homothetic, the underlying utility behind heterogeneity in product appeal across firms would be: where Ω t is the set of varieties available at time t. Finally, FMMM provide examples suggesting that a log-linear approximation of the revenue function, which is behind both the construction ofλ it and its interpretation as a markupadjusted measure of product appeal, works well in many specifications. For example, Figure 1 plots two CARA log revenue functions obtained using two different values for product appeal: λ it =1 for log revenue function 1 and λ it =2 for log revenue function 2. 12 As can be appreciate from Figure 1, a linear approximation looks both reasonable and accurate for most of the 10 FMMM also show λ it is a measure characterizing differences in utility in the oligopoly model developed in Atkeson and Burstein (2008) and further refined in Hottman et al. (2016) 11 In the case of CES and generalized CES preferences (5) holds as an equality because the log revenue function is linear in both q it and λ it .
12 The other parameters are α=0.001 and the lagrange multiplier κ t =0.001.
relevant part of the two log revenue functions, i.e., within the range where log revenue (and revenue) is increasing because the marginal revenue is positive and the demand is elastic.

Markups and marginal costs
As far as markups are concerned FMMM build upon a result, first highlighted in Hall (1986) and implemented in De Loecker and Warzynski (2012) and DGKP among others, based on cost-minimization of a variable input free of adjustment costs (materials in our empirical implementation) and price-taking behaviour on the inputs side (the cost of materials W M it is allowed to be firm-time specific but it is given to the firm). The proof goes as follows.
Starting from the definition of marginal cost: Now define the markup as: We thus have:

13
Electronic copy available at: https://ssrn.com/abstract=3594013 Multiplying by Q it and dividing by M it on both sides we get: Re-arranging we finally have: The simple rule to pin-down markups is consistent with many hypotheses on product market structure (monopolistic competition, monopoly and standard forms of oligopoly) and consists in taking the ratio of the output elasticity of materials ( ∂q it ∂m it ) to the share of materials in ). Measuring the output elasticity of materials requires estimation of the coefficients of the production function while the share of materials in revenue is directly observable in most datasets (including ours). For example, in the case of a Cobb-Douglas production function with 3 inputs (labour L, materials M and capital K) and with (log) quantity TFP being labeled as a it , log quantity is: and so the output elasticity of materials is constant and equal to α M meaning that µ it = α M s M it . When instead considering a Translog production function log quantity is: and so: Therefore, with estimates of the production function coefficients at hand, (6) can be used to recover firm-specific markups. At the same time, using information on prices and markups, one can recover the marginal cost: Finally, with markups as well as log quantity and log revenue, (2) can be used to get a measure of demand heterogeneityλ it .

Quantity TFP
The last step to close the model involves estimating the parameters of the production function and so recover quantity TFP a it and subsequently markups, marginal costs and demand heterogeneity as explained above. There are many different hypotheses, and related estimation procedures, one can use in order to achieve this and in what follows we provide two examples. One readily available approach to estimate the production function, that is consistent with the underlying presence of heterogeneity in markups and demand, is provided in DGKP. This methodology relies on the popular proxy variable approach pioneered by Olley and Pakes (1996) and in particular, starting from the conditional input demand for materials, adds to such function a number of observables (prices and market shares in particular) to proxy for unobservables (markups and demand heterogeneity in our framework) while further imposing invertibility of the conditional input demand for materials. More specifically, DGKP build on the GMM approach outlined in Wooldridge (2009) and in particular consider the leading case of an AR(1) process for productivity: where G ar represents geographical factors affecting productivity (like the density of economic activities), 13 and ν ait stands for productivity shocks that are iid and represent innovations with respect to the information set of the firm in t − 1. Therefore, productivity shocks ν ait are uncorrelated with past values of all firm-level variables (capital, revenue, quantity, etc.) including productivity. However, the productivity level a it is allowed to be correlated with past and present firm-level variables and in particular is a variable considered by the firm when making choices in t.
Under the (usual) additional assumption that capital is predetermined in t, i.e., capital is chosen beforehand and cannot adjust immediately to shocks ν ait occurring in t, 14 the firm will thus consider capital as given in t and will choose the optimal amount of materials in order to minimize costs based on the given values of capital k it and TFP a it as well as the price of materials W M it . Such optimal amount will in general be a deterministic function h(.) of k it , a it and W M it . Furthermore, with underlying differences in markups and demand, h(.) will also depend on markups µ it and product appeal λ it . Finally, if labour has also been chosen prior to t (because it is like capital difficult to adjust in the wake of short-term shocks ν ait ), then h(.) will also contain l it : m it = h(k it , l it , a it , W M it , µ it , λ it ). If h(.) is globally invertible with respect to a it , the inverse function a it = g(k it , l it , m it , W M it , µ it , λ it ) exists and is well behaved and so one can use a semi-parametric polynomial approximation of g(.) in order to proxy for the unobservable (to the econometrician) quantity TFP a it . Furthermore, given also W M it , λ it and µ it are unobservable (to the econometrician), DGKP suggest using regional variables G r 13 The index r denotes the region where firm i is located at time t. In our empirical analysis, we use for G ar both the log of the 2009 population and the log of the land area of region r. Given our relatively short time frame (2008-15), it would not make much sense to consider a time-varying population.
14 Capital can nonetheless adjust to shocks ν ait at time t+1.
as well as the observable output price and market share of firm i as proxies for W M it , λ it and µ it in the semi-parametric approximation of g(.), 15 that so becomes a function of observables only. Operationally, g(.) is thus approximated by a polynomial function in the 3 inputs, G r , the output price and the market share. We provide more details on the DGKP approach and estimation procedure in Appendix A. Two shortcomings of the DGKP approach are related to its implicit assumptions and the amount of identifying variation. More specifically, existence and invertibility of a suitable conditional input demand for materials implies making implicit assumptions about demand and market structure that are nor readily verifiable. Furthermore, in the main estimation procedure described in DGKP firm market share (de facto firm revenue) and price in t − 1 are, among other things, added as covariates in a regression where quantity at time t in on the left-hand side. Therefore, there might be little variation left to precisely identify technology parameters. 16 In an attempt to address these two issues FMMM develop an alternative estimation methodology that does not rely on the proxy variable approach. More specifically, FMMM use both the first-order approximation of the log revenue function (4) and the production function equation to recover technology parameters. Indeed, FMMM are sufficiently explicit about demand to be able to explicitly write the log revenue function in terms of observables and heterogeneities and use both this and the production function equation to estimate technology parameters. The key disadvantage of this methodology is that one has to be explicit about the process governing the evolution of product appeal λ it and in particular FMMM assume it follows an AR(1) process. 17 In our analysis, we further allow for product appeal to be related to geographical factors G λ r which is a straightforward extension of FMMM. More specifically, in our implementation of the FMMM procedure we use: where G λr represents geographical factors affecting demand (like the density of economic activities), 18 and ν λit stands for product appeal shocks that are iid and represent innovations 15 DGKP cite Kugler and Verhoogen (2011) who document how producers of more expensive products also use more expensive inputs so suggesting that observable output prices could be reasonably used to proxy for unobservable input prices.
16 DGKP use the market share in their preferred Translog production function specification. When using a Cobb-Douglas production function, DGKP argue that there is no need to use the market share. 17 λ it captures consumers' perception of a firm's products quality and appeal; something that arguably does not change much from one year to another. It takes years of effort and costly investments to firms to establish their brand and build their customers' base very much like it takes years of effort and costly investments to firms to put in place and develop an efficient production process for their products. FMMM thus argue that there are profound similarities between the processes of productivity (typically modelled as an autoregressive process) and product appeal.
18 In our empirical analysis, we use for G λr both the log of the 2009 population and the log of the land area with respect to the information set of the firm in t − 1. However, we do not impose (in line with FMMM) any constraints on the correlation between product appeal shocks ν λit and quantity TFP shocks ν ait and so ultimately we do not impose a priori any constraints on the correlation between product appeal λ it and quantity TFP a it . Indeed, our results confirm previous findings in FMMM of a negative correlation between product appeal (as well as demand heterogeneity) and quantity TFP irrespective of whether we use the FMMM or the DGKP procedure. This is suggestive of a trade-off between the appeal/perceived quality of a firm's products and their production cost which in line with findings in the demand system literature (Ackerberg et al., 2007). We provide more details on the FMMM approach and estimation procedure, which builds upon both (10) and (11), in Appendix A.

TFP-R decomposed
To appreciate how the MULAMA model is useful in linking revenue-based TFP and quantitybased TFP note that, with standard Hicks-neutral TFP, one can write the log of the production function as q it =q it + a it whereq it is an index of inputs use that we label log scale. 19 Finally, by defining revenue TFP as T F P R it ≡ r it −q it and using equation (3) while substituting we get: meaning that T F P R it is a (non-linear) function of quantity-based TFP a it , the log revenue shifterλ it , the profit-maximizing markup µ it and log production scaleq it . (12) can also be made linear by considering markups-adjusted quantity TFP and log scale (ã it = a it µ it and so that T F P R it differences across firms located in different regions can be decomposed as the sum of differences inã it ,λ it andq it across such firms. In this respect, we note again that while the Urban Economics literature has focused on models featuring differences in quantity TFP across space, the empirical evidence we have gathered so far is at best about revenue TFP and in this respect our framework can shed new light on the determinants of differences in T F P R it across space.

A few last remarks
In our empirical investigations, we perform estimations and provide results based on both the DGKP and FMMM estimation procedures while considering the former as the baseline procedure. In both cases, we consider the Cobb-Douglas production function (7) as the leading case while providing some robustness results based on the Translog production function (8). In all instances we assume, in light of the features of the heavily regulated French labour market, that labour is predetermined, i.e., it cannot immediately adjust to short-term productivity or demand shocks. Furthermore, we measure the labour input with the number of full-time equivalent employees, as in Combes et al. (2012), 20 while providing some robustness results where we use the total wage bill to measure the labour input. Crucially, we will see later on that our key findings are little affected by whether we use the DGKP or the FMMM estimation procedure, by whether we employ the number of full-time equivalent employees or the total wage bill to measure the labour input as well as whether we use a Cobb-Douglas or a Translog production function. Last but not least, we also provide results based on both the single-product firms sample and the larger sample of single and multi-product firms while considering the latter as our preferred sample. Again, our key findings are little affected by which sample we use. Three last operational issues are worth noting. First, as customary in productivity analyses, we correct (in all estimations) for the presence of measurement error in output (quantity and revenue) and/or unanticipated (to the firm) shocks using the methodology described in DGKP and on which we provide key highlights in Appendix B. Second, we perform TFP estimations separately for each two-digit industry (NACE Sections) and consider a full battery of 8-digit product dummies, as well as year dummies. Indeed, quantity in the data is measured in units (kilograms, litres, number of items, etc.) that are specific to each 8-digit product and so quantity TFP a it can be reasonably compared across firms and space only within an 8-digit product category. For similar reasons, λ it can also be reasonably compared across firms and space only within an 8-digit product category. Therefore, as we discuss in more detail below, our analysis will focus on differences across locations in prices, quantities, quantity TFP, markups, etc. within 8-digit product categories. Third, in comparing firm outcomes across space we are faced with the issue of how to deal with firms having more than one establishment. One solution, followed by Combes et al. (2012), is to consider singleestablishment firms only. Despite serving the purpose, we believe this strategy is not ideal because it leaves out the group of large multi-establishment firms representing nearly half of employment. Therefore, in our analysis we adopt a different approach. More specifically, we consider firms as the unit of analysis and restrict our attention to firms whose establishments (if more than one) are all located in the same ZE so that we can uniquely associate a firm to a ZE at a given point in time. In this respect, we believe that the most natural unit of analysis for productivity, demand and markups heterogeneity is the firm and not the estab-lishment. Furthermore, inputs and outputs data are available at the level of the firm and not the establishment and so measuring productivity, demand and markups heterogeneity across establishments would necessarily involve debatable assignment procedures.
4 Main results 4.1 Analysis of the firm-level measures obtained with the MU-LAMA model Table 4 provides estimates of the coefficients of the Cobb-Douglas production function (7) obtained with the DGKP procedure applied the sample of single-product firms (as in DGKP). Coefficient estimates are in line with expectations given a three inputs production function and in particular materials coefficients are larger than labour coefficients which are in turn larger than capital coefficients. 21 Overall, there seems to be evidence of slightly decreasing returns to scale while coefficients are comparable to those reported in FMMM and DGKP using quantity and revenue data for Belgian and Indian firms, respectively.   We start from the sample of single-product firms and, using materials, labour and capital coefficients from Table 4, as well as data on quantity produced and inputs used, we compute quantity TFP a it as a residual from (7). Further using the coefficient of materials, as well as the revenue share of materials, we get markups µ it from (6). The marginal cost M C it is instead obtained from (9) using prices and markups while demand heterogeneity is computed from (2) using markups as well as log quantity and log revenue. Finally, revenue TFP and its components are derived from (12) and (13). We subsequently apply the inputs assignment procedure described in Appendix C to allocate inputs across the different products of multiproduct firms and use the above equations to obtain quantity TFP, markups, marginal costs, demand heterogeneity, as well as revenue TFP and its components, for each firm-productyear combination. The combined sample (that we label 'SP+MP firms') comprises both single-product and multi-product firms and spans over a total of 189,017 firm-product-year observations corresponding to 121,004 unique firm-year combinations. Notes: Summary statistics refer to the sample of SP and MP firms. An observation is a firmproduct-year combination. For SP a firm-productyear combination corresponds to a unique firmyear combination. Table 5 provides some summary statistics of the various MULAMA model measures for the SP+MP firms sample. For most measures, averages and/or medians are of little value per se and what matters is instead data variation. Concerning revenue TFP we find, in line with FMMM, that MULAMA TFP (TFP-R) is characterized by a standard deviation of about 0.5, which is also in line with the standard deviation of other TFP-R measures obtained from our data. 22 As for the standard deviations of quantity TFP and demand heterogeneity, they are again comparable to results reported in FMMM and much larger, for both quantity TFP and demand heterogeneity, than the standard deviation of TFP-R. Furthermore, there is actually more variation in demand heterogeneity values than quantity TFP values so suggesting that heterogeneity in demand is a key component of firm idiosyncracies being at least as sizeable as heterogeneity in productivity. Last but not least, the average markup across observations is 1.161 which compares to a value of 1.158 obtained by FMMM with data on Belgian firms.
Tables 6 and 7 provide a number of OLS regressions suggesting the correlations between the various elements of the MULAMA model are coherent with both intuition and economic  Notes: * p < 0.1; ** p < 0.05; *** p < 0.01. Standard errors clustered by firm.
Regressions include year dummies as well as 8-digit product dummies. Estimations are carried on the sample of SP and MP firms.
theory. For example, column (1) of Table 6 provides results of a regression where quantity TFP is regressed on the marginal cost while further considering year dummies as well as 8-digit product dummies and clustering standard errors at the firm level. The coefficient is negative and highly significant, as expected, and quite close to one. Column (2) of Table  6 displays results of a similar regression where the dependent variable in now the markup. The coefficient is negative and significant indicating that firms with a lower marginal cost charge a higher markup. In this respect, note that a negative relationship between markups and marginal costs is not a property of any well-behaved preferences structure: it points into the direction of preferences featuring increasing relative love for variety or sub-convexity from which pro-competitive effects come from. 23 Moving to column (3) of Table 6 one can appreciate that prices are increasing with the marginal cost with a pass-through elasticity of about 0.9, which is again in line with results from FMMM. Related to this point, FMMM note that a 0.9 average cost pass-through elasticity might seem too high compared to existing macro evidence (Campa and Goldberg, 2005). However, by looking at detailed product-destination level price and quantity data on French exporters, Berman et al. (2012) provide evidence that standard macro/aggregate measures of pass-through elasticity mask substantial heterogeneity across firms with many firms actually being characterized by a very high pass-through elasticity. More specifically, they show that the pass-through elasticity is decreasing in firm size and productivity with the un-weighted average across firms standing at 0.83 and a near complete pass-through elasticity for smaller and less productive exporters. 24 In Table 7, column (1) provides results of a regression where demand heterogeneity (the revenue shifterλ) is regressed on quantity TFP while further considering year dummies as well as 8-digit product dummies and clustering standard errors at the firm level. The coefficient is negative and highly significant, as in FMMM, and is suggestive of a trade-off between the appeal/perceived quality of a firm's products and their production cost as indicated in the demand system literature (Ackerberg et al., 2007). Column (2) further indicates that markups are increasing in quantity TFP (again pointing into the direction of preferences featuring increasing relative love for variety or sub-convexity) as well as in the revenue shifter λ. At the same time, firms with larger investments, i.e., firms with a higher log capital in our regression, tend to charge (for given quantity TFP and demand heterogeneity) lower markups, which is consistent with these firms maximising their profits by selling higher quantities and so facing a more elastic portion of the demand curve. Moving to column (3), one can appreciate that firm revenue is increasing, as it should be, with respect to quantity TFP as well as with the revenue function shifterλ and the revenue function slope 1/µ. In terms of marginal costs, column (4) indicates that they are, as intuition would suggest, negatively related to TFP also when controlling for the interceptλ and the slope 1/µ of the revenue function. Furthermore, marginal costs are increasing in bothλ and 1/µ suggesting that firms facing a higher demand curve (because of higherλ and/or higher 1/µ) do spend more resources to produce their products. Such products are thus likely to be higher quality products also from a production point of view and not simply from the view point of consumers' perception. Finally, column (5) shows that prices decrease with quantity TFP while increasing in bothλ and 1/µ, which is what one would expect if our measures capture well what they are supposed to measure. 4.2 On the revenue productivity advantage of denser areas: aggregation and product composition Table 8 provides a number of OLS regressions where standard revenue productivity measures at the firm level are regressed on the log of population density of the ZE where firms are located using various firm samples. More specifically, we use three measures of revenue productivity and four different samples. The three revenue productivity measures are: 1) log value added per worker; 2) revenue TFP obtained as a residual of a three inputs Cobb-Douglas production function estimation where output is measured by revenue and coefficients are estimated via OLS (OLS TFP-R); 3) revenue TFP obtained as a residual of a three inputs Cobb-Douglas production function estimation where output is measured by revenue and coefficients are estimated using the insights provided in Wooldridge (2009) (Wooldridge TFP-R). In terms of samples we use: 1) the FARE sample; 2) the Prodcom sample; 3) the SP+MP firms sample; 4) The SP firms sample. In all regressions, we add time and industry (4-digit) dummies while standard errors are clustered at the ZE level.   Notes: * p < 0.1; ** p < 0.05; *** p < 0.01. Standard errors are clustered at the ZE level. Regressions include time and industry (4-digit) dummies. The Fare sample includes firms with complete balance sheet data in NACE 2 industries 10-32 that remain after an initial cleaning of the data. The Prodcom sample includes the subset of such firms that are in the Prodcom dataset. In both samples, an observation is a firm-year combination. SP and MP refer to single-product and multi-product firms in the Prodcom sample that have been subject to further data cleaning. We consider two samples: 1) the sample of SP and MP; 2) the sample of SP. In both samples an observation is a firm-product-year combination. For SP a firm-product-year combination corresponds to a unique firm-year combination.
As one can appreciate, the density elasticity parameter varies a bit depending on the revenue TFP measure considered, and in particular value added per worker is characterized by somewhat higher coefficients. However, coefficients remains rather stable across samples for a given revenue TFP measure suggesting that focusing, as we do below, on the SP+MP sample or the SP sample does not appear to be particularly at odds with the relationship between revenue TFP and density in wider samples. At the same time, the range of magnitudes (0.6% to 3.2%) includes the value of 2.5% reported in Combes et al. (2012) and obtained by aggregating firm-level data at the ZEs-level without using any particular weights.
In Tables 9 and 10 we focus on the SP+MP sample and run very similar regressions to those performed in Table 8. Again we consider the same three revenue productivity measures employed for Table 8 while also adding MULAMA revenue TFP (TFP-R). At the same time, we always add year dummies but consider either 2-digit or 8-digit product dummies in order to highlight the importance of product composition in measuring the elasticity of revenue TFP with respect to density. Furthermore, while in Table 9 we perform weighted regressions giving equal weight to all firms located in the same ZE (what we label as number of firms weighted), 25 in Table 10 we perform weighted regressions giving different weights to firms located in the same ZE depending on their revenue (what we label as revenue weighted). 26 In both cases we shift, by means of regression weighting, the unit of analysis from firms (Table  8) to ZEs (Tables 9 and 10). However, in doing so we either give the same importance to all observations corresponding to a ZE, which means we ultimately compare the average firm across ZEs in the regressions, or we give an importance that is proportional to the revenue share within a ZE, which means our regressions at the firm level should be more comparable to macro/aggregate regressions run at the regional level. Finally, in all regressions we cluster standard errors at the ZE level.
By looking at Tables 8 and 9 one can draw three conclusions. First, coefficient values are very similar between the two Tables suggesting that whether the unit of analysis is the firm or the average firm in a location does not matter much for the measurement of the relationship between revenue TFP and density. Second, coefficients reported in Table 9 and obtained using either 2-digit or 8-digit dummies are very similar suggesting that product composition effects do not play a big role here. Third, coefficients corresponding to the MULAMA revenue TFP (TFP-R) are very much in line with other measures of revenue TFP (OLS and Wooldridge).
The comparison of Tables 9 and 10 is more interesting and reveals two important results we highlight below: Result 1: Weighting impacts the measurement of the elasticity of revenue productivity with respect to density.
Result 2: A substantial portion of the aggregate revenue productivity advantage of denser areas stems from product composition effects.
Regarding Result 1, by simply comparing Table 9 and Table 10 it appears prominently that coefficients in the latter are larger and, particularly when considering simple 2-digit product dummies, more in line with the 4-7% range suggested by aggregate regional-level 25 In the number of firms weighted case, each firm-product-year observation is weighted by 1/N r where N r is the total number of firm-product-year observations corresponding to the ZE r. 26 In the revenue weighted case, each firm-product-year observation is weighted by R ipt /R r where R ipt is firm i revenue corresponding to product p at time t and R r is the sum of R ipt across the firm-product-year observations corresponding to the ZE r.      Notes: * p < 0.1; ** p < 0.05; *** p < 0.01. Standard errors clustered by ZE. Regressions are weighted and include year dummies as well as either 2-digit or 8-digit product dummies. Estimations are carried on the sample of SP and MP firms. Each firm-product-year observation is weighted by R ipt /R r where R ipt is firm i revenue corresponding to product p at time t and R r is the sum of R ipt across the firm-product-year observations corresponding to the ZE r. Note that, since regressions use weights, the R 2 does not necessarily improves when considering 8-digit dummies instead of 2-digit dummies.
studies (Rosenthal and Strange, 2004). The reason for this behavior lies in the relationship between revenue TFP and revenue. In spatial modelsà la Melitz (2003) like, for example, Behrens et al. (2017) there is a one to one mapping between firm TFP, as well as revenue TFP, and firm revenue within each location: a firm with higher TFP/revenue TFP will have a higher revenue and so a higher revenue share within a location. However, while the correlation between firm revenue TFP and firm revenue in our data is positive in each and every ZE (ranging between 0.050 and 0.788), it is far from one and systematically related to density. In particular, in denser areas the linear relationship is stronger meaning that firms with higher (lower) TFP-R account for a larger (smaller) share of total revenue in denser regions. One way of interpreting this is that the market better allocates market shares across firms with heterogeneous productivities in denser areas so amplifying in aggregate revenueweighted figures any firm-level differences in productivity across space. Regarding Result 2, estimates obtained using 2-digit product dummies are systematically larger, sometimes close to a factor of two, than estimates obtained using 8-digit product dummies and this is particularly the case when considering revenue weighting. This suggests that a considerable portion of the observed aggregate revenue productivity advantage of denser areas comes from these areas being specialised in 8-digit products generating a higher revenue TFP as opposed to denser areas generating a higher revenue TFP for a given 8-digit product.
Tables D-1 to D-4 in Appendix D provide additional evidence of Results 1 and 2 by further looking at other samples: FARE, Prodcom and SP firms. More specifically, Tables D-1 and D-2 perform the very same analysis of Tables 9 and 10 for SP firms. Table D-3 displays the same regressions reported in Table 8 with 2-digit industry dummies and using revenue weighting across all firm samples. In the same vein, Table D-4 covers all firm samples while using 6-digit industry dummies and revenue weighting. 27 4.3 On the revenue productivity advantage of denser areas: demand matters From now onwards we systematically control for 8-digit product dummies, and so concentrate on the revenue productivity advantage stemming from denser areas generating a higher revenue TFP for a given 8-digit product, while providing both revenue weighted and number of firms weighted results. In particular, we now exploit the valuable information provide by the Prodcom database: quantities and prices. In doing so we more directly move the center of the analysis from firms to locations by aggregating firm-level variables, or more precisely firm-product-year variables, at the ZE level. 28 However, before doing any aggregation, we first demean these variables by 8-digit product and year. For the aggregation, we use either revenue weights or number of firms weights as in the previous Section while using robust standard errors in all ZE level regressions.
More specifically, in order to construct the unique log price measure corresponding to the ZE r, we first subtract from the raw log price information of firms located in the ZE r the corresponding, with respect to the specific product of the firm and the year, mean log price across all locations. We then aggregate up these deviations from 8-digit product and year averages across all firm-product-year observations corresponding to the ZE r using, for example in the case of revenue weights, the revenue share within ZE r corresponding to each observation as weight. 29 In doing so we thus end up with a unique measure of prices, quantities, and revenues, for each ZE that is consistent across ZEs. We then regress these measures on the log of population density corresponding to each ZE while clustering standard errors at the ZE level. Furthermore, in order to give a more causal flavor to our results, we instrument for current density building on an approach that is standard in the literature: using long-lagged historical densities as instruments for current densities (Combes and Gobillon, 2015). In particular, we use population density in 1831, 1861 and 1891 as our instruments. The corresponding underidentification and weak-identification tests are reported in Tables 11 and 12 and strongly support the use of such instruments.  Stock and Yogo (2005). Starting from the sample of SP and MP firms, firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by R ipt /R r where R ipt is firm i revenue corresponding to product p at time t and R r is the sum of R ipt across the firm-product-year observations corresponding to the ZE r.
The first three columns of Tables 11 and 12 provide results for log quantity, log revenue and log price, respectively. Note that this part of our analysis simply makes use of raw data and so it is not affected in any way by the possible limitations and restrictions of the MULAMA model. Furthermore also note that, because of the way we constructed variables and the properties of linear estimators, the density coefficient corresponding to log revenue is equal to the sum of the density coefficients corresponding to log quantity and log price. In this respect, inspection of Table 11 for revenue weighted results and Table 12 for number of firms weighted results, reveals another important result: 29 Formally, our measure of log price is p r = ipt∈r (p ipt −p pt )w ipt , where the weight w ipt is either 1/N r or

27
Electronic copy available at: https://ssrn.com/abstract=3594013  Stock and Yogo (2005). Starting from the sample of SP and MP firms, firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by 1/N r where N r is the total number of firm-product-year observations corresponding to the ZE r.
Result 3: Prices are higher in denser areas. At the same time, quantities sold at this higher prices are higher too and so are revenues.
Regarding Result 3, this evidence is present in both revenue weighted results and number of firms weighted results, while being quantitatively stronger in the former. Furthermore, Tables D-10 and D-11 discussed in the next Section also show these patterns hold in the SP firms sample data. Result 3 is consistent with the idea that firms located in denser areas face, on average, higher demand curves than firms located in less dense areas so being able to sell higher quantities even though charging higher prices. Result 3 has also clear and strong implications for the revenue productivity advantage of denser areas. Indeed, from the definition of revenue productivity we have T F P R it ≡ r it −q it = p it + q it −q it = p it + a it , i.e., revenue TFP is quantity TFP plus the log price. Therefore, even if quantity TFP was on average the same across locations, the fact that firms in denser areas are able to charge higher prices will boost their revenue TFP.
The fact that demand and prices are higher for goods produced in denser regions does not necessarily mean that firms located in such areas sell higher (actual and/or perceived) quality products. For example, in the extreme case where demand is fully local and products are only horizontally differentiated, demand and prices could be higher in denser regions because of the high concentration of service sectors (driven by agglomeration economies) consuming manufacturing products and boosting local wages and consumption. In order to shed light on this issue we provide below two additional pieces of information.
First, in columns (4) and (5) of Tables 11 and 12 we push the analysis forward by making use of some of the measures obtained from the MULAMA model: log marginal costs and log markups. For an individual firm log price is equal to log marginal cost plus log markup. In our aggregate regressions, because of the way we constructed variables and the properties of linear estimators, the sum of the density coefficients of log marginal cost and log markup equals the density coefficient of log price. In this respect, results provided in Tables 11 and 12 strongly suggest that the single most important reason why prices are higher in denser areas is because marginal costs are higher. Furthermore, given we use the number of full-time equivalent employees to measure the labour input rather than the wage bill, the fact that marginal costs are higher is not mechanically due to wages being higher in denser areas. As far as log markups are concerned, they are lower in denser areas but significantly so only in the case of number of firms weighted regressions. Notes: * p < 0.1; ** p < 0.05; *** p < 0.01. Standard errors clustered by firm. Regressions include year dummies as well as 8-digit product dummies. Estimations are carried on the sample of SP and MP firms. The first column reports results of an un-weighted OLS regression while column two provides results of a weighted OLS regression where each firm-product-year observation is weighted by R ipt where R ipt is firm i revenue corresponding to product p at time t.
The fact that marginal costs are higher in denser areas is in line with the idea that products sold there are of higher actual quality, and so they require more inputs to be produced, but it is not yet a proof. For example, marginal costs could be higher in denser areas simply because firms located there move along an increasing marginal cost curve in order to meet the requirements of a higher demand rather than having their marginal cost curve upward shifted because of a more expensive and higher quality product being produced. In this respect, Table  13 shows the results of a simple OLS regression across firm which is meant to give an idea of by how much marginal costs should be higher in denser areas given the additional quantity sold. More specifically, in order to reconstruct the shape of the log marginal cost curve, in Table 13 we regress the log marginal cost corresponding to a firm-product-year observation 29 in the SP+MP sample on the corresponding quantity TFP and log quantity. To show that coefficients are not much affected by firm weighting and/or sample choice we report in column 1 (2) of Table 13 simple un-weighted (firm-weighted) results while reporting in Table D-5 of Appendix D both un-weighted and firm-weighted results for the SP firms sample. Turning to Table 13, the coefficient of quantity TFP is around -1 and strongly significant which makes sense. As for the coefficient of log quantity, it is around 0.2 indicating that, for example, a 10% higher quantity for given TFP would imply a 2% higher marginal costs. In this respect, column 1 of Table 11 indicates that doubling density increases quantity sold by 14.54% which should translate, for given TFP and marginal cost curve, into about 3% higher marginal costs. Yet the same Table 11 indicates in column 4 that doubling density is associated to a 4.62% higher marginal cost. Repeating the same exercise with Table 12 provides an expected, from Table 13 and column 1 of Table 12, higher marginal cost of about 1% compared to a 2.51% coming from column 4 of Table 12. These findings are somewhat supportive of the idea that marginal costs are higher in denser areas compared with what they would be if quantities sold were the same, i.e., that products sold by firms located in denser areas cost more and are of a higher actual quality. Notes: * p < 0.1; ** p < 0.05; *** p < 0.01. Standard errors are clustered at the ZE level. We use exports data provided by the French customs. We first match exports data over the period 2008-2014 to the relevant sample data (FARE, Prodcom, SP+MP and SP) and so discard multi-ze firms from the analysis. We further eliminate observations with missing prices and trim the data based on the top and bottom 1% of the distribution of the demeaned (by HS 8-digit product-country-year) log prices. We also apply a trimming based on the top 3% of the value of exports by ZE. We then use as y variables firm-product-country-year log quantity, log revenue and log price and regress those variables on the log density of the location of the firm along with product-destination-year dummies using the Stata command areg.
The second and more substantial piece of evidence to support the claim that products of firms located in denser regions are of higher perceived/actual quality comes from exports data. Exports represent a substantial portion of French manufacturing firms sales. For example, the overall 2015 goods exports to manufacturing production ratio was 0.7727 while using the sum of manufacturing production and goods imports as denominator delivers a ratio of 0.4187 for 2015. We thus match firm-product-country-year level data on French exporters over the period 2008-2014 30 to our samples and compare quantities, revenues and prices of the same product sold in the same destination and year by firms located in more or less dense areas. We do so for all of the four firm samples we consider in our analysis and overall find a consistent message provided in Table 14. More specifically, in Table 14 we regress log export quantity, log export revenue and log export price (unit value) on the log density of the location of the firm along with product-destination-year dummies and using revenue weights. Evidence across samples is consistently supportive of products coming from denser areas being sold in higher quantities, despite higher prices, in the same market.
Considering all of the above evidence we draw Result 4: Result 4: Marginal costs are higher and markups are lower in denser areas. At the same time, marginal costs are higher in denser areas also because of a higher product quality.

On the revenue productivity advantage of denser areas: it is all about demand
Tables 15 and 16 provide additional insights into the productivity advantage of denser areas by exploiting more measures obtained from the MULAMA model. In particular, columns (1) to (3) report MULAMA revenue TFP (TFP-R), quantity TFP (TFP) and the log price. For an individual firm, revenue TFP is quantity TFP plus the log price. In our aggregate regressions, because of the way we constructed variables and the properties of linear estimators, the sum of the density coefficients of quantity TFP and log price equals the density coefficient of revenue TFP. Results in Tables 15 and 16 point to the same direction, with findings referring to revenue weights being stronger in magnitude as in the rest of the analysis, and allow establishing a further important result: Result 5: The revenue productivity advantage of denser areas is driven by higher prices with no overall significant differences in quantity TFP.
The picture emerging by combining Results 3 to 5 can be summarized as follows. Manufacturing firms located in denser areas are not necessarily characterized by a significantly higher quantity TFP. They do, however, enjoy a revenue TFP advantage due to their capacity to produce and sell higher demand products at higher prices and in larger quantities compared to firms located in less dense areas. Furthermore, their products are characterized by lower markups and higher marginal costs and part of this higher costs reflects an actual higher product quality.
Additional insights are provided in columns (5) and (7) of Tables 15 and 16. More specifically, looking at the density coefficients related to the log revenue function interceptλ and slope 1/µ reveals that only the latter is significantly and positively related to density suggesting that firms in denser areas face a higher revenue function, i.e, face a higher demand Notes: * p < 0.1; ** p < 0.05; *** p < 0.01. Instruments for log density are 1831, 1861 and 1891 log density. Robust standard errors reported. The 'LM stat' is a LM test statistic for under-identification and 'Under-identif. p-value' is the corresponding p-value. 'Wald F stat' is the first-stage F test statistic corresponding to the excluded instruments and is a test statistic for weak identification. See Stock and Yogo (2005). Starting from the sample of SP and MP firms, firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by R ipt /R r where R ipt is firm i revenue corresponding to product p at time t and R r is the sum of R ipt across the firm-product-year observations corresponding to the ZE r. Notes: * p < 0.1; ** p < 0.05; *** p < 0.01. Instruments for log density are 1831, 1861 and 1891 log density. Robust standard errors reported. The 'LM stat' is a LM test statistic for under-identification and 'Under-identif. p-value' is the corresponding p-value. 'Wald F stat' is the first-stage F test statistic corresponding to the excluded instruments and is a test statistic for weak identification. See Stock and Yogo (2005). Starting from the sample of SP and MP firms, firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by 1/N r where N r is the total number of firm-product-year observations corresponding to the ZE r.
curve, mainly because of a higher slope. Finally, columns (4) to (6) provide results of the revenue TFP decomposition of equation (13) with density coefficients of columns (4) to (6) adding up to the density coefficient of revenue TFP in column (1). We already discussed λ is not significantly increasing with density and column (4) points to a similar result for markups-adjusted TFPã. It is markups-adjusted scaleq it = (1−µ it )q it µ it that is significantly higher in denser areas because of firms selling higher quantities and using more inputs, and so a larger scale, coupled with a higher revenue function slope. 31

Two examples
Results 1 to 5 refer to the aggregate of manufacturing products. Therefore, it might well be the case that, for some specific products, there is a positive and significant relationship 31 Note that (1−µit) µit = 1 µit −1 and so the higher the revenue function slope 1 µit the higher is markups-adjusted scale.
between TFP and density. In this respect, we provide here one such example: 'Ready mixed concrete' (NACE code 2363). 32 Indeed, this particular industry/product has been the object of a number of studies 33 also suggesting that there are significant differences in TFP across space. At the same time, we also provide an example, among many others, of a particular industry ('Manufacture of other parts and accessories for motor vehicles'; NACE code 293) behaving as the aggregate of manufacturing products.  Stock and Yogo (2005). Focusing on the sample of SP and MP firms producing products belonging to the 'Ready-mixed concrete' industry (NACE code 2363), firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by R ipt /R r where R ipt is firm i revenue corresponding to product p at time t and R r is the sum of R ipt across the firmproduct-year observations (belonging to the industry 'Ready-mixed concrete') corresponding to the ZE r.
Tables 17 and 18 provide the same type of information contained in Tables 12 and 15, but refer to the sub-sample of firm-product-year observations corresponding to the production of 'Ready mixed concrete'. 34 At the same time, Tables 19 and 20 refer to the sub-sample of firm-product-year observations corresponding to the production of 'Manufacture of other parts and accessories for motor vehicles'. 35 Table 12 indicates that, within the 'Ready mixed concrete' sample, firms located in denser areas sell higher quantities and generate higher revenues but do not charge significantly higher or lower prices, while having overall similar marginal costs and markups with respect to firms located in less dense areas. Furthermore, Table 15 reveals that 'Ready mixed concrete' firms located in denser areas are characterized by 32 'Ready mixed concrete' corresponds to a unique 8-digit Prodcom code. 33 See, for example, Syverson (2004) and Syverson (2008). 34 There are 726 firm-product-year observations corresponding to 'Ready mixed concrete' distributed across 123 ZEs. 35 There are 2,036 firm-product-year observations corresponding to 'Manufacture of other parts and accessories for motor vehicles' distributed across 184 ZEs. a higher revenue TFP and that this is entirely driven by a higher TFP. At the same time Table  19 indicates that, within the 'Manufacture of other parts and accessories for motor vehicles' sample, firms located in denser areas sell higher quantities and generate higher revenues while charging significantly higher prices and having higher marginal costs and lower markups than firms located in less dense areas. Table 20 further shows that 'Manufacture of other parts and accessories for motor vehicles' firms located in denser areas are characterized by a higher revenue TFP and that this is entirely driven by higher prices.  Stock and Yogo (2005). Focusing on the sample of SP and MP firms producing products belonging to the 'Ready-mixed concrete' industry (NACE code 2363), firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by R ipt /R r where R ipt is firm i revenue corresponding to product p at time t and R r is the sum of R ipt across the firm-product-year observations (belonging to the industry 'Ready-mixed concrete') corresponding to the ZE r.  Stock and Yogo (2005). Focusing on the sample of SP and MP firms producing products belonging to the 'Manufacture of other parts and accessories for motor vehicles' industry (NACE code 293), firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by R ipt /R r where R ipt is firm i revenue corresponding to product p at time t and R r is the sum of R ipt across the firm-product-year observations (belonging to the industry 'Manufacture of other parts and accessories for motor vehicles') corresponding to the ZE r.  Stock and Yogo (2005). Focusing on the sample of SP and MP firms producing products belonging to the 'Manufacture of other parts and accessories for motor vehicles' industry (NACE code 293), firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by R ipt /R r where R ipt is firm i revenue corresponding to product p at time t and R r is the sum of R ipt across the firm-product-year observations (belonging to the industry 'Manufacture of other parts and accessories for motor vehicles') corresponding to the ZE r.

Robustness checks
Results 1, 2 and 3 do not depend on the Mulama model assumptions and limitations because they are either shown to be consistent across several methodologies (Results 1 and 2) or they come straight from the raw data (Result 3) while holding across several samples and weighting approaches. As for Results 4 and 5, they are instead more reliant on the Mulama model and in this Section we provide a number of additional results showing that Results 4 and 5 are little affected by whether we use the DGKP or the FMMM estimation procedure, by whether we use the single-product firms sample or the larger sample of single-product and multi-product firms, by whether we employ the number of full-time equivalent employees or the total wage bill to measure the labour input, by whether we consider firm revenue or firm wage bill to weigh observations, by whether we include or not the Paris area (and more specifically thê Ile de France region), as well as by whether we use a Cobb-Douglas or a Translog production function. 36 FMMM estimation procedure. Two shortcomings of the DGKP procedure are related to its implicit assumptions and the amount of identifying variation. More specifically, existence and invertibility of a suitable conditional input demand for materials implies making implicit assumptions about demand and market structure that are nor readily verifiable. Furthermore, in the estimation procedure described in DGKP firm market share (de facto firm revenue) and price in t − 1 are, among other things, added as covariates in a regression where quantity at time t in on the left-hand side. Therefore, there might be little variation left to precisely identify technology parameters.
In an attempt to address these two issues FMMM develop an alternative estimation methodology that does not rely on the proxy variable approach. More specifically, FMMM use both the first-order approximation of the log revenue function (4) and the production function equation to recover technology parameters. Indeed, FMMM are sufficiently explicit about demand to be able to explicitly write the log revenue function in terms of observables and heterogeneities and use both this and the production function equation to estimate technology parameters. The key disadvantage of this methodology is that one has to be explicit about the process governing the evolution of product appeal λ it and in particular we, as FMMM, assume it follows an AR(1) process.
Tables D-6 to D-9 in Appendix D provide supporting evidence of Results 4 and 5 obtained using the FMMM procedure.
Single-product firms. The key advantage of using multi-product firms is coverage. Multiproduct firms are large and account for the lion's share of manufacturing production. However, their technology needs to be inferred from information on single-product firms (the production function is actually estimated using data on single-product firms only), and assumptions need to be made about how to split inputs across the different products of a multi-product firm.
In order to side-step these limitations, Tables D-10 to D-13 in Appendix D report results referring to the smaller sample of single product firms. Again, evidence is in line with Results 4 and 5.
Using firm wage bill to measure the labour input. Some spatial productivity studies use the firm wage bill instead of the number of full time workers to measure the labour input on the grounds that this controls in some way for the ability of workers. However, our aim is not to establish what share of the productivity advantage of denser areas is related to workers' skills and abilities (possibly due to sorting of better workers across space), but rather to establish how much of the observed revenue-based productivity advantage of firms located in denser areas is due to actual TFP differences as opposed to demand and markups differences. In this light, we prefer to use a measure of the labour input allowing our firm-level revenue TFP and quantity TFP to incorporate differences in workers' skills and abilities across locations. Furthermore, as discussed in Section 4, using the number of full-time equivalent employees allows us to more clearly establish whether products sold by firms located in denser locations actually require more inputs to be produced as opposed to more expensive inputs.
We nevertheless provide evidence in Tables D-14 to D-17 in Appendix D that Results 4 and 5 are qualitatively, and to a large extent also quantitatively, unaffected by using the wage bill to measure the labour input.
Using firm wage bill to weigh observations. Using firm revenue to weigh observations is simple and straightforward. However, given that firms generating a similar revenue might generate a very different value added over inputs, statistical offices often prefer to use other approaches when aggregating firm-level data. The most common approach is to consider either the number of employees or the wage bill. In Tables D-18 to D-19 in Appendix D we use the firm wage bill instead of firm revenue to weight observations and in doing so we confirm Results 4 and 5.
Eliminating Paris. When considering the spatial distribution of economic activities and/or regional differences in productivity and wages in France the elephant in the room is the Paris area. To check whether or not our findings are driven by some particular patterns arising in the Paris area we provide in Tables D-20 to D-23 in Appendix D results obtained eliminating firms located in theÎle de France region. Again, findings are strongly supportive of Results 4 and 5.
Translog production function. The Cobb-Douglas production function is widely used in productivity analyses including the spatial productivity investigation of Combes et al. (2012). However, the Translog production function is more general albeit more demanding in terms of number of parameters to estimate and degree of analytical complication. In Tables D-24 and D-25 in Appendix D we provide results obtained employing a Translog production function while using product revenue shares in order to assign inputs to the different products of a multi-product firm. Reassuringly, Results 4 and 5 find again strong support.

Conclusions
We make use of detailed quantity, prices and revenue data on products produced by French manufacturing firms and, building upon FMMM, we quantify heterogeneity in TFP, demand and markups across firms while further providing an exact decomposition of revenue TFP. We measure these heterogeneities at the firm level and subsequently aggregate them at the regional level to analyze differences in TFP, demand and markups across space. We find a number of robust results providing fresh insights on agglomeration economies that have implications for both economic theory and regional policy.
For example, the current policy approach is based on the presumption that firms in lagging regions are characterized by a lower TFP and so interventions are directed towards increasing their technical efficiency. In this respect, our evidence suggests that interventions should rather promote firms' product quality and marketing capabilities in order to increase revenue TFP in lagging regions. Furthermore, our findings suggest that achieving regional convergence has a lot to do with increasing the relative size of the most productive firms in lagging regions which might be hindered more than in other regions by factors like inputs misallocation.
On a concluding note, while our analysis provides a number of fresh insights on agglomeration economies it does not address the old question of what micro-channels generate the observed advantages of denser areas and how important they are individually. However, our analysis does suggest that micro-channels related to product quality and demand are key to understand differences in revenue TFP across space while at the same time highlighting the importance of the largely understudied links between firm revenue TFP, firm size and density in generating aggregate regional-level outcomes. In terms of avenues for future research, we believe the analysis could be fruitfully pushed forward by exploring, along the lines of Combes et al. (2012), if and how much the distribution of each component of revenue TFP is subject to left-truncation (as a measure of the importance of selection) and/or right-shifting and dilation (as a measure of the importance of agglomeration economies).
We index firms by i and time by t. In what follows we consider a Cobb-Douglas production technology with 3 production factors: labour (L), materials (M) and capital (K). In line with the existing literature we assume capital to be a dynamic input that is predetermined in the short-run, i.e., current capital has been chosen in the past and cannot immediately adjust to current period shocks. 37 We further assume, as standard in the literature, that materials are a variable input free of adjustment costs. Concerning labor we could assume it is a variable input free of adjustment costs, or we could assume it is, very much like capital, predetermined in the short-run as in DGKP, or we could also assume, following Ackerberg et al. (2015), it is a semi-flexible input. 38 In light of the features of the French labor market we opt for the predetermined case.
We further assume firms are single-product, while relaxing this assumption in Appendix C, and minimize costs while taking the price of materials W M it , which is allowed to be firmtime specific, as given. Consequently, at any given point in time, each firm i is dealing with the following short-run cost minimization problem: 39 where A it is quantity TFP which is observable to the firm (and influences her choices) but not to the econometrician. In what follows we refer to the Cobb-Douglas production technology as the quantity equation and denote with lower case the log of a variable (for example a it 37 As described in Ackerberg et al. (2015) capital is often assumed to be a dynamic input subject to an investment process with the period t capital stock of the firm actually determined at period t-1. Intuitively, the restriction behind this assumption is that it takes a full period for new capital to be ordered, delivered, and installed.
38 More precisely, in the semi-flexible case L it is chosen by firm i at time t − b (0 < b < 1), after K it being chosen at t − 1 but prior to M it being chosen at t. In this case, one should expect L it to be correlated with productivity shocks in t. Yet labour would not adjust fully to such shocks as materials do. The choice between predetermined and semi-flexible for L it does not change the structure of the model and estimation procedure we provide below but only affects the set of moments used in the estimation. We highlight any differences later on. 39 To simplify notation we ignore components that are constant across firms in a given time period as they will be controlled for by suitable dummies.

I
Electronic copy available at: https://ssrn.com/abstract=3594013 denotes the natural logarithm of A it ). The quantity equation can thus be written as: (A-1) First order conditions to the firm's cost minimization problem imply that: where χ it is a Lagrange multiplier. 40 We can thus write the short-run cost function as: Marginal cost thus satisfies the following property: By combining equations (A-2), (A-3) and (A-4) one obtains the result provided in Section 3.2 that the markup can be computed as the ratio of the output elasticity of material to the share of materials' expenditure in revenue: Moving to the time process of quantity TFP a it we assume, as standard, it can be characterized by a Markov process and in particular we consider the leading AR(1) case. More specifically we assume: where G ar represents geographical factors affecting productivity (like the density of economic activities), and ν ait stands for productivity shocks that are iid and represent innovations with respect to the information set of the firm in t − 1.

A.2 The DGKP estimation procedure
From the above equations, the optimal expenditure on materials (A-3) is a function of labour, capital, the unit cost of materials and quantity TFP (which are known and given to the firm in t) as well as of the optimal quantity produced. The latter is obtained by equalizing the

II
Electronic copy available at: https://ssrn.com/abstract=3594013 marginal cost and the marginal revenue and will thus depend upon the same 4 variables (labour, capital, the unit cost of materials and quantity TFP) plus factors characterizing the specific demand facing firm i. DGKP suggest to proxy for the unobservable unit cost of materials and firm-specific demand factors with the observable price and marker share of firm i as well as regional variables G r . Operationally, they thus assume the conditional (log) input demand for materials can be expressed as a function h(.) of k it , l it , a it , p it , G r , and the market share M S it . If h(.) is globally invertible with respect to a it , the inverse function a it = g(k it , l it , m it , p it , M S it , G r ) exists and is well behaved and so one can use a semi-parametric polynomial approximation of g(.) in order to proxy for the unobservable (to the econometrician) quantity TFP a it . Operationally, we use a second order polynomial in the arguments of g(.) to proxy this function. By labeling this polynomial P oly it we thus have a it = P oly it .
Using a it−1 = P oly it−1 in (A-6) we have: while substituting this into the production function one gets: Note that in (A-7) one does not need to identify the parameter φ a nor separately identify G ar form the G r contained in P oly it−1 . Therefore, one can write: where P oly it−1 is simply a second order polynomial in k it−1 , l it−1 , m it−1 , p it−1 , G r and M S it−1 . 41 Given the assumption that productivity shocks ν ait are innovations with respect to the information set of the firm in t − 1, ν ait is uncorrelated with P oly it−1 in (A-8). Furthermore, labour and capital are predetermined and so uncorrelated with ν ait too. Therefore, the only endogenous variable in (A-8) is materials m it and parameters can be estimated by exploiting additional moments conditions. More specifically, we use materials, labour and capital at time t − 2 as instruments for materials in t. This ultimately allows us to get estimates of the production function parametersα L ,α M andα K as well as productivitŷ (6) and (9) can then be used to recover the firm-specific markup and marginal cost while (2) delivers demand heterogeneityλ it . We perform estimations of (A-8) separately for each two-digit industry (NACE Sections) and consider a full battery of 8-digit product dummies, as well as year dummies.

A.3 The FMMM estimation procedure
As in FMMM we assume that product appeal follows an AR(1) process and in particular: where G λr represents geographical factors affecting demand (like the density of economic activities) and ν λit stands for product appeal shocks that are iid and represent innovations with respect to the information set of the firm in t − 1. Furthermore, we make use of the result that the log revenue function can be approximated (up to a constant across firms that will be controlled for by using suitable dummies) by a linear function of quantity and product appeal and, to avoid burdening notation, we use = instead of : We label (A-10) the revenue equation. This estimation procedure builds upon (A-10) and uses both the revenue and quantity equations to estimate technology parameters. The two-steps procedure described below is not the only one that can be used to recover technology parameters under our set of assumptions but has the advantage of being simple to implement and linear. In what follows, it is convenient to rewrite the Cobb-Douglas production function as: where γ characterizes returns to scale. By substituting q it with the formula of the Cobb-Douglas we can transform (A-10) further as:

IV
Electronic copy available at: https://ssrn.com/abstract=3594013 As for G ar , it is under our assumptions uncorrelated with ν ait . Operationally, we perform estimations of (A-20) separately for each two-digit industry (NACE Sections) and consider a full battery of 8-digit product dummies, as well as year dummies. IV estimation of (A-20) provides an estimate of γ that, together with γ α M and α L α M coming from the first stage revenue equation, uniquely delivers production function parameters (α L ,α M andγ) as well as (6) and (9) can then be used to recover the firm-specific markup and marginal cost while (2) delivers demand heterogeneityλ it .

B Measurement error in output and unanticipated shocks
As customary in productivity analyses, an issue to account for before proceeding to any estimations of the production function is the presence of measurement error in output and/or unanticipated productivity shocks. In the former case, instead of q it , the econometrician might be observing q it =q it + e it where e it is standard measurement error. Another interpretation of the same equation is that e it represents productivity shocks unanticipated by the firm. (A-1) thus becomes: The approach suggested by the literature (Ackerberg et al., 2015;De Loecker et al., 2016) to deal with measurement error in output and/or unanticipated shocks e it is based on the proxy variable framework and a semi-parametric implementation. We follow this approach and, building on the same logic of equation (19) in DGKP, we estimate: where q it is (log) quantity as reported in the data and poly(.) is a third-order polynomial in l it , m it , p it and k it . 42 We run (B-1) separately for each two-digit industry while including the log of the 2009 population and the log of the land area of region r, as well as a full set of 8-digit product dummies and year dummies, to (B-1). We then use the OLS prediction of q it , that we labelq

OLS it
, as quantity in the both the DGKP and FMMM procedures. 42 The logic behind using (B-1) to purge quantity from measurement error and unanticipated shocks is quite simple. From the quantity equation (A-1) q it is a function of l it , m it , k it and a it . Using prices p it as a proxy for a it , while assuming invertibility, one can then write a it as a function of l it , m it , p it and k it . Overall, q it is thus a function of l it , m it , p it and k it than can be semi-parametrically approximated by a polynomial function. Crucially, measurement error and/or unanticipated shocks to do influence a firm's choices and so their are not part of the polynomial approximation but rather the residual of equation (B-1).

VII
Electronic copy available at: https://ssrn.com/abstract=3594013 We also use the same approach for revenue and consider: whereē it now contains measurement error in both quantity and prices, as well as unobserved productivity shocks, and use the OLS prediction of r it , that we labelr

OLS it
, as revenue in the both the DGKP and FMMM procedures. Again, we run (B-2) separately for each two-digit industry while including the log of the 2009 population and the log of the land area of region r, as well as a full set of 8-digit product dummies and year dummies, to (B-2). Also note that, as suggested in DGKP, by purging revenue from measurement error and usingr OLS it instead of r it , we obtain a more reliable measure of the share of materials in revenue (s M it ) that is needed to compute markups.

C Multi-product firms
Produced quantities and generated revenues may be observable for the different products of each firm in databases like ours. However, information on inputs used for a specific product is typically not available. We report here an extension of the MULAMA model from FMMM to solve the problem of assigning inputs to outputs for multi-product firms.
As usual we denote a firm by i and time by t. A firm i produces in t one or more products indexed by p and the number of products produced by the firm is denoted by I it . In our data p is an 8-digit prodcom product code but in other data, like the bar-code data used in Hottman et al. (2016), can be much more detailed. We assume product appeal is firm-time specific (λ it ) while we allow markups (µ ipt ) and productivity (a ipt ) to be firm-product-time specific. The production function for product p produced by firm i is given by: where C p and C t are innocuous product and time constants (that will be controlled for by suitable dummies) we disregard in what follows and g identifies a product group/industry. Production function coefficients are the same for products within a product group because a certain level of data aggregation is needed to deliver enough observations to estimate parameters. (C-1) means we allow for technology (α Lg , α M g , γ g ) to differ across the different products p produced by a multi-product firm. At the same time productivity is allowed to vary across products within a firm and information coming from single-product firms need to be used to infer the technology of multi-product firms, i.e., we rule out physical synergies in production but allow for some of the economies (diseconomies) of scope discussed in DGKP. Furthermore,

VIII
Electronic copy available at: https://ssrn.com/abstract=3594013 we assume firm i to maximize profits and choose (for each product p) the amount of labour L ipt and materials M ipt in order to minimize short-term costs while taking capital K ipt , as well as productivity a ipt and product appeal λ it as given. We make use of (4) and in particular: Profit maximization implies: so that we can, starting from data on prices and markups, recover marginal costs. Also note that the marginal cost is equal to 43 Firms minimize costs and so markups are such that: where s M ipt is the expenditure share of materials for product p at time t in firm revenue for product p at time t.
As far as single-product firms are concerned, the DGKP procedure or the FMMM procedure described in Appendix A can be used to recover quantity TFP, markups, marginal costs and demand heterogeneity. Turning to multi-product firms we impose, as in DGKP, that the same technology parameters coming from single-product producers extend to the products of the former. Yet, in order to quantify multi-product firms productivity, markups, marginal costs and demand heterogeneity we still need to solve the issue of how to assign inputs to outputs and we do so by building on the above assumptions and the parameters estimated for single-product firms. As far as materials are concerned, we need to assign the observable total firm material expenditure M it across the I it products produced by firm i at time t, i.e., we need to assign values to M ipt such that I it p=1 M ipt = M it . We can use this condition along with (C-5) and (C-2) to operate this assignment. Substituting (C-5) into (C-2) and adding I it p=1 M ipt = M it provides a system of I it + 1 equations in I it + 1 unknowns; the I it inputs expenditures M ipt plus λ it . Indeed, at this stage we have data on r ipt , q ipt , α M g and M it . Operationally, one can actually proceed in two stages. Combining the above equations one hand one can then obtain materials expenditure from M ipt = α M g r ipt R ipt q ipt +λ it . By recovering inputs expenditures M ipt we subsequently compute materials expenditure shares in revenues s M ipt and so use (C-5) to recover a firm-product-time specific markup µ ipt as well as the marginal cost from (C-3). Since labour is a variable input a condition analogous to (C-5) holds for this input and so we use the computed markups µ ipt and information on α Lg to derive labour expenditure: L ipt = α Lg R ipt µ ipt . Operationally, this is not guaranteed to satisfy the constraint I it p=1 L ipt = L it for each firm and so the L ipt are re-scaled for each firm. The above procedure allows so far to obtain markups, marginal costs and product appeal/demand heterogeneity, as well as information on labour and materials use, for each of the products of a multi-product firm. However, in order to recover productivity a ipt we still need values for capital K ipt . To do this one can proceed as follows. Combining the marginal cost, profit maximization and quantity equations one gets: is the capital coefficient. We further refine those values by running an estimation where the computed K ipt from (C-6) is regressed on R ipt , M ipt , L ipt as well as total firm expenditure on materials and labour plus the capital stock and a full battery of year and product dummies. The predicted values of such regression are then re-scaled for each firm to meet the constraint X Electronic copy available at: https://ssrn.com/abstract=3594013      Notes: * p < 0.1; ** p < 0.05; *** p < 0.01. Standard errors clustered by ZE. Regressions are weighted and include year dummies as well as either 2-digit or 8-digit product dummies. Estimations are carried on the sample of SP firms. Each firm-product-year observation is weighted by R ipt /R r where R ipt is firm i revenue corresponding to product p at time t and R r is the sum of R ipt across the firm-product-year observations corresponding to the ZE r.   Notes: * p < 0.1; ** p < 0.05; *** p < 0.01. Standard errors are clustered at the ZE level. Regressions include time and industry (2-digit) dummies. The Fare sample includes firms with complete balance sheet data in NACE 2 industries 10-32 that remain after an initial cleaning of the data. The Prodcom sample includes the subset of such firms that are in the Prodcom dataset. In both samples, an observation is a firm-year combination. Each firm-year observation is weighted by R it /R r where R it is firm i revenue at time t and R r is the sum of R it across the firm-year observations corresponding to the ZE r. SP and MP refer to single-product and multi-product firms in the Prodcom sample that have been subject to further data cleaning. We consider two samples: 1) the sample of SP and MP; 2) the sample of SP. In both samples an observation is a firm-product-year combination. For SP a firm-product-year combination corresponds to a unique firm-year combination. Each firm-product-year observation is weighted by R ipt /R r where R ipt is firm i revenue corresponding to product p at time t and R r is the sum of R ipt across the firm-product-year observations corresponding to the ZE r.   Notes: * p < 0.1; ** p < 0.05; *** p < 0.01. Standard errors are clustered at the ZE level. Regressions include time and industry (6-digit) dummies. The Fare sample includes firms with complete balance sheet data in NACE 2 industries 10-32 that remain after an initial cleaning of the data. The Prodcom sample includes the subset of such firms that are in the Prodcom dataset. In both samples, an observation is a firm-year combination. Each firm-year observation is weighted by R it /R r where R it is firm i revenue at time t and R r is the sum of R it across the firm-year observations corresponding to the ZE r. SP and MP refer to single-product and multi-product firms in the Prodcom sample that have been subject to further data cleaning. We consider two samples: 1) the sample of SP and MP; 2) the sample of SP. In both samples an observation is a firm-product-year combination. For SP a firm-product-year combination corresponds to a unique firm-year combination. Each firm-product-year observation is weighted by R ipt /R r where R ipt is firm i revenue corresponding to product p at time t and R r is the sum of R ipt across the firm-product-year observations corresponding to the ZE r.   Stock and Yogo (2005). Starting from the sample of SP and MP firms, firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by R ipt /R r where R ipt is firm i revenue corresponding to product p at time t and R r is the sum of R ipt across the firm-product-year observations corresponding to the ZE r.  Stock and Yogo (2005). Starting from the sample of SP and MP firms, firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by 1/N r where N r is the total number of firm-product-year observations corresponding to the ZE r.  Stock and Yogo (2005). Starting from the sample of SP and MP firms, firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by R ipt /R r where R ipt is firm i revenue corresponding to product p at time t and R r is the sum of R ipt across the firm-product-year observations corresponding to the ZE r.   Stock and Yogo (2005). Starting from the sample of SP firms, firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by R ipt /R r where R ipt is firm i revenue corresponding to product p at time t and R r is the sum of R ipt across the firm-product-year observations corresponding to the ZE r. Notes: * p < 0.1; ** p < 0.05; *** p < 0.01. Instruments for log density are 1831, 1861 and 1891 log density. Robust standard errors reported. The 'LM stat' is a LM test statistic for under-identification and 'Under-identif. p-value' is the corresponding p-value. 'Wald F stat' is the first-stage F test statistic corresponding to the excluded instruments and is a test statistic for weak identification. See Stock and Yogo (2005). Starting from the sample of SP firms, firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by 1/N r where N r is the total number of firm-product-year observations corresponding to the ZE r. Notes: * p < 0.1; ** p < 0.05; *** p < 0.01. Instruments for log density are 1831, 1861 and 1891 log density. Robust standard errors reported. The 'LM stat' is a LM test statistic for under-identification and 'Under-identif. p-value' is the corresponding p-value. 'Wald F stat' is the first-stage F test statistic corresponding to the excluded instruments and is a test statistic for weak identification. See Stock and Yogo (2005). Starting from the sample of SP firms, firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by R ipt /R r where R ipt is firm i revenue corresponding to product p at time t and R r is the sum of R ipt across the firm-product-year observations corresponding to the ZE r. Notes: * p < 0.1; ** p < 0.05; *** p < 0.01. Instruments for log density are 1831, 1861 and 1891 log density. Robust standard errors reported. The 'LM stat' is a LM test statistic for under-identification and 'Under-identif. p-value' is the corresponding p-value. 'Wald F stat' is the first-stage F test statistic corresponding to the excluded instruments and is a test statistic for weak identification. See Stock and Yogo (2005). Starting from the sample of SP firms, firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by 1/N r where N r is the total number of firm-product-year observations corresponding to the ZE r. Notes: * p < 0.1; ** p < 0.05; *** p < 0.01. Instruments for log density are 1831, 1861 and 1891 log density. Robust standard errors reported. The 'LM stat' is a LM test statistic for under-identification and 'Under-identif. p-value' is the corresponding p-value. 'Wald F stat' is the first-stage F test statistic corresponding to the excluded instruments and is a test statistic for weak identification. See Stock and Yogo (2005). Starting from the sample of SP and MP firms, firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by R ipt /R r where R ipt is firm i revenue corresponding to product p at time t and R r is the sum of R ipt across the firm-product-year observations corresponding to the ZE r.

XVII
Electronic copy available at: https://ssrn.com/abstract=3594013 Notes: * p < 0.1; ** p < 0.05; *** p < 0.01. Instruments for log density are 1831, 1861 and 1891 log density. Robust standard errors reported. The 'LM stat' is a LM test statistic for under-identification and 'Under-identif. p-value' is the corresponding p-value. 'Wald F stat' is the first-stage F test statistic corresponding to the excluded instruments and is a test statistic for weak identification. See Stock and Yogo (2005). Starting from the sample of SP and MP firms, firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by 1/N r where N r is the total number of firm-product-year observations corresponding to the ZE r. Notes: * p < 0.1; ** p < 0.05; *** p < 0.01. Instruments for log density are 1831, 1861 and 1891 log density. Robust standard errors reported. The 'LM stat' is a LM test statistic for under-identification and 'Under-identif. p-value' is the corresponding p-value. 'Wald F stat' is the first-stage F test statistic corresponding to the excluded instruments and is a test statistic for weak identification. See Stock and Yogo (2005). Starting from the sample of SP and MP firms, firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by R ipt /R r where R ipt is firm i revenue corresponding to product p at time t and R r is the sum of R ipt across the firm-product-year observations corresponding to the ZE r.

XVIII
Electronic copy available at: https://ssrn.com/abstract=3594013 Notes: * p < 0.1; ** p < 0.05; *** p < 0.01. Instruments for log density are 1831, 1861 and 1891 log density. Robust standard errors reported. The 'LM stat' is a LM test statistic for under-identification and 'Under-identif. p-value' is the corresponding p-value. 'Wald F stat' is the first-stage F test statistic corresponding to the excluded instruments and is a test statistic for weak identification. See Stock and Yogo (2005). Starting from the sample of SP and MP firms, firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by 1/N r where N r is the total number of firm-product-year observations corresponding to the ZE r. Notes: * p < 0.1; ** p < 0.05; *** p < 0.01. Instruments for log density are 1831, 1861 and 1891 log density. Robust standard errors reported. The 'LM stat' is a LM test statistic for under-identification and 'Under-identif. p-value' is the corresponding p-value. 'Wald F stat' is the first-stage F test statistic corresponding to the excluded instruments and is a test statistic for weak identification. See Stock and Yogo (2005). Starting from the sample of SP and MP firms, firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by W ipt /W r where W ipt is firm i wage bill corresponding to product p at time t and W r is the sum of W ipt across the firm-product-year observations corresponding to the ZE r.

XIX
Electronic copy available at: https://ssrn.com/abstract=3594013 Notes: * p < 0.1; ** p < 0.05; *** p < 0.01. Instruments for log density are 1831, 1861 and 1891 log density. Robust standard errors reported. The 'LM stat' is a LM test statistic for under-identification and 'Under-identif. p-value' is the corresponding p-value. 'Wald F stat' is the first-stage F test statistic corresponding to the excluded instruments and is a test statistic for weak identification. See Stock and Yogo (2005). Starting from the sample of SP and MP firms, firm-product-year variables are aggregated at the ZE level after demeaning by 8 digit product and year. Each firm-product-year observation is weighted by W ipt /W r where W ipt is firm i wage bill corresponding to product p at time t and W r is the sum of W ipt across the firm-product-year observations corresponding to the ZE r.