Geographic Distribution of Firm Productivity and Production: A Market Access Approach

Size: px
Start display at page:

Download "Geographic Distribution of Firm Productivity and Production: A Market Access Approach"

Transcription

1 Geographic Distribution of Firm Productivity and Production: A Market Access Approach Yuxiao Huang and Wentao Xiong Department of Economics, Harvard University Nov. 1, 2017 Please Find Latest Version Here. Abstract This paper studies the distribution of firm productivity and production in a network of geographic locations. Using firm-level data from China s manufacturing sector from 1998 to 2007, we document that, on average, firms geographic locations explain approximately 14% of the productivity dispersion, yet the explanatory power of locations varies significantly across industries. These empirical patterns motivate us to develop two market access measures in a firm heterogeneity and trade framework to illustrate how changes in inter-location trade costs affect the selection of firms in each location and shape the distribution of firm productivity and production across locations. A decrease in a location s trade cost with other locations improves its (1) consumer market access (CMA), a measure of import competition that local firms face from rival firms elsewhere; and (2) firm market access (FMA), a measure of the size of the export market for local firms. To distinguish between these two competing effects empirically, we use reduction in road travel time due to massive nationwide highway construction as a source of variation in trade costs between Chinese prefectures. Reduced-form evidence supports theoretical predictions that better CMA is associated with higher local average productivity and smaller output of local individual firms, while better FMA with lower average productivity and larger firm output. Counterfactual analysis suggests that, for the manufacturing sector on aggregate, the expansion of highway networks accounted for 24% of the observed rise in productivity level, 40% of the decline in productivity dispersion, and 16% of the increase in output, but these effects differ substantially across industries and locations. JEL classification: F15, O18, R12. Keywords: productivity distribution, geographic locations, market access. Contact: wxiong@fas.harvard.edu Wentao Xiong is deeply indebted to his advisers Shawn Cole, Edward Glaeser, and Elhanan Helpman for their invaluable advice and continued support. We would like to thank Pol Antras, Jie Bai, Marco Pagano, Kirill Borusyak, Raj Chetty, Alonso De Gortari, Xiang Ding, Richard Hornbeck, Andrew Garin, William Kerr, Michael Kremer, Marc Melitz, Nathan Nunn, Martin Rotemberg, and participants at Harvard economics department s international, public, development, and informal practice seminars for many helpful suggestions. Errors are solely our own. 1

2 Geographic Distribution of Firm Productivity and Production 1 Introduction This paper studies the distribution of firm productivity and production in the network of geographic locations in an economy. It is well documented that there exists substantial dispersion of firm productivity within narrowly defined industries (see Bernard et al. (2007, 2012) for reviews). While there have been attempts to investigate the causes underlying such dispersion, research work to date is far from exhaustive. Meanwhile, a vast literature in economic geography explores the spatial distribution of production activities, and frequently links firms location choices to their productivity (see Rosenthal & Strange (2004) and Puga (2010) for reviews). This paper highlights the under-studied role of geography from an inter-regional trade perspective: trade costs between locations affect where firms are located, influence the selection of firms that enter and produce in each location, and hence shape the geographic distribution of firm productivity and production in an economy. Formally, we further develop the market acces approach Donaldson & Hornbeck (2016) in a Melitz (2003) framework to capture the effects of trade cost changes in general equilibrium. We first derive simple expressions of two types of market access: consumer market access (CMA) that measures competition from rival firms, and firm market access (FMA) that measures market size available to local firms. We then use the dramatic expansion of highway networks in China from 1998 to 2007 as a source of variation in Chinese prefectures market access, and examine how the geographic distribution of manufacturing firms productivity and production evolved accordingly. We start by establishing two empirical patterns related to how firm productivity and production are distributed across geographic locations. Our empirical setting is the Chinese manufacturing sector from 1998 to 2007, and geographic locations refer to prefectures. 1 First, geographic locations explain a significant fraction of between-firm productivity variation. On average across industries, where firms are located accounts for approximately 14% of the variation in firm productivity, yet this fraction ranges from below 2% for some industries to above 20% for some others. Second, while firms tend to be geographically concentrated, the degree of concentration again differs substantially by industry. Some industries have over 50% of total sales from firms in fewer than 5 locations, some have their sales spread out among firms in more than 200 locations, with no location accounting for more than 2% of industry aggregate. 1 A prefecture usually consists of 2 4 core urban districts and 4 8 surrounding counties, similar to a commuting zone in the U.S.A. 2

3 1. Introduction The stylized facts above provoke inquiry into the underlying mechanisms through which geographic locations matter. This paper tackles this question from an inter-regional trade perspective. Changes in inter-location trade costs induce two counteracting effects on firm productivity distribution and production in each location. As trade costs decline, on the one hand, local consumers will find it less expensive to buy from rival firms in other locations, or equivalently, local firms will face more competition from elsewhere ( import competition ). For each location, stronger import competition only allows the more productive firms to enter and produce, raises local average productivity, and shrinks each surviving firm s output. On the other hand, local firms can also more easily sell to markets elsewhere ( export access ). For each location, better export access allows less productive firms to survive, lowers local average productivity, and increases each surviving firm s output. Despite their countervailing effects, import competition and export access are also inter-related via an income-expenditure link: the more productive firms there are in a location, the higher competitive pressure these firms exert on firms in other locations; meanwhile, more productive firms bring in more revenue for local consumers, who can thus spend more on purchases from firms elsewhere. This paper formalizes the intuition above by extending the market access approach per Donaldson & Hornbeck (2016) in the firm heterogeneity and trade framework of Melitz (2003). We derive two distinct market access measures: consumer market access (CMA) to capture import competition, and firm market access (FMA) to capture export access. For a location, both market access measures increase in another location s economic mass, for example the number of competing firms or the total purchasing power of consumers, and decrease in the trade cost between the two locations. A location s market access affects its local productivity distribution and production via firm selection: as trade costs decline, better CMA makes it more difficult for less productive firms to survive and compresses the output of surviving firms, while better FMA works in the opposite direction. As local firms respond to trade cost shocks, they generate spillovers on firms elsewhere in the network of locations, even if some other locations do not experience trade cost changes directly. For example, consider 3 locations o, m, and d that trade with one another. If many new firms enter in o after a highway connects o and d, these firms will exert higher competitive pressure on rival firms in m (an increase in m s CMA), even if m is not connected to the o d highway. As less productive firms exit from m, m gives less intense competition to d, attracting more entrants in d, which can then feed back to firm selection in o. Such inter-location interaction necessitates our 3

4 Geographic Distribution of Firm Productivity and Production market access approach, examining the effect of trade cost shocks in general equilibrium. To calibrate and test the model, this paper exploits a common and important source of variation in trade costs: construction of transportation infrastructure. Specifically, in the decade from 1998 to 2007, along with a 5-fold increase in aggregate manufacturing output, China witnessed dramatic growth in its highway networks, with only sporadic regional highways at the beginning but a well connected nationwide highway system in the end. We treat each prefecture as a local economy, and approximate the trade costs between them using road travel time, which highway networks reduced by over 40% on average across prefecture pairs in this period. Market access measures are then defined at industry-prefecture level. The historic expansion of China s manufacturing sector and highway networks resulted in substantial changes in both CMA and FMA, with sizable variation in such changes across industry-prefecture s. We use firm-level data and inter-prefecture travel time data for model calibration, allowing the empirical relationship between trade cost and travel time to differ across industries. We find that trade cost increases in travel time, with a steeper slope for industries whose products have higher weight-to-value ratios, a proxy for the per-unit-value price of road transport, and industries for which expenses on road transport account for a larger fraction of total input. We then use the resulting parameter estimates to compute market access measures and investigate how they are correlated with firm productivity distribution and production at industry-prefecture level, which yields 3 sets of reduced-form evidence consistent with the theoretical predictions above. First, holding FMA constant, CMA exhibits a strong positive correlation with local average productivity, with an increase of 1 standard deviation (s.d.) in CMA corresponding to about 0.24 s.d increase in local mean productivity. In contrast, holding CMA constant, an increase of 1 s.d. in FMA is associated with 0.18 s.d. decrease in local mean productivity. The correlations between market access and local average productivity are higher for high weight-to-value industries. Second, conditional on an individual firm s own productivity, FMA exhibits a strong positive correlation with this firm s output, with 1 s.d. increase in FMA corresponding to 0.32 s.d. increase in firm output, and this effect comes mostly from highly productive firms. Third, an increase in CMA or a decrease in FMA is related to a higher probability of exit among less productive firms and a smaller number of entrants. A key concern about the causal interpretation of the reduced-form evidence above is that changes in a location s market access are endogenous to changes in local productivity and produc- 4

5 1. Introduction tion outcomes. For instance, the government may choose to build highways to connect prefectures where firms are becoming more productive. To start with, we show that baseline prefecture characteristics, such as per-capita GDP and population density, do not predict when a prefecture receives highway connection. Furthermore, we define two alternative sets of market access measures that are less correlated with local economic prospects. First, when computing market access measures, we exclude prefectures that are geographically close to the prefecture in question. While a prefecture s market access may comove with economic trends locally and in nearby areas, changes in market access due to distant neighbors are less correlated with changes in the local economy. Second, we use contemporary trade cost, but keep economic mass of markets elsewhere at baseline. In this way, changes in market access only come from changes in trade cost, not other locations economic conditions. Our reduced-form results are robust to alternative definitions of market access. To characterize the new equilibrium after a trade cost shock, we rely on our structural model to recover counterfactual scenarios, incorporating interaction among locations in the network. For our main counterfactual, we assume that 2007 highway networks were put in place in 1998, and examine how 1998 geographic distribution of firm productivity and production would have responded. In this scenario, the dispersion of firm productivity across locations is significantly reduced, production is concentrated in slightly fewer locations, and aggregate productivity and output levels are much higher. This overall trend masks considerable heterogeneity across industries and locations. For instance, productivity dispersion decreases more for industries with a higher share of output in (often inefficient) state-owned enterprises, and locations where firms are highly productive tend to see a larger increase in total output. We then compare this counterfactual, where only market access forces have been at work, with the actual 2007 firm-level data. We find that, for the manufacturing sector overall, the highway-induced market access dynamics alone accounts for a significant fraction of the observed changes between 1998 and 2007: around 24% of productivity growth, 40% of the decline in productivity dispersion, and 16% of output growth. Emphasizing import competition and export access effects, we are missing some potentially important impacts of highway construction that we can t measure due to data limitations. For example, highway networks can encourage firms to cluster in certain regions and create more benefits of agglomeration, improve access to locations from which firms source intermediate inputs, 2 raise 2 In Appendix A.1.2, we supplement our main model by adding an input channel, and derive an expression for input market access (IMA) that captures the effect of input price changes in response to trade cost shocks. We also discuss what data we need to empirically estimate this more comprehensive model. 5

6 Geographic Distribution of Firm Productivity and Production individual firms productivity levels by making it more convenient for experienced technicians to travel between locations and exchange ideas, etc. We focus on import competition and export access to keep our theoretical framework simple, tractable, and easy to estimate given our data. In fact, this seemingly narrow focus can explain a significant fraction of the observed productivity and production evolution, as shown in the counterfactual analysis above. This paper connects two vast fields of active research: economic geography and firm heterogeneity. The spatial distribution of economic activities, and most relevant to this paper, firms, has long taken center stage in economic geography, and has been frequently linked to firms productivity. In firm heterogeneity literature, the sizable dispersion of firm productivity has been repeatedly highlighted, and remains prominent in research on international trade. A voluminous literature on economic geography has proposed various theories why firms choose to locate where they are, and many theories suggest that firms choose to locate where they can be most productive. Puga (2010) comprehensively reviews studies on the magnitude and causes of the geographic concentration of production activities. This paper stresses that, by determining where to locate, firms choose how close they are to their competitors and product markets, and affect other firms location choices. A location where firms can be highly productive may seem less ideal if many competitors are nearby or product markets are distant. In other words, a location attracts firms not only because of its (immobile) endowment, but also because of its connection with other locations, and better connection often means lower trade costs. Redding & Turner (2015) offers a detailed review of transportation costs and the spatial organization of economic activity. Directly contributing to the extensive research on firm heterogeneity (see Bernard et al. (2012) for an overview), this paper documents the important yet understudied role of geography in shaping firm productivity distribution. Many papers offer explanations of the enormous difference in productivity between firms, such as technology (Eaton & Kortum, 2002), policy distortions (Hsieh & Klenow, 2009), management practice (Bloom et al., 2013), labor mobility (Tombe & Zhu, 2015), etc. This paper focuses on an unexplored factor, namely geography-based market access, distinguishes between CMA and FMA, and discusses the respective implications for productivity distribution. Using highway connection as an important source of variation in market access, this paper is also related to a large volume of research on the economic impact of transportation infrastructure. Many economists have explored how transportation infrastructure affects regional economic growth (Duranton & Turner (2012), Banerjee et al. (2012), Faber (2014), Lin (2017)), inter-regional 6

7 2. Introduction trade (Donaldson (forthcoming), Duranton et al. (2014), Allen & Arkolakis (2017)), labor market (Michaels, 2008), manufacturing activity (Ghani et al., 2016), urban formation and development (Baum-Snow (2007), Baum-Snow et al. (forthcoming), Nagy (2017)), among others. We take a novel angle by examining the effect of highway networks on the geographic distribution of firm productivity and production, highlighting the market access dynamics in general equilibrium. Very closely related to this paper, in terms of the market access approach, are Donaldson & Hornbeck (2016) and Baum-Snow et al. (2017). 3 Since both papers examine macro-level outcomes of local economies like total land value and GDP growth, for simplicity, they only specify one traded sector and one type of market access for each location, and find that better market access improves macro-level outcomes. Looking at the distribution of individual firms productivity, this paper considers multiple industries with heterogeneous firms and the counteracting effects of two types of market access. 4 Although a decrease in trade cost enhances both CMA and FMA for a location, the total local firm revenue will not increase if import competition dominates export access. This can be the case, as an example, when location o is better connected to location d with many productive firms in industry j but few firms in other industries (so d s total income/expenditure is small), and thus firms in j d exert strong competitive pressure on firms in j o, but d offers a tiny export market for firms in j o. As the geographic distribution of firms varies by industry, so will the overall effect of trade cost shocks on a given location. While both papers focus on reduced-form evidence, this paper estimates a fully specified structural model using detailed firm-level data and recovers counterfactuals related to trade cost shocks. In particular, we highlight how changes in location o s market access can influence the market access of other locations, and changes in other locations market access can in return affect o s market access. The rest of this paper proceeds as follows. Section 2 briefly introduces the empirical setting, and establishes two sets of stylized facts in order to motivate our model in Section 3. Section 4 describes how to measure historical road travel time between prefectures and link it to trade cost. Section 5 estimates the structural model and shows how well the model fits the data. Section 6 conducts reduced-form tests of the models key predictions, together with robustness checks and discussion of alternative hypotheses. Section 7 uses the model to recover counterfactual scenarios related to trade cost changes. Section 8 concludes. 3 This approach, as we will elaborate in Sec.3, captures the direct and indirect effects of trade cost changes in general equilibrium, not just the direct effects in partial equilibrium as in most earlier papers. 4 As we will show in Sec.3.2, the two types of market access, CMA and FMA, are proportional in a one-sector setting. 7

8 Geographic Distribution of Firm Productivity and Production 2 Empirical Patterns: Geographic Distribution of Firm Productivity and Production In this section, we briefly introduce our empirical setting, and establish two stylized facts about the geographic distribution of firm productivity and production. First, a firm s location explains a significant fraction of the substantial variation in productivity, and the fraction of variation explained depends on the industry in question. Second, the degree of firms geographic agglomeration also varies dramatically by industry. These results point to the importance of understanding the distribution of firm productivity and production in an economy from a geographic perspective, with particular attention to industry-specific patterns, and motivate subsequent work on how productivity and production distributions are affected by trade costs that separate geographic locations. 2.1 Empirical Setting: Chinese Industrial Firms Database Detailed firm-level data, based on which we calculate individual firms productivity, come from the Chinese Industrial Firms Database (CIFD). An annual firm-level survey conducted by Chinas National Bureau of Statistics, the database covers mining, manufacturing, and public utility industries. In this paper, we will focus on manufacturing firms, and the decade between 1998 and 2007 in which the surveys contain necessary variables for standard productivity measurements. Firms present in this dataset are relatively large. According to the official documentation, the surveys include all state-owned enterprises (SOEs), and non-state firms with sales greater than 5 million CNY ( above-scale firms). 5 Yet in fact, a significant number of below-5-million non-state firms, accounting for about 5% of the unbalanced panel, are also included. In comparison with the 2004 Economic Census that covers the universe of firms, the CIFD excludes 80% of firms, yet these below-scale firms only accounted for 28.8% of industrial workforce, 9.9% of output, and 2.5% of exports (Brandt et al., 2012). 6 the final sample is constructed. Appendix A.(1.1.1) provides details about this database and how Table 1(A) provides key summary statistics from firm balance sheets. From 1998 to 2007, aggregate real industrial output experienced an impressive 4-fold increase, value added and export increased by even more, while total employment grew by 40%, suggesting considerable productivity 5 1 U.S. dollar 8.3 Chinese yuan from 1998 to 2004, and gradually depreciated to 7.2 CNY at the end of Thus in this paper, the entry and exit of very small non-state firms are about when they first appear in and disappear from the CIFD. We discuss how this will affect our results in Sec

9 Geographic Location and Productivity Dispersion gains. Though the total number of firms more than doubled from 129k to 288k, there was massive entry and exit simultaneously, as shown in Table 1(B). Approximately 10 15% of firms enter and exit every year, and only 30k firms remained throughout the sample period. Table 1(C) summarizes firm productivity distributions across industries. For each of the 29 2-digit industries, we calculate the Levinsohn-Petrin output productivity for individual firms. The well documented firm heterogeneity is confirmed: if we rank firms by their productivity in a year, firms at the 90th percentile are about 0.7 log points, or 100% more productive than firms at the 10th percentile. The decade saw a substantial increase in aggregate productivity level, and a slight decline in productivity dispersion. Productivity dispersion quantified using valueadded productivity measures like Olley-Pakes and Ackerberg-Caves-Frazer is an order of magnitude greater, echoing findings in Gandhi et al. (2016), but the overall rise in aggregate productivity and decline in productivity dispersion remain. 2.2 Geographic Location and Productivity Dispersion We now show that a firms geographic location explains a significant fraction of between-firm productivity dispersion. If we assume that a prefecture s contribution to local firms productivity is the same across industries, then where firms are located accounts for about 2 3% of the variation in firm productivity (Table 2). This is already a sizable fraction in comparison with other common factors, such as ownership type (state-owned, private, etc.) and firm age, which explain less than 0.1%. Yet the effect of geographic locations likely varies by industry: while Boston can be a good fit for pharmaceutical firms, it does not seem an ideal environment for oil refineries. Allowing the effect of geographic locations to vary by industry boosts the fraction of variation explained by 5 times to 12 14% (Table 2). One may think of the first 2 3% of the variation explained as the common contribution of geographic locations to firm productivity, and the next 10 11% as the industry-specific contribution. For example, a well-educated labor pool in a city supports local firms regardless of industry, yet industries that demand more skilled labor likely benefit more. While geography seems to matter for firm productivity overall, Figure 1(A) documents that the importance of geography varies across industries. In 2006, the fraction of variance explained by firms geographic locations ranges from less than 2% for industries like oil refining and coking to more than 20% for industries like ferrous metal smelting and rolling. 9

10 Geographic Distribution of Firm Productivity and Production Geographic Agglomeration of Firms Our second empirical pattern is about how firm production is distributed geographically. While we have documented that geography matters for firm productivity across industries, firms do not always agglomerate in highly productive locations. Figure 1(B) shows the Ellison-Glaeser agglomeration index for the 29 2-digit industries. Recall that, when Ellison & Glaeser (1997) introduce this index to the 1987 U.S. Census of Manufactures data, the authors designate an industry with an index value above 0.05 as highly agglomerated, and an industry with a negative index value as the opposite. Overall, Chinese manufacturing firms in 2006 were not highly agglomerated, with an average index of across industries. Again, industry heterogeneity is evident: the index ranges from negative for industries like tobacco products to above 0.06 for industries like electronic equipment. 7 Together, results in this section invite questions about what determines the distribution of firm productivity and production across space. 3 Theoretical Framework To understand the geographic patterns about firm productivity and production distributions, we further develop the market access approach per Donaldson & Hornbeck (2016) in a Melitz (2003) framework of firm heterogeneity and trade. As inter-location trade costs decline, firms in a location face more import competition on the one hand, but also have greater export access on the other. We derive expressions of consumer market access (CMA) to capture import competition, and firm market access (FMA) to capture export access. Both types of market access shapes the geographic distribution of firm productivity and production via firm selection: for a location, higher CMA, or more intense import competition, only allows the more productive firms to survive, raising local average productivity and shrinking the output of every surviving firm; higher FMA, or greater markets for export, works in the opposite direction. The general equilibrium effect of trade cost changes becomes particularly salient in our setting and necessitates the market access approach. Many papers on the effect of transportation infrastructure strives to identify the partial equilibrium effect of trade cost shocks. For this purpose, the ideal experiment is to randomly assign highway connection, and compare locations that get connected with those that do not. However, in a network of locations, each location is affected by 7 Both the percent of variation explained by prefecture and the agglomeration index have much higher averages and greater dispersion at 4-digit industry level (about digit industries), see Glaeser & Xiong (2017). 10

11 Setup the global matrix of location-to-location trade costs. A change in trade cost between two locations not only directly affects trade between the two, but also generates spillovers on other locations. Consider two neighbor locations o 1 and o 2 and another distant location d with trade connections among them. A highway that connects o 1 and d reduces the travel time, and hence trade cost, between o 1 and d, changing the CMA and FMA of, and thus the set of firms in, both o 1 and d. As the sets of firms in o 1 and d are now different, so are CMA and FMA of o 2. In addition, the o 1 -d highway can reduce the travel cost between o 2 and d, for instance, if it saves time for an o 2 -d trip to go through o 1 to take advantage of the o 1 -d highway. This will also change the market access of, and hence the set of producing firms in, o 2. A change in the set of firms in o 2 feeds back to the market access of o 1 and d, and this loop continues in the network of locations until a new equilibrium is reached. A simple treatment-control (highway connection or not) comparison will miss such indirect effects of trade cost shocks, and thus we rely on our structural model to recover counterfactual scenarios. 3.1 Setup We adopt the Melitz (2003) framework, derive explicit expressions of CMA and FMA, and show their respective implications for local firm productivity and production distributions. We begin by outlining the setup Notations Representative consumers preferences are defined over the consumption of goods produced by J industries indexed by j, k {0, 1, 2,..., J}. Industry j = 0 produces a homogeneous good. Each industry j 1 has a continuum of horizontally differentiated varieties, and each firm produces one of the varieties. There are N locations indexed by o (origin) or d (destination) {1, 2,..., N}, and a symmetric iceberg trade cost τ od between o and d. In industry j and location o, the unit cost of production (or composite input price) is c jo, and there is an underlying distribution of firm productivity G jo (ϕ) and potential mass of firms S jo. The mass of firms that produce (and we observe in data) is a fraction h jo of S jo. Individual firms are indexed by ω Ω jo, with each firm s productivity draw ϕ joω. 8 Readers interested in the details of the Melitz setup may refer to Melitz & Redding (2014). 11

12 Geographic Distribution of Firm Productivity and Production Preferences Representative consumers have Cobb-Douglas upper-tier utility over industries, U = j β j ln Q j, J β j = 1, β j (0, 1) j In location o, total expenditure on industry j s good is thus X jo = β j I o, where I o denotes the total income of o. Within each differentiated industry j 1, the aggregator over these varieties takes the Constant Elasticity of Substitution (CES) form: [ Q j = q(ω) (σ j 1)/σ j dω ω Ω j where σ j > 1 denotes the within-industry CES elasticity. The price index of industry j s good is then [ P j = p(ω) 1 σ j dω ω Ω j ] σj /(σ j 1) ] 1/(1 σj ) Production and Pricing Industry j = 0 produces a homogeneous good with a unit input, and the price of this homogeneous good is chosen as the numeraire. In industry-location j o, a potential entrant firm can pay fixed cost c jo e jo for a productivity draw ϕ joω, and can then choose to produce at unit cost c jo ϕ joω (so production features constant returns to scale). A firm will only start producing if it expects the revenue to at least cover the fixed cost of production c jo f jo. Under CES demand and monopolistic competition, firms charge a constant mark-up σ j σ j 1 above their unit cost of production. Given the trade cost τ od between locations o and d, the price of firm joω s product in location d is p j,od (ϕ joω ) = σ j c jo τ od σ j 1 ϕ joω 12

13 Market Access 3.2 Market Access As we will characterize equilibrium in an industry, we now drop the industry subscript j to streamline notation. While the Melitz model specifies a fixed cost of export for each pair of locations, we make a key simplifying assumption: inter-location trade is only subject to iceberg trade cost, not fixed cost of export. In other words, as long as a firm chooses to produce, it sells its variety to all locations in the economy. This assumption allows us to derive clean expressions for market access measures, and is less stringent in intra-national trade, as in our empirical setting, than in international trade. In addition, since our data do not cover the export (selling to other prefectures in China) status of individual firms, it is quite challenging to estimate the fixed cost of export as many papers on international trade do Firm Market Access (FMA) A destination d s demand for the variety produced by a firm with productivity ϕ in origin o is r od (ϕ) = q od (ϕ)p od (ϕ) = X d = ( σ σ 1 ) 1 σ τ 1 σ od P 1 σ d X d P 1 σ d Total demand is the sum of demand over all destinations, p od (ϕ) 1 σ ( ) 1 σ co ϕ where F MA o r o (ϕ) = ( ) 1 σ co F MA o (1) ϕ ( ) 1 σ σ σ 1 d τ 1 σ od X d P 1 σ d A location has high FMA, or access to a large market, if (1) it is well connected to other locations, which shows up as small trade costs τ; (2) other locations have high demand, which shows up as large total expenditure X; (3) other locations have low levels of competition, which shows up as a high price level P ). Intuitively, even if a location spends a large amount on an industry s products, there will be little room for firms elsewhere if the local market is already highly competitive, or equivalently, at a low price level. FMA thus captures the aggregate size of markets that a local firm 13

14 Geographic Distribution of Firm Productivity and Production can potentially reach. It quickly follows that total demand/revenue for all firms in a location is R o (ϕ) = S o Consumer Market Access (CMA) ϕ ( ) 1 σ co F MA o dg o (ϕ) (2) ϕ The price index in each destination d is a CES aggregate of the prices of individual firms varieties that sell in d. Competition among rival firms lowers the price index and enhances local consumers welfare. We thus relate a location s price index to its consumer market access (CMA), 9 P 1 σ d = o = o [M o ] p od (ϕ) 1 σ dg o (ϕ) [ ( σ M o σ 1 ) 1 σ c o τ od dg o (ϕ)] CMA d ϕ Eq.(1) gives ( c oϕ ) 1 σ = r o(ϕ) F MA o. making use of Eq.(2), we obtain Substituting this into the P d CMA equation above, and CMA d = ( ) 1 σ σ σ 1 o τ 1 σ od R o FMA o (3) A location has higher CMA, or more competition, if (1) it is better connected to other locations, which shows up as smaller trade cost τ; (2) other locations have large competing firms, which shows up as larger total revenue R; (3) firms in other locations can only reach a small market, which shows up as smaller F MA. Intuitively, if a location has many large firms, but at the same time sells to a very large market, the competitive pressure that local firms exert on firms elsewhere will be lower. Symmetrically, since P 1 σ d CMA d, we write FMA as FMA o = ( ) 1 σ σ σ 1 d τ 1 σ od X d CMA d (4) 9 To the extent that firms in an industry use the output of all industries as intermediate input, and that the price of intermediate input is part of the composite cost of production, a location o s higher CMA, or lower price index, of industry j will lower the cost of production c ko for all industry k s. We formalize this in Appendix A.1.2 and derive input market access (IMA) based on CMA. 14

15 Local Productivity Distribution Eq.(3) and (4) illustrate that, if total revenue R jo is proportional to total expenditure X jo across industry-location s, 10 so is CMA to FMA. This is the case if there is only one traded sector, as in several previous papers on market access like Donaldson & Hornbeck (2016) and Baum-Snow et al. (2017). In our setting, CMA and FMA differ from each other because of the uneven distribution of production across locations. Empirically, it is difficult to measure CMA as a recursive index. Again using we express CMA in unit cost of production, CMA d = ( ) ( 1 σ σ τ 1 σ od σ 1 o ω Ω o ( ϕoω c o ) σ 1 ) ( c oϕ ) 1 σ = r o(ϕ) F MA o, Clearly, a location has high CMA if it is well connected to locations that have many productive firms (large sum of high ϕ firms). (5) 3.3 Local Productivity Distribution In location o, a firm at the local productivity cutoff ϕ o makes just enough profit π(ϕ o) to offset the fixed cost of production c o f o, Eq.(1) implies π(ϕ o) = r(ϕ o) σ c o f o = 0 (ϕ o) σ 1 = (c o) σ σf o F MA o ϕ o (F MA o ) 1 1 σ (6) Ceteris paribus, an increase in FMA lowers the cutoff ϕ o: a larger market size allows local firms to spread their fixed costs of production over more product units, and hence unproductive firms can manage to enter or survive. By Eq.(4), FMA is decreasing in CMA, so an increase in CMA raises the cutoff ϕ o: more competition makes it more difficult for unproductive firms to enter or survive. Similarly, by Eq.(1), local individual firms revenue and total revenue of a location both increase in the location s FMA and decrease in its CMA. 11 Assume an underlying Pareto local productivity distribution with scale and shape parameters 10 An extreme case: each industry-location maintains trade balance, i.e. R jo = X jo. 11 One can see this more clearly by dividing a location s FMA into that due to its home market and that due to its export market (selling to other locations), ( ) 1 σ σ X o F MA o = F MA o,ex + σ 1 CMA o 15

16 Geographic Distribution of Firm Productivity and Production ( ϕ, θ) o, so the probability density function g(ϕ oω ) = θo( ϕo)θo (ϕ oω) θo+1. Changes in CMA or FMA move ϕ o ϕ around, but the lower-truncated Pareto distribution is still Pareto, with scale parameter ϕ o and the same shape parameter θ o. For θ o > 1, 12 the cutoff pins down the mean productivity of surviving firms, ϕ o = θ oϕ o θ o 1 = θ o θ o 1 ( (co ) σ σf o F MA o ) 1 1 σ All firms pay a fixed cost of entry c o e o to draw from the underlying productivity distribution G o (ϕ), but only those who get a draw above the cutoff ϕ o produce. Free-entry condition imposes that the expected ex-ante profit of firms that produce just offsets the fixed cost of entry, 0 π(ϕ)dg o (ϕ) = ϕ o ( ) r(ϕ) σ c of o dg o (ϕ) = c o e o R o σ = h os o c o f o + c o e o S o where h o = dg ϕ o (ϕ) denotes the share of underlying firms that enter and produce, and thus h o S o o the observed mass of firms. For simplicity, we write e o S o = E o. Under this free-entry condition, lowering barriers to entry in a location attracts more entrants there, intensifies local competition, and pushes down the local price level in equilibrium. (7) (8) 4 Trade Cost Measurement Having derived explicit expressions of various market access, we proceed to the corresponding empirical measurement. A key missing piece in our setting is the trade cost between prefectures in China. Without inter-prefecture trade data, in particular data on spatial price gaps, 13 we are unable to directly estimate trade cost, and hence follow an indirect route in the economic geography literature to infer trade cost from road travel time between prefectures For θ o (0, 1], the mean approaches infinity. 13 See Atkin & Donaldson (2015) for a recent advance in using spatial price gaps to identify intra-national trade costs. The authors also review a voluminous literature along this line. 14 In Appendix A.1.3, we discuss why we focus on road transport rather than other modes of transport, like railway, air, and waterway. 16

17 National Highway Construction 4.1 National Highway Construction Since the early 1990s, and in particular during our sample period from 1998 to 2007, China went through massive expansion of highway networks. 15 In 1990, there were essentially no limited access highways between Chinese prefectures. Existing inter-city roads had at most two lanes with unrestricted access, and were often unpaved. Thus, Baum-Snow et al. (2017) assume merely 25 km/h for inter-prefecture traveling on local roads. ton-km s, and almost all goods were moved by railway or waterway. Roads took care of less than 5% of freight In 1992 under the National Trunk Highway Development Program, the Chinese State Council approved the blueprint of the 5-7 system, which refers to 5 North-South vertical and 7 East- West horizontal axes World Bank (2007). The project aimed to connect all provincial capitals and cities with an urban registered population above half a million on a single highway network, and to connect targeted regional centers and the national border in border provinces as part of the Asian Highway Network. The Chinese Ministry of Communications (predecessor to the current Ministry of Transport) set as the kick-off phase in which only a handful of highways were completed, and as the rapid development phase in which large-scale construction took place. Originally earmarked for completion by 2020, this nationwide construction endeavor concluded ahead of schedule by the end of 2007, in large part due to the government s stimulus spending in response to the 1997 Asian Financial Crisis Asian Development Bank (2007). In consequence, from 1998 to 2007, China s total highway length grew from 8.7k km to 53.6k km. By 2010, highways and roads carried over 30% of freight ton-km s. 16 Given the post-1998 acceleration of construction, the State Council approved an even more ambitious follow-up plan in Dec The stated purpose of this system was to bring highway connection to all cities with an urban registered population above 200k by Prior to the advent of national trunk highways, national highways and provincial highways served as the main routes between prefectures, usually subject to a speed limit of km/h and 70 km/h, respectively. However, many prefectures were not on existing national highways. Moreover, due to poor road quality and frequent congestion, the actual speed on these highways was often far below the limit. In contrast, newly built as 4-lane limited access tollways, the national 15 This subsection that tells the history of China s national highway system borrows heavily from Sec.2.1 of Faber (2014) and Sec.2 of Baum-Snow et al. (2017). 16 Source: China Statistical Yearbook, 2011, published by the National Bureau of Statistics of China refers to 7 radial axes from the capital Beijing, 9 vertical axes, and 18 horizontal ones. In 2013, the Ministry of Transport added two additional vertical axes, aiming to having in total 118k km of highways by

18 Geographic Distribution of Firm Productivity and Production trunk highways commonly feature a speed limit of km/h, and run in parallel with existing major roads in many areas. 4.2 Road Travel Time and Trade Cost We make use of 2 data sources to estimate by how much highway construction reduces interprefecture road travel time. First, the ACASIAN GIS data, which we describe in Appendix A.(1.1.2), provide historical maps of highway networks in China, allowing us to hand-collect in which year each prefecture received highway connection. If unclear based on maps, we supplement the data with news search on highway construction. As Figure 2 shows, while there were only sparse regional highways in 1998, the vast majority of prefectures in China had been connected by highways in an integrated national network by Second, working with Google Maps APIs, we obtain contemporary (Nov. 2016, when this task was completed) travel distance and time matrices. Since very few, if any, prefectures are still off the national highway network, the normal travel time is almost completely based on highway travel, i.e. one spends the least time traveling between prefectures by staying on highways all the time. Recall that very few prefectures, typically the remote ones with little manufacturing activity, were still left behind during the highway boom by 2007, so the contemporary normal travel time provides a good proxy for the 2007 travel time. If we specify avoid highways, Google Maps will avoid highways altogether when choosing routes between prefectures, typically staying on national and provincial major roads that were already in place in This usually results in minimal changes in travel distance, but doubles travel time. Since Google Maps can t recover travel time, we use the historical highway maps and contemporary travel time in 3 steps, incorporating two main adjustments. 1. For an origin location o and destination d in year y, if the highway map of year y shows that both o and d were connected to highways, then we use the normal travel time; otherwise, use the avoid highways time. 2. Block adjustment. The first step probably underestimates the actual travel time in the early years of our sample period. As shown in Figure 2, in the late 1990s, there were several regional blocks in which highways connected regional centers and nearby cities, but sometimes no highways between different blocks. When o and d belong to different blocks, one can t always stay on highways when going from o to d. Therefore, for each year, we identify a few highway 18

19 Road Travel Time and Trade Cost blocks, and only use normal travel time if o and d are connected via the same block of highways. The number of blocks declined as the highway networks expanded over time, and by 2007, basically all prefectures lay on one nationwide highway block. 3. Detour or fastest path adjustment. Especially when o and d are far apart, the avoid highways travel time will likely exceed the actual travel time even if neither prefecture is directly connected to highway networks, since one may still go on highways between the two whenever available. We thus allow one to take detours between o and d as long as this reduces travel time. Let t od denote travel time between o and d. Given a 3rd prefecture m, record (t om +t md ). For all prefectures other than o and d, if t od > min m {t om + t md }, replace t od with min m {t om + t md }. Clearly, this is only possible if at least one of t om and t md is normal travel time. Iterate the above two steps until t od doesn t get smaller. After these adjustments, Table 3 shows the reduction in travel time as a result of the nationwide highway expansion. In our sample period , there were 337 prefectures in China. 18 We restrict our empirical analysis to 287 prefecture-level cities and leave out 50 prefecture-level regions that were mostly remote, underdeveloped areas in Western China with little manufacturing activity. Among all ( ) = prefecture pairs, while the average travel distance stayed more or less the same, the average travel time decreased by over 40% from 1998 to We then follow a widely used approach in economic geography to relate travel time to trade cost. Baum-Snow et al. (2017) specify the following formula to relate road travel time to trade cost, τ od = ρ(travel time od ) λ where ρ > 0 and λ (0, 1) (9) and use ρ [0.5, 2], travel time in hours, and λ = 0.8 to incorporate some concavity. These parameter values correspond to a modest ad-valorem tariff of % per day (24 hours) of travel time, consistent with estimates in Limao & Venables (2001) and Hummels & Schaur (2013). However, this relationship likely varies across industries based on, say, the per-unit-value cost of transport. For instance, high weight-to-value industries such as timber and coking likely have a larger ρ. We thus empirically estimate this relationship between travel time and trade cost. 18 The number slightly changed over time, due to creation of new prefectures and merger of old ones. 19

20 Geographic Distribution of Firm Productivity and Production 5 Structural Estimation and Model Fit The structural model in Sec.(3) and trade cost in Sec.(4) pave the way for estimation of the model. This section estimates key parameters of the model based on Sec.(3). With these parameters in hand, we test the model s fit, and show that such a simple model provides a decent description of the actual geographic distribution of productivity and production. 5.1 Estimation To focus on the two main forces (import competition vs. export access) in local firm selection, we make an additional assumption that significantly simplifies the structural estimation: the homogeneous good (in sector j = 0) is produced in all locations with a unit cost and is costlessly traded. In such an incomplete specialization equilibrium, the unit cost of production is equal across all locations, c jo = c j o {1, 2... N}. While stringent, this assumption is in fact common in firm heterogeneity and trade literature, see Melitz & Redding (2014)) for an example. In Appendix A.1.2, we relax this assumption and point to an alternative method to measure market access empirically. For the manufacturing sector (for which we have firm-level data), we also impose location-level trade balance on the income-expenditure relationship for each location. Under this assumption, although a location can spend more on an industry s goods than it makes from selling this industry s goods, a location s total revenue from local firms in all industries is equal to its total expenditure on all industries goods, i.e. X jd = γ j j R jd. γ j represents the representative consumers proportion of total income from manufacturing (traded sector) spent on industry j s output, and we obtain γ j from the 2002 national input-output matrix. We now classify the model s parameters into 3 categories: (1) taken from existing literature; (2) computed directly from data; (3) estimated by fitting simulated moments to actual moments in the data. To start with, we borrow from Broda & Weinstein (2006) and set the same CES elasticity for all 2-digit industries, i.e. σ j = 4 j. 19 data. 20 We now directly estimate two sets of parameters for each industry-location j o from firm-level First, as we assume a Pareto distribution of firm productivity in each j o, we follow 19 Without inter-prefecture trade data, we can t directly estimate the CES elasticity as in Broda and Weinstein (2006). For robustness checks, we try different values of σ [3, 5]. For an industry, a larger σ implies more intense competition between firms. 20 For this step, location refers to province, which typically contains 8 16 prefectures, so that there are sufficient firms in each industry-location and we obtain more robust estimates. 20

21 Estimation Gabaix and Ibragimov (2011) to recover the shape parameter. 21 In each year, within each j o, we rank firm productivity ϕ joω in decreasing order, and estimate θ jo in the following OLS regression, 22 ( ln rank joω 1 ) = η jo 2 ˆθ jo ln ϕ joω Note that the standard error here on θ jo is not the OLS s.e. Recall that we use Levinsohn-Petrin output productivity, based on an estimated industry-specific production function, in the main empirical specification. We also replicate our main results using alternative productivity measures including Olley-Pakes, Ackerberg-Caves-Frazer, and Gandhi-Navarro-Rivers in Appendix A.??. Second, with the unbalanced panel data, for an industry-location j o that has at least appeared twice (i.e. has 1 firm for 2 years, we rely on the free-entry condition Eq.(8) to solve for c jo f jo and c jo e jo. 23 For example, observing total revenue R jo and mass (number) of firms S jo in years 1 and 2, we write (omitting j o subscripts and use y {1, 2} to indicate years), R 1 σ R 2 σ = S 1 cf + ce = S 2 cf + ce f = 1 R 2 R 1 cσ S 2 S 1 E = R 1 S cσ 1f For industry-location s that appear in T > 2 years, we pick ( T 2) 2-year pairs, solve the system of equations above for each pair, and take the average for f and E. 24 Now we specify the parameters to recover from structural estimation, conducted at industryyear level. Parameters used to convert travel time to trade cost, ρ j and λ j as in Eq.(9). As discussed, this time-cost relationship likely differs by industry. Given the importance of international trade for China in our sample period, particularly since China joined the WTO in 2001, it is necessary to incorporate the market access effects due 21 Alternatively, maximum likelihood gives the following estimates ˆϕ jo = min{ϕ joω } and ˆθ 1 jo = Mean[ln ϕ joω ] ln ˆϕ jo (10) This approach is quite sensitive to outliers of firm productivity. 22 Around 6.2% of estimated θ s are smaller than 1, often from industry-location s with few firms. In this case, we replace the θ s with the national mean of the industry. 23 We ignore industry-location s that has appeared only once in the sample period, which account for a tiny fraction of output. For instance, industry-prefecture s that only appear in 2004 and not other years account for 0.7% of the total output in About 7.4% of estimated f s and 9.0% of E s are negative, often from industry-location s with few firms. In this case, we replace the f s and E s with the respective national mean of the industry. 21

22 Geographic Distribution of Firm Productivity and Production to rest of the world (RoW). Without firm-level data from RoW, we assume a point-like RoW that all Chinese prefectures trade with, and use a prefecture s travel time to the nearest port as its travel time to RoW. 25 This gives us two additional parameters: CMA-adjusted total expenditure X CMA and FMA-adjusted total revenue R FMA for RoW. Having specified all parameters, we proceed to structural estimation based on Sec.(3). For each industry and year, we estimate Ψ j = {ρ, λ, ( ) X, ( ) R } F MA RoW CMA RoW j as follows. 1. Compute the inter-prefecture trade cost matrix, T. 2. Compute {CMA} j following Eq.(5). {CMA} j then gives {FMA} j by Eq.(4) f jo and FMA jo then gives ϕ jo by Eq.(6). Under the assumption of Pareto distribution of firm productivity in j o, local mean productivity ϕ jo = θ joϕ jo (this requires θ θ jo 1 jo > 1). 4. The true Ψ j = {ρ, λ, ( ) X, ( ) R } FMA RoW CMA RoW j will meet the moment condition E[y(Ψ j )] = E[ ϕ(ψ j ) ϕ] = 0 where indicates moments from simulated data. We thus seek Ψ j that achieves Ψ j = arg min Ψ j {y(ψ j ) W y(ψ j )} (11) where a location s weight in the weighting matrix W is proportional to its mass/number of firms. We follow Eaton et al. (2011) to calculate standard errors by bootstrapping, taking into account both sampling error and simulation error. 5.2 Model Fit This subsection summarizes parameter estimates from the previous subsection and presents evidence that this simple model fits well the actual geographic distribution of firm production. Figure 3(A) plots the estimated ρ, which relates travel time to trade cost, against weight-to-value ratios, across industries in Weight-to-value ratios, which measures the weight of an industry s 25 9 ports that handle the largest volume of international trade in 2001 (Baum-Snow et al., 2017): Dalian, Qinhuangdao, Tianjin, Qingdao, Lianyungang, Shanghai, Ningbo, Guangzhou, and Shenzhen. 26 {CMA} j denotes the set of CMA s (of every location) for industry j. 22

23 Model Fit output per unit value, serves as a proxy of the price of transportation per unit value. 27 For industries that pay a higher price for transportation, a decrease of one day in travel time likely means a larger decrease in trade cost. Indeed, Figure 3(A) shows a strong positive correlation between the estimated ρ and weight-to-value ratios. Since the weight-to-value ratios based on U.S. commodity flows may deviate from their counterpart in Chinese data, we also obtain the proportion of total expenses spent on road transport from the 2002 Chinese national input-output table, and conduct a similar exercise. As show in Figure 3(B) The strong positive correlation between ρ and the share of road transport expenses remains. 28 With the estimated Ψ in hand, we compute CMA following Eq.(5), then FMA by Eq.(4), and finally industry-location total revenue R by Eq.(2). Since we do not target {R} when searching for Ψ that satisfies the key moment condition (11), comparing the model-simulated { R} with the actual {R} in data will shed light on how well the model fits the geographic distribution of production. Figure 4 plots the simulated { R} against the actual (both in natural logs), based on 1998 firm-level data (first year in our sample period). The simulated { R} match with the actual quite well, with a slope almost equal to 1 and quite a small and insignificant constant term. The match becomes worse (1) at the very high end, due to a small number of extremely large outliers in firm productivity distribution; (2) at the low end, probably since the assumed Pareto distribution of firm productivity typically features a higher density than actual at the low end, but better approximates the actual distribution as one moves toward the high end. Table 4 shows similar patterns about the actual vs. simulated geographic distribution of firm production, also based on 1998 data. In Panel A, within each of the 29 2-digit industries, we rank prefectures that have firms in this industry by total revenue, and examine the prefectures in different size groups (Top 10 vs. Bottom 40). Again, our model well matches the actual data in this respect: for an industry, the 10 prefectures with the most revenue in data are almost always (97.3% of the time) also among the Top 10 according to the model. This mean decreases to 83.5%, with a greater standard deviation, when it comes to the Bottom 40 prefectures. Panel B shows a similar story about the geographic concentration of firm production. Again across industries, the share of revenue accounted for by the Top 5 and 10 prefectures are almost identical between simulated 27 Industry-level weight-to-value ratios, in 1k U.S. dollars per metric ton, come from 2007 U.S. Commodity Flow Surveys. See Duranton et al. (2014) for details. We can only match 22 of 29 2-digit industries in CIFD to sectors in CFS. 28 Note that, if an industry spends a large fraction of total expenses on road transport, it can mean either a high price of transport per unit value (e.g. heavy output), or a large quantity of transport services used (e.g. producers are often far away from consumers. 23

24 Geographic Distribution of Firm Productivity and Production and actual data. Nonetheless, the match deteriorates toward the low end for Bottom 20 and 40 prefectures, but the bottom prefectures typically accounted for less than 0.5% of the total revenue of an industry. Hence we feel confident that our simple model does a decent job approximating the geographic distribution of the bulk of firm production, and will generate meaningful counterfactuals when we estimate the general equilibrium effect of highway expansion in Sec.7. 6 Reduced-Form Evidence This section reports reduced-form evidence about how a location s market access affects local firm productivity and production. To the extent that the evidence supports the theoretical predictions in Sec.3, this section provides further confidence in the model s fit. Before presenting regressions, we note that a location s productivity distribution and production are co-determined in equilibrium together with its market access, generating endogeneity bias if we simply regress local productivity and production outcomes on market access measures. Thus for regressions in this section, we exclude each industry-location j o s own contribution, ( ) X and F MA jo ), respectively from its CMA and FMA (recall Eq.(3) and (4)). These terms are usually so ( R CMA jo small that this adjustment hardly affects our main results. Still, because changes in one location s market access will simultaneously induce changes in other locations market access, the point estimates on market access measures do not identify an average treatment effect, but rather empirically characterize the relationship between market access and outcomes of interest in equilibrium. Table 5 presents summary statistics at industry-prefecture-year level. Firm productivity and production outcomes, such as mean productivity, number of firms, total revenue, etc., all increased dramatically from 1998 to 2007, so did CMA. The increase in FMA is relatively modest, due to the considerable increase in CMA and a smaller increase in expenditure (recall Eq.(4)). 6.1 Market Access and Industry-Location Mean Productivity We first examine how market access affects productivity distribution at industry-location level. Recall that Eq.(6) predicts that local productivity cutoff increases in CMA but decreases in FMA. Since the minimum productivity in an industry-location is measured with much noise (extreme outliers are common), we investigate the empirical relationship between market access and local 24

25 Market Access and Industry-Location Mean Productivity mean productivity using the following regression, ϕ jot = b 1 CMA jot + b 2 FMA jot + ɛ jot (12) where ϕ jot is the mean productivity in industry-prefecture-year j o t, and all variables are in natural logs. We include industry*prefecture and industry*year fixed effects, and cluster standard errors at prefecture level. Table 6(A) present the results. In Column (1), we use the basic definition of CMA and FMA based on Eq.(5) and (4). Conditional on FMA, CMA exhibits a strong positive correlation with local average productivity, with an increase of 1 standard deviation (s.d.) in CMA corresponding to about 0.24 s.d increase in local mean productivity. In contrast, conditional on CMA, an increase of 1 s.d. in FMA is associated with 0.18 s.d. decrease in local mean productivity. Figures 4(A) and 4(B) show these empirical patterns visually. Column (2) replicates Column (1), but uses output-weighted mean productivity as the dependent variable. This puts higher weight on productive firms, due to the strong positive correlation between firm productivity and size. The empirical patterns in Column (1) remain in Column (2). A key endogeneity concern is that local economic shocks might have affected both market access and firm productivity and production, for instance, if prefectures with promising prospects for industrial growth were given priority to receive highway connection. We first show that, in our sample period, a prefecture s industrial growth in total industrial output, value added, employment,... does not predict when it was connected to highway networks (see Appendix A.??). We then develop two alternative measures of market access: (1) distant neighbor, which excludes prefectures within 300km of travel distance from the prefecture in question, is less likely to be correlated with local economic conditions; (2) baseline mass, which uses contemporary travel time but sticks with baseline (1998) economic mass (firm productivity, output, etc.), alleviates the bias due to the endogenous response of local economic mass to changes in trade costs. In Table 6(A), Columns (3)-(6) replicate Columns (1)-(2), but use the two alternative measures of market access, and the strong correlations between local mean productivity and market access are insensitive to alternative measures. Note that, since economic mass and market access are determined jointly in equilibrium involving all locations in the network, these alternative measures are not theoretically founded, and merely serve as robustness checks. Following Table 6(A), Table 6(B) explores heterogeneous effect of market access across industries that rely more or less on road transportation. We categorize the 29 2-digit industries into 25

26 Geographic Distribution of Firm Productivity and Production tertiles based on the share of road transport expenses in total input, and repeat Column (1) of Table 6(A) on these 3 subsamples. From Column (1) to (3), the correlations between local mean productivity and market access become stronger and more significant as the share of road transport expenses rises. 6.2 Market Access and Individual Firm Production Next, we test Eq.(1) that predicts a positive correlation between an industry-location s FMA and a local individual firm s revenue, conditional on the firm s productivity. The regression specification resembles Eq.(12), but the dependent variable is now individual firm revenue, and we control for individual firm productivity on the RHS. Column (1) in Table 7(A) starts by confirming the well-established positive correlation between firm productivity and size. Remarkably, controlling for a firm s own productivity, its industry-prefecture s market access still strongly affects its revenue. Specifically, 1 s.d. increase in FMA corresponds to 0.32 s.d. increase in firm revenue. Columns (2) and (3) show that these patterns persist if we use alternative market access measures. As discussed in Sec.3, a decline in trade costs results in competing effects on individual firm s output: import competition compresses firm output, while export access works against import competition. Which effect dominates likely depends on a firm s productivity: productive firms will benefit more from a larger export market than losing from more intense competition, and the other way round for unproductive firms. Following Table 7(A), Table 7(B) shows that, conditional on firm productivity, the positive correlation between FMA and firm revenue becomes stronger and more significant as firm productivity rises from Column (1) to (3). 6.3 Market Access and Firm Entry & Exit We proceed to the entry & exit dynamics associated with market access, presenting results in Table 8. Our firm selection mechanism entails that higher CMA makes unproductive firms more likely to exit, while higher FMA makes it easier for them to stay. Columns (1)-(3) test this hypothesis. Given two years, we set the earlier one (2004 here) as baseline, the later one (2007 here) as endline, and define firm exit as an indicator of a firm being present at baseline but not at endline. We take the long difference between baseline and endline for market access measures, and categorize firms at baseline into tertiles based on a firm s productivity ranking within its industry. 26

27 7. Counterfactuals: Evolving Highway Networks It is clear that market access affects exit decisions mostly among the less productive firms. Column (1) shows that a 1% increase in CMA is related to an increase of 2.1 percentage points, while a 1% increase in FMA to a decrease of 1.8 percentage points, in the exit probability of the less productive firms. This pattern diminishes as firm productivity rises in Columns (2) and (3). Column (4) evidences the prediction that higher CMA deters, while higher FMA encourages, firm entry. We tag a firm as entrant if it is present at endline but not at baseline, and count the number of entrants at industry-prefecture level. As in Column (4), a 1% increase in CMA is related to an 0.62% fewer, while a 1% increase in FMA to 0.41% more, entrants. The evidence on firm entry and exit suggests that market access works through firm selection. Since the CIFD excludes very small non-state firms, this paper misses the very low end of the national productivity distribution (given the strong correlation between firm productivity and size). Still, such firms account for a tiny fraction of aggregate output and export (recall Sec.??). Even without such firms, the import competition and export excess forces, captured by CMA and FMA respectively, are theoretically relevant and empirically present in the CIFD sample, as this section documents. Missing the very small non-state firms likely results in an under-estimation of the coefficients on market access measures, since firm selection due to import competition and export access effects happens at the lower end of each location s productivity distributions. For instance, an increase in CMA will likely make unobserved low-productivity firms exit. In summary, the reduced-form evidence on the relationship between market access and local productivity and production outcomes is consistent with our theoretical framework, reassuring that our simple model provides a good description of the actual trade dynamics and will likely generate meaningful counterfactuals. In comparison with many previous papers investigating the effect of trade cost shocks (see Goldberg & Pavcnik (2016) for a review), we decompose this effect into two, namely import competition and export access, capture them with respective market access measures, and find evidence of their counteracting effects. 7 Counterfactuals: Evolving Highway Networks 7.1 Fixed-Point Iteration The reduced-form evidence in Sec.6, while informative, is insufficient for estimating the effect of trade cost changes in general equilibrium because of the inter-location spillovers and the interdependence between market access and productivity distribution in the network of locations, as 27

28 Geographic Distribution of Firm Productivity and Production explained at the beginning of Sec.3. The spillovers and inter-dependence also make it challenging to write moments of interest, for example the mean productivity in an industry after some trade cost shocks, as explicit functions of parameters of the model. Therefore, we conduct fixed-point iteration to characterize counterfactual equilibria: given (observed) initial firm-level and trade cost data, what will the new equilibrium look like if one changes trade costs to a different set? Compute the initial and endpoint iceberg trade cost matrices, T I and T E. 2. Denote empirical measures based on initial raw data with subscript 0. Assume everything except trade costs stays the same (in particular, hold FMA s constant), compute {CMA} 1 using endline trade cost T E by Eq.(3). {CMA} 1 then gives {FMA} 1 by Eq.(4). 3. By Eq.(6), we find a new set of local productivity cutoffs {ϕ } 1, for every industry-location j o, ϕ jo,1 ϕ jo,0 and hence the mass of producing firms = ( FMAjo,1 FMA jo,0 ) 1 1 σ j 4. By Eq.(8), total firm revenue h jo,1 S jo h jo,0 S jo = dg ϕ jo jo,1 ϕ jo,0 ( ϕ jo,1 = dg jo ϕ jo,0 ) θjo R jo,1 σ j R jo,0 σ j c jo E jo = h jo,1s jof jo R jo,1 = c jo E jo h jo,0 S jo f jo ( FMAjo,1 FMA jo,0 ) θ jo σ j 1 Rjo,0 + [ 1 ( FMAjo,1 FMA jo,0 ) θ jo σ j 1 ] σ j c jo E jo 5. Given trade balance at location level, {R} 1 gives total expenditure: X jo,1 = j R jo,1. {R} 1 and {X} 1, together with {CMA} 1 and {FMA} 1 computed in Step 2, give {CMA} 2 and {FMA} 2 by Eq.(3). 6. Iterate Steps 2 5 until both {CMA} and {FMA} converge. 30 Denote empirical measures at fixed point with subscript F. Given {ϕ } F and {θ}, we draw a number of firms from every j o in proportion to S jo,f and obtain the national distribution of firm productivity and production A similar approach has been formalized by Allen & Arkolakis (2014), and applied in Donaldson & Hornbeck (2016) (see Sec.III.B Procedure for Counterfactual Simulations in their online appendix). 30 X Lacking firm-level data from RoW, we assume that FMA and R CMA stay constant for RoW during the iteration. 31 One may also derive explicitly empirical moments to match with raw data, yet it easily becomes messy to aggregate local productivity distributions to the national level. 28

29 Counterfactual: 1998 Firms, 2007 Highways Note that this method rules out the extensive margin at industry-location level. Baseline zeros, i.e. industry-location s without a single firm at baseline, may see entrants as highway networks expand. Yet we do not observe the mass of firms or total revenue of these baseline zeros, so the above method is inapplicable. Still, baseline zeros do not seem to matter much for our counterfactual characterization. At baseline in 1998, there were 6, 672 non-zero s and 1, 561 zeros, 273 of which became non-zero s but only accounted for 0.82% of the total output in In other words, the sizable growth in firm production between 1998 and 2007 comes largely from industrylocation s that already had producing firms in 1998, validating the fixed-point iteration approach above. 7.2 Counterfactual: 1998 Firms, 2007 Highways A key question of policy significance is: how much did the large-scale highway expansion contribute to the observed dramatic productivity and output growth observed between 1998 and 2007 (recall Table 1(A))? In addition to aggregate growth, what about inclusive growth across regions, given that the highways connected relatively developed and underdeveloped prefectures? To answer these questions, we apply 2007 travel time to 1998 firm data, and examine the resulting equilibrium now that only the market access forces associated with highway expansion are at work. Panel A of Table 9 compares this counterfactual equilibrium to the actual 1998 data. Overall, the highway construction between 1998 and 2007 raised aggregate productivity and, less powerfully, reduced productivity dispersion. Production became slightly less concentrated in the largest prefectures, and aggregate firm mass and revenue both increased substantially. This suggests the importance of both the import competition and export access effects of highway connection. On the one hand, unproductive firms in the least productive prefectures (where average firm productivity was very low to begin with) were forced to exit under intense import competition, resulting in higher aggregate productivity and smaller productivity dispersion. On the other hand, more productive prefectures saw a large number of entrants due to market expansion. While the entrants were less productive than their local incumbents, they were still more productive than the exiting firms in very unproductive prefectures. This massive entry then had a relatively small effect on the aggregate productivity level, but a much larger effect on the aggregate number of firms and revenue. The growth of the most productive incumbents also contributed considerably to aggregate revenue growth. We then compare this counterfactual equilibrium to the actual 2007 data, presenting the results 29

30 Geographic Distribution of Firm Productivity and Production in Panel B of Table 9. Comparison between the counterfactual and the actual changes shows that the firm selection mechanism fueled by highway construction alone accounts for about 24% of the observed productivity growth, and almost 40% of the reduction in productivity dispersion. Highwayinduced changes in market access are less powerful in explaining the observed more-than-doubled sum of firms and 4-fold increase in aggregate output, accounting for around 16% of the actual changes. The aggregate patterns mask enormous heterogeneity across industries and locations. In Table??, we examine how the counterfactual changes vary by industry and location characteristics at baseline (in this case, 1998 data). Productivity dispersion decreases more for industries with a higher share of output in (often inefficient) state-owned enterprises, suggesting that the import competition effect is stronger where there are initially many inefficient, close-to-exit firms. Highly productive prefectures tend to see a large number of entrants and a sizable increase in total revenue, suggesting that the export access effect dominates at the higher end of the productivity distribution. In contrast, unproductive prefectures tend to see more of their firms exit, and experience a decrease in total revenue. We keep in mind that highway expansion was among a myriad of factors that contributed to the observed evolution of productivity and production distributions. During the decade, surviving individual firms experienced substantial productivity growth (Brandt et al., 2012) by learning from the best performers, both domestic and abroad, and investing in R&D by themselves. The Chinese economy became increasingly market-oriented as the government reformed SOEs and pushed for privatization (Chen et al., 2016). China s admission to the WTO also allowed domestic firms to interact with global markets in more complex ways than our model assumes. Given more relevant forces than enumerated here, it is encouraging that our simple model explains a significant part of the actual evolution of the Chinese manufacturing sector. 8 Conclusion This paper develops a market access approach to studying the geographic distribution of firm productivity and production in an economy. In the Melitz framework, we distinguish between two distinct effects, import competition and export access, of trade cost changes, and derive CMA and FMA measures respectively to capture each effect. We empirically examine how market access forces shape the distribution of firm productivity and production across locations in China s manufacturing 30

31 8. Conclusion sector, taking advantage of the dramatic expansion of highway networks from 1998 to 2007 that brought about substantial reduction in inter-regional trade costs. Consistent with the theoretical framework, we find strong reduced-form evidence that, conditional on FMA, CMA raises local average productivity, shrinks local firms output, and make unproductive firms more likely to exit, while FMA works in the opposite direction. We use our model to recover counterfactuals, and find that a significant proportion of the observed productivity growth and reduction in productivity dispersion is attributable to the massive highway construction. This paper sheds lights on an under-researched role of transportation infrastructure in the course of economic development: lifting inter-regional trade barriers and fostering economic integration in a national market. Our results suggest that a low level of integration may explain the greater productivity dispersion in developing countries than in developed countries, as documented in Hsieh & Klenow (2009), and point to higher marginal returns to transportation infrastructure investments in less economically integrated countries or regions. We emphasize that economic integration generates competing effects on local firms, and firm heterogeneity is essential to evaluating the net benefit. Our market access approach highlights the importance of examining economic integration in general equilibrium, taking into account how local economic mass can endogenously respond to lower trade costs, and how such responses propagate through the network of geographic locations in an economy. 31

32 Geographic Distribution of Firm Productivity and Production References Allen, Treb, & Arkolakis, Costas Trade and the Topography of the Spatial Economy. Quarterly Journal of Economics, 129(3), Allen, Treb, & Arkolakis, Costas The Welfare Effects of Transportation Infrastructure Improvements. Working Paper. Asian Development Bank Retrospective Analysis of the Road Sector, Atkin, David, & Donaldson, Dave Who s Getting Globalized? The Size and Implications of Intra-national Trade Costs. Working Paper. Banerjee, Abhijit, Duflo, Esther, & Qian, Nancy (March). On the Road: Access to Transportation Infrastructure and Economic Growth in China. Working Paper National Bureau of Economic Research. Baum-Snow, Nathaniel Did Highways Cause Suburbanization? Quarterly Journal of Economics, 122(2), Baum-Snow, Nathaniel, Henderson, J. Vernon, Turner, Matthew A., Zhang, Qinghua, & Brandt, Loren Does Investment in National Highways Help or Hurt Hinterland City Growth? Working Paper. Baum-Snow, Nathaniel, Brandt, Loren, Henderson, J. Vernon, Turner, Matthew A., & Zhang, Qinghua. forthcoming. Roads, Railroads and Decentralization of Chinese Cities. Review of Economics and Statistics. Bernard, Andrew B., Jensen, J. Bradford, Redding, Stephen J., & Schott, Peter K Firms in International Trade. Journal of Economic Perspectives, 21(3), Bernard, Andrew B., Jensen, J. Bradford, Redding, Stephen J., & Schott, Peter K The Empirics of Firm Heterogeneity and International Trade. Annual Review of Economics, 4(1), Bloom, Nicholas, Eifert, Benn, Mahajan, Aprajit, McKenzie, David, & Roberts, John Does Management Matter? Evidence from India. Quarterly Journal of Economics, 128(1), Brandt, Loren, Biesebroeck, Johannes Van, & Zhang, Yifan Creative accounting or creative destruction? Firm-level productivity growth in Chinese manufacturing. Journal of Development Economics, 97(2), Broda, Christian, & Weinstein, David E Globalization and the Gains From Variety. Quarterly 32

33 8. REFERENCES Journal of Economics, 121(2), Caliendo, Lorenzo, & Parro, Fernando Estimates of the Trade and Welfare Effects of NAFTA. Review of Economic Studies, 82(1), Chen, Yuyu, Igami, Mitsuru, Sawada, Masayuki, & Xiao, Mo Privatization and Productivity in China. Working Paper. Donaldson, Dave. forthcoming. Railroads of the Raj: Estimating the Impact of Transportation Infrastructure. American Economic Review. Donaldson, Dave, & Hornbeck, Richard Railroads and American Economic Growth: A Market Access Approach. Quarterly Journal of Economics, 131(2), Duranton, Gilles, & Turner, Matthew A Urban Growth and Transportation. Review of Economic Studies, 79(4), Duranton, Gilles, Morrow, Peter M., & Turner, Matthew A Roads and Trade: Evidence from the US. Review of Economic Studies, 81(2), Eaton, Jonathan, & Kortum, Samuel Technology, Geography, and Trade. Econometrica, 70(5), Eaton, Jonathan, Kortum, Samuel, & Kramarz, Francis An Anatomy of International Trade: Evidence From French Firms. Econometrica, 79(5), Ellison, Glenn, & Glaeser, Edward L Geographic Concentration in U.S. Manufacturing Industries: A Dartboard Approach. Journal of Political Economy, 105(5), Faber, Benjamin Trade Integration, Market Size, and Industrialization: Evidence from China s National Trunk Highway System. Review of Economic Studies, 81(3), Fang, Wanli Dispersion of Agglomeration through Transport Infrastructure. Ph.D. dissertation, MIT Department of Urban Studies and Planning. Gandhi, Amit, Navarro, Salvador, & Rivers, David On the Identification of Production Functions: How Heterogeneous is Productivity. Working Paper. Ghani, Ejaz, Goswami, Arti Grover, & Kerr, William R Highway to Success: The Impact of the Golden Quadrilateral Project for the Location and Performance of Indian Manufacturing. Economic Journal, 126(591), Glaeser, Edward L., & Xiong, Wentao Urban Productivity in the Developing World. Oxford Review of Economic Policy, 33(3),

34 Geographic Distribution of Firm Productivity and Production Goldberg, Pinelopi K., & Pavcnik, Nina The Effects of Trade Policy. NBER Working Paper No Hsieh, Chang-Tai, & Klenow, Peter J Misallocation and Manufacturing TFP in China and India. Quarterly Journal of Economics, 124(4), Hummels, David L., & Schaur, Georg Time as a Trade Barrier. American Economic Review, 103(7), Limao, Nuno, & Venables, Anthony J Infrastructure, Geographical Disadvantage, Transport Costs, and Trade. World Bank Economic Review, 15(3), Lin, Yatang Travel costs and urban specialization patterns: Evidence from Chinas high speed railway system. Journal of Urban Economics, 98(Supplement C), Urbanization in Developing Countries: Past and Present. Melitz, Marc J The Impact of Trade on Intra-Industry Reallocations and Aggregate Industry Productivity. Econometrica, 71(6), Melitz, Marc J., & Redding, Stephen J Chapter 1 - Heterogeneous Firms and Trade. Pages 1 54 of: Gopinath, Gita, Helpman, Elhanan, & Rogoff, Kenneth (eds), Handbook of International Economics. Handbook of International Economics, vol. 4, no. Supplement C. Elsevier. Michaels, Guy The Effect of Trade on the Demand for Skill: Evidence from the Interstate Highway System. Review of Economics and Statistics, 90(4), Nagy, David K City Location and Economic Development. Working Paper. Puga, Diego The Magnitude and Causes of Agglomeration Economies. Journal of Regional Science, 50(1), Redding, Stephen J., & Turner, Matthew A Chapter 20 - Transportation Costs and the Spatial Organization of Economic Activity. Pages of: Duranton, Gilles, Henderson, J. Vernon, & Strange, William C. (eds), Handbook of Regional and Urban Economics. Handbook of Regional and Urban Economics, vol. 5, no. Supplement C. Elsevier. Rosenthal, Stuart S., & Strange, William C Chapter 49 - Evidence on the Nature and Sources of Agglomeration Economies. Pages of: Henderson, J. Vernon, & Thisse, Jacques-Franois (eds), Cities and Geography. Handbook of Regional and Urban Economics, vol. 4, no. Supplement C. Elsevier. Tombe, Trevor, & Zhu, Xiaodong (June). Trade, Migration and Productivity: A Quantitative 34

35 Analysis of China. Working Paper. University of Toronto, Department of Economics. World Bank Domestic Trade Impacts of the Expansion of the National Expressway Network in China. 35

36 Table 1. CIFD Summary Statistics, Table 1(A). Firm Balance Sheet Items This table present national aggregates of the firm-level Chinese Industrial Firms Database (CIFD), Year Firm Employ ment Labor compens ation Capital stock Intermed Value iate input added Industria l output Export , , , , , , , , , , Notes: (1) Real values in trillion CNY (deflated to 1998 level), employment in million. 36

37 Table 1. CIFD Summary Statistics, Table 1(B). Firm Entry & Exit This table counts the number of entrants and exitors every year in the CIFD. A firm is labeled "exitor" if it appears in the current year but not the following year, and is labeled "entrant" when it appears in the sample for the first time and reports a starting year no more than two years earlier. Year Total Continuing Exiting Incumbents Entrants , ,083 19, , ,450 18, ,350 6, , ,514 22, ,413 7, , ,892 18, ,562 12, , ,314 17, ,660 11, , ,424 35, ,253 15, , ,632 32, ,425 45, , ,045 22, ,680 19, , ,291 26, ,435 25, , ,710 29,076 Notes: (1) N (10-year balanced panel) = Firms that exit and later re-enter the sample (less 1% of all firms) are considered to be operating throughout. 37

38 Table 1. CIFD Summary Statistics, Table 1(C). Productivity Dispersion This table shows summary statistics of firm-level Levinsohn-Petrin productivity (in natural logs). For each of the 29 2-digit industries, we estimate a production function. Within each industry, we rank firms by productivity, take some statistic of the productivity distribution, and then average across industries. (ln) Levinsohn-Petrin output productivity Unweighte d mean Weighted mean S.D. IQR p90-p Note: (1) "Weighted mean" refers to mean productivity weighted by firm output. 38

39 Table 2. Productivity Expersion: Variance Explained This set of tables explore what explain between-firm productivity dispersion, and show the cumulative percent of variance explained (CPVE). "Precfecture": individual prefecture dummies; "ownership": type of ownership dummies (state-owned, private ); "age"and "size": dummies for respective quartiles. Table 2(A). No Industry-Specific Effects For all industries, regress firm productivity on each set of dummies, and take the adjusted R-squared. Year Prefecture Ownership Age Size % 0.08% 0.02% 0.05% % 0.07% 0.04% 0.06% % 0.06% 0.04% 0.06% % 0.09% 0.11% 0.05% % 0.11% 0.09% 0.05% % 0.13% 0.09% 0.06% % 0.05% 0.09% 0.08% % 0.07% 0.07% 0.08% % 0.06% 0.07% 0.15% % 0.06% 0.05% 0.22% Table 2(B). Industry-Specific Effects For each industry, regress firm productivity on each set of dummies, and take the adjusted R-squared. Then take the mean adjusted R-squared across industries. Year Prefecture Ownership Age Size % 1.5% 2.8% 16.2% % 1.7% 2.5% 17.1% % 1.9% 2.4% 17.3% % 1.8% 1.9% 17.0% % 1.9% 1.6% 17.7% % 1.4% 1.3% 17.8% % 1.6% 0.9% 14.6% % 1.1% 0.6% 18.3% % 1.0% 0.5% 18.8% % 0.8% 0.5% 18.6% 39

40 Table 3. Road Travel Distance and Time This table provides summary statistics of road travel time and distance between Chinese prefecture pairs, in the first and last year of our sample period. 287 prefecture-level cities, prefecture pairs every year. Year 1998 Mean Median SD p10 p90 Travel distance (km) Travel time (h) Avg. speed (km/h) Year 2007 Mean Median SD p10 p90 Travel distance (km) Travel time (h) Avg. speed (km/h)

41 Table 4. Industry-Prefecture Total Revenue, Data vs. Model This set of tables compare the industry-prefecture total revenue between actual data and model-simulated data, for the year Table 4(A): Size Groups Within each of the 29 2-digit industries, we rank prefectures by total revenue, and record the set of prefectures in a certain size group (Top 10, Bottom 40) in simulated data. We then compute the share of these prefectures that fall in the same size group in actual data, and show summary statistics of this share across industries. mean median s.d. IQR % in size group Top Bottom Table 4(B): Geographic Concentration of Production Within each of the 29 2-digit industries, we rank prefectures by total revenue, and record how much prefectures in a certain size group (largest 5, smallest 20, etc.) account for the industry's aggregate revenue. We show summary statistics of this revenue share across industries. Data Model mean s.d. mean s.d. Prefecture share (%) Top Top Bottom Bottom Log-log rank-size (0.012) (0.012) 41

42 Table 5. Industry-Prefecture Summary Statistics This table provides summary statistics at industry-prefecture level. Market access measures are computed based on parameter estimates from the structural model, and exclude home market. Year 1998, N = 6672 Mean Median SD p10 p90 Mean productivity Firm count Total employment 6,744 2,123 14, ,615 Total revenue CMA basic FMA basic Year 2007, N = 6706 Mean Median SD p10 p90 Mean productivity Firm count Total employment 9,455 2,296 29, ,714 Total revenue CMA basic FMA basic Notes: (1) Mean productivity is the mean Levinsohn-Petrin output productivity across firms in an industry-prefecture-year. (2) Total revenue is in billion CNY. 42

43 Table 6(A). Market Access and Industry-Prefecture Average Productivity This table presents panel regressions of industry-prefecture mean productivity on market access measures. "Weighted" refers to mean productivity weighted by firm output, across all firms in an industry-prefecture-year. Market access measures are computed in 3 different ways: (1) "basic" follows the basic definitions in the theoretical framework, using all locations in the economy and contemporary economic mass (firm productivity, revneue, etc.); (2) "distant neighbors" excludes prefectures within 300km of travel distance from the prefecture in question; (3) "baseline mass" restricts the economic mass to baseline (Year 1998). 43 (1) (2) (3) (4) (5) (6) Dependent variable: id-pf-yr mean productivity Basic Distant neighbor Baseline mass Unweighted Weighted Unweighted Weighted Unweighted Weighted Consumer market access 0.064*** 0.115*** 0.049*** 0.082*** 0.040*** 0.067*** (0.015) (0.036) (0.014) (0.025) (0.011) (0.023) Firm market access *** *** *** *** *** *** (0.007) (0.009) (0.007) (0.008) (0.005) (0.006) R-squared N Notes: (1) All variable in natural logs. (2) Fixed effects: industry*prefecture, industry*year. Standard errors clustered at prefecture level.

44 Table 6(B). Market Access and Industry-Prefecture Average Productivity, Heterogeneous Effects This table presents the same regressions as in the preceding Table 6(A), but on 3 subsamples based on an industry's share of road transport expenses in its total input. This variable comes from the 2002 Chinese national input-output table. (1) (2) (3) Dependent variable: id-pf-yr mean productivity % of road transport expenses in total inputs 1st tertile 2nd tertile 3rd tertile Consumer market access - basic 0.023*** 0.057*** 0.083*** (0.011) (0.021) (0.033) Firm market access - basic ** *** (0.012) (0.012) (0.013) R-squared N Notes: (1) All variable in natural logs. (2) Fixed effects: industry*prefecture, industry*year. Standard errors clustered at prefecture level. 44

45 Table 7(A). Market Access and Individual Firm Revenue This table presents panel regressions of individual firm revenue on firm productivity and market access measures. See Table 6(A) for how the 3 types of market access measures are defined. (1) (2) (3) Dependent variable: individual firm revenue Basic Distant neighbor Baseline mass Firm productivity 0.827*** 0.805*** 0.841*** (0.145) (0.184) (0.177) Consumer market access *** (0.021) (0.023) (0.018) Firm market access 0.251*** 0.207*** 0.193** (0.081) (0.073) (0.071) R-squared N Notes: (1) All variable in natural logs. (2) Fixed effects: industry*prefecture, industry*year. Standard errors clustered at prefecture level. 45

46 Table 7(B). MA and Individual Firm Revenue, Heterogen. Effects This table presents the same regressions as in the preceding Table 7(A), but on 3 subsamples based on a firm's productivity. (1) (2) (3) Dependent variable: individual firm revenue Firm productivity tertile within industry-year 1st tertile 2nd tertile 3rd tertile Firm productivity 0.721*** 0.794*** 0.903*** (0.166) (0.209) (0.250) Consumer market access *** (0.029) (0.036) (0.043) Firm market access *** 0.389*** (0.094) (0.077) (0.133) R-squared N Notes: (1) All variable in natural logs. (2) Fixed effects: industry*prefecture, industry*year. Standard errors clustered at prefecture level. 46

47 Table 8. Market Access, Firm Productivity, and Enry & Exit This table shows how market access affects firm entry and exit. In Columns (1)-(3), we define "firm exit" as an indicator of an individual firm being present at baseline year (2004 here) but not at endline (2007 here), and take the long difference between baseline and endline for market access measures. We categorize firms at baseline into tertiles based on a firm's productivity ranking within its industry. In Column (4), we tag a firm as "entrant" if it is present at endline but not at baseline, and count the number of entrants at industryprefecture level. (1) (2) Firm exit by endline (3) (4) (ln) Number of entrants Firm productivity tertile within industry-year, at baseline. 1st tertile 2nd tertile 3rd tertile (LD) (ln) Consumer market access 0.021** 0.008** ** (0.009) (0.004) (0.008) (0.314) (LD) (ln) Firm market access ** * * (0.004) (0.003) (0.004) (0.245) R-squared N Notes: (1) Baseline year 2004, endline year Cross-section regression at endline. (2) LD for long difference (between baseline and endline), LL for long lag (variable at baseline). (3) FE: industry*prefecture for (1)--(3); industry & prefecture for (4). Standard errors clustered at prefecture level. 47

48 Table 9. Highway Network Expansion: Counterfactual This table presents results from the counterfactual equilibirium that we arrive at by applying 2007 trade cost to 1998 firm data. This counterfactual describes what would happen had if only the highway-induced market access forces were at work between 1998 and Panel A compares this counterfactual to 1998 data. Panel B compares this counterfactual to 2007 data, showing how much of the observed evolution of firm productivity and production the highway-induced market access forces can account for Data Panel A 1998 firms, 2007 highways Change 2007 Data Panel B Change Counterfactu al vs. actual 48 At industry level, then average across industries Productivity distribution Weighted mean % s.d % IQR % p90-p % Production (revenue) distribution Top 5 prfc's % Bottom 20 prfc's % National aggregate (real, deflated to 1998 level) Mass of firms 128, , % 287, % 9.2% Total revenue (trillion CNY) % % 16.0% Notes: (1) Levinsohn-Petrin productivity in natural logs. (2) In the "Change"and " Change" columns, numbers without % indicate difference in levels, numbers with % indicate changes in percent.

49 Figure 1: Geography-Related Characteristics across Industries This set of figures show geography-related characteristics across 29 2-digit industries, for the year Figure 1(A): Cumulative % of Variance in Firm Productivity Explained by Prefecture Figure 1(B): Ellison-Glaeser Agglomeration Index across Industries Figure 1: Figure 1 49

50 Figure 2. Highway Networks in China Maps of highway networks in China, based on ACASIAN GIS data. Figure 2(A): Highway Networks in 1998 Figure 2(B): Highway Networks in