The birth of new Clusters

Size: px
Start display at page:

Download "The birth of new Clusters"

Transcription

1 The birth of new Clusters Sebastian Bustos, Ricardo Hausmann, Rodrigo Wagner [Preliminary - for submission at NEUDC Neither quote nor circulate] Abstract Externalities across related industries have been mentioned as a potential key for economic development. Nonetheless, the literature provides little identified evidence of a causal link between the location of one large firm and the subsequent entry of new related industries to the region. To address that gap, in this paper we use two decades of US county-level data from 1977 to 1997 and combine it with the Million Dollar Plants natural-experiment (MDP), in which a large industrial plant located in one shortlisted county for arguably exogenous reasons (see Greenstone, Hornbeck, and Moretti, 2010; JPE). After remarking a boom in non-tradable activities, our main result is that the subsequent birth of new industries related to the MDP s industry is 50%-75% higher for treated counties than for the control group. In contrast, the difference between treatment and control groups is economically and statistically insignificant for industries that are unrelated to these arguably exogenous MDP plants. Our preliminary results support the view that today s industries causally impact the type of firms that will born in the future. There is hysteresis in how regions locate in the industry-space", so movements in this space can have strategic consequences for subsequent entrants. JEL O25, R12, F23. Keywords: clusters, industrial policy, economic growth. This is a very preliminary draft for submission to NEUDC conference Authors acknowledge the excellent work of Carla Tokman and the kind help of Richard Hornbeck, who shared some files with public information on Million Dollar Plants. Of course their help does not mean endorsement since usual disclaimers apply. Authors are the only responsible for errors, opinions and omissions in this paper. Corresponding Sebastian_Bustos@hks.harvard.edu Harvard University Harvard University Tufts University, University of Chile and CID-Harvard 1

2 1 Introduction The diversity of related economic activities in an area can potentially spur positive spillovers, like Jacobs (1969) urbanization externalities or Marshall s (1920) localization externalities. 1 But even if one believes in those channels, the literature still knows very little about how a government with limited tools can dynamically foster more diversity in a geographic area. In fact, existing empirical research exploring how having one industry impacts the location of other subsequent industries suffers from Manski s reflection problem : rather than industry A facilitating the entry of B, it could be a third unobserved factor causing both industries to locate in the same place, sometimes with an unknown time lag. One of the few recent attempts to understand co-location of some industries was Ellison, Glaeser, and Kerr (2010). They explore the cross section of related industries and tried to tell apart which economic forces drive agglomeration (e.g. labor pooling, exchange of ideas, input-output relations). They do so calculating the various types of industry relations in the UK and other nonagglomerated areas of the US, as a way to mitigate reverse causality concerns. Having said that, their paper offers essentially cross sectional evidence. It remains a big if whether there is timeseries relation in which the entry of one firms could foster the entry of related industries. To answer this question we take advantage of a natural experiment already exploited for other purposes by Greenstone, Hornbeck, and Moretti (2010): the Million Dollar Plants (MDP). In their paper they ask how does the entry of a large industrial plant impact TFP in incumbent plants (and therefore incumbent industries). Using records of the Site Selection Magazine they take advantage of the fact that a large industrial plant located in one shortlisted county for arguably exogenous reasons, so they could compare the treated county that received the plant against a runner-up county, that was shortlisted but did not receive the large plant. We use the same natural experiment to explore which new industries emerge after a large plant arrives to a region. After remarking a boom in non-tradable activities, our main result is that the subsequent birth of new industries related to the MDP s industry is 50%-75% higher for treated counties than for the control group. In contrast, the difference between treatment and control groups is economically 1 For a review on these types of externalities see Rosenthal and Strange (2006) and Ioannides (2012)[chapter 4]. See also See Chatterji, Glaeser, and Kerr (2013) for a review on clusters of entrepreneurship and innovation as well as Delgado, Porter, and Stern (2012) 2

3 and statistically insignificant for industries that are unrelated to these arguably exogenous MDP plants. Our preliminary results support the view that today s industries causally impact the type of firms that will born in the future. There is hysteresis in how regions locate in the industryspace", so movements in this space can have strategic consequences for subsequent entrants. Some authors have previously explored the relationship between clusters and entrepreneurship (seeglaeser and Kerr, 2009;Delgado, Porter, and Stern, 2010). We share with them the interest in new activities, but rather than focusing on new firms, mostly in existing industries, we look for new firms in new industries. On the other hand, Bustos, Gomez, Hausmann, and Hidalgo (2012) predict the emergence of new export industries across countries using industry relationships (in nested matrices) to other existing products. We share with them the interest of understanding the sequencing of new economic activities, but we focus on the causal channel rather than in providing a time series prediction. Other papers focus on comparative case studies of a few cities. For example Klepper (2010) uses data on two canonical cases of clustering (automobiles in Detroit and integrated circuits in Silicon Valley) to explore the dynamics of inter-industry evolution. Our work in this paper shares with Klepper (2010) the interest on the dynamics of clustering, but has two important differences. On the one hand we use a more systematic procedure that does not focus on canonical cases, but is general to our sample of counties. This mitigates hindsight bias of focusing only on winners. Second, we will not only focus of clustering in the same industry, but also clustering of related industries that are important and complementary for the industrial ecosystem but that do not necessarily share the same SIC code. This is especially important for industries that have reached a large scale: it is very unlikely that another Airline firm starts nextdoor to Boeing s plant in Snohomish County, Washington State. The rest of the paper goes as follows. Section 2 describes the data and shows descriptive statistics. Section 3 shows that there is disproportionate birth of related industries after the entry of the MDP on the treated counties, but not on the control counties. Later on this section shows various robustness checks to the central result, also remarking a boom in non-traded industries. Finally section 4 concludes with remarks and an invitation to further research. 3

4 2 Data and Descriptive Statistics 2.1 Data sources and procedures Our data comes from a combination of, on the one hand, the County Business Patterns (CBP) and, on the other hand, the shortlist of counties from Site Selection Magazine, previously collected and used in Greenstone, Hornbeck, and Moretti (2010). 2 We compiled a panel at the industry-county-year level with number of firms and employment using the editions of CBP, produced by the Bureau of the Census. The year 1977 was chosen to provide at least 5 year before the first case a large plant opening according to the Site Selection magazine, while 1997 was the last year for which CBP followed the Standard Industrial Classification (SIC) and then moved to the North American Industrial Classification (NAICS). The CBP dataset excludes government and military employment but covers the great majority of the private sector, excluding only agricultural workers, railroad workers and household employment. But even within the period CBP used different vintages of SIC industrial classifications. We needed to make a single concordance of SIC combining both vintages, since we are interested in new industries and we do not want the change of SIC system to drive our results. From 1977 until 1987 CBP followed the SIC classification of We first used the concordance tables provided by the Census to convert the SIC-1972 into SIC codes following the 1987 classification, but various codes do not have a direct concordance. In fact we found that some codes reported in the CBP data that supposedly aggregated other codes (i.e. two digit codes should aggregate the corresponding 4 digit codes) accounted for more firms and employment than adding the disaggregated codes. We implemented an algorithm to identify these cases. To retain as much information as possible, we created a modified SIC industrial classification following the 1997 version including some new codes for this cases. As a result, our data on counties has 692 different industries. The concordance, which might be of independent interest for some authors, is available from authors upon request. Large plant openings where identified using the Site Selection magazine. Each issue of Site Selection includes an article titled Million Dollar Plants (MDP) describing how firms decided where 2 We thank Rick Hornbeck for kindly sharing the digitized version of the list of counties. 4

5 to locate new plants, that decision is labeled a "case". These articles always report the county that the firm chose (i.e., the winner ), and report the runner-up county or counties (i.e., the losers ). To ease the exposition we will interchangeably talk about "winners"as the Treatment group and the runner up as the Control group. This data was collected and first used in Greenstone and Moretti (2004), who in the complete sample had 82 cases of winner counties and 211 runner-up counties. One limitation is that the industry code of the opening plants does not appear in magazine articles and it is not published in Greenstone and Moretti (2004). We performed a search on Lexis/Nexis and other sources, identifying the SIC codes for 68 cases. As Greenstone, Hornbeck, and Moretti (2010), we have to remark that some counties ended up being both treatment and control in different cases. We think this might be less of a concern since in this paper we are looking for entry of new industries and the county could suffer a treatment in one SIC industry in one year, but being a control in a probably very different industry and a different year. Figure 1 displays a map of all US counties, highlighting MDP counties and whether they are treatment, control or both. Figure 1. Map of counties in Million Dollar Plants Combining the two data sets, our final sample is the result of the CBP data restricted to those counties that where either winner or losers of a large plant opening. While we may have up to 21 years of data ( ), we follow the same timing of Greenstone, Hornbeck, and Moretti (2010) 5

6 and limit the sample to 13 years, namely 7 years before the entry and 5 years after. Table 1 summarizes the dimensions of the Panel, which would be balanced had not been for 2% of the time observations that are missing because the 13 years window may exceed the boundaries We though that the methodological challenges of extending the sample after 1997, which would require a third correspondence with NAICS, may generate more noise that its benefit through a balanced sample. In almost all regressions we include a case fixed effect so both treatment and control groups are compared in the same years, attenuating the potential biases of a slightly unbalanced panel. Table 1. Dimensions of the panel Number County cases 151 Industries (SIC digit) 692 Total cross-section (complete) = 104,492 Time dimension 13 years ; t = {- 7,...,0,..., +5} Panel 1,333,484 (98% of 104,492 13) 2.2 Descriptive statistics of the sample Table 2 provides a basic description of our sample. The observations are a unique combination of a case s, county c, industry i and event time t; consisting in overall 1.3 million observations. Among these observations, 46% of the industries exist (E scit = 1), where by existing we mean that the number of establishments N scit is greater than zero. The unconditional average number of establishments N s,c,i,t is 10.2 and overall they have on average 172 workers L s,c,i,t ; while conditional on the sector existing in that place and time the average number of establishments is N s,c,i,t is 22.1 and of employees is 373. As expected the distribution is highly skewed,with the average number of employees conditional on existence being above the 90th percentile of observations. 6

7 Table 2. Descriptive statistics of the sample of observations, each one identified by a combination of case, county, industry and year. stats N of Employees Existence:1[N > 0] establishments N s,c,i,t L s,c,i,t E s,c,i,t mean se(mean) count 1,333,484 1,333,484 1,333,484 p p p p p Note: Conditional on existence, the average N s,c,i,t is 22.1 and the average L s,c,i,t is 373; which come from dividing the unconditional average by the unconditional average existence. 2.3 Measuring co-location proximity of industries. To measure co-location proximity across industries we use the1977 County Business Patterns for the all the counties in the US 3, and then proceed to calculate conditional probabilities that two industries i and j are located in the same county. If ɛ i is a dummy represents that industry i exists in a county, we computed both probabilities E (ɛ i = 1 ɛ j = 1) and E (ɛ j = 1 ɛ i = 1), taking the expectations across counties in the CBP. Then we follow Hausmann and Klinger (2006) and Hidalgo, Klinger, Barabasi, and Hausmann (2007) and compute the co-location proximity index as the minimum of the two conditional probabilities P roximity ij = min {E (ɛ i = 1 ɛ j = 1), E (ɛ j = 1 ɛ i = 1)} (0, 1) (1) Future version of the paper will include also the Ellison-Glaeser (1999) index of co-agglomeration that tries to capture the same phenomenon. 3 In future versions we will exclude the MDP counties used in our main sample to compute proximity measures. But since the MDP are a small fraction of the total number of counties (see Figure 1), this change seems unlikely to impact our results. 7

8 2.4 Previous findings of Greenstone, Hornbeck, and Moretti (2010) and their identification Before jumping to our analysis it is worth emphasizing the findings and identification made by Greenstone, Hornbeck, and Moretti (2010), who use the same empirical strategy we borrow, although they had access to micro-data on firms, while for our purposes we just needs aggregate data at the county level to understand industry diversity. Their main finding is an increase in TFP of incumbent plants. They first show that treatment and control counties had parallel and similar TFP trends in the seven years before the MDP. In contrast, after the MDP incumbent plants in treated county suffers a change in TFP trend, so by the fifth year there is a difference in TFP of 12% between plants in treated and control counties. Later on they correlate the TFP gains with various measures of relatedness between incumbent plants in other industries and the MDP s industry. They find that the TFP effect is larger for incumbent plants that share similar labor and technology pools with the new plant. Finally, they find evidence of an increase in wages and other costs so not all TFP gains go to profits. Our results naturally complement those of GHM-2003 because we focus on effects of the MDP on a different subset of firms: the ones that were born in new industries after the entry of the MDP. The identification assumption for their paper and ours is that there there are no systematic differences in trends between treatment and control groups, had there not been an treatment. On top of the parallel trends discussed before, the second piece of evidence supporting their identification assumption is in Table 3 of their paper, where they show that the MDP winner-loser research design balances many (although not all) observable county-level and plant-level covariates. Overall we proceed defending our identification strategy using the same bottom line they use. Even if there could be some deviations between treatment and control groups that makes the natural experiment imperfect, the MDP seems as of today one of the few available strategies to get close to a causal link. 8

9 3 What happens to industrial diversity after a large plant opens in the county? 3.1 Entry of new industries Here we study how related industries develop after the MDP entered, looking for differences between treatment and control groups. As a first outlook to the phenomenon we run a non-parametric regression between the proximity of an industry to the MDP and the probability that that industry is born at any point t 1, instead of remaining nonexistent. The vertical axis is the mean of a dummy variable that takes the value of one if the industry is born, b s,c,i, but only looking at industries that did not exist in the county prior to the entry of the MDP, also excluding the MDP s industry itself. (see Figure 2). In the graph the difference in industry birth probabilities between treatment and control groups is evident only after a threshold proximity of around 0.15 units; but not below it. This means that the difference shows up only for industries that are related enough to the MDP s industry. To have a sense of this threshold, 0.15 corresponds to the 90th percentile of proximity measured across any two SIC industries in the US, as shown in the Appendix. 9

10 Figure 2. Non-parametric regression birth After this first visual approximation we explore the same question in the context of parametric models. Table 3 displays linear probability estimates of a regression of birth using the same cross section of case-county-industries. Namely, b s,c,i = γ 0 + γ 1 T reat s,c + γ 2 Related s,c,i + γ 3 [T reat Related] s,c,i + ε s,c,i (2) ; in which T reat s,c is a dummy variable for being a treatment county, Related s,c,i is a dummy indicating that the industry has high proximity to the MDP, and ε s,c,i are error components that are clustered to allow for correlation at the case-county level. Depending on the specification ε s,c,i also includes various fixed effects. In the estimation the parameters of interest are on the one hand γ 1,which gives the treatment effect for industries that are less related to the MDP, and on the other hand γ 3, which gives the additional treatment effect on birth when the industry is related to the MDP in terms of proximity. Empirically we defined Related 1 [P roximity 0.16] and we performed some sensitivity analysis. 10

11 The coefficient γ 3 in specification (1) of Table 3 shows a 10.9 percentage point increase in the probability of being born (p-value<0.05). That means that the estimated birth probability for the treatment group is 75% larger than for the control group in those industries that are related to the MDP. In contrast, when we look at the industries that are not related to the MDP we observe no difference; with the coefficient γ 1 being statistically insignificant and with a point estimate that is less than an order of magnitude smaller in its economic significance. Since this is a case control study, in specifications (2) to (4) we also add a case fixed effect so we exploit variation only within each case. Even if our observations are a cross section, they happen at different moments in time. To control for a potential time effect confounding our results in (3) and (4) we also add fixed effect for the year in which the MDP arrived. The central results are largely unchanged by these corrections, with point estimates for γ 3 in the range of 10-11%. The only difference is that when in (4) we correct for 2 digit industry fixed effects we find that there is a significant reduction in the probability of birth for the treatment group in industries that are less related to the MDP, although the magnitude is still an order of magnitude smaller. The result so far seems consistent with a view in which the birth of new industries is impacted by the comparative advantage of the region. Industries similar to the MDP are more likely to born, while industries unrelated to the MDP might be less likely to born, maybe due to General Equilibrium effects like the impact on skill corrected wages reported by Greenstone, Hornbeck, and Moretti (2010). The central finding of more birth in industries related to the MDP is not surprising given the previous findings of Greenstone, Hornbeck, and Moretti (2010), who showed disproportionate productivity gains in incumbent plants from industries that were related to the MDP. They found effects on existing firms and sectors; while we found an effect on innovation of new sectors. 11

12 Table 3. Linear probability regressions of the birth of new industries 1 [Industry is born given it did not exist before MDP] (1) (2) (3) (4) 1 Treated ; γ ** ( ) ( ) ( ) ( ) 1 [Related to MDP] ; γ * * (0.0421) (0.0321) (0.0321) (0.0368) Treated [Related to MDP] ; γ ** 0.105** 0.105** 0.115*** (0.0487) (0.0409) (0.0409) (0.0433) Constant ; γ *** *** 0.103*** 0.179*** ( ) (0.0108) ( ) (0.0273) Case s FE FE FE Year of t = 0 FE FE 2 digit SIC Industrial Sector FE Observations 46,372 46,372 46,372 46,372 R-squared Robust standard errors in parentheses, clustered at the interaction of case and county (s,c) *** p<0.01, ** p<0.05, * p<0.1. Observations in this table are a cross section of case s, county c and industry i taken at a moment t > 0. Regression was performed only for only for sectors that did not exist at t 0. If they existed at any time between t ( 7, 0) the observation is excluded from the sample. The industry of the Million Dollar Plant is excluded from the sample.r 2 are increasing because they consider the Fixed Effects. Following Baltagi (2005), we preferred using a linear probability model because it handles better the variety of fixed effects we are controlling for. Moreover, since we are regressing a dummy variable on other dummy variables the coefficients are just conditional means and no prediction escapes from the (0, 1)interval, so the standard limitation of linear probability models is not relevant for our case. 12

13 3.2 Survival of existing industries After describing the pattern of entry into new industries, now we analyze whether there is a differential survival of sectors due to the MDP. Table 4 shows linear probability regressions similar to the ones we run for birth b scit in Eq 2; but now having survival S scit on the left hand side. The coefficient of the treatment effect for industries related to the MDP (i.e. γ 3 ) is always statistically insignificant with point estimates close to zero. Regarding the existing industries that are farther away from the MDP s industry (i.e. γ 1 ), we observe a 1 to 2% lower survival for the treatment group; coincident with the results we found in the last specification of the industry-birth regressions in Table 4. Again, this lower survival of existing industries that are less connected to the MDP seems consistent with a general equilibrium story in which the county focuses more on its comparative advantage; with factor prices driving other industries out. Although there is no difference between treatment and control groups, the coefficient γ 2 on survival is systematically in the range of 12 to 6 percentage points, indicating that survival in industries close to the MDP was higher for both groups. 13

14 Table 4. Linear probability regressions of the survival of existing industries 1 [Industry exists at t=5 given it existed before MDP] (1) (2) (3) (4) 1 Treated ; γ * * ** ( ) ( ) ( ) (0.0101) 1 [Related to MDP] ; γ *** 0.156*** 0.156*** *** (0.0110) (0.0178) (0.0178) (0.0134) Treated [Related to MDP] ; γ (0.0269) (0.0330) (0.0330) (0.0245) Constant ; γ *** 0.853*** 0.834*** 0.856*** ( ) (0.0353) ( ) (0.0273) Case s FE FE FE Year of t = 0 FE FE 2 digit SIC Industrial Sector FE Observations 52,536 52,536 52,536 52,536 R-squared Robust standard errors in parentheses, clustered at the interaction of case and county (s,c) *** p<0.01, ** p<0.05, * p<0.1. Observations in this table are a cross section of case s, county c and industry i taken at a moment t > 0. Regression was performed only for only for sectors that did not exist at t 0. If they existed at any time between t ( 7, 0) the observation is excluded from the sample. The industry of the Million Dollar Plant is excluded from the sample.r 2 are increasing because they consider the Fixed Effects. Following Baltagi (2005), we preferred using a linear probability model because it handles better the variety of fixed effects we are controlling for. Moreover, since we are regressing a dummy variable on other dummy variables the coefficients are just conditional means and no prediction escapes from the (0, 1)interval, so the standard limitation of linear probability models is not relevant for our case. 3.3 Initial conditions: differences before the MDP. One concern we can have is that maybe the winner county was simply a better ecosystem of industrial activity than the runner up county, and that could be the reason why companies finally decided to locate in the treatment group. To deal with that concern in Table 5 we look at the conditions before the MDP arrived, finding that there was not more activity on treated counties, at least when looking at industries related to the MDP. In all our specifications ˆγ 3 are statistically insignificant and quantitatively small. This is reassuring because our central finding remains robust: the 14

15 additional birth around the MDP of the treatment group is not explained by the differences in the original density of industries around the MDP. In addition specifications (1) to (4) show that both treatment and control groups have higher density of industries related to the MDP, namely a 1 to 3% higher probability of existing if an industry is related to the MDP (p-value <0.05). Thus, the original conditions mirror the survival results we observed in Table 4. Finally we also observe that the treatment group has fewer industries among those that are unrelated to the MDP, with a γ 1 ranging between 7 and 9 percentage points of lower probability of existence. If any, winner counties started with less density of activity in industries far away from the MDP, while did not have significantly different density of industries around the MDP. This is again supportive of the view that the additional births we observed in subsection 3.1 could have been caused by the entry of the MDP rather than by better initial conditions. 15

16 Table 5. Linear probability regressions of the existence of an industry at t = 1 1 [Industry exists at t = 1] (1) (2) (3) (4) 1 Treated ; γ ** *** *** *** (0.0329) (0.0207) (0.0207) (0.0207) 1 [Related to MDP] ; γ *** 0.322*** 0.322*** *** (0.0455) (0.0427) (0.0427) (0.0308) Treated [Related to MDP] ; γ (0.0846) (0.0755) (0.0755) (0.0527) Constant ; γ *** 0.452*** 0.604*** 0.779*** (0.0200) (0.157) (0.0727) (0.0748) Case s FE FE FE Year of t = 0 FE FE 2 digit SIC Industrial Sector FE Observations 104, , , ,341 R-squared Robust standard errors in parentheses, clustered at the interaction of case and county (s,c) *** p<0.01, ** p<0.05, * p<0.1. Observations in this table are a cross section of case s, county c and industry i taken at a moment t > 0. Regression was performed only for only for sectors that did not exist at t 0. If they existed at any time between t ( 7, 0) the observation is excluded from the sample. The industry of the Million Dollar Plant is excluded from the sample.r 2 are increasing because they consider the Fixed Effects. Following Baltagi (2005), we preferred using a linear probability model because it handles better the variety of fixed effects we are controlling for. Moreover, since we are regressing a dummy variable on other dummy variables the coefficients are just conditional means and no prediction escapes from the (0, 1)interval, so the standard limitation of linear probability models is not relevant for our case. 3.4 A boom in non-tradable industries So far we have argued that there is disproportionate birth of new industries related to the MDP plant, in what could be called the emergence of a new cluster. We also observe a some small reduction in industry births and a reduction in survival for industries that are not related to the MDP, which seems consistent with General Equilibrium effects discussed before. To complement the story, here we also show that there was additional entry of firms in some non-traded sectors, like construction and services. Table 6 shows the treatment effect for t ( 5, +5) on the log 16

17 number of Establishments, aggregating board sectors. By the 5th year of the MDP, the treatment group shows an increase of 7 percentage points in Construction, 4 pp in Retail, 7 pp in Finance and 4 pp in Services. In contrast, we fail to observe statistically significant changes in aggregate log number of establishments among Manufacturing, Wholesale trade and Transport; all of them arguably more tradable in nature than the previously mentioned sectors that grew. A similar although not exact pattern is followed on Table 7 showing the increase in aggregate employment in some non-tradable sectors after the MDP. One difference is that when one looks at manufacturing, it increased overall employment, although not the number of firms. Overall, the evidence seems consistent with a boom in non-traded sectors. Table 6. Estimated Treatment Effect (Treatment vs Control) on the log Number of Establishments in various sectors at different event times VARIABLES Cons Manuf Transport Whole Retail Fin Serv. (1) (2) (3) (4) (5) (6) (7) before *** before *** before *** before ** *** before ** after ** * * after * * * after * ** * after ** ** ** after *** ** * ** Observations 2,635 2,635 2,635 2,635 2,635 2,635 2,635 R-squared. Standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1 17

18 Table 7. Estimated Treatment Effect (Treatment vs Control) on the log aggregate Employment in various sectors at different event times VARIABLES Cons Manuf Transport Whole Retail Fin Serv. (1) (2) (3) (4) (5) (6) (7) Before ** Before Before Before before after ** after *** after * * *** after ** *** ** after * *** ** Observations 2,635 2,635 2,635 2,635 2,635 2,635 2,635 R-squared. Standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1 18

19 4 Concluding remarks Does bringing one industry to a region facilitate the emergence of other related industries? This question on the early formation of clusters has attracted lots of attention among both economists and policymakers in development. Nonetheless, there is very little evidence of a causal channel between one industry impacting the entry of others. In this paper we used two decades of US county-level data from 1977 to 1997 and combine it with the Million Dollar Plants naturalexperiment (MDP), in which a large industrial plant located in one shortlisted county for arguably exogenous reasons (see Greenstone, Hornbeck, and Moretti, 2010; JPE). After documenting the existence of a boom in non-tradable activities, our main result is that the subsequent birth of new industries related to the MDP s industry is 50%-75% higher for treated counties than for the control group. In contrast, the difference between treatment and control groups is economically and statistically insignificant for industries that are unrelated to these arguably exogenous MDP plants. Our preliminary results support the view that today s industries causally impact the type of firms that will born in the future. There is hysteresis in how regions locate in the industry-space", so movements in this space can have strategic consequences for subsequent entrants. The findings in this paper are preliminary and require further research. References BALTAGI, B. H. (2005): Econometric Analysis of Panel Data. John Wiley & Sons. BUSTOS, S., C. GOMEZ, R. HAUSMANN, AND C. A. HIDALGO (2012): The Dynamics of Nestedness Predicts the Evolution of Industrial Ecosystems, PLOS-One. CHATTERJI, A., E. L. GLAESER, AND W. R. KERR (2013): Clusters of Entrepreneurship and Innovation, Working Paper 19013, National Bureau of Economic Research. DELGADO, M., M. E. PORTER, AND S. STERN (2010): Clusters and entrepreneurship, Journal of Economic Geography. (2012): Clusters, Convergence, and Economic Performance, Working Paper 18250, National Bureau of Economic Research. ELLISON, G., E. L. GLAESER, AND W. R. KERR (2010): What Causes Industry Agglomeration? Evidence from Coagglomeration Patterns, American Economic Review, 100(3),

20 GLAESER, E. L., AND W. R. KERR (2009): Local Industrial Conditions and Entrepreneurship: How Much of the Spatial Distribution Can We Explain?, Journal of Economics & Management Strategy, 18(3), GREENSTONE, M., R. HORNBECK, AND E. MORETTI (2010): Identifying Agglomeration Spillovers: Evidence from Million Dollar Plants, Journal of Political Economy. HAUSMANN, R., AND B. KLINGER (2006): Structural Transformation and Patterns of Comparative Advantage in the Product Space, CID Working Paper Series, Harvard University. HIDALGO, C. A., B. KLINGER, A. L. BARABASI, AND R. HAUSMANN (2007): The product space conditions the development of nations, Science, 317(5837), 482. IOANNIDES, Y. M. (2012): From Neighborhoods to Nations: The Economics of Social Interactions. Princeton University Press. JACOBS, J. (1969): The economy of cities. Random House. KLEPPER, S. (2010): The origin and growth of industry clusters: The making of Silicon Valley and Detroit, Journal of Urban Economics, 67(1), 15 32, <ce:title>special Issue: Cities and Entrepreneurship</ce:title> <ce:subtitle>sponsored by the Ewing Marion Kauffman Foundation ( MARSHALL, A. (1920): Industry and trade, a study of industrial technique and business organization; and of their influences on the conditions of various classes and nations. McMillan, 3rd edn. ROSENTHAL, S., AND W. STRANGE (2006): The micro-empirics of agglomeration economies, in A companion to Urban Economics, ed. by R. Arnott, and D. McMillen, chap. 1, pp Blackwell Publishing. 20

21 5 Appendix Figure 3. Cumulative distribution of proximity across 692 (692-1) SIC 4 digit industries in the US using CBP data from Vertical line at 0.15 units of proximity. Table 8. Was the MDP s industry new in the county? Treatment 0 1 Total MDP industry existed at t= % 19% 48% % 19% 52% Total % 38% 100% Pearson s chi squared (1) = Pr =