1 Introduction

Stewart (1941; 1947) applied the law of gravitation to demography, proposing the concept of ‘population potential’ to measure a location’s proximity to the population of the total system. Isard (1954) extended this idea to ‘income potential’, and Harris (1954) further extended it to ‘market potential’ to measure the strength of the attraction created by accessible market size in each location. A distance-decay function replicates the empirical fact that the interaction between two locations decreases as the distance between them increases. The simplest definition of this function is the reciprocal of distance. Krugman’s (1992; 1991) ‘wage-type’ equation predicts the effects of a very similar variable, defined in terms of the structural features of a formal model, thereby founding the New Economic Geography (NEG). For the European regions, later literature showed that the empirical results were very similar whether one used Harris’s indicators or the more sophisticated NEG procedure derived by Redding and Venables (2004).

A key issue in this literature is the measurement of internal market size. As trade costs are assumed to be related to location, locating markets is crucial. For foreign or external markets, market size is assumed to be located at the geographical center of that territory. Distances between centers of areal units are easy to measure. When we wish to compare these calculations to the calculation of internal market size, however, we confront an old problem: what is the distance (trade cost) of a country or region to its own domestic market?

Using cross-sectional regional European data, this paper evaluates the role of measuring internal market size when estimating a wage-type equation. It also shows the effect of including the internal market size on the spatial distribution of the variables. Given that regression is about fitting the average relative values of variables, similarities and differences between the average spatial distribution of variables are key to understanding regression results.Footnote 1

Assuming that the average region is circular, I compare the effects of two alternative methods for measuring internal distances. First, internal distances may be calculated considering the expected distribution of agents in the disk (Stewart 1947; Rich 1980; Keeble et al. 1982). Following this approach, I study the case of proxying internal distances as 1/3 of the regional radius. Second, internal distance may be determined as the mean length of trips joining the center to all possible points within the circle (Bonsall 1975; Frost and Spence 1995; Head and Mayer 2000). Based on this interzonal travel, I study the case of measuring internal distances as 2/3 of the regional radius.

Both approaches to the empirical measurement of internal distances are subject to many caveats—among others, the fact that the main cities in coastal regions are not usually in the center of the region; that maritime routs are frequently ignored; and that an ad hoc geometric justification affects the weight of internal market size in each indicator of market potential built as a summation, the results of which differ for different sample sizes, as will be shown. Moreover, including internal market size implies adding more endogenous information about the region itself, which aggravates the endogeneity problems of the indicator of External Market Potential (Breinlich 2006; Head and Mayer 2006). NEG literature has not carefully studied the consequences of different methods for measuring internal market size. No previous study has focused on the role of the proxy for internal market size in the full variable of market potential.

The paper concludes that using a single rule to proxy Internal Market Potential in the same way for any sample is a bad methodology. Alternative methods give either more (Keeble et al. 1982) or less (Head and Mayer 2000) weight to the importance of domestic markets. To study long-run agglomeration forces in each geographical sample, we should balance methods to proxy the historical role of big cities with reasons to avoid domestic endogenous regional data.

The remainder of the paper is structured as follows. The second section introduces the NEG wage equation. Section 3 presents alternative methods to calculate internal market size. Section 4 presents the methodology and data, and Sect. 5 the results. The final section draws conclusions.

2 An empirical wage-type equation

The so-called ‘wage equation’ is a market-clearing condition of the basic NEG model in which labor is a unique production factor. I will now present a one-sector generalized form of this equation, in which the dependent variable is not wages but marginal costs and thus encompasses many of the ‘wage equations’ previously derived in the literature (Combes et al. 2008; Bruna 2015). For a firm in region \( i\) (\( i=1,\dots, R\)) with zero profit, the maximum value of marginal costs (\( {m}_{i}\)) the firm can afford to pay depends on its accessibility to the markets. It is proportional to its Real Market Potential (\( {RMP}_{i}\)) (to use Head and Mayer’s (2006) term) or Market Access (to use Redding and Venables’ (2004) term), as follows:

$$ {m}_{i}=Constant\cdot{\left({RMP}_{i}\right)}^{\frac{1}{\sigma }}=Constant\cdot{\left(\sum _{j}^{R}{{T}_{ij}}^{1-\sigma }\frac{{E}_{j}}{{S}_{j}}\right)}^{\frac{1}{\sigma }}$$
(1)

where, \( \sigma >1\) is the elasticity of substitution between any pair of varieties in a love-of-variety utility function. \( {RMP}_{i}\) is a weighted sum of the market conditions in the other \( j\) regions, where \( {T}_{ij}\) is the trade cost from firm-or-region \( i\) to region\( j\), and \( {E}_{j}\) is total expenditure in \( j\). \( {S}_{j}\) is called the ‘competition index’ to stress that it measures the level of competition among varieties in \( j\) market, given consumers’ characteristic tastes. NEG’s long-term prediction is that firms and regions with higher Market Potential tend to earn more profits and pay higher remuneration to the production factors, resulting in higher regional income per capita.

If trade costs are proxied by physical distances (\( {d}_{ij}\)),Footnote 2 the explanatory variable of Eq. (1) becomes \( {RMP}_{i}=\sum _{j}^{R}{{d}_{ij}}^{1-\sigma }\frac{{E}_{j}}{{S}_{j}}\). As in some previous literature, marginal costs (\( {m}_{i}\)) can be proxied by data on gross value added per capita (\( GVApc\)) and total expenditure (\( {E}_{j}\)) with data on \( GVA\). Harris’ (1954) index of accessibility to markets, in contrast, can be defined as \( {HMP}_{i}=\sum _{j}^{R}{{d}_{ij}}^{-1}{GVA}_{j}\). Since a \( -1\) trade elasticity to distance is an extremely robust empirical finding in the literature on gravity equations (Head and Mayer 2014; Borchert et al. 2022),Footnote 3 the major difference between \( {RMP}_{i}\) and \( {HMP}_{i}\) lies in \( {S}_{j}\), which is not directly measurable in NEG theory. For samples of European regions, Head and Mayer (2006), Breinlich (2006), and the 2016-draft of a paper by Fichet and Hammer (2018) obtained similar empirical results using both Harris’ indicator and the more sophisticated procedure of Redding and Venables (2004) to proxy \( {S}_{j}\). Bruna (2024a) shows that both approaches capture the core-periphery spatial patterns in the data in a similar way.

Therefore, taking natural logarithms to Eq. (1) and replacing variables by my proxies, I get the following estimable equation:

$$ \text{log}{GVApc}_{i}=C+{\beta \text{log}{HMP}_{i}+u}_{i}$$
(2)

3 Trying to measure the Internal Market potential

NEG’s formal abstraction does not distinguish between internal and external markets. Conversely, during the founding years of regional science, economic geographers were develo** ‘social physics’, which applied concepts from Newton’s law of gravity to study the attraction force of social masses. Using the latter framework, we can disaggregate Harris’ (1954) indicator as follows:

$$ {HMP}_{i}=\sum _{j=1}^{R}{{d}_{ij}}^{-1}{GVA}_{j}={{d}_{ii}}^{-1}{GVA}_{i}+\sum _{j\ne i}^{R-1}{{d}_{ij}}^{-1}{GVA}_{j}={IMP}_{i}+{EMP}_{i}$$
(3)

where \( {IMP}_{i}\) stands for a measure of the Internal Market Potential of region \( i\), and \( {EMP}_{i}\) for its External Market Potential. \( {IMP}_{i}\), or self-potential, represents the portion of total interaction of activities that is intra- rather than interzonal (Rich 1980). It is the mass (\( {GVA}_{i}\)) of the zone divided by the intrazonal distance.

To measure external distances (\( {d}_{ij}\)), the market size of other regions (\( {GVA}_{j}\)) is usually considered to be located at their regional geographic centers (centroids). Where, however, should we locate \( i\)’s market size to calculate an internal distance from its own centroid, \( {d}_{ii}\)? Stewart (1947) suggested representing each zone as a circle of equivalent area and estimating the intrazonal distance from some transformation of its radius, \( {r}_{i}= \sqrt{{area}_{i}/\pi }\). From this starting point, I test two methods for calculating internal distances.

First, Stewart (1947) used a physical analogy: the electric potential of a circular disk at its center is proportional to half its radius. This analogy assumes a uniform distribution, such as setting the internal distance as \( {d}_{ii}=1/2\cdot r_{i}\). Rich (1980) noted that a centered conic or Gaussian distribution would imply a more concentrated mass of population or economic activity. Internal distances could thus be taken to be less than half of the radius to allow for the likely peaking of the mass in and around the centroid.Footnote 4 This approach does not help in choosing a specific ratio of the radius to work with a given sample of spatial data. For a specific sample of European data, Keeble et al. (1982) chose \( {d}_{ii}=1/3\cdot r_{i}=\) 0.188\( \sqrt{{area}_{i}}\).

Second, Bonsall (1975) and Frost and Spence (1995) approached the problem from the perspective of the mean length of trips joining the center to all possible points within a circle. Following Thisse’s suggestion, Head and Mayer (2000) adapted this idea to the NEG empirical literature. Internal distance may be approximated as the average distance from the centroid to all other points of a circular region, as follows: \( {d}_{ii}=2/3\cdot r_{i}= \)0.376\( \sqrt{{area}_{i}}\). Given that Krugman’s micro-fundamentals made social physics obsolete, this clean abstract argument fitted NEG’s modern approach and became the standard in the literature (see, for instance, Breinlich (2006), Tokunaga and ** (2011), or Gambuli (2023). For Kordi et al. (2012), however, “estimating the average intra-zonal trip length is still an ongoing challenge in spatial models.”Footnote 5

Differences between these approaches were not only about measurement. The old tradition of economic geographers was interested in exploratory analysis to map the gradients of accessibility to markets, as distance increases from locations with the most demographic or economic activity (Bruna 2024a). Conversely, NEG’s empirical tradition has been more interested in confirmatory analysis. The NEG literature does not usually discuss the empirical implications of different methods to proxy self-potential. Redding and Venables (2004) and Boulhol et al. (2008) are an exception, though they do not explain the consequences of each alternative in the full variable of Market Potential.

The approach of social physics (Stewart 1947; Rich 1980; Keeble et al. 1982) may be considered as scientifically more honest, in that it recognizes uncertainty about the proper ratio of the circular radius to proxy Internal Market Potential. Conversely, the promise of a general rule by NEG’s empirical researchers using geometrical arguments may sound pretentious.

Since Keeble et al. (1982) place more weight of \( {IMP}_{i}\) on the full variable \( {HMP}_{i}\) (\( {IMP}_{i}\) is \( 3{GVA}_{i}/{r}_{i}\) instead of \( 1.5{GVA}_{i}/{r}_{i}\)), I will focus on this indicator to test the consequences of measuring Internal Market Potential when estimating a wage-type equation.

4 Methodology and data

The sample uses regional European 2019 data defined at aggregation level 2 of the Eurostat nomenclature of territorial units for statistics (NUTS).Footnote 6 It includes 311 regions from 32 countries (but not Switzerland). The explanatory variable is Harris’ indicator of Market Potential. As mentioned, for the estimation of a wage-type equation, I use \( GVApc\) to proxy marginal costs and \( GVA\) to proxy the market size of each region. Inter-regional distances (\( {d}_{ij}\)) are measured as great-circle distances between regional centroids.

I will compare regression results and plots for indicators of External Market Potential and the full variable of Market Potential, which includes a measure of Internal Market Potential. The initial test of NEG in European regions is thus based on the following equation, estimable by Ordinary Least Squares (OLS):

$$ \text{log}{GVApc}_{i}=C+\beta \text{log}{EMP}_{i}+ {u}_{i}$$
(4)

Whereas excluding own regional market reduces the access measure of some economically larger locations (Breinlich 2006; Head and Mayer 2006), including it aggravates the general endogeneity problem of Market Potential in a wage-type equation. I compare the results of Eq. (4) to those derived from using \( {HMP}_{i}\) (Eq. 2) and calculating internal distances as 2/3 or 1/3 of the regional radius.

Using different methodologies to measure \( {d}_{ii}\) affects the share of internal market size in total Market Potential. In this sample, the median share of \( IMP\) in \( HMP\) is 3.6% when \( {d}_{ii}\) is 2/3 of the regional radius and 7.0% when is 1/3 of the radius. Moreover, geometrical arguments about domestic market size imply that \( IMP\)’s share in Market Potential decreases as sample size (and \( EMP\)) increases. This feature reduces comparability between studies and does not help us to understand the historical role of the domestic market for the firms in a particular sample. Additionally, Bruna (2024a) argues that peripheral European regions tend to have a larger area (see Fig. 1)—that is, lower Internal Market Potential—merely because of their size.

Fig. 1
figure 1

Choropleth maps of the logs of Gross Value Added per capita, External Market Potential, and Market Potential (dii 1/3), (311 regions, 2019)

Figure 1 shows choropleth maps of the endogenous variable in Eq. (4), the explanatory variable, and an alternative measure of Harris’ Market Potential considering internal markets as 1/3 of the regional radius. The visual differences between the two right-hand plots are not obvious. In other samples, it is visually clearer that the main difference is for regions with capital cities. In this sample, the weight of \( IMP\) in \( HMP\) is greater than 30% in 20 regions for the 1/3 indicator of internal markets, but in only 5 regions for the 2/3 indicator. With the 1/3 approach, the regions with more than 40% of \( IMP\) in \( HMP\) are Inner London (West), Istanbul, Ile-de-France, Madrid, Brussels, Hamburg, Inner London (East), and Stockholm.

To analyze robustness, I also estimate the following Spatial Error Model (SEM):

$$ \text{log}{GVApc}_{i}=C+{\beta }_{1}\text{log}{EMP}_{i}+{\beta }_{2}{HK}_{i}+{\beta }_{c}{u}_{c}+\lambda \text{W}{u}_{i}+{\epsilon }_{i}$$
(5)

where human capital (\( {HK}_{i}\)) is proxied by Eurostat’s share of population in the labor force with tertiary education. \( {u}_{c}\) are 31 dummy variables for country fixed effects, and \( \lambda \) is the SEM parameter, whereas \( W\) is a spatial weights matrix defined as a row-standardized binary matrix considering the five nearest neighbors.Footnote 7

Additionally, to study the effects of including \( IMP\) on the indicator of market accessibility, I present scatterplots of the values of the variable against the east-north (\( E\), \( N\)) spatial coordinates of the regional centroids (see Fig. 2 below). The horizontal axes represent the location of the regional centroids, measured in kilometers from the origin of the projection system (in the south-west). Dotted lines represent regression lines, and solid lines the results of a locally weighted scatterplot smoother (LOWESS).

5 Results

Table 1 shows the regression results of the wage-type equation for EMP and the two variants of Harris MP including Internal Market Potential. Not surprisingly, the OLS regressions produce a higher coefficient of determination in column (3), for the variable giving more weight to the (endogenous) internal market size. The differences between columns (1) to (3) are, however, negligible. Since including a proxy for internal market size serves as a technical correction for regions with big cities, the inclusion of the internal markets has greater impact in the model incorporating spatial local correlations, as revealed by the estimates in columns (4), compared to those in columns (5) and (6). NEG theory, however, predicts a positive sign for Market Potential but says nothing about magnitudes such as 0.2 or 0.3.Footnote 8

Table 1 OLS and SEM models to explain regional Gross Value Added per capita

Scatterplots of the variables against the centroids’ spatial coordinates help to explain what is happening inside the regression algorithm. The horizontal axes represent the location of those centroids, measured in kilometers from an origin in the south-west. The first row of Fig. 2 shows these plots for the dependent variable of the models in Table 1. The left-hand plot displays the values of \( \text{log}{GVApc}_{i}\) on the distance of each regional centroid as we go from the origin in the west towards the east. The right-hand plot shows the same values in the south-to-north direction. The dotted (regression) lines indicate a general decreasing trend from north-west to south-east. The solid (LOWESS) lines show a core-periphery spatial pattern, with higher values around the geographical center of Europe. This pattern is very clear in the left-hand plot.

The second row of plots shows the strong core-periphery spatial pattern of External Market Potential, with a smoother distribution of values because of the summation in Eq. (2). The third row of plots in Fig. 2 shows the spatial distribution of Market Potential when internal distances are calculated as 1/3 of the regional radius. Some outliers change their position in the plots, mainly those of regions with capital cities, but the core-periphery spatial pattern detected for External Market Potential remains very similar.

Fig. 2
figure 2

Scatterplots of the logs of GVApc, EMP and MP (\( {{d}}_{{i}{i}} \)1/3) against the regional centroids’ coordinates in the east and north directions

6 Conclusions

This paper studies the effects of including a measure of the internal market size when estimating a NEG wage-type equation. The paper shows that including an indicator of Internal Market Potential should be interpreted as a technical correction of the indicators for external market size, and that this correction is mostly relevant for regions with capital cities. Because this inclusion adds ad hoc endogenous information to the explanatory variable of External Market Potential, however, its impact must be examined carefully. Inner London of course has a larger internal market, but we cannot say that London has a high income per capita merely because it has a high income. That makes no sense to study causality.

When estimating a wage-type equation in this European sample, including an indicator of internal market size does not crucially alter the empirical results. Both External Market Potential and the full variable of Market Potential display a core-periphery spatial pattern that helps to explain the core-periphery spatial pattern of European regional per capita income. This property, however, has a different impact on results using different regional samples. The problem of calculating the market size accessible to big cities still has not been solved. General geometrical argumentation about Internal Market Potential does not seem to be a careful method to proceed.

Improvements in the empirical literature on trade costs might help to solve this problem, but there is no easy solution. Theories such as NEG provide fundamental determinants of agglomeration based on historical explanations. Modern statistics on trade cost might not properly represent those historical conditions and their effects, which have accumulated over centuries.

Ultimately, researchers should be explicit when connecting their methods with the historical conditions of the theory studied, compare alternative approximations of those conditions, and identify the effects of those alternatives on the spatial attributes of the data.