Comparing spatial networks: A one-size-ﬁts-all efﬁciency-driven approach

Spatial networks are a powerful framework for studying a large variety of systems belonging to a broad diversity of contexts: from transportation to biology, from epidemiology to communications, and migrations, to cite a few. Spatial networks can be described in terms of their total cost (i.e., the total amount of resources needed for building or traveling their connections). Here, we address the issue of how to gauge and compare the quality of spatial network designs (i.e., efﬁciency vs. total cost) by proposing a two-step methodology. First, we assess the network’s design by introducing a quality function based on the concept of network’s efﬁciency. Second, we propose an algorithm to estimate computationally the upper bound of our quality function for a given network. Complementarily, we provide a universal expression to obtain an approximated upper bound to any spatial network, regardless of its size. Smaller differences between the upper bound and the empirical value correspond to better designs. Finally, we test the applicability of this analytic tool set on spatial network data-sets of different nature.

(Received 25 April 2019; accepted 3 March 2020; published 3 April 2020) Spatial networks are a powerful framework for studying a large variety of systems belonging to a broad diversity of contexts: from transportation to biology, from epidemiology to communications, and migrations, to cite a few. Spatial networks can be described in terms of their total cost (i.e., the total amount of resources needed for building or traveling their connections). Here, we address the issue of how to gauge and compare the quality of spatial network designs (i.e., efficiency vs. total cost) by proposing a two-step methodology. First, we assess the network's design by introducing a quality function based on the concept of network's efficiency. Second, we propose an algorithm to estimate computationally the upper bound of our quality function for a given network. Complementarily, we provide a universal expression to obtain an approximated upper bound to any spatial network, regardless of its size. Smaller differences between the upper bound and the empirical value correspond to better designs. Finally, we test the applicability of this analytic tool set on spatial network data-sets of different nature. DOI: 10.1103/PhysRevE.101.042301

I. INTRODUCTION
A large variety of systems, both natural and artificial, are composed of interconnected units embedded in space. All these systems can be mapped onto spatial networks [1,2], a powerful framework, which provides the mathematical and conceptual tools to study them formally. Such a framework, built on basic common features, applies to a broad diversity of contexts: from transportation [3][4][5][6][7] to biology [8,9], from neuroscience [10] to animal behavior [11][12][13], from epidemiology [14][15][16] to communications [17] and migrations [18], to cite a few.
Spatial networks are those whose nodes have associated spatial coordinates. Consequently, links, that is, connections between nodes, are characterized by the distance between the pair of nodes they connect. Such a distance usually, translates into a cost, standing for the amount of resources needed for building or traveling (or both) a given connection. For example, the cost to build a road connecting two cities increases as its extremities are farther from each other.
Most of the literature on the topic [4,19,20] considers the case of edges' costs directly proportional to their length (i.e., * ignacio.morer@gmail.com † alessio.cardillo@urv.cat ‡ albert.diaz@ub.edu § luceprignano@ub.edu slozanop@ub.edu distances between connected nodes). Despite other possible options (e.g., cost proportional to a monotonically increasing function of the connection's length), this is generally considered a good description of almost the totality of the systems representable as spatial networks. Notice that construction or operative costs can also be assigned to nodes in particular circumstances (e.g., routers in communication networks). In this paper, we neglect such node costs. By doing this, spatial networks can be characterized in terms of a total cost defined as the sum of their edges' costs.
When comparing two spatial networks with the same node layout it is reasonable to think that the one with a higher total link length can achieve better efficiency. Likewise, the same total link length might not be able to communicate in an equally efficient way two sets of nodes, which are different in number and relative location. Our goal is to enable a meaningful comparison between two spatial networks even in such cases. To reach that goal, we assume the total cost to be an external constraint (i.e., determined by external factors fixing the amount of resources available for building connections), and focus on assessing to what extent such resources have been employed profitably. Specifically, we frame the quality of a network with regards to the total cost and establish a reference of how better could it be under that constraint, similarly to what was done in Ref. [21] for nonspatial networks. It is also a very similar approach to that of Cardillo et al. in which real networks (urban street patterns) are compared by rescaling both their efficiencies and costs with values obtained from suitable null cases [22].
Here, we propose a two-step methodology: (i) to assess the design of different networks by means of a quality function; (ii) to compare those values with an upper bound estimated computationally and therefore calculate how far each system lies from an optimal, equally constrained network.
The paper is organized as follows. After characterizing the behavior of some reference models of spatial networks, we introduce the concept of integrated efficiency, E int , and set it as our quality function (Sec. II). Then, in Sec. III we devise an algorithm to estimate the maximum value that such a metric can take (upper bound) for a given set of node positions (node layout) and a given total link length (constrained total cost). We then provide an approximate universal relation that allows us to determine the upper bound of the integrated efficiency as a function of the average distance between nodes and the total link length, for any number of nodes. In Sec. IV, we measure the quality of several empirical systems and put them in context with their optimal counterparts, to enable a meaningful comparison regardless of their individual characteristics. Finally, as an example, we present some potential applications.

II. GLOBAL, LOCAL, AND INTEGRATED EFFICIENCIES
Given a spatial network G with N nodes, its structure is completely determined by the adjacency and distance matrices, A and D. Their corresponding elements {a i j } take value 1 (0) if the connection exists (does not exist), and {d i j } take finite positive values corresponding to the spatial distance between nodes i and j [23]. These two matrices fully determine the shortest paths matrix L, whose elements {l i j } stand for the length of the shortest path between nodes i and j. l i j is a straight sum of the weights of the links in the path, no matter the number of steps. 1 There are several ways to assess the quality of a spatial network. We chose to use the concept of network efficiency, which is, essentially, a comparison between the spatial distance and the shortest path. The efficiency in the communication between two nodes i and j, E i j corresponds to the ratio between the spatial distance and the shortest path length, E i j = d i j /l i j , commonly known as detour index or route factor [2]. By computing the average over all pairs of nodes we obtain the global efficiency as proposed in Ref. [24], (despite an alternative definition has been given in Ref. [25]): E glob quantifies the ability of the system as a whole to communicate efficiently among its elements. Additionally, it is also relevant to assess the fault tolerance of the system's communicability at local level. To this aim, Latora and Marchiori [25] introduced the so-called local efficiency, E loc . Such an indicator measures how efficient is the communication in the local neighborhood of a node i after its removal, thus accounting for the robustness of the connectivity against local damage. In this paper, we adopt a modified E loc proposed by Vragovic et al. [24] that measures the efficiency of the communication between any two neighbors j and m of node i, no matter how far they are, considering all possible paths connecting them: where i represents the local subgraph of neighbors of node i and l jm/i is the length of the shortest path joining nodes j and m in absence of i, and l jm/i = ∞ when j and m belong to disconnected components. Finally, k i is the degree (i.e., number of connections) of node i. Given a certain layout of the nodes in space, there are multiple connectivity patterns presenting (approximately) the same total length, L tot = i, j a i j d i j . Among the plethora of spatial network models available in the literature [4,19,[26][27][28][29], we selected a few models as benchmarks, thus encompassing a wide spectrum of possibilities: (i) the minimum spanning tree (MST) [2]; (ii) the greedy triangulation (GT), a maximally connected triangulation minimizing its total length [22]; (iii) the equitable efficiency model (EEM), a growth model adding one link at a time ensuring that such move constitutes the best improvement in the communication among any pair of nodes [30]; (iv) the Gastner-Newman model (GN), a growth model where edges have an effective length, which combines Euclidean and topological distances altogether [19].
A detailed explanation of all the models (except for the well-known minimum spanning tree) can be found in the Appendix.
Starting from a random distribution of nodes in a unit square, we built the corresponding MST, GT, EEM, and GN networks and computed their E glob and E loc . The results are shown in Fig. 1 and denote remarkable differences amidst the benchmarks. Preliminary work confirmed that differences persist when one considers different spatial distributions of points, that is, they do not depend critically on the details of the node positions. Additionally, the analysis of spatial layouts of the empirical networks considered in this study (see Sec. IV), showed that the distribution of distances among the points is compatible with a random uniform distribution.
In Fig. 1, points correspond to the networks grown until their total length is almost the same as the GT one. Lines, instead, account for the intermediate stages corresponding to the growth phase, i.e., progressive addition of edges, of the model (if available). A first glance at the panels reveals some interesting features. The first one is the opposite behavior of EEM and GT, with EEM performing better than GT in terms of E glob , and the other way around for E loc . Another feature is the fact that MST has a nonzero value of E glob but E loc = 0, although this is expected since the removal of a single node in a tree implies the impossibility of communicating between its neighbors. Finally, the position of GN networks indicates higher values of both efficiencies but at a higher total cost (i.e., total link length). This is due to the fact that GN networks are built using a cost function different from the mere spatial one. The analysis of Fig. 1 highlights how differently the benchmark models behave with respect to the two efficiencies. Such differences can be leveraged and used to characterize each network using both E glob and E loc . Thus, we can represent each network with the pair of values (E loc , E glob ), which corresponds to a point in a two- Any topology lies inside this square, its position depending on its specific features. For instance, a set of isolated nodes (networks with no links) will lie at the bottom left corner (0,0), while the top right one (1,1) corresponds to the complete graph, which, by definition, has the maximum possible efficiency. Topologies having (E loc , 0) or (E loc , 1) ∀E loc ∈ ]0, 1[ are not allowed since E glob = 0 and E glob = 1 can be obtained exclusively by a set of isolated nodes or a complete graph, respectively. Topologies falling on the E loc = 0 line correspond to treelike (acyclic) graphs, while those falling on the E loc = 1 line are ensembles of disconnected complete subgraphs.
Every model of link growth, i.e., a model that builds networks by progressively adding connections to a set of initially isolated nodes, draws a trajectory in the diagram starting from (0,0) and, if not bounded to stop earlier, reaching (1,1) [see Figs. 1 and 2(a)]. In this sense, we can regard each real network as an intermediate stage of an unknown growth model, ideally connecting the point (0,0) to (1,1). Such a framework provides us with a metric to assess directly how efficient a given topology is from an overall viewpoint: the normalized distance between the point representing the considered topology and the top right corner of the diagram (i.e., the final target of any network growth model).
Therefore, we adopt this metric, which we call integrated efficiency, as a comprehensive measure of the efficiency of spatial networks: which is the Euclidean distance to the (1,1) coordinates in the two-dimensional space displayed in Fig. 2(a). The above definition satisfies a crucial general consideration about the efficiency of real spatial networks: They perform reasonably well at both local and global scale [26,31]. Indeed, this specific formulation encapsulates equally both scales by rewarding the balance between the two efficiencies. Consider the alternative, much simpler, measure E int = (E glob + E loc )/2. Three hypothetical topologies located at coordinates (0,1), (1,0), and (0.5,0.5), respectively, would score the same in terms of E int . On the contrary, the proposed measure E int takes a higher value in the third case, enhancing the balance between E glob and E loc . The behavior of E int for benchmark models as a function of L tot is displayed in Fig. 2 For simplicity, we consider a formulation of E int where the contributions of E loc and E glob are equal. However, it is possible to alter Eq. (3) including a tunable multiplicative factor to account for asymmetric contributions.

III. COMPARING NETWORK DESIGNS
The measure introduced in the previous section informs whether a certain topology is more or less efficient than another. Our final goal, however, is to compare the design of spatial networks in terms of resource allocation, something that is conceptually different. When we compute a network's integrated efficiency, we are calculating, by definition, how far it lies from the complete graph, which is the only graph reaching maximum values of E loc and E glob simultaneously. Nonetheless, there is a limit to how close a system can get to such an extreme, which is conditioned by the total cost L tot . A simple solution to this issue would be to divide the integrated efficiency, E int , by L tot /L cg , where L cg is the cost of the complete graph with the same node layout. However, such a rescaling procedure implies assuming a linear dependence of the efficiency on the total link length, which in general is not true. For the purpose of a fair comparison, it is essential to avoid arbitrary assumptions that may introduce biases.

A. Quasiexact comparisons: A numerical recipe
Our proposal is based on a very simple idea: to build the best possible network for a given amount of resources (i.e., , and integrated (E opt int ) efficiencies with respect to the total length of the system, L tot , for the networks generated using our optimal model. (b) Evolution of the average local ( E loc ), and global ( E glob ) efficiencies for the same networks. The green star denotes the values at which 90% of the nodes belong to the giant component. Results are averages over N real = 100 different layouts of N = 100 nodes uniformly distributed on the unit square. Continuous black lines stand for E int equilines. a given total cost L tot ) and fixed node layout. In this manner, for every network under study, we can construct its optimal counterpart and hence compute E int = E opt int − E int , where E int is the integrated efficiency of the original network and E opt int that of its optimal counterpart. This value quantifies the room for improvement in a design with the same amount of resources. More importantly, if we consider two systems with different total cost and spatial scale, E int enable to perform an indirect but fair comparison between them.
Our goal is not to build an improved version of real networks, which should display all their ideal characteristics, but, rather, to generate appropriate benchmarks by computing the maximum value that a given object function could take under certain constraints. As a consequence, we have designed a growth algorithm that works as follows: we start considering an empty graph G with N nodes. Then, we add edges iteratively until the total length of the graph reaches the desired one. Adding edges means adding consecutive line segments in the (L, the origin with the [L tot , E int (L tot )] point. Since we want to maximize the value of E int (L) for L = L tot , at each iteration we have to look for the segment with the maximum vertical slope, i.e., the edge maximizing the ratio between the variation of E int and the increase in total cost: where L and E int are the total weight of links and the integrated efficiency at the current step, respectively, andL andẼ int stand for the same quantities after adding the link (i, j). Since the identification of the edge to add involves the evaluation of the contribution of all the possible candidates, the overall procedure is completely deterministic. Even though it is not possible to ensure that the topologies produced by such an algorithm reach the maximum possible value of the integrated efficiency, there are strong hints that they are very close to it. The optimization of E int is a non-Markovian process and there exists the chance that different choices, locally not optimal, could lead to a better final result. To address this issue, we have explored the possibility to use alternative search methods based on simulated annealing. Our conclusion was that slightly higher values of E int may be reached by a limited number of alternative topologies, perhaps with a different balance between the local and global efficiencies, requiring a considerably higher computational cost (i.e., exploring the space of the configurations more exhaustively) in order to be discovered. 2 In Fig. 3, we display the average behavior of E glob , E loc , and E int against L tot for the networks generated using our optimal algorithm. We also report the evolution of the global and local efficiencies across the growth process according to the bidimensional representation adopted in Fig. 2(a). In particular, averages are computed over 100 random distributions of N = 100 nodes within the unit square. Figure 3(a) tells us that the algorithm first favors increases in E loc up to a point where it is not possible to increase E int further at the expense of E loc . Unlike E glob , this metric has a nonmonotonous behavior. For example, a system made of separated cliques has a E loc = 1, and adding any other link will reduce E loc . This is the evolution's least interesting phase since, at this stage, systems are mainly composed by many connected components while the overall connectivity is extremely low. After the peak, the algorithm begins to link these isolated components at the expense of E loc increasing E glob . When the total cost is roughly equal to that of the MST, the curves of E loc , E glob , and E int merge together and start to behave in the same way.

B. Approximated comparisons: An estimated universal upper bound
The above algorithm provides an optimal network counterpart usable as a reference. However, in practice, the increase of the system size N severely affects the runtime of the algorithm, in particular of the calculation of E loc , jeopardizing the applicability of our methodology to large size systems. To overcome this limitation, we determined the expected value of E int for any value of L tot and any N, in the case of layouts of nodes randomly distributed in a square. First, we studied the behavior of the integrated efficiency against the overall cost, for several layouts of N ∈ {200, 300, 400} nodes. Figure 4(a) displays the behavior of the E int curves, confirming that the results are robust across sizes. Specifically, the increase in size translates into a shift of the E int curve towards higher values of L tot .
By rescaling the x coordinate of plots in Fig. 4, we collapse them into a single, universal one, which is the same for any value of N. This leads to the definition of a normalized total cost of a network G, L , which reads: where d stands for the average spatial distance among nodes and L cg is the length of the complete graph having the same node layout as G. Such a normalized length can be rewritten as a combination of two variables: d and N, since the length of a complete graph is Hence, we obtain: We have found that α = 1/3. As we can observe in Fig. 4(b), the rescaling of L tot returns perfectly overlapped curves. Such a new, universal, curve allows us to compute the expected maximum value of E int for a system with a given L , providing an approximate upped bound to perform an indirect comparison between different systems. Specifically, given an empirical network with a certain L , we can use the difference between its actual value of integrated efficiency and its expected maximum value, as a proxy of the system's performance.
To improve the usability of this upper bound in realworld applications, it would be extremely useful to provide an analytical expression for the dependence of the expected maximum value of E int on L . However, for L < 1, networks are usually disconnected and the behavior of the optimal integrated efficiency is very noisy [see also Fig. 3(a)]. In order to avoid this difficulty, we restricted the range of admissible values of the normalized length to L 0.91 (thus discarding the region where the slope of the curve is almost zero) and fitted our numerical data to the relation: Using nonlinear least squares, we found that c 1 = 0.187, c 2 = 0.406, and c 3 = 1.211. In this way, given any real network, it is possible to calculate the expected upper bound for its integrated efficiency without generating any artificial network, but simply from the average distance between its nodes and its total cost, through L and Eq. (7). Including shorter total link lengths (i.e., L 0.91) would need a separate specific discussion, which goes beyond the scope of the present work.

IV. APPLICATIONS
To illustrate the applicability of the proposed analytical tools, here we use them to compare some real spatial networks based on their integrated efficiency. Specifically, we consider seven different collections of networks, a few of them (UK Flights, Cities, and Latium Vetus/Southern Etruria) correspond to successive snapshots of the same evolving system. The collections are as follows.
UK Flights: Time-varying network of domestic flights in the United Kingdom between years 1990 and 2003 [32]. Nodes correspond to airports, while an edge between two airports accounts for the distance among them. For each year/graph, we keep only those routes with, at least, 5000 carried passengers across the year.
Cities: The evolution of urban street patterns of a small region in northern Italy, captured in four snapshots between 1955 and 2007 [33]. Nodes correspond to the intersection between two streets or dead ends, while the weight of an edge corresponds to the length of the street connecting two nodes. For computational reasons, we consider only a small rectangular sample of the whole data set centered around a single village.
Latium Vetus/Southern Etruria: The networks of trails among settlements between 950 and 509 BC (Iron Age) in two regions of Italy, namely: Latium Vetus (LV) [34] and Southern Etruria (SE) [30]. Nodes represent settlements, while an edge denotes a direct route connecting them. From older to earlier, we have five snapshots for both regions: Early Iron Age 1 Early, Early Iron Age 1 Late, Early Iron Age 2, Orientalizing Age, and Archaic Age. We label these snapshots chronologically as (LVx/SEx) with x ∈ [1,5]. The snapshots do not have the same duration, but the properties of the system are more or less stable within each snapshot.
Catalonia railway: This network describes the current regional railway network in Catalonia. Nodes correspond to aggregated groups of contiguous towns, while edges denote the length of the railway line connecting them.
Hispania roads: The networks of trails among cities and towns in Hispania (Iberian peninsula) during the Roman Empire. As for the Latium Vetus and Etruria collections, nodes represent settlements, while an edge denotes a direct route connecting them.
Rome railway: The network of rail connections in Rome, where nodes represent stops or stations and link constitutes a direct connection between two nodes [35]. Weights correspond to the geodesic distance between both ends of the link. The original data-set splits many stops in two, each one corresponding to the two ways of the line. We simplify the network by merging stops having the same name into a single node.
Power grid: A simplified model of the power grid network of Italy, where transmission lines are assumed bidirectional and identical. It is a subset of a model of the continental European electricity system, comprising 1494 buses and 2156 lines [36]. The main topological features of all these networks are reported in Table I. Such table reveals the diversity of the  networks under analysis.  TABLE I. Summary of the topological indicators for all the empirical datasets. For each network, we report its number of nodes, N, of edges, K, the edge density, ρ, the total length for the empirical, L tot , and complete graph, L cg , as well as the average spatial distance among the nodes, d , the rescaled length of the system, L , and their local E loc , global E glob , and integrated E int efficiencies. Finally, for each network dataset, we report the bibliographic source of the data.

A. Direct comparison of efficiencies
Efficiency values in Table I shows interesting features among the variety of datasets. First, we see that E loc and E glob in the UK Flights fall within a narrow range of values centered around 0.6, despite the edge density ρ is fairly high. On the other side, we noticed that the rest of networks are all terrestrial, and tend to be more efficient at a global rather than local scale. This is in line with the principles behind the design and growth of such kind of networks, which tend to privilege treelike structures spanning the whole system at the expenses of resilience [26,28,31,37]. In this sense, terrestrial infrastructure networks are likely to show fairly high E glob , since they provide paths among all nodes with little chance to large route factors. Nevertheless, exceptions are found. For example, the rail network of the city of Rome shows two connected components, dragging down the value of E glob compared to other similar systems. On the other end, terrestrial networks are more vulnerable at the local level, as denoted by their values of E loc . Beyond comparing the efficiencies of the different spatial networks in a descriptive way, our methodology allows for a systematic comparison of diverse spatial networks considering both their E int and L tot .

B. Systematic comparison considering total length
For each empirical network, we generate an optimized one using the algorithm presented in Sec. III while preserving the node layout (i.e., their position) and the total length, L tot . First, we analyze the differences on both local and global efficiencies separately. We compute the differences E X = E opt X − E X , X ∈ {loc, glob} among the efficiencies of the empirical and optimal networks. In Fig. 5(a)  Railway networks E glob < 0 (i.e., they are more efficient than their optimized counterparts). Latium Vetus, Etruria, and the Italian Power Grid, instead, fall very close to the optimum. Another interesting feature is that UK Flights and Rome Railway networks are sub optimal both locally and globally, confirming our guess about the existence of criteria behind their design other than merely spatial ones.
We sort all the data sets according to their E int = E opt int − E int and check how far the real networks lie from the upper bound [ Fig. 5(b)]. The extent of efficiency's difference tells us how much better the systems could have performed consuming the same amount of resources. The differences range from E int ≈ 15% (City 2007, LV5) to above 50% (Rome Railway), while the majority of the real networks are about 20% less efficient than their optimal counterparts.
We have thus proven that it is relevant to consider the upper bound of the integrated efficiency of each real network since it provides novel, complementary information with respect to the mere value of E int . However, as discussed in Sec. III B, the computational cost of determining such upper bound increases rapidly with the size and total cost of the system under scrutiny. Therefore, it is interesting to assess whether replacing the value of E opt int with its expected value provided by Eq. (7) leads to similar results. By doing so, we are disregarding the details of the node layouts, while still considering their overall characteristics through the average node distance. In Fig. 6 we report the values of the empirical E int as a function of the rescaled length L . As expected, all the values of E int in empirical systems lie way below the optimal curve. Generally speaking, networks with higher connectivity are more efficient than those with less links, albeit being more costly. However, this is not always true. By looking at Fig. 6 we notice that UK Flights networks attain approximately the same efficiency of Southern Etruria ones, but with a much higher cost. We rank networks according to E int =Ē opt int − E int , the difference between their E int and the corresponding value FIG. 6. Integrated efficiency, E int as a function of the rescaled length L for the same data sets. The filled area for L 0.91 denotes the region for which the system has not fully percolated yet. The hue of the color is used to order the data set in a chronological way. The continuous black line stand for the optimal curve E opt int (L ).
computed through Eq. (7). We find that rankings based on E int and on E int provide a Spearman rank correlation of r s = 0.618 (p-val = 0.0037). This indicates that the curve is a valid alternative to ranking networks according to the actual difference between optimal and empirical integrated efficiencies. Nevertheless, the correlation's value highlights non negligible discrepancies between the rankings. The reason for such differences is that real spatial layouts differ from layouts in our simulations. Random distribution of nodes in a square show very little fluctuations as indicated by the shadow of the curve. On the contrary, a real network's layout can be far from this distribution, thus affecting the output of a spatial network model [38].

C. Application to networks' comparison in meaningful scenarios
We have shown that our methodology can be used with diverse spatial networks in a plethora of application domains. Now, we aim at presenting how such comparison can be used to address relevant research questions. In the following, we use some of the empirical networks presented above to highlight two meaningful ways of applying our methodology. The first example is an archaeological case of study based on the Latium Vetus and Southern Etruria data sets [30,34]. These two data sets describe the pathway networks connecting settlements in two neighboring regions with similar characteristics through time. Since there are obvious differences in terms of covered distance, system size, and total cost (length) among the ten data sets, the networks corresponding to the two regions cannot be compared directly. On the contrary, our methodology allows assessing which region hosted better designed infrastructures by looking at the efficiency difference to their optimal counterparts. A second example would involve analyzing the evolution of roads in the Cities data set. There, we can assess the efficiency in the area and the evolution of a system that increases both in size and total cost, but for which we initially cannot know how much better it could have performed under those constraints. Despite the strong changes in structural terms (e.g., size increase and density decrease), our methodology would allow for a longitudinal comparison throughout the temporal snapshots in the data set.
Even when the computation of the optimal is jeopardized by the system size (like in 1994 and 2007) we could use the upper bound curve to obtain their estimation.

V. CONCLUSIONS
This paper provides tools to compare the design of different spatial networks in terms of the duality resources employed and quality reached. Following an approach similar to Ref. [21], such a comparison is performed indirectly by contrasting the quality improvement that each one of the networks' designs could reach.
The paper presents the different components of our tool set progressively. First, we have introduced the notion of integrated efficiency, E int , as a metric to quantify spatial networks performance at global and local scale simultaneously, while rewarding the balance between the two. Second, we propose an algorithm to computationally estimate the upper bound of our quality function for a given specific network: we have devised an algorithm to generate networks with maximal E int (E opt int ) with the same node layout and total cost as in the original network. The smaller is the difference between such an upper bound and the empirical value, the higher we consider the design quality of the network under analysis to be.
Since the high computational cost of the optimal network algorithm may hinder its applicability on large networks, we provide a universal expression for approximated upper bound to any network. Considering a setting of N nodes randomly distributed in a unit square, we computed the expected maximal value of E int (Ē opt int ) as a function of the total cost L tot . Then, by defining a rescaled total link length, L , we successfully collapsed theĒ opt int versus L tot curves for different sizes onto a single one. In this way, we have been able to expressĒ opt int as a function of the number of nodes N, the average distance between nodes d , and the total cost L tot of the network under study.
Finally, to test the applicability of our method, we have analyzed the performance of a heterogeneous set of spatial networked systems. We have checked that our approach provides new information beyond the mere comparison between two networks' efficiency. We provide also several examples of application among the included data sets, showing how it helps to compare networks with different size and total length in a meaningful way.
In conclusion, we have shown that a meaningful comparison of spatial networks cannot be exempt from the definition of proper upper bounds with specific cost constraints. This can be done (almost) exactly, by running our maximal efficiency algorithm, or approximately, thanks to the universal curve. The latter constitutes a good approximation for systems whose size does not make the computation of E opt int feasible. However, the particularities of the layout (especially for low L ) may affect the precision of the method. In the future, it will be worth exploring how the specificities of a layout affects a systems' E opt int with respect to the value provided by the curve. Another direction to pursue is to explore the effect of weighting differently the contributions of the local and global efficiencies in the computation of E int .
The following data sets are available as complementary files at [39]: Latium Vetus, Southern Etruria, Catalonia railway and Hispania roads. The rest of the data sets are available online in the following references: Rome railway [35] and Power grid [36]. Regarding Cities data sets, please contact the author (see Acknowledgments section). The code used for running the algorithm can also be found in Ref. [39].

APPENDIX A: GASTNER-NEWMAN MODEL
We present here a brief description of the Gastner-Newman (GN) model to generate spatial networks introduced in Ref. [19]. The main idea behind the model is that there are two types of costs associated with a given network: one related with the construction of the infrastructure, and another related with its usage. Given a graph with N nodes and K edges, G(N, K ) [40], we consider its embedding in the bidimensional space, R 2 . We denote with d i j the Euclidean distance between nodes i and j, respectively. Therefore, the total cost of construction of graph G, T , reads: Where a i j is the element of the adjacency matrix, A, of the graph [23]. The total usage cost, Z λ , instead, is withl i j being the shortest path length between nodes i and j, which, in turn, is the sum of the lengths of the edges forming the path between i and j [23]. The path length depends on a parameter λ such that: with 0 λ 1. Parameter λ accounts for the users' perception of distances. For λ = 0, users give more importance to paths made of few hops, while for λ = 1 they pay more attention to shorter paths (in terms of distance). Finally, the total cost, C, of the whole graph is where parameter γ ∈ [0, 1] determines the relative weight between construction and usage cost. The GN algorithm generates optimal spatial networks minimizing cost C. The cost optimization can be implemented using either greedy or simulated annealing techniques [41]. We decided to implement the latter, since it ensures higher probabilities of finding the optimal network. In our case, we were interested in building GN networks with a given total cost. Hence, given a spatial network G , we compute its cost C according to Eq. (A4).
Considering the same nodes layout of G , we build a GN network G through the following steps.
(1) Create the complete graph G 0 , and compute its cost C(0).
(2) At each step, t, perform with equal probability one of these two operations: (a) Add/Remove an edge: Choose a random pair of nodes i and j, and if they are connected (i.e., ∃ e i j ) we remove the corresponding edge. Otherwise, we add the edge. The removal can take place unless one of the two nodes has degree one (i.e., otherwise the node will get disconnected).
(b) Rewiring: Choose an edge e i j at random. Then, choose a node k = i, j at random and create the edge (i, k) or ( j, k) -if it does not exist already. Finally, remove the edge e i j .
(3) Compute the cost of the resulting graph, C(t ), and then: (a) Accept the move with the following probability p: increase the value of β of a quantity β = 1 + 3 · 10 −6 (with β(0) = 0.1 C MST , and being C MST the cost of the minimum spanning tree of the nodes layout). (4) Repeat stages 2 and 3 either t ∞ = 1.5×10 6 times, or until C C .

APPENDIX B: GREEDY TRIANGULATION
Here we provide a brief description on how to compute the greedy triangulation graph of a given layout of nodes embedded into a bidimensional metric space R 2 . Given a set of N nodes embedded into a two dimensional space, the most connected (planar) triangulation graph G P has, at most, K = 3N − 6 edges [42]; and the maximally connected triangulation minimizing its total length, L, is called the minimum weighted triangulation (MWT). Since no polynomial time algorithm is known to compute the MWT, following Refs. [22,43] we consider, instead, its greedy approximation known as greedy triangulation (GT). Taking N points embedded into a bidimensional Euclidean space, we compute first the corresponding complete weighted graph G 0 with K = N (N−1) 2 edges. Then, we prepare a list of G 0 's edges, sorted in ascending order of length (e.g., using quicksort [44]). Given the schematization displayed in Fig. 7, for each edge or segment, AB, connecting nodes A, B ∈ G 0 we store the following information: Where ID i , x i , y i are the ID, and x, y coordinates of node or point i, M is the middle point of AB, r = d AB 2 is its semilength, and m is its angular coefficient (i.e., the tangent of the angle between AB and the x axis). In its essence, the algorithm to compute the GT graph, G GT , parses the sorted edge list of G 0 and checks whether each candidate edge e ∈ G 0 belongs to the GT or not. After adding the shortest edge or segment of G 0 to the empty set of edges of G GT , we check if any other edge of G 0 's edge list intersects (and eventually how) with those of G GT . To check if two segments AB ∈ G GT and CD ∈ G 0 intersect, and assuming that x A x B and x C x D , we compute the distance between their middle points d 12 = d M AB M CD . Then: (1) If d 12 > (r AB + r CD ) the two segments do not intersect, and thus CD potentially belongs to G GT .
(2) If d 12 (r AB + r CD ) the two segments may intersect, and we need to perform further checks to include or exclude the edge from G GT . To perform such checks, we have to compute the coordinates of the intersection point, (X, Y ), which are: Where x α , y α α ∈ {1, 2} are the coordinates of the middle point of AB if α = 1, or CD if α = 2. If X ∈ [min(x A , x C ), max(x B , x D )], then the two segments will cross for sure [since d 12 (r AB + r CD )] and CD could be discarded (a similar criterion could be established for Y ). For each candidate edge CD, we repeat the procedure described above for all the edges AB already present in G GT . However, there are some exceptions to the intersection rule. In particular, segments sharing one vertex do technically intersect, but without breaking G GT 's planarity and, hence, might belong to G GT . Another case requiring special attention is that of segments either parallel to one axis or perpendicular to them. In such case the check on either X or Y alone is not enough, and we must ensure that both X and Y fall outside their respective intervals, instead. Lastly, for parallel segments (i.e., m CD = m AB ), the relations in Eqs. (B1) have a singularity and, thus, cannot be used to compute the coordinates of the intersection point. If segment CD does not cross any of those of G GT , it can be added to G GT and we proceed to check the next candidate of G 0 's edge list. The algorithm stops either when 3N − 6 edges have been added to G GT , or if no more candidates to check are available. In the latter case, the number of edges of the GT will be lesser than 3N − 6. This is due to the fact that some edges between nodes laying at the outskirts of the node layout might be added without breaking the planarity. However, such edges cannot be represented as straight lines, and thus their intersection cannot be computed using the above-mentioned method. The amount of missing edges is approximately in the order of √ N 3N − 6.

APPENDIX C: EQUITABLE EFFICIENCY MODEL
In this section, we present the essential traits of the equitable efficiency model (EEM) introduced by Prignano et al. in Ref. [30]. EEM is a growth model that builds networks from empty and static spatial node layouts. In its essence, the model adds one link at a time ensuring that such addition constitutes the best improvement in the efficiency of communication among any pair of nodes.
Given a node layout embedded in a two-dimensional space, R 2 , at each step we calculate the route factor, E i j , (i.e., the ratio between the spatial distance, d i j , and the shortest path length, l i j ) between all the pair of nodes i and j. According to its definition, E i j ∈ [0, 1] ∀ i, j; where E i j = 0 when i and j belong to different components of the system (i.e., l i j = ∞), and E i j = 1 when they are directly connected, instead. After computing all the values of E i j , we sort them in ascending order. The connection having the smallest E i j is added to the network, and the above procedure is repeated iteratively until the graph has a total length, L tot , equal to the desired one. However, it is worth noting that, according to the definition of route factor, E i j = 0 for all nodes belonging to different components, regardless of their distance. To ensure a parsimonious usage of resources, and avoid an arbitrary selection of one of the unconnected pairs, we ideally replace l i j = lim →∞ with l i j = where L cg is a large, but finite, length. This replacement implies that the route factor between pairs of nodes belonging to different components will be ranked according to their spatial distance. Therefore, until the graph has one single component, the algorithm will select connections between unreachable nodes, starting from those that are physically closer to each other. This means that the set of links connecting the nodes into a single component is nothing else than the minimum spanning tree (MST) of the layout under consideration.