A geometry-induced topological phase transition in random graphs

Clustering $\unicode{x2013}$ the tendency for neighbors of nodes to be connected $\unicode{x2013}$ quantifies the coupling of a complex network to its latent metric space. In random geometric graphs, clustering undergoes a continuous phase transition, separating a phase with finite clustering from a regime where clustering vanishes in the thermodynamic limit. We prove this geometric-to-nongeometric phase transition to be topological in nature, with anomalous features such as diverging entropy as well as atypical finite size scaling behavior of clustering. Moreover, a slow decay of clustering in the nongeometric phase implies that some real networks with relatively high levels of clustering may be better described in this regime.


Introduction
For many years, Landau's theory of symmetry breaking was believed to be the ultimate explanation of continuous phase transitions 1 . In the liquid-crystal transition, for instance, the continuous translational and rotational symmetry at high temperatures break into a set of discrete symmetries in the low temperature phase. This paradigm was challenged for the first time by Berezinskii, Kosterlitz, and Thouless (BKT) in the two dimensional XY model 2,3,4 . For this model, the Mermin-Wanger theorem 5 states that there is no ordered phase even at zero temperature, so that 1 a phase transition in Landau's sense cannot exist. Yet, BKT showed that, in fact, there is a finite temperature phase transition driven by topological defects: vortices and antivortices. At low temperature, vortex-antivortex pairs are bound together. Above the critical temperature, vortex-antivortex pairs unbind, moving freely on the surface.
No symmetry is broken in the transition since both phases are rotationally invariant and so magnetization is zero in both phases. Topological order and topological phase transitions are nowadays fundamental to understand the properties of quantum matter 6 .
In this paper, we show that a transition taking place in a very general class of sparse spatial random networks models is, in fact, topological in nature with no broken symmetry. In this transition, chordless cycles in the network play the role of topological defects with respect to a tree. A critical temperature separates a low temperature phase, where the underlying metric space forces chordless cycles to be short range -mostly triangles-and a high temperature phase, where chordless cycles decouple from the metric space and become of the order of the network diameter. This is similar to the unbinding of vortex-antivortex pairs in the BKT transition. However, the thermodynamics of the transition is very different. As opposed to the BKT transition, the entropy density diverges at the critical temperature. This is also at odds with the continuous entropy density (with discontinuous first-or higher order-derivative) usually observed in continuous phase transitions. We thus describe a topological phase transition with novel thermodynamic properties. The two distinct topological orders of the transition can be quantified by means of the average local clustering coefficient, a measure of the fraction of triangles attached to nodes.
Clustering is finite in the "geometric" phase with short range cycles -as a result of the triangle inequality of the underlying metric space-and vanishes in the thermodynamic limit of the "non-geometric" phase with long range chordless cycles. This geometric-to-nongeometric phase transition shows interesting atypical scaling behavior as compared with standard continuous phase transitions, where one observes a power law decay at the critical point and a faster decay in the disordered phase. Instead, at the critical point, the average local clustering coefficient decays logarithmically to zero for very large systems and, in the nongeometric phase, where the coefficient decays as a power law, we discover a quasi-geometric region where the exponent that characterizes this decay depends on the temperature.

Results and discussion
We use a geometric description of networks 7 , which provides a simple and comprehensive approach to complex networks. The existence of latent metric spaces underlying complex networks offers a deft explanation for their intricate topologies, giving at the same time important clues on their functionality. The small-world property, high levels of clustering, heterogeneity in the degree distribution, and hierarchical organization are all topological properties observed in real networks that find a simple explanation within the network geometry paradigm 7 . Within this paradigm, the results found in this work hold in a very general class of spatial networks defined in compact homogeneous and isotropic Riemannian manifolds of arbitrary dimensionality 8,9,10,11,12,13 . Yet, in this paper, we focus on the S 1 model 9 and its isomorphically equivalent formulation in the hyperbolic plane, the H 2 model 14 .
Interestingly, many analytic results have been derived for the S 1 /H 2 model, e. g. degree distribution 9,14,15 , clustering 14,15,16,17 , diameter 18,19,20 , percolation 21,22 , self-similarity 9 , or spectral properties 23 and it has been extended to growing networks 24 , as well as to weighted networks 25 , multilayer networks 26,27 , networks with community structure 28,29,30 and it is also the basis for defining a renormalization group for complex networks 31,32 . The analytical tractability of the S 1 model makes it the perfect framework for our work.
In the S 1 model, nodes are assumed to live in a metric similarity space, where similarity refers to all the attributes that control the connectivity in the network, except for the degrees. At the same time, nodes are heterogeneous, with nodes with different levels of popularity coexisting within the same system. The popularity of a given node is quantified by its hidden degree. In our model, expected degrees can match observed degrees in real networks and we fix the positions of nodes in the metric space so that generated networks can be compared against real networks. This imposes constraints on the connection probability. Specifically, a link between a pair of nodes is created with a probability that resembles a gravity law, increasing with the product of nodes' popularities and decreasing with their distance in the similarity space. We further ask the model to define an ensemble of geometric random graphs with maximum entropy under the constraints of having a fixed expected degree sequence. This determines completely the form of the connection probability depending on the value of one of the model parameters: temperature 8 . Next, we describe the S 1 model in the low and high temperature regimes. 3

Low temperature regime
The S 1 is a model with hidden variables representing the location of the nodes in a similarity space and their popularity within the network. Specifically, each node is assigned a random angular coordinate θ i distributed uniformly in [0, 2π], fixing its position in a circle of radius R = N/2π. In this way, in the limit N 1 nodes are distributed in a line according to a Poisson point process of density one with periodic boundary conditions. Each node is also given a hidden degree κ i , which corresponds to its ensemble expected degree. In the low temperature regime, each pair of nodes is connected with probability where x ij = R∆θ ij is the distance between nodes i and j along the circle, and β > β c = 1 andμ are model parameters fixing the average clustering coefficient and average degree of the network, respectively. In this representation, the parameter β plays the role of the inverse temperature, controlling the level of noise in the system. To see this, the connection probability in Eq. (1) can be rewritten as the Fermi distribution 8 where the energy of state ij is and where the chemical potential µ = lnμ fixes the expected number of links, as in the grand canonical ensemble.
This result is remarkable as it allows us to map our model to a system of identical particles with Fermi statistics.
First, links in our model are unlabeled -and so indistinguishable-objects. Second, the model generates simple graphs such that only one link can occupy a given state of energy ij . Third, such a state is occupied with the probability given in Eq. (2), which is the occupation probability of the Fermi statistics in the grand canonical ensemble. Thus, the S 1 model is equivalent to a system of noninteracting fermions at temperature T = 1 β 14,8 .
These Fermi-like "particles" correspond to the links of the network and live on a discrete phase space defined by the N (N − 1)/2 pairs among the N nodes of the network. Each such state ij has an associated energy given by ij , which grows slowly with the distance between nodes i and j in the metric space.
Despite the fact that links in the model are noninteracting particles, the system undergoes a continuous topological phase transition at a critical temperature T c = β −1 c = 1, separating a geometric phase, with a finite fraction of triangles attached to nodes induced by the triangle inequality, and a nongeometric phase, where clustering vanishes in the thermodynamic limit 9 . We can analyze the nature of the transition by studying the entropy of the ensemble.
Given the mapping of the S 1 model to a system of non-interacting fermions in the grand canonical ensemble, we start from the grand canonical partition function whereμ = exp µ, with µ = ln β 2π k sin π β so that, in the thermodynamic limit, the average degree is set to k 8 .
Given the homogeneity and rotational invariance of the distribution of nodes in the similarity space, we can place the i'th node on the origin, leading to N identical terms. When the system size is large, we can approximate the sums in Eq. (4) by integrals. This leads to the following expression We can then use the above expression to find the grand potential Ξ = −β −1 ln Z and the entropy as S = β 2 ( ∂Ξ ∂β ) µ From this, we can find the entropy per link of the system as where in the last stepμ was plugged in. Note that E = N k /2 is the number of links -and so particles-in the network. Interestingly, the entropy density is only a function of β, and so independent of the degree distribution.
From Eq. (6), we see that the entropy per link diverges at the critical temperature β → β + c = 1. This implies that there is a sudden change in the behavior of the system at the critical point β = β c , which could indicate the presence of a phase transition. This transition is, however, anomalous -at odds with the continuous entropy density usually observed in continuous phase transitions-and thus cannot be described by Landau's symmetry-breaking theory of continuous phase transitions. Figure 1 shows a numerical evaluation of the entropy for different system sizes in homogeneous networks confirming the divergence of the entropy per link at the critical temperature as predicted by our analysis. Nevertheless, as we show in the SI, entropy per link diverges logarithmically with the system size at β = β c so that the divergence can only be detected for very large systems.

High temperature regime
In the high temperature regime β < β c we again fix the angular coordinate and expected degree of the nodes (κ i , θ i ) so that the degree distribution of the network remains unaltered when temperature is increased beyond the critical point and the model can be directly compared with real networks. Under these constraints, maximizing the 6 entropy of the ensemble leads to the following connection probability 8 withμ (1 − β)2 −β N β−1 / k for β < 1 andμ (2 k ln N ) −1 when β = 1 1 . Notice that this definition of the model converges to the soft configuration model with a given expected degree sequence 33,34,35,36 in the limit of infinite temperature β = 0. As we show in the SI, in this regime long range connections dominate, which causes the entropy density to scale as ln N (see Fig. 1) in the whole interval β ∈ [0, 1] (and so to diverge in the limit N → ∞) and the clustering to vanish in the thermodynamic limit.
Notice that the S 1 model is rotationally invariant both above and below the critical temperature, which implies that there is no symmetry breaking at the critical point. In fact, we argue that β c separates two distinct phases with different organization of the cycles, or topological defects, in the network. Indeed, the cycle space of an undirected network with N nodes, E links, and N com connected components is a vector space of dimension E − N + N com 37 .
This dimension is also the number of chordless cycles in the network as they form a complete basis of the cycle space. In complex networks, we are typically interested in connected or quasi-connected networks, with a giant connected component extending almost to the entire network. In the S 1 model this is achieved in the percolated phase when the average degree is sufficiently high, but still in the sparse regime, so that the vast majority of cycles are contained in the giant component. In this case, by changing temperature without changing the degree distribution, the number of nodes, links, and components remain almost invariant and so does the number of chordless cycles. Thus, the two different phases correspond to a different arrangement of the chordless cycles of the network 2 , as illustrated in the sketch in Fig. 1. This is again similar to the BKT transition since the number of vortices and antivortices is preserved in both phases.
This difference in arrangement of the cycles is caused by the following process. At low temperatures, the high energy associated to connecting spatially distant points causes the majority of links attached to a given node to be local. This defines the geometric phase at β > β c where the triangle inequality plays a critical role in the 1 Here we define 'A B' as 'A is asymptotically equal to B', i.e. that the equality becomes exact as N → ∞. This in contrast to 'A ∼ B' which means that A and B are asymptotically proportional to one another. 2 We, however, notice that the preservation of the number of cycles is not a necessary condition for the transition to take place.  In the geometric phase, there are finite cycles of any order although, as we shown in the SI, the density of triangles is much higher than the density of squares, pentagons, etc. In the nongeometric phase, the cycles are of the order of the network diameter. However, due to the small-world property and finite size effects the diameter of the network can be quite small, so that the distinction between finite cycles of order higher than three and long range cycles can be difficult. Therefore, the average local clustering coefficient -measuring the density of the shortest possible cycles, which are also the most numerous-is the perfect order parameter to quantify this

Finite size scaling of the transition
To quantify the behavior of clustering in this transition, we compute the average local clustering coefficient,c, as the local clustering coefficient averaged over all nodes in a network. The local clustering coefficient for a given node i, with hidden variables (κ i , θ i ), is defined as the probability that a pair of randomly chosen neighbors are neighbors themselves and, using results from 38 , can be computed as In the SI we derive analytic results for the behavior of the average local clustering coefficient when hidden degrees follow a power law distribution ρ(κ) ∼ κ −γ with 2 < γ < 3 and a cutoff κ < κ c ∼ N α/2 . This is done by finding appropriate bounding functions f (N, β) ≤ c(N, β) ≤ g(N, β) that are both asymptotically proportional to N −σ(β) , implying that c ∼ N −σ(β) as well. When β > 1, which we call the geometric region, the average local clustering coefficient behaves as 9 for some constant Q(β) that depends on β. Moreover, there exists a constant Q such that When β c < β ≤ 1, i.e. in the quasi-geometric region, where the vale of β c depends on the parameter α. If α > 1 3 it is given by β c = 2/γ and if κ c grows with N slower than any power law (α = 0) then β c = 2 3 . This includes the case of homogeneous degree distributions with ρ(κ) = δ(κ − k ) that we study in this paper. Notice that the behavior in a close neighborhood of β c is independent of γ. The fact that the microscopic details of the model do not affect this scaling behavior points to the universality of our results.
Finally, when β < β c (in the non-geometric region), the exact scaling behavior depends on α (see the SI for the general case α ≤ 1): These results are remarkable in many respects. First, clustering undergoes a continuous transition at β c = 1, attaining a finite value in the geometric phase β > β c and becoming zero in the nongeometric phase β < β c in the thermodynamic limit. The approach to zero when β → β + c is very smooth since both clustering and its first derivative are continuous at the critical point. Second, right at the critical point, clustering decays logarithmically with the system size, and it decays as a power of the system size when β < β c . This is at odds with traditional continuous phase transitions, where one observes a power law decay at the critical point and an even faster decay in the disordered phase. Third, there is a quasi-geometric region β c < β < β c where clustering decays very slowly, with an exponent that depends on the temperature. Finally, for β < β c , we recover the same result as that of the soft configuration model for scale-free degree distributions 34 . The results in Eqs. (11,12) around the critical point suggest that N eff = ln N plays the role of the system size instead of N . Indeed, in terms of this effective size, we observe a power law decay at the critical point and a faster decay in the unclustered phase, as expected 3 α = 1 defines the onset of structural degree-degree correlations 39 for a continuous phase transition. Consequently, we expect the finite size scaling ansatz of standard continuous phase transitions to hold with this effective size. We then propose that, in the neighborhood of the critical point, clustering at finite size N can be written as with η = 2, ν = 1, and where f (x) is a scaling function that behaves as We test these results with numerical simulations, and by direct numerical integration of Eq. (8) using Eq. (1) for β > β c and Eq. (7) for β ≤ β c , see SI. Simulations are performed with the degree-preserving geometric (DPG) Metropolis-Hastings algorithm introduced in 40 , that allows us to explore different values of β while preserving exactly the degree sequence. Given a network, the algorithm selects at random a pair of links connecting nodes i, j and l, m and swaps them (avoiding multiple links and self-connections) with a probability given by where ∆θ is the angular separation between the corresponding pair of nodes. This algorithm maximizes the likelihood that the network is S 1 geometric while preserving the degree sequence and the set of angular coordinates, and it does so independently of whether the system is above or below the critical temperature. Notice that the continuity of Eq. (14) as a function of β makes it evident that, even if the connection probability takes a different functional form above and below the critical point, the model is the same. Heterogen. γ = 2.7 Num. Integration Homogen. Simulations Homogen.
Num. Integration N ∈ [5 × 10 5 , 10 8 ] and measure numerically the exponent σ(β). In this case, the agreement is also very good for heterogeneous networks. The remaining discrepancy when β ≈ β c is again expected since, as shown in Eq. (11), right at the critical point clustering decays logarithmically rather than as a power law. Finally, Fig. 4 shows the finite size scaling Eq. (13) both for the numerical simulations and numerical integration of Eq (8). In both cases, we find a very good collapse with exponent η/ν ≈ 2 in all cases. The exponent ν, however, departs from the theoretical value ν = 1 in numerical simulations due to their small sizes but improves significantly with numerical integration for bigger sizes. We then expect Eq. (13) to hold, albeit for very large system sizes.
The slow decay of clustering in the nongeometric phase implies that some real networks with significant levels of clustering may be better described using the S 1 model with temperatures in the quasi-geometric regime β < β c .
Given a real network, the DPG algorithm can be used to find its value of β. To do so, nodes in the real network are given random angular coordinates in (0, 2π). Then the DPG algorithm is applied, increasing progressively the value of β until the average local clustering coefficient of the randomized network matches the one measured in the real network. Many real networks have very high levels of clustering and lead to values of β > β c . However, there are notable cases with values of β below the critical point. As an example, in the SI table we show values of β obtained for several real networks with values below or slightly above β c . In fact, some of them are found to be very close to the critical point, like protein-protein interaction networks of specific human tissues 41 , with β ≈ 1, or the genetic interaction network of the Drosophila Melanogaster 42 , β ≈ 1.1.

Conclusion
Our results in this paper show that the dependence of the phase space on an underlying geometry in networks where edges are considered noninteracting particles leads to an anomalous phase transition between different topological orders. Despite particles being noninteracting, the set of states that they can occupy are correlated by the triangle inequality in the underlying metric space. This correlation induces an effective interaction between particles, ultimately leading to a clustered phase at low temperatures. Interestingly, the logarithmic dependence of the state-energy with the metric distance results in the divergence of the entropy at a finite temperature β c and, thus, to a different ordering of cycles below β c , where clustering vanishes in the thermodynamic limit. The finite size behavior of the transition is anomalous, with ln N and not N playing the role of the system size. This slow approach to the thermodynamic limit is relevant for real networks in the quasi-geometric phase β c < β < 1, for which high levels of clustering can still be observed. All together, our results describe an anomalous topological phase transition that cannot be described by the classic Landau theory but that, nevertheless, differs from other topological phase transitions, such as the BKT transition, in the behavior of thermodynamic properties.

Authors contributions
All authors participated in the design and implementation of the research, analysis of results, and in the writing of the paper.

Competing interests
The authors declare no competing interests.

S1 Analytics
In the following section we will first present some preliminaries about the S 1 -model. We will do so from the point of view of network theory, focussing on connection probabilities and hidden variables of nodes. This section functions as a summary of known results about the S 1 -model. For further reading about this model and its alternative formulation, the H 2 -model, we kindly direct the reader to 1 . We will then look at the model from another direction, focussing on the fact that the network can be represented as a gas of fermions (links), where the nodes of the network define the available states. This will allow us to fully determine the thermodynamics of the model, generalizing what was already done in the main text of the paper. We investigate the surprising result further by showing that many can be recovered by a highly simplified toy-model. We then revert back to the network point of view to find the finite size scaling behaviour of the clustering coefficient in different cases.
This is done by looking at N 1 but finite, taking only the dominant contributions into account. We distinguish between the cases β < β c and β = β c as these show different behaviour. Finally we will study how the clustering coefficient in the thermodynamic limit (N → ∞) approaches zero in the limit β → β + c . Notation-wise we choose to use a b to refer to 'a is asymptotically equivalent to b' and ∼ to 'a is asymptotically proportional to b'.

S1.1 Preliminaries
The average clustering coefficient for a node with hidden degree κ and angular position θ is defined in the S 1 2 model as in a network with system size N . Here the functionk(κ, θ) is the average degree of a node with hidden coordinates (κ, θ), ρ(κ) is the hidden degree density and p(κ, κ , θ, θ ) is the connection probability between two nodes with hidden coordinates (κ, θ) and (κ , θ ). The exact form of these functions will be discussed in the following. Note that as the model has rotation symmetry, one only needs to investigate the node at angular coordinate θ = 0. The average clustering coefficient can be computed from c(κ) in the following manner However, as c(κ) is a bounded monotonically decreasing function, it suffices to find the scaling of c(κ) for small κ 3 . In Eq. (S1), ρ(κ) defines the distribution of the hidden degrees. In the following, we always apply a power law distribution. As we are interested in finite-sized scale-free networks, we choose the following distribution Note that from this expression also the homogeneous distribution can be reached. This can be done in two different ways.
The first is by letting κ c → κ 0 . Note that then ρ(κ) → ∞ if κ 0 ≤ κ ≤ κ c , but that this region also goes to zero width. Thus, we end up with a delta distribution (note that the integral of ρ(κ) always gives 1, irrespective of κ c ), exactly what we want for a homogeneous distribution. We can then set κ 0 = k to obtain the correct average degree. One can make similar arguments by leaving κ c > κ 0 and γ → ∞. In this case, one again ends up with the same distribution. We choose to not specify the specific form of the cut-offs just yet. We just demand that κ 0 is such that the correct average degree is obtained and that, to lowest order, it does not depend on the system size. The average degree of nodes with hidden variable κ and angular position θ is defined ask The function p describes the probability of two nodes in the network being connected and is given by the Fermi-Dirac distribution. In Ref. 4 it is noted that for β > β c , one can define the connection probability in the thermodynamic limit, given in terms of the spatial coordinates r, in our 1D case the coordinates on a infinite line: Hereμ = exp µ where µ is the chemical potential which fixes the average degree of the network. We come back to this shortly. As was noted in the main text, the relation between the coordinate θ on a circle with a finite radius and the coordinate on the real line r is r = N θ 2π . So for finite sizes this becomes Here ∆θ = π − |π − |θ − θ ||. To find the value ofμ we demand that the average degree remains constant: ; z is the ordinary hypergeometric function 1 . Which one of these terms is dominant depends on β. If β > 1, the first term is more important and so we can isolateμ to obtain µ β sin(π/β) 2π k . (S10) If β < 1, the second term dominates and we obtain However, as explained in the main text, using connection probability (S6) also when β < 1 leads to an ever more homogeneous network. If instead we want to preserve the degree sequence also below the critical β we need to redefine the connection probability in this regime: (S12) If we use this connection probability instead in Eq. (S7) we obtain forμ when β < 1 (S13) It can be shown that higher order terms become relevant when β → 1. Thus, when β = 1 the form of the dominant contribution to chemical potential will change. Note that in this case both connection probabilities are equivalent. To fix the average degree we write This then leads toμ (S15) Having derived the expressions above, we can also determine k(κ, θ). For large N , Eq. (S4) evaluates to k(θ, κ) Cμ κ κ, where C is some constant that depends on β. Thus, we note that the expected degree is proportional to the hidden degree. 1 Note that we use a slightly different form than the standard 2 F 1 [a, b; c; z] for aesthetic purposes. Now, integrating over κ we obtain k Cμ κ 2 , where above we have defined the variousμ's s.t. k = κ . This then implies that k(θ, κ) κ, i.e. that the hidden degree exactly represents the expected degree of a node.

S1.2 The network as a gas of fermions
We will now look at the network in a different picture, using the fact that, as explained in the main text, the edges can be seen as fermions, with occupation numbers given by (1 + exp (β( − µ))) −1 .

S1.2.1 The density of states
We start from the most general form of the connection probability Now, if we want the connection probability to have the form as given in Eq. (S6), whereμ = exp(µ), we must define the energy per link/particle as As was mentioned above as well as in the main text, we must change the form of the connection probability for β < β c in order to have a degree distribution independent of temperature. The form we then use is that given in Eq. (S12), wherê µ = exp βµ, which leads to the following energy per particle.
Note that, from a standard statistical physics perspective having the energy levels depend on temperature explicitly is unusual.
In fact, we will see that we need to be very careful when deriving the thermodynamic properties. However, we will also show that, from a network perspective, the results we obtain are completely valid.
With the two expressions for the energy per particles we can then derive the density of states as follows: This leads to two distinct density of states. The first, using Eq. (S16), is with max = ln N 2κ 2 0 (connecting two points on opposite sides of the S 1 manifold with both points having the minimal expected degree) and the second, using Eq. (S17), is . Note that at β = 1 these two are the same. The general form is thus

S1.2.2 Chemical Potential
With this we can calculate the chemical potential. In order to do so we study the average amount of links Here, Φ[z, a, b] is the Lerch zeta function. If we now assume e β( max −µ) 1, we can approximate this as We know that max ∼ ln N so µ c ln N where c < 1. It can then be shown that for all c the dominant contributions are If we take E = N k /2 (sparse network) we obtain Note that in all these cases e β( max −µ) 1 and that these are exactly the same results as we found before.

S1.2.3 Thermodynamics
With this we can now study the grand potential.
where B z [a, b] is the incomplete beta function. Again assuming that e β( max −µ) 1 and b < 1 we get the following dominant terms, after having divided out E Normally with this we have enough to calculate the entropy. However, we need to be careful when using a temperature dependent density of states. Let us check if S = β 2 ∂Ξ ∂β µ still holds.
In the last step we recognize the entropy of a graphon gas 5 . So, indeed, in the case that ρ( ) or max depends on the temperature, we get extra terms (S = β 2 ∂Ξ ∂β µ − ∆). These terms, at least in the general case, are not trivial to evaluate. However, we also note that S = β( U − Ξ − µ E ) remains valid in all cases. We will therefore approach S in this way. Thus, the final thing we need to do is find an expression for the average energy.
We can again take the limit e β( max −µ) 1, dividing out E , to obtain Finally, this leads us to the entropy: Now we plug in the remaining variables to obtain The final entropy is, as expected, equal to that of the an Erdös-Renyi graph with connection probability k /N when β → 0 and γ → ∞ 6 and should give the entropy of the soft configuration model when β → 0.
Using the density of states and the Fermi-Dirac statistics we can also find the probability of a link having energy . This is namely given by We can plug in the approximated values of µ from before and plot the results for β = 1/2 (blue line) and β = 3/2 (orange line) in Fig. S3. Notice that this probability density changes dramatically at the "critical" point β = 1. Indeed, when β > 1 particles occupy low energy states and for β < 1 they occupy mainly high energy states. However, since the number of states grows exponentially with the energy, the number of available microstates per particle grows extremely fast in the β < 1 regime, inducing a sudden increase of the entropy, explaining the divergence of the entropy in the thermodynamic limit in this regime.

S1.2.4 Toy model
Above we have seen some interesting bahavior, most notably the non-extensivity of the entropy above the critical temperature.
We want to now investigate where this feature comes from, by looking at a simplified version of our model. Suppose we have a system made of N part non-interacting "particles", each of which can attain states of energy ∈ (0, max ). Suppose also that the degeneracy of states of energy grows as g( ) = V e βc with β c a fixed parameter and V the volume of the system. The probability density to find one such particle in a state of energy is We notice that here we find the same sudden change of behavior at the critical point β = β c as we found in the S 1 model.
Using Maxwell-Boltzmann statistics for identical particles, the entropy per particle of this system is easily calculated as The first two terms in this last equation are just the average energy per particle of the system. If the density of particles is kept fixed, so that lim Npart→∞ Npart V = cte, then entropy is an extensive quantity as it is proportional to the number of particles.
However, there is a clear change of behavior as one goes from β > β c to β < β c due to the change of behavior of the probability density Eq. (S34). If besides β max 1, then the entropy behaves as Thus, in the limit of max → ∞ the entropy per particle diverges at β → β + c and scales as max for β ≤ β c , just as in our model.
In the S 1 -model, the effective system size is given by the proportionality constant V eff = ae max in Eq. (S21) and the amount of particles is given by E = N k /2 as we are working with a sparse graph. In this case we indeed satisfy lim E →∞ E Veff = cte and the entropy is thus in principle extensive. However, as in the full model there is the extra constraint E ≤ N (N − 1)/2 and max = ln(N/(2κ 2 0 )), we are obliged to also send max to infinity when going to the thermodynamic limit, thus resulting in a non-extensive entropy for β < β c . We thus show that the essential feature of the S 1 model that leads to a non-extensive entropy is the exponential dependence on the energy of the density of states.

S1.3 Scaling Behaviour of Clustering with System Size
In the following section we find the dominant finite size scaling of the clustering coefficient for β ≤ 1. As was explained in the main text, in this region in the thermodynamic limit clustering vanishes. We will therefore study what happens when N 1 but finite for any β (we thus do not take any limit with respect to the temperature). As for β 1 higher order finite size correction become important, we study separately the case β = 1.
We start by manipulating the angular integrals of Eq. (S1) as to simplify the task at hand later on. We then turn to the scaling when β < 1 and conclude with an analysis of the scaling when β = 1. In the case of β < 1, in order to facilitate numerics later on, we choose to adopt the connection probability as defined by Eq. (S12), where the degree sequence at different temperatures is the same.
The basis of these calculations is the fact that we are looking for the scaling behaviour of the c with respect to the system size N . This allows us to always ignore terms that we know are smaller than than the main term, which simplifies the integrals that we study substantially. Once we have a term, say A, we want to know the scaling behaviour of, we use the fact that if the functions f (N ) and g(N ) in equation have the same dominant scaling, one can immediately conclude that A also has that exact dominant scaling. Therefore, by finding upper and lower bounds to the integrals in question we can extract there scaling behavior with respect to A. It is important to keep in mind that, even when the integrals representing the bounds become very tedious, the strategy we employ remains the same throughout this section.

S1.3.1 Angular Manipulation
We start by manipulating the angular integrals of Eq. (S1) to make it easier to work with, i.e. get rid of the absolute values in the expressions for ∆θ. The equation has the following form: Here, we have used θ = 0. Let us first investigate the trivial case of the denominator, where we only focus on the angular where in the last step we have performed the transformation t = 2π − θ and t → θ on the second integral. The numerator can be rewritten in a similar way to obtain four terms 2π 0 dθ 2π 0 dθ p(κ, κ , π − |π − |θ ||)p(κ, κ , π − |π − |θ ||)p(κ , κ , π − |π − |θ − θ ||) The first two terms are not exactly the same. However, as the full expression of the clustering coefficient also contains integrals over the hidden degrees, one can interchange κ ↔ κ . This thus shows that the first two terms contribute equally to the clustering coefficient. All in all, we will thus be working with the following three terms Now, before we get started on finding the scaling with respect to the system size of each term individually, it might be that we can avoid doing so by some simple arguments. Indeed, we will show that the first term will always dominate the others in the large N limit, and so we only have to find its scaling. Let us start with the second term The above statement is true as the integrand is strictly positive and so extending the integration domain will only make the integral larger. Now, we can split the θ integral and perform θ ↔ θ and κ ↔ κ on the second term to obtain In the final step we use the functional form of p with respect to the angular coordinate is As s β is monotonously increasing, and 1/(1 + s) is monotonously decreasing, p(s) is monotonously decreasing. Thus, it is largest when s is smallest. Obviously, θ + θ > θ − θ for all (θ , θ ) ∈ [0, π] × [0, θ ]. We have thus proven that the first term in Eq. (S41) dominates the second term. We can follow similar steps for the third term. We we will now only clarify steps if they are new.
For the same reasons as before, this then implies so this term is also dominated by the first term in Eq. (S41).

S1.3.2 Case 0 < β < 1
The first step is to perform the transformation x = κ κs and y = κ κs , where we define κ 2 s ≡ N β /((2π) βμ ). Note that we use assume the functional form ofμ defined in Eq. (S13), such that κ s ∼ √ N . This leads to c(κ) ∼ 2 κc/κs κ0/κs dx κc/κs κ0/κs We investigate the numerator and denominator separately and define Our investigation will focus on finding upper and lower bounds for these integrals. Note that from here on out we will drop the domains of the x and y integrals and assume them to be [κ 0 /κ s , κ c /κ s ] unless otherwise indicated. Using the fact that we can conclude that A + < A − . As numerical investigation leads us to expect that both have the same scaling, this implies that we do not need to worry about an upper bound for A+ nor the lower bound for A − . If the functions f (N ) and g(N ) in have the same dominant scaling, one can immediately conclude that A − also has that exact dominant scaling. One might ask why we introduce A + in the first place, when in the end we are only interested in the scaling of A − . The answer to this is that A + in general has nicer properties due to the lack of (θ −θ ), as it is thus easier to find a lower bound for it than for A − .
We start with the simplest integral, the B-term, which can be solved exactly. To this end we first need to rewrite it a bit. By performing two substitutions one obtains This then gives the following expression This expression can then be expanded w.r.t. N , using that κ s ∼ √ N . To lowest order, one finds that B then scales as Next we turn to the A + term. Here we use the following fact to bound this integral. If F = V f ( x), where V is the volume over which to integrate the function, and f ( x) ≥ f 0 for all x, where f 0 some constant, then F ≥ f 0 V. From the form of the standard connection probability given in Eq. (S44), we see that A + is smallest when the argument is largest, which is the case when θ , θ are largest, so when they are both π. Thus we can bound the angular integrals by replacing the integrand with its minimum, the same function where both angular coordinates are π. The integrand is then a constant so the bound is given by the value of that constant times the area of the integral. Plugging this in we obtain Now, this is exactly the same integral (with the exception of the π's, but they will obviously not change scaling) as the one evaluated in Ref. 3 . As was found in the reference (Eq. (6)), the scaling depends on how we set κ c relative to κ s . We distinguish two regimes. First, there is the regime where κ 0 κ s κ c . In this case, the scaling is Then, there is the region where κ 0 ≤ κ c ≤ κ s (κ 0 κ s must be required to hold) where one obtains This, however, does not give the full scaling behaviour, as numerical results show us that for large β the scaling with respect to N is different. To find where this different scaling comes from we take a step back and look at the full integral A + as given in Eq. (S50). One might be tempted to, as in Ref. 3  integrand is positive for all angles, which means that the integral over the full domain must be larger or equal to the integral over D 1 . The domain D 1 can only be defined in the case that κ c ≤ κ s , as only then the angular coordinates remain smaller than the maximal possible value of π for all x and y. For the case that κ c κ s we define the more restrictive domain Starting with the case κ c ≤ κ s , bounding the integral as before (by replacing the integrand by its minimum), one finds For the case κ c κ s one obtains where in the first step it was noted that irrespective of the value of x and y, the argument of the connection probabilities is small.
We now have five different scaling behaviours. Which terms dominate will depend on the value of β as well on κ c . To quantify how the scaling varies with κ c we introduce the exponent α such that κ c ∼ N α/2 . As κ s ∼ N 1/2 , the different regimes of κ c described above correspond to α ∈ [0, 1] for κ c ≤ κ s and α ∈ (1, 2 γ−1 ] for κ c κ s . Using these definitions and adding up the different scaling we found above, we conclude that where C +,i are constants. Note that, for example, the scaling of Eqs.
One sees that these three terms are similar, and so we treat the general integral where the transformation θ = θ − θ , θ → θ was performed. Now, the argument of the Lerch zeta function can in principle be smaller and larger than one. If it is smaller, it can be shown that Φ[−(θ − θ ) −β ζ, 2, 3 − γ] < 2 γ−3 . If it is bigger than one can use the identity described in Ref where and ϑ(γ) = −π 2 cot(πγ) csc(πγ).
The argument of the Lerch zeta function is exactly one, which is the inflection point between the behaviours, when We must thus split the integration domain D l in three regions Fig. S1. Now, the grey region is the one where the Lerch zeta function Figure S1: Integration regions. In the grey region (D Y + D Z ) the argument of the Lerch zeta function is bigger than one, in the hatched region (D X ) it is not and the black region is D s . argument is bigger than one, in the hatched region we can bound the Lerch zeta function away and the black region is D s and we thus do not care about it for the moment. Before going any further, let us note that Fig. S1 looks slightly different for different κ c and ζ. If κ c κ s and ζ = (κ c /κ s ) 2 , then ζ 1 and thus so is a. However, a as an integration limit must be smaller than π and thus in this case the D X and D Y regions disappear. When κ c ≤ κ s this is not the case as for all ζ, a < π. Finally, irrespective of the value of κ c , for ζ = (κ 0 /κ s ) 2 , a = b and thus region D Z vanishes. Implementing the transformation given by Eq. (S65) in the grey region one obtains As this leads to three different angular integrals, in the end we have seven different integrals to solve.
The next step is to organise the different scalings (see Tab. (S1), where we have defined c Yi = c Y1i + c Y2i + c Y3i and similarly for Z) that were found and find which is dominant.
Let us note that as, the final results (Eq. (S63)) contains I κ 2 c /κ 2 s − 2I κ0κc/κ 2 s , the terms containing ln(κ c /κ 0 ) cancel. We now have many different scaling behaviours, and the question of which one dominates again depends on the value of β as well as κ c . As a matter of fact, if one includes the κ −2 s pre-factor in Eq. (S63), one recovers the same behaviour as was found for the lower bound  where C −,i are constants.
This seems to go in the right direction. However, we have not explored the full integration domain yet. It turns out though that the integration domain D s does not lead to any new scaling: The contribution of D s is thus subleading for small β and equally dominant as the other contributions for large β. We have thus shown that for the the upper and lower bound the dominant scaling is the same. We now have the scaling of all distinct parts, B, A + , A−, so we can now combine them all.
Let us discuss the limiting cases of α. When α = 0, κ 0 ∼ κ c , and the network thus has a homogeneous degree distribution.

S1.3.3 Case β = 1
We now turn to the limit β = 1. The general practice of finding upper and lower bounds for the various relevant integrals will be again pursued here, and in many cases the integrals examined will be similar to the ones studied above. However, there are some important differences that force us to treat this case separately. For one, we know that in the case of β = 1, µ scales aŝ µ ∼ (ln N ) −1 instead ofμ ∼ N 1−β , and thus κ s ∼ √ N ln N , which of course alters scaling. We will represent all integrals evaluated at β = 1 by a tilde (Ã − ,Ã + ,B). We start withB: The second term is dominant and thusB scales as For the lower bound of the numerator of the clustering coefficient we can use the result found in Eq. (S62) as nowhere was it assumed that β < 1. Irrespective of κ c this gives us For the upper bound ofÃ − we cannot follow the same path as was done in the case of general β. This is because the upper bound employed, given by Eq. (S63), diverges in the β = 1 limit. Thus, we must find a stricter bound. This is The black region is region D s . The horizontally striped region is region D 2 . The vertically striped region is region D 3 . The grey region is region D 4 . as represented in Fig. S2. Note that regions D 2 and D 3 overlap, but that is not a problem as our integrand is positive and counting a region double just increases the value of the integral, which in turn work for our purposes as we are only looking for an upper bound. For the region D s we can use the result (S83): where we have bounded the integral by decreasing the size of the denominators of the first and second terms. We also performed a change of variables of x and y. We now extend the lower bounds of the x and y integrals to zero, which can be done as our integral is positive, and so the resulting integral will be larger or equal to the original one.
, so the argument is larger than one. We first turn to the second region. Here the argument can diverge and we should thus perform a similar transformation as Eq. (S65). It is not exactly the same as the second argument of the Φ's is now 1 and not two 2, but the derivation is equivalent. This leads us to For D 2s we can immediately bound away the Φ to find κ κ s (κ s /κ 0 ) 2γ−3 Combining the two results we find thatÃ −,2 ≤c −,2 N γ−3 (ln N ) γ−3 as expected. Then we investigate to D 3 :

S1.4 Exponent η
In this section we show that the scaling exponent η that encodes how the clustering approaches zero when β → β + c = 1. As this only requires working on the low temperature side of the transition, we can directly work in the thermodynamic limit (we thus take first the limit N → ∞ and then β → 1). To this end, we denote the general definition of the clustering coefficient with hidden degree κ and (without loss of generality) spacial coordinate r = 0 dr ρ(κ )ρ(κ )p(κ, κ , |r |)p(κ, κ , |r |)p(κ , κ , |r − r |) (S99) where we can use connection probability (S5) andμ (S10).
Let us first turn to the denominator: where we have plugged in the definition ofμ and used that k = γ−1 γ−2 κ 0 .
The next step is the numerator. We first perform the transformation t = r /(κκ μ) and τ = r /(κκ μ) to obtain c(κ) =μ (S101) We know thatμ 2 ∼ (β − 1) 2 . This is exactly the scaling that we expect from numerical investigation for the clustering coefficient. Thus, all we need to prove is that at β = 1, the numerator is finite. If so, its (β − 1) dependence must be order O(1). If the full expression contained (β − 1) −n terms with n > 0 it would diverge at the critical point and if the dominant term was O((β − 1) n ) with n > 0 the numerator would go to zero at the critical point. And indeed, numerical integration shows that at β = 1 the numerator is finite, leading to the conclusion that c(κ) ∼ (β − 1) 2 (S102) such that η = 2, which in turn implies that ν = 1.

S2 Real Networks
As was stated in the main text, the DPG algorithm can be used to find the temperature of its embedding in the S 1 model. We list here a collection of real networks and their corresponding inverse temperatures. We choose to restrict ourselves to models where the inverse temperature lies below or close to the transition point β c .  Table S2: Properties of a selection of networks with the inverse temperature β obtained with the DPG algorithm. Only networks with β < 1.5 are shown.       Figure S5: Degree-degree correlations (upper row) and average clustering coefficient per degree (lower row) for three of the real networks in Tab. S2. The first column corresponds to the Human1-P network, the second to Human2-P and the third to Drosophila-G. The green points represent the network measures corresponding to the original network. The orange points represent the the average of 100 randomized networks at the β that reproduces the correct global clustering coefficient (see Tab. S2).