Soft communities in similarity space

The $\mathbb{S}^1$ model has been a central geometric model in the development of the field of network geometry. It has been mainly studied in its homogeneous regime, in which angular coordinates are independently and uniformly scattered on the circle. We now investigate if the model can generate networks with targeted topological features and soft communities, that is, heterogeneous angular distributions. Under these circumstances, hidden degrees must depend on angular coordinates and we propose a method to estimate them. We conclude that the model can be topologically invariant with respect to the soft-community structure. Our results might have important implications, both in expanding the scope of the model beyond the independent hidden variables limit and in the embedding of real-world networks.


Introduction
Complex networks have been widely studied in the last twenty years in many different contexts, from biological to social and technological [1,2]. There seems to be some universal features common to the topology of many networks. For instance, in most cases, they are scale-free, meaning that their degrees are power-law distributed, a phenomenon that was explained in early times of network theory by the preferential attachment mechanism: as the network grows, new nodes connect to highly connected -or popular-nodes with higher probability [3].
However, preferential attachment alone cannot explain the high level of clustering coefficient -the fraction of existing triangles-observed in real systems. To explain clustering, the concept of similarity was introduced [4]. The basic idea is that nodes connect not only because they are popular, but also because they are similar in some sense. Thus, if node A connects to nodes B and C because they are similar to A, B and C should also be similar and therefore have a high probability of being connected. This transitivity of similarity suggests encoding similarities between nodes as distances in metric spaces, since the triangle inequality is one of their defining properties: if the distance d BC in the underlying metric space measures the dissimilarity between B and C, it must be bounded by d BC ≤ d AB + d AC , therefore inducing the observed transitive connections. The S 1 model was proposed based on these ideas [4]. In this model, N nodes are randomly scattered into a circle of radius R = N/2π. Every node i is also assigned a hidden degree κ i from any distribution (for instance, a power-law P (κ) ∼ κ −γ ), and every pair of nodes i and j is connected with probability where d ij is the distance along the circle; µ and β are two global parameters controlling the average degree and the clustering coefficient, respectively. Notice that this connection probability takes the form of a gravity law, as it increases with the product of hidden degrees (popularities) and decreases with the distance between them (dissimilarity). Despite its apparent simplicity, this model arXiv:1707.09610v1 [physics.soc-ph] 30 Jul 2017 generates networks that resemble very much real networks; they are scale-free, small-world and have high levels of clustering. In fact, the degrees k i are proportional to the hidden degrees κ i , so the model is versatile enough to generate networks with different degree distributions 1 . The possibilities of the S 1 model go beyond generating realistic networks. The similarity space coordinates of the nodes of a real network can be inferred by finding the coordinates that maximise the likelihood for the real network to be generated by the model [6,7,8]. This embedding process yields a map of the network strikingly meaningful. For instance, it allows to navigate the network efficiently by mapping the coordinates to hyperbolic space. Moreover, being able to access to the similarity space coordinates of nodes opens the path to a completely new way of analysing complex networks. For example, in Refs. [6,7,8] it was shown that the angular coordinates of nodes in real-world networks are not uniformly distributed. Instead, they are distributed in a heterogeneous manner, with angular regions more densely populated than others. These dense regions reveal the community structure of the network [9,10,11]. Indeed, by partitioning the network using the largest gaps between consecutive nodes along the circle as community boundaries, the partitions obtained have a modularity comparable to that of other community detection methods currently available in the literature. Furthermore, this geometric method seems to have higher resolution [8].
In Ref. [12], the authors introduced the Geometric Preferential Attachment (GPA) model, a generalised version of the growing geometric model Popularity vs. Similarity Optimization [13] in which soft communities, as they named these denser angular regions, emerge from the growth dynamics of the network without altering topological properties like the degree distribution or the clustering spectrum of the resulting network. In this paper, we address the question of whether the S 1 model can generate networks with given target topological features and soft communities, that is, heterogeneous angular distributions. To that end, any angular distribution could be considered in principle. For instance, one could impose some non-uniform distribution function a priori. However, the angular distribution from Ref. [12] is not an imposition, but it rather emerges from a preferential attachment process in similarity space that seems to be a plausible explanation for the nature of communities in real systems. We focus on that particular angular distribution for our study. 1 The original model defined in [4] is in fact more general, allowing for any connection probability pij as long as it depends on the argument dij/(κiκj) 1/D , where now the space is the D-dimensional sphere and dij the geodesic distance on the sphere. The particular functional form in Eq. (1) allows us to interpret the network as a set of non-interacting fermions (the links) embedded in the hyperbolic plane, with the hyperbolic length of a link playing the role of its energy and β playing the role of the inverse of the bath temperature [5].

Results
In the bare S 1 model, hidden degrees and similarity coordinates are typically assumed to be uncorrelated, so every node's hidden variables are withdrawn independently from some joint distribution ρ(κ, θ) that factorises [4,14]. Nevertheless, the GPA is a growing model, so the angular coordinates and hidden degrees of different nodes are correlated. We are therefore forced to drop such simplifying assumptions.
In the GPA growth process, the degree of a node is determined by its age -the older the node is, the higher its degree. Moreover, when a new node t is added to the system, the probability for it to be placed at polar coordinate θ t depends on the number of nodes s < t at angular distance ∆θ st < 2/(s , where γ is the exponent of the power-law degree distribution. This implies a very particular dependence between similarity coordinates and degrees: the angular coordinate of a node must depend on the angular coordinates of all nodes with higher degree. Hence, we must include the implicit ordering in the sequence of nodes induced by the degree sequence in our heterogeneous version of the S 1 model. That could be done by first assigning a hidden degree κ i from a power-law distribution P (κ) ∼ κ −γ to every node, ordering the nodes according to their hidden degrees and reproducing the angular preferential attachment from the GPA model with that particular ordering. At the end of the process, we would obtain a set of N nodes with hidden degrees power-law distributed with exponent γ and the same angular distribution as the GPA model for that value of γ. However, if we then connected every pair of nodes with the probabilities given by Eq. (1), degrees and hidden degrees would not be proportional; the reason for such deviation from the usual behaviour of the model is that a homogeneous angular distribution is required for the proportionality between hidden and observed degrees [14], which is not fulfilled here by construction; hidden degrees must depend on the spatial distribution of nodes as well. In the following subsection, we address this issue. We explore the heterogeneous regime of the S 1 model and show that it is capable of generating networks with power-law degree distributions, high clustering and soft communities.

Geometric Preferential Attachment in the S 1 model
From the previous discussion, we see that hidden degrees and angles are considerably entangled in the modelling of geometric networks with soft communities. In this context, the S 1 model requires the following steps: 1. Assigning angular coordinates: Angular coordinates are assigned according to the Geometric Preferential Attachment, which requires an ordering. Therefore, assign a label i = 1, . . . , N to every node. Then, for every value of i from 1 to N :  Fig. 1 Geometric layout of the networks generated by the S 1 model with Geometric Preferential Attachment. In all cases, N = 1000 and β = 2.5. Every column corresponds to a value of γ and every row to a value of Λ. As in Ref. [12], soft communities emerge for low values of the initial attractiveness Λ. In order to clarify the figure, every node's target degree is represented as a radial coordinate ri = R−2 ln k tar i /k tar N , where k tar N is the smallest target degree and R = 2 ln N/(πµ(k tar N ) 2 ) . When using the hidden degrees instead of the target degrees, this mapping constitutes the isomorphism between the S 1 model and the H 2 model in hyperbolic space [14,6,7,8].
a. Sample i candidate angular positions φ j , j = 1, . . . , i for node i from U (0, 2π). b. For every candidate position, define the attractiveness A(φ j ) of candidate j as the number of nodes s with an already defined angular position, that is, with s < i, at angular distance ∆θ is < 2/(s c. Assign to node i the angular coordinate of candidate j, i. e. set θ i = φ j , with probability The initial attractiveness Λ ≥ 0 is a parameter that sets the strength of the geometric preferential attachment. For very high values of Λ, all candidate angles become equally likely, so the resulting angular distribution is homogeneous (see Fig. 1).
This process generates a distribution of nodes in the circle analogous to the angular distribution of the GPA model. However, notice that, in the GPA model, connections are established at the same time as positions are decided, whereas in the former steps, no connections have yet been made. 2. Assigning hidden degrees: Once every node has a defined angular position, we need to determine its hidden degree such that the resulting observed degrees, that is, after the connections have been actually established, are power-law distributed with exponent γ. As mentioned earlier in this paper, we must take into account that the spatial distribution is heterogeneous (especially for low values of Λ). We propose the following method: a. Generate a set of N target degrees k tar from a power-law distribution with exponent γ. Order the target degrees such that k tar 1 > k tar 2 > · · · > k tar N . b. Assign to every node i a hidden degree κ i , initially set to κ i = k tar i . c. Repeat N times: i. Choose some node i randomly.
ii. Compute the expected degreek i of node i as iii. Correct the value of κ i so that the expected degreek i matches the target degree k tar i . We propose to reset |κ i + k tar i −k i δ| → κ i , where δ is a random variable withdrawn from the uniform distribution U (0, 0.1). Other numerical methods could be used with the same end. d. Compute all relative deviations If max { i } i < η, where η is a tolerance which we set to η = 10 −2 , continue to step 3. Otherwise, go back to step 2c. 3. Generating the network with the S 1 model: In this last step, we simply connect every pair of nodes with the probabilities in Eq. (1). Since step 2 assigns a hidden degree to every node such that its expected degree matches its target degree, the resulting observed degrees in the network must be similar to the target degrees as well. Figure 1 shows the networks generated by the model for different values of γ and Λ. As in Ref. [12], the angular distribution has an evident soft-community structure for low values of Λ, whereas for high values of the initial attractiveness, the angular density resembles that of the homogeneous S 1 model. Despite the considerable differences in the similarity space distances between nodes for different values of Λ, the displayed networks are extremely similar from a topological perspective (see Fig. 2), with almost undistinguishable degree distributions and clustering spectra. Notice that step 2 is not specifically designed for the GPA angular distribution 2 . In principle, it should be valid for other distributions as well.

Discussion
There is abundant evidence of the geometric origin of many properties of complex networks, not only regarding their topology [6,7,15,8,16], but also their weighted organisation [17]. The field of network geometry has therefore attracted much attention recently, and the S 1 model is one of its cornerstones. On the one hand, it provides an intuitive and plausible explanation for clustering in real networks by introducing the concept of similarity space. On the other hand, it allows to build geometric maps of real networks by embedding them. These maps are remarkably meaningful, to the extent of predicting symmetries in real systems [16]. In addition, they are very useful; they can be used to navigate the network efficiently [6], to detect communities [6,8] or even to construct smaller-scale replicas of real networks for efficiently testing dynamics on real networks [16]. So far, the S 1 model has only been studied under several simplifying premises, like power-law degree distributions or independent hidden variables. Yet, it has been able to explain many observed phenomena in complex networks. However, it can be exploited beyond these assumptions, since the correlation between hidden degrees and angles might clarify many more topological features of real-world networks. This work opens the path towards such line of study by showing that the model does not require those simplifying assumptions, as it is capable of generating topologically similar networks with highly correlated hidden variables.
Moreover, the results presented in this paper might also have an impact on the embedding of real networks. Typically, the likelihood maximisation procedure only seeks the best angular coordinates, whereas hidden degrees are considered to be a function of degree only and known from the start [6,7]. This hypothesis is a direct consequence of the aforementioned simplifying assumptions usually contemplated in the S 1 model. Nevertheless, as we have shown in this work, a heterogeneous angular distribution requires correcting hidden degrees in such a way that they depend on the hidden variables of all other nodes. This is a very important result, since it suggests that inferring hidden degrees via likelihood maximisation as well might noticeably improve the quality of embeddings of real-world networks with community structure.