Generalized percolation in random directed networks

We develop a general theory for percolation in directed random networks with arbitrary two point correlations and bidirectional edges, that is, edges pointing in both directions simultaneously. These two ingredients alter the previously known scenario and open new views and perspectives on percolation phenomena. Equations for the percolation threshold and the sizes of the giant components are derived in the most general case. We also present simulation results for a particular example of uncorrelated network with bidirectional edges confirming the theoretical predictions.


I. INTRODUCTION
A wide class of real systems of many interacting elements can be mapped into graphs or networks. Under this approach, vertices or nodes of the network represent the elements of the system whereas edges or links among them stand for interactions between different elements. This mapping has triggered a huge number of works and a surge of interest in the field of complex networks that has lead to a general framework within which to analyze their topology as well as the dynamical processes running on top of them [1,2,3]. In many cases, these dynamical processes are directly related to functionality and involve some kind of transport or traffic flow. Furthermore, the very existence of those networks could be naturally explained as a direct consequence of the communication need among its constituents. The Internet or the World Wide Web are clear examples [4]. In order to preserve functionality, networks characterized by transport processes must be connected, that is, a path must exist between any pair of nodes, or, at least, there must exist a macroscopic portion of vertices -or giant componentable to communicate. In this context, percolation theory appears as an indispensable tool to analyze the conditions under which such connected structures emerge in large networks.
The general theory of percolation phenomena for uncorrelated undirected random networks was first developed by Newman et al. [6,7] after previous results in [8,9]. The phase transition at which the giant component forms was well characterized, and the size distribution of connected finite components below and above the critical point were calculated as well. Some further refinements were needed in order to approach real nets. Hence, correlations between degrees of neighboring vertices were taken into account in [10,11,12] and growing networks were treated by Dorogovtsev et al. [13] and Krapivsky et al. [14].
To go further, directness must be taken into consideration since some of the most interesting real networks present asymmetric interactions. Noticeable examples are the World Wide Web [15], citation networks [16,17], email networks [18], gene regulatory networks [19], or metabolic networks [20]. Percolation theory for purely directed networks was first developed by Newman et al. [6,7] and later by Dorogovtsev et al. [21]. In the particular case of scale-free degree distributions a number of interesting specific results were obtained [22]. While allowing for general correlations between the incoming and the outgoing number of edges of a given vertex, all these studies refer to networks with no degree correlations. Furthermore, the theory is restricted to the class of directed networks with no bidirectional edges, although this class of edges are ubiquitous to all real directed networks (see [23] and references therein). All these limit the applicability of the theory to real networks since bidirectional edges and degree correlations are common to all real directed networks. In this paper we present a general theory for percolation in directed random networks with general two point correlations and bidirectional links. We will show that the presence of bidirectional edges and degree correlations modify the picture previously drawn [6,7,21,22] in a non trivial way, opening new scenarios for percolation phenomena.
The paper is organized as follows. In section II, we review concepts, definitions, and the main results previously obtained in the analysis of percolation in random networks. In section III, we develop the general theory for directed networks with bidirectional edges and arbitrary degree correlations and we show how the theories for undirected and purely directed random networks stand as particular cases. Section IV is devoted to the uncorrelated case, which deserves special attention as a null or benchmark model. The relative sizes of the giant components are computed and the explicit expression for the percolation condition is provided. The well-known critical points signalling the phase transition in undirected and purely directed uncorrelated networks are recovered as limiting cases. A practical application of the formalism for uncorrelated networks is presented in section V, where the transformation from a purely directed network to a purely bidirectional one is studied. For power-law degree distributions, the transformation is shown to undergo a nontrivial phase transition. Simulation results support this prediction, finding an excellent agreement with numerical solutions of the theoretical equations. Fi-nally, we conclude with a brief report of results in section VI.

II. PERCOLATION IN UNCORRELATED PURELY DIRECTED NETWORKS
The topological structure of directed networks is more complex than that associated to undirected graphs. The edges associated to each node in a directed net are usually differentiated into incoming and outgoing. Usually, no bidirectional links are considered so that each vertex has two coexisting degrees, k i and k o , which sum up to the total degree k = k in + k out . Hence, the degree distribution for a directed network is a joint degree distribution P (k i , k o ) of in-and out-degrees, which in general may be correlated. The bidirectional edge symmetry of undirected networks is thus completely broken in purely directed ones, with implications down to the level of percolation properties. The giant connected component in undirected graphs becomes internally structured in the case of directed networks so that four different types of giant components may arise. Whether giant or not, these components are characterized as follows (according to definitions in [21]): • The weakly connected component, WCC, the percolative cluster in undirected graphs. In the WCC, every vertex is reachable from every other, provided that the directed nature of the edges is ignored.
• The strongly connected component, SCC, the set of vertices reachable from its every vertex by a directed path.
• The in-component, IN, all vertices from which the SCC is reachable by a directed path.
• The out-component, OUT, all vertices which are reachable from the SCC by a directed path.
Notice that, with these definitions, the SCC in included in both the IN and OUT components. The percolation theory developed by Newman et al. [6,7] and Dorogovtsev et al. [21] for directed graphs with arbitrary degree distribution and statistically uncorrelated vertices has shown that there are two phase transitions: the one at which the giant weakly connected component(GWCC) appears, and the one at which the other three giant components appear simultaneously: the giant in-component(GIN), the giant out-component(GOUT) and the giant strongly connected component(GSCC) as the intersection of the other two. The first phase transition corresponds in fact to the standard phase transition in an undirected random graph with arbitrary degree sequence and statistically uncorrelated vertices. The condition for this phase transition was first given by Molloy and Reed [8] and reads k k(k − 2)P (k) ≥ 0.
(1) When this condition is fulfilled, and although some disconnected finite components may remain, the GWCC emerges. It contains a macroscopic portion of the vertices in the network capable to communicate to each other regardless of the orientation of their links. The second phase transition is characteristic of directed networks. The critical point [7] ki,ko marks the first simultaneous appearance of the other three giant components: the GSCC, the GIN and the GOUT, as well as other secondary structures such as tubes or tendrils [15]. The efforts to understand how this landscape is modified by the consideration of correlations between degrees of neighboring vertices has been exclusively focused on the percolation analysis of undirected networks [10,11,12]. Assortative mixing by degree, observed in the vast majority of social real networks, has been found to favor percolation in the sense that the giant component appears at lower edge density. On the contrary, disassortative correlations, characteristic of technological and biological networks, difficult the formation of the giant component even if the second moment of the degree distribution diverges.

III. GENERALIZED PERCOLATION
In this section we develop a general theory for percolation in directed random networks with arbitrary two point degree correlations and bidirectional edges. We term our networks "random" -or Markovian-because, apart from purely local properties and two point correlations being fixed, networks are maximally random. This implies that the whole topology is encoded into the degree distribution P (k) ≡ P (k i , k o , k b ) and the transition probabilities P i (k ′ |k), P o (k ′ |k), and P b (k ′ |k) measuring the probability to reach a vertex of degree k ′ leaving from a vertex of degree k using an incoming, outgoing, or bidirectional edge respectively. Notice that we have used the notation k ≡ (k i , k o , k b ) and that we consider all three kind of edges as independent entities. These transition probabilities are related through the following degree detailed balance conditions [24] and These conditions assure that any edge leaving a vertex points to another vertex or, in other words, that the network is closed. Notice that this may not be the case in situations where information is incomplete and the outdegree of a vertex is known but not the neighbors at the end of these edges.
To analyze the percolation properties for this class of networks, it is necessary to calculate the joint distribution G(s, s ′ ) of the number of vertices (plus itself), s, that are reachable from a given vertex and the number of vertices (plus itself), s ′ , that can reach that vertex. Notice that we can leave a vertex using an outgoing or bidirectional edge and we can arrive to a vertex through an incoming or bidirectional edge. These sets of vertices are called the out-and in-components of a given vertex, respectively. Analogously, we can define the marginal probabilities G o (s) and G i (s) for the number of reachable vertices from a given one and the number of vertices that can reach it, respectively. These three functions contain the information of the sizes of the different giant components of the network. Notice that, if the network is above the percolation threshold, G o (s) = s ′ G(s, s ′ ) and G i (s) = s ′ G(s ′ , s). The probability to belong to a finite component is smaller than one in this situation, which implies that the remaining corresponds to the probability to belong to an infinite component, that is, the giant component. Function G o (s), for instance, measures the probability that a given vertex has a finite out-component of size s regardless of the size of its incomponent, which can be finite or not. On the other hand, function s ′ G(s, s ′ ) only accounts for finite components. Therefore, the relative sizes of the different giant components of the network can be written as (5) and (6) where the GOUT is thought of as the set of vertices with an infinite in-component, the GIN as the set of vertices with an infinite out-component, and the GSCC as the set of vertices with infinite in-and out-components simultaneously.
In heterogeneous networks, this set of probabilities depend on the degree of the vertex from where we start the count. Therefore, functions G(s, s ′ ), G o (s), and G i (s) are to be expressed as and where functions G inside the summations have the same meaning as the original ones under the condition of starting from a vertex of degree k. Thus, the complete solution of the problem goes through finding the conditional probabilities G i (s|k), G o (s|k) and G(s, s ′ |k). In the following subsections we will show how to compute them in the general correlated case.

A. In/Out component
To proceed further, we first focus our attention on the out-component size distribution of a vertex of degree k, G o (s|k). Starting from a vertex of degree k = (k i , k o , k b ), we can leave it using the k o outgoing edges and the k b bidirectional ones. Then, the number of reachable vertices will be the sum of the reachable vertices of each of the k o + k b neighbors plus 1. In mathematical terms, this translates into where g o (s|k) (g b o (s|k)) is the distribution of the number of reachable vertices from a vertex given that we have arrived to it from another source vertex of degree k fol-lowing one of its outgoing (bidirectional) edges. In writing Eq.(9), we have used the fact that random networks are locally tree like. Equations of the type of (9) find in the discrete Laplace space their natural representation in terms of the generating function formalism. Using this formalism, these equations simplifies enormously and can be manipulated very easily. Within this formalism, equation (9) simplifies aŝ where we have adopted the notationf (z) ≡ s f (s)z s .
In what follows, we will work in the discrete Laplace space, using the generating function formalism. Functions g o and g b o satisfy the following set of coupled equationŝ The term k b − 1 in the second line of Eqs. (11) comes from the fact that one of the bidirectional edges has already been used to reach the vertex of degree k ′ and, thus, cannot be used again to leave it. Notice that this restriction is not needed in the first line of Eqs. (11) since, in this case, we have reached the vertex using an outgoing edge of the source vertex. The set of equations (11) is closed for the functions g. Its solution for z = 1 will allow us to compute the size of the GIN component, The trivial solution of Eqs. (11) isĝ o (1|k) = 1, g b o (1|k) = 1, corresponding to the only case without giant component. Therefore, the network will percolate at the directed level when this trivial solution becomes unstable. To analyze the stability of this solution we use the approach adopted in [12] and find solutions of the formĝ o (1|k) = 1 − ǫx(k),ĝ b o (1|k) = 1 − ǫy(k) in the limit ǫ → 0. Replacing these expressions in Eqs. (11) and taking the limit ǫ → 0 we obtain where the matrix C o kk ′ is defined as The stability of the solutionĝ o (1|k) = 1,ĝ b o (1|k) = 1 is thus determined by the maximum eigenvalue of the matrix C o kk ′ , Λ m . When Λ m ≤ 1 this solution is stable and the GIN component does not exist. In contrast, when Λ m > 1 a non trivial solution of the set of equations (11) exists and the GIN component emerges. The analysis for the in-component of individual vertices is identical to the case of the out one if we replace in equations (10) and (11) In this case, the matrix controlling the onset of the GOUT component is As before, the condition for the appearance of the GOUT component is ruled by the maximum eigenvalue of the matrix C i kk ′ . At first glance, one could be tempted to conclude that, since matrices C i kk ′ and C o kk ′ are different, their eigenvalues are also different, leading to different phase transitions for the appearance of the GIN and GOUT components. However, it can be proved that the eigenvalues spectra of both matrices are identical and, then, both the GIN and the GOUT components appear simultaneously.
It is illustrative to recover from this formalism the results for the purely undirected and the purely directed cases. In indirected networks, only bidirectional edges are present, k i ≡ 0 and k o ≡ 0, and the matrices turns into recovering results in [12]. In the case of purely directed networks, k b ≡ 0 and we obtain This result generalizes the percolation theory for purely directed random networks developed in [6,7,21] to the case of arbitrary degree-degree correlations.

B. Strongly connected component
The analysis of the GSCC requires a more careful development of the ideas introduced in the previous section. In this case, the joint distribution of the in-and out-components of a vertex of degree k readŝ where the function g b (s, s ′ |k) is defined analogously to g b o (s|k) and g b i (s|k). It is worth to mention that, if the network contains bidirectional edges, G(s, s ′ |k) = G o (s|k)G i (s ′ |k) because, in this case, such edges are com-mon to the in and out components of the vertex. Using the same reasoning as in the previous section, we can write down a closed equation for the joint distribution g b (s, s ′ |k) This equation, together with Eq. (10) and Eqs. (11) are the complete solution of the problem. The solutions for arbitrary values of z and z ′ allow to find the distribution of the sizes of the in-and out-components of single vertices whereas the non-trivial solution for z = 1 and z ′ = 1, combined with Eqs. (5)(6) and Eqs. (7)(8), will provide us with a method to calculate the size of the different giant components of the network.

IV. UNCORRELATED NETWORKS
As mentioned before, real networks are usually correlated in the sense that the degrees of pairs of connected vertices are correlated random quantities. Nevertheless, uncorrelated networks are equally useful as benchmarks, or null models, to test topological and dynamical properties and compare them to the results obtained in correlated networks.
When two point correlations are absent, the transition probabilities become independent of the degree of the source vertex. In this situation, after some elemental algebra, we obtain and where we have made use of the fact that k o = k i . Using these expressions, the set of equations (11) reduces to the following set of trascendent equations, whereP (x, y, z) is the generalized generating function of the degree distribution, that is, The condition for the existence of a non trivial solution of the set of equations (23) is easily obtained using the formalism developed in the previous sections. For the uncorrelated case, the matrices C i kk ′ and C o kk ′ become independent of k and its maximum eigenvalue reads so that, whenever the condition Λ m ≥ 1 is fulfilled, the network is in the percolated phase. As it can be seen from Eq.
This result is easy to understand since, when Γ ∼ 0, vertices cannot have directed and bidirectional edges simultaneously and, as a consequence, the network is composed of two isolated networks, one of them purely directed and the other one containing bidirectional edges only.
In the purely undirected case, k i ≡ 0 and k o ≡ 0 and we recover the well known condition for percolation in undirected networks with given degree distribution Eq. (1) In the case of purely directed networks, k b ≡ 0 and the maximum eigenvalue reads recovering Eq. (2). When Γ ≫ 1, Λ m must be computed using Eq.(25) and, in general, will depend on the density of edges, as well as on the type of correlations between directed and bidirectional edges. In particular, a positive correlation between k b and k i or k o can strongly favor the emergence of the giant component even if the density of bidirectional edges is very small. We will illustrate this point in the example of the next section.
To finish the uncorrelated analysis, let us compute the relative sizes of the giant components. Let (x c , y c , z c , z ′ c , z ′′ c ) be the non trivial solution of the set of Eqs. (23), then using Eq. (5) and Eq. (6), the relative sizes of the different giant components of the network read In the next section, we present a practical application of this formalism.

V. BIDIRECTIONAL EDGES AS PERCOLATION CATALYSTS
Suppose we have a purely directed network with degree distribution P (k i , k o ) and no two point degree correlations. Suppose also that the network is in a regime in which the GWCC exists but not the GSCC, that is, Now we transform the original network by converting each directed edge into a bidirectional one with probability p. After this transformation, we end up with a network with pE bidirectional edges and (1 − p)E directed ones, where E is the original number of directed edges on the network. The degree distribution of the transformed network can be written, in the discrete Laplace space, aŝ This transformation undergoes a phase transition as we increase the value of p. When p = 0, the network is purely directed and, by construction, it has no GSCC. When p = 1, all edges become bidirectional and, thus, the GSCC is identical to the GWCC of the original network. Therefore, at some intermediate value p c , the network percolates and a GSCC emerges. The value of p c can be easily obtained using the expression for the maximum eigenvalue, Eq. (25), and the final degree distribution, Eq.(32). The most interesting case corresponds to networks with marginal degree distributions following power laws of the form P i (k i ) ∼ k −γi i and P o (k o ) ∼ k −γo o with γ i , γ o ≤ 3. When the transformation described above is performed in this type of networks, some of the terms in Eq.(25) are proportional to k 2 i and k 2 o and, con-sequently, Λ m → ∞ in the thermodynamic limit. This, in turn, implies that p c = 0, that is, even an infinitely small fraction of bidirectional edges suffices to percolate the network.

A. Numerical simulation
The check the accuracy of our theory in the case of power law marginal degree distributions, we have performed extensive numerical simulations. We first generate purely directed random networks with degree distribution of the form P (k i , k o ) = P i (k i )P o (k o ) and no two point correlations. The in and out degree distributions are taken to be identical and to follow a scale-free form of the type where ζ(γ) is the Zeta Riemann function. To generate a purely directed random network, we use a natural extension of the configuration model [8,9,25,26] -an algorithm intended to generate uncorrelated random networks with a given degree distribution. The algorithm starts by first assigning to a set of N vertices a pair of "stubs", one of them incoming and the other one outgoing, k i and k o , randomly drawn from the distribution P (k i , k o ). The only requirement is that i k i = i k o , whenever one wishes to close the network. The network is constructed by selecting pairs of in and out stubs chosen uniformly at random to create directed edges, avoiding multiple, bidirected, and self-connections among vertices.
Once the network has been assembled, each directed edge is transformed into a bidirectional one with probability p, and the relative sizes of the giant components are measured. Figures 2 and 3 show simulation results for a scalefree network following Eq.(33) with exponent γ = 3 and size N = 10 6 as compared to the numerical solution of Eqs. (23). As it can be seen, the agreement between simulation results and the theoretical prediction is excellent. The relative sizes of the GSCC and the GIN are shown as a function of the conversion probability p for different values of the distribution parameter P 0 , the probability of nodes having null in or out degree. Even for very small values of p, the GSCC and the GIN are evident. As expected, small values of P 0 favor the growth of bigger giant components. Then, bidirectional edges act as a percolation catalysts, favoring the appearance of a fine structure in the giant connected component. The scale-free property of many real networks is, once more, indicative of interesting features, since, in this case, the presence of an infinitesimal fraction of bidirectional edges is enough to ensure percolation at the level of the directed components .

VI. CONCLUSIONS
We have derived a very general formulation of the theory of percolation in directed random networks with bidirectional edges and arbitrary two point degree correlations. Our formalism accounts for all the previously known results for percolation in purely directed and purely undirected random networks, which stand as limiting cases of our theory. The percolation threshold for the most general situation is derived as a function of the maximum eigenvalue of the connectivity matrices. In particular, for networks with no two point correlations, explicit expressions are provided depending on the first and second moments of the degree distribution P (k). In this case, we have also shown that bidirectional edges act as a catalyst for percolation, favoring the emergence of the GSCC, and for scale-free networks, only an infinitesimal fraction of bidirectional edges is needed.
After the completion of this work, we have become aware of a recent preprint [27] where the classical Susceptible-Infected-Recovered (SIR) model of epidemiology is analyzed in uncorrelated directed random networks with bidirectional edges. Since there exist a mapping between the SIR model and percolation theory, some of the results derived in that reference overlap our results for the uncorrelated case.