Noise-Induced Polarization Switch in Single and Multiplex Complex Networks

The combination of bistability and noise is ubiquitous in complex systems, from biological to social interactions, and has important implications for their functioning and resilience. We analyze a simple three-state model for bistability in networks under varying unbiased noise. In a fully connected network increasing noise yields a collapse of bistability to an unpolarised state. In contrast, in complex networks noise can abruptly switch the polarization state in an irreversible way. When two networks are combined through increasing multiplex coupling, one is dominant and progressively imposes its state on the other, offsetting or promoting the ability of noise to switch polarization. Our results show that dynamical correlations and asymmetry in dynamical processes in networks are sufficient for allowing unbiased noise to produce abrupt irreversible transitions between extremes, which can be neutralized or enhanced by multiplex coupling.

Bistable behavior in systems which can operate in two competing modes is recurrent in a wide range of domains, from cell biology [1,2] to social dilemmas [3]. Concurrently, stochastic fluctuations are inherent to most of these systems. In bistable systems of finite size, an obvious effect of noise is to cause fluctuating transitions between the alternate states. However, noise may also impact on bistability intrinsically by unfolding a bistable phase into a monostable one. This less intuitive effect has been proved in very different systems, like autocatalytic chemical reactions [4] and decision making in financial markets [5].
Even in systems without bistability, noise can induce striking phenomena [6]. A paradigmatic example is the Brownian ratchet, where directed motion can be induced by combining Brownian motion with an asymmetric potential [7]. A discretised version of this, the "Parrondo paradox", describes the alternation of two processeseach with losing expectations -which combine to produce a winning result [8][9][10]. Such noise phenomena are a consequence of temporal correlations in the dynamics. In systems of many constituents, heterogeneity in the patterns of interaction [11] and non-trivial topological features, e.g. in complex networks, can further correlate the dynamics [12][13][14][15]. For instance, systems of coupled networks can yield monostable-bistable transitions in the mean field Ising model [16].
Here, we study the interplay between bistability, noise, and the connectivity structure of networks. We prove that, under quite loose requirements, dynamical nodenode correlations subject to unbiased noise can induce abrupt irreversible switching from bistability to monostability. The switching disappears when dynamical correlations are removed in the limit of infinite connectivity; noise can then only bring about a neutral mean state in a continuous transition. With dynamical correlations, FIG. 1: Three-state model for bistability. a. Unconditional random process. At probability ε, a node is selected and changes its state with probability 1/2 as shown. b. Conditional random process. At probabilty 1 − ε, transitions are conditional on the state of a selected neighbor, e.g. transition u → v for a given node is possible at probability p3 if the selected neighbor is in w. No transition takes place otherwise.
the system remains polarized and the neutral state can only be attained when the dynamics is entirely produced by noise. Additionally, in multiplexes with increasing coupling, there is a dominant layer which attracts the other towards its "resilient" extreme. We use a simple and general dynamical model for bistability [2,17,18] under the presence of unbiased noise which affects all constituents. We derive an analytic expression for the fixed points of the dynamics for infinite fully-connected systems, and a second-order mean field approximation for regular random graphs, which is solved numerically. Scale-free, Watts-Strogatz, and multiplex networks are investigated using numerical simulations. Model -We consider undirected networks of N nodes, where each node can be in one of three states, labeled u, v and w (Fig. 1) [20]. These states could encode unmethylated, hemi-methylated, and fully methylated CpG sites in DNA methylation, or left-wing, centrism and rightwing orientation in politics. Transitions between the extreme states u and w always visit the intermediate state v and can occur in two ways: Type-1 transitions depend only on the state of the affected node (Fig. 1a). This is analog to Brownian motion, where nodes perform random walks in the space of configurations. Type-2 transitions depend on the state of neighbors (Fig. 1b). The dynamics proceeds as follows: a node l is chosen at random. With probability ε respectively 1 − ε, a type-1 or type-2 event occurs at l. In the latter case, one of l's neighbors m is selected at random. Depending on the state of m, a transition will occur at l with the corresponding probability p i , i ∈ {1, . . . , 4} (Fig. 1b). Note that only four of the nine combinations of states l and m lead to conditional transitions, others are not affected.
Fully connected systems -We first let p 1 =p 4 =1 and p 2 =p 3 ≡ p, leaving only p ≤ 1 free. We useq for the system average of any quantity q as well as the symbol M ≡ w − u to denote net polarization. The steady state can be computed exactly for systems consisting of few sites. For two nodes connected by a link,M maintains its sign for all values of noise and p (Fig. S1). In contrast, for a three-node system forming a triangular loop, a change of average polarization occurs as ε is increased beyond a certain value ε 0 , e.g.M < 0 for ε < ε 0 andM > 0 for ε > ε 0 . We find that ε 0 increases monotonically with p such that even an infinitesimal asymmetry in the model can combine with noise to produce a flip in the polarization sign. For larger fully connected systems, this overall change of polarization is maintained. However, as N → ∞, correlations between the states of any two nodes disappear. It then becomes appropriate to describe the system's dynamics in a mean field approximation where each node is influenced only by the system averagesū,v andw. Dropping overbars, the dynamics becomeu = f (u, w) andẇ = g(u, w) [21]. In the steady state (u =ẇ = 0), we see that u = h = w = 1/3 is a fixed point for all ε and p. Further, for p = 1 and ε < (1 + 3/ √ p) −1 ≡ ε c (p) the stable fixed point polariza- . For p < 1, M must be computed numerically and looses its symmetry w.r.t. the origin. Nevertheless, ε c (p) still represents a bifurcation point for the fully-connected system below which two stable polarized fixed points exists. (Details: Supplement). Switching through dynamical correlations -How do dynamical correlations influence bistability in the presence of noise? To answer this, we explore the range of systems where dynamical correlations are important, hence connectivity is low (k N ), but system size is large (N 1). We first simulated graphs where each node has identical degree k. Dynamical correlations are markedly reflected in the phase portrait where trajectories are distorted compared to the fully-connected system, and often cross themselves or one another (Fig. S2). For p = 1, which makes the model symmetric regarding the states u and w, we yield a symmetric bifurcation diagram when initializing the system in the extremal states M = ±1, as expected. This diagram is qualitatively similar to the one obtained for the fully-connected system q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q p=.2 FIG. 2: Dynamics in networks. a, Regular random graph with ki = 3 for all nodes i and symmetric system (p = 1). Dependence on ε for infinite, fully-connected system: solid (dashed) gray lines show stable (unstable) fixed points; simulated, sparsely-connected system (bold black lines) when initializing in the two extremal statesw = 1 andū = 1, respectively. Vertical orange mark indicates the bifurcation point εc(p) = (1 + 3/ √ p) −1 = 1/4 for (i). b, Similar to (a) but for p = .2, i.e. for a non-symmetric system. Dashed vertical line marks the abrupt transition fromM < 0 toM > 0. The vertical orange line marks εc(.2) ≈ .13. Light blue symbols mark the distribution maxima obtained through the extended mean field approximation (Fig. S4) for k = 3. Arrows mark the path taken byM when starting fromM = −1 and ε = 0, then first increasing ε towards unity and then again reducing ε to zero. c, Similar to (b) but for scale-free degree distribution with α = 2.4 and k ≈ 3. Hysteresis in (c) is similar as for (b), arrows not shown. System size for simulations: N = 5000 (see also: ( Fig. 2a). However, the bifurcation point in the network, referred to as ε * (p), has now moved to considerably lower values of ε * (p = 1) ≈ .14 < ε c (p = 1) (Fig. 2a) -an effect of the dynamical correlations retained by demanding small k . Bistability has become more sensitive to noise.
Second, we introduce asymmetry (0 ≤ p < 1), which means that conditional transitions (i.e. contingent on either u or w at neighboring sites) back-and-forth between u and v are now less frequent than conditional transitions between v and w. This changes the picture qualitatively ( Fig. 2b) as compared to the symmetric case: while the system remains bistable at low noise (ε < .04), the lower branch of fixed points is more sensitive to ε and collapses when ε is increased. The system switches abruptly -i.e. changes the sign of M -to a w-dominant configuration, and remains polarized there, even as noise is increased further [22]. Reducing ε again below ε * (p) does not restore the u-dominant state, the system remains polarized atM > 0. Only when ε → 1 does the system approach the neutral state whereM = 0. Henceforth we use the terms "vulnerable" and "resilient" for the corresponding branches of the system. In contrast, in fully-connected systems, switching polarization is ruled out for any 0 ≤ p < 1 (Fig. S3b). |M | of a polarized state will decrease as ε is increased and finally collapse to the neutral state (M = 0) for sufficiently large noise ε < 1.
Third, we ask whether such abrupt switches also occur in heterogeneous graphs. We synthesized random networks with algebraic degree distributions P (k) ∝ k −α , α ≈ 2.4, which gave similar k ≈ 3 as in the previous simulation for the regular graph. The qualitative features are preserved (Fig. 2c): bistability is present at low noise but an abrupt switch fromM < 0 toM > 0 occurs at a value of noise significantly lower than that of the bifurcation in the fully-connected graph for comparable p. Heterogeneous degree implies that the expected state of any individual node may also be different from that of any other. Indeed, states not only depend on a node's degree, but additionally on the broader context nodes play within the network (Fig. 3). When extracting any node's stationary temporal mean state, we find that a node's neighborhood defines its rank, i.e. relative mean state within the network, and this rank is roughly unchanged, even when varying noise or initial conditions. Further, random graphs with Poissonian degree distributions as well as the two-dimensional square lattice (Fig. S6) again gave qualitatively similar behavior as in Fig. 2b,c.
We also checked the behavior of networks with k > 3. For fixed p < 1, we monitored the abrupt transition from bistability to monostability in degree-regular networks of increasing k . Over approximately two orders of magnitude of k such transitions are always found (Fig. S9). As k → N , the transition point ε * (p) approaches the bifurcation point ε c (p) of the fully-connected system. We find the difference |ε c (p)−ε * (p)| to decrease algebraically as |ε c (p) − ε * (p)| ∼ k −s(p) with 0 < s(p) < 1 weakly increasing with the asymmetry parameter p (Fig. S10). Cause of switching -To further explore the origin of the switching we developed a mean-field calculation for sparsely connected networks. The introductory discussion on the N = 2, N = 3, and fully-connected systems highlights that any mean-field approach ignoring dynamical correlations is bound to fail and that it should consider states formed by more than two sites. Using a star motif (Fig. S11) and retaining correlations up to the size of this motif, we write a self-consistent equation for all occupation probabilities of states for the motif. The time dependence of occupation probability of each state is then a sum over contributions from internal (within the motif) and external states. External states involve conditional probabilities, satisfying compatibility with the internal states. Solving numerically for the steady state approximates the occupation probabilities Π i (Details: Supplement). We find that, for k = 3, an asymmetric bifurcation diagram can be obtained when the fixed points are identified as the states of maximum probability (Fig. 2b). However, as the degree of the star motif is raised, our mean field approach predicts that one of the branches of fixed points disappears. We note the limitation that a mean-field calculation based on a star-motif assumes a tree-like topology and hence disregards effects of clustering and long-range connections. To distinguish those effects, we performed additional analysis by implementing Watts-Strogatz networks to interpolate between a regular ring lattice and a random graph. As a result of repeated rewiring (Fig. S7b-d), which gradually reduces clustering by increasing longrange connectivity, the switching ε * (p) systematically moves towards larger values of noise (Fig. S7). We see this as evidence of the role of long-range connections in enabling the presence of two stable states at small values of noise. Bistability can be suppressed by increased locality with associated node-node dynamical correlations.
This interpretation further implies the existence of dynamical correlations reaching beyond the immediate neighborhood of a given node. Using a real-world network, the neural wiring diagram of C. elegans [19] (Fig. 3) and a synthetic scale-free network (Fig. S14), we consider the long-term average of each individual network node. Disregarding the transient phase, we consider only the behavior afterM settles for either of the polarized states (Fig. 3a). We find that a node's degree alone is not a sufficient predictor of its average state within the network (compare Fig. 3a and b). Remarkably, even when changing ε, nodes maintain their relative polarization w.r.t. the remaining network, i.e. the curves shown in Fig. 3 are approximately still smooth when noise is varied. Two-layer networks -Finally, we combine two network layers A and B, where nodes can change their state by influences from either of the layers. For simplicity, we consider each layer to consist of statistically equivalent regular random graphs. We allow the dynamics in layer A to be defined by the parameters used previously (i.e. p 1 = p 4 = 1, p 2 = p 3 = .2) whereM > 0 in its resilient branch. When uncoupled, layer A would hence bahave as described above. In layer B, rates are generally smaller: p 1 = p 4 = .1, p 2 = p 3 = .5 and swapped regarding u and w. Hence, B's resilient branch is characterized bȳ M < 0 (Fig. 4, inset). Both subsystems are initialized as M = 1 and allowed to reach a steady state. We couple A and B by requiring a randomly chosen fraction c f of nodes to have shared states in the two layers, i.e. their state in one network is copied to the other network at any update. The remaining fraction of nodes never transfer their states from one layer to the other. At every time step, A or B is chosen at equal probability and a random node of that network is updated as before.
At low noise ε, both systems are w-dominant (Fig. 4). As ε is increased, the behavior depends on c f . For weak c f , B will eventually transition to its resilient configuration, where M < 0. Notably, this transition is still abrupt for small c f (Fig. 4, red curves). As c f is moderately increased (purple curves), the abrupt switch of B is first reduced and eventually disappears but B still shows changing sign of M [23]. A, conversely, is also impacted upon by B, since the magnitude ofM in A is somewhat reduced by the coupling. As c f is increased further, even the change of sign finally disappears (blue curves). For sufficiently large coupling (e.g. c f = .64) the curves of A and B assimilate and the magnitude ofM at low noise becomes maximal: the copying process between layers is now so efficient that they act as a single layer with enhanced effective degreek > 3 and intermediate effective parameters (compare Fig. S12). Hence, the effect of combining layers is to dilute the correlations between any two neighbors by the larger number of influences from their enlarged neighborhood. The relative frequencies of updating (e.g. the rates p i ) in the two layers determine which of the two will be able to impose its state on the other. Variants are explored in Fig. S15. For instance, if network A is initialized in its vulnerable state, it will eventually -after its own change of polarization -still dominate the system dynamics by pulling B through the switch.
Discussion -We analyzed noise-induced switching caused by dynamical correlations in a simple three-state model for bistability, but our findings may generalize to other models that induce similar dynamics. The switching is a first-order irreversible transition from bistability to monostability. Our results suggest that this switching state exists in the intermediate regime where local dynamical correlations are present while nonlocal coupling allows for bistability to be maintained. When two systems interact, the dominant one imposes its resilient state on the other as coupling increases. This suggests that the propensity of a system to switch polarization under noisy environments can be both counterbalanced or enhanced by its matching with another dominant system.
Conceptually, our findings could be seen as an extreme case of Parrondo's paradox. Within this analogy, unbiased noise is a neutral game which transforms a second biased game from overwhelmingly losing to winning when the two are combined. In contrast to Parrondo's paradox, the implications of our findings may hence be far less subtle: modulating noise could be exploited as a simple mechanism to trigger strong behavioral changes in systems with bistability, e.g. in epigenetics or economics. For several systems coupled within an overall noisy environment, those with higher rates of conditional reactions will not only be more resilient but may often dominate other systems that are coupled to them. Our work might allow for a new perspective on a wide range of transitions in noisy bistable systems such as those that are found in biology or the social sciences.   Fig. 2b, main text. Panels were obtained using simulations with 30000 nodes.

Finite systems
For small systems consisting only of few sites and links connecting each pair of sites, the steady state can be computed exactly. As each node can take one of three states, the n-particle state space consists of 3 n states, i.e. the basis increases exponentially with the number of sites. However, considering permutation symmetry the basis reduces to m(n) = (n 2 + 3n + 2)/2 states, i.e. the basis then only increases approximately quadratically with the number of sites. The smallest interacting system is that consisting of only two nodes, connected by a link. Hence, there are m(2) = 6 basis states. By evaluating all m(2) × m(2) transition probabilities P (χ j → χ i ) for transitions between n-particle states χ i and χ j with i, j ∈ {1, . . . , m} we write a dynamical equation for the occupation prob-abilities Π i : Solving for Π i = 0 and considering the vector M ≡ (M 1 , . . . , M m ) of "magnetization", where M i ≡ w i − u i for each n-particle state, we obtain the average "magnetization"M ≡ i Π i M i as a function of noise ε and the asymmetry parameter p,M =M (ε, p). As a function of ε, for any given value of p ∈ [0, 1] we find that, when n = 2,M maintains its sign for all values of noise ε (Fig. S1a). Conversely, for n = 3,M has a zero-crossing as ε is increased. Retaining only terms linear in ε we approximate the curveM (ε, p) and, by settingM (ε, p) = 0, reach an approximate expression for the zero crossing ε 0 (p) as a function of p: We find that the function ε 0 (p) increases monotonically with p ( Fig. S1b) and approaches a finite value ε 0 (1) = 1/33 as p → 1, p < 1. Hence, even infinitesimal asymmetry p = 1 in the model (Fig. 1) can induce a noise-dependent change of sign akin to the original Parrondo paradox.
Examining larger system (n > 3) numerically, we find the zero crossing and qualitative behavior ofM to be similar to that of n = 3. However, when considering a comparison with the noise effect in the Parrondo paradox, as system size is increased, the system at hand will hardly reside in a state nearM . This is because the model ( Fig. 1) induces bistability, whereas this effect is absent in the Parrondo paradox (discussed in the main text).

Infinite system
To see the existence of bistability, consider the limit n → ∞, where each node is connected with any other. In this limit, the probability for one node to be influenced by any given neighbor in two consecutive node updates decays as 1/n. For sufficiently large n it hence becomes appropriate to neglect all correlations and assume any node to only "feel", i.e. respond to, the mean densities u,v, andw. Withū +v +w = 1 one is left with two coupled nonlinear differential equations describing dū/dt and dw/dt.
We now show that these equations allow for two stable fixed points whereM = 0 when ε < ε c (p). For ε ≥ ε c (p) a stable fixed point exists atM = 0. This is the extreme case of the model in Fig. 1 (main text) where the limit of large systems and full connectivity is considered, i.e. N → ∞ and 2L/N = N − 1. As mentioned, when all nodes are connected any correlation between individual nodes is lost. The conditional probabilities in Fig. 1b can then just be described as contingent on mean densities of the three statesū,v, andw. We drop the overbar for simplified notation. The resulting dynamical equations then readu The functions f (u, w) and g(u, w) depend only on the densities of the states u and w because the density of v results from the conservation of probability, i.e. v(w, u) = 1 − w − u. Using the conditions in Fig. 1, the expression on the RHS of Eqs S3 and S4 are: To further discuss the fixed points of Eqs S5 and S6, as in the main text we again specialize to the case where p 2 = p 3 and p 4 = p 1 . By rescaling time, we can always express all quantities in units of p 1 , i.e. we make the replacement p ≡ p 2 → p 2 /p 1 , ε → ε/p 1 and p 1 → 1. The equations S5 and S6 then read: For these simplified equations, there is a line of fixed points at u = w = 1/3 for all ε and linear stability anal-ysis of these fixed points reveals a bifurcation point at For ε < ε c this fixed point is an unstable saddle, while for ε > ε c it is a stable node. The characterization of the remaining fixed points depends on the remaining parameter p (Fig. S3). For p = 1, in the diagram of M = w − u two symmetric branches of stable fixed points exist for ε < ε c = 1/4, which approach ±1 as ε → 0. In this case, solving for f (u, w) = g(u, w) = 0 yields the fixed points where we have defined ε ≡ ε/(1 − ε) for simplified notation ( Fig. S3a and Fig. 2a, gray curve). The difference w − u becomes For p < 1, ε c diminishes and the branches of stable fixed points become asymmetric (Figs 2 and S3b). An additional line of unstable fixed points appears, which joins the bifurcation point and the upper branch of stable fixed points. Analytical solutions are hard to come by for this asymmetric case.
Consider now a system of low noise, i.e. ε → 0. In this case, the system will approach one of the stable fixed points w − u = ±1. As the noise level is increased, the state of the system will follow the corresponding branch of stable fixed points, until it eventually reaches w − u = 0. In the case of the lower branch, i.e. w − u < 0, this approach will be continuous. In the case of the upper branch, i.e. w − u > 0, there will be a jump discontinuity (compare Fig. S3b). However, in all cases, the neutral value of w − u = 0 will be approached monotonically as the noise level is increased. The effect of noise hence is to reduce the polarization of the system state and eventually yield a state where all three configurations u, h and w are equally likely. Maintaining the limit of sufficiently large system size, we have confirmed that the results described in the main text for the infinite system size limit are retained when using numerical simulation for large systems (n > 10 4 sites), hence benchmarking our quasi-infinite system size simulations with the exact results. Lowering the average degree k we retrieve the effect of correlations, hence k 1. In these simulations with correlations, we refer to ε * (p) as the transition point between a single and multi-valued functionM (ε, p).

SPARSELY CONNECTED SYSTEMS -MEAN FIELD APPROXIMATION
Consider again the network of N nodes but take the connectivity to be sparse, i.e. L/N N . For simplicity, assume that the links are equally distributed among all nodes, so that each node has exactly k connections to other nodes. This topological structure is sometimes called a regular graph of degree k. In the following, we refer to the link between nodes n and r as l nr . Choose any node at random for update. Consider a central node and its k nearest neighbors, forming a cluster of k + 1 nodes and k links. In general, the configuration space of this k+1-node cluster spans s k ≡ 3 k+1 states. We refer to any of the k−node states i as χ i and to the probability of this state being occupied as Π i . Writing the dynamical equations for the transition probabilities (Fig. 1, main text) for each transition between these states, the steady-state solutions can be computed.
However, this would ignore all interaction with the surroundings of the cluster. In order to achieve a better approximation, consider the following: When all nodes have equal connectivity, choosing a target node and its partner is equivalent to choosing a link and assigning one of the two nodes involved as the target node. Now consider all links within the cluster (shown in red in Fig. S4) as well as those outside the cluster (black). There are k ways to choose an internal link and the target node will always lie within the cluster. There are k(k − 1) ways to choose external links, however, the target node will only lie within the cluster at probability 1/2. Together, the probability to choose internal updating is then We now need to separate the two cases of selecting an internal versus external link. Internal update -Selecting a link and target node generates a transition matrix for all cluster states. A given state χ i thereby has a probability P (χ i → χ j ) to transition to state χ j . This probability is straightforward to determine, given the schematic in Fig. 1 and noting that any possible transition is constrained to states that differ only by one elementary move. E.g. the transition {uuuv} → {uuuw} is possible when node four is selected as the target site and its probability is then Mean-field approximation. Example for connectivity k = 3. A central node is connected to three nearest neighbors, which themselves are each connected to three nearest neighbors. In the tree-approximation it is assumed that nearest neighbors are connected only by one path. We distinguish internal and external links, which are indicated by red and black colors, respectively. Also shown: several configurations of unique four-node states, i.e. those that do not map onto one another by permutations of the individual single-node states.
External update -Updating an external link can modify any but the central node of the cluster. We now need to consider the updating of external cluster states, i.e. clusters with central nodes located at sites 2, . . . , k+1 w.r.t. the original cluster. These updates require some book keeping: For a given cluster state χ i not all external cluster states will be possible. A sum needs to be carried out over nodes j ∈ {2, . . . , k + 1}, where the respective states of the pairs (1, j) constrain the possible external cluster states. Given this subset of states, all possible transitions are again enumerated. It is then back-tracked how these transitions affect the respective node j, which in turn leads to an update of the internal cluster state.
The temporal evolution, in time units of N node updates, of the occupation probability Π i hence is: Here, is the conditional probability that the remaining nodes {q} are in the states {s q } given that nodes 1 and x are in the single-node state s 1 and s x , respectively. The sum in the denominator extends over all configurations of external states that are compatible with the single node states at sites 1 and x and serves as a normalization. Notably, in Eq. S14 in the first term all configurations χ i of the internal cluster can contribute while in the second term only those configurations Π j contribute, which are compatible with a given single-node state s 1 at the central site and a state s x at a given peripheral site. Note further that in the first term only linear contributions in the occupation probabilities Π i occur, while the second term involves also product terms of probabilities Π i .
The mean field approximation in the equation consists in the closure of correlations up to the size of the cluster. It is implicitly assumed that longer-ranged correlations do not contribute and can be neglected. Eq. S14 constitutes a set of non-linear equations in the occupation probabilities Π i , which itself has to be solved numerically. Fig. S11 shows examples of the density of states for the solution to Eq. S14 when only the internal cluster is considered. The density of states is initially peaked at low and high values of w − u, with the high-u state the most likely. As noise is increased, one peak gradually disappears and a single-peak distribution results.
The crucial features of this plot can be represented more compactly by extracting the peaks of the distribution function (Fig. S5). For the cluster consisting of four sites, excluding the surroundings, Fig. S5a shows the dependence of maxima on the noise ε. Indeed, at low ε a bimodal distribution is present, whereas for larger noise (ε > .15) the distribution is unimodal. We compare this to the mean field solution where also the surroundings are taken into account (Fig. S5b). Now the peaks for the u-dominant state decay more rapidly, the pattern is much more skewed than in Fig. S5a. At a value of ε ≈ .04 the distribution is unimodal and peaks are present for the w- dominant state. This result should be compared to the simulation result in Fig. 2b (main text), where similar qualitative features are present.

Additional network structures
In Fig. S6 we supplement main text Fig. 2 by including two additional network geometries. A random graph is one where links are assigned randomly without restrictions to each node's degree. The resulting degree distribution is then binomial and for sufficiently large systems approximated as a Poisson distribution. We simulated the dynamics (Fig. 1, main text) for otherwise similar parameter configurations as in Fig. 2b,c (main text), i.e. k = 3 and p = .2. The result is very similar to that of the degree-regular graph (Fig. 2b). This means that the relatively small degree fluctuations about the average do not lead to strong deviations of the polarization.
We also simulated a two-dimensional square lattice, i.e. translationally invariant graph of degree k = k = 4 (Fig. S6b). Despite the larger degree, which would be expected to shift the zero-crossing to larger values of ε, the zero-crossing now actually moves to markedly smaller values of ε as compared to the Poissonian (Fig. S6a), degree-regular (Fig. 2b) or scale-free (Fig. 2c) cases. This finding again highlights the potential importance of disorder in granting a large range of bistability.
To explore this further, we tested for the effect of disordering an initially clustered system (Fig. S7). We start from a one-dimensional ring of N sites and connected each node to several of its immediate neighbors by a link. After computing the transition for this configuration, we rewired the network by swapping links randomly. In the link swapping procedure, two links are chosen at random, and if they do not involve any nodes twice, one end of each is exchanged with the corresponding node for the partner link [1]. The effect of repeated rewiring can be q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q p=.2 q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q p=. captured in terms of the clustering coefficient (Fig. S7). We find that for several different values of mean degree k ranging from 4 to 20, lower clustering, i.e. more disorder, means larger values of ε * .

Sensitivity analysis
We performed sensitivity tests regarding system size and dependence on degree. System size -For varying values of the asymmetry parameter p, we investigate the dependence of the bifurcation point on system size (Fig. S8). The results show that for regular graphs already system sizes of approximately 5, 000 nodes are sufficient to reach stable estimates for the critical value of noise. Dependence on connectivity -As pointed out in the main text, even for p = 1 the bifurcation point is influenced by connectivity (degree). As nodes are more and more connected, the bifurcation point approaches the analytical value ε c = 1 + 3/ √ p −1 (compare Eq. S9). We define the departure of the transition point ε * (dropping explicit reference to p for simplified notation) from the fully-connected bifurcation point ε c as As ∆(k) will approach zero as k → ∞, i.e. when approaching the fully-connected system, we can quantify the functional form of the approach. Fig. S9 shows ∆(k) on a double-logarithmic scale. For different values of p the curves approach zero as a power-law, with varying coefficients (Fig. S10). We further explore the effect of biased noise, i.e. we test whether the transition in polarization is still possible when an overall drift towards the u-state is present. We induce such a drift by modifying the noise rate ε in such a way that ε − ≡ ε + ε is the rate of transition for the reactions w → v as well as v → u, while for the inverse reactions ε + ≡ ε − ε (To be clear, note that the symbol ε used here should not be confused with the previous symbol in Eqs S10 and S11). The coefficient ε thereby describes the drift in the direction of u. For various values of ε we show again the dependence on degree k (Fig. S9b). The plot demonstrates that indeed the transition is present for varying levels of drift. However, the curves for ε > 0 are no longer power-law curves, the decay as a function of degree is now faster. Even moderate values of connectivity now remove the transition, i.e. connectivity can be seen as supporting the disorder in the system, thereby removing the correlations that previously were necessary for a transition to occur. Fig. S12 shows the transition for a perfectly coupled two-layer system, i.e. each node exchanges its state between the two layers when it is updated. We ask, whether different dynamics, defined by differing parameters p in the two layers, can induce differences in the transition of the coupled system. The figure shows that the transition for the coupled system can be described well by averaging the parameters within the two individual layers, i.e. a coupled system with parameters p a = .2 (layer a) and p b = .4 (layer b) behaves similarly as a system with both parameters set equal to the average .3.
Verifying abruptness of transition -To prove that the sparsely-connected systems, e.g. shown in Fig. 2, in fact perform an abrupt transition, we map out the occupation probability density of polarization P (w − u) for each value of noise. To obtain the occupation probability density, we have performed long simulations, requiring that the system transition sufficiently many times between the two extremal states within the bistable regime, i.e. the entire range of states must be sufficiently explored. As transitions become exceedingly rare for larger systems, we were only able to compute the spectrum for systems up to 59 sites. Larger systems gave very noisy results, but slightly smaller ones were qualitatively consistent with the results we show here. The function −log(P (w − u)) can be seen as describing the "potential function" of the correlated system (Fig. S13). Notably, a bifurcation is defined here as the transition from a bimodal to a unimodal regime, associated with the disappearance of one stable fixed point as noise is increased. Indeed, the figure shows the presence of two potential minima, corresponding to stable fixed points, for low noise but only one remaining stable fixed point (w − u > 0) for larger noise. The disappearance of the fixed point w − u < 0 is abrupt in the sense that the two fixed points do not merge as noise is increased. Rather, the potential minimum at low w − u gradually becomes more and more shallow and finally disappears altogether.
Variants of coupled two-layer systems -We have further investigated several ways of coupling twolayer systems. Beyond the case shown in the main text (Fig. 4), we here show additional cases: • Not a fraction of nodes is permanently coupled, but all nodes are coupled at a reduced rate c r . The rate is a probability that states how likely it is that one node will couple its state to the other layer after it has been updated (Fig. S15b).
• The product of the two coupling coefficients c r c f is held constant, and different values of c r are explored (Fig. S15c).
• Another type of game is explored, where the two layers have similar hierarchy of parameters (p < 1 in both layers), but the magnitude of this parameter varies between the two layers (Fig. S15). Effect of clustering. a, Rewiring operation, by which a 1D-lattice with nearest and next-nearest neighbor couplings is gradually converted to a random graph of unchanged degree. Green points are network nodes, black lines are links, red lines indicate a pair of rewired links. The lattice is taken to have cyclic boundary conditions. b, Transitions for varying levels of disorder for k = 4, ε * increases with disorder. The fully disordered system is a regular graph of degree k = 4. System size: N = 2, 000. c, Similar to (b) but for k = 6. System size: N = 2, 000. d, Similar to (b) but for k = 20. System size: N = 5, 000. Note that we use a larger system size for the higher connectivity here to ensure that finite size effects are still minimal. e, Summary of ε * for the simulations in (b)-(d) as a function of clustering coefficient. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q p=0.1 p=0.2 p=0.5

FIG. S8
: System size dependence. Simulation for regular graph with k = 3 (compare Fig. 2b), but varying system size and p. Each curve represents one value of p (marked in legend) and points are simulation results for the transition between u-dominant and w-dominant states. Lines serve as guide to the eye and connect the points obtained by the simulations. Vertical dashed line indicates N = 5, 000, a value we find sufficient for finite size effects to be considered small. FIG. S13: Potential wells. For each value of noise ε long simulations with n = 59 nodes and k ≈ 3 n were carried out. As these systems are comparably small, sufficiently many fluctuations between the quasi-stable states at negative and positive M do occur to compute the probability density function P (M = w−u). Curves ranging from blue to red show the function −log(P (w − u)) for a range of ε. Inspecting these functions shows that the minimum at M < 0 becomes increasingly weak and finally disappears. The function obtains an overall slant towards positive M .   S15: Effect of coupling a two-layer system. a, A fraction c f is coupled between two layers of a regular (k = 3) network. There is no randomness of coupling noise (cr = 1), i.e. at each update of a given coupled node, the state of this node in one layer is copied to that in the other. The parameter c f is varied as shown in the legend. b, Similar to (a) but now keeping c f = 1 maximal, i.e. all nodes are capable of copying their states. Now, cr is varied as shown in the legend, i.e. the probability of copying the state in one layer to the other layer. c, Similar to (a) but keeping the product crc f = .1 fixed. Parameters of different curves as shown in legend. d, Coupling two systems as shown in panel inset. Solid (dashed) lines represent systems A and B, respectively. cr = 1. Numbers near lines and in legend represent the fraction of nodes permanently coupled (c f ). System sizes in all panels: N = 2, 000.