Universality of power-law exponents by means of maximum-likelihood estimation

Víctor Navas-Portella ,1,2,3 Álvaro González ,1,4 Isabel Serra,1,5 Eduard Vives ,6,7 and Álvaro Corral1,2,8,9 1Centre de Recerca Matemàtica, Edifici C, Campus Bellaterra, E-08193 Bellaterra, Catalonia, Spain 2Barcelona Graduate School of Mathematics, Edifici C, Campus Bellaterra, E-08193 Barcelona, Spain 3Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, E-08007 Barcelona, Spain 4GFZ German Research Centre for Geosciences. Telegrafenberg, 14473 Potsdam, Germany 5Barcelona Supercomputing Center, Nexus II, 08034 Barcelona, Spain 6Departament de Matèria Condensada, Facultat de Física, Universitat de Barcelona, 08028 Barcelona, Catalonia, Spain 7Universitat de Barcelona Institute of Complex Systems (UBICS), Facultat de Física, Universitat de Barcelona, E-08028 Barcelona, Catalonia, Spain 8Complexity Science Hub Vienna, 1080 Vienna, Austria 9Departament de Matemàtiques, Universitat Autònoma de Barcelona, E-08193 Barcelona, Spain


I. INTRODUCTION
Generally speaking, a complex system can be understood as a large number of interacting elements whose global behavior cannot be derived from the local laws that characterize each of its components. The global response of the system can be observed at different scales and the vast number of degrees of freedom makes prediction very difficult. In this context, a probabilistic description of the phenomenon is needed in order to reasonably characterize it in terms of random variables. When the response of these systems exhibits lack of characteristic scales, it can be described in terms of power-law-type probability density functions (PDF), f (x) ∝ x −γ , where x corresponds to the values that the random variable that characterizes the response of the system can take, ∝ denotes proportionality and the power-law exponent γ acquires values larger than 1. The power law is the only function which is invariant under any scale transformation of the variable x [1]. This property of scale invariance confers a description of the response of the system where there are no characteristic scales. This common framework is very usual in different disciplines [2,3] such as condensed matter physics [4], economics [5], linguistics [6], geoscience [7], and, in particular, seismology [8,9].
It has been broadly studied [10,11] that when different complex systems present common values of all their powerlaw exponents and share the same scaling functions they can be grouped into the same universality class. Therefore, it is important to determine these exponents rigorously, not only to properly characterize phenomena but also to provide a good classification into universality classes.
In practice, exponents are difficult to measure empirically. Due to experimental limitations that distort the power-law behavior, the property of scale invariance can only be measured in a limited range. Therefore, when a power-law distribution is fitted to empirical data is more convenient to talk about local or restricted scale invariance. In this context, the wider the range the fitted PDF spans, the more reliable and strong this property will be.
A paradigmatic example of power-law behavior in complex systems is the well-known Gutenberg-Richter (GR) law for earthquakes [12]. This law states that, above a lower cut-off value, earthquake magnitudes follow an exponential distribution; in terms of the magnitude PDF f (m) = (b log 10)10 −b(m−m min ) ∝ 10 −bm , (1) defined for m m min , with m the magnitude (moment magnitude in our case), m min the lower cut-off in magnitude, b is the so called b value, and log corresponds to the natural logarithm. The general relationship between seismic moment x and moment magnitude m is given by: measured in units of Nm [13]. The GR law is a power-law distribution when it is written as a function of the seismic moment x: where we conveniently define γ = 1 + 2 3 b (b > 0 and γ > 1) and x min corresponds to the value of the seismic moment of the cut-off magnitude m min [14] introduced in Eq. (2). Note that this PDF has a domain x ∈ [x min , +∞).
An earthquake catalog is an empirical dataset that characterizes each earthquake by an array of observations: time of occurrence, spatial coordinates, magnitude, and so on. The magnitude m min is usually associated to the completeness threshold, such as all earthquakes with m m min are recorded in the catalog [14]. For m < m min , some events are missing from the catalog due to the difficulties of detecting them (e.g., Refs. [15,16]), especially when the seismic activity rate suddenly increases, such as in aftershock sequences, in which earthquake waveforms tend to overlap each other difficulting detection [14,[17][18][19]. This incompleteness distorts the powerlaw behavior below m min , whose value should be an upper bound to encompass these variations.
One has to keep in mind that there also exists an upper cutoff due to the finite-size effects [20], implying that, at a certain value of the magnitude, there are deviations from the powerlaw behavior. Consequently, strictly speaking, the range of validity of the GR law cannot be extended up to infinity [7,21]. By ignoring which is the model that conveniently fits the tail of the distribution, the power-law behavior has to be restricted to an upper cut-off x max and the PDF for the truncated GR law is written: defined for x ∈ [x min , x max ]. Recent studies regarding the acoustic emission (AE) in compression experiments of porous glasses and minerals [22][23][24][25][26] or wood [27] have focused the attention on the energy distribution of AE events due to the similarities with the GR law for earthquakes [21]. According to the terminology which is used in some of these studies, we will name as labquakes those AE events that occur during the compression of materials.
Earthquake and labquake catalogs as well as other empirical datasets in complex systems only report a limited range of events, making it difficult to estimate parameters of the power-law PDF accurately. In this work we try to solve this problem by combining datasets with rigorous statistical tools, with the goal of finding a broader range of validity when the different datasets correspond to the same system. If different datasets can be combined and characterized by a unique power-law exponent, that means that the particular exponents of each dataset are statistically compatible. When different phenomena share the same power-law exponents for the distributions of all their observables, they can be classified into the same universality class. Consequently, this methodology represents a statistical technique to discern whether different phenomena can be classified into the same universality class or not. In order to conveniently classify earthquakes and labquakes into the same universality class, all the observables should be taken into account and all the corresponding exponents should be compatible. In this work we will illustrate this methodology by focusing on the distribution of seismic moment for earthquakes and the AE energy for labquakes. The results will reveal whether they are candidates to be classified into the same universality class or not, but the final establishment of their belonging to the same universality class would require the study of all the possible observables.
The manuscript is structured as follows: in Sec. II we will describe the different statistical procedures that are used in order to conveniently merge datasets and to find a global power-law PDF. Although previous procedures to combine independent datasets have been already proposed [28], the approach presented in this paper is totally different. In Sec. III we apply this methodology to different datasets that are explained in Sec. III A and analyzed in Sec. III B for earthquake catalogs and in Secs. III C and III D for AE data obtained during the compression of two different materials. In Sec. IV we present our conclusions, and details supporting the methodology are presented in Appendixes A-D.

Statistical methodology: Merging datasets
By considering n ds datasets of N i (i = 1, . . . , n ds ) observations each, one wants to fit a general power-law distribution with a unique global exponent for all of them. We assume that for the ith dataset, the variable X (seismic moment if one works with the GR law for earthquakes or AE energy if one works with the GR law for labquakes) follows a power-law min , x (i) max ] given by Eq. (4) from a certain lower cut-off x (i) min to an upper cutt-off x (i) max with exponent γ i and number of data n i (n i N i ) in the range [x (i) min , x (i) max ]. Note that one also can consider the untruncated power-law model for the ith dataset if x (i) max → ∞. By means of the methods explained in Refs. [7,29] (or, alternatively, Ref. [30]), one can state that data from the ith dataset does not lead to the rejection of the the power-law hypothesis for a certain range. Note that, in the ith dataset, the variable X can acquire values in a range typically spanning several orders of magnitude.
Generally, the procedure of merging datasets is performed by selecting upper and lower cut-offs x (i) min and max ] for each dataset. Data outside these ranges are not considered. All the possible combinations of cut-offs {x min } and {x max } are checked with a fixed resolution (see below). The residual coefficient of variation (CV) test can be used to fix some upper cut-offs, thus reducing the computational effort. For more details about the CV test, see Appendix B.
Given a set of cut-offs, datasets can be merged by considering two different models: (i) Model α: All datasets are merged by considering a unique global exponent (γ i = for all datasets).
Note that model α is nested in model β and the difference in the number of parameters characterizing these models is n β − n α = n ds − 1. Since we are interested in merging datasets with a unique global exponent (model α), we need enough statistical evidence that this simpler model is suitable to fit the data. For a given set of cut-offs, the fit is performed by means of the following protocol: (i) Maximum likelihood estimation (MLE) of model α: The log-likelihood function of model α can be written as: where x i j corresponds to the n i values of the variable X that are in the range x (i) min x i j x (i) max in the ith dataset and is the global exponent. The definition of likelihood is consistent with the fact that likelihoods from different datasets can be combined in this way [31]. At this step, one has to find the valueˆ of the global exponent that maximizes the loglikelihood expression in Eq. (5). For the particular expressions corresponding to the truncated and untruncated power-law PDF, see Eqs. (A2) and (A3) in Appendix A. If all the powerlaw distributions are untruncated, then this exponent can be easily found analytically [32] aŝ where the hats denote the values of the exponents that maximize the log-likelihood of the particular power-law distribution (model β) and the general one in Eq. (5). If truncated power-law distributions are considered, one has to use a numerical method in order to determine the exponent that maximizes this expression [29,33].
(ii) MLE of model β: The log-likelihood function of model β can be written as: using the same notation as in Eq. (5). For the particular expressions corresponding to the truncated and untruncated power-law PDFs, see Eqs. (A2) and (A3) in Appendix A. The values of the exponents that maximize Eq. (6) are denoted asγ i . (iii) Likelihood-ratio test: We perform the likelihood ratio test (LRT) for the models α and β in order to check whether model α is good enough to fit data or not in comparison with model β. For more details about the LRT see Appendix A. If model α "wins," then the procedure goes to step (iv). Otherwise, this fit is discarded and a different set of cut-offs {x min } and {x max } is chosen, and the procedure goes back to step (i). Note that model α can be a good model to fit if the particular exponentsγ i do not exhibit large differences among each other in relation to their uncertainty.
(iv) Goodness-of-fit test: In order to check whether it is reasonable to consider model α as a good candidate to fit data, we formulate the next null hypothesis H 0 : The variable X is power-law distributed with the global exponentˆ for all the datasets. In this work, we are going to use two different statistics in order to carry out the goodnessof-fit tests: the Kolmogorov-Smirnov distance of the merged datasets (KSDMD) and the composite Kolmogorov-Smirnov distance (CKSD). The KSDMD statistic can be used as long as datasets overlap each other, whereas the CKSD statistic does not require this condition. For more details about how these statistics are defined and how the p value of the test is found, see Appendix C. If the resulting p value is greater than a threshold value p c (in the present work we are going to use p c = 0.05 and p c = 0.20), we consider this as a valid fit and it can be stated that the variable X is power-law distributed with exponentˆ along all the different datasets for the different ranges {x min } and {x max }. Otherwise, this fit will not be considered as valid, a different set of cut-offs is chosen and the procedure goes back to step (i).
When all the combinations of cut-offs have been checked, one may have a list of valid fits. In order to determine which one of them is considered the definitive fit, the following procedure is carried out: (i) The fit that covers the largest sum of orders of mag- ]} is chosen. If the power-law fit is untruncated, then x (i) max can be substituted by the maximum observed value x (i) top . If there is a unique candidate with a maximum number of orders of magnitude, then this is considered as the definitive global fit; otherwise, the procedure goes to the next step (ii).
(ii) The fit with the broadest global range: min ] } for i = 1, . . . , n ds is chosen. If there is a unique candidate, then this is considered as the definitive global fit; otherwise, the procedure goes to the next step (iii).
(iii) The fit with the maximum number of data N = n ds i=1 n i is considered as the definitive global fit. By means of these three steps, a unique fit has been found for all the datasets analyzed in this work. Nevertheless, one could deal with datasets in which more conditions are needed in order to choose a definitive fit unambiguously.
At the end of this procedure, if a solution is found, then one is able to state that the datasets that conform the global fit correspond to phenomena that are candidates to be be classified into the same universality class, at least regarding the observable X . If no combination of cut-offs is found to give a good fit, then it can be said that there exists at least one catalog that corresponds to a phenomenon that must be classified in a different universality class. The performance of the methodology over synthetic power-law data with different sample sizes and relative difference in exponents δ γ is shown in Appendix D.

III. APPLICATIONS
In this section we apply the methodology of merging datasets to different earthquake and labquake catalogs. First, we try to merge three earthquake catalogs. Second, a fourth catalog of charcoal labquakes is added in order to check whether these two phenomena are candidates to be classified into the same universality class or not. Finally, we apply the methodology to four catalogs of Vycor labquakes that cover different observation windows.

Earthquake catalogs
We have selected catalogs that have different completeness magnitudes in order to cover different magnitude ranges. We hope that a convenient combination of these datasets will give us a larger range of validity of the GR law with a unique exponent. Let us briefly describe the catalogs that have been used in this work (see also Fig. 1): (a) Global Centroid Moment Tensor (CMT) Catalog: This catalog compiles earthquakes worldwide since 1977 [34,35], from which we analyze the dataset until the end of 2017. This catalog reports the values of the moment magnitude as well as the seismic moment. Given that the seismic moment is provided with three significant digits, the resolution of the magnitude in the catalog is approximately m 10 −3 .
(b) Yang-Hauksson-Shearer (YHS) A: It records earthquakes in Southern California with m 0 in the period 1981-2010 [36]. This catalog does not report the seismic moment but a preferred magnitude that is approximately converted into seismic moment according to Eq. (2). The resolution of the catalog is m = 0.01.
(c) Yang-Hauksson-Shearer (YHS) B: It is a subset of YHS A that contains the earthquakes in the region of Los Angeles in the period 2000-2010 [36]. This LA region is defined by the following four vertices in longitude and latitude: Fig. 1). This region has been selected because it is among the best monitored ones [15,38]. Furthermore, we selected this time period due to the better detection of smaller earthquakes than in previous years [38], which should reduce the completeness magnitude of the catalog [15]. The resolution of this catalog is the same as YHS A. Figure 1 shows the epicentral locations of the earthquakes contained in each catalog. In the statistical analysis, in order to not to count the same earthquake more than once, the spatio-temporal window corresponding to the YHS B catalog has been excluded from the YHS A and the spatial [ Fig.  1(b)] and temporal window corresponding to the full YSH catalogue (A+B) has been excluded from the CMT catalog.

Labquake catalogs
We performed uniaxial compression experiments of porous materials in a conventional test machine Z005 (Zwick/Roell). Samples with no lateral confinement were placed between two plates that approached each other at a certain constant displacement rateż. Simultaneous to the compression, recording of an AE signal was performed by using a piezoelectric transducer embedded in one of the compression plates. The electric signal U (t ) was pre-amplified, band filtered (between 20 kHz and 2 MHz), and analyzed by means of a PCI-2 acquisition system from Euro Physical Acoustics (Mistras Group) with an AD card working at 40 megasamples per second with 18-bit precision [39]. Signal preamplification was necessary to record small AE events. Some values of the preamplified signal were so large that could not be detected correctly by the acquisition system. This fact led to signal saturation and, consequently, to an underestimated energy of the AE event [32]. Recording of data stopped when a big failure event occurred and the sample got destroyed.
An AE event (often called AE hit in specialized AE literature) starts at the time t j when the signal U (t ) exceeds a fixed detection threshold and finishes at time t j + τ j when the signal remains below threshold from t j + τ j to at least t j + τ j + 200 μs. The energy E j of each event is determined  2 3 4 5 6 7 8 9 f Estimated survivor functions S(x) of the seismic moment for each catalog (a) and for the merged catalogs (b). Estimated PDFs f (x) of the Gutenberg-Richter law for each catalog (c) and for the merged catalogs (d). The merged histograms are plotted by following the procedure explained in Ref. [32]. Fits are represented by solid black lines. Top axes represent the same scale in moment magnitude. Note that earthquakes from the CMT and YSH A catalogs below their respective lower cut-offs x min have been excluded in the bottom plots. of 24 dB for the detection threshold referring to the signal U (t ), not the preamplified signal (in such a way that after preamplification the threshold moves to 64 dB). This value of the threshold was set as low as possible in order to avoid parasitic noise. The sample corresponded to commercially available fine art fusains (HB 5 mm, NITRAM, Canada).
(ii) Vycor: We use the labquake catalogs of Ref. [32], in which the authors performed four experiments with cylindrical samples (diameters = 4.45 mm and heights H = 8 mm) of Vycor (a mesoporous silica glass with 40% porosity) at a constant rateż = 0.005 mm/min. Before compression, samples were cleaned with a 30% solution of H 2 O 2 during 24 h and dried at 130 • C. The four experiments were performed with the following preamplification values: 60, 40, 20, and 0 dB, and the respective values of the detection threshold, 23, 43, 63, and 83 dB (in such a way that after preamplification the threshold always moves to 83 dB). This value of the threshold was set as low as possible in order to avoid parasitic noise.
An important difference between these two materials is the degree of heterogeneity. The synthetic mesoporous silica structure of Vycor is much more homogeneous than the one of the charcoal, which has been formed through different natural processes and may contain voids and macropores. These structural differences may lead to differences in the energy exponents [40].

B. Merging earthquake catalogs
In its usual form, the GR law fits a power-law model that contemplates a unique power-law exponent. Nevertheless, several studies have elucidated the existence of double-powerlaw behavior in the GR law for global seismicity [7,41,42]. Authors in Ref. [7] claim that a truncated power law with exponent γ 1.66 cannot be rejected up to m max 7.4 and a second power-law tail emerges from m min = 7.67 with an exponent γ = 2.1 ± 0.1.
of the radiated energy for the merged earthquake and charcoal labquake catalogs. The fit is represented by a solid black line. The methodology to merge the three earthquake catalogs is the same as in the rest of plots, whereas the addition of the charcoal catalog to this fit has been done ad hoc by conveniently rescaling both parts (those corresponding to charcoal and earthquakes, respectively). Each part has been divided by an effective number of events by assuming that the probability in each part corresponds to that obtained from a global power-law exponent with exponentˆ from x (0) min to x (3) max . Note that events from the CMT, YSH A and YSH B catalogs below their respective lower cut-offs x min have been excluded in the plot.
We have checked these results by using the residual CV test. Furthermore, we have been able to establish that, by fixing the upper truncation at m max = m min = 7.67, the truncated power-law hypothesis cannot be rejected (see Table I). Consequently, if one wants to fit a power-law PDF with a unique exponent, one has to exclude all those earthquakes with m m max = 7.67. For the CMT catalog, we fix the upper cut-off x max = 10 3 2 m max +9.1 and we work by following the same procedure described in the previous section. The rest of catalogs can be safely fitted by untruncated power-law PDFs because the CV test does not reject the hypothesis of a unique power-law tail and the magnitudes which are studied are considerably smaller than those in the CMT catalog.
Thus, we consider two untruncated power-law distributions and a third one which is truncated for the CMT catalog In Table II we present the results of the global fit for models α and β. The values of the statistics as well as the resulting p values are shown in Table III. It can be observed that, for this particular set of cut-offs, the CKSD test has a smaller p value and can be considered to be a more strict statistic with respect the KSDMD (in agreement with the results in Appendix D). No fit with a smaller p value has been found for the KSDMD statistic, and the fit shown in Table II [32,43]. The results for the CKSD statistic do not show remarkable differences if the critical p value is set to p c = 0.20 (see Table II). We consider that the definitive fit is the one whose goodness-of-fit test has been performed with a threshold value of p c = 0.20. Given the number of data for each dataset and the relative difference among exponents (around 1%), these results are compatible with the ones obtained from synthetic catalogs shown in Appendix D.

C. Merging earthquakes and charcoal labquake catalogs
Motivated by the fact that the power-law exponents of earthquake catalogs and charcoal labquakes are very similar (see Table I), we apply this methodology by adding also the charcoal catalog.
At this step it is important to stress the fact that the seismic moment does not correspond with the radiated energy E r by the earthquake, which would be the reasonable energy to compare with the AE energy. Whether the ratio of seismically radiated energy over the seismic moment is independent on the moment magnitude is still an unsolved question [44]. A constant ratio would imply that the static stress drop is constant for all the earthquakes. This ratio may depend on different earthquake parameters such as moment magnitude and the depth of the source [45][46][47][48]. On the contrary, some authors argue that the seismically radiated energy is in some cases underestimated and this ratio can be considered as constant [49][50][51][52]. For our study, we are going to consider this ratio as constant so that the values of the seismic moment should just be multiplied by an unique factor. The value of this unique factor is E r M = 10 −4.6 , see Ref. [53], where M is the seismic moment (previously called x). In this case, x corresponds to the energy radiated in seismic waves by earthquakes E r and the AE energy. . Estimated PDFs f (x) of the Gutenberg-Richter law for each Vycor catalog (c) and for the merged catalogs (d). The merged histograms are plotted by following the procedure explained in Ref. [32]. Fits are represented by solid black lines. In the bottom plots, events corresponding to the experiments performed at 40, 20, and 0 dB below their respective lower cut-offs as well as those corresponding to the experiments performed at 60, 40, and 20 dB above their respective upper cut-offs have been excluded.
It can be shown that, for both models α and β, multiplying the variable by a constant factor only introduces a constant term in the log-likelihood that does not change neither the maximum nor the difference of the log-likelihoods. As the CKSD statistic is a weighted average of the particular KS distances of each dataset, it does not change either. Note   (2) TABLE II. Results of fitting the models α and β to the earthquake catalogs when doing the goodness-of-fit test with the CKSD statistic and different values of p c . If the goodness-of-fit test is performed with the KSDMD statistic, the same set of cut-offs as the fit done by using CKSD 0.20 is found (see Table III). Same notation as in Table I (5) that, as the data do not overlap, the KSDMD cannot be used. Therefore, the results shown in Table I Table IV we present the results of the global fit for the CKSD statistic. The value of the global exponent is approximately in agreement with the harmonic mean of the particular exponents of the GR law for each catalog [32]. The results do not show remarkable differences if the critical p value is set to p c = 0.20 (see Table IV). We consider that the definitive fit is the one whose goodness-of-fit test has been performed with a TABLE III. Results of fitting model α to all the catalogs when doing the goodness-of-fit test with the KSDMD and the CKSD statistics for p c = 0.05 (same cut-offs for all the catalogs) and for the set of cut-offs for which the p value is larger or equal than p c = 0.20 for the CKSD statistic. KSDMD has a unique fit because the fit already exceeds p c = 0.20. For the definition of D e , see Appendix C. The remaining notation is as in previous tables. threshold value of p c = 0.20. The fit is shown in Fig. 3. These results are compatible with the ones obtained from synthetic catalogs shown in Appendix D.

D. Merging Vycor labquake catalogs
The values of the particular exponents γ i of Vycor labquake catalogs differ remarkably from those found for earthquakes and charcoal labquakes [32]. No combination of the cut-offs has lead to a good fit if Vycor labquakes ar merged with charcoal labquakes or earthquakes. Therefore, Vycor labquakes can be considered to be in a different universality class.
Due to the limitations due to saturation effects for highenergy AE events, it is convenient to check whether it is necessary an upper truncation for the power-law regime [see  rejects the null hypothesis. See Appendix D for more details.
Although the fit performed with the KSDMD covers a larger number of orders magnitude, the definitive fit will be the one performed with the CKSD and p c = 0.20 (see Table V). Given that the ranges covered by the other acceptable fits lead to the rejection of the global power-law hypothesis when the CKSD is used with p c = 0.20, this fit is more restrictive than the other ones (see Appendix D) and it is the "safest" choice. The definitive fit is shown in Fig. 4(b) and 4(d) and covers about nine orders of magnitude in energy with a number of data N = 24 496 entering into the fit.

IV. CONCLUSIONS
We have presented a statistical procedure to merge different datasets in order to validate the existence of universal power-law exponents across different scales or phenomena. This methodology can be useful in the study of different complex systems in order to check whether the power-law exponents obtained via maximum likelihood estimation are statistically compatible among them or not. Therefore, the procedure presented in this paper provides a statistical tool that enables us to establish whether different complex systems can be classified into the same universality class or not.
In this work, the methodology has been applied to the Gutenberg-Ricther law for earthquakes and labquakes. By merging earthquake catalogs, a global power law with a global exponent = 1.667 holds for more than eight orders of magnitude in seismic moment (from m min = 1.93 to m max = 7.67 in moment magnitude). To our knowledge, this is the broadest fitting range that has been found for the Gutenberg-Richter law for earthquakes with a unique value of the exponent [54]. There are catalogs of tiny mining-induced earthquakes which exhibit a much smaller completeness magnitude [55] than the ones of natural seismicity used in this work. They were not considered here because they are not currently public and show b values significantly different [56] from the one found here, which would result in nonacceptable fits when merging them with the rest of catalogs, possibly pointing to a different universality class. Future works involving different earthquake catalogs can be carried out in order to find a broader fitting range of the Gutenbrg-Richter law and also to check whether different regions have compatible power-law exponents or not. This kind of studies would be of interest in order to statistically strengthen the geological arguments that justify the difference in the b values observed in some regions [43].
Earthquake catalogs have been also merged with a charcoal labquake catalog with a global power-law exponent = 1.688 suggesting that these different systems might be classified into the same universality class. Further investigations involving different observables, such as the distribution of waiting times (time between consecutive events), might be necessary in order to properly classify charcoal labquakes and real earthquakes into the same universality class.
A previous methodology for merging datasets was not able to find a good fit for the Gutenberg-Richter law for the four Vycor Labquake catalogs [32]. The previous procedure did not take into account the fact that the cut-offs of the merged catalogs could be different to those that limited the power-law regime for the individual catalogs. With the methodology exposed in this paper, which allows to consider different cut-offs, these Vycor labquake catalogs have been merged and a GR law with exponent = 1.35 for the energy AE events has been found to hold for nine orders of magnitude. As it was proposed in Ref. [40], the labquakes in this material have been found to belong to a different universality class than earthquakes and charcoal labquakes.

ACKNOWLEDGMENTS
The research leading to these results has received funding from the "La Caixa

APPENDIX A: LIKELIHOOD RATIO TEST
The LRT is a statistical procedure used to check whether it is worth using a statistical model α (characterized by a set of n α parameters) which is nested into a more general statistical model β (characterized by a set of n β parameters). The null hypothesis of this test states that the simpler model α is good enough to describe the data and a more general model β does not provide a significant improvement. In order to perform this test, the likelihood ratio is defined as: where log is the natural logarithm and the hats denote that the likelihoods L α and L β are evaluated at those values of the set of parameters for which they reach their maximum values.
Although not all the regularity conditions exposed in Ref. [31] are satisfied when dealing with power-law PDFs, it has been numerically tested that the statistic 2R follows a chisquared distribution with n β − n α > 0 degrees of freedom, at least for the largest percentiles [21]. Once the significance level of the test has been fixed, one is able to determine the critical value 2R c . If the empirical value 2R e is below this threshold, then one can consider that there are not enough statistical evidences to say that the more complex model β is needed in order to describe the data. In this situation, due to its simplicity, model α is preferable to fit data. This procedure can be naively understood as a statistical version of the Occam's razor criterion. Note that the rejection of the null hypothesis does not imply the rejection of model α as a fit to data, on the contrary, the "acceptance" of model α does not imply that it is a good fit to the data. The test is just a relative comparison between two models.
We present the expressions that are necessary to compute the statistic of the likelihood ratio test in this work. The log-likelihood for the untruncated PDF in Eq. (3) is written: whereas the log-likelihood for the truncated PDF in Eq. (4) is written: These expressions are valid for model β and also for model α replacing γ by .

APPENDIX B: RESIDUAL COEFFICIENT OF VARIATION TEST
A useful statistical tool to check whether an untruncated power-law distribution is a good model to fit data in a certain range is the so-called residual coefficient of variation (CV) test [57].
Let us suppose that we have a variable X that acquires N values {x}. We define x ( j) as the jth value of the variable when it is sorted in ascending order: (N ) . The null hypothesis of the test states that there exists a power-law tail given by x > x (k) . The test is based on the idea that for an untruncated power-law distribution the logarithmic coefficient of variation C l is close to 1: where m l corresponds to the mean of the logarithm of the rescaled variable l = log(x/x (k) ): and s 2 l is the unbiased variance: In order to check whether the value of C l is close to 1 or not for a particular number of remaining data, one simulates many samples power-law distributed with the same number of data N − k in order to extract a distribution for C l . It is important to remark that the distribution of the statistic does not depend on the power-law exponent and, therefore, it is not necessary to estimate it previously. Once one determines the level of significance for the test (in this case 0.05), one can obtain the upper and lower critical values of the statistic by checking the percentiles of the distribution of simulated values (percentiles 2.5 and 97.5 in our case). If the empirical value of C l lies between these two critical values, then one cannot reject the null hypothesis of power-law tail. Otherwise, if the empirical value is below percentile 2.5, then the power law is rejected in favor of a truncated log-normal [58]; if the empirical value is above percentile 97.5, then the power law is rejected but there is no alternative. One proceeds by computing the residual C l for increasing values of the index k, thus analyzing different possible starting points for the tail of the distribution.

APPENDIX C: GOODNESS-OF-FIT TEST
Determining whether the null hypothesis of considering a global exponent is compatible with the values of the particular fits presents some differences with the standard Kolmogorov-Smirnov goodness-of-fit test presented in Ref. [29]. The test statistic used in this standard goodnessof-fit test is the Kolmogorov-Smirnov distance (KSD), which is defined as the maximum difference between the empirical and the theoretical cumulative distribution functions. In this Appendix we expose the two different statistics for the goodness-of-fit test that have been used for the global fit: The first statistic is essentially an adaptation of the standard KS test for the merged case, whereas the second method uses a statistic which is a composition of KS distances.
When the value of the global exponentˆ has been found through MLE, one can understand that each dataset contributes to the global PDF with a global exponentˆ in their particular ranges [x (i) min , x (i) max ] (i = 1, . . . , n cat ). The global distribution can be understood as a global power-law PDF with exponent ranging from X min = min{x (i) min } to X max = max{x (i) max }. In this situation, we need to redefine the KSD where we have merged several datasets (the formulas below for truncated power laws, the untruncated case is recovered in the limit case in which the upper cut-off goes to infinity).
(i) Kolmogorov-Smirnov distance of the merged datasets (KSDMD): This statistic can be used as long as datasets overlap each other. Datasets can be merged by pairs by giving a certain weight to each point from data. Depending on which overlapping or nonoverlapping region a point from our data comes from, each point will have a certain weight ω j , j = 1, . . . , N . For more details about the expression of these weights, see Ref. [32]. By sorting data in ascending order, it is easy to construct the empirical cumulative distribution function for the merged datasets by using the following expression: with k = 1, . . . , N and the subindex e refers to the empirical CDF. Note that, for a standard situation in which datasets are not merged, all the weights would be 1 [29]. Note also that the survivor function is S e (x (k) ) = 1 − F e [x (k) ]. Once we have constructed the merged F e (x), we can easily compute the KS distance by: where the values of x are taken from the ranges [x (i) min , x (i) max ] for each dataset.
(ii) Composite KS distance (CKSD): This second statistic can be computed independently on whether the datasets are overlapping each other. This statistic is constructed from the 062106- 11  TABLE VI. Performance of the methodology for merging datasets explained in the main text for two untruncated power laws with exponents γ 1 and γ 2 , lower cut-offs x (1) min = 10 12 and x (2) min = 10 14 (arbitrary units) and sizes n 1 and n 2 . Five different groups are presented depending on the relative difference of the power-law exponents δ γ . For these datasets, the fitted global exponentˆ is estimated and the LRT statistic 2R e is computed. The statistics D (CKSD) e and D (KSDMD) e together with their corresponding p values p-CKSD and p-KSDMD are presented for each combination of datasets. As a complement, the z statistic and the p value p norm -z from to the z test are also shown. p values in bold correspond to those which are below p c = 0.05.
where the subindex i refers to the ith catalog and F e is the empirical cumulative distribution function. In order to compute the statistic, we perform the following summation: where the factors √ n i are due to the scaling of the KS distance with the number of data [59].
Once the statistic is computed, one needs to determine whether it is big or small in relation to the one found for data sampled from a PDF with the same parameters {x (i) min }, {x (i) max }, , and {n i }. Data are sampled from a power-law PDF in the range given by the ith catalog with probability q i = n i /N , where N = n ds i=1 n i . Note that the particular number in each simulated dataset is not necessarily the empirical one n i but the total number of data in the global fit N is maintained. Hence, one needs a first random number to choose the dataset i and therefore the range [x (i) min , x (i) max ] and a second one to generate the random power-law number in that range with exponentˆ [29]. When N events have been generated according to this procedure, one finds the global exponent sim by maximizing the global log-likelihood and computes either D (KSDMD) sim or D (CKSD) sim for the merged simulated datasets.
An estimation of the standard deviation of the global exponent can be computed from the standard deviation ofˆ sim . By performing several realizations of the previous procedure, one can estimate the p value of the fit by computing the fraction of simulated datasets where the simulated statistic is larger than the empirical one.

APPENDIX D: PERFORMANCE ON SYNTHETIC DATASETS
Once the methodology for merging datasets has been presented together with the different goodness-of-fit tests, it is important to check the performance of the method on synthetic data. In order to carry out this analysis, two untruncated power-law-distributed datasets with exponents γ 1 and γ 2 , lower cut-offs x (1) min and x (2) min , and sizes n 1 and n 2 , were generated. The sizes of both datasets were considered to be equal n 1 = n 2 so as to simplify the analysis.
The global exponentˆ of the merged catalogs was estimated according to the methodology exposed in the main text and the empirical LRT statistic 2R e was computed according to the methods exposed in Appendix B. Given that the difference on parameters between model α and β in this case is 1, the critical value of the test with a significance level equal to 0.05 is 2R c = 3.84. If the empirical LRT statistic 2R e is found to be larger than the critical one, then one rejects the null hypothesis that the simpler model α is good enough to describe data and, consequently, more parameters are needed. Once the global exponent and the likelihood ratio statistic were found, the two different goodness-of-fit tests exposed in Appendix C were performed.
The analysis was performed for different dataset sizes as well as different power-law exponents γ 1 and γ 2 . The results of the method for synthetic datasets are shown in Table VI. Five groups are presented depending on the relative difference δ γ between exponents. The p values of the CKSD and the KSDMD goodness-of-fit tests have been computed with 10 4 Monte Carlo simulations (see Appendix C for more details).
In order to compare our results with a test that checks whether two power-law exponents are significantly different or not, the p norm value of the z test exposed in Ref. [54] is also shown. These p values can be computed by assuming that, for a sufficiently large sample sizes, the z statistic follows a normal distribution with zero mean and standard deviation equal to 1.
As one would expect, if both datasets have exactly the same power-law exponent, the null hypothesis that variable X is power-law distributed for all the datasets with the global exponentˆ is not rejected (this has to happen in the 95% of the cases, for a significance level equal to 0.05). When the CKSD goodness-of-fit test rejects the null hypothesis of a power-law distribution with a global exponent, the z test also rejects the null hypothesis of considering γ 1 = γ 2 . The same does not apply for the KSDMD goodness of fit test, where some fits yield to nonrejectable p values, whereas the z test clearly rejects the null hypothesis. In this sense, one can consider that the KSDMD statistic is less strict than the CKSD statistic, which has therefore more power.
However, for the sample sizes which are involved in this work, both goodness-of-fit tests reject the null hypothesis for sufficiently large difference in the exponents. It can also be seen that the null hypothesis is rejected independently on the goodness-of-fit test for those fits in which the likelihood ratio statistic exceeds the critical value 2R c = 3.84. This fact justifies the decision of performing the LRT before the goodness-of-fit test.