Extraction of the Underlying Structure of Systematic Risk from Non-Gaussian Multivariate Financial Time Series Using Independent Component Analysis : Evidence from the Mexican Stock Exchange

Regarding the problems related to multivariate non-Gaussianity of financial time series, i.e., unreliable results in extraction of underlying risk factors -via Principal Component Analysis or Factor Analysis-, we use Independent Component Analysis (ICA) to estimate the pervasive risk factors that explain the returns on stocks in the Mexican Stock Exchange. The extracted systematic risk factors are considered within a statistical definition of the Arbitrage Pricing Theory (APT), which is tested by means of a two-stage econometric methodology. Using the extracted factors, we find evidence of a suitable estimation via ICA and some results in favor of the APT.


Introduction
The goal of the present paper is to determine the statistical pervasive systematic risk factors in the Mexican Stock Exchange by means of an uncommon computational technique, namely, Independent Component Analysis (ICA), in order to detect a more reliable structure of the pervasive factors driving the returns on equities in the Mexican Stock Exchange (BMV for its acronym in Spanish).
Because of its nature, ICA is designed by assuming a linear mixture of random variables that are not normally distributed, which is a relevant property for the problem we are dealing with.This technique helps to reveal a linear combination of underlying time series; by extracting their statistically independent components, the pervasive sources of some observed parallel time series can be explained.
ICA has been used, mainly in fields such as signal and image processing, speech and audio separation, biomedical signals and image analysis, telecommunications, neurophysiology, text and document processing, bioinformatics, environmental issues and some industrial applications.In relatively recent years, studies about the applications of ICA in different fields of Finance have been made in some countries.
The works that we considered more relevant in the context of our research have used ICA for extracting the following: the underlying factors explaining the stock returns in Japan [2], Hong Kong [4], Italy [9], the USA [24] and during the crisis period [25]; the relevant factors driving the movements from implied volatility surfaces of index options [1]; the factors driving the movements of a term structure on interest rates in Germany [35]; the factors driving spot rate curve movements in the USA [3]; the factors moving the returns for real estate investment trusts in the USA [30], and for estimating the factor model of returns for the USA Thrift Saving Plan Funds [37], and the factors for pricing multiasset derivatives [26].
Moreover, some other representative studies of ICA in Finance have used this technique for the following purposes: (1) to analyze the interactions between currencies in the Foreign Exchange [36]; (2) to model the conditional higher moments risk in international stock markets [48], the term structure of multiple yield curves [46], and the volatility of market price indexes [47]; (3) to manage investment portfolios [8]; (4) to allocate assets [32]; (5) to forecast financial time series [30]; (6) to compute improved portfolio risk measures such as VaR in banking sector [6,7]; (7) to explain the volatility of investment funds [45]; (8) to generate an equity sector classification [43]; (9) to improve bank performance evaluation [29]; (10) to produce multifactor index variance from the SPX sector ETF returns [38]; (11) to measure the dependency between stocks in the USA [17], and (12) to analyze herding among hedge fund styles [27].
As far as we are concerned, there is no study regarding the application of the ICA in Finance focused on Mexico.Consequently, we shall try to fill this gap in financial literature by contributing with the application of a novel extraction technique to extract the underlying structure of risk factors in the Mexican Stock Exchange.
The outline of this paper is as follows.In section 2, we briefly describe the ICA technique; in section 1 According to [44] there are two approaches to solve the BSS problem: one based on the Independent Component Analysis and another based on Second Order Statistics.3, we present an empirical study; and in section 4, we draw the main conclusions.

ICA Basics
Despite the widespread evidence concerning the non-Gaussianity of the returns on equities, the most popular latent variables analysis techniques used for extracting the pervasive factors underlying the financial multivariate data are Principal Component Analysis (PCA) and Factor Analysis (FA), which assume a Gaussian distribution of the latent factors.
ICA represents an improved extraction technique for this kind of data, since it is based on a multivariate non-normality approach and looks for mutually and statistically independent components.According to [21], statistical independence means that not one of the components gives any information about the others.
Also following [10], mutually and statistically independent can be interpreted as being of different nature.ICA was introduced in the field of signal processing and neural computation as a tool to solve the problem of Blind Source Separation (BSS) and Signal Reconstruction.
According to [40], the former concept implies revealing hidden factors from observable measures, where we know very little about the original signals and their process of generation. 1  The basic technique for solving this kind of problem is ICA, which assumes that the observed variables are the result of an unknown mixing process of some latent original sources.Consequently, the observed variables can be decomposed by means of a demixing process, capable of estimating some statistically independent components that can be considered as reliable proxies for the original sources that generated the observed variables (s ≈ y).
The main characteristic of the latent sources is that they are assumed to be non-Gaussian and mutually independent.They are known as the independent components of the multivariate observed data.
According to [5], the formal expressions of the mixing and demixing processes in the basic ICA model are as follows: Demixing process:  =  = . ( where x represents the vector of observed variables; A, the mixing matrix; s, the vector of original sources; y, the vector of the independent components; and W, the demixing matrix, which we assume as being invertible.Since we are ignorant of both the input and output processes and also the original sources, the ICA methodology makes several assumptions: a) both the original sources and the components y are non-Gaussian and mutually independent; b) the number of observed mixtures is equal to the number of original sources, so the unknown mixing matrix is square; c) if the independent components are equal to the original sources, the mixing matrix A will be the inverse of the demixing matrix W: Under these assumptions we can estimate both W and y from x by looking for some components as statistically independent as possible.Thus, the objective of ICA is to find a demixing linear mapping W in which the components y would be as statistically independent as possible.
In relevant literature we can find mainly three estimation criteria for ICA: a) the maximization of non-Gaussianity, b) the maximum likelihood estimation, and c) the minimization of mutual information.As it is expressed in [23], under some conditions, the three approaches are essentially equivalent or at least closely related.
The former three criteria allow for different methods of computing the ICs, which resemble one another in the sense that the optimization step is done by means of an iterative algorithm.The two main methods are: the adaptive algorithms based on gradient methods, and the fixed-point iteration scheme algorithm, known as fast fixed-point or Fast-ICA algorithm.

PCA, FA, ICA and Finance
In reference to PCA and FA, [21] state that ICA is capable of finding the underlying factor when these techniques fail; furthermore, [39] declare that ICA might reveal some features that otherwise would remain hidden.In addition, PCA and FA present a limitation that ICA overcomes.It is often believed that PCA and FA generate independent components; however, this is only true if the data are multivariate normally distributed, since uncorrelated components are also independent for Gaussian data.
The real world data and specially the financial time series usually are non-Gaussian.ICA will search statistically independent components for non-Gaussian data.Moreover, independence represents a stronger property than uncorrelatedness, since the former implies the latter but not vice versa.Therefore, uncorrelatedness is not enough to separate the underlying components.From a different perspective, PCA and FA techniques use only the covariance matrix to obtain linear decorrelated components, i.e., they minimize second-order statistics.
ICA uses statistics that are not considered in the covariance matrix, i.e., they additionally minimize higher-order statistics containing information not included in the covariance matrix.Consequently, another problem related to the use of PCA and FA on financial time series is the fact that, in finance, probability distributions have fat tails, and therefore the outliers can distort the estimation of the parameters in both cases.
Conversely, ICA presents a special problem absent in both PCA and FA: the estimated independent components (ICs) are not explicitly ranked as in the other methods, where the factors are automatically ranked by their eigenvalues.Additionally, therefore we have to apply an algorithm able to order the ICs according to some criteria.
In the case of financial series, on the other hand, it is reasonable to assume that there is a set of independent factors that underlie the observed time series, which might be related to political, meteorological, technical, fundamental, macroeconomic, market, national or international aspects, and that ICA might be an appropriate model to extract them.Consequently, ICA is very suitable for use on financial time series for the following reasons: first, ICA deals with the problem of blind source separation or dealing with parallel time series, like those obtained from financial variables; secondly, ICA works with non-Gaussian random variables, which are the ones most commonly found in financial data; thirdly, from statistical and financial standpoints, ICA produces more reliable underlying components or factors, since they are statistically independent and not only uncorrelated.This fact contributes directly to the aim of extracting systematic risk factors affecting the returns on equities in a multifactor asset-pricing model like the Arbitrage Pricing Theory.

The Data
We used four different databases formed as follows: First, for the sake of comparison with previous research [28], we ran our study over two databases consisting of 291 quotations, formed on the basis of weekly closing prices in log-returns from 20 stocks of the Mexican Stock Exchange over the period running from July 3, 2000 to January 27, 2006. 2 One of these two databases is stated in returns (DBWR) and the other, in excesses of the free-risk interest rate (DBWE). 3 Besides, we also used two other daily databases, one expressed in returns (DBDR) and another in excesses (DBDE).The period of the daily databases, consisting of 1410 observations from 22 stocks, extended from July 3, 2000 to January, 27, 2006. 4  2 The criteria utilized to choose the sample of stocks for these studies have been their inclusion in the main index of the Mexican Stock Exchange (IPC) and a survival bias during the analyzed period.The period considered was defined by the available information, the terms of the IPC index's samples and the explanatory character of this study in the pre-crisis period.More recent periods will be used in future researches where we will analyze the prediction potential of this technique during other periods of time (crisis / post-crisis). 3In consistence with our previous research [28], the riskless interest rate is assumed to be equal to the government securities' daily funding interest rate published by the Bank of Mexico. 4In the same sense, as stated in our previous research [28]: "The number of assets and the periods considered were defined by the The returns were calculated using the logarithmic returns of the stocks' closing prices, in accordance with the following expression: Although ICA does not require time series being stationary, by using the continuous logarithmic returns analysis to compute the returns on equities as expressed in expression 4, we already are considering that the prices time series are not stationary and that a difference has been done in order to make those series stationary in mean.In addition, as the returns are differential values, the underlying mean and trend are discarded, and thus the ICA algorithm is able to capture the interactions between the different stocks at a given moment.
On the other hand, the ICA as a methodology does not require that each time series intrinsically be stationary.What ICA assumes is that the overall set of time series preserve the same kind of interactions between times series, that is, the statistics of the observations might change, but the interaction between them captured by the matrix W does not change.
Finally, it is a fact that by averaging over longer time intervals, such as increasing the time period from daily to weekly to monthly, gives a time series that increasingly has a lower discrepancy (see [11]); however, the discrepancies at the high values of the returns in the QQ plots with respect to a Gaussian at the level of one month, are compatible with the assumptions about non-Gaussianity needed for the ICA algorithm.
available information in accordance with a survival bias criterion.Unfortunately, since there are many gaps in the observations of several stocks in the Mexican market, it is very difficult to build a dataset of quotations which contains both a long number of observations and a large number of stocks.In our case, the 20 and 22 stocks considered represents the maximum number of shares from which we could obtain a good enough number of observations of all of them, that allowed us to build complete and homogeneous datasets for both periodicities (without missing values).This fact constitutes a very important aspect for the correct application of the extraction technique presented.In addition, we decided to use two differently structured databases in order to test the case of weekly and daily returns as well as a larger and a smaller number of observations, according to the different studies found in literature." ISSN 2007-9737

Tests for Univariate and Multivariate Normality
It is known [21] that PCA (implicitly) and FA (explicitly) require a normally distributed multivariate sample in order to produce completely reliable results, i.e., they will only produce uncorrelated and independent components if the sample data have no higher order statistics beyond the variance.Thus, if the samples do not fulfill these conditions, we will be prompted to use a more suitable technique such as ICA to uncover the underlying sources in a non-Gaussian sample.Therefore, we first tested the univariate normality (UVN) of each individual series, since ICA requires that not more than one of the observed signals (the returns on equities) be non-Gaussian.
Tables 1 to 4 present the descriptive statistics up to the fourth moment of the four databases used in this study.We can observe that the skewness and the kurtosis of practically all the stocks differs from those of the Gaussian distribution.
We also carried out the Jarque-Bera test for UVN on the four databases, rejecting the null hypothesis of normality at 5% of probability for all the stocks in the daily databases, but not rejecting it for only one stock in the weekly databases that was normally distributed.The last two columns of the Tables 1 to 4 present the results of the Jarque-Bera test.
We used two classical alternatives for assessing the multivariate normality (MVN) tests:  [34] remark in their study.
We performed two tests following the accepted criterion of applying more than one MVN test when assessing this property of a sample. 5Our results with both tests reject the null hypothesis of MVN at 5 We performed both MVN tests using the Matlab scripts developed by [41,42].5% of probability for all the databases.Tables 5  and 6 present the results of Mardia's and H-Z's tests, respectively.
We extended this analysis by making an experiment concerning the horizon of Mardia's test, i.e., we ran the test using different numbers of observations so as to check the multivariate normality in different scenarios.The results showed that from 101 observations on, inclusive, the sample is non-Gaussian according to the three statistics.
On the basis of the foregoing results 6 , we cannot accept as completely reliable the outcomes of techniques assuming the multivariate normality of data such as PCA and FA; thus, we are led to the In addition, the tests of normality are based on checking this assumption.In particular the non-linearities used for the implementation of the experiments in this paper, guaranteed the presence of high order interactions from the Taylor expansion, and therefore the presence of moments of all orders.

Estimation of the ICA Model
In order to estimate the ICA model in expression (2), we used the ICASSO methodology [20], which is based on the FastICA algorithm [22] 7 .According to the foregoing authors, the FastICA algorithm is based on a fixed-point iteration scheme for finding the local extrema of the objective functions.The software uses the FastICA Matlab package by [13] to estimate the FastICA algorithm.
where the nonlinearity g can be almost any smooth function such as: 3 () =  3 .(8) 8 According to [21], nonlinearity than (a1y) is optimal for super-Gaussian fat-tail distributions; y 3 performs better for sub-Gaussian thin-and g' is the derivative of g(.). 8  The final vector gives one of the ICs as a linear combination in y = w T z.The specific resulting algorithm depends both on the estimation principle used and the approach selected to estimate several numbers of ICs, i.e., the nonlinearity and the decorrelation method chosen.In [21], the authors state that by setting the options, nonlinearity tanh (hyperbolic tangent) and symmetric approach, one can obtain a good estimation of the ICA model; this would be tail ones; and y exp(+y 2 /2) is recommended for highly super-Gaussian distributions or when robustness is very important.equivalent to performing the three estimation approaches at the same time.
In addition, the positive kurtosis obtained in the multivariate normality tests leads us to use the hyperbolic tangent function.
Furthermore, as reported in [14], the best tradeoff for estimating the ICA model, from statistical performance and computational load perspectives, is represented by the FastICA algorithm with symmetric orthogonalization and tanh nonlinearity estimation.In our study we followed these specifications.
The election of the ideal number of ICs to estimate still represents an unsolved problem.
Although in ICA literature we can find diverse criteria to determine this number, in most cases it is actually chosen by trial and error without any theoretical basis.One alternative is to reduce the number of dimensions in the whitening pre- 9 The criteria adopted were the same used in our previous research [28]: "the arithmetic mean of the eigenvalues, the percentage of explained variance, the exclusion of the components or factors explaining a small amount of variance, the scree plot, the unretained eigenvalue contrast (Q statistic), the likelihood ratio contrast, Akaike's processing stage, considering some criteria from among those used in PCA or FA, and to estimate the same number of ICs.For the sake of comparison with our previous study, we use the same test window, which ranges from two to nine components. 9 As stated by [20], one problem that the ICA estimation presents is that the reliability of the estimated ICs is not known since the results are stochastic, i.e., they might be dissimilar in different runs of the algorithm.
Thus, the results of a single run of the FastICA algorithm could not be completely trusted and an additional analysis of the reliability of the estimation should be performed.In this context, reliability has two aspects the algorithmic and the statistical.According to the former authors, ICASSO methodology represents an alternative for dealing with this problem, since it ensures the information criterion (AIC), the Bayesian information criterion (BIC), and the maximum number of components feasible to estimate in each technique."Following [20], ICASSO first runs the FastICA algorithm M times on data set  = [x 1 , x 2 , … x  ], composed of N samples of k vectors; then, ICASSO forms clusters with the ICs produced in each run according to their similarity.Mutual similarities between estimates are computed, using the absolute value of their linear correlation coefficient as the measure of similarity: These elements form the similarity matrix, which can be obtained by: where,  is the covariance matrix of dataset x, and  ̂ is the estimates of demixing matrices  ̂ from each run  = 1,2, … ,  gathered in a single matrix: According to [19], reliable estimates of ICs correspond to tight clusters, since they agglomerate estimates generated by many runs of the algorithm which are similar, even when the initial values and datasets for the estimation have been changed.Conversely, estimates which do not belong to any cluster are considered unreliable estimates.The centrotype of each cluster is considered a more reliable estimate than that generated by any single run.
Besides the previously declared parameters for FastICA, there are some additional parameters to set when using ICASSO, such as the resampling mode, number of resampling cycles (M) and number of clusters (L).In order to ensure both statistical and algorithmic reliability, in our study we used both resampling modes, i.e., each time the dataset was bootstrapped and the initial conditions of the algorithm were randomized.We used the default number of resampling cycles fixed by the software, i.e., 30, and we set the number of clusters according to the number of ICs (m) estimated in each experiment in order to obtain squared mixing (A) and demixing (W) matrices.
The demixing matrix (W) computed by ICASSO corresponds to the centrotypes of each cluster as well, representing a more reliable estimate than that produced by a single run of FastICA; however, they are not strictly orthogonalized.In the context of our research where we need to obtain orthogonalized ICs, we will have to make an orthogonalization procedure in a later step.
Consequently, we first took the demixing matrix (W) produced by ICASSO, then we computed the mixing matrix: and the matrix of independent components or sources:

Ranking and Orthogonalization of the Independent Components
The ICA model does a decomposition by means of a criterion related to statistical independence, which does not allow to order in a natural way the components and thus the residual.The criterion presented in this section is one criterion that has sense in the application at hand.In contrast with the case of linear regression or PCA, where the driving noise is easy to identify, because it is a residual obtained after the components of maximum variance are determined, in the case of ICA such an interpretation will not be natural.
Because of this, in the literature about ICA it is not clearly specified the difference between the components and the residual, and therefore the results are usually presented as a complete projection in the space statistically independent components.Then, next we ordered the independent components in terms of their explained variability by means of the criterion proposed by [12].This criterion ranks the ICs according to the amount of variance of the stocks that explains each one of them, thus we obtain a ranked matrix of independent components (S r ), as well as sorted mixing (A r ) and demixing matrices (W r ).
Finally, we orthogonalized the matrix of ICs by means of the following process of transformation: Computación y Sistemas, Vol.
where V is a transformation matrix to decorrelate the matrix of sorted independent components, and S o represents the matrix of orthogonalized ICs.

Extraction of Underlying Systematic Risk Factors Via ICA
In each one of the four databases, we computed eight multifactor models in order to extract a window from two to nine independent components.Then, we proceeded to reconstruct the original variables according to the generation process of expression (1), including the inverse of the transformation matrix V in order to orthogonalize the mixing matrix A as well: The reproduced values were very similar to the observed series for greater part of the equities in 10 As in our previous paper [28], the rest of the estimations when we extract 2, 3, 4, 5, 6, 7 and 8 components showed similar behavior.The observed results are typical.all the datasets, which indicates that the generative multifactor model performed by ICA was effective.However, stocks such as GMODELO, CEMEX, SORIANA and GCARSO were not very well reconstructed, especially in the cases of daily returns and excesses, due to the high volatility they presented during the studied period.To save space, we only present the line plots for the first five stocks appearing in the returns and excesses observed and reproduced from each database.
Figures 1 to 4 present the results of the case when we extracted nine underlying factors; the reconstruction performance is evident. 10An interesting fact of the ICA algorithm is that it captures the global interaction between stocks, independently of the non-stationarity of the joint behavior.That is, the required assumption in the model is that there are independent sources that are mixed by a matrix W.
If the matrix does not change, the ICA algorithm will give an estimation, and therefore, given that   the matrix does not change, it will impute the components of volatility to some of the nonobservable factors.

Independence Test
In order to test the independence of the computed ICs, we ran the Hilbert-Schmidt Independence Criterion (HSIC) test [15] 11 , which tests whether random variables X and Y are independent based on a sample of observed pairs (xi, yi).The results of our independence tests confirmed the statistical independence, between each pair of components estimated from the weekly and daily databases.

Econometric Contrast
We carried out an econometric contrast under a statistical approach to the Arbitrage Pricing Theory (APT) using the underlying systematic risk factors extracted via ICA.The APT's pricing equation is expressed as follows: In the same outline that in [28], 0 represents the riskless interest rate, k the risk premium for each kind of systematic risk factor, and k the exposures to each type of systematic risk.We tested the former expression by way of an average crosssection methodology estimating the coefficients by ordinary least squares (OLS) in the following regression model: We used again the two-stage methodology for the econometric contrast of the APT used in our aforementioned study [28], which is explained as follows: In the first stage, we estimated the betas to be used in expression 18 from the scores of the extracted factor.In the second stage, we estimated the lambdas.In the first stage we estimated the betas by regressing the factor scores obtained by ICA as a cross-section on the returns and excesses.In order to improve the efficiency of the parameter estimates and to eliminate autocorrelation in the error terms of the regressions, we used weighted least squares (WLS) to estimate the entire system of equations at the same time.
The results of the regressions in the four databases were very good, producing, in almost all cases, statistically significant parameters, high values of the R 2 coefficients and results in the Durbin-Watson test of autocorrelation, which lead us to the non-rejection of the null hypothesis of noautocorrelation.In the second stage we estimated the lambdas or risk premia in expression 17 by regressing the betas obtained in the first stage as a cross-section on the average returns and excesses, using ordinary least squares (OLS).
In order to avoid the econometric problems of heteroskedasticity and autocorrelation in the residuals of the model estimated through OLS, we corrected it by means of the Newey-West heteroskedasticity and autocorrelation consistent covariance estimates (HEC).Additionally, we verified the normality in the residuals by carrying out the Jarque-Bera test of normality.
In order to accept the APT pricing model, we require the statistical significance of at least one parameter lambda different from λ0, and the equality of the independent term to its theoretic value, i.e., the average returns, in the models expressed in returns: and zero, in the models expressed in excesses of the riskless interest rate: We used Wald's test to confirm these equalities.
In Table 7, we present a summary of the results of the econometric contrast for the four databases.In general, the results of the explanation power, the adjusted R-squared (R Database of weekly returnsDatabase of weekly excessesDatabase of daily returns Database of daily excessesNote: Logarithmic returns of the first five stocks observed in each database and their respective reconstructions using the estimated ICA model.Stock symbols of the stocks presented appear above each line plots.

Fig. 1 .
Fig. 1.Line plots of the observed and reproduced stocks

Table 1 .
Descriptive statistics and Jarque-Bera Test.Database of weekly returns Computación y Sistemas, Vol.22, No. 4, 2018, pp.1049-1064 doi: 10.13053/CyS-22-4-3083 the Mardia [33] and the Henze-Zirkler [18] MVN tests.Mardia's test is based on the multivariate skewness and kurtosis of the sample.Henze-Zirkler's (H-Z) test considers a measure of the distance between the characteristic function of the MVN and the empirical one, where the computed statistic will be lognormally distributed, if the data is multivariate normal.Both techniques have shown very good performance in measuring the MVN against other classic and newer alternatives, as

Table 2 .
Descriptive statistics and Jarque-Bera Test.Database of weekly excesses

Table 4 .
Descriptive statistics and Jarque-Bera Test.Database of daily excesses

Table 5 .
Mardia Test for Multivariate Normality Database of weekly returns.DBWE = Database of weekly excesses.DBDR= Database of daily returns.DBDE= Database of daily excesses.H0 = Multivariate Normality.p-value lower than 0.05 = Rejection of the H0.

Table 7 .
Summary of the Econometric Contrast