Volatility: a hidden Markov process in financial time series

The volatility characterizes the amplitude of price return fluctuations. It is a central magnitude in finance closely related to the risk of holding a certain asset. Despite its popularity on trading floors, the volatility is unobservable and only the price is known. Diffusion theory has many common points with the research on volatility, the key of the analogy being that volatility is the time-dependent diffusion coefficient of the random walk for the price return. We present a formal procedure to extract volatility from price data, by assuming that it is described by a hidden Markov process which together with the price form a two-dimensional diffusion process. We derive a maximum likelihood estimate valid for a wide class of two-dimensional diffusion processes. The choice of the exponential Ornstein-Uhlenbeck (expOU) stochastic volatility model performs remarkably well in inferring the hidden state of volatility. The formalism is applied to the Dow Jones index. The main results are: (i) the distribution of estimated volatility is lognormal, which is consistent with the expOU model; (ii) the estimated volatility is related to trading volume by a power law of the form $\sigma \propto V^{0.55}$; and (iii) future returns are proportional to the current volatility which suggests some degree of predictability for the size of future returns.


I. INTRODUCTION
The volatility measures the amplitude of return fluctuations and it is one of the central quantities in finance [1]. Investors sometimes place even greater emphasis on the level of volatility than on the market trend itself. The reason for this is mainly that the risk of holding an asset is classically associated with its volatility [2]. The theoretical framework used to quantify aspects of price fluctuations has many common points with areas of physics dealing with noisy signals. The research of random diffusion aims to describe the dynamics of particles in random media and its methods have been applied to a large variety of phenomena in statistical physics and condensed matter [3]. Time series describing solar flares, earthquakes, the human heartbeat or climate records show strong correlations, multi-scaling, non-Gaussian statistics and self-organized behavior [4,5,6,7]. These are properties also observed in financial time series where volatility is considered to play a key role [1,8].
The picture that prices follow a simple diffusion process was first proposed by Bachelier in 1900 [9]. Later in 1959, the physicist Osborne introduced the geometric Brownian motion and suggested that volatility can be viewed as the diffusion coefficient of this random walk [10]. The simplest possible assumption -that it is a time-independent constant -lies at the heart of classical models such as the Black-Scholes option pricing formula [11]. More recently it has become widely accepted that such an assumption is inadequate to explain the richness of the markets' behavior [1]. Instead, volatility itself should be treated as a random quantity with its own particular dynamics.
Among its most relevant properties [1,8,12,13,14], volatility is the responsible for the observed clustering in price changes. That is: large fluctuations are commonly followed by other large fluctuations and similarly for small changes [1,8]. Another related feature is that, in clear contrast with price changes which show negligible autocorrelations, volatility autocorrelation is still significant for time lags longer than a year [8,12,13]. Most of these studies introduce a subordinated process which is associated with the volatility in one way or another [15,16,17,18,19,20,21,22,23,24,25,26].
The main obstacle of the appropriate analysis of volatility is that it is directly unobservable. As we have mentioned, volatility provides important information to traders but it is very unclear how reliable the estimates of such a hidden process can be. Investors use several proxies to infer the level of current asset volatility. The most common ways are: (i) to make it equivalent to the absolute value of return changes, (ii) to assume a proportional law between volatility and market volume [26,27,28,29], and (iii) to use the information contained in option prices and obtain the so-called "implied volatility" which, in fact, corresponds to the market's belief of volatility [1].
This already raises the questions: What process is a proper model of volatility and how to adjust the possible parameters to describe various stocks and markets? Among the possible candidates, multifractals [16,17,18,19,20] and stochastic volatility models [15,21,22,23,24,25,26] are the most promising. The fact that volatility is directly unobservable makes even more difficult to get a conclusive answer about the best model. This doubled challenge has deserved the attention from the most diverse disciplines that look at financial markets. All of them have converged on the description of what is technically known as realized volatility and on methodologies for reconstructing volatility path. Several procedures for reconstructing volatility have already been presented being more or less dependent on the volatility model chosen. Mathematics, econometrics, and finance research have a large number of publications during the last decade on this issue (see e.g. [15,30,31,32,33,34,35,36,37,38,39,40,41]).
Our research presents an alternative procedure to estimate volatility from the price dynamics only. Likewise most of the papers in the literature, the price dynamics is represented by a two-dimensional diffusion process: one dimension for price and second dimension for a volatility described by a hidden Markov process. Our case estimates the volatility subordinated time series through maximum likelihood but, in contrast with other studies, having fixed in advance the parameters of the model.
We have decided to focus on a particular stochastic volatility model that is able to circumvent both mathematical and computational difficulties: the exponential Ornstein-Uhlenbeck volatility model [21,22] although the same procedure can be used for a larger class of models. As its name indicates, the model assumes that the logarithm of the volatility follows an Ornstein-Uhlenbeck process, that is: a mean reverting process with linear drift. The resulting model is capable of reproducing the statistical properties of the financial markets fairly well [21].
The paper is organized as follows. In Sec. II we outline the general stochastic volatility framework and more specifically the exponential Ornstein-Uhlenbeck model. In Sec. III we present a maximum likelihood estimator for a wide class of stochastic volatility models. For the case of the exponential Ornstein-Uhlenbeck (expOU) model we present Monte Carlo simulations to show that it performs remarkably well in inferring the hidden state of the volatility process. In Sec. IV the procedure is applied to the Dow Jones Industrial Average. Conclusions are drawn in Sec. V and some more technical details are left to the appendices.

II. STOCHASTIC VOLATILITY MODELS
The geometric Brownian motion (GBM) [10] is the most widely used model in finance. In this setting the asset price S(t) is described through the following Langevin equation (in Itô sense): where σ is the volatility, assumed to be constant, µ is some deterministic drift indicating an eventual trend in the market, and W 1 (t) is the Wiener process. We define the zero-mean return X(t) as where the symbol · · · designates the average value and t 0 is the initial time which is usually set to be zero. In terms of X(t) the GBM is simply written as However, especially after the 1987 market crash, compelling empirical evidence has become available that the assumption of a constant volatility is doubtful [8]. Neither is volatility a deterministic function of time as one might expect on account of the non-stationarity of financial data, but a random quantity [22].
In the most general setting one therefore assumes that the volatility σ is a given function of a random process Y (t): Most stochastic volatility (SV) models assume Y (t) is also a diffusion process that may or may not be correlated with price. The main difference between various models is only the parametrization of this scheme. In a general notation the zero-mean return X(t) defined above is described by the following set of stochastic differential equations where dX = dS/S − dS/S and f , g and h are given functions of Y (t). As shown in Eq. (4), f [Y (t)] corresponds to the volatility, i.e., the amplitude of return fluctuations. However, since f (x) is usually chosen to be a monotonically increasing function, it is not misleading to think of Y as a measure of volatility. Thus, as far as there is no confusion, we will refer to the process Y (t) as "volatility" as well. On the other hand, the function g[Y (t)] describes a reverting force that drives the volatility toward the so-called "normal level". This force brings the volatility process to a stationary regime for long time horizons and the normal level is related to the average volatility in that limit. Finally, the subordinated process Y (t) may have a non-constant diffusion coefficient defined in terms of the function h[Y (t)] which is called the volatility-of-volatility ("vol of vol"). The functions g and h fully describe the volatility process. The resulting dynamics is comparable with the one described by a Gaussian particle trapped in a potential well V (y) whose associated force is −g(y), where g(y) = V ′ (y). In finance one typically proposes convex potentials with only one minimum whose value is related to the normal level of the volatility.
In what follows we will mostly work with one particular SV model, the exponential Ornstein-Uhlenbeck (expOU) model, which follows from the substitutions: Note that in this model the process Y (t) is precisely the logarithm of the volatility, or "log-volatility" for short. The main statistical properties of the model are thoroughly discussed in Ref. [21]. We simply recall that the stationary distribution of the process Y (t) is a Gaussian (i.e., a lognormal distribution for σ) with zero mean and variance β: where β ≡ k 2 /2α.

III. VOLATILITY ESTIMATION
A. The Wiener measure and volatility estimation Let X and Y denote a simultaneous realization of the variables X(τ ) and Y (τ ) in the time interval t − s ≤ τ ≤ t. Omitting all Y-independent terms, we show in Appendix A that the probability density (likelihood) of such a realization is approximately given by Before proceeding further, we will discuss the meaning of this expression. We first note that Eq. (11) has to be understood in the sense of generalized functions [42] since the Wiener process is only differentiable in this sense and, hence,Ẋ(t) andẎ (t) do not exist as ordinary functions and Eq. (11) is just a symbolic expression. Nevertheless, the formula is still valid when the integral and the derivatives therein are discretized with arbitrary small time steps, a requirement that is indeed necessary for numerical computations.
Let us now see some qualitative properties of Eq. (11). The first summand measures the fluctuations of the zero-mean return with respect to the volatility, [Ẋ(τ )/f (Y (τ ))] 2 , and their contribution to the likelihood (probability density) of a given return realization. Note that the higher this contribution is, the lower those "relative" fluctuations are. In the same fashion, the second summand in Eq. (11) measures the fluctuations of the volatility process Y (t) with respect to the vol of vol h(Y ), although in this case these fluctuations are gauged with the mean reverting force −g(Y ). As before, the lower these fluctuations, the higher their contribution to the log-likelihood (11).
While Eqs. (5)-(6) represent a joint model for return and volatility, the stock market data only include recordings of the return process X. The Y process and, hence, the volatility f (Y ), must be inferred indirectly in a Bayesian fashion through Eq. (11). Indeed, the conditional probability density that the realization of the hidden Y -process is Y, given that the observed return is X, reads ln P(Y|X) = ln P(X, Y) − ln P(X).
In consequence, we can find the maximum likelihood sample path of the (hidden) volatility process by maximizing Eq.
Since the second summand of Eq. (12) is independent of Y, we can neglect P(X) in this maximization process. Therefore, the maximization of ln P(Y|X) yields the same result as that of ln P(X, Y). Note that, besides the specification of the stochastic volatility model (that is, the explicit forms of f, g, and h), the only free parameter is s: the duration of past return data to take into account. After substituting the observed return history as X, we will obtain by maximum likelihood the quantity: We should mention that similar maximization problems have been studied in the context of hidden Markov models, where this procedure is called "decoding". When the state space (the number of possible X and Y values) is finite, the optimization can be done exactly by the Viterbi algorithm [43], while there has been limited success in the continuous case [44]. A similar technique has been applied to the forecasting of volatility assuming a binomial cascade model which, unlike stochastic volatility models, has a finite state space (see Ref. [20]). More focused on the stochastic volatility, we should also mention the efforts based on Kalman or Particle filtering, Bayesian inference, conditional likelihood or Fourier methods among other similar techniques (see e.g. Refs. [15,30,31,32,33,34,35,36,37,38,39,40,41]). Most of the cited works assume an specific model, although they never focuss on the expOU model. In addition, they are mainly worried about intraday (high-frequency) data while we are here focussed on reproducing daily (low-frequency) data. It deserves an special attention the work on a superposition of Lévy Ornstein-Uhlenbeck processes for σ 2 crafted to describe high-frequency (intraday) data and provide a power law slow decay for the volatility autocorrelation [15,32]. All the techniques mentioned above provide an efficient way to reproduce the volatility path but, in contrast to our case, the method also serves to estimate the parameters of the model. We have decided to provide the parameters beforehand thus using an independent way of estimating them (see Sec. IV A). It should be left for a future research an accurate comparison between our method and others. We may look for a way to implement parameter estimation in our method.
As we have already stated, our main objective is to design a method able to filter the Wiener noise dW 1 (t) out of Eq. (5) and thus to obtain a reliable estimateŶ (t) of the hidden volatility process Y (t). The method, an extension of the deconvolution procedure previously presented in [21], has basically the following five steps: (i) We simulate a random sample path of the Wiener process dW 1 (τ ) for t − s ≤ τ ≤ t.
(ii) Then a surrogate realization of Y is generated aŝ where t − s ≤ τ ≤ t. Note that this equation requires that f (x) is invertible which implies that f (x) be chosen to be a monotonic function.
(iii) SubstituteŶ s and X into Eq. (11) to calculate the log-likelihood of this realization.
(iv) Iterate (i)-(iii) for I steps, keep the highest likelihood random realization (the conditional median) and assume this to be the proper estimateŶ (t).
(v) The estimate of the hidden process at time t is thenŶ (t). The estimate of the volatility is given bŷ B. Interpretation of the method Let us further elaborate the meaning of such an estimate. In finance, the volatility is often identified with the absolute value of returns variations. Indeed, as a first approximation, we can replace in Eq. (5) the noise term by its expected value and write which shows that the volatility is approximately proportional to the absolute returns. Eq. (5) can be thus thought of as a first approximation toward estimating volatility. Our method, based on the maximization of Eq. (12), takes this estimation two steps further. In effect, the first step was taken in Ref. [21] where we replaced the average |dW 1 (t)| by a simulated sample path. We are now taking a second and more refined step in which we are not only replacing the Wiener noise by a random simulation but, in addition, we perform the maximum likelihood method described by items (i)-(v). Thus we are basically separating the observed returns dX(t) into two sources: σ(t) and dW 1 (t). To do this, we have first considered a specific form of stochastic volatility. Secondly, we have taken the driving Wiener noises dW 1 (t) and dW 2 (t) appearing in Eqs. (5) and (6) to be uncorrelated. Finally, we have assumed that σ(t) is approximately constant over the time step during which we numerically evaluate the derivativesẎ (t) andẊ(t) appearing in Eq. (11). We incidentally note that if h(x) = 0 -the vol of vol is equal to zero -then the stationary solution of Eq. (6) is Y (t) ≡ 0. Thus the model reduces to the Wiener process in which the volatility is constant and absolute returns are uncorrelated [9].

C. The performance of the estimator
In order to test the performance of the estimator, we simulate the expOU process by using Eqs. (7)-(8) with the realistic parameters obtained in Section IV. The relationship between the simulated value of the log-volatility Y (t) and its estimateŶ (t) is given in Fig. 1. The two quantities agree within error bars, so we may state that In what follows we will always use s = 10 days of past data and I = 10 5 iterations for maximization. The time step for discretization will be ∆t = 1 day. For random optimization, the necessary number of iterations I grows very fastly, perhaps exponentially, with s/∆t. The values of s and ∆t cited here were chosen to keep the task computationally feasible. As for the value of I, in this case an increase to I = 10 6 does not improve the estimates noticeably. The estimates generated with this parameter set have negligible bias and they can efficiently distinguish between low and high volatility periods (see Appendix B for a discussion on the robustness and possible bias of the procedure).
An additional verification of the good performance of our estimate is shown in Fig. 2, where we give the actual sample path of Y for a single realization of the expOU model together with the estimatedŶ . As we can see the estimate follows the true log-volatility Y (t) very closely.

D. Volatility forecasting
Another possible approach to the utility of the expOU model is to evaluate its forecasting performance. One can for example use the mean error of absolute returns |dX(t + h)| for a given Monte Carlo simulated path: where | dX(t + h)| is our estimate for the absolute return |dX(t + h)| given an estimateŶ (t) while h is the time horizon (number of days) to forecast ahead. We recall that for our estimates we take the conditional median of our MonteCarlo simulations. Thus for the expOU model we have where M [|dW 1 (t + h)|] denotes the median of the absolute value of Wiener increment. Figure 3 compares five methods to forecast |dX(t + h)| using only information available before time t. For clarity we also give the titles used in Fig.  1 in brackets ("..."):  Finally, note that for h ∼ 500 days all estimators based on the expOU model lose any information included inŶ (t), and converge to the same error level. This is consistent with real data (cf. Sec. IV D).

IV. APPLICATION TO STOCK MARKET DATA
In this section we present an application of the method to actual stock market data. We analyze the Dow Jones Industrial Average (DJIA) index in the period 01/01/1900-05/26/2006, a total of 29, 038 days. In order to work with zero-mean returns, the mean return was subtracted from the actual data. Trading volumes for the index are only available for the period 04/01/1993 − 05/26/2006, a total of 3, 375 days.

A. Parameter estimation
We recall that in the estimation procedure presented here one necessarily needs to assume a theoretical model for the volatility. Having done this, the next step is to estimate the parameters involved in the model chosen. For the expOU model, Eqs. (7)-(8), these parameters are: m, k and α.
Before proceeding further we remark that time increments in real data have a finite size since the market always works on discrete times (for daily data the minimum time increment is 1 day). Thus, in practice, the (infinitesimal) return variation dX(t) = X(t + dt) − X(t) corresponds to a (finite) return increment ∆X(t) = X(t + ∆t) − X(t) where ∆t is the time step between two consecutive ticks. Also the Wiener differentials dW (t) correspond, in mean square sense [45], to the increments where ε(t) is a Gaussian process with zero mean and unit variance [45]. In the present case our time step has a fixed width and is equal to ∆t = 1 day. Coming back to the estimation of parameters, we show in Appendix C that ln m ≈ (γ + ln 2)/2 + ln |∆X|/ √ ∆t , where γ = 0.5772 · · · is the Euler constant. Taking into account that the third summand can be evaluated from data we see that Eq. (20) provides a direct estimation of m.
On the other hand, if the expOU model is appropriate then the empirical estimateŶ (t) of the hidden volatility Y (t) should also be a Gaussian process with a stationary distribution of zero mean and variance given by β = k 2 /2α [see Eq. (10)]. As shown in Fig. 4, if one takes β = 0.61 ± 0.05, the distribution ofŶ (t) is Gaussian and coincides with the theoretical distribution of Y (t) given by Eq. (9). The assumption of a Gaussian distribution for our estimate is robust and it holds for a wide range of parameters.
To fully specify the model we have to obtain the parameter α. We have chosen the value found in Ref. [21] which was obtained in order to capture the long range correlations of the volatility, at least up to 500 days. The model is able to provide the appropriate long range behavior with an infinite sum of exponentials but it is also true that it does not provide a pure power law decay like the models in Refs. [15,17]. In any case, we have seen in Ref. [21] that at least for daily data our approach is satisfactory. Our final set of parameter estimates is thus m = (7.5 ± 0.5) × 10 −3 days −1/2 , α = (1.82 ± 0.03) × 10 −3 days −1 , and k = (4.7 ± 0.3) × 10 −2 days −1/2 .
The errors were determined based on Fig. 4, similarly to the error of β. In this parameter range the distributions of the estimatedŶ (t) and simulated Y (t) series agree well. The results reported throughout this paper are insensitive to the misspecification of these parameters, even beyond these error bars (see also Appendix B).
The distributions of log-volatility are compared in Fig. 4 for four cases: our maximum likelihood procedure applied to Dow Jones, a simulation of expOU, the simple estimate of Eq. (21) and the deconvolution procedure introduced in Ref. [21] which can be written asŶ where W 1 (t) is a simulation of the Wiener process. Note that this deconvoluted log-return estimator has indeed a Gaussian distribution but with a larger variance as hinted in Fig. 4 in view of its wider density. We finally mention that the estimate for the log-volatility ln(σ/m) ≈ ln |dX| given by Eq. (16) shows a non Gaussian and biased distribution as was also reported in Ref. [21]. This suggests thatŶ decon is an appropriate "null model" to contrast withŶ . Both quantities are generated by dividing dX(t) by the increments of a realization of the Wiener process and then taking the logarithm of the absolute value of this ratio. The difference lays in the fact thatŶ takes the realizations that satisfy a maximum likelihood requirement whileŶ decon takes a Wiener process realization that is purely random. In such a way, our method keeps the divisor correlated with dX and, as we will see next, it seems to conserve clustering and memory effects in the log-volatility time series.

B. Clustering and the estimated volatility
To support our claim that the technique presented is powerful enough in filtering the noise dW 1 out of returns we will give a visual comparison based on the following qualitative experiments. Figure 5 [Left] displays a comparison over a 1000-day time interval. One can observe there that the noise level in Y is substantially smaller than inŶ decon , as also inferred from Fig. 4.
In order to show that such a correlation is responsible for suppressing large fluctuations in the ratio, we can perform a second experiment. Thus in Fig. 5 [Right] we see a comparison between the logarithm of absolute returns variations, ln |dX|, and the estimated volatilityŶ . Note that the proper clustering of volatility becomes clearly visible.

C. A comparison with trading volume
The hidden nature of the volatility process has been addressed by several authors [26,27,28]. For instance, Ref. [26] suggests that, instead of the volatility, a good estimate would be the square root of the daily trading volume. That is, where α = 0.5 and M[·] denotes the median. In Fig. 6 we show evidence that supports this assumption. Again, the first estimation for the volatility is |dX(t)|. Therefore, we regress ln |dX(t)| versus ln V (t) as shown in Fig. 6.   In the same figure we also present the regression between the maximum likelihood estimateŶ (t) and ln V (t) which appears to be less noisy than the former regression in accordance with the smaller variance ofŶ (t) compared to that of ln |dX(t)|. Nevertheless, the exponent α ≈ 0.55 is the same for both regressions. There have been similar [46], albeit controversial [47,48], findings for the price impact of single transactions. However, Eq. (22) does not yet imply that volatility is proportional to volume, only that its typical value is (i.e., the median). Fluctuations around the average behavior due to changes in liquidity might have a key role in the process [48]. Therefore, the conditional median of ln |dX(t)| given Y (t) is We point out that this relationship implies some degree of predictability of the absolute changes in return through their median, if one knows the current value of the log-volatility Y (t). We test Eq. (23) for real data and with Y (t) replaced by its estimateŶ (t). As shown by the bottom curve of Fig. 7, the slope of the linear regression between M ln |dX(t)| Y andŶ (t) is not equal to 1 -as would have been implied by Eq. (23) -but 0.9 which still suggests strong predictive power.
Recall that the minimum time step of the empirical data used is 1 day. Hence, Eq. (23) implies the prediction of tomorrow's return based on today's volatility and return. We now want to extend the prediction horizon. To this end we generalize Eq. (23) and propose the following ansatz: where h = 0, 1, 2, · · · . In Fig. 7 we test this ansatz for several values of the horizon: h = 0, 5, 20, 100 and 1000 days. We find that the slope γ(h) is a decreasing function from the value γ(0) = 0.9 to practically zero when h = 1000 days which means a complete loss of memory. Note that, when h = 100 trading days we have γ = 0.25 still implying a slight degree of prediction after approximately five months, which is of the same order of magnitude than the DJIA characteristic time scale, 1/α ∼ 500 days, for the relaxation of the volatility [21].

V. CONCLUSIONS
The volatility is a crucial quantity for financial markets since it provides a measure of the amplitude of price fluctuations. Traders try to follow carefully the level of volatility because it gives the perception of the risk associated with any asset. Although volatility was originally conceived to be the diffusion coefficient of the price return random walk, there is compelling evidence not to consider it a constant, but a subordinated random process. The framework is analogous to that of random diffusion processes which have been applied to a large variety of phenomena in statistical mechanics and condensed matter physics. The main obstacle to get a better knowledge of the volatility's nature is that it is not directly observed. In fact, this is precisely the motivation behind the present research. Our main objective has been to develop a tool which visualizes the sample path of volatility. The procedure derives a maximum likelihood estimate assuming that the volatility is a hidden Markov process. To do so, one needs also to assume a specific model for the volatility. We have chosen a class of two dimensional diffusions commonly known as stochastic volatility models, where the volatility acts as a diffusion particle trapped in a potential well. We have focused on the expOU model and obtained promising results, especially for three reasons: (i) the model is computationally feasible; (ii) its parameters can be easily obtained and fit the data reasonably well; and (iii) the distribution of the estimated volatility is log-normal, which is consistent with the assumed expOU model.
We have shown for the Dow Jones index daily data that the sample path of our estimated volatility improves other estimates. We have compared our estimation with a rather typical one which identifies volatility with absolute return changes. Our estimation is able to remove the existing bias in the stationary distribution of volatility while still preserving the clustering in volatility time series. We have also studied the estimate of volatility that deconvolutes the return by the simulation of a random Wiener path [21]. This last procedure also provides a Gaussian distribution for the log-volatility, albeit the distribution has too fat tails and pays the price of losing clustering and memory in the volatility time series. Our new procedure is in fact a more sophisticated variant of this estimate since it filters out Wiener realizations via maximum likelihood. The estimate drastically reduces the noise in the volatility path thus preserving data clustering. In this way we have thus proposed an alternative methods to those already provided by mathematical finance literature (see e.g. [15,30,31,32,33,34,35,36,37,38,39,40,41]). It should however be left for a future research an accurate comparison between our method and others. We may even look for a way to implement parameter estimation in our method.
The median of the estimated volatility has also been related to trading volume by the power-law expression M [σ] ∝ V 0.55 . A link between volatility and trading volume has been previously mentioned in different studies however our estimate is again capable to provide a less noisy regression. We must, indeed, stress the fact that this does not imply that volatility is proportional to a power of the volume, but only that its typical value is and that fluctuations around the average might play an important role.
We have also seen that current returns are proportional to the estimated volatility, as otherwise expected. However, the main novelty is that we have observed how future returns are proportional to current volatility and their predictability diminishes monotonically with the number of time steps ahead. This last finding implies that our estimation method can be applied to predict the size of future returns with the knowledge of current volatility.
As a final remark we stress the fact that the technique herein presented can be applied to a variety of physical phenomena besides finance. One typical problem of this sort is provided by the Brownian motion inside a field of force in which inertial effects are not negligible [49]. In this situation the dynamics of the particle is described by a two dimensional diffusion process (X(t), V (t)) representing the position and the velocity of the Brownian particle. The maximum likelihood technique might provide a reliable estimate of the velocity in the case that, for instance, the only accessible experimental measures are the positions of the particle at wide time steps, so that a measure of the velocity -which implies the knowledge of two very close positions -is too noisy and unreliable.
where the first term, P(X t−s , Y t−s ), corresponds to the initial realizations of X and Y s/∆t time steps far from the present time t; all the remaining terms of the form P(X τ , Y τ |X τ −∆t , Y τ −∆t ) are the conditional pdf's for transitions between consecutive steps: The logarithm of Eq. (A8) is On the other hand from Eqs. (A4)-(A5) we realize that where |J| is the Jacobian of the transformation (X τ , Y τ ) −→ (ε 1 (τ − ∆t), ε 2 (τ − ∆t)) defined by Eqs. (A4)-(A5), that is, But ε 1 and ε 2 are independent standard Gaussians, hence Let us briefly explain the origin of some of these contributions. The first summand comes from the normalization constant of the Gaussian distribution (A10). It appears in every conditional probability density and this is the reason for the factor s/∆t, which is the number of time steps between t − s and t. The resulting term does not depend on the realization, so that we can neglect it for a maximization with respect to Y. The term also goes to −∞ in the ∆t → 0 limit, which means that any individual realization has a probability measure zero.
The second summand is mostly the sum of the Jacobian transformations of each transition probability and depends on Y . Stochastic volatility models typically assume that these f and g [cf. Eqs. (A1) and (A2)] are continuous and monotonically increasing functions or even constants. For instance, in the expOU model we have f (x) = m exp(x) and g(x) = k. Because of this, we will also neglect this term in the maximization procedure. The contribution shifts the maximum at the excessive cost of adding more noise to the numerical computations. We can however look at the situation from another point of view. Ignoring this term is equivalent to omitting the Jacobian transformation between the two probability density measures [cf. Eq.(A10)]. In this way, we are stating that what we are really going to maximize is the probability of the realization of ε 1 (t) and ε 2 (t) -instead of Y (t)-in terms of the past history of the process expressed by {X, Y}.
The term ln P(X t−s , Y t−s ) is fixed by the initial conditions of the process. If we assume a known initial return Xwhich can be set to zero -and take a random Y t−s following the stationary distribution P st (Y t−s ) given by Eq. (9), then P(X t−s , Y t−s ) = δ(X t−s − X) P st (Y t−s ) and hence ln P(X t−s , Y t−s ) = ln P st (Y t−s ) + ln δ(X t−s − X).
(A12) Had we taken another initial condition, the technique would have given equivalent results (we have checked this by using several initial distributions). For this reason and in order to improve the convergence of the maximum likelihood estimate we have neglected also this contribution. We therefore write ln P({X, Y}) ≃ − 1 2 We can represent this equation in the continuous time framework if ∆t is sufficiently small and if f (x), h(x) and g(x) are continuous. In such a case, Eq. (A13) yields the result given in Eq. (11). The results are given in Fig. 8. The plot shows, that either option gives the same width of theŶ distribution, and the bias introduced by either specification error is small. We have also run similar simulations but with very different parameters values and the procedure still provides consistent results similar to those given by Fig. 8.
Finally, a more detailed test along the lines of point (d) above can be performed (that is: a test on wether the detrending causes a bias in our estimation procedure). We thus have repeated the test (d) also including errors 10µ and 100µ. The procedure tolerates up to 10 times more detrending error than expected in the real dataset. Only, when we have about 100 times the error, the reconstruction shows a strong upward bias for low volatility periods as shown in Fig. 9. All these results imply that possible errors in the detrending procedure could affect our procedure only when any wrong specification of the drift is far outside the error domain involved in our DJIA data set. ln |ε| = 1 2 √ 2π ∞ 0 x −1/2 e −x/2 ln xdx = − π/2(γ + ln 2), where γ = 0.5772 · · · is the Euler constant. Therefore, ln |∆W 1 (t)| ≈ (ln ∆t)/2 − (γ + ln 2)/2. (C2) Substituting Eq. (C2) into Eq. (C1) proves Eq. (20).