Forecasting tourism demand using consumer expectations

The paper focuses on forecasting tourism demand in Catalonia for the four main visitor markets (France, the United Kingdom, Germany and Italy) combining qualitative information with quantitative models: autoregressive (AR), autoregressive integrated moving average (ARIMA), self-exciting threshold autoregressions (SETAR) and Markov switching regime (MKTAR) models. The forecasting performance of the different models is evaluated for different time horizons (1, 2, 3, 6 and 12 months).


INTRODUCTION
Catalonia is one of the seventeen autonomous communities in Spain. It is located in the northeast and its capital is Barcelona. Its population (over seven million inhabitants) represents 16% of the total population of Spain. Catalonia is a tourist region: over 14 million foreign visitors come to Catalonia every year, leading to 111 million overnight stays and tourism accounts for 12% of GDP and provides employment for around 19% of the working population in the service sector. The study of aggregate tourism demand helps the making of business decisions and tourist policies and provides in-depth information about tourist flows. Although studies have been undertaken for other countries, to date, there has not been any analyses of tourism demand forecasting in Catalonia.
Consumer surveys have become an essential tool for gathering information about different economic variables (Ludvigson, 2004, Garrett et al, 2004, Howrey, 2001. Their results are weighted percentages of respondents expecting an economic variable to increase, decrease or remain constant. Therefore, the information refers to the direction of change but not to its magnitude. As pointed out by Pesaran (1987), this type of data are less likely to be susceptible to sampling and measurement errors than surveys that require respondents to give point forecasts. Statistical information from consumer surveys is available much more in advance to quantitative statistics and is related with agents' expectations. The fast availability of the results and the wide range of variables covered make them very useful for monitoring the current status of the economy, but there is no consensus on their utility for forecasting macroeconomic developments.
The objective of the paper is to analyse the possibility of improving the forecasts for tourist demand in Catalonia using the information provided by consumer surveys. As expansions are more prolonged over time than recessions (Hansen, 1997), in the behaviour of most economic variables there seems to be a cyclical asymmetry that linear models are not able to capture. To overcome this issue, four different sets of models have been considered in the paper: autoregressive (AR), autoregressive integrated moving average (ARIMA), self-exciting threshold autoregressions (SETAR) and Markov switching regime (MKTAR) models. Then the Root Mean Square Error (RMSE) has been computed for different forecast horizons (1, 2, 3, 6 and 12 months).
In order to test if survey results provide useful information to improve forecasts of the tourism demand in Catalonia we have considered the consumer confidence indicator (CCI) for the four main visitor markets (France, the United Kingdom, Germany and Italy) from January 2002 to June 2008 and we have introduced as explanatory variable in autoregressive (AR) and Markov switching regime (MKTAR) models, where the probability of changing regime depends on the information of the qualitative indicators rather than on the own evolution of the series. The comparison of these values with the ones obtained with models where information from business and consumer would permit to assess whether these indicators permit to improve the forecasts or not.
The structure of the paper is as follows. In the next section our methodological approach is described, including both benchmark models and models where consumer surveys information is included. Next, results of the forecasting competition are discussed in Section 3. Last, conclusions are given in Section 4.

Benchmark models
A variety of time-series models have been used and compared to estimate and forecast tourism demand. The most commonly used being exponential smoothing and autoregressive integrated moving average (ARIMA) models (Li, Song and Witt, 2005;Witt and Witt, 1995). In this work four different models (AR, ARIMA, SETAR and MKTAR models) have been proposed to obtain forecasts for the quantitative variables expressed as year-on-year growth rates. As there are few attempts in the literature to incorporate qualitative information in quantitative forecasting models (Lee, Song & Mjelde, 2008), AR models have also been applied including qualitative survey data.

Autoregressions (AR)
Autoregressions explain the behaviour of the endogenous variable as a linear combination of its own past values: In order to determine the number of lags that should be included in the model, we have selected the model with the lowest value of the Akaike Information Criteria (AIC) considering models with a minimum number of 1 lag up to a maximum of 24 (including all the intermediate lags).

Autoregressive integrated moving average models (ARIMA)
The general expression of an ARIMA model (Box and Jenkins, 1970) is the following: 1 1 1 is a regular moving average polynomial, and 1 1 1 is a regular autoregressive polynomial, λ is the value of the Box-Cox (1964) transformation, D s Δ is the seasonal difference operator, d Δ is the regular difference operator, S is the periodicity of the considered time series, and t ε is the innovation which is assumed to behave as a white noise. In order to use this kind of models with forecasting purposes we have considered models with up to 12 AR and MA terms selecting the model with the lowest value of the AIC.

Self-exciting threshold autoregressions models (SETAR)
A Self-Excited Threshold Autoregressive (SETAR) model (Hansen, 1997) for the time series t x can be summarised as follows: where t u and t v are white noises, ) (L B and ) (L ζ are autoregressive polynomials, the value k is known as delay and the value γ is known as threshold. This two-regime self-exciting threshold autoregressive process is estimated and a Monte Carlo procedure is used to generate multi-step forecasts. The selected values of the delay are those minimising the sum of squared errors among values between 1 and 12. The values of the threshold are given by the variation of the analysed variable.

Markov switching regime models (MKTAR)
As an alternative to SETAR models, time series regime-switching models assume that the distribution of the variable is known conditional on a particular regime or state occurring. Hamilton (1989) presented the Markov regime-switching model in which the unobserved regime evolves over time as a first order Markov process. In this analysis, we use a Markov-switching threshold autoregressive model (MKTAR) allowing for different regime-dependent intercepts, autoregressive parameters, and variances. Once we have estimated the probabilities of expansion and recession using the Hamilton filter together with the smoothing filter of Kim (1994), we construct the following model for the time series t x using the estimated probabilities of changing regime: where, t u and t v are white noises, ) (L B and ) (L ζ are autoregressive polynomials, k is the value minimizing the sum of squared errors among 1 and 12 and the value P , known as threshold, is given by the variation of the probability.

Models where consumer surveys information is incorporated
One way to use the qualitative information of survey data on the direction of change in order to improve the forecasts of the quantitative variables consists in introducing selected indicators as explanatory variables in autoregressions. Several recent works have estimated autoregressive models for some target variable adding current and lagged values of a consumer confidence index in order to test its significance and consider the extent of its effects (Claveria, Pons and Ramos, 2007;Easaw and Heravi, 2004;Vuchelen, 2004). We have followed the same approach by incorporating the consumer confidence indicator (CCI) to autoregressive (AR) models. We have excluded the rest of the benchmark models due to the available data set.
The consumer confidence indicator (CCI) was designed by the European Commission in order to summarise the results of the consumer surveys. This indicator is obtained as an arithmetic mean of the answers (seasonal adjusted balances) to four questions: where 1 Q refers to the financial situation over the next 12 months, 2 Q to the general economic situation over the next 12 months, 3 Q to the unemployment expectations over the next 12 months and 4 Q to the savings over the next 12 months.

RESULTS
Tourist data in this paper was obtained from Turisme de Catalunya and the Statistical Institute of Catalonia (IDESCAT), as well as Frontur data from the Institute of Tourism Studies (IET), while survey data from the European Commission. A descriptive analysis of this data set can be found in Claveria and Datzira (2009).
In order to evaluate the relative forecasting accuracy, all models were estimated from January 2002 to June 2007 and forecasts for 1,2,3,6 and 12 months ahead were computed. The specifications of the models are based on information up to that date and, then re-estimated each month for forecasts to be computed. Given the availability of actual values until June 2008, forecast errors can be computed in a recursive way (i.e., for the 1 month forecast horizon, 12 forecast errors can be computed). All calculations are performed with Gauss for Windows 6.0.
To summarise this information, the Root Mean Squared Error (RMSE) has been computed so methods can be ranked according to their values. It is worth mentioning that in all cases we have assumed that the information of business and consumer surveys is known in advance, which is not a strong assumption for shorter forecasting horizons but it could be for longer ones.

ANNEX. ESTIMATION RESULTS
The estimation results of the best models used for forecasting tourism demand in Catalonia for one month ahead, corresponding to Germany for both arrivals and overnight stays, are reported in Tables A.1 to A.12.