A new forecasting approach for the hospitality industry

Purpose This study aims to apply a new forecasting approach to improve predictions in the hospitality industry. In order to do so we develop a multivariate setting that allows incorporating the cross-correlations in the evolution of tourist arrivals from visitor markets to a specific destination in Neural Network models. Design/methodology/approach This multiple-input multiple-output approach allows generating predictions for all visitors markets simultaneously. Official data of tourist arrivals to Catalonia (Spain) from 2001 to 2012 are used to generate forecasts for 1, 3 and 6 months ahead with three different networks. Findings – The study reveals that multivariate architectures that take into account the connections between different markets may improve the predictive performance of neural networks. Additionally, we develop a new forecasting accuracy measure and find that radial basis function networks outperform the rest of the models. Research limitations/implications This research contributes to the hospitality literature by developing an innovative framework to improve the forecasting performance of artificial intelligence techniques and by providing a new forecasting accuracy measure. Practical implications The proposed forecasting approach may prove very useful for planning purposes, helping managers to anticipate the evolution of variables related to the daily activity of the industry. Originality/value – A multivariate neural network framework is developed to improve forecasting accuracy, providing professionals with an innovative and practical forecasting approach.


I. Introduction
International tourism is becoming one of the most important activities worldwide.As a result, an increasing amount of financial resources are flowing to the hospitality industry.In order to efficiently allocate the increasing investments, both public and private sectors need accurate forecasts of tourism demand, specially at the destination level (Reyonlds et al., 2013).Accurate estimates of the demand for tourism enable decision makers and managers in the hospitality industries to improve strategic planning.
New forecasting methodologies play an important role in this context, as they often allow to improve the predictive accuracy of estimates.As a result, a growing body of literature has focused on tourism demand forecasting.Most research efforts apply econometric models such as static almost-ideal demand system (AIDS) models (Han et al., 2006), error correction (EC) and co-integration (CI) models (Veloce, 2004;Ouerfelli, 2008, Lee, 2011), vector autoregressive (VAR) models (Song and Witt, 2006), time varying parameter (TVP) models (Song et al., 2011).Choice models (Talluri and van Ryzin, 2004) have also been increasingly used in revenue management.
Time series models such as exponential smoothing (Athanasopoulos et al., 2012) and autoregressive integrated moving average (ARIMA) models (Claveria and Datzira, 2010;Assaf et al., 2011;Gounopoulos et al., 2012) have been widely used in the literature.While there is no consensus on the most appropriate approach to forecast tourism demand (Kim and Schwartz, 2013), there seems to be unanimity on the importance of applying new approaches to tourism demand forecasting (Song and Li, 2008), and on the fact that nonlinear methods outperform linear methods in modelling economic behaviour (Cang, 2013).
Nevertheless, nonlinear models are still limited in that an explicit relationship for the data series has to be assumed with little knowledge of the underlying data generating process.Since there are too many possible nonlinear patterns, the specification of a nonlinear model to a particular data set becomes a difficult task.
The suitability of artificial intelligence (AI) techniques to handle nonlinear behaviour and the need for more accurate forecasts explain why these techniques have become an essential tool for economic and tourism forecasting.AI methods can be divided into five categories: grey theory, fuzzy time series, rough sets approach, support vector machines (SVMs) and artificial neural networks (ANNs).Yu and Schwartz (2006) use grey theory and fuzzy time series models in predicting annual US tourist arrivals.Goh et al. (2008) apply a rough sets algorithm to forecast US and UK tourism demand for Hong Kong.SVMs were first applied to tourism demand forecasting by Pai et al. (2006) in order to obtain predictions of visitors to Barbados.Chen and Wang (2007) find empirical evidence that SVMs outperform ARIMA models in predicting quarterly tourist arrivals to China.
ANNs are a flexible tool for modelling and forecasting.The introduction of the backpropagation algorithm fostered the use of ANNs for forecasting purposes, such as predicting business failure (Coleman et al., 1991), personnel inventory (Huntley, 1991), international airline passenger traffic (Nam and Schaefer, 1995), etc.Some of the first applications of ANNs for tourism demand forecasting are those of Pattie and Snyder (1996) and Law (1998).Many authors have found evidence that ANNs outperform time series models for tourism demand forecasting (Law, 2000;Cho, 2003;Claveria and Torra, 2014).Zhang et al.(1998) review the literature comparing ANNs with time series models.
Additionally, the fact that tourism data are characterised by strong seasonal patterns and volatility, make it a particularly interesting field in which to apply new forecasting techniques.ANNs can be regarded as a multivariate nonlinear nonparametric statistical method.As data characteristics are associated with forecast accuracy (Peng et al., 2014), nonlinear data-driven approaches such as ANNs represent a flexible tool for forecasting, allowing for nonlinear modelling without a priori knowledge about the relationships between input and output variables.
ANNs can be classified into two major types of architectures depending on the connecting patterns of the different layers: feed-forward networks and recurrent networks.In feed-forward networks the information runs only in one direction, while in recurrent networks there are bidirectional data flows.The feed-forward topology was the first to be developed.The most widely used feed-forward model in tourism demand forecasting is the multi-layer perceptron (MLP) network (Uysal and El Roubi, 1999;Law, 2000Law, , 2001;;Law andAu, 1999, Burger et al., 2001;Tsaur et al., 2002;Kon and Turner, 2005;Palmer et al., 2006;Padhi and Aggarwal, 2011).
A special class of multi-layer feed-forward architecture with two layers of processing is the radial basis function (RBF) network.RBF ANNs were devised by Broomhead and Lowe (1988).The first attempt to use RBF ANNs in tourism demand forecasting is that of Cang (2013), who generates RBF, MLP and SVM forecasts of UK inbound tourist arrivals and combines them in non-linear models.
Another paper in which RBF ANNs are implemented is that of Cuhadar et al. (2014), who evaluate the forecasting accuracy of RBF networks to predict cruise tourist demand to Izmir (Turkey).
The fact that recurrent networks allow for temporal feedback connections from outer layers to lower layers of neurons make them specially suitable for time series modelling.There are many recurrent architectures.A special case of recurrent network is the Elman network (Elman, 1990).This ANN model has not been used in tourism demand forecasting except by Cho (2003), who applies the Elman architecture to predict the number of arrivals from different countries to Hong Kong.
This study seeks to break new ground by evaluating the forecasting performance of three different ANN models (MLP, RBF and Elman) in a multivariate setting based on multiple-input multiple-output (MIMO) structures.
By incorporating cross-correlations between foreign visitor markets to a specific destination we can simultaneously obtain forecasts for all countries, without having to estimate the models for each market.The proposed forecasting approach is designed so as to improve the forecasting accuracy of ANNs and may prove very useful for effective policy planning.The main aim of the study is to provide investors and managers with a useful procedure to anticipate the evolution of demand for all different markets simultaneously.
Multivariate approaches to tourist demand forecasting are also few and have yielded mixed results.While Athanasopoulos and Silva (2012) find that exponential smoothing methods in a multivariate setting improve the forecasting accuracy of univariate alternatives, du Preez and Witt (2003) obtain evidence that multivariate time series models do not generate more accurate forecasts than univariate time series models.
When comparing the performance of different ANN models we are evaluating the impact of alternative ways of processing data on forecast accuracy.ANNs learn from experience.In non-supervised learning networks, the subjacent structure of data patterns is explored so as to organize such patterns according to their correlations.While MLP networks are supervised learning models, RBF networks combine both supervised and non-supervised learning, which is known as hybrid learning.
The present study focuses on international tourist arrivals to Catalonia, which is a region of Spain, the world's third most important destination after France and the US, with 60 million tourist arrivals in 2013.Capó et al. (2007) and Balaguer and Cantavella-Jordá (2002) show the fundamental role of tourism in the Spanish long-run economic development.Catalonia received 15,5 million tourists in 2013, which accounted for 25% of tourism revenues in Spain.The first four moths of 2014, Catalonia experienced a 10.4% growth of tourist arrivals with respect to the same time last year.It follows that tourism is one of the fastest growing sectors in Catalonia.These figures also show the importance that accurate forecasts of tourism volume have for policy makers and professionals in the hospitality and leisure industry.
We use official statistical data of tourist arrivals from all countries of origin to Catalonia over the period 2001 to 2012.By means of the Johansen test we find correlated accelerations between the different markets, which leads us to apply a multivariate approach to obtain forecasts of tourism demand for different forecast horizons (1, 3 and 6 months).To assess the effect of expanding the memory on forecast accuracy, we repeat the analysis assuming different topologies with respect to the number of lags used for concatenation.Finally, we compute several measures of forecast accuracy and the Diebold-Mariano (DM) test for significant differences between each two competing series.
The structure of the paper is as follows.Section 2 describes the different neural networks architectures used in the analysis.Section 3 analyses the data set.
Section 4 presents the methodology used in the study.In Section 5, results of the forecasting comparison are presented.Concluding remarks are given in Section 6.
Finally, we discuss on the limitations of the analysis and the lines for future research in Section 7.

Artificial Neural Network models
This is the first study that implements multiple-input multiple-output ANN architectures to tourism demand forecasting.This framework allows to incorporate the common trends in inbound international tourism demand from all visitor markets to a specific destination in ANNs.Multivariate networks also allow to generate predictions for all countries simultaneously, instead of implementing each model for each visitor market.Additionally, we repeat the analysis using different memory values in order to test the effect on the forecasting accuracy of the number of lags included in the models.
In this study we focus on three ANN models (MLP, RBF and Elman).Each network represents a different way of handling information.A complete summary on ANN modelling issues can be found in Bishop (1995) andHaykin (1999).

Multi-layer perceptron neural network
MLP networks consist of multiple layers of computational units interconnected in a feed-forward way.MLP networks are supervised neural networks that use as a building block a simple perceptron model.The topology consists of layers of parallel perceptrons, with connections between layers that include optimal connections.The number of neurons in the hidden layer determines the MLP network's capacity to approximate a given function.In this work we used the MLP specification suggested by Bishop (1995) with a single hidden layer and an optimum number of neurons derived from a range between 5 and 25: Where t y is the output vector of the MLP at time t ; g is the nonlinear function of the neurons in the hidden layer; i t x − is the input value at time i t − where i stands for the memory (the number of lags that are used to introduce the context of the actual observation.);q is the number of neurons in the hidden layer; ij φ are the weights of neuron j connecting the input with the hidden layer; and j β are the weights connecting the output of the neuron j at the hidden layer with the output neuron.

Radial basis function neural network
RBF networks consist of a linear combination of radial basis functions such as kernels centred at a set of centroids with a given spread that controls the volume of the input space represented by a neuron (Bishop, 1995).RBF networks typically include three layers: an input layer; a hidden layer and an output layer.
The hidden layer consists of a set of neurons, each of them computing a symmetric radial function.The output layer also consists of a set of neurons, one for each given output, linearly combining the outputs of the hidden layer.The output of the network is a scalar function of the output vector of the hidden layer.
The equations that describe the input/output relationship of the RBF are: Where t y is the output vector of the RBF at time t ; j β are the weights connecting the output of the neuron j at the hidden layer with the output neuron; q is the number of neurons in the hidden layer; j g is the activation function, which usually has a Gaussian shape; i t x − is the input value at time i t − where i stands for the memory (the number of lags that are used to introduce the context of the actual observation); j μ is the centroid vector for neuron j ; and the spread j σ is a scalar and it can be defined as the area of influence of neuron j in the space of the inputs.The spread j σ is a hyper parameter selected before determining the topology of the network, and it was determined by crossvalidation on the training database.

Elman neural network
An Elman network is a special architecture of the class of recurrent neural networks, and it was first proposed by Elman (1990).The architecture is also based on a three-layer network but with the addition of a set of context units that allow feedback on the internal activation of the network.There are connections from the hidden layer to these context units fixed with a weight of one.At each time step, the input is propagated in a standard feed-forward fashion, and then a back-propagation type of learning rule is applied.The output of the network is a scalar function of the output vector of the hidden layer: Where t y is the output vector of the Elman network at time t ; t j z , is the output of the hidden layer neuron j at the moment t ; g is the nonlinear function of the neurons in the hidden layer; i t x − is the input value at time i t − where i stands for the memory (the number of lags that are used to introduce the context of the actual observation); ij φ are the weights of neuron j connecting the input with the hidden layer; q is the number of neurons in the hidden layer; j β are the weights of neuron j that link the hidden layer with the output; and ij δ are the weights that correspond to the output layer and connect the activation at moment t .Note that the output t y in our study is the estimate of the value of the time series at time 1 + t , while the input vector to the neural network will have a dimensionality of 1 + p .

Data set
In this study we use the number of international tourist arrivals to Catalonia (first destination) over the period 2001:01 to 2012:07.Data is provided by the National Institute of Statistics (INE).Table 1 shows a descriptive analysis of tourist arrivals to Catalonia for the out-of-sample period (January 2009 to July 2012).Since data characteristics, including the origin of tourists, have an influence on the predictive ability of the models, we use tourist arrivals from all source markets.The first four visitor countries (France, the United Kingdom, Belgium and the Netherlands and Germany) account for more than half of the total number of tourist arrivals to Catalonia.Russia is the country that presents the highest dispersion in tourist arrivals, while Italy shows the highest levels of Skewness and Kurtosis.Before estimating the models, we determine whether the underlying process which generated the series is stationary.If the series are found to have a unit root, differencing is necessary to obtain stationary series (Lim et al., 2009).In Table 2a we present the results of several unit roots tests: the ADF test (Dickey and Fuller, 1979), the PP test (Phillips and Perron, 1988) and the KPSS test (Kwiatkowski et   al., 1992) While the ADF and the PP statistics test the null hypothesis of a unit root in t y , the KPSS statistic tests the null hypothesis of stationarity.In most countries we cannot reject the null hypothesis of a unit root at the 5% level.
Similar results are obtained for the KPSS test, where the null hypothesis of stationarity is rejected in most cases.These results imply that differencing is required in most cases and prove the importance of detrending tourism demand data (Zhang and Qi, 2005).In order to eliminate both linear trends as well as seasonality we used the first differences of the natural log of tourist arrivals.Given the common patterns displayed by most countries, we test for cointegration using Johansen's trace tests (Johansen, 1988(Johansen, , 1991)).Trace statistics test the null hypothesis of r cointegrating vectors against the alternative hypothesis of n cointegrating vectors.In Table 2b we present the results of five different unrestricted cointegration rank tests.It can be seen that we can only reject the null hypothesis of nine cointegrating vectors with two of the tests.The fact that the evolution of tourist arrivals is multicointegrated has led us to apply a multivariate multiple-output neural network approach to obtain forecasts of tourism demand for all visitor markets.

Methodology
As the evolution of tourist arrivals from different visitor markets shows a common stochastic trend, we apply a multivariate neural network approach to obtain forecasts of tourism demand.To our knowledge this is the first study to evaluate the performance of different ANN models in a multiple-input multipleoutput setting that uses the cross-correlations between the evolution of foreign visitors from different markets to a specific destination.This approach also allows to generate predictions for all visitors markets simultaneously without having to replicate the analysis for each country.
We divide the collected data into three sets: training, validation and test.This division is done in order to asses the performance of the network on unseen data (Bishop, 1995;Ripley, 1996).The initial size of the training set is determined to cover a five-year span in order to accurately train the networks.Therefore, the as the validation set, and the last 20% as the test set.
We implement an iterative forecasting scheme: after each forecast, a retraining is done by increasing the size of the set by one period and sliding the validation set by another period.This iterative process is repeated until the test set consists of the out-of-sample period.The selection criterion for the topology and the parameters is the performance on the validation set.To ensure optimal parameter estimation, we apply a multi-start technique that initializes the neural network three times for different initial random values returning the best result.Using as a criterion the performance on the validation set, the results correspond to the selection of the best topology and the best spread in the case of the RBF neural networks.
Following the suggestions made by Koupriouchina et al. (2014), we evaluate our predictions at a number of forecasting horizons.Therefore, forecasts for 1, 3 and 6 months ahead are generated in a recursive way.To assess the forecast accuracy we make use of three different measures.First, we compute the Root Mean Squared Error (RMSE).An additional key aspect that should be addressed is whether the reduction in RMSE is statistically significant when comparing models.
With this in mind, we obtain the measure of predictive accuracy proposed by Diebold and Mariano (1995).The DM loss-differential test for predictive accuracy tests the null hypothesis that the difference between the two competing series is non-significant.We calculate the DM test using a Newey-West type estimator to construct the t-statistic from a simple regression of the loss function on a constant.
A negative sign of the t-statistic associated to the DM test implies that the absolute value of the forecast error associated with the prediction coming from the competing series is lower, while a positive sign implies the opposite.
Finally, in order to attain a more comprehensive forecasting performance evaluation, we propose a dimensionless measure based on the CJ statistic for testing market efficiency (Cowles and Jones, 1937).This accuracy measure allows us to compare the forecasting performance between two competing models.This statistic consists on a ratio that calculates the proportion of periods in which the model under evaluation obtains a lower absolute forecasting error than the benchmark model.In this study we use the no-change model as a benchmark.This new measure of forecast accuracy will be referred to as the percentage of periods with lower absolute error (PLAE).
Let us denote t y as actual value and t y ˆ as forecast at period t , Forecast errors can then be defined as . Given two competing models A and B , where A refers to the forecasting model under evaluation, whereas B stands for benchmark model, we can then obtain the proposed statistic as follows: (4)

Results
When analysing the forecast accuracy, MLP and RBF networks show lower RMSE values than Elman networks.Nevertheless, the no-change model outperforms both MLP and Elman networks.While RBF networks display the lowest RMSE values for longer horizons in most countries, the no-change model shows higher percentages of PLAE than the rest of the networks, especially for the shorter forecasting horizons.The fact that the no-change model generates more accurate one-period-ahead forecasts than other more sophisticated models confirms previous research by Witt et al. (1994).When the forecasts are obtained incorporating additional lags of the time series, that is to say when the memory is set to three periods, the forecasting performance of RBF networks significantly improves in Switzerland and Russia for 6 months ahead.Nevertheless, the highest forecasting errors are always found in Russia.This result may be caused by the high dispersion observed in Russian tourist arrivals, and it endorses the conclusions obtained by Peng et al. (2014) with reference to the fact that the country of origin affects the accuracy of the predictions.By contrast, the lowest RMSE value is obtained with the RBF network for total tourist arrivals, for 6 months ahead when the memory is zero, and for 3 months ahead when using a memory of three lags.In Fig. 1 we compare the actual evolution of total tourist arrivals to the six moth ahead forecasts with RBF network.Unlike the results obtained by Chow (2003) when comparing the forecasting performance of Elman ANN to exponential smoothing and ARIMA models to forecast tourist arrivals to Hong Kong, we do not find evidence in favour of Elman networks, which suggests that divergence issues may arise when using dynamic networks with forecasting purposes.The only network that outperforms the benchmark model in most cases is the RBF.These results also show that hybrid models such as RBF, which combine supervised and non-supervised learning, are more indicated for tourism demand forecasting than models using supervised learning alone.Our finding on the suitability of using RBF ANNs for tourism demand forecasting in the hospitality industry confirms previous research by Cuhadar et al. (2014).
Finally, we repeat the analysis assuming different topologies regarding the memory values, which refer to the number of past months included in the context of the input, ranging from one to four months.Therefore, when the memory is one, the forecast is done using only the current value of the time series, without any additional temporal context.In Table 4 we present the results of the DM test so as to assess whether the different memory values are statistically significant on forecast accuracy.We find that in most cases, as the number of previous months used for concatenation increases, the forecasting performance of the different networks shows no significant improvement.This result suggests that the increase in the weight matrix is not compensated by the more complex specification, leading to overparametrization.

Conclusion
The increasing importance of the hospitality sector worldwide has led to a growing interest in new approaches to tourism demand forecasting.New methods provide more accurate estimations of anticipated tourist arrivals for effective managerial and policy planning.Artificial intelligence techniques are generating an increasing interest as a way to refine the predictions of tourist arrivals at the destination level.The main aim of this study is to develop a new forecasting framework to improve the forecasting performance of artificial neural networks.
We use three models that represent alternative ways of handling information: the multi-layer perceptron, the radial basis function and the Elman recursive neural network.

Theoretical implications
The theoretical contribution of this study to the hospitality and tourism literature is twofold.On the one hand, we apply an innovative approach to improve the forecasting accuracy of artificial intelligence techniques.Based on multiple-input multiple-output structures, we design a framework that allows to incorporate the existing cross-correlations in tourist arrivals form different visitor markets to a specific destination in neural networks.This new procedure allows to estimate the evolution of demand from all different markets simultaneously.
On the other hand, we propose a dimensionless forecasting accuracy measure based on the statistic for testing market efficiency.This new measure allows to compare the forecasting performance between two competing models by giving the percentage of periods in which the model under evaluation obtains a lower absolute forecasting error than the benchmark model.
By means of cointegration analysis we find that the evolution of arrivals from international visitor markets to Catalonia share a common trend, which leads us to apply a multivariate approach.The forecasting out-of-sample comparison shows that radial basis function models outperform the rest of the networks in most cases.
The research highlights that hybrid models, which combine supervised and non-supervised learning, are more indicated for tourism demand forecasting than models using supervised learning alone.Our results also suggest that when using dynamic or recurrent neural networks with forecasting purposes scaling issues may arise.
In order to evaluate the effect of the memory on the forecasting results, we repeat the analysis assuming different topologies regarding the number of lags used for concatenation and find no significant differences when additional lags are incorporated, especially in the case of multi-layer perceptron neural networks.This result can in part be explained by the cross-correlations accounted for in the multivariate approach.

Practical Implications
From a managerial perspective, the study provides a new and practical approach to predict the arrival of tourists, overnight stays or any other variable related to the daily activity of the hospitality and leisure industries.This new procedure allows practitioners to incorporate the common trends in customers from all markets in neural networks in order to anticipate the evolution of demand for all different markets simultaneously.
The study also provides managers with a new and easily interpretable measure to evaluate forecasts from different methods with the aim of attaining a more comprehensive forecasting performance evaluation.
The research reveals the suitability of hybrid models such as radial basis function networks for medium-term forecasts.The results of the analysis also highlight that the predictive performance can be improved by taking into account the connections between the different markets.
Demand is the most basic force driving the industry's development.Therefore, these findings aim to improve forecasting practices in the hospitality industry and provide new tools that may prove very helpful to anticipate future demand for planning purposes.

Limitations and future research
Nevertheless, this study is not without its limitations.First, a comparison between different tourist destinations would allow to analyze to what extent regional differences have a significant influence on forecasting accuracy.Another question to be considered in further research is whether a combination of the forecasts from different topologies may improve the accuracy of predictions.
Finally, there is the question of whether the implementation of alternative artificial intelligence techniques such as support vector regressions could improve the forecasting performance of neural networks.et al., 1999).p-values in parentheses when different from zero.

Notes:
The black line represents the evolution of total tourist arrivals to Catalonia.The dotted line represents the 6 month ahead forecasts of total tourist arrivals to Catalonia obtained with the RBF model.
first sixty monthly observations (from January 2001 to January 2006) are selected as the initial training set, the next thirty-six (from January 2007 to January 2009)

Table 1 .
Descriptive analysis of tourist arrivals (levels)

Table 2 .
Tests.Unit Root Tests and Unrestricted Cointegration Rank Tests

Table 4 .
Diebold-Mariano loss-differential test statistic for predictive accuracy

Table 1 .
Descriptive analysis of tourist arrivals (levels)