Empirical modelling of survey-based expectations for the design of economic indicators in five European regions

In this study we use agents’ expectations about the state of the economy to generate indicators of economic activity in twenty-six European countries grouped in five regions (Western, Eastern, and Southern Europe, and Baltic and Scandinavian countries). We apply a data-driven procedure based on evolutionary computation to transform survey variables in economic growth rates. In a first step, we design five independent experiments to derive a formula using survey variables that best replicates the evolution of economic growth in each region by means of genetic programming, limiting the integration schemes to the main mathematical operations. We then rank survey variables according to their performance in tracking economic activity, finding that agents’ “perception about the overall economy compared to last year” is the survey variable with the highest predictive power. In a second step, we assess the out-of-sample forecast accuracy of the evolved indicators. Although we obtain different results across regions, Austria, Slovakia, Portugal, Lithuania and Sweden are the economies of each region that show the best forecast results. We also find evidence that the forecasting performance of the survey-based indicators improves during periods of higher growth.

These findings have led us to use evolutionary computation to generate indicators of economic growth that combine different survey variables of 26 European countries grouped into five major European regions (Western, Eastern, and Southern Europe, and Baltic and Scandinavian countries). First, we design five independent experiments that link survey expectations to economic growth, limiting the preliminary functions to the main mathematical operations with the aim of facilitating the implementation of the evolved economic indicators. Once we obtain the optimal combination of survey variables that best replicates the evolution of economic activity in each region, we rank the expectations according to the relative weight of each one in the evolved indicators. In a second step, we assess the out-ofsample forecast accuracy of the obtained economic indicators country by country.
Some of the features of empirical modelling are particularly indicated to deal with the problem at hand. First, empirical modelling is especially suitable for finding patterns in large data sets with little or no prior information about the system. Second, empirical modelling allows us to simultaneously evolve both the structure and the parameters of the model without imposing any assumptions regarding agents' expectations. In a recent study, Lahiri and Zhao (2015) found significant improvements in the forecasting performance of quantified expectations when relaxing the assumptions of quantification methods of qualitative survey data.
The empirical modelling approach applied in this research is based on symbolic regression (SR) via genetic programming (GP). While SR is a modelling approach characterised by the search of the space of mathematical expressions that best fit a given dataset, GP is a soft computing search technique for problem-solving (Cramer 1985). GP is based on the implementation of genetic algorithms (GAs), which are a specific type of evolutionary algorithm (EA). Evolutionary computation can be regarded as a subfield of artificial intelligence, and is being increasingly applied to automated problem solving in economics.
The main aim of this study is twofold. On the one hand, we implement GP to find the optimal combinations of survey expectations to forecast economic growth at a regional level, restricting the integration schemes to the main four mathematical operations so as to obtain easily replicable expressions. This allows us to rank survey variables according to their predictive capacity. On the other hand, we assess the forecasting performance of the evolved economic indicators in each country and compare it to a benchmark model.
The structure of the paper is as follows. The next section reviews the existing literature. In Sect. 3 we present the methodological approach, describing the data and the experimental set-up. Empirical results are provided in Sect. 4. Finally, conclusions are drawn in Sect. 5. average percentage change of an aggregate variable on the percentage of respondents expecting a variable to rise and to fall. Carlson and Parkin (1975) developed the theoretical framework for quantifying survey expectations by assuming that respondents report a variable to go up if the mean of their subjective probability distribution lies above a threshold level, also known as indifference interval (Theil 1952).
This relationship has been also explored by matching individual responses with firm-by-firm realisations, both empirically (Lahiri and Zhao 2015;Lui et al. 2011a, b;Mitchell et al. 2002Mitchell et al. , 2005aMokinski et al. 2015;Müller 2010), and experimentally via Monte Carlo simulations (Claveria et al. 2006;Nardo 2003). Common (1985) used experimental expectations to test the rational expectations hypothesis. Muth (1961) assumed that rationality implied that expectations had to be generated by the same stochastic process that generates the variable to be predicted. While Common (1985) rejected the presence of rational agents, Miah et al. (2016) have recently found survey expectations in 18 emerging economies to be mostly unbiased and efficient. Simulation experiments have also been used to assess the forecasting performance of different quantification methods of survey expectations (Claveria 2010;Löffler 1999;Nardo and Cabeza-Gutés 1999;Terai 2009).
In this study we fill this gap by linking survey data and economic growth in a SR setting solved by means of evolutionary computation. This approach is based on the implementation of GAs, which adopt Darwinian principles of the theory of natural selection in the context of expensive optimisation (Fogel et al. 1966). GAs are the most common type of EA, and were initially proposed by Holland (1975). GP allows the model structure to vary during the evolution, which makes it particularly indicated for non-linear and empirical modelling. See Banzhaf et al. (2008), Dabhi and Chaudhary (2015) and Poli et al. (2010) for a review of the state of the art in GP.
Most economic applications of evolutionary computing are in finance (Chen and Kuo 2002;Fogel 2006;Goldberg 1989). GAs have been used to predict the financial failure of firms (Acosta-González and Fernández 2014), to explain the 2008 financial crisis (Acosta-González et al. 2012), to model exchange rates (Lawrenz and Westerhoff 2003), to evaluate the convergence to the rational expectations equilibrium (Maschek 2010), to optimize the signals generated by technical trading tools (Thinyane and Millin 2011), to forecast stock price trends in Taiwan (Wei 2013), etc. See Drake and Marks (2002) for a review of the applications of GAs in financial forecasting.
Regarding GP, Vasilakis et al. (2013) proposed a GP-based technique to predict returns in the trading of the euro/dollar exchange rate. GP has also been applied to to model short-term capital flows (Yu et al. 2004), to forecast exchange rates (Á lvarez-Díaz and Á lvarez 2005), and for stock price forecasting (Chen et al. 2008;Kaboudan 2000;Larkin and Ryan 2008;Wilson and Banzhaf 2009). Wilson and Banzhaf (2009) compared a developmental co-evolutionary GP approach to standard linear GP for interday stock prices prediction. Alexandridis et al. (2017) have recently compared the forecasting performance of GP in the context of weather derivatives pricing with other state-of-the-art machine learning algorithms and classic linear approaches, finding that non-linear methods outperformed the alternative linear models significantly.
Up until now there have been very few applications of GP in macroeconomics. The first GP application is that of Koza (1992), who used GP to solve a SR problem designed to reassess the exchange equation, relating the price level, gross national product, money supply, and the velocity of money. More recent macroeconomic applications of GP have been used with forecasting purposes (Chen et al. 2010;Duda and Szydło 2011). Ferreira (2001) developed a version of GP known as gene expression programming (GEP). Recently, Peng et al. (2014) proposed an improved GEP algorithm especially suitable for dealing with SR problems. Gandomi and Roke (2015) compared the forecasting performance of artificial neural network models to that of GEP techniques.
SR is an empirical modelling technique used to construct regression models. Given a predetermined set of operations and functions, SR searches appropriate models from the space of all possible mathematical expressions that best fit the data. Zelinka et al. (2005) introduced analytical programming in order to synthesise suitable solutions in SR.
Given its versatility, SR has been increasingly used in different areas (Barmpalexis et al. 2011;Ceperic et al. 2014;Sarradj and Geyer 2014;Wu et al. 2008;Yang et al. 2015;Yao and Lin 2009;Zameer et al. 2017), but there have been very few SR applications in macroeconomics. Claveria et al. (2016) implemented SR via GP to derive a set of building blocks used to estimate economic activity. Klúcik (2012) used SR to estimate total exports and imports to Slovakia. Kotanchek et al. (2010) implemented SR via GP to predict economic activity. By means of SR, Kronberger et al. (2011) identified interactions between economic indicators in order to estimate the evolution of prices in the US. The authors suggested using SR for the exploration of variable interplay when approaching complex modelling tasks, as it provides a quick overview of the most relevant interactions and can help to identify new unknown links between variables.
In this study we design five independent SR experiments and apply GP in order to find the optimal combinations of survey expectations that best fit the actual evolution of economic activity in each region. We also asses the forecast accuracy of the obtained evolved economic indicators and compare it with several benchmarking models.

Data and methodology
In this study we use SR via GP to formalize the optimal interactions between survey variables that best allow to predict economic growth, restricting them to the main mathematical operations (addition, subtraction, multiplication, and division). In order to do so, we need to combine two types of information: qualitative survey expectations and quantitative official statistics from 2000:Q2 to 2016:Q3. Regarding the former, we make use of survey data on expectations from the World Economic Survey (WES) carried out quarterly by the Ifo Institute for Economic Research. As a proxy of economic activity we use the year-on-year growth rates of the Gross Domestic Product (GDP) retrieved from the Organisation for Economic Cooperation and Development (OECD) (https://data.oecd.org/gdp/quarterly-gdp. htm#indicator-chart).
The analysis is carried out for 26 European economies grouped in five regions based on the criteria used for statistical processing purposes by the United Nations Statistics Division. As a result, Austria, Belgium, France, Germany, Ireland, the Netherlands and the United Kingdom (UK) are grouped as Western Europe (1); Bulgaria, the Czech Republic, Hungary, Poland, Romania and the Slovak Republic as Eastern Europe (2); Croatia, Greece, Italy, Portugal, Slovenia and Spain as Southern Europe (3); Estonia, Latvia and Lithuania as the Baltic countries (4); Denmark, Finland, Norway and Sweden as the Scandinavian countries (5).
In Table 1 we present the twelve survey variables used in the study, denoted as X it , where i refers to each country and t to the time period. Survey variables can be divided in judgements, perceptions and expectations, depending on whether they refer to the expected value in the present, in the present compared to last year, or for  Garnitz et al. (2015) for an appraisal of the WES data. By means of GP we evolve a symbolic expression for each region combining the different survey variables for each country until a stopping criterion is reached. Regarding this criterion, it can either be a predetermined value of the fitness function or a given number of generations. As there is a trade-off between accuracy and simplicity, we have chosen a maximum number of 50 generations as as stopping criterion. In Table 2 we summarize the steps for implementing the experiment in each of the regions.
Genetic operators (crossover and mutation) are applied to the parents selected on the basis of the fitness function. Crossover consists on the recombination of randomly chosen parts of parents, while mutation on randomly altering a part of a parent. Consequently, the fitness of the population increases generation after generation. The output of this process is a set with the best individual functions from all generations for each region. In this study we have used the open source Distributed Evolutionary Algorithms Package (DEAP) framework implemented in Python (Fortin et al. 2012;Gong et al. 2015).
The obtained symbolic expressions are then used to generate out-of-sample forecasts of economic growth in all countries. To evaluate the performance of the evolved economic indicators we compute the accuracy of the forecasts and we compare it to that of the predictions obtained with both naïve and autoregressive (AR) time series models used as a benchmark.

Results
In this section we first present the results of the different experiments undertaken for each region (R = 1, 2, 3, 4, 5) for the in-sample period (2000:Q2 to 2014:Q1). The output,ŷ R;it , is the evolved expression obtained in each region, formed by the optimal combination of survey variables for each set of countries. The following evolved symbolic expressions can be regarded as survey-based indicators of economic activity for each region: We can observe that variable X4 it (perception of the overall economic situation compared to last year) is by far the variable that more frequently appears in the symbolic expressions, being present in all five evolved indicators. The second most frequent variables is X3 it , which refers to the judgement about the present situation of private consumption. The expectations about the future are the variables with a lower weight, being X7 it and X8 it (expectations for the next six months about the overall economy and capital expenditures respectively) the only variables that do not appear in any of the regions. Klein and Ö zmucur (2010) analysed the role of survey expectations in 26 European countries and found that he question related to production expectations was more useful in improving the forecasting performance than the aggregated confidence and sentiment indicators.
Given that X4 it and X3 it seem to be the most relevant survey variables, we repeated the five experiments using only those two variables, obtaining the following evolved symbolic expressions: Next, we generate forecasts of GDP growth using both sets of evolved indicators and evaluate their predictive performance in an out-of-sample forecasting comparison for the period 2014:Q2 to 2016:Q3. With this aim we compute several measures of prediction accuracy. First, the the mean absolute error (MAE), the root mean square error (RMSE), and the mean absolute percentage error (MAPE) in order to assess the predictive content in terms of forecast accuracy. Table 3 summarises the information of the different accuracy measures and the change in precision resulting from the use of the indicators of the second set of experiments. The best forecasting performance is obtained for Austria and the UK in Western Europe, for Slovakia in Eastern Europe, for Portugal in Southern Europe, and for Lithuania in the Baltic countries. If we average the results by region, we obtain the best results for the Baltic countries. These results are in line with those of Claveria et al. (2017), who in a similar experiment obtained the lowest MAE and RMSE values for Austria, Belgium, Bulgaria, Estonia, and Lithuania.
By using only X4 it and X3 it as the input variables, we find a deterioration of predictive accuracy in most countries, with the exception of Belgium, Germany, Italy, the Netherlands and the UK, where the three accuracy measures (MAE, RMSE and MAPE) decrease with respect to the ones obtained in the first set of experiments.
Finally, we compare the forecast accuracy of the evolved economic indicators obtained by means of all survey variables to the benchmark models. With this aim, we compute the mean absolute scaled error (MASE) and the percentage of periods with lower absolute error (PLAE) to compare the forecasting performance to the benchmarks.
Let us denote y t as the actual value, andŷ t as forecast at period t, t = 1, …, n. Forecast errors can then be defined as e t ¼ y t Àŷ t . We have two competing models A and B, where A refers to the forecasting model under evaluation and B stands for benchmark model. Given that there is a delay of more than a quarter between the publication of official quantitative data with regard to survey data, in this study we use two-step ahead naïve forecasts as a baseline. The MASE can then be obtained as the mean of the absolute value of the scaled error q t : The MASE, proposed by Hyndman and Koehler (2006), allows to scale the forecast errors by the mean absolute in-sample errors obtained with a benchmark model. This statistic presents several advantages over other forecast accuracy measures. On the one hand, it is independent of the scale of the data. On the other hand, it is easy to interpret: values less than one indicate that the average prediction computed with the benchmark model is worse than the estimates obtained with the proposed method.
With the aim of finding an easy to interpret measure to compare the forecast accuracy between two models, Claveria et al. (2015) proposed the PLAE statistic, which is also a dimensionless measure. The PLAE is based on the CJ statistic proposed by Cowles and Jones (1937) for testing market efficiency and the 'percent better' measure proposed by Makridakis and Hibon (2000) to compare the forecast accuracy of the models to a random walk. The PLAE consists on a ratio that calculates the proportion of periods in which the model under evaluation obtains a lower absolute forecasting error than the benchmark model: When comparing the obtained out-of-sample forecasts with the models used as a benchmark (last two columns of Table 3), MASE values show that in the case of Germany, Ireland, Italy and Greece, the forecast accuracy of the evolved indicators does not improve that of the in-sample average prediction of the naive method. The PLAE values obtained in these four countries also highlight that the percentage of out-of-sample periods in which the proposed regional indicator generates lower absolute forecasting errors is very low. Conversely, the high PLAE values obtained for both the Naïve method and the AR model for Croatia, Spain, Lithuania and Sweden are indicative of the good forecasting performance of the generated indicators for these countries. In most cases, the PLAE values obtained for both benchmarks are very similar, with the exception of Bulgaria or Slovakia, where the relative performance of the AR model improves. In Fig. 1 we graphically compare actual and predicted economic growth.
In Fig. 1 we can observe that proposed evolved economic indicators seem to better capture the evolution of GDP growth in most countries. The worst results are obtained in Ireland, which shows a particularly high economic growth during 2015. Therefore, with the aim of graphically assessing whether there are differences in the accuracy of the estimates of economic activity across regions depending on the level of growth, we compute the correlation between actual and predicted economic growth, differentiating between those periods in which economic growth lies out or within the interquartile range (IQR) of the distribution in the European Union. The IQR, also known as midspread, is a measure of statistical dispersion, obtained as the difference between upper and lower quartiles, Q3-Q1.
By discriminating between these two states of growth, we can graphically determine whether there are notable differences in the accuracy of the estimates of economic activity across regions. In Fig. 2 we present the boxplots for each region. We want to note that empirical correlation values in the smaller samples containing the extreme values are likely to be higher than in the subsets containing the remaining larger samples.
In Fig. 2 we can observe that the highest correlations during periods of high growth rates are obtained in Western Europe, with the exception of Ireland. It can also be seen that in all regions the performance of the evolved indicators seems to vary depending on the level of dispersion: during periods of average growth the correlation between estimates and actual values is lower than during periods of high growth rates. These results are in line with those obtained by Łyziak and Mackiewicz-Łyziak (2014), who found that the 2008 financial crisis period had led In the present study we also find evidence regarding the informative value of survey-based expectations. Our results are in line with recent findings by Altug and Ç akmakli (2016), Klein and Ö zmucur (2010), Kłopocka (2017) and Lehmann and Wohlrabe (2017). Altug and Ç akmakli (2016) found survey expectations useful to improve inflation forecasts. Klein and Ö zmucur (2010) found evidence that survey expectations improved the forecasting performance of autoregressive time series models in European countries. Kłopocka (2017) showed the usefulness of survey indicators to forecast household saving and borrowing rates in Poland. Lehmann and Wohlrabe (2017) found that consumers' unemployment expectations and new orders improved predictions of employment growth in Germany.
While there is ample evidence in the literature in favour of the usefulness of expectations to improve the predictive capacity at the macroeconomic level  Dua 1992, 1998;Batchelor and Orr 1988;Christiansen et al. 2014;Dees and Brinca 2013;Girardi 2014;Hansson et al. 2005;Ivaldi 1992;Kumar et al. 1995;Leduc and Sill 2013;Lemmens et al. 2005;Müller 2010;Qiao et al. 2009;Schmeling and Schrimpf 2011), several authors have recently proposed refinements in order to enhance the explanatory power of survey expectations in forecasting models. Bruestle and Crain (2015) have showed that controlling for significant versus insignificant changes in consumer confidence improved the accuracy of  Dreger and Kholodilin (2013) have noted that better performing survey-based indicators should be built upon preselection methods and data-driven approaches to determine the weights. In this work we have shown the appropriateness of the SR frame for empirical modelling. SR allows to address complex modelling issues in large data sets where the potential relationships between variables are unknown. In these circumstances, the implementation of SR via evolutionary computation provides researchers with an overview of the most relevant interactions and helps identifying new unknown links between variables. These features make this approach particularly indicated for non-linear modelling.
By means of GP we have simultaneously evolved the structure and the parameters of the models without imposing any a priori assumptions. In this regard, Bruno (2014) has recently noted the importance of avoiding restrictive assumptions about the functional form when modelling using survey indicators. Thus, a SR via GP approach can be of particular interest when it comes to quantify survey expectations, to construct data-driven survey-based indicators or to test economic hypothesis about the formation of agents' expectations.

Conclusion
This paper proposes an empirical modelling approach to design survey-based economic indicators at a regional level. By means of SR via GP we find the optimal combination of survey variables that best tracks the evolution of the economic activity in twenty-six European countries grouped in five regions (Western Europe, Eastern Europe, Southern Europe, Baltic countries and Scandinavian countries). This data-driven approach based on evolutionary computation allows us to transform qualitative survey expectations into quantitative estimates of economic activity.
We have used survey variables regarding expectations about the economic situation from the World Economic Survey in order to find the most relevant interactions in each region. This exercise allows us to rank the expectations according to the relative weight of each one in the evolved economic indicators. Although results differ across regions, agents' ''perception about the overall economy compared to the same time last year'' is the best predictor of economic activity.
In a second step, we assess the out-of-sample forecast accuracy of the evolved survey-based indicators in each region. The best forecasting performance is obtained for Austria and the UK in Western Europe, for Slovakia in Eastern Europe, for Portugal in Southern Europe, for Lithuania in the Baltic countries, and for Sweden in the Scandinavian countries. At the regional level we obtain the best results for the Baltic and the Scandinavian countries.
Finally, we evaluate if there are differences in the accuracy of the estimates of economic activity across regions depending on the level of growth. We find that Empirica (2019) 46:205-227 221 during periods of average growth rates the correlation between estimates and actual values is lower in all regions. The highest correlations during periods of high variability are obtained in Western Europe. In spite of the novelty of the proposed approach, this research is not without limitations. On the one hand, given that we used a data-driven method, the evolved economic indicators are not grounded in any theoretical background. On the other hand, extending the analysis to other survey data would allow us to examine the extent of the similarities in the derived functional forms. Another issue left for further research is testing whether the implementation of alternative algorithms could improve the forecast accuracy of empirically generated quantitative estimates of expectations.