A new approach for the quantification of qualitative measures of economic expectations

In this study a new approach to quantify qualitative survey data about the direction of change is presented. We propose a data-driven procedure based on evolutionary computation that avoids making any assumption about agents’ expectations. The research focuses on experts’ expectations about the state of the economy from the World Economic Survey in twenty eight countries of the Organisation for Economic Co-operation and Development. The proposed method is used to transform qualitative responses into estimates of economic growth. In a first experiment, we combine agents’ expectations about the future to construct a leading indicator of economic activity. In a second experiment, agents’ judgements about the present are combined to generate a coincident indicator. Then, we use index tracking to derive the optimal combination of weights for both indicators that best replicates the evolution of economic activity in each country. Finally, we compute several accuracy measures to assess the performance of these estimates in tracking economic growth. The different results across countries have led us to use multidimensional scaling analysis in order to group all economies in four clusters according to their performance.


Introduction
Business and consumer surveys are directly addressed to representatives of firms and agents. Respondents are asked about the expected direction of change of a wide range of variables. Accordingly, these surveys provide important information about agents' economic expectations and perceptions. Survey results allow comparisons among different countries' business cycles, and have become an indispensable tool for monitoring the evolution of the economy. The fact that survey results are available ahead of the publication of quantitative official data makes them very useful for prediction.
However, unlike experimental expectations, survey-based expectations are qualitative in nature. With the aim of overcoming this limitation, numerous quantification methods have been proposed in the literature (Nardo 2003;Driver and Urga 2004;Pesaran and Weale 2006;Vermeulen 2014). This line of research centered on the conversion of qualitative responses about the expected direction of change into quantitative measures has evolved in parallel with the application of new econometric techniques.
Recent developments in empirical modelling allow us to generate mathematical models from a given dataset. Empirical modelling has two main advantages over conventional approaches. On the one hand, it is especially suitable for finding patterns in large data sets, where little or no information is known about the system. On the other hand, empirical modelling allows to simultaneously evolve both the structure and the parameters of the model.
In a recent study, Lahiri and Zhao (2015) examined the quality of quantified expectations by comparing them to quantitative realisations at the firm level, obtaining significant improvements when relaxing the assumptions of quantification methods of qualitative survey data, particularly during periods of uncertainty, with high levels of disagreement between respondents.
These findings have led us to look for a data-driven assumption-free approach to transform survey measures of agents' expectations into quantitative estimates. We aim to break new ground in the quantification of survey responses on the direction of change by presenting a new method based on the implementation of recent developments in evolutionary computation to qualitative survey data.
The CESIfo Institute for Economic Research elaborates World Economic Survey (WES), which polls experts in 123 countries about economic trends (Kudymowa et al. 2013). We use 12 survey variables from the WES in 28 countries of the Organisation for Economic Co-operation and Development (OECD) to generate quantitative estimates of economic growth. In the first step, we combined and transformed agents' expectations about the future state of the economy. We repeated the experiment for agents' judgements about the present state of the economy. As a result, we derived a leading indicator and a coincident indicator of economic activity. In the second step, we applied index tracking, which is a procedure used for portfolio management, to calculate the optimal relative weights of both indicators that best replicated the evolution of the gross domestic product (GDP) in each country.
With the aim of examining the leading properties of these estimates of economic growth, we computed several accuracy measures to assess their predictive content and to evaluate their cyclical properties in terms of the level of synchronisation with the quantitative variable of reference. Finally, by means of a dimensionality reduction technique, we synthesise all the information provided by these performance measures and the characteristics of the data into two factors that allow us to cluster all economies into four groups.
The structure of the paper is as follows. The next section reviews the existing literature. In Sect. 3 we present the methodological approach and describe the experiment. Empirical results are provided in Sect. 4. Finally, conclusions are presented.
2 Literature review 2.1 Quantification of qualitative survey data The first attempt to quantify survey expectations was that of Anderson (1951Anderson ( , 1952, who proposed the balance statistic as a measure of the evolution of the quantitative variable it refers to. Aggregating individual replies as percentages of the respondents in each category, and assuming that the expected percentage change in a variable remains constant over time for all agents, the balance statistic is obtained as the subtraction between the percentage of agents reporting an increase and the percentage reporting a decrease. Based on these premises, Pesaran (1984Pesaran ( , 1985 developed this framework to allow for an asymmetrical relationship between individual changes and the evolution of the quantitative variable of reference. Using the relationship between actual values and respondents' perceptions of the past as yardstick for the quantification of expectations about the future, the author proposed the regression approach. By making positive and negative individual changes dependent on past values of the quantitative variable of reference, Smith and McAleer (1995) proposed a non-linear dynamic regression model to quantify survey responses that can be regarded as an extension of the regression approach. A drawback of the regression approach to quantify survey responses is that there is no empirical evidence that agents judge past values in the same way that they formulate expectations about the future (Nardo 2003). As a result, the regression approach is restricted to expectations of variables over which agents have direct control, be it prices or production. The development of this approach has also been conditioned by the procurement of a rationale for the application, which can only be obtained by means of the analysis of individual data. For an appraisal of individual firm data on expectations see Zimmermann (1997). Theil (1952) designed a theoretical framework to generate quantitative estimates from the balance statistic proposed by Anderson (1951). Based on the assumption that respondents report a variable to go up (or down) if the mean of their subjective probability distribution lies above (or below) a certain level, the author defined the indifference threshold, also known as the difference limen. This threshold was conceived as an interval around zero within which respondents perceive there are no significant changes in the variable, and respond that the variable remains unchanged. Let y it indicate the percentage change of variable Y it for agent i from time t -1 to time t, and R t and F t , denote the aggregate percentage of respondents at time t -1 expecting a variable to rise or fall at time t respectively. If y it e is the unobservable expectation that agent i has over the change of variable Y it , the indifference interval can be defined as (-a it , b it ), where a it and b it are the lower and upper limits of the indifference threshold for agent i regarding time t. Assuming that response bounds are symmetrical and fixed both across respondents and over time (a it = b it = k, Vi,t), and that agents base their answer on an independent subjective probability distribution that has the same form across respondents, the author generated quantitative estimates ofŷ t . Knöbl (1974) and Carlson and Parkin (1975) further developed the probability approach proposed by Theil (1952). As estimates ofŷ t are conditional on a particular value for the imperceptibility parameter k, and a specific form for the aggregate density function, Carlson and Parkin (1975) assumed that the individual density functions were normally distributed, and estimated k by assuming that over the sample-periodŷ t is an unbiased estimate of y t . Consequently, the role of k is to scale the aggregate expectations y t e such that the average value of y t equals y t e . Thus, using the evolution of the observed variable as a yardstick qualitative responses can be transformed into quantitative estimates. Fishe and Lahiri (1981), Batchelor (1982), Visco (1984), and Foster and Gregory (1977) used alternative distributions. There is inconclusive evidence on the type of probability distribution which aggregate average expectations come from. While Carlson (1975), Batchelor (1981), Batchelor and Dua (1987), Foster and Gregory (1977) and Lahiri and Teigland (1987) reject the hypothesis of normality, Dasgupta and Lahiri (1992), Balcombe (1996), Berk (1999) and Mitchell (2002) find evidence that normal distributions provide expectations that are as accurate as other non-normal distributions.
Another line of research has focused on refining the probability approach by relaxing the assumptions of symmetry and constancy of the indifference bounds. Several strategies have been proposed in the literature in order to introduce dynamic imperceptibility parameters in the probability approach. Bennett (1984), Batchelor (1986), Kariya (1990), and Berk (1999) made the threshold dependent on time-varying quantitative variables. Batchelor and Orr (1988) imposed the unbiasedness condition over predefined subperiods. Mitchell et al. (2007) generalised the Carlson-Parkin procedure to generate cross-sectional and time-varying proxies of the variance.
Using a time-varying parameter model (Cooley and Prescott 1976) together with the Kalman filter (Kalman 1960) for parameter estimation, Seitz (1988) was able to simultaneously introduce asymmetric and time-varying indifference thresholds. The author assumed that the imperceptibility parameters were subject to permanent and temporary shocks. Claveria et al. (2007) extended this framework by using a state-space representation that allowed for asymmetric and dynamic response thresholds generated by a firstorder Markov process.
Further improvements of quantification procedures have been developed at the micro level, either by means of experimental expectations generated by Monte Carlo simulations, or by comparing the individual responses with firm-by-firm realisations. Regarding the former option, Common (1985) generated simulated expectations to test the rational expectations hypothesis. Nardo and Cabeza-Gutés (1999) designed a simulation experiment to assess the performance of the different quantification methods, both in terms of the size of the measurement error and of its systematic nature, obtaining the best results with the probabilistic approach with time varying parameters.
By means of simulation-based expectations, Löffler (1999) and Terai (2009) also estimated the measurement error introduced by the probabilistic method. Additionally, Löffler (1999) and Claveria et al. (2006) proposed refinements of the Carlson-Parkin method. Claveria (2010) used computer-generated expectations to assess the forecasting performance of different quantification methods, and presented a variation of the balance statistic that took into account the proportion of respondents reporting that the variable remains unchanged.
Using firm-level survey responses, Mitchell et al. (2002) developed a procedure to quantify individual categorical expectations based on the assumption that responses are triggered by a latent continuous random variable as it crosses time-varying thresholds, and found evidence against time invariant thresholds. By introducing the ''conditional absolute null'' property, based on the empirical finding that the median of realised quantitative values corresponding to the ''no change'' category is zero, Müller (2010) proposed a variant of the Carlson-Parkin method with asymmetric and time invariant thresholds, which allows to solve the zero response problem that occurs when all respondents fall into one of the extreme responses (an increase or a decrease).
The variation of the indifference thresholds across individuals can only be tested by means of the analysis of individual expectations. Using a matched sample of qualitative and quantitative individual stock market forecasts, Breitung and Schmeling (2013) corroborated the importance of introducing asymmetric and dynamic indifference parameters, but found that individual heterogeneity across respondents plays a minor role in forecast accuracy. On the other hand, Lahiri and Zhao (2015) have recently found strong evidence against the threshold constancy, symmetry, homogeneity and overall unbiasedness assumptions of the probability method. The authors have generalised the Carlson-Parkin framework by means of a hierarchical ordered probit model. Based on a matched sample of households, they found that when the unbiasedness assumption is replaced by a timevarying calibration, the resulting quantified series better track the quantitative benchmark.

Evolutionary computation
Evolutionary computation is a subfield of artificial intelligence that is being increasingly applied in economics in the context of expensive optimisation. Evolutionary computation is based on the implementation of algorithms that adopt Darwinian principles of the theory of natural selection to automated problem solving. These algorithms are known as evolutionary algorithms (EAs). Evolutionary programming was introduced by Fogel et al. (1966). The most popular type of EA is the genetic algorithm (GA), which was initially proposed by Holland (1975). Cramer (1985) developed a generalisation of GAs known as genetic programming (GP). GP is a soft-computing search technique that allows the model structure to vary during the evolution, which makes it particularly indicated for non-linear and empirical modelling. See Poli et al. (2010) for a review of the state of the art in GP. Chen and Kuo (2002) classified the literature on the application of evolutionary computation to economics and finance. Most evolutionary computing in economics has been implemented in finance (Goldberg 1989). On the one hand, Acosta-González and Fernández (2014) used a GA to predict the financial failure of firms, and Acosta-González et al. (2012) also used this method to explain the 2008 financial crisis. Lawrenz and Westerhoff (2003) modelled exchange rates with a GA. Maschek (2010) evaluated the performance of the self-adaptation mechanism in GAs for the convergence to the rational expectations equilibrium. Thinyane and Millin (2011) applied GAs to optimize the signals generated by technical trading tools. Vasilakis et al. (2013) presented a GP-based technique to predict returns in the trading of the euro/dollar exchange rate. Wei (2013) used an adaptive expectation GA to optimise a fuzzy model to forecast stock price trends in Taiwan. For a review of the applications of GAs for financial forecasting see Drake and Marks (2002).
On the other hand, Á lvarez-Díaz and Á lvarez (2005) applied GP to predict exchange rates. Chen et al. (2008) analysed the performance of GP to financial trading. Kaboudan (2000), Larkin and Ryan (2008), and Wilson and Banzhaf (2009) used GP for stock price forecasting. Yu et al. (2004) implemented a GP approach to model short-term capital flows.
Applications of GP in macroeconomics have been very limited. The first GP application is that of Koza (1992), who used GP to reassess the exchange equation relating the price level, gross national product, money supply, and the velocity of money. Chen et al. (2010) applied GP in a vector error correction model for macroeconomic forecasting. Duda and Szydło (2011) developed economic forecasting models by means of gene expression programming (GEP), which can be regarded as a version of GP (Ferreira 2001). Koza (1992) developed GP to find the best single computer program to implement symbolic regression (SR). SR can be regarded as a new approach to empirical modelling. Given a predetermined set of operations and functions, SR searches appropriate models from the space of all possible mathematical expressions that best fit the data. Zelinka et al. (2005) introduced analytical programming in order to synthesise suitable solutions in SR. Due to its versatility, SR is being increasingly used in different areas: from industrial data analysis ) and the experimental design of manufacturing systems (Can and Heavey 2011), to signal processing (Yao and Lin 2009) and various other applications (Barmpalexis et al. 2011;Cai et al. 2006;Ceperic et al. 2014;Sarradj and Geyer 2014;Wu et al. 2008).
There have been very few applications in macroeconomics. Claveria et al. (2016) implemented SR via GP to derive a set of building blocks used with forecasting purposes. Kl'účik (2012) applied SR to estimate total exports and imports to Slovakia. Kotanchek et al. (2010) used SR via GP for GDP forecasting. By means of SR, Kronberger et al. (2011) identified interactions between economic indicators in order to estimate the evolution of prices in the US. Yang et al. (2015) used SR for production forecasting of crude oil. Recently, Peng et al. (2014) have proposed an improved GEP algorithm especially suitable for dealing with SR problems.

Data
This study matches two sources of information for 28 countries of the OECD: Austria, Belgium, Bulgaria, Croatia, the Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Ireland, Italy, Japan, Latvia, Lithuania, the Netherlands, Norway, Poland, Portugal, Romania, the Slovak Republic, Slovenia, Spain, Sweden, the United Kingdom (UK), and the United States (US). On the one hand, we use quantitative official statistics regarding the evolution of economic activity. Specifically, the year-onyear growth rates of quarterly GDP data from the OECD (https://data.oecd.org/gdp/ quarterly-gdp.htm#indicator-chart). The sample period goes from the third quarter of 2000 to the first quarter of 2014. On the other hand, we use qualitative survey data reflecting agents' expectations about the future, and their judgements about the present economic situation.
We focus on the main survey variables from the WES, which assesses worldwide economic trends by polling professionals and experts on current economic developments in their respective countries (Kudymowa et al. 2013). Białowolski (2016) notes that professional respondents are characterised by significantly lower biases in responding to survey questions than consumers. Franses et al. (2011) also find evidence in favour of experts' forecasts when compared with pure model forecasts. See Henzel and Wollmershäuser (2005), Stangl (2007) and Hutson et al. (2014) for an appraisal of the WES.
Respondents are asked about the economic situation in three different forms: their expectation by the end of the next 6 months (variables X7-X12), their present judgement (variables X1-X3), and their assessment compared to the same time last year (variables X4-X6). The economic situation is assessed with respect to three items: the overall economy (variables X1, X4 and X7), capital expenditures (variables X2, X5 and X8), and private consumption (variables X3, X6 and X9). Respondents are also asked about their expectations about the volume of exports (X10), imports (X11) and the balance of trade (X12). All twelve variables are presented in Table 1.
In order to present the survey results, the Ifo uses a grading procedure that is conceptually equal to calculating balances: positive replies are assigned a grade of nine; indifferent replies, a grade of five; and negative replies, a grade of one. A country's results are weighted according to its share of exports and imports in total world trade (CESifo World Economic Survey 2016). The Ifo also constructs an aggregate indicator obtained as the arithmetic mean of assessments of the general economic situation and the expectations for the economic situation in the next 6 months: the economic climate indicator (ECI). The ECI tends to correlate closely with the actual business-cycle trend measured in annual growth rates of real GDP (Claveria et al. 2016;Garnitz et al. 2015).

Methods
SR is based on the search for relationships between a given set of variables. The major difference in relation to conventional regression analysis resides in the fact that while the former is based on a certain model specification, SR does not rely on a specific a priori determined model structure. The only assumption made in SR is that the response surface can be described by an algebraic expression.
GP can be regarded as an extension of GAs in which the solutions are expressed in the form of computer programs. The main difference between them is in the representation of the structure: while GP codes potential solutions by means of tree-structured, variable length representations, GAs use fixed length binary string representations. GP's more general representation scheme allows the model structure to vary during the evolution. This feature is particularly suitable in the current study, in which the functional relationship between the set of survey variables is arbitrary and unknown. Consequently, we use GP to solve the SR experiment, and to transform qualitative survey data into quantitative estimates of economic activity, formalising the interactions between a wide range of survey-based indicators. The implementation of GP for SR was based on the following sequence of steps: First-the creation of an initial population. We determined a population size of 3 million individuals.
Second-determination of a fitness function. In order to evaluate the fitness of each member of the population we use the root mean square error (RMSE).
Third-determination of a strategy for the selection of parents for replacement, which are the programs used to create offspring programs. In order to guarantee the diversity in the population, we use the tournament method of selection of individuals for crossover. This method is based on the selection of the fittest individual in each of the tournaments, which are run among a group of individuals chosen at random from the population. One of the main advantages over other alternative methods is that the selection pressure can be easily adjusted, and it is efficient to code.
Fourth-determination of the probability of a new generation and application of genetic operators to the parents. The main genetic operations are reproduction (copy), crossover (recombination of randomly chosen parts of parents), and mutation (random alteration of a part of a parent). We select a 0.1 mutation probability to prevent trapping into local optima.
Fifth-determination of constants. With the aim of avoiding the search path to deviate from the optimum we include the automatic generation of constants provided by the algorithm, which are optimised after a number of generations according to their correlation relative to the functional form.
Sixth-determination of a stopping criterion. We set a maximum number of 150 generations as the termination criterion. Steps three and four are repeated until a new generation is created. If no individual in the population has a required minimal fitness, or the stopping criterion is fulfilled, everything is repeated using the new generation as the population. As a result, the fitness of the population is ever increasing.
The search process is characterised by a trade-off between accuracy and simplicity. To limit the complexity of the resulting expressions, the set of functions is restricted to some elementary functions. We use the Distributed Evolutionary Algorithms Package (DEAP) framework implemented in Python (Fortin et al. 2012;Gong et al. 2015).
By matching qualitative survey indicators from the WES to quantitative official data in two successive SR experiments, we are able to derive two analytical expressions: one linking agents' expectations about the future (variables X7-X12) to economic growth, and another one combining agents' judgements about the present (variables X1-X6).

Results
First, we present the output of the two SR experiments that were undertaken. On the one hand, expression (1), which combines agents' expectations about the future, and can therefore be regarded as a leading indicator of economic activity (ŷ 1;it ). On the other hand, expression (2), which combines agents' judgements about the present state of the economy, and can be seen as a coincident indicator (ŷ 2;it ): where the sub index i refers to each specific country at time t. In Fig. 1 we graphically compare the evolution of the two SR-generated indicators to that of the GDP. We can observe that whileŷ 2;it is closely correlated to the oscillations of GDP in all countries,ŷ 1;it shows a worse performance, especially since the 2008 financial crisis. Łyziak and Mackiewicz-Łyziak (2014) found that the 2008 financial crisis period led to a decrease in expectational errors in transition economies. Claveria et al. (2016) obtained a similar result for ten Eastern European countries. In a second step, we derive estimates of economic growth by combining the information of both indicators. We use a procedure of constrained optimisation known as index tracking, which is used in finance in order to replicate the performance of stock indexes (Karlow 2012;Kwiatkowski 1992;Rudd 1980). Index tracking consists on the minimisation of a tracking error, which is defined as the expected squared deviation of return from that of the index, with the aim of obtaining the proportion of capital to be invested in each asset. Based on this premise, we use a generalised reduced gradient algorithm to minimise the summation of squared forecast errors. We impose two restrictions on the optimisation process with respect to the value of the weights. First, the sum of both weights must equal one. Second, we impose a non-negativity restriction, so that the weights must be equal to or larger than zero. As a result, we obtain the optimal weights of both the leading and the coincident indicator for each country (Table 2).
While the obtained relative weight of the coincident indicator is higher than that of the leading indicator, we observe numerous differences across countries. In Greece, Hungary, Latvia, Romania, the Slovak Republic, and Slovenia, the algorithm yields a null weight to the leading indicator constructed with agents' expectations about the future. This result contrasts with that of Lacová and Král (2015), who found that in Slovakia companies are slightly more forward-looking than backward-looking. On the other extreme, in countries such as Norway and Japan, future expectations outweigh judgements about the present. This result brings up the question of whether survey-based indicators should equally weight the information regarding the expectations about the future and the judgements about the present in all countries.
There is no consensus in the literature regarding the information content of survey expectations. Breitung and Schmeling (2013) compared quantified stock market expectations with quantitative forecasts, and found that there was a weak correlation between them. Lacová and Král (2015) found that quantified survey expectations in Slovakia systematically failed to capture changes in the consumer price index. Ö sterholm (2011, 2012), Lui et al. (2011a, b) and Maag (2009) reached similar conclusions. On the other hand, there is ample evidence that survey expectations provide useful information for economic modelling (Altug and Ç akmakli 2016;Dua 1992, 1998;Dees and Brinca 2013;Girardi 2014;Hansson et al. 2005; Jean-Baptiste 2012; Klein and Ö zmucur 2010;Leduc and Sill 2013;Lemmens et al. 2005;Müller 2009;Qiao et al. 2009;Schmeling and Schrimpf 2011). In order to evaluate the performance of the resulting estimates in monitoring economic activity, we compute several measures of forecast accuracy: the mean absolute error (MAE) and the RMSE to assess the predictive content in terms of forecast accuracy, the mean absolute scaled error (MASE) to compare the forecasting performance to a benchmark model, and the concordance index (CI) proposed by Harding and Pagan (2002) to evaluate the cyclical properties in terms of the level of synchronization with the quantitative benchmark variable.
In Table 3, we present the MAE and the RMSE. We observe differences across countries. Austria, Belgium, Bulgaria, Estonia, and Lithuania present the lowest MAE and RMSE values. In the other extreme, Croatia, Hungary, Ireland, Italy and Poland are the economies where we obtain the least accurate predictions.
Next, we complement the assessment of the estimates by comparing them to those obtained with a benchmark model. With this aim, we compute the MASE proposed by Hyndman and Koehler (2006). The idea behind the MASE is to scale the errors by the mean absolute errors obtained with a benchmark model. The MASE statistic presents several advantages over other forecast accuracy measures, as it is independent of the scale of the data, and it does not suffer from some of the problems presented by other relative measures of forecast accuracy (Hyndman and Koehler 2006). An additional advantage is its easy interpretation, as values less than one indicate that the average prediction computed with the benchmark model is worse than the estimates obtained with the proposed method. If we denote e t as the forecast error, the MASE can be obtained as the mean of the absolute value of the scaled error q t : Given that official quantitative data are published with a delay of more than a quarter compared to survey data, we use two-step ahead naïve forecasts as a benchmark. In Table 4 we present the MASE results. In Austria, Belgium, Bulgaria, Estonia, Greece, Japan, Latvia, Lithuania, Norway, the Slovak Republic and the UK, the estimates of GDP obtained with the proposed method outperform those of the benchmark model.
In Fig. 2 we compare the obtained forecast results to the standard deviation of the yearon-year growth rates of GDP. There seems to be no relation between the MASE results and the variability of economic activity. While Estonia and Latvia are the countries that present the highest levels of dispersion, the forecast errors are low, and the opposite holds true for Hungary, Croatia and Poland.
As there is evidence that the trend in the Ifo's ECI correlates closely with the actual business-cycle trend measured in annual growth rates of real GDP (CESifo WES 2016), next we evaluate the cyclical properties of the proposed SR-generated estimates. We compare it with the ECI in terms of the level of synchronisation. To that end, we use the CI proposed by Harding and Pagan (2002): where T is the number of observations, y t refers to the percentage growth rate of GDP and x t to the variable under analysis. S is a binary variable that takes the value of one if the series is in expansion, and zero otherwise. As in the C index developed by Harrell et al. (1996), the CI is expressed as the proportion of sample periods in which the two series are in the same phase of the cycle. Thus, the CI allows us to assess the proposed indicator in terms of regime shifts. Results in Table 5 show that, in most cases, there are no major differences in CI values between both proxies. While in countries like Belgium, Finland, Hungary, the Netherlands and Spain, the ECI shows a lower level of synchronisation, the opposite holds true for Japan and Norway.
To synthesise all the information provided by the above performance measures and the characteristics of the data, we finally compute two factors that allow us to cluster all   (Harding and Pagan 2002). A one value indicates that the cycles of the variables under comparison are in the same phase one hundred percent of times economies into four groups. By transforming the original set of correlated performance measures into a smaller and more understandable set of uncorrelated factors, we aim to summarize the results of the present study. We make use of multidimensional scaling (MDS) analysis to generate a two-dimensional perceptual map in which we position all 28 economies according to their coordinates regarding the two factors. MDS is a multivariate analytical procedure also known as principal coordinates analysis (Torgerson 1952(Torgerson , 1958. For a detailed description of this technique see Hair et al. (2009) andJolliffe (2002). MDS allows us to visualise the level of similarity of individual cases of a dataset. In our case, the proximity between the different countries in the perceptual map indicates how similar they are in terms of the performance of survey-based measures of economic expectations. First, we rank all 28 countries in decreasing order according to their performance experienced over the sample period for each of the following measures: the weight of the leading indicator (xŷ 1;it ), the summation of squared forecast errors (SSE), the MAE, the RMSE, the MASE, the CI, and the standard deviation of GDP growth (r y t ) in each country. Second, we assign a numerical value to each country corresponding to its position. We used the Kaiser-Guttman method (Guttman 1954;Kaiser 1960;Yeomans and Golder 1982) in order to determine the number of factors to retain. According to this criterion, only the factors that have eigenvalues greater than one are retained for interpretation. In our case, only the first two factors have eigenvalues larger than the unity, indicating that the optimal number of dimensions is two.
After deciding on the number of components, we reduce the information of all rankings into two dimensions, which can be regarded as two synthetic indicators that maintain the original ordinal structures. We obtain a Kruskal stress value of 0.012, which indicates the amount of distortion in distances to tolerate. Stress values range from 0 to 1, where zero denotes a perfect representation of the input data in two dimensions. The two-dimensional scatterplot that represents the coordinates of the first two dimensions for each country is presented in Fig. 3.
The perceptual map is divided into four quadrants. In the top right quadrant, we find the countries with the highest scores in the two dimensions: Belgium, Norway, Austria, Lithuania, Japan and the UK. In this group of economies, the evolution of GDP displays a stable pattern, and expectations show a good forecasting performance. In the other extreme, in the lower left quadrant, we find the economies with the lowest scores in both dimensions: Ireland, Hungary, Slovenia, Poland, Italy, Finland, Sweden and Denmark, which is very close to Spain, Portugal and Germany in top left quadrant. Croatia is grouped apart, obtaining the lowest score in the first dimension, as opposed to Bulgaria, Estonia and Latvia, which are the economies with the highest scores in the first dimension.
According to Lee (1994), the differences between the actual values of a variable and the quantified expectations may arise from three different sources: the measurement or conversion error due to the use of quantification methods to approximate unobservable expectations; the expectational error due to the agents' limited ability to predict the movements of the actual variable; and the sampling errors. The groupings in Fig. 3 are indicative of different values regarding these three sources of error.

Conclusion
This paper proposes an empirical approach to transform qualitative survey responses on the direction of change into quantitative estimates of economic activity by means of SR via GP. We used survey-based agents' expectations about the economic situation from the World Economic Survey in 28 countries of the OECD to derive a leading indicator, which consists of an optimal combination of survey variables that best tracks the evolution of the economic activity. We repeated the experiment using agents' perceptions about the present economic situation to obtain a coincident indicator.
We then combined the information from the leading and the coincident indicator by means of index tracking, which is a procedure to find the optimal relative weights of both indicators. By doing so, we generated quantitative proxies of economic activity. To assess the forecasting performance of the generated estimates, we computed several stylised facts and compared them to a benchmark model and to the ECI constructed by the Ifo. The heterogeneity of the results across countries led us to synthesise the information provided by all the forecast accuracy measures by means of a dimensionality reduction technique that allows us to cluster all economies according to their performance.
We obtained significant differences between mean values of each cluster, which indicates that the forecasting performance of survey-based expectations could be improved by designing ad-hoc quantification procedures for countries with similar characteristics.
Due to the novelty of the proposed approach, there are still several limitations to be addressed. Given that we used a data-driven method, the obtained quantitative estimates lack any theoretical background. Extending the analysis to other questionnaires would allow us to examine to what level the obtained functional forms are extensive to different survey data. Another issue left for further research is to assess the effect of GDP updates on the results. Finally, there is the question of whether the implementation of alternative EAs could improve the forecast accuracy of empirically generated quantitative estimates of expectations.