A new metric of consensus for Likert-type scale questionnaires: an application to consumer expectations

In this study, we present a metric of consensus for Likert-type scales. The statistic provides the level of agreement for any given number of response options as the percentage of consensus among respondents. With this aim, we use a geometric framework that allows us to analytically derive a positional indicator. The statistic is obtained as the relative weight of the distance from the point containing the proportions of observations that fall in each category to the centre of a regular polygon with as many vertices as categories, which corresponds to the point of maximum dissent. The polygon can be regarded as the area that encompasses all possible answering combinations. In order to assess the performance of the proposed metric of consensus, we conduct an iterated forecasting experiment to test whether the inclusion of the degree of agreement in households’ expectations improves out-of-sample forecast accuracy of the unemployment rate in seven European countries and the Euro Area. We find evidence that the level of consensus among households contains useful information to predict unemployment rates in all cases. This result shows the potential of agreement metrics to track the evolution of economic variables. Finally, we design a simulation experiment in which we compare the sampling distribution of the proposed metric for three- and five-response alternatives, finding that the distribution of the former shows a higher level of granularity and dispersion.


Introduction
The widespread practice of online shopping has fostered the use of customer satisfaction surveys to gather feedback from consumers. Most of these questionnaires make use of Likert scales to elicit the degree of satisfaction of web users. Likert scales were developed to measure people's attitudes [1], and they are the most common approach to scaling responses in survey research. Likert scales result when survey participants are asked to rank their agreement with a set of items on a scale that has a limited number of possible responses that are presented in a sequence. Although the number of responses is usually five, which can take the following form: "strongly agree", "agree", "undecided", "disagree", "strongly disagree", it may vary. See [2] for a discussion on the optimal number of response alternatives.
As a result, Likert scales are ubiquitous in opinion polls. Consumer surveys, which are conducted among households, use Likert-type questionnaires with three and five reply options. These surveys are the main source for eliciting the expectations of consumers. Consequently, household surveybased expectations are widely used as explanatory variables in quantitative forecasting models [3,4] and also to test economic hypothesis [5][6][7][8].
The aim of the paper is twofold. On the one hand, we present a metric to compute the degree of consensus among respondents in Likert-type questionnaires for any given number of responses. The measurement of agreement (and dissent) is a key component of data analysis and the study of human behaviour [9]. Although most metrics of qualitative variation are designed so that the extreme categories have a greater weight, in this study we propose a metric that equates all categories, making the neutral or no-change values have the same importance in the calculation. By means of a geometric framework, we derive a measure of positional agreement that has a straightforward interpretation, as it provides the percentage of agreement among the respondents. In addition, the positional nature of the indicator allows monitoring the evolution of the response pattern in a regular polygon with as many vertices as answering categories, which contains all possible response combinations.
On the other hand, we assess the performance of the proposed consensus statistic. With this aim, we first compute the level of agreement between households regarding their expectations about the future evolution of unemployment. The analysis is performed for a set of long-term member states of the European Union (EU) that reflects the diversity in terms of the evolution of unemployment in European countries, especially since the 2008 financial crisis. Then, we evaluate whether the metric helps to improve the accuracy of unemployment rate forecasts.
With this objective, we design an iterated one-period ahead forecasting experiment in which we generate pseudo out-of-sample predictions of the unemployment rates using an autoregressive model as a benchmark. Next, we replicate the experiment including the agreement metric as a predictor, and test if the reduction in the mean percentage absolute forecast error is statistically significant.
The structure of the paper is as follows. The next section introduces consumer surveys. We then present the methodological approach. Empirical results are provided next. Finally, we end with some concluding remarks.

Survey data on households' expectations
Households' expectations are not directly observable, and therefore are elicited through surveys. In the United States, the University of Michigan Survey Research Center started conducting Surveys of Consumers in 1946, covering three broad areas of consumer sentiment: personal finances, business conditions in the economy as a whole, and the appraisal of market conditions for large household durables, vehicles and houses. Based on the significance of consumers' subjective assessments of the economic and social trends, the European Commission conducts the harmonised Consumer Survey since May 1972. The joint harmonised EU Consumer Survey is comprised of twelve monthly questions and three quarterly questions. Its main purpose is to collect information on households' spending and savings intentions, and to assess their perception of the factors influencing these decisions. To this end, consumers are asked about the household financial situation, the general economic situation, savings, and intentions with regard to major purchases [10].
The Consumer Survey mainly consists of qualitative questions, following a similar answer scheme in which responses are given according to a three-option ordinal scale ("increase", "remain unchanged", "decrease") or a five-option ordinal scale ("sharp increase", "slight increase", "constant", "slight decrease", "sharp decrease"). Due to their simplicity, the questionnaires are easy to complete. Surveys are conducted during the first three weeks of each month, and results are published at the end of each month. Consequently, households' expectations are available prior to the publication of official data, making them particularly useful for monitoring economic developments and shortterm forecasting.
Since the 2008 Great Recession, there has been a renewed interest in the measurement of economic uncertainty. In [11], the authors found that during times of high uncertainty survey respondents tended to give more heterogeneous answers to the questions focused on relevant economic variables. Since then, economic uncertainty has been increasingly approximated by the degree of disagreement among survey respondents [12][13][14]. In this sense, in [15], the authors proposed using the cross-sectional variation of individual expectations. With the objective of capturing this heterogeneity, different measures have been considered, including the index of qualitative variation (see [16,17]).
In this study, instead of measuring the level of disagreement, the focus is on the level of agreement among survey respondents. Another difference with previous research lies in the fact that while most measures of disagreement are exclusively built by means of the responses that fall into the extreme categories ("increase" and "decrease"), while we incorporate the information coming from the respondents expecting a variable to remain constant. With the aim of assessing the proposed metric of consensus, we use households' expectations about the future evolution of unemployment, elicited from the joint harmonised EU Consumer Survey. In the questionnaire, households are asked how they expect the number of people unemployed in the country to change over the next twelve months. In the "Appendix", we present the results of a simulation experiment in which we compare the sampling distribution of the proposed metric with that of the index of qualitative variation for different numbers of response options.

Methodology
In this section we present a methodology to compute a metric of agreement among survey respondents of Likert-type questionnaires. The frame of reference is based on a geometric application to determine the likelihood of disagreement among election outcomes proposed by [18]. Let us assume a Likert-type questionnaire with K reply options, where R i,t denotes the aggregate percentage of responses in category i at time t , where i = 1, … , K and t = 1, … , n.
Given that the sum of the shares of reply options adds to 100, the vector X t containing all the information from the surveyed units ( R i,t ), which corresponds to the barycentric coordinates, can be represented as a point on a regular polygon [19]. This polygon, within which all possible combinations of response options are contained, has as many vertices as answering categories. Each vertex corresponds to a point of maximum consensus. We propose measuring the level of agreement as the ratio between 'the distance of the point to the barycentre' and 'the distance from the barycentre to the nearest vertex'. See Fig. 1 in the next section for two specific cases ( K = 3 and K = 5).
Since the barycentric coordinate system allows computing the vertical distance of a point in the polygon to the nearest edge, a positional measure of consensus at a given time period t can be formalised as: This metric reaches the maximum of 100% when a response category draws all the responses (total consensus), and the minimum value of zero when the answers are evenly distributed across the K response categories (maximum dissent). (1)

Empirical results
The empirical analysis is done for households' survey data regarding their unemployment expectations. We use monthly data from the Consumer Survey conducted by the European Commission (https ://ec.europ a.eu/info/busin ess-econo myeuro/indic ators -stati stics /econo mic-datab ases/busin ess-andconsu mer-surve ys_en). The sample period goes from January 2006 to December 2017. Consumers are asked whether they expect a certain variable to "sharply increase", "slightly increase", "remain constant", "slightly decrease" or "sharply decrease". We respectively denote the aggregated percentages of the individual replies in each category as PP t , P t , E t , M t and MM t . Consumers are also faced with questions with three reply options, in which they are asked whether they expect a variable to "increase", "remain constant" or "decrease". In this case, the percentages of respondents are respectively noted as P t , E t , and M t . These shares configure vector X t , which is represented with a grey point in the polygons in Fig. 1, where we depict the resulting polygons for both three-and five-response options.
In this study, we compare the performance of the proposed metric for both scenarios: the percentage of agreement for three ( C3 t ) and for five reply options ( C5 t ). As unemployment expectations are elicited via five reply options, in order to compute C3 t we opt for grouping all positive Note: The equilateral triangle corresponds to the three-reply option, where E denotes the % of "remains constant" replies, P the % of "increase", and M the % of "decrease". The regular pentagon corresponds to the five-reply question, where E denotes the % of "remains constant" replies, P the % of "slight increase", PP the % of "sharp increase", MM the % of "sharp fall", and M the % of "slight fall". The grey point in the polygons corresponds to a unique convex combination of all reply options for a given period in time. responses in P t , all negative ones in M t , and incorporating the "do not know" share in E t . In Fig. 2, we graph the evolution of both consensus measures for the set of countries analysed in the study (France, Germany, Greece, Ireland, Italy, Portugal, Spain and the Euro Area). As a backdrop we represent the distribution of survey responses grouped in three categories. We can see that both measures co-evolve during the sample period, being the five-response consensus metric the one that shows less dispersion. This notion is further confirmed in Table 2 and Fig. 3 of the "Appendix", where we present the results of the simulated sampling distributions of the polygons defined in Fig. 1 by generating a uniform set of 10,000 vectors.
Several authors have recently addressed the effect of online job searches on unemployment forecasts [20,21], but the role of unemployment expectations has been largely overlooked. To fill this gap, we test if the proposed measure of consensus in unemployment expectations helps to improve the accuracy of unemployment forecasts. With this aim, we use seasonally adjusted unemployment rates provided by the OECD (https ://stats .oecd.org/index .aspx?query id=36324 ).
In order to evaluate the forecasting performance of both metrics, we introduce the percentage of agreement among households regarding the future evolution of unemployment. We use both C3 t and C5 t as explanatory variables in autoregressive (AR) models. This model is usually referred to as ARX or dynamic regression model. AR models explain the behaviour of the endogenous variable as a linear combination of its own past values: where y t refers to the rate of unemployment at period t, and t to the innovation, which is assumed to behave as a white noise process. Regarding the number of lags that should be included for each period in every country, we choose between models with a minimum of 1 lag and a maximum of 4 lags, selecting the model with the lowest Akaike's information criterion [22].
We design an iterated one-period ahead forecasting experiment in order to assess the forecast accuracy of the out-of-sample predictions. We use the last twelve periods to compute the mean absolute percentage forecast error (MAPFE), which is a scale-independent measure that weighs the absolute forecast error by the actual value of the variable for every point in time: where e t denotes the forecast error at period t . The fact that we are dealing with positive data and comparing countries with different unemployment rates, makes the MAPFE (2) y t = 1 y t−1 + 2 y t−2 + ⋯ + p y t−p + t particularly suitable in this case (see [23]). We also compute the small-sample modification of the Diebold-Mariano (DM) test of forecast accuracy proposed by [24] to evaluate whether the inclusion of the consensus metric significantly lowers forecast errors. Under the null hypothesis that there is no significant difference in precision, the statistic follows a standard normal distribution. A negative sign indicates that the second model has larger forecast errors. Finally, in order to provide further insight regarding the forecasting performance of both C3 t and C5 t , we also compute the % of periods with lower absolute error (PLAE). This accuracy measure was proposed by [25], and is a dimensionless measure based on the CJ statistic for testing market efficiency [26]. It can be regarded as a generalisation of the 'percent better' measure proposed by [27] to compare the forecast accuracy of the models to a random walk. The statistic consists on the proportion of periods in which the model under evaluation obtains a lower absolute forecast error than the benchmark model.
Given two competing models A and B , where A refers to the forecasting model under evaluation, whereas B stands for benchmark model, the PLAE can be computed as: In this study we use the AR model as a benchmark. Table 1 contains the results of the pseudo out-of-sample forecasting evaluation.
We can observe that the information coming from the degree of agreement helps to refine unemployment rate predictions in all countries. This improvement is statistically significant in most of the countries for the model augmented with the metric of agreement for K = 3 . The reduction in forecast errors seems more relevant in countries with high unemployment rates (Greece, Ireland, Portugal, and Spain). These results are in line with those obtained by [28] and [29], who found that households' expectations exhibit a high forecasting accuracy. To compare the forecasting performance between C3 t and C5 t , we use the % of PLAE and find that, with the exception of Greece, in the rest of the countries the three-response metric performs better.

Concluding remarks
The paper presented a new measure of consensus for Likert-type questionnaires. We proposed using a geometric framework to construct a positional indicator that gives the degree of agreement as the relative weight of a distance. This ratio ponders the distance of each point, containing the proportions of observations that fall in each category, to the centre of a regular polygon, which has as many vertices as

France Germany
Greece Ireland

Spain E uro Area
Note: The blue area represents the evolution of the percentage of "fall" responses (sharply and slightly) regarding the level of unemployment over the next 12 months, while the light grey area represents the % of "increase" responses (sharply and slightly) and the white area the % of "remain the same (no-change)" responses. The black dashed line represents the evolution of the of the consensus measure for three categories, while the red line the consensus for five categories. With the aim of assessing the performance of the proposed metric of agreement, we used it to compute the level of consensus in consumers' unemployment expectations in seven European countries and the Euro Area. We designed a two-step iterated forecasting experiment in which we first generated out-of-sample predictions of the unemployment rates using an autoregressive model as a benchmark. Then we replicated the experiment augmenting the models with the agreement metric as a predictor so as to test whether its inclusion significantly improved forecast accuracy.
We found that the degree of agreement improved forecast accuracy in all cases. The reduction in forecast errors was found to be statistically significant in four of the countries included in the analysis (Greece, Ireland, Italy and Spain) and in the Euro Area. On the one hand, this finding reveals that the degree of consensus in households' unemployment expectations contains useful information to predict unemployment rates. On the other hand, these results underline the importance of the measurement of agreement, and hint at the usefulness of consensus-based metrics to track the evolution of economic variables.

Appendix
We design a simulation experiment to compare the performance of the proposed metric of consensus (dissent) to the index of qualitative variation (IQV). Starting from the simulation of 100,000 aggregated response combinations for five categories, we group the shares of the extreme reply options in order to compute the consensus metrics for K = 3 and K = 5 (C3 and C5), and the inverse of the IQV for both scenarios (1-IQV_3 and 1-IQV_5).
In Table 2 we present the summary statistics for both metrics. While extreme values and standard deviations are quite similar, the interquartile range (IQR), which is obtained as the difference between upper and lower quartiles, is higher for the consensus indicators, especially for three categories. We also observe that the consensus metrics yield higher mean values. This notion is further confirmed in Fig. 3, where we compare the boxplots for both measures. The plots represent the respective distributions through their quartiles,   without making any assumptions about the underlying statistical distribution. When comparing the consensus metric for three-and five-response options, it can be seen that the distribution of C3 encompasses a wider range of the scale, and its distribution of scores is more uniform, indicating a higher level of granularity for the median values of the distribution in comparison to C5. Something similar happens when comparing the inverse of IQV for both for K = 3 and K = 5.
When the proposed consensus measure is compared with the inverse of IQV, we obtain higher average values with the former. Finally, we use a scatter plot to depict the type of relationship between both metrics, observing a non-linear relationship between them (Fig. 4).