Construction of a Composite Index of European Identity

In this work, we build a Composite Index of European Identification. We analyse the main theoretical backgrounds explaining citizens’ identification with Europe and the alternative types of identities (civic against the ethnic-cultural constructions) and the social psychology theory (based on awareness, emotional attachment and evaluation). We have found that neither of them report spatially homogeneous results over Europe. The proposed solution in this work is the use of fuzzy set techniques. This option is flexible enough to report an index of identification with Europe at the individual level. The Composite Index of European Identity a report an alternative specification to the self-reported perception given by individuals. In this work, we have reported a brief description of the index, which report higher values in Eastern European countries and Spain, and lower average figures in the United Kingdom, France, Sweden and the Netherlands. The index is clearly increasing with education, income and city size, and does not display any first-sight association with respondent’s age and gender. The Composite Index is an important complement to the survey’s scores, as it captures the grounds in which social identification is built and consequently displays a more robust picture of the determinants of European identity and particularly by the role played by regional policies. Acknowledgements: This paper is a result of the PERCEIVE Project, funded by the EC under the call H2020-REFLECTIVE-SOCIETY-2015. Grant Agreement number: 693529.


Introduction:
The main aim of the work is the construction of a synthetic indicator of citizens' identification with Europe. Clearly, a first question is why we need such a synthetic indicator when we ask individuals directly about their identification with Europe? The easy answer to this question is related with the complexity of the concept and how and if we approximate it correctly with a simple question in a survey.
In order to capture the concept of European identity, individuals are usually asked if they feel Europeans in their everyday life, if they feel close to the European Union project, or if they are happy to be Europeans. Alternative questions propose the use of inverted scales, for example if they feel that Europe is worthless. In that regard, Mendez and Bachtler (2017) identify a list of questions on European identity, grouped into five categories: -Geographical belonging: Usually captured by the answer to a question such as "To which of these geographical groups would you say you belong first of all? And the next?" Once considering the locality or town where the respondent live, the region or country or Europe, if respondents answer Europe in first or second place one can think on some sort of European feel of belonging. -Thinking of self as European: the basic question is if the respondent think of him/herself as a citizen of Europe often, sometimes or never. Other alternatives include a comparison with national identity, such as "Do you ever think of yourself as not only (nationality), but also European?" Clearly, a cognitive aspect is involved in this type of question. -Attachment to Europe/EU: attachment is used as a synonym of being close or emotionally close to Europe. This is an emotionally measure of identity. -National versus European: in addition to the national/European comparison, a prospective demand is proposed: "In the near future do you see yourself as (nationality) only, (nationality) and European, European and (nationality), European only?" The requested evaluation proposes a look for the future. -Proud to be European: another type of emotional evaluation is the request on the satisfaction or gratification linked with being proud of being European: "And would you say you are very proud, fairly proud, not very proud, not at all proud to be European?" Many works (such as Fligstein et al. 2012or Roose 2013 consider the so-called Moreno question in the Eurobarometer: "In the near future, do you see yourself as (1) European only, (2) European and [nationality], (3) [nationality] and European,or (4) [nationality] only". During the 1970s and 1980s the Eurobarometer data was mainly used to combine the political support for the European Community with European identity, two aspects clearly distinct, as Mendez and Bachtler (2017) stress. The Eurobarometer usually proxies European identity by asking about the support to European integration or about the feeling to be European (citizen of the EU) (Scheuer andSchmitt, 2009, Verhaegen et al., 2014), and also asks about trust in people from other countries (Scheuer and Schmitt, 2009), or trust in EU institutions (Arnold et al., 2012).
According to Hobolt and de Vries (2016), a key advantage of Eurobarometer data is that it allows studying the identification with Europe both over time and across countries. Still, several other works have used specific surveys to analyse particular aspects: selfesteem scale applied to European feelings (Agirdag et al., 2012), feeling European in one's day-to-day life and attachment to the EU (Hooghe and Verhaegen, 2017), and a battery of questions such as the Europeans Values Index, or Support to the EU (Rünz, 2015).
Clearly, the idea of identification with Europe can be addressed from many angles. As a consequence, no measure is free of critique, as all suffer from limitations, such as a lack of measure of the intensity or the meaning of the identity or meaning that citizens associate with their identity (Luhmann, 2017a). A debate on the types of collective identities is as necessary as it is supportive since it benefits the understanding of the aspects veiled within the latent concept of European identity.
These ideas clarify that measuring the concept of European identity is not an easy task and that using a simple question in a survey is a good option but might not be necessarily the optimal one. In this document we aim at providing a definition of European identity together with the main aspects that we can label as constituents of the concept (section 2), which we will use later on to build a composite index of European identity: in section 3 we will study the experiences in the literature, the possible alternatives and their pros and cons. We also develop a spatial analysis by means of geographic weighted regressions technique using survey data from the 2016 wave of the Eurobarometer, in order to be close to replicate previous empirical works and, most of all, to have a wider spatial coverage of the concept of European identity when studying spatial heterogeneity of the studied parameters. In section 4 we propose a Composite Index of European Identity based on fuzzy sets techniques using the PERCEIVE survey data, what allows us to report a detailed description of the regions analysed in the PERCEIVE project. Finally, section 5 presents the concluding remarks resulting from this work.

Defining European identity
Two main approaches are being used in the literature to define the individual identification with Europe and the European social identity. In this section we follow Bergbauer (2018) and Royuela and López-Bazo (2020) who uses this categorisation to build a definition of identification with Europe.

Individual identification with Europe
Personal identity is composed of two major dimensions: personality characters and fitting to societal groups and categories (Lengyel, 2011). The concept of individual identification with Europe is derived from social psychology. According to Tajfel (1981), social identity is "that part of an individual's self-concept which derives from his knowledge of his membership of a social group (or groups) together with the value and emotional significance attached to that membership" (Tajfel, 1981, p. 255). Consequently, it is not enough to share a common characteristic with a group, such as race or religion. David and Bar-Tal (2009) propose that it is a psychological attribute, a subjective claim and a person's self-recognition of membership in a social group.
This subjective awareness of identification involves cognitive, affective and evaluative aspects of identity, close to the three main dimensions proposed by Brubaker and Cooper (2000): identification and categorisation; self-understanding and social location; and commonality, connectedness and groupness. This triple distinction does not imply that all identity dimensions need to be simultaneously present to qualify as a collective identity, although they interact and can even reinforce the identification with a social group.
The cognitive dimension refers to the self-categorisation as member of a group, whether people categorise themselves as European. Self-categorisation is the first requirement for other aspects of identification. Individuals need to acknowledge in-group/out-group classifications, building taxonomies of social groups and evaluating their differences and similarities. The cognitive process, then, requires a sort of meta-contrast of reality.
Once individuals have certain knowledge of social groups, they evaluate both social groups and social membership. Both issues are associated with specific value connotations, both positive and negative. One can expect that the higher (lower) the perception of a social group, the better (worst) will be evaluated the group membership and the status associated to it. Of course, it can be the case that an individual assumes his membership to a social group and at a point in time having a negative evaluation. This can lead to leave the group.
The final aspect of individual identity is an affective component. Group members can cultivate emotional attachments and feelings of love and concern for the group, leading to a sense of fidelity and responsibility to the group. This relates to the emotional significance, the 'we-feeling'.
Based on these arguments, Bergbauer (2018) defines individual identification with Europe: "Individual identification with Europe refers to citizens' self-categorisation as European together with their evaluations of their membership in the European collective and their affective attachment to Europe and other Europeans" (Bergbauer, 2018, p. 18 The distinction between civic and cultural citizenship was firstly defined by Kohn (1944) and continues to be one of the most influential frameworks for the analysis of national identities (Brubaker, 1992, Shulman, 2002, Bruter, 2005. The cultural approach proposes that the common identity is based on cultural roots, historic experiences and traditions, sharing a heritage that differentiates a social group from another. This line is particularly adequate for national identities, build on long-term processes of common experiences, and is more difficult to justify for the existence of an actual European identity. Still, there is a cultural background of European identity: Europeans share a common cultural background, such as the right wing sentiment that Europeans are Christians sharing joint history (Holmes, 2009). Cultural aspects embrace common history, traditions and moral values and norms (Bruter, 2003). On the contrary, the civic approach considers the set of institutions, rights, and rules that preside over the political life of a community (Bruter, 2005). This can be a good basis for building the European identity, as European ideas are linked with civic values such as solidarity and cohesion, finally driving to rights and obligations resulting from European laws and treaties (Reeskens and Hooghe, 2010).
The cognitive, evaluative and affective aspects of social membership identify, evaluate and define an emotional attachment to similarities and differences that can be cultural or civic, ethnic or political. These issues apply for different types of identities, including local, national and European, and there is no need for deciding one against another. The existence of multiple identities can exist in a hierarchical system, in which every identity becomes relevant in different situations, as can be the case when an individual highlights his love for his football team on weekends but expresses his support to a political party at work from Monday to Friday. Still, national and European identities can be conflicting, as both polities compete for sovereignty. Some authors have argued that these two identities can be complementary and even talk about the 'marble cake of multiple identities' (Risse, 2010). This complementarity can also take place between the civic and cultural distinction as pure civic or ethnic bodies only exist in theory (Kuzio, 2002).

Collective identification with Europe
Bergbauer (2018) lists two approaches to define the concept of collective identity: one based in social psychology and another grounded on a sociological approach to collective identity. According to social psychology, collective identity is "a situation in which individuals in a society identify with the collective and are aware that other members identify with this collective as well" (David and Bar-Tal, 2009, p. 361). This implies that individuals are conscious that other group fellows self-categorise as group members. Thus, two aspects are important then: individual's self-identification with the group and awareness of other group members, what is clearly more difficult. According to these authors, such mutual awareness is a precondition for the cognitive and emotional consequences of belonging to a social group, such as collective mobilisation or coordinated activities, what associates some sort of functionalism to collective identities at the group level.
The second approach is the sociological concept of collective identity, and is based on the notion of "we-feeling" or "sense of community" (Easton, 1965), the feeling of belonging together as a group, the affective ties among members of a community and the amount of political unity and solidarity between fellows, what implies, for instance, the acceptance of communal mandatory decisions. There is a challenge in this definition, though, as here collective identity is a characteristic at the group level. Using this framework Bergbauer (2018) defines collective European identity in the following terms: "A collective European identity will be stronger, the higher the number of EU citizens who identify with Europe, the stronger citizens' identification with Europe, and the more citizens are aware of other citizens' identification with Europe" (Bergbauer, 2018, p. 25).
This definition implies mutual awareness, an aspect that is hard to collect, as it is hard to believe that citizens' know the feelings of other members of such a large community as the EU. In fact, the result of this difficulty is that the usual approach is considering for the share and strength of citizens' identification with Europe, neglecting aspects related with common awareness.
Other authors have also analysed the concept of collective European identity. Agirdag et al (2012) lists two main theories to explain this concept. The social identity theory (Tajfel, 1981 andTajfel andTurner, 1986) accepts that any collective identity is part of an individual social identity, accepting than an individual participates of a social group. The self-categorization theory (Oakes et al, 1994), affirms that social contexts offer the environment for individuals' identities becoming important. Hooghe and Verhaegen (2017) analyse two alternative academic views of European identity. The functionalist institutional approach suggests that the European identity is grounded on trusting how European institutions are promoting prosperity and economic development. The society based-approach accepts that citizens have to identify themselves with other European in order to build a European community, what implies that there is a need of democratic legitimacy of the process of European integration (Habermas, 2011, andRisse, 2014). Still, it is easier to recognise European citizenship than a European identity, although there is a clear link between European laws and the rights granted by them, and the feeling of belonging to the European Union (Risse, 2010). Fligstein et al (2012) define a collective identity as a collection of individuals (a group) accepting a dominant similarity (that can be driven by religion, ethnicity, language, social class, gender, and of course, nations), and finally driving to a feeling of solidarity within the group, what implies some level of social interactions. Fligstein et al (2012) quote Anderson (1983, p. 5) for proposing a definition of a nation: "In an anthropological spirit, then, I propose the following definition of the nation: it is an imagined political communityand imagined as both inherently limited and sovereign. (…) Regardless of the actual inequality and exploitation that may prevail in each, the nation is always conceived as a deep, horizontal comradeship. Ultimately it is this fraternity that makes it possible, over the past two centuries, for so many millions of people, not so much to kill, as willingly to die for such limited imaginings".
According to this statement, nations are communities capable of creating social rules, limits and boundaries and frontiers when they become states. Among these social rules, nation-states create directions for reproducing and reinforcing the national side of the common identity.

Composite measures of European identity
The academic literature has considered several variables together to build a joint measure of European identity, mostly when it has been considered as the 'dependent variable' of an empirical work. Next we review how the empirical literature has addressed the issue and subsequently we provide the pros and cons of every approach.
We notice in first place that many works use as dependent variable the results of a question addressing directly the concept of European or national identity, for instance by using the Moreno question in the Eurobarometer. Then, an interest variable is proposed and a list of controls are introduced in the regression. Some recent examples are: Verhaegen et al. (2014) look at the role of perceived economic benefits to the proximity to the idea of Europe; Luhman (2017b) analyses the impact of European integration on European identity; Hooghe and Verhaegen (2017) study the role of trust and trust in European citizens on European identity; and Bergbauer (2018) inspects the role of the EU enlargement and the Great Recession on European identity.
Most of these works consider a list of controls, containing individual variables such as age or gender and context variables, including national characteristics. Within the right hand variables we find several works including dimensions reviewed in the previous section: cognitive, affective and emotional aspects and also civic, ethnic and cultural dimensions of identity. Next, we provide some insights of every alternative approach by looking at the empirical literature.

The civicethnic dichotomy
The works explicitly considering civic and ethnic dimensions of identity usually look for understanding the type of identity behind, both analysing national identity (Wright and Reeskens, 2013) or different aspects of the European identity (Shulman, 2002, Reeskens and Hooghe, 2010, Ariely, 2013. Nevertheless, one of the major topics analysed in the literature is the types of identity one can find. As stressed by Reeskens and Hooghe (2010), the civic-ethnic division is usually operationalised assuming crossnational equivalence, although there is a classic assumption that Eastern and Western ways of European identity differ, being the identification with Europe more ethnic oriented in Eastern countries while the Western countries are more civic oriented. Reeskens and Hooghe (2010) perform a cross-country validation analysis and find that the civic and ethnic typology is not cross-nationally equivalent across nations.
We also briefly analyse this aspect at the regional level by means of the Eurobarometer data. We use this source for two reasons. First, it allows to be closer to previous academic works on the topic and consequently to replicate previous results. And second, because it has a wide geographical coverage. This is an important aspect, as we want to develop a spatial analysis for which spatial proximity, and consequently spatial coverage, can play an important role when studying parameters' spatial heterogeneity. We provide a very simple aggregate model at the regional level using 2016 data. 1 We have collected information associated to European identity, 2 and we have built a 0-1 variable at the individual level, in which 1 refers to some type of European identity. Figure  1 displays a map representing the share of individuals with some identification with Europe. It is not hard to see that there is a geographic pattern within the considered regions, what is not a problem indeed, but a spatial representation of reality: higher values in central and western continental Europe and lower values in Great Britain and eastern countries.

Figure 1. European identity
What we do next is thinking on an aggregate model at the regional level in which the share of citizens reporting a European identity is a function of civic and ethnic variables and short list of controls. In particular we define the civic/ethnic divide by means of the Eurobarometer question on aspects most creating a feeling of community among EU citizens. Respondents were offered to list up to three different options from the following list: History; Religion; Values; Geography; Languages; The rule of law; Sports; Inventions, science and technology; Economy; Healthcare, education and pensions; Solidarity with poorer regions; Culture; Other; None, such a feeling does not exist. We create an individual indicator of the civic dimension of European identity by building a dichotomous variable with value equal to 1 if respondents reported as aspects most creating a feeling of community among EU citizens any of three options: Values; The rule of law; and Solidarity with poorer regions. We did the same for building an indicator for the ethnic component, now identifying Religion, Geography and Language. Both indicators were built at the social level and also at the individual level (now considering fact, considering the same coordinates for all individuals within every region in this regression would be somehow equivalent to the use of regional aggregate data. the most important values personally for the respondent). The regional variables for the civic and ethnic and dimensions were built as the share of respondents for who the aspects most creating a feeling of community among EU citizens were either civic or ethnic.
We use as further aggregate controls the average political position of individuals (on a 0 left-10 right scale), marital status (shared of married respondents), age (average age) and gender (share of men). We assume that the considered definitions of the variables and the model itself are quite simple and we agree that a more sophisticated analysis can be developed, for instance using other sources. Nevertheless, we aim at looking at a simple statement : the spatial variability of the composition between the ethniccivic divide. We do so by means of the use of a geographic weighted regression analysis. 3 The proposed empirical model is then: = 0 + 1 + 2 + 3 ℎ + 4 ℎ + 5 + 6 % + 7 % + 8 _ + Where i corresponds to every region. 4 The basic results are displayed in tables 1 and 2, and the spatial variability of the parameters is shown in figure 2. 5 For the GWR estimates we used an adaptive Gaussian kernel, and the selection criterion for finding the optimal bandwidth was cross validation. The obtained results report a low adjustment at the aggregate level (the R2 is below 0.2). This calls to interpret the results with caution, as the global model is only capable of reproducing a small fraction of total variance. The parameters of the aggregate model confirm that civic values are positively associated with European identity, while ethnic values show heterogeneous results, with a negative association of the index reported at the individual level. Due to the aggregate nature of our experiment it is not unexpected that some of the controls (politics, the share of married and men) report non-significant parameters. Our results then report a picture where European identity is formed from both civic and ethnic values, although the importance of every vector strongly differs:  The social index of the civic dimension is always positively and significantly associated with European identity, although reporting different levels of intensity in the relationship. Higher values of the parameters are found in the eastern Mediterranean regions, Germany, Denmark and the United Kingdom and Ireland.  The index for civic values measured at the individual level is also positively associated with European identity, although arises as non-significant in eastern regions of Europe and positive and significant in western regions.  The social ethnic index, measured at the aggregate level, displays a positive association with European identity, although, again, is not significant in eastern European regions.  Finally the index of ethnic values measured at the individual level is the one reporting a negative association with European identity, being this parameter strong in Western Europe.
This latter result is of particular importance: the ethnic dimension can be negatively associated with European identity. The literature has usually argued that this result can be partly due to a negative consequence of the Moreno question, this is, a sort of competition between national and European identity. In any case, this is of particular importance for thinking on a theoretical structure of a composite index of European identity. As ethnic values are more likely to be connected with national identity, we find that a key vector of social identity might not be a good proxy of citizens' identification with the idea of Europe.
Most parameters display a significantly heterogeneous pattern in space, with the only exception of the Ethnic social index, which is always positive and significant. Our results are in line with Reeskens and Hooghe (2010), who report that civic and ethnic typologies are not spatially equivalent. In our view these results proof that building a composite index of European identity based on aggregate weights, either loadings at a principal component analysis or parameters in a regression framework, would be inappropriate or, at least, causing misleading results at the local level. Consequently, we discard the alternative of building a composite index of European identity based on the two types of typologies described by Kohn (1944). This drives us to consider the possibility to build such index grounded on social psychology, based on the cognitive, affective and emotional aspects of identity.

The cognitive, affective and emotional construction of identity
As we saw in section 2.1, the individual identification with Europe is derived from social psychology and most academic papers usually cite the works from Henri Tajfel on social identity, according to which the social identity of an individual comes from his/her knowledge of his/her membership of a social group (or groups) and from the value and emotional attachment to such membership. Consequently, it is defined at the individual level and requires a self-recognition of membership in a social group (David and Bar-Tal, 2009).
This definition is followed in a list of academic works. Next we list some works in a chronological order: -Scheuer and Schmitt (2009): feeling like a European citizen, and thinking of oneself as European or national as ways to measure cognitive aspects, and pride in being European, proxying the emotional dimension. They test the dimensionality of European pride and self-perceptions as a European citizen, and find that both dimensions are originating in the same latent construct.
- Agirdag et al. (2012) study the determinants of European identity among children in Belgian schools and use a scale based on five items from the Collective Self-Esteem Scale: 'I consider myself a European', 'I often regret that I am a European' (reverse scored), 'I am glad to be a European', 'I often feel that Europe is worthless' (reverse scored) and 'I feel good about Europe'. These items capture cognitive and affective considerations. Finally the responses to the five considered items were averaged. - Quintelier et al (2014) and Verhaegen and Hooghe (2015), who use measures of both cognitive and emotional identification as a European citizen. They use data from the International Civic and Citizenship Education Study 2009 (ICCS 2009) and consider the following statements: "I am proud to live in Europe," "I feel part of Europe," "I am proud that my country is a member of the European Union," and "I feel part of the EU", considering the feeling variables associated with cognitive aspects and those asking about pride linked with the emotional dimension. They finally build an index of European identity by means of a principal component analysis: with just a single component of the four items capture up to 63% of the overall variance, confirming that all aspects correspond to the latent factor. - Rünz (2015) use two dependent variables in his analysis of the impact of EU simulations exercises in the European Parliament: European identity and support of the EU, a similar approach to the work from Verhaegen et al. (2014), who analyse the impact of the perceived economic benefits from the EU on the support for European integration and on European identity. Both works analyse the two dependent variables separately.
-In a more recent paper (Hooghe and Verhaegen, 2017) again consider both cognitive and emotional variables. They consider the 2009 IntUne Mass Survey and use the information on European identity collected in two items: one providing information on feeling European in one's day-to-day life, and the other on attachment to the European Union. Both variables are measured in a 1 to 4 scale. The final index is just the sum of both items. In this work they develop a confirmatory factor analysis and find that both variables measure the same underlying idea. Interestingly, they develop this test for every country in the sample (16 EU member states) and "model fit indicators suggest a similar model fit in each country, so we can assume measurement equivalence" (Hooghe and Verhaegen, 2017, p. 167).
These experiences clarify that the social psychology approach mixing cognitive and affective/emotional dimensions is an accepted method. The more sophisticated method considered among these works is principal component analysis. This technique exploits a hypothetical association between a group of latent factors and the set of observed attributes or variables, and aims at identifying separate dimensions and which attribute is explained by each dimension. The two primary objectives of the technique are in line with the main aim of the search of a composite index: identification and data reduction. The first step of principal component is the determination of the factor loadings. These can be interpreted both as regression parameters and as correlation coefficients. This way, the main (first) principal component can be interpreted as a composite measure of the optimally weighted variables under analysis. As in the previous section, we face a list of concerns: reasonable fit of the model; weights (parameters) with the wrong sign and spatial stability of the parameters. Next we provide some insights again by means of the 2016 Eurobarometer data. As above, we perform an exercise using an OLS regression and a GWR analysis.
Again we use two indicators for every dimension. As for awareness we consider the following two questions: "Overall, to what extent do you think that in (COUNTRY) people are well informed or not about European matters?" and "Overall, to what extent do you think that you are well informed or not about European matters?", both of them on a 1 to 4 scale, that we turn into 0-1 dichotomous variables, with a value of 1 for those declaring being Very well or Fairly well informed. The former question is used to build the "social awareness" indicator (also labelled as awareness 1) and the latter the "individual awareness" (or awareness 2). As for attachment we consider the following item: "Please tell me how attached you feel to…", with answers refereed to The European Union ("attachment 1") and Europe ("attachment 2") . Again both variables are collected on a 1 to 4 scale, that we turn into 0-1 dichotomous variables with value of 1 for answers referring being Very or Fairly attached. All four indicators are considered at the aggregate regional level and can be interpreted as the share of respondents who are aware of or attached to Europe. The proposed empirical model is: The results are displayed in tables 3 and 4 and in the maps in figure 3. Again, some descriptive statistics are reported in appendix 1. The obtained results report a much higher adjustment than the model considering the civic-ethnic typology. Still, we find that variables associated with awareness of Europe display non-significant parameters, what call for some caution on the estimated models, as most models in the empirical literature point to the importance of cognitive aspects on social identification. We understand at this stage that this simple exercise developed at the aggregate level does not properly capture the importance of this variable, which we assume to consider in the remaining parts of the work. As for the spatial variability of parameters we find that all F tests report negative and large values, what we interpret as an important evidence of spatial heterogeneity in our regression. Besides, by looking at the maps in figure 3 we see significant negative parameters for the first indicator of awareness in northern European regions and wide spaces of non-significance for most parameters of all four considered indicators.
These results call for considering a technique capable of considering heterogeneity on the construction of European identity, an aspect that we will develop in the next section.

Empirical approach: fuzzy sets applied to European identity
In this section we assume the Tajfel's based social psychology approach to define European identity and the Bergbauer's derived definition, according to which individual identification with Europe is resulting from: citizens' awareness as European; -citizens' evaluation of their membership; -citizens' affective attachment to Europe and other Europeans.
These three main aspects are assumed to be the constituents of European identity. In all dimensions, then, the higher the awareness the better the evaluation, and the strongest the affective attachment the higher will be the identification with Europe.
Once assumed this theoretical framework, we propose the use of a fuzzy sets method to build a Composite Index of European Identity (CI_Eu_Ident). To the best of our knowledge it has not been applied to the study of European identity. It is an adequate mathematical technique for analysing concepts that are hard to place in a set of wholly membership. The result provides a continuous index on the [0-100] interval capturing partial membership. In our case, European identity, the index will range from 0, this is, total absence of membership to the concept, this is, no European identity at all, to 100, meaning total or full membership to European identity.
In brief, this technique allows for capturing in a single measure multifaceted concepts such as well-being, poverty or, in our case, identity. A classical set application would imply that individuals are fully identified with a group or not identified with it at all. In a fuzzy set approach, an element is allowed to partially belong to a set; this is, to be partially identified with the idea of Europe. This implies that the transition from no identification to full identification takes place gradually. "Fuzzy reasoning aims, in fact, at providing models that mirror people's intuitions and thinking processes when confronted with fuzzy categories in reality" (Lelli, 2001, p. 6). What becomes relevant in this approach is not only the membership value associated to every alternative, but the fact that the final result implies an ordering of these alternatives. A number of technical approaches have been defined to evaluate the degree of membership: the distance approach estimates membership by similarity judgements by defining optimal values; the frequency approach escaped from any prior evaluation of the alternatives, and stresses the role played by social environments in measuring membership, as considers as key reference the distribution of elements in the society. This option not only allows membership to be depending on the empirical evidence, as it also requires adopting a standpoint and consequently also assumes some sort of normative nature.
We believe that these characteristics make this approach particularly suitable for the analysis of European identity once assumed the importance of varying frameworks in space, as described above. This approach overcomes some limitations that are present in the structure of other composite indices: it does not require to define subjective weights for the attributes; the considered weighting scheme is sensitive to the distribution of every attribute, assigning lower weights to less frequent characteristics; contrary to factor analysis, it does not depend only on the loads, but also takes into account the environment in which individuals are, as the degree of membership takes into account the cumulated distribution of all dimensions and weights them by the frequency of the membership to the group. 6 This technique has been applied in a wide list of disciplines, including those of social sciences, such as well-being and poverty (see Lemmi & Betti 2006;Bérenger & Verdier-Chouchane 2007) or quality of work (Gómez et al., 2013, Agovino and Parodi, 2014. Nurmi and Kacprzyk (2007) review several applications of fuzzy sets in political science. They exemplify the use of these techniques in this discipline by mentioning the fact that a fuzzy line exist in sovereignty from local governments to national bodies and supranational polities. In another example related with voting, they assume that individual preferences are fuzzy and subsequently can be aggregated into collective fuzzy preferences. Besides they declare that "using individual rather than collective preference relations as the point of departure enables us to define new solution concepts akin to the core, minimax set and least vulnerable set" Nurmi and Kacprzyk (2007, p. 283).
The main inputs of this technique are the indices associated with concept under analysis, in our case European identity. As the purpose of the PERCEIVE project is the analysis of the association between European identity and the perception of the Regional and Cohesion Policy, we consider the specific survey conducted for the project, as other surveys are neither designed nor aiming to capture the impact of EU Cohesion Policy on citizens' identification with Europe. Besides, this survey is designed to report representative figures for a group of selected regions, a key aspect for us that is not covered by the Eurobarometer and that we fruitfully exploit in the next subsections. A brief description of the PERCEIVE survey is reported in Appendix 1. In order to build the index we consider a list of variables included in the PERCEIVE survey. We use two variables resulting from this question. First, a variable of effectiveness of the EU, ranging from 1 to 3, with a positive graduation (the higher the better) : eval_1a. Besides we build a ratio of the relative perception of effectiveness of the European Union over the one of national and regional/local governing bodies: eval_1b.

Q8. In general, do you think that (YOUR COUNTRY'S) EU membership is (NOT for UK).
We build a variable with three alternatives: 1-A bad thing; 2-Neither good nor bad or "not sure"; 3-A good thing: eval_2.
Q16. On a 0-10 scale, with '0' being that 'there is no corruption' and '10' being that corruption is widespread, how would you rate the European union? 0-There is no corruption … 10-Corruption is widespread. We use two variables: one for the raw answer in a positive way (the higher the score the better, this is, less corruption): eval_3a; and another one for capturing relative perception of corruption between the EU and the national or regional/local governments, built as a ratio on the basis that the higher the score, the lower the relative perception of the European corruption: eval_3b. And finally there is a specific question about identification with Europe, which we will use to test the validity of our approach:

Attachment
Q9. On a 0-10 scale, with '0' being 'I don't identify at all, and '10' being 'I identify very strongly', how strongly you identify yourself with Europe?: 0-I don't identify at all … 10-I identify very strongly: Eu_ident.
The considered variables are the best proxies to the concepts of awareness, evaluation and attachment within the PERCEIVE survey, and we believe that they majorly capture the spirit of the theoretical concepts behind European identification. We keep question Q9 (Eu_ident) as the reference question to evaluate if the considered variables and finally the resulting index can be confirmed as reporting similar results than the directly asked European identity.
In first place, then, we check if all the considered variables of awareness, evaluation and attachment display the right sign and significance with European identity. As can be seen in table 5 all variables are defined positively, this is, the higher the score, the stronger the identification with Europe and neither of them report a particularly strong correlation with European identity. In fact, there is a substantial amount of idiosyncratic information in this group of variables. We have performed a PCA (principal components analysis, not reported) of the eight considered variables and we find up to four components with an eigenvalue over unity and in fact we need to consider these four dimensions to sum over two thirds of the total variance. Finally, in a further PCA analysis (again not reported for brevity) we also include the survey results for European identity. The first component of such examination reports positive loads for all considered variables, including that for identification with Europe.
The computation using fuzzy sets techniques and the eight considered variables results in the Composite Index of European Identification. In order to validate this generated index next we provide some descriptives. First, figure 4 displays the density function of the index together with the histogram of the European identification variable from the survey. As we can see, both variables are clearly not representing a similar shape: the Composite Index, which ranges from 0 to 100, displays a bell shaped density, with a mode of 42.3, close to the median (42.6) and slightly below the average (44). This distribution is slightly skewed to the right (0.27) and with a kurtosis statistic of 2.5 (slightly below the normality threshold of 3). On the contrary, the variable on citizens' identification with Europe resulting from the survey shows a negatively skewed distribution (-0.63), being the average (6.4) well below the median (7) and the mode (10). Clearly, the Composite Index reports less extreme values than the surveyed European Identity. In order to check if both indicators of European identification are associated, we computed the linear correlation statistic, which is positive and significant but not very high (0.31). Figure 5 plots the association between these two variables. Despite the large variability of the Composite Index for every value of the survey, the association is clearly positive, as for higher values of citizens' responses, the median of the Composite Index is clearly larger.
Next, we confirm such positive association by regressing the Composite Index against the survey results for European identity together with demographic controls. Table 6 shows a list of columns including sequentially controls for gender, education, age, years residing in current region, employment, city size, income and even country fixed effects. Even though the parameter of the estimation slightly declines, it is strongly significant throughout alternative specifications.

Sensitivity analysis and robustness checks
Next, we perform a sensitivity analysis of the Composite Index by computing it again once every alternative attribute is selectively removed. As above, we test our approach by looking at the partial correlations between the resulting Composite Index and the variable directly derived from the survey. This way, table 7 presents the coefficient of correlation between the original variable from the survey and the Composite Index considering all eight attributes or alternative combinations. We confirm that the correlation of the Composite index based on eight attributes and the citizens' identification with Europe resulting from the survey is 0.31 (first column). Once we remove the first attribute associated with awareness (awar_1) and we build a new Composite Index based just on the seven remaining variables, the correlation with the citizens' identification with Europe declines to 0.30 (second column). The same reading can be applied to all other columns, being the last one the only considering just six attributes for building the composite index (once eval_1b and eval_3b were dropped). We can see that the exclusion of these two attributes helps to slightly increase the correlation between the six-based attributes Composite Index and the survey score of European identity. We remember that these variables were built in relative terms: relative effectiveness of European government and relative index of corruption, always defined in positive terms and comparing Europe with national and regional/local governing bodies.
These results could call for removing these two variables from the analysis. Nevertheless, we remember at this stage that our objective is not reproducing the scores resulting from the survey, as we already have this information. The main objective of the reported analysis is building a comprehensive indicator including the maximum amount of information resulting from a theoretical basis, this is, considering all dimensions behind the construction of a social identity such as citizens' identification with Europe. In this line, we plot in figure 6 the scatterplot between the composite indices considering 8 and 6 attributes respectively. We see that for every value of the Composite Index built using 6 attributes we have a wide range of values of the Composite Index using 8 attributes.
We interpret this as a proof that the 8-variables Composite Index incorporates more information than the 6-variables option and consequently we prefer the richer version of the index. dummies. In line with the correlation analysis, the attributes measuring relative concepts provide non-significant (eval_1b) or even negative parameters (eval_3b), what in fact capture non-linearities of the evaluation of corruption, as one can see once we leave the relative measures (eval_1b and eval_3b) and remove the absolute ones (eval_1a and eval_3a) (not reported). A further interpretation of the negative signs can be given in the following terms: once the evaluation of the EU is controlled for, an improvement in the evaluation of local institutions result in lower levels of citizens' European identity.

Description of the results
We finally provide a description of the Composite Index of European Identity according to a list of dimensions, including a comparison with the scores of citizens' identification with Europe in the PERCEIVE survey, assuming that the fitted association is over-fitted and should be read with care. We acknowledge that a proper association analysis should be done at the individual level. Still, due to the characteristics of our analysis, an aggregate analysis shows some features of the similarities between the composite index and the European identity survey's scores.
As for the country classification, we find Eastern European countries as those with higher levels of the Composite Index of European Identity ( figure 7). On the contrary, the United Kingdom, France, Sweden and the Netherlands are the ones with lower scores. At the aggregate country level, the find a correlation coefficient of 0.65 with the European identity survey's scores.

Figure 7. European Identity by Country
By level of education (figure 8) we find that more educated citizens display higher levels of the Composite Index of European Identity. This is in line with what we find for results of the Survey, with a very strong correlation with the Composite Index (0.97). A similar picture is found by household income (figure 9), with higher values of European identification for wealthier citizens and with a strong association between both considered scores. In figure 10 we find that larger cities, usually with more educated and wealthier people, are the ones with higher levels of the Composite Index of European Identity, again with a strong association with the results of the Survey.

Concluding remarks
In this work, we build a Composite Index of European Identification. We have analysed the main theoretical backgrounds explaining citizens' identification with Europe. We have revised the approach grounded on alternative types of identities that have been defined in the literature: the civic against the ethnic-cultural constructions. The main disadvantage of this alternative is the competing component of the ethnic approach, as it is more feasible that it is linked with national and local identities than with supranational ones. In fact, we have found negative associations between individual definitions of the ethnic dimension of identity with citizens' identification with Europe using the 2016 Eurobarometer dataset at the aggregate regional level.
We have also studied the Tajfel's approach of social psychology theory. According to this framework, social identity results from individuals' self-awareness together with the value and emotional connotation derived from this membership. Consequently, awareness, emotional attachment and evaluation are the three main axes in which identification is grounded. This second alternative is found to be much more theoretically consistent with the data, as all axes are found to be positively associated with European identity. Besides, aggregate regressions show much stronger association with citizens' self-reported identification with Europe.
Nevertheless, we have found that neither the social psychology theory not the civicethnic approach report spatially homogeneous results over Europe. In fact, using Geographic Weighted Regression analysis we have proved that European Identity formation has multiple ways to be performed. This is a clear warning for considering techniques such as regression analysis or principal components which in practice report results with parameters or factor loadings that are constant for all spatial units.
The proposed solution in this work is the use of fuzzy set techniques. This option is flexible enough to report an index of identification with Europe at the individual level. Besides, it does not require to define subjective weights, the weighting scheme is sensible to the distribution of every attribute, and more importantly, it takes into account the environment in which individuals are, as the degree of membership takes into account the cumulated distribution of all dimensions and weights them by the frequency of the membership to the group. To the best of our knowledge it has not been applied to the study of European identity, what makes this report innovative from a technical point of view.
The final outcomes of the Composite Index of European Identity a report an alternative specification to the self-reported perception given by individuals in the PERCEIVE survey. In this work, we have reported a brief description of the index, which report higher values in Eastern European countries and Spain, and lower average figures in the United Kingdom, France, Sweden and the Netherlands. The index is clearly increasing with education, income and city size, and does not display any first-sight association with respondent's age and gender.
The computed synthetic indicator has also allowed to present an alternative ranking of citizens' identification with Europe. Similarly to what happens with the direct answer at the PERCEIVE survey (the correlation with the index for the nine regions is 0.87) convergence regions display higher values than competitiveness regions. Besides, in those regions with higher average levels of the computed index we find lower values in rural areas, a fact that is not found, though, in regions less identified with Europe, where the results are quite balanced for all areas.
We understand that the Composite Index is an important complement to the survey's scores, as it captures the grounds in which social identification is built. It will allow then to enrich further research and consequently to give a more robust picture of the determinants of European identity and particularly by the role played by regional policies. We finally acknowledge that some new communication policies should be enhanced by the European Commission in order to promote citizens' identification with the European project, in line with the recommendations by Molica and Salvai (2019).

OSM1. Geographic weighted regressions
Geographically Weighted Regression (GWR) is a technique proposed by  aiming at explicitly modelling the properties of non-stationarity in regression analysis involving spatial data. In particular, it allows for spatial variation of estimated parameter. GWR has been used, among others, in the analysis of house prices , rural poverty , migration patterns , the role of microenterprises in economic growth , the wage curve , and the impact of quality of life on population growth ).
The GWR model expands a traditional OLS regression model such as: where the weights apply a heavier weight to observations closer to location i, for instance by using a Gaussian scheme, the weight of the observation j associated to the regression linked to location i will be such as: = exp (− / ) 2 , where is the distance between observations i and j, and is the bandwidth, which can be estimated using alternative methods, such as cross-validation and those based on the likelihood, such as AIC or BIC.

OSM 2. Using Fuzzy Sets to build a composite index
For simplicity we follow the formulation given in , according to which we let X be the universe of N individuals represented by xi. These individuals have a number of characteristics j=1,…,J. Let A be a fuzzy subset of X. If xi ∈ A citizen i can be labelled as having identification with Europe. If the amount of attachment of xi to A can be stated as a function μA, having values in the interval [0-100], then A is a fuzzy set. Function μA is ( ) = 0 if xi does not fit in A. ( ) ∈ (0-100) in case of partial association to subset A, what can imply total or limited identification in just several attributes. ( ) = 100 if xi has total membership to A.
( ) provides an individual calculation of identification for attribute , with the single condition of being in a range in between 0 and 100.  recommends using the cumulated distribution for evaluating the degree of membership and  propose: Being J the number of dimensions and the matching weight: Where the denominator denotes the fuzzy proportion of individuals with some degree of identification in attribute . The final result of the fuzzy sets method is an individual measurement of identification. The further aggregation for different characteristics of the data set (age, gender, political definition, country, etc.) allows describe the European identification of every subgroup.