Your Language or Mine? The Noncommunicative Benefits of Language Skills

Do languages matter beyond their communicative benefits? We explore the potential role of preferences over the language of use, theoretically and empirically. We focus on Catalonia, a bilingual society where everyone is fully proficient in Spanish, to isolate linguistic preferences from communicative benefits. Moreover, we exploit the language&#8208;in&#8208;education reform of 1983 to identify the causal effects of language skills. Results indicate that the policy change has improved the Catalan proficiency of native Spanish speakers, which in turn increased their propensity to find Catalan&#8208;speaking partners. Hence, the acquisition of apparently redundant language skills has reduced endogamy.


Introduction
The days when most human beings could go through their life using exclusively their native language are long gone. The latest wave of globalization, and The Internet in particular, has dramatically increased individuals'exposure to multiple languages.
It has been estimated that more than one-half of the world's population speak more than one language (Tucker, 2001). Thus, it is not surprising that multilingualism is attracting a great deal of attention, also among economists. Indeed, economic research has clearly established that language skills matter for economic outcomes.
For instance, it has been shown that sharing a common language promotes international trade (e.g., Frankel and Rose, 2002;Melitz, 2008, Egger andLassmann, burgh and. They note that preserving linguistic diversity involves non-negligible costs. However, individuals tend to develop some kind of emotional attachment to the language that better de…nes their identity; therefore, limiting the number of languages also generates losses. Hence, policy makers should pay attention to both the role of languages as means of communication as well as their subjective, emotional aspects. The relevance of the non-communicative aspects of languages can be also inferred from two other strands of the economics literature. First, several studies (including Alesina et al., 2003 andDesmet et al., 2012) use language as a proxy for ethnicity or culture in order to examine the e¤ects of ethnic or cultural diversity on civil con ‡ict and redistribution. Second, certain language characteristics have been linked to values and economic behavior. In particular, Chen (2013) shows that languages that grammatically associate the future and the present foster forwardlooking behavior. In a similar spirit, Gay et al. (2013), demonstrate that women speaking languages that more pervasively mark gender distinctions are less likely to participate in economic and political lives. 2 In this paper we examine the non-communicative aspects of languages both theoretically and empirically. In contrast to the existing literature, we focus on the e¤ects of acquiring a second language. In particular, we show that the acquisition of language skills that are redundant from a communicative viewpoint can signi…cantly in ‡uence the pattern of social interactions, undermining endogamic behavior. We interpret such non-communicative e¤ects as arising from a broad notion of linguistic preferences: most individuals develop an emotional attachment to their native language and, even if fully bilingual, prefer to use it over their second language.
Clearly, linguistic preferences may also emerge from the ties between language and culture, and re ‡ect ethnic or political identity. In any case, it is important to note that our theory focuses on the e¤ect of language skills on social behavior, taking preferences as exogenous. Nevertheless, the interpretation of the empirical results may also depend on the nature of preferences, and hence we take this issue up again in Section 6.
More speci…cally, we …rst provide a theoretical framework that illustrates a new channel by which the distribution of language skills in a bilingual society a¤ects the pattern of social interactions. We build on standard theory and assume that 2 See also Galor et al. (2016) and their list of references.
sharing a common language enhances economic and social interactions. 3 On top of this, we assume that even fully bilingual individuals have a preference for using their native language or the language adopted as their own in later stages. 4 We model a bilingual society with an initial asymmetric distribution of language skills: all native speakers of the weak language are bilingual, with full command of both the strong and the weak language, but most native speakers of the strong language are either monolingual or only partially pro…cient in the weak language. 5 Thus, all agents share a common language, and hence the role of linguistic preferences can be isolated from the communicative bene…ts. Cooperation (trade partnerships, marriages, etc.) requires communication and hence the use of a particular language. Such a choice is trivial when all partners belong to the same speech community. However, in the case of mixed partnerships, individuals with strong linguistic preferences may reject optimal partners (in terms of non-linguistic dimensions) and instead match with less desirable, but linguistically homogeneous, partners. In other words, the formation of mixed partnerships requires a satisfactory resolution of a linguistic con ‡ict. The crucial observation is that the intensity of the con ‡ict varies with language skills.
In particular, as native speakers of the strong language improve their skills in the weak language: (i) the frequency of mixed partnerships increases, (ii) the use of the weak language also increases.
It is important to note that, if we abstract from learning costs, such an improvement in language skills increases total surplus. That is, the promotion of language skills that do not expand the ability to communicate generates social bene…ts, that need to be measured against the learning costs. Thus, policies that promote minority languages can be justi…ed not only in terms of fairness (Van Parijs, 2011) but also, under some conditions, on e¢ ciency grounds. The intuition behind these ben-e…ts is that the equilibrium rate of mixed partnerships is ine¢ ciently low, because individuals do not internalize the negative externalities in ‡icted on their potential partners when they unilaterally decide to match with an inferior but linguistically 3 See, for instance, Selten and Pool (1991), Church and King (1993), and Weber et al. (2011). 4 Some kind of linguistic preferences have already been introduced in a variety of economic frameworks. See, for example, Grin (1992), Wickström (2005), Caminal (2010), and Mèlitz (2012). Our main focus is on how language skills and preferences a¤ect cooperation between speech communities. 5 The relative strength of the two languages do not necessarily re ‡ect the relative size of their local speech communities. A language may be strong because of its status and prestige, or because it is widely spoken outside the country or region (think of Russian in Latvia, or English in Quebec) and hence incentives to learn it may surpass its local communicative bene…ts. See the next section for precise de…nitions. homogeneous partner. Thus, the increase in mixed partnerships generated by the additional language skills is bound to raise total surplus.
Next we empirically test these predictions using survey data originated in the particular but very …tting case of Catalonia (Spain). Two main reasons make Catalonia a unique test …eld. First, it is a bilingual society (Spanish and Catalan are the two main languages) where the ability to communicate is not at stake because everyone speaks the strong language (Spanish), just as in the theoretical model.
Hence, any implications of additional language skills must be attributed to linguistic preferences. Second, new language-in-education policies were introduced three decades ago, after the approval in 1983 of the Language Normalization Act (LNA).
With the implementation of this reform, education experienced a smooth transition from a system in which Catalan was excluded to one in which Catalan has become the main language of instruction in compulsory education. This reform led to a signi…cant improvement of the Catalan skills of native Spanish speakers, whereas all other language skills remained basically unchanged. 6 Hence, the heterogeneous e¤ect of language exposure during compulsory education allows us to generate quasi-experimental variation in the variables of interest.
The main goal of the empirical analysis is to study the in ‡uence of improved language skills among native speakers of the strong language (Spanish) on their propensity to form a linguistically-mixed couple and the use of the weak language (Catalan) with the partner. 7 In order to identify the causal e¤ect, we exploit an Instrumental variable based on the di¤erential e¤ect by native language of exposure to Catalan as a language of instruction during compulsory schooling. Compulsory language exposure was already considered as an exogenous determinant of identity formation by Clots-Figueras and Masella (2013) in a reduced-form framework. 8 Here, we exploit the interaction between compulsory exposure and the indicator for 6 We are referring to oral skills, which are the most relevant regarding the formation of a couple. As discussed in Section 4; written skills in Catalan improved for both Spanish and native Catalan speakers, although much less so for the latter group, and Spanish skills remained at very high levels for both speech communities. 7 It has been shown (Bleakly andChin, 2010, Furtado andTheodoropoylos, 2011;and Chiswick and Hoseworth, 2011) that the frequency of inter-ethnic marriages among US immigrants is positively a¤ected by English-speaking ability. See also Meng and Meurs (2009) for the case of France. Since the pro…ciency of individuals in the strong language varies a lot from individual to individual, these studies cannot distinguish between linguistic preferences and communicative bene…ts. 8 Thus, they study the e¤ects of the same education reform, but focus on a di¤erent topic and use a di¤erent dataset. They …nd that attending compulsory schooling after the LNA reform reinforces individuals'self-identi…cation as "Catalans". See also Aspachs et al. (2008). In Sections 5 and 6 we discuss whether identity considerations matter in interpreting our empirical results. being native Spanish speaker as identifying variable in a Two-Stage Least Squares (2SLS) setting. This exclusion restriction captures the improvement in oral ‡uency in Catalan among native Spanish speakers that was induced by reform exposure during compulsory schooling. The main underlying assumption behind the validity of this identi…cation strategy is that non-linguistic cohort e¤ects are common for both linguistic communities (in the spirit of the identi…cation strategy originally proposed by Bleakley Chin, 2004, 2010. Several robustness checks and fal-si…cation exercises are carried out in order to validate the use of such exclusion restriction.
Our results are in line with the theoretical predictions. In particular, the 1983 education reform, by improving the oral Catalan skills of native Spanish speakers, raised their propensity to …nd a Catalan speaking partner and to speak Catalan with the partner. These results are robust to a battery of sensitivity checks, and clearly indicate that linguistic preferences are relevant. In particular, the acquisition of language skills that appear redundant from a communicative viewpoint can signi…cantly reduce segregation.
In the next section we lay out the theoretical framework and derive two testable hypothesis. In Section 3 we provide some historical background and describe the data. Section 4 discusses some descriptive evidence. The main results as well as the robustness and sensitivity tests are presented in Section 5. Finally, Section 6 summarizes the paper and discusses alternative interpretations.

The theory
Consider a country with two languages, A and B. A fraction of the population is initially socialized in A (they are native A speakers), and a fraction 1 in B (native B speakers). Everyone is fully competent in their mother tongue. These two languages di¤er in their status and knowledge. In particular, all native B speakers are also fully pro…cient in language A; but only some native A speakers are pro…cient in language B. Because of the (domestically) universal knowledge we call language A the strong language, and B the weak language. Perhaps, these asymmetric language skills are induced by the fact that A is widely known in the rest of the world and hence very useful for communicating with foreigners. 9 In any 9 Another reason could be that knowledge of A provides access to an abundant supply of media outlets and leisure goods produced in that language. case, we take language skills as exogenous, and the identi…cation of a language as strong or weak as country-speci…c. Thus, a particular language can be weak in one country or region and strong in another. 10 In spite of the universal knowledge of the strong language, the existence of di¤erent speech communities (de…ned according to native languages) still matters because individuals develop a preference towards their initial language, as speci…ed below.
Individuals derive utility from forming partnerships with other compatriots (e.g., trade partnerships, couples). 11 In particular, each individual can match a single person. The level of utility obtained from a partnership depends on linguistic as well as non-linguistic factors. With respect to the latter, for each agent i there is a single best match, j, which is reciprocal (so that j's best match is also i). The best match generates, for each partner, a level of utility g ij > 0 (pair-speci…c). For simplicity, we assume that all other potential matches provide the same level of utility, which is normalized to zero.
The activities of the partnership require communication, and hence the use of a particular language. Everyone has a preference for using their native language.
Hence, if the two members of a best match belong to the same speech community, 12 then nothing prevents the formation of the best match, since each partner obtains g ij ; which is higher than any alternative. However, if they belong to di¤erent speech communities (a mixed match), then language preferences can prevent the formation of the best match. More speci…cally, let individual a be the native A speaker, and b the native B speaker of a mixed match. If they form the partnership and choose A as the language of communication, then a and b would obtain a payo¤ of g ab and g ab w b , respectively: That is, individual b incurs a cost w b for using their second language. Individuals di¤er in the intensity of their linguistic preferences.
In particular, w b is the realization of a random variable w distributed over some interval [0; w] with density function f (w) ; and distribution function F (w) : We 10 The universal knowledge of the strong language guarantees communication, independently of the knowledge of the weak language. The model literally apply to cases like Catalonia, Wales or the Basque Country. However, in other cases like Belgium or Quebec some speakers of the weak language (Flemish and French, respectively) remain monolingual. The model can be easily extended to take into account a fraction of monolingual speakers of B. In that case, language skills will a¤ect segregation not only through linguistic preferences but also through changing the ability to communicate. 11 For simplicity, we ignore potential foreign partners. 12 If everyone has the same probability of being i 0 s best match, independently of their native language, then the probability of a linguistically homogeneous best match is for a native A speaker and 1 for a native B speaker: assume that f (w) > 0 for all w 2 [0; w] and there are no mass points. If instead they choose B, then their payo¤s would be g ab a w a and g ab , respectively.
That is, if individual a uses B instead of A; this incurs an extra cost of w a + a , where w a represents again the cost for using a's second language (pure preference), whereas a 0 represents the disutility caused by a limited pro…ciency in the second language. Hence, individuals with a better command of B have lower values of .
For simplicity, we assume that both speech communities have identical distributions of pure preferences. That is, w a and w b are two independent realizations of the random variable w. Whereas w is a …xed individual characteristic, vary as a becomes more pro…cient in B. 13 The value of the outside option for both partners is 0 since there is always a member of their own speech community among their second best partners.
Given the set of values (g ab ; w a ; a ; w b ) ; the two potential members of a mixed match must decide whether or not to form the partnership, and the language of use in case they do. Our main qualitative results rely on the existence of some kind of bargaining friction. For expositional convenience, we consider the following environment. First, partners negotiate under full information about the relevant parameters. Second, if both parties agree on forming the partnership, then they choose the language that maximizes the joint surplus. Thus, the only friction is the absence of monetary compensations (non-transferable utility). At the end of this section we discuss some alternative frameworks that provide very similar insights and qualitatively identical comparative statics and welfare results.
Hence, in our set up a will accept forming the partnership and use B only if g ab a w a 0: Similarly, b will accept using A only if g ab w b 0: These two participation constraints imply that in equilibrium the coalition will be formed if and only if min f a + w a ; w b g g ab Thus, individuals do not internalize the negative externality imposed on their potential partners in case they unilaterally decide not to form the partnership.
Therefore, if decisions were instead taken by a social planner aiming at maximizing total surplus (…rst best), then the best match would be formed if and only if min f a + w a ; w b g 2g ab Figure 1a depicts the equilibrium outcome (i.e., when individuals are allowed to unilaterally reject the best match), for the case w > 2g ab . The region marked with N (no best match) corresponds to the case where one of the parties prefers not to make the match. Regions marked with A and B correspond to the cases where the partnership is formed and that particular language, A or B, is selected. Figure 1b represents the socially e¢ cient outcome (the solution that maximizes total surplus). Comparing the two …gures, it becomes apparent that there is a region of parameter values for which the best match is not formed in equilibrium but should form according to the …rst best. 14 In order to avoid uninteresting technical issues, in the rest of the exposition we will focus on the case that g ab = g and a is distributed on ; with a density function that takes strictly positive values in this interval, and has no mass points.
Moreover, a and w a are assumed to be independent variables. It will be convenient to …rst compare two extreme scenarios. Suppose …rst that g (Scenario 0). That is, all as are essentially monolingual. In this case, B will never be used in a mixed match, and hence the best match will be formed if and only if w b g: Alternatively, suppose now that all as are fully competent in B: i.e., = 0 (Scenario 1): In this case, the two languages are in a symmetric position, which generates a symmetric outcome: each language is used with a …fty percent chance. Moreover, the fraction of best matches that materialize is higher than in Scenario 0. That is: (i) if w b g; as in Scenario 0, all best matches happen; moreover, (ii) if w b > g, then those matches where w a g also materialize.
The comparative statics are analogous if we consider gradual, but general changes in a : More speci…cally, for all 2 (0; 1), if we start from a situation where < g (i.e., a positive fraction of as are willing to make the best match and use B) and there is a shift in the distribution of a s such that the …nal distribution is …rst-order stochastically dominated by the initial distribution, then: Result 1 (i) the fraction of successful mixed matches increases, and (ii) B is used more often in those matches: See the Appendix for details.
14 Instead of choosing between A and B, we could have allowed linear combinations of the two languages, assuming, for instance, that individual utility decreases linearly with the fraction of time in which the second language is used. The qualitative results would remain unchanged.
Result 1 contains the main hypothesis we want to test in the empirical analysis.
That is, an exogenous improvement in the pro…ciency in the weak language on the part of native speakers of the strong language reduces segregation and fosters the use of the weak language.
We can now investigate the welfare consequences of such a change in language skills. First, we focus again on the two extreme scenarios. If all as are monolingual (Scenario 0), then the average payo¤s to the as and bs, when their best match is linguistically mixed, are given by: Thus, the best match will materialize with probability F (g), in which case each party obtains g: However, the bs bear all the costs of using their second language.
That is, in Scenario 0, bilinguals are worse o¤ than monolinguals.
Alternatively, if all as are also fully competent in B (Scenario 1), then the average payo¤s are Consider b's expected utility (it is symmetric for the as). With probability F (g), w a < g; the match is feasible and each member obtains g; which explains the …rst term of the above expression. However, in this region, b incurs the cost of using A whenever w b < w a , which is the second term. Also, w a > g with probability 1 F (g). In this case, the match is feasible only if w b < g, in which case b always incurs the full costs of using A, which is the third term.
Note that the bs are better o¤ in Scenario 1: U 1 b > U 0 b . Also, the total surplus is higher in Scenario 1: 15 However, the as may be better o¤ in Scenario 0 or in 1: U 1 a Q U 0 a : The reason for this ambiguity is the following. Compared to Scenario 0; in Scenario 1; on the one hand, a bene…ts from the higher frequency of successful best matches, which increases from F (g) to F (g) [2 F (g)] : On the other hand, they lose their power to impose their preferred language, and have to bear half of the costs of using their second language. 16 . In other words, even abstracting from learning costs, native A speakers may or may not bene…t from learning B. In contrast, native B speakers always bene…t from this change, since on top of the higher frequency of successful best matches, they enjoy a better language treatment. 17 Finally, the total surplus is always higher in Scenario 1.
That is, in case native A speakers lose, they lose less than the amount gained by native B speakers. The reason is twofold. Scenario 1 generates: (i) a higher rate of occurrence of best matches, and (ii) it allows a reduction in the total discomfort from using the second language, since B can now be used whenever w a < w b : 18 In the Appendix we show that the same comparative statics hold for gradual but general changes in a . That is, for all 2 (0; 1) if we start from a situation where < g, and there is a shift in the distribution of a s, such that the …nal distribution is …rst-order stochastically dominated by the initial distribution, then: Result 2 (i) Native B speakers are better o¤, (ii) native A speakers may be better-o¤ or worse-o¤, and (iii) aggregate welfare increases.
Thus, if we abstract from learning costs, an exogenous improvement in the pro…ciency in the weak language among native speakers of the strong language raises total welfare. However, it may also have non-trivial distributional implications.
The model presented in this section is highly stylized. In the working paper version (Caminal and Di Paolo, 2015) we discuss various possible extensions and interpretations. None of these additional considerations a¤ects the main message.
In particular, one may argue that the assumption of non-transferable utility could be highly restrictive in some applications. If we allowed for monetary compensations then we would need to invoke informational asymmetries (on linguistic preferences, for example). As it is well known, bargaining under asymmetric information results in excessively frequent break-ups (Myerson and Satterwaite, 1983). In such a framework, changes in the language skills of native A speakers also reduce the ine¢ ciency associated to asymmetric information, and Results 1 and 2 still hold. 19 16 For example, if f (w) = 1 w , then U 0 a U 1 a takes a positive value if w g is su¢ ciently small, and takes a negative value if w 2g is also su¢ ciently small. 17 Notice that the third term of U 1 b is positive and the second term has a lower absolute value than the second term of U 0 b : 18 Notice again that the third term of U 1 a is positive. Also, 2 Alternatively, we could model the matching process as the result of directed (costly) search decisions. Individuals might join a bunch of social activities in order to …nd their optimal partners.

Empirical analysis: preliminaries 3.1 Historical background
Catalan can be regarded as the native language of Catalonia. It is a Romance language, originating from Latin in the territory in the ninth century. Spanish (Castilian), another Romance language, arrived in Catalonia as early as the …fteenth century and consolidated its position among the elites during the eighteenth century.
The general population remained primarily monolingual in Catalan, and only gained access to Spanish with the expansion of elementary education, which was relatively slow. 20 During Franco's dictatorship , Catalan was restricted to the private sphere, and nevertheless transmitted (mostly orally) from parents to children in a large fraction of the native Catalan families. Towards the second half of this period, e¤orts to revive Catalan as vehicle of culture intensi…ed, although those e¤orts systematically clashed with the legal frame and often resulted in …nes, or exile and jail sentences. In contrast, Spanish was the only o¢ cial language and the only language used in education. Moreover, the social use of Spanish in Catalonia was strongly reinforced by the massive migration from southern Spain (especially in the 1960s). By the end of the 1970s, Catalan was the native language of almost onehalf of the population, who at the same time were fully competent in Spanish. In contrast, most of the native Spanish speakers (40% of the population of Catalonia had been born outside the region) were monolingual or only passively bilingual (Woolard and Gahng, 1990;Siguan, 1991). Regarding attitudes and social prestige, Catalan was in a somewhat awkward position. On the one hand, it was a language excluded from public life, but at the same time the language of a large fraction of the better educated: the middle and the upper-middle class. 21 The social composition of its native speakers is probably crucial to explain the vast political support for "normalizing" the use of Catalan in the post-Franco era.
Right after the constitution of the Catalan regional government (the Autonomous If di¤erent activities are conducted in di¤erent languages, then language skills and preferences will also a¤ect the formation of mixed partnership in a way similar to the stylized model we have presented in the main text. 20 Massive school enrollment did not take place in Spain until the twentieth century. In 1872 the percentage of the primary-school age population enrolled in school was only 42%, far below the levels prevailing in contemporary France and England (Nohoglu Soysal and Strang, 1989). 21 The economic elite and those social groups in direct contact with Franco's regime adopted Spanish as the unique language in their repertoire. Community), the regional parliament passed in 1983 (unanimously) the "Language Normalization Act"(LNA), which set the legal framework that allowed the dramatic changes in language-in-education policy that occurred over the next two decades.
The LNA aimed at making all pupils fully competent in both languages (Spanish and Catalan) by the end of compulsory education. It also de…ned an integrative education model, in which children were not separated on the basis of the language spoken at home. The application of the LNA was gradual. In the period 1984-1993, the two languages were both used as the language of instruction in proportions that varied geographically, depending on the linguistic characteristics of the students and teachers'language skills. Throughout this period the average fraction of subjects taught in Catalan increased signi…cantly over time.
As a result, at the beginning of the 1990s, Catalan had become the preferred language of instruction in most primary schools, although Spanish was still dominant in secondary education (Artigal, 1997). Since 1994, the authorities gave Catalan full priority as the language of instruction in all public educational institutions, but in practice Spanish has also been used, particularly in secondary education (Muñoz, 2005). In summary, education experienced a gradual transition from a system from which Catalan was excluded to one in which Catalan has become the main language of instruction, at least in compulsory education. 22 Such an asymmetric treatment of the two languages has apparently produced a fairly symmetric distribution of language skills. At the end of compulsory education, students'levels of pro…ciency in Catalan and Spanish are similar (Consell Superior d'Avaluació del Sistema Educatiu, 2013). Moreover, the level of pro…ciency in Spanish of students coming out of Catalan schools is similar to the rest of Spain (Instituto de Evaluación, 2011). From a dynamic perspective, the educational reform improved the oral Catalan skills of native Spanish speakers (and the written skills of both native Catalan and native Spanish speakers), with basically no e¤ect on the Spanish skills of either speech community. 23 The regional authorities also sought to promote the knowledge and use of Catalan using a variety of means, including a Catalan-only TV channel, several catalanization campaigns, and language pro…ciency requirements for public sector jobs. 22 The education reform a¤ected not only the language of instruction. New textbooks and instructional materials replaced the ones produced under the supervision of Franco's educational authorities, and new generations of school teachers, better educated and more pro…cient in Catalan, joined the system. Also, specialized teachers were hired to ful…l the LNA's objectives. 23 See also Vila (2008) and references contained there.
The results of these policies have been mixed. The use of Catalan by the overall population has never exceeded 50%. Regarding speci…c environments, the use of Catalan is preeminent in the regional and local governments and, more generally, in the political life of the region. In contrast, its use in other branches of government (for example, the judiciary) is close to zero. Similarly, cultural activities and media outlets also exhibit very heterogenous linguistic patterns. For example, whereas about 50% of the radio audiences consume programs in Catalan, less than 5% of movies projected in Catalan theaters are either originally …lmed or dubbed into Catalan.

Data and descriptive statistics
The data used in the empirical analysis are drawn from the Survey of Language (i.e. individuals younger than 18 in 2008) and those who were students at the time of the survey are also excluded from the analysis. Given the main research question, it is also natural to exclude individuals who never had a partner (less than 7% of the restricted sample). Finally, in order to reduce the degree of unobserved heterogeneity in the data, we also discard the very few remaining observations of individuals whose native language or whose partner's native language is neither Spanish nor Catalan. The resulting restricted sample has 5,357 observations, 2,553 from the 2008 wave and 2,804 from the 2013 wave.
Individuals' native languages (as well as self-identi…cation language) are clas-si…ed into three categories: (1)  The language pro…ciency variables are coded with a 0-10 scale. In our analysis we focus on oral skills (and in particular, the ability to speak), which are much more relevant in couple formation. Figure  observed for native Spanish speakers observed in the raw data, it seems plausible that the language-in-education reform of 1983 is one of the main reasons behind such a positive trend. Indeed, in the identi…cation strategy that we adopt to recover causal estimates, we only exploit the variation in oral language skills that induced by the di¤erent degree of exposure of successive cohorts of native Spanish speakers to the language-in-education reform (which is arguably an exogenous component of the positive trend in language ‡uency). For the sake of comparison, Figure   3a displays written Catalan skills. Note that written pro…ciency improves for the younger cohorts of both speech communities, with a more pronounced increase for native Spanish speakers. Also, the level of written Spanish pro…ciency (Figure 3b) is uniformly high and virtually identical for both speech communities. 26 The partner's language is also classi…ed into the same three categories as the respondent's native language. In the baseline analysis, consistently with the definition of the respondent's native language, we de…ne a respondent's partner as a Catalan speaker if either option (1) or (2) 26 Note that this evidence clearly identi…es Spanish as the strong language, as de…ned in the theoretical model: that is, the language shared by all speech communities. This evidence is also compatible with the results of the systematic tests mentioned above conducted by the national educational authorities. 27 The distribution of this variable is quite concentrated on the extreme options, (1) and (5): only 16% of the sample report an intermediate option.

Descriptive evidence: OLS estimates
We consider two di¤erent left-hand-side variables: (i) an indicator that takes the value of 1 if individual i is matched with a Catalan-speaking partner, and zero otherwise, and (ii) an indicator that takes the value of 1 if individual i uses only Catalan with their partner, and zero otherwise. For each of the two outcomes, we specify a linear probability model (OLS): where the outcome Y of individual i born in year t depends on a set of controls, X, oral pro…ciency in Catalan, Cat, year of birth …xed e¤ects, ; and a random disturbance, ". The coe¢ cient of interest is . We start with a parsimonious speci…cation that includes as controls a dummy for wave, a gender indicator, and a cubic polynomial of age, which picks up age di¤erences that are not fully captured by cohort dummies. 28 We next include several controls for parental background (parents'place of birth, education, native language) and for individual attributes (place of birth, place of residence, and completed education). The full set of control variables is presented in Table 2, together with basic descriptive statistics.
We start by presenting the results obtained for the subsample of native Spanish speakers. Selected estimates for the two outcomes are presented in Table 3 (the complete results can be found in Tables A1a and A1b in the online Appendix). The estimates from the baseline speci…cation (column a) indicate that a marginal increase in oral pro…ciency in Catalan is associated with an increase by about 4:5 percentage points in the probability of having a Catalan-speaking partner. Similarly, better skills in Catalan is associated, to a similar extent, with a higher likelihood of using only Catalan with the partner. These conditional correlations are similar, but slightly lower, when we control for parental characteristics, individual characteristics or both set of controls simultaneously (columns b, c, and d respectively). 29 28 Notice that the use of two di¤erent cross-sections enables the simultaneous inclusion of age and year of birth (since the sample contains individuals born in the same year but of di¤erent ages), which is especially useful for the identi…cation strategy discussed in the next section. 29 We are aware of the fact that the above-mentioned controls are unlikely to represent exogenous covariates. This is because some of the individual characteristics (like place of residence and education) are choice variables, potentially related to the error term of the outcome equation(s). Moreover, parental characteristics, as well as individual place of birth, could re ‡ect unmeasured Overall, the evidence using observational data seems to be consistent with the theoretical predictions of the model. Nevertheless, these conditional correlations might not represent the causal mechanism portrayed by the theoretical model. First, partner choice/language use and language skills are likely to be correlated with common unobserved factors, opening the door to the typical omitted variable bias. Second, language competence is self-reported, and hence measurement error bias could also be an issue due to the systematic tendency to over-report language skills. Third, we observe language skills only at the time of the interview, but this variable itself is likely to be a¤ected by the linguistic characteristics of the partner. In other words, a native Spanish speaker is likely to improve their Catalan pro…ciency if matched with a Catalan speaker. This implies that reverse causality might also generate an additional source of inconsistency. interpreted as an "Intention to Treat"variable. 30 More speci…cally, Clots-Figueras parental characteristics that are potentially endogenous with respect to the two outcomes. Therefore, the evidence regarding these control variables must be interpreted with caution and are not discussed in details for brevity resons. 30 That is, the number of years of schooling in Catalan, assuming: a) no grade repetition, b) perfect compliance with compulsory age of school attendance, and c) uniform use of Catalan as medium of instruction in the schools. The last assumption is the most restrictive, since in the early years of application of the reform, the use of Catalan for general teaching purposes was weaker in schools with a majority of native Spanish speakers. However, the focus of our analysis and Masella (2013) assumed that individuals born in 1977 or after received all their compulsory schooling in Catalan, while those born between 1970 and 1976 were just partially exposed to the reform, with one year of exposure for the former cohort, up to seven years for the latter cohort. Individuals born before 1970 were never a¤ected. The length of compulsory education in Spain was eight years under the legal framework implemented in 1974 ("Ley General de Educación") from ages 6 to 14. A new law passed in 1990 (LOGSE) extended the number of years of compulsory education to ten (from ages 6 to 16). This means that individuals born before 1983 were subject to eight years of compulsory schooling, and those born in 1983 or after to ten years. 31 Thus, the variable capturing compulsory exposure to Catalan at school, ce t , can be expressed in the following way: Notice that the variation in ce t is only determined by the individual's year of birth, which is obviously not a choice variable. Indeed, ce t seems to be an appealing way to extract an exogenous component from the positive trend in oral language skills observed over the successive cohorts of native Spanish speakers. However, this variable itself is unlikely to be a valid exclusion restriction to identify the causal e¤ect of language pro…ciency on outcomes. In fact, ce t could capture both the language pro…ciency e¤ect of the LNA as well as other cohort e¤ects that potentially a¤ect directly the outcomes of interest (i.e., partnership formation and language use), through non-language-related channels.
In order to control for the direct (common) e¤ects of birth cohort on the outcomes of interest, we include native Catalan speakers in the analysis. This is in the spirit of the identi…cation strategy proposed by Bleakley and Chin (2004and 2010. They estimate the (private and social) returns to English pro…ciency among US immigrants, exploiting the well-established fact of the existence of a "critical period"of language acquisition (i.e., immigrants who arrive in the host country at is precisely the e¤ect of the reform on native Spanish speakers (for whom the treatment was less intense). In this sense, we are probably capturing a lower-bound e¤ect. 31 The results are una¤ected by the change in the length of compulsory education, since we obtained virtually the same results imputing eight years of exposure (instead of ten) also to individuals born after 1982. a very young age assimilate the language more easily). Their identifying variable is the interaction between age at arrival and a dummy that takes the value one if the immigrant comes from a non-English speaking country. Under the assumption that the non-language e¤ects of early migration are the same for immigrants arriving from English speaking countries as for those from non-English speaking countries, the di¤erential e¤ect of age at arrival for those who migrated from a non-English speaking country should be purged of non-language-related e¤ects and thus would represent a valid exclusion restriction.
In our case, we exploit the fact that oral language skills are also acquired within the family at an early age. Hence, the language-in-education reform did not exert any signi…cant e¤ect on the oral pro…ciency of native speakers. Moreover, the Spanish skills of native Catalan speakers have remained very high and stable over cohorts.
Therefore, using the pooled sample of native Spanish speakers and native Catalan speakers, we use the interaction between exposure to Catalan during compulsory schooling (ce t ) and the indicator that identi…es native Spanish speakers as an exclusion restriction, controlling for (common) cohort e¤ects in the outcomes of interest.
The underlying assumption of this identi…cation strategy is that both language communities were subject to the same general cohort e¤ects, except that we allow the treatment (compulsory policy exposure) to a¤ect (with increasing intensity) the oral pro…ciency in Catalan of the treated cohorts of native Spanish speakers. In other words, we assume that any speci…c cohort e¤ect experienced by native Spanish speakers a¤ected by the policy change should be (plausibly) attributed to better language skills. This identi…cation setup can be easily represented by a two-equation system, where the oral skills in Catalan (Cat) of individual i, born in cohort t and a native speaker of l (l = Spanish; Catalan) is the dependent variable of the …rst-stage equation, which contains as right-hand-side variables a set of controls (X), year of birth …xed-e¤ects (' t ), an indicator for native Spanish speaker (l = Spanish), and its interaction with ce t (as identifying variable): The second-stage equation explains the two outcomes of interest (having a Catalan-speaking partner and use of Catalan with the partner). Alternatively, we could de…ne the …rst outcome as having a mixed-couple and the second outcome as speaking the non-native language with the partner. Such a symmetric treatment of the two speech communities seems desirable. Unfortunately, the data do not support a symmetric approach. The problem is that the survey reports more information about the respondent than about the partner. If the respondent is a native Spanish speaker then we know his/her Catalan pro…ciency and year of birth (so that we can impute years of exposure to the reform). However, if the respondent is a native Catalan speaker then we ignore his/her partner's Catalan pro…ciency and year of birth. Thus, we need to de…ne the …rst outcome as having a Catalanspeaking partner and the second outcome as the use of Catalan with the partner.
The second-stage equation includes pro…ciency in oral Catalan as an endogenously determined covariate: Under the validity of the identifying assumption, the 2SLS estimation of Equations (3) and (4) should provide the causal e¤ect of oral ‡uency in Catalan on each of the outcomes ( IV ) among native Spanish speakers who improved their language pro…ciency due to exposure to the language in their compulsory schooling. This is because 2SLS provides an estimate of the endogenous right-hand-side variable that exploits only the variability of language skills that is produced by the instrument among the subpopulation of compliers (i.e., a "local" estimate of the treatment e¤ect). 32

Estimation results
Selected 2SLS estimates of Equations (3) and (4), estimated with the pooled sample of Spanish and Catalan speakers 33 , are displayed in Table 4. Overall, the results obtained from our identi…cation strategy are in line with those obtained by OLS and, more importantly, consistent with the theoretical predictions. More speci…cally, the causal e¤ect of better Catalan skills among Spanish speakers on the probability of having a Catalan-speaking partner is just slightly higher (but not statistically di¤erent) than the OLS estimate. Using the parsimonious set of controls, a unit 32 In the empirical analysis, we cluster the standar errors on year of birth, which is the level of variation of our instrument. 33 The results obtained by applying OLS to the subsample of native Spanish speakers are virtually identical to those obtained from the pooled sample of both Spanish and native Catalan speakers, as shown in Table A2 in the online Appendix. This means that most of the conditional correlations between oral pro…ciency in Catalan and the two outcomes are driven by the variation observed within the Spanish speaking community. increase in ‡uency in oral Catalan increases the likelihood of a mixed match by 7.6 percentage points (versus an OLS estimate of 4.5 percentage points for the joint sample -See Table A2). In order to gauge the magnitude of the e¤ect, we must note that, according to the …rst-stage regression, the Catalan pro…ciency of native Spanish speakers fully a¤ected by the reform is approximately one point higher (on a 0-10 scale) than that of those not exposed to the reform (8.5 versus 7.5, respectively). Also, the 2SLS estimates indicate that such an increase in the level of pro…ciency raises the probability that a native Spanish speaker is matched with a native Catalan speaker from 0.33 to 0.40. This is a sizable e¤ect. The two speech communities have a similar size, which implies that in the absence of any languagerelated bias such a probability would be approximately 0.50. Hence, according to our estimates, the reform has eliminated roughly 40% of the initial bias.
As we add parental controls, the point estimate drops slightly. However, in contrast to the OLS strategy, including individual controls generates a modest increase in the coe¢ cient of interest, while controlling for both parental and individual characteristics provides virtually the same estimate as in the baseline speci…cation.
Regarding the second outcome (the use of Catalan with the partner), our IV approach generates estimates that are much more similar to those obtained by OLS.
In particular, for the baseline speci…cation (column (a)), one unit increase in ‡uency in oral Catalan increases the probability of speaking only Catalan with the partner by 5.3 percentage points, slightly above the OLS estimates of 4.3 percentage points -See Table A2. The e¤ect of including parental and individual covariates on the second outcome are analogous to the …rst outcome case, and hence the results of the baseline speci…cation appear very robust. Overall, the di¤erences between the OLS and 2SLS estimates could be due to the fact that the latter estimator exploits all the variation that is observed in the data, whereas the former is based only on the variation generated by the instrument among the treated cohort of the subsample of native Spanish speakers. Moreover, the presence of measurement error in self-reported language pro…ciency, which could cause a downward bias in the OLS estimate, could be an additional (and probably complementary) explanation for this divergence. It is important to note that the …rst-stage estimates corresponding to our identifying variable (the interaction between language exposure during compulsory schooling and the indicator for being a native Spanish speaker), presented in the upper panel of Table 4, have the expected sign and are strongly signi…cant (the complete results of the …rst-stage regressions can be found in Table A3 in the online Appendix). Thus, native Spanish speakers a¤ected by the language reform did improve their oral pro…ciency in Catalan. The corresponding coe¢ cients obtained using di¤erent speci…cations are quite stable. Moreover, the F test for weak identi…cation indicates that the instrument is su¢ ciently strong in all speci…cations.
Overall, the results obtained from the IV strategy provide empirical support for the causal predictions of the theoretical model. Thus, better pro…ciency in the weak language of native speakers of the strong language (generated by a plausibly exogenous source of variation) fosters their propensity to form mixed partnerships and use the weak language more intensively.
We have performed a battery of robustness checks about the speci…cation of our baseline regression. In the online appendix, we present in detail the results of our sensitivity analysis. In particular, we show that the estimates of interest are quite stable when we run separate estimations for males and females (Table A4), use alternative speci…cations of the age polynomial (Table A5), use an alternative speci…cations of the exposure variable (Table A6) Table 7A in the online Appendix).
One of the sensitivity checks is very relevant to discuss the role of individual or group identity in explaining our baseline results. If we exclude from the sample those respondents with a language of self-identi…cation di¤erent from their native language ("language switchers") then the main results remain basically unchanged.
It is important to note that the vast majority of language switchers are Spanish native speakers who chose Catalan as their language of self-identi…cation. As reported in column (b) of Table 5, the e¤ect of oral skills in Catalan on partnership formation is slightly smaller, and the e¤ect on the use of Catalan slightly higher when we exclude switchers. Moreover, the instrument becomes much stronger and the coe¢ cients are estimated more precisely. In Section 6 we further discuss how this exercise helps interpreting the nature of the main results. Here it seems worth highlighting that the stability of the estimates obtained after dropping language switchers represents a …rst evidence in favor of our identi…cation strategy. In fact, it can be argued that ethnic or political identity (Catalan or Spanish) could be a possible unobserved determinant of partnership formation and, as reported by Aspachs et al. (2008) and Clots-Figueras and Masella (2013), also a¤ected by the reform. In other words, according to such alternative theory, some native Spanish speakers might have adopted, as a result of exposure to the reform, a Catalan identity, and this would help them in …nding a Catalan-speaking partner. However, if identity was the main driving force behind our results, and as long as the language of self-identi…cation is positively correlated with ethnic or political identity (a very plausible hypothesis), then we would expect a signi…cant change in the main coef-…cients when language switchers are excluded. Instead, the observed invariance of the coe¢ cients suggests that the exclusion restriction is not picking up unobserved identity traits that a¤ect the potential to …nd a Catalan-speaking partner.
On top of these sensitivity analysis, in the next section we provide more detailed evidence on some key robustness checks concerning the two components of our identifying variable (the interaction between compulsory language exposure and the native language indicator) and the underlying identifying assumptions of our identi…cation strategy. First, we present the results from several placebo experiments, which aim at providing evidence that our treatment variable (compulsory exposure) is not capturing any spurious e¤ects due to pre-existing trends across cohorts. Second, we repeat the estimations using two alternative proxies of native language, namely parental language and parental regional origins, in order to ensure that our results are robust to the potential endogeneity of self-reported native language. Third, thanks to the availability of these two proxies for native language, we are able to (partially) relax the underlying hypothesis of our identi…cation strategy, requiring that the non-linguistic e¤ects that operate across cohorts are common for both language groups.

Falsi…cation and identi…cation checks
Evidence from placebo experiments. One component of the identifying variable, exposure to Catalan during compulsory schooling, only depends on the year of birth. We need to consider the possibility that compulsory exposure could capture spurious relations due to potential cohort-speci…c trends in (language-related) couple formation and/or language use. We have run a set of placebo experiments, which aim at providing evidence that our identifying variable is not contaminated by any spurious e¤ects.
We consider the reduced form equation to test for falsi…cation. Equation (5) shows the reduced form representation of our baseline 2SLS approach, where RF is the coe¢ cient that "directly"relates exposure to Catalan during compulsory schooling among native Spanish speakers with the outcomes of interest.
Then, we consider the placebo sample of never-treated individuals born between 1944 and 1969 who were schooled in Catalonia before the reform was implemented (i.e., they were never exposed to Catalan during compulsory schooling). Therefore, also in line with the falsi…cation strategy adopted by Clots-Figueras and Masella (2013), we impute years of (pseudo) exposure to Catalan at school (ce t ), which are imputed "as if" the reform had been applied from 13 to 20 years before the true reform; that is, …rst in 1970 instead of 1983, then in 1969 and so forth (until 1963). 34 We estimate the reduced form model (5), but using the placebo sample of individuals born in Catalonia (or migrated from the rest of Spain, before age 6) who were never a¤ected by the compulsory component of the reform: Obtaining a positive and signi…cant coe¢ cient for placebo exposure would cast doubt on the reliability of our (real) exposure variable, because it could be re ‡ecting pre-existing cohort trends that apply to the outcomes of interest. However, the battery of falsi…cation experiments we performed suggest that this is not the case. In fact, while the reduced form estimates based on real reform exposure re ‡ect a positive causal e¤ect of our identifying variable on both outcomes (see the …rst column of Table 6a and 6b, respectively), all the coe¢ cients associated with the di¤erent placebo exposure variables are small in size and not statistically di¤erent from zero.
Overall, this evidence suggests that the compulsory exposure variable constructed à la Clots-Figueres and Masella (2013) is unlikely to be capturing spurious relations, unrelated to the policy reform, as also highlighted in their original paper.
Native languages. We also address the validity of the second component of the identifying variable: the de…nition of native Spanish speakers. It could be argued that the self-reported native language might not be exogenous; respondents could be in ‡uenced by endogenous factors. In particular, some Spanish speakers might be tempted to misreport their true native language in favor of Catalan (or Spanish and Catalan), perhaps because of the in ‡uence of the language-in-education reform on their self-identi…cation. In order to address these concerns, we have replaced the native language variable used in the baseline estimations by two alternative proxies.
In particular, an individual is classi…ed as a native Spanish speaker: (i) if both parents have Spanish-only as native language (parental language) or, alternatively, ii) if both parents were born outside Catalonia (parental origins). We then reestimated our 2SLS model using these two alternative de…nitions of language groups.
The results obtained for each of the two proxies of native language are presented in column (a) of Tables 7a and 7b, respectively. These estimates are generally similar than those obtained using the original native language variable. We only observe a mild reduction in the coe¢ cient of Catalan skills on the partner's language equation when individuals are classi…ed into language groups by parental language, and somewhat higher coe¢ cients for both outcomes when the groups are formed by parental origins. 35 This evidence indicates that the main results are robust to the use of alternative proxies of native language. Moreover, the fact that the estimates obtained using parental origins as proxy for native language are higher than in the baseline estimation is consistent with the idea that the sub-population of compliers that is captured by this new instrument are individuals a¤ected by the reform with both parents born outside Catalonia, who are likely to be more sensitive to exposure to Catalan at school. In other words, native Spanish speakers with at least one parent born in Catalonia were probably exposed to Catalan through alternative channels, and hence were less sensitive to the reform than their counterparts with both parents born outside Catalonia. 36 The availability of two alternative proxies to de…ne language groups opens the possibility of analyzing the sensitivity of the results to the main identifying assump- 35 Notice that using parental language as a proxy for native language creates some ambiguity in the (few) cases in which the individual declares that both parents had both Catalan and Spanish as their native languages. However, the results are virtually the same when these observations are excluded (detailed results available upon request). 36 Nevertheless, de…ning language groups on the basis of parental origins is not ideal for the purpose of testing the predictions of the theoretical model (which is structured around the concept of native language), since there is a relevant fraction of individuals with Catalan origins (i.e. at least one parent born in Catalonia) who are native Spanish speakers (around 20%). tions in our model. First, we were able to specify two alternative overidenti…ed 2SLS models, in which we use exposure to Catalan interacted with both the native language indicator and each of the two alternative proxies as exclusion restrictions.
The results obtained from the overidenti…ed models are presented in column (b) of Tables 7a and 7b for Spanish speaking parents and parents of non-Catalan origin, respectively. In both cases, the point estimates of interest are very similar to those obtained from the baseline speci…cation. More importantly, the Hansen J test for overidenti…cation does not reject the null hypothesis that the exclusion restrictions can be reasonably excluded from the outcome equation(s). This result points out that the instrument seems to be uncorrelated with unobservable determinants of partnership formation and language use (i.e. identity feelings, aspirations, social networks, etc.). Second, we are also able to perform an additional (and related) exercise. We relax the hypothesis that the only channel through which exposure to Catalan during compulsory schooling of native Spanish speakers a¤ects the outcomes is through language pro…ciency, by including the interaction between language exposure and each of these two proxies as a control in the outcome equations (column (c) of Tables 7a and 7b). In this case, we obtain higher point estimates for Catalan skills when we consider the …rst proxy, which also lose precision (and strength of the instrument) due to the correlation between the exclusion restriction and these control variables. When we instead control for parental origins interacted with exposure to Catalan, the coe¢ cient of Catalan pro…ciency for the partner's language equation is virtually identical to the baseline (but again imprecisely estimated), while it becomes smaller for the language use equation. In any case, the coe¢ cients for the interaction between exposure to compulsory schooling and the two alternative proxies for language groups is not statistically signi…cant and very small in size (which is consistent with the evidence from the overidenti…cation test).
Common non-language e¤ects. We have also tried to relax the assumption that the direct cohort e¤ects in the two outcomes are common to native Spanish speakers and native Catalan speakers, which is a non-trivial underlying hypothesis of our identi…cation strategy. We allow for language-speci…c cohort e¤ects by including interactions between year of birth and indicators of the above language group proxies. This should capture potentially heterogeneous cohort e¤ects on each of the two outcomes. Therefore, the 2SLS equations become Cat itl = + 0 X i + I (l = Spanish) + I (l = Spanish) ce t + ' l t + u itl (7) where l is one of the two proxies of native language, and the terms ' l t and l t represent birth-cohort …xed e¤ects that are allowed to di¤er by either parental language or parental origins. The corresponding estimates are presented in column (d) of Tables 7a and 7b, respectively, and show the same pattern that emerged from the models that contain the interactions between exposure and language proxy as controls. That is, the coe¢ cients for Catalan skills are somewhat higher (and imprecisely estimated) when parental language is considered as a proxy, while controlling for parental origin-speci…c year of birth e¤ects yields the same point estimate for Catalan pro…ciency on partnership formation and a small and insigni…cant coe¢cient for the language use equation.
Subsample of native Spanish speakers. As a …nal exercise, we repeat the 2SLS estimation for the subsample of native Spanish speakers using the same spec-i…cation as our baseline model, but using the interaction between parental origins and exposure to Catalan as an exclusion restriction. 37 We estimate the model(s) for the whole sample of native Spanish speakers and also excluding individuals whose partner has both Catalan and Spanish as a native language. These results are displayed in columns (a) and (b) of Table 8. They are qualitatively similar to those obtained from the whole sample, which exploits all the variation among Spanish speakers to identify the causal e¤ects, while here the estimates re ‡ect the variation among Spanish speakers with non-Catalan origins who improved their oral ‡uency in Catalan due to language exposure during compulsory education. Nevertheless, the estimations are less precise and the identi…cation is somewhat weak, but still the results are in line with the evidence presented using the simple OLS.

Discussion and concluding remarks
We have presented empirical evidence and theoretical arguments that endorse the idea that languages are much more than neutral communication devices, due to the plausible existence of some form of emotional attachment. However, one may claim 37 The heterogeneous e¤ect of exposure to Catalan by parental language cannot be used as an exclusion restriction, since virtually all Spanish speakers have both parents who have only Spanish as native language. that our results could also be compatible with alternative, plausible interpretations.
Let us consider the following three alternatives: Alternative 1: Results are driven by a combination of social mobility and assortative matching.
A large fraction of native Spanish speakers either migrated from the South of Spain the 1960's or are their descendants. Thus, native Catalan speakers have enjoyed in average a better socio-economic status. Some of these immigrants or their children have climbed the social ladder, which may have raised their propensity to match with members of these upper social groups, which in turn are more likely to speak Catalan. Finally, native Spanish speakers may more inclined to learn and use Catalan as they improve their socio-economic status, perhaps using the language as a signaling device.
Some of the control variables we use in the estimation actually re ‡ect the socio-economic status of individuals or their families: education of the respondent, parental education, and even the place of birth or residence of the respondent and their families. Hence, if such alternative interpretation had a bite, the introduction of these control variables should a¤ect the point estimates of the e¤ect of language skills on both outcomes. Since this is not the case, and the main estimates are observed to be very stable to the inclusion of various sets of controls, we …nd little support for such an alternative interpretation.
Alternative 2: Results are driven by changes in ethnic or political identity.
It is well known that language is a key symbol of ethnic, national, or class identity. For the case of Catalonia, the American antropologist Kathryne Woolard (Woolard, 1989;Woolard and Ghang, 1990) pointed out that back in the 1980's ethnicity was critical to understanding language attitudes and choices. More specifically, she found that Catalan was perceived by non-Catalan speakers as the language of native Catalans and completely alien to everybody else. Moreover, the adoption of Catalan was interpreted as sheer assimilation. In contrast, Spanish was perceived by almost everyone as "the language of everybody", free of ethnic marks.
Thus, one may wonder if our results may simply re ‡ect the dynamics of ethnic politics in Catalonia. In particular, the educational reform may have a¤ected the frequency of mixed couples (according to our de…nition) not so much by changing language skills and reducing the language con ‡ict, but by inducing a fraction of native Spanish speakers to cross over and become "ethnically Catalan"(that is, by assimilation). In other words, it could be the case that endogamy has remained roughly unchanged, but the composition of ethnic groups has varied over time.
Our data set allows us to tentatively approach the issue of ethnic identity. In particular, we believe that ethnic or cultural assimilation should show up in those respondents who choose a language of self-identi…cation di¤erent from their native language. That is, if an "ethnically Spaniard" (a native Spanish speaker) crosses over and becomes "ethnically Catalan", then such a switch should probably involve adopting Catalan as the language of self-identi…cation. In fact, in our baseline sample, whereas only about 3% of the native Catalan speakers report Spanish as their language of self-identi…cation, about 20% of native Spanish speakers report Catalan as their language of self-identi…cation. When we eliminate these "switchers" from the sample, results remain largely unchanged (see Section 5.2 and Table 5). This can be taken as a informal test for the role played by ethnic identity formation in driving our results. Indeed, this suggestive evidence points out that language skills matter beyond ethnic identity. In particular, native Spanish speakers that keep Spanish as their language of self-identi…cation, to the extent they improved their Catalan skills during compulsory education, are more likely to …nd Catalan speaking partners and use Catalan with their partner more often. This interpretation seems compatible with the latest research on language attitudes in Catalonia (Woolard, 2011 andand Newman, Trenchs-Parera and Ng, 2008). These studies suggests that the perceived link between language and ethnicity has drastically softened and that nowadays both speech communities value bilingual pro…ciency.
Alternative 3: Our instrumental variable may capture spurious relations due to potential cohort-speci…c trends in couple formation.
The historical period under consideration is highly non-stationary in many dimensions. First, it includes two antithetical political regimes (dictatorship and democracy). Second, it has witnessed huge demographic changes; specially, the huge migration in ‡ows of the 1960's and 1970's. Third, the new information and communication technologies, specially during the last decades, might have a¤ected social behavior, particularly in the marriage market. Thus, there may exist underlying trends in couple formation that can be captured by our instrumental variable.
The evidence from our placebo experiments reported in Section 5.2 suggests that the compulsory exposure variable is unlikely to be capturing spurious relations, unrelated to the policy reform.
Summarizing, in this paper we examine the non-communicative aspects of languages both theoretically and empirically. In particular, this is the …rst work showing that policies that promote the acquisition of language skills that appear redundant from a communicative viewpoint can signi…cantly reduce segregation along linguistic lines. We have interpreted these results using an abstract and comprehensive notion of linguistic preferences, which is far more general than the presumed link between language and ethnic identity. We are also con…dent that (at least part of) the increase in the frequency of mixed couples can indeed be interpreted as a reduction in segregation.
Our empirical analysis focuses on the case of couple formation in Catalonia.
Obviously, more research is needed before we can claim that linguistic preferences Hence, A and N increase and B decreases with a : Result 2. The expected utilities of those individuals in potential partnerships with a < g are given by The e¤ect of a on U a has an ambiguous sign: However, both U b and U a + U b decrease with a :