DOCUMENT DE TREBALL XREAP2010-9 Knowledge of catalan, public/private sector choice and earnings: Evidence from a double sample selection model

This paper explores the earnings return to Catalan knowledge for public and private workers in Catalonia. In doing so, we allow for a d ouble simultaneous selection process. We consider, on the one hand, the non-rand om allocation of workers into one sector or another, and on the other, the potent ial self-selection into Catalan proficiency. In addition, when correcting the earni ngs equations, we take into account the correlation between the two selectivity rules. Our findings suggest that the apparent higher language return for public sect or workers is entirely accounted for by selection effects, whereas knowledge of Cata lan has a significant positive return in the private sector, which is somewhat hig her when the selection processes are taken into account.


Introduction
The period of democracy in Spain, which started with the end of the dictatorship (1939 1975), has been characterized by largescale regional devolution.The economic, legal and po 10 ANTONIO DI PAOLO litical decentralization has brought with it a significant degree of independence from the State, especially for regions such as Catalonia.An additional feature of the democratization process is the general expansion of the Welfare State, which has led to a huge increase in public sec tor employment.In Catalonia, the combination of these two factors has meant a great expan sion of local government in the last thirty years, which has progressively gained importance with respect to some of the existing centralized institutions.Another important aspect of the democratization process is the recognition of regional culture and language, which were se verely repressed during the dictatorship (the Franco regime had prohibited the use of Catalan in public and strongly disapproved of its use in private).In the case of Catalonia, this "cultur al devolution" has been spearheaded by two major public linguistic policies aimed at promot ing and enhancing the use of the Catalan language among the population.
Specifically, the first linguistic policy implemented by Catalonia's Autonomous Gov ernment (the Generalitat) was the "Linguistic Normalization Act" of 1983 (Llei de Normalització Lingüística), which aimed to reinstate the public use of Catalan and to stimulate its learning and its use in private.This law established not only that Catalan was to be the offi cial language of the Catalan government and of local public administrations, but also the main language used in primary and secondary education 1 .In order to stimulate and improve private study, the Generalitat also began to organize language courses directed to the adult population, usually offered completely free of charge.More relevant today for the econom ic value of knowledge of Catalan was the "Linguistic Policy Act" of 1998 (Llei de Política Lingüística), which attempted to reassert the presence of Catalan (versus Castilian) by a) in creasing fluency requirements for public sector employees, and b) introducing major incen tives (and in some case requirements) for increasing the use of Catalan in private business and other socioeconomic and cultural domains 2 (Solé and Alarcón 2001).
These two public policies have increased the economic value of knowledge of Catalan in the local labour market, to the extent that proficiency in the language is believed to be very important today.In fact, using Census data of 1991 and 1996 (that is, even before the imple mentation of the linguistic reform of 1998), Rendon (2007) shows that knowledge of Cata lan has a substantial impact on the probability of employment among nativeborn individu als and immigrants from the rest of Spain.Moreover, Quella and Rendon (2009) suggest that the skills of speaking and writing Catalan have a significant effect on employment prospects, especially in whitecollar occupations, services, government and education.Finally, a recent study by Di Paolo and Raymond (2010) stresses the existence of significant earnings returns to Catalan proficiency (defined as the ability to speak and write in Catalan) among first and second generation immigrants in Catalonia.In general, the potential return to Catalan profi ciency exists because, in spite of the efforts at institutional level to raise Catalan fluency among the population, full functional knowledge is far from widespread.
This study attempts to make another step forward, by simultaneously considering the role of knowledge of Catalan in sector choice and the relationship between language profi ciency and monthly earnings.Applying a level of detail that goes beyond our previous study, the general aim of this paper is to determine whether there is an earnings/productivity effect of the knowledge of Catalan, distinguishing between the public and the private sectors.With the objective of providing evidences that are consistent for the whole employed population, we also aim to establish whether the statistical association between knowledge of Catalan and earnings in the two sectors is affected by the potential relationship between language proficiency and private/public sector selection.In other words, we want to examine whether knowledge of Catalan is considered an "advantage" in the two sectors (and consequently in creases an individual's earnings), or whether it merely represents a "requirement" for work ing in the Catalan public sector.We hypothesize that with the strict regulations imposed by the legislation on language mentioned above, and the progressive contraction of public insti tutions of the State (which are not regulated by the regional legislation), Catalan proficien cy will emerge as a prerequisite to enter the public sector, and may represent an advantage in the private sector.
With these purposes in mind, this paper proceeds as follows.The next section offers a brief review of the relevant literature and situates the study; section 3 presents the data used in the empirical analysis and some descriptive statistics.Section 4 describes the economet ric methodology for dealing with this particular issue; section 5 contains the empirical re sults, and section 6 concludes.

Related Literature
In any case, it seems that the relationship etween language proficiency and earnings cannot be considered in isolation from occupation/sector choicerelated decisions.In fact, the papers by Berman et al. (2003) and Lang and Siniver (2009), based on panel data, suggest that knowing Hebrew in Israel has a positive value only in highskill oc cupations, and that the apparent language return for lowskilled workers is due entirely to unobserved heterogeneity (ruled out by taking the first difference from the longitudi nal estimates).However, neither of these papers explicitly considers that occupation is the intervening activity that links income to education and the other forms of Human Capital -including language proficiency.This would mean that language knowledge may play an important role in determining the type of occupation the individual can enter; therefore, more fluent individuals would tend to be selfselected into occupations with higher language requirements.This possibility has been explicitly considered in a recent paper by Chiswick and Miller (2010), who exploited the O*NET database that contains information on occupationspecific language requirements in the US.The au thors find a positive sorting of more proficient workers into occupations that require a higher level of language fluency; moreover, they argue that the positive effect of lan guage proficiency is higher the higher the level of English required in the occupation.Finally, Aldashev et al. (2009) explore the effect of language knowledge on immigrants' earnings in Germany, considering the potential effect of language fluency on the simul taneous selection process into economic sector and occupation type.After estimating a twostep model with multiple simultaneous sources of selection, they find that when the positive effect of language proficiency in the simultaneous selection of economic sector and occupation type is taken into account, there is no evidence to suggest a pure produc tivity language effect.In other words, it seems that the earnings return to language knowledge is only indirect, because more proficient workers are more likely to be se lected into highpaid works.
As briefly mentioned above, in this paper we investigate the potential earnings return to proficiency Catalan in the public and private sectors.Given the strict regulation of language requirements in public sector occupations, we suspect that the economic value of knowledge of Catalan may differ radically between these domains.Nevertheless, we believe that, in the case of Catalonia, language proficiency and the public/private sector choice can be taken as a joint simultaneous decision, and that this simultaneity must be taken into account in order to obtain a correct estimate of the earnings return to Catalan in the two sectors.Therefore, as references we also take some of the studies of the earning differentials across the public and private sectors.
In particular, we refer to studies that consider the existence of a potential selectivity process behind the sector choice decision, which is taken into account in the estimation of earning determinants for predicting earning differentials in the two sectors -normally using Endogenous Switching Models (see van der Gaag and Vijverberg 1988, Hartog and Oost erbeek 1993, Dustman and van Soest 1998, Adamchik and Bedi 2000, Bender (2003) among others).We also refer to two other studies that consider the existence of other sources of selection apart from the public/private sector choice, which are treated in much the same way as in Aldashev et al. (2009).Specifically, the paper by Christofides and Pashardes (2002) takes into account the existence of a double selection problem in the choice between self and paidemployment and public/private sector selection.Neverthe less, their results indicate that these two underlying choices are not interrelated, and they eventually estimate the wage equations controlling for two independent selection correction terms-one for the type of employment and the other for the desired sector.Moreover, Heit mueller (2006) treats labour force participation and sector choice as simultaneous deci sions, and he takes account of this potential double selection mechanism for computing the earning gap between public and private sector employees.Finally, Recotillet (2007) esti mates the return of participating in a postdoctoral training for French doctors who work in the private sector, which seems to be positive when estimated by simple OLS.However, once controlling for the simultaneous selectivity of the participation in a postdoctoral pro gramme and the sector choice, she finds that the postdoctoral programme has no effect on earnings.
Following these two strands of the literature, we start by modelling a Bivariate Probit Equation to explain, on the one hand the propensity to be proficient in Catalan or not, and on the other hand the decision to work in the public or the private sector.Subsequently, as explained in detail in what follows, from this bivariate estimation we construct two selectiv ity correction terms -one for sector choice and the other for knowledge of Catalan respec tively-that take into account the simultaneity of the two selfselection mechanisms when es timating the earnings equations.In general, the inclusion of the correction terms would adjust the biases in the estimates caused by the nonrandom assignment of workers in the economic sectors, and by the potential selfselection into knowledge of Catalan.However, if the likelihood of working in one sector or the other and the propensity to be fluent in Cata lan are interrelated variables (this is a realistic possibility in the case of Catalonia), neglect ing this simultaneity would lead to inconsistent estimates of the economic value of knowl edge of Catalan in the two sectors.
As illustrated below, the empirical results indicate that the two selection processes are positively related, and this relationship must be taken into account for a correct estimation of the economic value of knowledge of Catalan in the public and private sectors.Moreover, the results indicate that the apparently higher return to knowledge of Catalan in the public sector estimated by a simple regression is accounted for entirely by the fact that proficient workers are more likely to be selected into the public sector (and vice versa).In contrast, there exists a significant return to proficiency in Catalan for individuals who work in the pri vate sector, which is significantly higher when selection into language knowledge and eco nomic sector is accounted for.This empirical evidence could be taken as an indication that the economic value of Catalan proficiency for public sector workers is only represented by a higher chance of working in this sector, but (once entered) language knowledge will have little effect on improving earning opportunities 5 -i.e.knowledge of Catalan is merely an entry requirement.However, in the absence of a strict regulation regarding language capa bilities in the private sector, fluency in Catalan may represent an advantage for proficient (private) workers, which is reflected by higher expected monthly earnings.

Data and Descriptive Statistics
The empirical analysis is based on the data from the 2006 "Survey of Living Conditions and Habits of the Catalan Population (ECVHP06)" carried out by the Statistical Institute of Catalonia (IDESCAT).The original sample comprises 10,358 observations of individuals aged 16 or more residing in Catalonia.The survey aims to obtain socioeconomic and demo graphic information of the overall population, focusing on individuals and their families.The data were collected between the fourth trimester of 2005 and the third trimester of 2006; therefore, the information on individual labour market status and monthly earnings reflects the situation in 20052006.Analyzing this period is very attractive for our purposes, since the unemployment rate was exceptionally low (6.6%) 6 ; this means that we can focus only on the employed population, as we consider that neglecting the potential selfselection into em ployment would not be problematic during this period 7 .Moreover, the relatively high rate of female activity (52.3%) allowed us to include women in the analysis.This means that, as commented above, our final aim is to provide evidences that are consistent for the employed population in 20052006.
The final sample consisted in 5,019 observations of all the individuals aged 16 to 65 in regular employment, with valid information on earnings (net of taxes), which are recorded in brackets.Notice that the intervalcoding of monthly earnings is the most important limi tation of the ECVHP06 database; in fact, it makes unfeasible the construction of an hourly wage measure, which represents the standard dependent variable in a Mincertype regres sion.Therefore, we are forced to limit the analysis to monthly earnings, but we address this issue in the Robustness Checks section (5.4).Anyway, the ECVHP06 represents the unique database that enables the analysis of the earnings returns to Catalan knowledge, because it contains the information about language skills and several potential instruments for dealing with its endogeneity in the earnings equation 8 .
The information on Catalan knowledge contained in the survey is reported in four cate gories: namely, an individual may claim he/she "does not understand", "understand but is unable to speak", "is able to speak but not to write", and "is able to speak and write" Cata lan.Only individuals who could speak and write Catalan were considered fully proficient.This restrictive definition of language proficiency might help to minimize the potential measurement error in the selfreported language knowledge variable 9 .Table 1 contains a basic description of the selected sample, in terms of the two relevant dimensions of analysis -i.e.proficiency in Catalan and economic sector.Specifically, each cell of this 2×2 table contains the expected frequencies of every possible outcome, and the rowpercentages in italics inside brackets.
Slightly more than 16% of the selected sample work in the public sector, and 83% of them are fully proficient in Catalan.In all likelihood, the remaining 17% comprise individ uals who work in the institutions of the central government, where knowledge of Catalan is not strictly considered a requirement.In the private sector, which represents 83% of the final sample, the proportion of fully proficient workers falls to 61%, reflecting the lack of any strict public regulation concerning knowledge of Catalan in this sector.These differences in the distribution of language capital in the Catalan labour market may be reflected in differ ent rates of return to language knowledge in the two sectors.Given that knowledge of Catalan is significantly more widespread in the public than in the private sector, one might expect to observe higher return among private sector workers.Table 2 illustrates the means of log earnings 10 in the two sectors according to Catalan proficiency.This descriptive evidence is the exact opposite of what we might have expected.As commonly reported in the literature, public sector workers earn significantly more than those in the private sector and the positive statistical association between monthly earn ings and language proficiency is also higher in the former (0.20 log points).Moreover, the positive earnings premium in the public sector is significantly more pronounced among pro ficient workers 11 .Nevertheless, the interest lies in the ceteris-paribus earnings return to knowledge of Catalan, unaffected by the potential mechanisms of selfselection behind this estimation (see the next section for details).
Therefore, in order to model monthly earnings, knowledge of Catalan and sector choice, we exploit all the relevant information contained in the ECVHP06 database 12 .Table 2A in the Appendix contains the basic descriptive statistics separately for public and private work ers.The subsample of public sector workers is somewhat older, highly feminized, and bet ter educated than the subsample of private sector workers.As expected, there is a higher presence of foreign workers in the private sector.Moreover, public sector workplaces are more stable, given the higher job tenure and the higher unionization rate; finally, the propor tion of parttime workers is almost the same in both sectors.

Empirical Strategy
We start by estimating two logearning regressions: one for private sector employees (PUB i =0) and one for public sector employees (PUB i =1); the d coefficients in (1) represent our parameters of interest, which capture the percentage earnings increase associated with Catalan knowledge A selectivity problem arises when the likelihood of entering the public sector and/or the propensity to achieve full language proficiency depend on unobservable individual charac teristics that are potentially related to the unobservable earnings determinants.The two se lection processes can be treated with the standard methods proposed by Lee (1978) and Heckman (1979), but only if the two selection rules are strictly independent.
However, in our case, the selection rules -i.e.public/private sector choice and proficien cy in Catalan-are clearly unlikely to be independent.In fact, because of the Catalan institu tional setting those who work in the public sector are, in general, more likely to know Cata lan, and those who are fully proficient in Catalan may be more likely to work in the public sector 13 .This means that we must deal with a joint double selection rule, which can be writ ten as where Z i and W i contain the observable determinants of the latent propensity to know Cata lan (CAT*) and of the desired sector choice (PUB*) respectively, and r u,w represents the cor relation coefficient between the unobservable elements of the two equations.If this correla tion coefficient is statistically different from zero, we must generalize the selectivity prob lems to a double simultaneous selection process, which can be addressed with the method ology proposed by Fishe et al. (1981), Ham (1982) and Tunali (1986), and more recently used by Heitmueller (2006), Recotillet, (2007) and by Aldashev et al. (2009) among others.Specifically, taking expectation of the earnings equations in (1) we obtain that, where the last terms in both equations contain the joint double selectivity bias; following the twostep procedure proposed by Tunali (1986), we consider this generalized selectivity problem as a double simultaneous selection situation, with full information on the outcomes of the two selection rules (giving four distinct cells).That is, as shown in the descriptive analysis, the sample contains cases of proficient individuals in either the public or the pri vate sector.Moreover, there are also public sector employees who are not proficient in Cata lan, the ones who work in the central government's institutions -that do not consider knowl edge of Catalan as a strict requirement for workers.Finally, we obviously observe private sector employees who do not know Catalan, given the absence of a general legal requirement regarding knowledge of Catalan, and the coexistence with Spanish.In addition, we may also reasonably assume that the two selection rules are simultaneous (rather than sequential), given that during the "linguistic normalization" process, public sector workers with limited knowledge of Catalan were allowed to improve their proficiency by attending specific lan guage courses for publicemployees, provided free of charge by the Catalan government (and normally taught during part of the working day).
Therefore, Tunali (1986) shows that, assuming a joint normal distribution of the error terms (e j , u, w) with zero mean and variancecovariance matrix 14 the correction terms for the two selectivity processes would take the form: where and F(•) stands for the Bivariate Normal Distribution of the predicted probabilities comput ed from the joint estimation of (2a) and (2b) with a Bivariate Probit model.This means that the conditional expected earnings in (3) can be written as where l PUB and l CAT represent two additional variables that must be included in the earnings equation for the two sectors, in order to correct the estimation of the parameters of interest (the d coefficients) for the potential bias caused by the double simultaneous selectivity prob lem described above.Note that if the correlation between the error terms of the two selec tion rules r u,w is equal to zero, the l terms reduce to two independent correction terms, as in Heckman's standard method.On the other hand, if r u,w ≠ 0, neglecting the statistical rela tionship between the selection rules would still lead to inconsistent estimates.

Identification
In general, standard selectivity models à la Heckman require the presence of at least one exclusion restriction to ensure that the parameters are identified not only because of the nonlinearity of the selectivitycorrection term.This means that at least one variable that appears in the selection equation can be reasonably assumed to be excludable from the outcome equation(s) of the second stage.However, as pointed out by Tunali (1986, pp. 245), the bivariate selectivity model requires additional exclusion restrictions for identifying the correlation coefficient parameter of the error term of the simultaneous selection equations.That is, at least one determinant of each selection process must not be related with the unexplained earnings component.Moreover, for complete identifica tion at least one variable included in the sector choice equation must not influence Cata lan knowledge and vice versa, and this variable must not appear in the earnings equa tions.
In the case of the Catalan proficiency equation, like Rendon (2007) we assume that hav ing received schooling in Catalan (i.e. after the 1983 reform) affects knowledge of Catalan but does not directly affect individual earnings.We also assume that after controlling for the years since migration in the earnings equations, having arrived at the age of 10 or younger only affects individual's earnings through Catalan proficiency.Moreover, we consider that language use with the children acts as a determinant of language proficiency and can be ex cluded from the earnings equation.Finally, regional origins of national immigrants only ap pear in the Catalan knowledge equation.
In order to identify the sector choice equation, we consider that father's occupation in fluences sector choice but does not directly affect individual earnings (following Dustmann and van Soest 1998).Moreover, in accordance with Christofides and Pashardes (2002), the sector choice equation contains a variable indicating whether the individual's spouse or partner is an employer (capturing a potential reduction in the cost of jobsearch in the pri vate sector), and another variable indicating whether the individual perceives income from other sources (nonlabour income).The number of children is assumed to influence only sector choice, and does not directly affect earnings.In order to reinforce identification, years of completed schooling are included in the knowledge of Catalan and earnings equa tions, but completed education is included as a dummy variable in the sector choice equa tion (Hartog and Oosterbeek 1993, among others, adopt a similar strategy).This reflects that earnings and language knowledge depend on the length of completed studies (human capital view), whereas the sector choice decision is more closely related to the legal value of educational certificates.Moreover, following the same logic, we also assume that the type of University studies only affects the decision to enter the public sector (and does not directly affect earnings).

Baseline Earnings Equations
The analysis of the empirical results starts with the estimation of (1) without accounting for selection 15 , whose estimates are reported in table 3; the high R 2 indicates that the covari ates included have satisfactory power for explaining the log of monthly earnings.Compar ing the estimates across the two sectors we observe that nativeborn individuals of Catalan origin (i.e. with at least one parent born in Catalonia) earn somewhat less than secondgen eration immigrants in the private sector; moreover, immigrants from elsewhere in Spain present a clear earnings advantage in both sectors 16 .There is no clear penalization for Euro pean immigrants, whereas private sector workers proceeding from Africa, Asia and other countries earn significantly less than nativeborn immigrants, even accounting for Catalan knowledge.Immigrants who arrived many years ago are paid less than recent immigrants with similar characteristics, but only in the private sector.Females earn less than males, al though the earnings gap is somewhat lower in the public sector; and, as commonly found in the literature, married individuals tend to earn more than their unmarried counterparts.
The return to one additional year of schooling is considerably higher in the public sec tor 17 , while an additional year of job tenure has practically the same impact on monthly earn ings in both sectors.Previous experience shows a positive linear effect on earnings in the public sector 18 and an inverse Ushaped effect in the private sector.As expected, monthly earnings for parttime are almost the 40% lower than for fulltime workers.Union members earn more than nonunion members, and the earning effect of union membership is signifi cantly higher in the private sector.Among private workers, those who work in a large firm and those who are selfemployed earn more than the mean.-0,031 0,017 Born in the rest of Spain 0,089 0,077 Born in Europe -0,025 -0,005 Born in Latin America -0,079 -0,04 Born in Africa -0,176 -0,092 Born in Asia or other countries 0,052 -0,092 YMS/10 -0,003 -0,017 Female -0,251 -0,288 (-10.5)(-24.28)Married 0,053 0,077 Years of Schooling 0,059 0,042 (-14,96) (-22,3) Job Tenure (in months)/10 0,011 0,01 (-10,44) (-14,67) (Previous) Experience/10 0,068 0,121 (-3,75) (-6,94) Experience2/100 -0,02 - (-4.72) Part-time Worker -0,388 -0,438 (-8.29) (-19,84) Union membership 0,077 0,09 #Workers>500 0,053 - (-3,51) Self-Employed 0,109 - (-5,86) Living in Barcelona 0,023 0,02 Moreover, proficiency in Catalan has a significant and positive effect on monthly earn ings in both sectors and, consistent with the descriptive evidence presented above, the return to knowledge of Catalan seems to be higher in the public sector; indeed, point-estimates in dicate a 5% (= exp(d)-1) return in the private sector, whereas the language premium for pub lic workers is reflected in extra earnings of 9.4%.However, as noted above, these estimates may be seriously biased.One possible source of bias is the potential non-randomness of the mechanism that allocates workers in the public or the private sector.Unobserved individual heterogeneity may represent another source of bias, if individuals opt to learn Catalan on the basis of their unobservable attributes -potentially related to unexplained earnings compo nents.Finally, a third source of bias may be the correlation between the unobservable deter minants of the two selection mechanisms (Catalan proficiency and sector choice).

Bivariate Selection Equations
In order to deal with these multiple sources of bias, we implement the double simulta neous selection correction with the methodology presented in the previous section.The first step is the joint estimation of the selection equation ( 2a) and (2b) to explain Catalan profi ciency and sector choice respectively.Table 4 shows the maximum likelihood estimates of the resulting Bivariate Probit.
The results of the estimation of the knowledge of Catalan equation indicate that females are somewhat more likely to speak and write Catalan than males with similar characteristics.As expected, the propensity to be proficient in Catalan decreases with age, indicating that older individuals have more difficulty in assimilating the language.Second-generation im migrants are clearly less likely to speak and write Catalan than native-born individuals of Catalan origin.Individuals born outside Catalonia are also clearly penalized, except for those from eastern Spain (Valencia and Balearic Islands); this result is no surprise, since Catalan is also spoken in these regions of Spain (even though it is less institutionalized).Moreover, the disadvantage is even higher for those individuals who were born outside Spain, especial ly for Latin American immigrants; in all likelihood, this is because their mother-tongue is Spanish, and the incentives for learning Catalan are lower for them (ceteris paribus).How ever, the positive and statistically significant coefficient for time in Catalonia (years since migration) indicates that a longer exposure to the local language favours its assimilation 19 .
Schooling is clearly one of the most important determinants of the probability of speak ing and writing Catalan, with a positive and highly significant estimated coefficient.As found by Rendon (2007), linguistic assimilation is easier for immigrants who arrived at a young age (even controlling for the years since migration); moreover, individuals affected by the 1983 language legislation are more likely to be able to speak and write Catalan, with a stronger effect for those who were schooled entirely in Catalan after the 1983 reform.Our results also show that the individual's environment plays an important role in explaining the chances of achieving language proficiency.In fact, use of the language with the children 20 significantly increases the probability of speaking and writing Catalan.The estimates of the sector choice equation reveal that females are significantly more likely to work in the public sector than males.As commonly found in the literature, the num ber of children does not significantly affect the probability of working in one sector or in the other.The likelihood of being selected in the public sector increases with age but at a de creasing rate; foreigners are less likely to enter the public sector, and those who were born in Europe are even less likely to do so than their nonEuropean immigrants.
Individuals with a post-compulsory education certificate have a higher chance of work ing in the public sector than individuals with lowersecondary education or less; among ter tiary educated workers, those who studied exact sciences or social sciences at University are significantly less likely to work in the public sector than those who studied humanities.As expected, the relative cost of searching for a public sector job is higher for those individuals whose partner is an employer; moreover, being the child of a skilled whitecollar or skilled bluecollar father increases the likelihood of having a public occupation.Surprisingly, those individuals who have some nonlabour income are more likely to work in the public sector.
Finally, the correlation coefficient r is positive and statistically different from zero, uw which means that the two selection rules are not independent.Specifically, this result indi cates that individuals who are more likely to be proficient in Catalan are also more likely to work in the public sector and vice versa; notice that in this Bivariate Probit model the posi tive relationship between the two selectivity mechanisms is indirect -i.e. it is captured by the correlation between the unobservable of the two equations.The evidence of significant cor relation between the disturbances of the selection equations also suggests that joint estima tion provides more efficient results than independent estimation of the two selection rules.In addition, it implies that this correlation should be taken into account in order to obtain a consistent estimate of the return to knowledge of Catalan in each sector, because controlling for the two selectivity rules assuming that they are independent may not entirely eliminate selection bias(es).

SelectivityCorrected Earnings Equations
As noted above, the previous estimation of the language return in the private and public sectors could be biased on the one hand by the nonrandomness of the allocation of workers into one sector or another, and on the other hand by the selfselection into knowledge of Catalan.We should also take into account the positive correlation between these two selec tivity rules, as suggested by the previous results.In order to obtain a consistent estimate of the d parameters for each sector, we implement the bivariate selection correction as present ed in section 4.Under the assumption of validity of the identification conditions, table 5 re ports the bivariate selectivitycorrected earnings equations for public and private sector workers, which contain the consistent estimates of the return to knowledge of Catalan in the two sectors 21 .The bivariate estimation of the two selection rules enables us to construct the selectivitycorrection terms in (5), which have been inserted into the earnings equation as ad ditional regressors (eq.7).Notice that the coefficients' standard errors have been obtained through bootstrapping 22 , given that the calculation of the correct variancecovariance matrix obtained by Ham (1982) and by Tunali (1986) is cumbersome.
In general, the estimated coefficients for both sectors are roughly identical to those esti mated using the simple Interval Regression Method, and we will not describe them again for brevity reasons.Even so, we observe some interesting differences with respect to the previ ous estimates, which appear to be worth analysing in more detail.The minor changes in the earnings equation estimates are consistent with the reduction in the return to schooling and potential previous experience when estimated with the double simultaneous selection correc tion.
Above all, the significant changes concern the estimation of the return to knowledge of Catalan in the public and private sectors.Specifically, the apparently higher return to lan guage knowledge in the public sector estimated by standard regression methods (i.e.without controlling for the endogenous selectivity) seems to be composed entirely by selectionbias effects.In fact, when we take into account the selection process behind sector choice and lan guage proficiency and the correlation between the two selectivity mechanisms, the return to Catalan knowledge for public sector workers is statistically zero.In contrast, the estimated return to Catalan proficiency for private sector workers is significantly higher when the two simultaneous sources of selection are taken into account, representing almost 13% (≈ exp(d)-1) of extra monthly earnings 23 .-0,043 0,032 Born in the rest of Spain 0,047 0,106 Born in Europe 0,031 0,083 Born in Latin America -0,136 0,059 Born in Africa -0,218 -0,024  Years of Schooling 0,054 0,029 (-6,37) (-9,64) Job Tenure (in months)/10 0,01 0,009 (-8,58) (-11,56) (Previous) Experience/10 0,051 0,112 Experience2/100 -0,019 - (-4.44) Part-time Worker -0,389 -0,438 (-8.23) (-20,04) Union membership 0,078 0,088 #Workers>500 0,055 - (-3,73) Self-Employed 0,115 - (-6,05) Living in Barcelona 0,022 0,017 Notice also that the correlation coefficients between the unexplained earnings compo nent and the error term of the sector choice equation are negative in both equations.This shows that an individual who is selected for work in the public sector performs worse than a random individual.However, the estimated correlation coefficient is clearly statistically significant only in the private sector equation, but not in the public sector equation (proba bly due to the reduced sample size).Moreover, the correlation between the earnings equa tion's error term and the unobservable determinants of Catalan proficiency is positive for public sector workers and negative for private sector workers.This could indicate that the apparently higher language return in public sector may only reflect the fact that those pub lic workers who are more likely to be proficient in Catalan are also more likely to earn more, and are also more likely to be allocated in that sector; nevertheless, we do not have sufficient statistical evidence to argue that this correlation is different from zero for public sector workers.In contrast, the significant correlation between the unobservable determi nants of language proficiency and unexplained earnings is negative in the private sector, suggesting that those individuals whose propensity to know Catalan is largely determined by unobservable determinants of language knowledge earn less than the mean private sec tor worker.

Robustness Checks
The evidences presented up to this point indicate that, among the whole employed pop ulation in Catalonia, there exists a positive relationship between Catalan proficiency and monthly earnings.This association is, apparently, more pronounced in the public sector than in the private sector.On the contrary, when the double selection process (Catalan knowl edge/sector choice) is taken into account, the language return is null for public workers, whereas it is clearly positive for private workers.However, the inclusion of the overall em ployed population may generate some concern about the presence of (neglected) individual heterogeneity, and its potential effect on the empirical results 24 .In this subsection we check for the robustness of the results obtained, considering three different restrictions regarding the subsample that has been used for the estimations.
First of all, we deal with a specific restriction of the database used in the empirical analysis.As commented above, the dependent variable used in the estimates consists in (intervalcoded) monthly earnings, which means that the construction of an hourly wage measure is unfeasible 25 .Moreover, the joint estimation two equations, one for monthly earnings and another for the hours of work, together with the complex double selectivity process, introduces additional econometric complications.Consequently, we repeat the analysis excluding from the estimation sample all the individuals who work less than thirty or more than sixty hours per week (Restriction 1).A sec ond point consists with the inclusion of selfemployed workers among the group of private work ers; indeed, the information about monthly earnings may have a very different meaning for self employed workers, and the role of language knowledge might also be different for them.Then, selfemployed workers are dropped from the estimation sample (Restriction 2).
Finally, we consider another issue related to the heterogeneity of the language acquisi tion process.The idea is that experiencing some premigration exposure to Catalan is almost impossible for those immigrants who proceed from nonCatalan speaking regions 26 .This means that the language acquisition process of adult immigrants from nonCatalan speaking regions is hardly comparable to that of younger immigrants, and of those who were already exposed to the language during their childhood (i.e.nativeborn and immigrants from Cata lanspeaking regions).Therefore, we perform all the estimates including only native individ uals, immigrants from Catalanspeaking regions and other immigrants who migrated to Cat alonia at a (relatively) young age (Restriction 3).Instead of classifying individuals according to a fixed age at arrival threshold, we opted for including only those who migrated before their potential entry into the labour market (i.e. years since migration > potential experien ce 27 , defined as age -years of schooling-6).
The estimates obtained under these three restrictions (separately and jointly implement ed) are contained in table 6; for brevity reasons we only report the estimates for the param eter of interest (the return to Catalan knowledge, d) for both sectors, with and without con trolling for the double selectivity process 28 .Consistently with the evidence obtained from the baseline specification (i.e.table 3 and 5), the return to Catalan knowledge seems to be high er for public sector workers (without considering the joint selectivity).But for each of the proposed restrictions, and even with a combination of the three, the estimated return is vir tually zero when the two selection rules are taken into account.Moreover, the language re turn for private workers is always positive, but higher when the selection correction terms are included as additional regressors.In each case, the coefficients' standard errors indicate that the point estimates are not statistically different from the ones obtained using the base line specification; however, as expected, the restrictions on the estimation sample generate a substantial loss of precision.In sum, these robustness checks suggest that the results ob tained are quite stable, which make us more confident about the general evidences regarding the economic value of Catalan knowledge in the public -and in the private sector.Restriction 3: including only nativeborn individuals, national immigrants from Catalanspeaking regions and other immigrants who migrated before entering the labour market (i.e. years since migration > potential labour market experience, computed as "age -years of schooling -6").

Discussion and Conclusion
This paper investigates the economic value of knowledge of Catalan for private and pub lic sector workers in Catalonia.The descriptive evidence and the results from a simple esti mation indicate that, apparently, the earnings return to being able to speak and write Catalan (our measure of linguistic proficiency) is positive in both sectors, but is significantly higher for public workers.However, following the main literature, we argue that both knowledge of Catalan and the decision to work in the public or in the private sector are choice variable; this represents a double selectivity process, which must be taken into account in order to ob tain a consistent estimate of the return to Catalan proficiency in the two sectors.In addition, in accordance with the Catalan institutional setting, we enable the potential correlation of these two selection rules, which we control for by implementing a double simultaneous se lection correction of the earnings equations.
Once this complex selfselection process is taken into account, the results are complete ly different and are consistent with our ex-ante expectation.Specifically, on the one hand, the return to knowledge of Catalan is virtually zero for public workers when we control for selection on observable and unobservable into proficiency and sector choice, as well as for the significant positive correlation between the unobservable determinants of the two selec tivity rules.On the other hand, when allowing for the double simultaneous selection process the return to Catalan proficiency in the private sector is still positive and is significantly higher (rising from 5% up to 13% of extra monthly earnings).These results suggest that there is no productivity effect of knowledge of Catalan among public sector workers, and that the positive economic value of language proficiency consists only in a higher chance of being selected into that sector.
In contrast, Catalan proficiency seems to increase productivity for private sector work ers (assuming the correspondence between earnings and productivity), given that we obtain a positive earnings premium even controlling for the double simultaneous selection.In more detail, we found that the return to language proficiency in the private sector is underestimat ed using a standard regression, because of the presence of negative selection effects; first, the propensity to be proficient in Catalan and the likelihood of working in the private sector are clearly negatively correlated.Second, the private sector workers who are more likely to be selected in the public sector perform worse than a random private sector worker.Third, those individuals who are more likely to be proficient in Catalan (keeping the observable de terminants of language knowledge fixed) tend to earn less than a random individual.Espe cially with respect to the third point, it is quite likely that this negative selection of proficient workers may operate through occupational choices -i.e.private workers who are more like ly to be proficient in Catalan because of their unobserved language determinants are also more likely to be selected into lowpaid occupations than others.Indeed, potential caveat of this work is that it neglects the role played by the type of oc cupation and its interrelation with language proficiency, in the spirit of Aldashev et al. (2009).However, we consider that occupationtype selection is a less relevant issue for es timating the return to knowledge of Catalan, because we believe and assume that education, and not Catalan proficiency, is the main channel for entering highskill occupations.In other words, highly educated individuals may manage to enter highlypaid occupations with or without being fluent in Catalan (e.g. in multinational firms where English is the main lan guage spoken).In contrast, loweducated individuals are precluded from entering highly paid and highskill occupations, regardless of their functional knowledge of Catalan.Even so, occupational components may account for some part of the estimated productivity effect for private sector workers; therefore, extending the selection process to a potential occupa tional selection for private sector workers would be an interesting issue for future research into the economic value of knowledge of Catalan in the labour market.
In any case, the global results show the existence of a positive economic value of the knowledge of Catalan.Even though we still need to clarify whether the positive estimated value for private sector workers corresponds to a productivity effect or to an occupational ef fect, it is clear that knowledge of Catalan only represents a selection effect in the public sec tor -i.e.Catalan proficiency does not increase productivity of public sector workers.Defi nitely, this result questions the strict regulation and the high requirements of knowledge of Catalan in the Catalan public sector.On the one hand, it seems that after accounting for self selection linguistic proficiency is not associated with higher productivity of public sector workers, and merely represents a requirement for being hired in that sector.On the other hand, the results also indicate that the probability of being proficient in Catalan is strongly related to individual characteristics, which are also related to labour market success (e.g.age, origins, education).This means that, in all likelihood, many disadvantaged individuals are prevented from being able to speak and write Catalan because of the same characteristics that tend to penalize them in the labour market.As a consequence, the strict regulations on lan guage requirements for entering the public sector represent a clear barrier to them, and may be responsible for some discrimination in the labour market.In terms of policy implications, it is quite possible that lowering the linguistic requirements for working in the public sector may generate a positive effect, at least in terms of equity; this is especially true if we con sider the historical role of public sector occupation in Mediterranean countries as a social safety net.

Notes
1.In fact, Spanish (or Castilian) is taught as a second language in preuniversity education.At university the lan guage used is not determined by law, and is established by the professor.
2. With the 1998 Act, private suppliers of public services have been subjected to almost the same linguistic re quirements as the public sector.Moreover, the Catalan government has introduced economic incentives for "normalizing" Catalan in private firms and stimulating active learning of this language by workers; finally, the government also promoted the language through the mass media by introducing incentives in the use of Cata lan on radio, TV and also in the newspapers and written publications in general.
3.Moreover, given that the information on language knowledge is habitually selfreported in the surveys, mis classification/measurement error is also a problematic issue in the empirical literature which introduces addi tional methodological complications; see Dustmann andVan Soest (2001, 2002) for further details.
4. Their work is focused on the Spanish labour market in Ginsburgh and PrietoRodriguez (2007), whereas they extend the analysis to nine European countries (including Spain) in Ginsburgh and PrietoRodriguez (2011).
5. Consistently, individual earnings in the public sector are determined on the basis of educational attainment, professional category and seniority.This means that two identical individuals who are working in the same occupation cannot be discriminated in terms of earnings on the basis of language proficiency; however, dur ing the hiring stage, there may be significant discrimination in favour of the candidate with knowledge of Catalan.
6.The value reported is the mean unemployment rate between the fourth trimester of 2005 and the third trimester of 2006.The information is taken from the EPA (Encuesta de la Población Activa, Active Population Survey (INE)) for Catalonia.Similar values for the unemployment rate in Catalonia had not been recorded since 1978.
7. We also tried to estimate a Probit model for employment, but it performed very poorly due to the extremely low number of zeros (unemployed individuals).
8. Indeed, Rendon (2007) recognizes that an important limitation of his study is due to the Census data that he used, which does not permit the estimation of the return to Catalan knowledge in terms of earnings (because this information is not reported in the database).
9. This means that we may be estimating the lower bound of the true return to Catalan knowledge; in fact, it is quite reasonable to assume that individuals tend to overreport their true language abilities.Therefore, we be lieve that individuals who claim to be able to speak and write Catalan have at least an acceptable functional knowledge of the language.
10.The information on individual earnings is presented in brackets in the ECVHP06 survey.We adopt the stan dard solution of estimating an interval regression ("intreg" command with STATA); anyway, using a contin uous earnings variable over the midpoints of each earnings interval yields almost the same results; see table 1A in the Appendix for more details about the coding of monthly earnings.
11. Another potential explanation for this descriptive evidence could be that public sector workers who speak and write Catalan are more likely to work in Local Government Institution, which may earn higher salaries than public workers in Central State Institutions; we are grateful to an anonymous referee for this suggestion.
12. Descriptions of each explanatory variable can be found in table 1A in the Appendix.
13.In fact, we consider two similar individuals who differ only with respect to knowledge of Catalan, because one of them is fully proficient and the other is not: the former has a clear comparative advantage over the latter for entering the public sector, which would mean a higher likelihood of working in that sector (ceteris paribus).
14. Notice that, since the covariance between e PUB1 and e PUB0 is not directly identifiable, the variancecovariance matrix has been split into two matrices.
15.Given that the dependent variable is defined in intervals (see table 1A in the Appendix), we use the Interval Regression Method that is considered a better approximation for such kind of dependent variables; we thank an anonymous referee for this suggestion.In any case, the estimation by OLS using the midpoint of each in terval as dependent variable yields almost the same results (available upon request).
16. Please, notice that these earning differentials by origins represent a "ceteris paribus" evidence.The results from a model with only origin dummies indicate that immigrants from other Spanish regions tend to earn less than native; this differential is progressively reduced once other earnings' determinants are introduced into the model, and reverted once Catalan proficiency is controlled for.This means that a significant earnings gap by origin is explained by differences in characteristics that are in some way associated with origins, among which language knowledge represents an important element.17.Notice that, as usually argued in the literature, schooling could be an endogenous variable and its coefficient should not be interpreted in causal terms.Even so, we tried to instrument years of schooling using parental ed ucation and parental occupational status and the results about the return to Catalan knowledge -the variable of interest in this paper-are virtually the same (even controlling for the endogeneity of Catalan proficiency).
18.The coefficient estimate for the quadratic job tenure in both equations and for previous experience in the pub lic sector earnings equation are not statistically different from zero and they have been dropped from the equa tion.There exclusion does not modify the rest of results.
19.Even so, the negative sign of the interaction (Born in Spain)×YSM indicates that immigrants from the rest of Spain are less likely to be proficient in Catalan as the length of their stay increases; therefore, the advantage of individuals who were born in Spain with respect to foreigners decreases with time spent in Catalonia.This shows that individuals who came from the rest of Spain in the past may have had fewer incentives to learn Catalan, since they may well have arrived when the use of this language was still restricted to oral communi cation.
20.We also tried to include the number of children, but its coefficient is not statistically different from zero (a common result in the literature); moreover, including or excluding this variable does not modify the overall results obtained.
21. Also in this case, we use the Interval Regression Method to better account for the intervalcoding of the de pendent variable; as before, the results obtained using OLS with the midpoint of each interval are statistical ly the same.
22. Specifically, we display the zStatistics obtained with the BiasCorrected bootstrapped standard errors, which have been computed with 1000 replications.
23. Estimating the return to Catalan proficiency with two independent correction terms yields similar results in qualitative terms but, for both sectors, the estimated coefficients are somewhat higher than when we control for the correlation between the two selection rules (the results are not shown and are available upon request).This means that, to some extent, a part of the positive effect of knowledge of Catalan is captured by its corre lation with sector choice, which must be taken into account in order to obtain an unbiased estimate of the true value of knowledge of Catalan for private and public workers.
24.This analysis is mainly driven by the useful suggestions received by an anonymous referee, which were ex tremely appreciated by the authors.
25.In the baseline specification we just control for parttime workers introducing an indicator variable, consider ing that it is less likely to be endogenously determined with monthly earnings than the hours of work.
26.That is, it is quite hard to find an individual who acquired some knowledge of Catalan before migrating to Cat alonia, with the exception of those proceeding from Catalanspeaking regions (mainly Valencia and the Balearic Islands).
27.This definition would also capture the fact that these are immigrants that completed their education in Catalo nia; even if Catalan was (still) not the language of education when they moved, the contact with Catalan speaker pupils at school facilitates the assimilation of the language and the local culture.In any case, the re sults are the same using a fixed value of the age at migration, even with very restrictive thresholds.In fact, 776 of the 950 immigrants that respect this condition migrated when they were 10 or younger.
Previous Experience = potential work experience, previous to the current work (age-schooling-(job-tenure (in months))/12 -6).*Variables constructed by IDESCAT staff, from the original registers of the survey (maxi mum desegregation).

Table 2 MEANS AND EQUALITY TESTS FOR MONTHLY EARNINGS BY ECONOMIC SECTOR AND LANGUAGE KNOWLEDGE Expected logmonthly earnings Economic sector Public Private Difference in mean tStatistics Total sample
Note: Interval regression estimates.Robust standard errors in italics.

Table 6 ROBUSTNESS CHEKS FOR THE RETURN TO CATALAN KNOWLEDGE (d d)
Age = mid points of the original age variable collected in intervals . Arrived at 10 or younger* = 1 if the individual arrived in Catalonia at age 10 or younger, 0 otherwise.Complete Normalization* = 1 if the individual was born after 1977, and arrived at age 6 or younger if born outside Catalonia, 0 otherwise.Partial Normalization* = 1 if the individual was born between 1969 and 1977 and arrived younger than age 16 if born outside Catalonia, 0 otherwise.Years of Schooling = 4 if the individual has no education, or if he/she has completed only primary education (grouped in the original database); 7 for lower-secondary uncompleted; 8 for lower-secondary completed; 12 for vocational education; 13 for general upper-secondary education; 18 for completed tertiary education.Job Tenure (in months) = mid point of the original variable collected in intervals (< 2 years,