The Evolution of Educational Inequalities in Spain: Dynamic Evidence from Repeated Cross-Sections

A lack of longitudinal data prevents many countries from estimating dynamic models and, thus, from obtaining valuable evidence for policymaking in the field of education. This is the case of Spain, where recent education reforms have targeted secondary schools, but their design has been based on incomplete information regarding the evolution of student performance and far from robust evidence concerning just when educational inequalities are generated. This paper addresses the absence of longitudinal data required for performing such analyses by using a dynamic model with repeated cross-sectional data. We are able to link the reading competencies of students from the same cohort that participated in two international assessments at different ages (9/10 and 15/16) and so identify when educational gaps—in terms of gender, socio-economic status and place of birth—emerge. Our results suggest that educational inequalities in Spain originate in lower educational levels. These results stress the importance of early intervention for improving performance during the compulsory education and for tackling educational inequalities.


Introduction
Education plays a major role in skills acquisition. However, as this is a cumulative process (Cunha et al. 2010), inequalities in the acquisition of these skills can emerge at different stages of life and identifying these moments becomes a highly necessary step for the effective design of education policies. Reducing educational inequalities is not only relevant from an equity point of view -for example, Jerrim and MacMillan (2015) show education is one of the main channels through which the Great Gatsby Curve 1 seems to operate-but also for enhancing educational efficiency. For example, recent reports highlight the fact that some of the top-performing countries in international educational assessments are also amongst the most equitable (OECD 2016). Notwithstanding, research has shown that socioeconomic inequalities may emerge early in students' lives (Feinstein 2003;Cunha and Heckman 2007;Heckman 2011), but this evolution may not be homogeneous across countries. Le Donné (2014), for example, shows that the interaction between the institutional features of the education system and the schools and students' socioeconomic status plays an important role driving the effect of social inequalities on cognitive achievement. Thus, policy makers interested in reducing educational inequalities need to identify the moment when socio-economic based inequalities gaps in performance are generated, in their educational system. However, this critical information is not available for many countries.
In practical terms, understanding the impact on academic achievement of the set of individual, household, school and social factors included in the education production function typically requires the use of longitudinal information. Yet, the fact that such panel data are not available in many countries places a major constraint on researchers and policymakers. Given this situation, it is essential to try to identify alternative methodological strategies. One such alternative is the use of repeated cross-sections (RCS) which allow information on different individuals pertaining to the same cohort to be gathered.
RCS are more abundant than panel data and, under certain conditions (formalized by Moffitt 1993;Verbeek and Vella 2005), they are useful for providing consistent achievement estimations in dynamic models. To the best of our knowledge, only De Simone (2013), Contini and Grand (2015) have applied this methodology to dynamic achievement models, focusing on the evolution of the socioeconomic gap between primary and secondary school in Italy. There are nevertheless some discrepancies in their results, probably due to a combination of factors related to the use of different datasets and identification strategies.
Spain is an ideal country for performing this exercise. To begin with, there is an urgent need to provide evidence on the moment in which performance gaps and educational inequalities arise. Seven General Education Acts have been passed since 1978 and, the latest of these-the 2013 Organic Law for the Improvement of Quality in Education (or the LOMCE)-focuses its reforms specifically on lower-secondary education, given the poor performance of Spanish students in international assessments (specifically PISA). Among other measures, the LOMCE stresses the need to raise the profile of school principals, foster greater autonomy of schools, introduce new external assessment tests at the end of primary and lower-secondary education and initiate tracking between academic and vocational pathways from the age of 15 (as opposed to the current age of 16).
These reforms were drawn up on very little solid evidence and, although Choi and Jerrim (2016) 2 provide an initial analysis from a comparative perspective (their results appearing to indicate that educational inequalities emerge long before children enter secondary school), further research is needed to clarify what are critical questions for policymakers. Indeed, previous studies have shown the existence of important educational inequalities at different stages of the Spanish educational system. For example, MEC (2016) describes that the performance gap of 4th grade students whose parents have completed higher education studies and those whose parents have completed at most lower secondary education, is lower than the OECD average. However, the conclusion is the opposite-that is, educational inequalities at ages 9/10 are larger in Spain than the OECD and EU averages-, when the occupational category of parents is considered, instead of their educational level. Furthermore, OECD (2016) shows that 15-years-old Spanish students coming from low socioeconomic background face a 600% larger risk of obtaining a low score in the scientific competencies assessed by PISA compared to their high socioeconomic status counterparts. This figure is among the highest across the OECD countries (the OECD average is 441%). The effect of parental socioeconomic status has also been linked by authors such as Fernández-Macías et al. (2013) or Guio et al. (2017) to one of the main problems of the Spanish educational system, the high early school dropout rates (19% in year 2017). However, there is very little evidence on the evolution of these inequalities (Fernández 2014). The current lack of evidence for Spain may well reflect the inexistence of adequate longitudinal data for assessing such questions. However, because various Spanish cohorts have participated in several international assessments, we are able to exploit the strategy proposed by Moffitt (1993).
The contribution of this article is twofold: first, it describes the evolution of educational inequalities by gender, country of birth and socio-economic status (SES) in Spain between the ages of 9/10 (primary education) and 15/16 (lower-secondary education). Second, it combines RCS from two different international assessment tools (Progress in International Reading Literacy Study -PIRLS-and the Programme for International Student Assessment -PISA-), and employs a strategy that should widen the number of countries capable of overcoming their data constraints through the use of RCS. In addition, and given its widespread use in Spain, we explore the effect of grade retention at the lower-secondary school level on academic performance.
This paper now proceeds as follows: Sect. 2 reviews the conditions that have to be met in order to estimate dynamic models with RCS. Section 3 describes the data. Section 4 outlines the empirical approach employed to implement the analysis and discusses the main results and policy implications. Section 5 concludes.

Methodology
Building on the idea that the formation of human capital is a cumulative process, the learning contribution of each stage in the educational process is added to the learning acquired in the previous period. Here, we present a methodology for examining the impact of a set of individual and household-level characteristics on reading competencies at age 15/16, considering previous achievement at age 9/10. Educational inequalities may emerge during this process and understanding the evolution of these inequalities and whether they are reduced or not is crucial to improving the education system. In this regard, we assume the following linear autoregressive model, the theoretical properties of which provide a good representation of a cumulative learning process: where Y i;t and Y i;tÀ1 account for the performance of student i during two stages of her schooling (i.e., secondary and primary school, respectively),X i is a set of time-invariant determinants of cognitive skills, and e i;t is the error term. Our aim is to identify how the total effect of the individual and household-level variables on education performance evolves over time. These gross effects are composed of direct effects, as well as of indirect effects working through school and peer characteristics. Other time-variant characteristics are deliberately excluded from the estimation to ensure consistency of the model. Therefore, our set of explanatory variables is time-invariant. In Sects. 2.1 and 2.2, we address the conditions for the identification and consistent estimation of Eq. (1) using imputed regression methodology on our sources of data.
To analyse the contribution of each stage of schooling to the competencies acquired by students, we allow our parameters to change over time, given that the effect of the explanatory variables is not expected to be constant over the whole process. Therefore, we need to consider both assessments separately and estimate one equation for each stage of the student's schooling. Then, we can express Eq. (2) as: Primary school achievement Secondary school achievement We are particularly interested in the parameter b that indicates differentials in achievement between both stages conditioned on primary school performance. Besides, the relation between c and b measures the evolution of learning inequalities: a. If c 6 ¼ 0, and b ¼ 0, the effect of the explanatory variables is centred on primary school, and students catch up in secondary school conditioned on previous achievement. b. If c ¼ 0 and b 6 ¼ 0, learning inequalities emerge at secondary school conditioned on primary school achievement. c. If c and b have the same signs, inequalities increase, and if they have opposite signs they decrease or change direction.

Estimation of the Dynamic Model in the Absence of Panel Data: Imputed Regression 3
In order to estimate Eq. (3) as it stands, we need longitudinal data about the students' performance. Unfortunately, this data is not available for Spain so, as an alternative empirical strategy, we use data from independent cross-sectional surveys conducted at primary and secondary schools. Here, we draw on the previous work developed by Moffitt (1993) and, later, by Verbeek and Vella (2005), which discusses the conditions for the identification and consistent estimation of linear dynamic panel data models with RCS.
The main challenge is obtaining information about Y i;tÀ1 in the absence of panel data. Basically, Moffitt (1993) proposes replacing the lagged dependent variable Y i;tÀ1 in Eq. (3) with an estimated value ofŶ i;tÀ1 based on an auxiliary regression on individuals from previous cross-sections that share the same observed characteristics. Moreover, Verbeek and Vella (2005) argue that to obtain consistent estimates, the explanatory variables must be time-invariant or not auto-correlated time-variant variables. Our set up meets this requirement by construction, as all our exogenous variables are time-invariant individual and household characteristics. Furthermore, by including exactly the same set of independent variables in Eqs. (2) and (3), the model is not identified when substituting the lagged dependent value with its correspondent estimate, asŶ i;tÀ1 is a linear combination of the explanatory variables. Thus, to address issues of multicollinearity, we need to find additional time-invariant regressors, W, that fulfil two specific conditions: a. They must be correlated with Y i;tÀ1 and cannot be relevant for Y i;t . b. They must be observed at each stage of the educational process.
When we impose these conditions upon our model, we obtain the following equations: and substituting Y i;tÀ1 by its OLS estimateŶ i;tÀ1 , By including additional regressors, W, that fulfil the above conditions, the measurement error in primary education achievement,ðY i;tÀ1 ÀŶ i;tÀ1 ), is not correlated with the X's. Besides, the measurement error is also uncorrelated with the lagged dependent variable according to its OLS properties. Hence, our model is identified and OLS estimates can be considered consistent.

Selection of Additional Explanatory Variables (W)
To the best of our knowledge, only De Simone (2013) using TIMMS and Contini and Grand (2015) drawing on Italian data have applied this methodology to the analysis of achievement inequalities between primary and secondary school. 4 Here, we adopt an identification strategy that relies on two variables: month of birth and attendance of pre-primary education. We expect these variables to have a strong impact during early stages of education, while the effect-if any-should operate, during lower secondary schooling, via the students' previous performance. While we are unable to check this condition directly for Spain (again, owing to a lack of longitudinal data), there is an abundant literature indicating that both are suitable variables.
In the case of the first variable (month of birth), Crawford et al. (2007aCrawford et al. ( , b, 2013 and Robertson (2011) report that the differences in academic performance attributable to this variable diminish as children grow older. But while Crawford et al. (Crawford et al. 2007b) find these differences still to be significant at age 16, Robertson (2011) shows that the gap has been eliminated by eighth grade (age 13/14). A more detailed discussion on the suitability of using month of birth as a means for identification can be found in Contini and Grand (2015).
As for the second identification variable, 5 there is an established strand in the Economics of Education literature that investigates the effect of school-entry age on educational achievement and other outcomes. A common finding is that attendance of preprimary education has a large positive effect during lower grades, but that it weakens over time (Bedard and Dhuey 2006;Black et al. 2011;Fletcher and Kim 2016). Crawford et al. (2007a) found that the large and significant differences observed in educational performances do not lead to pervasive differences in adulthood. Likewise, Elder and Lubotsky (2009) present evidence that age-related differences in academic performance dissipate as children advance in their schooling, the authors attributing most of the initial differences to the accumulation of skills before children enter kindergarten.

Data
Since the 1990s, Spain has participated in various international assessments gathering cross-sectional information on student performance in relation to a number of competencies. Having specified above the conditions for applying an RCS strategy, it is clear that we need to identify at least two assessments that (i) follow the same cohort of Spanish students at different points in time; (ii) measure performance in similar competencies; and (iii) include the same information about the students' characteristics and background. Below, we discuss the suitability of PIRLS 2006 and PISA 2012 for performing this analysis.
The OECD's PISA assesses the reading, mathematics, science and problem-solving competencies of 15-year-old students, on a triennial basis. However, it does not follow the evolution of students over time and it provides no information regarding their previous achievement. A total of 65 countries, 34 belonging to the OECD and 31 partner countries, participated in the PISA 2012 assessment (OECD 2014a). PISA 2012 assessed students born in 1996, that is, in the case of Spain, students who are typically enrolled in their last year of compulsory secondary school (ESO).
PIRLS, conducted every five years by the International Association for the Evaluation of Educational Achievement (IEA), located at Boston College's Lynch School of Education, assesses student reading achievement in fourth grade and, in 2006, was implemented in 40 countries. As such, our analysis focuses solely on reading competencies. 6 PIRLS and PISA are regarded as being representative at the national level, share similar sampling designs and response rates 7 and, interestingly for our purposes here, most students participating in PIRLS 2006 were born during 1996 and so belong to the same cohort as PISA 2012 students. However, certain adjustments had to be made to enhance comparability of the two assessments. In the case of the PIRLS database, we discarded those students not born in 1996, so that none of our final sample had repeated a grade during primary school. Likewise, we also removed from the PISA database students that reported having repeated at least one grade during primary school. Additionally, we eliminated from PISA 2012 first generation immigrants who reported arriving in Spain after year 2006and who, as a result, could not have participated in PIRLS 2006. However, this means our having to assume there was no international mobility of students during the period. As will be seen, we impose one more restriction: we assume no cross-regional mobility within Spain during the period.
Throughout the following analysis, we account for the clustering of children within schools in both assessments by making the appropriate adjustment to the estimated standard errors (using either the STATA 'repest' or 'pv' survey commands). Weights, which attempt to correct for bias induced by non-response, while also scaling the sample up to the size of the national population, have been applied throughout the analysis.
As discussed, our strategy is to treat the results from PIRLS 2006 (the auxiliary sample) as an indicator of student reading competencies towards the end of primary school, and those from PISA 2012 (the main sample) as an indicator of reading competencies towards the end of compulsory secondary school. 8 However, there are differences between the skills being measured by the two assessments: PIRLS focuses upon children's reading performance in an internationally agreed curriculum; PISA focuses on reading competencies-that is, the use of skills in everyday situations. Jerrim and Choi (2014: 353) in discussing the two, conclude that we cannot rule out the possibility of there being some 'subtle' differences in the precise skills being measured. As such, we recognize this limitation and proceed with due caution.
Differences also occur in the respective score metrics used by PIRLS and PISA. Although they both use a set of five plausible values for measuring reading competencies, with a mean of 500 and a standard deviation of 100, the assessments base the performance scores on two different sets of countries. This means the results are not directly comparable, as the countries participating in the two assessments are not the same. We overcome this by adopting the approach proposed by Brown et al. (2007), that is, we transform the test scores from each survey into international z-scores with mean 0 and a standard deviation 1, across the 25 jurisdictions participating in PIRLS and PISA.
Finally, PIRLS and PISA provide comparable information on time-invariant student background characteristics, which are required to estimate the evolution of performance gaps across time. School characteristics, which are also available in the two assessments, are not used, as the individuals in the RCS differ. Moreover, the names of the schools are coded in both assessments and, even if we were able to identify them, it would not be possible to link the primary schools in PIRLS to the students in PISA. Both assessments provide information on gender, month of birth, attendance of pre-primary education, place of birth of students and their parents, and background characteristics. It is important to consider the timing of potential gender differences of Spanish girls who, like in most countries (OECD 2014a(OECD , 2016, outperformed boys in the PISA 2012 and 2015 reading competences. Likewise, immigrants in Spain tend to achieve worse results than native students, and their performance improves with time spent in the country (Zinovyeva et al. 2014). We therefore include in our estimation controls for first and second-generation immigrants to capture this source on inequality. We proxy SES using two variables: the highest level of parental education and the number of books in the home. The choice between these variables is not trivial. Bukodi and Goldthorpe (2012) discuss the independent and distinctive effects of the different components of socioeconomic status. The positive relationship between the education of the former and that of their children has been studied in depth by the intergenerational mobility literature (Holmlund et al. 2011). In the case of the number of books in the home, Jerrim and Micklewright (2014) have raised some concerns, which we acknowledge here, as to whether it is a robust proxy for SES and regarding the accuracy of its measurement. However, given the fact that this variable books has been frequently used as a proxy for SES (Schütz et al. 2008; Hanushek and Wößmann 2011, among others), we estimate our models twice, employing the two variables separately.
Finally, in line with Contini and Grand (2015), we introduce regional (Comunidad Autónoma) dummies; in other words, we assume that students did not migrate across regions during the 2006-2012 period. Besides, this is particularly important in Spain given the existence of decentralized educational competences that might lead to regional differences. Multiple imputation by chained equations (MICE) algorithm (Royston and White 2011;StataCorp 2013) is applied in both databases to account for missing data. 9

Empirical Approach, Results and Discussion
Below, we specify the application of the two-step methodology adopted here to create a pseudo-panel that combines microdata from two international cross-sectional databases, namely, PIRLS 2006 and PISA 2012. These two tools assess the same cohort of students at two different moments in time: when the students are 9/10 (2006) and when they are 15/16 (2012).

First Stage: Estimating Achievement at Age 9/10
Our aim in the first stage is to estimate predicted reading skills of students aged 15/16 in 2012, taking into account their performance six years earlier. Thus, using PIRLS 2006 data, we first estimate the determinants of their academic achievement in reading at age 9/10. In this linear model, the dependent variable takes into account the five plausible reading scores provided by PIRLS, while the independent variables comprise a battery of individual and household-level time-invariant variables, available and identical in both PIRLS (2006) and PISA (2012)-summary statistics are presented in Tables 3 and 4 in the Appendix, respectively. The results of the education production function in PIRLS are shown in Table 1.
We first focus on the analysis of the additional explanatory variables (W) that allow the estimation of our model: month of birth and attendance of pre-primary education. The fact that both variables are statistically significant indicates their relevance during early stages of education, which is reassuring for our identification purposes. Moreover, the negative impact on reading scores at age 9/10 of having attended ISCED0 (pre-primary) for less than one year and being born in the final months of the year is consistent with previous studies. For example, research in human capital development has emphasised that differences in childreńs cognitive skills emerge at early ages, and therefore early investments (e.g. pre-primary schooling) provide the support for later attainment (Carneiro and Heckman 2004;Cunha and Heckman 2008;Almond and Currie 2011). Regarding month of birth, previous research has found that children who are older within their academic cohort achieve better examination results, on average, than their younger peers (Bedard and Dhuey 2006;Datar 2006;Puhani and Weber 2007;McEwan and Shapiro 2008;Smith 2009;Black et al. 2011;Fredriksson and Ö ckert 2014). This pattern is consistent across countries for children at early stages of education.
All the remaining variables included in the estimation are significant, with the exception of gender and some of the dummies for the regional variables. Their coefficients report the expected sign and values. In primary education, there appears to be no gender differences in relation to reading scores. Belonging to an immigrant household (first or second generation) has a negative influence on scores. In contrast, a household's socio-economic background, proxied through the parents' highest levels of education (or the number of books in the home- Table 5 in the Appendix, first column) are significantly related to children obtaining higher reading scores. As in similar studies (Contini and Grand 2015), the model's goodness-of-fit is not high, as time-variant and school level variables are not included in the analysis.

Second Stage: Estimating Achievement at Age 15/16
In the second stage, we apply the parameters obtained in the first regression to the PISA sample and obtain the predicted value that a student in this PISA database would have obtained on PIRLS. To do so, we add an additional column to the PISA 2012 database: i.e. the student's predicted score on PIRLS 2006. The predicted z-scores of the earlier achievement in reading are, for PISA 2012, an average of 0.151 points with a standard deviation of 0.326 points.
With this information, we are now in a position to work with the PISA 2012 database. We estimate a linear model in which the five plausible values for reading competencies provided by PISA 10 depend on the set of individual and household variables included in PIRLS-excluding our two identification variables, Attended ISCED0 and Month of Birth. More specifically, we estimate three models of reading achievement: first, a static crosssectional model; second, a dynamic model (which includes previous achievement); and, third, a dynamic model that incorporates a grade retention variable. It should be borne in mind here that other characteristics (e.g. type of school attended) are intentionally not controlled, so that the parameters proxy all the channels via which family background influences the students' test performance. 11 The results of the three models are shown in Table 2. To check the robustness of the household socio-economic background proxy, these estimates were replicated with the ''Books at home'' and similar results were obtained ( Table 5 in the Appendix).
Our PIRLS sample consists of 2,381 individuals and the PISA sample contains 21,230. While the PISA sample is close to the size (Contini and Grand 2015) consider optimal for obtaining reliable estimates (30,000), the PIRLS sample size may be cause for concern. However, as long as the PIRLS sample represents the total population (which is the case here), given the aim of the first stage (namely, obtaining consistent estimates for imputing predicted previous performance), sample size is not a critical issue.
Indeed, in the two-sample two-stage least squares (TSTSLS) methodology (Arellano and Meghir 1992) applied in the earnings mobility literature, and which is theoretically similar to the approach we adopt here, sample size in the first-stage auxiliary database is frequently considerably smaller than that of the main sample. This strand of the literature, as well as (Contini and Grand 2015), stress the importance therefore of the correct selection of the imputed variables. 12 10 Following Hox (1995) and OECD (2104b), we take into account the five plausible values, set of weights and nested nature of PISA. 11 A discussion of the different channels via which SES can affect academic performance can be found in Willms (2006). 12 Jerrim et al. (2016) analyse the robustness of the TSTSLS methodology and provide a recent review of articles using this approach. They also review the sample sizes of the main and auxiliary databases employed in these articles. Table 2 shows the results from the static model and two dynamic specifications, in the second of which we incorporate grade retention information. The results displayed in the (3)-show that most of our explanatory variables are statistically significant, have a substantial effect on achievement and present the expected signs. Individual socio-economic characteristics, measured by parental education and immigrant condition, are strong predictors of performance and indicators of the presence of marked educational inequalities at this stage. Likewise, female students perform decidedly better than males. Results in the first column also show the existence of heterogeneity across regions, this being coherent with substantial mean differences in PISA results across Comunidades Autónomas. The determination of the causes of the cross-regional different effects falls however out of the scope of this research. The static specification is especially informative about the learning differences in place at age 15/16. However, as the specific aim of our study is to analyse how these inequalities evolve over time, the results derived from the dynamic model are of more interest. Thus, if we examine the pseudo-panel estimates in the second column of Table 2, we observe that previous academic achievement has a strong and significant effect on secondary school performance. Gender and immigrant condition inequalities seem to accumulate during secondary school, as the corresponding coefficients have similar magnitudes and are statistically significant. However, the value of the coefficient for first generation immigrants falls when we control for previous achievement, suggesting that the poor performance of these students is generated at an earlier stage in the education system. This is consistent with the cultural assimilation hypothesis (Levels et al. 2008). Results for gender are also in line with the gaps identified by other studies such as Machin and Pekkarinen (2008).

Findings
Interestingly, the estimates for the variables of a family's socio-economic background present a sizable reduction in magnitude when we condition on primary school achievement. The magnitude of this reduction depends on the SES variable chosen; thus, we find a greater reduction for parental education than for number of books in the home. This result indicates that socio-economic characteristics affect secondary school performance through their impact on earlier academic achievement. Students from more disadvantaged family backgrounds perform worse in primary education and this seems to operate as a transmission mechanism that increases inequalities in secondary education.
In the dynamic specification, it should be borne in mind that the model is estimated on children from the 1996 birth cohort. This means we exclude children who have repeated a grade during primary school. The potential sample selection bias that might be generated by this exclusion will affect our independent variables and, as such, will not generate unbiased estimates, although the standard errors will be larger.
Finally, we re-estimate the dynamic model, incorporating grade retention at the lower secondary school level as a covariate (column 3 of Table 2). While our empirical strategy does not allow us to determinate causality, it does show that grade repetition during the lower secondary education has a negative association with performance at age 15/16 (even after controlling for prior performance, an exercise which has hitherto not been performed, to the best our knowledge, for Spain 13 ). This result lends further support to the recommendations of Liddell and Rae (2001), Choi and Calero (2013), among others, who argue for the need to introduce alternative measures to grade retention, given the ineffectiveness of grade retention in increasing academic performance.
In summary, our findings suggest that: (i) reading competencies at the end of lowersecondary school are heavily dependent on achievement at primary school; (ii) the size of the socio-economic gap in lower-secondary school is narrowed when previous achievement is taken into account, and the magnitude of this reduction depends on the chosen proxy for SES; (iii) there is a consistent widening of the gender gap in reading competencies between the ages of 9/10 and 15/16; (iv) the negative effect of being a first generation immigrant on reading performance seems to be dragged from the early stages of the education system; and, (v) grade retention during lower-secondary school is negatively and strongly correlated to reading performance.

Conclusions
This article has sought to (1) assess the evolution of educational inequalities between primary and lower secondary education in Spain; and, (2) explore the utility and limitations of RCS for undertaking dynamic analyses of academic performance in the absence of longitudinal data.
As regards the first of these objectives, our results stress the relevance of achievement at early stages of the education system: receiving early childhood education (ages 0-3) has a positive effect on reading competencies at age 9/10, which in turn affects performance at age 15/16. Being able to incorporate previous achievement into the analysis reveals an important finding for Spanish policymakers: SES-based inequalities in reading competencies are already present at age 9/10 and appear to become more marked during lower secondary schooling. The achievement gap between native and immigrant students also increases between ages 9/10 and 15/16, but is narrowed when previous achievement is incorporated into the static framework. These results stress the importance of early intervention for improving performance during compulsory secondary education and for tackling educational inequalities. They also seem to indicate, in line with Choi and Jerrim (2016), that it would have been desirable that the 2013 education reform act passed in Spain-our results refer to 2012-should have put more emphasis on reforming lower levels of the education system, where most problems seem to concentrate. For example, extending compulsory education to early childhood and introducing targeted measures at the primary school level may at the same time help enhance academic performance and reduce educational gaps. Our results also suggest that Spanish education authorities need to reconsider the systematic application of grade retention in secondary schools, as grade repetition during lower secondary education negatively affects students' subsequent performance, even after controlling for their prior performance at primary school.
As for the second of our objectives, we have reported an applied example of the potential and limitations of RCS for assessing achievement dynamic models. Our strategy has shown that, in the absence of panel data, the use of RCS may be a valid strategy for identifying specific points in the educational system when different types of inequalities are generated. However, our findings need to be treated with some caution, given a number of limitations. Here, specifically, the small set of time-invariant individual characteristics constrains the types of inequality we have been able to analyse. Moreover, although not a feature exclusive to this empirical strategy, our results may be sensitive to small differences in the definitions of variables between cross-sections. And, finally, the estimation of achievement dynamic models from RCS using international assessments is currently restricted (a) to mathematical, scientific and reading competencies (given that these tools focus solely on these cognitive competencies), which means other relevant cognitive and non-cognitive competencies are excluded; and, (b) to primary and lower secondary education levels (the levels that international institutions such as the OECD and IEA focus their attention). Future research needs to analyse the magnitude of these limitations and, in this regard, replicating analyses in countries where both longitudinal and RCS data are available may be highly fruitful. Whatever the case, this article has shown that, in the absence of longitudinal data, the use of RCS should be considered by policymakers as a valid alternative for designing evidence-based reforms.       Category of reference: Non-immigrant household, student did not repeat during secondary level, parents' highest level of education (ISCED 2), attended ISCED0 for one year or more, region of residence: ES61. Regions expressed in NUTS-2 codes provided by EUROSTAT *** p \ 0.01, ** p \ 0.05, * p \ 0.1