Vocabulary learning at primary school: a comparison of EFL and CLIL

Comparative studies in content and language integrated learning (CLIL) often show CLIL students to be at something of an advantage over their non-CLIL peers. However, such studies are often difficult to interpret given problems of cross-group comparability (different schooling systems, different number of instructional hours, bias attributable to selection/self-selection, etc.). This study focuses on a single group of schoolchildren (n = 22), aged eight years old, that were exposed to English as a foreign language (EFL) instruction in the fall term and to CLIL instruction (Science) in the winter term. The main objectives are to analyze the vocabulary of the class materials and to examine gains in productive lexical knowledge. Our results show that students were exposed to a greater number of words and to more abstract and technical vocabulary in the CLIL materials, but that they made significant progress in vocabulary learning in both contexts. The study also reveals that learning English through Science proved to be a more challenging experience than learning English in the EFL class.

Learning vocabulary is acknowledged to be an important building block in the acquisition of a language, and it is also reported as being closely related to enhanced language proficiency. In studies of children learning a new language in a classroom setting, the analysis of vocabulary learning is crucial since acquiring vocabulary is traditionally one of the main objectives for teachers and materials developers alike in the early phases of second language acquisition. This paper reports a small-scale study of vocabulary learning by a group of primary schoolchildren learning English in a traditional English as a foreign language (EFL) classroom and in a Science class taught in English based on a content and language integrated learning (CLIL) methodology. The study first offers an analysis of the EFL and CLIL materials used in the two classes before presenting a comparison of the vocabulary gains associated with these materials.

Background
Before 1980, European schools that offered the possibility of learning content through a second/foreign language were limited to a minority of elite institutions, but since the *Corresponding author. Email: tragant@ub.edu provide evidence of significant development in the receptive vocabulary skills of preschool children (Buyl and Housen 2014). Adopting the perspective afforded by discourse analysis, Dalton-Puffer (2007) has shown how CLIL students often notice lexical gaps in their own oral output and are able to self-initiate repair, something that is less frequent in typical EFL students. Ruiz de Zarobe's (2010) work on written production indicates that after the second year, a group of secondary school students enrolled in an intensive CLIL program obtained better vocabulary scores than those recorded by students in the non-CLIL and in the less-intensive CLIL programs. Quantitative analyses of written production in upper secondary education (Jexenflicker and Dalton-Puffer 2010) also show that CLIL students tend to make greater use of lessfrequent words than their non-CLIL counterparts and that the former show more lexical variation. Lexical variation in essays was also found to be higher among primary school CLIL students in Agustín Llach and Jiménez Catalán's study (2007). A different approach to vocabulary research in CLIL is the use of discrete-point vocabulary tests. Within this framework, Jiménez Catalán and Ruiz de Zarobe (2009) found that the difference in favor of the CLIL group was more evident in the case of the most frequent 2000-word test than in that of the 1000-word test among students aged 11-12. Sylvén (2004) and Seregëly (2008) also found that CLIL learners outperformed traditional learners in a battery of vocabulary tests. However, and in spite of these positive findings, evidence about the advantages of CLIL have generally been more conclusive in the case of receptive rather than of productive knowledge and skills (Dalton-Puffer 2007;Fusté and Miralpeix 2013;Heras and Lagasabaster 2015). The present study opts to examine specifically the productive knowledge schoolchildren gain when learning vocabulary through CLIL instruction.
This study has, moreover, been specifically designed to avoid one of the most salient problems in existing research, namely the comparability of CLIL and non-CLIL groups, especially when the number of schools involved is small (see Aguilar andMuñoz 2014, andBruton 2011, for a discussion of other problematic aspects of CLIL research). This is the case of studies in which the two groups are from different cities and/or belong to different school systems as occurs, for example, in Moghadam and Fatemipour (2014). The limitation is often attributable to the fact that the CLIL groups receive considerably more hours of instruction, a frequently unavoidable confounding variable, as in Jiménez Catalán and Ruiz de Zarobe (2009). Another factor that also makes comparisons less valid is when CLIL students are selected or self-selected. In fact, this was the case in Heras and Lagasabaster's study (2015) of a school in which entering the CLIL program was voluntary. The lack of group comparability is, in general, a major challenge in L2 classroom research, but it is especially difficultif not impossibleto avoid in the context of CLIL instruction. Aware of these challenges, the opportunity was presented to us of conducting a study of vocabulary gains in which the question of comparability is not a problem since the same group of students underwent a first period of EFL instruction followed by a period of CLIL instruction.

Aims of the study and research questions
The present study focuses on a group of primary school students who followed a program that concentrated their regular English instruction in the fall term and their CLIL instruction in the winter term. The main objectives of the study are to analyze the vocabulary profiles of the instructional materials used in the two teaching contexts and to compare the vocabulary learning resulting from the two instructional periods. More specifically, we aim to answer the following questions: (1) How much and what type of target vocabulary are students exposed to in their textbooks during a first term of EFL instruction and a second term of CLIL instruction? (2) Do students experience any gains in their productive knowledge of the target vocabulary between the start and end of each term? (3) Do students make more vocabulary gains in one or other of the two programs: EFL vs. CLIL?
In addressing the three research questions above, we focus on the key vocabulary appearing in the teacher's manuals of the textbooks under analysis. From this point on this vocabulary is referred to as the 'target vocabulary' and comprises the words that students are expected to learn through direct instruction.

Methodology and context
The school and participants The school is a prestigious boys' school located in a residential area of a medium-sized city in Catalonia (Spain). It is a small school (one class per grade) with spacious facilities and one of its distinctive features is the importance attached to English by its administrators. For example, in the third grade (the target grade in this study), four hours a week are usually devoted to English instruction (the official requirement in Catalonia being two hours per week). In addition, a further three hours a week are normally devoted to the teaching of Science in English in the third grade, which is when CLIL instruction is introduced in this school. For the purposes of this study, however, the school was requested to modify this program and to concentrate English language instruction into the fall term (seven one-hour periods per week) and the instruction of Science in English into the winter term (seven one-hour periods per week). This new program allows us to undertake an independent examination of vocabulary learning in EFL and CLIL classes. In both contexts students worked with a class book/textbook and an activity book, and lessons took place in their regular classroom, except for one lesson each week when the class was held in the school's computer room. The participants in the study were 22 boys aged between 8 and 9 (third year in primary education in Spain) and their English/Science teacher. One student was excluded from the study because of serious learning difficulties and, on analyzing the data, he was found to be an outlier. With the exception of one of the fathers and two of the mothers, the rest of the parents have university degrees and 10 of the children study English as an after-school activity. Even though almost all of the children (20 out of the 22) report English to be an important language to learn, when asked to name their three favorite subjects, English was mentioned by just three of the boys. These children are taught English and Science in English by the same teacher, an experienced male teacher with a good command of English and effective management skills. He is a certified English primary schoolteacher who also has a BA degree in English Studies.

Instructional materials and practices
The EFL lessons were based on a class and activity book called Incredible English Kit 3 (Phillips and Morgan 2007). The class book includes a variety of materials: speaking and reading/listening practice, songs and vocabulary, and pronunciation and covert grammar activities. Vocabulary in this class book is formally introduced on the first page of each unit accompanied by a large illustration, but it is not typologically enhanced in any way on the remaining pages of the unit. In the vocabulary exercises in the activity book, words are printed on a gray background.
Most EFL classes started as a whole class session using the class book, while the last part of the class was often devoted to individual work in the activity book. One session a week took place in the computer room, where students were asked to prepare PowerPoint presentations to illustrate the vocabulary of the unit they were then studying. The teacher covered most of the materials from the class book and the boys were usually asked to complete all the activities in the activity book. At times the teacher was also observed to provide unplanned explanations outside the textbook (i.e., similarities between L1 and L2 words, cross-cultural differences, etc.). During these lessons the teacher almost always addressed the children in English and, when asking them questions, he expected them to answer in English.
The Science lessons were based on a text and activity book called Top Science 3 (Santillana 2011). In the textbook, information is presented in texts and via numerous photographs and drawings, which are complemented with questions, hands-on activities, and written exercises. In this textbook, target words are highlighted in bold in the body of the texts in which content is introduced, and these texts are often accompanied by pictures and drawings. The visuals are usually accompanied by a short descriptive caption, as well as arrows and labels to highlight concepts and vocabulary. In the activity book, words are highlighted individually on a yellow background within a black frame.
In class, most Science sessions started with a teacher presentation of the contents followed by practice activities that were usually carried out with the whole class group. In the same way as the EFL lessons, one session a week took place in the computer room, and it was also devoted to preparing PowerPoint presentations on the vocabulary being studied. The teacher made a selective use of the textbook as he thought some of the texts were too difficult for the students, but all the activities from the workbook were completed and all the target vocabulary from each unit was covered. At times students were observed to engage in spontaneous discussions or to self-initiate questions to the teacher in relation to the content of the lesson (e.g., a discussion in L1 about whether plants can move; a student's question in L1 as to why salamanders do not die when they lose their tail). However, these opportunities were not always picked up on by the teacher. During these lessons, the teacher almost always used English but, in contrast to the EFL lessons, he sometimes allowed students to use Spanish or Catalan to respond to his questions.
Students' self-reported evaluations of the English and Science lessons were quite similar, with three quarters of the class saying that they liked them or liked them a lot and only a minority (one or two students) saying they did not like them much or did not like them at all. About the same number of students (roughly half the class) thought English and Science lessons were neither difficult nor easy subjects, although there were a few more students who found the EFL class easier or much easier than the Science class (26% vs. 17.4% respectively).

Instruments and analysis
Textbook analysis and vocabulary tests constitute the two primary instruments employed in this study. Secondary sources include six classroom observations, two teacher interviews, and a questionnaire with bio and attitudinal data that students completed at the beginning and end of the school year. The vocabulary analysis of the two course books focused on the target words in the units that were covered during the instructional period under analysis; it was carried out with Lextutor. This online program is based on the Lexical Frequency Profiler (Cobb 2014;Heatley, Nation, and Coxhead 2002) and divides the words of texts into first and second thousand levels, academic words, and the remainder or 'off-list' words. 1 The vocabulary covered in the EFL class during the fall term was tested at the beginning and end of that term (September-Time 1 and December-Time 2) while the vocabulary covered in the Science class during the winter term was tested at the beginning and end of that term (December-Time 1 and March-Time 2). The December test included items from both the English and Science classes presented at random. In each vocabulary test, the students' knowledge of 30 items was assessed. Tests included small drawings, next to which students were asked to write a word indicating what was being illustrated; thus, productive vocabulary knowledge was assessed. The initial letter of each word was provided as a prompt. A receptive vocabulary test was also administered but the results proved to be unreliable and, so, the test is not used in this study (see footnote 2 for an explanation). Because of the format of the productive vocabulary test, only concrete nouns were included (in contrast, the receptive vocabulary test did include adjectives and verbs as well as abstract and concrete nouns). Approximately the same number of items from each teaching unit was assessed in the productive and receptive tests. These items were chosen from the vocabulary component of the syllabus for each unit in the two textbooks. In the EFL test, we included the words that were highlighted on the vocabulary list. The CLIL vocabulary list included more items than the EFL list and they were not highlighted. We included vocabulary items that were easily depicted (and therefore interpreted). See Appendix 1 for the list of words included in the EFL and Science testsnote, these words were the same in both the preand post-tests. The tests were tailor-made for this study since no standardized tests could be used, but they were previously piloted with children of the same age to ensure that the pictures were clear and that the length of the test was adequate.
In scoring the tests, correctly produced items were awarded one point, even if words were misspelled. In analyzing the results, related and independent samples t-tests were conducted. The level of significance was set at .05.

Textbook analysis: vocabulary profile
During the fall term, the first six units of the EFL class and activity books (Incredible English 3) were taught in class. They included the following vocabulary areas: personal possessions, clothes, food, places in a town, animals, and adjectives to describe people and music. During the winter term, five units were covered in class from the CLIL text and activity books (Top Science 3) including the following topics: the body, the senses, living things, vertebrate animals, and invertebrate animals. The textbook analysis reported below focuses on the vocabulary that was encountered in these units, since this serves as the pool of vocabulary that students were subsequently tested on as we sought to answer research questions two and three.
A preliminary examination of the wordlists from the two textbooks reveals certain differences. The average number of lexical items per unit is higher in the Science textbook than in the EFL class book (77 vs. 49 words, respectively), but the proportion of multiword lexical items is considerably lower on the Science wordlists (11% vs. 27% respectively). The most frequent multiword items in the EFL class book are fixed expressions, that is, utterances to be learnt as a whole (e.g., Here you are; What's the matter?; What a mess!), and productive expressions (e.g., I can't find …; Here's …). The most common types of multiword units in the Science textbook are phrasal verbs (e.g., to breathe in; to cut off) and nouns preceded by modifiers (e.g., life cycle; magnifying glass).
A further analysis of the two wordlists (including chunks but not productive expressions) by word frequency (tokens) was performed with Lextutor (Cobb 2014). This analysis is based on the total number of target words in the first six units in the EFL class book and the first five in the Science textbook, the number being considerably higher in the latter (a total of 256 vs. 337 words, respectively). Table 1 shows the proportions of the first and second thousand level words as well as the academic and offlist words in both course books. Here, although some differences are apparent, they are not great. Word types forming part of the first and second thousand levels are slightly more numerous in the case of the EFL class book, while the proportion of academic and off-list words are a little higher in the Science textbook. The same trend applies for Latin cognates, with the Science textbook showing slightly higher indexes.
A closer inspection of the items included among the off-list words reveals more significant differences. For example, in the case of the EFL class book there is a good proportion of lexis related to food (e.g., asparagus, biscuits, broccoli), clothing (e.g., tights, trousers, gloves), and everyday objects (e.g., magazine, jar, torch), whereas the Science textbook list includes a good proportion of classifying nouns and adjectives (e.g., carnivore, canine, auditory), words designating processes or abstract concepts (e.g., germinate, height, vibration), and nouns related to human and animal body parts (e.g., iris, sternum, womb), some of which are quite technical.
Finally, the words included on the tests used to measure vocabulary gains (see next section) reflected the vocabulary profiles of the EFL and CLIL course books. The EFL test contained more words belonging to the first and second frequency bands (27% and 36%) than the CLIL test (23% and 27%). In contrast, the proportion of 'off-list' words on the CLIL test was higher (50%) than on the EFL test (36%).

Vocabulary gains
Gains in students' productive knowledge of EFL words from Time 1 (beginning of the fall term) to Time 2 (end of the fall term) are examined first. The mean score at Time 1 was 8.9 for the whole sample (n = 22), increasing to 20.1 at Time 2. The related samples t-test indicates that the difference between these two scores is significant with a large effect size, according to Cohen's d (see Table 2). When we repeated this same test with the subsample of students not taking English lessons as an extra-curricular activity (n = 11), the results were equally significant. It should be noted that in both cases the standard deviations are quite high, showing considerable variability among students. Gains in students' productive knowledge of Science words from Time 1 (beginning of the winter term) to Time 2 (end of the winter term) are examined next. The mean score at Time 1 was 11.8 for the whole sample (n = 23), increasing to 20 at Time 2. The related samples t-test indicates that the difference between these two scores is significant with a large effect size (see Table 3). When we repeated this same test with the subsample of students not taking English lessons as an extra-curricular activity (n = 11), the results were equally significant.
Finally, a comparison of the gains between the EFL and CLIL vocabulary tests is drawn. In order to do this, the gains for each student were obtained by subtracting their scores at Time 1 from those obtained at Time 2. The results from an independent samples t-test show that students' mean gains are significantly higher in the EFL context and that the magnitude of this difference is large (see Table 4). In the EFL tests students were able to produce an average of 11 more words at Time 2 than at Time 1, whereas in the CLIL tests students were able to produce an average of eight more words at Time 2 than at Time 1.

Discussion and conclusions
A comparison of the target vocabularies that students were exposed to in the EFL and CLIL contexts revealed a number of differences. A frequency analysis of the target vocabularies showed slight differences between the EFL and Science materials, with the former presenting somewhat higher proportions in the first and second frequency word bands and the latter presenting slightly higher proportions of Latin-based and off-list words. We would expect these differences to become more marked in materials targeting higher level students as contents become more specialized and topics are dealt with in greater depth.
A more detailed analysis revealed other more obvious differences. Thus, the CLIL materials included roughly 25% more items than the EFL materials, which probably reflects the greater role played by content and information in the former. In terms of word types, the differences identified probably reflect the text types included in the EFL and CLIL materials. The frequent presence of stories, songs, and dialogs in the EFL materials would explain why multiword units often included chunks and productive language (more typical of oral language), while the predominance of expository texts in the CLIL materials would explain the presence of nominal premodification and abstract language (more typical of descriptive language and the scientific genre). In terms of the visual presentation of vocabulary, the target words in the CLIL materials were more often and more prominently highlighted than those in the EFL materials. An analysis of the words used in the vocabulary tests that the students took confirmed this difference between CLIL and EFL vocabularies.
An examination of the gains students made between the start and the end of the twoterm period demonstrated that there was a considerable growth in the productive vocabulary of the learners in both contexts, thus confirming the overall positive benefits of CLIL instruction on vocabulary learning (Buyl and Housen 2014;Jiménez Catalán and Ruiz de Zarobe 2009;Seregëly 2008;Sylvén 2004). However, a comparison of the gains between the two contexts showed that, on average, students learned more words from the EFL lessons than from the CLIL lessons and that this difference was significant. The fact that students in the CLIL lessons used their L1 more often than in the EFL lessons to answer the teacher's questions may account, in part, for these results. Likewise, the fact that the vocabulary in the EFL class book included more frequent words of probably greater relevance to students' daily lives may be another explanatory factor. In fact, it seems likely that the vocabulary in the EFL textbook could more readily become part of the existing everyday word network in the child's mental lexicon. In contrast, some of the vocabulary in the CLIL textbook may not yet have an appropriate web of words in the child's mental lexicon to which they might easily connect. Even though receptive knowledge could not be tested in this study (see footnote ii), and it is one of its limitations, it is possible that the impact of CLIL would have been more marked in the case of receptive than productive vocabularies, as reported elsewhere (Ruiz de Zarobe 2011). Additional limitations that need to be taken into account when interpreting the results are that the study involved a small sample of language learners and a few units from the respective textbooks. The fact that the vocabulary test was restricted to concrete words made the test less representative of the vocabulary students were exposed to in their class books, this being especially pertinent in the case of CLIL. A further limitation is the fact that important factors affecting vocabulary learning were not taken into account in the analysis of the data such as word length, frequency of use, textual context, among others, etc.
Despite the scale of the study, we feel it has a number of implications for further research. First, the research design employed here, with the number of hours of instruction in CLIL and EFL being constant, strengthens the validity of the study. The design is also methodologically innovative, at least in comparative studies involving CLIL, and it has proved to be feasible in practice. Second, in the process of designing the vocabulary tests, we faced major difficulties, to the point of having to discard the receptive vocabulary test. Further research needs to pay special attention to how CLIL vocabulary might be assessed with young children. The fact that some of this vocabulary typically includes abstract and technical terms is an added difficulty when it comes to conducting assessments with this age group. Finally, some of the aspects of classroom practice that were briefly described above in contextualizing the results of this study point to some intriguing questions. For example, our observation data suggest certain differences in classroom interaction between the CLIL and EFL sessions, there being more teacher-fronted teaching during the CLIL than the EFL sessions. There also seemed to be more student-initiated moves in the CLIL sessions, in spite of them being in the students' L1. Another issue of interest is the difference in difficulty of the EFL and CLIL textbooks, as identified to us by the teacher. A subsequent analysis of a random unit from the two textbooks confirmed the teacher's impressions: there being an average of 9 and 5.1 words per sentence in the CLIL and EFL texts and an average grade level in terms of comprehension difficulty of 4.2 and 0.6, respectively. 3 Another issue worth exploring is the profile of the teacher and how this affects classroom practice. In this study, the teacher was a language teacher not a science teacher and this might explain the fact that he was observed to provide more unplanned explanations during the EFL than during the CLIL lessons. He also had students spend time during the CLIL sessions on vocabulary practice (PowerPoint presentations), an activity that was also carried out in the EFL sessions and which is more characteristic of a language than a science class. The significance of these aspects of classroom practice could be of special interest for textbook development and teacher training and require further exploration.
All in all, there seems to be sufficient indications in the study reported here to suggest that learning Science in English was a more difficult experience for our eight-year-old learners than learning the language in the EFL class. The CLIL textbook was considerably more difficult than the EFL class book, and the teacher was more permissive with the use of L1 when teaching Science. There were also more target words to be learned in each of the CLIL units, and the analysis with Lextutor revealed there to be slightly more academic, Latin cognate, and off-list words in the Science vocabulary. An examination of the off-list words also revealed that the Science textbook included more words designating processes and abstract nouns in contrast to the more everyday topics covered in the EFL class book. The complexity of the Science lessons was also reflected in students' self-reports, with a few more students finding the EFL class easier or much easier and none finding Science to be easier. Given this evidence, it would be simplistic to interpret our analysis of word gains in isolation without acknowledging that CLIL and EFL lessons offered quite different learning contexts. Besides, it should not be forgotten that vocabulary learning in the context of CLIL is a more complex process, since it is not limited just to learning new words but it also implies understanding new concepts and performing tasks with different cognitive demands.
In sum, the present study has examined the CLIL and EFL approaches with young learners of English following a design that guarantees the comparability of the groups under analysis. The analysis of the vocabulary in the course books used by the students revealed marked differences, indicating that CLIL and EFL materials offer complementary sources of input. The results have also shown that both approaches offer valid contexts for the development of productive vocabulary knowledge and that Science in English is a more challenging learning experience for most children.