Geòrgia Pujadas & Carmen Muñoz (2019) Extensive viewing of captioned and subtitled TV series: a study of L2 vocabulary learning by adolescents, The Language Learning Journal, 47:4, 479- 496, DOI: 10.1080/09571736.2019.1616806 Extensive viewing of captioned and subtitled TV series: a study of L2 vocabulary learning by adolescents Geòrgia Pujadas and Carmen Muñoz Department of Modern Languages and Literatures and English Studies, University of Barcelona, Barcelona, Spain ABSTRACT This study aims at exploring the potential of extensive TV viewing for L2 vocabulary learning, and the effects associated with the language of the on-screen text (L1 or L2), type of instruction (pre-teaching target items or not) and learners’ proficiency. A total of 106 secondary school students (Grade 8) divided into 4 classes participated in a one-year pedagogical intervention, viewing 24 episodes of a TV series under four experimental conditions with each class being assigned to a different treatment: (1) captions and pre-teaching, (2) captions and non-pre- teaching, (3) subtitles and pre-teaching, and (4) subtitles and non-pre- teaching. Following a pre- /post-test design, form recall and meaning recall gains were examined. Results showed that participants learnt vocabulary in all four conditions, with greater gains in recalling form than in recalling form and meaning. The analysis also showed that, overall, groups that were pre-taught the target items performed better, independently of the language of the on-screen text. An important finding is the role of learners’ proficiency prior to the intervention, with higher proficiency related to higher gains. The study contributes to the area of foreign language learning through audio-visual input with results from a longitudinal, classroom-based study with adolescent learners. KEYWORDS L2 vocabulary learning; extensive viewing; captions; subtitles; classroom instruction; adolescents Introduction Vocabulary is an essential part of learning a foreign language (Schmitt 2008) and therefore a main concern for foreign language teachers. Research has shown that learners need to know around 3000 word families to understand oral discourse (e.g. van Zeeland and Schmitt 2013) and between 8000 and 9000 to understand written discourse (e.g. Nation 2006). However, in classroom settings the amount of time that can be devoted to vocabulary learning is limited, and there is a sizable gap between the amount of words that can be explicitly taught and learnt in class and those necessary to achieve higher second language (L2) proficiency (Malone 2018). Research has shown that extensive reading outside the classroom can foster vocabulary acquisition (Nation 2015; Schmitt 2008), but its impact remains limited: the typical learner does not read sufficiently to encounter the same words frequently enough to avoid forgetting them (Laufer 2005); and there has been a drop in the popularity of reading habits (European Commission 2017), especially among young people, who prefer watching TV to reading (Lindgren and Muñoz 2013; Peters 2018). Recognising the potential limitations of reading, Webb and Rodgers (2009a) proposed extensive viewing as an alternative source of rich authentic input, pointing out the potential of television programmes and movies for learning vocabulary due to its lexical richness, repeated encounters with low-frequency words, and visual image support (Rodgers 2018; Sydorenko 2010; Webb and Rodgers 2009b). The addition of captions to this authentic input can provide access to foreign language material that would otherwise be difficult to comprehend for non-native speakers (Vander- plank 2016). Indeed, research has found that even a short amount of input enriched with captions or subtitles can lead to significant improvement in listening, content comprehension and vocabulary learning (e.g. Birulés-Muntaner and Soto-Faraco 2016; D’Ydewalle and Van Poel 1999), but there is still some debate on what language to use in on-screen text. Although typically proposed as a supplementary activity, combining explicit teaching with this media could also yield increased benefits for L2 learning, by deliberately drawing attention to specific vocabulary (Hulstijn 2013). The emergence of multimedia learning environments and the greater accessibility to TV series, movies, and other online platforms in recent years have created opportunities for teachers and lear- ners to boost language learning inside and outside formal settings. The aim of the present study is to explore the learning potential of audio-visual input (i.e. a TV series) with adolescent school learners. Background TV programmes and vocabulary learning TV programmes have several features that make them an effective tool for L2 vocabulary learning. First of all, watching TV is already a popular activity: 88% of people in Spain watch TV daily (European Commission 2017) and if learners were to watch it in the L2 for enjoyment it could be a valuable source of meaning-focused input (Webb and Nation 2017). TV input also complies with Nation’s (2007) five conditions for suitable input (Rodgers 2013): it is processed in large quantities, it is familiar to the language learners, it provides context cues (i.e. through image and dialogue), it is comprehensible (Rodgers and Webb 2011) and it is engaging. Although the association of TV programmes with entertainment has raised the concern that some learners will not pay enough attention to this type of input and will derive few benefits for language learning (Vanderplank 1990, 2016), a number of studies have shown that audio-visual material may enhance learners’ motivation and attention. For instance, Pujadas and Muñoz (2017) interviewed a group of adolescent and young adult learners who had been watching an L2 TV series in the classroom. They reported that watching audio-visual materials was more ‘natural’, ‘enjoyable’ and ‘motivating’ than other classroom activities. This, in turn, led them to be more attentive, helped lessen anxiety and encouraged a stronger feeling of learning. Most research on vocabulary acquisition through audio-visual input has involved incidental learning, where learning occurs as a by-product of another (meaning-focused) activity (i.e. watching a video for its information content). Indeed, a growing number of studies in this area consistently suggest that incidental vocabulary acquisition does occur through viewing short clips, full movies (Peters and Webb 2018), and TV series (Rodgers 2013). Furthermore, Rodgers and Webb (2011) found that related television programmes such as episodes in a series are likely to contain fewer word families than unrelated programmes, and that word families from the 4000 to 14000 levels were more likely to reoccur in a complete season of a television series than in a random sample of television programmes. This suggests that the more episodes you watch from the same TV series, the greater the potential to learn vocabulary from them (Webb and Rodgers 2009a). However, most studies investigating audio-visual input for L2 learning have used only short clips, segments of films, or educational videos, which are largely unrelated, and not fully representative of what a viewer might normally choose to watch (Rodgers 2013). Only a few studies have used longer, authentic input, such as full TV episodes, documentaries or movies (e.g. Peters and Webb 2018). Longitudinal classroom-based studies using several full-length TV episodes are scarce as well. Zarei (2008) used 9 episodes of a British comedy series to assess vocabulary acquisition and comprehension. Rodgers (2013) also investigated incidental vocabulary learning though the viewing of 10 episodes of a TV series and the effects on frequency and range of occurrence. BavaHarji, Alavi and Letchumanan (2014) used 30 episodes of a TV series to examine the effects of captioned instructional videos on EFL learners’ content comprehension, vocabulary acquisition and language proficiency. Frumuselu et al. (2015) studied the acquisition of informal and conversational speech through 13 episodes of a subtitled TV series. Chen, Liu and Todd (2018) explored spoken vocabulary acquisition through 10 episodes of an animated television series. Overall – despite differences in the design and focus of these studies comparing different types of on-screen support – results indicate that incidental learning though sustained viewing does occur and that the presence of captions or subtitles is beneficial rather than distracting. One way to optimise the effectiveness of vocabulary learning through TV programmes – and promote greater learning effort during viewing – is to involve intentional or explicit learning, on the basis of the principle that incidental and intentional learning are complementary approaches that can be integrated (Schmitt 2010). Research in the area of extensive reading suggests that learning rates can be increased by deliberately focusing attention on vocabulary (Elley 1989; Hulstijn 2013), and research on listening and the use of advanced organisers has also shown that including some kind of pre-listening support had a positive effect on comprehension (Chang and Read 2006; Chung 1996, 1999). In the area of audio-visual input, Webb (2010; see also Webb and Rodgers 2009a) investigated the potential of pre-teaching low-frequency words to increase comprehension by analysing the lexical profiles of several TV series. Webb pointed out that television programmes may be too demanding for lower level learners because they do not have the vocabulary necessary to understand the content, and suggested pre- learning unknown topic- related words in a specific television programme to improve comprehension and vocabulary learning, but to date no study has looked at the effects of teaching target expressions as part of a classroom intervention. To the authors’ knowledge, the only studies including such pre-viewing activities are Bravo (2008) and Gesa and Miralpeix (2018), who investigated the learning of lexical items through several episodes of a TV series. However, since their aim was not the comparison between teaching and no teaching, no conclusions can be drawn about the effects of instruction in those studies. In sum, while there has been an increase in studies focusing on vocabulary learning through audio-visual input, there is a scarcity of research on explicit vocabulary teaching to boost vocabulary acquisition through extensive viewing. Captions, subtitles and proficiency The use of original version (OV) TV programmes in the regular language classroom with children or adolescents raises the concern that learners may not be proficient enough to cope with fast speech rate and advanced vocabulary (Guillory 1998). The addition of on- screen text in the form of subtitles (native language [L1] text) or captions (L2 text)1 – commonly available nowadays – may help these learners (Vanderplank 2016). L2 audio- visual materials enhanced with subtitles or captions are robust tools for second language learning as learners are exposed to a large amount of input simultaneously through image, text and sound. Studies on L2 learning from captioned and non-captioned audio-visual materials have consistently shown the advantage of viewing videos enhanced with on- screen text compared to viewing them without it (Mohd Jelani and Boers 2018; Montero- Pérez, Van Den Noortgate and Desmet 2013), but it is still a matter of debate whether, when, and why subtitles or captions may be preferable (Matielo, D’Ely and Baretta 2015). The general consensus in this area is that captions provide more exposure to the target language, thus being more beneficial for language learning and vocabulary acquisition (e.g. Danan 2004; Vanderplank 2010; Winke et al. 2010). Indeed, the majority of comparative studies have found that captions have more positive effects on vocabulary learning than subtitles (Birulés-Muntaner and Soto- Faraco 2016; Frumuselu et al. 2015; Matielo, Collet and D’Ely 2013; Naghizadeh and Darabi 2015; Zarei 2008; Zarei and Rashvand 2011). Some studies do show, on the contrary, that more benefits are derived from subtitling – especially for low proficiency learners (Bianchi and Ciabattoni 2008), while yet others report inconclusive results, with small or non-significant differences between captioning and subtitling (Bisson et al. 2014; Bravo 2008; Steward and Pertusa 2004). These mixed results might be due to differences in methodology (test modality, length of exposure, target items)2 and the characteristics of participants, especially their proficiency level (Malone 2018; Mohd Jelani and Boers 2018). Participants’ proficiency – when reported – ranges from beginner to advanced, sometimes even within the same sample (e.g. Frumuselu et al. 2015), which poses a significant problem when discussing results against other studies (Zarei and Rashvand 2011). It has been found that learners from different proficiency levels show different responses to different on-screen text language within the same study, especially when learners are less proficient (Lwo and Lin 2012). It also appears that learners with larger vocabulary knowledge perform better than learners with smaller vocabularies (Horst, Cobb and Meara 1998; Peter and Webb 2018; Webb and Chang 2015a), suggesting that more proficient participants will normally perform better (‘the rich get richer’ effect). The influence of proficiency on learners’ L2 reading behaviour faced with captions and subtitles was explored in an eye-tracking study by Muñoz (2017). Unexpectedly, despite their slower reading rate, it was found that child participants with very low proficiency level spent a very short time on each fixation when words were in the L2. Muñoz suggested that learners did not even make efforts to understand because of their perceived proficiency limitations. Similarly, research on L2 reading has observed that if learners are unable to follow the overall story, they do not pay attention to the precise meaning of words (Laufer 2005). If a minimum competency threshold is necessary to benefit from captioning (Neuman and Koskinen 1992), subtitling might be a better option for supporting beginner learners (Danan 2004). However, the use of subtitles is normally discouraged in foreign language classroom settings, because it is believed that the availability of the L1 will stop learners from listening to the foreign language and that they will focus only on reading (Danan 2004). It has been argued, though, that ‘it seems perfectly sensible to exploit it [the L1] when it is to our advantage’ (Schmitt 2008:337), which seems particularly true for younger viewers. Young viewers Studies on audio-visual input and L2 learning have been frequently conducted at university level or with adult language learners (e.g. Montero-Perez et al. 2014; Sydorenko 2010; Winke, Gass and Sydorenko 2010). Although still scarce, the number of studies focusing on multimodal input with children and adolescents has increased in the last two decades, and research has demonstrated that watching subtitled or captioned television has positive effects on both first and foreign language learning. Early studies observed that primary school children benefitted from subtitles in their L1 (Koolstra and Beentjes 1999), and that even pre-schoolers could learn new L1 vocabulary through exposure to audio-visual input (Rice et al. 1990). D’Ydewalle and Van Poel (1999) conducted a study with 8- to 12-year-olds testing incidental L2 learning through a 10- minute still-motion movie comparing normal and reversed subtitles, and found that – even with a short exposure – participants in both conditions already showed small gains in vocabulary. From data collected through questionnaires, Kuppens (2010) also reported that Grade 6 students who frequently watched subtitled English programmes (before formal instruction) performed significantly better in English tests. The influence of frequent watching of audio-visual material over other types of out-of- school exposure has been observed in several studies as well. Lindgren and Muñoz (2013) found that watching movies had the strongest explanatory power on the listening and reading comprehension skills of a very large group of 10- to 11-year-old learners in seven European countries. Similarly, in a study comparing Danish and Spanish learners of English (ages 7 and 9) Muñoz, Cadierno and Casas (2018) found that Danish children’s more frequent viewing of OV audio-visual material helped compensate for the effects of their comparatively lower amount of formal instruction to a greater extent than other activities such as gaming or listening to music. Peters (2018) showed that 40% of surveyed students (15- to 16-year-olds) watched OV TV several times a week, compared to the 1% who indicated reading books with the same frequency. Several recent studies have focused on the comparison between young learners’ viewing audio- visual material with or without subtitles or captions. Hsu et al. (2013) investigated the effect of subtitle mode on vocabulary and comprehension with Grade 5 participants during a one-month experiment, comparing non-captioning, full-captioning and keyword- captioning. These researchers found that there were no differences between the two captioning groups and that both outperformed the non-captioning one. Lekkai (2014) explored incidental vocabulary learning through a 15-minute cartoon with Grade 4, 5 and 6 learners (ages 9–12), with and without L1 subtitles. Again, learners in the subtitles group outperformed the non-subtitles and control groups, supporting the idea that even at this young age students can learn from subtitled videos. Chen, Liu and Todd (2018) explored the effect of captioning (against non-captioning) on spoken vocabulary with Grade 8 learners (aged 13), and found that the availability of captions significantly improved learners’ recognition of form and form-meaning knowledge of novel L2 spoken vocabulary, especially for higher-proficiency learners. Studies comparing the effects of on-screen text language (either L1 or L2) have also been recently conducted. Bravo (2008) compared the effects of watching captioned or subtitled episodes of a TV show on lexical expressions and comprehension scores for 13- to 14- year-olds. She found similar results for both experimental groups but also reported that the absence of the L1 required a greater effort and higher L2 fluency among her young participants when completing post-viewing tasks. Lwo and Lin (2012) compared L1 and L2 text, using a multimedia animated reading tool to explore the effects of different types of on-screen text (L1, L2, L1 + L2 and none) on vocabulary and reading comprehension with Grade 8 learners. They found that the effects of different modes on scores depended on learners’ L2 proficiency, that for the lower-proficiency learners having L2 or L1 + L2 subtitles was more beneficial, and that learners relied on visual information for comprehension. Naghizadeh and Darabi (2015)’s study on L2 vocabulary with intermediate-level 15- to 17-year- olds reported that learners in the captions groups learnt significantly more than those in the subtitles group, who in turn had similar results to the non-subtitles group. Peters, Heynen and Puimège (2016) conducted two experimental studies on the effects of L1 and L2 text on vocabulary gains for 17- to 18-year-olds (intermediate and low-intermediate). Their results showed that, even if gains were low, captions had the potential to increase form learning, and that the captions group outperformed the subtitles group. Altogether, the above studies suggest that, regardless of subtitling mode, length of exposure to the input, or proficiency, young learners benefit from exposure to audio- visual input enhanced with L1 and L2 on-screen text. As with older learners, results seem to indicate that captions are more suitable for older/more proficient young learners, while subtitles might be more appropriate for younger/less proficient children. Aims and research questions The present study aims to address some of the gaps observed in this field of research, as highlighted above. It focuses on L2 vocabulary learning by adolescents through extensive viewing of an OV TV series over a complete academic year, providing much needed longitudinal data. It compares vocabulary learning gains from the same audio-visual material with two types of text support: captions and subtitles, which will allow comparison of the benefits of the two modes for beginner / low-inter- mediate learners and exploration of the role of learners’ proficiency in this learning environment. The study also aims to compare vocabulary learning gains under two different instructional conditions: focused (with pre-teaching of target items) and non-focused (without pre- teaching),3 thus contributing to the understanding of the pedagogical value of both types of instruction. Specifically, the study addresses the following research questions: (1) To what extent can L2 vocabulary (form and meaning) be learnt through extended exposure to TV series? (2) How is L2 vocabulary learning through TV series affected by (a) language of the on- screen text, (b) type of instruction and (c) learners’ proficiency level? Methodology Participants The initial pool of participants comprised 106 secondary school learners (65 female, 41 male) in Grade 8, from a state school in the Barcelona area. Only participants who had 85% attendance or more per term were included in the analysis (N = 80). For the second research question, participants who had not completed the proficiency test had to be excluded, leaving a total of 74 (46 female, 28 male). They were 13–14 years old at the time the intervention started. Participants were Catalan-Spanish bilinguals, most of them balanced bilinguals for whom both languages may be considered first languages. They had a beginner – low- intermediate proficiency level in English (Pre-A to A2/B1 according to the Common European Framework of Reference [CEFR]) and a mean vocabulary size in English of 1,967 words. Prior to the intervention, around 55% of participants reported watching movies or TV series in English with L1 subtitles on a weekly basis, and around 15% with L2 subtitles or no subtitles. More than 50% said they found subtitles to be useful or very useful while only 4% considered them to be useless or annoying. The participants had been randomly distributed in four classes by the school, and each one of them was assigned to a different experimental condition (see Figure 1): captions and focused instruction (CF) (n = 22); captions and non-focused instruction (CNF) (n = 22); subtitles and focused instruction (SF) (n = 19); and subtitles and non-focused instruction (SNF) (n = 17). There were no significant differences between the four groups either in proficiency or vocabulary size (see below). Figure 1. Experimental conditions CF = captions focused; CNF = captions non-focused; SF = subtitles focused; SNF = subtitles non-focused Audio-visual material The TV series selected for the intervention was Fresh off the Boat (Khan et al. 2015), and it was chosen for the following reasons: the episodes had an appropriate length (with an average running time of 20 minutes); its content was appropriate for this particular age group; it was engaging (participants could identify with the main character and get hooked into the story); it was serial in nature, which allowed participants to gather information about the characters as they continued watching new episodes (Rodgers 2013); it was not strongly accented; it was a sitcom, a format with which participants are familiar through watching similar TV programmes; and at the time the intervention took place Fresh off the Boat had not been aired in Spain, thus minimising the possibility that participants had watched any of the episodes before. From the series, 24 consecutive episodes were selected, and by the end of the intervention participants had watched a total of 8 hours and 35 minutes of audio-visual input. Subtitles (in Spanish) were manipulated by the first author to ensure that the number of encounters with the target items was equal in both captions and subtitles conditions. The 24 episodes chosen for the intervention were analysed using the RANGE software (Nation and Heatley 2002). The analysis of the lexical profile shows that the series reached a 95.66% coverage at the 3000-word level plus proper nouns and marginal words.4 Target items A total of 120 target items (TIs) (40 each term, 5 per episode) were selected from the series according to (a) frequency of occurrence (between 2 and 14 times within the episode) and (b) low likelihood of being known by participants at this level of proficiency (school teachers were consulted on the TIs selected and these were replaced when necessary).5 TIs were from the first to the fourteenth frequency word lists on the BNC/COCA (Nation 2012): 52% belonged to the 1–3 K word families, 21% to the 4–8 K, and 12% to the 9–19 K (15% were off-list). TIs also belonged to different parts of speech, with the majority of them being nouns (60%) and verbs (25%). As for the frequency of occurrence across the intervention, 75% occurred between 2 and 5 times, 20% between 6 and 9 times and only 5% between 10 and 14 times. Test instruments Initial general proficiency was assessed by means of the Oxford Placement Test (OPT) and vocabulary size was measured using X_Lex (Meara and Milton 2003). Two questionnaires were also administered prior to the intervention to collect background information and attitudes towards subtitles (see above). A third questionnaire was administered after the intervention to gather participants’ insights about the perceived usefulness and overall learning value of the viewing sessions. The pre- and post-test assessing learners’ knowledge of the TIs consisted of two parts: (1) an aural form recognition and written form recall test (henceforth form recall), and (2) a meaning recall test (henceforth meaning recall). Participants listened to each TI twice and had to write down the English word, and then provide a translation or a short definition in Catalan or Spanish. Tests were administered by the first author and included a set of trial items to ensure participants completed them correctly. This type of test was chosen because, in order to assess the benefits of captions and subtitles for vocabulary learning, tests had to be congruent with the input-modality (Mohd Jelani and Boers 2018): written L2 word prompts in the test could have been used to the caption groups’ advantage. Procedure The classroom intervention took place over a whole academic year and was embedded as a part of normal English lessons. Because of school calendar constraints, the intervention itself was divided into 3 terms with 8 viewing sessions each, and participants were pre- tested at the beginning and at the end of each term to assess their knowledge of the corresponding 40 TIs (Figure 2 shows the structure of the intervention). Pre-tests were administered 1–2 weeks prior to the first session to reduce pre-test effects. The decision to have three sets of pre-/post-tests was also made to avoid decay due to having the post- test too far from the first sessions of the term. The complete inter- vention extended over 32 sessions. Figure 2. Structure of the intervention Prior to intervention TRIMESTRE 1 September – December TRIMESTRE 2 January – March TRIMESTRE 3 April – June Proficiency tests Pre- test TREATMENT Post- test Pre- test TREATMENT Post- test Pre- test TREATMENT Post- test Questionnaires For the two focused instruction groups (one with captions, one with subtitles), each viewing session started with a pre-viewing task aimed at teaching the five TIs appearing in the episode plus three distractors. Pre-viewing activities included matching exercises, word searches, fill-in-the-blanks tasks and crosswords, and they were corrected orally. These activities had the aim of drawing learners’ attention to the TIs while watching the video, but no specific strategies were suggested. Then, participants watched the episode (with either captions or subtitles, depending on the group) and completed two immediate post-viewing tasks, namely a vocabulary task and a content comprehension task,6 which were given to encourage learners to pay attention to both vocabulary and content. The two non-focused instruction groups’ sessions followed the same outline but did not include the pre-viewing task; thus, learners were unaware of what words they were going to be tested on later. Scoring and analysis Pre- and post-tests were scored dichotomously (1 or 0). For form recall, words had to be correctly spelled to be considered correct. For meaning recall, translations were scored by two raters (there was an interrater reliability of 95%; disagreement cases were discussed until an agreement was reached). A word was considered learned when it was unknown in the pre-test and known in the post-test. Words known in both pre- and post-test were considered known but not learned. Relative gains were calculated at item level following the formula used in previous studies (Horst, Cobb and Meara 1988; Peters and Webb 2018; Rodgers 2013): Relative gains = (number of learnt TIs/total number of TIs − number of known TIs) × 100 Prior to conducting the analysis, as stated above, participants with less than 85% attendance at the viewing sessions across the intervention were excluded from the data. TIs corresponding to missing episodes were not taken into account when calculating the vocabulary gains for those participants. The measure of relative gains used for analysis is the average relative gains across three terms. Results Table 1 shows descriptive statistics for the two proficiency measures: OPT and X_Lex. There were no significant differences between groups in terms of proficiency (F (3,70) = 1.545, p = .210) nor vocabulary size (F (3,64) = .816; p = .490). For the present study, OPT scores (henceforth proficiency) were used because the OPT yields a more general measure of proficiency, including a section on listening, which was deemed appropriate considering that listening skills are especially relevant in this learning environment. Table 1. General proficiency and Vocabulary size descriptives General proficiency Vocabulary size Group n OPT scores n X_Lex score CF 21 99.71 (14.05) 17 1971 (547) SF 19 90.47 (13.07) 16 2097 (332) CNF 20 92.75 (15.04) 19 1992 (601) SNF 14 93.43 (15.33) 16 1825 (434) Average 94.27 (14.49) 1972 (494) Research question 1: To what extent can vocabulary be acquired through TV series? Our first research question focused on the extent to which participants could learn L2 vocabulary through an extended exposure to TV series shown in the classroom. The descriptive statistics for relative gains in form recall and meaning recall are presented in Table 2. The table displays the percentage of relative gains (average across the three terms) – with standard deviations shown in brackets – by experimental group, and the average across the four groups. Table 2. Relative gains (in percentage) for form recall and meaning recall Percentage Relative gains (SD) Group n Form Meaning CF 21 30,10 (16,83) 14,54 (10,43) SF 19 21,53 (11,16) 8,45 (6,36) CNF 20 13,02 (5,85) 5,97 (6,25) SNF 14 14,30 (9,32) 8,34 (7,73) Average 20,29 (13,50) 9,49 (8,48) Overall, the focused groups who were pre-taught the TIs performed better than the non- focused groups, with the captions-focused group (CF) being the most successful, followed by the subtitles- focused group (SF), the subtitles non-focused group (SNF) and finally the captions non-focused group (CNF). As can be observed, for form recall the two non-focused groups – who were not told what words to pay attention to – performed similarly, independently of the language of the on- screen text. In contrast, when learners were pre-taught the words, the CF group – with simultaneous exposure to both L2 sound and text – outperformed the SF group. For meaning recall, however, the two subtitles groups – with access to L1 translations – performed similarly, independently of instruction. Paired-sample t-tests showed that differences between pre- and post-test were significant in all three terms for both form recall and meaning recall (p < .001) and for each experimental group (ranging from p < .001 to p = .029). On average, the most successful group had gains in form recall of 30.10% and gains in meaning recall of 14.64%, which means that on average participants in that group learnt approximately 36 word forms and 18 word meanings. It is interesting to note that the differences in gains within the group are considerable, with the most successful participants having gains of 62.18% in form and 32.11% in meaning, while the least successful ones had gains of 5.4% in form and 0.83% in meaning. Differences between the groups in form recall were first explored by means of a Welch’s ANOVA and using squared transformations (a Levene’s test showed that variances were unequal: F (3,76) = 2.774, p = .047). The ANOVA showed that there was a statistically significant difference between groups (F (3,76) = 7.714, p < .001, ω2 = .199) and that approximately 20% of the total variance in form recall gains was accounted for by the experimental group. A Tamhane’s T2 post-hoc test revealed that there was a statistically significant difference in relative gains in form between CF and CNF groups (p = .001) as well as between CF and SNF groups (p = .003). For meaning recall, a one-way ANOVA showed that there were also significant differences between the experimental groups (F (3,76) = 3.301, p = .024). A Tukey HSD post-hoc test revealed that differences were only significant between CF and CNF groups (p = .023). Research question 2: Effect of language, type of instruction and proficiency Our second research question addressed the extent to which language of the on-screen text, type of instruction and learners’ proficiency could explain the differences in vocabulary learning observed among the four groups. Generalised linear models (GLMs) were run to evaluate the influence of these three factors on the two vocabulary outcome measures: form recall and meaning recall. Exploration of the data showed that the variable proficiency was not linearly distributed. For this analysis, proficiency was re- categorised into three levels according to the OPT scores, distributing participants in three CEFR groups: Pre-A (n = 25), A1 (n = 35) and A2/B1 (n = 14). A GLM was first calculated with form gains (relative gains in percentage for form recall across the intervention) as the dependent variable, and language (captions or subtitles) type of instruction (focused or non-focused), and proficiency (Pre-A, A1 or A2/B1) as fixed effects. Non-significant inter- actions were removed from the model, leaving significant main effects. Table 3 presents the fitted model for relative gains in form. Table 3. Results of fitted generalized linear model: influence of fixed factors on form recall Mean (SE) M Diff (SE) df F Sig. Captions (EN) 22.25 (1.75) Subtitles (SP) 19.72 (1.95) 2.53 (2.52) 1, 66 1.012 .318 Focused 26.93 (1.78) Non-Focused 15.04 (1.91) 11.89 (2.50) 1, 66 22.596 .000 Pre-Aa 14.65 (2.13) a-b 4.30 (2.79) A1b 18.95 (1.81) b-c 10.41 (3.45) A2/B1c 29.35 (2.96) a-c 14.70 (3.65) 2, 66 8.164 .001 The model revealed a main effect for type of instruction and proficiency, but no main effect for the language of the on-screen text. Results showed that participants in the focused condition scored significantly higher (p < .001) than their classmates by about 11.89%. Gains in form also depended significantly on participants’ proficiency level (p=.001), with the more proficient students scoring 14.70% higher than the less proficient students. Once it was established that these two factors had a significant effect on learning outcomes, we further explored whether the effect of learners’ proficiency was different in each instruction group. For the following analysis, the CF and SF groups were jointly considered focused, and CNF and SNF were non-focused (for we did not find a significant effect for on-screen language). Table 4 shows the percentage of relative gains per type of instruction and proficiency group, with standard error (SE) in brackets. Table 4. Relative gains in form and meaning per type of instruction and proficiency group Percentage of relative gains Mean (SE) Type of Instruction Proficiency n Form Meaning Pre-A 13 20.44 (2.99) 8.08 (2.01) A1 19 24.17 (2.46) 10.61 (1.65) Focused A2/B1 8 37.57 (4.09) 18.89 (2.75) Pre-A 12 8.86 (3.10) 2.73 (2.09) A1 16 13.90 (2.70) 6.99 (1.82) Non-focused A2/B1 6 20.83 (4.37) 14.65 (2.95) GLM results showed that there were significant differences between types of instruction when comparing each proficiency level against its counterpart, with the focused group significantly outperforming the non-focused group at the Pre-A (F (1,66) = 7.182, p = .009), A1 (F (1,66) = 7.928, p = .006) and A2/B1 (F (1,66) = 7.820, p = .007) levels. Within the focused group itself, results indicated that differences were also significant between the three levels of proficiency (F (2,66) = 5.880, p = .004), and pairwise comparisons showed that significant differences were found between the A2/B1 level and both Pre-A (p = .001) and A1 levels (p = .006). In contrast, in the non-focused group differences were marginally significant (F (2,66) = 2.553, p = .087) and significant differences were found only between the A2/B1 level and the Pre-A level (p = .029). Figure 3 shows estimated marginal means per focused and non-focused groups when we divided participants by proficiency levels (Pre-A, A1 and A2/B1). Figure 3. Estimated marginal means for form recall of focused and non-focused groups A second GLM was calculated with meaning gains (relative gains in percentage for meaning recall across the intervention) as the dependent variable, and language (captions or subtitles), type of instruction (focused or non-focused) and proficiency (Pre-A, A1 or A2/B1) as fixed effects. Again, non-significant interactions were removed, leaving significant main effects in the model. Table 5 presents the fitted model. Table 5. Results of fitted generalized linear model: influence of fixed factors on meaning recall Mean (SE) M Diff (SE) df F Sig. Captions (EN) 10.91 (1.75) Subtitles (SP) 9.73 (1.31) 1.18 (1.69) 1, 66 .488 .487 Focused 12.48 (1.19) Non-Focused 8.16 (1.28) 4.32 (1.68) 1, 66 6.643 .012 Pre-Aa 5.43 (1.42) a-b 3.35 (1.87) A1b 8.77 (1.21) b-c 8.00 (2.31) A2/B1c 16.77 (1.99) a-c 11.34 (2.45) 2, 66 10.810 .000 Similar to gains in form, the model showed a main effect for type of instruction and proficiency, but no main effect for the language of the on-screen text. Results showed that participants in the focused condition scored significantly higher (p=.020) than their classmates by 4.32%. Gains in meaning also depended significantly on participants’ proficiency level (p < .000), with the more proficient students scoring 11.34% higher than the less proficient students. Once more we examined the relationship between instruction and proficiency (see Table 4). In contrast with form recall, for meaning recall there were no significant differences between instruction groups at the Pre-A (F (1,66) = 3.375, p = .071), the A1 (F (1,66) = 2.179, p = .145) nor the A2/B1 level (F (1,66) = 1.103, p = .298), although learners in the focused group always had higher gains than those in the non-focused group. GLM also showed, however, that differences in proficiency were significant within each instruction group (F (2,66) = 5.111, p = .009 for focused, and F (2,66) = 5.452, p = .006 for non-focused). Pairwise comparisons revealed that in both conditions it was the more advanced group (A2/B1 level) that significantly outperformed the other two: in the focused group, differences were between the A2/B1 level and the Pre-A (p = .002) and A1 levels (p = .012), and in the non-focused group, significant differences were also found between the A2/B1 group and the Pre-A (p = .002) and A1 (p = .006) groups. Figure 4 shows estimated marginal means per type of instruction when we divided participants by proficiency levels. Finally, and although a complete analysis falls beyond the scope of this paper, a preliminary exploration of the self-reported data from the end-of-intervention questionnaires revealed interesting findings concerning learners’ attention to the audio- visual input. As many as 89% of participants reported having been attentive or very attentive, and only 17% said they were paying less attention during the viewing sessions compared to other classroom activities. Learners also reported that they understood better the series by the end of the intervention (74%), they had a strong feeling of learning (58%) and they felt relaxed during the sessions (53%) (Pujadas, 2019). Figure 4. Estimated marginal means for meaning recall of focused and non-focused groups Discussion This study explored the effects of extensive viewing of a TV series on L2 vocabulary learning investigating the influence of the language of the on-screen text, type of instruction and learners’ proficiency on L2 vocabulary learning. Results concerning the first research question showed that, independently of the experimental condition, participants did learn L2 vocabulary from extensive exposure to audio-visual input. This concurs with findings from the majority of studies in the field, which claim that L2 vocabulary can be acquired through watching TV series (e.g. Peters and Webb 2018; Rogers 2013). Additionally, it was also found that participants made greater gains in recalling form than in recalling meaning – in line with previous studies that also showed greater gains in form recognition than in meaning recognition (Montero-Perez et al. 2014; Peters, Heynen and Puimège 2016). This was found across all conditions, that is, with captions or subtitles, and with or without pre-teaching of TIs. Importantly, it cannot be forgotten that in the present study, recalling meaning is dependent on prior recalling form, which will have had an effect on meaning recall scores. Our second research question looked at the effect that the language of the on-screen text, type of instruction and learners’ proficiency had on participants’ vocabulary learning regarding form recall and meaning recall. Overall, focused instruction groups performed significantly better than non- focused groups in both form recall and meaning recall, independently of whether they were watching the series with captions or subtitles. This is not surprising, since it is well known from past research that intentional learning is significantly more efficient than incidental learning (Hulstijn 2003). However, although results revealed that language of the on-screen text had no significant effect on either form recall or meaning recall, a pattern can be observed: in the focused condition, the group with captions outperformed the group with subtitles in both form and meaning recall, while in the non-focused groups it was the subtitles group that performed slightly better than the captions group in meaning recall. This may suggest that when learners are pre-taught the words appearing in an episode they make a first connection between form and meaning of the new words through the pre-viewing activities. Then, having the audio and text in the same language (captions) reinforces the connection between the oral and written form (Webb and Nation 2017), which in turn helps recall the meaning. On the other hand, when learners are not pre-taught the TIs (more comparable to an incidental learning condition), there is a tendency for the subtitles group to have higher gains in meaning, as they can use the meaning provided by the L1 subtitles to connect it to the L2 oral form, but cannot use this shortcut with captions (and it takes them longer to learn the words). This would also suggest that the L1 text might have compensated for the lack of instruction in the SNF group. The lack of statistical differences between the captions and subtitles groups falls in line with results from other aforementioned studies (e.g. Bisson et al. 2014; Steward and Pertusa 2004). If we narrow down the comparison to studies with young viewers, results coincide with those from Bravo’s study (2008) – with participants at A2/B1 proficiency level – in which L1 and L2 groups did not statistically differ, though the subtitles group performed slightly better. However, Bravo acknowledged that the L1 group was initially more proficient than the L2 group, and since the presence of captions required a higher L2 proficiency level this would explain the lack of differences. Lwo and Lin (2014) also found that varying the language of the text did not have a significant impact on vocabulary gains and that the effect of different types of text presentation varied depending on learners’ proficiency. They found that this was more evident in lower proficient learners, who benefited the most from captions, but for advanced learners, the presence of the L1 was a distractor – a result that was not found in the present study. On the other hand, the studies by Naghizadeh and Darabi (2015) and by Peters, Heynen and Puimège (2016) consistently found that captions groups performed significantly better than the sub- titles groups in vocabulary learning. However, learners in these two studies were older (aged 15–18) than participants in the present study (aged 13), and more proficient. This suggests that there might indeed be an age/proficiency threshold and that the older and more proficient you are, the better you benefit from captioning rather than subtitling. In line with the above-mentioned studies, in our study, we found that – as expected – learners’ proficiency level was significantly related to vocabulary gains in both form and meaning recall, with more advanced learners obtaining higher gains (Chen, Liu and Todd 2018). Since instruction had a strong effect on vocabulary learning outcomes, we further investigated the relationship between learners’ proficiency and instruction and found that the effect of proficiency level (Pre-A, A1 or A2/B1) was different depending on whether TIs were pre-taught or not. Results showed that for form recall, participants in the focused groups significantly outperformed the non-focused groups at each proficiency level and that the A2/B1 group had higher significant gains than A1 and Pre-A. In the non-focused groups, significant differences were only found between A2/B1 and Pre-A levels. In contrast, for meaning recall, the differences between focused and non-focused groups at each proficiency level did not reach significance, although focused groups consistently outperformed non- focused groups. This would suggest that for meaning recall, proficiency might have had a slightly stronger effect than instruction. Again, A2/B1 groups in both types of instruction setting significantly outperformed A1 and Pre-A groups. The fact that significant differences were mostly found between the A2/B1 group and the other two suggests the possibility of a threshold at A2/B1 level, over which learners seem to be able to benefit better from exposure to audio-visual input for vocabulary learning. This finding confirms the crucial role played by proficiency and suggests that the results of studies in this area cannot be adequately interpreted if this key variable is not taken into account. In other words, to reach robust conclusions in this line of research, results need to be seen as contingent on the proficiency level of the participants of each particular study. Further research controlling for age or for proficiency can help us conclude which of the two factors has a stronger interaction with the outcomes from either mode. In summary, this study has yielded evidence that shows that extensive viewing of TV programmes can support L2 vocabulary learning. Arguably, compared to the outcomes of other kinds of vocabulary instruction, the gains in both form and meaning were relatively small (in a total of 515 minutes, participants in the most successful group (CF) learned on average around 36 word forms and 18 word meanings). However, one needs to bear in mind that vocabulary learning is a gradual process and we did not test the partial acquisition of the TIs nor possible long-term effects. Likewise, we did not test for other aspects of knowledge, such as pronunciation, or other words that might have been learned. Conclusions This study has made several contributions to the growing area of research into the benefits of watching captioned/subtitled audio-visual material for L2 learning. First, results confirm previous findings regarding the potential of TV programmes for vocabulary learning and as a rich source of comprehensible input. Secondly, this study has proven that the integration of explicit instruction and extensive viewing is possible and effective, and suggests that a small amount of teaching (instruction consisted of simple 5 minute-activities) and directing learners’ attention to target vocabulary already brings about significant improvement – especially on form recall. Third, it has provided valuable data for the so-far unresolved issue of the relative gains from captions and subtitles, underscoring the key role played by learners’ proficiency. In that respect, it has been found that the benefits of either on-screen text language depend not only on the linguistic competence itself but also on the instruction condition. Fourth, the study looked at adolescent learners in regular classrooms and measured gains over a period of time extending a whole academic year, yielding ecological validity and generalisability to its findings and complementing results more frequently obtained from research with university students, although replication with larger groups is still needed. Finally, relevant pedagogical implications emerge from this classroom-based intervention. As mentioned above, a minimal time investment on pre-teaching with receptive activities helps students’ learning of vocabulary, which could be further enhanced with additional productive activities (Chung 1996; Sockett 2014). Data on learners’ perceptions also showed that learners were very positive and appreciative of encountering ‘real’ English, which is of special interest for teachers. Students were paying attention to the input and had a strong feeling of learning, and the entertaining nature of the media did not seem to distract but, on the contrary, boosted attention. Moreover, although results evince the higher efficiency of pre-teaching, the fact that learners who did not receive instruction also succeeded in learning vocabulary – even if to a lesser extent – supports previous findings on the learning potential that watching TV series may also have outside the classroom. Especially in set- tings where L2 input is limited, teachers may encourage learners to engage in viewing OV series out of school and may also train them to enhance their learning potential through the use of strategies and focus on form. Finally, the study also has social implications demonstrating to learners and teachers the value of watching captioned/subtitled TV series as an L2 resource in a traditionally dubbing country and, thus, hopefully contributing to a change of viewing habits. The study has some limitations that should be acknowledged. First, the type of test used to evaluate learning might explain the low gains obtained – especially in meaning – since a recall test (e.g. a translation test) is more difficult than a recognition test (e.g. multiple- choice test) (Jones 2004). If a student failed to aurally identify the target items in the first place they could not provide a translation, but this does not mean the participants could not recognise the word form if encountered, or that they did not know the meaning of the word. A second test to check partial knowledge of the target items may have provided a more accurate picture of their learning. As regards form, the requirement that words had to be correctly spelt might have put the subtitles groups in disadvantage. Another limitation is that the study did not take into account the degree of attention participants were actually putting into the tasks, which could be a concern when extrapolating results. Although more faithful to the real learning environment, this is a common short- coming of classroom studies, which may be seen as an inevitable concomitant of their ecological validity. Similarly, this environment precluded the existence of a control group as ethically, their learning opportunities without any exposure to OV TV might have been reduced. Lastly, we did not take into account the role that the imagery might have had in providing semantic support, especially for those learners who did not receive that information from either the instruction or the subtitles (the CNF group), and who nonetheless acquired a number of the word meanings. Future research also needs to look into the characteristics of the target items – such as imageability (Rodgers 2018), frequency of occurrence or relevance, which may influence the learnability of vocabulary in this particular context. Notes 1. In the present study, the term subtitles is used to refer to L1 subtitles or interlingual subtitles (in this case, Spanish), and captions is used to refer to same-language subtitles or intralingual subtitles (in this case, English). 
 2. Within the aforementioned studies input materials vary from 2-minute short clips to full-length 90- minute movies or several TV episodes (adding up to 325 minutes in total), and the number of target items oscillate between 10 and 78 items, with a wide range of tests types and constructs measured. 
 3. The terms focused and non-focused are used here to refer to the condition in which learners are pre- taught the target items and are not, respectively. These terms are preferred over the terms intentional and incidental. Usually, incidental learning is narrowly identified as not having forewarning of an upcoming post-test (Dörnyei 2009; Hulstjin 2003), and the distinction with intentional learning is centred on the role of learners’ intention (Burton, García López and Esquileche Mesa 2011). However, in classroom studies ‘it is difficult [ ... ] to ensure that learners do not become intentionally focused on learning vocabulary’ (Malone 2018, 3), and it is actually impossible to know what participants actually do (Hummel 2010). This becomes even more obvious in the context of a longitudinal study, where participants are already expecting a vocabulary test after a couple of sessions. 
 4. Nation (2006) suggested that proper nouns have a minimal learning burden; and Webb and Rodgers (2009b) showed in their study of the lexical coverage of movies that if learners knew proper nouns and marginal words (e.g. ah, oh, huh) they could reach 95.76% coverage with the most frequent 3,000 word families. In the present study, proper nouns make up 3.11 % of the running words, adding more coverage than words from the 3000 word-family (1.62%). Considering that characters and locations reoccur throughout the episodes, it seems safe to assume learners are familiar with most of the proper nouns. 
 5. Frequency of occurrence across the intervention takes into account the number of occurrences of a TI from the pre- to the post-test where the TI was tested, that is, it includes occurrences happening within 8 episodes of the term in which that word was targeted. 
 6. The vocabulary task was an aural recall and meaning recognition task (participants had to write down the word they heard, and select the correct translation from 5 options provided), and included 5 TIs plus 3 distractors. The comprehension task included 10 items (5 multiple-choice, 5 true-false) assessing content comprehension. Post- viewing tasks were not corrected. Although the analysis of comprehension falls beyond the scope of the present paper, it is interesting to note that, while comprehension was observed in all groups, differences were found between on-screen text language, with the subtitles groups outperforming the captions groups (Pujadas and Muñoz, in press). 
 Disclosure statement No potential conflict of interest was reported by the authors. Funding This work was supported by Ministerio de Economía y Competitividad (MINECO) [grant number FFI2016-80564-R] and by Agència de Gestió d’Ajuts Universitaris i de Recerca (AGAUR) [grant numbers 2017 SGR560 and FI-DGR 2016]. ORCID Geòrgia Pujadas http://orcid.org/0000-0002-0290-1158 Carmen Muñoz http://orcid.org/0000-0002-7001-4155 References Bavaharji, M., Z.K. Alavi and K. Letchumanan. 2014. Captioned instructional video: effects on content comprehension, vocabulary acquisition and language proficiency. English Language Teaching 7, no. 5: 1– 16. DOI: 10.5539/elt.v7n5p1. Bianchi, F. and T. Ciabattoni. 2008. Captions and subtitles in EFL learning: an investigative study in a comprehensive com- puter environment. In From Didactas to Ecolingua, ed. A. Baldry, M. Pavesi, C. Taylor Torsello and C. Taylor, 69–90. Trieste: Edizioni Università di Trieste.
 Birulés-Muntané, J. and S. Soto-Faraco. 2016. Watching subtitled films can help learning foreign languages. PloS One 11, no. 6: e0158409. DOI: 10.1371/journal.pone.0158409.
 Bisson, M.J., W.J.B. Van Heuven, K. Conklin and R.J. Tunney. 2014. The role of repeated exposure to multimodal input in incidental acquisition of foreign language vocabulary. Language Learning 64, no. 4: 855–77. DOI: 10.1111/lang.12085. Bravo, M.C.C. 2008. Putting the reader in the picture. Screen translation and foreign-language learning. PhD diss, University Rovira i Virgili and University of Algarve.
 Bruton, A., M. García López and R. Esquiliche Mesa. 2011. Incidental L2 vocabulary learning: an impracticable term? TESOL Quarterly 45, no. 4: 759–68. DOI: 10.5054/tq.2011.268061.
 Chang, A.C.S. and J. Read. 2006. The effects of listening support on the listening performance of EFL learners. TESOL Quarterly 40, no. 2: 375–97.
 Chen, Y., Y. Liu and A.G. Todd. 2018. Transient but effective? captioning and adolescent EFL learners’ spoken vocabulary acquisition. English Teaching & Learning 42, no. 1: 25–56. DOI: 10.1007/s42321- 018-0002-8
 Chung, J.M. 1996. The effects of using advance organizers and captions to introduce video in the foreign language classroom. TESL Canada Journal 14, no. 1: 61–5.
 Chung, J.M. 1999. The effects of using video texts supported with advance organizers and captions on Chinese college students’ listening comprehension: an empirical study. Foreign Language Annals 32, no. 3: 295–308.
 Danan, M. 2004. Captioning and subtitling: undervalued language learning strategies. Meta: Journal Des Traducteurs 49, no. 1: 67–77. DOI: 10.7202/009021ar.
 Dörnyei, Z. 2009. The Psychology of Second Language Acquisition. Oxford: Oxford University Press.
 D’Ydewalle, G. and M. Van de Poel. 1999. Incidental foreign-language acquisition by children watching subtitled television programs. Journal of Psycholinguistic Research, 28, no. 3: 227–44. DOI: 10.1023/A:1023202130625
 Elley, W.B. 1989. Vocabulary acquisition from listening to stories. Reading Research Quarterly 24, no. 2: 174–87. European Commission. 2017. Media use in the European Union. European Union, Standard Eurobarometer 88 Project. DOI: 10.2775/116707
 Frumuselu, A.D., S. De Maeyer, V. Donche and M.M. Gutiérrez Colon-Plana. 2015. Television series inside the EFL classroom: bridging the gap between teaching and learning informal language through subtitles.’ Linguistics and Education 32: 107–17. DOI: 10.1016/j.linged.2015.10.001.
 Gesa, F. and I. Miralpeix. 2018. Enhancing foreign language learning by means of multimodal input. The case of subtitled TV series and young learners. Paper presented at the 2018 Conference on Technological Innovation for Specialized Linguistic Domains, Ghent, May 24–26.
 Guillory, H.G. 1998. The effects of keyword captions to authentic French video on learner comprehension. Calico Journal 15, no. 1–3: 89–108.
 Horst, M., T. Cobb and P. Meara. 1998. Beyond a clockwork orange: acquiring second language vocabulary through reading. Reading in a Foreign Language 11, no. 2: 207–23.
 Hsu, C.K., J.G. Hwang, Y.T. Chang and C.K. Chang. 2013. Effects of video caption modes on English listening comprehension and vocabulary acquisition using handheld devices. Journal of Educational Technology & Society, 16, no. 1: 403–14. ISSN1436-4522 Hulstijn, J.H. 2003. Incidental and intentional learning. In Handbook of Second Language Acquisition ed. C.J. Doughty and M.H. Long, 349–81. Malden, MA: Wiley-Blackwell. Hulstijn, J.H. 2013. Incidental learning in second language acquisition. In The Encyclopedia of Applied Linguistics, ed. C.A. Chapelle, vol. 5, 2632–40. Chichester: Wiley-Blackwell. DOI: 10.1002/9781405198431.wbeal0530 Hummel, K.M. 2010. Translation and short-term L2 vocabulary retention. Hindrance or help? Language Teaching Research 14: 61–74. DOI: 10.1177/1362168809346497. Jones, L. 2004. Testing L2 vocabulary recognition and recall using pictorial and written test items. Language Learning & Technology 8, no. 3: 122–43. ISSN 1094-3501 Khan, N., J. Kasdar, M. Melvin, R. Blomquist, E. Huang and J. McEwen. 2015. Fresh off the Boat [TV Series]. Los Angeles, CA: ABC Koolstra, C.M. and J.W.J. Beentjes. 1999. Children’s vocabulary acquisition in a foreign language through watching sub- titled television programs at home. Educational Technology Research and Development 47, no. 1: 51–60. doi:10.1007/ bf02299476. Kuppens, A.H. 2010. Incidental foreign language acquisition from media exposure. Learning, Media and Technology 35, no. 1: 65–85. doi:10.1080/17439880903561876. Laufer, B. 2005. Focus on form in second language vocabulary learning. EUROSLA Yearbook 5, no. 1: 223–50.
Lekkai, I. 2014. Incidental foreign-language acquisition by children watching subtitled television programs. TOJET: The Turkish Online Journal of Educational Technology 13, no. 4: 81–7.
 Lindgren, E. and C. Muñoz. 2013. The influence of exposure, parents, and linguistic distance on young European learners’ foreign language comprehension. International Journal of Multilingualism 10, no. 1: 105–29. DOI: 10.1080/14790718. 2012.679275.
 Lwo, L. and M.C. Lin. 2012. The effects of captions in teenagers’ multimedia L2 learning. ReCALL 24, no. 2: 188–208. DOI: 10.1017/s0958344012000067.
 Malone, J. 2018. Incidental vocabulary learning in SLA. Studies in Second Language Acquisition, 1–25. DOI: 10.1017/ s0272263117000341.
 Matielo, R., T. Collet and R.C.S.F. D’Ely. 2013. The effects of interlingual and intralingual subtitles on vocabulary learning by Brazilian EFL learners: an exploratory study. Intercâmbio. Revista do Programa de Estudos Pós-Graduados em Linguística Aplicada e Estudos da Linguagem 27: 83–99. ISSN2237-759x. 
 Matielo, R., R.C.S.F. DEly and L. Baretta. 2015. The effects of interlingual and intralingual subtitles on second language learning/acquisition: a state-of-the-art review. Trabalhos Em Linguística Aplicada 54, no. 1: 161–82. DOI: 10.1590/ 0103-18134456147091.
 Meara, P. and J. Milton. 2003. X-lex: The Swansea Levels Test. Newbury: Express Publishing.
 Mohd Jelani, N.A. and F. Boers. 2018. Examining incidental vocabulary acquisition from captioned video. Approaches to Learning, Testing, and Researching L2 Vocabulary ITL – International Journal of Applied Linguistics 169, no. 1: 169–90. DOI:10.1075/itl.00011.jel.
 Montero-Perez, M., E. Peters, G. Clarebout and P. Desmet. 2014. Effects of captioning on video comprehension and incidental vocabulary learning. Language Learning & Technology 18, no. 1: 118–41. DOI: 10125/44357
 Montero-Perez, M., W. Van Den Noortgate and P. Desmet. 2013. Captioned video for L2 listening and vocabulary learning: a meta-analysis. System 41, no. 3: 720–39. DOI: 10.1016/j.system.2013.07.013.
 Muñoz, C. 2017. The role of age and proficiency in subtitle reading. An eye-tracking study. System 67: 77–86. DOI: 10.1016/ j.system.2017.04.015
 Muñoz, C., T. Cadierno and I. Casas. 2018. Different starting points for English language learning: a comparative study of Danish and Spanish young learners. Language Learning 68. DOI: 10.1111/lang.12309
 Naghizadeh, M. and T. Darabi. 2015. The impact of bimodal, Persian and no-subtitle movies on Iranian EFL learners’ L2 vocabulary learning. Journal of Applied Linguistics and Language Research 2, no. 2: 66–79. ISSN2376-760x
 Nation, P. 2006. How large a vocabulary is needed for reading and listening? Canadian Modern Language Review, 63, no. 1: 59–82. DOI: 10.3138/cmlr.63.1.59
 Nation, P. 2007. The four strands. International Journal of Innovation in Language Learning and Teaching, 1, no. 1: 2–13. DOI: 10.2167/illt039.0
 Nation, P. 2012. The BNC/COCA Word Family Lists. Document bundled with Range program with BNC/COCA lists, 25. Nation, P. 2015. Principles guiding vocabulary learning through extensive reading. Reading in a Foreign Language 27, no.1: 136–45. http://nflrc.hawaii.edu/rfl
 Nation, P. and Heatley, A. 2002. Range: a program for the analysis of vocabulary in texts [software]. https://www.victoria. ac.nz/lals/about/staff/paul-nation
 Neuman, S.B. and P. Koskinen. 1992. Captioned television as comprehensible input: effects of incidental word learning from context for language minority students. Reading Research Quarterly 27, no. 1: 94– 106. DOI: 10.2307/747835. Peters, E. 2018. The effect of out-of-class exposure to English language media on learners’ vocabulary knowledge. Approaches to Learning, Testing, and Researching L2 Vocabulary ITL - International Journal of Applied Linguistics 169, no. 1: 142–68. doi:10.1075/itl.00010.pet.
 Peters, E., E. Heynen and E. Puimège. 2016. Learning vocabulary through audiovisual input: the differential effect of L1 subtitles and captions. System 63: 134–48. DOI: 10.1016/j.system.2016.10.002. Peters, E. and S. Webb. 2018. Incidental vocabulary acquisition through viewing L2 television and factors that affect learning. Studies in Second Language Acquisition 1–27. DOI: 10.1017/s0272263117000407. Pujadas, G. 2019. Language learning through extensive TV viewing. A study with adolescent EFL learners. PhD diss, Universitat de Barcelona. Pujadas, G. and C. Muñoz. 2017. Learning through subtitles. Learners’ preferences and task perception. Paper presented at the 2017 International Conference on Task-Based Language Teaching, Barcelona, 19– 21 April. Pujadas, G. and C. Muñoz. (in press) Examining adolescent EFL learners’ TV viewing comprehension through captions and subtitles. Studies in Second Language Acquisition. First view at: https://doi.org/10.1017/S0272263120000042 Rice, M.L., A.C. Huston, R. Truglio and J.C. Wright. 1990. Words from ‘Sesame Street’: learning vocabulary while viewing. Developmental Psychology 26, no. 3: 421–8. DOI: 10.1037//0012- 1649.26.3.421. Rodgers, M.P.H. 2013. English language learning through viewing television: an investigation of comprehension, inciden- tal vocabulary acquisition, lexical coverage, attitudes, and captions. PhD diss, Victoria University of Wellington. Rodgers, M.P.H. 2018. The images in television programs and the potential for learning unknown words. Approaches to Learning, Testing, and Researching L2 Vocabulary ITL - International Journal of Applied Linguistics 169, no.1: 191–211. DOI: 10.1075/itl.00012.rod. Rodgers, M.P.H. and S. Webb. 2011. Narrow viewing: the vocabulary in related television programs. TESOL Quarterly 45, no. 4: 689–717. DOI: 10.5054/tq.2011.268062 Schmitt, N. 2008. Review article: instructed second language vocabulary learning. Language Teaching Research 12, no. 3: 329–63. DOI: 10.1177/1362168808089921 Schmitt, N. 2010. Researching Vocabulary: A Vocabulary Research Manual. Basingstoke: Palgrave Macmillan.
 Sockett, G. 2014. The Online Informal Learning of English. Basingstoke: Palgrave Macmillan.
 Stewart, M.A. and I. Pertusa. 2004. Gains to language learners from viewing target language closed- captioned films. Foreign Language Annals 37, no. 3: 438–42. DOI: 10.1111/j.1944-9720.2004.tb02701.x. 
 Sydorenko, T. 2010. Modality of input and vocabulary acquisition. Language Learning & Technology 14, no. 2: 50–73. DOI: 10125/44214
 Vanderplank, R. 1990. Paying attention to the words: Practical and theoretical problems in watching television programmes with uni-lingual (CEEFAX) sub-titles. System 18, no. 2: 221–34.
 Vanderplank, R. 2010. Déjà vu? A decade of research on language laboratories, television and video in language learning. Language Teaching 43, no. 1: 1–37. DOI: 10.1017/S0261444809990267
 Vanderplank, R. 2016. ‘Effects of’ and ‘effects with’ captions: how exactly does watching a TV programme with same- language subtitles make a difference to language learners? Language Teaching 49, no. 2: 235–50. DOI: 10.1017/ s0261444813000207.
 Van Zeeland, H. and N. Schmitt. 2013. Lexical coverage in L1 and L2 listening comprehension: the same or different from reading comprehension? Applied Linguistics, 34, no. 4: 457–79. DOI: 10.1093/applin/ams074
 Webb, S. 2010. Pre-learning low-frequency vocabulary in second language television programmes. Language Teaching Research 14, no. 4: 501–15. DOI: 10.1177/1362168810375371
 Webb, S. and A.C.S. Chang. 2015a. How does prior word knowledge affect vocabulary learning progress in an extensive reading program? Studies in Second Language Acquisition, 37, 651–75. DOI: 10.1017/S0272263114000606
 Webb, S. and P. Nation. 2017. How Vocabulary is Learned. Oxford: Oxford University Press.
 Webb, S. and M.P.H. Rodgers. 2009a. Vocabulary demands of television programs. Language Learning 59, no. 2: 335–66. DOI: 10.1111/j.1467-9922.2009.00509.x.
 Webb, S. and M.P.H Rodgers. 2009b. The lexical coverage of movies. Applied Linguistics 30, no. 3: 407–27. DOI: 10.1093/ applin/amp010.
 Winke, P., S. Gass and T. Sydorenko. 2010. The effects of captioning videos used for foreign language listening activities. Language Learning & Technology 14, no. 1: 65–86. DOI::10125/44203
 Zarei, A.A. 2008. The effect of bimodal, standard, and reversed subtitling on L2 vocabulary recognition and recall. Pazhuhesh-e Zabanha-ye Khareji 49: 65–84.
 Zarei, A.A. and Z. Rashvand. 2011. The effect of interlingual and intralingual, verbatim and nonverbatim subtitles on L2 vocabulary comprehension and production. Journal of Language Teaching and Research 2, no. 3: 618–25. DOI: 10.4304/ jltr.2.3.618-625.