A Longitudinal Analysis of the Effects of One Year Abroad

The purpose of this study is to analyze the progress of 14 Spanishspeaking learners of English during a period abroad from a longitudinal perspective. Oral and written data were collected three times during an academic year at a British university. These samples were analyzed in terms of fluency, syntactic complexity, lexical richness, and accuracy. The results of the statistical analyses indicate that, while a few months abroad might be sufficient for some gains in oral performance to occur, improvement in written production is slower and does not seem to take place until students have spent more than one semester abroad. In addition, it was observed that the type of interaction experienced abroad and some attitudinal features can partly explain language development in some areas.


Literature review
Learning context has been an important focus of second language acquisition (SLA) research during the past decade (Collentine, 2009;Freed, 1995Freed, , 1998Freed, Segalowitz, & Dewey, 2004;Freed, So, & Lazar, 2003;Llanes, 2011). This interest in learning context has grown in tandem with the popularity of study abroad (SA) experiences.
According to the Institute of International Education (2012) and the European Commission for Higher Education (2011), which report data regarding SA participation in the US and Europe, respectively, the number of students studying abroad has increased dramatically during the past decade (see Figures 1 and 2). Similarly, Canadian universities are becoming more and more interested in sending their students abroad, and an increasing number of these students (up to 17,850 in 2006, three times more than in the year 2000) are taking advantage of this opportunity according to the Association of Universities and Colleges of Canada (AUCC, 2007).
It has been documented that SA has an impact on several areas of second language development, both in non-linguistic aspects, such as motivation (Allen, 2010) and affective and cultural factors (Ismail, & Hayes, 2005), and in linguistic ones. The effects of SA on the participants' linguistic development in particular have been widely documented. The most investigated domain in relation to learning context is oral production, especially oral fluency, as it is believed to be the most sensitive to learning context (Freed, 1995;Freed, Segalowitz & Dewey, 2004;Lennon, 1990;Llanes & Muñ oz, 2009;Yager, 1998). Vocabulary development is another important domain that has been reported to be different between SA and traditional instructional settings. Lennon's (1990) and Llanes and Muñ oz's (2009) studies analyzed the oral fluency of a group of learners who spent time abroad and found that time abroad was crucial for the improvement of second language (L2) fluency. Freed's (1995) study also examined oral production using both objective measures and rating scales through which native speakers judged the native-likeness of learners' speech samples. The results obtained on both sets of measures revealed that the SA context led to greater gains than the at home (AH) context. Similarly, Yager (1998) found that after the SA experience, participants were perceived to be more fluent. Further evidence for the benefits of SA on participants' oral production comes from Segalowitz and Freed (2004), who examined oral fluency using a series of objective measures and attempted to relate the gains that participants achieved to their cognitive abilities. These authors concluded that cognitive abilities also play a role in the oral fluency improvement that participants experienced.
Vocabulary acquisition is another commonly researched aspect of learning-context studies. Dewey (2008) examined the receptive vocabulary of American undergraduates learning Japanese in three learning contexts: AH, IM (domestic immersion), and SA. Dewey found that participants in the SA group scored higher than participants in the IM group, who in turn scored higher than participants in the AH group. Foster (2009) compared the L2 vocabulary of learners in different contexts, AH and SA, and also included data from native speakers of the L2. She found that SA participants' L2 use was closer to native speakers' use of the language than AH participants' L2 use. Other studies such as Ife, Vives, and Meara (2000), Llanes andMuñ oz (2009), andMilton andMeara (1995) examined the L2 vocabulary development of participants who spent some time abroad, but they did not offer a comparison group. The studies reported analyzed vocabulary use in different ways; however, they have all concluded that staying abroad is beneficial for the participants' lexical development.
Although not as commonly investigated as oral fluency and vocabulary, other language skills have also been the subject of learning-context studies. Some evidence has been provided in the literature for the advantage of the SA context for the development of listening skills (Cubillos, Chieffo, & Fan, 2008;Dyson 1988;Llanes & Muñ oz, 2009), reading comprehension skills (Dewey, 2004;Lapkin, Hart, Swain, 1995), writing skills (Sasaki, 2004(Sasaki, , 2007(Sasaki, , 2009, and also grammar (Guntermann 1995;Howard, 2005Howard, , 2006. An interesting finding is that no benefits have been reported for the SA context in terms of pronunciation (Díaz-Campos, 2004;Mora, 2008). In relation to sociolinguistic appropriateness, Regan (1995, p. 261) claims that 'the effect of the year abroad is very striking in the acquisition of the vernacular grammar and sociolinguistic competence.' Regan corroborated her own claim in her 2005 study, in which she examined the deletion of the particle ne ('no,' 'not') in French (L2) by a group of five Irish undergraduates who spent an academic year in France. The author collected data at three different points (pre-test, post-test, and delayed post-test) and found that, between the pre-and post-test, participants deleted the particle ne more frequently (i.e., showing more native-like sociolinguistic patterns) and that the ne deletion rates attained after their year abroad were still maintained one year after their return from France.
It must be borne in mind, however, that not all the empirical evidence in terms of learning context suggests significant differences in favour of students going abroad over students receiving classroom instruction. Some studies have reported no differences between learning contexts or no significant improvement after a period abroad (Collentine, 2004;DeKeyser, 1991;Dewey, 2004;Díaz-Campos, 2004;Mora, 2008). Similarly, studies examining different linguistic areas do not necessarily find advantages in all of these areas for the SA context (Freed et al., 2003).
The fact that the SA context has not been found uniformly more beneficial for language development than classroom L2 learningdespite the popular belief that the best (or even the only) way to learn a language is by spending time abroad -can be due to the different factors that determine whether students will take advantage of the opportunities they supposedly have abroad. One such factor is the length of stay abroad. It has been shown that, in general, students who stay abroad for a whole academic year tend to show significantly greater gains than those who stay for only one semester (Dwyer, 2004;Ife et al., 2000). Other variables that can have an impact on the type and rate of L2 development that occurs abroad include initial proficiency level (Brecht, Davidson, & Ginsberg, 1995;Freed, 1990Freed, , 1995Ife et al., 2000;Milton & Meara, 1995), language contact while abroad Juan-Garau & Pérez-Vidal, 2007;, personality (Kinginger, 2008), or even gender (Brecht et al., 1995;Polanyi, 1995;Regan et al., 2009).
The present study aims to investigate language gains in an SA context from a longitudinal perspective, which is not a commonly adopted design in the literature (with a few exceptions, such as Regan, 2005). The same group of participants was followed at different time points during their stay abroad and their oral and written production was assessed over time. The learners' progress was examined through different data collection points after a few months abroad and after a whole academic year. This design allows us to analyze whether L2 development in an SA context is linear in the different areas of oral and written production under examination and whether oral and written production develop in tandem. In addition, a longitudinal design also facilitates the possibility of examining whether there are L2 areas that develop more quickly than others. In addition to the longitudinal development of students' oral and written production, we have also considered two factors for analysis that we thought might affect language progress in the SA context: attitudes toward the L2 and its speakers, which has been claimed to affect SLA in general (Masgoret & Gardner, 2003), and language interaction while abroad.
More specifically, this study aims to answer the following research questions: 1. Does L2 proficiency in oral and written production develop at the same pace while abroad, or is improvement in one modality faster than in the other? 2. Can learners' individual variables, such as attitudes or chances to interact abroad, explain certain aspects of language development in oral and written production?

Participants
The participants in this study are 14 Spanish-speaking students from Spain who were enrolled in a UK university for one year as part of the Erasmus European Exchange Programme, which is the most popular program for college students to study abroad in Europe. The participants were all young adults between the ages of 20 and 24, with a mean age of 22. There were nine females and five males. The participants studied different majors in Spain, seven of them related to English studies or translation, six in scientific fields, and one in history. The students also differed in terms of their current academic year at their universities of origin: two were in their second year, eight were in their third or fourth year, and four were writing their undergraduate thesis. For all of the students the SA period was optional and for six, or roughly half, this was their first time abroad. In terms of their previous experience with the English language in formal settings, all of the students had received instruction at school beginning at ages 6-10. Apart from this, eight students had also taken extra-curricular courses in language schools, while six students did not have this experience. When they were asked about their perception of their English proficiency in reading, writing, speaking, and listening, they all rated themselves between lower intermediate and advanced, and this rating corresponded with students' actual proficiency level as determined by their performance on the pre-test.

Instruments
The instruments that were used in this study were designed to examine students' oral and written production on the one hand and students' background information, referring to language attitude and language use, on the other.
Students' oral production was elicited by means of an oral narrative ('The Picnic Story, ' Heaton, 1966). To the authors' knowledge, this task was first used for research purposes by the Barcelona Age Factor Project (see Muñ oz, 2006), and since then it has been used in a variety of studies (Collins & White, 2011;Llanes & Muñ oz, 2009;Serrano, 2011;Tavakoli & Foster, 2008). The participants were shown six pictures representing two children going on a picnic with their dog (see Appendix A). The interviewer allowed the students to become familiar with the story before they were asked to narrate it.
To assess students' written production descriptive essays were elicited. The students were asked each time to write a description of a person, who, in their first essay was 'their best friend,' in their second essay 'someone they admired,' and in their third essay 'their best friend in the study abroad context.' For each individual essay, the students were given 15 minutes and were asked to write approximately 150 words.
Self-reported data, in the form of a written questionnaire, were used to obtain biodata, including information about participants' language learning history, as well as attitudinal data and information about different aspects of their stay abroad. The present study will focus on the questions that elicited information about students' attitudes toward English people and the English language as well as language contact while abroad (see Appendix B for some examples of key questions).

Procedure
The data collection took place in situ, that is, in the study abroad context. Most studies that analyze gains after an experience abroad tend to assess students' competence when they have returned to their home country. We believe that analyzing students' language production while they are still abroad provides a better reflection of the actual language gains that take place abroad compared to examining their skills once they have returned back home. First of all, the students are still in contact with the L2 and should have less interference from the first language (L1) and more automatic production of the L2 than when they are in a setting in which the L1 is dominant. Also, depending on how long students are in their home country again before their language production is examined, some of the gains that occurred while abroad might not be as apparent as in the SA context (especially those referring to procedural knowledge, to use DeKeyser's [2007] terminology).
Longitudinal data were collected at three time points. The pre-test (time 1) took place toward the beginning of the stay abroad (the last week of September). The data collection at time 2 occurred in December, before the students returned to their home country for the Christmas holidays. Finally, the data at time 3 were collected in the month of May. Even though the time lapse is longer from time 2 to time 3 than from time 1 to time 2, it should be borne in mind that the Easter break occurs between time 2 and time 3 and most students travel during that break, often to their home country.
The same procedure was followed for all three data collection points by the same researcher (one of the authors of this study). The researcher met with the students either individually or in pairs on university premises. They first completed the oral task, which was recorded in a quiet room with the presence of the researcher only. The students then performed the written task, and the questionnaire was completed at the end of the session. The students spent an average of 20-25 minutes to finish all the tasks.

Oral and written production
The same measures were adopted to analyze oral and written production, except for the case of fluency, for which syllables per minute (SPM) was adopted for oral fluency, while words per T-unit (W/T) was used for written fluency. The T-unit was adopted as the production unit except as otherwise noted.
The T-unit is defined as 'one main clause with all subordinate clauses attached to it' (Hunt, 1965, p. 20). Hunt developed the T-unit as an alternative to the sentence, the latter being subject to the learner's knowledge and command of the punctuation system of a specific language. The T-unit was considered appropriate for this study for the same reason.
Fluency was examined in terms of words per T-unit (W/T), which is a frequently used ratio. The total number of words in a sample was divided by the total number of T-units. Several studies have claimed that W/T is a good measure of development in L2 writing (Larsen-Freeman, 2006;Larsen-Freeman & Strom, 1977;Wolfe-Quintero, Inagaki, & Kim, 1998). It must be indicated that W/T has sometimes been assumed to measure grammar complexity more than fluency (Norris & Ortega, 2009;Ortega, 2003). Nevertheless, as Cooper (1976) and Wolfe-Quintero et al. (1998) suggest, longer does not necessarily mean more complex. Some evidence for the fact that longer T-units do not need to include more complex clauses is found in Casanave (1994), who observed that many of her students produced longer and more accurate T-units after some hours of instruction, though they were less complex. Fluency in oral production was examined by means of SPM, since this measure is generally considered more appropriate for oral fluency than W/T (Griffiths, 1991). For our study, the syllable count did not include false starts, repetitions, self-corrections, unfinished sentences, or words in a language other than English.
To analyze syntactic complexity, the T-unit complexity ratio (clauses per T-unit [C/T]) was adopted in this study, and within the term 'clauses,' both finite and non-finite clauses were considered. The total number of clauses in a sample was divided by the total number of T-units. Wolfe-Quintero et al. (1998) claimed that the majority of the studies they reviewed 'do support the usefulness of the clauses per T-units measure' (p. 86).
Lexical richness was examined using Guiraud's Index of Lexical Richness: word types divided by the square root of the word tokens (Types/Tokens). Some studies have shown that this measure is one of the most adequate for analyzing lexical richness in L2 learners ' productions (van Hout & Vermeer, 2007;Vermeer, 2000). In her review of the most commonly used measures of lexical richness in spontaneous speech data, Vermeer (2000) concludes that Guiraud's Index is highly reliable, while the traditionally used Type/Token ratio lacks validity and reliability due to its dependence on text length.
The measure errors per T-unit (Err/T) was adopted in this study to examine learners' accuracy. Err/T was obtained by dividing the total number of errors by the total number of T-units. The errors that were considered included lexical, morphological, and syntactic errors. Mechanical or pronunciation errors were not taken into account.
It should be emphasized that the accuracy scale works in the opposite direction from the other measures described above. While a higher number of W/T, SPM, C/T or a higher Guiraud's Index would indicate improvement over time, in the case of Err/T the opposite pattern occurs: fewer errors would indicate more accurate performance over time.
The Computerized Language Analysis (CLAN) program (Mac-Whinney, 2000) and the Statistical Package for the Social Sciences (SPSS, 2007) were used for the coding and analyses of the writing samples. Three different researchers (the three authors of this study) coded the data for the more objective measures (W/T, C/T, SPM). Inter-rater reliability was calculated for the division of the oral and written samples in T-units and clauses as well as for errors. In the first two cases, percentage agreement reached 100% (on 15% of the data, coded by all three researchers). For accuracy, which is usually more problematic, two researchers were in charge of the coding. Inter-rater reliability was calculated on 30% of the data, reaching 95% agreement. After all the samples were coded, analyses were performed using SPSS.

Questionnaire
Attitudinal data included six items related to attitudes toward English people and four items related to attitudes toward the English language. All of the items used semantic differential five-level scales. The bipolar adjectives regarding English people included sociable/unsociable, friendly/unfriendly, open-/narrow-minded, humble/snob, honest/false, reliable/unreliable. The adjectives regarding the English language included simple/complex, beautiful/ugly, well-/bad-sounding, easy/difficult to learn.
Regarding language contact, students were asked to state the type of accommodation they had chosen as well as a maximum of four people they had most contact with in their place of residence while abroad (either in their residence hall or apartment/house) -a variable that is referred to as interaction in this article. Students were asked to indicate the language of communication and nationality of each person. Students who were living with British families were excluded from the analysis since they were too few (n = 2). Students were also asked if there was someone from Spain with whom they spent considerable time while in England. For this interaction variable, two values were calculated, one for the number of reported Spanish-speaking roommates and one for the number of reported English-speaking roommates. The data obtained through the questionnaire were also analyzed using SPSS.

Statistical analyses
To analyze the language progress from time 1 to time 2 and from time 2 to time 3, Wilcoxon Signed Rank tests were performed with the different measures of fluency, complexity, and accuracy as dependent variables, first for the oral production task and then for the written production task. Non-parametric tests were preferred because of the low number of participants (n = 14 in written production; n = 13 in oral production).
In the analysis of the self-reported data, the Mann-Whitney U test, also a non-parametric test, was used. Because of the small sample, the exact sig. value (instead of the asymp. sig (2-tailed) value) was used to determine the level of significance of the results, as recommended by Field (2005). Independent variables with more than two levels in the original questionnaire were transformed into two levels because of the small size of the sample.

Results
The results of the different statistical analyses will be presented first for the oral production data, followed by the written production data, and lastly the results of the self-reported data.

Oral production data
The descriptive statistics for the mean scores obtained by the participants in the oral production task appear in Table 1. This table also contains information about the standard deviations in parentheses and the median scores. The results of the Wilcoxon Signed Rank tests performed for each of the measures comparing times 1-2, times 2-3, and times 1-3 appear in Table 2. After the significance value, we also include Cohen's d for effect size.
In view of these results, it appears that one semester abroad was enough for significant progress to occur in certain areas of oral production, namely, fluency and lexical richness. The effect size of these differences is large in the case of fluency and medium-large in the case of lexical richness. In contrast, the progress that the students experienced between the end of the first semester and the end of the second semester was not significant except in accuracy. In this area, the effect size of the difference between time 2 and time 3 was large. Considering the whole stay, all the areas of oral production under examination experienced a significant improvement (with the effect size of the differences from time 1 to time 3 being large), with the exception of syntactic complexity. Table 3 presents the descriptive statistics, including the mean, median, and standard deviation, for the scores obtained by the students at each data collection time for all the measures of written production.

Written production data
The results of the Wilcoxon Signed Rank tests as well as the effect sizes appear in Table 4.
Unlike the results for oral production, there was no significant progress by the students in terms of written production during their first semester abroad. Some significant progress begins to occur from the end of the first semester to the end of the second semester in terms of accuracy and syntactic complexity, and the effect size of these differences is large. However, the most significant development in terms of written production occurs between times 1 and 3, that is when initial and final performance is compared. All four areas under analysis (fluency, syntactic complexity, lexical richness, and accuracy) show significant growth and the effect size of the differences between the two time points is large (or medium-large in the case of lexical richness).

Comparing oral and written production
As can be seen from the descriptive and inferential statistics, the progress experienced by students abroad during the first and second semester in oral and written production differs. Some significant progress in oral production was already apparent in the first semester, but   significant improvement in written production did not manifest itself until the second semester. The progress in each of the areas analyzed (fluency, syntactic complexity, lexical richness, and accuracy) is represented in Figures 3-6. In the case of fluency (Figure 3), the scores for written fluency (W/T) have been multiplied by 10 to have a similar scale to SPM, which makes the relationship more apparent in the visual representation. Also, as explained above, the accuracy measure (Err/T) is the only one in which lower scores indicate improvement (fewer errors = more accuracy). Figure 3 shows that even though students' development of fluency can be said to be linear in both oral and written production, the progress in oral fluency during the first semester is more significant than during the second semester. The opposite is true for written fluency, for which the second semester seems to be more significant. In the case of syntactic complexity (Figure 4), development is apparent in the case of oral production. However, for written production, syntactic complexity declined at the end of the first semester but improved by the end of the second semester. Figure 5 also shows that the lexical richness of oral production improved more than written production  Figure 6 indicates that significant development in accuracy did not occur until the second semester.

Self-reported data
To examine the relationship between attitudes and linguistic gains, students' answers to the questionnaire at time 3 and students' gains from time 1 to time 3 were examined as it was expected that attitudes would have more impact on language gains after a longer period. Out of the six scales related to attitudes toward English people, significant differences were found in two of the six bipolar adjectives in the questionnaire. The students who rated English people as more sociable than unsociable made more gains in accuracy in their written production (U = 8, n1 = 7, n2 = 7, Z = −2.11, p = .04). The same is true for students who rated English people as more humble than snobbish (U = 5, n1 = 9, n2 = 5, Z = −2.33, p = .02). The effect size in both tests was large (Cohen's d = 0.59 and 0.65, respectively).
Out of the four scales related to attitudes toward the English language, significant differences were found in one of the four bipolar adjectives in the questionnaire. The students who rated English as more complex than simple made more gains in the lexical measure both in their written (U = 9, n1 = 8, n2 = 6, Z = −1.94, p = .05) as well as  their oral production (U = 6, n1 = 6, n2 = 7, Z = −2.14, p = .03). In both tests, the effect size was medium-large (Cohen's d = 0.52 and 0.59, respectively).
In examining the relationship between language contact and learning gains, gains from time 1 to time 3 were used in the analyses of two variables that remained constant throughout the academic year: accommodation and contact with a close Spanish friend during the academic stay. Gains from time 1 to time 2, time 2 to time 3, and time 1 to time 3 were used in the analysis of the variable that was more liable to change between semester 1 and semester 2 -that is, the linguistic profile of the people with whom students had more contact in their residence hall or apartment. As regards accommodation, it was found that there were significant differences between students who were living in an apartment/house and those living in a residence hall, with the former having more gains in lexical richness (oral production) between time 1 and time 3 (U = 3, n1 = 4, n2 = 7, Z = 2.08, p = .04; Cohen's d = 0.63). Results also indicated that students who generally did not have someone from Spain with whom they did almost everything experienced more gains in lexical richness (written production; U = 4, n1 = 5, n2 = 8, Z = −2.34, p = .02) and accuracy (oral production; U = 5, n1 = 5, n2 = 8, Z = −2.03, p = .05). The effect size in both tests was medium-large (Cohen's d = 0.62 and 0.72, respectively). Whether students were living with only English-speaking people or with one or more Spanish-speaking people turned out to be significant in the lexical richness measure (of written production) between time 2 and time 3 (U = 7, n1 = 6, n2 = 7, Z = −2.0, p = .05; Cohen's d = 0.5), with those living with only English-speaking people experiencing more gains. No significant differences were found in the oral production measures between times 1 and 2 or between times 1 and 3.

Discussion
In answer to the first research question, our results seem to suggest that L2 proficiency in oral and written production while abroad develop in somewhat different ways. The longitudinal design made it possible to observe that the 14 English learners examined in this study made significant progress in some areas of oral production (namely fluency and lexical richness) at the end of the first semester abroad, while no parallel improvement was registered in terms of written production. It is especially interesting that the areas that seem to improve the most after one semester abroad coincide with what most studies in the literature seem to suggest as being the areas for which spending time abroad could be especially beneficial, namely, oral fluency (Freed, 1995;Lennon, 1990;Llanes & Muñ oz, 2009;Yager, 1998) and vocabulary (Ife et al., 2000;Milton & Meara, 1995). Similarly, the findings from this study, concerning development from time 1 to time 2 in terms of writing, are in line with results reported by other researchers in which the SA context is not found to be particularly helpful for the development of written production (Freed et al., 2003). Indeed, most of the studies from the literature that report advantages in written production for SA students seem to analyze long periods of time (Sasaki, 2004(Sasaki, , 2007(Sasaki, , 2009). To our knowledge, there is only one study (Pérez-Vidal & Juan-Garau, 2009) that shows significant improvement in some aspects of written production after a relatively short experience abroad (three months). However, the improvement was observed on only two of the five measures considered to analyze fluency, complexity, and accuracy.
It is from time 2 to time 3 that students' oral accuracy improved. It seems as if the students benefited first from the SA context in terms of fluency and lexical richness, and only later does this progress extend to accuracy. The period between time 2 and time 3 is also when accuracy in written production develops significantly. The implications from these findings are that for L2 accuracy to develop, longer stays might be necessary in some cases, which could also explain why some studies focusing on accuracy in short-term stays have found little or no improvement (DeKeyser, 2010). It might also be the case, as has been found in previous studies, that other areas (or sub-systems) need to develop before a development in accuracy can occur (Caspi, 2010).
Considering students' progress throughout the whole academic year (time 1-time 3), the results reported in this study are quite hopeful for the SA experience as significant improvement occurs in almost all of the areas of oral and written production under analysis. These results could imply that the reason a clear advantage has not been unanimously reported in the literature might be related (among other possible factors, of course) to the short-term stays that tend to be analyzed (usually one semester or less).
Another objective of this study was to analyze whether some attitudinal and interactional factors were associated with the progress the students experienced abroad. In answer to the second research question we can say that several factors appear to have a certain relation with language development. We have found that some attitudes toward the L2 ('English is a complex language') or the people who speak it ('English people are sociable and humble rather than snob') were associated with gains in accuracy and lexical richness. The reason why 'sociable' and 'humble' were key adjectives could be that those students who considered English people sociable and not snobbish might have interacted more with them, which contributed in turn to language gains. It is also interesting that those who found the English language more complex were the ones who made more gains in lexical richness. Probably, these learners paid more attention to complexity, were challenged by this feature of the language, and as a result their production was more complex in terms of vocabulary. Nevertheless, it should be emphasized that not all attitudes toward the English language or English people under analysis were associated with language gains. This could be because the choice of adjectives (which were selected through an Internet search of stereotypes of British people by foreigners) may not have been exhaustive enough or because some of the adjectives included may have referred to attitudes that had less effect on L2 learners' use of the language. More studies should analyze attitudes in a more detailed way to establish a clearer relationship between this variable and language gains abroad, as the present study has demonstrated that this is an area worth exploring in depth.
Moreover, our results suggest that living arrangements also seemed to have a role in the progress experienced by students. Similarly, those students who did not spend most of their time with a Spanish student improved their lexical richness more than those who did. In fact, these two situations likely lead to more possibilities for interaction in the L2, and such use/practice is probably responsible for the language improvement.
Although the current study was not designed within the Dynamic Systems Theory (DST), the results that are reported here can be explained using some of the major tenets of this theory (de Bot, 2008;Larsen-Freeman & Cameron, 2008;Verspoor, de Bot, & Lowie, 2011). Indeed, from a DST perspective, language development is seen as the interaction between a wide variety of internal and external factors that can be grouped in different levels and sub-levels (Lowie, Verspoor, & de Bot, 2009). Among these levels, Lowie et al. highlight the social, psycholinguistic, cultural, and linguistic levels. In the present study, we consider all of these levels and how they interact -we have examined how the learning context (which encompasses socio-cultural factors) may be related to language development. In addition, we have analyzed different sub-systems within the linguistic level: fluency, syntactic complexity, lexical richness, and accuracy, both in oral and written mode. Our findings certainly demonstrate that there is an interaction between the different levels and sub-levels.
Furthermore, the results of the present investigation suggest that some sub-systems develop faster than others in the SA context: globally, it seems that progress occurs earlier in oral production than in written production. In the case of oral production, it seems that the development of accuracy is slower, perhaps requiring other areas to develop before it (namely fluency and lexical richness). This finding is in line with Caspi (2010), whose study suggests that the development of both lexical and syntactic complexity precedes the development of lexical and syntactic accuracy, which is explained by the 'nestedness' and hierarchical structure of dynamic systems (van Geert, 1995). As Caspi (2010) suggested, these two characteristics (nestedness and especially hierarchical structure), which are typical of language development according to DST, can explain why, for example, vocabulary acquisition is a prerequisite for the development of syntactic complexity.
For all the above-mentioned reasons, we consider that DST offers an appropriate framework to investigate language development abroad and further studies should be conducted to examine the SA context from a DST perspective, ideally with more data collection points than the present study and with more information about other variables and individual development of participants instead of focusing on group means.

Conclusion
The results that we have reported in this study suggest that the SA context potentially provides an advantageous experience for students to improve their L2 skills. Nevertheless, the word 'potentially' must be emphasized here since not all learners will necessarily find such a context beneficial, as studies with larger groups of participants and different measures of socio-cultural and individual variables may reveal. According to the findings from this study, length of stay is an influential variable in terms of the progress that is to be expected for oral and written skills. More time is necessary for measureable progress in written production to occur than for oral production. The findings of this study are certainly innovative in this respect since they contribute to the debate on whether the SA context is beneficial for written development or not. According to our findings, written development can occur while abroad; however, a substantial amount of time in the L2 country (in this study, two full semesters) is necessary before such development can take place. Our results also suggest that attitudes and types of interaction can influence linguistic improvement to a certain extent.
In this study, we have only analyzed three factors that can contribute to the L2 development in the SA context, namely duration of the stay, attitudes, and living arrangements. We are also aware of many other factors that can determine whether the potential of the SA context materializes: initial proficiency level (which according to DeKeyser [2007; is crucial), aptitude (DeKeyser, 2010), and motivation (Dwyer, 2004;Isabelli-García, 2006), to name a few. Future studies should concentrate on different individual factors and relate them to the kind of progress that occurs abroad.
More longitudinal studies like the one reported here (and ideally inclusive of more L2 samples) are also necessary to gain better insight into L2 development in the SA context. As in the present study, it is important that longitudinal analyses include a variety of measures that tap different areas of language proficiency so as to better understand what L2 aspects are more likely to improve after a stay abroad experience, and in which order L2 gains should be expected to appear.