The Graded Effect of Valence on Word Recognition in Spanish

The influence of emotional content on language processing remains unclear. Previous research conducted in English has obtained contradictory results regarding the effects of valence on word recognition. Whereas some studies indicate that valence predicts recognition latencies such that negative words are processed more slowly than positive words, other studies indicate facilitation of responses to emotional (both positive and negative) compared to neutral words. The authors examined the influence of valence and arousal on word recognition reaction time (RT) using large-scale word naming and lexical decision data-sets in Spanish. They found that linear mixed-effects model estimates revealed a valence but not an arousal effect on reading latencies. The influence of valence was better captured by a graded (RTs to positive words < neutral < negative) than by a categorical (positive < negative) valence effect. A categorical emotional versus neutral effect was not reliably observed. In an advance on previous research, the authors’ analyses showed that the valence effect is substantially more prominent in lexical decision than in pronunciation. These results mirror some of those reported previously in English, adding evidence to support their validity, and demonstrating important parallels in word recognition processes in orthographically shallow as well as deep languages.


Abstract
The influence of emotional content on language processing remains unclear. Previous research conducted in English has obtained contradictory results regarding the effects of valence on word recognition. Whereas some studies indicate that valence predicts recognition latencies such that negative words are processed more slowly than positive words, other studies indicate facilitation of responses to emotional (both positive and negative) compared to neutral words. We examined the influence of valence and arousal on word recognition reaction time using large-scale word naming and lexical decision data-sets in Spanish. We found that linear mixed-effects model estimates revealed a valence but not an arousal effect on reading latencies. The influence of valence was better captured by a graded (RTs to positive words < neutral < negative) than by a categorical (positive < negative) valence effect. A categorical emotional vs. neutral effect was not reliably observed. In an advance on previous research, our analyses showed that the valence effect is substantially more prominent in lexical decision than in pronunciation. These results mirror some of those reported previously in English, adding evidence to support their validity, and demonstrating important parallels in word recognition processes in orthographically shallow as well as deep languages.
Keywords: valence; word recognition; Spanish; reading; lexical decision The interplay between emotion and cognition is central to the study of human psychology (Russell, 2003). Emotion is usually characterized in a bidimensional space framed by the theoretically orthogonal dimensions of valence and arousal (Bradley & Lang, 1999;Osgood, Suci, & Tannenbaum, 1957;Russell, 2003), and is argued to modulate our focus of attention, directly influencing word processing (Lang, Bradley, & Cuthbert, 1997). Valence refers to the pleasurable nature of a stimulus, ranging from negative or unpleasant to neutral to positive or pleasant, whereas arousal refers to the degree of activation elicited by a stimulus, ranging from calming to exciting. Despite considerable effort expended in previous studies, based either on direct experimental manipulation of the emotional qualities of words (Kanske & Kotz, 2007;Kousta, Vinson, & Vigliocco, 2009) or on analyses of data gathered from large-scale studies (Algom, Chajut, & Lev, 2004;Estes & Adelman, 2008a;Kousta et al., 2009;Kuperman, Estes, Brysbaert, & Warriner, 2014;Vinson, Ponari, & Vigliocco, 2014), the effects of valence and arousal during word recognition remain unclear. We report findings from an analysis of observations on word recognition in Spanish, in the word naming and lexical decision tasks. Our analyses help to clarify the influence of emotion on word recognition, in a shallow orthography, and under varying task demands.
According to the Automatic Vigilance model of emotion (Pratto & John, 1991), undesirable aversive events are more likely to retain attention than neutral or pleasant ones. This feature of negative stimuli complicates the disengagement of attention, delaying a possible response (Fox, Russo, Bowles, & Dutton, 2001). The effect of such a bias could then be taken to explain evidence of increased reaction times for negative words in a variety of tasks including the Stroop test (Algom et al., 2004;Pratto & John, 1991) as well as lexical decision or word naming tasks (Algom et al., 2004;Estes & Adelman, 2008a;Kuperman et al., 2014;Yao et al., 2016).
The model of Motivated Attention and Affective States (Lang et al., 1997), on the other hand, proposes that motivationally relevant events, including both positive and negative stimuli, are more likely to attract attention compared to affectively neutral events, thus speeding responses to emotional words. This hypothesis is supported by emotional facilitation effects obtained in lexical decision experiments in which negative and positive stimuli elicited faster reaction times than neutral words (Citron, Weekes, & Ferstl, 2013;Kanske & Kotz, 2007;Kousta et al., 2009;Palazova, Mantwill, Sommer, & Schacht, 2011;Vinson et al., 2014).
A further inconsistency among the results of previous studies relates to the role of arousal during word recognition and its possible interaction with valence effects. Thus, whereas Kousta et al. (2009) and Vinson et al. (2014) reported effects of emotional valence with no significant influence of arousal, Estes and Adelman (2008a) and Kuperman et al. (2014) observed independent effects of both arousal and valence, with arousing words being recognized more slowly than calming words.
Finally, there is a debate concerning whether valence effects are graded or categorical. Kousta et al. (2009), who observed facilitatory effects of both negative and positive stimuli, and Kuperman et al. (2014), who observed increased reaction times for negative words, attributed their otherwise contradictory results to a graded measure of valence. In contrast, Vinson et al. (2014) observed significant facilitation for positive and negative words, compared to neutral words, as a categorical emotion effect. Furthermore, Estes and Adelman (2008b) showed that the interaction between arousal and valence observed by Larsen et al. (2008) appeared only when valence was entered in the analyses as a continuous factor but not when it was coded as a categorical, positive vs. negative, variable.
In the current article, we present a set of analyses in which we tested the influence of affective content on word naming and lexical decision reaction times obtained from previously gathered data in Spanish (Davies, Barbón, & Cuetos, 2013;González-Nosti, Barbón, Rodriguez-Ferreiro, & Cuetos, 2014). Given the inconsistencies observed among the results of previous research, we aimed to clarify the form of the valence effect on word recognition. Does emotional valence have an effect, and, if it does, what is the best measure for capturing the effect?
Spanish is a language in which the spelling-to-sound mappings are regular so that its orthography is characterized as shallow or transparent. Research in English has limited the observation of the influence of semantic content on reading performance to low frequency irregular words that are harder to encode phonologically (Plaut, McClelland, Seidenberg, & Patterson, 1996;Strain, Patterson, & Seidenberg, 1995;Woollams, Lambon-Ralph, Plaut, & Patterson, 2007; but see Balota, Cortese, Sergent-Marshall, Spieler, & Yap, 2004;Cortese & Khanna, 2007;Monaghan & Ellis, 2002). However, Davies et al. (2013) and González-Nosti et al. (2014) reported that a semantic component is apparent among the factors that influence oral reading in Spanish, suggesting that the influence of semantic content may be found more broadly than has previously been found (cf. Ricketts, Davies, Masterson, Stuart, & Duff, 2016). Davies et al. (2013) and González-Nosti et al. (2014) did not investigate the effects of valence or arousal. Finding such effects would therefore add evidence suggesting that reading processes are richly influenced by semantic information, not just imageability or, arguably, Age-of-Acquisition (Balota et al., 2004;Brysbaert & Ghyselinck, 2006;Cortese & Khanna, 2007;Davies et al., 2013), but also by valence or arousal. Extending observations on the shape of the emotion effects to a transparent orthography is thus a critical contribution of the present study.
Another important contribution lies in the fact that by comparing emotion effects on reading in different tasks we were able to examine the locus of the effects.
Psycholinguistic effects --most prominently, effects associated with lexical or semantic knowledge --are typically reported to be larger in lexical decision than in reading aloud (e.g. in English, Balota et al., 2004;Cortese & Khanna, 2007;Davies, Arnell, Birchenough, Grimmond, & Houlson, 2017). Critically, a comparison between word naming and lexical decision in Spanish provides valuable information on the extent to which the effects of emotion variables were moderated by the effects of task demands and therefore (Cortese & Khanna, 2007) the extent to which the impact of emotion variables could be linked to reliance on semantic processing in word recognition (Chumbley & Balota, 1984) in transparent orthographic systems.
One potentially important source of the discrepancies between the results of previous studies is related to the differing extent to which possible psycholinguistic confounds were taken into account in different analyses. In a review of 32 studies, Larsen, Mercer and Balota (2006) showed that lexical variables like length, frequency or orthographic lexical density were confounded with valence differences and therefore the effects of these variables were potentially responsible for what had been interpreted as valence effects in the Stroop task. Although recent studies using lexical decision and word naming tasks have invested considerable effort in the control of potentially confounding lexical variables, different research groups have focused on different sets of variables. This could explain, at least in part, the inconsistency among results. For example, Kuperman et al. (2014) included in their models different measures of word length, lexical density and lexical frequency, as well as Age-of-Acquisition (AoA), Context Distinctiveness (CD) and initial phoneme information, whereas Vinson et al. (2014) included only one predictor variable each to capture effects of length, frequency and density, but introduced positional bigram frequencies and extended the control of lexical-semantic factors with the inclusion in models of, not only AoA, but also concreteness, imageability and familiarity measures. In our analyses, we included a large set of variables as control factors to facilitate comparability with previous research and to strengthen our results by isolating the impact of emotional factors, over and above the effects of better known psycholinguistic variables.
We investigated the composition of emotion effects in reading, examining the impact of valence and arousal on word recognition in Spanish. We investigated the shape of the valence effect, examining whether the valence effect should best be described as a graded (positive-neutral-negative) or a categorical positive-negative valence effect, or as a categorical emotional-neutral effect. Close examination of recent reports (e.g. Kuperman et al., 2014;Vinson et al., 2014) makes it apparent that a number of alternate routes can be taken, and have been taken, through the process of analysing word recognition data to uncover the effects of emotional variables. Gelman and Loken (2014) characterize such variation as resembling a "garden of forking paths", and Silberzahn and Uhlman (2015; see also Simmons, Nelson, & Simonsohn, 2011) have demonstrated the consequences of variation in analytic approach in relation to differences in the size and direction of the effects that can be estimated. Our approach to analyzing the psycholinguistic effects on word recognition therefore, firstly, assimilated critical alternative steps employed in previous studies. Secondly, we examined the impact on our findings of variation in analytic choices by comparing results across critical alternative permutations in analysis steps. We share our data and analysis code to enable readers to review our choices or to examine alternative approaches.

Data
We gathered reaction time data from previous large-scale studies that had examined word recognition in Spanish using the word naming (Davies et al., 2013) and lexical decision (González-Nosti et al., 2014) tasks. Davies et al. (2013) recorded word naming reaction times from 25 monolingual speakers of Spanish using randomized lists of 2,765 words. Their stimuli set included all nouns, verbs and adjectives between 3-10 letterslong, excluding compounds, from the LEXESP (Sebastián-Gallés, Carreiras, Cuetos, & Martí, 2000) database, which is one of the most used psycholinguistic databases in Spanish. González-Nosti et al. (2014) obtained lexical decision reaction times for the same words from a group of 36 participants. These data were combined with valence and arousal values gathered through Emofinder (Guasch, Padrón, Haro, Ferré, & Fraga, 2017), a web-based search engine for Spanish word properties from different normative databases (Ferré, Guasch, Moldovan, & Sánchez-Casas, 2012;Guasch, Ferré, & Fraga, 2016;Hinojosa et al., 2016;Redondo, Fraga, Comesaña, & Perea, 2005;Redondo, Fraga, Padrón, & Comesaña, 2007;Stadthagen-Gonzalez, Imbault, Pérez Sánchez, & Brysbaert, 2017), resulting in lexical decision and word naming data for a set of 2,555 words. Affective norms were gathered using nine-point scales for valence and arousal dimensions by means of the self-assessment manikin standard method (Bradley & Lang, 1994), a non-verbal pictorial assessment technique that allows direct measurement of these dimensions using simple non-verbal icons to depict various points along each of them. When data for a given word were available in various databases we used averaged values.
In addition, we compiled lexical characteristics known to influence word recognition including: word length measured as number of letters, phonemes and syllables; initial phoneme; written subtitle-based lexical frequency, as CD or as word form occurrence count; mean positional bigram frequency; orthographic and phonological neighbourhood size (N, Coltheart, Davelaar, Jonasson, & Besner, 1977) as well as averaged Orthographic Levenshtein Distance (OLD, see Yarkoni, Balota, & Yap, 2008) measures of lexical similarity neighbourhoods; and subjective ratings of familiarity, imageability, concreteness and AoA. All the values were gathered from the EsPAL database (Duchon, Perea, Sebastián-Gallés, Martí, & Carreiras, 2013) except for the AoA data which were obtained by Davies et al. (Davies et al., 2013). Lexical frequency values represent per-million occurrences from a 462,611,693 token data-set constructed from movie subtitles in Spanish. CD refers to the percentage of movies in which a given word appeared in the corpus, out of a total of 98,339 distinct movies. Bigram frequency and neighbourhood values were taken from the same database. Familiarity and imageability ratings correspond to averaged scores obtained with seven-point scales from at least 30 participants, except for the AoA data which correspond to averaged scores from 25 informants. A summary of the normative values of the psycholinguistic variables is given in

Results
We begin by reporting steps taken to clean the data-set for analysis, removing outliers and missing values. We then report the correlations between psycholinguistic variables and the steps taken to reduce the problem of multicollinearity indicated by the correlations. We firstly report an analysis of the combined cross-task data-set. We then report the results of separate task-specific analyses of the lexical decision and word naming data, in a step analogous to simple effects analyses, designed to aid the interpretation of interactions between the effects of task and of the psycholinguistic variables.

Data treatment
We analysed trial-level data corresponding to the latencies of correct responses made by 61 participants to the 2,555 stimulus words for which we had complete critical psycholinguistic variable information. After excluding errors, outlier fast responses (responses associated with RT < 200ms) or responses to words for which data were missing on one or more psycholinguistic variables, we had a data-set of 60,690 word naming latencies and 79,616 lexical decision latencies for the task-specific analyses. A total of 140,306 observations was available for the primary cross-task analysis.

Preparation of predictor variables
Correlations between predictor variables must be examined because of the potential problem of multicollinearity. This problem arises in a linear model or, by extension, in a linear mixed-effects model, when the information associated with predictors overlaps, as indicated by high pairwise correlations (r > .8, according to a commonly used threshold) or condition numbers (kappa > 12, according to another common threshold; compare Baayen, Davidson, & Bates, 2008;Cohen, Cohen, West, & Aiken, 2003). It can be seen that there are correlations r >= .7 for a number of pairs of variables, including correlations between length (letters, phonemes, syllables), and orthographic similarity measures (orthographic neighbourhood size, phonological neighbourhood size, and orthographic Levenshtein distance), as well as between the different measures of frequency (word form frequency and CD), and semantic content (imageability and concreteness). These correlations or, rather, the multicollinearity they indicate, would if ignored pose the risk that analyses would not be capable of estimating the unique contributions to outcome variance of separate predictor variables, or would estimate effects that would not be stable between different samples (Cohen et al., 2003). Therefore we took the following steps to address the multicollinearity, prior to conducting our formal analyses: (1.) we combined the length measures, number of letters and number of phonemes, by averaging them together to create a new variable, "length"; (2.) we combined the orthographic and phonological neighbourhood size measures by averaging them together to create another new variable, "N-size"; (3.) we selected the CD measure of frequency for use in the analyses, given recent findings (Adelman, Brown, & Quesada, 2006;Brysbaert & New, 2009) indicating its superior performance compared to word form frequency in explaining variance in reading performance; (4.) between concreteness and imageability, we selected imageability as a measure of semantic content for inclusion as a model predictor, given its common use in previous analyses of large data samples (e.g. Cortese & Khanna, 2007); and (5.) we standardized all continuous numeric predictor variables. We include the aggregated length and N-size variables in the Table 2 correlations, for information.
We examined whether these choices influenced our results. We checked if the choice of frequency, orthographic similarity or semantic measure affected the estimates of emotion effects. We fitted models: (1.) using either the CD or the word form frequency measure, but not both, in separate analyses; (2.) using either the aggregated N-size or the OLD measures of neighbourhood similarity (but not both) in separate analyses; and (3.) using either the imageability or concreteness measures (but not both) in separate analyses. We found that alternation in the choice of frequency, orthographic similarity or semantic measures did not substantially influence the size or direction of the estimates of the valence or arousal effects. The interested reader is referred to the Supplementary Materials for detailed information.
Standardizing continuous numeric predictors removes non-essential collinearity due to scaling (Cohen et al., 2003) and it is critical for the estimation of interaction or curvilinear effects because lower-and higher-order terms are collinear if numeric predictor variables are not first rescaled to center on zero. Although Vinson et al. (2014) and Kuperman et al. (2014) chose to center their numeric predictors on mean values, we preferred to standardize predictors because transforming the variables to the same scale allowed straightforward comparison of effects.

Construction of categorical valence variables
We followed previous authors (Estes & Adelman, 2008a;Vinson et al., 2014) in constructing categorical valence predictor variables: (1.) a variable coding for whether word valence was positive or negative, termed positive-negative valence; (2.) a variable coding for whether word valence was emotional or neutral, termed emotional-neutral valence. We constructed categorical positive-negative and emotional-neutral valence terms for the cross-task (naming and decision), the word naming, and the lexical decision data-sets. In our cross-task data-set, raw valence ratings varied from 1.2 to 8.7, with a mean (SD) of 5.3 (1.4); in the valence ratings studies (e.g. Stadthagen-Gonzalez et a., 2017), ratings had been elicited for a scale ranging from 1 (unhappy) to 9 (happy) via 5 (neutral, neither happy nor sad). To create the positive-negative factor, we divided the data by coding words with rated valence < 5 as negative, and words with rated valence >= 5 as In the word naming data-set, the same categorization scheme resulted in the classification of 19,326 observations as concerning responses to negative words, 41,364 concerning positive words, 17,478 concerning emotional words, and 43,212 concerning neutral words. In the lexical decision data-set, the same scheme resulted in the classification of 24,505 observations as concerning negative words, 55,111 concerning positive words, 23,557 concerning emotional words, and 56,059 concerning neutral words. It can be noted that we used linear mixed-effects models to analyse latencies, and that such models are robust to imbalances in numbers of observations.

Cross-task analysis
We examined the latencies of correct responses to words in both the lexical decision and naming tasks, in a cross-task analysis, fitting linear mixed-effects models to estimate effects using the lme4 package version 1.1-14 (Bates et al., 2017) in R version 3.4.2 (R development core team, 2017). All predictors were entered simultaneously.
We report the results of analyses of the effects of psycholinguistic variables on reading response RT but it is common practice to transform the outcome variable to log10(RT) to ameliorate skew in the distribution of latencies. We checked if the choice of outcome variable transformation made any difference to our results. We therefore repeated the final models (described later), for each valence measure, for the cross-task and for the task-specific lexical decision and word naming data-sets. To anticipate, we found that the significance and, more critically, the direction and the relative size of psycholinguistic effects were replicated in alternate log10(RT) or -1/RT compared to the RT models (see Supplementary Materials).
Following Vinson et al. (2014) and Kuperman et al. (2014), we began our analyses by specifying a baseline model. Because our primary focus was on the cross-task data, the baseline model had to incorporate effects due to task, psycholinguistic variables, and interactions between the effects of task and of the psycholinguistic variables. We report, firstly, our observations from the process of specifying an adequate baseline model. We report, then, the results from subsequent analysis steps conducted to evaluate the contribution of valence and arousal to our account of the variance of response latencies in reading in Spanish. In these steps, as we explain following, we examined whether the addition of the affective variables was warranted by improved model fit to data. We evaluated model fit using information criterion (e.g. Burnham & Anderson, 2004) and Likelihood Ratio Test (LRT, see, e.g., Baayen, 2008;Baayen et al., 2008;Pinheiro & Bates, 2000) comparisons.
The research questions investigated in our analysis were: We structure our results reporting correspondingly. We addressed each question in turn, examining whether the addition to our model of a term corresponding to the effect of interest, for example, of valence, improved model fit to data. We did this separately for each valence measure. We compared model fits for models with vs. without the effect of interest using the LRT. In addition, we report the results of t-tests of the coefficient estimates for each effect of interest, employing Satterthwaite approximations to denominator degrees of freedom (p-values were derived with the lmerTest package, Kuznetsova, Brockhoff, & Bojesen Christensen, 2016). At present, different methods are commonly used to examine the utility of hypothesised effects or, equivalently, the relative utility of alternative models (with or without the effects). It was reassuring to find, as we shall report, that, concerning the utility of hypothesized effects, the indications from the model fit comparisons and the hypothesis tests coincided.
We began by comparing models that varied in fixed effects, corresponding to psycholinguistic effects, but were consistent in the inclusion of random effects due to unexplained differences between sampled participants or between items in intercepts (random intercepts). We fitted terms corresponding to all effects of interest, ultimately.
Model comparisons are reported as tests of the utility (for model fit to data) of the inclusion of terms corresponding to these effects, not as the basis for including the terms. We conclude this section by presenting a summary of the full model including all effects of interest.
We then examined the utility for model fit of adding random effects due to differences between participants in the slopes of the (within-subjects) psycholinguistic effects or between items in the slope of the (within-items) task effect. Matuschek, Kliegl, Vasishth, Baayen, & Bates (2017;see, also, Barr, Levy, Scheepers, & Tily, 2013) argue that an adequate balance between the relative sensitivity and conservatism of an analysis can be found by examining whether the inclusion of random effects terms improves model fit to data. We did this by fitting a model with all fixed effects of interest and all random effects permitted by the study design, then removing random effects terms until we arrived at a model with a parsimonious random effects structure (as complex as appeared defensible, given the data). We present the cross-task and task-specific models, finally, with this random effects structure. We note that the results of comparisons between models varying in fixed effects did not differ if we specified only random intercepts (as presented) or instead the more complex, but parsimonious, random effects structure identified in our later checks (code and results for both sets of models are presented in the Supplementary Materials).

Baseline task x psycholinguistic effects models
We began by fitting a baseline model. We examined four candidate baseline models. In all models, we included terms corresponding to the following key variables: a word initial phoneme coding variable; a variable coding for reading task (word naming vs. lexical decision); the CD frequency measure; AoA; familiarity; imageability; the aggregate word length measure (the average of length in letters and in phonemes); word length in syllables; and the aggregate neighbourhood size measure (the average of orthographic and phonological neighbourhood size). The models incorporated fixed effects terms corresponding to the effects of the psycholinguistic variables and, at this stage, random effects terms corresponding to variance due to unexplained differences between sampled participants or words in intercepts (random intercepts).
In model 1, the fixed effects included the effects of task, phoneme, and the critical psycholinguistic variables except valence or arousal. No interactions were included. All numeric predictor variables were specified as terms corresponding to linear effects.
Previous observations have indicated that the effects of some psycholinguistic variables on response latencies, like the effects of word frequency or length, may be curvilinear in English (e.g. Baayen, Feldman, & Schreuder, 2006;New, Ferrand, Pallier, & Brysbaert, 2006) and in Spanish (Davies et al., 2013). Studies of the impact of emotion on word recognition have included reports suggesting non-linear effects of valence (Estes & Adelman, 2008b;Kousta et al., 2009; but see Kuperman et al., 2014). We therefore examined, in model 2, if curvilinearity should be allowed for the effects of any of the psycholinguistic variables (excluding task, initial phoneme, and length in syllables). A comparison of model 1 and model 2, where, in the latter, all numeric predictors were fitted to latencies using restricted cubic splines (with up to k = 3 knots), indicated that the model allowing for curvilinearity fit the data better (LRT comparison, χ 2 (7) = 357.7, p < .001). (See the Supplementary Materials for summaries of all models.) We examined curvilinearity in the effects of psycholinguistic variables using restricted cubic splines (e.g. Baayen, 2008;Davies et al., 2013) but checked if the influence of emotion effects was the same in analyses using polynomial (up to quadratic) terms. This is because, while Vinson et al. (2014) preferred to fit polynomial terms to estimate potentially non-monotonic emotion effects, Kuperman et al. (2014) preferred to fit restricted cubic splines (in Generalized Additive Models, GAMs; see also Kousta et al., 2009). The results of the check analyses indicated that the size and direction of critical effects estimates were not substantially different if polynomials or splines were used to capture curvilinearity in effects (see Supplementary Materials).
The effects of CD, AoA, familiarity, length and N-size were associated with significant curvilinear components (model 2, t-tests on corresponding coefficients, p < .05). Task, imageability and length in syllables were associated with significant linear effects only (model 1, all t-tests on corresponding coefficients, p < .01). Bigram frequency was associated with a marginal linear effect (t-test, p = .074) and a non-significant curvilinear effect (t-test, p > .10). To fit the most parsimonious defensible baseline model, given our data, we specified the CD, AoA, familiarity, length and N-size effects as curvilinear, and specified all other effects as linear, in all further models. The simplified model (model 3) fit the data as well as (or not detectably different than) model 2 (χ 2 (2) = .6, p = 0.758).
In the final baseline interactions model, we specified the described linear or curvilinear psycholinguistic effects, plus the effect of task, and the effects of all possible two-way interactions between the effect of task and the effect of each of the psycholinguistic variables. An LRT comparison indicated that including interactions between task and psycholinguistic effects improved model fit (χ 2 (36) = 1953.2, p < .001). Task differences significantly modulated the effects of CD, AoA, familiarity and length (ttests of task by psycholinguistic interaction effects, all ps < .05). A summary of the model is given in Table 3. Response latencies decreased with increasing CD frequency, familiarity and neighbourhood size though the impact of each effect diminished for higher variable values. Latencies increased with unit increase in AoA, word length measured in syllables or with increasing bigram frequency. The effect of the aggregate length variable was curvilinear such that latencies decreased slightly for increasing length, at first, and then increased with increasing length for longer words. The effect of AoA was curvilinear such that the impact of AoA was greater for later-acquired words.
Each psycholinguistic effect was more pronounced in lexical decision than in naming.
( Table 3, about here) In the following sequence of analyses, to address each of the research questions, over a series of models, we successively added terms corresponding to the effects of interest.
The addition of terms was cumulative so that later models included all terms specified in earlier models. At each step, we first added the term as a "main effect", that is, ignoring any potential interaction with task. We then added the term as both the lowerorder component and as the task x effect interaction. Stepping up the increments in model complexity in this way allowed us to evaluate whether the addition of the effect was warranted with or without allowing for the modulation of the effect by task differences. In the following, we report the results of LRT comparisons of the model with versus without the additional term, added as a "main effect", then of the model with the additional "main effect" versus the model with the additional term added as a "main effect" and as a task by effect interaction.

Test of the effect of valence
Our first research question was: Does valence have an effect, and, if it does, what is the best measure for capturing the effect? To answer the first part of that question, we added the main effect of valence to the baseline interactions model, in separate models, one model for each valence measure: graded valence, categorical positive-negative valence, or categorical emotional-neutral valence. By comparing the fit of the baseline interactions model to the fit of the model including a valence measure, we evaluated if the addition of valence was useful in accounting for observed variance in Spanish reading. Likelihood ratio test comparisons showed that the addition of valence was justified by significantly improved model fit with the addition of graded valence (χ 2 (1) = 25.3, p < .001) or positive-negative valence (χ 2 (1) = 27.7, p < .001) but not of emotional-neutral valence (χ 2 (1) = .8, p = .363). (Allowing the effect of graded valence to be curvilinear did not improve model fit, χ 2 (2) = 3.5, p = .178.) In the second step, we examined whether the valence effect was moderated by the effect of task differences. We compared the fit of a model including the baseline interactions terms plus valence to the fit of a model with the same baseline interactions terms plus valence and the task by valence interaction. LRT comparisons showed that the addition of a task by valence interaction was justified by significantly improved model fit for the models including graded valence (χ 2 (1) = 16.3, p < .001), positive-negative valence (χ 2 (1) = 7.2, p = .007) but not emotional-neutral valence (χ 2 (1) = .3, p = 599). We term these models the baseline-plus-valence models.
In evaluating competing models using information criteria, we are concerned with the relative, not the absolute, AIC values. Criteria with lower values (smaller values if positive, closer to negative infinity if negative) indicate that models have higher likelihood (log likelihood, scaled by multiplication by -2), that they incorporate effects estimates that allow better prediction of observed latencies, minimising Kullback-Leibler information loss (Burnham & Anderson, 2004;McElreath, 2016). The graded rated valence or categorical positive-negative valence models better approximated Spanish word recognition performance data than did the baseline or categorical emotional-neutral models. A summary of the baseline-plus-valence models is shown in Table 3. The influence of rated valence on word recognition RTs, and its greater prominence in lexical decision, is clearly illustrated in Figure 2.

Test of the effect of arousal
Our second research question was: Does arousal have an effect, either as a main effect or in a task by arousal interaction? We answered this question by comparing the fit of the baseline-plus-valence model with the fit of models including the same terms as the baseline-plus-valence model plus, successively, the main effect of arousal, and the effects of arousal and the task by arousal interaction.
For models representing valence as a graded measure, likelihood ratio tests indicated that, compared to the baseline-plus-valence model, the addition of arousal did not significantly improve model fit if added as a main effect (χ 2 (1) = 2.0, p = .158).
Compared to the baseline-plus-valence and arousal model, the addition of arousal as main and task by arousal interaction effects did not improve fit (χ 2 (1) = 1.3, p = .258).
For models incorporating valence as a positive-negative factor, the same pattern of results was found. LRTs indicated that, compared to the baseline-plus-valence model, the addition of arousal did not significantly improve model fit if added as a main effect (χ 2 (1) = 2.2, p = .140). Compared to the baseline-plus-valence and arousal model, the addition of arousal as main and interaction effects did not improve fit (χ 2 (1) = .2, p = .622). For models incorporating valence as an emotional-neutral factor, likewise, the addition of arousal did not significantly improve model fit if added as a main effect (χ 2 (1) = .3, p = .603) or as main and interaction effects (χ 2 (1) = .7, p = .403).
Adding arousal, as a main effect, or as main and task by arousal interaction effects, did not improve the fit to data, compared to models incorporating baseline and valence terms. The limited impact of arousal in either task is clearly illustrated in Figure 3. We termed the models including the arousal and task by arousal interaction effects the baseline-plus-affect model.

Test of the interaction between the effects of valence and arousal
Our third research question was: Do the effects of valence and arousal interact? To answer this question, we compared the fit of the baseline-plus-affect model with the fit of models including the same terms plus, successively, the valence by arousal interaction effect, and the valence by arousal as well as the task by valence by arousal interaction effects.
For models representing valence as a graded measure, LRTs indicated that, compared to the baseline-plus-affect model, the addition of the valence by arousal interaction did not significantly improve model fit (χ 2 (1) = .2, p = .695). However, compared to a baselineplus-affect model that also included a valence by arousal interaction, further adding the task by valence by arousal interaction effect did improve model fit (χ 2 (1) = 5.4, p = .020).
For models representing valence as a positive-negative factor, a different pattern of results was found. LRTs indicated that, compared to the baseline-plus-affect model, the addition of the valence by arousal interaction did not significantly improve model fit (χ 2 (1) = 2.4, p = .118). Nor, if added as valence by arousal and task by valence by arousal interaction effects, did that addition improve model fit to data (χ 2 (1) = .1, p = .796).
For models representing valence as an emotional-neutral factor, likewise, LRTs indicated that, compared to the baseline-plus-affect model, the addition of the valence by arousal interaction did not significantly improve model fit (χ 2 (1) = .0, p = .859). Nor, if added as valence by arousal and task by valence by arousal interaction effects, did that addition improve model fit (χ 2 (1) = .4, p = .550).
In sum, a potential interaction between the effects of valence and arousal was apparent but it was expressed in different ways depending on the measure of valence incorporated in the model. For models in which valence was represented using a positive-negative or an emotional-neutral categorical factor, the inclusion of the interaction between valence and arousal did not improve model fit. For models in which valence was represented as a graded valence measure, the impact of the valence by arousal interaction appeared to be constrained by task differences. We termed the models including the valence, arousal, valence by arousal and task interaction effects the baseline-plus-affect-interaction model.

Evaluating the modulation of valence and arousal effects by task differences
Our fourth research question was: Do the effects of valence and arousal interact with the effect of task? We addressed this question by estimating potential interactions between the effect of task and the effects associated with critical psycholinguistic variables. Our observations indicated, as seen, that psycholinguistic effects are modulated by task differences, with variation in the size and shape of the effects of frequency, AoA, familiarity and valence in lexical decision compared to word naming. These differences were explored in the task-specific analyses reported in a following section.

Evaluating the modulation of valence and arousal effects by frequency
Our final research question was: Are the effects of valence or arousal modulated by the influence of word frequency in interactions between the frequency and emotion effects?
To answer this question, we compared the fit of the baseline-plus-affect-interaction model with the fit of models including the same terms plus, successively, both CD frequency by valence and CD frequency by arousal interaction effects, and models including these interactions as well as terms corresponding to the modulation of the interactions by task differences.
We found that the addition of interactions between the effects of frequency and the effects of valence or arousal did not improve model fit to data, irrespective of the valence measure, whether comparing the fit of baseline-plus-affect-interaction models to models with the same terms plus just the frequency by valence and frequency by arousal interaction effects (graded valence, χ 2 (4) = 3.6, p = .470; positive-negative valence, χ 2 (4) = 2.4, p = .662; emotional-neutral valence, χ 2 (4) = 3.7, p = .445) or comparing the fit of the latter models to models with the same terms plus the task by CD frequency by valence or task by CD frequency by arousal interactions (for graded valence, χ 2 (4) = 5.1, p = .278; for positive-negative valence, χ 2 (4) = 2.8, p = .597). For models including emotional-neutral valence, the addition of terms corresponding to interactions between task, frequency and valence or arousal together did improve model fit to data (emotional-neutral valence, χ 2 (4) = 14.5, p = .006).
We termed the models including the frequency by valence, frequency by arousal, and corresponding task interaction effects, the baseline-plus-affect-frequency-interaction model. Our conclusion is that frequency did not significantly modulate the effects of valence or arousal except where, for models including valence coded as an emotionalneutral factor, the main effect of categorical valence was not, itself, reliably detected as a main effect.

Comparison of model fit across different measures of valence
We found that a comparison of information criteria statistics indicated that models representing the valence effect with a graded valence or a categorical positive-negative measure fit the data better than either a baseline model not including a valence measure, or a model including the categorical emotional-neutral measure (see Table 3). A comparison of information criteria statistics showed that the ranking of the relative utility of models incorporating different valence measures remained the same after models had been expanded to include effects associated with arousal. We evaluated, for each valence measure, the models including the baseline effects plus the effects of valence, arousal, and the valence by arousal interaction, as well as the interactions between these effects and the effects of frequency and task. We found that information criteria values indicated better fit to data for the model representing valence as a graded measure (AIC = 1666762) compared to the model representing valence as a positivenegative measure (AIC = 1666776), while both graded and categorical positive-negative valence models were better fits than a model including the categorical emotional-neutral variable as the valence measure (AIC = 1666800).

Comparison of model fit when only valence and arousal are entered as predictors
Readers may ask if the observed utility of the valence or arousal effects would appear to be different if only valence or arousal were entered as predictors or if the order in which valence or arousal were entered was varied. (We thank an anonymous reviewer for this suggestion.) We should note that the psycholinguistic variables were entered simultaneously in each of the reported models. However, entering valence and arousal as the only fixed effects (alongside random effects due to between-subjects or betweenitems differences in intercepts) allowed us to estimate a further measure of relative fit, to bring converging evidence to bear on the question of how valence or arousal influenced word recognition latencies in Spanish.
We fitted models of the cross-task data-set response latencies, separately for each valence measure, in which we specified as fixed effects: valence alone; arousal alone; valence and arousal as additive main effects; valence, arousal, and the interaction between valence and arousal. For each model, we calculated the marginal R2_m, the variance explained by the fixed effect(-s) as a proportion of the sum of all the variance components, including the fixed effects as well as the random effects and the residuals (with R2_m calculated using the MuMIn package, version 1.15.6, Barton, 2016;Johnson, 2014;Nakagawa & Schielzeth, 2013). We found that .3% of variance was explained by the graded valence effect, compared to .2% explained by the categorical positive-negative valence effect, and .1% by the emotional-neutral effect. The valence effect is small but, consistent with the results reported in the foregoing, we found that it was best captured by the graded valence measure. We estimated that .02% of variance was found to be explained by the effect of arousal, entered as a fixed effect on its own.
We calculated that a valence by arousal interaction explained, at best, .05% of variance.
Arousal, or the valence by arousal interaction, thus added little to our account.

Random effects
The models reported to this point have incorporated fixed effects due to the psycholinguistic variables, and random effects due to the differences between participants or between stimulus words in intercepts. We did not, up to this point, include variance terms corresponding to random differences between participants in the slopes of the within-subjects psycholinguistic effects, or between words in the slope of the within-items task effect (random slopes). This was a potentially important omission.
Not including random slopes has been argued to increase the Type I error rate (Barr, Levy, Scheepers, & Tily, 2013). However, Matuschek et al. (2017) have demonstrated that some caution is required because a loss of sensitivity can be associated with including random effects not warranted by the data.
We fit a model with the same fixed effects as the final baseline-plus-affect-frequencyinteraction models, with both random intercepts and random slopes. We excluded terms corresponding to covariances between random intercepts and random slopes, to random differences between subjects in the coefficients of the curvilinear components of the psycholinguistic effects, and to random differences between subjects in the coefficients of the word initial effect, because models including those terms did not converge. We fit a model (1) including the critical fixed effects plus random effects corresponding to random differences between subjects or items in intercepts, random differences between subjects in the slopes of the linear (main and interaction) psycholinguistic effects, and random differences between words in the slope of the task effect. This model fit the data approximately as well as a model (2) excluding terms corresponding to random differences in the slopes of interactions (χ 2 (3) = .7, p = .863), the latter fit the data better than a model (3) excluding a term corresponding to random differences between items in the task effect (χ 2 (1) = 637.1, p < .001), while the last fit the data better than a model with just random intercepts (χ 2 (10) = 1047.5, p < .001).
The model comparisons indicate that model (1) represents the best account of the Spanish reading data, including fixed effects terms that test theoretically critical questions, as well as a random effects structure that is as complex as necessary to fit the data, capturing random differences between subjects or items in intercepts and slopes.
We present a summary of the final model in Table 4. We show effects estimated with a model including the graded valence measure only because that measure was found to be most useful in capturing the influence of affect.

Task-specific analyses
The results of the cross-task analysis show that the effects of critical psycholinguistic variables are moderated by the influence of differences between reading tasks.The psycholinguistic effects were consistent in direction but smaller in size in the word naming compared to the lexical decision task. This pattern matches previous observations in English and other languages (e.g. Burani, Arduino, & Barca, 2007 in Italian; Balota et al., 2004;Cortese & Khanna, 2007 in English). However, we observed, for the first time, interactions between curvilinear psycholinguistic effects and task differences within the same analysis. To clarify how task differences moderated the psycholinguistic effects, we fitted the same model to the lexical decision and word naming data-sets. For each task-specific analysis, we estimated the effects of the same linear and curvilinear psycholinguistic effects, including the effects of valence, arousal, and the valence by arousal interaction. For each analysis, we included the same random effects structure as we identified for the cross-task final model, minus the random effect of items on the slope of the task effect. We fit models using each different valence measure though we report in detail only the results for the models representing valence as a graded measure.
In the task-specific model of lexical decision latencies, we found significant curvilinear effects of frequency, AoA, familiarity, length and neighbourhood size, along with linear effects of bigram frequency and valence (represented as a graded measure). In the model of word naming latencies, we found significant curvilinear effects of frequency, AoA, length, and neighbourhood size, along with linear effects of familiarity, word length in syllables, and the valence x arousal interaction. In Table 5 we present summaries of mixed-effects models of the task-specific data. (Table 5,  The valence effect in lexical decision was, on average, negative going, with words that were associated with higher (more positive) valence ratings associated with faster responses (estimate of the valence effect, coefficient = -3.4, SE = 1.5). The impact of task differences was to strongly reduce the valence effect in naming compared to lexical decision (estimate of the word naming valence effect, coefficient = .9, SE = 1.0).
The valence x arousal interaction effect in lexical decision was, on average, small and positive (estimate of the interaction effect, coefficient = .4, SE = .5), suggesting that the valence effect was slightly smaller for higher arousal words. In word naming, a contrasting pattern was apparent. The valence x arousal interaction effect in naming was small and negative (estimate of the word naming valence x arousal interaction effect, coefficient = -1.0, SE = .4), suggesting that the valence effect was slightly larger for higher arousal words.
In summary, the graded effect of valence was significant for lexical decision but not for word naming, though there was a trend suggesting an effect of valence in naming.
Consistent with the full cross-task analysis, the task-specific results indicated larger effects in lexical decision than naming for variables typically associated with lexical or semantic processes, frequency, AoA, familiarity and, critically for our study, valence.
While we do not report summaries of full models including categorical positivenegative or emotional-neutral valence measures, we note that positive-negative valence was associated with a significant effect in lexical decision but not naming, while emotional-neutral valence was not associated with a significant effect in either task.

Discussion
We aimed to assess the impact of affective content on word recognition in Spanish. We explored the influence of valence on performance in the lexical decision and naming tasks. In addition, we examined the effects of arousal, and of the interaction between valence and arousal. Our analyses revealed a significant effect of valence on word recognition, as emotional negativity delayed the participants' responses in lexical decision and to some extent in naming. These results support theoretical accounts of emotional stimulus evaluation in which negative affective values delay reaction times.
They contradict accounts in which emotional (negative or positive valence) words elicit faster responses than neutral words. Our results add to current understanding by showing that the valence effect is larger in lexical decision than in word naming, consistent with an interpretation of the effect as located in semantic processing. They demonstrate the importance of the valence effect in reading in Spanish, a language with a transparent orthography, significantly extending the apparent scope of the influence of emotion on reading. We discuss the theoretical implications of our observations in the following.
Pratto and John's Automatic Vigilance model of emotion (1991) proposes that undesirable stimuli grab more attention than desirable ones. According to this model, the effect occurs during automatic monitoring of the environment (i.e. monitoring without the perceiver's intent), functioning as a signal of potential danger. Based on their observations, in which undesirable stimuli retained more attention than positive ones, regardless of their relative valence, Pratto and John (1991) proposed that the valence effect was categorical in nature (see also Estes & Adelman, 2008a, 2008b. However, more recently, Kuperman et al. (2014) reported graded linear valence effects, leading those authors to argue that the automatic vigilance process is graded. The retention of attention is proportional to the negative affective value of the stimulus. In our study, word recognition reaction times were better explained by models including a graded (positive-negative) version of the valence measure, adding to the empirical support for a graded view of automatic vigilance.
We observed that the fit of models incorporating graded versus categorical positivenegative valence measures were not greatly different. It would be appropriate, then, to acknowledge that the impact of valence on word recognition can be captured by graded or by categorical measures of positive-negative valence differences. In our analysis, the graded measure of valence was found to be a more sensitive means of estimating the influence of valence on word recognition latencies. This is consistent with the greater information associated with a graded compared to a categorical measure of psychological variation (Cohen, 1983). However, as seen, the effect of valence is relatively small, the variance explained by the fixed effect of the graded valence effect was about .3% (the marginal R 2 _m; Nakagawa & Schielzeth, 2013) while for the positive-negative valence it was about .2%. This means that the graded valence effect may be readily detected in the long run. For comparison, we note that Adelman and Estes (2008b) reported a valence effect of .8% (lexical decision) or .6% (word naming) but remind their readers that the theoretical importance of the effect is nevertheless large. The difference between the size of the effect of valence in Spanish compared to English is interesting but should be the topic of future research.
The important point is that the difference between speed of response to positive compared to negative valence words was reliably detected in our analyses of Spanish reading behaviour. The balance of evidence is that the difference between response latencies for positive and negative words is graded. Equally, our results are clearly in conflict with findings from previous studies that indicated emotional facilitation during word recognition. Both Kousta et al. (2009) and Vinson et al. (2014) observed inverted-U effects of valence with faster reaction times for negatively and positively valenced stimuli compared to neutral words. Their observations favored the Motivated Attention and Affective States model (Lang et al., 1997), according to which both positive and negative affective stimuli are more likely to draw attention than neutral stimuli because emotional stimuli are motivationally relevant. In our analyses, an emotional-neutral coding of valence failed to capture the impact of valence effect that was otherwise evident (using positive-negative measures) across an extensive set of analyses.
Our sample of Spanish reading behaviour, while substantial, did not indicate an effect of arousal, either. We did not observe an effect of arousal, overall, or in an interaction, moderated by task. The lack of an effect of arousal in our analyses is congruent with the results obtained by Kousta et al. (2009) and Vinson et al. (2014), who also observed specific effects of valence but no influence of arousal on word recognition. Our results, however, contrast with the effect of arousal identified in the large-scale study conducted by Kuperman et al. (2014). Although the inclusion of arousal in our analysis did not improve the fit of our model to word recognition data, the fact that our word sample was smaller than that analysed by Kuperman et al. (2014) does not allow us to rule out the existence of a small arousal effect.
We analyzed if the effects of valence and arousal interacted. A valence by arousal interaction was not reliably detected in previous studies (Kousta et al., 2009;Vinson et al., 2014;Estes and Adelman, 2008a;Kuperman et al. 2014). In contrast, in our study, the effect of valence on word naming latencies was modulated by that of arousal, with stronger valence effects for higher arousal words. A similar interaction was not observed in lexical decision. Our observations thus suggest that an influence due to arousal may be found, to the extent that the valence effect is slightly different for words varying in arousal, in Spanish, but not to the extent that the impact of arousal is, on its own, detectable for our data. The variation in the valence by arousal interaction, depending on which valence measure is included in the model, suggests that the influence of arousal merits further investigation but will be difficult to characterize with confidence.
Critically, we tested whether the effect of valence was different in response to different task demands. The inclusion of the interaction between task (lexical decision or word naming) and graded valence in the analysis significantly improved model fit. Valence affected lexical decision responses more strongly than word naming responses. This finding extends previous observations in English (Estes & Adelman, 2008a;Kuperman et al., 2014) in which the valence effect was compared between tasks qualitatively but not formally. Importantly, our cross-task analysis allowed a direct estimate of the moderation of the valence effect by task differences as the effect of the task by valence interaction. (See Nieuwenhuis, Forstmann, & Wagenmakers, 2011, for a discussion of the inferential problems inherent in comparing effects in different data sub-sets when interactions are hypothesized but not formally tested).
The comparison between the results of lexical decision and word naming tasks is of interest because it could help to clarify the nature of the effect. Previous research (Balota et al., 2004;Cortese & Khanna, 2007) has indicated that semantic effects tend to be larger or easier to detect in lexical decision than in word naming because lexical decision response preparation is more reliant or draws more readily on such information (although see Plaut, 1997;Seidenberg & McClelland, 1989, for alternative interpretations). We do not think that the greater size of psycholinguistic effects, like the effect of valence, in lexical decision, is due to the fact that responses were slower than in word naming (as is usually observed). The average speed of response varies at random between subjects within and between tasks, as well as between items within tasks. Our use of linear mixed-effects models allowed us to include variance terms to account for such differences between sampled participants or words in response speed (while controlling for all other predictors). We thus observed the interaction between task and valence effects while taking into account differences in average speed of response. The fact that we observed stronger effects on lexical decision would, in our view, therefore support a semantic interpretation of the valence effect. According to this account, valence would join the group of variables argued to be related to semantic knowledge, like AoA, imageability or familiarity (Balota et al., 2004;Cortese & Khanna, 2007;Davies et al., 2013;Davies, Wilson, Cuetos, & Burani, 2014).
The observation of the valence and task by valence interaction effects in Spanish has significant implications for a language-general account of reading. Our findings demonstrate that emotional content affects reading in a transparent orthography.
Granted that valence can be understood as a semantic effect, this contrasts with the account proposed by some researchers, that semantic information tends to influence word recognition more prominently where words are difficult to encode, as appears to be the case, in English, for low frequency irregular words (Plaut et al., 1996;Strain et al., 1995). It may well be that semantic information influences word recognition in English more widely across the vocabulary (as reported by Balota et al., 2004;Cortese & Khanna, 2007). Our results demonstrate with certainty that word recognition is richly influenced by semantic content when the words being read have regular pronunciations.
In sum, we did not observe a significant effect of arousal in word recognition in Spanish. Further studies should be conducted to ascertain whether the lack of a reliable arousal effect in our data is due to specific characteristics of our stimuli or it rather indicates differences between the influence of this variable on word recognition in deep and transparent orthographies.
In contrast, we identified a substantial effect of emotional valence on word recognition, with positive valence words eliciting faster reaction times than negative valence words.
This finding provides empirical support to the Automatic Vigilance model of emotion (Pratto & John, 1991), according to which emotionally negative information slows down cognitive activity. In contrast, our data does not support the model of Motivated Attention and Affective States (Lang et al., 1997), which predicts faster reaction times for both positive and negative stimuli. Our data suggest that the observed effect is graded, such that the latency reduction associated with positive compared to negative valence is proportional to the positivity of the stimuli. This finding is inconsistent with the categorical effect for negative stimuli predicted by the original version of the Automatic Vigilance model. Critically, our observation that the influence of valence was stronger in lexical decision than naming indicates a semantic location for the effect.
This has implications for theoretical accounts of the cognitive architecture of the reading system, and of the role of semantic information in reading performance in different languages. Our observation of a valence effect in Spanish, a language with a transparent orthography, shows that emotion influencess the recognition of words with regular pronunciations. These results mirror some of those reported previously in English, thus demonstrating important parallels in word recognition processes between orthographically shallow and deep languages.  Table 3. Summary of linear mixed-effects models of the cross-task data, including lexical decision and word naming data.