The Relative Salience of Facial Features When Differentiating Faces Based on an Interference Paradigm

Research on face recognition and social judgment usually addresses the manipulation of facial features (eyes, nose, mouth, etc.). Using a procedure based on a Stroop-like task, Montepare and Opeyo (J Nonverbal Behav 26(1):43–59, 2002) established a hierarchy of the relative salience of cues based on facial attributes when differentiating faces. Using the same perceptual interference task, we established a hierarchy of facial features. Twenty-three participants (13 men and 10 women) volunteered for the experiment to compare pairs of frontal faces. The participants had to judge if the eyes, nose, mouth and chin in the pair of images were the same or different. The factors manipulated were the target-distractive factor (4 face components × 3 distractive factors), interference (absent vs. present) and correct answer (the same vs. different). The analysis of reaction times and errors showed that the eyes and mouth were processed before the chin and nose, thus highlighting the critical importance of the eyes and mouth, as shown by previous research.

processed in a complex visual task such as face recognition. It is of interest to social psychology because it could reveal the importance of processing facial features as the first stage in making inferences about people in social interactions. Despite the great amount of cognitive and social research that has been done, the relative salience of facial features in perceptual tasks has received much less attention and the role of these features in face processing is still not well known.
There is considerable controversy about the featural and configurational properties underlying face recognition (see, for example, Halberstadt et al. 2003;Ingvalson and Wenger 2005;Lahaie et al. 2006;Leder and Carbon 2005;Tanaka and Sengco 1997;Wenger and Ingvalson 2003). Featural properties refer to isolated facial features (such as the nose, eyes, mouth, etc.), while configurational properties refer to the spatial relationships among the facial features, as well as their interactions (such as the distance between the eyes and the nose) (Rakover 2002). In general, empirical research has focused on pointing out the role of configurational properties in face-recognition processing (e.g., among many other works, Cheung et al. 2008;Donnelly and Davidoff 1999;Farah et al. 1998;Leder and Bruce 2000;Maurer et al. 2002;McKone 2008), but there is also some evidence that featural properties play a role in the process (Farivar and Chaudhuri 2003;Greenberg and Goshen-Gottstein 2009;Hosie et al. 1988;Macho and Leder 1998;Mondloch et al. 2002). Cabeza and Kato (2000) carried out two experiments using the prototype effect (i.e., the tendency to falsely recognize a new face that is perceptually related to a series of a previously seen faces). They used prototypes that emphasized either featural or configural processing. The first experiment showed that both featural and configural prototypes yielded a robust prototype effect. The second experiment showed that face inversion eliminated the prototype effect for configural prototypes, but not for featural prototypes. These results suggest that both featural and configural processing make important contributions to face recognition. Collishaw and Hole (2000) tested the effects produced by blurring, inversion and scrambling on the recognition of the faces of unfamiliar people and celebrities. The results showed that, in the case of unfamiliar faces and celebrities, when only configurational or featural processing was disrupted (i.e., when the faces were blurred, scrambled, inverted or inverted and scrambled), the faces were easily recognized. However, when configurational and featural processing was disrupted (i.e., when the faces were blurred and scrambled, or blurred and inverted), it was not easy to recognize them. In the same way, Schwaninger et al. (2002) showed scrambled, blurred, or scrambled and blurred familiar and unfamiliar faces to participants. The faces were recognized when they were scrambled or blurred (i.e., the featural or configurational information was degraded), but they were not recognized when they were scrambled and blurred (i.e., when both featural and configurational information was degraded). Finally, Zhang et al. (2004) used a neural network first developed by Dailey et al. (2002) to model the Mondloch et al. (2002) data. The neural network was built for holistic facial processing, but when important facial features were introduced in the stimuli to be tested (such as eyes and mouth), the neural network became more sensitive to the features. All these results show that featural processing contributes in some way to face recognition.
A new framework for explaining how sensorial information is adjusted to the information stored in memory was introduced in the late 1990s (Schyns 1998). According to this framework, the information required to place a given object into one category or another will change depending on the interaction between task constraints and object information. Task constraints refer to the information needed to place the object in the category required by the task, 1 e.g., given the question, ''Is this a face?,'' it will be necessary to find certain visual information (such as a nose, mouth, eyes, etc.) before giving an answer. Object information refers to the perceptual information available for placing the object in the category demanded by the task, e.g., if it is possible to observe a nose, mouth, eyes, etc., then we have the information we need for categorization and the question, ''Is this a face?,'' can be answered. Therefore, given a specific task, a group of characteristics of the object becomes especially useful (i.e., diagnostic), since it provides the necessary information to solve the task (Harel and Bentin 2009;Morrison and Schyns 2001).
We suggest that the diagnostic-recognition approach accounts for the empirical evidence supporting the notion that face processing is mainly holistic (based on configurational aspects or based on all the information in the entire image), as well as the empirical evidence supporting the idea that face processing is occasionally based on local features (Ruiz-Soler and Beltran 2006). Since the features in a face are always very similar, as a general rule they cannot be used as criteria for fast differentiation of faces (or differentiation with limited conditions), i.e., such aspects will not be diagnostic; only the particular interrelationship between facial features (configuration) will be diagnostic to recognize someone's face. However, holistic processing is probably not efficient to recognize people we have just met (e.g., at a crowded party). In this case, facial components (e.g., a nose or beard) will probably be more diagnostic to successfully perform the task of recognizing a new face. Therefore, task constraints could bias the perceiver towards configurational or featural processing. However, before testing any hypothesis about the effect of task constraints on the information used by a perceiver, it is necessary to have a more accurate idea of the role of the features themselves. The aim of the present study was to establish a hierarchy of facial features based on their relative salience as visual attributes for differentiating between faces. Montepare and Opeyo (2002) explored the relative salience of facial cues or attributes that distinguished race (black or white), age (old or young), gender (male or female), and emotional expression (angry or neutral). The participants judged if the faces were the same or different in terms of one of the attributes by means of a task that tried to highlight Stroop-like interference effects in order to control influences such as social desirability and reactivity (e.g., Stangor et al. 1992;Zarate and Smith 1990), because it involves a kind of automatic decision making that makes it difficult for participants to control their judgments (MacLeod 1991). The authors hypothesized that if a given facial cue was more salient and someone was asked to make a judgment based on that cue, the target cue would be processed more quickly and accurately than the other cues. The results showed a hierarchy of attributes in terms of reaction times (RTs) and errors. A possible explanation for these results was that the facial features that indicate the more salient attributes (e.g., skin color and hair type for race) are easier visual cues to discern than the features that indicate the minor salient attributes (e.g., the salience of race may reflect the distinctive perceptual qualities of skin color and hair type). But, given the fact that the study manipulated only two facial features (i.e., skin color and hair type for race, facial wrinkling and cranial hair for gender, and so on), ''additional research with stimuli containing more varied racial markers (…) is needed to ascertain the extent to which racial information captures perceivers' visual attention more strongly than other facial information'' (Montepare and Opeyo 2002, p. 54). Therefore a main conclusion could be that the salience of facial attributes should be based on the role of the underlying facial features.
We argue that facial features may have a more significant role than has traditionally been thought. Firstly, the fact that configural processing of human faces by adults takes priority does not rule out the possibility that featural processing also takes place. Moreover, studies with young children show that they do featural processing more often than configural processing (e.g., Freire and Lee 2001;Mondloch et al. 2002). Based on the recognition diagnosticity approach, we understand that visual processing uses featural or configurational properties depending on task demands, and that specific facial features therefore probably have different stimular salience. We used Montepare and Opeyo's Stroop-like procedure with the aim of establishing a hierarchy of facial features. But Montepare and Opeyo (2002) focused on the cues that ascribe qualities such as race, gender, age and emotional expressions. We focused on the facial features that differentiate faces.
We hypothesize that if certain facial features are salient to perceivers, they will be processed more quickly and with fewer errors. Thus, when task demands require featural processing, the different parts of the face show different salience. More specifically, we predict that the components with the most information (such as the eyes and mouth) will produce more interference patterns than other facial components (such as the nose and chin). A hierarchy of facial features would therefore allow us to formulate more accurate hypotheses in future research about the configurational and featural information used by perceivers when performing complex visual tasks and these hypotheses could provide useful information on cognitive processes and social judgments.

Participants
Twenty-three participants (13 men and 10 women) volunteered for the experiment (M age = 28.3, age range: 18-46 years). All of them were graduates of the University of Málaga, Spain, except for the youngest participant, who was an undergraduate student at the same university. The participants were recruited through a public advertisement at the university and were compensated for their participation.

Material
The experimental stimuli were pairs of frontal Caucasian faces (half of which were men and the other half women) built using Power SuperGoo (Metacreations 2000), a commercial software program that allows users to modify the size of facial components, and thus easily allowed us to design our facial stimuli. Four components were manipulated to compose the faces: eyes (round or almond shaped), nose (wide or thin), mouth (fat lips or thin lips) and chin (wide or narrow). Although the faces were built using computer software, none of them could be distinguished from real faces (see Fig. 1). We tested this by asking six judges if the images were real or artificial. Each judge was tested individually and was shown a sample of the pictures. All of them agreed that every face was an image of a real person. Each pair of faces was displayed simultaneously and in each case the faces were either the same or different; in the latter case, the faces were always different in terms of two components and were identical in terms of the other two components (e.g., the same eyes and nose and different mouth and chin). The faces were designed from two original male faces and two original female faces. Because the four facial components could vary in two ways, 16 different faces were obtained from each original face, giving a total of 64 faces.
Trials were constructed using SuperLab Pro (Abboud and Sugar 1997), which made it possible to completely control the procedure (i.e., the size and position of stimuli on the screen, exposition time measured in milliseconds, interstimulus interval, etc.) and to automatically record the responses (reaction times and errors).

Procedure
The participants were tested individually in an acoustically isolated room. Their consent to participate in the experiment was obtained and they were informed that they were going to perform a perceptual task. Their instructions were displayed on the computer screen: ''Below are pairs of faces which may be the same or different. You will first see a word in the center of the screen referring to a part of the face (e.g., eyes). Your task is to indicate whether the two faces are the same with respect to that feature by pressing a button. If the two faces are the same with respect to the feature, press the right mouse button; if the faces are different with respect to the feature, press the left mouse button.'' The participants therefore had to judge if the pair displayed on the computer screen was the same or different in terms of the eyes, nose, mouth and chin by pressing one of two mouse buttons (the right button for the same and left button for different). The participants were asked to keep their left index finger in contact with the left mouse button and their right index finger in contact with the right mouse button and to respond as quickly as possible. If the participant took longer than 1,500 ms, a warning message was displayed asking for faster  (2012) 36:191-203 195 answers. A single word (eyes, nose, mouth or chin) informed participants about the facial component they had to judge before the faces were displayed. All stimuli were displayed on a 14 screen at a distance of approximately 40 cm from the participant. The factors manipulated included the Target-distractive factor, Interference (absent vs. present) and Correct answer (the same vs. different). The target-distractive factor had 12 levels (4 face components 9 3 distractive factors). The levels of this factor were as follows: eyes-nose (EN), eyes-mouth (EM), eyes-chin (EC), nose-eyes (NE), nose-mouth (NM), nose-chin (NC), mouth-eyes (ME), mouth-nose (MN), mouth-chin (MC), chineyes (CE), chin-nose (CN), and chin-mouth (CM). So, for example, in the EN condition, the participants were asked if the eyes were the same in both faces (here the nose could also be different), whereas in the NE condition, the participants were asked if the nose was the same in both faces (the eyes could also be different). Interference could occur in one of two ways: when the correct answer was the same for the target component and different for the distractive component, or when the correct answer was different for the target component and the same for the distractive component. RTs and accuracy were recorded.
The participants evaluated the six pairs of faces twice. Each time they were asked to say if the two faces were the same or different in terms of one of the two variable components (e.g., eyes or nose). In one trial, they judged one of the components (e.g., eyes) and in the other they judged the other component (e.g., nose). The training phase had 16 trials (4 noninterference trials in which the correct answer was the same, 4 non-interference trials in which the correct answer was different, 4 interference trials in which the correct answer was the same, and 4 interference trials in which the correct answer was different). In the test phase, a word (e.g., nose) was displayed on the screen for 750 ms and two faces were immediately displayed simultaneously.
The overall experiment had 384 trials, i.e., eight stimuli for each of the 48 experimental conditions. The experiment was run in four blocks of 96 trials and for each of these blocks the other factor combinations appeared the same number of times. In other words, there were 48 non-interference trials and 48 interference trials and, in each of these groups, the correct answer for half of the trials was the same and the correct answer for the other half was different. All the trials were randomized within each block. Each block was separated by a short break. Answering all the questions in the experiment took about 15 min. Stimuli were counterbalanced in order to avoid carryover effects.

Results
According to the overall hypothesis of this work, our aim was to find out if there was a hierarchy or sequencing in the processing of facial features. In terms of data analysis, this involves determining whether there are interference asymmetries observed in twin experimental conditions (i.e., those in which the same stimuli were used, but with targetdistractor exchanged). For example, the target-distractive conditions EN and NE were twins, but while the target in the EN condition was ''eyes'' and the distractor was ''nose,'' the target in the NE condition was ''nose'' and the distractor was ''eyes.'' If the only thing that mattered was that the distractor could lead to a different response from the target (e.g., answering that the eyes were the same when the noses were different or answering that the eyes were different when the noses were the same), then there should be differences. Conversely, if the distractor was more salient, then we would have to observe the asymmetry. Given the fact that two dependent variables were recorded in this experiment (RTs and errors), the results will be presented separately for each one.

Reaction Time
The unit of analysis was the pair of faces and an average reaction time score for each face pair was calculated for all the research participants and each of the 12 face pair conditions, including the four categories within each experimental condition (2 interference factors 9 2 correct responses). Prior to the analysis of RTs, we proceeded to eliminate those over 1,500 ms (less than 1 %), because we considered them to be abnormally high and to be distractions.
A repeated measures analysis of variance comparing the differences in overall reaction times indicated the following effects: Target-distractive, F(11, 231) = 8, p \ .0001, g p 2 = .08, Target-distractive x Interference, F(11, 231) = 3.97, p = .001, g p 2 = .32, Tar The post hoc analysis of RTs showed a pattern generally consistent with our hypothesis, because significant differences occurred in conditions where the eyes or mouth acted as distractors. The eyes interfered in the processing of the nose and chin (contrasts EN-NE and EC-EC), and the mouth interfered in the processing of the nose and chin (contrasts MN-NM and MC-CM). However, the lack of differences in interference between the eyes and the mouth and the mouth and the eyes (EM-ME) suggests that these facial features have a similar level of salience. Moreover, the lack of differences in interference between the nose and the chin was also consistent with the general hypothesis, because these features have little stimular salience (namely, they require a low level of attention) and their level of interference is therefore similar. The only somewhat unexpected result was the lack of differences in interference between the NM-MN conditions.
We also calculated the overall interference of all the other components with respect to the target (see Table 1). We thought it would be of interest to estimate the interference caused by each facial component directly by measuring the increase in milliseconds when participants evaluated the faces. Such interference was calculated by adding the differences in RTs between the interference condition and the non-interference condition. For example, the nose target situation was compounded by the increase in milliseconds posed by each of the distractors ([eyes = 214] ? [mouth = 159] ? [chin = 140]) = 513. Following this computation, the results, in ascending order of interference, were the mouth (190 ms), the eyes (269 ms), the chin (467 ms) and the nose (513 ms). Therefore, the nose and chin were the targets with the most interference and the mouth and the eyes were the targets with the least.

Errors
In cognitive psychology RT analyses are usually performed on the correct answers, but in our experiment the focus was on the interference caused by different facial elements which generally coincided with the production of errors. We will therefore focus on the pattern of errors found. Errors were calculated for all the participants and all face pairs. Since the number of errors is a count variable, a Poisson regression was applied (v 2 Likelihood Ratio (47) = 733.77, p \ .001) and the model selected was Target-distractive x Interference ? Target-distractive x Response, thus obtaining a fitness to data equal to 83% (it was J Nonverbal Behav (2012) 36:191-203 197 (117) computed as the result of [(Null Model Deviance -Model selected Deviance) Null Model Deviance] 9 100). A general pattern can be observed in Fig. 2. The experimental conditions in which the right answer was different yielded more errors than when the right answer was the same, except for the chin-mouth condition. Although this general pattern could be produced by a configurational effect from this stimulus zone because of the close proximity of the mouth and chin, the other subset of results was rather clear: the eyes and mouth produced maximum interference with the nose, thus pointing out the salience of these features compared to the nose. It was also observed that the mouth and eyes had a similar salience. Moreover, contrasts among twin conditions showed the expected differences in interfer-

Discussion
As per our hypothesis, the RTs and number of errors in a perceptual interference task appear to depend on the specific facial features being compared. The results suggest a priority in the order of processing facial features, with the eyes and mouth being preferred over the chin and nose. The empirical evidence indicating such a hierarchy was The EN condition means Eyes-Nose (with eyes as the target and nose as the distractor) while the NE condition means Nose-Eyes (with nose as the target and eyes as the distractor). The two graphs on the left display the results for the non-interference condition and the two graphs on the right display the results for the interference condition. The two graphs on the top display the results for correct response = ''same'' and the other two graphs on the bottom display the results for correct response = ''different '' J Nonverbal Behav (2012) 36:191-203 199 based on an experimental Stroop-like procedure that involved automatic processing so as to avoid some unwanted effects such as social desirability and reactivity to the faces. The importance of the eyes over other facial features when people examine a face was stated in Argyle's classical work (Argyle 1970) and shown to be empirically well founded using a variety of experimental procedures (e.g., Itier et al. 2007;Itier and Batty 2009;Janik et al. 1987;Vinette et al. 2004). However, Smith et al. (2004) showed that in a categorization task by gender, the diagnostic information was in the eyes, whereas in a categorization task by expression, the diagnostic information was in the mouth. Moreover, Calder and Young (2005) suggested than the recognition of facial identity and facial expression are somewhat related and do not follow strictly separate functional and neural pathways. Thus, our results reflect the preference for the eyes and the mouth over other features when people process a face and seem to indicate that the critical features (eyes and mouth) used to process a face form a single group that takes precedence over the other features tested in the experiment.
Our results with frontal faces are congruent with current findings on the processing of facial features. Given the fact that configurational processing takes precedence over featural processing of profiles (McKone 2008), probably because some parts of the face are not visible, the question arises as to whether the hierarchy would be maintained in profiles and three-quarter views of faces. However, although it would be necessary to test the hierarchy of facial features from different viewpoints, the hierarchy obtained for frontal faces may be useful in many cognitive and social psychology studies. For example, in the aforementioned article by Montepare and Opeyo (2002), they found a hierarchy of facial attributes in which race was more salient than gender, age and emotional expression. A possible explanation for these results was that the quality of the features chosen to reflect race had a relatively different quality from the features used as cues that indicate age, gender and emotional expression. Thus, it would be of interest to select the cues based on the criterion established by a hierarchy of features, such as we found in our study. Also, Montepare and Dobish (2003) found that pictures of participants instructed not to show any emotional facial expression elicited trait impressions from perceivers related to happy or angry emotional expressions, but not related to surprise. Although it would need to be empirically tested, it is possible to suppose that this variability eliciting trait impressions could arise from the hierarchy of the facial features involved in such emotional facial expressions. In general, the hierarchy of facial features could be useful in a variety of research projects in social perception of faces, such as research grounded on the ecological theory (McArthur and Baron 1983). According to the ecological theory, both the nature of the stimulus information and the situational demands play a role in perceived social traits. Therefore, the hierarchy of facial features could be based on the cues chosen by a perceiver to infer trait impressions and attributes in a given social situation.
Also, a hierarchy of features could be useful in studies that try to determine the role of featural processing in face recognition. For example, holistic processing is less efficient in some tasks than the processing of specific features (Harel and Bentin 2009), probably because the kind of categorization required by the task means that some specific features become diagnostic. The role of the task in face recognition has been pointed out (e.g., Ruiz-Soler and Beltran 2007), but the hierarchy of features found in our study may provide a tool for selecting the stimuli used in studies to establish the role of facial features based on perceptual task demands.
Although some may argue that the task of this experiment is somewhat artificial because face processing is usually done of the face as a whole and not its parts, there are some reallife situations that require facial components to be compared, thus giving ecological validity to our task. For example, when customs police officers compare the passport photo with the face of the person in front of them, focusing on the specific facial components helps with identification. In situations in which many people are introduced in a short period of time (e.g., at a party), it may be more diagnostic to notice a specific feature (a nose or eyes with a distinguishing trait) to identify the person. Awareness of this priority when processing facial features could also be used to compare the performance of adults with other groups in which task-oriented processing is based more on features (such as young children or children with autism spectrum disorders). Therefore, based on the diagnostic-recognition approach, if the information required to place an object in one category changes according to the interaction between task constraints and the object information approach, the temporal sequence in which facial features are processed could be a key methodological contribution to test the specific role of features when people process faces.