Diagnostic Accuracy, Item Analysis and Age Effects of the UPSIT Spanish Version in Parkinson's Disease.

OBJECTIVE
The University of Pennsylvania Smell Identification Test (UPSIT) is the most commonly used test to detect olfactory impairment in Parkinson's disease (PD), but the cut-off score for clinical purposes is often difficult to establish because of age and sex effects. The current work aims to study the sensitivity and specificity of the UPSIT Spanish version and its accuracy in discriminating PD patients at different age groups from healthy controls (HC), and to perform an item analysis.


METHOD
Ninety-seven non-demented PD patients and 65 HC were assessed with the UPSIT Spanish version. Sensitivity, specificity, and diagnostic accuracy for PD were calculated. Multiple regression analysis was used to define predictors of UPSIT scores.


RESULTS
Using the normative cut-off score for anosmia (≤18), the UPSIT showed a sensitivity of 54.6% with a specificity of 100.0% for PD. We found that, using the UPSIT cut-off score of ≤25, sensitivity was 81.4% and specificity 84.6% (area under the receiver operating characteristic curve = 0.908). Diagnosis and age were good predictors of UPSIT scores (B = -10.948; p < .001; B = -0.203; p < .001). When optimal cut-off scores were considered according to age ranges (≤60, 61-70, and ≥71), sensitivity and specificity values were >80.0% for all age groups.


CONCLUSIONS
In the Spanish UPSIT version, sensitivity and specificity are improved when specific cut-off scores for different age groups are computed.


Introduction
Olfactory impairment has been found to be a prodromal manifestation in Parkinson's disease (PD) (Berendse, Roos, Raijmakers, & Doty, 2011; and other neurodegenerative disorders . The prevalence of olfactory deficits in PD is estimated to be about 90% (Fullard et al., 2016;Haehner et al., 2009). Olfactory dysfunction substantially affects wellbeing and quality of life, reducing enjoyment from food, beverages, personal care products, and the natural environment. Moreover, decreased smell function increases the risk of danger from fire, environmental toxins, leaking natural gas, and spoiled food (Doty, 2017).
Odour identification tests, such as the University of Pennsylvania Smell Identification Test (UPSIT) (Doty, 1995), are used in both research and clinical settings to assess olfactory dysfunction in PD patients. The UPSIT has demonstrated high capability in differentiating PD patients from healthy subjects (Bohnen, Studenski, Constantine, & Moore, 2008), and has been used as a tool for the diagnosis of PD (Doty, Bromley, & Stern, 1995;Silveira-Moriyama et al., 2008).
It is well known that aging per se is associated with olfactory decline; McKinnon et al. (2010), using data collected from 732 subjects, estimated that the decrease in UPSIT scores is around 3.2 points for every 10 years of age (McKinnon et al., 2010). Loss of olfactory identification ability is seen from the 5th decade onwards but is most remarkable between the 7th and 8th decades of life. In PD patients, it has also been found that UPSIT scores decreased as a function of age .
Olfactory functions in normal subjects are clearly influenced by gender (Doty, Ford, Preti, & Huggins, 1975;Doty, Orndorff, Leyden, & Kligman, 1978;Doty, Green, Ram, & Yankell, 1982). Sex-related differences in the UPSIT have also been described in PD, with lower performance for men . These results suggest that age and sex should be taken into consideration when using olfactory measures as a tool for PD diagnosis. In this Effects of ageing and sex on olfaction in PD 4 setting, Doty et al.  proposed optimal cut-off scores that best discriminate between PD and healthy subjects, according to different age groups and according to gender. As far as we know, these results have not been validated in the UPSIT Spanish version.
Furthermore, previous works found a decreased identification of some UPSIT items in healthy populations (Picillo et al., 2014;Silveira-Moriyama et al., 2008;Rodriguez-Violante, Gonzalez-Latapi, Camacho-Ordonez, et al., 2014b;Kondo, Matsuda, Hashiba, 1998). In this sense, the analysis of individual smell scores of UPSIT Spanish version may allow identifying items with greater accuracy in PD diagnosis, and items that are culturally unsuitable.
In the current work, we aim to investigate the sensitivity and specificity of the UPSIT Spanish version, its accuracy in discriminating PD patients from healthy controls (HC), age and sex related effects on test performance, and individual item reliability. It was hypothesised that: (1) PD patients will have poorer performance than HC; (2) the UPSIT will discriminate PD patients from healthy controls; (3) Some UPSIT items will show low percentage of success in control subjects; (4) Age will have an effect in UPSIT scores; (5) Sex will have an effect in UPSIT scores; (6) Diagnostic accuracy will improve when the effects of age and sex are considered.
Effects of ageing and sex on olfaction in PD 5

Participants
The study sample included 97 non-demented PD patients and 65 HC who underwent olfactory evaluation. The inclusion criteria for PD were: (1) the fulfilment of UK PD Society Brain Bank diagnostic criteria for PD (Hughes, Daniel, Kilford, & Lees, 1992); (2) no surgical treatment with deep brain stimulation. Exclusion criteria for PD were: (1) Hoehn and Yahr (H&Y) (Hoehn & Yahr, 1967) score > 3 (defined as mild to moderate bilateral disease with some postural instability but physically independent); (2) dementia according to Movement Disorder Society criteria (Dubois et al., 2007): (a) PD developed prior to the onset of dementia, (b) PD associated with a decreased Global Cognitive Efficiency, (c) cognitive impairment in more than one cognitive domain, (d) impact on daily living resulting from cognitive deficits, over and above those imposed by motor and autonomic problems (Dubois et al., 2007).
Effects of ageing and sex on olfaction in PD 6 All PD patients were taking antiparkinsonian medication consisting of different combinations of L-DOPA, COMT inhibitors, MAO inhibitors, dopamine agonists, and amantadine. In order to standardise doses, levodopa equivalent daily dose (LEDD) was calculated as suggested by Tomlinson et al. (Tomlinson et al., 2010). All assessments were done in the on state. Motor disease severity was evaluated using H&Y staging and the Unified Parkinson's Disease Rating Scale motor section (UPDRS-III).
The study was approved by the Ethics Committee of the University of Barcelona (IRB00003099). All subjects provided written informed consent to participate after full explanation of the procedures involved.

Olfactory and clinical assessment
Odour identification was assessed using the Spanish version of the University of Pennsylvania Smell Identification Test (UPSIT-40) (Doty, 1995).
The UPSIT is a standardised multiple-choice scratch-and-sniff test consisting of four test booklets with 10 items each. In accordance with normative instructions, subjects scratch the impregnated area and are asked to select one of four possible answers for each item.
Following normative data presented in the UPSIT manual , which includes adjustment for age and sex, scores greater than 33 in males and 34 in females are considered to reflect normosmia, and scores lower or equal to 18 reflect anosmia. Microsmia ranges from 33 or 34 (depending on the subject's gender) to 19.
In order to achieve higher cultural familiarity, the Smell Identification Test™ (UPSIT) Spanish version commercialised by Sensonics (https://sensonics.com) substitutes some items with respect to the original UPSIT. Specifically, in item 13, talcum powder replaces liquorice; in item 20, apple replaces gingerbread; in item 24, rubber tire replaces root beer; in item 27, raspberry replaces lime; and in item 29, walnut replaces wintergreen.
Effects of ageing and sex on olfaction in PD 7 The Beck Depression Inventory II (Beck, Steer, 1996), Starkstein's Apathy Scale (Starkstein et al., 1992), and the Neuropsychiatric Inventory (Cummings et al., 1994) were administered to all subjects to explore the presence of psychiatric symptoms.

Demographic, clinical, and olfactory data analysis
Statistical analysis of demographic, clinical, and olfactory data were performed using IBM SPSS Statistics 24.0.0 (2016; Armonk, NY: IBM Corp). Group differences in demographic and clinical variables between HC and PD patients were analysed with independent Student's ttests for quantitative measures or with Pearson's chi-squared test for categorical measures.
Bonferroni correction was used to correct for multiple testing. A multiple regression analysis with stepwise method was used to define possible UPSIT score predictors in the whole sample and in the PD group. Predictors were chosen based on previous literature (Deeb et al., 2010;Doty et al., 1978;Frye, Schwartz, & Doty, 1990;Hubert, Fabsitz, Feinleib, & Brown, 1980;Joyner, 1964;McKinnon et al., 2010) and correlation analysis. Age, PD diagnosis, sex, years of education, and smoking status were introduced as possible predictors for the whole sample regression. For the PD subsample, PD diagnosis was excluded, and clinical measures such as H&Y, UPDRS-III, and LEDD were added.

UPSIT item analysis
Sensitivity, specificity, diagnostic accuracy, positive predictive value, and negative predictive value for PD were calculated for the UPSIT-40 total score, and for each item.
Pearson's chi-squared test was used to compare the percentage of HC and PD patients who correctly identified each item of the 40-items UPSIT version.
Incorrect answers were considered as positive findings (i.e., altered performance), whereas correct answers defined negatives. True positives (TP) were defined as the number of PD patients who got a wrong response in a certain item; true negatives (TN) as the number of HC who got a correct response; false positives (FP) were calculated as the number of HC who got a wrong response and, finally, false negatives (FN) were described as the number
Considering the normative cut-off score of ≤18, 54.6% of PD patients and no HC were identified as anosmic. Supplementary Table 2.

UPSIT diagnostic measures
For the normative cut-off score of ≤18 to define anosmia, UPSIT total score had a sensitivity of 54.6% with a specificity of 100.0%, a PPV of 100.0%, a NPV of 59.6%, and a diagnostic accuracy of 72.8% for PD. The receiver operating characteristic (ROC) curve showed that the optimal cut-off score for UPSIT-40 was ≤25 with a sensitivity of 81.4%, a specificity of Effects of ageing and sex on olfaction in PD 9 84.6 %, a PPV of 88.8%, a NPV of 75.3% and a diagnostic accuracy of 82.7% (area under the ROC curve = .908). Table 1.
When considering the whole sample, UPSIT total score correlated negatively with age (r=-.300; p<.001). In the HC group, significant correlations were found between UPSIT total score and age (r=-.406; p=.001) and years of education (r=.296; p=.017). In the PD patient group, a negative correlation was found between UPSIT total score and age (r=-.366; p<.001). Sex did not achieve significant correlation with UPSIT total score, either in HC (r=.221; p=.076) or in PD (r=-.016; P=.875). Diagnosis (B=-10.948; t=-11.333; P<.001) and age (B=-.203; t=-4.370; P<.001) were found to be the best predictors of UPSIT scores when the whole sample was considered. For the PD patient sample, clinical measures were included in the regression analysis, and only age was identified as a significant UPSIT score predictor (B=-.250; t= -3.545; P<.001). According to these data, we plotted ROC curves for three age groups (≤60, 61-70, and ≥71 years). Table 2 shows cut-off scores with higher sensitivity and specificity for each age group. Sensitivity and specificity for PD diagnosis considering age groups for each UPSIT cut-off score are represented in Figure 1. In addition, we plotted ROC curves considering the same three groups divided by gender. Table 3.

Discussion
Since olfactory deficits have been found to be a relevant clinical feature in PD (Berendse, Roos, Raijmakers, & Doty, 2011;, there has been growing interest in incorporating odour tests in routine protocols both in patient assessment and in data collection. Our results show age and sex should be taken into consideration when using UPSIT Spanish version as a tool for identifying olfactory deficits associated with PD. In the current study, the normative cut-off score of anosmia proposed by the UPSIT manual (≤18) demonstrated high specificity (100.0%) and low sensitivity (54.6%). In our sample, the optimal cut-off score was identified as ≤25, with a diagnostic accuracy of 82.7%, a sensitivity of 81.4%, and a specificity of 84.6%. This observed sensitivity is similar to that found in a Mexican population using the same cut-off score (Rodriguez-Violante, Gonzalez-Latapi, Camacho-Ordonez, et al., 2014a, 2014b. In an Italian study, the cut-off score of ≤21 differentiated PD and HC with similar sensitivity (82.0%) and specificity (88.2%) (Picillo et al., 2014). Given that cross-cultural aspects may influence these results (Rodriguez-Violante, Gonzalez-Latapi, Camacho-Ordoñez, et al., 2014a), it could be of interest to perform a more detailed analysis of the UPSIT-40 items to identify those with less capability to detect olfactory deficits in a given population due to cultural biases.
We performed an item analysis similar to previous works (Kondo, Matsuda, Hashiba, 1998;Picillo et al., 2014;Rodriguez-Violante, Gonzalez-Latapi, Camacho-Ordonez, et al., 2014b;Silveira-Moriyama et al., 2008). Our results showed that some items had a rate of correct answers of less than 50.0% in HC (cedar, lemon, cherry, dill pickle and walnut), specifically, the three age groups coincided on dill pickle and walnut. Such items showed no significant differences in the percentage of success between HC and PD patients. As has been proposed in the literature (Picillo et al., 2014), a low percentage of success in control subjects may be an indicator of culturally unsuitable items. A decreased ability to identify of some odours has been reported as well in Italian (Picillo et al., 2014), Brazilian (Silveira-Moriyama et al., -Violante, Gonzalez-Latapi, Camacho-Ordonez, et al., 2014b), and Japanese (Kondo, Matsuda, Hashiba, 1998) control subjects. Out of the five items with low identification rates by HC in our study, three items (cherry, cedar, lemon) also showed a low percentage of correct identification in a previous study obtained from an Italian population (Picillo et al., 2014). Rodríguez-Violante et al. also found that cherry, walnut, and dill pickle were identified by less than 50.0% of subjects in a Mexican population (Rodriguez-Violante, Gonzalez-Latapi, Camacho-Ordonez, et al., 2014b). In Brazil, Silveira-Moriyama et al. found an identification rate of less than 50.0% in cedar, dill pickle, and lemon as well (Silveira-Moriyama et al., 2008). In this sense, a large percentage (86.0%) of HC fell in the microsmic range of performance, with only a small number (13.8%) being classified as normosmic.

2008), Mexican (Rodriguez
This increased rate of microsmia may reflect, among other reasons, the impact of cultural differences on UPSIT performance. These findings suggest a need to provide and to promote the use of country-specific normative data.
Our results show that UPSIT performance was explained mainly by group, but also by age.
Contrary to previous studies (Campabadal et al., 2017;Sharma & Turton, 2012;Silveira-Moriyama et al., 2008), clinical measures such as motor scales and disease duration did not significantly predict smell identification performance in our PD subsample. One explanation could be that olfactory loss is an early symptom that progresses mainly in prodromal stages and may reach maximum severity levels even earlier than motor symptoms, which progress through advanced stages (Doty, Deems, & Stellar, 1988). Further studies in de novo samples could help identify a more accurate cut-off score for PD patients in initial stages, which could eventually have a positive impact on their quality of life.
Age contribution to olfactory deficits is well known both in normal subjects (McKinnon et al., 2010;Sorokowska et al., 2015) and in PD patients . This is demonstrated by the fact that, in the original English version, discriminatory UPSIT scores are presented for different age groups . Even though changes in some specific items have been made, these age-related discrimination criteria have not been reviewed for the Spanish version.
Age-group analysis showed that the UPSIT-40 cut-off score of ≤25 demonstrated higher sensitivity for all age ranges than the normative cut-off score for anosmia. When cut-off scores for each age group were considered, sensitivity and specificity values were above 80.0% for all age groups. These values were similar when groups were classified by age and gender. In particular, our results show cut-off scores for the youngest group (≤60 years) were 29 for males and 30 for females; for the intermediate age group (61-70 years), they were 22 for males and 26 for females; and for the oldest group (≥71 years) they were 24 and 27, respectively. Cut-off scores for all groups were similar but below those proposed by Doty et al. (Doty et al., 1995). With a sample of 180 PD patients and 612 HC, Doty et al. (1995) found that the sensitivity and specificity of the UPSIT in distinguishing between male PD patients and HC under the age of 61 years was 91.0% and 88.0%, respectively (cut-off score 31). For females of the same age, the corresponding values were 79.0% and 85.0% (cut-off score 33). Sensitivity and specificity of the UPSIT-40 for subjects between 61 and 13 for females, respectively (cut-off score 30). Finally, for subjects older than 71, sensitivity and specificity for males were 76.0% and 78.0% (cut-off score 22), and 78.0% and 82.0% for females (cut-off score 25) . Considering that the onset of motor features in PD takes place around the age of 60, there is a need for developing more accurate procedures for detecting olfactory impairment in young PD patients and in the prodromal stages.
Screening for potential early indicators of PD such as olfaction, constipation or sleep disturbances facilitates the identification of individuals at high risk for PD who could participate in medical trials addressed to prevent or slow disease progression (Ross et al., 2008). Therefore, although the UPSIT has high capability in differentiating PD patients from healthy controls, it should be used within the context of a comprehensive clinical assessment.
In conclusion, our results demonstrate that the sensitivity and specificity of the UPSIT Spanish version were improved when specific cut-off scores for different age and sex groups were considered.
Limitations of this study include the fact that most of our PD patients are in mild to moderate stages of the disease. Since the PD patients of our study were already identified as having PD, our findings cannot address the issue of early diagnosis of PD. Further studies examining early clinical or prodromal stages of PD are needed to address this question. In this study we have excluded patients with low IQ and dementia; thus, our findings are not generalizable to the whole PD population. Finally, our data is preliminary, and the proposed cut-offs need to be replicated before being used clinically.    UPSIT sensitivity and specificity for PD diagnosis considering age group.

Figure 2
Percentage of subjects who correctly identified each item of the UPSIT.
* Items that did not show significant differences between groups in the chi-squared test after Bonferroni correction HC: healthy controls; PD: Parkinson's disease.