Determination of Phenolic Compounds in Paprika by Ultra-High Performance Liquid Chromatography-Tandem Mass Spectrometry. Application to Product Designation of Origin Authentication by Chemometrics.

1 A UHPLC-ESI-MS/MS method was developed for the determination of 36 2 phenolic compounds in paprika. The proposed method showed good method 3 performance with limits of quantitation between 0.03 – 50 µg/L for 16 compounds, and 4 between 50 µg/L and 1 mg/L for 12 compounds. Good linearity (r 2 > 0.995), run-to-run 5 and day-to-day precisions (%RSD values <12.3% and <19.2%, respectively), and 6 trueness (relative errors <15.0%) were obtained. The proposed method was applied to 7 the analysis of 111 paprika samples from different production regions: Spain (La Vera 8 PDO and Murcia PDO) and Czech Republic, each one including different flavor 9 varieties (sweet, bittersweet, spicy). Phenolics profiles and concentration levels showed 10 to be good chemical descriptors to achieve paprika classification and authentication 11 according to the production region by principal component analysis (PCA) and partial 12 least squares regression – discriminant analysis (PLS-DA). In addition, perfect 13 classification among flavor varieties for Murcia PDO and Czech Republic samples was 14 also obtained. 15


INTRODUCTION
Paprika is a spice obtained after drying and grinding fruits of the genus Capsicum that belongs to the Solanaceae family. 1 Within this genus there are approximately 39 species, including wild, semi-domestic and domestic ones, such as C. annuum, C. chinense, C. baccatum, C. frutescens, and C. pubescens, growing in different parts of the world, being C. annuum the most usual. 2,36][7][8][9] Paprika contains a large number of bioactive compounds with great health-promoting properties such as carotenoids (provitamin A), ascorbic acid (vitamin C), tocopherols (vitamin E), capsaicinoids and phenolic compounds. 10Among them, it is worth noting the importance of phenolic compounds that are widely distributed in plants, many of which are essential secondary metabolites that contribute to the sensory properties of foods such as color and aroma. 115][16] By far, liquid chromatography with either UVdetection or coupled to mass spectrometry is the most widely used technique for the determination of polyphenols. 13,17Nevertheless, the great chemical diversity of these compounds and the low concentration levels in which they are found make liquid chromatography coupled to mass spectrometry or tandem mass spectrometry (LC-MS(/MS)) the most effective method for the characterization, identification and determination of polyphenols in paprika samples. 11,18,19Previous studies have reported that the main phenolic compounds found in paprika are vanillic, caffeic, ferulic, pcoumaric and p-hydroxybenzoic acids. 20 manufacturers, as well as the public in general, are increasingly concerned about food quality attributes and, therefore, the demand for food products of a specific geographical origin grows.Within this context, and with the aim of preserving the reputation of the products and supporting good practices in rural and agricultural activities, the European legislation has established several quality parameters related to the protection of geographical indications and appellations of origin of agricultural and food products (Council Regulation, EEC No. 510/2006 21 ): Protected Designation of Origin (PDO) that links the products with the defined geographical area where they are produced; Protected Geographical Indication (PGI) that links products to a geographical area where at least one step of production occurred; and Traditional Specialties Guaranteed (TSG) that protects traditional production methods. 22pain, there are two production areas of paprika with PDO recognized by the European Union: La Vera, from the north of the province of Cáceres (Extremadura), and the province of Murcia.Despite having a common origin and practically parallel development, the production process is different in each of these areas. 23In both cases, the product is the result of drying and grinding the fruits of Capsicum species, but differences in fruit varieties and drying processes provide different organoleptic characteristics.The red fruits used for the production of La Vera paprika are dried with oak or holm oak firewood, by the traditional Vera system, and belong to the Capsicum annuum varieties of the Ocales group (Jaranda, Jariza and Jeromín) and Bola..6] Paprika is a worldwide consumed species susceptive of adulteration practices to attain economic benefits.The substitution of ingredients, the addition of (illegal) substances and false declarations of origin are important and challenging issues facing the authorities of the food industry. 27Moreover, the characteristics of paprika, as well as the content of phenolic compounds, may differ due to multiple factors such as the varieties, climatic conditions, growing areas, water resources, ripening stage, agronomy conditions, pre-and post-harvest treatment, etc. 11 As a result, the content of phenolic and polyphenolic compounds in paprika products can be exploited as a source of analytical data to establish the product classification and authentication, both in the prevention of fraudulent adulterations and in the correct assignment of the PDO declarations.
In this work, a ultra-high performance liquid chromatography-electrospraytandem mass spectrometry (UHPLC-ESI-MS/MS) method using a triple quadrupole (QqQ) analyzer has been developed for the determination and quantification of 36 phenolic and polyphenolic compounds in paprika, and subsequent characterization, classification and authentication of paprika samples by multivariate chemometric methodologies.Chromatographic and electrospray ion source conditions were optimized, and the method performance was established by determining quality parameters such as linearity, limits of detection, limits of quantitation, run-to-run and day-to-day precision, and trueness.A total of 111 paprika samples belonging to La Vera PDO and Murcia PDO (Spain) and to Czech Republic were analyzed with the proposed methodology after applying a simple extraction method using acetonitrile/water (80:20 v/v) solution as extracting agent.Then, contents of the 36 phenolic and polyphenolic compounds were employed as chemical descriptors of the analyzed paprika samples to their classification and authentication by principal component analysis (PCA) and partial least squares-discriminant analysis (PLS-DA).

Reagents and solutions
All standards and chemicals used in this work were of analytical grade, unless otherwise indicated.Structures, family group, CAS number and supplier of the 36 phenolic and polyphenolic compounds under study are indicated in Table S1 (Supporting information).Individual stock standard solutions (ca.1000 mg/L) were prepared in methanol in amber glass vials.Intermediate standard working solutions were prepared weekly from these individual stock standard solutions by appropriate dilution with water.All stock and intermediate working solutions were stored at 4 o C for no more than 1 month.LC-MS quality water, methanol and acetonitrile (Chromasolv TM quality) were purchased from Honeywell (Riedel-de-Haën, Seelze, Germany).Formic acid (≥98%) was obtained from Sigma-Aldrich (St. Louis, MO, USA).

Instrumentation
The determination of polyphenols and phenolic acids was carried out on an Open Accela UHPLC instrument (Thermo Fisher Scientific, San José, CA, USA), equipped with a quaternary pump and a CTC autosampler.The separation was performed by reversed-phase chromatography using an Ascentis Express C18 fusedcore (100 x 2.1 mm, 2.7 µm partially porous particle size) column from Supelco (Bellefonte, PA, USA), and gradient elution using 0.1% formic acid in water (solvent A) and 0.1% formic acid in acetonitrile (solvent B) as mobile phase components, with a mobile phase flow-rate of 300 µL/min.The elution gradient program was as follows: 0-5.5 min, isocratic elution at 5% solvent B; 5.5-6.5 min, linear gradient up to 10% solvent B; 6.5-12 min, isocratic elution at 10% followed by a 1 min-increase to 20% solvent B; 13-18 min, isocratic elution at 20% solvent B; 18-19 min, linear gradient raising up to 50% solvent B and then 2 min elution at this percentage; 21-22 min, linear gradient to 95% solvent B and 3 min keeping these composition of the mobile phase.
Afterwards, return to initial conditions for a 5 min-column re-equilibration, and completing a total elution program time of 30 min.The chromatographic column was kept at room temperature, and an injection volume of 10 µL, full loop mode, was employed.
The UHPLC instrument was coupled to a TSQ Quantum ultra AM triple quadrupole (QqQ) mass analyzer (Thermo Fisher Scientific), equipped with hyperbolic quadrupoles and an heated-electrospray ionization (H-ESI) source.Nitrogen with a purity of 99.98% was employed for the ESI sheath gas, ion sweep gas, and auxiliary gas at flow rates of 60, 20, 0 a.u.(arbitrary units), respectively.Other H-ESI parameters were as follow: capillary voltage in negative ion mode, -2. was employed for all studied compounds except betulinic acid that showed no fragmentation under working conditions.A mass resolution of 0.7 m/z full width at half maximum (FWHM) on both quadrupoles (Q1 and Q3), and a scan width of 0.5 m/z were used.Fragmentation was carried out by using argon as collision gas at a pressure of 1.5 mtorr, and the optimal normalized collision energies (NCE) for each SRM transition monitored (quantifier and qualifier) are shown in Table 1.The precursor ion selected, precursor and product ion assignments, quantifier/qualifier ion ratios, and the tube lens offset voltage for each compound under study are also summarized in Table 1.To improve sensitivity, the acquired chromatogram was segmented into 4 windows (Table 1), and a dwell time of 50 ms, and 1 microscans were employed.The control of the UHPLC-ESI-MS/MS system and the data processing was performed by using the Xcalibur software version 2.1 (Thermo Fisher Scientific).

Samples and sample treatment
A total of 111 paprika samples, purchased from local markets in Spain and Czech Republic, were analyzed.The set included 72 La Vera PDO paprika samples (26 sweet, 23 bittersweet, and 23 spicy flavor), 24 Murcia PDO paprika samples (12 sweet and 12 spicy flavor), and 15 Czech Republic paprika samples (5 sweet, 5 smoked-sweet, and 5 spicy flavor).
Sample treatment was performed following a previously described method. 1,28fly, 0.3 g of paprika were extracted with 3 mL of a water:acetonitrile (20:80 v/v) solution in a 15 mL PTFE tube.Extraction was performed by stirring in a Vortex for 1 min (Stuart, Stone, United Kingdom) followed by sonication for 15 min (2510 Branson ultrasonic bath, Hampton, NH, USA).Then, sample extracts were centrifuged for 30 min at 4500 rpm (Rotana 460 HR centrifuge, Hettich, Germany), and the supernatant extract filtered through 0.45 µm nylon filters (Whatman, Clifton, NJ, USA) and stored at -18 o C in 2 mL glass injection vials until analysis.
A quality control (QC) solution was prepared by mixing 50 µL of each sample extract.This QC was employed to evaluate the repeatability of the method and the robustness of the chemometric results.
Samples were randomly analyzed with the proposed UHPLC-ESI-MS/MS method.Moreover, a QC and an instrumental chromatographic blank of acetonitrile were also injected every ten analyzed samples.

Data analysis
Principal component analysis (PCA) and partial least squares regressiondiscriminant analysis (PLS-DA) calculations were performed using Stand Alone Chemometrics Software (SOLO) from Eigenvector Research. 29Detailed description about the theoretical background of these methods can be found elsewhere. 30ta matrices in both PCA and PLS-DA consisted of the concentration levels of the 36 phenolic and polyphenolic compounds quantified in the set of paprika samples and QCs, whereas the Y-data matrix in PLS-DA defined the membership of each sample in the corresponding class.Data was autoscaled to equalize the influence of major and minor compounds on the descriptive models.Scatter plots of scores and loadings from principal components (PCs), in PCA, and from latent variables (LVs), in PLS-DA, were employed to study the distribution of samples and variables (quantified compounds), revealing patterns that could be correlated to their characteristics.

UHPLC chromatographic separation
As commented in the introduction section, one of the objectives of the present work is the development of a LC-MS/MS method for the determination of a total of 36 phenolic and polyphenolic compounds, which belong to different phenolic classes, in paprika samples.The separation of polyphenols and phenolic acids in food products by LC-MS techniques is normally addressed by reversed-phase chromatography under gradient elution conditions using acidified water and methanol or acetonitrile as mobile phase components. 18For that purpose, as a first attempt in this work, the separation was carried out with an Ascentis Express C18 fused-core (100 x 2.1 mm, 2.7 µm partially porous particle size) column, using water and acetonitrile, both acidified with 0.1% formic acid), as mobile phase components, and applying a universal gradient elution profile from 0 to 90% acetonitrile in 25 min.Under these conditions, multiple coelutions were observed, and almost all the analyzed compounds eluted within the first 5 min, showing that when acetonitrile was used as organic mobile phase modifier low elutropic strength was needed for the elution of this family of compounds by reversedphase chromatography.Therefore, the separation of the studied compounds was optimized by combining isocratic and linear gradient elution steps at low acetonitrile content (between 5 to 50%) to improve separation among the more polar phenolic acids, increasing then the acetonitrile content to elute all the compounds.It should be noted that due to the high number of compounds under study, a compromise between chromatographic resolution and analysis time was considered.Figure 1 shows the proposed UHPLC chromatographic separation for the 36 studied phenolic and polyphenolic compounds (see elution program in the instrumentation section).As can be seen, an acceptable chromatographic separation was obtained in less than 26 min, although still some partial and total co-elutions were found for some compounds, such as for homovanillic and syringic acids (peaks 11 and 12), p-coumaric acid, (-)epigallocatechin gallate and syringaldehyde (peaks 16, 17 and 18), and veratric and ferulic acids (peas 21 and 22).However, the use of MS detection under MRM acquisition mode allowed to overcome problems dealing with partial and total coelutions for the correct determination of the studied compounds.In this regard, different SRM transitions were monitored for the co-eluting compounds and no ion-suppression effects within ESI were present (that will be addressed in the next section).

UHPLC-ESI-MS/MS acquisition conditions
The ionization of the studied compounds under H-ESI conditions was thoroughly investigated.First, ion source parameters were tuned to generate the highest number of ions and to improve the obtained signal.For that purpose, these parameters were optimized by infusion of 100 mg/L standard solutions of each one of the studied compounds at a flow rate of 15 µL/min and using the syringe pump integrated in the TSQ QqQ instrument, mixed with 200 µL/min of a 0.1% formic acid acidified water/acetonitrile (1:1 v/v) solution by means of a Valco zero dead volume tee piece from Supelco.Then, for each one of the indicated ion source parameters, the optimal value was selected as the one providing the highest signal for most of the studied compounds (see instrumentation section).In contrast, a specific ESI tube lens offset voltage was selected for each compound, and the optimal values obtained are summarized in Table 1.
Full scan MS spectra (m/z 50 -1000) of individual solutions of all the studied compounds in negative ionization mode were also registered.As an example, Figure S1a (Supporting information) shows the obtained MS spectra of syringaldehyde and ethyl gallate.As can be seen, the most abundant ion (base peak) in both spectra is the deprotonated molecule, [M-H] -, at m/z 181.1 and 197.2 for syringaldehyde and ethyl gallate, respectively.Similar results were obtained for most of the studied phenolic and polyphenolic compounds, being the deprotonated molecule the spectrum base peak.
Moreover, adduct formation with the mobile phase components was not observed.In general, no ion in-source fragmentation was obtained, excepting some particular compounds.For instance, in the case of polydatin the spectrum base peak was not the deprotonated molecule but the [M-H-C 6 H 10 O 5 ] -ion at m/z 227.0, although the [M-H] - was also very abundant.In the case of syringaldehyde (Figure S1a), and gentisic and 4hydroxybenzoic acids, ion source fragmentations with relative intensities lower than 40% and 60%, respectively, were observed.Finally, it should be mention that in most of the MS spectra obtained, a signal at m/z 91.2 was also observed due to the dimmer formation of the formic acid present in the mobile phase ([HCOOH-HCOO] -).After the study of the MS spectra, the deprotonated ion was then proposed as the precursor ion for the further fragmentation studies (Table 1).
Fragmentation of the phenolic and polyphenolic compounds under study in the QqQ mass analyzer was also evaluated under tandem MS condition.As an example, Figure S1b and S1c (Supporting information) show the normalized collision energy curves and the product ion scan spectra, respectively, for syringaldehyde and ethyl gallate.The two most intense and characteristic product ions of each compound were selected for the quantifier and qualifier SRM transitions, and they are summarized in Table S1, together with the optimal NCE for each SRM transition and the quantifier/qualifier ion ratio.As can be seen in the table, all the compounds with partial or total co-elution in the chromatographic separation previously commented (Figure 1) showed different precursor-product ion transitions for both quantifier and qualifier ions.
In addition, ion-suppression effect in the ESI source for those co-eluting compounds was evaluated by comparing their signal when analyzed individually and under co-elution conditions at the same concentration level.In all cases, ion-suppression was lower than 10%, in agreement with previous reported studies. 31Therefore, baseline chromatographic separation is not mandatory because these co-elutions can be selectively resolved by tandem MS using the appropriate SRM transitions.

Instrumental method performance
Method performance was evaluated from instrumental quality parameters such as limits of detection, limits of quantitation, linearity, run-to-run and day-to-day precision, and trueness.The obtained results for the 36 phenolic and polyphenolic compounds determined are summarized in Table 2.
Limits of detection (LODs), based on a signal-to-noise ratio of 3:1, were assessed by analyzing standard solutions at low concentration levels, obtaining values in a wide range depending on the compound (from 0.01 µg/L for D-(-)-quinic acid to 1.4 mg/L for kaempferol).Limits of quantitation (LOQs), based on a signal-to-noise ratio of 10:1, in the range 0.03 µg/L -4.5 mg/L were then established.Of those, seven compounds showed LOQ values equal or below 1 µg/L, nine compounds in the range 1 provided LOQ values higher than 1 mg/L.Taking into account that these compounds are naturally occurring secondary metabolites in plant-based products and the huge variety of compounds and concentration levels that can be found (usually at the relatively low to high mg/L level), these values are acceptable for the quantitation of this family of compounds in paprika samples.
External calibration curves using phenolic and polyphenolic standards prepared in water and based on peak area at concentrations above LOQ to 15 mg/L were established.Very good linearities with correlation coefficients (R 2 ) higher than 0.995 were obtained.
Run-to-run and day-to-day precisions for compound quantification were also calculated at four concentration levels (5 µg/L, 50 µg/L, 500 µg/L and 10 mg/L) and the results are also given in Table 2.In the case of run-to-run precision, five replicate determinations for each concentration level were performed within the same day.For day-to-day precision, 15 replicate determinations at each concentration level were carried out within three non-consecutive days (five replicate determinations each day).
In general, run-to-run precisions below 12.3%, expressed as percentage of relative standard deviations (%RSD), were obtained in all cases.As expected, better precisions were achieved at the highest concentration level evaluated (10 mg/L), with RSD values in the range 0.2 -4.4% (for 33 compounds), and only asiatic and betulinic acids showed higher RSD values (6.7% and 7.3%, respectively).Precision slightly worsened at lower concentrations for those compounds that were still detected under the selected conditions, but the figures of merit were very acceptable, with values below 5.9%, 9.9% and 12.3% for the 500, 50 and 5 µg/L concentration levels, respectively.RSD values slightly increased when calculating day-to-day precisions, as expected.Nevertheless, RSD values below 13.2%, 8.6%, 15.9% and 19.2% for the 10 mg/L, 500 µg/L, 50 µg/L and 5 µg/L concentration levels, respectively, being quite acceptable taking into consideration the evaluated concentration levels and the methodology employed.
Method trueness was also evaluated at the four concentration levels by comparing the spiked concentrations with those calculated by external calibration using standards prepared in water.Relative errors (%) lower than 8.2, 12.6, 15.0 and 13.3% for the 10 mg/L, 500 µg/L, 50 µg/L and 5 µg/L concentration levels, respectively, were obtained.
The results showed that the proposed UHPLC-ESI-MS/MS method was very satisfactory in terms of sensitivity, precision and trueness for the determination of the 36 studied phenolic and polyphenolic compounds at the expected concentration levels.

Sample analysis
The applicability of the proposed UHPLC-ESI-MS/MS method for the determination of the 36 studied compounds in paprika was evaluated.Paprika samples were extracted by solid-liquid extraction with water:acetonitrile (20:80 v/v) as described in the experimental section.The obtained extracts were then analyzed in triplicate with the proposed analytical method and targeted compounds were quantified by external calibration.Quantitation results for all the 111 paprika samples analyzed are provided in the Supporting information (Phenolic and Polyphenolic concentration in Analysed Paprika Samples.xlxs).As an overview, Table 3 shows, for each compound, the concentration ranges and the mean values ± standard deviations found in the analyzed paprika samples depending on the production region (La Vera PDO, Murcia PDO and Czech Republic) and the paprika flavors.Gallic acid, quercetin and kaempferol were always detected below the LOQ value.16 of the studied compounds (D-(-)-quinic acid, arbutin, 4-hydroxybenzoic acid, gentisic acid, (+)-catechin, syringic acid, (-)epicatechin, ethyl gallate, (-)-epigallocatechin gallate, procyanidin C1, veratric acid, polydatin, procyanidin A2, fisetin, morin and asiatic acid) were not detected in any of the 111 paprika samples (these compounds were not included in Table 3).4][35] Data was first analyzed with univariate methods trying to recognize some tentative biomarkers of the different paprika types.The average concentrations and boxplots comparing the three geographical origins and/or the flavor varieties suggested that some compounds were up-or down-expressed depending on the classes.Some representative examples are given in the boxplots with whiskers of Figure S2 (Supporting information) including model compounds much more abundant in one of the classes and others quite homogeneously distributed.
More in detail, some compounds were only found in some specific paprika samples depending on the production region so they could be considered as putative markers with high selectivity with respect to origins.For example, homogentisic acid was only detected in Czech Republic samples, although always below the LOQ.
Umbelliferone was only found, at low concentrations, in the spicy flavor paprika from Czech Republic, while betulinic acid was only found in La Vera PDO samples.
Other general patterns were extracted concerning non-selective compounds.For instance, homoplantaginin, rosmarinic acid and nepetin-7-glucoside exhibited concentrations 3-to 10-fold higher in Czech Republic samples compared to the other origins.A similar trend was found with hydroxycinnamic acids, also more abundant in Czech Republic paprika.For La Vera PDO, homovanillic acid and, especially, syringaldehyde, were quite characteristic.In contrast, no unique or featured molecules were encountered for Murcia samples which displayed, in general, intermediate concentration values between La Vera and Czech Republic.As an example, Figure S3 (Supporting information) depicts bar plots showing the distribution of three selected compounds (syringaldehyde, rutin, and nepetin-7-glucoside) in the analyzed paprika samples.It can be seen that rutin shows quite similar levels within all the paprika samples.In contrast, as commented above, syringaldehyde and nepetin-7-glucoside are more characteristic of La Vera PDO and Czech Republic samples, respectively.These clear differences in phenolic and polyphenolic distribution and concentrations depending on the region and flavor varieties may allow to propose polyphenols as good chemical descriptors to address paprika authentication.
The significance of the differences in the concentration values among classes was evaluated using statistical test.As a result, most of the previous considerations regarding the occurrence of quite featured compounds of the different classes could be confirmed.Results commented here have been limited to various illustrative cases since a comprehensive analysis dealing with all variables seems to be excessive.Data given as follows corresponds to the probability (p values) of t-student for the comparison of the means of two classes before a Fisher test of variances.We assume a confidence level of 0.99 so when p < 0.01 differences in the analyte concentrations among the classes are significant.Results reveal the existence of several compounds such as syringaldehyde (at least, p < 0.0006), caffeic acid (at least, p < 0.0042) and homoplantaginin (at least, p < 0.0016) with statistically relevant differences in the concentration levels depending on the origin.Other species such as ferulic acid and nepetin-7-glucoside show no significant differences among Murcia and Czech Republic (p = 0.02 and 0.048, respectively).Finally, compounds such as chlorogenic acid are unspecific so its role in class description and discrimination is quite irrelevant (p = 0.04, 0.04 and 0.91 for La Vera/Murcia, Murcia/Czech Republic and La Vera/Czech Republic, respectively).

PDO authentication
Phenolic and polyphenolic concentration levels found in the analyzed paprika samples were evaluated as potential chemical descriptors to address sample classification and authentication.As a first approach a non-supervised exploratory PCA strategy was employed with the aim of studying the grouping trends among the analyzed samples.A matrix data was built including the 36 compound concentrations found in the 111 paprika samples and the QCs, and was subjected to PCA. Figure 2 shows the score plot of PC1 vs PC2 obtained.As can be seen, QCs appeared grouped and located close to the center area of the plot, showing the good performance and robustness of the proposed method and the chemometric results.QCs appeared distributed in the same area than La Vera Paprika PDO samples because QC composition is enhanced on La Vera Paprika due to the high number of samples belonging to this group (72 out of 111 paprika samples).Paprika samples were perfectly discriminated by PC1 in three separate groups: La Vera PDO at the left of the score plot, Murcia PDO at the top-right area, and Czech Republic samples at the bottom-right area of the plot.Therefore, concentration levels found with the proposed UHPLC-ESI-MS/MS method are excellent chemical descriptors to achieve sample discrimination regarding the paprika production region.In addition, paprika flavors from Murcia PDO (sweet vs spicy) and from Czech Republic (sweet vs smoked-sweet vs spicy) samples are also perfectly separated, being discriminated by PC2 and by PC1 in the case of Murcia PDO and Czech Republic samples, respectively.In contrast, no discrimination was observed among La Vera PDO paprika flavors (sweet, bittersweet and spicy), and all the samples appeared mixed.As previously commented in the introduction, phenolic and polyphenolic distribution and content in plant-based products may be related to multiple parameters such as climatic conditions, growing areas, water resources, agronomy conditions, etc.
The study of the PCA loadings plot allow to see which variables (concentration) are defining the separation observed in the score plot.Figure S4 (Supporting information) shows the obtained PCA loadings plot of PC1 vs PC2.Thus, the separation of Czech Republic samples is achieved mainly by the presence of homoplantaginin, nepetin-7-glucoside, p-coumaric acid and kaempferol among other compounds.Chlorogenic acid, rutin and hesperidin are more discriminating compounds for the Murcia PDO samples.In contrast, vanillin, homovanillic acid, syringaldehyde and quercetin seem to be the more characteristic compounds to separate La Vera PDO samples from the other two groups.Although more studies will be necessary, a priori these compounds would be good candidates as potential biomarkers for the authentication of paprika.
A supervised pattern recognition technique such as PLS-DA was used to discriminate paprika according to their geographical and/or botanical origins for authentication purposes.In this case, the X-data matrix was again the concentration of the compounds determined in the studied samples, while the Y-data matrix was the sample class.
A first study was focused on the classification of paprika samples according to geographical origin into La Vera, Murcia and Czech Republic types.In this case, the calibration set was composed of 48 La Vera, 16 Murcia and 10 Czech Republic samples randomly selected, which approximately corresponded to 70% of the analyzed samples.
The other ca.30% of the samples was used as the test set for prediction purposes.The optimum number of LVs established by cross validation using Venetian blinds was 4, providing the minimum of the root mean square error of cross validation (RMSECV) function.The analysis of scores and loadings of LV1 vs LV2 (not shown here) revealed that the three classes were perfectly separated and relevant compounds for their discrimination were similar to those annotated for PCA.
Figure S5 (Supporting information) shows the plots of the qualitative parameters (regression vector, the variable importance in projection (VIP) and the selectivity ratio) for the previously obtained PLS-DA model.These parameters allow to predict which variables (compounds) are more discriminant to achieve the obtained PLS-DA distribution.As can be seen, homovanillic acid and syringaldehyde are the compounds appearing as the most important variables in the three qualitative parameters, therefore being the two most relevant compounds for the PLS-DA classification when dealing with the paprika production region.
Figure 3 shows the classification plots corresponding to (a) La Vera (rhombus symbols) vs the other samples, (b) Murcia (square symbols) vs the other samples, and (c) Czech Republic (triangle symbols) vs the other samples.The dashed line indicated the classification boundary, so samples belonging to the targeted class were located to the top while those belonging to the other types were to the bottom.Samples to be used for calibration were to the left and those for prediction were to the right side.Results indicated that the classification rate was 100% so all the samples were correctly assigned to the corresponding classes in both calibration and prediction steps (confusion matrix was [24, 0, 0; 0, 8, 0; 0, 0, 5] for La Vera, Murcia and Czech Republic, respectively).
Table S2 (Supporting information) show the validation results for both calibration and prediction.The obtained validation results are satisfactory.Calibration sensitivity and specificity are 1, and the root mean square error of cross validation (RMSEC) and the bias showed values tending to zero, ensuring a good calibration model.PLS-DA models were also applied to each paprika production region in order to study the classification of samples according to the flavor variety, and the obtained results are shown in Figure S6 (Supporting information).In order to build them, a total of 4, 2 and 2 LVs were needed for La Vera, Murcia and Czech Republic sample classification, respectively, As can be seen, again no discrimination was observed among the different La Vera PDO paprika samples, showing that the distribution and content of the targeted compounds found on La Vera samples is not enough to allow discrimination between sweet, bittersweet and spicy samples.In contrast, perfect discrimination among flavor varieties was obtained for both Murcia PDO and Czech Republic paprika samples.Based on the qualitative parameters (regression vector, the variable importance in projection (VIP) and the selectivity ratio) for the PLS-DA models applied to Murcia PDO and Czech Republic samples (Figure S7 in the Supporting information), compounds such as vanillin, kaempferol and p-coumaric acid seem to be important for the discrimination of Murcia DOP flavor varieties, and others such as rutin, hesperidin and chlorogenic acid are playing also an important role.In the case of Czech Republic samples, nepetin-7-glucoside seem to be the most important compound to discriminate among the three flavor varieties under study, together with other compounds such as rutin, herperidin or p-coumaric acid among others.
In this work, and for the first time, an important number of phenolic and polyphenolic compounds belonging to different families were determined in a high number of Spanish paprika samples with PDO attributes.This is very important to know the distribution and levels of these chemicals, with antioxidant properties, in paprika samples with PDO, giving additional benefits and attributes to the agricultural practices and regions producing paprika.In addition, the results obtained in this work demonstrate that the phenolic and polyphenolic profiles and contents obtained by the proposed UHPLC-ESI-MS/MS method after a very simple sample extraction can be employed as good chemical descriptors for the characterization and classification of paprika samples These compounds resulted to be very useful also for the discrimination of flavor varieties in the case of Murcia PDO and Czech Republic paprika samples.
Finally, several compounds resulted to be important factors to address sample classification by PCA and PLS-DA, and could be considered as potential biomarkers for paprika authentication.
5 kV; H-ESI vaporizer temperature, 350 o C; ion transfer tube temperature, 350 o C. For compound quantitation and confirmation, multiple reaction monitoring (MRM) acquisition mode by recording two selected reaction monitoring (SRM) transitions (quantifier and qualifier transitions)

Figure captions Figure 1 .
Figure captions

Figure 2 .
Figure 2. PCA score plot of PC1 vs PC2 when using the 36 compound concentrations found in the analyzed paprika samples as chemical descriptors.

Figure 3 .
Figure 3. PLS-DA classification plots according to the production region.(a) La Vera versus other classes; (b) Murcia versus other classes; (c) Czech Republic versus other classes.Sample assignation: Rhombus = La Vera PDO, square = Murcia, triangle = Czech Republic.Dashed line means the classification boundary.

Figure 3 .
Figure 3. PLS-DA classification plots according to the production region.(a) La Vera versus other classes; (b) Murcia versus other classes; (c) Czech Republic versus other classes.Sample assignation: Rhombus = La Vera PDO, square = Murcia, triangle = Czech Republic.Dashed line means the classification boundary.

Figure S5 .
Figure S5.Plots of La Vera PDO class qualitative parameters (regression vector, the variable importance in projection (VIP) and the selectivity ratio) for the PLS-DA model obtained for the classification of paprika samples according to the production region (La Vera PDO, Murcia PDO and Czech Republic).

Figure S6 .Figure S7 .
Figure S6.PLS-DA score plots of LV1 vs LV2 when using the 36 compound concentrations as chemical descriptors for the classification of each production regions (La Vera PDO, Murcia PDO and Czech Republic samples) according to their different flavor varieties.