Non-targeted metabolomic biomarkers and metabotypes of type 2 diabetes: A 1 cross-sectional study of PREDIMED trial participants

Aim . – To characterize the urinary metabolomic fingerprint and multi-metabolite signature associated with type 2 diabetes (T2D), and to classify the population into metabotypes related to T2D. Methods. 35 – A metabolomics analysis using the 1H-NMR-based, non-targeted metabolomic approach was 36 conducted to determine the urinary metabolomic fingerprint of T2D compared with non-T2D 37 participants in the PREDIMED trial. The discriminant metabolite fingerprint was subjected to logistic 38 regression analysis and ROC analyses to establish and to assess the multi-metabolite signature of 39 T2D prevalence, respectively. Metabotypes associated with T2D were identified using the k-means 40 algorithm. multi-metabolite guanidoacetate, methylguanidine, hydro- acetoacetate

Urine samples were thawed at 4º C and gently vortexed before metabolomic analysis, using a 113 procedure based on a previously published methodology [17]. Briefly, 300 mL of urine were diluted 114 in 200 mL of H2O/D2O (8:2 ratio) and mixed with an internal standard solution [0.1% chemical-shift 115 reference 3-(trimethylsi-lyl)propionic-2,2,3,3-d4 acid sodium salt (TSP), 2 mM of sodium azide 116 (NaN3) and 1.5 M of KH2PO4 in 99% deuterated water (D2O)]; the pH was set at 7.0 with a KOD 117 solution. The 1H-NMR experiments were conducted using a 500-MHz spectrometer (Varian INOVA;118 Varian Medical Systems, Palo Alto, CA, USA), with presaturation of water resonance using a nuclear 119 Overhauser enhancement (NOESY)-presat pulse sequence. Internal tempera-ture was kept constant 120 at 298 K during acquisition. Spectra were acquired by collecting 128 scans at 32-K datapoints with a 121 spectral width of 14 ppm, acquisition time of 2 s, relaxation delay of 5 s and mixing time of 100 ms. 122 For spectral processing, the free induction decay (FID) was multiplied by an exponential function 123 corres-ponding to a 0.3-Hz line broadening before Fourier transformation. All spectra were phased, 124 baseline-corrected and referenced to TSP (d 0.0) using TopSpin version 3.2 software (Bruker BioSpin 125 GmbH, Rheinstetten, Germany). The spectral data were processed, using an intelligent bucketing 126 algorithm, in domains of 0.005 ppm [17] and integrated using ACD/NMR Processor 12.0 software 127 (Advanced Chemistry Development, Inc., Toronto, ON, Canada). The spectral region 4.75-5.00 ppm 128 was excluded from the dataset to avoid spectral interference from residual water. 129

Statistical analysis 130
A dataset containing integrals of NMR spectra was imported into MetaboAnalyst 3.0, a web-based 131 platform for extensive analysis of metabolomic data [18], filtered using interquantile range (IQR) and 132 row-wise-normalized by the sum of the spectral intensities. The normalized dataset was then imported 133 into SIMCA-P+ 13.0 (Umetrics, Umea , Sweden) before being log-transformed and range-scaled 134 prior to performing a principal component analysis to explore data distribution [17]. To reduce 135 variability not associated with T2D classification, orthogonal signal correction (OSC) was applied to 136 the dataset followed by partial least squares discriminant analysis (PLS-DA) to determine differences 137 in metabolite profiling between the T2D and non-T2D groups. The predictive ability of the OSC-138 removed from the whole dataset (training set), and the OSC-PLS-DA models calculated. This 140 procedure was repeated five times, and was used to evaluate the ability of the models to classify 141 prediction sets, and to calculate quality parameters of the method and the misclassification  Multi-metabolite signature model for T2D prevalence 166 The results obtained by OSC-PLS-DA analysis were subjected to forward conditional stepwise 167 logistic regression analysis to design a multi-metabolite signature model of T2D prevalence. The 168 prediction model was applied to a training set (two-thirds of participants) and subsequently validated 169 against a validation set (one-third of participants). Quality of the models was evaluated by calculating 170 the sensitivity, specificity and area under the receiver operating characteristic curves (AUROCs). 171 Urinary glucose was not included in this analysis due to the high AUROCs. The optimal cut-off for 172 calculating sensitivity and specificity was determined as the minimum distance to the top left-hand 173 corner [21]. Significance was set at P < 0.05. IBM SPSS version 21 statistical software (IBM Corp) 174 was used to perform the logistic regression and ROC analyses. 175

Metabolic phenotypes by k-means algorithm 176
Cluster analysis to identify metabolic phenotypes, or metabo-types, was performed using the k-means 177 cluster algorithm in MetaboAnalyst 3.0 [12,22]. This generated two clusters in the diabetes patients 178 and two clusters in the non-diabetic participants by taking as inputs the identified metabolites from 179 the OSC-PLS-DA analysis and applying the k-means clustering algorithm [12]. After k-means 180 analysis, the results for the four clusters were visualized using hierarchical clustering analysis. 181

Results 182
Subjects' characteristics 183 Our participants were 67 ± 6 years old and nearly one-third were male (Table 1). Also, 55% of 184 participants had T2D and 47% were obese. They were divided according to T2D diagnosis, as 185 previously reported [14,16]. Both groups (T2D and non-T2D) were well balanced in terms of 186 demographic characteristics and other cardiovascular risk factors, such as blood pressure, plasma lipids, and antihypertensive and hypolipidaemic medications (P > 0.05). Otherwise, measures of waist 188 circumference, plasma glucose and use of antidiabetic agents were significantly higher in the T2D 189 patients, as expected. 190 were calculated from the OSC-PLS-DA models when samples were predicted (n = 5); these values 198 were then included in a misclassification table (Table S1; see supplementary materials associated  199 with this article online). Thus, t-test analyses among VIP > 1.0 identified 33 metabolites that were 200 significantly different between the T2D and non-T2D participants (Table 2). Of these metabolites, 17 201 were significantly increased in T2D patients compared with non-T2D subjects, while the remaining 202 16 metabolites were decreased in T2D patients. In addition, the metabolic fingerprint associated with 203 T2D was found to be significantly independent of waist circumference except for 4-deoxythreonic 204 acid and citrate (P = 0.11), and 3-hydroxybutyrate (3HB) (P = 0.062; Table 2). Furthermore, no 205 statistical differences were observed in levels of metabolites among T2D patients whether taking drug 206 treatment or not (data not shown). A comprehensive analyses of the metabolic pathways (P and 207 impact values) revealed that the carbohydrate and amino-acid pathways were the most altered among 208 T2D patients (Fig. S2, Table S2; see supplementary materials associated with this article online). The 209 metabolites involved in these pathways can be up-and downregulated ( Fig. S3; see supplementary 210 materials associated with this article online), and each metabolite is related to its own pathway (Table  211 S3; see supplementary materials associated with this article online). 212

Profiles of discriminant metabolites of T2D biomarkers by 1H-NMR metabolomics
Multi-metabolite signature of T2D prevalence The multi-metabolite signature for better discrimination of T2D prevalence included higher levels of 214 methylsuccinate, alanine, dimethylglycine and guanidinoacetate, as well as lower levels of glutamine, 215 methylguanidine, 3-hydroxymandelate and hippurate (Table S4; see supplementary materials  216 associated with this article online). In the validation set, the specificity and sensitivity of the multi-217 metabolite signature were 87.0% and 96.4%, respectively, while the AUROC was 96.4% (95% CI: 218 92.0-100%; P < 0.001). However, the specificity, sensitivity and AUROC values of each individual 219 metabolite as well as urinary glucose were lower than those of the multi-metabolite signature (Fig. 1,  220 Table 3). 221

Characterization of metabotypes 222
Unsupervised analysis of k-means gave two metabotypes of diabetes participants and two 223 metabotypes of non-diabetes participants (Table S5; see supplementary materials associated with this 224 article online) from data for the 33 identified metabolites. After determining those four metabotypes, 225 the results were visualized using hierarchical clustering (heatmap) analysis (Fig. 2), where 226 samples/individuals are shown on the x-axis and metabolites are displayed on the y axis. Most of the 227 up-and downregulated metabolites observed in the clusters were similar to those reported in Table 2  228 for T2D and non-T2D participants except for four metabolites: acetoacetate (AA); p-cresol; 229 phenylalanine; and phenylacetylglutamine (PAG). Levels of these four metabolites were significantly 230 higher in clusters 2 and 3 than in clusters 1 and 4 ( Fig. 2; P < 0.05) on stratifying the entire cohort, 231 and were orthogonal for T2D. Thus, the two metabotypes of T2D (clusters 1 and 2) and two 232 metabotypes of non-T2D (clusters 3 and 4) differed in these four metabolites. Cluster differences for 233 subjects' characteristics, concentrations of biochemical parameters and use of medication are 234 presented in Table S5 (see supplementary materials associated with this article online). The main 235 difference was that cluster 2, followed by cluster 1, had the highest plasma glucose levels, and both 236 were significantly different from clusters 3 and 4 (P < 0.001). As expected, the use of insulin and oral 237 antidiabetic agents was significantly different between T2D and non-T2D participants (P < 0.001), 238 but did not differ between T2D metabotypes (P = 0.20). 239

Discussion 240
The present study found significant differences in the profile of 33 urinary metabolites between T2D 241 and non-T2D participants, using a 1H-NMR-based, non-targeted metabolomic approach. 242 Specifically, a model of eight metabolites was the multi-metabolite signature that discriminated 243 between T2D and non-T2D after stepwise logistic regression analysis and AUROC evaluation. To 244 the best of our knowledge, this was the first-ever study to use spot urine to determine the pathways 245 altered in T2D in a free-living population, along with identifying a multi-metabolite signature of T2D 246 prevalence while highlighting the key implied metabolites. This metabolomic clinical study also 247 confirms the associated perturbations of amino-acid metabolism, with some amino acids being used 248 as substrates for gluconeogenesis. In addition, the increased excretion of amino acids could indicate findings that a microbiota imbalance could be key in the pathogenesis of a diabetic state and that 281 healthy diets and/or lifestyle patterns directed towards improving microbiota quality are essential for 282 preventing advanced pathological states [31]. The present study identified two distinct metabotypes 283 in T2D patients (clusters 1 and 2) and two in the non-T2D participants (clusters 3 and 4) using k-284 means cluster analysis based on their identified metabolic profiles. It should be noted that the 285 metabotype comprising higher levels of four metabolites (phenylalanine, PAG, p-cresol and AA) was 286 found in the entire study population and was orthogonal for T2D. In particular, differences were 287 observed in some parameters between clusters 1 and 2 (T2D patients) whereas no differences were 288 noted between clusters 3 and 4 (non-T2D subjects). Although the increase in these metabolites were 289 orthogonal for T2D, when the focus was on diabetes patients, those with higher levels of those four 290 metabolites also had higher levels of plasma glucose, but with no differences in use of antidiabetic 291 medications or in other characteristics. Thus, our hypothesis is that the T2D patients in cluster 2 could 292 have had a greater lack of control over their disease which, in the long term, could have led to a 293 greater number of complications such as myocardial infarction, stroke, heart failure and kidney 294 disease [32]. Certainly, phenylalanine has been described as a marker of higher diabetes risk [7,23] 295 and, furthermore, has also been used together with tyrosine and isoleucine to predict long-term future cardiovascular events, an increased disposition towards atheroscle-rosis and perhaps even inducible 297 myocardial ischaemia [33]. In addition, phenylalanine has been identified as a biomarker associated 298 with future cardiovascular events in meta-analyses [34]. PAG and p-cresol are metabolites of 299 microbial origin [35]. PAG comes from the conversion of phenylalanine to phenylacetate by 300 microbiota and its subsequent conjugation with glutamine [36]; and p-cresol, the most widely studied 301 uraemic retention solute, is formed by microbial metabolism of tyrosine [36]. PAG has been 302 described as a strong independent risk factor for mortality and CVD in patients with chronic kidney 303 disease [35], while p-cresol has been described as a predictor of cardiovascular events independent 304 of GFR in patients with mild-to-moderate kidney disease [37]. The fourth metabolite that differed 305 between T2D clusters was the ketone body AA. This is generated from the ketogenic amino-acid 306 lysine and may also be derived from b-oxidation of fatty acids. AA and 3HB are at a ratio of 1:1 in a 307 physiological state, although 3HB increases its excretion in ketoacidosis [38]. Recent evidence has 308 highlighted the association between elevated levels of ketone bodies and hyperglycaemia and T2D 309 [39]. It is also worth noting that the T2D patients in clusters 1 and 2 had similar mean ratios of 310 AA:3HB (1:2), whereas clusters 3 and 4 (non-T2D) had mean ratios of 1:1 (albeit not statistically 311 significant). However, there were statistically significant differences (P = 0.007) between ratios in 312 T2D (1:2) vs non-T2D (1:1) participants. Both hyperketonaemia and ketosis have been related to 313 liver, brain and microvasculature complications, which can increase the risk of morbidity and 314 mortality [40]. Therefore, the subjects in clusters 2 and 3 with increased levels of these four 315 metabolites could have higher risks of CVD and other such events in future. Thus, further studies 316 should now evaluate these metabolites in such populations in long-term studies. One limitation of our 317 present study is that the panel of metabolites and the model used for the multi-metabolite signature 318 imprinting of T2D were obtained from a high-cardiovascular-risk population, and so needs to be 319 validated and replicated in other populations. In addition, the metabolite panel should also be tested 320 in patients with different grades of T2D, including prediabetes states, to determine its limit values for 321 prediction. Moreover, it would be of interest to evaluate whether our metabotypes are modified in 322 states such as prediabetes. Another limitation of our study is that the microbial composition in these 323 participants was unknown, thereby preventing any correlations with the identified metabolites. On the other hand, one strength of our study is that it reproduced of real-life conditions of the participants. 325 In conclusion, the results of our cross-sectional study using a non-targeted 1H-NMR metabolomics 326 approach reveal a multi-metabolite signature of T2D prevalence comprising eight metab-olites 327 belonging to pathways related mainly to glucogenic and ketogenic amino acids, glycolysis and 328 gluconeogenesis, carboxylic acid metabolism and changes in gut microbiota metabolism. This is also 329 the first study to identify metabotypes in T2D, revealing that such patients have higher levels of 330 phenylalanine, PAG, p-cresol and AA-metabolites related to higher risks of long-term cardio-331 vascular events-and also higher levels of plasma glucose. Nevertheless, as they were orthogonal for 332 T2D, further studies now need to evaluate their long-term effects. In addition, this study reinforces 333 the use of metabolomics to discover and to evaluate the main metabolic pathways altered in T2D and 334 the metabotypes of individuals. Thus, it would be highly useful to investigate T2D diagnosis and 335 treatment to further support the development of stratified and precision medicine.