Predicting clinical outcome with phenotypic clusters in COVID-19 pneumonia: 2 an analysis of 12,066 hospitalized patients from the Spanish registry SEMI-3 COVID-19.

(1) Background: This study aims to identify different clinical phenotypes in COVID-19 88 pneumonia using cluster analysis and to assess the prognostic impact among identified clusters in 89 such patients. (2) Methods: Cluster analysis including 11 phenotypic variables was performed in a 90 large cohort of 12,066 COVID-19 patients, collected and followed-up from March 1, to July 31, 2020, 91 from the nationwide Spanish SEMI-COVID-19 Registry. (3) Results: Of the total of 12,066 patients 92 included in the study, most were males (7,052, 58.5%) and Caucasian (10,635, 89.5%), with a mean 93 age at diagnosis of 67 years (SD 16). The main pre-admission comorbidities were arterial 94 hypertension (6,030, 50%), hyperlipidemia (4,741, 39.4%) and diabetes mellitus (2,309, 19.2%). The 95 average number of days from COVID-19 symptom onset to hospital admission was 6.7 days (SD 7). 96 The triad of fever, cough, and dyspnea was present almost uniformly in all 4 clinical phenotypes 97 identified by clustering. Cluster C1 (8,737 patients, 72.4%) was the largest, and comprised patients 98 with the triad alone. Cluster C2 (1,196 patients, 9.9%) also presented with ageusia and anosmia; 99 cluster C3 (880 patients, 7.3%) also had arthromyalgia, headache, and sore throat; and cluster C4 100 (1,253 patients, 10.4%) also manifested with diarrhea, vomiting, and abdominal pain. Compared to 101 each other, cluster C1 presented the highest in-hospital mortality (24.1% vs. 4.3% vs. 14.7% vs. 102 18.6%; p<0.001). The multivariate study identified phenotypic clusters as an independent factor for 103 in-hospital death. (4) Conclusion: The present study identified 4 phenotypic clusters in patients with 104 COVID-19 pneumonia, which predicted the in-hospital prognosis of clinical outcomes.


111
Since January 2020, the COVID-19 pneumonia pandemic has spread across the globe. As of      is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 15, 2020. . https://doi.org/10.1101/2020.09.14.20193995 doi: medRxiv preprint

Treatments prescribed 132
The treatments received were in accordance with the medical guidelines available at the time of 133 the pandemic [5][6][7][8][9][10][11]. In the absence of clinical evidence of any of the treatments at the initial time of 134 the pandemic, their use was allowed off-label.

144
The cluster analysis was performed by ascendant hierarchical clustering on the 11 variables 145 previously selected by using Ward's minimum variance method with Euclidean squared distance 146 [12]. Results are graphically depicted by a dendrogram. The number of clusters was estimated by a 147 visual distance criterion of the dendrogram. The cluster analysis model was included in a binary 148 logistic regression, taking the two above-mentioned outcomes as dependent variables. Mortality 149 among the groups was represented by the Kaplan-Meier curves with their logarithmic range test.    Table 1. Patients were mostly males (7,052, 58.5%) and Caucasian (10,635, 89.5%).

156
The mean age at diagnosis was 67 years (SD 16). The average number of days from symptom onset 157 to hospital admission was 6.7 days (SD 7). The main pre-admission comorbidities were arterial 158 hypertension (6,030, 50%), hyperlipidemia (4,741, 39.4%) and diabetes mellitus (2,309, 19.2%). The 159 mean Charlson index among patients was 1.2 (SD 1.8). The most common symptoms (Table 2)  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint    The treatments received are shown in Table 4

196
Admissions to the ICU numbered 1,120 patients (9.3%). Overall, the mortality rate was 20.9% (2,522 197 patients). The outcomes are shown in Table 5 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 15, 2020. . https://doi.org/10.1101/2020.09.14.20193995 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 15, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 15, 2020. . https://doi.org/10.1101/2020.09.14.20193995 doi: medRxiv preprint 237 238 239 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 15, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 15, 2020. . https://doi.org/10.1101/2020.09.14.20193995 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 15, 2020.   is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 15, 2020. . https://doi.org/10.1101/2020.09.14.20193995 doi: medRxiv preprint arthromyalgia, headache, and/or sore throat. Finally, the C4 cluster also manifests with digestive 267 symptoms such as diarrhea, vomiting, and/or abdominal pain.

268
In terms of prognosis, the C1 cluster showed the highest mortality rate (24.1%) in this large 269 Spanish nation-wide series. It was followed by C4 (18.6%), C3 (14.7%), and finally C2 (4.3%). The 270 crude survival study identified the C2 cluster as a cluster of good prognosis. The multivariate 271 regression study showed a non-significative trend to better prognosis. Also identified the C3 cluster 272 as another good prognostic subgroup, in addition to C2. In contrast, the C1 and C4 clusters were 273 identified as the poorest prognosis clusters.

274
The risk factors recognized so far for poor prognosis have been repeated in several studies. The

295
As for the generalization of our results, it should be noted that the data come from a developed 296 European western country with a mostly Caucasian population and little representation of other 297 ethnicities. Furthermore, it should also be taken into account that Spain has a universal-coverage 298 public healthcare system, not comparable with some other developed and developing countries. On 299 the other hand, proportionally speaking, Spain has one of the largest elderly populations in the 300 world and, as is well known, age has been described as a fundamental factor in the poor prognosis of   is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 15, 2020. . https://doi.org/10.1101/2020.09.14.20193995 doi: medRxiv preprint suggested phenotyping as a function of pathophysiology [16,17]. It would be interesting to combine 309 all methods of phenotyping.

310
We believe that the identification of the present clusters may be of great help to clinicians in 311 order to identify those cases with a better or worse prognosis, and thus direct more individualized 312 therapeutic strategies. In this regard, we also believe that identification of phenotypes can serve as a 313 guide for clinical trials, not evaluating new treatments in general, since not all subgroups of 314 COVID-19 patients may benefit from the same therapeutic strategies. On the other hand, drugs 315 previously discarded, but with a rational pathophysiological basis to be tested, should be reanalyzed 316 to clarify their real efficacy, taking into account the different clinical spectrum of COVID-19 patients.

317
The main strength of this study is the identification of different phenotypic clusters in COVID-19 318 pneumonia from a very large sample of more than 12,000 patients from more than 100 hospitals.

319
Among limitations, data were obtained from a retrospective register of a sole country, which means 320 that some specific data could be missing or collected with some grade of heterogeneity.

322
In conclusion, the present study identified 4 phenotypic clusters that predicted in-hospital 323 prognosis of clinical outcome in a large nationwide series of patients with COVID-19 pneumonia.

324
Clusters associated with bad in-hospital prognosis were C1, in which subjects presented with the 325 isolated triad of fever, cough, and dyspnea, and C4 also manifested with diarrhea, vomiting, and/or 326 abdominal pain. In contrast, subjects grouped in the C2 cluster (manifested also with ageusia and/or 327 anosmia) showed the best prognosis, together with cluster C3 (adding arthromyalgia, headache, 328 and/or sore throat), which was second only to C2 showing a good outcome.

332
We gratefully acknowledge all the investigators who participate in the SEMI-COVID-19 Registry.

333
We also thank the SEMI-COVID-19 Registry Coordinating Center, S&H Medical Science Service, for   is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 15, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 15, 2020. . https://doi.org/10.1101/2020.09.14.20193995 doi: medRxiv preprint 543 . CC-BY-NC 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 15, 2020. . https://doi.org/10.1101/2020.09.14.20193995 doi: medRxiv preprint 583 H. Virgen de la Salud. Toledo

584
. CC-BY-NC 4.0 International license It is made available under a perpetuity.

626
. CC-BY-NC 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 15, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 15, 2020. . https://doi.org/10.1101/2020.09.14.20193995 doi: medRxiv preprint