New Sepsis Definition (Sepsis‐3) and Community‐acquired Pneumonia Mortality. A Validation and Clinical Decision‐Making Study

Rationale: The Sepsis‐3 Task Force updated the clinical criteria for sepsis, excluding the need for systemic inflammatory response syndrome (SIRS) criteria. The clinical implications of the proposed flowchart including the quick Sequential (Sepsis‐related) Organ Failure Assessment (qSOFA) and SOFA scores are unknown. Objectives: To perform a clinical decision‐making analysis of Sepsis‐3 in patients with community‐acquired pneumonia. Methods: This was a cohort study including adult patients with community‐acquired pneumonia from two Spanish university hospitals. SIRS, qSOFA, the Confusion, Respiratory Rate and Blood Pressure (CRB) score, modified SOFA (mSOFA), the Confusion, Urea, Respiratory Rate, Blood Pressure and Age (CURB‐65) score, and Pneumonia Severity Index (PSI) were calculated with data from the emergency department. We used decision‐curve analysis to evaluate the clinical usefulness of each score and the primary outcome was in‐hospital mortality. Measurements and Main Results: Of 6,874 patients, 442 (6.4%) died in‐hospital. SIRS presented the worst discrimination, followed by qSOFA, CRB, mSOFA, CURB‐65, and PSI. Overall, overestimation of in‐hospital mortality and miscalibration was more evident for qSOFA and mSOFA. SIRS had lower net benefit than qSOFA and CRB, significantly increasing the risk of over‐treatment and being comparable with the “treat‐all” strategy. PSI had higher net benefit than mSOFA and CURB‐65 for mortality, whereas mSOFA seemed more applicable when considering mortality/intensive care unit admission. Sepsis‐3 flowchart resulted in better identification of patients at high risk of mortality. Conclusions: qSOFA and CRB outperformed SIRS and presented better clinical usefulness as prompt tools for patients with community‐acquired pneumonia in the emergency department. Among the tools for a comprehensive patient assessment, PSI had the best decision‐aid tool profile.

Community-acquired pneumonia (CAP) represents a significant infection burden worldwide, and it is often complicated by sepsis (1)(2)(3)(4). Early recognition of sepsis is fundamental to guide treatment, improve outcomes, and decrease costs (5)(6)(7). In contrast, in patients with uncomplicated infection, over-treatment should be avoided to prevent unnecessary harm.
Sepsis is a syndrome characterized by a dysregulated host response to infection leading to life-threatening organ dysfunction (5). In 2016, the Sepsis-3 Task Force updated previous recommendations primarily aiming to accurately differentiate between sepsis and uncomplicated infection (5). By applying a data-driven approach to identify patients at risk of worse outcomes, the Task Force proposed a new clinical definition, removing the need for systemic inflammatory response syndrome (SIRS) criteria. Thus, in infected patients, sepsis was clinically defined by an increase in Sequential (Sepsis-related) Organ Failure Assessment (SOFA) score of two points or more. Additionally, a bedside score for risk stratification, namely the quick SOFA (qSOFA), has been proposed, which incorporates hypotension, altered mental status, and tachypnea (5,8).
In patients with CAP, several scores have been developed to identify high-risk patients and support therapeutic decisions (4,9). Two of these scores, Confusion, Urea, Respiratory Rate, Blood Pressure and Age (CURB-65) and Pneumonia Severity Index (PSI), are well-validated scores to support CAP management and prognosis (9,10). Simplifications of CURB-65 (i.e., Confusion, Respiratory Rate and Blood Pressure [CRB]-65 and CRB) (11) have been developed and validated to facilitate the risk stratification process; these simplified scores do not require blood tests (12), as in the qSOFA. Yet the definitions for hypotension and tachypnea parameters on the CRB tool differ from those of the qSOFA.
Sepsis-3 will change clinical practice and influence medical decisions. However, clinical decision-making cannot rely only on predictive performance measures, such as discrimination and calibration (13,14). Indeed, decision-aid tools must also account for the benefits and harms resulting from clinicians' choice (13,14). To date, no clinical decision-making analysis of Sepsis-3 is available, including the proposed bedside tool (qSOFA) and the Sepsis-3 Flowchart, which includes qSOFA and SOFA scores. Therefore, the aim of our study was to evaluate three tools for initial assessment (SIRS, qSOFA, and CRB) and three tools for a comprehensive assessment (SOFA, CURB-65, and PSI) as decision-aid prognostic tools in CAP using decision-curves methodology. Additionally, the Sepsis-3 flowchart was also applied in this population.
Some of the results of this study have been previously reported in the form of an abstract (15).

Study Design and Patients
We retrospectively analyzed patients from two cohorts, which prospectively included patients aged greater than or equal to 16  CAP was defined as a new pulmonary infiltrate on chest radiograph on hospital admission and acute symptoms of lower respiratory tract infection (e.g., fever, cough, sputum production, pleuritic chest pain). Immunosuppression (i.e., patients taking more than 10 mg of prednisone-equivalent per day for at least 2 wk, on cytotoxic therapy, or with acquired immunodeficiency syndrome) and active tuberculosis were exclusion criteria. We included patients from nursing homes. Demographic variables, comorbidities, and physiologic parameters were collected in the emergency department (ED). All patients had a complete microbiologic evaluation and microbiologic confirmation of CAP was defined according to current guidelines (16,17). In each institution, a dedicated clinical researcher prospectively included patients, under the supervision of an experienced pulmonary physician. Patients were followed up until hospital discharge, and all survivors were reexamined or contacted by telephone 30 days after hospital discharge. Further details are reported in previous publications (16,17).

Outcomes
Our primary outcome was all-cause inhospital mortality (5,8). We also explored two secondary outcomes: in-hospital mortality and/or need for critical care support greater than or equal to 3 days (composite outcome) (5,8); and 30-day mortality. We defined need for critical care support as admission to an intensive care unit (ICU) or high-dependency unit (HDU).

Score Definition
We clustered the six scores in those that might facilitate the clinician's initial decision (SIRS, qSOFA, and CRB), and clinician's decision after initial management and

At a Glance Commentary
Scientific Knowledge on the Subject: In 2016, the Sepsis-3 Task Force updated the clinical criteria for sepsis, excluding the need for systemic inflammatory response syndrome and introducing a flowchart that comprises the quick Sequential (Sepsis-related) Organ Failure Assessment (qSOFA) and SOFA scores. However, the clinical decision-making process cannot rely on risk stratification scores, because a decision-aid tool must account for the benefits and harms of clinicians incorporating that tool into clinical practice. A clinical decision-making analysis of Sepsis-3 is not yet available.

What This Study Adds to the
Field: We demonstrated that qSOFA outperformed systemic inflammatory response syndrome (SIRS) criteria and presented better clinical usefulness in patients with community-acquired pneumonia. Among the tools for initial assessment, SIRS presented the worst net benefit versus qSOFA and Confusion, Respiratory Rate and Blood Pressure (CRB), significantly increasing the risk of over-treatment and being comparable with the "treat-all" strategy. Among the tools for a comprehensive assessment, Pneumonia Severity Index (PSI) had better predictive performance and net benefit for mortality than modified SOFA and the Confusion, Urea, Respiratory Rate, Blood pressure and Age score (CURB-65), whereas modified SOFA was more useful when considering mortality/intensive care unit admission. Finally, following the Sepsis-3 flowchart resulted in better identification of patients at high risk of worse outcomes.
additional examinations (SOFA, CURB-65, and PSI). We adapted the Sepsis-3 flowchart illustrating this approach and the timeline of the clinical decision-making processes involved in the ED (Figure 1).
We defined SIRS, qSOFA, CRB, CURB-65, and PSI as originally described (see Table  E1 in the online supplement) (5,8,9,12). For SOFA score, we calculated the respiratory, hematologic, hepatic, and renal systems as originally described. However, we adapted the SOFA calculation for neurologic and cardiovascular parameters, using a conservative approach similar to Sepsis-3 (modified SOFA [mSOFA]) (see Table E1). We used the first clinical signs/symptoms documented in the ED for all scores. For mSOFA, we used the first reported data, comprising the early resuscitation phase, as previously validated (18). For missing mSOFA values, we attributed a normal value (i.e., zero points), reflecting clinical practice and as widely reported (5,8). In a sensitivity analysis, we used multiple imputation (5,8). We also compared qSOFA and CRB with their corresponding qSOFA-65 and CRB-65, by adding the age component.

Statistical Analysis
We assessed the predictive performance of SIRS, qSOFA, CRB, mSOFA, CURB-65, and PSI for the primary and secondary outcomes (19). We evaluated calibration with calibration plots and two complementary goodness-of-fit statistics (Hosmer-Lemeshow and the Le Cessie-van Houwelingen-Copas-Hosmer tests) (20). Calibration curves were built with a smoothed nonparametric method (20,21). We used the area under the receiver operating characteristic curve (AUROC) to assess discrimination. The 95% confidence interval (CI) estimation for the AUROCs and their comparisons were performed using bootstrapping methods in 10,000 samples (21,22). Overall fit was assessed using scaled Brier score and Nagelkerke R-square (19,21). To incorporate important information that clinicians might have at the bedside (8), we evaluated the additional predictive contribution of SIRS, qSOFA, CRB, and mSOFA to a baseline risk for in-hospital mortality estimated by a multivariate logistic regression model. The baseline risk model included age, sex, chronic respiratory disease, chronic neurologic disease, liver disease, heart failure, diabetes mellitus, neoplasia, chronic renal disease, and microbiologic confirmation. The baseline and additional risk models were fitted after multiple imputation.
For a score to be clinically useful, it must have good discrimination and be wellcalibrated but those alone are not enough (14,23,24). Indeed, discrimination and calibration may not reflect clinical utility (25). The main barrier to translating discrimination and calibration to clinical practice is that sensitivity, specificity, and prediction errors are weighted equally (e.g., true-positive and false-positive rates), whereas clinicians usually apply different weights during the decision-making process (23). Decision-curve analysis is a method that depicts the predicted net benefit (NB; NB = benefit 3 true-positive Initial assessment Further assessment Figure 1. Flowchart about the decision-making process for community-acquired pneumonia management at the emergency department. *First clinical decision encompasses the decision to assess organ dysfunction and pneumonia severity with additional laboratory and/or invasive procedures. † Second clinical decision encompasses the decision, after full assessment, to admit the patient to the ward/intensive care unit, consider additional treatment not yet started, or change treatments already started. The flowchart does not regulate timing for institute life-saving treatments or, for instance, prompt starting of empiric antibiotic treatment. CAP = community-acquired pneumonia; CRB = Confusion, Respiratory Rate and Blood Pressure; CURB-65 = Confusion, Urea, Respiratory Rate, Blood Pressure and Age; PSI = Pneumonia Severity Index; qSOFA = quick Sequential (Sepsis-related) Organ Failure Assessment; SIRS = systemic inflammatory response syndrome.
classifications 2 harm/cost 3 false-positive classifications) of a prediction tool over a range of threshold probabilities. Threshold probabilities quantify how over-treatment is considered against treatment benefits (19,23,(25)(26)(27)(28). For instance, if a clinician weights the harm/cost of overtreatment versus the benefit of appropriated treatment at 1:19, we have a threshold probability of 5% and a number willing to treat (NWT) of 20 (26,29). Decision curves have the advantage of being able to plot a plausible range of threshold probabilities.
We defined 100 to 5 NWTs as a plausible range (i.e., threshold probabilities from 0% to 20%), because it is unlikely that clinicians will use a score to make decisions about treatment of infected patients for higher threshold probabilities. At any given NWT, the score with the higher NB is the preferred one. The NB of each score was estimated for the primary and secondary outcomes and compared with the "treat-none" and "treatall" strategies. The "treat-all" strategy assumes everyone will develop the event and receive the intervention independent of any score. The associated intervention comprises the initial treatment of patients with sepsis in the ED, such as additional blood sampling, aggressive resuscitation, intensive monitoring, invasive procedures, and place of treatment. We hypothesized harm, at patient and hospital levels, associated with over-treatment and overuse of hospital resources, such as adverse events of broad-spectrum antibiotics and aggressive resuscitation/invasive procedures, ICU admission for patients unlikely to benefit, and hospital costs ( Figure 1) (4, 5, 30). Finally, we described the distribution and outcomes of patients based on combinations among SIRS (resembling Sepsis-2 definition), qSOFA (Sepsis-3 flowchart), and CRB with mSOFA.
Sensitivity, specificity, and positive and negative predictive values were calculated as shown elsewhere. Because we expected few missing values for SIRS, qSOFA, and CRB, our main analysis was conducted on the complete-case data; for sensitivity analysis, we conducted multiple imputation. We prespecified two subgroups, defined by age (,65, >65 yr) and chronic comorbidities (without chronic comorbidities, one or more chronic comorbidity). All statistical analyses were performed using R software version 3.2.2 (R Foundation for Statistical Computing, Vienna, Austria) (31). We followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis guidelines (32) and further information about methods and statistical analysis are in the online supplement.

Score Distribution
Our complete-case analysis comprised 6,024 patients (87.6%) (see Tables E2 and E3, Figure E1). There was a clear association between qSOFA, CRB, mSOFA, CURB-65, and PSI with in-hospital mortality. Nevertheless, higher SIRS points poorly predicted in-hospital mortality ( Figure 2). Similar results were found in the imputed data (see Figure E2) and for secondary outcomes (see Figures E3 and E4). Very few patients who were discharged after a short ED stay had qSOFA and CRB greater than or equal to two points (4% and 2%, respectively), whereas 61% had SIRS greater than or equal to two points ( Figure 3). These patients had very low 30-day mortality (3 of 744; 0.4%). In contrast, patients admitted to the ICU/HDU had in-hospital mortality of 15.7%, and higher scores. Sepsis (infection 1 mSOFA >2 points) was present in 17% of patients discharged after a short ED stay, 64% of those admitted to the ward and 89% of patients treated in the ICU/HDU (Figure 3).  Table 2; see Figure E5). All scores presented worse discrimination for in-hospital mortality in patients greater than or equal to 65 years old. In those patients without chronic comorbidities, the discrimination of all scores improved (see Table E4). Regarding calibration, in general scores overestimated in-hospital mortality, and miscalibration was more evident for qSOFA, mSOFA, and CURB-65 (Table 2; see Figure E5).
The overall performance measured by the scaled Brier score and R-square increased from SIRS to qSOFA, CRB, CURB-65, mSOFA, and PSI (Table 2). We observed similar results when analyzing the Barcelona and Valencia cohorts separately, but mSOFA and CURB-65 had better discrimination in the Valencia cohort (see Table E5). We found similar results when analyzing the imputed data (see Table E6, Figure E6) and for secondary outcomes (see Tables E7 and E8, Figures E7 and E8). Nevertheless, for the composite outcome, CRB had better discrimination than qSOFA and mSOFA had the highest discrimination and best calibration. CRB-65 outperformed qSOFA, CRB, and qSOFA-65 for inhospital mortality (see Table E9, Figure E9).

Clinical Usefulness and Decision-Curve Analysis
Among the tools for the initial assessment, SIRS greater than or equal to two presented high sensitivity and low specificity, whereas qSOFA greater than or equal to two and CRB greater than or equal to two presented moderate sensitivity and high specificity for in-hospital mortality ( Table 2). Among the follow-up tools, mSOFA greater than or equal to two presented high sensitivity and low specificity and CURB-65 greater than or equal to two and PSI greater than or equal to four presented a good compromise between sensitivity (78% and 92%, respectively) and specificity (60% and 47%, respectively). CRB had the highest positive likelihood ratio (3.05; 95% CI, 2.65-3.51) and PSI the lowest negative likelihood ratio (0.16; 95% CI, 0.12-0.23) ( Table 2). We observed the same pattern in the imputed data (see Table E6) and for secondary outcomes (see Tables E7 and E8). In the subgroup analysis for in-hospital mortality, we observed similar findings except that mSOFA greater than or equal to two had higher specificity in the subgroup of patients aged less than 65 years (sensitivity, 94%; specificity, 51%) and without chronic comorbidities (sensitivity, 88%; specificity, 51%) (see Table E11).
The NB of qSOFA and CRB outperformed SIRS for in-hospital mortality, and SIRS showed an NB close to the "treat-all" strategy for most of the NWT values ( Figure 4A). For NWT between 15 and 30 and lower than 8, CRB had higher NB than qSOFA. PSI had the highest NB over the whole NWT range, except for values lower than eight, when mSOFA outperformed PSI for in-hospital mortality. When translating these findings to the number of avoided interventions in a hypothetical population of 100 patients with pneumonia, assuming a physician weights the harm/cost of overtreatment versus the benefit of appropriated treatment at 1:19 (NWT = 20), the number of interventions could have been decreased by 7% without missing any death using SIRS, 16% using qSOFA, 27% using CRB or mSOFA, 30% using CURB-65, and 35% using PSI ( Figure 4B). We observed similar findings on NB for secondary outcomes, except that mSOFA outperformed other scores for a wide range of NWT for the composite outcome ( Figures 4C-4F). The NB of the full models showed that baseline 1 SIRS had virtually no advantage compared with the baseline model alone. The models baseline 1 qSOFA and baseline 1 CRB had higher NB than previous models for NWTs between 25 and 7. In contrast, baseline 1 mSOFA presented the highest NB over the whole NWT range (see Figure E11).

Discussion
In a population of patients with CAP, qSOFA outperformed SIRS for in-hospital mortality risk stratification and presented better clinical usefulness virtually in all evaluations. CRB had slightly better predictive performance than qSOFA for discrimination and calibration measures, but presented similar clinical usefulness for most scenarios. For a comprehensive assessment of CAP, mSOFA and PSI had the best predictive performance and highest NB. The combination of qSOFA or CRB with mSOFA better selected high-risk patients, while potentially decreasing the burden of intensive monitoring and overtreatment. The Sepsis-2 definitions, published in 2001, raised awareness of sepsis syndrome and have been associated with better care and outcomes (6,7). However, SIRS criteria weakly predicted patient outcomes (3,33), which associated with its high sensitivity and low specificity, likely classify SIRS as an unreliable tool for bedside clinical decisionmaking and research (5,8,34,35). Our current analysis in patients with CAP confirmed these limitations (3) and highlighted risks of overtreatment, demonstrating that the NB of SIRS is comparable with the "treat-all" strategy.
Indeed, the decision-curve analysis showed that when different weights for true-positive and false-positive classifications were applied, SIRS did not provide any additional benefit for decision-making. In contrast, we found a positive NB if clinicians incorporated qSOFA or CRB for the initial assessment, decreasing the number of unnecessary interventions while not missing any death. qSOFA and CRB were better than SIRS or a "treat-all" strategy for NWT values lower than 40, which seems reasonable for use in the ED (5,30), where qSOFA and CRB can be easily assessed. Given that CRB and CRB-65 were specifically developed for patients with CAP, they had better calibration and discrimination than qSOFA, and higher specificity. Thus, rather than qSOFA, physicians could consider CRB or CRB-65 for the initial risk stratification of patients with CAP.
For a comprehensive assessment of CAP, PSI had the best mortality prediction and highest NB from high NWT values down to an NWT of eight, reinforcing its pivotal role in CAP management. mSOFA seemed to be more applicable for NWT values lower than 12, mainly when considering ICU admission. This might be because PSI comprises 20 variables and has age as a main determinant for risk classification, whereas mSOFA measures acute organ dysfunctions in six domains. Further studies should investigate whether both scores are complementary in CAP management. Of note, the baseline 1 mSOFA model, which could be analogous to PSI 1 mSOFA, had higher discrimination and NB than PSI alone.
Our results are in line with those of the pivotal Sepsis-3 clinical criteria study (5,8), which showed better discrimination for qSOFA and mSOFA compared with SIRS. In contrast, mSOFA clearly outperformed qSOFA in our population. The discrimination of qSOFA in our study was lower than that reported originally in Sepsis-3 (5), which might be because of the differences in the populations included and because we measured qSOFA and mSOFA using ED data. Sepsis-3 aimed to identify infected patients with greater than or equal to 10% of mortality (5,8,36). In our study that goal was achieved: 18% of the patients presented positive qSOFA/mSOFA and in-hospital mortality in these patients was 16.6%.
Interestingly, when describing the prevalence of each score categorized by place of treatment, it seems that clinicians relied on the parameters hypotension, altered mental status, and tachypnea for decision-making. Indeed, only 2% and 4% of patients who were not hospitalized had qSOFA greater than or equal to two and CRB greater than or equal to two, respectively. However, SIRS greater than or equal to two was present in most of the promptly discharged patients (61%). Interestingly, 46% of patients had qSOFA less than two/mSOFA greater than or equal to two. In-hospital mortality in these patients was low (5.4%); this might indicate that patients with qSOFA less than two presented some points on mSOFA, but ultimately not associated with death. Among the scores we evaluated, qSOFA was recently developed by a data-driven process from large databases. As with CRB, it attributes one point to each clinical parameter, is promptly available at bedside, and is easily repeated without invasive measures. Yet it is    Illustrative example: if a clinician weights the harm/cost of overtreatment versus the benefit of appropriated treatment at 1:19 for in-hospital mortality, there is a threshold probability of 5% and a NWT of 20. This choice specifies that death of a patient with community-acquired pneumonia who remained untreated is 19 times worse than the important to emphasize that the suggested cutoff of two points for qSOFA had low sensitivity, being inappropriate if applied as a single screening tool, resulting in delayed recognition of sepsis (37). Our study has some strengths that must be highlighted. First, we described challenges in decision-making that could be faced by clinicians on a daily basis, not only during evaluation of hospitalized patients, but also in those rapidly discharged following ED evaluation. Additionally, it is known that predictive performance measures have disadvantages (19-21, 32, 38) and are difficult to translate into clinical-practice (14). Thus, we used clinical decision-making analyses (13) to complement predictive performance evaluations, which are fundamental to better support clinicians' decision (23,24,39).
This study has also some limitations. First, we analyzed one type of infection, from only two Spanish institutions, potentially limiting generalizability of our results. However, the data came from two prospective CAP cohorts, increasing our ability to capture data granularity. Second, although our data were prospective collected from consecutive patients and had few missing values, misclassification and selection bias could have occurred. We expect both to be low, because of the standard procedures for prospective data collection and researchers' extensive expertise in this field. Moreover, our outcomes were objective (mortality/ICU admission) and we had few losses to follow-up, decreasing the possibility of outcome bias. Third, we could not fully calculate the SOFA score for the cardiovascular and neurologic parameters; thus, by adopting a conservative approach we may have hampered the SOFA performance. However, the mSOFA score maintained its high predictive power, confirming feasibility of SOFA score calculation outside the ICU (18). Fourth, we could not differentiate between acute and chronic organ dysfunction; however, our analysis excluding patients with chronic comorbidities showed similar findings. Fifth, we observed score miscalibration, which can influence clinical decision based on NB (40). Finally, we did not incorporate clinical judgment into the models, which could ultimately improve the performance of the Sepsis-3 flowchart.

Conclusions
We demonstrated that for initial assessment, qSOFA outperformed SIRS and presented better clinical usefulness in patients with CAP in the ED. Moreover, CRB and CRB-65 had better predictive performance than qSOFA for initial stratification of patients with CAP in some scenarios, including higher NB for some values of NWT. For the comprehensive assessment of CAP, PSI had the best predictive performance and NB for mortality, whereas mSOFA seemed more suitable when considering ICU admission. Finally, the Sepsis-3 flowchart provided an improved, feasible approach for identifying patients with CAP at higher risk of death. Further studies, including other CAP cohorts and other sources of infection, should be conducted to corroborate our findings. n Author disclosures are available with the text of this article at www.atsjournals.org.

Acknowledgment:
The authors thank the clinicians and healthcare professionals who assiduously work in the collaborating institutions, and who helped in the development of both cohorts.  . consequences of overtreatment of an unnecessarily treated patient. At a NWT of 19, the net benefit of the SOFA, qSOFA, and CRB scores outperforms SIRS and treat-all strategies. At the same time, at a NWT of 20, one could reduce the number of interventions without missing any in-hospital death by 7% using SIRS criteria, 16% using qSOFA, 27% using CRB or mSOFA scores, 30% for CURB-65, and 35% for PSI. CRB = Confusion, Respiratory Rate and Blood Pressure; CURB-65 = Confusion, Urea, Respiratory Rate, Blood Pressure and Age; ICU = intensive care unit; mSOFA = modified Sequential (Sepsis-related) Organ Failure Assessment; PSI = Pneumonia Severity Index; qSOFA = quick Sequential (Sepsis-related) Organ Failure Assessment; SIRS = systemic inflammatory response syndrome.