Popular Boards

Matthew A. Reyna, Christopher S. Josef, Russell Jeter | Critical Care Medicine | (2019)

Abstract

OBJECTIVES: Sepsis is a major public health concern with significant morbidity, mortality, and healthcare expenses. Early detection and antibiotic treatment of sepsis improve outcomes. However, although professional critical care societies have proposed new clinical criteria that aid sepsis recognition, the fundamental need for early detection and treatment remains unmet. In response, researchers have proposed algorithms for early sepsis detection, but directly comparing such methods has not been possible because of different patient cohorts, clinical variables and sepsis criteria, prediction tasks, evaluation metrics, and other differences. To address these issues, the PhysioNet/Computing in Cardiology Challenge 2019 facilitated the development of automated, open-source algorithms for the early detection of sepsis from clinical data. DESIGN: Participants submitted containerized algorithms to a cloud-based testing environment, where we graded entries for their binary classification performance using a novel clinical utility-based evaluation metric. We designed this scoring function specifically for the Challenge to reward algorithms for early predictions and penalize them for late or missed predictions and for false alarms. SETTING: ICUs in three separate hospital systems. We shared data from two systems publicly and sequestered data from all three systems for scoring. PATIENTS: We sourced over 60,000 ICU patients with up to 40 clinical variables for each hour of a patient's ICU stay. We applied Sepsis-3 clinical criteria for sepsis onset. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: A total of 104 groups from academia and industry participated, contributing 853 submissions. Furthermore, 90 abstracts based on Challenge entries were accepted for presentation at Computing in Cardiology. CONCLUSIONS: Diverse computational approaches predict the onset of sepsis several hours before clinical recognition, but generalizability to different hospital systems remains a challenge.

Tags

Plain English Takeaway

This paper describes a big competition where teams built computer programs to spot sepsis early in hospital patients, showing that computers can help doctors catch sepsis sooner, but it's still hard to make these tools work everywhere.

Study Aim

The main goal of this paper is to address the ongoing challenge of detecting sepsis (a life-threatening reaction to infection) early in hospital patients. The authors aim to create a fair way to compare different computer algorithms that predict sepsis, since past studies used different patient groups and methods. They organize a public challenge to encourage the development of open-source tools for early sepsis detection and to test these tools using a new scoring system that rewards early and accurate predictions. Simply put: The study wants to find the best way to use computers to spot sepsis early and fairly compare different methods.

Study Design

The research is based on the PhysioNet/Computing in Cardiology Challenge 2019. In this challenge, 104 teams from universities and companies submitted 853 computer algorithms. These algorithms were tested in a cloud-based system using data from over 60,000 intensive care unit (ICU) patients. Each patient record included up to 40 different health measurements taken every hour, such as heart rate, blood pressure, and lab results. The study used the Sepsis-3 criteria (a standard definition for sepsis onset) to label when sepsis started. The scoring system rewarded early and correct predictions and penalized late or false alarms. Data from three hospital systems were used, with some data kept hidden for fair testing. Simply put: The study ran a big contest where teams tested computer programs on hospital data to see which could best predict sepsis early.

Findings

The study reveals that many different computer approaches can predict sepsis several hours before doctors usually recognize it. The challenge attracted wide participation, with 90 related abstracts accepted for presentation. However, the authors note that while these algorithms work well on the data provided, it is still difficult to make sure they perform just as well in different hospitals. The new scoring system helped highlight which methods were best at early and accurate detection. The authors suggest that more work is needed to ensure these tools can be used reliably in real-world hospital settings. Simply put: The study found that computers can help spot sepsis early, but it's still tough to make sure these tools work well in every hospital.

Referenced In

🚨 The Sepsis Prediction Paradox: Our AI Models Are Getting More Explainable—But Are They Explaining the Right Things?

New systematic review reveals a critical gap between what ML models highlight and what clinicians actually need to see.

A PRISMA-guided review of 37 studies (2019–2025) offers encouraging news — along with other concerns:

📈 The Good News: Explainability Is Surging

Adoption of explainable ML methods is accelerating dramatically with ~67% greater odds per year. By 2023–2025, techniques like SHAP and LIME were substantially more common than in 2019–2021. SHAP dominated, used in 74% of explainability studies.

Better yet, explainability-friendly studies scored higher on methodological quality (82.6% vs. 76.9%), suggesting transparency and rigor increasingly go hand-in-hand.

⚠️ The Concerning Gap: Models Explain Vitals, Not Key Biomarkers

Across studies reporting feature importance, Heart rate topped the charts, appearing as a top-5 predictor in 11 of 25 studies. Temperature and respiratory rate also similarly dominated.

Yet clinically crucial sepsis biomarkers were strikingly absent:

  • C-reactive protein (CRP): Used in only 4 of 37 studies

  • Procalcitonin (PCT): Used in only 1 of 37 study

This is alarming because CRP and PCT are among the most extensively studied and clinically utilized sepsis biomarkers in practice. This led the authors to highlight a striking issue in "data architecture".

Complete CRP and PCT records in public datasets were rare: sometimes missing and were often times than not, never consistently sampled.

🏗️ Structural Barriers Only Run Deeper

  • Reproducibility: Only 27% of papers shared code; 22% released datasets

  • Generalizability: 27% used local-only data; 38% lacked external validation

  • Real-world evidence: Only 2 of 37 studies were prospective—meaning 95% trained on retrospective data that can't establish causal relationships

In reality, 17 of 37 studies analyzed the same PhysioNet Challenge 2019 dataset, potentially creating an illusion of independent progress.

💡 The Path Forward

This review newly quantifies two parallel trends: rising explainability and persistent biological relevance gaps. Closing them requires:

  1. Prospective designs capturing CRP, PCT, and other biomarkers at clinically meaningful time points

  2. Explanations reflecting pathophysiology: not just generic feature rankings, but insights clinicians can act on

  3. Transparency as standard: code and data sharing must become routine, not exceptional

  4. Clinician-validated interpretability: SHAP values are only useful if they improve bedside decision-making

🔬 Bottom Line

We're at an inflection point. Sepsis prediction models are getting more transparent, but their explanations remain constrained by what the health records conveniently capture.

The gap between algorithmic performance and biological relevance won't close with better neural networks alone—it requires reimagining how we collect and validate clinical data for Machine Learning.

0