🚨 The Sepsis Prediction Paradox: Our AI Models Are Getting More Explainable—But Are They Explaining the Right Things?

New systematic review reveals a critical gap between what ML models highlight and what clinicians actually need to see.

A PRISMA-guided review of 37 studies (2019–2025) offers encouraging news — along with other concerns:

📈 The Good News: Explainability Is Surging

Adoption of explainable ML methods is accelerating dramatically with ~67% greater odds per year. By 2023–2025, techniques like SHAP and LIME were substantially more common than in 2019–2021. SHAP dominated, used in 74% of explainability studies.

Better yet, explainability-friendly studies scored higher on methodological quality (82.6% vs. 76.9%), suggesting transparency and rigor increasingly go hand-in-hand.

⚠️ The Concerning Gap: Models Explain Vitals, Not Key Biomarkers

Across studies reporting feature importance, Heart rate topped the charts, appearing as a top-5 predictor in 11 of 25 studies. Temperature and respiratory rate also similarly dominated.

Yet clinically crucial sepsis biomarkers were strikingly absent:

C-reactive protein (CRP): Used in only 4 of 37 studies
Procalcitonin (PCT): Used in only 1 of 37 study

This is alarming because CRP and PCT are among the most extensively studied and clinically utilized sepsis biomarkers in practice. This led the authors to highlight a striking issue in "data architecture".

Complete CRP and PCT records in public datasets were rare: sometimes missing and were often times than not, never consistently sampled.

🏗️ Structural Barriers Only Run Deeper

Reproducibility: Only 27% of papers shared code; 22% released datasets
Generalizability: 27% used local-only data; 38% lacked external validation
Real-world evidence: Only 2 of 37 studies were prospective—meaning 95% trained on retrospective data that can't establish causal relationships

In reality, 17 of 37 studies analyzed the same PhysioNet Challenge 2019 dataset, potentially creating an illusion of independent progress.

💡 The Path Forward

This review newly quantifies two parallel trends: rising explainability and persistent biological relevance gaps. Closing them requires:

Prospective designs capturing CRP, PCT, and other biomarkers at clinically meaningful time points
Explanations reflecting pathophysiology: not just generic feature rankings, but insights clinicians can act on
Transparency as standard: code and data sharing must become routine, not exceptional
Clinician-validated interpretability: SHAP values are only useful if they improve bedside decision-making

🔬 Bottom Line

We're at an inflection point. Sepsis prediction models are getting more transparent, but their explanations remain constrained by what the health records conveniently capture.

The gap between algorithmic performance and biological relevance won't close with better neural networks alone—it requires reimagining how we collect and validate clinical data for Machine Learning.

Top five features for ML in sepsis

Item 1 of 1

2026 | Ioannis Papapanagiotou, Apostolos Karalis, Stelios Kokkoris | International Journal of Medical Informatics | (2026)
2019 | Matthew A. Reyna, Christopher S. Josef, Russell Jeter | Critical Care Medicine | (2019)

Item 1 of 2

Created: May 5, 2026