Popular Boards

Ioannis Papapanagiotou, Apostolos Karalis, Stelios Kokkoris | International Journal of Medical Informatics | (2026)

Key Takeaways

Plain English Takeaway

Many computer programs can now help doctors spot sepsis early, but these programs often focus on easy-to-measure signs instead of the most important blood markers for sepsis.

Study Aim

The main goal of this paper is to review recent studies that use machine learning (computer programs that learn from data) to predict sepsis (a serious infection response) in adults. The authors specifically want to see how well these programs explain their predictions and whether they use the most important blood markers for sepsis. Simply put: The paper checks if computer tools for spotting sepsis use the right medical clues and explain their decisions clearly.

Study Design

The authors conducted a systematic review, following PRISMA guidelines (a standard for reviewing studies). They searched four major databases for studies published between January 2019 and July 2025. Only studies that used the Sepsis-3 definition (a standard way to define sepsis) and included critically ill adult patients were chosen. Two reviewers independently checked each study for quality and how well they explained their results. Simply put: The researchers carefully looked at recent studies about computer programs that predict sepsis in very sick adults.

Findings

The review found that more studies are now using explainability methods (ways to show how computer programs make decisions) in sepsis prediction, with about 67% more studies doing this each year. However, the most important blood markers for sepsis, like procalcitonin and C-reactive protein, were rarely used by these programs. Instead, the programs mostly relied on vital signs (like heart rate and blood pressure), which are measured more often in hospitals. This is partly because the key blood markers are not always recorded in public datasets. The authors also note that differences in which features are chosen and the use of local data make it hard to apply these findings everywhere. They recommend better data sharing and more focus on using the right medical clues in future research. Simply put: The study found that computer tools for sepsis prediction are getting better at explaining their choices, but they often miss the most important blood tests for sepsis.

Abstract

OBJECTIVE: To systematically review machine learning-based sepsis prediction studies, examining model explainability and the extent to which explanations reflect key sepsis biomarkers. DATA SOURCES: Following the PRISMA guidelines, we reviewed the titles, abstracts, and full texts. The search was conducted in four major bibliographic databases with publication dates from January 1, 2019 to July 16, 2025. STUDY SELECTION: The included studies provided a clear definition of sepsis based on the Sepsis-3 criteria and involved critically ill adult human subjects. DATA EXTRACTION AND SYNTHESIS: Two authors (IP and AKa) independently reviewed and assessed each study. Using statistical methods, we assessed study quality and explainability trends. RESULTS: A total of 37 studies were included. Our analysis revealed a notable temporal increase (≈67% greater odds per year) in the use of explainability methods in sepsis prediction models. However, key sepsis biomarkers (procalcitonin or C-reactive protein) were not among the top predictive features, highlighting a gap between the model output and known sepsis pathophysiology. DISCUSSION: Model attributions often mirror what electronic health records measure most consistently (vital signs) rather than what is most biologically specific, partly due to the high missingness and irregular sampling of CRP/PCT in public datasets. Heterogeneity in feature selection and reliance on local datasets limit generalizability, while sparse code/data sharing constrains reproducibility. CONCLUSION: This review newly quantifies the rise of explainability use in sepsis prediction and identifies a consistent gap between model explanations and key sepsis biomarkers, providing a foundation for future work to bridge data-driven insights with sepsis pathophysiology. SYSTEMATIC REVIEW REGISTRATION NUMBER: CRD420251101470.

Referenced In

🚨 The Sepsis Prediction Paradox: Our AI Models Are Getting More Explainable—But Are They Explaining the Right Things?

New systematic review reveals a critical gap between what ML models highlight and what clinicians actually need to see.

A PRISMA-guided review of 37 studies (2019–2025) offers encouraging news — along with other concerns:

📈 The Good News: Explainability Is Surging

Adoption of explainable ML methods is accelerating dramatically with ~67% greater odds per year. By 2023–2025, techniques like SHAP and LIME were substantially more common than in 2019–2021. SHAP dominated, used in 74% of explainability studies.

Better yet, explainability-friendly studies scored higher on methodological quality (82.6% vs. 76.9%), suggesting transparency and rigor increasingly go hand-in-hand.

⚠️ The Concerning Gap: Models Explain Vitals, Not Key Biomarkers

Across studies reporting feature importance, Heart rate topped the charts, appearing as a top-5 predictor in 11 of 25 studies. Temperature and respiratory rate also similarly dominated.

Yet clinically crucial sepsis biomarkers were strikingly absent:

  • C-reactive protein (CRP): Used in only 4 of 37 studies

  • Procalcitonin (PCT): Used in only 1 of 37 study

This is alarming because CRP and PCT are among the most extensively studied and clinically utilized sepsis biomarkers in practice. This led the authors to highlight a striking issue in "data architecture".

Complete CRP and PCT records in public datasets were rare: sometimes missing and were often times than not, never consistently sampled.

🏗️ Structural Barriers Only Run Deeper

  • Reproducibility: Only 27% of papers shared code; 22% released datasets

  • Generalizability: 27% used local-only data; 38% lacked external validation

  • Real-world evidence: Only 2 of 37 studies were prospective—meaning 95% trained on retrospective data that can't establish causal relationships

In reality, 17 of 37 studies analyzed the same PhysioNet Challenge 2019 dataset, potentially creating an illusion of independent progress.

💡 The Path Forward

This review newly quantifies two parallel trends: rising explainability and persistent biological relevance gaps. Closing them requires:

  1. Prospective designs capturing CRP, PCT, and other biomarkers at clinically meaningful time points

  2. Explanations reflecting pathophysiology: not just generic feature rankings, but insights clinicians can act on

  3. Transparency as standard: code and data sharing must become routine, not exceptional

  4. Clinician-validated interpretability: SHAP values are only useful if they improve bedside decision-making

🔬 Bottom Line

We're at an inflection point. Sepsis prediction models are getting more transparent, but their explanations remain constrained by what the health records conveniently capture.

The gap between algorithmic performance and biological relevance won't close with better neural networks alone—it requires reimagining how we collect and validate clinical data for Machine Learning.

0