Popular Boards

Carlos R. Ramírez Medina, Jose Benitez-Aurioles, David Jenkins | npj Digital Medicine | (2025)

Key Takeaways

Sample Definition And Size

The review included 44 studies that applied supervised machine learning to predict opioid-related adverse events. These studies were published between 2017 and 2023. The sample sizes varied widely across studies—for example, in opioid use disorder prediction studies, sample sizes ranged from 130,120 to 5,183,566 (mean ≈1,116,761; median ≈361,527) ([nature.com](https://www.nature.com/articles/s41746-024-01312-4?utm_source=openai)).

Study Type

This is a systematic literature review (systematic review) of supervised machine learning prediction models for opioid-related harms, following PRISMA guidelines and using CHARMS and PROBAST tools for data extraction and bias assessment ([nature.com](https://www.nature.com/articles/s41746-024-01312-4?utm_source=openai)).

Conflicts Of Interest

The authors declared no competing interests ([nature.com](https://www.nature.com/articles/s41746-024-01312-4?utm_source=openai)).

Results Summary

Key findings: Among the 44 studies, most originated from North America (96%), with only 7% reporting external validation. Common predicted outcomes included postoperative opioid use (15 studies, 34%), opioid overdose (8, 18%), opioid use disorder (8, 18%), and persistent opioid use (5, 11%) ([nature.com](https://www.nature.com/articles/s41746-024-01312-4?utm_source=openai)). Model performance varied: AUC ranged from 0.68 to 0.96 across studies ([nature.com](https://www.nature.com/articles/s41746-024-01312-4?utm_source=openai)). Calibration reporting was missing in 41% of studies ([nature.com](https://www.nature.com/articles/s41746-024-01312-4?utm_source=openai)). Risk of bias assessment (PROBAST) found 16/44 studies had high risk in at least one domain, 19/44 were unclear in at least one domain, and only 9/44 had low risk across all domains ([nature.com](https://www.nature.com/articles/s41746-024-01312-4?utm_source=openai)).

Abstract

No abstract available

Referenced In

🚨 AI Can Predict Opioid Death Risk… So Why Isn’t It Used in Clinics?

What began as an effort to treat PAIN has, in many parts of the world, evolved into a devastating public health crisis — with prescription opioids contributing to a growing number of opioid-related deaths.

A recent review compiled 44 machine learning studies attempting to predict opioid harm — yet almost NONE have made it into real clinical use.

Interestingly, model performance across these papers was moderate to strong, but 41% lacked proper calibration, meaning their risk predictions may not reflect real-world probabilities.

This UK study adds one more to the pile, using competing risk time-to-event models on over 1 million patients. It predicts opioid-related death with ~82% accuracy.

Top predictors include prior substance abuse, lung/liver co-morbidities, strong opioids at initiation, and gabapentinoid co-prescription.

What they did differently:

  • Predicted mortality rather than overdose

  • Implemented competing risks framework accounting for deaths from other causes

  • Tested if deep learning helps: 48,500-parameter neural network under-performed to a simple LASSO regression

  • Acknowledged poor calibration in external validation, where models overestimated absolute risk by 2-7×, designing percentile-based scores as workaround

  • Built for deployment: EHR-native features, SHAP interpretability, no data leakage

This new paper is a more rigorous model. Yet, it still may not reach patients.

Given the high recall, specificity and lower precision, the model works best when used against its design. The "Implementation Irony" holds true as it is trained to flag danger, yet it succeeds only at clearing safety.

It can suggest who probably will not die.

It cannot say who WILL

And in the midst overwhelming clinicians with false alarms.

⸂⸂⸜(രᴗര๑)⸝⸃⸃ Hey everyone!! 👋 Biomed engineering PhD student here — I always enjoy seeing how technology might actually translate into real healthcare impact. Anyways, this study recently caught my attention, and I’m curious to hear what others think.

🤔 Food for thought:

  • If simple models beat deep learning, why do we keep building bigger ones?

  • Is negative screening (identifying those safe to proceed) even useful to clinicians? 

  • Thus far, most models are based in the US or the UK, how far would the prediction shift in a new area / culture?

3