Popular Boards
Alif Munim, Adibvafa Fallahpour, Teodora Szasz | ArXiv.org | (2026)
Abstract
Foundation models for echocardiography often struggle to disentangle anatomical signal from the stochastic speckle and acquisition artifacts inherent to ultrasound. We present EchoJEPA, a foundation model trained on 18 million echocardiograms across 300K patients, representing the largest pretraining corpus for this modality to date. By leveraging a latent predictive objective, EchoJEPA learns robust anatomical representations that ignore speckle noise. We validate this using a novel multi-view probing framework with frozen backbones, where EchoJEPA outperforms leading baselines by approximately 20% in left ventricular ejection fraction (LVEF) estimation and 17% in right ventricular systolic pressure (RVSP) estimation. The model also exhibits remarkable sample efficiency, reaching 79% view classification accuracy with only 1% of labeled data versus 42% for the best baseline trained on 100%. Crucially, EchoJEPA demonstrates superior generalization, degrading by only 2% under physics-informed acoustic perturbations compared to 17% for competitors. Most remarkably, its zero-shot performance on pediatric patients surpasses fully fine-tuned baselines, establishing latent prediction as a superior paradigm for robust, generalizable medical AI.
Tags
Sample Definition And Size
The study pretrained EchoJEPA on 18 million echocardiograms from approximately 300,000 patients, representing the largest pretraining corpus for echocardiography to date ([arxiv.org](https://arxiv.org/abs/2602.02603?utm_source=openai)).
Study Type
This is a foundation model development study employing self-supervised learning with a latent predictive objective, evaluated via a novel multi-view probing framework using frozen backbones. It is not a clinical trial but a methodological AI model development and evaluation study ([arxiv.org](https://arxiv.org/abs/2602.02603?utm_source=openai)).
Conflicts Of Interest
No conflicts of interest or funding disclosures are provided in the arXiv metadata or abstract. The paper acknowledges support from the Simons Foundation and member institutions, but no competing interests are declared ([arxiv.org](https://arxiv.org/abs/2602.02603)).
Results Summary
Key findings include: approximately 20% reduction in error for left ventricular ejection fraction (LVEF) estimation and 17% reduction for right ventricular systolic pressure (RVSP) estimation compared to leading baselines; view classification accuracy of 79% using only 1% labeled data versus 42% for the best baseline trained on 100%; only ~2% performance degradation under acoustic perturbations versus ~17% for competitors; and superior zero-shot performance on pediatric patients, outperforming fully fine-tuned baselines ([arxiv.org](https://arxiv.org/abs/2602.02603)).
Referenced In
Mercedes C.
2 months ago
Created: Mar 7, 2026