Fitxers
Tipus de document
ArticleVersió
Versió publicadaData de publicació
Llicència de publicació
Si us plau utilitzeu sempre aquest identificador per citar o enllaçar aquest document: https://hdl.handle.net/2445/227851
Evaluation metrics in medical imaging AI: fundamentals, pitfalls, misapplications, and recommendations
Títol de la revista
Director/Tutor
ISSN de la revista
Títol del volum
Recurs relacionat
Resum
Robust assessment of artificial intelligence (AI) models in medical imaging is paramount for reliable clinical integration. This international collaborative review paper provides an overview of key evaluation metrics across diverse tasks, including classification, regression, survival analysis, detection, and segmentation, as well as specialized metrics for calibration, foundation models, large language models, and synthetic images. Challenges of comparing models statistically and translating metric scores to clinical practice are also discussed. For each section, the paper outlines fundamental metrics, identifies common pitfalls and misapplications, and offers
recommendations for more robust evaluations. Key recommendations often involve utilizing multiple, complementary metrics tailored to the specific task and dataset properties, transparent reporting of methodology, and critically, considering the clinical utility and real-world implications of model performance. Ultimately, effective evaluation requires a comprehensive, context-aware approach that goes beyond statistical metrics to ensure.
Matèries (anglès)
Citació
Citació
KOCAK, Burak, et al. Evaluation metrics in medical imaging AI: fundamentals, pitfalls, misapplications, and recommendations. European Journal of Radiology Artificial Intelligence. 2025. Vol. 3, num. 100030. [consulted: 22 of May of 2026]. Available at: https://hdl.handle.net/2445/227851