Biased accuracy in multisite machine-learning studies due to incomplete removal of the effects of the site

Solanes, Aleix; Palau, Pol; Fortea, Lydia; Salvador, Raymond; González Navarro, Laura; Llach, Cristian; Valentí Ribas, Marc; Vieta i Pascual, Eduard; Radua, Joaquim

Please use this identifier to cite or link to this item: https://hdl.handle.net/2445/219882

Full metadata record

DC Field	Value	Language
dc.contributor.author	Solanes, Aleix	-
dc.contributor.author	Palau, Pol	-
dc.contributor.author	Fortea, Lydia	-
dc.contributor.author	Salvador, Raymond	-
dc.contributor.author	González Navarro, Laura	-
dc.contributor.author	Llach, Cristian	-
dc.contributor.author	Valentí Ribas, Marc	-
dc.contributor.author	Vieta i Pascual, Eduard, 1963-	-
dc.contributor.author	Radua, Joaquim	-
dc.date.accessioned	2025-03-20T13:53:02Z	-
dc.date.available	2025-03-20T13:53:02Z	-
dc.date.issued	2021-08-30	-
dc.identifier.issn	0925-4927	-
dc.identifier.uri	https://hdl.handle.net/2445/219882	-
dc.description.abstract	Brain MRI researchers conducting multisite studies, such as within the ENIGMA Consortium, are very aware of the importance of controlling the effects of the site (EoS) in the statistical analysis. Conversely, authors of the novel machine-learning MRI studies may remove the EoS when training the machine-learning models but not control them when estimating the models' accuracy, potentially leading to severely biased estimates. We show examples from a toy simulation study and real MRI data in which we remove the EoS from both the "training set" and the "test set" during the training and application of the model. However, the accuracy is still inflated (or occasionally shrunk) unless we further control the EoS during the estimation of the accuracy. We also provide several methods for controlling the EoS during the estimation of the accuracy, and a simple R package ("multisite.accuracy") that smoothly does this task for several accuracy estimates (e.g.,sensitivity/specificity, area under the curve, correlation, hazard ratio, etc.).	-
dc.format.extent	21 p.	-
dc.format.mimetype	application/pdf	-
dc.language.iso	eng	-
dc.publisher	Elsevier B.V.	-
dc.relation.isformatof	Versió postprint del document publicat a: https://doi.org/10.1016/j.pscychresns.2021.111313	-
dc.relation.ispartof	Psychiatry Research-Neuroimaging, 2021, vol. 314	-
dc.relation.uri	https://doi.org/10.1016/j.pscychresns.2021.111313	-
dc.rights	cc-by-nc-nd (c) Elsevier B.V., 2021	-
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	-
dc.source	Articles publicats en revistes (Medicina)	-
dc.subject.classification	Aprenentatge automàtic	-
dc.subject.classification	Estadística mèdica	-
dc.subject.classification	Imatges per ressonància magnètica	-
dc.subject.other	Machine learning	-
dc.subject.other	Medical statistics	-
dc.subject.other	Magnetic resonance imaging	-
dc.title	Biased accuracy in multisite machine-learning studies due to incomplete removal of the effects of the site	-
dc.type	info:eu-repo/semantics/article	-
dc.type	info:eu-repo/semantics/acceptedVersion	-
dc.identifier.idgrec	717059	-
dc.date.updated	2025-03-20T13:53:02Z	-
dc.rights.accessRights	info:eu-repo/semantics/openAccess	-
dc.identifier.idimarina	9243705	-
dc.identifier.pmid	34098248	-
Appears in Collections:	Articles publicats en revistes (Medicina) Articles publicats en revistes (IDIBAPS: Institut d'investigacions Biomèdiques August Pi i Sunyer)

Files in This Item:

File	Description	Size	Format
244296.pdf		240.46 kB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License