A comparative analysis of tree-based models classifying imbalanced breath alcohol data

Alcañiz, Manuela; Santolino, Miguel; Ramon, Lluís

Please use this identifier to cite or link to this item: https://hdl.handle.net/2445/120281

Full metadata record

DC Field	Value	Language
dc.contributor.author	Alcañiz, Manuela	-
dc.contributor.author	Santolino, Miguel	-
dc.contributor.author	Ramon, Lluís	-
dc.date.accessioned	2018-02-27T09:49:03Z	-
dc.date.available	2018-02-27T09:49:03Z	-
dc.date.issued	2017	-
dc.identifier.issn	1889-3805	-
dc.identifier.uri	https://hdl.handle.net/2445/120281	-
dc.description.abstract	When applied to binary data, most classification algorithms behave well provided the dataset is balanced. However, when one single class includes the majority of cases, a good predictive performance for the minority class is not easy to achieve. We examine the strengths and weaknesses of three tree-based models when dealing with imbalanced data.We also explore sampling and cost sensitive methods as strategies for improving machine learning algorithms. An application to a large dataset of breath alcohol content tests performed in Catalonia (Spain) to detect drunk drivers is shown. The Random Forest method proved to be the model of choice if a high performance is required, while down- sampling strategies resulted in a significant reduction in computing time. When predicting alcohol impairment, the area of control (built-up or not), hour of day and drivers age were the most relevant variables for classification.	-
dc.format.extent	34 p.	-
dc.format.mimetype	application/pdf	-
dc.language.iso	eng	-
dc.publisher	Sociedad de Estadística e Investigación Operativa	-
dc.relation.isformatof	Reproducció del document publicat a: http://www.seio.es/BBEIO/BEIOVol33Num3/index.html#10	-
dc.relation.ispartof	Boletín de Estadística e Investigación Operativa, 2017, vol. 33, num. 3, p. 189-222	-
dc.rights	(c) Sociedad de Estadística e Investigación Operativa, 2017	-
dc.source	Articles publicats en revistes (Econometria, Estadística i Economia Aplicada)	-
dc.subject.classification	Consum d'alcohol	-
dc.subject.classification	Mostreig (Estadística)	-
dc.subject.classification	Algorismes	-
dc.subject.other	Drinking of alcoholic beverages	-
dc.subject.other	Sampling (Statistics)	-
dc.subject.other	Algorithms	-
dc.title	A comparative analysis of tree-based models classifying imbalanced breath alcohol data	-
dc.type	info:eu-repo/semantics/article	-
dc.type	info:eu-repo/semantics/publishedVersion	-
dc.identifier.idgrec	674019	-
dc.date.updated	2018-02-27T09:49:03Z	-
dc.rights.accessRights	info:eu-repo/semantics/openAccess	-
Appears in Collections:	Articles publicats en revistes (Econometria, Estadística i Economia Aplicada)

Files in This Item:

File	Description	Size	Format
674019.pdf		333.9 kB	Adobe PDF	View/Open

Show simple item record