Carregant...
Miniatura

Tipus de document

Article

Versió

Versió publicada

Data de publicació

Tots els drets reservats

Si us plau utilitzeu sempre aquest identificador per citar o enllaçar aquest document: https://hdl.handle.net/2445/120281

A comparative analysis of tree-based models classifying imbalanced breath alcohol data

Títol de la revista

Director/Tutor

ISSN de la revista

Títol del volum

Recurs relacionat

Resum

When applied to binary data, most classification algorithms behave well provided the dataset is balanced. However, when one single class includes the majority of cases, a good predictive performance for the minority class is not easy to achieve. We examine the strengths and weaknesses of three tree-based models when dealing with imbalanced data.We also explore sampling and cost sensitive methods as strategies for improving machine learning algorithms. An application to a large dataset of breath alcohol content tests performed in Catalonia (Spain) to detect drunk drivers is shown. The Random Forest method proved to be the model of choice if a high performance is required, while down- sampling strategies resulted in a significant reduction in computing time. When predicting alcohol impairment, the area of control (built-up or not), hour of day and drivers age were the most relevant variables for classification.

Citació

Citació

ALCAÑIZ, Manuela, SANTOLINO, Miguel, RAMON, Lluís. A comparative analysis of tree-based models classifying imbalanced breath alcohol data. _Boletín de Estadística e Investigación Operativa_. 2017. Vol. 33, núm. 3, pàgs. 189-222. [consulta: 27 de gener de 2026]. ISSN: 1889-3805. [Disponible a: https://hdl.handle.net/2445/120281]

Exportar metadades

JSON - METS

Compartir registre