Carregant...
Fitxers
Tipus de document
ArticleVersió
Versió publicadaData de publicació
Tots els drets reservats
Si us plau utilitzeu sempre aquest identificador per citar o enllaçar aquest document: https://hdl.handle.net/2445/120281
A comparative analysis of tree-based models classifying imbalanced breath alcohol data
Títol de la revista
Director/Tutor
ISSN de la revista
Títol del volum
Recurs relacionat
Resum
When applied to binary data, most classification algorithms behave well provided the dataset is balanced. However, when one single class includes the majority of cases, a good predictive performance for the minority class is not easy to achieve. We examine the strengths and weaknesses of three tree-based models when dealing with imbalanced data.We also explore sampling and cost sensitive methods as strategies for improving machine learning algorithms. An application to a large dataset of breath alcohol content tests performed in Catalonia (Spain) to detect drunk drivers is shown. The Random Forest method proved to be the model of choice if a high performance is required, while down- sampling strategies resulted in a significant reduction in computing time. When predicting alcohol impairment, the area of control (built-up or not), hour of day and drivers age were the most relevant variables for classification.
Matèries
Matèries (anglès)
Citació
Citació
ALCAÑIZ, Manuela, SANTOLINO, Miguel, RAMON, Lluís. A comparative analysis of tree-based models classifying imbalanced breath alcohol data. _Boletín de Estadística e Investigación Operativa_. 2017. Vol. 33, núm. 3, pàgs. 189-222. [consulta: 27 de gener de 2026]. ISSN: 1889-3805. [Disponible a: https://hdl.handle.net/2445/120281]