Please use this identifier to cite or link to this item:
http://hdl.handle.net/2445/120281
Title: | A comparative analysis of tree-based models classifying imbalanced breath alcohol data |
Author: | Alcañiz, Manuela Santolino, Miguel Ramon, Lluís |
Keywords: | Consum d'alcohol Mostreig (Estadística) Algorismes Drinking of alcoholic beverages Sampling (Statistics) Algorithms |
Issue Date: | 2017 |
Publisher: | Sociedad de Estadística e Investigación Operativa |
Abstract: | When applied to binary data, most classification algorithms behave well provided the dataset is balanced. However, when one single class includes the majority of cases, a good predictive performance for the minority class is not easy to achieve. We examine the strengths and weaknesses of three tree-based models when dealing with imbalanced data.We also explore sampling and cost sensitive methods as strategies for improving machine learning algorithms. An application to a large dataset of breath alcohol content tests performed in Catalonia (Spain) to detect drunk drivers is shown. The Random Forest method proved to be the model of choice if a high performance is required, while down- sampling strategies resulted in a significant reduction in computing time. When predicting alcohol impairment, the area of control (built-up or not), hour of day and drivers age were the most relevant variables for classification. |
Note: | Reproducció del document publicat a: http://www.seio.es/BBEIO/BEIOVol33Num3/index.html#10 |
It is part of: | Boletín de Estadística e Investigación Operativa, 2017, vol. 33, num. 3, p. 189-222 |
URI: | http://hdl.handle.net/2445/120281 |
ISSN: | 1889-3805 |
Appears in Collections: | Articles publicats en revistes (Econometria, Estadística i Economia Aplicada) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
674019.pdf | 333.9 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.