A comparative analysis of tree-based models classifying imbalanced breath alcohol data
| dc.contributor.author | Alcañiz, Manuela | |
| dc.contributor.author | Santolino, Miguel | |
| dc.contributor.author | Ramon, Lluís | |
| dc.date.accessioned | 2018-02-27T09:49:03Z | |
| dc.date.available | 2018-02-27T09:49:03Z | |
| dc.date.issued | 2017 | |
| dc.date.updated | 2018-02-27T09:49:03Z | |
| dc.description.abstract | When applied to binary data, most classification algorithms behave well provided the dataset is balanced. However, when one single class includes the majority of cases, a good predictive performance for the minority class is not easy to achieve. We examine the strengths and weaknesses of three tree-based models when dealing with imbalanced data.We also explore sampling and cost sensitive methods as strategies for improving machine learning algorithms. An application to a large dataset of breath alcohol content tests performed in Catalonia (Spain) to detect drunk drivers is shown. The Random Forest method proved to be the model of choice if a high performance is required, while down- sampling strategies resulted in a significant reduction in computing time. When predicting alcohol impairment, the area of control (built-up or not), hour of day and drivers age were the most relevant variables for classification. | |
| dc.format.extent | 34 p. | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.idgrec | 674019 | |
| dc.identifier.issn | 1889-3805 | |
| dc.identifier.uri | https://hdl.handle.net/2445/120281 | |
| dc.language.iso | eng | |
| dc.publisher | Sociedad de Estadística e Investigación Operativa | |
| dc.relation.isformatof | Reproducció del document publicat a: http://www.seio.es/BBEIO/BEIOVol33Num3/index.html#10 | |
| dc.relation.ispartof | Boletín de Estadística e Investigación Operativa, 2017, vol. 33, num. 3, p. 189-222 | |
| dc.rights | (c) Sociedad de Estadística e Investigación Operativa, 2017 | |
| dc.rights.accessRights | info:eu-repo/semantics/openAccess | |
| dc.source | Articles publicats en revistes (Econometria, Estadística i Economia Aplicada) | |
| dc.subject.classification | Consum d'alcohol | |
| dc.subject.classification | Mostreig (Estadística) | |
| dc.subject.classification | Algorismes | |
| dc.subject.other | Drinking of alcoholic beverages | |
| dc.subject.other | Sampling (Statistics) | |
| dc.subject.other | Algorithms | |
| dc.title | A comparative analysis of tree-based models classifying imbalanced breath alcohol data | |
| dc.type | info:eu-repo/semantics/article | |
| dc.type | info:eu-repo/semantics/publishedVersion |
Fitxers
Paquet original
1 - 1 de 1