Please use this identifier to cite or link to this item:
http://hdl.handle.net/2445/120281
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Alcañiz, Manuela | - |
dc.contributor.author | Santolino, Miguel | - |
dc.contributor.author | Ramon, Lluís | - |
dc.date.accessioned | 2018-02-27T09:49:03Z | - |
dc.date.available | 2018-02-27T09:49:03Z | - |
dc.date.issued | 2017 | - |
dc.identifier.issn | 1889-3805 | - |
dc.identifier.uri | http://hdl.handle.net/2445/120281 | - |
dc.description.abstract | When applied to binary data, most classification algorithms behave well provided the dataset is balanced. However, when one single class includes the majority of cases, a good predictive performance for the minority class is not easy to achieve. We examine the strengths and weaknesses of three tree-based models when dealing with imbalanced data.We also explore sampling and cost sensitive methods as strategies for improving machine learning algorithms. An application to a large dataset of breath alcohol content tests performed in Catalonia (Spain) to detect drunk drivers is shown. The Random Forest method proved to be the model of choice if a high performance is required, while down- sampling strategies resulted in a significant reduction in computing time. When predicting alcohol impairment, the area of control (built-up or not), hour of day and drivers age were the most relevant variables for classification. | - |
dc.format.extent | 34 p. | - |
dc.format.mimetype | application/pdf | - |
dc.language.iso | eng | - |
dc.publisher | Sociedad de Estadística e Investigación Operativa | - |
dc.relation.isformatof | Reproducció del document publicat a: http://www.seio.es/BBEIO/BEIOVol33Num3/index.html#10 | - |
dc.relation.ispartof | Boletín de Estadística e Investigación Operativa, 2017, vol. 33, num. 3, p. 189-222 | - |
dc.rights | (c) Sociedad de Estadística e Investigación Operativa, 2017 | - |
dc.source | Articles publicats en revistes (Econometria, Estadística i Economia Aplicada) | - |
dc.subject.classification | Consum d'alcohol | - |
dc.subject.classification | Mostreig (Estadística) | - |
dc.subject.classification | Algorismes | - |
dc.subject.other | Drinking of alcoholic beverages | - |
dc.subject.other | Sampling (Statistics) | - |
dc.subject.other | Algorithms | - |
dc.title | A comparative analysis of tree-based models classifying imbalanced breath alcohol data | - |
dc.type | info:eu-repo/semantics/article | - |
dc.type | info:eu-repo/semantics/publishedVersion | - |
dc.identifier.idgrec | 674019 | - |
dc.date.updated | 2018-02-27T09:49:03Z | - |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | - |
Appears in Collections: | Articles publicats en revistes (Econometria, Estadística i Economia Aplicada) |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
674019.pdf | 333.9 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.