A comparative analysis of tree-based models classifying imbalanced breath alcohol data

dc.contributor.authorAlcañiz, Manuela
dc.contributor.authorSantolino, Miguel
dc.contributor.authorRamon, Lluís
dc.date.accessioned2018-02-27T09:49:03Z
dc.date.available2018-02-27T09:49:03Z
dc.date.issued2017
dc.date.updated2018-02-27T09:49:03Z
dc.description.abstractWhen applied to binary data, most classification algorithms behave well provided the dataset is balanced. However, when one single class includes the majority of cases, a good predictive performance for the minority class is not easy to achieve. We examine the strengths and weaknesses of three tree-based models when dealing with imbalanced data.We also explore sampling and cost sensitive methods as strategies for improving machine learning algorithms. An application to a large dataset of breath alcohol content tests performed in Catalonia (Spain) to detect drunk drivers is shown. The Random Forest method proved to be the model of choice if a high performance is required, while down- sampling strategies resulted in a significant reduction in computing time. When predicting alcohol impairment, the area of control (built-up or not), hour of day and drivers age were the most relevant variables for classification.
dc.format.extent34 p.
dc.format.mimetypeapplication/pdf
dc.identifier.idgrec674019
dc.identifier.issn1889-3805
dc.identifier.urihttps://hdl.handle.net/2445/120281
dc.language.isoeng
dc.publisherSociedad de Estadística e Investigación Operativa
dc.relation.isformatofReproducció del document publicat a: http://www.seio.es/BBEIO/BEIOVol33Num3/index.html#10
dc.relation.ispartofBoletín de Estadística e Investigación Operativa, 2017, vol. 33, num. 3, p. 189-222
dc.rights(c) Sociedad de Estadística e Investigación Operativa, 2017
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.sourceArticles publicats en revistes (Econometria, Estadística i Economia Aplicada)
dc.subject.classificationConsum d'alcohol
dc.subject.classificationMostreig (Estadística)
dc.subject.classificationAlgorismes
dc.subject.otherDrinking of alcoholic beverages
dc.subject.otherSampling (Statistics)
dc.subject.otherAlgorithms
dc.titleA comparative analysis of tree-based models classifying imbalanced breath alcohol data
dc.typeinfo:eu-repo/semantics/article
dc.typeinfo:eu-repo/semantics/publishedVersion

Fitxers

Paquet original

Mostrant 1 - 1 de 1
Carregant...
Miniatura
Nom:
674019.pdf
Mida:
333.9 KB
Format:
Adobe Portable Document Format