SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.
RSS Feed

HELP: Tutorials | FAQ
CONTACT US: Contact info

Search Results

Journal Article

Citation

Li Y, Yang Z, Xing L, Yuan C, Liu F, Wu D, Yang H. Accid. Anal. Prev. 2023; 192: e107271.

Copyright

(Copyright © 2023, Elsevier Publishing)

DOI

10.1016/j.aap.2023.107271

PMID

37659275

Abstract

For each road crash event, it is necessary to predict its injury severity. However, predicting crash injury severity with the imbalanced data frequently results in ineffective classifier. Due to the rarity of severe injuries in road traffic crashes, the crash data is extremely imbalanced among injury severity classes, making it challenging to the training of prediction models. To achieve interclass balance, it is possible to generate certain minority class samples using data augmentation techniques. Aiming to address the imbalance issue of crash injury severity data, this study applies a novel deep learning method, the Wasserstein generative adversarial network with gradient penalty (WGAN-GP), to investigate a massive amount of crash data, which can generate synthetic injury severity data linked to traffic crashes to rebalance the dataset. To evaluate the effectiveness of the WGAN-GP model, we systematically compare performances of various commonly-used sampling techniques (random under-sampling, random over-sampling, synthetic minority over-sampling technique and adaptive synthetic sampling) with respect to dataset balance and crash injury severity prediction. After rebalancing the dataset, this study categorizes the crash injury severity using logistic regression, multilayer perceptron, random forest, AdaBoost and XGBoost. The AUC, specificity and sensitivity are employed as evaluation indicators to compare the prediction performances.

RESULTS demonstrate that sampling techniques can considerably improve the prediction performance of minority classes in an imbalanced dataset, and the combination of XGBoost and WGAN-GP performs best with an AUC of 0.794 and a sensitivity of 0.698. Finally, the interpretability of the model is improved by the explainable machine learning technique SHAP (SHapley Additive exPlanation), allowing for a deeper understanding of the effects of each variable on crash injury severity.

FINDINGS of this study shed light on the prediction of crash injury severity with data imbalance using data-driven approaches.


Language: en

Keywords

Imbalanced data; Crash injury severity; Generative adversarial network; Sampling technique

NEW SEARCH


All SafetyLit records are available for automatic download to Zotero & Mendeley
Print