SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.
RSS Feed

HELP: Tutorials | FAQ
CONTACT US: Contact info

Search Results

Journal Article

Citation

Hara K, Kataoka H, Inaba M, Narioka K, Hotta R, Satoh Y. IEEE Trans. Intel. Transp. Syst. 2022; 23(8): 11917-11929.

Copyright

(Copyright © 2022, IEEE (Institute of Electrical and Electronics Engineers))

DOI

10.1109/TITS.2021.3108795

PMID

unavailable

Abstract

Conventional prediction approaches for traffic scenes primarily predict the future states of visible objects (i.e., not in blind spots) based on their current observations. This study focused on the prediction of future states of objects in blind spots (e.g., those outside the filed-of-view or occluded regions) based on the current observations of other visible objects. We proposed a method that predicts the appearance of vehicles from a blind spot based on the behaviors of visible pedestrians who observe vehicles in the blind spot. Our proposed method utilizes a spatiotemporal 3D convolutional neural network and learns pedestrian behaviors for predictions. The method explicitly represents subtle motions and the surrounding environments of pedestrians using pose estimation and semantic segmentation. To conduct evaluation experiments, we built two datasets of videos capturing real traffic scenes. The datasets are collected by cameras with and without ego-motions. Using the datasets, we conducted experiments not only on simpler configurations but also on realistic traffic environments. Based on the experimental results, the following conclusions could be obtained: (i) our proposed method achieved a high performance at a level similar to that of humans in our prediction task, and predicted the appearance of vehicles from blind spots more than 1.5 s before they actually appeared. (ii) Explicit representations of pose and semantic masks captured information complementary to RGB videos, and ensembling the representations improved the prediction performance. (iii) Fine-tuning the models using videos with ego-motions is important to achieve good prediction in the videos captured by driving cars.


Language: en

Keywords

Accidents; action recognition; Cameras; Deep learning; future prediction; Semantics; spatiotemporal 3D convolution; Spatiotemporal phenomena; Three-dimensional displays; Vehicles; Videos

NEW SEARCH


All SafetyLit records are available for automatic download to Zotero & Mendeley
Print