Predicting appearance of vehicles from blind spots based on pedestrian behaviors at crossroads

Hara, Kensho; Kataoka, Hirokatsu; Inaba, Masaki; Narioka, Kenichi; Hotta, Ryusuke; Satoh, Yutaka

doi:10.1109/TITS.2021.3108795

SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.

RSS Feed

HELP: Tutorials | FAQ

CONTACT US: Contact info

Search Results

Journal Article

Predicting appearance of vehicles from blind spots based on pedestrian behaviors at crossroads
Citation	Hara K, Kataoka H, Inaba M, Narioka K, Hotta R, Satoh Y. IEEE Trans. Intel. Transp. Syst. 2022; 23(8): 11917-11929.
Copyright	(Copyright © 2022, IEEE (Institute of Electrical and Electronics Engineers))
DOI	10.1109/TITS.2021.3108795
PMID	unavailable
Abstract	Conventional prediction approaches for traffic scenes primarily predict the future states of visible objects (i.e., not in blind spots) based on their current observations. This study focused on the prediction of future states of objects in blind spots (e.g., those outside the filed-of-view or occluded regions) based on the current observations of other visible objects. We proposed a method that predicts the appearance of vehicles from a blind spot based on the behaviors of visible pedestrians who observe vehicles in the blind spot. Our proposed method utilizes a spatiotemporal 3D convolutional neural network and learns pedestrian behaviors for predictions. The method explicitly represents subtle motions and the surrounding environments of pedestrians using pose estimation and semantic segmentation. To conduct evaluation experiments, we built two datasets of videos capturing real traffic scenes. The datasets are collected by cameras with and without ego-motions. Using the datasets, we conducted experiments not only on simpler configurations but also on realistic traffic environments. Based on the experimental results, the following conclusions could be obtained: (i) our proposed method achieved a high performance at a level similar to that of humans in our prediction task, and predicted the appearance of vehicles from blind spots more than 1.5 s before they actually appeared. (ii) Explicit representations of pose and semantic masks captured information complementary to RGB videos, and ensembling the representations improved the prediction performance. (iii) Fine-tuning the models using videos with ego-motions is important to achieve good prediction in the videos captured by driving cars. Language: en
Keywords	Accidents; action recognition; Cameras; Deep learning; future prediction; Semantics; spatiotemporal 3D convolution; Spatiotemporal phenomena; Three-dimensional displays; Vehicles; Videos