Visual attention framework for identifying semantic information from construction monitoring video

Zhong, Botao; Shen, Luoxin; Pan, Xing; Lei, Lei

doi:10.1016/j.ssci.2023.106122

SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.

RSS Feed

HELP: Tutorials | FAQ

CONTACT US: Contact info

Search Results

Journal Article

Visual attention framework for identifying semantic information from construction monitoring video
Citation	Zhong B, Shen L, Pan X, Lei L. Safety Sci. 2023; 163: e106122.
Copyright	(Copyright © 2023, Elsevier Publishing)
DOI	10.1016/j.ssci.2023.106122
PMID	unavailable
Abstract	Construction safety management has been extensively investigated. Construction cameras have been widely adopted to monitor people's performance in construction on-site. However, manually analyzing large quantities of video or image data is time-consuming and labor-intensive. Existing studies mostly focus on single element identification in videos or images, while the deeper semantic understanding of construction scenes with the whole scene is limited. Drawing on the attention mechanism, a framework is proposed to address this problem and identify semantic information such as multiple objects, relationships, and attributes from construction videos. This framework comprises the following two-step modeling approach: (1) a frame extraction model with an interframe difference mechanism is proposed to extract frames/images from construction videos and (2) an image scene understanding model that integrates a ResNet101 "encoder" and an LSTM + Attention "decoder" is put forward to identify semantic information/natural language descriptions from frames/images. Finally, the proposed framework is validated by multiple experiments with offline image datasets of construction scenes. The contributions of this research are twofold: (1) The proposed visual attention framework represents a significant and data-driven advancement in the cross-modal processing of construction video-image-natural language descriptions; (2) The automatic generation of video semantic information facilitates construction safety management such as workers' safety state estimation and monitoring video/image retrieval and storage. Language: en
Keywords	Construction safety management; Frame extraction; Monitoring video; Scene understanding; Semantic information; Visual attention framework