Research on behavioral decision at an unsignalized roundabout for automatic driving based on proximal policy optimization algorithm

Gan, Jingpeng; Zhang, Jiancheng; Liu, Yuansheng

doi:10.3390/app14072889

SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.

RSS Feed

HELP: Tutorials | FAQ

CONTACT US: Contact info

Search Results

Journal Article

Research on behavioral decision at an unsignalized roundabout for automatic driving based on proximal policy optimization algorithm
Citation	Gan J, Zhang J, Liu Y. Appl. Sci. (Basel) 2024; 14(7): e2889.
Copyright	(Copyright © 2024, MDPI: Multidisciplinary Digital Publishing Institute)
DOI	10.3390/app14072889
PMID	unavailable
Abstract	Unsignalized roundabouts have a significant impact on traffic flow and vehicle safety. To address the challenge of autonomous vehicles passing through roundabouts with low penetration, improve their efficiency, and ensure safety and stability, we propose the proximal policy optimization (PPO) algorithm to enhance decision-making behavior. We develop an optimization-based behavioral choice model for autonomous vehicles that incorporates gap acceptance theory and deep reinforcement learning using the PPO algorithm. Additionally, we employ the CoordConv network to establish an aerial view for spatial perception information gathering. Furthermore, a dynamic multi-objective reward mechanism is introduced to maximize the PPO algorithm's reward pool function while quantifying environmental factors. Through simulation experiments, we demonstrate that our optimized PPO algorithm significantly improves training efficiency by enhancing the reward value function by 2.85%, 7.17%, and 19.58% in scenarios with 20, 100, and 200 social vehicles, respectively, compared to the PPO+CCMR algorithm. The effectiveness of simulation training also increases by 11.1%, 13.8%, and 7.4%. Moreover, there is a reduction in crossing time by 2.37%, 2.62%, and 13.96%. Our optimized PPO algorithm enhances path selection during autonomous vehicle simulation training as they tend to drive in the inner ring over time; however, the influence of social vehicles on path selection diminishes as their quantity increases. The safety of autonomous vehicles remains largely unaffected by our optimized PPO algorithm. Language: en
Keywords	autonomous vehicle; deep reinforcement learning; gap acceptance theory; optimized PPO algorithm; unsignalized roundabout