TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments

Oct 1010, 10100·
Daeun Song
Jing Liang
Jing Liang
,
Xuesu Xiao
,
Dinesh Manocha
· 0 min read
Image credit:
Abstract
We present a multi-modal trajectory generation and selection algorithm for real-world mapless outdoor navigation in challenging scenarios with unstructured off-road features like buildings, grass, and curbs. Our goal is to compute suitable trajectories that (1) satisfy the environment-specific traversability constraints and (2) match human-like paths while navigating in crosswalks, sidewalks, etc. Our formulation uses a Conditional Variational Autoencoder (CVAE) generative model enhanced with traversability constraints to generate multiple candidate trajectories for global navigation. We use VLMs and a visual prompting approach with their zero-shot ability of semantic understanding and logical reasoning to choose the best trajectory given the contextual information about the task. We evaluate our methods in various outdoor scenes with wheeled robots and compare the performance with other global navigation algorithms. In practice, we observe at least 3.35% improvement in the traversability and 20.61% improvement in terms of human-like navigation in generated trajectories in challenging outdoor navigation scenarios, such as sidewalks, crosswalks, etc.
Type