TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments

Oct 1010, 10100·

Daeun Song

Jing Liang

Xuesu Xiao

Dinesh Manocha

· 0 min read

PDF Code

Image credit:

Abstract

We present a multi-modal trajectory generation and selection algorithm for real-world mapless outdoor navigation in challenging scenarios with unstructured off-road features like buildings, grass, and curbs. Our goal is to compute suitable trajectories that (1) satisfy the environment-specific traversability constraints and (2) match human-like paths while navigating in crosswalks, sidewalks, etc. Our formulation uses a Conditional Variational Autoencoder (CVAE) generative model enhanced with traversability constraints to generate multiple candidate trajectories for global navigation. We use VLMs and a visual prompting approach with their zero-shot ability of semantic understanding and logical reasoning to choose the best trajectory given the contextual information about the task. We evaluate our methods in various outdoor scenes with wheeled robots and compare the performance with other global navigation algorithms. In practice, we observe at least 3.35% improvement in the traversability and 20.61% improvement in terms of human-like navigation in generated trajectories in challenging outdoor navigation scenarios, such as sidewalks, crosswalks, etc.

Type

Preprint

Last updated on Oct 1010, 10100

Traversability Analysis Language Models Outdoor Navigation

Authors

Jing Liang

PhD Student

← On the safety concerns of deploying llms/vlms in robotics: Highlighting the risks and vulnerabilities Oct 1010, 10100