Open-vocabulary predictive world models from sensor observations

Karlsson, Robin; Asfandiyarov, Ruslan; Carballo, Alexander; Fujii, Keisuke; Ohtani, Kento; Takeda, Kazuya

doi:10.3390/s24144735

SAFETYLIT WEEKLY UPDATE

We compile citations and summaries of about 400 new articles every week.

RSS Feed

HELP: Tutorials | FAQ

CONTACT US: Contact info

Search Results

Journal Article

Open-vocabulary predictive world models from sensor observations
Citation	Karlsson R, Asfandiyarov R, Carballo A, Fujii K, Ohtani K, Takeda K. Sensors (Basel) 2024; 24(14).
Copyright	(Copyright © 2024, MDPI: Multidisciplinary Digital Publishing Institute)
DOI	10.3390/s24144735
PMID	39066133
PMCID	PMC11281213
Abstract	Cognitive scientists believe that adaptable intelligent agents like humans perform spatial reasoning tasks by learned causal mental simulation. The problem of learning these simulations is called predictive world modeling. We present the first framework for a learning open-vocabulary predictive world model (OV-PWM) from sensor observations. The model is implemented through a hierarchical variational autoencoder (HVAE) capable of predicting diverse and accurate fully observed environments from accumulated partial observations. We show that the OV-PWM can model high-dimensional embedding maps of latent compositional embeddings representing sets of overlapping semantics inferable by sufficient similarity inference. The OV-PWM simplifies the prior two-stage closed-set PWM approach to the single-stage end-to-end learning method. CARLA simulator experiments show that the OV-PWM can learn compact latent representations and generate diverse and accurate worlds with fine details like road markings, achieving 69 mIoU over six query semantics on an urban evaluation sequence. We propose the OV-PWM as a versatile continual learning paradigm for providing spatio-semantic memory and learned internal simulation capabilities to future general-purpose mobile robots. Language: en
Keywords	autonomous driving; BEV generation; continual learning; generative models; mobile robots; open-vocabulary semantics; self-supervised learning; world models

BACK TO RESULTS

NEW SEARCH
Download this record to:
RIS | BibTeX | EndNote

All SafetyLit records are available for automatic download to Zotero & Mendeley

Print
Email

Find full text at...

- Direct link (DOI)
- Publisher website
PubMed Central
- Google Scholar
- Inter-Library Document Request Form (pdf)