FOCUS: Object-Centric World Models for Robotics Manipulation

Stefano Ferraro*, Pietro Mazzaglia*, Tim Verbelen, Bart Dhoert
* Equal contribution

My Happy SVG

Article

Abstract

Understanding the world in terms of objects and the possible interplays with them is an important cognition ability, especially in robotics manipulation, where many tasks require robot-object interactions. However, learning such a structured world model, which specifically captures entities and relationships, remains a challenging and underexplored problem. To address this, we propose FOCUS, a model-based agent that learns an object-centric world model. Thanks to a novel exploration bonus that stems from the object-centric representation, FOCUS can be deployed on robotics manipulation tasks to explore object interactions more easily. Evaluating our approach on manipulation tasks across different settings, we show that object-centric world models allow the agent to solve tasks more efficiently and enable consistent exploration of robot-object interactions. Using a Franka Emika robot arm, we also showcase how FOCUS could be adopted in real-world settings.

Exploration Behaviour

Exploration behaviour of FOCUS at different training snapshots.
Snapshot:
100k
500k
1M
2M
out
out
out
out
out
out
out
out
out
out
out
out
out
out
out
out
out
out
out
out

World Model Reconstructions

Reconstructions of world models for the different tasks (random actions).
GT
FOCUS
DreamerV2
out
out
out
out
out
out
out

Exploration experiments

Detailed results of exploration experiments for the different environments.
out

Dense reward experiments

Detailed results of dense reward experiments for the different tasks.
out