UNISafe: Uncertainty-aware Latent Safety Filters for Avoiding Out-of-Distribution Failures

Carnegie Mellon University
Under Review, 2025

Abstract

Recent advances in generative world models have enabled classical safe control methods, such as Hamilton-Jacobi (HJ) reachability, to generalize to complex robotic systems operating directly from high-dimensional sensor observations. However, obtaining comprehensive coverage of all safety-critical scenarios during world model training is extremely challenging. As a result, latent safety filters built on top of these models may miss novel hazards and even fail to prevent known ones, overconfidently misclassifying risky out-of-distribution (OOD) situations as safe. To address this, we introduce an uncertainty-aware latent safety filter that proactively steers robots away from both known and unseen failures. Our key idea is to use the world model's epistemic uncertainty as a proxy for identifying unseen potential hazards. We propose a principled method to detect OOD world model predictions by calibrating an uncertainty threshold via conformal prediction. By performing reachability analysis in an augmented state space--spanning both the latent representation and the epistemic uncertainty--we synthesize a latent safety filter that can reliably safeguard arbitrary policies from both known and unseen safety hazards. In simulation and hardware experiments on vision-based control tasks with a Franka manipulator, we show that our uncertainty-aware safety filter preemptively detects potential unsafe scenarios and reliably proposes safe, in-distribution actions.

Challenge: Unreliable WM Can Result in OOD failures.

Description

While latent safety filters can compute control strategies that prevent hard-to-model failures, their training and runtime filtering rely on imagined futures generated by the latent dynamics model. However, a pretrained world model can hallucinate in uncertain scenarios where it lacks knowledge, leading to OOD failures.

Consider the simple example in Figure above where a Dubins car must avoid two failure sets: a circular grey and a rectangular purple region. The world model is trained with RGB images of the environment and angular velocity actions, but the model training data is limited, lacking knowledge of the robot entering the purple failure set. When the world model imagines an action sequence in which the robot enters this region, the world model hallucinates as soon as the scenario goes out-of-distribution: the robot teleports away from the failure region and to a safe state. This phenomenon leads to latent safety filters that cannot prevent unseen failures, and even known failures, due to optimistic safety estimates of uncertain out-of-distribution scenarios.

UNISafe: UNcertainty-aware Imagination for Safety filtering

Description
(Left): We quantify the world model’s epistemic uncertainty for detecting unseen failures in latent space and calibrate an uncertainty threshold via conformal prediction, resulting in an OOD failure set. (Center): Uncertainty-aware latent reachability analysis synthesizes a safety monitor and fallback policy that steers the system away from both known and OOD failures. (Right): Our safety filter reliably safeguards arbitrary task policies during hard-to-model vision-based tasks, like a teleoperator playing the game of Jenga.

Benchmark Safe Control Task with a 3D Dubins Car

Description

Description

Simulation: Block Plucking

Description

Hardware Experiments: Vision-based Jenga with a Robotic Manipulator

Description

Description

BibTeX

BibTex Code Here