Recent advances in generative world models have enabled classical safe control methods, such as Hamilton-Jacobi (HJ) reachability, to generalize to complex robotic systems operating directly from high-dimensional sensor observations. However, obtaining comprehensive coverage of all safety-critical scenarios during world model training is extremely challenging. As a result, latent safety filters built on top of these models may miss novel hazards and even fail to prevent known ones, overconfidently misclassifying risky out-of-distribution (OOD) situations as safe. To address this, we introduce an uncertainty-aware latent safety filter that proactively steers robots away from both known and unseen failures. Our key idea is to use the world model's epistemic uncertainty as a proxy for identifying unseen potential hazards. We propose a principled method to detect OOD world model predictions by calibrating an uncertainty threshold via conformal prediction. By performing reachability analysis in an augmented state space--spanning both the latent representation and the epistemic uncertainty--we synthesize a latent safety filter that can reliably safeguard arbitrary policies from both known and unseen safety hazards. In simulation and hardware experiments on vision-based control tasks with a Franka manipulator, we show that our uncertainty-aware safety filter preemptively detects potential unsafe scenarios and reliably proposes safe, in-distribution actions.
While latent safety filters can compute control strategies that prevent hard-to-model failures, their training and runtime filtering rely on imagined futures generated by the latent dynamics model. However, a pretrained world model can hallucinate in uncertain scenarios where it lacks knowledge, leading to OOD failures.
Consider the simple example in Figure above where a Dubins car must avoid two failure sets: a circular grey and a rectangular purple region. The world model is trained with RGB images of the environment and angular velocity actions, but the model training data is limited, lacking knowledge of the robot entering the purple failure set. When the world model imagines an action sequence in which the robot enters this region, the world model hallucinates as soon as the scenario goes out-of-distribution: the robot teleports away from the failure region and to a safe state. This phenomenon leads to latent safety filters that cannot prevent unseen failures, and even known failures, due to optimistic safety estimates of uncertain out-of-distribution scenarios.
(Left): We quantify the world model’s epistemic uncertainty for detecting unseen failures in latent space and calibrate an uncertainty threshold via conformal prediction, resulting in an OOD failure set. (Center): Uncertainty-aware latent reachability analysis synthesizes a safety monitor and fallback policy that steers the system away from both known and OOD failures. (Right): Our safety filter reliably safeguards arbitrary task policies during hard-to-model vision-based tasks, like a teleoperator playing the game of Jenga.
We first conduct experiments with a low-dimensional, benchmark safe navigation task where privileged information about the state, dynamics, safe set, and safety controller is available.
UNISafe reliably identifies the OOD failure: To evaluate OOD detection, we first consider a setting where failure states are never observed by
UNISafe robustly learns safety filters despite high uncertainties in the world models: We evaluate whether our method can synthesize a robust safety filter under uncertainty due to limited data coverage. Here, the vehicle must avoid a circular obstacle of radius
We scale our method to a visual manipulation task using IsaacLab, where a Franka manipulator must pluck the middle block from a stack of three while ensuring the top one remains on the bottom one. Observations consist of images from a wrist-mount and a tabletop camera, with 7-D proprioceptive inputs. Actions are a 6-DoF end-effector delta pose with a discrete gripper command.
UNISafe minimizes failure by preventing safety overestimation UNISafe that incorporates both known and OOD failures achieves the lowest failure rates and model errors. In contrast, LatentSafe that does not incorporate OOD failures overestimates the safety of OOD actions, leading to unsafe action proposals.
We evaluate our method on a real-world robotic manipulation task using a fixed-base Franka Research 3 arm, equipped with a third-person camera and a wrist-mounted camera. The robot must extract a target block from a tower without collapsing, then place it on top.
Teleoperator Playing Jenga with Safety Filters. UNISafe enables non-conservative yet effective filtering of the teleoperator’s actions, ensuring the system remains within the in-distribution regions. In contrast, the uncertainty-unaware safety filter ( LatentSafe ) optimistically treats uncertain actions as safe, leading to failure.
Our latent safety filter ( UNISafe ) allows stable block removal that is safe and predictable.
UNISafe reliably corrects the teleoperator by proposing in-distribution safe actions.
Uncertainty-unaware latent safety filter ( LatentSafe ) fails due to optimistic imagination of futures, leading to high uncertainty.
OOD Visual inputs. Although the block colors differ from those seen during training, such visual variations do not necessarily imply out-of-distribution inputs. Instead, the decision to halt is based on the reliability of the filtering system. If the color change falls within the model’s generalization capacity, the latent dynamics model remains accurate, and its predictive uncertainty stays below the safety threshold. In contrast, when the visual input significantly departs from the training distribution, the model’s predictions become unreliable. The resulting increase in uncertainty causes the safety filter to trigger a halt, preventing potentially unsafe actions.
@article{seo2025uncertainty,
title={Uncertainty-aware Latent Safety Filters for Avoiding Out-of-Distribution Failures},
author={Seo, Junwon and Nakamura, Kensuke and Bajcsy, Andrea},
journal={Conference on Robot Learning (CoRL)},
year={2025}
}