Conformalized Interactive Imitation Learning:

Handling Expert Shift and Intermittent Feedback

Carnegie Mellon Unversity

When the learned robot policy is deployed, ConformalDAgger calibrates its uncertainty (represented by red prediction interval boxes) based on feedback received from the human. The robot refrains from asking questions when its uncertainty is low, but when the human shifts in their task policy, the robot uncertainty increases causing the robot to actively query the expert for help.

Abstract

In interactive imitation learning (IL), uncertainty quantification offers a way for the learner (i.e. robot) to contend with distribution shifts encountered during deployment by actively seeking additional feedback from an expert (i.e. human) online. Prior works use mechanisms like ensemble disagreement or Monte Carlo dropout to quantify when black-box IL policies are uncertain; however, these approaches can lead to overconfident estimates when faced with deployment-time distribution shifts. Instead, we contend that we need uncertainty quantification algorithms that can leverage the expert human feedback received during deployment time to adapt the robot's uncertainty online. To tackle this, we draw upon online conformal prediction, a distribution-free method for constructing prediction intervals online given a stream of ground-truth labels. Human labels, however, are intermittent in the interactive IL setting. Thus, from the conformal prediction side, we introduce a novel uncertainty quantification algorithm called intermittent quantile tracking (IQT) that leverages a probabilistic model of intermittent labels, maintains asymptotic coverage guarantees, and empirically achieves desired coverage levels. From the interactive IL side, we develop ConformalDAgger, a new approach wherein the robot uses prediction intervals calibrated by IQT as a reliable measure of deployment-time uncertainty to actively query for more expert feedback. We compare ConformalDAgger to prior uncertainty-aware DAgger methods in scenarios where the distribution shift is (and isn't) present because of changes in the expert's policy. We find that in simulated and hardware deployments on a 7DOF robotic manipulator, ConformalDAgger detects high uncertainty when the expert shifts and increases the number of interventions compared to baselines, allowing the robot to more quickly learn the new behavior.


Key Contributions

  • Contribution 1: Intermittent Quantile Tracking

    How to do online conformal prediction without access to expert feedback at every timestep (intermittent feedback)?

  • Contribution 2: Conformal Interactive Imitation Learning

    Using human feedback received during deployment time as a valuable uncertainty quantification signal that is leveraged to update the robot’s uncertainty estimate online, and inform when the robot should ask for help online!



Intermittent Online Conformal Prediction

In this work, we relax the assumption that labels must be observed at each time point in the streaming data and extend the online conformal paradigm to ensure coverage in settings where ground truth labels are intermittently observed. We present our approach in the context of quantile tracking.


ConformalDAgger

After obtaining an initial learner policy, ConformalDAgger calibrates uncertainty during the interactive deployment episode with the expert via intermittent quantile tracking. When the size of the uncertainty intervals is high, the robot actively queries the user for feedback. When uncertainty is low, the robot executes its predicted action and the human may independently intervene with some low probability. After deployment episode ends, the data is aggregated and the learner retrained.