Language Control Diffusion: Efficiently Scaling through Space, Time, and Tasks

Abstract

Training generalist agents is difficult across several axes, requiring us to deal with high dimensional inputs (space), long horizons (time), and multiple and new tasks. Recent advances with architectures have allowed for improved scaling along one or two of these dimensions, but are still prohibitive computationally. In this paper, we propose to address all three axes by leveraging Language to Control Diffusion models as a hierarchical planner conditioned on language lcd.

We effectively and efficiently scale diffusion models for planning in extended temporal, state, and task dimensions to tackle long horizon control problems conditioned on natural language instructions. We compare LCD with other state-of-the-art models on the CALVIN language robotics benchmark and find that LCD outperforms other SOTA methods in multi task success rates while dramatically improving computational efficiency with a single task success rate (SR) of 88.7% against the previous best of 82.7%.

We show that LCD can successfully leverage the unique strength of diffusion models to produce coherent long range plans while addressing their weakness at generating low-level details and contro

Efficient Training & Inference

Through the usage of DDIM, temporal abstraction, and low-dimensional generation, we find in this figure that LCD is significantly faster during rollout and training than Diffuser. In addition, our method is significantly faster to train than HULC, albeit slower during rollout. This is to be expected, as HULC is not a diffusion model and only requires a single forward pass for generation. However, when comparing to other diffusion models we find that our method is 3.3x-15x faster during inference and 1.5x-3.7x faster during training, answering our second research question regarding how much efficiency is gained by planning in a latent space. All numbers are taken from our own experiments and server for reproducibility and fairness, including baselines.

Robust Against Hyperparameters

In this figure, we show that our diffusion method is robust to several hyperparameters including a frame offset o and the hidden dimensions of the model. Frame offset o controls the spacing between goal states in the sampled latent plan, which affects the temporal resolution of the representations. We consider augmenting our dataset by sampling a d ~ Unif[-o, o] for goal state s_c+d rather than just s_c, potentially improving generalization. However, we find that this effect is more or less negligible. Finally, we find our method is also robust with respect to the number of parameters, and instantiating a larger model by increasing the model dimensions does not induce overfitting.

Related Work

This project is built on some exceptional prior work.

Planning with Diffusion for Flexible Behavior Synthesis introduces the base diffusion-based planning model that we adopt as the high-level policy.

CALVIN provides the dataset and benchmark for evaluating the performance of our agent.

Finally, HULC offers a strong baseline policy that we use for comparison and as our low-level controller.,

Get Started

$ git clone git@github.com:ezhang7423/language-control-diffusion.git 
$ make install && conda activate lcd
$ lcd                                                                                             
 Usage: lcd [OPTIONS] COMMAND [ARGS]...                                                      
                                                                                             
╭─ Options ─────────────────────────────────────────────────────────────────────────────────╮
│ --install-completion          Install completion for the current shell.                   │
│ --show-completion             Show completion for the current shell, to copy it or        │
│                               customize the installation.                                 │
│ --help                        Show this message and exit.                                 │
╰───────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ────────────────────────────────────────────────────────────────────────────────╮
│ rollout        Rollout in the environment for evaluation or dataset collection            │
│ train_hulc     Train the original hulc model                                              │
│ train_lcd      Train the original hulc model                                              │
╰───────────────────────────────────────────────────────────────────────────────────────────╯

BibTeX

@inproceedings{zhang2024language,
title={Language Control Diffusion: Efficiently Scaling through Space, Time, and Tasks},
author={Edwin Zhang and Yujie Lu and Shinda Huang and William Yang Wang and Amy Zhang},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=0H6DFoZZXZ}
}