Skip to content

[NeurIPS 2025] The official implementation of "Towards Robust Zero-Shot Reinforcement Learning"

License

Notifications You must be signed in to change notification settings

DiffusionAD/BREEZE

 
 

Repository files navigation

Behavior-regularized zero-shot RL with Expressivity Enhancement (BREEZE)

Towards Robust Zero-shot Reinforcement Learning (NeurIPS 2025)

License: MIT Code style: black

Paper

The is the official codebase for Towards Robust Zero-shot Reinforcement Learning from Kexin Zheng*, Lauriane Teyssier*, Yinan Zheng, Yu Luo, Xianyuan Zhan
*Equal contribution

Content

Overview

BREEZE is an FB-based framework that simultaneously enhances learning stability, policy extraction capability, and representation learning quality, through three key designs:

  • Behavioral regularization in zero-shot RL policy learning, transforming policy optimization into a stable in-sample learning paradigm.
  • Task-conditioned diffusion model policy extraction, enabling the generation of high-quality and multimodal action distributions in zero-shot RL settings.
  • Attention-based architectures for representation modeling to capture the complex relationships between environmental dynamics.

Performance

BREEZE achieves the best or near-best returns with faster convergence and enhanced stability.

training curves for quadruped rnd
BREEZE within 400k steps can match or exceed baselines trained for 1M steps.
performance summary table

Setup

Requirements

  • Python 3.9
  • Mujoco - required by the DM Control suite. Note: While our experiments used separate MuJoCo binaries, the latest mujoco pip package now includes them.
  • Wandb - for experiment tracking. Set WANDB_API_KEY before launching experiments or pass --wandb_logging False.

Install dependencies

conda create -n breeze python=3.9
conda activate breeze
pip install -r requirements.txt

Data Preparation

All experiments rely on offline datasets from ExORL. Our repository includes a script to automatically download and reformat the datasets for the tasks and algorithms below.

ExORL Download & Reformat

bash data_prepare.sh

Domains and Tasks

Domain Eval Tasks Dimensionality Type Reward Command Line Argument
Walker stand walk run flip Low Locomotion Dense walker
Quadruped stand walk run jump High Locomotion Dense quadruped
Jaco reach_top_left reach_top_right reach_bottom_left reach_bottom_right High Goal-reaching Sparse jaco

Exploration Algorithms for Dataset Collection

Exploration Algorithm Command Line Argument
Random Network Distillation (RND) rnd
Diversity is All You Need (DIAYN) diayn
Active Pretraining with Successor Features (APS) aps
Reinforcement Learning with Prototypical Representations (PROTO) proto

Repository Structure

We provide the repository structure in repository_structure.md.

Running Experiments

The main entry point is main_offline.py, which takes the algorithm name, domain, and exploration policy that generated the dataset. Key flags:

usage: main_offline.py <algorithm> <domain_name> <exploration_algorithm> \
                       --eval_tasks TASK [TASK ...] [--train_task TASK]
                       [--seed INT] [--learning_steps INT]
                       [--z_inference_steps INT]
                       [--wandb_logging {True,False}]
  • algorithm: one of breeze, fb, cfb, vcfb, mcfb, cql, sac, td3, sf-lap(see table below).
  • domain_name: DMC domain (walker, quadruped, jaco, point_mass_maze, ...).
  • exploration_algorithm: dataset source tag (proto, rnd, aps, etc.).
  • --eval_tasks: list of downstream tasks for zero-shot evaluation.

Example

# BREEZE on Quadruped with RND exploration data
python main_offline.py breeze quadruped rnd \
  --eval_tasks stand run walk jump \
  --seed 42 --learning_steps 1000000

Configuration defaults (network sizes, optimizers, diffusion settings, etc.) are stored in agents/<algo>/config.yaml. Override any value via CLI flags or edit the YAML.

Available algorithms

Algorithm Authors Type Command Line Argument
Breeze Zheng et al. (2025) Zero-shot RL breeze
FB Representations Touati et al. (2023) Zero-shot RL fb
Conservative FB Representations (VCFB/MCFB) Jeen et al. (2024) Zero-shot RL mcfb/vcfb
Conservative Q-learning Kumar et al. (2020) Single-task Offline RL cql
Soft Actor-Critic (SAC) Haarnoja et al. (2018) Online RL sac
Twin Delayed DDPG (TD3) Fujimoto et al. (2018) Online RL td3
Successor Features with Laplacian Eigenfunctions (SF-LAP) Borsa et al. (2018) Zero-shot RL sf-lap

Reproducing the Paper

We provide the domain-specific hyperparameters used in our experiments in domain_specific_hyp.md.

Acknowledgements

We thank all the contributions of prior studies:

Citation

If you find this repository helpful, please consider citing our paper:

@inproceedings{zheng2025towards,
  title={Towards Robust Zero-Shot Reinforcement Learning},
  author={Kexin Zheng and Lauriane Teyssier and Yinan Zheng and Yu Luo and Xianyuan Zhan},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
  year={2025}
}

License

This project is licensed under the MIT License. See LICENSE for the full text.

About

[NeurIPS 2025] The official implementation of "Towards Robust Zero-Shot Reinforcement Learning"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.6%
  • Shell 0.4%