Behavior-regularized zero-shot RL with Expressivity Enhancement (BREEZE)

Towards Robust Zero-shot Reinforcement Learning (NeurIPS 2025)

The is the official codebase for Towards Robust Zero-shot Reinforcement Learning from Kexin Zheng*, Lauriane Teyssier*, Yinan Zheng, Yu Luo, Xianyuan Zhan
*Equal contribution

Overview

BREEZE is an FB-based framework that simultaneously enhances learning stability, policy extraction capability, and representation learning quality, through three key designs:

Behavioral regularization in zero-shot RL policy learning, transforming policy optimization into a stable in-sample learning paradigm.
Task-conditioned diffusion model policy extraction, enabling the generation of high-quality and multimodal action distributions in zero-shot RL settings.
Attention-based architectures for representation modeling to capture the complex relationships between environmental dynamics.

Performance

BREEZE achieves the best or near-best returns with faster convergence and enhanced stability.

BREEZE within 400k steps can match or exceed baselines trained for 1M steps.

Setup

Requirements

Python 3.9
Mujoco - required by the DM Control suite. Note: While our experiments used separate MuJoCo binaries, the latest mujoco pip package now includes them.
Wandb - for experiment tracking. Set WANDB_API_KEY before launching experiments or pass --wandb_logging False.

Install dependencies

conda create -n breeze python=3.9
conda activate breeze
pip install -r requirements.txt

Data Preparation

All experiments rely on offline datasets from ExORL. Our repository includes a script to automatically download and reformat the datasets for the tasks and algorithms below.

ExORL Download & Reformat

bash data_prepare.sh

Domains and Tasks

Domain	Eval Tasks	Dimensionality	Type	Reward	Command Line Argument
Walker	`stand` `walk` `run` `flip`	Low	Locomotion	Dense	`walker`
Quadruped	`stand` `walk` `run` `jump`	High	Locomotion	Dense	`quadruped`
Jaco	`reach_top_left` `reach_top_right` `reach_bottom_left` `reach_bottom_right`	High	Goal-reaching	Sparse	`jaco`

Exploration Algorithms for Dataset Collection

Exploration Algorithm	Command Line Argument
Random Network Distillation (RND)	`rnd`
Diversity is All You Need (DIAYN)	`diayn`
Active Pretraining with Successor Features (APS)	`aps`
Reinforcement Learning with Prototypical Representations (PROTO)	`proto`

Repository Structure

We provide the repository structure in repository_structure.md.

Running Experiments

The main entry point is main_offline.py, which takes the algorithm name, domain, and exploration policy that generated the dataset. Key flags:

usage: main_offline.py <algorithm> <domain_name> <exploration_algorithm> \
                       --eval_tasks TASK [TASK ...] [--train_task TASK]
                       [--seed INT] [--learning_steps INT]
                       [--z_inference_steps INT]
                       [--wandb_logging {True,False}]

algorithm: one of breeze, fb, cfb, vcfb, mcfb, cql, sac, td3, sf-lap(see table below).
domain_name: DMC domain (walker, quadruped, jaco, point_mass_maze, ...).
exploration_algorithm: dataset source tag (proto, rnd, aps, etc.).
--eval_tasks: list of downstream tasks for zero-shot evaluation.

Example

# BREEZE on Quadruped with RND exploration data
python main_offline.py breeze quadruped rnd \
  --eval_tasks stand run walk jump \
  --seed 42 --learning_steps 1000000

Configuration defaults (network sizes, optimizers, diffusion settings, etc.) are stored in agents/<algo>/config.yaml. Override any value via CLI flags or edit the YAML.

Available algorithms

Algorithm	Authors	Type	Command Line Argument
Breeze	Zheng et al. (2025)	Zero-shot RL	`breeze`
FB Representations	Touati et al. (2023)	Zero-shot RL	`fb`
Conservative FB Representations (VCFB/MCFB)	Jeen et al. (2024)	Zero-shot RL	`mcfb/vcfb`
Conservative Q-learning	Kumar et al. (2020)	Single-task Offline RL	`cql`
Soft Actor-Critic (SAC)	Haarnoja et al. (2018)	Online RL	`sac`
Twin Delayed DDPG (TD3)	Fujimoto et al. (2018)	Online RL	`td3`
Successor Features with Laplacian Eigenfunctions (SF-LAP)	Borsa et al. (2018)	Zero-shot RL	`sf-lap`

Reproducing the Paper

We provide the domain-specific hyperparameters used in our experiments in domain_specific_hyp.md.

Acknowledgements

We thank all the contributions of prior studies:

This implementation is based on the Zero-Shot Reinforcement Learning from Low Quality Data codebase.
The implementation of Diffusion model is based on IDQL

Citation

If you find this repository helpful, please consider citing our paper:

@inproceedings{zheng2025towards,
  title={Towards Robust Zero-Shot Reinforcement Learning},
  author={Kexin Zheng and Lauriane Teyssier and Yinan Zheng and Yu Luo and Xianyuan Zhan},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
  year={2025}
}

License

This project is licensed under the MIT License. See LICENSE for the full text.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
agents		agents
custom_dmc_tasks		custom_dmc_tasks
docs		docs
img		img
rewards		rewards
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_prepare.sh		data_prepare.sh
dmc.py		dmc.py
exorl_reformatter.py		exorl_reformatter.py
main_kitchen.py		main_kitchen.py
main_offline.py		main_offline.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Behavior-regularized zero-shot RL with Expressivity Enhancement (BREEZE)

Towards Robust Zero-shot Reinforcement Learning (NeurIPS 2025)

Content

Overview

Performance

Setup

Requirements

Data Preparation

Repository Structure

Running Experiments

Available algorithms

Reproducing the Paper

Acknowledgements

Citation

License

About

Uh oh!

Releases

Packages

Languages

License

DiffusionAD/BREEZE

Folders and files

Latest commit

History

Repository files navigation

Behavior-regularized zero-shot RL with Expressivity Enhancement (BREEZE)

Towards Robust Zero-shot Reinforcement Learning (NeurIPS 2025)

Content

Overview

Performance

Setup

Requirements

Data Preparation

Repository Structure

Running Experiments

Available algorithms

Reproducing the Paper

Acknowledgements

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages