Dichotomous Diffusion Policy Optimization

Ruiming Liang*, Yinan Zheng*, Kexin Zheng*, Tianyi Tan*, Jianxiong Li, Liyuan Mao, Zhihao Wang, Guang Chen, Hangjun Ye, Jingjing Liu, Jinqiao Wang $\dagger$, Xianyuan Zhan $\dagger$

📢 News

Jan 6, 2026: DIPOLE is now available on arXiv.
Jan 1, 2026: We released the official website and repo for DIPOLE.

🔥 Quick Start

Comming soon.

📊 Benchmarks

ExORL

Average score over 8 random seeds (w/o rs: without rejection sampling)

Domain	Task	IQL	ReBRAC	CFGRL	IFQL	FQL	DIPOLE w/o rs	DIPOLE
Walker	stand	603 ± 8	461 ± 3	782 ± 8	873 ± 6	801 ± 4	793 ± 11	953 ± 4
Walker	walk	444 ± 4	208 ± 6	608 ± 32	844 ± 11	755 ± 12	679 ± 16	910 ± 5
Walker	run	247 ± 10	98 ± 2	282 ± 6	406 ± 8	294 ± 11	256 ± 12	442 ± 9
Quadruped	walk	776 ± 15	344 ± 7	762 ± 25	883 ± 12	739 ± 25	813 ± 21	928 ± 55
Quadruped	run	485 ± 7	344 ± 3	571 ± 25	595 ± 18	503 ± 5	560 ± 11	657 ± 10
Cheetah	run	168 ± 7	97 ± 13	216 ± 15	269 ± 16	222 ± 14	194 ± 9	274 ± 12
Cheetah	run-backward	146 ± 8	85 ± 4	262 ± 26	310 ± 24	231 ± 12	227 ± 7	350 ± 15
Jaco	reach-top-right	33 ± 2	38 ± 13	72 ± 6	193 ± 9	224 ± 17	84 ± 5	117 ± 18
Jaco	reach-top-left	30 ± 8	59 ± 5	46 ± 6	181 ± 11	222 ± 42	63 ± 8	110 ± 12

OGBench

Aggregate score over all single tasks for each category (average over 8 random seeds)

Task Category	IQL	ReBRAC	IDQL	IFQL	FQL	DIPOLE
humanoidmaze-medium-navigate (5 tasks)	33 ± 2	2 ± 8	1 ± 0	60 ± 14	58 ± 5	68 ± 3
humanoidmaze-large-navigate (5 tasks)	2 ± 1	2 ± 1	1 ± 0	11 ± 2	4 ± 2	6 ± 2
antsoccer-arena-navigate (5 tasks)	8 ± 2	0 ± 0	12 ± 4	33 ± 6	60 ± 2	57 ± 7
cube-single-play (5 tasks)	83 ± 3	91 ± 2	95 ± 2	79 ± 2	96 ± 1	97 ± 2
cube-double-play (5 tasks)	7 ± 1	12 ± 1	15 ± 6	14 ± 3	29 ± 2	44 ± 7
scene-play (5 tasks)	28 ± 1	41 ± 3	46 ± 3	30 ± 3	56 ± 2	60 ± 2

NavSim

Method	Input	NC↑	DAC↑	TTC↑	Comf.↑	EP↑	PDMS↑
Constant Velocity	-	68.0	57.8	50.0	100.0	19.4	20.6
Ego Status MLP	-	93.0	77.3	83.6	100.0	62.8	65.6
UniAD	Cam	97.8	91.9	92.9	100.0	78.8	83.4
PARA-Drive	Cam	97.9	92.4	93.0	99.8	79.3	84.0
LFT	Cam	97.4	92.8	92.4	100.0	79.0	83.8
Transfuser	Cam & Lidar	97.7	92.8	92.8	100.0	79.2	84.0
Hydra-MDP	Cam & Lidar	98.3	96.0	94.6	100.0	78.7	86.5
DP-VLA (ours)	Cam	98.0	97.0	94.3	100.0	82.5	88.3
DP-VLA w/ DIPOLE navtrain (ours)	Cam	98.2	98.0	95.2	100.0	83.6	89.7
DP-VLA w/ DPPO navtest	Cam	97.9	97.6	94.1	100.0	83.5	89.0
DP-VLA w/ DIPOLE navtest (ours)	Cam	99.2	98.7	95.6	99.8	94.2	94.8

✍️ Citation

@article{liang2026dipole,
  title={Dichotomous Diffusion Policy Optimization},
  author={Ruiming Liang and Yinan Zheng and Kexin Zheng and Tianyi Tan and Jianxiong Li and Liyuan Mao and Zhihao Wang and Guang Chen and Hangjun Ye and Jingjing Liu and Jinqiao Wang and Xianyuan Zhan},
  journal={arXiv preprint arXiv:2601.00898},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
rl-bench @ 00d4dec		rl-bench @ 00d4dec
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dichotomous Diffusion Policy Optimization

📢 News

🔥 Quick Start

📊 Benchmarks

ExORL

OGBench

NavSim

✍️ Citation

About

Uh oh!

Releases

Packages

DiffusionAD/DIPOLE

Folders and files

Latest commit

History

Repository files navigation

Dichotomous Diffusion Policy Optimization

📢 News

🔥 Quick Start

📊 Benchmarks

ExORL

OGBench

NavSim

✍️ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages