[TL;DR] PIWM is a lightweight, physics‑informed generative model that predicts future images from the current image and actions — enabling forecasting with strong existential and temporal consistency in dynamic environments.
The following implementation is based on DIAMOND.
- One-page-paper upload (to be done before
[2025/10/30]) - What-is-it video create (to be done before
[2025/10/15]) - Dataset release
- Code release
- Preprint release
-
Quick start with Miniconda:
git clone https://github.com/TUM-AVS/physics-wm.git cd physics-wm conda create -n PIWM python=3.10 conda activate PIWM pip install -r requirements.txt -
To quickly play with our trained physics‑informed BEV world model (PIWM): the first run will automatically download the pretrained model and several spawn points from the HuggingFace Hub 🤗. Please reserve ~1.55GB of disk space.
python src/play.py
When the download completes, press Enter to start. Use WASD and Space to control.
top.mp4
-
The default fast config runs best on a CUDA GPU (>12 FPS on an RTX 4080 laptop). The model also runs faster if compiled (>21 FPS on an RTX 4080 Laptop).
python src/play.py --compile
All demo videos and performance measurements use the fast config referenced by the trainer. You can switch to higher_quality for improved quality at reduced speed.
We collected 2,000 episodes in HighwayEnv with an MCTS agent, yielding 2 million BEV frames with aligned states and actions as the training dataset.
We used a random test split of 200 episodes of 1000 steps (specified in test_split.txt), and the remaining 1,800 episodes for training.
To get the data ready for training on your machine:
-
Step 1: Download the preprocessed dataset from the HuggingFace Hub 🤗 and unzip it (~550GB disk space).
Dataset structure:
your_path/highway_dataset_processed ├── full_res └── low_res
-
Step 2: Then edit config/env/piwm.yaml and set:
`path_data_low_res` to `<your_path>/highway_dataset_processed/low_res` `path_data_full_res` to `<your_path>/highway_dataset_processed/full_res`
Then you can launch a training run with
python src/main.pyThe provided configuration took around 18 hours on an RTX 4090.
-
Adjust Soft Mask related weights used during training and inference in
piwm.yaml.Our released model uses:
- Training:
mask_gain_train: 1.0 mask_gain_infer: 0.45 # not used in training mask_w_blue: 1.0 mask_w_green: 0.8
- Inference:
mask_gain_train: 1.0 # not used in inference mask_gain_infer: 0.45 mask_w_blue: 0.55 mask_w_green: 0.45
- Training:
-
Enable zero-shot Warm Start to inject contextual information at inference time and improve stability for smaller models:
python src/play.py --warmstart
@misc{anonymous,
title={Enhancing Physical Consistency in Lightweight World Models},
author={Anonymous Author(s) for now},
year={2025},
eprint={2509.12437},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2509.12437},
}
}
