🛸 BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation

Yuanhong Yu • Xingyi He • Chen Zhao • Junhao Yu • Jiaqi Yang • Ruizhen Hu • Yujun Shen • Xing Zhu • Xiaowei Zhou • Sida Peng

Project Page | Paper | 🤗 Demo

ICCV 2025

📦✨ Dream up accurate 3D bounding boxes for objects in the wild!

📅 News

[2025.10.06] 🎉 BoxDreamer demo upgraded! Now supports CLI usage for easier interaction!
[2025.06.26] 🏆 Paper accepted by ICCV 2025! Code is now open-sourced on GitHub!
[2025.04.10] 📄 BoxDreamer paper released on arXiv!

📦 Method Overview

💻 Installation

BoxDreamer supports two installation methods: a fast automated script or manual step-by-step installation. Choose the method that best suits your needs.

Method 1: Fast Installation (Recommended)

If your system is compatible with PyTorch 2.5.1 + CUDA 12.1, use our automated installation script:

bash install.sh

After successful installation, you can immediately start using BoxDreamer via the CLI or Gradio demo.

Method 2: Manual Installation

For custom configurations or troubleshooting, follow these steps:

Step 1: Create Environment

# Create and activate conda environment
conda create -n boxdreamer python=3.11
conda activate boxdreamer

Step 2: Install uv Package Manager

We recommend using uv for faster dependency installation:

pip install uv

Step 3: Install Core Dependencies

# Install PyTorch (adjust CUDA version if needed)
uv pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121

# Install PyTorch3D
pip install "git+https://github.com/facebookresearch/pytorch3d.git"

# Install Flash Attention
pip install flash_attn

# Install xformers
pip install xformers==0.0.28.post3

Step 4: Install Project Dependencies

# Install required packages
uv pip install -r requirements.txt

# Install BoxDreamer
pip install -e .

Step 5: (Optional) Install SAM2 for Real-World Demo

For CUDA 12.1 + Python 3.11 + PyTorch 2.5.1:

# Download pre-built wheel from https://miropsota.github.io/torch_packages_builder/sam-2/
pip install https://github.com/MiroPsota/torch_packages_builder/releases/download/SAM_2-1.0%2Bc2ec8e1/SAM_2-1.0%2Bc2ec8e1pt2.5.1cu121-cp311-cp311-linux_x86_64.whl

# Install additional demo dependencies
pip install decord pyqt5 gradio transformers

Step 6: Initialize Submodules

git submodule update --init --recursive

Step 7: (Optional) Configure VS Code Python Environment

touch .env
echo "PYTHONPATH=three/dust3r" >> .env

Download Model Checkpoints

Required: Reconstruction Models

mkdir -p weights && cd weights
wget https://download.europe.naverlabs.com/ComputerVision/DUSt3R/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth

Optional: Grounding DINO (for demo usage)

wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth

Verify Installation

Test your installation with the CLI:

boxdreamer-cli --help

If you see the help menu, installation was successful! Proceed to CLI Usage or try the Gradio demo.

📱 CLI Usage

# Display help and available options
boxdreamer-cli --help

# Process video with text prompt (automatic object detection)
boxdreamer-cli --video src/demo/examples/mode1/mode1-4.mp4 \
  --show_point_cloud --interactive --use_grounding_dino \
  --text_prompt "Controller"

# Manual object annotation mode
boxdreamer-cli --video src/demo/examples/mode1/mode1-4.mp4 \
  --show_point_cloud --interactive

# Auto reference frame selection
boxdreamer-cli --video src/demo/examples/mode1/mode1-4.mp4 \
  --show_point_cloud

# Quick processing (without point cloud rendering)
boxdreamer-cli --video src/demo/examples/mode1/mode1-4.mp4

🤗 Gradio Demo

Launch the interactive web interface:

# Using local checkpoint
python -m src.demo.gradio_demo --ckpt path_to_boxdreamer_ckpt

# Or load from Hugging Face
python -m src.demo.gradio_demo --hf

📂 Dataset Preparation

LINEMOD

You can download the dataset from CDPN. Then extract the dataset to data/lm folder.

OnePose & OnePose-LowTexture

Download OnePose dataset from OpenDataLab OnePose, OnePose-LowTexture from here. Then extract the dataset to data/onepose folder and data/onepose_lowtexture separately.

Occluded LINEMOD

Download the dataset from here. Then extract the dataset to data/lmo folder.

YCB-Video

Download the dataset from OpenDataLab YCB-Video. Then extract the dataset and move the YCB_Video_Dataset folder to data/ycbv folder. You can get foundationpose reference database from here.

Preprocess

YCB-Video FoundationPose Reference (Optional)

python src/datasets/utils/ycbv/foundationpose_ref_process.py.py

YCB-Video Preprocess

python src/datasets/utils/ycbv/ycbv_preprocess.py

Occluded LINEMOD Preprocess

python src/datasets/utils/linemod_utils/linemod_o_process.py

Reference Database Creation (Optional)

# Create FPS 5 views database for LINEMOD
python -m src.datasets.utils.view_sampler --dataset linemod --method fps --num_views 5 --root data/lm

🚀 Reconstruction

# Basic usage: Reconstruct LINEMOD with DUSt3R
python -m src.reconstruction.main --dataset LINEMOD --reconstructor dust3r --ref_suffix _fps_5

Key Parameters:

--dataset: Dataset name (LINEMOD, OnePose, etc.)
--reconstructor: Reconstruction method (dust3r, etc.)
--ref_suffix: Suffix for reference views database

🏋️ Training

# Basic usage: Train on OnePose with 5 reference views
python run.py --config-name=train.yaml \
    datamodule.train_datasets=[OnePose] \
    datamodule.val_datasets=[OnePose] \
    length=6

Note: For zsh shell, escape brackets with backslash: \[OnePose\]

Key Parameters:

datamodule.train_datasets: List of training datasets
datamodule.val_datasets: List of validation datasets
length: Number of reference views + 1 query view (e.g., 6 means 5 reference views)

📊 Evaluation

# Basic usage: Evaluate on LINEMOD using FPS 5 views
python run.py --config-name=test.yaml \
    pretrain_name=subfolder \
    exp_name=lm \
    datamodule.test_datasets=[LINEMOD] \
    datamodule.LINEMOD.config.model_suffix=_dust3r_5 \
    datamodule.LINEMOD.config.reference_suffix=_fps_5 \
    length=6

# Or, load ckpt from huggingface (lastest checkpoint)

python run.py --hf --config-name=test.yaml \
    exp_name=lm \
    datamodule.test_datasets=[LINEMOD] \
    datamodule.LINEMOD.config.model_suffix=_dust3r_5 \
    datamodule.LINEMOD.config.reference_suffix=_fps_5 \
    length=6

# Use the reproducible version checkpoint

python run.py --hf --reproducibility --config-name=test.yaml \
    exp_name=lm \
    datamodule.test_datasets=[LINEMOD] \
    datamodule.LINEMOD.config.model_suffix=_dust3r_5 \
    datamodule.LINEMOD.config.reference_suffix=_fps_5 \
    length=6

Key Parameters:

pretrain_name: Name of the pretrained model folder
datamodule.test_datasets: List of test datasets
datamodule.LINEMOD.config.model_suffix: Suffix for model files If not provided, ground truth models will be used for bounding box extraction
datamodule.LINEMOD.config.reference_suffix: Suffix for reference database If not provided, full views database will be used
length: Number of reference views + 1 query view

For evaluation with a dense reference database, set length to the total number of reference images plus one. Enabling the DINO feature filter (model.modules.dense_cfg.enable=True) will further assist in selecting the most relevant neighbor views for the decoder input.

📦 Model Zoo

Version	Training Data	Params	Download
Latest	Objaverse + OnePose	88.6M	Download
Pretrained	Objaverse	88.6M	Coming soon

download the ckpt and put it in models/checkpoints/subfolder folder and rename it to last.ckpt

❓ Frequently Asked Questions

Does BoxDreamer require CAD models or mesh representations of objects?

No, BoxDreamer does not require any 3D CAD models or mesh representations of objects during inference. This is a key advantage of our approach, as it enables generalization to novel objects without access to their 3D models. During training, we do use bounding box annotations, but no detailed 3D models are required.

How computationally expensive is BoxDreamer during inference?

The BoxDreamer-Base model runs at over 40 FPS on a single NVIDIA RTX 4090 GPU with 5 reference images.

Can BoxDreamer work with RGB-D images?

Yes! While the base version of BoxDreamer works with RGB images only, depth information also provides the access to object coordinates in real world. We plan to introduce a variant of BoxDreamer that incorporates depth information in the future.

📝 Citation

If you find BoxDreamer useful in your research, please consider citing our paper:

@article{yu2025boxdreamer,
  title={BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation},
  author={Yu, Yuanhong and He, Xingyi and Zhao, Chen and Yu, Junhao and Yang, Jiaqi and Hu, Ruizhen and Shen, Yujun and Zhu, Xing and Zhou, Xiaowei and Peng, Sida},
  journal={arXiv preprint arXiv:2504.07955},
  year={2025}
}

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🙏 Acknowledgements

Our implementation is based on several open-source repositories. We thank the authors of these repositories for making their code available.

At the same time, I would like to thank Yating Wang, Chengrui Dong, and Yiguo Fan for their sincere suggestions and the valuable live demonstrations they provided.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
configs		configs
scripts		scripts
src		src
tests		tests
three		three
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py

License

zju3dv/BoxDreamer

Folders and files

Latest commit

History

Repository files navigation

🛸 BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation

Project Page | Paper | 🤗 Demo

ICCV 2025

📅 News

📋 Table of Contents

📦 Method Overview

💻 Installation

Method 1: Fast Installation (Recommended)

Method 2: Manual Installation

Step 1: Create Environment

Step 2: Install uv Package Manager

Step 3: Install Core Dependencies

Step 4: Install Project Dependencies

Step 5: (Optional) Install SAM2 for Real-World Demo

Step 6: Initialize Submodules

Step 7: (Optional) Configure VS Code Python Environment

Download Model Checkpoints

Required: Reconstruction Models

Optional: Grounding DINO (for demo usage)

Verify Installation

📱 CLI Usage

🤗 Gradio Demo

📂 Dataset Preparation

LINEMOD

OnePose & OnePose-LowTexture

Occluded LINEMOD

YCB-Video

Preprocess

YCB-Video FoundationPose Reference (Optional)

YCB-Video Preprocess

Occluded LINEMOD Preprocess

Reference Database Creation (Optional)

🚀 Reconstruction

🏋️ Training

📊 Evaluation

📦 Model Zoo

❓ Frequently Asked Questions

📝 Citation

📄 License

🙏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages