Skip to content

zju3dv/BoxDreamer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›Έ BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation

Yuanhong Yu β€’ Xingyi He β€’ Chen Zhao β€’ Junhao Yu β€’ Jiaqi Yang β€’ Ruizhen Hu β€’ Yujun Shen β€’ Xing Zhu β€’ Xiaowei Zhou β€’ Sida Peng

Python PyTorch Pytorch Lightning Hydra black pre-commit

Model on HF Deploy on Spaces

πŸ“¦βœ¨ Dream up accurate 3D bounding boxes for objects in the wild!

πŸ“… News

  • [2025.10.06] πŸŽ‰ BoxDreamer demo upgraded! Now supports CLI usage for easier interaction!
  • [2025.06.26] πŸ† Paper accepted by ICCV 2025! Code is now open-sourced on GitHub!
  • [2025.04.10] πŸ“„ BoxDreamer paper released on arXiv!

πŸ“‹ Table of Contents

πŸ“¦ Method Overview

BoxDreamer

πŸ’» Installation

BoxDreamer supports two installation methods: a fast automated script or manual step-by-step installation. Choose the method that best suits your needs.

Method 1: Fast Installation (Recommended)

If your system is compatible with PyTorch 2.5.1 + CUDA 12.1, use our automated installation script:

bash install.sh

After successful installation, you can immediately start using BoxDreamer via the CLI or Gradio demo.

Method 2: Manual Installation

For custom configurations or troubleshooting, follow these steps:

Step 1: Create Environment

# Create and activate conda environment
conda create -n boxdreamer python=3.11
conda activate boxdreamer

Step 2: Install uv Package Manager

We recommend using uv for faster dependency installation:

pip install uv

Step 3: Install Core Dependencies

# Install PyTorch (adjust CUDA version if needed)
uv pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121

# Install PyTorch3D
pip install "git+https://github.com/facebookresearch/pytorch3d.git"

# Install Flash Attention
pip install flash_attn

# Install xformers
pip install xformers==0.0.28.post3

Step 4: Install Project Dependencies

# Install required packages
uv pip install -r requirements.txt

# Install BoxDreamer
pip install -e .

Step 5: (Optional) Install SAM2 for Real-World Demo

For CUDA 12.1 + Python 3.11 + PyTorch 2.5.1:

# Download pre-built wheel from https://miropsota.github.io/torch_packages_builder/sam-2/
pip install https://github.com/MiroPsota/torch_packages_builder/releases/download/SAM_2-1.0%2Bc2ec8e1/SAM_2-1.0%2Bc2ec8e1pt2.5.1cu121-cp311-cp311-linux_x86_64.whl

# Install additional demo dependencies
pip install decord pyqt5 gradio transformers

Step 6: Initialize Submodules

git submodule update --init --recursive

Step 7: (Optional) Configure VS Code Python Environment

touch .env
echo "PYTHONPATH=three/dust3r" >> .env

Download Model Checkpoints

Required: Reconstruction Models

mkdir -p weights && cd weights
wget https://download.europe.naverlabs.com/ComputerVision/DUSt3R/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth

Optional: Grounding DINO (for demo usage)

wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth

Verify Installation

Test your installation with the CLI:

boxdreamer-cli --help

If you see the help menu, installation was successful! Proceed to CLI Usage or try the Gradio demo.


πŸ“± CLI Usage

# Display help and available options
boxdreamer-cli --help

# Process video with text prompt (automatic object detection)
boxdreamer-cli --video src/demo/examples/mode1/mode1-4.mp4 \
  --show_point_cloud --interactive --use_grounding_dino \
  --text_prompt "Controller"

# Manual object annotation mode
boxdreamer-cli --video src/demo/examples/mode1/mode1-4.mp4 \
  --show_point_cloud --interactive

# Auto reference frame selection
boxdreamer-cli --video src/demo/examples/mode1/mode1-4.mp4 \
  --show_point_cloud

# Quick processing (without point cloud rendering)
boxdreamer-cli --video src/demo/examples/mode1/mode1-4.mp4

πŸ€— Gradio Demo

Launch the interactive web interface:

# Using local checkpoint
python -m src.demo.gradio_demo --ckpt path_to_boxdreamer_ckpt

# Or load from Hugging Face
python -m src.demo.gradio_demo --hf

πŸ“‚ Dataset Preparation

LINEMOD

You can download the dataset from CDPN. Then extract the dataset to data/lm folder.

OnePose & OnePose-LowTexture

Download OnePose dataset from OpenDataLab OnePose, OnePose-LowTexture from here. Then extract the dataset to data/onepose folder and data/onepose_lowtexture separately.

Occluded LINEMOD

Download the dataset from here. Then extract the dataset to data/lmo folder.

YCB-Video

Download the dataset from OpenDataLab YCB-Video. Then extract the dataset and move the YCB_Video_Dataset folder to data/ycbv folder. You can get foundationpose reference database from here.

Preprocess

YCB-Video FoundationPose Reference (Optional)

python src/datasets/utils/ycbv/foundationpose_ref_process.py.py

YCB-Video Preprocess

python src/datasets/utils/ycbv/ycbv_preprocess.py

Occluded LINEMOD Preprocess

python src/datasets/utils/linemod_utils/linemod_o_process.py

Reference Database Creation (Optional)

# Create FPS 5 views database for LINEMOD
python -m src.datasets.utils.view_sampler --dataset linemod --method fps --num_views 5 --root data/lm

πŸš€ Reconstruction

# Basic usage: Reconstruct LINEMOD with DUSt3R
python -m src.reconstruction.main --dataset LINEMOD --reconstructor dust3r --ref_suffix _fps_5

Key Parameters:

  • --dataset: Dataset name (LINEMOD, OnePose, etc.)
  • --reconstructor: Reconstruction method (dust3r, etc.)
  • --ref_suffix: Suffix for reference views database

πŸ‹οΈ Training

# Basic usage: Train on OnePose with 5 reference views
python run.py --config-name=train.yaml \
    datamodule.train_datasets=[OnePose] \
    datamodule.val_datasets=[OnePose] \
    length=6

Note: For zsh shell, escape brackets with backslash: \[OnePose\]

Key Parameters:

  • datamodule.train_datasets: List of training datasets
  • datamodule.val_datasets: List of validation datasets
  • length: Number of reference views + 1 query view (e.g., 6 means 5 reference views)

πŸ“Š Evaluation

# Basic usage: Evaluate on LINEMOD using FPS 5 views
python run.py --config-name=test.yaml \
    pretrain_name=subfolder \
    exp_name=lm \
    datamodule.test_datasets=[LINEMOD] \
    datamodule.LINEMOD.config.model_suffix=_dust3r_5 \
    datamodule.LINEMOD.config.reference_suffix=_fps_5 \
    length=6

# Or, load ckpt from huggingface (lastest checkpoint)

python run.py --hf --config-name=test.yaml \
    exp_name=lm \
    datamodule.test_datasets=[LINEMOD] \
    datamodule.LINEMOD.config.model_suffix=_dust3r_5 \
    datamodule.LINEMOD.config.reference_suffix=_fps_5 \
    length=6

# Use the reproducible version checkpoint

python run.py --hf --reproducibility --config-name=test.yaml \
    exp_name=lm \
    datamodule.test_datasets=[LINEMOD] \
    datamodule.LINEMOD.config.model_suffix=_dust3r_5 \
    datamodule.LINEMOD.config.reference_suffix=_fps_5 \
    length=6

Key Parameters:

  • pretrain_name: Name of the pretrained model folder
  • datamodule.test_datasets: List of test datasets
  • datamodule.LINEMOD.config.model_suffix: Suffix for model files If not provided, ground truth models will be used for bounding box extraction
  • datamodule.LINEMOD.config.reference_suffix: Suffix for reference database If not provided, full views database will be used
  • length: Number of reference views + 1 query view

For evaluation with a dense reference database, set length to the total number of reference images plus one. Enabling the DINO feature filter (model.modules.dense_cfg.enable=True) will further assist in selecting the most relevant neighbor views for the decoder input.

πŸ“¦ Model Zoo

Version Training Data Params Download
Latest Objaverse + OnePose 88.6M Download
Pretrained Objaverse 88.6M Coming soon

download the ckpt and put it in models/checkpoints/subfolder folder and rename it to last.ckpt

❓ Frequently Asked Questions

Does BoxDreamer require CAD models or mesh representations of objects?

No, BoxDreamer does not require any 3D CAD models or mesh representations of objects during inference. This is a key advantage of our approach, as it enables generalization to novel objects without access to their 3D models. During training, we do use bounding box annotations, but no detailed 3D models are required.

How computationally expensive is BoxDreamer during inference?

The BoxDreamer-Base model runs at over 40 FPS on a single NVIDIA RTX 4090 GPU with 5 reference images.

Can BoxDreamer work with RGB-D images?

Yes! While the base version of BoxDreamer works with RGB images only, depth information also provides the access to object coordinates in real world. We plan to introduce a variant of BoxDreamer that incorporates depth information in the future.

πŸ“ Citation

If you find BoxDreamer useful in your research, please consider citing our paper:

@article{yu2025boxdreamer,
  title={BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation},
  author={Yu, Yuanhong and He, Xingyi and Zhao, Chen and Yu, Junhao and Yang, Jiaqi and Hu, Ruizhen and Shen, Yujun and Zhu, Xing and Zhou, Xiaowei and Peng, Sida},
  journal={arXiv preprint arXiv:2504.07955},
  year={2025}
}

πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

πŸ™ Acknowledgements

Our implementation is based on several open-source repositories. We thank the authors of these repositories for making their code available.

At the same time, I would like to thank Yating Wang, Chengrui Dong, and Yiguo Fan for their sincere suggestions and the valuable live demonstrations they provided.

About

Code for "BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation", ICCV 2025.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •