Rig3R is a transformer-based model for multiview 3D reconstruction and camera pose estimation that incorporates rig-aware conditioning. Unlike prior models (e.g., DUSt3R, Fast3R), Rig3R learns to leverage rig metadata (camera ID, timestamp, and rig pose) when available, and can infer rig structure when not — enabling robust 3D reconstruction across unstructured and rig-based image sets.
This repository is a reimplementation of the original Rig3R paper (Li et al., 2025), built for clarity, reproducibility, and extensibility.
- Rig-aware transformer architecture using ViT-Large encoder-decoder.
- Joint prediction of:
- Pointmaps (dense 3D coordinates),
- Pose raymaps (global frame),
- Rig raymaps (rig-relative frame).
- Rig discovery: infers rig calibration from unordered image collections.
- Closed-form camera pose recovery from raymaps.
- Multi-task learning with dropout-based metadata conditioning.
Set up environment with uv
make setup-envsource .venv/bin/activatemake installFor complete setup instructions:
- Waymo: Dataset Preparation – Waymo
- CO3D: Dataset Preparation – CO3D
make setup-trainmake train- Note: for debugging or training on a less powerful machine, you can run the command below
make train-debugRecommended configuration values:
- Batch size: 128
- Frames per sample: 24
- Image size: 512×512
- Optimizer: AdamW (lr=1e-4)
- Scheduler: Cosine annealing
- Dropout on metadata: 0.5
- Note: Check out here for more training details
Note: Make sure to have trained models ready
make download-wayve101make evaluate- Note: for debugging or evaluating on a less powerful machine, you can run the command below
make evaluate-debugMetrics:
- RRA / RTA @5°/15°/30°
- mAA (mean angular accuracy)
- Chamfer distance
- Rig discovery accuracy (Hungarian assignment)
Run unsupervised rig calibration discovery:
python scripts/infer_rig_discovery.py --config configs/evaluate.yaml- Note: for debugging or running it on a less powerful machine, you can run the command below
python scripts/infer_rig_discovery.py --config configs/evaluate_mini.yamlThe script calculates and reports:
- Rig ID Accuracy: How well frames are clustered into cameras.
- Rig mAA: Mean Angular Accuracy of the recovered rig extrinsics.
- Chamfer Distance: Quality of the reconstructed 3D pointcloud.
It also produces:
- Clustered rig raymaps (internal representation)
- Reconstructed 3D pointclouds
- Estimated rig configurations (extrinsics)
If you use this reimplementation, please cite the original paper:
@article{li2025rig3r,
title={Rig3R: Rig-Aware Conditioning for Learned 3D Reconstruction},
author={Li, Samuel and Kachana, Pujith and Chidananda, Prajwal and Nair, Saurabh and Furukawa, Yasutaka and Brown, Matthew},
journal={arXiv preprint arXiv:2506.02265},
year={2025}
}