GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering (SIGGRAPH 2025)
GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering (SIGGRAPH 2025 Conference Paper)
Zinuo You, Stamatios Georgoulis, Anpei Chen, Siyu Tang, Dengxin Dai
git clone https://github.com/sinoyou/gavs.git
GaVS has been trained and tested with the followings software versions: Python 3.10, CUDA 12.6
create and activate conda environment
conda create -n gavs python=3.10
conda activate gavsor place local victual environment
conda create -p ./venv python=3.10
conda activate ./venvFinally, install the required packages in the following order,
pip install -r requirements-torch.txt --extra-index-url https://download.pytorch.org/whl/cu126
pip install ./thirdparty/diff-gaussian-rasterization-w-pose
pip install -r requirements.txtIn the project root directory, please download and unzip the dataset, pre-trained checkpoints and results from HuggingFace repo.
# download and unzip
python datasets/download_gavs_data.pyOur method is mainly evaluated on the selected scenes from Google Deep Online Fused Video Stabilization, including 15 challenging scenes categorized into dynamic, mild and intense. The selection process considers our method's nature of 3D modeling and the balance between different categories.
The original dataset only contains raw unstable videos with gyro information. We first extract video frames and compensate rolling shutter by 2D homography from gyro history. Then we enhance video frames with following modules:
- GLOMAP: compute 3D camera poses and sparse point clouds from the static regions.
- ProPainter: video completion module to extrapolate original frame with optical flow.
- Grounded SAM 2: extract dynamic masks with text prompts.
Additionally, we compute monocular depth maps and optical flows on-the-fly with UniDepth and RAFT.
With correct cloning and data extraction, the project directory is expected to be as follows:
./
|-- gavs-data
| |-- re10k_v2_checkpoints/ # pretrained Flash3D model
| |-- dataset
| | |-- dynamic_dance/ # preprocessed data of 'dynamic dance'
| | |-- dynamic_dance.zip # downloaded zip file
| | |-- ...
| |-- result_and_comparison # reference results from different methods
|-- configs
| |-- dataset/ # dataset configs
| |-- experiment/ # finetuning and evaluation configs
| |-- ...
|-- train.py # entry point of finetuning
|-- evaluation.py # entry point of evaluation
|-- models/
|-- README.md
|-- ...You can start the finetuning with the following command (e.g. dynamic_dance scene). The evaluation will automatically start afterwards,
python train.py \
hydra.run.dir=./exp/dynamic_dance/ \
+experiment=layered_gavs_overfit \
dataset.data_path=./gavs-data/dataset/dynamic_danceYou can also start the evaluation only (e.g. with different stability) with the following command,
python evaluate.py \
train.mode='eval' \
hydra.run.dir=./exp/dynamic_dance \
+experiment=layered_gavs_eval \
dataset.data_path=./gavs-data/dataset/dynamic_dance \
config.eval_dir='customized_eval'Ablation on dynamic compensation,
python train.py \
hydra.run.dir=./exp/dynamic_dance/ \
+experiment=layered_gavs_overfit \
dataset.data_path=./gavs-data/dataset/dynamic_dance \
train.handle_dynamic_by_flow=FalseAblation on window regularization,
python train.py \
hydra.run.dir=./exp/dynamic_dance/ \
+experiment=layered_gavs_overfit \
dataset.data_path=./gavs-data/dataset/dynamic_dance \
loss.window.weight=0Ablation on inpainting module,
python train.py \
hydra.run.dir=./exp/dynamic_dance/ \
+experiment=layered_gavs_overfit \
dataset.data_path=./gavs-data/dataset/dynamic_dance \
dataset.use_inpainted_images=False \
dataset.pad_border_aug=0Ablation on video stability (only change evaluation),
python evaluate.py \
train.mode='eval' \
hydra.run.dir=./exp/dynamic_dance \
+experiment=layered_gavs_eval \
dataset.data_path=./gavs-data/dataset/dynamic_dance \
config.eval_dir='stability8' \
dataset.stability=8This repo is built on the Flash3D. We thank the authors of all the open source tools: GLOMAP, ProPainter, Grounded SAM 2, UniDepth and RAFT. We thank the authors of following projects for providing evaluation scenes: Google Deep Online Fused Video Stabilization and LocalRF.
@article{you2025gavs,
title={GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering},
author={You, Zinuo and Georgoulis, Stamatios and Chen, Anpei and Tang, Siyu and Dai, Dengxin},
journal={arXiv preprint arXiv:2506.23957},
year={2025}
}
