MΒ³-ReID: Unifying Multi-View, Granularity, and Modality for Video-Based Visible-Infrared Person Re-Identification
This repository contains the official implementation of the paper:
By Tengfei Liang, Yi Jin, Zhun Zhong, Xin Chen, Xianjia Meng, Tao Wang, Yidong Li.
Notes: This repository offers the complete implementation of the method, featuring a well-organized directory structure and detailed comments that facilitate model training and testing. Moreover, the codebase is designed to be easy to extend, allowing users to quickly adapt or build upon the existing framework. It is hoped that this work can serve as a new strong and simple baseline for video-based visibleβinfrared person re-identification and further contribute to the advancement of the VVI-ReID field.
Video-based visible-infrared person re-identification (VVI-ReID) task focuses on cross-modality retrieval of pedestrian videos, which are captured in visible and infrared modalities by non-overlapping cameras across diverse scenes, and holds significant value for security surveillance scenarios. It's a challenging task due to three main issues: the difficulty of capturing comprehensive spatio-temporal cues, intra-class variations within video sequences, and inter-modality discrepancies between visible and infrared data.
To address these, we propose MΒ³-ReID, a unified framework that simultaneously handles:
- Multi-View Learning (MVL): Captures diverse spatio-temporal patterns (Spatial, Temporal-Width, Temporal-Height views) with the Diverse Attention Constraint.
- Multi-Granularity Representation (MGR): Optimizes features across both fine-grained frame level (with Orthogonal Frame Regularizer) and coarse-grained video level.
- Multi-Modality Alignment (MMA): Explicitly aligns metric learning with cross-modality retrieval goals using a retrieval-oriented contrastive objective.
The codebase is organized into modular components to facilitate understanding and maintenance.
M3-ReID/
βββ data/ # Data Pipeline
β βββ manager.py # DataManager: Parses raw data (HITSZ-VCM/BUPTCampus)
β βββ dataset.py # VideoVIDataset: Handles video loading and sampling
β βββ sampler.py # Diverse sampling strategies
β βββ transform.py # Custom transforms
βββ losses/ # Loss Functions
β βββ mma_loss.py # Multi-Modality Alignment (MMA) Loss
β βββ sep_loss.py # Separation Loss (Used for OFR and DAC)
βββ models/ # Model Definitions
β βββ backbones/ # Backbone Networks
β β βββ resnet.py # ResNet backbone implementation (FC layers removed)
β βββ modules/ # Custom Modules
β β βββ mvl_attention.py # Multi-View Learning (MVL) Attention module
β β βββ non_local.py # Spatio-Temporal Non-Local Block
β β βββ normalize.py # Feature Normalization layer
β βββ model_m3reid.py # The M3-ReID architecture definition
βββ tools/ # Utilities
β βββ eval_metrics.py # Evaluation (CMC, mAP, mINP)
β βββ utils.py # Logger, random seed, path handling
βββ train_m3reid.py # Training Entry Point
βββ test_m3reid.py # Evaluation Entry Point
We have implemented several engineering optimizations to ensure robustness, efficiency, and extensibility:
- π§© Modular & Extensible Design: The framework decouples the Data, Model, and Loss components. You can easily plug in new backbones, custom losses, or optimization strategies with minimal changes to the core codebase.
- ποΈ Synchronized Video Augmentation: Unlike previous VVI-ReID methods that often overlook temporal consistency during augmentation, our
SyncTrackTransformensures that random operations (e.g., cropping, flipping, erasing) are applied identically across all frames within a video tracklet. This preserves temporal coherence and avoids introducing artificial jitter noise that could confuse the temporal learning modules. - π¨ Rich Sampling Strategies:
sampler.pyprovides diverse sampling logic, including standard PxK sampling, Identity-Balanced Cross-Modality sampling (guaranteeing half IR / half RGB per batch), and random sampling, satisfying various training requirements. - π₯ Mixed Precision Training: The training loop natively supports Automatic Mixed Precision (AMP) (
--fp16), allowing for reduced GPU memory usage and faster training throughput without compromising performance. - π Comprehensive Logging: We provide a dual-logging system that simultaneously records training progress to console, text files, and TensorBoard. This makes it easy to monitor loss curves and accuracy in real-time.
- π GPU-Based Metric Calculation: The evaluation script (
eval_metrics.py) computes CMC, mAP, and mINP entirely on the GPU. This significantly accelerates the evaluation process, especially for large-scale gallery sets, compared to traditional CPU-based implementations.
- CUDA 11.3+
- Python 3.9+
- PyTorch 1.12+
- TorchVision 0.13+
- Numpy
- Pillow
- TensorBoard
P.S. In our experiments, we used CUDA 11.3, Python 3.9, PyTorch 1.12.1, TorchVision 0.13.1, NumPy 1.26.4, Pillow 11.1.0, and TensorBoard 2.19.0. Since our code depends only on these common libraries and does not require any unusual packages, higher or lower versions of these packages may also be supported. It is not strictly necessary to follow the exact versions listed in our requirements.txt. Users are free to explore different versions; if any warnings appear, they can usually be resolved with small adjustments to the code.
You can clone the repository using Git or download the ZIP archive manually.
Option A: Using Git
git clone https://github.com/workingcoder/M3-ReID.git
cd M3-ReIDOption B: Manual Download
-
Click the "Code" button at the top right of this page.
-
Select "Download ZIP" and extract it.
-
Enter the directory via terminal:
cd M3-ReID-mainInstall the required packages. Note that the provided requirements.txt is intended as a reference rather than a strict specification, and users are free to explore different versions.
pip install -r requirements.txtDuring our experiments, we evaluated the proposed method on two publicly available datasets: HITSZ-VCM and BUPTCampus, which are benchmarks for VVI-ReID.
Please download the corresponding datasets from their official sources. Our code is designed to support their original directory structures.
We provide training scripts (train_m3reid.py) for both the HITSZ-VCM and BUPTCampus datasets. In our experiments, Mixed Precision Training (AMP) is enabled by default to reduce GPU memory consumption and accelerate training. Users can adjust all parameters as needed for further exploration.
We provide a comprehensive set of arguments to configure the training process.
Data Configurations
--dataset: Target dataset name. Choices:HITSZVCM,BUPTCampus.--dataset_dir: Path to the root directory of the dataset.--img_h,--img_w: Input image resolution. Default:288x144.--p_num: Number of identities (P) per batch. Default:4.--k_num: Number of images (K) per identity. Default:8.- Note: The total Batch Size is calculated as
p_num * k_num.
- Note: The total Batch Size is calculated as
--workers: Number of data loading threads. Default:4.
Optimization
--lr: Initial learning rate. Default:0.0002.--wd: Weight decay. Default:0.0005.
Training & Logging
--log_interval: Frequency of logging training status (in batches).--test_interval: Frequency of evaluating on the test set (in epochs).--save_interval: Frequency of saving model checkpoints (in epochs).--resume: Path to a checkpoint file to resume training from.--seed: Random seed for reproducibility. Default:0.--fp16: [Flag] Enable Automatic Mixed Precision (AMP) training to save memory and speed up.--desc: A short description string for the experiment (used for naming the log folder).--gpu: GPU device ID to use (e.g.,0).
python train_m3reid.py \
--dataset HITSZVCM \
--dataset_dir path/to/HITSZ-VCM \
--img_h 288 \
--img_w 144 \
--p_num 4 \
--k_num 8 \
--lr 0.0002 \
--fp16 \
--desc M3-ReID \
--gpu 0python train_m3reid.py \
--dataset BUPTCampus \
--dataset_dir path/to/BUPTCampus \
--img_h 288 \
--img_w 144 \
--p_num 4 \
--k_num 8 \
--lr 0.0001 \
--fp16 \
--desc M3-ReID \
--gpu 0We provide three complementary ways to monitor the training process:
- Console Output: Real-time progress, loss values, and accuracy metrics are printed directly to the terminal standard output.
- Text Log: A complete log file is automatically saved in the
ckptlogdirectory.- Path Structure:
ckptlog/<DATASET>/Time-<TIMESTAMP>_<DESC>/log_*.txt - Example:
ckptlog/HITSZVCM/Time-2025-12-07_M3-ReID/log_Time-2025-12-07_M3-ReID.txt
- Path Structure:
- TensorBoard: Visualizations of loss curves and evaluation metrics are recorded in the
tensorboardsubdirectory.
To view training curves via TensorBoard:
# Option A: Point to the specific experiment directory
tensorboard --logdir ckptlog/HITSZVCM/Time-2025-XX-XX_M3-ReID/tensorboard --port 6006
# Option B: Point to the root log directory to compare multiple experiments
tensorboard --logdir ckptlog/HITSZVCM/ --port 6006
# Then open http://localhost:6006 in your browser to view the curvesThe testing script (test_m3reid.py) performs cross-modality retrieval evaluation on both "Visible-to-Infrared" (V2I) and "Infrared-to-Visible" (I2V) modes. It calculates and reports Rank-1, Rank-5, Rank-10, mAP, and mINP metrics.
Data Configuration
--dataset: Target dataset name. Choices:HITSZVCM,BUPTCampus.--dataset_dir: Path to the root directory of the dataset.--img_h,--img_w: Input image resolution. Default:288x144.--batch_size: Batch size for testing. Default:32.- Note: This refers to the number of video tracks processed simultaneously. Increasing this (e.g., to 64 or 128) can significantly speed up inference if GPU memory allows.
--workers: Number of data loading threads. Default:4.
Model & Environment
--resume: [Required] Path to the trained model checkpoint file (e.g.,path/to/modelckpt/model_epoch-200.pth).--desc: A short description string for this testing process (used for naming the log folder).--gpu: GPU device ID to use (e.g.,0).
python test_m3reid.py \
--dataset HITSZVCM \
--dataset_dir path/to/HITSZ-VCM \
--img_h 288 \
--img_w 144 \
--batch_size 32 \
--resume ckptlog/HITSZVCM/Time-Your_Desc/modelckpt/model_epoch-200.pth \
--desc M3-ReID \
--gpu 0python test_m3reid.py \
--dataset BUPTCampus \
--dataset_dir path/to/BUPTCampus \
--img_h 288 \
--img_w 144 \
--batch_size 32 \
--resume ckptlog/BUPTCampus/Time-Your_Desc/modelckpt/model_epoch-200.pth \
--desc M3-ReID \
--gpu 0Our method achieves good performance on VVI-ReID benchmarks. Detailed cross-modality retrieval performance is reported below.
| Method | Mode | Rank-1 | Rank-5 | Rank-10 | Rank-20 | mAP |
|---|---|---|---|---|---|---|
| MΒ³-ReID | Visible to Infrared | 75.31% | 85.17% | 88.66% | 91.43% | 60.51% |
| MΒ³-ReID | Infrared to Visible | 73.21% | 82.81% | 86.74% | 89.70% | 59.02% |
| Method | Mode | Rank-1 | Rank-5 | Rank-10 | Rank-20 | mAP |
|---|---|---|---|---|---|---|
| MΒ³-ReID | Visible to Infrared | 70.74% | 85.19% | 89.44% | 92.04% | 63.75% |
| MΒ³-ReID | Infrared to Visible | 68.28% | 85.45% | 88.99% | 91.79% | 64.96% |
(Results cited from Table I and Table II of the original paper)
Our framework is designed to be easily extensible.
Follow these steps to implement a new VVI-ReID method (e.g., YourMethod).
Create a new model file models/model_yourmethod.py. You can refer to models/model_m3reid.py as a template.
- Inherit
nn.Module: Define your class (e.g.,class YourMethod(nn.Module)). - Backbone Integration: Initialize your backbone in
__init__.- Flexibility: While we use ResNet-50 by default, you can easily integrate other backbones (e.g., ViT, Swin Transformer, or ConvNeXt) by adding their implementations to
models/backbones/and importing them here.
- Flexibility: While we use ResNet-50 by default, you can easily integrate other backbones (e.g., ViT, Swin Transformer, or ConvNeXt) by adding their implementations to
- Custom Modules: Add your specific components (e.g., Attention mechanisms, Temporal Aggregation) in
models/modules/and assemble them in your model. - Forward Pass: Implement the feature extraction flow. Ensure
forward()returns the necessary tensors for loss calculation (during training) or the final embedding (during evaluation).
Create a new training script train_yourmethod.py. You can duplicate train_m3reid.py as a robust starting point and modify:
- Import Model:
# from models.model_m3reid import M3ReID <-- Replace with your model from models.model_yourmethod import YourMethod
- Custom Data Pipeline (Samplers & Transforms):
- You can customize augmentations in
data/transform.pyor design new sampling strategies indata/sampler.py. - Note: Our existing samplers already support balanced batch sampling (half IR, half RGB). Even if not used in our default configuration, you can easily enable them to get separate samples for each modality using the
track_mid(modality labels) from a batch returned by the dataloader.
- You can customize augmentations in
- Define Custom Losses:
- If your method requires unique loss functions, implement them in the
losses/directory (e.g.,losses/your_new_loss.py). - Import and initialize them in your training script:
from losses.your_new_loss import YourNewLoss # ... inside the script ... criterion_new = YourNewLoss().cuda()
- If your method requires unique loss functions, implement them in the
- Initialize Model:
model = YourMethod(sample_seq_num, num_train_class).cuda()
- Optimization Loop:
- Update the forward/backward logic inside the training loop (
for batch_idx, batch_data in enumerate(train_loader):). - Flexibility: You are free to customize the optimization flow to support complex paradigms, such as Generative Adversarial Networks (GANs) (alternating optimizer steps) or multi-stage training, by modifying how
optimizer.step()and gradients are handled here.
- Update the forward/backward logic inside the training loop (
- Custom Logging & Visualization:
- Text Logs: Since
sys.stdoutis redirected to theLoggerclass, anyprint()statement inside the training loop will automatically be saved to the log file inckptlog/. You can modify the formatted string in the loop to include your new losses or any other values. - TensorBoard: Use the initialized
writerobject to track your new metrics. Simply addwriter.add_scalar('group/metric_name', value, iter_num)alongside the existing logging code to visualize your custom curves in real-time.
- Text Logs: Since
Create a new testing script test_yourmethod.py. Duplicate test_m3reid.py and update the model import:
# from models.model_m3reid import M3ReID <-- Replace with your model
from models.model_yourmethod import YourMethod
# ... inside the script ...
model = YourMethod(sample_seq_num, num_train_class).cuda()Now you can run your own method using:
python train_yourmethod.py --dataset HITSZVCM ...python test_yourmethod.py --dataset HITSZVCM ...Enjoy your research journey!
If you find this code or paper useful for your research, please cite:
@article{TIFS_M3ReID,
author = {Tengfei Liang and
Yi Jin and
Zhun Zhong and
Xin Chen and
Xianjia Meng and
Tao Wang and
Yidong Li},
title = {M$^3$-ReID: Unifying Multi-View, Granularity, and Modality
for Video-Based Visible-Infrared Person Re-Identification},
journal = {IEEE Transactions on Information Forensics and Security},
year = {2025}
}This repository is released under the MIT license. Please see the LICENSE file for more information.
